Keyboard Shortcuts
ctrl + shift + ? :
Show all keyboard shortcuts
ctrl + g :
Navigate to a group
ctrl + shift + f :
Find
ctrl + / :
Quick actions
esc to dismiss
Likes
Search
Temporary Workaround to improve USB command stability
To Erik, Hugen, ttrf (de facto code "owners")
*The Command Stability Issue:* Some people have reported some stability issues with the usb command interface, including the screen going white. I have also noticed this issue with my own python experiments, and have been investigating it over the last few days. I've found an easy and reliable way reproduce the issue: I use putty to repeatedly type the "freq" command with different frequencies, and monitor the result on the tx port with a frequency counter. Although the usb interface continues to work, the frequency stops changing after 4-8 "freq" commands. The screen will also randomly go white, but much less frequently. Once the nanovna gets into this state it needs to be powered off and on again to restore normal operation; the usb reset command isn't sufficient. I've reproduced this issue with Hugen's Aug 2nd firmware, and with a build today based on Erik's firmware (here: ). *The Temporary Workaround:* In investigating this issue I have found that the problem is somehow related to the UI thread calling draw_all_cells() in parallel to the usb command thread running. Stability is improved by restricting when this function is called. See the code snippet (just 4 lines are moved)( ) This is a temporary fix until we find what the underlying cause of this interaction is. As described below, Hugen's firmware also needs stack size changes to improve stability. *Stack Size Issues* The symptoms we see are typically associated with a thread's stack overflowing the allocated area and trampling on adjacent memory. Indeed the command thread makes function calls at least 12 deep to process the "freq" command, and together with the parameters passed and local variables in each function, consumes a lot of stack! However I've eliminated this as a cause (at least for Erik's version). I did so by following the chibiOS guide re. stack sizes (here: ). The default nanovna build already set sets the mentioned build variables CH_DBG_ENABLE_STACK_CHECK and CH_DBG_FILL_THREADS. I implemented a simple memory dump command to let me do this testing (code also in the gist linked above). My conclusion is that Erik's firmware version has just 4 bytes spare stack space. This could easily be accidentally "blown" by a future simple software change (adding a parameter or another local variable). I therefore recommend increasing the usb thread's stack size by 64 bytes, (and reducing the UI stack size by the same amount, since it's stack is larger than needed and there's no other memory left). The two lines to change are also in the gist above. I also believe Hugen's version has the usb thread far too small - the "freq" command will definitely overwrite the UI's stack and cause instability - this needs fixing. We already have a proliferation of firmware versions, so I won't add to the problem by adding my own. Instead I suggest the existing owners investigate and take these changes forward as they see fit. Rgds, Dave |
Several days ago I reported stability issues with some usb commands (
/g/nanovna-users/message/2379 ) . This is a follow-up... First, to summarise the issue, a number of commands sent on the usb port leads to nasty stability issues (frequency freezing, the screen going white). The problem commands are: "freq", "scan", "offset", "power". (These all make calls to an internal routine called set_frequency() to set the output frequency). I was surprised at the time that more people weren't reporting similar problems, especially given the popularity of nanovna-saver, which uses the usb interface. However I've since looked at nanovna-saver and discovered that the only commands is uses are "info", "sweep", "data", "frequencies". These commands instead rely on the UI Thread to call set_frequency() for all 101 data points. It very nicely avoids the problems I encountered. This means that the issues I've found will *not* affect most users. However if you are dabbling with writing your own pc software you should stay clear of the problem commands. The following is therefore only relevant to those that want to use these problem commands, and to those wanting to improve overall firmware stability. I continued to investigate the cause of the instability, partly because I was concerned there was a random "rogue pointer" bug somewhere, and that my test case just made the bug more easily reproducible - therefore finding and fixing it might lead to other benefits. I had previously found that stability was improved by having the UI thread avoid calling draw_all_cells() (which updates the screen) whilst the usb interface was active. I therefore focused my efforts on the screen drawing code. The screen drawing logic is long and complicated (partly described by ttrf here: (use google translate). It's therefore a perfect candidate for a subtle bug causing a buffer's bounds to be exceeded. However, instead I isolated the problem to inside the screen-driver itself: ili9341.c. The source code includes 2 alternate implementations of ili9341_bulk(), selection between them being via an #ifdef. This routine outputs a large amount of data to the LCD screen. The "live" version uses DMA to transfer the information fast and without CPU intervention. The commented-out version transfers the data via the CPU alone, without DMA. When I switched to building with the CPU version the stability issue had gone! I have tried to narrow down what specifically about the DMA version causes stability issues, but have made no further progress. However I have eliminated memory corruption issues in the screen drawing logic as the cause of the instability. Note: the DMA version doesn't cause problems normally because the UI thread only ever calls set_frequency() and ili9341_bulk() sequentially; the problem only arises if the calls occur in parallel. Next steps: Permanently reverting to the non-DMA version of ili9341_bulk() isn't really an option as the screen refresh is considerably slower. It therefore appears that the "temporary" workaround I suggested earlier is perfectly viable longer term, at least until someone can work out why the DMA version causes issues. The stack size changes also make sense. I've extensively tested my own version of the firmware that includes the fix and it works very reliably both with the usb commands above, and also with nanovna-saver. Rgds, Dave |
to navigate to use esc to dismiss