¿ªÔÆÌåÓý

ctrl + shift + ? for shortcuts
© 2025 Groups.io

Temporary Workaround to improve USB command stability


 

To Erik, Hugen, ttrf (de facto code "owners")

*The Command Stability Issue:*

Some people have reported some stability issues with the usb command
interface, including the screen going white.

I have also noticed this issue with my own python experiments, and have
been investigating it over the last few days. I've found an easy and
reliable way reproduce the issue: I use putty to repeatedly type the "freq"
command with different frequencies, and monitor the result on the tx port
with a frequency counter. Although the usb interface continues to work, the
frequency stops changing after 4-8 "freq" commands. The screen will also
randomly go white, but much less frequently.

Once the nanovna gets into this state it needs to be powered off and on
again to restore normal operation; the usb reset command isn't sufficient.
I've reproduced this issue with Hugen's Aug 2nd firmware, and with a build
today based on Erik's firmware (here:
).

*The Temporary Workaround:*

In investigating this issue I have found that the problem is somehow
related to the UI thread calling draw_all_cells() in parallel to the usb
command thread running. Stability is improved by restricting when this
function is called. See the code snippet (just 4 lines are moved)(
)

This is a temporary fix until we find what the underlying cause of this
interaction is.

As described below, Hugen's firmware also needs stack size changes to
improve stability.

*Stack Size Issues*

The symptoms we see are typically associated with a thread's stack
overflowing the allocated area and trampling on adjacent memory. Indeed the
command thread makes function calls at least 12 deep to process the "freq"
command, and together with the parameters passed and local variables in
each function, consumes a lot of stack!

However I've eliminated this as a cause (at least for Erik's version). I
did so by following the chibiOS guide re. stack sizes (here:
). The default
nanovna build already set sets the mentioned build variables
CH_DBG_ENABLE_STACK_CHECK and CH_DBG_FILL_THREADS. I implemented a simple
memory dump command to let me do this testing (code also in the gist linked
above).

My conclusion is that Erik's firmware version has just 4 bytes spare stack
space. This could easily be accidentally "blown" by a future simple
software change (adding a parameter or another local variable). I therefore
recommend increasing the usb thread's stack size by 64 bytes, (and reducing
the UI stack size by the same amount, since it's stack is larger than
needed and there's no other memory left). The two lines to change are also
in the gist above.

I also believe Hugen's version has the usb thread far too small - the
"freq" command will definitely overwrite the UI's stack and cause
instability - this needs fixing.

We already have a proliferation of firmware versions, so I won't add to the
problem by adding my own. Instead I suggest the existing owners investigate
and take these changes forward as they see fit.

Rgds,
Dave


 

Several days ago I reported stability issues with some usb commands (
/g/nanovna-users/message/2379 ) . This is a follow-up...

First, to summarise the issue, a number of commands sent on the usb port
leads to nasty stability issues (frequency freezing, the screen going
white). The problem commands are: "freq", "scan", "offset", "power". (These
all make calls to an internal routine called set_frequency() to set the
output frequency).

I was surprised at the time that more people weren't reporting similar
problems, especially given the popularity of nanovna-saver, which uses the
usb interface. However I've since looked at nanovna-saver and discovered
that the only commands is uses are "info", "sweep", "data", "frequencies".
These commands instead rely on the UI Thread to call set_frequency() for
all 101 data points. It very nicely avoids the problems I encountered.

This means that the issues I've found will *not* affect most users. However
if you are dabbling with writing your own pc software you should stay clear
of the problem commands.

The following is therefore only relevant to those that want to use these
problem commands, and to those wanting to improve overall firmware
stability.

I continued to investigate the cause of the instability, partly because I
was concerned there was a random "rogue pointer" bug somewhere, and that my
test case just made the bug more easily reproducible - therefore finding
and fixing it might lead to other benefits.

I had previously found that stability was improved by having the UI thread
avoid calling draw_all_cells() (which updates the screen) whilst the usb
interface was active. I therefore focused my efforts on the screen drawing
code. The screen drawing logic is long and complicated (partly described by
ttrf here:

(use
google translate). It's therefore a perfect candidate for a subtle bug
causing a buffer's bounds to be exceeded.

However, instead I isolated the problem to inside the screen-driver itself:
ili9341.c. The source code includes 2 alternate implementations
of ili9341_bulk(), selection between them being via an #ifdef. This routine
outputs a large amount of data to the LCD screen. The "live" version uses
DMA to transfer the information fast and without CPU intervention. The
commented-out version transfers the data via the CPU alone, without DMA.
When I switched to building with the CPU version the stability issue had
gone! I have tried to narrow down what specifically about the DMA version
causes stability issues, but have made no further progress. However I have
eliminated memory corruption issues in the screen drawing logic as the
cause of the instability.

Note: the DMA version doesn't cause problems normally because the UI thread
only ever calls set_frequency() and ili9341_bulk() sequentially; the
problem only arises if the calls occur in parallel.

Next steps:
Permanently reverting to the non-DMA version of ili9341_bulk() isn't really
an option as the screen refresh is considerably slower.

It therefore appears that the "temporary" workaround I suggested earlier is
perfectly viable longer term, at least until someone can work out why the
DMA version causes issues.

The stack size changes also make sense.

I've extensively tested my own version of the firmware that includes the
fix and it works very reliably both with the usb commands above, and also
with nanovna-saver.

Rgds,
Dave