¿ªÔÆÌåÓý

CW Mistakes -- Stack collisions


Jack, W8TEE
 

Yep, stack collisions are insidious because you may not even know it happened.

For anyone who might be unaware, the controller's SRAM is used for data storage. Think of it as being divided into two parts: 1) the Heap, which is at the "bottom" of SRAM, and 2) the Stack, which grows downward from the top as you use the Stack. Any time you define a variable with global scope, or any static variables, in the sketch, they are placed in the Heap. Pretend that the Heap has used up 1000 bytes of SRAM, so that much of your SRAM is taken before you code even starts to run. This leaves just over 1000 bytes of Stack space on a Nano.

As your sketch starts to execute, local variables (those with function or statement block scope) are allocated on the stack. Each time you call a function, any arguments passed to the function are placed on the Stack and read from the Stack by the function. (Each function call also has some overhead info placed on the Stack, too, so it knows where to return to when the function call is finished.) Once into the function, any of its variables are also placed on the Stack. If that function calls another function, the new function call also chews into the Stack space. Each time you use the Stack, that new data is like a new pile of clean salad plates getting placed on the plate dispenser at a salad bar. The more plates, the deeper the bottom of the stack is pushed downward. Pile enough data on the Stack, and you can push the bottom of the Stack into the top of the Heap space, overwriting the data stored there. Functions that call functions that call functions (e.g., like recursive function calls) often can cause this kind of problem.? This is called a Stack Collision.

When a Stack Collision happens, white smoke pours out of...naw, just kidding. Indeed, that's the bad part: you may not even know it happened. If the Heap data that at the top just got clobbered, but you don't access it again, you won't know it's gone. Likewise, your first function call when the program started is on the bottom of the Stack and its data might never get popped back off the Stack. This is what I call a "Silent Stack Crash".

With this understanding, now think about the 2K of SRAM on the Nano compared to 1024Kof SRAM on the Teensy 4.1. In many applications, the limiting factor on a program is SRAM, not flash. If people are thinking about replacing the ATMega328 with something else, keep an eye on the replacement's SRAM resources...they are important.

Jack, W8TEE

On Sunday, May 16, 2021, 10:59:43 PM EDT, Jerry Gaffke via groups.io <jgaffke@...> wrote:


Those are some really good hints on how to instrument code.
A good reason to find a processor with a few more pins than the Nano.

Doesn't really need to be timer based,? can just set pins high when doing?
the various jobs, perhaps include a pin for when it's idle. This is far better than trying
to print stuff to a serial monitor or a display, as writing to a pin is pretty much instantaneous.

One of those $20 DSO138's would be adequate as a scope for this sort of thing.

Driving LED's from those pins would sort of work, but not nearly as informative as a scope.

Checking the stack like that is a very good idea, it can be really hard to debug when the
stack fills up to where it starts stepping on your global variables.? All it takes is one array
invoked as a dynamic variable, or a poorly written library function.? Could set hundreds of bytes
to the magic 0x55 value and count them each loop if you wish, so long as you have time to .
do the counting.

Jerry, KE7ER


On Sun, May 16, 2021 at 06:15 PM, jerry@... wrote:
On 2021-05-16 16:15, Evan Hand wrote:
All,
While I agree that there is enough sample time for the ADC, I have not
seen any analysis on the polling time of the main loop of the program.
I once worked on an embedded project with the following architecture, which is more or less what I use when starting a project from scratch:

1. A main loop. Each run through the loop takes a defined time ( we used 50ms, or 20 times a second ).

2. A timer interrupt. On that project, it happened every 2 ms. It bumped a counter on every invocation.

3. The main loop would time out its 50ms by counting the counter done by the 2ms interrupt.

4. When the main loop finished whatever useful things it was doing, it would set a pin high and wait till the timer count was 50ms. Then it would set the pin low, and go back to the beginning of the loop.

5. We would stick an oscilloscope on that pin, and it would tell us if the CPU was too busy. As the CPU loaded up, the pulse would get
shorter and shorter, because the CPU was spending less time just waiting at the end of the 50ms loop.

This was done by an Intel 8031 running at 9MHz, with VERY limited resources.

At the top of the main loop, the stack pointer was always at the same number. If it wasn't, we knew something was badly wrong, and the processor would log the occurrence and reset.

We also set the top 3 bytes of the stack to 0x55. If it wasn't 0x55 anymore, we knew we were close to blowing the stack.

- Jerry KF6VB

--
Jack, W8TEE