More time domain samples will not provide better resolution because of the upper frequency limit. However, if the UI allowed picking a peak on the time domain display, one could window that in time, Fourier transform and do a linear fit to the phase to get the delay time and display that.
For peaks which are close together, the Rayleigh criterion set by the maximum frequency is the best you can do. However, the phase fit is most easily done with just a single peak. For an isolated peak the phase resolution and wavelength of the highest frequency measured determines the spatial resolution. I doubt that the velocity factor sufficiently consistent to do better than a few cm.
A basis pursuit will resolve closely spaced reflections, but that's not something that an STM32F series MCU can do.
Off the top of my head, I don't see a way to solve d = a0*exp(j*2*pi*f*t0) + a1*exp(j*2*pi*f*t1) by least squares. Though you might get close enough by estimating a0 & a1 in the time domain and only solving for for t0 and t1.
I wish I had time to work on this more, but other tasks must be attended to.
Have Fun!
Reg