Don't know how to send this to one person as K9HZ probably has to answer this.
In winding L2 I find I cannot get 7 turns done I am a little short - maybe not strong enough to pull it tighter.
My questions. I have 18 gauge wire - is that adequate to create L2 or do I need 20 gauge.
I can get 20 gauge at aliexpress but it probably is not silver coated and finish is not TFE.
I found it at one vendor but the min length is 100 ft. at an insane price.
Can I use 20 gauge PVC wire - no silver coating.
IF not can I buy another piece from K9HZ.
Thanks for any help 73,
Bruce VA3BKI
|
Re: Learning while on vacation
The superscalar aspect of the pipelines was when they added additional execution units to handle branch prediction so that they could compute both paths. As they added more simultaneous instructions to achieve higher throughputs (4-way superscalar), that was hen when the hyper threading abstraction came into existence.
|
Re: Learning while on vacation
|
Re: Learning while on vacation
Somewhere I have an article about the DSP/CPU versus FPGA wars of the late 1990s/early 2000s. FPGA won!? Even though FPGAs clock at much slower speeds, the ability to transform data in a single clock cycle wins!
There are three distinct hardware technologies which can be combined in various ways for an optimal solution to a specific problem. In order of fastest to slowest:
1.? Hard silicon.? (meaning you take the algorithms in FPGA and make an IC) 2.? FPGA. 3.? General purpose CPU + DSP specific hardware.
One of the vendors I worked with while still employed (Thank God I am retired!) told our team story of what was going on in the defense industry at the time. There was a huge and growing demand for IC designers of the hard silicon type.? They wanted to replace the FPGAs in aircraft and other flying objects with hard silicon. Why would that be?? It turns out there were so many FPGAs in them it had become a significant weight and power factor which was limiting the aircraft's performance! So when you take an FPGA and turn it into hard silicon, there is a significant size and power reduction.? But you can't re-program hard silicon.? So you have to make damn sure the algorithm you are going to turn into hard silicon is well refined!
At the time I had no idea how dependent the defense industry had become on FPGAs.? I guess even the smallest missiles have a handful of FPGAs in them. FPGAs also found their way into the finance industry for high-speed trading type applications.? A bunch of the very best technical talent got sucked up by Wall Street! Same goes for the crypto miners.? They were cranking out hard-silicon for that purpose.? Super-strong motivation to be the fastest!
I'll keep my eyes out for that article about the FPGA versus DSP wars.? Interesting stuff!
73 Greg KF5N
|
Re: Learning while on vacation
On 2023-09-26 17:23, Steve Clark wrote: With all of this extra circuitry that got added to these 'superscalar' architectures, the processor designers came up with a way to package it into things like 'hyperthreading' *** My understanding of "hyperthreading" is that it's just multiple sets of CPU registers. I did a project with an 8051 back in the 80's. The '51 had two sets of registers aka "register banks" that you chose by hitting a single bit in IIRC the status register. That second register bank was EXTREMELY advantageous for interrupt processing. Because it saved us from having to push & pop everything. Not only that, but the second bank could save info from previous interrupts. According to Wikipedia, "superscalar" is not about pipelining. It's about simultaneous execution of multiple instructions. So - multiple execution units in a single core. Want to do 6 instructions at once? You need 6 ALU's. - Jerry, KF6VB and whatever AMD called their virtual 'processors.' These are actually just beefed up versions of the superscalar architectures.
Now what is really interesting, is that pipelining is EXACTLY what the strong point of FPGAs is. If you have some processing where you can break it down into stages where each stage does a small piece, and feeds it to the next stage, then your data can flow through the pipeline a clock at a time, and while there is a N clock cycle latency for the final answer, you basically are computing a full solution every single clock cycle. For FPGAs, you want to put the algorithms into them that have NO branching, so that you don't have deal with all of that crap. Let the superscalar CPU deal with all the branching type of algorithms. And use the FPGA to do the easily pipelined operations.
I am currently in the process of reading the awesome book that W8TEE, and am interested in FPGA based DSP. There are a couple of FPGA boards I'd like to also suggest. One of them is actually in the 'teensy' line of design:
The other one is a Chinese model, but it actually has a 800 MHz 2-core ARM embedded into it also, along with an amazing amount of I/O peripherals, for the best price I've seen for such functionality:
I have one of the second boards, and am in the process of getting the first one. These boards using xilinx based FPGAs instead of altera (Intel).
73,
Steve KE0DXH
Links: ------ [1] /g/AmateurRadioBuilders/message/2546 [2] /mt/101231490/243852 [3] /g/AmateurRadioBuilders/post [4] /g/AmateurRadioBuilders/editsub/243852 [5] /g/AmateurRadioBuilders/leave/11522011/243852/920863695/xyzzy
|
Re: Learning while on vacation
Also, here is the originator of that Trenz ¡°teensy¡± style fpga board that is manufactured by Trenz in Germany (I think):
|
Re: Learning while on vacation
Here is what looks to be a really good fft implementation (for fpga).
The author - Dan Guisselquist, seems to be fairly knowledgeable about FPGA development. If you look on his GitHub page he has a number of interesting projects, including a soft cpu core, and a CORDIC block which is the heart of a lot of radio DSP type functions.
|
Re: Learning while on vacation
is a good place to start
-- John G0ORX
On 27/09/2023 04:45, K9HZ wrote:
toggle quoted message
Show quoted text
I would like to start a list with free sources for IP. ?Thats
where my interests lies at the moment. ?
Dr.?William
J. Schmidt - K9HZ J68HZ 8P6HK ZF2HZ PJ4/K9HZ VP5/K9HZ
PJ2/K9HZ
?
Owner -
Operator
Big Signal
Ranch ¨C K9ZC
Staunton,
Illinois
?
Owner ¨C
Operator
Villa
Grand Piton - J68HZ
Soufriere,
St. Lucia W.I.
Rent it:
email:??bill@...
?
On Sep 26, 2023, at 10:22 PM, ian007 via
groups.io <ian007@...> wrote:
?
Yeah, I just jumped in to try to help folks understand
that the RT1062 has plenty of ¡°horse power.¡±? Then it seemed
that some did not understand pipelining or superscalar when
I quoted from the NXP RT1062 documentation.? Hopefully, we
won¡¯t get duped into a monorail like the folks in the
Simpsons ;)
?
In an earlier post, I said that I suspected that DMA
wasn¡¯t used and/or configured properly.? I also said that
the original ConvolutionSDR source code provided us various
opportunities for optimizations and improvements.? After
some of those are implemented, I don¡¯t think anyone will be
searching for a replacement for the RT1062, unless it has
less ¡°horse power¡± than the RT1062.? How low can we go?
?
¡°Once CPU clocks got above around 100 MHz, they had to do
this pipelining in order to allow the ¡®subcircuits¡¯ to
function at the higher clock rates.¡±? -- Steve (KE0DXH)
?
Also, if it helps folks understand, the CPU clock speed
is based at least on the size of each pipeline stage.? At
each pipeline stage, signals travel through it, and the
fewer transistors the signals have to travel through, the
faster the stage can be run.? In other words, the fewer
transistors the signals have to travel through, the faster a
stable result can be reached at the end of the stage.? For
instance, the stable result is provided to the next pipeline
stage at the rising edge of the CPU clock.
?
As an example, in the early 2000¡¯s, the PowerPC processor
had a seven-stage (or six-stage, I can¡¯t remember) pipeline
processor running around 500MHz, while inside the Pentium,
there was around a 23-stage pipeline processor running at
two to three times that speed (i.e., much smaller pipeline
stages).? Back then, a lot of people equated CPU clock speed
with computational performance.? Perhaps, it¡¯s still a lot
of people now¡
?
One of the shortcomings of a lot of pipeline stages
includes missing a branch prediction.? When, not if, a
branch prediction is missed, the pipeline must be cleared
(or at least most of it), and all that CPU work in the
pipeline must be dashed.? All the work in any pipeline must
be thrown out when a branch prediction is missed, but it may
¡°hurt¡± less with shorter pipelines.
?
Anyway, enough geeking out for one night, but I¡¯d be
interested in seeing what FPGA features can be implemented
with this project.
?
Ian
|
Re: Learning while on vacation
I would like to start a list with free sources for IP. ?Thats where my interests lies at the moment. ?
Dr.?William J. Schmidt - K9HZ J68HZ 8P6HK ZF2HZ PJ4/K9HZ VP5/K9HZ PJ2/K9HZ ? Owner - Operator Big Signal Ranch ¨C K9ZC Staunton, Illinois ? Owner ¨C Operator Villa Grand Piton - J68HZ Soufriere, St. Lucia W.I. Rent it: www.VillaGrandPiton.com
email:??bill@... ?
toggle quoted message
Show quoted text
On Sep 26, 2023, at 10:22 PM, ian007 via groups.io <ian007@...> wrote:
? Yeah, I just jumped in to try to help folks understand that the RT1062 has plenty of ¡°horse power.¡±? Then it seemed that some did not understand pipelining or superscalar when I quoted from the NXP RT1062 documentation.? Hopefully, we won¡¯t get duped into a monorail like the folks in the Simpsons ;)
?
In an earlier post, I said that I suspected that DMA wasn¡¯t used and/or configured properly.? I also said that the original ConvolutionSDR source code provided us various opportunities for optimizations and improvements.? After some of those are implemented, I don¡¯t think anyone will be searching for a replacement for the RT1062, unless it has less ¡°horse power¡± than the RT1062.? How low can we go?
?
¡°Once CPU clocks got above around 100 MHz, they had to do this pipelining in order to allow the ¡®subcircuits¡¯ to function at the higher clock rates.¡±? -- Steve (KE0DXH)
?
Also, if it helps folks understand, the CPU clock speed is based at least on the size of each pipeline stage.? At each pipeline stage, signals travel through it, and the fewer transistors the signals have to travel through, the faster the stage can be run.? In other words, the fewer transistors the signals have to travel through, the faster a stable result can be reached at the end of the stage.? For instance, the stable result is provided to the next pipeline stage at the rising edge of the CPU clock.
?
As an example, in the early 2000¡¯s, the PowerPC processor had a seven-stage (or six-stage, I can¡¯t remember) pipeline processor running around 500MHz, while inside the Pentium, there was around a 23-stage pipeline processor running at two to three times that speed (i.e., much smaller pipeline stages).? Back then, a lot of people equated CPU clock speed with computational performance.? Perhaps, it¡¯s still a lot of people now¡
?
One of the shortcomings of a lot of pipeline stages includes missing a branch prediction.? When, not if, a branch prediction is missed, the pipeline must be cleared (or at least most of it), and all that CPU work in the pipeline must be dashed.? All the work in any pipeline must be thrown out when a branch prediction is missed, but it may ¡°hurt¡± less with shorter pipelines.
?
Anyway, enough geeking out for one night, but I¡¯d be interested in seeing what FPGA features can be implemented with this project.
?
Ian
|
Re: Learning while on vacation
Yeah, I just jumped in to try to help folks understand that the RT1062 has plenty of ¡°horse power.¡±? Then it seemed that some did not understand pipelining or superscalar when I quoted from the NXP RT1062 documentation.? Hopefully, we won¡¯t get duped into a monorail like the folks in the Simpsons ;)
?
In an earlier post, I said that I suspected that DMA wasn¡¯t used and/or configured properly.? I also said that the original ConvolutionSDR source code provided us various opportunities for optimizations and improvements.? After some of those are implemented, I don¡¯t think anyone will be searching for a replacement for the RT1062, unless it has less ¡°horse power¡± than the RT1062.? How low can we go?
?
¡°Once CPU clocks got above around 100 MHz, they had to do this pipelining in order to allow the ¡®subcircuits¡¯ to function at the higher clock rates.¡±? -- Steve (KE0DXH)
?
Also, if it helps folks understand, the CPU clock speed is based at least on the size of each pipeline stage.? At each pipeline stage, signals travel through it, and the fewer transistors the signals have to travel through, the faster the stage can be run.? In other words, the fewer transistors the signals have to travel through, the faster a stable result can be reached at the end of the stage.? For instance, the stable result is provided to the next pipeline stage at the rising edge of the CPU clock.
?
As an example, in the early 2000¡¯s, the PowerPC processor had a seven-stage (or six-stage, I can¡¯t remember) pipeline processor running around 500MHz, while inside the Pentium, there was around a 23-stage pipeline processor running at two to three times that speed (i.e., much smaller pipeline stages).? Back then, a lot of people equated CPU clock speed with computational performance.? Perhaps, it¡¯s still a lot of people now¡
?
One of the shortcomings of a lot of pipeline stages includes missing a branch prediction.? When, not if, a branch prediction is missed, the pipeline must be cleared (or at least most of it), and all that CPU work in the pipeline must be dashed.? All the work in any pipeline must be thrown out when a branch prediction is missed, but it may ¡°hurt¡± less with shorter pipelines.
?
Anyway, enough geeking out for one night, but I¡¯d be interested in seeing what FPGA features can be implemented with this project.
?
Ian
|
Re: Learning while on vacation
It¡¯s kind of interesting where this discussion has gone. Ian¡¯s clothing description is describing what¡¯s called ¡®pipelining¡¯ its where you break a higher level function into smaller parts, and then stagger the execution so that at each clock, each of the ¡®parts¡¯ can do its piece of the larger function. Once CPU clocks got above around 100 MHz, they had to do this pipelining in order to allow the ¡®subcircuits¡¯ to function at the higher clock rates. The only problem with the pipelining, is that sometimes you get an instruction that has 2 possible ¡®outcomes¡¯ i.e. a branch instruction. To solve this problem, the CPU designers started adding additional logic to do both steps of the instructions in the same clock, and then after the condition of the branch is known, the correct one can be picked. To feed all of this at the higher clock rates required inventing all of the caching that you probably are familiar with that intelligently peeks ahead and grabs ¡®blocks¡¯ of RAM to support processing both stages of the branching.
With all of this extra circuitry that got added to these ¡®superscalar¡¯ architectures, the processor designers came up with a way to package it into things like ¡®hyperthreading¡¯ and whatever AMD called their virtual ¡®processors.¡¯ ?These are actually just beefed up versions of the superscalar architectures.
Now what is really interesting, is that pipelining is EXACTLY what the strong point of FPGAs is. If you have some processing where you can break it down into stages where each stage does a small piece, and feeds it to the next stage, then your data can flow through the pipeline a clock at a time, and while there is a N clock cycle latency for the final answer, you basically are computing a full solution every single clock cycle. For FPGAs, you want to put the algorithms into them that have NO branching, so that you don¡¯t have deal with all of that crap. Let the superscalar CPU deal with all the branching type of algorithms. And use the FPGA to do the easily pipelined operations.
I am currently in the process of reading the awesome book that W8TEE, and am interested in FPGA based DSP. There are a couple of FPGA boards I¡¯d like to also suggest. One of them is actually in the ¡®teensy¡¯ line of design: ?
The other one is a Chinese model, but it actually has a 800 MHz 2-core ARM embedded into it also, along with an amazing amount of I/O peripherals, for the best price I¡¯ve seen for such functionality:
I have one of the second boards, and am in the process of getting the first one. These boards using xilinx based FPGAs instead of altera (Intel).
73,
Steve KE0DXH
|
Re: Learning while on vacation
Jerry:
A quick look at Wikipedia reveals:? "A superscalar processor is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor."
In pipelining, different stages of the processor perform their respective portions of work in executing an instruction.? In classic RISC (reduced instruction set computer) architecture, a processor has five stages:? instruction fetch, instruction decode, execute, memory access, and writeback. Each stage consumes one clock cycle (notwithstanding floating point operations).??You'd have to consult the NXP documentation for the six stages of the Teensy 4.0 and 4.1 processor.
A basic analogy of pipelining could be washing, drying, and ironing clothes.? A piece of clothing is put in the washer, then put in the drier, then ironed.? When a piece of clothing exits the washer, another piece of clothing can be put into the washer and continue down clothing processing the pipeline.
Sticking with that basic analogy, if your clothing system was superscalar, two pieces of clothing are put into two respective and different washers, then the two pieces of clothing are put into two respective and different?driers, and then the two pieces of clothing are ironed by two respective and different irons.
That's pretty basic.? Some stages of a processor may be parallel and still called superscalar.? For instance, years ago, Sun Microsystems came out a SPARC processor with a superscalar ALU (arithmetic logic unit).
In the clothing processing system analogy, one could have a washer, a dryer, and two irons, since one piece of clothing may take longer to iron than another. One could even put lots of clothes in the washer and drier stages and then go to a single iron stage.??That way, one can speed up the pipeline by doing some portion of the processing in parallel.
While two pieces of clothing are likely always not dependent upon on another, two instructions may or may not be.? For example, one instruction may depend on a result from another instruction.? Depending where that result is, like still being processed, will dictate if those two instructions can be executing in parallel.
Those are very basic analogies/examples, but I hope they help =)? The book Computer Architecture: A Quantitative Approach explains things much better.
Ian
|
Re: Learning while on vacation
On 2023-09-22 12:12, ian007 via groups.io wrote: Wes et al.:
The processor of the Teensy 4.0 and 4.1 does have parallel functionality. It has a 6-stage _SUPERSCALAR_ pipeline. *** Does this mean that it can execute six instructions at the same time ( assuming no data dependencies )? - Jerry, KF6VB
|
File /K9HZ 20W 1-54 MHz RF Amplifier module/20W AMP 160M-6M Build Instructions V2.4 092423.pdf updated
#file-notice
The following files and folders have been updated in the Files area of the [email protected] group.
By: K9HZ <bill@...>
Description:
20W PA Build and Setup Instructions V2.4
|
Re: Learning while on vacation
On Fri, Sep 22, 2023 at 03:34 PM, Jim Strohm wrote:
Just for grins, I chose my home QTH and the first?place I could think of in Australia, which was Adelaide, ?Turns out that the distance is about 9,184 miles as the 747 flies.
?
Ignoring ionospheric bounces, that distance takes 186,000 mi/ sec / 9,178 miles, or about 1/20 sec flight time, as an electron flies. ?
?
Or 0.05 sec.
When I wrote my previous message that mentioned latency, I could have been more clear. I blame jet lag and diving back into the T41 project right after a vacation. This discussion might be confusing two different types of latency. In the short term, on a scale of microseconds, "Interrupt latency" is the delay between a hardware event and the start of interrupt processing in software. This isn't usually a problem unless the main body of software disables interrupts for long periods. There's a related delay. Long interrupt service routines or very frequent interrupts can slow down "normal" non-interrupt processing. For instance, at 192 ksps, new I and Q samples from the T41's receive ADC appear every 5.2 microseconds. Even on a 600 MHz processor, handling each sample individually via an interrupt would put a significant load on the processor. The T41 gets around this by using the Teensy Audio library, which works pretty much "off the shelf" to handle ADC and DAC samples via DMA to and from RAM buffers. No interrupts are needed. However, the DSP functions work on buffers of (if memory serves) 512 I and Q samples. So latency of the T41 receiver, RF to audio, is at least 5.2 milliseconds, half to fill a buffer with I/Q samples, and half to fill a buffer with digital audio samples before streaming them out to analog audio. Add to this the delay to push signals through the DSP software chain, which shares processor time with waterfall display calculations and other functions. It would be interesting to measure these delays for both transmit and receive. Transmit and receive latencies under 50 milliseconds or so are probably not going to affect amateur radio operations, even contesting. Tatsuya Hirahara JQ3ALW has an article in the September 2023 issue of QEX where he measures the latency of two commercial transceivers. He found that a simple analog direct-conversion receiver has a latency of about 0.5 msec, RF to audio, while the Icom IC-7630 had receive latency of 8 to 16 msec, and the IC-7851 clocked in at 4 to 20 msec. The latency of the two commercial rigs increased as receive bandwidth was narrowed. It wouldn't be hard to make similar measurements on the T41. This is getting far afield from FPGA vs. software processing, but what I meant earlier was that the T41 avoids a lot of microsecond-scale delays by using DMA, but has the millisecond-scale delays for handling buffered data that are a feature of many SDR designs. Wearing my software engineering hat for a moment, I'd say this was a good design tradeoff for Convolution SDR and the T41 project. Wes AC8JF
|
Re: Learning while on vacation
I'd like to throw in a couple of things to this great discussion
from a systems perspective.
In respect of latency one could ask questions like these perhaps:
Does/will it exist?
Does it matter, and if it does is it known so that corrections
might be applied.
I had to ask these questions back in 1988 when I was instructed
to get the time right on a network broadcast station. (Reading the
news at the wrong time is frowned upon.....)
In my case latency would exist in various fixed links and could
be determined so correction figures for time data were known and
applied to what was sent. Audible time signals were sent early so
the time in London was the same as a long way up north.
However in some cases latency would be present but not easily
known. The way that was handled was to use data modems which
established connections between both ends and then worked out the
round trip time by pinging tones. So the latency was then known
and a correction figure could be applied to the transmitted time
data.
Processors weren't too clever back in 1988 but they were clever
enough for what I had to do.
Now we have the Teensy, FPGAs, and all sorts of wonderful things.
The same principles exist though.
Jim?? G4EQX
On 23/09/2023 08:16, John Melton via
groups.io wrote:
toggle quoted message
Show quoted text
Having written a lot of code for Hermes and Apache
Labs radios I can assure that it does not do all the DSP work in
the FPGA.
On the receive side the hardware ADC samples 64
MHz of spectrum with 24 bit I and Q samples. This is why it
needed an FPGA to be able to handle all that data.
Depending on the hardware version and the size
of the FPGA it could then output over the Ethernet interface
(or the earlier USB interface) between 1 and 7 DDC steams of
24 bit I and Q samples at sample rates of 48k to 384k centered
anywhere in the 64MHz. With this I could have multiple
receivers running concurrently on multiple bands.
The host system would then perform all the DSP
processing on these DDC streams. WDSP was written by Warren
Pratt and is used as the DSP library by both my Linux code and
the Windows code.
In a similar way the host processed microphone
input and performed the DSP code to output I and Q samples
over the Ethernet to then have the FPGA perform the DUC to
output through the DAC.
There are 2 different network protocols
implemented. Protocol 1 was the earlier version and required
all DDC streams to be at the same sample rate. Protocol 2 was
written to allow DDC streams to be at different sample rates,
plus a lot of other improvements.
-- John G0ORX?
On Sat, 23 Sept 2023, 00:32
K9HZ, < bill@...>
wrote:
I was
gonna say¡ from what I¡¯ve seen¡ even the Hermes
project¡ the FPGA did all of the digital processing
including FP DSP¡ but the A/D, D/A, and analog filters
were all external to the FPGA.
?
I¡¯ve
attached a couple block diagrams of SDRs¡? They are
almost functionally identical¡
?
?
Dr.
William J. Schmidt - K9HZ J68HZ 8P6HK ZF2HZ
PJ4/K9HZ VP5/K9HZ PJ2/K9HZ VP2EHZ
?
Owner
- Operator
Big
Signal Ranch ¨C K9ZC
Staunton,
Illinois
?
Owner
¨C Operator
Villa
Grand Piton ¨C J68HZ
Soufriere,
St. Lucia W.I.
Rent
it:
?
Moderator:
North American QRO Group at Groups.IO.
Moderator:
Amateur Radio Builders Group at Groups.IO.
?
email:?
bill@...
?
?
?
I think you will have problems implementing the DSP
functions in the FPGA.? It is fairly easy to get samples
from an A-D or D-A and do some processing on it to
implement a DDC or DUC, but to do any more requires
complex mathematics which would have to be implemented
on the FPGA using some code to implement a CPU that has
floating point.
I may be wrong but most of the FPGA based SDR radios
just implement the DDC? and DUC in the FPGA and leave
the rest to a CPU (ARM or Intel) using either a USB or
Ethernet interface to send/receive the I/Q data.
-- John G0ORX
On 22/09/2023 19:44, K9HZ wrote:
What
you say works for regular processors, even with
multiple cores.? Not for FPGAs.? They always have
many thread-equivalent tasks running independent of
the rest. They CAN communicate if you want them to
but its not necessary¡ and the type of communication
is akin to using shared memory so that the
individual tasks so they don¡¯t wait on each other.
?
Once
I am proficient in programming my FPGA kit, I will
build the equivalent of the QSE, QSD, the DSP
functions, and all the ¡°other stuff¡± into my FPGA as
separate and independent tasks.? Nothing will be
interrupt driven.? That is drastically different
from Arduino or RPi processing.
?
?
Dr.
William J. Schmidt - K9HZ J68HZ 8P6HK ZF2HZ
PJ4/K9HZ VP5/K9HZ PJ2/K9HZ VP2EHZ
?
Owner
- Operator
Big
Signal Ranch ¨C K9ZC
Staunton,
Illinois
?
Owner
¨C Operator
Villa
Grand Piton ¨C J68HZ
Soufriere,
St. Lucia W.I.
Rent
it:
?
Moderator:
North American QRO Group at Groups.IO.
Moderator:
Amateur Radio Builders Group at Groups.IO.
?
email:?
bill@...
?
?
?
On Fri, Sep 8, 2023 at 06:35 PM,
K9HZ wrote:
[T]he
Teensy is a device that executes code tasks
serially¡ meaning NOT with parallel functions¡ and
it must be interrupted to generate I/O.
?
Interrupts,
no matter how fast the processor is, cause chaos
with radio functions.? Makes audio discontinuous.?
Makes screen updates discontinuous. For code, it
makes perfect sense.? For continuous processes
like streaming audio, or a demodulator¡. It¡¯s bad.
Late to the discussion, but.. those
are dangerous words. :-) The primary rule of real-time
programming is that software must never cause a task
to miss its timing deadline. The corollary is that
when real-time software in a device meets all its
deadlines, a person perceives that the device is
running continuously. Yes, interrupts can cause
problems, but good design techniques have created
plenty of real-time applications, for instance your
car's engine controller, where the timing never fails.
Now, the T41-EP software has kludged together "soft"
real-time operation on a platform not designed for it,
Arduino. Uncertain timing for interrupts and DSP
functions is probably one reason why the T41, like the
Convolution SDR it's based on, uses direct memory
access (DMA) to buffer several milliseconds of each IQ
and audio channel to get around timing issues. The
price of this is transmitter and receiver latency. I'd
guess all SDR transceivers suffer many milliseconds of
signal latency. (See, for instance the September 2023
QEX.)
73,
Wes Plouff AC8JF
|
Re: Learning while on vacation
Having written a lot of code for Hermes and Apache Labs radios I can assure that it does not do all the DSP work in the FPGA.
On the receive side the hardware ADC samples 64 MHz of spectrum with 24 bit I and Q samples. This is why it needed an FPGA to be able to handle all that data.
Depending on the hardware version and the size of the FPGA it could then output over the Ethernet interface (or the earlier USB interface) between 1 and 7 DDC steams of 24 bit I and Q samples at sample rates of 48k to 384k centered anywhere in the 64MHz. With this I could have multiple receivers running concurrently on multiple bands.
The host system would then perform all the DSP processing on these DDC streams. WDSP was written by Warren Pratt and is used as the DSP library by both my Linux code and the Windows code.
In a similar way the host processed microphone input and performed the DSP code to output I and Q samples over the Ethernet to then have the FPGA perform the DUC to output through the DAC.
There are 2 different network protocols implemented. Protocol 1 was the earlier version and required all DDC streams to be at the same sample rate. Protocol 2 was written to allow DDC streams to be at different sample rates, plus a lot of other improvements.
-- John G0ORX?
toggle quoted message
Show quoted text
On Sat, 23 Sept 2023, 00:32 K9HZ, < bill@...> wrote: I was gonna say¡ from what I¡¯ve seen¡ even the Hermes project¡ the FPGA did all of the digital processing including FP DSP¡ but the A/D, D/A, and analog filters were all external to the FPGA. ? I¡¯ve attached a couple block diagrams of SDRs¡? They are almost functionally identical¡ ? ? Dr. William J. Schmidt - K9HZ J68HZ 8P6HK ZF2HZ PJ4/K9HZ VP5/K9HZ PJ2/K9HZ VP2EHZ ? Owner - Operator Big Signal Ranch ¨C K9ZC Staunton, Illinois ? Owner ¨C Operator Villa Grand Piton ¨C J68HZ Soufriere, St. Lucia W.I. Rent it: ? Moderator: North American QRO Group at Groups.IO. Moderator: Amateur Radio Builders Group at Groups.IO. ? email:? bill@... ? ? ? I think you will have problems implementing the DSP functions in the FPGA.? It is fairly easy to get samples from an A-D or D-A and do some processing on it to implement a DDC or DUC, but to do any more requires complex mathematics which would have to be implemented on the FPGA using some code to implement a CPU that has floating point. I may be wrong but most of the FPGA based SDR radios just implement the DDC? and DUC in the FPGA and leave the rest to a CPU (ARM or Intel) using either a USB or Ethernet interface to send/receive the I/Q data. -- John G0ORX On 22/09/2023 19:44, K9HZ wrote: What you say works for regular processors, even with multiple cores.? Not for FPGAs.? They always have many thread-equivalent tasks running independent of the rest. They CAN communicate if you want them to but its not necessary¡ and the type of communication is akin to using shared memory so that the individual tasks so they don¡¯t wait on each other. ? Once I am proficient in programming my FPGA kit, I will build the equivalent of the QSE, QSD, the DSP functions, and all the ¡°other stuff¡± into my FPGA as separate and independent tasks.? Nothing will be interrupt driven.? That is drastically different from Arduino or RPi processing. ? ? Dr. William J. Schmidt - K9HZ J68HZ 8P6HK ZF2HZ PJ4/K9HZ VP5/K9HZ PJ2/K9HZ VP2EHZ ? Owner - Operator Big Signal Ranch ¨C K9ZC Staunton, Illinois ? Owner ¨C Operator Villa Grand Piton ¨C J68HZ Soufriere, St. Lucia W.I. Rent it: ? Moderator: North American QRO Group at Groups.IO. Moderator: Amateur Radio Builders Group at Groups.IO. ? email:? bill@... ? ? ? On Fri, Sep 8, 2023 at 06:35 PM, K9HZ wrote: [T]he Teensy is a device that executes code tasks serially¡ meaning NOT with parallel functions¡ and it must be interrupted to generate I/O. ? Interrupts, no matter how fast the processor is, cause chaos with radio functions.? Makes audio discontinuous.? Makes screen updates discontinuous. For code, it makes perfect sense.? For continuous processes like streaming audio, or a demodulator¡. It¡¯s bad.
Late to the discussion, but.. those are dangerous words. :-) The primary rule of real-time programming is that software must never cause a task to miss its timing deadline. The corollary is that when real-time software in a device meets all its deadlines, a person perceives that the device is running continuously. Yes, interrupts can cause problems, but good design techniques have created plenty of real-time applications, for instance your car's engine controller, where the timing never fails.
Now, the T41-EP software has kludged together "soft" real-time operation on a platform not designed for it, Arduino. Uncertain timing for interrupts and DSP functions is probably one reason why the T41, like the Convolution SDR it's based on, uses direct memory access (DMA) to buffer several milliseconds of each IQ and audio channel to get around timing issues. The price of this is transmitter and receiver latency. I'd guess all SDR transceivers suffer many milliseconds of signal latency. (See, for instance the September 2023 QEX.)
73,
Wes Plouff AC8JF
|
Re: Learning while on vacation
I was gonna say¡ from what I¡¯ve seen¡ even the Hermes project¡ the FPGA did all of the digital processing including FP DSP¡ but the A/D, D/A, and analog filters were all external to the FPGA. ? I¡¯ve attached a couple block diagrams of SDRs¡? They are almost functionally identical¡ ? ? Dr. William J. Schmidt - K9HZ J68HZ 8P6HK ZF2HZ PJ4/K9HZ VP5/K9HZ PJ2/K9HZ VP2EHZ ? Owner - Operator Big Signal Ranch ¨C K9ZC Staunton, Illinois ? Owner ¨C Operator Villa Grand Piton ¨C J68HZ Soufriere, St. Lucia W.I. Rent it: ? Moderator: North American QRO Group at Groups.IO. Moderator: Amateur Radio Builders Group at Groups.IO. ? email:? bill@... ? ?
toggle quoted message
Show quoted text
From: [email protected] < [email protected]> On Behalf Of John Melton via groups.io Sent: Friday, September 22, 2023 2:02 PM To: [email protected]Subject: Re: [AmateurRadioBuilders] Learning while on vacation ? I think you will have problems implementing the DSP functions in the FPGA.? It is fairly easy to get samples from an A-D or D-A and do some processing on it to implement a DDC or DUC, but to do any more requires complex mathematics which would have to be implemented on the FPGA using some code to implement a CPU that has floating point. I may be wrong but most of the FPGA based SDR radios just implement the DDC? and DUC in the FPGA and leave the rest to a CPU (ARM or Intel) using either a USB or Ethernet interface to send/receive the I/Q data. -- John G0ORX On 22/09/2023 19:44, K9HZ wrote: What you say works for regular processors, even with multiple cores.? Not for FPGAs.? They always have many thread-equivalent tasks running independent of the rest. They CAN communicate if you want them to but its not necessary¡ and the type of communication is akin to using shared memory so that the individual tasks so they don¡¯t wait on each other. ? Once I am proficient in programming my FPGA kit, I will build the equivalent of the QSE, QSD, the DSP functions, and all the ¡°other stuff¡± into my FPGA as separate and independent tasks.? Nothing will be interrupt driven.? That is drastically different from Arduino or RPi processing. ? ? Dr. William J. Schmidt - K9HZ J68HZ 8P6HK ZF2HZ PJ4/K9HZ VP5/K9HZ PJ2/K9HZ VP2EHZ ? Owner - Operator Big Signal Ranch ¨C K9ZC Staunton, Illinois ? Owner ¨C Operator Villa Grand Piton ¨C J68HZ Soufriere, St. Lucia W.I. Rent it: ? Moderator: North American QRO Group at Groups.IO. Moderator: Amateur Radio Builders Group at Groups.IO. ? email:? bill@... ? ? ? On Fri, Sep 8, 2023 at 06:35 PM, K9HZ wrote: [T]he Teensy is a device that executes code tasks serially¡ meaning NOT with parallel functions¡ and it must be interrupted to generate I/O. ? Interrupts, no matter how fast the processor is, cause chaos with radio functions.? Makes audio discontinuous.? Makes screen updates discontinuous. For code, it makes perfect sense.? For continuous processes like streaming audio, or a demodulator¡. It¡¯s bad.
Late to the discussion, but.. those are dangerous words. :-) The primary rule of real-time programming is that software must never cause a task to miss its timing deadline. The corollary is that when real-time software in a device meets all its deadlines, a person perceives that the device is running continuously. Yes, interrupts can cause problems, but good design techniques have created plenty of real-time applications, for instance your car's engine controller, where the timing never fails.
Now, the T41-EP software has kludged together "soft" real-time operation on a platform not designed for it, Arduino. Uncertain timing for interrupts and DSP functions is probably one reason why the T41, like the Convolution SDR it's based on, uses direct memory access (DMA) to buffer several milliseconds of each IQ and audio channel to get around timing issues. The price of this is transmitter and receiver latency. I'd guess all SDR transceivers suffer many milliseconds of signal latency. (See, for instance the September 2023 QEX.)
73,
Wes Plouff AC8JF
|
Re: Learning while on vacation
Wes et al.:
?
The processor of the Teensy 4.0 and 4.1 does have parallel functionality.? It has a 6-stage superscalar pipeline.
?
This project was born from source code that provided us many opportunities for improvements and optimizations.? For instance, some of those (more esoteric) optimizations include:? moving the interrupt table into RAM, moving frequency called interrupt handlers into RAM, and suspending interrupts for time-critical sections (if we have any).? I¡¯m sure there are lots more (less esoteric) improvements and optimizations, but I have just started looking at the source code and am waiting to buy one of the T41-EP kits.
?
One should also consider the hardware of the NXP MIMXRT1062DVJ6B.? An ADC (analog to digital converter) of the MIMXRT1062DVJ6B is distinct from and operates independently of the processing unit (or ¡°CPU¡± if you will) and can write directly to RAM (see, e.g., direct memory access).? Then when the processing unit ¡°has time,¡± it can process the data from RAM (while the ADC can continue to gather new data and write that new to RAM).? (Darn it!!? That sounds like parallelism again!!)? The same goes for a DAC (operating in reverse of an ADC process) and other peripheral hardware of the MIMXRT1062DVJ6B.? If (1) the hardware is configured properly and the code takes advantage of that or (2) is VERY fast, a human will not notice any latency.
?
Since some of us are here to learn, I¡¯ll leave it as an exercise to learn/read about:? direct memory access (DMA), DMA channels, real-time processing, and hard real-time processing.? Entire books have been written about the latter two, but the Wikipedia pages on any of those can probably provide a better (and more in depth) explanation than I can in a group posting.? Also, if one is not familiar with the term ¡°superscalar,¡± that can also be an exercise to learn ;)
?
Ian
|
Re: Learning while on vacation
Yes, so do many many others like the RadioBerry, the HPSDR hardware, the QSR1, Prometheus, etc¡ all open source. ? ? Dr. William J. Schmidt - K9HZ J68HZ 8P6HK ZF2HZ PJ4/K9HZ VP5/K9HZ PJ2/K9HZ VP2EHZ ? Owner - Operator Big Signal Ranch ¨C K9ZC Staunton, Illinois ? Owner ¨C Operator Villa Grand Piton ¨C J68HZ Soufriere, St. Lucia W.I. Rent it: ? Moderator: North American QRO Group at Groups.IO. Moderator: Amateur Radio Builders Group at Groups.IO. ? email:? bill@... ? ?
toggle quoted message
Show quoted text
From: [email protected] < [email protected]> On Behalf Of Lou, KI5FTY Sent: Friday, September 22, 2023 2:07 PM To: [email protected]Subject: Re: [AmateurRadioBuilders] Learning while on vacation ? Flex uses fpga for all the processing ? On Fri, Sep 22, 2023 at 2:01?PM John Melton via <john.d.melton=[email protected]> wrote: I think you will have problems implementing the DSP functions in the FPGA.? It is fairly easy to get samples from an A-D or D-A and do some processing on it to implement a DDC or DUC, but to do any more requires complex mathematics which would have to be implemented on the FPGA using some code to implement a CPU that has floating point. I may be wrong but most of the FPGA based SDR radios just implement the DDC? and DUC in the FPGA and leave the rest to a CPU (ARM or Intel) using either a USB or Ethernet interface to send/receive the I/Q data. -- John G0ORX On 22/09/2023 19:44, K9HZ wrote: What you say works for regular processors, even with multiple cores.? Not for FPGAs.? They always have many thread-equivalent tasks running independent of the rest. They CAN communicate if you want them to but its not necessary¡ and the type of communication is akin to using shared memory so that the individual tasks so they don¡¯t wait on each other. ? Once I am proficient in programming my FPGA kit, I will build the equivalent of the QSE, QSD, the DSP functions, and all the ¡°other stuff¡± into my FPGA as separate and independent tasks.? Nothing will be interrupt driven.? That is drastically different from Arduino or RPi processing. ? ? Dr. William J. Schmidt - K9HZ J68HZ 8P6HK ZF2HZ PJ4/K9HZ VP5/K9HZ PJ2/K9HZ VP2EHZ ? Owner - Operator Big Signal Ranch ¨C K9ZC Staunton, Illinois ? Owner ¨C Operator Villa Grand Piton ¨C J68HZ Soufriere, St. Lucia W.I. Rent it: ? Moderator: North American QRO Group at Groups.IO. Moderator: Amateur Radio Builders Group at Groups.IO. ? email:? bill@... ? ? ? On Fri, Sep 8, 2023 at 06:35 PM, K9HZ wrote: [T]he Teensy is a device that executes code tasks serially¡ meaning NOT with parallel functions¡ and it must be interrupted to generate I/O. ? Interrupts, no matter how fast the processor is, cause chaos with radio functions.? Makes audio discontinuous.? Makes screen updates discontinuous. For code, it makes perfect sense.? For continuous processes like streaming audio, or a demodulator¡. It¡¯s bad.
Late to the discussion, but.. those are dangerous words. :-) The primary rule of real-time programming is that software must never cause a task to miss its timing deadline. The corollary is that when real-time software in a device meets all its deadlines, a person perceives that the device is running continuously. Yes, interrupts can cause problems, but good design techniques have created plenty of real-time applications, for instance your car's engine controller, where the timing never fails.
Now, the T41-EP software has kludged together "soft" real-time operation on a platform not designed for it, Arduino. Uncertain timing for interrupts and DSP functions is probably one reason why the T41, like the Convolution SDR it's based on, uses direct memory access (DMA) to buffer several milliseconds of each IQ and audio channel to get around timing issues. The price of this is transmitter and receiver latency. I'd guess all SDR transceivers suffer many milliseconds of signal latency. (See, for instance the September 2023 QEX.)
73,
Wes Plouff AC8JF
|