Open Source DSP XOs

Status
This old topic is closed. If you want to reopen this topic, contact a moderator using the "Report Post" button.
why do you have to interrupt the processor for each sample ?

Well at present I'm not using interrupts at all, rather polling for the next sample when I've finished with the current one. But perhaps in future I'll learn how to take advantage of the serial comms FIFO and then I'll just need to poll every 4 samples. At present I have a simple serial-parallel converter with a latch which only holds a single stereo sample.

Surely this is not a requirement for an active crossover where a bit of latency is neither here nor there.

Quite so that its not a requirement, just it suits the particular SoC and method of interfacing to the incoming data rather well.
 
Well at present I'm not using interrupts at all, rather polling for the next sample when I've finished with the current one. But perhaps in future I'll learn how to take advantage of the serial comms FIFO and then I'll just need to poll every 4 samples. At present I have a simple serial-parallel converter with a latch which only holds a single stereo sample.



Quite so that its not a requirement, just it suits the particular SoC and method of interfacing to the incoming data rather well.

you are throwing away valuable clock cycles doing the polling. It would probably add up to much more than the interrupt latency time where the processor could be doing other things.

what you're gaining on the roundabouts you are losing on the swings ;)
 
Last edited:
Seems that abraxalito's approach is to use a Cortex-M0 as close as possible to the DAC, adding intelligence to a DAC. Consider a digital audio source like S/PDIF. Consider a stereo 3-way crossover. I guess abraxalito would use one S/PDIF receiver followed by three Cortex-M0 operating in parallel. Perhaps four when adding some global equalization, a long FIR perhaps. All Cortex-M0 see the LRCK (Fs) as main clock, also exploiting the BITCLK and LRCK. No MCLK needed as abraxalito prefears NOS DACs. The only functions abraxalito would add are 1) a high quality (analog) volume control driven by an infrared remote control and b) a message passing scheme for changing the filters coefficients. No doubt abraxalito would use an extra Cortex-M0 for this, dealing with the RC5 infrared receiver interrupts (the received bits) and dealing with a USB-device connexion, without disturbing the Cortex-M0s using as DSPs. The USB-device would connect on a USB-host (like a PC or a Tablet) running a GUI. From there would come the filter coefficients updates. The Cortex-M0 would then dispatch data to all other Cortex-M0s DSPs. For the interprocessor communication, knowing there could be five Cortex-M0s as data receivers (general equalizing, subwoofer, woofer, medium, tweeter), and knowing that abraxalito has a preference for a compact layout (less EMI) and a "no interrupt" scheme, I would advise him a monomaster to multislaves serial line, unidirectional, using a handmade SPI scheme (data, clock, enable) using bitbanging and polling, conveying 32 bit data words preceded by a 8-bit CPU identity prefix and a 24 bit local address for knowing where to write the data. With a bitrate equal to half the sampling frequency. Let's see what it becomes viewed from a Cortex-M0 DSP. When it has finished processing the sample, instead of waiting the new sample (polling), he polls the data communication serial line input, and processes any transition he sees over there. All Cortex-M0s used as DSP are wired in parallel, what's regarding the data communication line. Processing one transition is straightforward. This is a state-machine. In such state machine, you need to define states for reading the 8-bit CPU identity, comparing it to its own local CPU identity, reading the 24-bit address, reading the 32-bit data, and finally, writing the 32-bit data in local memory. You manage the communication at a bitrate 1/2 of Fs. If you need to update five 32-bit words, you will do five separate transactions. This guarantees that any Cortex-M0 doing the DSP, will only spend between 10 to 20 cycles for the communication, after having processed each audio sample.
 
Up to this point, pretty much right steph_tsf :)

For the interprocessor communication, knowing there could be five Cortex-M0s as data receivers (general equalizing, subwoofer, woofer, medium, tweeter), and knowing that abraxalito has a preference for a compact layout (less EMI) and a "no interrupt" scheme, I would advise him a monomaster to multislaves serial line, unidirectional, using a handmade SPI scheme (data, clock, enable) using bitbanging and polling, conveying 32 bit data words preceded by a 8-bit CPU identity prefix and a 24 bit local address for knowing where to write the data.

I'm not at all a fan of bit-banging, that's one way to squander perfectly good CPU cycles which could be going towards a longer FIR filter. All the LPCs have UARTs so I'd explore using them before descending into the abyss of bit-banging :) The 9-bit (RS485) mode of the UART is a bit fiddly to use, but would allow some kind of addressing function.
 
I'm not at all a fan of bit-banging, that's one way to squander perfectly good CPU cycles which could be going towards a longer FIR filter.
Disagree. The bitbang scheme I am advising only steals about 20 CPU cycles per audio sample, a hard to beat figure, with a maximal value that's deterministic. The only caveat is the relatively slow datacom bitrate (half the sampling frequency).
All the LPCs have UARTs so I'd explore using them before descending into the abyss of bit-banging :) The 9-bit (RS485) mode of the UART is a bit fiddly to use, but would allow some kind of addressing function.
If you stick to a "no interrupt" scheme, you would poll the "new data available" flag from the UART. If the UART is configured for handling one byte at a time, and if you have a discipline like dealing with one datacom byte per audio sample, such scheme will only steal about 20 CPU cycles per sample with a maximal value that's deterministic, and deliver a 8 x bitrate compared to the above bitbang scheme. The flip side is the datacomm bit clock, a new clock in the system, potentially causing EMI in case there is some datacom trafic while the audio is on.

I'm afraid that if you want a "more efficient" datacomm, not dealing with one bit datacom per audio sample, or not dealing with one byte datacom per audio sample, you'll get a difficulty in predicting how many CPU cycles need to be allocated for the datacomm task, after having completed the DSP task.

Provided that the 1/2 Fs bitrate is enough, the bitbang scheme dealing with one bit datacom per audio sample is the most valuable scheme, compromise-free. In an audiophile context, it should be high on your list.
 
Last edited:
Disagree. The bitbang scheme I am advising only steals about 20 CPU cycles per audio sample, a hard to beat figure, with a maximal value that's deterministic. The only caveat is the relatively slow datacom bitrate (half the sampling frequency).

Ah I hadn't fully digested the fact that the bit banging was operating on such a low bitrate - objection dropped :) There's always an exception to the 'no bit banging' rule.

I'm afraid that if you want a "more efficient" datacomm, not dealing with one bit datacom per audio sample, or not dealing with one byte datacom per audio sample, you'll get a difficulty in predicting how many CPU cycles need to be allocated for the datacomm task, after having completed the DSP task.

I haven't reached this point at the current level of detail in my own designs - I'm still at the broad brush stage. Thanks for pointing out the pitfall in determinacy.

Provided that the 1/2 Fs bitrate is enough, the bitbang scheme dealing with one bit datacom per audio sample is the most valuable scheme, compromise-free. In an audiophile context, it should be high on your list.

Noted.
 
Another consideration in the single-sample versus sample-block processing is the opportunity to halt the processor to save power. Many embedded processors have the ability to halt the processor until a particular hardware event occurs. By matching several samples together in a block, not only can the firmware potentially use more efficient instructions (SIMD), but it becomes possible to handle hundreds of samples very quickly and then halt the processor while DMA collects the next block. A simple end-of-block interrupt from the DMA can wake the processor periodically. Granted, this requires that the hardware allow DMA between peripheral and memory to continue while the instruction stream is halted, and I'm sure that not all processors are designed for this.
 
What's unproductive about writing in assembler?

<edit> It looks to me that you've fundamentally missed the whole point of DIY. Its not about 'being productive' (as if that's an end in itself) is about having fun (i.e. producing a feeling of well-being) while being productive of things we can't buy commercially. At least that's how it works for me.
 
Last edited:
What's unproductive about writing in assembler?

<edit> It looks to me that you've fundamentally missed the whole point of DIY. Its not about 'being productive' (as if that's an end in itself) is about having fun (i.e. producing a feeling of well-being) while being productive of things we can't buy commercially. At least that's how it works for me.

assembler is not portable whereas C usually is.
 
Cortex M0 assembler I predict is going to become the world's most portable computer language ever, in terms of the number of platforms that will run it natively.
I guess you want bare metal programming, and for this you think you need to code in assembly. There is another way to do it. You may use the C language in a very restrictive way only using interger arithmetics, integer arrays, no characters, define most variables as public, define a few counters and flags as "register" (if your C compiler doesn't ignore such directive), only use simple control flow (if-then-else, switch, do while, break, goto), avoid calling functions so prefear storing the result of evaluations in dedicated variables, be cautious with pointers and structures (check the code generated by your compiler), and last but not least, avoid all input and output features, and avoid all unix/linux system interfaces. This way, you could generate efficient, readable, maintenable and portable code for the Cortex-M0/1/3/4 family. On top of this you would use assembler for your IIRs and FIRs, using different macros depending on the Cortex version you are running, finely tuned to the Cortex version you are running.
 
Last edited:
I guess you want bare metal programming, and for this you think you need to code in assembly.

The reason I want bare metal programming is because its such fun. So having fun programming is the primary metric to optimise, in my experience the most fun is to be had with tight hand-coding and optimizing every cycle, knowing all the quirks of the hardware and working around or in step with them. In contrast C programmers are at the mercy of their compiler and its ability to optimize and have little need to know machine specifics. If I wrote in C I'd be mulling over the machine code it generated anyway and wondering why it was idiotic in its use of resources. So why not just take the (to me unnecessary) abstraction layer out of the way from the start?

C coding certainly has its place, tiny, ultra-low power systems with heavily limited resources isn't one of them in my estimation.
 
My point was only that in most cases you can implement the same function in C in less time.

If you really enjoy writing in assembler then sure, but I don't know many people who do.

C is heavily used in automotive and implantable medical devices. Assembler is difficult to maintain, error prone, gives you no ability to do static analysis and makes unit testing more difficult.
 
Last edited:
The reason I want bare metal programming is because its such fun. In contrast C programmers are at the mercy of their compiler and its ability to optimize and have little need to know machine specifics. So why not just take the (to me unnecessary) abstraction layer out of the way from the start?
Try the datacom bitbang scheme as an example. Where do you see an "abstraction layer" if you code it in a very restrictive C like described above ? I think that you would have fun coding the datacom bitbang using such restrictive C. I think that initially, you rejected the bitbang datacom scheme because of the annoyance of coding it in assembly !
 
No, I rejected it initially not for the coding challenges (I enjoy them, provided the purpose is worthwhile) but for the apparent waste of CPU resources. When you presented your case in more detail then I noticed the 20 cycles you estimated was indeed fairly immaterial in the scheme of things.
 
Its not merely the processor, its all the peripherals too. But yeah, my aim is to get to know them better than the compiler writer :)
Agree, I/O and peripheral control are areas where handmade assembly code is generally overperforming C compiled code. Such handmade assembly code may deliver a ready-to-use 32-bit integer, to be accessed by the C compiler. This way you may combine the advantages of both worlds.
 
I disagree with the claim that assembly programming is a waste of time.

In my experience, a mix of C and assembly is most beneficial. Interrupt routines, FFT, and other DSP should generally be coded in assembly, unless you already have an efficient API from the chip manufacturer. The high-level program flow logic can be in C because it is not usually time-critical.

assembler is not portable whereas C usually is.
Interrupt routines are not portable, because they are highly dependent upon the specific peripherals that they are servicing.

As for FFT or other serious DSP subroutines, if you're coding in C for portability then you probably are not using your processor very efficiently.

Companies like Texas Instruments have entire documents devoted to the subset of C that you need to use to take advantage of special addressing modes, bit reversal, and other constructs that are common to DSP but simply not able to be expressed in C. If you want to code those sorts of things in C and place copious comments warning maintainers not to change a single character, then proceed at your own risk. It's not going to be portable anyway, although it might work with a slightly different processor that uses the same compiler.

I'm speaking primarily of actual DSP chips, rather than general purpose processors. However, every processor that I've studied seems to have a few instructions that are useful for speeding things up in ways that the compiler usually misses. This is particularly true for vector processing and other DSP-related tasks.

In general, embedded firmware is not portable. I actually think that it's a mistake to code embedded firmware using techniques that are aimed at portability rather than optimization.
 
Status
This old topic is closed. If you want to reopen this topic, contact a moderator using the "Report Post" button.