High quality Raspberry Pi 24bit/384k I2S card

@m0rci: Congrats and hats off.

I have been thinking about a similar project, just not for extending stereo I2S (because 4ch/8ch I2S are already easily available), but extending 8ch to 16ch or even to 32ch. There are very very few USB UAC2 -> 16ch I2S bridges, and basically none USB UAC2 -> 32ch I2S bridges, while the USB gadget can handle that data flow easily (it could run 50+ 48kHz channels with just a minor modification). But the principle is identical, no matter how many channels.

Please let me a few questions: IIUC you generate all your I2S clocks with 2350. Does 2350 have enough dividers to generate both incoming I2S clocks and outgoing I2S, or do you generate the outgoing (i.e. slower) I2S with PIO? Or do you use individual PIOs for generating the incoming and outgoing clocks with PIO assembly program, using the 2350 PLLd master clock (you say 8x BCLK)?

Does 2350 have an option for external precise clock (e.g. standard audio MCLK 24.576MHz), to avoid any PLL alltogether? Does it allow switching between clocks (e.g. to allow switching between 48kHz and 44.1kHz clocks)?

Thanks a lot!
 
A single PIO state machine generates all the clock signals (MCLK, BCLK, LRCLK) for the incoming and for the outgoing I2S signals. Notice that the two outgoing I2S signals share the same clock so it's just matter of outputting 6 clock signals (notice also that the RPi does not need a master clock and not even the MAX98357s I'm using for testing need it, but the MA12070 wants it so... it's not like I'm paying for it).
When I realized that I could offload al the clocks to a single SM and use others for signal management instead of trying to intermix clocks and signals management, it all became incredibly easier. And of course you can only do this if you're the clock master!
About clock "precision"... at this stage I'm just using a standard Pico clone board, so the PIOs' clock is extracted from the sysclk of the board. And since there is a 12MHz xtal on the board, all the 48k, 96k, etc... rates are spot on. I really don't care about 44.1 and multiples but, yes, in the future, the idea is to switch to a custom PCB and there I could use a "audio" xtal. Notice that also these small Pico boards support external clock generators / external xtal, so I assume you could "run" them with an audio xtal.
BTW: I'm using the 2040, I do also have a couple of 2350s but for this stuff I did not need them.
Then again: I'm not thinking at this as a general solution. I just support a single fixed incoming signal rate, with a single supported signal format (S32_LE) and I'm producing two stereo outgoing signals with half that rate. Adding support for dynamic rate switching, more formats, and so on would be relatively easy, but that's not my use case.
 
I have been thinking about a similar project, just not for extending stereo I2S (because 4ch/8ch I2S are already easily available), but extending 8ch to 16ch or even to 32ch. There are very very few USB UAC2 -> 16ch I2S bridges, and basically none USB UAC2 -> 32ch I2S bridges, while the USB gadget can handle that data flow easily (it could run 50+ 48kHz channels with just a minor modification). But the principle is identical, no matter how many channels.
Have you considered the creation of an ALSA virtual device that accepts, say, a 16 channels signal and just transfers this stuff (with the most convenient encoding) to a Pico via SPI? The Pico then would put all the data in a buffer that is used to feed 16 I2S output signals. They would share the same clocks, looks quite straightforward. The issue I see is that RPis do not run as SPI slaves so you will have a rate mismatch to handle. But then again you could manage the skew with some form of backpressure and when the RPi goes to much ahead or to much behind you could recur to resampling. I'm sure it can be done, not trivial but feasible.
 
I think you'd probably want the pico to be in slave mode not the pi. I do this for something else and have one core of the pico doing the work, or in your case the feeding the I2S to the DAC's. Then the other core in the pico is getting data as an spi slave to the pi. The code I posted above is the spi slave pio code. The pico also signals to the pi via a gpio that it is ready for data. That triggers the pi to send. One thing I have noted is it is difficult to send more than 16 bytes in a transaction though. I don't remember why anymore. Another issue I think I ran into was the pi wants to toggle CS for every byte, or word, can't remember anymore. Slows things down even more. So what I did is add the overlay on the pi that moves CS to an unrouted gpio and then use bit banging for CS. So a transfer is CS low via gpio, transfer 16 bytes, CS high via gpio.
 
When I realized that I could offload al the clocks to a single SM and use others for signal management instead of trying to intermix clocks and signals management, it all became incredibly easier. And of course you can only do this if you're the clock master!
IIUC generating the clocks (i.e. handling synchronous I2S) is the key enabler of your design. I thought about using PIOs for the deserialization of asynchronous I2S and it would require either external clock dividers (PIO's instructions handling only the data lines), or produce a high level of jitter on the clock lines (since clocked by async clock). I got stuck there. Your method solves this.

And since there is a 12MHz xtal on the board, all the 48k, 96k, etc... rates are spot on.
12MHz/48kHz/64bit = 3.90625 - that requires fractional dividers IIUC. These are (non-randomly) jittery by design. But replacing the clock with a different one which needs only an integer divider is easy, that's true.

Have you considered the creation of an ALSA virtual device that accepts, say, a 16 channels signal and just transfers this stuff (with the most convenient encoding) to a Pico via SPI? The Pico then would put all the data in a buffer that is used to feed 16 I2S output signals. They would share the same clocks, looks quite straightforward. The issue I see is that RPis do not run as SPI slaves so you will have a rate mismatch to handle. But then again you could manage the skew with some form of backpressure and when the RPi goes to much ahead or to much behind you could recur to resampling.
I am afraid that would ruin the low and especially fixed deterministic latency a good I2S deserializer should have. IMO the final I2S output must run synchronously to the alsa interface, to allow proper timing in the apps before the interface. Also good adaptive resampling of the many channels I would consider would take huge CPU load. The bridges often have to adaptively resample already (incoming master clock), another resampling would be too much, IMO. Also it would preclude running on low-power ARM SoCs which allow powering from the USB bus directly (like the Radxa Pi-S core with 16ch I2S already).

I would probably prefer to stay with I2S only (plus e.g. I2C for controlling the deserializer, of course).

Also I thought of using PIOs only, no DMA, again to keep the timing exact.

BTW - how did you solve "marking" the first channel in the RPi I2S, so that PIO deserializes to correct channels? IIRC there were several I2S deserialization projects which used another GPIO from the driver for the marks but that was not 100% reliable due to timing delays, especially at higher samplerates. I thought of "misusing" LSBs of one channel of the serialized I2S - for 32bit length the LSB is way below audible. I think good to keep everything serial, fast parallel is tough (PATA -> SATA, SCSI -> SAS, PCI -> PCI-e, etc.)

Do you consider releasing your code?
 
For some reason the "quote" button does not appear in your message for me. Do anybody know why? Oh well, manual quoting, I guess...
12MHz/48kHz/64bit = 3.90625
That shows I can't do math. But yes, I really didn't dig into that, the idea was to look at it for the "PCB version".
I am afraid that would ruin the low and especially fixed deterministic latency a good I2S deserializer should have.
OK, I need to know that you aim is, otherwise I cannot understand what your problem is.
I assume you are performing some kind of "live processing", otherwise you should not care about the "deterministic latency" of the outgoing I2S. As long as the output clock is stable and as long as the application can refill the output buffer fast enough, you will be fine.
However, I can see issues if you have a setup like:
I2S_IN -> IN_BUFFER -> APP -> OUT_BUFFER -> I2S_OUT
in this case, if the IN rate and the out rate are not exactly in sync, sooner or later you will incur in a buffer overrun at the input (when the output rate is lower) or a buffer underrun at the output (when the output rate is higher).
To avoid that, the most straightforward solution is for the PC to be the clock master for both I2S_IN and I2S_OUT or to use an external clock provider and set all devices as clock slaves.
Notice that I really did never program a system of this kind, it's all based on my assumptions about how this stuff should work, yet I would be surprised to know that things are different.
The solution we're discussing is:
I2S_IN -> IN_BUFFER -> APP -> OUT_BUFFER -> SPI -> PICO_BUFFER -> I2S_OUT
As you can see there is not a requirement on the SPI to be synchronous, as long as the average rate of the SPI signal prevents the PICO_BUFFER from overrunning or underrunning (are these even words?).
So the problem is not in the SPI part, the problem, again, is having I2S_IN and I2S_OUT running at the very same rate.
If you can use the PICO as the clock provider this will not be a problem. If you can't... well I have a couple of solutions in mind.
Do you consider releasing your code?
If I ever find the time to polish it in a reasonable way... and that could very well be never, but who knows...
 
Last edited:
BTW - how did you solve "marking" the first channel in the RPi I2S, so that PIO deserializes to correct channels? IIRC there were several I2S deserialization projects which used another GPIO from the driver for the marks but that was not 100% reliable due to timing delays, especially at higher samplerates. I thought of "misusing" LSBs of one channel of the serialized I2S - for 32bit length the LSB is way below audible. I think good to keep everything serial, fast parallel is tough (PATA -> SATA, SCSI -> SAS, PCI -> PCI-e, etc.)
Yes, the plan is to use the LSB as an in-band control signal. Notice that the only problem I have is that the input signal is:
L1 R1 L2 R2 L1 R1 L2 R2...
and I can distinguish L from R (given the LRCLK) but I cannot distinguish L1 from L2. The idea is to set the LSB of L1 to zero and the LSB of L2 to 1 (or whatever) and I will be fine.
At this point the implemented solution is even simpler: the Pico initially waits for a non-zero L frame and it assumes that's a control frame, it ignores it and it starts the processing from the next frame (that will be an L1). On the PC side I just make sure to add a 0x000001,0x00000000 sequence in front of the actual signal and I'm fine.
 
For some reason the "quote" button does not appear in your message for me.
The last post does not contain a quote button. However, it can be quoted by highlighting text with the mouse. Then a little local quote button appears which is what I used to quote your post.

Other than that, approximating an I2S clock frequency by switching between two frequencies to achieve the correct average frequency is asking for trouble in a dac. Clock jitter can be quite audible, and a deliberately jittered one has proven to be a bad idea.
 
Last edited:
The last post does not contain a quote button. However, it can be quoted by highlighting text with the mouse. Then a little local quote button appears which is what I used to quote your post.
Thanks for the hint, I'm obviously a newbie here.
Other than that, approximating an I2S clock frequency by switching between two frequencies to achieve the correct average frequency is asking for trouble in a dac. Clock jitter can be quite audible, and a deliberately jittered one has proven to be a bad idea.
I'm not getting who this applies to. I don't remember anybody here suggesting to switch between frequencies...
 
The solution we're discussing is:
I2S_IN -> IN_BUFFER -> APP -> OUT_BUFFER -> SPI -> PICO_BUFFER -> I2S_OUT
As you can see there is not a requirement on the SPI to be synchronous, as long as the average rate of the SPI signal prevents the PICO_BUFFER from overrunning or underrunning
That is true. But for low-latency applications the alsa device should not only run one average in the correct rate of the output clock, but also the period expirations should be as regularly-paced as possible, so that the buffers involved can be minimal. The SPI running at a different speed than the output clock would have to be gated by some feedback protocol from the MCU, yet the alsa device should be consuming samples at regular intervals. That means another buffer in the driver. Then there is another buffer in the MCU for DMA to the PIO emulating the I2S protocol. All these buffers and feedback gating not only increases the latency, but also reduce precision of the latency. E.g. Video can be delayed for the audio latency, but only if the latency is known in advance and reasonably stable between restarts.

E.g. CamillaDSP being a great example of an I/O buffer with very advanced async resampling. As you said input and output clocks can differ. Async/adaptive resampling monitors the I/O rate ratio and adjusts the resampling ratio as needed. The resampling ratio should not change a lot because any change in the ratio introduces minor but unremovable distortions to the stream. However if the consumer/output rate deviates (although on average it's spot on), the feedback value will deviate too. And the slower the feedback, the larger the buffers must be to accommodate the deviation before the feedback fixes the buffer fill.

That's why I am looking for a way to use the I2S interface directly (only one buffer -> DMA -> short FIFO in the I2S interface), running as slave, with the perfectly deterministic PIO for deserializing the multiple-rate data line to the several final-rate data lines.

The idea is to set the LSB of L1 to zero and the LSB of L2 to 1 (or whatever) and I will be fine.
That was exactly what I thought too. It's corrupting the L's LSBs but acceptable at 32bit, IMO.
At this point the implemented solution is even simpler: the Pico initially waits for a non-zero L frame and it assumes that's a control frame, it ignores it and it starts the processing from the next frame (that will be an L1)
I would be a bit afraid that pausing or restarts especially after xruns can cause the first sample be skipped/dropped. IMO the method should allow resyncing within the running stream, like all streaming protocols do (SPDIF - the side bits, RTP, AC3/DTS - chunks preceeded by a header, etc...)
 
Last edited:
Its something that was done with RPi GPIO bus in the past to approximate I2S bus clock frequencies needed by dacs. Apparently the RPi clock was not integer divisible to produce the exact needed clock frequencies.
It's the only way for RPi4 and older to generate master I2S clocks. That's why I find it a shame that most RPi hats are I2S slaves instead of masters with local clock, and many users resort to plain FIFO hacks instead of properly slaving the I2S transmitter to precise clock in the first place.
 
That's why I am looking for a way to use the I2S interface directly (only one buffer -> DMA -> short FIFO in the I2S interface), running as slave, with the perfectly deterministic PIO for deserializing the multiple-rate data line to the several final-rate data lines.
I understand. Yet I would not ditch the solution a priori. If the in and out rates are indeed the same (which would be the case if using all devices as slaves) you can probably get away with very small buffers.
And in any case, the Pico solution just adds another buffer to the chain (it's not like you have three more). If you can keep the size of this buffer at a reasonable size maybe you could get away with it.
Consider that most existing sound applications apply quite a lot of buffering (often just to "play safe") and most users do not ever realize that. But yes, if you cannot enforce the same clocks at in and out you also have the issues with resampling and so forth...
 
I would be a bit afraid that pausing or restarts especially after xruns can cause the first sample be skipped/dropped. IMO the method should allow resyncing within the running stream, like all streaming protocols do (SPDIF - the side bits, RTP, AC3/DTS - chunks preceeded by a header, etc...)
And that's the reason why I'm planning to adopt the other solution.
 
  • Like
Reactions: phofman
Consider that most existing sound applications apply quite a lot of buffering (often just to "play safe") and most users do not ever realize that.
Yes, but the massively multichannel model which I am interested in (as these are not readily available) is mostly for AV and/or advanced audio work where latency matters. That's why a solution I would be looking for (just considering for now 🙂 ) would have the small and predictable latency as a major criterium. I would imagine a hat-sized module (not limited to RPi, but the same input pinout for 8ch I2S, 1x I2C, and likely several GPIOs) with two selectable audio-frequency-compatible clocks and several RP2350s for extending the 8ch 384kHz I2S to either 16ch (192kHz output max) or 32ch (96kHz output max) (the PIO code would likely be similar). The I2S outputs would include MCLK master line for the codecs, as (basically) all I2S interfaces outside the RPi realm do.
 
I'm not getting who this applies to. I don't remember anybody here suggesting to switch between frequencies...
This is basically how fractional frequency dividers work. The division ratio is still integer, usually modulated by sigma delta. The clock edge positions can be compensated by dynamic delay lines to minimize the TIA. It is basically a noise shaping technique to move the noise to higher offset frequencies.