High quality Raspberry Pi 24bit/384k I2S card

@m0rci: Congrats and hats off.

I have been thinking about a similar project, just not for extending stereo I2S (because 4ch/8ch I2S are already easily available), but extending 8ch to 16ch or even to 32ch. There are very very few USB UAC2 -> 16ch I2S bridges, and basically none USB UAC2 -> 32ch I2S bridges, while the USB gadget can handle that data flow easily (it could run 50+ 48kHz channels with just a minor modification). But the principle is identical, no matter how many channels.

Please let me a few questions: IIUC you generate all your I2S clocks with 2350. Does 2350 have enough dividers to generate both incoming I2S clocks and outgoing I2S, or do you generate the outgoing (i.e. slower) I2S with PIO? Or do you use individual PIOs for generating the incoming and outgoing clocks with PIO assembly program, using the 2350 PLLd master clock (you say 8x BCLK)?

Does 2350 have an option for external precise clock (e.g. standard audio MCLK 24.576MHz), to avoid any PLL alltogether? Does it allow switching between clocks (e.g. to allow switching between 48kHz and 44.1kHz clocks)?

Thanks a lot!
 
A single PIO state machine generates all the clock signals (MCLK, BCLK, LRCLK) for the incoming and for the outgoing I2S signals. Notice that the two outgoing I2S signals share the same clock so it's just matter of outputting 6 clock signals (notice also that the RPi does not need a master clock and not even the MAX98357s I'm using for testing need it, but the MA12070 wants it so... it's not like I'm paying for it).
When I realized that I could offload al the clocks to a single SM and use others for signal management instead of trying to intermix clocks and signals management, it all became incredibly easier. And of course you can only do this if you're the clock master!
About clock "precision"... at this stage I'm just using a standard Pico clone board, so the PIOs' clock is extracted from the sysclk of the board. And since there is a 12MHz xtal on the board, all the 48k, 96k, etc... rates are spot on. I really don't care about 44.1 and multiples but, yes, in the future, the idea is to switch to a custom PCB and there I could use a "audio" xtal. Notice that also these small Pico boards support external clock generators / external xtal, so I assume you could "run" them with an audio xtal.
BTW: I'm using the 2040, I do also have a couple of 2350s but for this stuff I did not need them.
Then again: I'm not thinking at this as a general solution. I just support a single fixed incoming signal rate, with a single supported signal format (S32_LE) and I'm producing two stereo outgoing signals with half that rate. Adding support for dynamic rate switching, more formats, and so on would be relatively easy, but that's not my use case.
 
I have been thinking about a similar project, just not for extending stereo I2S (because 4ch/8ch I2S are already easily available), but extending 8ch to 16ch or even to 32ch. There are very very few USB UAC2 -> 16ch I2S bridges, and basically none USB UAC2 -> 32ch I2S bridges, while the USB gadget can handle that data flow easily (it could run 50+ 48kHz channels with just a minor modification). But the principle is identical, no matter how many channels.
Have you considered the creation of an ALSA virtual device that accepts, say, a 16 channels signal and just transfers this stuff (with the most convenient encoding) to a Pico via SPI? The Pico then would put all the data in a buffer that is used to feed 16 I2S output signals. They would share the same clocks, looks quite straightforward. The issue I see is that RPis do not run as SPI slaves so you will have a rate mismatch to handle. But then again you could manage the skew with some form of backpressure and when the RPi goes to much ahead or to much behind you could recur to resampling. I'm sure it can be done, not trivial but feasible.
 
I think you'd probably want the pico to be in slave mode not the pi. I do this for something else and have one core of the pico doing the work, or in your case the feeding the I2S to the DAC's. Then the other core in the pico is getting data as an spi slave to the pi. The code I posted above is the spi slave pio code. The pico also signals to the pi via a gpio that it is ready for data. That triggers the pi to send. One thing I have noted is it is difficult to send more than 16 bytes in a transaction though. I don't remember why anymore. Another issue I think I ran into was the pi wants to toggle CS for every byte, or word, can't remember anymore. Slows things down even more. So what I did is add the overlay on the pi that moves CS to an unrouted gpio and then use bit banging for CS. So a transfer is CS low via gpio, transfer 16 bytes, CS high via gpio.
 
When I realized that I could offload al the clocks to a single SM and use others for signal management instead of trying to intermix clocks and signals management, it all became incredibly easier. And of course you can only do this if you're the clock master!
IIUC generating the clocks (i.e. handling synchronous I2S) is the key enabler of your design. I thought about using PIOs for the deserialization of asynchronous I2S and it would require either external clock dividers (PIO's instructions handling only the data lines), or produce a high level of jitter on the clock lines (since clocked by async clock). I got stuck there. Your method solves this.

And since there is a 12MHz xtal on the board, all the 48k, 96k, etc... rates are spot on.
12MHz/48kHz/64bit = 3.90625 - that requires fractional dividers IIUC. These are (non-randomly) jittery by design. But replacing the clock with a different one which needs only an integer divider is easy, that's true.

Have you considered the creation of an ALSA virtual device that accepts, say, a 16 channels signal and just transfers this stuff (with the most convenient encoding) to a Pico via SPI? The Pico then would put all the data in a buffer that is used to feed 16 I2S output signals. They would share the same clocks, looks quite straightforward. The issue I see is that RPis do not run as SPI slaves so you will have a rate mismatch to handle. But then again you could manage the skew with some form of backpressure and when the RPi goes to much ahead or to much behind you could recur to resampling.
I am afraid that would ruin the low and especially fixed deterministic latency a good I2S deserializer should have. IMO the final I2S output must run synchronously to the alsa interface, to allow proper timing in the apps before the interface. Also good adaptive resampling of the many channels I would consider would take huge CPU load. The bridges often have to adaptively resample already (incoming master clock), another resampling would be too much, IMO. Also it would preclude running on low-power ARM SoCs which allow powering from the USB bus directly (like the Radxa Pi-S core with 16ch I2S already).

I would probably prefer to stay with I2S only (plus e.g. I2C for controlling the deserializer, of course).

Also I thought of using PIOs only, no DMA, again to keep the timing exact.

BTW - how did you solve "marking" the first channel in the RPi I2S, so that PIO deserializes to correct channels? IIRC there were several I2S deserialization projects which used another GPIO from the driver for the marks but that was not 100% reliable due to timing delays, especially at higher samplerates. I thought of "misusing" LSBs of one channel of the serialized I2S - for 32bit length the LSB is way below audible. I think good to keep everything serial, fast parallel is tough (PATA -> SATA, SCSI -> SAS, PCI -> PCI-e, etc.)

Do you consider releasing your code?
 
For some reason the "quote" button does not appear in your message for me. Do anybody know why? Oh well, manual quoting, I guess...
12MHz/48kHz/64bit = 3.90625
That shows I can't do math. But yes, I really didn't dig into that, the idea was to look at it for the "PCB version".
I am afraid that would ruin the low and especially fixed deterministic latency a good I2S deserializer should have.
OK, I need to know that you aim is, otherwise I cannot understand what your problem is.
I assume you are performing some kind of "live processing", otherwise you should not care about the "deterministic latency" of the outgoing I2S. As long as the output clock is stable and as long as the application can refill the output buffer fast enough, you will be fine.
However, I can see issues if you have a setup like:
I2S_IN -> IN_BUFFER -> APP -> OUT_BUFFER -> I2S_OUT
in this case, if the IN rate and the out rate are not exactly in sync, sooner or later you will incur in a buffer overrun at the input (when the output rate is lower) or a buffer underrun at the output (when the output rate is higher).
To avoid that, the most straightforward solution is for the PC to be the clock master for both I2S_IN and I2S_OUT or to use an external clock provider and set all devices as clock slaves.
Notice that I really did never program a system of this kind, it's all based on my assumptions about how this stuff should work, yet I would be surprised to know that things are different.
The solution we're discussing is:
I2S_IN -> IN_BUFFER -> APP -> OUT_BUFFER -> SPI -> PICO_BUFFER -> I2S_OUT
As you can see there is not a requirement on the SPI to be synchronous, as long as the average rate of the SPI signal prevents the PICO_BUFFER from overrunning or underrunning (are these even words?).
So the problem is not in the SPI part, the problem, again, is having I2S_IN and I2S_OUT running at the very same rate.
If you can use the PICO as the clock provider this will not be a problem. If you can't... well I have a couple of solutions in mind.
Do you consider releasing your code?
If I ever find the time to polish it in a reasonable way... and that could very well be never, but who knows...
 
Last edited:
BTW - how did you solve "marking" the first channel in the RPi I2S, so that PIO deserializes to correct channels? IIRC there were several I2S deserialization projects which used another GPIO from the driver for the marks but that was not 100% reliable due to timing delays, especially at higher samplerates. I thought of "misusing" LSBs of one channel of the serialized I2S - for 32bit length the LSB is way below audible. I think good to keep everything serial, fast parallel is tough (PATA -> SATA, SCSI -> SAS, PCI -> PCI-e, etc.)
Yes, the plan is to use the LSB as an in-band control signal. Notice that the only problem I have is that the input signal is:
L1 R1 L2 R2 L1 R1 L2 R2...
and I can distinguish L from R (given the LRCLK) but I cannot distinguish L1 from L2. The idea is to set the LSB of L1 to zero and the LSB of L2 to 1 (or whatever) and I will be fine.
At this point the implemented solution is even simpler: the Pico initially waits for a non-zero L frame and it assumes that's a control frame, it ignores it and it starts the processing from the next frame (that will be an L1). On the PC side I just make sure to add a 0x000001,0x00000000 sequence in front of the actual signal and I'm fine.
 
For some reason the "quote" button does not appear in your message for me.
The last post does not contain a quote button. However, it can be quoted by highlighting text with the mouse. Then a little local quote button appears which is what I used to quote your post.

Other than that, approximating an I2S clock frequency by switching between two frequencies to achieve the correct average frequency is asking for trouble in a dac. Clock jitter can be quite audible, and a deliberately jittered one has proven to be a bad idea.
 
Last edited:
The last post does not contain a quote button. However, it can be quoted by highlighting text with the mouse. Then a little local quote button appears which is what I used to quote your post.
Thanks for the hint, I'm obviously a newbie here.
Other than that, approximating an I2S clock frequency by switching between two frequencies to achieve the correct average frequency is asking for trouble in a dac. Clock jitter can be quite audible, and a deliberately jittered one has proven to be a bad idea.
I'm not getting who this applies to. I don't remember anybody here suggesting to switch between frequencies...