High quality Raspberry Pi 24bit/384k I2S card

@m0rci: Congrats and hats off.

I have been thinking about a similar project, just not for extending stereo I2S (because 4ch/8ch I2S are already easily available), but extending 8ch to 16ch or even to 32ch. There are very very few USB UAC2 -> 16ch I2S bridges, and basically none USB UAC2 -> 32ch I2S bridges, while the USB gadget can handle that data flow easily (it could run 50+ 48kHz channels with just a minor modification). But the principle is identical, no matter how many channels.

Please let me a few questions: IIUC you generate all your I2S clocks with 2350. Does 2350 have enough dividers to generate both incoming I2S clocks and outgoing I2S, or do you generate the outgoing (i.e. slower) I2S with PIO? Or do you use individual PIOs for generating the incoming and outgoing clocks with PIO assembly program, using the 2350 PLLd master clock (you say 8x BCLK)?

Does 2350 have an option for external precise clock (e.g. standard audio MCLK 24.576MHz), to avoid any PLL alltogether? Does it allow switching between clocks (e.g. to allow switching between 48kHz and 44.1kHz clocks)?

Thanks a lot!
 
A single PIO state machine generates all the clock signals (MCLK, BCLK, LRCLK) for the incoming and for the outgoing I2S signals. Notice that the two outgoing I2S signals share the same clock so it's just matter of outputting 6 clock signals (notice also that the RPi does not need a master clock and not even the MAX98357s I'm using for testing need it, but the MA12070 wants it so... it's not like I'm paying for it).
When I realized that I could offload al the clocks to a single SM and use others for signal management instead of trying to intermix clocks and signals management, it all became incredibly easier. And of course you can only do this if you're the clock master!
About clock "precision"... at this stage I'm just using a standard Pico clone board, so the PIOs' clock is extracted from the sysclk of the board. And since there is a 12MHz xtal on the board, all the 48k, 96k, etc... rates are spot on. I really don't care about 44.1 and multiples but, yes, in the future, the idea is to switch to a custom PCB and there I could use a "audio" xtal. Notice that also these small Pico boards support external clock generators / external xtal, so I assume you could "run" them with an audio xtal.
BTW: I'm using the 2040, I do also have a couple of 2350s but for this stuff I did not need them.
Then again: I'm not thinking at this as a general solution. I just support a single fixed incoming signal rate, with a single supported signal format (S32_LE) and I'm producing two stereo outgoing signals with half that rate. Adding support for dynamic rate switching, more formats, and so on would be relatively easy, but that's not my use case.
 
I have been thinking about a similar project, just not for extending stereo I2S (because 4ch/8ch I2S are already easily available), but extending 8ch to 16ch or even to 32ch. There are very very few USB UAC2 -> 16ch I2S bridges, and basically none USB UAC2 -> 32ch I2S bridges, while the USB gadget can handle that data flow easily (it could run 50+ 48kHz channels with just a minor modification). But the principle is identical, no matter how many channels.
Have you considered the creation of an ALSA virtual device that accepts, say, a 16 channels signal and just transfers this stuff (with the most convenient encoding) to a Pico via SPI? The Pico then would put all the data in a buffer that is used to feed 16 I2S output signals. They would share the same clocks, looks quite straightforward. The issue I see is that RPis do not run as SPI slaves so you will have a rate mismatch to handle. But then again you could manage the skew with some form of backpressure and when the RPi goes to much ahead or to much behind you could recur to resampling. I'm sure it can be done, not trivial but feasible.
 
I think you'd probably want the pico to be in slave mode not the pi. I do this for something else and have one core of the pico doing the work, or in your case the feeding the I2S to the DAC's. Then the other core in the pico is getting data as an spi slave to the pi. The code I posted above is the spi slave pio code. The pico also signals to the pi via a gpio that it is ready for data. That triggers the pi to send. One thing I have noted is it is difficult to send more than 16 bytes in a transaction though. I don't remember why anymore. Another issue I think I ran into was the pi wants to toggle CS for every byte, or word, can't remember anymore. Slows things down even more. So what I did is add the overlay on the pi that moves CS to an unrouted gpio and then use bit banging for CS. So a transfer is CS low via gpio, transfer 16 bytes, CS high via gpio.
 
When I realized that I could offload al the clocks to a single SM and use others for signal management instead of trying to intermix clocks and signals management, it all became incredibly easier. And of course you can only do this if you're the clock master!
IIUC generating the clocks (i.e. handling synchronous I2S) is the key enabler of your design. I thought about using PIOs for the deserialization of asynchronous I2S and it would require either external clock dividers (PIO's instructions handling only the data lines), or produce a high level of jitter on the clock lines (since clocked by async clock). I got stuck there. Your method solves this.

And since there is a 12MHz xtal on the board, all the 48k, 96k, etc... rates are spot on.
12MHz/48kHz/64bit = 3.90625 - that requires fractional dividers IIUC. These are (non-randomly) jittery by design. But replacing the clock with a different one which needs only an integer divider is easy, that's true.

Have you considered the creation of an ALSA virtual device that accepts, say, a 16 channels signal and just transfers this stuff (with the most convenient encoding) to a Pico via SPI? The Pico then would put all the data in a buffer that is used to feed 16 I2S output signals. They would share the same clocks, looks quite straightforward. The issue I see is that RPis do not run as SPI slaves so you will have a rate mismatch to handle. But then again you could manage the skew with some form of backpressure and when the RPi goes to much ahead or to much behind you could recur to resampling.
I am afraid that would ruin the low and especially fixed deterministic latency a good I2S deserializer should have. IMO the final I2S output must run synchronously to the alsa interface, to allow proper timing in the apps before the interface. Also good adaptive resampling of the many channels I would consider would take huge CPU load. The bridges often have to adaptively resample already (incoming master clock), another resampling would be too much, IMO. Also it would preclude running on low-power ARM SoCs which allow powering from the USB bus directly (like the Radxa Pi-S core with 16ch I2S already).

I would probably prefer to stay with I2S only (plus e.g. I2C for controlling the deserializer, of course).

Also I thought of using PIOs only, no DMA, again to keep the timing exact.

BTW - how did you solve "marking" the first channel in the RPi I2S, so that PIO deserializes to correct channels? IIRC there were several I2S deserialization projects which used another GPIO from the driver for the marks but that was not 100% reliable due to timing delays, especially at higher samplerates. I thought of "misusing" LSBs of one channel of the serialized I2S - for 32bit length the LSB is way below audible. I think good to keep everything serial, fast parallel is tough (PATA -> SATA, SCSI -> SAS, PCI -> PCI-e, etc.)

Do you consider releasing your code?