High quality Raspberry Pi 24bit/384k I2S card

phofman · 2025-05-13 6:23 pm

@m0rci: Congrats and hats off.

I have been thinking about a similar project, just not for extending stereo I2S (because 4ch/8ch I2S are already easily available), but extending 8ch to 16ch or even to 32ch. There are very very few USB UAC2 -> 16ch I2S bridges, and basically none USB UAC2 -> 32ch I2S bridges, while the USB gadget can handle that data flow easily (it could run 50+ 48kHz channels with just a minor modification). But the principle is identical, no matter how many channels.

Please let me a few questions: IIUC you generate all your I2S clocks with 2350. Does 2350 have enough dividers to generate both incoming I2S clocks and outgoing I2S, or do you generate the outgoing (i.e. slower) I2S with PIO? Or do you use individual PIOs for generating the incoming and outgoing clocks with PIO assembly program, using the 2350 PLLd master clock (you say 8x BCLK)?

Does 2350 have an option for external precise clock (e.g. standard audio MCLK 24.576MHz), to avoid any PLL alltogether? Does it allow switching between clocks (e.g. to allow switching between 48kHz and 44.1kHz clocks)?

Thanks a lot!

m0rci · 2025-05-13 7:59 pm

A single PIO state machine generates all the clock signals (MCLK, BCLK, LRCLK) for the incoming and for the outgoing I2S signals. Notice that the two outgoing I2S signals share the same clock so it's just matter of outputting 6 clock signals (notice also that the RPi does not need a master clock and not even the MAX98357s I'm using for testing need it, but the MA12070 wants it so... it's not like I'm paying for it).
When I realized that I could offload al the clocks to a single SM and use others for signal management instead of trying to intermix clocks and signals management, it all became incredibly easier. And of course you can only do this if you're the clock master!
About clock "precision"... at this stage I'm just using a standard Pico clone board, so the PIOs' clock is extracted from the sysclk of the board. And since there is a 12MHz xtal on the board, all the 48k, 96k, etc... rates are spot on. I really don't care about 44.1 and multiples but, yes, in the future, the idea is to switch to a custom PCB and there I could use a "audio" xtal. Notice that also these small Pico boards support external clock generators / external xtal, so I assume you could "run" them with an audio xtal.
BTW: I'm using the 2040, I do also have a couple of 2350s but for this stuff I did not need them.
Then again: I'm not thinking at this as a general solution. I just support a single fixed incoming signal rate, with a single supported signal format (S32_LE) and I'm producing two stereo outgoing signals with half that rate. Adding support for dynamic rate switching, more formats, and so on would be relatively easy, but that's not my use case.

m0rci · 2025-05-13 8:22 pm

phofman said:
I have been thinking about a similar project, just not for extending stereo I2S (because 4ch/8ch I2S are already easily available), but extending 8ch to 16ch or even to 32ch. There are very very few USB UAC2 -> 16ch I2S bridges, and basically none USB UAC2 -> 32ch I2S bridges, while the USB gadget can handle that data flow easily (it could run 50+ 48kHz channels with just a minor modification). But the principle is identical, no matter how many channels.

Have you considered the creation of an ALSA virtual device that accepts, say, a 16 channels signal and just transfers this stuff (with the most convenient encoding) to a Pico via SPI? The Pico then would put all the data in a buffer that is used to feed 16 I2S output signals. They would share the same clocks, looks quite straightforward. The issue I see is that RPis do not run as SPI slaves so you will have a rate mismatch to handle. But then again you could manage the skew with some form of backpressure and when the RPi goes to much ahead or to much behind you could recur to resampling. I'm sure it can be done, not trivial but feasible.

mikeAtx · 2025-05-13 10:09 pm

I think you'd probably want the pico to be in slave mode not the pi. I do this for something else and have one core of the pico doing the work, or in your case the feeding the I2S to the DAC's. Then the other core in the pico is getting data as an spi slave to the pi. The code I posted above is the spi slave pio code. The pico also signals to the pi via a gpio that it is ready for data. That triggers the pi to send. One thing I have noted is it is difficult to send more than 16 bytes in a transaction though. I don't remember why anymore. Another issue I think I ran into was the pi wants to toggle CS for every byte, or word, can't remember anymore. Slows things down even more. So what I did is add the overlay on the pi that moves CS to an unrouted gpio and then use bit banging for CS. So a transfer is CS low via gpio, transfer 16 bytes, CS high via gpio.

phofman · 2025-05-14 7:43 am

m0rci said:
When I realized that I could offload al the clocks to a single SM and use others for signal management instead of trying to intermix clocks and signals management, it all became incredibly easier. And of course you can only do this if you're the clock master!

IIUC generating the clocks (i.e. handling synchronous I2S) is the key enabler of your design. I thought about using PIOs for the deserialization of asynchronous I2S and it would require either external clock dividers (PIO's instructions handling only the data lines), or produce a high level of jitter on the clock lines (since clocked by async clock). I got stuck there. Your method solves this.

m0rci said:
And since there is a 12MHz xtal on the board, all the 48k, 96k, etc... rates are spot on.

12MHz/48kHz/64bit = 3.90625 - that requires fractional dividers IIUC. These are (non-randomly) jittery by design. But replacing the clock with a different one which needs only an integer divider is easy, that's true.

m0rci said:
Have you considered the creation of an ALSA virtual device that accepts, say, a 16 channels signal and just transfers this stuff (with the most convenient encoding) to a Pico via SPI? The Pico then would put all the data in a buffer that is used to feed 16 I2S output signals. They would share the same clocks, looks quite straightforward. The issue I see is that RPis do not run as SPI slaves so you will have a rate mismatch to handle. But then again you could manage the skew with some form of backpressure and when the RPi goes to much ahead or to much behind you could recur to resampling.

I am afraid that would ruin the low and especially fixed deterministic latency a good I2S deserializer should have. IMO the final I2S output must run synchronously to the alsa interface, to allow proper timing in the apps before the interface. Also good adaptive resampling of the many channels I would consider would take huge CPU load. The bridges often have to adaptively resample already (incoming master clock), another resampling would be too much, IMO. Also it would preclude running on low-power ARM SoCs which allow powering from the USB bus directly (like the Radxa Pi-S core with 16ch I2S already).

I would probably prefer to stay with I2S only (plus e.g. I2C for controlling the deserializer, of course).

Also I thought of using PIOs only, no DMA, again to keep the timing exact.

BTW - how did you solve "marking" the first channel in the RPi I2S, so that PIO deserializes to correct channels? IIRC there were several I2S deserialization projects which used another GPIO from the driver for the marks but that was not 100% reliable due to timing delays, especially at higher samplerates. I thought of "misusing" LSBs of one channel of the serialized I2S - for 32bit length the LSB is way below audible. I think good to keep everything serial, fast parallel is tough (PATA -> SATA, SCSI -> SAS, PCI -> PCI-e, etc.)

Do you consider releasing your code?

m0rci · 2025-05-14 10:09 am

For some reason the "quote" button does not appear in your message for me. Do anybody know why? Oh well, manual quoting, I guess...

12MHz/48kHz/64bit = 3.90625

That shows I can't do math. But yes, I really didn't dig into that, the idea was to look at it for the "PCB version".

I am afraid that would ruin the low and especially fixed deterministic latency a good I2S deserializer should have.

OK, I need to know that you aim is, otherwise I cannot understand what your problem is.
I assume you are performing some kind of "live processing", otherwise you should not care about the "deterministic latency" of the outgoing I2S. As long as the output clock is stable and as long as the application can refill the output buffer fast enough, you will be fine.
However, I can see issues if you have a setup like:
I2S_IN -> IN_BUFFER -> APP -> OUT_BUFFER -> I2S_OUT
in this case, if the IN rate and the out rate are not exactly in sync, sooner or later you will incur in a buffer overrun at the input (when the output rate is lower) or a buffer underrun at the output (when the output rate is higher).
To avoid that, the most straightforward solution is for the PC to be the clock master for both I2S_IN and I2S_OUT or to use an external clock provider and set all devices as clock slaves.
Notice that I really did never program a system of this kind, it's all based on my assumptions about how this stuff should work, yet I would be surprised to know that things are different.
The solution we're discussing is:
I2S_IN -> IN_BUFFER -> APP -> OUT_BUFFER -> SPI -> PICO_BUFFER -> I2S_OUT
As you can see there is not a requirement on the SPI to be synchronous, as long as the average rate of the SPI signal prevents the PICO_BUFFER from overrunning or underrunning (are these even words?).
So the problem is not in the SPI part, the problem, again, is having I2S_IN and I2S_OUT running at the very same rate.
If you can use the PICO as the clock provider this will not be a problem. If you can't... well I have a couple of solutions in mind.

Do you consider releasing your code?

If I ever find the time to polish it in a reasonable way... and that could very well be never, but who knows...

m0rci · 2025-05-14 10:44 am

phofman said:
BTW - how did you solve "marking" the first channel in the RPi I2S, so that PIO deserializes to correct channels? IIRC there were several I2S deserialization projects which used another GPIO from the driver for the marks but that was not 100% reliable due to timing delays, especially at higher samplerates. I thought of "misusing" LSBs of one channel of the serialized I2S - for 32bit length the LSB is way below audible. I think good to keep everything serial, fast parallel is tough (PATA -> SATA, SCSI -> SAS, PCI -> PCI-e, etc.)

Yes, the plan is to use the LSB as an in-band control signal. Notice that the only problem I have is that the input signal is:
L1 R1 L2 R2 L1 R1 L2 R2...
and I can distinguish L from R (given the LRCLK) but I cannot distinguish L1 from L2. The idea is to set the LSB of L1 to zero and the LSB of L2 to 1 (or whatever) and I will be fine.
At this point the implemented solution is even simpler: the Pico initially waits for a non-zero L frame and it assumes that's a control frame, it ignores it and it starts the processing from the next frame (that will be an L1). On the PC side I just make sure to add a 0x000001,0x00000000 sequence in front of the actual signal and I'm fine.

Markw4 · 2025-05-14 3:37 pm

m0rci said:
For some reason the "quote" button does not appear in your message for me.

The last post does not contain a quote button. However, it can be quoted by highlighting text with the mouse. Then a little local quote button appears which is what I used to quote your post.

Other than that, approximating an I2S clock frequency by switching between two frequencies to achieve the correct average frequency is asking for trouble in a dac. Clock jitter can be quite audible, and a deliberately jittered one has proven to be a bad idea.

m0rci · 2025-05-14 3:55 pm

Markw4 said:
The last post does not contain a quote button. However, it can be quoted by highlighting text with the mouse. Then a little local quote button appears which is what I used to quote your post.

Thanks for the hint, I'm obviously a newbie here.

Markw4 said:
Other than that, approximating an I2S clock frequency by switching between two frequencies to achieve the correct average frequency is asking for trouble in a dac. Clock jitter can be quite audible, and a deliberately jittered one has proven to be a bad idea.

I'm not getting who this applies to. I don't remember anybody here suggesting to switch between frequencies...

Markw4 · 2025-05-14 3:58 pm

m0rci said:
I don't remember anybody here suggesting to switch between frequencies...

Its something that was done with RPi GPIO bus in the past to approximate I2S bus clock frequencies needed by dacs. Apparently the RPi clock was not integer divisible to produce the exact needed clock frequencies.

phofman · 2025-05-15 9:12 am

m0rci said:
The solution we're discussing is:
I2S_IN -> IN_BUFFER -> APP -> OUT_BUFFER -> SPI -> PICO_BUFFER -> I2S_OUT
As you can see there is not a requirement on the SPI to be synchronous, as long as the average rate of the SPI signal prevents the PICO_BUFFER from overrunning or underrunning

That is true. But for low-latency applications the alsa device should not only run one average in the correct rate of the output clock, but also the period expirations should be as regularly-paced as possible, so that the buffers involved can be minimal. The SPI running at a different speed than the output clock would have to be gated by some feedback protocol from the MCU, yet the alsa device should be consuming samples at regular intervals. That means another buffer in the driver. Then there is another buffer in the MCU for DMA to the PIO emulating the I2S protocol. All these buffers and feedback gating not only increases the latency, but also reduce precision of the latency. E.g. Video can be delayed for the audio latency, but only if the latency is known in advance and reasonably stable between restarts.

E.g. CamillaDSP being a great example of an I/O buffer with very advanced async resampling. As you said input and output clocks can differ. Async/adaptive resampling monitors the I/O rate ratio and adjusts the resampling ratio as needed. The resampling ratio should not change a lot because any change in the ratio introduces minor but unremovable distortions to the stream. However if the consumer/output rate deviates (although on average it's spot on), the feedback value will deviate too. And the slower the feedback, the larger the buffers must be to accommodate the deviation before the feedback fixes the buffer fill.

That's why I am looking for a way to use the I2S interface directly (only one buffer -> DMA -> short FIFO in the I2S interface), running as slave, with the perfectly deterministic PIO for deserializing the multiple-rate data line to the several final-rate data lines.

m0rci said:
The idea is to set the LSB of L1 to zero and the LSB of L2 to 1 (or whatever) and I will be fine.

That was exactly what I thought too. It's corrupting the L's LSBs but acceptable at 32bit, IMO.

m0rci said:
At this point the implemented solution is even simpler: the Pico initially waits for a non-zero L frame and it assumes that's a control frame, it ignores it and it starts the processing from the next frame (that will be an L1)

I would be a bit afraid that pausing or restarts especially after xruns can cause the first sample be skipped/dropped. IMO the method should allow resyncing within the running stream, like all streaming protocols do (SPDIF - the side bits, RTP, AC3/DTS - chunks preceeded by a header, etc...)

phofman · 2025-05-15 9:16 am

Markw4 said:
Its something that was done with RPi GPIO bus in the past to approximate I2S bus clock frequencies needed by dacs. Apparently the RPi clock was not integer divisible to produce the exact needed clock frequencies.

It's the only way for RPi4 and older to generate master I2S clocks. That's why I find it a shame that most RPi hats are I2S slaves instead of masters with local clock, and many users resort to plain FIFO hacks instead of properly slaving the I2S transmitter to precise clock in the first place.

m0rci · 2025-05-15 9:37 am

phofman said:
That's why I am looking for a way to use the I2S interface directly (only one buffer -> DMA -> short FIFO in the I2S interface), running as slave, with the perfectly deterministic PIO for deserializing the multiple-rate data line to the several final-rate data lines.

I understand. Yet I would not ditch the solution a priori. If the in and out rates are indeed the same (which would be the case if using all devices as slaves) you can probably get away with very small buffers.
And in any case, the Pico solution just adds another buffer to the chain (it's not like you have three more). If you can keep the size of this buffer at a reasonable size maybe you could get away with it.
Consider that most existing sound applications apply quite a lot of buffering (often just to "play safe") and most users do not ever realize that. But yes, if you cannot enforce the same clocks at in and out you also have the issues with resampling and so forth...

m0rci · 2025-05-15 9:41 am

phofman said:
I would be a bit afraid that pausing or restarts especially after xruns can cause the first sample be skipped/dropped. IMO the method should allow resyncing within the running stream, like all streaming protocols do (SPDIF - the side bits, RTP, AC3/DTS - chunks preceeded by a header, etc...)

And that's the reason why I'm planning to adopt the other solution.

phofman · 2025-05-15 10:14 am

m0rci said:
Consider that most existing sound applications apply quite a lot of buffering (often just to "play safe") and most users do not ever realize that.

Yes, but the massively multichannel model which I am interested in (as these are not readily available) is mostly for AV and/or advanced audio work where latency matters. That's why a solution I would be looking for (just considering for now 🙂 ) would have the small and predictable latency as a major criterium. I would imagine a hat-sized module (not limited to RPi, but the same input pinout for 8ch I2S, 1x I2C, and likely several GPIOs) with two selectable audio-frequency-compatible clocks and several RP2350s for extending the 8ch 384kHz I2S to either 16ch (192kHz output max) or 32ch (96kHz output max) (the PIO code would likely be similar). The I2S outputs would include MCLK master line for the codecs, as (basically) all I2S interfaces outside the RPi realm do.

eclipsevl · 2025-05-15 10:58 am

m0rci said:
I'm not getting who this applies to. I don't remember anybody here suggesting to switch between frequencies...

This is basically how fractional frequency dividers work. The division ratio is still integer, usually modulated by sigma delta. The clock edge positions can be compensated by dynamic delay lines to minimize the TIA. It is basically a noise shaping technique to move the noise to higher offset frequencies.

m0rci · 2025-05-20 11:55 am

I never dug into fractional frequency dividers but your explanation makes sense: the PLL acts as a low pass filter and the incoming signal is sigma-delta modulated. It seems to me that this approach, if well designed, should indeed produce a rather stable clock signal. So you're saying somebody tried to achieve this without the PLL/low pass stage? I bet you get lousy results...

eclipsevl · 2025-05-20 4:38 pm

m0rci said:
the PLL acts as a low pass filter and the incoming signal is sigma-delta modulated. It seems to me that this approach, if well designed, should indeed produce a rather stable clock signal.

This is how Frac-N PLL work, yes. The feedback divider is integer but modulated with SD to get fractional divider value. The SD noise is filtered by the PLL loop filrer.

m0rci said:
So you're saying somebody tried to achieve this without the PLL/low pass stage? I bet you get lousy results...

Yes, but depends on implementation. There are noise cancellation techincs that can yield very good results. For example, cdcm6208

But I guess MCU have very basic ones, for example PIC32MZ has fractional dividers too but the edge jitter is so high that it is visible on the scope.

phofman · 2025-05-20 7:58 pm

IIUC standard fractional dividers do not employ the PLL loop, but just switch between adjacent integer divider values to achieve the (jittery) fractional average.

Somewhere I read that PLL is a very costly part of the chip (taking lots of space), therefore MCUs/SoCs have only very limited number of PLLs.

eclipsevl · 2025-05-21 9:15 am

phofman said:
Somewhere I read that PLL is a very costly part of the chip (taking lots of space), therefore MCUs/SoCs have only very limited number of PLLs.

Area and power, yes.

phofman said:
IIUC standard fractional dividers do not employ the PLL loop, but just switch between adjacent integer divider values to achieve the (jittery) fractional average.

Yes, in pure digital implementation (=what is mostly used in MCUs) that is the case. But there are clock generators that implement noise cancellation (needs some analog circuitry but not as complex as PLL) and filter most of the jitter. Not as clean as PLL can be but still good for many applications.

Search

Amplifiers

Source & Line

Loudspeakers

Design & Build

General Interest

Live Sound

Member Areas

Site

Featured Vendors

Members Market

Vendors Market

Vendors

Search

High quality Raspberry Pi 24bit/384k I2S card

phofman

m0rci

m0rci

mikeAtx

phofman

m0rci

m0rci

Markw4

m0rci

Markw4

phofman

phofman

m0rci

m0rci

phofman

eclipsevl

m0rci

eclipsevl

phofman

eclipsevl