Resample dev modules.

I'm building a DIY digital path router. It's not a mixer, although it will down/up mix. It's not a DSP although it will do basic EQ. It's not really even about music, although music will be played through it. It's just a big hub for connecting the plethora of USB, 3.5mm jack and Optical devices into a common set of buses ending up with an N to N digital router.

I can feel the hatred already, but I'm not even going "High Fidelity" from the start. V1.0 will be 48K 16bit across the board. Which brings me to my question.

Given that audio clocks, well, I'm not going to go chasing baller xtals and trying to make clocks play nicely. Life is too short. I have been well advised by those "in the know" in such aspects that clocks can't really be simplified away.

So, unless I want to handle all the potential nasties that clock skew and drift WILL cause in software, or I want to invest a decade learning to program a "real" generic DSP, my best option seems to be Jelly bean resample ICs. In my case they won't be resampling, same format both sides, but what they will do is resample across clock domains.

I see a few posts in here around similar, but usually either buying audiophile preassembled units with DAC/ADCs and/or all the high end stuff in tow.

I'm wondering if anyone has any experience with the basic reclockers/sample convertors. I'm looking for the consumer grade market, similar to, say the level of the PCM5102 or PCM2706, which is basically just setting some pins for your preferred format and/or let it auto detect as slave on both sides, whatever.

I won't exclude chips that do highend formats, as long as they also do basic DVD quality 48K. If V1.0 works, then V2.0 may well have a specific "music" path for a bit higher, like 24/96.

I've tried trawling through Farnell and TI's site, but browsing datasheets is never easy. I also tried trawling AliExpress for clones of other modules, but I found too wide a range on price and functionality, so .. I came here to ask.
 
Any reason for not using ALSA etc? It’s a two commands to pipe data or use the mixer..
It's interesting point. It would couple all audio to a single PC of course. I do have a 24/7 PC, but it's a noisy beast. Fine if it's not doing the DAC/ADC portions. Moving those external and battery powering them would free me from that noise domain, if I need to go to opto or mag.

How does USB Host to USB Host audio work though?

ALSA in general ... I dislike it. Although that's not exactly an objective opinion it's just that me an ALSA don't get along. It has been nothing BUT a burden to me over the years of using Linux. I mean the OSS kernel module (remember it?) it supported multiplexing out of the box. ALSA back in the day ... just didn't. So you had to have a sound daemon to do that, which meant that only half your applications supported your sound daemon. Then there is the thing of it randomly readdressing all your dynamic sound devices when you plug in them in. Forcing you to go an lock everything down a hardware ID... until you get another new device or you mate brings round his headset for you to try... it might be a lot better today.... but my current issue with ALSA is that it sends the wrong sample rate from VirtualBox over RDP so I can tell my dev VMs sound from others as it's plays at twice frequency with drop outs.

Anyway, partly there is the hobby aspect, a little "tool orientated design". Partly I want the box to be standalone, battery powered and place the minimum of requirements on the end devices, USB 48K, Optical 48K. The EQ is just because I ... don't have a central output EQ, I have to rely on junky software EQs in browsers for music with EQ.

No... I'm not paying the prices AudioPhiles want for devices that contain £3.99 ICs and a bunch of passives for £500. I know how the maths of electronics works. A £500 ADC/DAC Audiophile "sound card" is probably only about £50 of hardware, £50 of engineering and £400 of productisation, distribution, retail and marketing. If you skip all of the later you get a £500 worth of audio box for £50-100 + a whole lot of your time!
 
If you want a full USB support (ie be able to use hubs etc) then you will need to buffer between the devices at the application layer (ie a streaming relay) and it will have issues multiple synchronising streams for coordinated play back (ie active speakers) without clock synchronisation support.
Your data frames for each speaker would then have sample clocks in the datastream etc.
 
If you want a full USB support (ie be able to use hubs etc) then you will need to buffer between the devices at the application layer (ie a streaming relay) and it will have issues multiple synchronising streams for coordinated play back (ie active speakers) without clock synchronisation support.

Oh. Took me a while to see what you mean. I have no requirments for audio sync between inputs (or even outputs really). Not yet anyway.

In practice while I would like any or all outputs active, it's not like I am going to using that for running multi-room speakers for example where sync and delay are important. It's more likely to be the desktop speakers on a DAC line out, the headphone on an internal DIY headphone amp and maybe a BT source for the wireless headphones. It's unlikely all will be active and very unlikely they will all be playing the same source. So I can live without audio sync control. As long as I maintain audio sync across stereo pairs.

For down-mixing my first prototype works fine. It literally just waits for both buffers to have a readable portion, process it asynchronously and wait for the next alignement. It's a prototype, both of those inputs need to be active for it to work. I'm not too afraid of the routing logic to make the streams optional and dynamic, hopefully I shouldn't be 🙂
 
Resamping in realtime has all the issues that static analysis and conversion doesn’t. Most of those chips have a FIR filter with programmable coefficients and etc but a static analysis can use DFT across the entire piece of music and use that intensive brute force to resample and reconstruct. For incidentals it can take take into account and scale over if desired.
 
Oh. Took me a while to see what you mean. I have no requirments for audio sync between inputs (or even outputs really). Not yet anyway.

In practice while I would like any or all outputs active, it's not like I am going to using that for running multi-room speakers for example where sync and delay are important. It's more likely to be the desktop speakers on a DAC line out, the headphone on an internal DIY headphone amp and maybe a BT source for the wireless headphones. It's unlikely all will be active and very unlikely they will all be playing the same source. So I can live without audio sync control. As long as I maintain audio sync across stereo pairs.

For down-mixing my first prototype works fine. It literally just waits for both buffers to have a readable portion, process it asynchronously and wait for the next alignement. It's a prototype, both of those inputs need to be active for it to work. I'm not too afraid of the routing logic to make the streams optional and dynamic, hopefully I shouldn't be 🙂
Just asa full stop to that idea - do not underestimate the complexity for distributed synchronisation! 😀
 
Interesting. The architecture is open, to be honest each bit of it is juggling between different options. Most surround where to push the various inherent problems. Can they be encapsulated in ICs and "I don't need to care" or is it better to just do everything in software.

All too often you find an IC that solves 100% of your problem, you get excited and then you find a single line in the datasheet which basically says, "That thing you wanted to use this for... it's not going to work for you."
 
Latency is an issue, one I'm just going to try and out run. If I have to deal with it, then that will be a pain. The USB inputs are likely to carry audio which is very much synced to video. I know that modern USB (and BT) devices have a way of reporting their latency to the host and you will see Netflix and Youtube pause an realign buffers to meet that latency (ie, delay the video). Having a dynamic and multi-path audio path would make calculating that dynamic latency and reporting it to each USB endpoint individually would be a real pain in the ****.

So I'm wing and a prayer hoping I can maintain 1ms buffers across from in, to out.

I intend to do that, in part, by breaking with tradition and not using I2S internally. I2S 'packets' will be bussed over highspeed SPI at 50Mbit/s. Thus they will be decoupled from the audio clocks entirely for the greater part. The way I see it. 1ms of audio clocked in. No matter where or how I send it around, it still remains 1ms of audio. When it reaches an output I2S buffer it can be clocked back out over 1ms.

I can go up to high ms numbers, however I run the risk of latency building up. The bottom line point is... buffers are useless. They can only buy you time. With caveats: If you can't process 1ms of audio data in 1ms, it's game over anyway. (The list of caveats include: efficiencies and overheads, temporary glitches and so on).

Ideally I want <10ms end to end. I'm starting with 1ms until something tells me I can't do that. 🙂
 
Buffers are just being used as a queue so queue theory takes over. So it’s irrespective if the system transfers 50% of the audio data ahead of playback or not, the key is getting the clocks synchronised so everything hits your ears at the right time.
Once calibrated/analysed, a playback optimal model can generate code (or at least the latency solution).
Then you have DSPs such as camilla etc that can correct for room aberrations at the same time. Etc etc. just wait for atmospheric effects to pan left to right in 3D.. with latency timing etc then you need to be accurate.

Like I said it becomes a large and complex subject matter (i did distributed and parallel systems as a degree specialisation).

Now if you only want one source and destination at once.. then that’s easier 🙂
 
Last edited:
The resampling on the fly is often used to shift noise etc, however it can open you up to other distortion such as zero offset or where the end of the interpolated sample then doesn’t match the next real data sample. Even interpolating ahead doesn’t solve large transients etc so it’s a bit of a compromise.
 
Yes. So you see why I'm lucky I don't have sync requirements. 🙂

My finger in the air theory around the clocks is as follows:

I will have a "master" clock on my board for anything that is under my clocking control. So that's anything that accepts my master clock input. In software I will handle the rare occurances of buffer wrap. Stuff/drop.

For devices which insist on crossing clock domains (born from the simplicity that aligns with my laziness), like consumer USB Bridges with 12MHz clocks and 100:1 clock/sample ratios.... I can accept that their output will be nicely formed. If it's 48K@16 clocked by a 12MHz MLCK, then passing it through resampler into a clock domain of 24.576Mhz should provide sensible amount of overlap in the (over)sampling.

What is SOTA?
 
As an asides... a large part of this project is because I want to handle the low level buffers, synchronisation, inter-MCU IPC, etc. etc. Is because my day job is doing this at enterprise level and I miss the days of hacking critical path low level C code in stock exhange back ends to gain a microsecond on competitors. In enterprise Java is just a list of dependencies and "mechano" software.
 
Yes. So you see why I'm lucky I don't have sync requirements. 🙂
Why "sync requirements" do you mean specifically?
For devices which insist on crossing clock domains (born from the simplicity that aligns with my laziness), like consumer USB Bridges with 12MHz clocks and 100:1 clock/sample ratios.... I can accept that their output will be nicely formed. If it's 48K@16 clocked by a 12MHz MLCK, then passing it through resampler into a clock domain of 24.576Mhz should provide sensible amount of overlap in the (over)sampling.
I am afraid I do not understand this. The 12MHz clock of a USB bridge is used for the bridge operation, not for generating the output I2S signal. Either the bridge runs in adaptive mode, then the output signal clock is generated from the incoming USB data stream via PLL (i.e. effectively synced to clock of the USB host controller which sends the stream). Or the bridge runs in async mode, then the output signal clock is fixed, provided externally, or through some PLL within the USB bridge. In both cases the output clock is already related to the audio samplerate, not the 12MHz operation clock of the USB bridge.

What is SOTA?
State of the art, top performance
 
  • Like
Reactions: paulca
Why "sync requirements" do you mean specifically?

Often the requirements of simultaneous multi-input systems is to "mix" multisources such as instruments, micrphones etc. You would not, for example want the guitar to slip ahead of the vocals. However I have no such requirements. On simultaneous outputs sync requirements would be needed if you were powering 16 speakers in a hall. You need to have specific delays to not create phase pockets etc. (I a assuming).

In my case if two outputs happen to fall perfectly out of phase while playing the same thing, it matters not. It's not a mixer and it's not for distributed audio.

I am afraid I do not understand this. The 12MHz clock of a USB bridge is used for the bridge operation, not for generating the output I2S signal.

You are indeed correct, however it's irrelevant, I have no control over that clock. So I can try and sync clock domains, I can resample or I can stuff/drop to prevent buffer wrap.
 
I mean, maybe I am "barking up the wrong tree".

The reason I'm concerned with clocks is not such that all of my audio is perfectly sync'd by the same clock such that it may not slip.

The reason I am concerned with clocks is rather that clocks drift and jitter and if I try and mix 2 clock domains without intervention the drift will invariably go one direction or other and will eventually run off the end of a buffer somewhere.

This is exactly what happens with a USB ASync endpoint if you don't fix it.

I was hoping to use resampler ICs to "clean" that up and reclock it to my internal clock such that for most of the internals everything is the same MCLK, SCLK and BIT Clock. Buffer overruns/underruns due to clock quality in a single clock domain should be fairly rare, so I don't need to be that precious about handling them. A click ever blue moon is fine if it even comes to that.