Asynchronous Sample Rate Conversion

Status
This old topic is closed. If you want to reopen this topic, contact a moderator using the "Report Post" button.
Hello guys! My first post here :) Never want to join a party without bringing something, so I thought i'd offer a tutorial on Asynchronous Sample Rate Conversion. I know it's been discussed many times, but I'd like to devote a full thread to an in-depth technical review of this technology. Are you guys game?

Rough outline would be something like :

1. Focus on the easiest case, modest upsampling from say 44.1kHz to 48kHz.

2. Review the details ... in both time & frequency domains ... of common filter structures used for ASRC, focusing first on "jitter free" environments.

3. Describe how the system responds to jitter on incoming data, compare to a PLL-based clock recovery system.

Sound like a plan? I know you guys know nothing about my credentials ;) but maybe we'll get through that as the thread develops, cool? Of course, I'll welcome all inputs from the experts on the forum.
 
Well that's all I need !

Let's start with a clear statement of the problem :

Input data : digital audio samples, arriving at 44.1kHz

Output data : digital audio samples, leaving at 48kHz. The 48kHz clock is derived from a higher speed local oscillator, say 512x, or 24.576MHz.

Input data rate, also called Fs_in, and output rate, also called Fs_out, are completely ASYCHRONOUS .... meaning they are NOT derived from the same fundamental timebase.

We may discuss precision of the audio samples, in terms of number of bits or dynamic range, as the thread progresses ... because the dynamic range will certainly be impacted by our filter structures. But we'll cross that bridge when we get to it :)

Now, we'll quickly find out how good a teacher I am :) because I have no visual aids :( I encourage the reader to draw two "time bases", or two horizontal lines. The top line will have periodic "ticks" corresponding to Fs_in, while the bottom line will have "ticks" corresponding to Fs_out. The lines on the bottom timebase occur more frequently, so we immediately see that the fundamental job of the Asynchronous Sample Rate Converter is to provide output audio samples corresponding to points of time IN BETWEEN the original audio samples. Sometimes only one sample in between, sometimes two ... because the output rate is faster. Make sense?

So right away we recognize the job of our ASRC to be one of INTERPOLATION.

Next post ... a couple of ideas for interpolation that we'll rule out pretty quickly.
 
Ok so we have a problem of interpolation. Let's examine a couple of ideas to solve it :

1. OK, Fs_in at 44.1kHz and Fs_out at 48kHz. No problem, you might say ... simply interpolate the input data by the integer 160, to a rate of 7.056Mhz, and follow that by integer decimation of 147 to the final output rate of 48kHz. Simple, right ?? Nope, wrong.

The problem is that the clocks are not derived from the same fundamental timebase. So the input rate will NOT be exactly 44.1kHz, and/or the output rate will NOT be exactly 48kHz when measured with the same timebase. Furthermore, this technique would not be "flexible" enough to easily accomodate other input & output rates. So it's a bad idea, fundamentally flawed for ASYNCHRONOUS sample rate conversion.

2. Well here's another idea. Why don't I just convert the incoming signal, sampled at Fs_in, to an ANALOG signal with a super-duper DAC. Then, I'll re-sample the output at the rate I choose, Fs_out, with a super-duper ADC. Problem solved, right?

Well, in a manner of speaking, yes ... it solves the problem. But, it introduces two more conversion stages, each with ugly analog electronics involved :) (please take no offense, my first love is, and will always be, analog circuit design). Futhermore, it has to be a GREAT DAC, with great anti-imaging filters, followed by a GREAT ADC, with great anti-alias filters. Why? Because all of the residual image energy "left over" from imperfect interpolation will ALIAS back down into the audio band upon ADC re-sampling, if not adequately filtered.

So these ideas aren't great. BUT, they do give us a couple clues about a better method :

Clue #1 : Async SRC really is about interpolation, but it's also about subsequent decimation. So I have to be careful that my interpolation process, whatever it is, does a REALLY good job of filtering images. Remember that filling in samples in the TIME domain is equivalent to low-pass filtering (or attenuating) images in the FREQUENCY domain ... this is the essence of interpolation.

Clue #2 : The analog model, while cumbersome in practise, is pretty sweet from a CONCEPTUAL perspective. The question is, can I approximate this analog model purely in the digital domain? After all, digital-to-analog conversion is really nothing more than interpolation, taken to the ultimate extreme :)

Let me pause for some feedback at this point. How am I doing?
 
Hi Werewolf,

I often see examples of systems using Asynchronous sample rate converters where the secondary side master clock is an exact multiple of Fs. An example: -

I have a Sony DAV-S400 that uses their S-Master digital amplifier modulator. The modulator the CXD9634Q that is manufactured and sold by Mitsubishi as the M65817AFP contains an Asynchronous sample rate converter to attenuate Phase noise from PCM and DSD data streams (the DSD is first converted to PCM). However, Sony choose to use a secondary Master clock of 49.152MHz (1024Fs). Surly, due to the finite stop-band attenuation of the SRC filters it’s a bad idea to have a 1:1 ratio between input and output data rates?

Also,

As a result of manufactures saving on silicon area, most integrated DAC these days have poor stop-band rejection. Lets say we are using a CD source @ 44.1KHz, and use a SRC @ 200KHz (not an exact multiple of FS!), would our integrated DAC now gain the benefits of the superior stop band rejection of the SRC?

Benchmark Systems have a paper on the Web titled “Jitter and its effects”, where they demonstrate that the Stop-Band rejection of decimation filters of an ADC is reduce by jitter. Would the decimation filters in a SRC be affected in the same way?

Keep up the tutorial, hope I gain a better understanding on the inner workings of SRC’s.

Are you / where you connected with our dear friends at Crystal or D2AUDIO ?

John
 
John, I'll try to address your questions :

My quick bio :
BSEE Lehigh University 83
MSEE MIT 85
Bell Labs MTS 85-89
Crystal design engineer 89-92
Crystal design manager 92-95
Crystal VP Engineering 95-97
Silicon Labs co-founder 97

So yep, know the old Crystal guys pretty well :)

Let's talk stop-band rejection of the SRC interpolators. The short answer is YES ... you need awesome stop band rejection to prevent residual image energy from aliasing back into the audio band during subsequent decimation. But here's something a bit less obvious ... ALL of the high frequency residual image energy must fold back down during decimation, because it's gotta end up in half the sample rate. Knowing this, what's the best stopband "shape" for the ASRC interpolator? Hint : it's NOT the flat, equi-ripple response you see with most DAC interpolators or ADC decimators. Flat stopband does NOT minimize total stopband energy ... but that's exactly what you need for ASRC.

On the other hand ... we have to remember that image energy is directly proportional to baseband energy. So poor stopband performance, while limiting SNR, does not impact dynamic range ... as the signal reduces in amplitude, so must the image energy. Now the aliased images will certainly not be harmonically related to the signal in any pleasing way, so we still gotta clobber them :)

But let's look at the specs for the Analog Devices 1896, for example. Worst case SNR ~120dB, dynamic range ~140dB. It's quite a performer ... must have awesome stop band rejection to hit those numbers :)

Make sense?

OK, next up we'll take about how that "analog model" can be approximated digitally, as we begin to develop a "conceptual" architecture for the ASRC. And we'll soon see that stopband rejection of the interpolator is not the ONLY thing to worry about, as far as residual image energy is concerned ...
 
A Confession...

While I am following most of your tutorial I have to admit that I have no real background in digital circuits and as such I'm not familiar with a few of the terms you use. Short definitions of a few terms would make this readable by a much wider audience (for me I'm not sure what decimation involves...)
Thanks!
Doug
 
thanks Doug! That's exactly the kinda feedback I was looking for :)

some definitions : whenever possible, I'll offer explanations in both the time & frequency domains. These two domains are NOT independent ... in fact, the uniqueness of the Fourier Transform guarantees just the opposite ... they are intimately linked. However, sometimes a certain operation or principle is just easier to see in one domain, that's the sole reason we have both :)

INTERPOLATION : In the time domain, this is simply the process we use to "fill in" digital samples BETWEEN the original samples. Usually the new samples are evenly spaced between the original samples, in which case we have "integer" interpolation. It's VERY important to note that, thanks to the Nyquist Theorem, this is NOT guesswork (for bandlimited signals). When a signal is sampled at a rate at least twice the bandwidth of the signal, the information BETWEEN samples is NOT lost ... it can be completely recovered from the original samples. Well, at least in theory ... in practise we can only approximate this fundamental truth, but to very high degrees of precision.

What's the best way, from a signal processing perspective, to "fill in" these samples? Glad you asked :) cuz that's where the frequency domain comes in real handy. You see, any discrete-time sequence ... like a digital audio signal ... contains the full audio or baseband of the original analog signal, plus periodically replicated "images" of the baseband ALL ALONG the frequency axis. Interpolation, in the frequency domain, is nothing more than removing SOME, or in the ultimate conversion back to analog, ALL of these images. What's the best way to remove high frequency images, while preserving the audio band as faithfully as possible? It's the same answer in BOTH the time & frequency domains ... an ideal LOW PASS FILTER. OF course, we can't build ideal LPF's, because their impulse response is infinitely long ... in both time directions. But we can sure build GREAT approximations digitally.

So, an interpolator is really an ideal low-pass filter. Quick example ... let's say I just want to "fill in" one sample between each input sample. There's really two conceptual steps in the process: first, "zero stuffing". We essentially consider the input signal to be coming twice as fast, with a zero inserted between each input sample. Then, the 2x rate signal feeds a low-pass filter ... output is an interpolated sequence :)

DECIMATION : really nothing more than a sampling process. In fact, it's just digitally sampling (or re-sampling) a digital or discrete-time sequence. However, like any sampling process, we have to be careful NOT to alias any high frequency trash back down into the audio band. Because once aliasing occurs, it cannot be undone. SO ... before any sampling, re-sampling, or decimation process, we have to filter. What's the best filter? Guess where we look :) Yep, frequency domain, quickly tells us that our friend, the LOW-PASS FILTER, is the ideal way to prevent aliasing.

Good example of decimation is the process used in oversampled (delta-sigma, mash, etc.) type Analog-to-Digital Converters. These little gems OVERSAMPLE the original audio signal in a clever way to trade speed for accuracy ... but ultimately provide samples at a Nyquist rate of 44.1kHz or 48kHz. The "downsampling" process is exactly decimation ... but the anti-alias low-pass filter (sometimes called the decimation filter) must PRECEED the actual downsampling process to filter (or remove) all the high frequency quantization noise associated with low-precision sampling.

Now in general rate conversion problems, we have BOTH interpolation and decimation to contend with ... turns out we really only need ONE low-pass filter to squash images (interpolation) and prevent aliasing (decimation).
 
Doug, an even more "on point" explanation of interpolation & decimation. Let's review that first rate conversion technique I mentioned ....

Let's say we are dealing with a SYNCHRONOUS (no "a" prefix :) ) rate conversion problem, from 44.1kHz to 48kHz, where both rates are derived from the SAME clock source. Meaning that, when measured with the same stopwatch (clock), the 44.1kHz rate is EXACLTY 44.1kHz, and the 48kHz rate is EXACTLY 48kHz.

I suggested that, in this environment, one could simply interpolate by 160 to 7.056MHz and decimate by 147 to 48kHz. Let's explore that in a bit more detail, to further define what we mean by interpolation and decimation.

In this sychronous environment, the first thing to do (at least conceptually) is to take the 44.1kHz data stream and stuff 160-1=159 zeros, evenly spaced in the time domain, between each sample. Then you apply the "zero-stuffed" sequence to a low-pass filter (cutoff at half the original sample rate or 22.05kHz), which will finalize the interpolation process by "filling in" the zeros with real data points. That completes the interpolation step.

Now we simply grab every 147th sample, to generate our output sequence at 48kHz. That "grabbing every 147th sample" is precisely DECIMATION. Just digital resampling, or downsampling. We didn't need a special filter for decimation in this case, cuz the interpolation filter did it's job for us :) Make sense?

But of course, in this thread we're dealing with ASYNCHRONOUS sample rate conversion. The 44.1kHz clock and the 48kHz are NOT derived from the same timebase, meaning they are not exactly the numbers they claim to be, when measured with the same stopwatch.

So, in this asynch world, if you go back to my two original timelines ... top one with ticks every 44.1kHz, bottom one with ticks every 48kHz ... the definition of Asynch rate conversion is this :

There is NO interpolation by any FINITE integer you can do on the INPUT sample points ... meaning, no finite number of samples you can evenly space between the original samples ... that will give you perfect alignment with all the OUTPUT sample points.

PLEASE let me know if this statement makes sense ... it's the very essence of Asynchronous sample rate conversion !!!! :) :) :) :)

SPEAK UP !!!! Please humor an aging engineer :)
 
Hi Werewolf.
Nice thread you started here. Were(wolf) you a teacher ;) ? I've almost all understood :) . Very "pedagogic" (sorry, dunno the english term...) Your posts are very clear, even for a non-english speaker like me.

Just a point for me please :

werewolf said:
So, an interpolator is really an ideal low-pass filter. Quick example ... let's say I just want to "fill in" one sample between each input sample. There's really two conceptual steps in the process: first, "zero stuffing". We essentially consider the input signal to be coming twice as fast, with a zero inserted between each input sample. Then, the 2x rate signal feeds a low-pass filter ... output is an interpolated sequence :)

What would be here the cutoff frequency of the ideal low pass after interpolator ? 1/(2xrate) ?

Sure some diagrams would help... But darn good explanations - for the moment ;)

Keep it up, and thanks for sharing your knowledge.
 
I have to admit that I'm no expert and I'm probably missing something very fundamental, but can you not simply convert from 44.1KHz to 48KHz by oversampling the digital input at a multiple of fs_out and then use an FIR (or some other form of digital filter) to remove the imaging components created by the modulation of the original analogue signal and the 44.1KHz sampling rate.

Will this approach work or will it cause some other strange modulation between the 44.1KHz and 48KHz signals?
 
Werewolf,

Nice to see someone as knowledgeable as yourself on this forum. I have on quick related question:

When using an SRC (which I am pretty much forced to with my Behringer DCX2496), how do I maximize jitter performance given that input will almost always be 16 bit 44.1KHz and that outputs will always be 96KHz?

I am thinking specifically about putting a master clock in the Behringer (near the DAC, or near the SRC?) and clocking the source remotely from this master oscillator.

Would I get just as good performance if I had a stable clock in the source, and a stable separate clock in the target, or is there benefit from going this route when using an SRC. I am trying to understand what the critical performance improving factors are.

Any thoughts on this?

Petter
 
Thanks guys :) I'll post a few answers before we progress with the tutorial, cool?

Cheff : The ideal interpolator is an ideal LPF, with a cutoff at half the input sample rate. This is far from obvious in the time domain ... it is STILL a topic of endless debate & confusion ... but is very obvious in the frequency domain. And of course the simple result pertains EQUALLY to both domains.

Annex : Can't really do what you're suggesting either. In the Asynch world, no amount of finite integer interpolation of the Fs_out "tciks" will give perfect alignment with Fs_in ticks, any more than the other way around. Basically, simple integer interpolation of EITHER rate will never give perfect time alignment with the OTHER rate ... because they are asynchronous.

Now I know it sounds like I've painted myself in a corner ... but please be patient for another post or two :)

Petter : sounds like you want to jump right to the conclusion! I'm not planning to hit the JITTER implications for a little while yet !
But, since you asked ...

I've always hated the S/PDIF format, for the simple reason that it's technically backwards ... just so that a single cable can be used between a transport (source) and a DAC. We all know the BEST way to manage the system timebase is to put the cleanest (highest Q, lowest jitter) clock source known to man, right next to the component that CARES about how clean the clock is ... the DAC. And then, use whatever means necessary (simple cable, FIF0's, etc.) to SLAVE the transport to the DAC ... instead of the other way around. So, for minimal jitter, here's the best ways to go (increasing order of better performance):

1. Use a PLL-based clock, recovered from the S/PDIF stream, to clock the DAC.
2. Use an Asynch SRC to receive the S/PDIF stream. This allows a local (to the DAC) clock to time the DAC, and of course provide data to the DAC on the local timebase. I think this is the best option available that's still compatible with S/PDIF. It will be the ultimate intent of this thread to compare and demonstrate the superiority of this technique to option #1.
3. Slave the transport to the DAC ... probably requires, horror of horrors, more than one cable between the source (or transport) and the DAC (or processor). I know at least a couple versions of this technique are discussed frequently on this board :)

Now if the Behringer unit in question has a Crystal CS8420, it is most probably already using option #2.

But I promise, we will discuss the jitter impacts of ASRC, after we develop two things : the conceptual plan, and the more detailed implementation architecture :)
 
Member
Joined 2003
Paid Member
Hi W-Wolf...

nice reading stuff, nice theory. But imo what counts is how it sounds. I have built few versions with ASRC the last 4 years and these are my 2 key findings:

1. If you do asynch resampling and then apply a digital filter (like 8420 followed by SM5842 or DF1704 for example) you can do almost any rate you want, totally asynchroneous, I have used 46.875kHz, 70,31kHz and 93.75kHz The clocks driving this was also clocking the DAC. All with relatively the same result: more air round instruments, better soundstage pin point. I liked best the last one, but they were all close....

2. I tried this as well with the Non Oversampling dddac1543 and it did not work well !!! Indeed you get the extra air etc, but tremendous amount of IM distortion, brrrrr with female voice you could actually HEAR the extra tones :dead: . Then I moved to the Tent XO Clock with 11.2896 MhZ, so in fact this is stays a sample rate conversion, but very very close to the original rate. And yes, I had again exactly the same benefits as described in #1.

The sound improvement is very easy to hear also in blind test (you need only a A-B switch, so indeed very easy, no cable plugging etc)

The practice is already here, so interested to read your theorie now. Still wondering what conclusions you will be drawing and if there is a hand-on part as well. I gues you have been doing some workshopping as well and not only been number crunching ??

doede

ps: as one example (I have some more test pictures) I attach a sinus of 1kHz, ARS-ed with a 47kHz clock in a non oversampling DAC.... clearly this will not sound pretty...:D
 

Attachments

  • sinus 0db 1khz fft 12mhz clock.jpg
    sinus 0db 1khz fft 12mhz clock.jpg
    22.7 KB · Views: 3,579
Status
This old topic is closed. If you want to reopen this topic, contact a moderator using the "Report Post" button.