Asynchronous Sample Rate Conversion
Hello guys! My first post here :) Never want to join a party without bringing something, so I thought i'd offer a tutorial on Asynchronous Sample Rate Conversion. I know it's been discussed many times, but I'd like to devote a full thread to an in-depth technical review of this technology. Are you guys game?
Rough outline would be something like :
1. Focus on the easiest case, modest upsampling from say 44.1kHz to 48kHz.
2. Review the details ... in both time & frequency domains ... of common filter structures used for ASRC, focusing first on "jitter free" environments.
3. Describe how the system responds to jitter on incoming data, compare to a PLL-based clock recovery system.
Sound like a plan? I know you guys know nothing about my credentials ;) but maybe we'll get through that as the thread develops, cool? Of course, I'll welcome all inputs from the experts on the forum.
I know I'd certainly be interested in reading your tutorial!
Well that's all I need !
Let's start with a clear statement of the problem :
Input data : digital audio samples, arriving at 44.1kHz
Output data : digital audio samples, leaving at 48kHz. The 48kHz clock is derived from a higher speed local oscillator, say 512x, or 24.576MHz.
Input data rate, also called Fs_in, and output rate, also called Fs_out, are completely ASYCHRONOUS .... meaning they are NOT derived from the same fundamental timebase.
We may discuss precision of the audio samples, in terms of number of bits or dynamic range, as the thread progresses ... because the dynamic range will certainly be impacted by our filter structures. But we'll cross that bridge when we get to it :)
Now, we'll quickly find out how good a teacher I am :) because I have no visual aids :( I encourage the reader to draw two "time bases", or two horizontal lines. The top line will have periodic "ticks" corresponding to Fs_in, while the bottom line will have "ticks" corresponding to Fs_out. The lines on the bottom timebase occur more frequently, so we immediately see that the fundamental job of the Asynchronous Sample Rate Converter is to provide output audio samples corresponding to points of time IN BETWEEN the original audio samples. Sometimes only one sample in between, sometimes two ... because the output rate is faster. Make sense?
So right away we recognize the job of our ASRC to be one of INTERPOLATION.
Next post ... a couple of ideas for interpolation that we'll rule out pretty quickly.
Ok so we have a problem of interpolation. Let's examine a couple of ideas to solve it :
1. OK, Fs_in at 44.1kHz and Fs_out at 48kHz. No problem, you might say ... simply interpolate the input data by the integer 160, to a rate of 7.056Mhz, and follow that by integer decimation of 147 to the final output rate of 48kHz. Simple, right ?? Nope, wrong.
The problem is that the clocks are not derived from the same fundamental timebase. So the input rate will NOT be exactly 44.1kHz, and/or the output rate will NOT be exactly 48kHz when measured with the same timebase. Furthermore, this technique would not be "flexible" enough to easily accomodate other input & output rates. So it's a bad idea, fundamentally flawed for ASYNCHRONOUS sample rate conversion.
2. Well here's another idea. Why don't I just convert the incoming signal, sampled at Fs_in, to an ANALOG signal with a super-duper DAC. Then, I'll re-sample the output at the rate I choose, Fs_out, with a super-duper ADC. Problem solved, right?
Well, in a manner of speaking, yes ... it solves the problem. But, it introduces two more conversion stages, each with ugly analog electronics involved :) (please take no offense, my first love is, and will always be, analog circuit design). Futhermore, it has to be a GREAT DAC, with great anti-imaging filters, followed by a GREAT ADC, with great anti-alias filters. Why? Because all of the residual image energy "left over" from imperfect interpolation will ALIAS back down into the audio band upon ADC re-sampling, if not adequately filtered.
So these ideas aren't great. BUT, they do give us a couple clues about a better method :
Clue #1 : Async SRC really is about interpolation, but it's also about subsequent decimation. So I have to be careful that my interpolation process, whatever it is, does a REALLY good job of filtering images. Remember that filling in samples in the TIME domain is equivalent to low-pass filtering (or attenuating) images in the FREQUENCY domain ... this is the essence of interpolation.
Clue #2 : The analog model, while cumbersome in practise, is pretty sweet from a CONCEPTUAL perspective. The question is, can I approximate this analog model purely in the digital domain? After all, digital-to-analog conversion is really nothing more than interpolation, taken to the ultimate extreme :)
Let me pause for some feedback at this point. How am I doing?
I often see examples of systems using Asynchronous sample rate converters where the secondary side master clock is an exact multiple of Fs. An example: -
I have a Sony DAV-S400 that uses their S-Master digital amplifier modulator. The modulator the CXD9634Q that is manufactured and sold by Mitsubishi as the M65817AFP contains an Asynchronous sample rate converter to attenuate Phase noise from PCM and DSD data streams (the DSD is first converted to PCM). However, Sony choose to use a secondary Master clock of 49.152MHz (1024Fs). Surly, due to the finite stop-band attenuation of the SRC filters it’s a bad idea to have a 1:1 ratio between input and output data rates?
As a result of manufactures saving on silicon area, most integrated DAC these days have poor stop-band rejection. Lets say we are using a CD source @ 44.1KHz, and use a SRC @ 200KHz (not an exact multiple of FS!), would our integrated DAC now gain the benefits of the superior stop band rejection of the SRC?
Benchmark Systems have a paper on the Web titled “Jitter and its effects”, where they demonstrate that the Stop-Band rejection of decimation filters of an ADC is reduce by jitter. Would the decimation filters in a SRC be affected in the same way?
Keep up the tutorial, hope I gain a better understanding on the inner workings of SRC’s.
Are you / where you connected with our dear friends at Crystal or D2AUDIO ?
John, I'll try to address your questions :
My quick bio :
BSEE Lehigh University 83
MSEE MIT 85
Bell Labs MTS 85-89
Crystal design engineer 89-92
Crystal design manager 92-95
Crystal VP Engineering 95-97
Silicon Labs co-founder 97
So yep, know the old Crystal guys pretty well :)
Let's talk stop-band rejection of the SRC interpolators. The short answer is YES ... you need awesome stop band rejection to prevent residual image energy from aliasing back into the audio band during subsequent decimation. But here's something a bit less obvious ... ALL of the high frequency residual image energy must fold back down during decimation, because it's gotta end up in half the sample rate. Knowing this, what's the best stopband "shape" for the ASRC interpolator? Hint : it's NOT the flat, equi-ripple response you see with most DAC interpolators or ADC decimators. Flat stopband does NOT minimize total stopband energy ... but that's exactly what you need for ASRC.
On the other hand ... we have to remember that image energy is directly proportional to baseband energy. So poor stopband performance, while limiting SNR, does not impact dynamic range ... as the signal reduces in amplitude, so must the image energy. Now the aliased images will certainly not be harmonically related to the signal in any pleasing way, so we still gotta clobber them :)
But let's look at the specs for the Analog Devices 1896, for example. Worst case SNR ~120dB, dynamic range ~140dB. It's quite a performer ... must have awesome stop band rejection to hit those numbers :)
OK, next up we'll take about how that "analog model" can be approximated digitally, as we begin to develop a "conceptual" architecture for the ASRC. And we'll soon see that stopband rejection of the interpolator is not the ONLY thing to worry about, as far as residual image energy is concerned ...
While I am following most of your tutorial I have to admit that I have no real background in digital circuits and as such I'm not familiar with a few of the terms you use. Short definitions of a few terms would make this readable by a much wider audience (for me I'm not sure what decimation involves...)
thanks Doug! That's exactly the kinda feedback I was looking for :)
some definitions : whenever possible, I'll offer explanations in both the time & frequency domains. These two domains are NOT independent ... in fact, the uniqueness of the Fourier Transform guarantees just the opposite ... they are intimately linked. However, sometimes a certain operation or principle is just easier to see in one domain, that's the sole reason we have both :)
INTERPOLATION : In the time domain, this is simply the process we use to "fill in" digital samples BETWEEN the original samples. Usually the new samples are evenly spaced between the original samples, in which case we have "integer" interpolation. It's VERY important to note that, thanks to the Nyquist Theorem, this is NOT guesswork (for bandlimited signals). When a signal is sampled at a rate at least twice the bandwidth of the signal, the information BETWEEN samples is NOT lost ... it can be completely recovered from the original samples. Well, at least in theory ... in practise we can only approximate this fundamental truth, but to very high degrees of precision.
What's the best way, from a signal processing perspective, to "fill in" these samples? Glad you asked :) cuz that's where the frequency domain comes in real handy. You see, any discrete-time sequence ... like a digital audio signal ... contains the full audio or baseband of the original analog signal, plus periodically replicated "images" of the baseband ALL ALONG the frequency axis. Interpolation, in the frequency domain, is nothing more than removing SOME, or in the ultimate conversion back to analog, ALL of these images. What's the best way to remove high frequency images, while preserving the audio band as faithfully as possible? It's the same answer in BOTH the time & frequency domains ... an ideal LOW PASS FILTER. OF course, we can't build ideal LPF's, because their impulse response is infinitely long ... in both time directions. But we can sure build GREAT approximations digitally.
So, an interpolator is really an ideal low-pass filter. Quick example ... let's say I just want to "fill in" one sample between each input sample. There's really two conceptual steps in the process: first, "zero stuffing". We essentially consider the input signal to be coming twice as fast, with a zero inserted between each input sample. Then, the 2x rate signal feeds a low-pass filter ... output is an interpolated sequence :)
DECIMATION : really nothing more than a sampling process. In fact, it's just digitally sampling (or re-sampling) a digital or discrete-time sequence. However, like any sampling process, we have to be careful NOT to alias any high frequency trash back down into the audio band. Because once aliasing occurs, it cannot be undone. SO ... before any sampling, re-sampling, or decimation process, we have to filter. What's the best filter? Guess where we look :) Yep, frequency domain, quickly tells us that our friend, the LOW-PASS FILTER, is the ideal way to prevent aliasing.
Good example of decimation is the process used in oversampled (delta-sigma, mash, etc.) type Analog-to-Digital Converters. These little gems OVERSAMPLE the original audio signal in a clever way to trade speed for accuracy ... but ultimately provide samples at a Nyquist rate of 44.1kHz or 48kHz. The "downsampling" process is exactly decimation ... but the anti-alias low-pass filter (sometimes called the decimation filter) must PRECEED the actual downsampling process to filter (or remove) all the high frequency quantization noise associated with low-precision sampling.
Now in general rate conversion problems, we have BOTH interpolation and decimation to contend with ... turns out we really only need ONE low-pass filter to squash images (interpolation) and prevent aliasing (decimation).
Doug, an even more "on point" explanation of interpolation & decimation. Let's review that first rate conversion technique I mentioned ....
Let's say we are dealing with a SYNCHRONOUS (no "a" prefix :) ) rate conversion problem, from 44.1kHz to 48kHz, where both rates are derived from the SAME clock source. Meaning that, when measured with the same stopwatch (clock), the 44.1kHz rate is EXACLTY 44.1kHz, and the 48kHz rate is EXACTLY 48kHz.
I suggested that, in this environment, one could simply interpolate by 160 to 7.056MHz and decimate by 147 to 48kHz. Let's explore that in a bit more detail, to further define what we mean by interpolation and decimation.
In this sychronous environment, the first thing to do (at least conceptually) is to take the 44.1kHz data stream and stuff 160-1=159 zeros, evenly spaced in the time domain, between each sample. Then you apply the "zero-stuffed" sequence to a low-pass filter (cutoff at half the original sample rate or 22.05kHz), which will finalize the interpolation process by "filling in" the zeros with real data points. That completes the interpolation step.
Now we simply grab every 147th sample, to generate our output sequence at 48kHz. That "grabbing every 147th sample" is precisely DECIMATION. Just digital resampling, or downsampling. We didn't need a special filter for decimation in this case, cuz the interpolation filter did it's job for us :) Make sense?
But of course, in this thread we're dealing with ASYNCHRONOUS sample rate conversion. The 44.1kHz clock and the 48kHz are NOT derived from the same timebase, meaning they are not exactly the numbers they claim to be, when measured with the same stopwatch.
So, in this asynch world, if you go back to my two original timelines ... top one with ticks every 44.1kHz, bottom one with ticks every 48kHz ... the definition of Asynch rate conversion is this :
There is NO interpolation by any FINITE integer you can do on the INPUT sample points ... meaning, no finite number of samples you can evenly space between the original samples ... that will give you perfect alignment with all the OUTPUT sample points.
PLEASE let me know if this statement makes sense ... it's the very essence of Asynchronous sample rate conversion !!!! :) :) :) :)
SPEAK UP !!!! Please humor an aging engineer :)
Well I'm just about still with you :)
|All times are GMT. The time now is 02:50 PM.|
vBulletin Optimisation provided by vB Optimise (Pro) - vBulletin Mods & Addons Copyright © 2016 DragonByte Technologies Ltd.
Copyright ©1999-2016 diyAudio