Asynchronous Sample Rate Conversion

Status
This old topic is closed. If you want to reopen this topic, contact a moderator using the "Report Post" button.
Oh, one more thing, as an "aside". Remember how I was careful to distinguish between Asynchronous Re-timing/sampling/clocking and Asynchronous Sample Rate Conversion? I said that that the first category was really a degenerate case, mathematically, of the second. Well, Asynchronous Re-timing/sampling/clocking is nothing more than Asynchronous Sample Rate Conversion with NO interpolation ... or more accurately, with N=1. A bit smaller than what's required for, say, 16 or 24 bit precision ;) The math will be left as "an exercise for the reader" ... but you can expect a pretty big difference in residual image energy from the ZOH function with N=1, and N=2^20 !! ;) ;)
 
werewolf said:
Mathematically, this operation is described by applying a zero-order hold function on the interpolated data, followed by asychronous decimation to Fs_out. In the time domain, this holding operation completely "fills in" the small remaining gaps between the interpolated data points ... effectively creating an analog signal :)

In the frequency domain, the zero-order hold applies the following filter function to the interpolated data :

ZOH = sin[pi*f/(N*Fs_in)]/[pi*f/(N*Fs_in)] = form of "sinx/x"

where f = frequency, and N*Fs_in = interpolated sample rate.

In this post we found that , in order to keep the second component down around -126dB, N must be astronomically huge ... on the order of 2^20.

Even though my hand-held calculator fried many posts ago - yielding the inevitable "TILT" display - I have attempted to follow this tutorial because it's instructional, informative and fascinating in that it gives some insight into what the designer of such things must contend with - BUT - eventually it has to be reduced to some practical advise for the DIYers.

My own conclusion is that *most* of the math can be reduced as follows:

what*%^$#the*&^%$I-don't get-it%$#$%$ = Use an CS8420.

Am I right? I don't want to give away the punch line:) :) :) but I also think that the odds of an average DIY-guy (and I'm less than average, thank you) actually reducing this to practice without the ability to *create* silicon is slim-to-none.

But, I await the final episode(s) because it IS interesting! Please continue!!!:)

Regards,
Bill
 
netgeek - I hear you :) I'm doing my best to confine the math to a post or two ... it's my goal to provide the math foundation for those who want to explore it, and yet still provide some useful conclusions for those who would rather bypass the details. Of course, we'll have to wait until we're done to see if I've been successful!

Case in point is this ZOH math ... you can certainly ignore it, and jump to the conclusion without really missing anything. But I do hope that the general trend is clear : the farther you interpolate, the smaller the difference will be between adjacent samples ... and so the smaller the error will be from using the most recent one, instead of the exact right one. Make sense?

Just so happens you need to interpolate by about a MILLION to get the error reasonably consistent with ~24 bit precision !! :)

But your point is VERY well taken ... I hope that nobody is turned away from this thread because it's too technical ... please continue to give me some feedback. We've got a couple more concepts to explore before we talk about JITTER ... which will form the basis for a comparison with the technology that ASRC is fighting to replace ... namely, PLL clock recovery schemes.

But you're right ... perhaps the bottom line for the DIYers is that YES, use a CS8420 or AD1896 instead of the old CS8412/14 standard. It's just a good thing to do ... it will allow you to clock your processor & DAC with a LOCAL crystal oscillator, while remaining "backward compatible" with the S/PDIF standard, and all will be right with the world :)
 
werewolf,

Well, I wasn't suggesting that the math should be abandoned or ignored - indeed it's the most useful way of proving and/or illustrating some of your points - hopefully, those truly interested in the subject will continue undaunted. That's what I'm trying to do - even if I'm starting to get glazed and dizzy:) Just kidding...

As for the extent of interpolation - perhaps a new mantra: "Interpolate unto infinity if you will - there will be no errors thus". Kidding aside, your point is well taken. But - what's the practical limit? And, more importantly, (at least in this forum) what can practically be done?!;) By that I mean to ask what DIY-types can do to practically implement some of these concepts. I've got a fairly decent arsenal of lab equipment, etc. - but I've not yet gotten to the point where I can start etching silicon in the basement:) :)

Is the tutorial meant to be an educational (and perhaps cautionary) tale so that designers can make better (i.e. informed) choices? Because in the long run "folks in these here parts" are completely reliant upon, and at the mercy of, :) "them folks what does make silicon".:) For the most part I think readers here want part numbers (AND stocking numbers for Mouser, Newark, et al :) :) ) as well, if available... Your posting is unique in this regard because there's no schematic, CAD file, photos, nor solicitation for donations to a PCB fund. And that's a good thing! But eventually I'd hope you can point us in a direction (after having been educated) towards practical and readily implemented solutions.

As for the "PLL/Clock recovery" replacement: I won't even go there (or close to it).:) :eek: :eek: I like the idea - but clocks in general seem to be a religious issue around here - beware if you want to open THIS can of worms:hot: :hot: (Although I'd welcome the effort)....

As for "all being right with the world" .... hahahahahaha....wouldn't it be grand if there was *anything* here that could be agreed that moved us closer to that???:) :)

Still looking forward to the continuation of this tutorial!!!

Regards,
Bill
 
werewolf said:
netgeek - I hear you :) I'm doing my best to confine the math to a post or two ... it's my goal to provide the math foundation for those who want to explore it, and yet still provide some useful conclusions for those who would rather bypass the details.

**

But you're right ... perhaps the bottom line for the DIYers is that YES, use a CS8420 or AD1896 instead of the old CS8412/14 standard.


Werewolf, looking at your CV, we are priveleged to have you
here.
I've studied the conceptual models on AD1890/1896 data sheets
and this step-by-step tutorial is certainly acheiving your goal, it is
much more understandable for people like myself that are not
dsp experts.

You are probably aware TI has some new ASRC chips that do
THD-140 and DR-144; better than even 1896 and it seems,
theoretical limit. They do not state any jitter rejection data, so I'm
keen to see your section on jitter rejection (or resultant artifacts),
and how it relates to these core THD and DR specs.

Excellent thread.

Cheers,

Terry
 
Thanks Terry ... I just downloaded the data sheet for the Burr Brown/TI SRC4192/93. Looks like some great reading :D

Quick review. Input data at the Fs_in rate needs to be interpolated by N~2^20, so that the error incurred by Asynchronous decimation to Fs_out will be acceptably small. In a nutshell, that's what we know so far :) (Why it took me so many posts, to say that one sentence ... escapes me at the moment :confused: )

But how can we possibly interpolate, or upsample, to 50GHz ... and beyond? Well, there's one VERY important point to keep in mind, as we consider hardware architectures for realizing our Asynchronous Sample Rate Converter. And that point is this :

We don't need ALL of the samples of the interpolated data.

In fact, we are going to IGNORE most of them ... we only NEED the ones that are closest to the Fs_out clock ticks. In signal processing lingo, we say that very, very few samples will "survive the decimation to Fs_out". Yes, we have to digitally interpolate the Fs_in data by N, conceptually CREATING N more data points ... and N is positively huge! But .... BUT .... we are going to throw MOST of them away, because we only need to keep the ones CLOSEST to Fs_out. Make sense?

So if we're going to throw most of the interpolated data points away, why even calculate them in the first place?

Well ... this is where we find a BIG difference between IIR (Infinite Impulse Response) and FIR (Finite Impulse Response) filters. Certainly a topic for yet another thread, but for now we'll just state that IIR has a big DISADVANTAGE here. If we decided to use an IIR low-pass filter for our interpolation, we would indeed have to calculate ALL of the interpolated data points ... not because the the ultimate output at Fs_out needs them, but because future calculations need past results ... i.e., feedback ... in IIR structures.

But there is no such feedback in an FIR structure. With FIR, we only need to calculate the outputs that will survive the decimation process. Oh, and of course you get a PERFECTLY linear phase response to boot :) So we don't have the enormous computation burden of really upsampling to 50 GHz.
 
So we will use an FIR filter for our interpolation by the huge N. A quick review of FIR interpolation is in order.

We said before that interpolation by N has two conceptual steps ... first, stuffing N-1 zeros in between the Fs_in data points and second, passing the zero-stuffed signal into our (FIR) filter. We can think of an FIR filter as a "tapped delay line", meaning that each output is nothing more than a linear combination of a "time slice" of input data. It's worth noting that any such time slice of input data will be VERY sparse ... due to that zero-stuffing process. In other words, most of the FIR input data points are zero, which further reduces our computation burden. And finally, the multiplicative elements in the linear combination are nothing more than the "coefficients" of the FIR filter.

But how many non-zero data points ... that is, how many Fs_in samples ... do I need in my linear combination? Without going into all the details, let me just state a strong relationship:

The number of Fs_in data points used in the FIR calculation will dramatically impact the frequency domain TRANSITION BAND of the FIR filter. We will call the number of Fs_in data points, used by our FIR filter, the POLYPHASE LENGTH of the FIR filter.

In other words,

more Fs_in data points used in computation = longer polyphase length = steeper low-pass transition band

And it's really this "polyphase length" concept that dictates our computation ... because this is how many NON-ZERO inputs our FIR computation will use ... no matter how far we interpolate.

However, the TOTAL LENGTH of the FIR filter itself must take into account how far we interpolate. In fact, the total FIR length is given by the simple relationship :

FIR Filter Length = (Polyphase length)*(Interpoaltion Ratio = N)

Time for a few simple examples !!

Let's say our frequency domain transition band requirements dictate that we need a polyphase length of 16. IF we want to use such a filter to interpolate by simply 2, the total FIR filter length would be 16*2 = 32. So there would be a total of 32 filter coefficients, just waiting for input data to linearly combine. But every other input sample is zero, because of the zero-stuffing we did to interpolate by 2. So even though there are 32 FIR coefficients, we only need to perform 16 multiply/additions for each output sample. Make sense?

Now let's say we want to use a similar filter, polyphase length = 16, to interpolate by a factor of 4. Total FIR filter length now must be 16*4 = 64, or a total of 64 FIR coefficients. But because of zero-stuffing, 3 out of every 4 input samples are zero, so once again each output still only needs 16 computations.

So for interpolation, it makes ALOT of sense to consider the two "factors" of an FIR filter : Polyphase Length, and the interpolation ratio = N. In fact, we consider the entire FIR filter to be made up of several SUB-FILTERS ... each one a "separate" polyphase filter.

Total FIR = (Polyphase length)*(Number of Polyphase Filters)

How many of these sub-filters ... how many polyphase filters ... are there? Very simple, there are N of them, where N is the interpolation ratio :)

So, to summarize, we will use an FIR filter to perform the HUGE interpolation of Fs_in data needed by our ASRC. And we consider the HUGE FIR filter to be made up of several sub filters, or several polyphase filters. Each filter has the same POLYPHASE LENGTH, and there are N of them. The polyphase length simply dictates how many Fs_in samples will be used in the computation to interpolate ... or, alternatively, how may Fs_in samples are used in the computation to produce an Fs_out sample.

Please stcik with me ... I promise this is going somewhere interesting! For now, a couple more numbers. High quality digital audio requires polyphase lengths of 64. Let's just accept that this number gives us adequately STEEP transition bands for our LPF interpolators. And we know that our interpolation ratio is about 1 MILLION for high quality ASRC. Does that mean that these Asynchronous Sample Rate Converters use FIR filters that are ... about 64 MILLION taps long ???????

Yes, in fact it does :) :D 1 Million Polyphase "sub-filters", each one 64 taps long.
 
Alright, this is a "BREAK-POINT" post. You can disregard all of the previous technical details, if you choose, and start here. As promised, we will reward all the readers :) and reform Asynchronous Sample Rate Conversion technology into a simple arithmetic problem. How's that sound?

Input data arrives at an Fs_in rate. We need to store the most recent 64 input samples, in a RAM or FIFO, on our device. Of course, this memory gets updated frequently ... every time a new input sample comes along, we kick out an old one. Out with the old, in with the new ! :)

Also stored on our device is a set, or more accurately, an ARRAY of numbers, called coefficients. These coefficients can be stored in ROM, because they will never change :) Now it's quite a LARGE array ... it has 64 columns, but approximately 1 MILLION rows. We call each row a "polyphase filter" ... so each polyphase filter has 64 coefficients, and there are a million different polyphase filters. (In reality, we might not really need a ROM this big, because there are some clever ways to calculate some polyphase filter rows from others. But it's absolutely fine to imagine ALL these polyphase filters, with 64 coefficients each, stored on our device).

Now what happens when an OUTPUT clock "tick", arriving at a totally asynchronous rate of Fs_out, comes along? Well, our device needs to CALCULATE an audio output sample. And it does this very simply : the device simply SELECTS one of the polyphase filters, which has 64 coefficients, and multiplies those 64 coefficients by the 64 input data samples stored in RAM. Then it adds up the result of all those multiplications, and provides that value as the output sample. And that's all there is to it :) :)
By the way, we often call this multiply/add operation a "convolution".

Simple, right? Indeed it is !!! There's only ONE missing piece. How does the device know which polyphase filter to select, from the one million available, when an output clock (Fs_out) edge comes along?

THIS IS THE WHOLE "TRICK" OF ASYNCHRONOUS SAMPLE RATE CONVERSION !!!! We have to figure out which of the very many polyphase filters to select, for multiplication/addition with input data, when an output clock edge comes along. And we shall address this issue in the next post ... and ultimately find the key to understanding how the ASRC responds to JITTER.

Questions? Comments? Have I lost my audience? :(
 
werewolf said:
... so each polyphase filter has 64 coefficients, and there are a million different polyphase filters. (In reality, we might not really need a ROM this big, because there are some clever ways to calculate some polyphase filter rows from others. But it's absolutely fine to imagine ALL these polyphase filters, with 64 coefficients each, stored on our device).

You're doing great! I hope you will shed some more light on the coefficient interpolation implied by the above statement at some point. I believe this is where the "real" trade-secrets of AD and CS is hiding ;)
 
Disabled Account
Joined 2003
werewolf said:
By the way, we often call this multiply/add operation a "convolution".

Wow! That is by far the most concise explanation of convolution I have ever seen!! :bigeyes: :D


I also have a question regarding a specific ASRC implementation. In another forum, I came across the following response (note, this is not mine, that's why I'm asking for another opinion):

I've read comments about CS8420 sound quality and many people were not happy, in fact this chip was giving ASRC a bad name. That's why I didn't even consider using it.
...
As for CS8420 that was mostly comments of audiophiles who heard commercial equipment using this chip and compared it with full reclocking and other normal DACs. That was the first generation of ASRC and lot of people weren't blown away with it then.

Can anybody comment on that? There's a CS8420 in my DAC and I haven't noticed any problem, but then again, I don't have much to compare with.
 
Well, these are rather complicated chips... it's not unknown to have bugs from time to time. Such bugs might cause anything from complete operational failure through subtle performance degradation.

As a side note, CS8420 is by no means a first generation ASRC chip. The first IC I recall seeing on the market was the AD1890 / AD1891 (ca. 1993?), well before the CS8420 came out. CS8420 might have been a first for Crystal though...
 
All very valid comments & points guys :) Thanks, it helps to re-fuel my gas tank !

This post is probably the first of two to deal with the "central" topic of Asynchronous Rate Conversion, as we have re-defined the problem :

How do you pick the right polyphase filter when an output word is demanded by Fs_out?

Let's take a step back, and recognize that it's the INPUT data, on the Fs_in clock, that's being interpolated by huge N. What this means is that EACH polyphase filter is really "associated" with ONE of N points in time, evenly spaced between INPUT samples.

What we need to do, is use the clocks available to us to pick the right "point in time" associated with an Fs_out edge ... same thing as picking the right polyphase. The PROBLEM is that it would take something on the order of a 50GHz clock to give us that kind of RESOLUTION in time ... at least, for all POSSIBLE time points. And we don't have clocks that fast available to us !!! We certainly have the Fs_in clock, the Fs_out clock, and maybe even many multiples of the Fs_out clock ... up to 512*Fs_out is common ... available. But nothing approaching 100GHz :) (I've suggested multiples of Fs_out being available, because in the ASRC environment, Fs_out is often derived from a local crystal oscillator operating at ~512*Fs_out).

But we DO have something going for us :) And that is simply this : we don't expect Fs_in and Fs_out to be dynamically CHANGING very fast, if at all (jitter issues notwithstanding). What this means is simply that we don't need the ultra-wide BANDWIDTH on our time RESOLUTION that a lightning-fast clock would provide.

This is VERY good news ... because it means that I can use an AVERAGING process to "measure" the clock rates, or time resolution. I can resolve the distinct time points needed to select the right polyphase by AVERAGING over many, many clock cycles. In other words, I can resolve very FINE points in time by AVERAGING over very many COARSE clock cycles.

Make sense? Coarse averaging, for fine resolution.

This post has been very "general" ... next post we'll examine, in a bit more detail, a specific technique for the "coarse" averaging. I'll call it the "Poly-Phase Locked Loop" or PPLL for short :)

But one thing to take away from this post : we need coarse AVERAGING for the fine resolution required to pick the ONE right polyphase out of a million. That "averaging" process, being of course a form of a low-pass filter, will dictate how the ASRC responds to clock jitter.
 
hifiZen said:
Well, these are rather complicated chips... it's not unknown to have bugs from time to time. Such bugs might cause anything from complete operational failure through subtle performance degradation.

That's an understatement, the CS8420 has A LOT of bugs! I know because we use them at work and they are a nightmare to configure and set up correctly. Its datasheet was written by monkeys with typewriters. Although once configured properly, the sound quality seems to be fine in my experience.
 
Can't say I follow everything but conceptually I do and think this is indeed a great thread...keep it up!


That's an understatement, the CS8420 has A LOT of bugs! I know because we use them at work and they are a nightmare to configure and set up correctly. Its datasheet was written by monkeys with typewriters. Although once configured properly, the sound quality seems to be fine in my experience.

I did not find it that bad, if I recall there was one bug that was hard to get around.

I've come to really appreciate my 8420/DF1704/PCM1704K DAC which besides sounding great is very versatile and lets me run anything from cd @ 16/44, dvdvideo @16-24/48kHz to dvdaudio @ 24/192kHz with ease.
 
Demystifying the digits

werewolf, this is a super set of posts. I haven't got through all of it yet, but then I'm partially sighted and read rather slowly.

So, why not assemble all this great information into a single document so that it could be downloaded? It should be straightforward, with just a little editing. Maybe a pdf?

Thanks.
 
Disabled Account
Joined 2003
A 8, are you using the CS8420 in software mode? The errata sheet says there's no workaround for the invalid mode bug in hardware mode. I wonder if someone can suggest a circuit that will detect every time a new input stream starts so the chip can be automatically reset, as that appears to be the only way to deal with the problem when in hardware mode.
 
Spartacus - thank you ! Maybe a pdf is in the near future :) but in the meantime, allow me to post the first of a few great references, written by my old friend Bob Adams from Analog Devices :

"A Stereo Asynchronous Digital Sample-Rate Converter for Digital Audio", IEEE Journal of Solid State Circuits, vol. 29, No.4, April 1994, pp.481-488.

In fact, it was Bob Adams who, to the best of my knowledge, first developed the "Polyphase Locked Loop" concept that we'll explore in the next post (but he didn't call it that :) ).

Actually, the IEEE paper followed shortly after another great reference from the same author :

"Theory and VLSI Architectures for Asynchronous Sample Rate Converters", Journal of the Audio Engineering Society, vol. 41, No.7, 1993.

Shame on me, for not mentioning these landmark works sooner. Most, if not all, of the concepts we've explored can be found in these references :)
 
Alright, the details of the Polyphase Locked Loop (PPLL). The previous post gave the general idea ... and it's really all you need to appreciate that there's some heavy FILTERING of clock edges/rates, in order to select the right polyphase at Fs_out edges. But here's a more in-depth review, for those interested :)

Statement of the problem : we have to pick the right polyphase filter, when an Fs_out edge comes along.

Now we know that there are PRECISELY N polyphase "intervals" between Fs_in clock edges, because, by definition, we are interpolating the INPUT data by N. So we must consider these N polyphase intervals to be ALIGNED with the INPUT data period (equal to 1/Fs_in). What we need to figure out is how many polyphase intervals are associated with an Fs_out period. Because, if we knew how many polyphase intervals exist between OUTPUT clock edges (Fs_out), we could simply increment a counter, clocked by Fs_out, by this "magic" number ... and the output of this counter would always point to the correct polyphase each time an Fs_out edge comes along.

In other words, there is a "magic" number of polyphases between each OUTPUT clock edge (Fs_out) ... let's call this number M. We know that M is less than N, because there are exactly N polyphases between INPUT clock edges (Fs_in), and Fs_in is less than Fs_out. Make sense? And once we know this "magic" number of polyphases between Fs_out edges, we can increment a simple counter by this number M with each Fs_out clock edge ... and the counter will always point to the right polyphase. (It must overflow periodically, in a modulo-N fashion, but that's just fine).

How do we determine M, the increment to our counter clocked by Fs_out? Why, we put it in a servo loop that we'll call a Polyphase Locked Loop. The loop operates on the following principle :

Fs_out/Fs_in = N/M = ratio of the sample rates

Where N is known, and M can be found by recognizing that, after a sufficiently long period of time, a counter clocked by Fs_out and incremented by M should equal a counter clocked by Fs_in and incremented by N. In other words, N*Fs_in = M*Fs_out (on average).

So, every so often our counter clocked by Fs_out and incremented by M is COMPARED to a counter clocked by Fs_in and incremented by N. The DIFFERENCE between these two counters is attenuated by some gain factor, and adjusts the estimate for M. That's all there is to it :) A simple feedback loop, operating in a similar fashion to a PLL that's measuring, and accumulating, the phase difference between two clocks in order to align their edges.

But of course, here we need to align "polyphase counts" in order to figure out how many polyphases "fit" between Fs_out edges. The loop will, of course, only perform this task in an AVERAGE sense ... it's got a pretty narrow bandwidth, given the precision required and the clocks available. How narrow? The bandwidth of the Polyphase Locked Loop in the AD1896 is about 3 Hertz.

A couple noteworthy points :

1. The same principle can work with multiples of Fs_in and/or Fs_out clocks.

2. The majority of all the operations within the ASRC can be clocked by Fs_out & it's multiples. In fact, the Fs_in clock is really only needed for two things : one, to write a new input data word into the RAM (used in the convolution operation) and two, to clock one of the two PPLL counters mentioned above. All of the other convolution operations can be clocked by multiples of Fs_out.

Well, I think this brings us to the end of our technology review. I've got one or two more posts left, to describe how the ASRC responds to input clock JITTER, and compare this technique to a PLL clock recovery scheme.

Certainly a good time to break for questions :)
 
Status
This old topic is closed. If you want to reopen this topic, contact a moderator using the "Report Post" button.