Upsampler/Interpolator Idea

JWR

Member
2006-11-13 1:47 pm
I have been working on a sinc based upsampler/interpolator for a few days and I was wondering if anyone here has any input. Specifically, I'm attempting to take a 16/44.1 input and output 24/196. Instead of a textbook approach, I'm trying to do this by convolving the input with a scaled sinc function.

My problem is that it is (expectedly) dog slow. Right now a 1 second input takes ~4minutes to compute (on a dual core westmere Xeon). I'm using c# and the .NET 4.0 parallel functions so its multi-threaded. All calculation is being done in double floating point (64bit). I tried f# and c++ and found no appreciable speed gains there (ok, both were a bit faster, but the underlying sine calc is still slow!), so I'm comfortable saying its an algorithmic issue and not primarily a language based one (which is reasonable, calculating millions of floating point sine functions isn't cheap).

I have investigated caching the Sinc function, but there doesn't seem to be any periodicity in my sampling points due to the odd 44.1->196 conversion ratio. I wanted to target 196k so that I could use any DAC I want. From what I can tell TI DACs don't support 176.4k sampling rates, it looks like AKM and Wolfson do but I'm not as familiar with them. As an aside, I think that 44.1 -> 176.4 could be much faster because of the periodicity of the sinc function...

So has anyone done this? At my current rate it looks like it would take the better part of a day to convert a single song. I'm going to try it on a quad core with hyperthreading and see if its any better, but I need a few orders of magnitude of improvement to make this practical.

I can post code if anyone is interested in the details.
 

JWR

Member
2006-11-13 1:47 pm
david: Great idea about interpolating from a reasonably sized lookup table! I will try this approach - as long as I calculate down to 1/2^24 accuracy there shouldn't be too much accumulated error, well, maybe I'll run some tests and find out because even very small error can accumulate over the 2/2^24 samples in either direction from the input sample... Would you use a linear interpolation or something higher order?

Ken: That is a good question. Sampling theory says that analog signal reconstruction is a sinc convolution. I see interpolating/upsampling as a D-A type of operation, where the higher resolution space is just closer to analog than the source. An chip could not possibly carry out this type of operation because it is causal and has as small a delay as possible. While whatever interpolation technique they do use (?) is probably good enough, it cannot reproduce the exact higher resolution representation of the data. So, I want to compare a textbook interpolation approach with an ideal approach and see the difference. Maybe it's irrelevant and all error is beneath the noise floor or rounds to zero, but I've never seen it proven. Also, another benefit is the one-step upsampling and interpolation.
 
Hi JWR,

First, a quick point:
You mention converting to a 196kHz sample-rate. This isn't standard. I'd put it down to you mistyping 192kHz (which is the standard) but you did it several times, so it seems this is your actual aim. I'd recommend you double-check this!


Hmmm, I'm a little confused - convolving by a sinc function is about as textbook as it gets. However, the implementation isn't usually described the way you have. As you're probably aware, the sinc function performs a perfect band-limited interpolation between the samples, which can be evaluated at any point and so allow you to resample the stream at whatever frequency you wish. The problem of course is that the sinc function has infinite support and must be windowed in some fashion. While you can reach a result of arbitrary precision by using a longer window, you can never reach the exactly resampled result.

In addition to length, there is a variety of windowing functions that are applicable. The rectangular window is optimal in terms of mean-squared error (i.e. error power) with respect to window length, but does not perform too well in the time-domain (i.e. pre- and post-echo). Some work has been done on optimising this sort of calculation against criteria other than MSE - I can dig out some papers if you are interested in this.

You report you are calculating "millions" of points - exactly how much of the sinc function are you using?
[ My guess is that you are using a weighted sinc function for every sample in the track - this is far more than necessary. I do not remember offhand, but the sinc function will be more than 120dB down in level after probably a few hundreds of cycles, so going beyond this isn't going to increase your accuracy much. It is a judgement call though, so take it as far as you wish :) ]


Now, in my view the key to moving forward is to realize that your current method is exactly equivalent to an FIR filter interpolation using a sinc kernel. For this sort of interpolation, as you noted, you require a periodicity. However, there is a periodicity in the conversion from 44.1kHz to 192kHz - which is 640/147. So, the conversion can be considered as an upsampling by an integer factor of 640 times, followed by a downsampling by an integer factor of 147 times. If you consider that in its raw form, you have a long FIR filter running at 640Fs! But closer inspection shows that the vast majority of input samples are zeros and don't contribute anything to the result, so you can get there much more cheaply by just not calculating for them. The standard structure to do this is called a polyphase filter, and is quite straightforward. It is documented all over the place. I would recommend you look at this approach, as it combines the arbitrary precision of the sinc implementation with the cheapness and ease of a fixed filter kernel.

I appreciate the fun of deviating from the mainstream, but in this case, there's a reason that polyphase filters are the norm! ;) In any case, the trick now becomes in designing the "best" filter kernel for the job, and there's plenty of fun to be had there. Nobody quite knows what to aim for in the properties of these filters, so it's still an active research topic.
 
Last edited:

JWR

Member
2006-11-13 1:47 pm
Wingfeather: You're right on the typo above, please read all 196 as 192. Before my rant below, thanks for your response too! I appreciate all input.

My idea derives from this sentence: "the sinc function has infinite support" -- which is true in analog, but in a digital representation the sinc function eventually rounds to 0, i.e. after 2*2^24 (2^25) samples for 24 bit. So the infinite sinc function can be fully described in 24bits as 4*2^24 (2^26) samples. If you convolve with this huge sinc function, you get exactly the higher resolution representation of your input. While I think you're spot on about the absurdity involved, i.e. that the sinc function is small after only a few cycles, I am curious to see the differences.

The textbook approach to interpolation I was referring to was to zero-pad and lowpass filter (the way I learned it in school), which is what I assume any hardware interpolater does.

Yes, this is a "FIR filter interpolation using a sinc kernel" as you point out, but by periodicity I should have said "rational periodicity", i.e. something that I can use to make my sine calculations simpler. As it is I have no repeated sine calculations in 44.1->192 so I have to call a sin(x) on every output sample for each input sample. I'm running a 1-minute sample right now (on a 24-core server mind you), so 44100*60*192000*60 = 30,481,920,000,000 sine functions (which was the impetus for the first posting)!

However, I didn't think of 192/44.1 as 640/147... Good idea, maybe this approach would be easier if I interpolated to 28.224MHz and back down... At least then the sine function would be cache-able...

So I have to admit I didn't consider a filterbank approach, but that's not really my goal. I'm aware there are established methods of doing this... My purpose here was to see if I could actually do the "brute force" calculations in a reasonable timespan, which I'm not so sure about any more. If the 28Mhz idea doesn't pan out, I'm going to have to start looking into CUDA programming or a distributed client approach.
 
Hi JWR,

You are correct, that most all hardware hardware interpolators utilize zero-stuffing followed by low-pass filtering via convolution with a sinc function kernel. In addition, the majority are half-band filters due to the hardware compactness that allows. While half-band filters are particularly problematic in performance terms when used utilized as anti-alias filters in monolithic ADC chips (which they greatly are) they are not so problematic in sample-rate conversion application. You are also correct that the residual errors (the interpolation filter's stop-band digital noise floor) of a practical hardware upconversion process are well below the analog system noise floor. For the AD1896 this floor is around 140dB below the fullscale output. For the SRC4192 it is even lower. There is some small variance in this noise floor figure depending on the conversion ratio due to the support for non-rational conversion ratios. Those noise figures are actual instrument measured results and well documented as such in the respective datasheets of those two devices.

A great side-benefit of the AD1896 chip in particular is the absolutely outstanding jitter rejection it can provide. We're talking about an jitter attenuation mask which begin at around 3Hz and is down by a whopping 55dB by 100Hz, and by more than 100dB by 1kHz!

I do understand, however, that you are curious to evaluate what an error/noise floor below even -140dB sounds like. I suggest that if you go through the trouble of implementing a custom upsampling software module that you do not base it on a half-band interpolation algorithm. Instead, implement a filter having a stop-band beginning at the input rate Nyquist point. Another option you may want to consider is the DSP (software) based upsampling engine designed by the Swiss Anagram Technologies (now apparently owned by ABC-PCB in the U.K.). The Anagram Q5 module features a digital noise floor more than 170dB down, among other very high-performance ASRC features.
 
JWR,
Which TI DACs don't support 176.4K? The 24-bit delta-sigma DACs that I am familiar with are all spec'ed at 10KHz to 200KHz. But you probably wouldn't want to use any of them because they include a 8x oversampling digital filter, which would circumvent what ever you hoped to accomplish with your software interpolater. The PCM1704 will work at any sample rate up to 781KHz or 1041KHz, depending on frame size.