# IIR filter coefficient quantization and filter performance

Status
Not open for further replies.

#### CharlieLaub

Paid Member
The problem:
IIR filters are not infinitely accurate. Their accuracy depends on how many significant digits can be provided by the coefficients used to calculate them. You may know that hardware DSP IIR filters often use 40-bit or higher representations of coefficients to gain a large number of sig figs when calculating the filter transfer function. For example, the miniDSP uses up to 56bits for this purpose. When calculating filters in software, it's common to use the double precision data type, which provides 64 bits of precision (52 bits in mantissa) but in some cases this is still not enough to eliminate roundoff error for some IIR filters with low frequency corner frequencies, etc. One example might be a HP filter to form a 6th order vented alignment for a woofer/subwoofer.

Lack of precision can have several consequences. Truncation to the available precision even when double precision representation of coefficients is used, can lead to errors at the lowest audio frequencies, and this error becomes worse as the frequency decreases, or the sample rate increases. A Powerpoint presentation reviewing these concepts can be found online here:
http://signal.ece.utexas.edu/~arslan/courses/dsp/lecture25.ppt
A paper titled "Parameter Quantization in Direct-Form Recursive Audio Filters" by Brian Neunaber investigated this topic for audio applications. The paper is available online here:
Direct-Form Filter Parameter Quantization
As an example, table 4 (see page 10) shows that the error in filter corner frequency (Fc) for Fc=20Hz rises from about 0.2% to over 2.5% when the sample rate is increased from 48kHz to 192kHz using double precision. The minimum Fc that can be represented with double precision rises from about 2.6Hz with a 48kHz sampling rate to 10.6Hz with a 192kHz sampling rate. For filter Q, the situation is similar. At 48kHz the error in Q for Q=1.414 (mild peaking) at 20Hz is approximately 1% and would rise to over 4% if the sample rate were increased to 192kHz. Figures in the Neunaber paper illustrate these effects very clearly.

When higher order filters are used, the loss in accuracy is strongly compounded and stability could be compromised if direct filter realization of higher order filters is attempted. For this reason it is in general not advisable to compute an IIR filter higher than second order in a single step, instead multiple first and second order filters are separately calculated and the signal routed through them in series, multiplying their effect without significantly reducing the filter accuracy.

Finally, coefficient truncation is a source of noise in IIR filters even at modest frequencies (and lower). The higher the number of bits used the lower is the magnitude of this source of noise.

What can be done to reduce the effects of parameter quantization?
The paper reviews some strategies for improving filter accuracy, however, unlike hardware based IIR DSP realizations there is a simple way to increase accuracy in software: use a higher precision variable to hold the coefficients. On some platforms the "long double" type exists which is an 80-bit representation with 63-bits of mantissa accuracy, however, the long double type is not universally supported. Instead we can turn to an arbitrary precision library such as the GNU Multiple Precision Arithmetic Library (GMP). Using such a library, it should be possible to create a LADSPA plugin with a user defined mantissa precision, or perhaps determine the amount of precision needed for a given filter given some desired accuracy level. Using the GMP library would certainly be more computationally expensive compared to using native data types, however, it could ameliorate the problems associated with parameter quantization. Such an implementation could be applied only where it is needed to preserve computational resources.

There are still IIR filter response inaccuracies that occur at higher audio frequencies that would not be solved by this approach. These are worse for lower sample rates and have to do with the properties of the bilinear transform. I discussed some of these here, as well as a way to modify the filter to better match the desired response in this thread:

All of these topics are of interest to me, and relevant with the advent of small and inexpensive computing hardware like the Raspberry Pi that can perform IIR filtering in software. If there is enough interest I am willing to write a new version of my LADSPA plugins using the GMP library.

I welcome comments and feedback and hope to start a discussion about the topic here.

Last edited:

#### Soldermizer

You post an interesting question. I do have a technical background, but i am a generalist when it comes to DSP algorithms. That is a polite way of saying I basically know Jack S**t about what you've posted Really though, I had some Comp Sci training, and and to be truthful, infinte accuracy doesn't exist in the digital (or analog) world. There will always be limited precision. However... again, speaking generally, I am of the opinion that:

1. while it is certainly possible that hardware and/or software may produce errors ("noise" in our context) this is often an error in the software and or trying to exceed the limitations of the software or hardware in a specific purpose;

2. For all practical purposes these issues are (or "should be") non-existent for the hobbyist and probably most professional users. It is sort of like worrying that your DAC has "only" 24 bits when your source was recorded in 16-bit 44.1 KHz PCM.

3. The questions remain valid when trying to get the best performance from low-end computing power like a Raspberry PI.

Sent from my NV570P using Tapatalk

#### CharlieLaub

Paid Member
You post an interesting question. I do have a technical background, but i am a generalist when it comes to DSP algorithms. That is a polite way of saying I basically know Jack S**t about what you've posted Really though, I had some Comp Sci training, and and to be truthful, infinte accuracy doesn't exist in the digital (or analog) world. There will always be limited precision. However... again, speaking generally, I am of the opinion that:

1. while it is certainly possible that hardware and/or software may produce errors ("noise" in our context) this is often an error in the software and or trying to exceed the limitations of the software or hardware in a specific purpose;

2. For all practical purposes these issues are (or "should be") non-existent for the hobbyist and probably most professional users. It is sort of like worrying that your DAC has "only" 24 bits when your source was recorded in 16-bit 44.1 KHz PCM.

3. The questions remain valid when trying to get the best performance from low-end computing power like a Raspberry PI.

Sent from my NV570P using Tapatalk

Disagree on your point 1. Disagree on your point 2. Point 3 is true, the question DOES remain valid on a Raspberry Pi, which has plenty of computing horsepower for IIR filters. This is about calculating the filter in a better way for certain applications. You just don't seem to understand the problem.

Sent directly from my hand typing on a keybord...

Last edited:

#### DPH

Thanks, that was a typo (meant to type 64 bits, the total for double). But you are correct in that only the mantissa is relevant. I edited the post.

Yeah, I figured it was a simple oops.

#### DPH

Charlie,

I can appreciate your concern here (I honestly haven't looked at FP32 errors on IIR filters), but want to make sure we're abundantly clear: the entire article from QSC is analyzed using *single precision* floating point math, i.e. 24 bit mantissa. Not double precision, which would push those errors into unobtanium, given the extra 28 bits of mantissa. I'll admit to being surprised by the accumulated errors in FP32 (single precision) at low frequencies, but this all seems like unecessary hand wringing. In today's computational landscape, even using something like a \$30 Raspberry Pi (esp 3rd edition), we have more horsepower on tap than desktop systems ~10 years ago. SIMD-heavy applications (cough, DSP, cough) are even better optimized on modern hardware. I'm completely unsure of the speed or hardware optimization of IIR filters for a modern processor, or packing a recursive algorithm efficiently onto an SIMD core vs the main cores. One probably needs to get messy with some assembly optimizations there to squeeze every last bit (pun intended) out.

BruteFIR was able to run a pretty menacing FIR filters back in its day, on hardware long eclipsed.
How high throughput can I get?

With a massive convolution configuration file setting up BruteFIR to run 26 filters, each 131072 taps long, each connected to its own input and output (that is 26 inputs and outputs), meaning a total of 3407872 filter taps, a 1 GHz AMD Athlon with 266 MHz DDR RAM gets about 90% processor load, and can successfully run it in real time. The sample rate was 44.1 kHz, BruteFIR was compiled with 32 bit floating point precision, and the I/O delay was set to 375 ms. The sound card used was an RME Audio Hammerfall.

In short, todays SOC's are as fast as hardware well able to handle most any *FIR* DSP you might want, much less IIR using double precision math. AVX/FMA/NEON (in ARMv8) all can crunch FP64 quite quickly, albeit at reduced rates compared to FP32 (not that you're really worried in IIR filters).

But, given you can run beefy-enough FP32 FIR filters which play nice with SIMD cores, giving quite the alluring speedup, I have to really wonder what's the draw of IIR filters, outside of using an now-underpowered miniDSP? I mean, as has been noted by (Forgot name, sorry!) on the RPI3 thread, BruteFIR already works, and FFTW3 has NEON support, which means you're not likely going to squeeze much more from it. Those FFTW kids don't mess around.

Edit to add: we should be more careful with our language, to be honest, and stick with things like FP32 and FP64 versus double/single precision, as different architectures call them differently.

Last edited:

#### scott wurcer

Edit to add: we should be more careful with our language, to be honest, and stick with things like FP32 and FP64 versus double/single precision, as different architectures call them differently.

There are FPU's with FP80 and libraries with FP128, it was pointed out to me a long time ago that some physics problems involve matrices that are very nearly singular and this seemingly absurd resolution is needed.

I put some numerical noise comments in my LA article, even RIAA with two serial biquads had dramatically different results (but still very low) based on order at FP32. At FP64 you have almost 200dB of extra room to make any difference.

Charlie - You need to filter the literature carefully there are a LOT of articles/whitepapers based around early fixed point DSP's which simply are outdated.

Last edited:

#### Mark Johnson

Paid Member
Just accept the factor of 30 decrease in throughput and use software implementations of floating point operations with 110 bit mantissas and 17 bit exponents. Put it inside and FPGA and nobody . will . ever . know .

#### CharlieLaub

Paid Member
Charlie,

I can appreciate your concern here (I honestly haven't looked at FP32 errors on IIR filters), but want to make sure we're abundantly clear: the entire article from QSC is analyzed using *single precision* floating point math, i.e. 24 bit mantissa. Not double precision, which would push those errors into unobtanium, given the extra 28 bits of mantissa.

No, no, no... please go back and re-read the article, specifically the section called "1.2.2 Floating-point Implementation" and basically the rest of the article from Neunaber at QSC.

The article makes perfectly clear the problems that in software even with 64 bit double precision error can result when calculating low frequency IIR filters. I am really only concerned with software applications only (but the same principle applies).

This is why I am looking at higher precision solutions. Looks to me that the BOOST libraries have 100- or 128-bit floating point types that offer up a lot more manitssa bits. Sure, much slower, but I think still very doable even on a Raspberry Pi (2 or 3). You only need the slow code with the huge floats for low frequency filtering, afterall. Even if they are much slower, they should still be accommodated just fine on platforms like the R-Pi. Anyway, small computing hardware is just getting faster every year, so if they ask too much I just have to wait for some time to let Moore's Law chug along...

So, specifically what IS the problem here? I'll give you a specific example:
Highpass filter Fc=10Hz, Q=1.4:
Z-transform Transfer Function with each Coefficient divided by a0, and b0, b1 and b2 normalized to give the specified gain
b0 = 0.999532291511683
b1 = -1.999064583023370
b2 = 0.999532291511683
a1 = -1.999063726687330
a2 = 0.999065439359400

Notice all those nines leading each coefficient, and that they are very close to either -1, 1, or 2? Remember in the Direct Form you are adding and subtracting the product of these coefficients and the sample, which is on the order of 0.1 to 0.01 typically. This pushes the 9's down into the mantissa and reduces the accuracy of the filter computation.

The QSC paper shows how, for single AND double precision, the translates to errors in corner frequency and pole location of the filter. If you want a very accurate low frequency filter, maybe increasing the mantissa bits above what a double can provide is not a bad thing? That is the motivation for this thread...

#### DPH

Charlie,

Am I misreading things in the Neunaber article? As I read it, all the tables/etc calculated using a 24 bit fixed point coefficients (e.g., middle of page 6, under eq 22, again on pg 9 under eq 32), the labels underneath each figure point this out, and the frequent comments to a floating point's factor-of-two advantage at the same 24 bit mantissa. So what is presented is, in effect, a factor of 2 worse than FP32 calculations. Yes, 7 of the extra bits are "wasted", and you do have to worry about the internal implementation of the FPU (well, no you don't, while FP80 internal is common on desktop CPUs, IEEE 754 FP43 isn't found anywhere, so it's going to remain FP32 for all internal calculation steps).

I.e. the results given in the Neunaber paper are relevant to FP32 (with a x2 scaling). It seems we agree that FP64 isn't too egregious computationally, so we're 28-bits (29 bits, given it's not fixed-point) ahead of all these figures. 29 bits, which if I'm doing my math right, is a 174 dB improvement, or essentially 5 nHz maximum error (using the math right off Table 4) at 20 Hz! Barring Scott's we-need-every-last-drop-of-precision-for-physics-calcs, I think we're WAY more than okay! If you're clever enough to translate back/forth with fixed-point [-1,1) and need to squeeze every last drop of performance out of a system, then use int32's, and have 31 bits of precision (a tidy 8 bits better, giving you 48 dB improvement).

INT64, FP80, and FP128 are ultimately, "why?" for audio DSP.

#### CharlieLaub

Paid Member
Charlie,

Am I misreading things in the Neunaber article? As I read it, all the tables/etc calculated using a 24 bit fixed point coefficients (e.g., middle of page 6, under eq 22, again on pg 9 under eq 32), the labels underneath each figure point this out, and the frequent comments to a floating point's factor-of-two advantage at the same 24 bit mantissa. So what is presented is, in effect, a factor of 2 worse than FP32 calculations. Yes, 7 of the extra bits are "wasted", and you do have to worry about the internal implementation of the FPU (well, no you don't, while FP80 internal is common on desktop CPUs, IEEE 754 FP43 isn't found anywhere, so it's going to remain FP32 for all internal calculation steps).

I.e. the results given in the Neunaber paper are relevant to FP32 (with a x2 scaling). It seems we agree that FP64 isn't too egregious computationally, so we're 28-bits (29 bits, given it's not fixed-point) ahead of all these figures. 29 bits, which if I'm doing my math right, is a 174 dB improvement, or essentially 5 nHz maximum error (using the math right off Table 4) at 20 Hz! Barring Scott's we-need-every-last-drop-of-precision-for-physics-calcs, I think we're WAY more than okay! If you're clever enough to translate back/forth with fixed-point [-1,1) and need to squeeze every last drop of performance out of a system, then use int32's, and have 31 bits of precision (a tidy 8 bits better, giving you 48 dB improvement).

INT64, FP80, and FP128 are ultimately, "why?" for audio DSP.

Oh, wow, Mea Culpa - I mean I really can't believe it. I have to apologize, and explain.

You are right, the paper's analyses were all done for 24 bit mantissas! This whole time I somehow had it in my brain that it was all done for DOUBLE precision... I had the idea from a recollection of an earlier read that the paper talked about double precision and its lack of accuracy at low frequency... and that seemed to be misleading me the entire time when I recently re-read the paper! So this whole thread is totally wrong on my part.

I really don't know how that all happened... perhaps I should make a visit to my optometrist in the very near future.

Thank you for your tact in pointing out my error.

#### wintermute

Paid Member
As the thread started based on an error, there does not seem to be any point in leaving it open so it has now been closed

Status
Not open for further replies.