Impulse response, FFTs, deconvolution
I'm trying to understand the theory and practice of room correction, starting with the measurement of impulse response, then hopefully moving onto deriving the inverse filter to perform a 'perfect' correction of some music recorded at the same position as the impulse response measurement (listening in headphones). At this point, I may feel that I have some grasp of how this stuff works, and can then experiment with 'tempering' the correction to make it work in practice.
I was wondering if anyone could help with some of the real 'nuts and bolts' of how this works:
As I understand it, as a first approximation, we can regard the room as 'smearing' the sound from each speaker by adding spurious echoes, reflections and reverberation that all reach our ears at different times after the direct sound; the audio from the speakers is 'convolved' with the room characteristic. This characteristic can be captured in its entirety (at one listening position) as the impulse response, and inverted to 'correct' the source audio for the room. (Various caveats, such as the system being assumed to be linear etc. which, in practice we may not achieve, but it's near enough for us to do something with it). This goes beyond normal frequency-selective equalisation, and allows us to effectively suppress echoes and reverberation - which seems rather miraculous and I can't wait to try it for myself, even if, in practice, it wouldn't be a good idea to take it as far as theoretically possible.
Capturing the impulse response can be done as easily as making a 'click' and recording the sound in the room for a couple of seconds, but a better way is to use a longer duration 'probe' signal that helps to reduce the effects of ambient noise and is more predictable, and easier for speakers to reproduce. A swept sine wave is good and especially, for various reasons, one that increases in frequency exponentially. However, any signal could be used, as long as it contains the full range of audio frequencies, such as white noise.
The impulse response can supposedly be derived by deconvolving the recorded signal with the 'dry' test signal. In the frequency domain, convolution is the equivalent of multiplying the spectra of two signals together, and deconvolution is simply obtained by dividing one by the other.
I have made some recordings of the room, with a loopback of the swept sine wave test signal recorded simultaneously, so in theory all I have to do is take the forward FFT of both signals (with suitable quantities of zero padding at the end of each signal, no windowing necessary(?)) and, bin-by-bin, divide the spectrum of the recorded signal by that of the test signal (using complex arithmetic), take the inverse FFT and hey presto I should have the impulse response. I think. It seems that the usual technique is to load up the FFT's real input with an audio signal and set the imaginary values to zero. Then after the frequency domain processing, the audio output is again taken from the real values, only. I might have expected the imaginary values to be almost zero but in my experiments I am finding that the the impulse response output is complex i.e. both real and imaginary values are present at equal levels on average. Does this mean that I definitely have a problem somewhere in my calculations, or is there a way of extracting the 'real' impulse response from this?
The usual way of correcting a room seems to be by using a convolution processor that convolves a FIR correction filter with the audio (again by FFT). For experimental purposes, the filter could be the perfect inverse of the impulse response, so convolving it with the signal would be the same as deconvolving the signal with the impulse response. Again, I find that this filter (obtained by inverting the division in the impulse response extraction stage above) is complex. How do I derive from this a simple kernel that can be convolved with the signal to effectively deconvolve the impulse response from the audio?
Many thanks to anyone who can help with this.
My mistake was to use a very poor FFT that, for large FFT sizes, produced only very approximate conjugate symmetry (up to 1% error or so) and also just inaccurate values, leading to complex values at the final output. I changed to an open source library from FFTW Home Page, and suddenly my impulse response outputs are almost pure real.
I'm interested in room correction and just stated to investigate. I found that it is a very interesting topic.
To my limited understanding not sure if FIR is able cancel out the echo/reverberation I think it is crossing frequency domain and the time domain. I'm seeing that as something like subtracting a faint and beyond recognition image of earlier sequence of sound and more than once.
I think it is mostly just correcting the frequency/phase response of the system.
My experiments so far, have been useless. I got nowhere with this software DRC: Digital Room Correction, experiencing a strongly unpleasant effect even when using the 'minimal' configuration, hence my attempts to write my own software just to get a handle on what I am doing wrong.
The way I try to imagine what the FIR filter should be is doing is this: suppose you were standing in a field with a speaker in front of you, and a concrete wall behind you. An impulse from the speaker would reach your ears after a certain delay, carry on to the wall, and then be reflected back to your ears as a single echo, after another delay. In theory, by sending a negative impulse from the speaker at just the right moment after the initial impulse, the reflection from the wall could be cancelled out at your ears. If, instead of impulses, we were listening to steady sine waves, then we would get partial reinforcement and cancellation at different frequencies, appearing to give us a wobbly frequency response. We might be tempted to think that a graphic equaliser would fix the problem, but it would not work with transient signals. Suppose the echo came 10 seconds after the original transient. Clearly no 'tone control' could fix it.
I'm not so naive that I believe that we can, or should, achieve perfect 'correction', but I would love to think that I had improved the general coherency of my system e.g. aligning the 'group delays' of my speakers as a by-product of the processing. Denis Sbragion who designed the DRC system above seems to have done a great job of ensuring that the processing can be subtle, and he has included ways to progressively reduce the correction at high frequencies with time, for example. If only I could make it work!
I have, in fact, managed to correct some music recorded at the microphone using my own deconvolution software. The straight recording sounds hollow and 'boxy' the way it always does when the microphone is some distance from the source (the problem we are trying to correct for), but when deconvolved with the measured impulse response (or almost, using the 'Inverse Kirkeby' method, microphone in the same location), the recording sounds much closer to the original - in headphones. But I have not got it to work 'for real'. When your head is in the same position as the microphone, you just don't perceive the same hollow 'boxy' sound without the correction, anyway. And if I pre-deconvolve the source audio with the measured IR, then it doesn't suddenly appear to my ears as though I'm listening with headphones or in an anechoic chamber, as I had expected! Instead, I hear strong 'pre-echoes' and colouration. Maybe I'm not understanding what I'm supposed to be doing with convolution and deconvolution. I have yet to 'close the loop' and record the sound of the corrected audio using the microphone and listen to it with headphones. If it sounds OK, but doesn't when 'live' with my head in the same position as the microphone, I'm not sure how to proceed from there.
May be you already done that. Without moving the mic do the swept sin wave and calculate the FIR then do the FIR corrected swept sin wave and check to see if the frequency and phase response had improved.
What mic and what digitizer are you using?
The idea is that at least you see real improvement from the measurement.
Headphone may not be "flat", the only thing certain is that it do not have any audible reverberation/echo.
The first issue that gives you trouble is that you are assuming a linear system. Even a simple loudspeaker system is not linear. The voice coil heats in use and so the coil resistance goes up and the efficiency down. Air also has limits to linearity.
The second issue is that you can do this for one location. As you have two ears that places a second limit. A localization error of 10 uS is significant.
Third is that recordings being binaural representations of three dimensional pressure modulations are often improved by adding additional early reflections.
There are several commercial products that try this and although they sound different they don't particularly sound better.
But it is fun to play.
I think it is possible to correct (to some extend) the non-linearity for example by combining multiple correction FIR created a different sound level and do some kind of interpolation.
Localization is going to be a big problem specially if there are more then one person listening.
How are you handling the noise in your room measurements? As you pointed out, deconvolution is a division in Fourier space, so any noise in the transfer function will have a devastating result in the outcome (sometime, blowing up the whole calculation to infinity), especially in the region where the amplitude is low. There are so many scientific papers on handling the noise effectively, but in essence, you have to fight it with the number of data you take.
The "pre-echoes" you are hearing may well be derived from "ringing" as a result of deconvolution.
"Pulse" measurement to get a system's transfer function, in theory, is at best when the width of the "pulse" can be infinitely narrow and the threw rate is infinitely high. If you cannot achieve these close enough to practicality, then measuring steady state condition is better way to go. Given that a lot of us can hear a difference in -60dB region, you have to suppress noise even lower. As the noise goes down by the root of the number of measurements, you want to measure at least 1 million sine waves. Even at 20 kHz, it will be 50 seconds, while you need more than an hour for middle C (261 Hz). At least, you can try to increase the number of measurements in a practical manner and see whether it improve you deconvolution.
By the way, are you doing your deconvolution in double precision floating point??
Thanks for the comments.
The microphone I'm using is a simple Panasonic WM61 capsule with a 10k bias resistor and a 9V battery. I did build my own Veroboard pre-amp, but I'm now using a Tascam mixer's microphone input. I have yet to build the capsule into a housing, so it currently exists at the end of a stalk of screened cable, reasonably far from any nearby surfaces.
Yes, it has occurred to me that the deconvolution process can involve division of something by nothing, or of nothing by nothing. I wasn't sure whether the best thing was to catch potential divide by (almost) zeroes and discard them, or also to pre-empt the problem to some degree by only dividing the bins within my sweep's frequency range. In theory the Kirkeby inverse filter is supposed to suppress huge boosts, as well. (I am carrying out all the calculations in double precision.)
I realise that the system isn't strictly linear, so a single filter can only be an approximation, but I was hoping for something reasonably close which I could then refine. I have a feeling I'm making an error somewhere, though, because although I can 'correct' a recording of uncorrected audio made with the mic at the listening position by multiplying its FFT by the FFT of the Kirkeby filter and taking the inverse FFT, I can't seem to achieve the same result by saving the Kirkeby's real part as a wav file, then later convolving it with music using one of the VST-type convolvers and recording the result with the mic. I'm doing something stupid somewhere.
I have just discovered this nice looking GUI front end for the Sbragion DRC system (don't know how I missed it before) which looks to simplify the process quite a lot, so I will be trying this ASAP.
Digital Room Correction Designer Help
|All times are GMT. The time now is 05:24 PM.|
vBulletin Optimisation provided by vB Optimise (Pro) - vBulletin Mods & Addons Copyright © 2014 DragonByte Technologies Ltd.
Copyright ©1999-2014 diyAudio