rePhase, a loudspeaker phase linearization, EQ and FIR filtering tool

Hi Toino
Thanks :)

There are 3 different types of delay to be considered when a convolution is to be applied:
- in/out delay: conversion times, buffering, etc. (can be quite high with a PC...)
- convolution process delay: will be quite high for FFT convolution as these obviously need buffering, a bit lower for portioned convolution, and zero for direct time-domain convolution
- impulse delay: the is the delay implied by the correction impulse itself, due to the position of the peak within the impulse. This will typically be half the impulse length for linear-phase correction impulses, very low (could be zero) for energy-centered minimum-phase correction impulses, and very high (could be the length of the impulse) for energy-centered purely phase linearization of a minimum-phase system correction impulses.

All in all hardware solutions embarking time convolution, such as the openDRC, will impose the lowest possible delay for a given correction.

On the other hand a PC with its soundcard buffers, routing and plugin architecture and FFT convolution will likely add a lot of delay to the one implied by the impulse itself.
Partitioning the FFT convolution will of course reduce this one to some extend, but the others will remain unchanged.
 
I have read a lot of papers about the partitioning convolution and the suggestion of reducing the delay is there; but I also don't understand how that is possible...
Without partinioning you have to wait until your sample buffer (which is as large as the convolution kernel) is completely filled before the FFT/iFFT transforms can take place, yielding a fully transformed complete buffer which then can be streamed to the DAC. Even when you had a singe Dirac as the first sample of the kernel, you would have to wait one full kernel length before output can start.

By dividing the FFT processing into smaller blocks, you can reduce, but not eliminate, this unavoidable additional processing latency, on top of the intrinisic delay in the pulse kernel itself (which is appication specific). Zero processing delay can only be had with time-domain convolution which is practical only for very small kernels (therefore most hardware FIR XO-systems downsample the MF/LF-ranges to make convolution feasible)

With modern audio devices and operating systems I/O-latency of PC-based systems can be ignored, often less than 128 Samples.
 
An eyballed baseline minimum is the cycle period time of the lowest freq you want to process, times the number of 360deg rotations you need to roll back, using the tweeter as the time zero. 720deg @ 50Hz gives 40ms.

For the work I do latency isn't critical. Typically I use 32kSample kernels @ 96kHz with time zero centered, so the delay is only 16k Samples (170ms).
For playback, other than the time difference from hitting "start" to when the music starts being played, latency is no concern.

Looks like the latency required to correct my PA down to 30 Hz would be far too long, over 20ms starts sounding like a discreet "slap" echo (when combined with the stage sound), 170ms is in to Pink Floyd territory.

Thanks!
 
Here are some data showing what result can be expected for a given delay, with (left) and without (right) iterative optimization of amplitude accuracy.
The amplitude scale is +/-1dB, and +/-0.1dB for the last cases.

The data are for a 24dB/oct LR filter at 100Hz, but can be easily transposed to other filters:
- doubling the filter slope implies doubling the delay for the same result
- halving the filter slope implies halving the delay
- going one octave lower implies doubling the delay (halving for an octave higher)
- going a decade lower implies 10 times the delay

As you can see 10ms for a 24dB/oct LR at 100Hz gives good results with optimization, and 8ms can already be considered good enough depending on application.

So based on these a 30Hz BR correction would for example require ~33ms to yield the same result as the 10ms example.

If you also have a crossover around 100Hz for example, then it would also have some impact on the BR phase shift, and you will possibly need more delay to achieve good results.
The filter to consider for your calculation is the lowest one you want to correct, but you also have to take into account filters that are "not too far" and that way their phase shift will combine.

If you want to limit yourself to 15ms for example, instead of trying to correct the full BR (and get a result similar to the 5ms example) you could only correct half of its phase shift (virtually turning your BR into a sealed box phase-wise) and get a good result (same as the 10ms example)
 

Attachments

  • rephase phase linearization with optimization.jpg
    rephase phase linearization with optimization.jpg
    217 KB · Views: 475
  • rephase phase linearization without optimization.jpg
    rephase phase linearization without optimization.jpg
    222.9 KB · Views: 478
Last edited:
For playback, other than the time difference from hitting "start" to when the music starts being played, latency is no concern.

Looks like the latency required to correct my PA down to 30 Hz would be far too long, over 20ms starts sounding like a discreet "slap" echo (when combined with the stage sound), 170ms is in to Pink Floyd territory.

Thanks!

Remarkably short complex FR correction can be applied, say above 200Hz, with very short latency, and likely produce much more tangible result for vastly larger set of people.
 
At the moment I'm using my M-Audio Fast Track Pro as the output, it has 4 channels out. The player is JRiver. I just tell JRiver that I want 4 output channels and the convolver script text file does the routing, crossovers and such. I currently have a file for HP and a file for LP, each 24 bit mono wav. JRiver hits the 4 channels with Left Low, Right Low, Left High, Right High, then it's straight out of the Fast Track to the amps. So JRiver is doing the splitting into 4 channels, I tell the convolver which impulse to use and where to route it. Simple, but took some time to figure out.
What I'm suggesting is to refine the "script", for natively supporting IIR BiQuad filters and FIR filters.
The FIR filter gets described just as usual, using a monophonic .wav file.
Each IIR BiQuad filter would get described using a short stereo .wav containing 3 stereo samples. The non-recursive part of the IIR BiQuad would be described using the left channel. The recursive part of the IIR BiQuad would be described using the right channel.

The script would be a text file just as usual, naming a succession of .wav files. Say 8 x IIR BiQuad filter, plus 1 x FIR filter.
A typical two-way Xover crossing a 2kHz would specify 8 x IIR BiQuad filters in series, followed by a single 128-tap FIR.
A typical two-way Xover crossing at 200 Hz would specify 8 x IIR BiQuad filters in series, followed by a single 1024-tap FIR.

Basically, the aim of the 8 x IIR BiQuad filters is to shorten the speaker impulse response, prior to applying the FIR filter. Most bass and midbass drivers exhibit a sharp cone resonance at approx 5 kHz. Some exhibit a +5 dB resonance, other exhibit a +15 dB resonance. Their impulse responses exhibit thus ringing. If you intend to suppress such ringing using a FIR filter, the FIR filter needs to be long, covering tens of milliseconds. A more suitable approach is to suppress such ringing using an IIR BiQuad filter configured as -5 dB or -15 dB notch at 5 kHz with the correct Q factor. Consequently, the FIR filter that's following doesn't need to cover tens of milliseconds. The FIR filter that's following can be 128-tap (less than 3 ms long at 44,100 Hz), fully dedicated to the fine gain and phase linearizations.

Oh, and I'm forgetting an advantage. Tuning an IIR BiQuad filter is easy and intuitive. It's like operating the pots of a PEQ. After having suppressed the main ringing, you'll get motivated for suppressing second-order ringings, for filling bumps in the frequency response, and for pre-filtering the signal using a lowpass and/or a highpass. Doing so you will be amazed how a succession of eight IIR BiQuad filters can massively improve (shorten) the impulse response.

This way, the FIR filter that's following appears as a nice refinement, not only for linearizing the amplitude response into a 2 dB corridor, but also for reaching a smooth controlled phase.

Such approach got applied in the Philips DSS930 and DSS940 digital speakers, back in 1993. Two-way speakers. Crossover around 3.5 kHz. DSP at 44.1 kHz. Only 41 taps in the FIR filter, if I remember. Outstanding overall impulse response, as result.
 
FIR filters are very good devices for audio, at the condition that they get used in a proper way.

1- FIR filters can be long, and describe low orders, say an order of 3
2- avoid high orders tending to brickwall filtering, as they generate ringing
3- when crossing at 2 kHz, very good results can be obtained using a 128-tap FIR filter
4- when crossing at 200 Hz, very good results can be obtained using a 1024-tap FIR filter
5- an FIR-based crossover can be designed for implementing a symmetric phase-linear 3rd-order (highpass 3rd-order and lowpass 3rd-order)
6- AND designed for zero relative phase shifts
7- AND designed for zero overall phase shift

The IIR BiQuad filters I'm referring in the post above, are only there for maximizing the FIR-based crossover efficiency. The FIR filter efficiency gets maximized when the FIR filter doesn't need to battle against the speaker cone breakup and ringing at high frequency. A single carefully tuned IIR BiQuad filter will eradicate the nasty effect of the main (first) speaker breakup mode. Adding a few more tuned IIR BiQuad filters will eradicate the effects of higher-order speaker breakup modes. This way the tweeter emission won't get garbled by spurious emissions coming from the midbass speaker.
 
Regarding IIR BiQuad filters, it's here :
http://www.diyaudio.com/forums/digi...n-help-digital-audio-filters.html#post3692813

A0, A1, A1 are the coefficients in the recursive path.
B0, B1, B2 are the coefficients in the non-recursive path.

Those six values can sit into a stereo .wav containing 3 stereo audio samples.
For avoiding numbers outside of the (+1,-1) range, we would divide them all by two inside the .wav.
 

Attachments

  • 4000 Hz notch -6dB Q3.0.png
    4000 Hz notch -6dB Q3.0.png
    65.5 KB · Views: 798
Last edited:
Excuse me for answering an old post in this thread. I'm just treading water here...

Why use a specialised DSP board when a near-silent PC can be had for a couple of quid?
And then you can develop on the target machine using free software. 32 bit floating point is very quick and it's powerful enough to use 64 bit if you so desire (even on a 32 bit machine). Plus you lock everything to the same sample clock (sound card is the destination for the audio stream, source for your crossover app, and destination for your crossover's output).

All this talk of SHARCs etc. makes me think I'm missing something, and also makes me head spin at the learning curve involved.

There's only one reason to justify the SHARC and Mini DSP at this point in the game: Making a reliable backup to your Windowz computer. For example, I'm using Acourate Convolver for my main audio system, 64-bit, dithered volume control, to 24 bit on the way to the DACs, superb convolver, 65k impulse response length, etc. etc. etc. What's not to like? Uh... yeah, Windowz. Can you say, "excuse me, I have to reboot my loudspeakers." So, basically, I'm learning how to get the best performance possible from the Mini-Sharc as a backup system. It boots up instantly, it's stable, and with the best optimization it should get me through a few days of main computer being down.

I do wish someone would conquer developing an Intel-based convolver in an embedded system that doesn't require windowz and does what the Mini-DSP system does. We're just not going to see a dedicated DSP chip with the speed and power of an Intel i7.