A convolution based alternative to electrical loudspeaker correction networks

It makes sense that with a limited filter length you would want to focus on at least getting the magnitude/min phase response right.

I use single driver (sealed) speakers, so EP correction that makes a noticeable difference for me means stronger room correction which is not the direction I want to go in. From right in front of the speakers (I measure from the sweet spot) I didn't think 1 cycle of EP correction sounded too bad. With 2 cycles, the attack portion of electric bass and kick drum starts softening much more noticeably, and with 4 cycles, sounds like it's going in reverse (it is actually :)). If you look at the custom config file, you'll see that I decided on 1 cycle but there are some other settings that allow it to go a bit higher in areas where there are large response dips. I thought that was a good compromise for people who want to hear a noticeable difference compared to the min phase filter, and aren't overly concerned about performance far outside of the sweet spot.

This post still rings in my head. If we correct at a single listening position, and use a long (4 cycles is long there) EP filter, we'd be correcting for the floor bump and ceiling bump. In other words, we try to move that bump in line with the first sound wave. In all other positions except at that measurement point you'd risqué having the direct sound come first because the floor bump moved.

That's why I don't correct all anomalies in my left and right channel. If you look trough REW and flip trough the gating while looking at all plots, only the "bended sound" can be fixed. Sound arriving too late (in general) in other words. Floor bumps and side wall reflections don't do that. They do not bend the sound if there is some distance between it and the listening position. They cause a double peak (and dips). If we move the woofer (if we have one) close to the floor it can act in line with it's floor reflection and be treated as one. That's the merit of floor to ceiling arrays and the reason for them not to have obvious floor or ceiling reflections.

The separate drivers (and their reflections) act like multiple measurements. Each has it's own set of reflections but the next one will be slightly different.
So the total effect of each single reflection is rather small. Only things that are equal distance as seen from the listening area can still have a larger effect and should be treated.

Now why am I typing all of this? This is the reason why I could use a longer (~ 3 cycles) phase window compared to a single driver speaker and not run into immediate trouble. Averaging works.

Bringing a woofer down to the floor would work, not at all frequencies. Just at longer wave lengths. If we get clever with it, it could help minimise room effects. That is if we really want to "remove" the room (as much as we can) from what we hear. I do realize that isn't everyone's goal.
 
Taking this a step further and one could use a Synergy style horn for everything above the floor bump and an array to handle the low end. Provided the array would be short enough not to really act as a true line source (like dropping off only 3 dB for each doubling of distance).

The array bass would average out floor/ceiling reflections and most room modes and the horn would avoid the rest of the walls. :)

Or average out everything like B&O does or the new Lexicon speaker.

A single driver will always have specific problems related to it's position unless we take away that problem with treatment. Or we could move up close in nearfield so the reflections would be down in SPL level, just like we do when we make semi anechoic measurements.

I guess I'm saying: work with the room, we cannot avoid it (completely).

DSP cannot fix the room, though it can fix the (overall) tonal balance.
 
In the case of the experimental filter, the reason for the longer (in the midrange anyway) ERB window is not to perform a higher resolution correction (increased location dependency) but to initially highlight the portion of the response that has the greatest impact on our perception. The psychoacoustic stage is interpreting the peaks and dips of that response with an algorithm rather than just flipping everything upside down, and the response of the newer filter is actually smoother than that of the 4 cycle inversion (especially in the lower portion of the spectrum). Of course, a smoother filter response equates to less problems away from the sweetspot, and this technically comes at the expense of reduced accuracy at the sweetspot. The reason I'm excited about this method is that I feel that the *percieved* sound at the sweetspot is slightly more neutral.
 
Taking this a step further and one could use a Synergy style horn for everything above the floor bump and an array to handle the low end. Provided the array would be short enough not to really act as a true line source (like dropping off only 3 dB for each doubling of distance).

I guess I'm saying: work with the room, we cannot avoid it (completely).

DSP cannot fix the room, though it can fix the (overall) tonal balance.
Where have we seen this before ? :)

Putting the horn in the corner totally eliminates wall reflections but ceiling and floor have to be dealt with.

I chickened out on a 2nd bass bin sitting on to top of the Synergy so I didn't get to the array bass point. I feared for my back at the thought of putting it in place and then that it might not stay there.

Just having the woofer close to the floor for 300 Hz and down helps enormously. And it turned out I need the space above the horn for bass traps to attenuate longitudinal modes.

attachment.php


I think the should a Murphy's kind of law for room acoustics: there is always one more reflection or mode to be dealt with.
 

Attachments

  • full room pic with both bass traps in place.jpg
    full room pic with both bass traps in place.jpg
    106.4 KB · Views: 268
  • Like
Reactions: 1 user
Where have we seen this before ? :)

Putting the horn in the corner totally eliminates wall reflections but ceiling and floor have to be dealt with.

I chickened out on a 2nd bass bin sitting on to top of the Synergy so I didn't get to the array bass point. I feared for my back at the thought of putting it in place and then that it might not stay there.

Just having the woofer close to the floor for 300 Hz and down helps enormously. And it turned out I need the space above the horn for bass traps to attenuate longitudinal modes.



I think the should a Murphy's kind of law for room acoustics: there is always one more reflection or mode to be dealt with.


nice setup, but did those "bass" traps actually attenuate any bass reflections ?
 
In the case of the experimental filter, the reason for the longer (in the midrange anyway) ERB window is not to perform a higher resolution correction (increased location dependency) but to initially highlight the portion of the response that has the greatest impact on our perception. The psychoacoustic stage is interpreting the peaks and dips of that response with an algorithm rather than just flipping everything upside down, and the response of the newer filter is actually smoother than that of the 4 cycle inversion (especially in the lower portion of the spectrum). Of course, a smoother filter response equates to less problems away from the sweetspot, and this technically comes at the expense of reduced accuracy at the sweetspot. The reason I'm excited about this method is that I feel that the *percieved* sound at the sweetspot is slightly more neutral.

Could you show us where it is smoother? There are so many graphs that make up an IR making smoother a pretty relative statement. :) I'm always a proponent to seeing it in graphs, the more the better.

I'd accept the theory ERB filtering tries to bring in the perception part of listening from us humans into the equation, but in my humble opinion that "might" not be the best or ideal way to go about correcting the actual loudspeaker movement that creates the actual wave front.

Personally my main interest has been to make the loudspeaker movement follow its input as ideal/text book as possible, however if I want to look at the perceived part I'd probably filter the end result with an "ERB filter" to see how it performs there.

The signal that goes into the speaker to create the actual impulse measurement (IR) is a limited (pretty short) signal in duration. I try to only guide that signal, the actual driver movement, up to it's peak SPL level. That's what can be seen best in early waterfall plots or 1/3 octave filtered IR's. If you have a clear ridge in the early waterfall plot (mimicking the original incoming signal from the amp) the speaker acts like that ideal transducer as much as possible.
It's the reason why I asked for a comparison between your two different ways of correction. The actual measurements at the listening position after correction would be even better.
Everything that happens after that peak will still alter the perceived sound which is why there still can be all kinds of perceived differences. The more we avoid or remove early reflections, the closer the end result will be to the adjusted curve of that first wave front.
After optimising speaker behaviour I do look at how we perceive that sound, at which point ERB might come into play to look at how our perception interprets the sound presented to the ears.
Getting that first wave front as ideal as possible will also clean up the ERB view anyway, without having to use it as the base for processing.

Even though the DRC documentation states ERB will not be a much stronger correction than the minimal template, it's longer window in the mid frequencies looks past the peak SPL of the actual incoming signal and adjusts it's frequency curve there. It's window is very short at bass frequencies and didn't work for my setup on the low end at all.

The difference for me, personally is that I do want to get my speaker to act as the most precise device I can get it to be (reason to use shorter duration filters) but I do adjust the (overall) FR curve over a longer window to adjust the actual tonal or perceived balance.

I split that part up as being separate goals of the journey. Basically correcting the speaker response the best I can technically and adjusting the tonal balance in listening over a longer time frame.
This took a lot of my time and isn't an automated process but over time it did give me an idea what to shoot for as far as a room target (or actually my preferred room target) goes.
The room will always alter the perceived tonal balance. I'm sure we can all agree on that.
Treating the room made the end curve way more linear and much more in line with the target set or used in processing. What I learned in listening to a wide variety of songs and genres, adjusting the perceived tonal balance over time, I put back into the processing chain later on. Step by step, gradually growing to a (short) correction that will translate to what I hear (I did not use an actual ERB type view but my own ears teaching me what to look for, by (re)viewing my preference in measurements to record the differences).

I don't constrain myself by only listening in that sweet spot, but living with it and getting to know how it works out in the room at different places, still listening to it even in adjacent rooms to pick up what it sounds like over there.

Treating the room by removing early reflections presented a new problem while fixing the other parts I was looking at. It resulted in a tonal difference I heard between the phantom centre and the sides, which only becomes obvious in the absence of early reflections.

Fix one thing and a next problem will surface :).

It's a long route (taking up a huge amount of time) I took but it has been very educational. At first I was just looking at Room curves, ERB targets and other text book examples.
Doing experiments with different room curves thought me a great deal on how very small changes can actually lead to huge perceived differences. Much bigger than I ever expected or anticipated.
That's when I decided to "grow" into a target, one that would fit my own room and listening preference, taking the time to learn how to find that by re-doing measurements to show me what I preferred.

Trying to keep in mind all the text and theory I've absorbed, like ERB and Harmon's room curve research and things like sound power in the room, but testing each part to confirm it for myself. Basically questioning every part of what I've read about it and creating my own truth instead of just accepting the views of others on this subject as the truth.
This is just my view though, I'm only spilling it here because it has thought me way more than blindly accepting theory. No one needs to agree with me. :eek:

In this journey I've come to my own conclusion about linear phase too. Going in I had given myself the task to create a time coherent, linear phase result at the listening spot. That was the reason to go with full range arrays. I ended up preferring to adjust my setup as a minimum phase band pass device. I've spend a lot of time on that too. As it's way more difficult than to simply adjust phase at a single position to get all of that right. (or sound right)

I see a lot of reports on this forum where people claim: phase isn't that important but when asked they rarely are willing to tell me how they actually tested it to come to that conclusion.
In a room it is extremely difficult to align phase without going trough each measurement with a tooth comb looking for every detail out in that room. Every bit of room influence messes up that phase so simply adjusting a single phase plot with a program like RePhase would result in failure to create a valid phase correction at the listening spot in my humble opinion. It took me half a year to come close to something that worked. pos keeps hammering down the original purpose of RePhase is to linearize the known phase deviations of the crossovers used and rightfully so.
Many seem to think they can simply straighten the phase of a measurement with RePhase and call that time coherency, stepping over the fact that each and every change out in the room will actually influence and change the phase of that first wave front. So changing the phase of a single measurement is trying to change the room + speaker. When you play with phase (not directed at anyone in particular) keep an eye on that IR. Every wiggle in front of the main peak is energy arriving early. Filter a measurement IR like that with a gate that stops just before the main peak to get an idea of the frequencies (and their level) that ring before they should start making sound.

Greg, I certainly don't intend these posts I type as commenting on or disregarding your current experiments. The soul reason to type it is to make all readers here think. Make them think harder if possible and question themselves and question others too. I question myself all the time too as I really don't think I know it all :).
Making connections between graphs and my ear/brain combination is my goal and I'm making progress to be able to "read" graphs. But there are so many variables and I sure haven't figured out all of them (yet). The brain interpretation part being the most difficult part of all. Even that parts slowly makes more sense, and can be linked to a lot of common theory out there.

I'll stick to optimising the speaker performance first, and continue to take the perceptional part as a separate second step. Too bad I don't have much free time anymore because I can still think of a million things to try :D.

I'm looking forward to the recently announced REW developments to make it simpler/easier to average multiple measurements.

We really should all have anechoic rooms available to play in, unlimited budget and time etc... How I wish I could play like that! A binaural head would be fun too, using it to capture the sound at the listening spot for others to evaluate trough head phones. That should work better to relate back to what we hear at that spot.
 
Ronald,

Thanks so much for all of your thoughtful posts here. I do read them all very carefully :). I'm sorry that I don't have the time right now to respond to all of your points, but I don't think you necessarily intended for me to anyway. I don't think there is much of anything at all we disagree on, I'm just interested right now in finding a simpler and more automated process to give me results that I'm happy with.

I think I might need to clarify one thing: it's the response of the newer filter itself (not the corrected response) that is smoother than that of the 4 cycle inversion.

Anyway, I'm looking forward to sharing some graphs - let me know what you'd like to see exactly. Today, however, is one of maybe two days this week that I can get away from the computer. Off to the shop for some hardware processing...;).
 
Ronald,

Thanks so much for all of your thoughtful posts here. I do read them all very carefully :). I'm sorry that I don't have the time right now to respond to all of your points, but I don't think you necessarily intended for me to anyway. I don't think there is much of anything at all we disagree on, I'm just interested right now in finding a simpler and more automated process to give me results that I'm happy with.

Very clear goal (the automated process) and I wouldn't want or mean to subtract from that with my ramblings. :)
My posts were intended to dump some thoughts that might inspire others. Or proof that I'm simply a nut case ;).

I think I might need to clarify one thing: it's the response of the newer filter itself (not the corrected response) that is smoother than that of the 4 cycle inversion.

That's way more clear for me :D.

Anyway, I'm looking forward to sharing some graphs - let me know what you'd like to see exactly. Today, however, is one of maybe two days this week that I can get away from the computer. Off to the shop for some hardware processing...;).

You can't run APL_TDA yet, can you? In that case a waterfall plot with these settings would be interesting, as measured at the listening position:
wfs.jpg


A spectrogram (and/or wavelet) showing the first 30 ms would help too.

I'd like to see the first ~ 30 to 50 ms of the filtered IR too (in dBFs to see levels of reflections and decay), and an IR up till ~20 ms. Just to compare these two different methods.

The frequency response curve filtered with a FDW of 1/6 oct showing phase would help too. And a left/right FR curve of each method to show the balance.

I could think of more things to look at but these will get us started by looking at possible differences in these plots. We may need to dig deeper.

On and off axis measurements would be way more work, only interested in those if you have the time to do that. The rest is easy to extract from what you probably have already.

One note: I'd love to see these from an actual measurement at the listening position with corrections applied. The predictions from DRC-FIR are close but do differ in practice.

Just curiosity that I ask for these. If we hear differences I always want to find them :D.
 
Last edited:
@wesayso,

thanks to gmad, I figured out how to convolve a response with a music file in foobar, and then abx it in foobar.

I am currently abxing a min phase and linear phase filter. all else equal even the magnitude target.

on my first attempt I gave up even though the min phase and linear filters are very different in the time domain. but in reality they seem to sound exactly the same, which makes sense because when you look at the phase data for the corrected filters its clear to see that most of what the linear filter is doing is slowing down the arrival of the tweeter sounds to match the woofer arrival.

I will create a dropbox account and post a blind AB of min phase vs linear phase.
 
A headphone test is i.m.h.o. not the ideal way to test this in my opinion.
A large part of music is felt as well as heard. Headphones won't give you that.

I was sure there would be large differences by aligning phase, and still am but different than what I thought it was, going into this project. As said, opting for linear phase makes it easy to mess it up and have some frequencies arrive in advance.

The effects of phase rotation are subtle, in most sounds, but differences do exist. You can even train yourself to pick up differences more easily once you know what to listen for.
Check out the actual arrival of all the frequencies at your listening spot. Most rooms make a mess of the bass frequencies which could actually be more devastating for the sound than any group delay from a cross-over.

I'll agree it's going to be hard to hear between two songs on headphones.
But there's more at play here. Get the bass part to arrive sooner than high frequencies and I promise you'll notice it immediately.
Get it arriving late and it will be much more forgiving. Yet the reason for me to try and get it "just right" has more to do with overtones and harmonics. And it will be more revealing in a real stereo setup with proper bottom end extension too.
We do react to timing queues, and a time coherent setup can sound quite scary to reproduce sudden sounds like a gun shot or a drum hit even.

At some point you'll hear the difference between Boem! or Booeem! Where that point lies is hard to tell. Get the bass part first (meoB!) will reveal itself immediately ;). Just kidding in this part.

To get closer to real life sounds I wanted to try this. Not with headphones but with a stereo setup that has good imaging already. I was expecting imaging differences and even improvements. I was wrong there, there weren't much clues where it helped in imaging. However due to the array nature of my speakers I had a phase bend at ~150 Hz, right where the Impedance peak is. Sound was a little slower there. Fixing that gap was a noticeable improvement, more fun with kick drums! Keeps you on your toes kind of sound.

In pictures that sort of looks like this:
An APL plot without any FIR correction at all:
APL_Demo_wesayso%20no%20cor.jpg

See how the sound is late between 100-200 Hz, not even building up yet? It's "less than" 15 ms.

After the "fix" things start to look different:
APL_Demo_wesayso.jpg

The build up on the left side is still smooth, no early sound, lets move the peak right a bit:
stereo.jpg


Don't mind the slight ridge you see in both corrected plots . This just means I didn't align the microphone perfectly between both speakers. You'll see they are happening at slightly different frequencies as these were two separate Stereo measurements.

So, my illusions that it would matter for imaging (prior to trying it out) weren't true. But it still makes a difference (both tonally and on a "jump" factor) and resembles the real sounds heard in life. A sudden strike of lightening very close to you will make you jump.
I remember riding on my bicycle trough Town when someone dropped a big wooden beam from the back of a truck right next to me as I was passing it. That! is what I want. :)
Headphones won't reveal that as easy. A system playing loud enough with 20 to 18 KHz just might. It's the vibration you feel in all of your body lining up with the sound that I'm after.
Bang on a drum hard enough in your room to hear how it sounds. It will be immediate sound without delay, followed by the room reflections etc.
Overtones and harmonics lining up without a crossover is another thing. A good read on that is what Troels put to words better than I could.

There are differences to be observed even between crossovers, read Troels reasons to go with 2nd order instead of 4th. Siri's Killer Note
Of course I've tracked down that track and can confirm Siri's Killer note sounds killer on the time coherent arrays. ;)

//rambling mode off....
 
Last edited:
I hope it's ok that:

-the FR graphs are L/R, everything else is 2 channel average

-this is all MP response only (the filters are not touching the EP response anyway).

Obviously, the custom filtered response is going to look "better" overall since it's a stronger filter. I think the most meaningful comparison here is probably the 24 cycle freq dependent windowed FR. Let me know if I forgot/screwed up anything, and if you see anything interesting.
 

Attachments

  • graphs.zip
    862.6 KB · Views: 46
The most meaningful difference here for me is the balance differences.
It does not seem like much, but it is happening over a wide area.
The Psycho version could seem more detailed because of the higher high frequency balance.
This can be seen in all graphs in a direct comparison.

I'm still liking the 4 cycle correction but would tweak it, not your goal obviously.
I do expect you could better each one if you look for parameters that fix the 2.7 bump. As can be seen the ERB correction does some of that correction, but dip limiting does not allow to fix it better. I do believe that part to be easily improved/fixed with min phase PEQ. So DRC should be able to do a little better and remove the excess ringing (it's most probably driver related). Even though this is 2 channels average I expect to see something in both single driver graphs indicating a driver issue. If it would not be at the same spot in both measurements the averaging should have averaged it out of the sum.

attachment.php


As said, balance would be the biggest noticeable difference. Room for improvement in both. Not by lengthening the correction, but by optimising settings. I'm only looking at 2K+ in the waterfall plots.
 

Attachments

  • Animation1.gif
    Animation1.gif
    215.2 KB · Views: 263
Last edited:
I just compared the filters with (A-weighted RMS level matched) white noise (files are now in the folder). I still feel that the psycho filter provides a slightly more neutral sounding midrange, but I think the important thing here is how similar they sound despite the different tonal balances some of the graphs might suggest. If we were (for the sake of argument) to call it a draw in terms of perceived sound quality, the advantage would then go to the filter with less resolution (since that should allow for better off-axis performance).

I think the two things I've been most interested in accomplishing with DRC for a little while now, have been (regarding perceived sound quality) improving the reliability of the automated response, and reducing filter resolution with minimal impact. To be more sure about whether this new process meets those goals, I'll need some more time to live with it, as well as feedback from other DRC users.
 
So the psycho filter allows for better off-axis performance because it has less resolution.
Even less resolution than that would be to use no filter at all. Is the off axis performance at it's best now, without any filtering at all? :)

This above is why I do not agree with a statement like that until we actually see prove of what it does off axis. Who knows, the 4 cycle correction might still be doing better!

On paper, the custom filter performs best but in real life you do not prefer it. How can we learn why that is? I'd say by making the custom correction similar in frequency balance as the Psycho and compare again.
Another subsequent test would be to see how both perform between 0 and 30 degree off axis. One or the other of the above just might give us the answer.

These are the steps I would do to learn. I hate to just assume things, actually checking it works better every time in my humble opinion.

When I was testing Pano's phase shuffler I came across some differences between sitting off axis on the left side compared to sitting off axis on the right. You bet I found out why in the measurements! The speaker can never do better than it first measures is what Dunlavy once said. I fully agree. But you don't want to know how sensitive we are to the area between 1.5 KHz and 5 KHz, as this is right where the cross talk is messing with perception. The custom filter shows a bump right there in every graph. I know what a bump like that can sound like. It can have such a large influence on perception that makes me say: that alone could totally be the reason you prefer the other filter. Well, that and the fact that overall the Psycho is getting hotter the higher we go in frequency. All of it is enough to make up or explain all of the perceptual differences, that might have nothing to do with the actual strength of each filter. Just how it's balance turned out.
 
Last edited:
So the psycho filter allows for better off-axis performance because it has less resolution.
Even less resolution than that would be to use no filter at all. Is the off axis performance at it's best now, without any filtering at all? :)

Well, let's change "better off-axis response" to "less off-axis artifacts". Does that change anything? :)

If/when you get around to comparing the white noise clips, I'd be interested to know if your perception is just what you'd expect based on the graphs...


edit: I'm not ignoring your request for off-axis measurements; I'd like to see it to. I'm planning to get to that...
 
Last edited:
That reads a little better, but without checking we're still assuming it's true.

A lot of driver anomalies still exist when moving off axis. Other parts are going to be influenced by the room too much. That's why taking several measurements and averaging them (in a time aligned manner) does make sense.

A good full range driver will still show artefacts of it's own. E.g. due to the transition of pistonic behaviour to a more bending mode state up high etc. Or the influences of it's size and the (bulging or dipping) surround and how well the spider is balancing everything, many things to consider.
If we can solve a few of those problems early on it just might do better overall (at all angles). Where to stop or where our DSP manipulation is making things worse is hard to say without rigorously moving trough the measurements and changing one thing at a time.

Doing listening tests is not on my agenda anytime soon, I have a plugged ear due to a cold. :(
Look at the waterfall plots again. One of them is cleaner, the other is less clean but more balanced. How are we going to know how our brain figures out what's happening?
Our listening tools (ears) aren't even very good measurement devices. At least not compared to our own DSP inside our head!

You will hear things different than I will, we will have differences in interpretation too. No problem there but it is fact. Knowing what we hear and relating that to the graphs means one by one eliminating every little detail to learn only a tiny part of it. But that's what I've been doing or at least tried to do.

Both corrections I did listen to were way to bright sounding for me. Missing a large portion that I still deem important for overall listening pleasure.

The most dangerous part is drawing conclusions after changing a million tiny things at once. As that's the real difference between these two corrections.

Changing just one thing is hard enough already but a way more consistent way to find clues.

By the way, I hate listening to white noise. Done that way to much and with 2 speakers You'll only pick up the problems that Stereo presents us, which is also a big part of our total perception. A single speaker and white noise is much safer. As soon as we have 2 stereo speakers firing we will experience cross talk, no way around that, even if it may be partially masked.
 
Ill take a more accurate magnitude measurement at the listening position even if it makes the response outside of the listening position worse.

Also, a single position measurement is still a bad idea. There will be anomalies and that is why you guys are having to use such short windows to capture more of the speaker response vs room response.

You will find that if you fix the single position measurement, and then run the measurement again there will still be significant phase deviation in the unwindowed measurement
 
Also, a single position measurement is still a bad idea. There will be anomalies and that is why you guys are having to use such short windows to capture more of the speaker response vs room response.

The multi-position measurement in this case was done in 5 positions across my sofa (probably covering about 5' or so (it was a while ago)).
 

Attachments

  • multi position vs. single position.jpg
    multi position vs. single position.jpg
    111.3 KB · Views: 80