The making of: The Two Towers (a 25 driver Full Range line array)

As promised, a bit more info on my current ACDC filter (Aural Comb Delay Convolver).

This time I use no extra EQ to do what I want, I have a short negative band passed burst, down in level in front of the main signal, and add a bit of energy back that gets robbed by the negative signal to balance the result. It is lower in overall level due to canceling part of the signal that gets combined at the ears.

If we remember the Dirac pulse combing at the ear:

di01.jpg

(0.1 ms rise time)
di02.jpg

(0.2 ms rise time)
di03.jpg

(0.3 ms rise time)

After which we'd get the inter aural comb pattern (0.270 ms has passed)
dic01.jpg

(0.1 ms rise time of L + 0.270 ms delayed R)
dic02.jpg

(0.2 ms rise time of L + 0.270 ms delayed R)
dic03.jpg

(0.3 ms rise time of L + 0.270 ms delayed R)
dic04.jpg

(0.4 ms rise time of L + 0.270 ms delayed R)

The new filter looks like this:
acdc01.jpg

(0.1 ms rise time)
acdc02.jpg

(0.2 ms rise time)
acdc03.jpg

(0.3 ms rise time)

After which we'd get the inter aural comb pattern (0.270 ms has passed)
acdc-2-01.jpg

(0.1 ms rise time of L + 0.270 ms delayed R)
acdc-2-02.jpg

(0.2 ms rise time of L + 0.270 ms delayed R)
acdc-2-03.jpg

(0.3 ms rise time of L + 0.270 ms delayed R)
acdc-2-04.jpg

(0.4 ms rise time of L + 0.270 ms delayed R)

Much better behaved and cleaner than my first attempt.

The trick is the direct signal aimed at our ears builds up the anti comb signal over time, lets look at the direct signal (if we only had one ear) after more time has passed (no combing happening here):
acdc08.jpg

You can see how the reverse "EQ" of the inter aural combing happening at the ears slowly builds over time, largely fighting the comb pattern as it happens. But not before. It's (almost) perfect flat until the first combing starts.
You can see the difference in total SPL level between the Dirac pulse combing and this filter. I only raise the area of interest globally by about 1 dB to compensate. (not seen in these plots)

I still haven't figured out if we need to apply something like this for the side signal. Right now I only apply it to the mid part (of a mid/side stereo split).
My reasoning; real life sounds coming from hard left or hard right would eventually reach the other ear as well, plus there's no signal in the opposite channel that's exactly the same to cause the combing. So in that theory it would be best to only apply this to the mid part of a mid/side signal. Only a slight EQ curve is needed at the sides to make it tonally equal to the mid. The old S-curve I already applied.(*)
An advantage to the slightly higher "Sides" level below 1 kHz is the introduction of an anti phase signal (at very low levels) in the opposite channels. Somewhere buried in this thread is a complete explanation of that fact with tests and graphs. If you search this thread for "Surround field" you're bound to find it in an earlier post by me.

I need to work on this stuff a while longer, to fine-tune it. As with the phase shuffler, this will work best when early reflections have been reduced/eliminated and phase in the early left and right channel is tracking each other within reason. Off axis you get a slight EQ curve gradually building over time combined with the combing at that position, as at every position off axis you will get similar combing, but different at both ears in the frequencies it is happening at. A more thorough examination of that fact can be found in one of my posts on the Phantom Center thread. (as part of the shuffler examinations)

(*) for more info on the S-curve, see: http://www.sengpielaudio.com/FrequenzabhHoerereignisrichtung.pdf

I've been using something similar to this for quite a while. It was a substitute for the JRiver effect "Surround field" I used before I moved on to this S-curve.

For me it doesn't widen the stage. It lines up the imaging position of lower frequency sounds and higher frequency sounds at the left and right sides. Making back ground vocals sound more believable, fuller/complete if you will.
Even this S-curve is related to the 2 ears we listen with and the way the sounds add for the center mixed sounds. Anyway, the paper probably explains the rationale better than I can, but the short version is: it's our 2 ears at work again.
 
Got a minor setback today... I wanted to listen to a few different delay numbers to see which one matched my ears/setup best.

I noticed a difference between 270 and 290 us. But 280 us did not seem all that different from 290 us.
Time to throw it in a loop again:
jriverdelay.jpg

As it turns out, 280 us and 290 us are exactly the same (and the true delay seems to be ~272.5 us while the 270 us is more like 250 us in the real world).
So there's definitely a limit to the accuracy achievable here. I actually used 0.290 to get the graphs to line up at the 270 us plots I make so I had noticed something was fishy.

I installed Voxengo's sound delay again, as it pretends to be able to do this kind of fine adjustments (0.0x ms with the x being 0 to 9). But sadly no real change. I could probably find a way to do it in a FIR based filter, but then I'd have to use the filter on the sides as well, as I do not split the stereo signal to process mid and side with FIR.

I'll try and hunt down a workable solution, first I've got to make sure it's JRiver or something else at play. But it makes it that much harder to fine tune the delay to match the ears. Not that the 290 us isn't working, even the 270 us sounds pretty good to my ears, but I'd rather have more accuracy available to play with numbers in between.

Edit: found out why... the 44100 sample rate is just too course to get finer adjustments.
11 samples = 249 us
12 samples = 272 us

So it's the sample rate that keeps me from fine tuning the timing.
 
Last edited:
That didn't take too long, doubling the sample rate should fix this:
highersamplerate.jpg

At least it would give me some wiggle room. Compared here are 88.200 and 96.000 and they match.

But I should redo my measurements to match the sample rate I guess. As I have an up-sampling DAC I have avoided an in between up sampling step so far. The input with optical is limited to 24/96. It would have to be really worth it to do that. But it opens up a little fine tuning. I guess I need to figure out my personal inter aural time delay at the ear. I've got a pair of in ear headphones I could try as in ear microphones :D. Worth a shot I guess.
 
Last edited:
Something has been bothering me the more I think about the phenomena your addressing. When I think about what we are actually hearing, it isnt represented fully by both channels driven at one ear location. I think a better representation is as follows.

(RR) Measurement 1) right channel at right ear
(RL) Measurement 2) right channel at left ear (with a .275ms delay offset)
(LL) Measurement 3) left channel at left ear
(LR) Measurement 4) left channel at right ear (with a .275ms delay offset)

So, would it follow that the totality of what we are hearing could be represented as:

1) + 2) = RR+RL
1) + 3) = RR+LL
1) + 4) = RR+LR
2) + 4) = RL+LR
2) + 3) = RL+LL
3) + 4) = LL+LR

????

I think so.

First RR, LL, LR, RL (1st pic)
I went into the impulse of LR and RL and entered a negative offset of 275u (3rd pic) , then did all the math operations A+B to create the 6 new graphs (2nd pic). Then averaged those 6 (4th pic)
 

Attachments

  • RR RL LL LR.jpg
    RR RL LL LR.jpg
    459 KB · Views: 84
  • ALL6.jpg
    ALL6.jpg
    499.6 KB · Views: 76
  • impulseoffset.jpg
    impulseoffset.jpg
    302.5 KB · Views: 75
  • ave6.jpg
    ave6.jpg
    415.5 KB · Views: 78
Last edited:
So the difference (on my system) becomes this (see pic)

(orange = averaging in the comb effect)
(green = just averaging the 4 basic measurements RR, LL, LR, RL without the head offset of 275u)

So in conclusion. The head offset problem does darken the middle of the soundstage, but not as drastically as the theoretical models or a stereo sweep at one ear location would indicate.
 

Attachments

  • diff.jpg
    diff.jpg
    441.8 KB · Views: 83
Last edited:
Yes I see your point... that's very close to how I see it too, more or less.
That's why I originally shaved a bit off at ~3.5 and 7 KHz and got an improvement in some songs. (I used 3.7 and 7.2 originally for a long time, just a slightly different delay figure).
After that I started experimenting with a little boost at ~1.8 and 5.5...
Just minor tweaks, no more that 1.5 to 2 dB. It made some voices more coherent in the phantom center. Most voices were perfectly OK before using that tweak though.

After that I tried the rePhase 2 shuffler from Pano, and that did something my EQ tweaks couldn't mimic. (it also made a bit of a mess in overall tonal balance somehow)
The thing it did for me is create more depth (more room in general) in the center image than my EQ did.
That's when I started adding and subtracting files like crazy to figure out why 'some' of that phase shuffler worked. :) And I don't have THE answer yet. I'm trying to fix part of the problem, but keep it gentle enough not to disturb off center listening.

Not an easy task by any means though. It's also the reason why I limit this processing to the phantom center (shared L and R) info and leaving the sides untouched.
Have you tried any of those test songs (dyno)Mike made? You could burn a couple to CD and play them.
 
In my system, I am not having any trouble with vocal intelligibility. Now that its been pointed out, I might agree my middle is a bit darker in the 2K range than the sides.

But overall, its not a big enough disturbance to try to fix. For me, this subject is more a thought experiment than an actual problem that I need to fix. Granted, if it were easy to fix, I might give it a try. But your documentation has made it clear that this is no easy task.

I haven't listened to any samples. I dont want to tease myself with a upgrade I cant implement. :santa:
 
Well I can shake your hand, it's more a thought experiment for me too. But the vocal intelligibility definitely changed for me, but like I said, only on a few particular tracks, I've mentioned a few of those on my thread before. One is "Christine Aguilera" with "Underappreciated". A couple more on that album too. I use that album a lot, not because I like it that much :eek: but because it has a lot of different mixes all on one album plus a lot of widely panned back ground vocals. You can hear the difference in vocal recording even without cross talk (headphones). Another song, "Impossible" has a normal, clear vocal in comparison. You'd almost think she had too much to drink recording that other track :D.

I can play with it at no extra cost, just trying to manipulate a few parameters so it's easy to play this game. And I'm convinced there's something to be had.
And if I find it, I get to keep it ;).

The most difficult part (as always) is not to mess with the things that were good about the "before". I have setup a test zone in JRiver and can go wild with experiments. While keeping my "normal" settings save and sound.
I'm doing this to learn, more so than I'm expecting to actually cure this. It would be nice of coarse, but indeed I'm learning here. Our hearing system is quite a marvel. It's fun to learn how we detect direction and distance and I can actually play with those variables this way. I'm learning why I hear the sound where it shouldn't be, sort of :joker:.

P.S.
One thing I did miss in your add/subtract picture. To me it seems that the combined left+right phantom should be higher in level (below 1 kHz) than the sides. At least when the cross talk kicks in after ~270 ms. That's what I'm getting anyway. Which is also the base for that "S" curve story I linked.
That would make the bumps at 3.5 and 7 KHz just above average and the dips at 2 and 5.5 KHz fall below the average left and right SPL level.
 
Last edited:
I've done some further experimenting yesterday. The graphs I posted a few days ago were from a cross talk signal before the main pulse.

In theory that one is wrong. But compared to doing it the other way around it made much more sense to try. It had way better looking waterfall plots compared to the version with the cross talk signal behind the main pulse. (which in theory would be more right)

Why do I say one would be "more" right and the other wrong? The pulse arriving late at the opposing ear is the one we would ideally like to cancel. That would mean the cross talk cancelation should happen after the main pulse. But if you do that the main waterfall plot already shows the counter shape of the cross talk cancelation and remains that way, much like a fixed EQ.

Having the cancelation dip in front of the main signal has a flat top FR in the early waterfall plots (like the ones I showed a few days back) and creates the counter signal over time, making the second pulse less severe in it's comb pattern.

I listened to both. Compared to having no cancelation they both gave me more 3D-like stage, but different. In a way this reminded me of the original rephrase shuffler, both in results of waterfall plots and in listening. But what I didn't expect is the worst looking waterfall plots being the more focused and separated in the phantom center. While the other chain gives a smoother (but less focused) tonal balance.

Now don't get these experiments wrong, I'm not using cross talk cancelation on the level used in most Ambiophonics algorithms (at least the ones I tried). I'm way down in level from the main signal in comparison. An Ambiophonic setup would place the speakers close together and rely on the cross talk to get the left/right separation. I'm still using a regular stereo setup and only use this cross talk cancelation in the (phantom) center signal.
I have played extensively with Ambio routines and IR pulses in a distant past and could not get those algorithms to work to my liking (in my room with the speakers I had at that time). Not that I want to discount their efforts in any way, I'm just chasing just a different view or take on the matter here. But I'm after a fix for the same problem so there are going to be similarities.

After sensing/experiencing the difference in the side content with a just tiny bit of mid/side processing (discussed way earlier in this thread) I figured just a hint of the same principals should work for the phantom center too. That's why I keep the levels of cross talk cancelation way down to keep the stereo image close to the same everywhere in the room (meaning off axis). I'm within plus/minus 3 dB at all times in any FR curve, but actually improving the balance at the listening (sweet) spot.

Wow this is turning into a long explanation... :)

In short: I combined both methods I mentioned above... a little cancelation before the main pulse and a little cancelation behind the main pulse.
The result? About the same separation I experienced with the rephrase shuffler but with a smoother tonal balance. The increased sense of depth is there, the vocal parts have improved and intelligibility on the few odd test tracks is favorable. As always I ran out of time, now why does that happen every time!

To give you an idea on the pulse shape:
x-talkpulse.jpg


And filtered IR showing how far down the signal is relative to the main pulse:
x-talkfIR.jpg


Phase is almost flat for this pulse as it's almost a symmetrical signal. No more than 4 degree phase swing...
 
Last edited:
It would only be fair to show the early waterfall plots of this type of IR,

First the single speaker for ~0.3 ms again:

adc01.jpg

(0.1 ms rise time)
adc02.jpg

(0.2 ms rise time)
adc03.jpg

(0.3 ms rise time)

In the above plots you can see the second cancelation pulse creating the "static EQ" I spoke about.

After which we'd get the inter aural comb pattern (0.270 ms has passed)
xadc01.jpg

(0.1 ms rise time of L + 0.270 ms delayed R)
xadc02.jpg

(0.2 ms rise time of L + 0.270 ms delayed R)
xadc03.jpg

(0.3 ms rise time of L + 0.270 ms delayed R)
xadc04.jpg

(0.4 ms rise time of L + 0.270 ms delayed R)

The combined result of using both pulses. One can also see that the second pulse (after the main) is topping off the bumps at ~ 3700 Hz and ~ 7200 Hz compared to using the pré pulse only:

At 0.3ms and 0.4ms rise time that one looked like this:
acdc-2-03.jpg

(0.3 ms rise time of L + 0.270 ms delayed R)
acdc-2-04.jpg

(0.4 ms rise time of L + 0.270 ms delayed R)

You do see the added signal at the bottom of the graph, actually canceling some of that 3.7 kHz bump, the signal I'm putting out looks like:
Code:
  +
  +
  +
  +
- +
- + -
First cancelation is slightly stronger than the second one. At ~1850 Hz they fuse, and at 3700 Hz it cancels part of it etc...
It is band passed to only hit the area from ~ 1 to 8 kHz. The band pass filters being linear phase.
 
Last edited: