Why "Flat" is Inaccurate

Status
This old topic is closed. If you want to reopen this topic, contact a moderator using the "Report Post" button.
I’ve been thinking about something lately. What I’m proposing here is an explanation, based in science (for those who demand such things), for why a gently tailing top end response is perceptually more accurate than a flat on axis frequency response. Remember you read it here first! :)

There’s a school of thought that rigidly prescribes to the target of a flat frequency response as the ideal for any loudspeaker. Minor variations are sometimes accommodated: long high frequency reverb decay times are “allowed” some high end tailoring, to tilt down the top end. Voicing my own personal designs over the years, I’ve never been able to accept this over-all approach.

For example, take your standard two-way with 4th order acoustic Linkwitz Riley xovers: you get a beautiful graph. However, for me, this approach never sounded completely accurate. Over time, in differing rooms, I tend to hear this as excess energy centered around 5 kHz, even with a flat on-axis response. I tend to tip the top end down a bit, sometimes allowing it back up after 10 kHz. I know I’m not the only one that favours this tailored on-axis target.

I can finally explain why this is a more accurate approach (in fidelity, not preference).

Imagine typical stereo creating a phantom center image with speakers set at some angle of incidence to the head. Well, the speaker playing at the right, into the right ear, projects more treble into the right ear than a real center sound source would. Ditto for the left ear.

The only place where the tonal balance is really correct is for an image right at one of the speakers.

To get an idea of the phantom center tonal error, see this head related transfer function (HRTF) graph (averaged over a small population), scanned from a 1966 JASA article by Edgar Shaw:
http://www3.sympatico.ca/dalfarra/HRTF.jpg

For an equilateral triangle set up, a speaker would be at 30 degrees trying to replicate a phantom in the center. The perceived artificial boost at 7 kHz is on the order 3 to 4 dB, with a gently rising characteristic starting at 2 kHz, peaking at around 7kHz, then reaching equality again above 10 kHz. There it is, the dreaded subjective high-end hotness with flat on axis designs.

By compensating using a gentle high-end roll off, our center phantom image perceptually sounds tonally correct again. Of course 3 to 4 dB compensation at 7 kHz adds the inverse error for images at the speakers, so a compromise of, say, 2 dB at 7 kHz sounds completely reasonable. Using a tweeter with some rise after 10 kHz then brings the difference back, and everyone is happy.

Of course this is tricky, as everyone’s HRTF is as unique as his or her fingerprint. However, it’s a very reasonable assumption that everyone will hear more treble from a source 30 degrees incident than 0 degrees. Tailor to taste.

The repercussions of this effect are wide ranging. For example, the HRTF at 30 degrees is also hotter from 200Hz to 1 kHz than the response at 0 degree (note: all curves converge below 100Hz, as the head becomes small in relation to the wavelength and head diffraction effects minimize). Perhaps there should also be less baffle diffraction compensation than a flat measure would indicate. Indeed, many voice to 4 or 5 dB, even for nominally flat acoustic power designs.

It would also result in perceived tonal changes with changes in speaker placement geometry that are independent of room effect and speaker toe-in. Different angles, different perceived phantom image error.

The difference between stereo and single speaker mono was always ascribed to the inherent picket fencing and crosstalk error inherent in stereo. I’d wager that the HRTF difference for phantom image generation is also a large factor.

Finally, and I know Lynn will love this last one as it’ll dovetail nicely with his far eastern philosophies. This effect explains why different people hear different tonal balances from the same speaker/room, and why there is objectively no one “right” frequency response across the population. Everyone needs a response tailored after their own HRTFs, if phantom images are to sound tonally correct to them.

To me, this is a “very big deal”. Almost makes me want to run out and buy Etymotic in-ear mics and get the HRTF characterized.

Cheers,
Dave Dal Farra
 
interesting approach, I just dialed some DEQ according to the diagram that you linked to and what you suggested. My dipoles now have slight better separation of left and right and voicing is a tad better. Am i hearing things? :dead:

:edit: playing around with toeing a bit more, now I'm not sure what to make of this
the speakers did so well earlier, now it doesn't do the disappearing act anymore.
 
Nah, if the 0 degree angle of incidence is used as a reference and normalized to a straight line, the peak around 2kHz would practically disappear and the peak around 5~7kHz would also be reduced. I agree that everyones hearing is different, but that's beside the point. A clarinet wouldn't be equalized in real life.

And the other thing is: why should a phantom image be only in the centre? I agree that a lot of multimedia sounds are monophonic, but there's no 'standard' listening position, so there are other factors to consider such as the off-axis speaker response, reverberation of the room, and the polar patterns of the microphones used for the recordings.
 
If you set up the speakers so that not directed to the listener but parallel to the center line, then the frequency dependent directional polar diagram will make sure the high frequencies will not be emphasized.

I remember that the legendary Celestion SL-6 had 1.5 dB/octave rolloff over most of its frequency range and I read somewhere that its optimal setup is not tilted towards the listener.
 
Administrator
Joined 2004
Paid Member
If sounds straight in front of the head have a different FR than those off to the sides, the phantom image may, indeed, sound odd.

But to EQ the phantom center properly you would need more than simple EQ. The EQ would need to track how common to center the sound is. Sort of like the simple Dolby matrix stuff. Apply the head function EQ most heavily to sound sthat are common to left and right - the center.. The EQ would need to change as the sounds become less common.

A good setup would allow you to dial in the angle of the speakers and maybe other things.

That's not to say that you couldn't fake it a simpler EQ.

Does this mean that the Binaural recordings done with a dummy head should have a better center phantom image? Because the HRTF is built in?
 
You are right in that stereo reproduction of a centered sound image requires a frequency response correction due to the HRTF. You are not the first to realize this, though...:D Correct me if I am wrong, but didn't the "BBC dip" have an explanation like that?
 
For an equilateral triangle set up, a speaker would be at 30 degrees trying to replicate a phantom in the center. The perceived artificial boost at 7 kHz is on the order 3 to 4 dB, with a gently rising characteristic starting at 2 kHz, peaking at around 7kHz, then reaching equality again above 10 kHz. There it is, the dreaded subjective high-end hotness with flat on axis designs.

Interesting idea, Dave. The problem I see with it is both ears are hearing sounds from both speakers, not just right-speaker, right-ear. Your graph doesn't show 30 degrees so let's take 45 instead. What the ear would hear would be a sum of the 45 and 315 curves. And the power spectrum reaching the ear would be even less than a linear addition of the curves because the short wavelengths make the signals uncorellated. Given all that, I think any boost would be MUCH smaller.
 
While that's true catapult the magnitude difference of the direct first arrivals between 315 and 45 is on the order of 15 dB, too wide for much cancellation or summation even with phase co-operating fully. Regarding the diffuse/power field, it's an interesting question but those signals would also be much reduced in level for a non-pathological speaker setup. Their impact would seem to me hinges on how the ear weighs the delay and directionality above and beyond straight frequency tailoring.
 
Last time I checked, recordings are still monitored in stereo before release. The 3 to 4 dB emphasis at 7 kHz mentioned would be present for the engineer to hear..and possibly eq out.
Having a fixed de-emphasis in the loudspeaker filter seems problematic to me. Either a switchable (and adjustable) active eq circuit or burning a copy of the cd with equalization applied (re-mastered;) ) appears more reasonable.

DDF said:
There it is, the dreaded subjective high-end hotness with flat on axis designs.

I have yet to hear this condition where flat on and off-axis , down through and below the xo region, is incorporated in the design, as in most serious monitors like Genelecs, etc. and some (including the NRC crowd) home speakers - that include directivity control and low distortion drivers not over driven.

cheers,

AJ
 
I simply don't like "flat" speakers because my ears are'nt that good anymore as they where. The signal/noise ratio of my ears is becoming smaller, and i have as 40dB dip in my hearing threshold from +/- 1k to 4.5k, and the treshold rises to -5db @ 8k again! Maybe this has to be considered also, the perceived hearing curve is not for everyone the same.

As a result of my dip there are a lot of speakers who give an edgy treble, with (a lot of?) distorsion, my ears have become very sensitive on distorsion levels.
 
I’m glad to see so many thoughtful responses!

Panomaniac: center channel? What’s a few more comb filters? :)

Sony once sold a pair of headphones with a gyro sensor, to detect head motion and change the HRTF according to head rotation angle. You first hit a “reset” button, then let the gyro do its thing. We bought a pair (rudely pricey) tried it at length, and didn’t work well at all. Problem with HRTF and binaural encoding is that HRTFs are individualized, and the illusion can work anywhere from extremely well, all the way terribly poor, depending upon your HRTF. I’ve sat down and listened to live feeds through Kemars and HATS, played back through speakers with crosstalk cancellation algos. Great effects but it destroys the music. My HRTF doesn’t match the standard curves the industry seems to favour, and I find most binaural recordings to generally sound cupped or honky, if not downright phasey.

Sqlkev: I’m glad someone tried it, thanks! Did you do it with an acoustic measurement: i.e. the absolute response hits this target? Or did the DEQ dial up just the relative tilt between the two angles? What was the original speaker on axis target?

CeramicMan/panomaniac: center was chosen as worst case to illustrate the concept but of course it varies over angle. Some compensation is better than none, and I personally would chose the center as center fill images are the most typical. Of course the room, directionality, recording etc affect things, but don’t give up hope yet. See my response to AJ below, to understand how to best apply this concept.

Oshifis: many experienced builders will build in a tilt like this from day one.

Svante: I’m a voracious consumer of speaker design literature (for over 30 years) and I’ve never seen this concept in print before. I can’t claim to be the first to think of it, but it’s certainly original to me, and not by any means common knowledge. ie in 30 years, I’ve never heard of this.

The BBC dip is interesting. I’ve yet to see any reference describing exactly what it is. It’s the Loch Ness monster of speaker philosophies: everyone claims to know it, but no one can draw it. If you have a reference, please post it. My understanding of it is a depression in the mid band, whose purpose is to add some diffuse field equalization to the response. The idea being the mic picks up incident but also non-incident sound, and a more natural tonal balance has the playback chain apply some diffuse field response weighting.

Hi catapult, as rdf mentions, the “crosstalk” signal shows additional inherent delay, and the tonal summation isn’t the same as if it were non-delayed. I view this as a second order effect and one of the inherent errors in stereophony.

AJ: Here, you’re repeating an argument I used to make on the MAD board. I guess I had that coming. :) However, the argument is misapplied in this case. The argument goes something like this: If we want to replicate the exact “message” the recording engineer provided, our speakers, speaker set up and room would be exactly like his used in the final mastering, assuming they were targeting achieving having their system sound as close to the “live” as possible (which isn’t that common a case actually). We could look at this in despair and throw our hands up, knowing that our reference is unknown. However, this argument is a justification to reject the concept of “absolute sound” and allow your own personal experience to dictate what is accurate. Since each recording varies, we should more target our home systems to sound as real to us as possible, with what we perceive as accuracy, and with the recordings we chose as reference, rather than some mythical absolute. We could apply this to each CD to decode the difference between each recording’s vision, and ours, but isn’t that a bit impractical? I’d rather play with the kids. :)

But all is not lost. Here is how this concept should be applied: In the absence of some unknown reference, I look at this way: if you voice in stereo just appreciate the fact that images glued to the speakers will sound a tad hotter than those more towards the center phantom image assuming you’re looking ahead. Try and strike a tonal balance that best trades of the tonal balance at differing phantom image locations. That’s it. IME, the resultant on axis response which sounds the most “right” invariably has the 4 to 7 kHz tipped down then some rebound around 10 kHz. Some less, some more, depending on f3, dispersion, and which room its going to go in. Its the beauty of diy hifi, we all get to roll our own to taste. This effect is real, consider or discard at your leisure.

Tubee, unfortunately we bring our ears to the live gig, and into the stereo room. So it’s a wash, if you’re trying to make it sound like what you think of as being real. I’m sorry to hear about your situation, but it’s a bit unique; compensating for hearing damage is trying to be “better” than live.
 
rdf said:
While that's true catapult the magnitude difference of the direct first arrivals between 315 and 45 is on the order of 15 dB, too wide for much cancellation or summation even with phase co-operating fully.

That's true at 7K but, if you go lower, the difference is less so you will get more summing at each ear which will make the 7K peak seem much smaller.
 
i dont know i could be going the wrong way in reply to your point but for listeners i dont see the point in having a flat response. 5k is in that range that humans will pick up/hear much better. So yeah you get some true flat response speakers and of course the mid and presence frequencies are gonna stand out more than those shelf speakers in the garage. I always thought flat response speakers is what the mix engineer wants so he could pick out those freq. that stand out too much and take care of them. For me when it come to listening, not mixing, color is desired. From the amp to the speakers i want a colored sound. So i agree the listener should just tailor to their own desire, and adjust for room acoustics. I cant imagine having my JBL LSR6328p's in my living room, or some ATC's or Westlake's at that.
maybe some audiophiles are taking it a bit far when they want flat response speakers in their living room. If thats the case, good luck in ever enjoying any music. right? leave the flat monitors for the mix engineers and then get your markers and color the sh#% out of that cd when you get home. More importantly rather then flat...
Frequency range. for the studio and the home listener. right?
 
DDF said:
Tubee, unfortunately we bring our ears to the live gig, and into the stereo room. So it’s a wash, if you’re trying to make it sound like what you think of as being real. I’m sorry to hear about your situation, but it’s a bit unique; compensating for hearing damage is trying to be “better” than live.

Thanks for answer.

The hearing dip is problably heredity from my fathers side, (i am just 40!) and could be also enhanced from popconcerts. The tinnitus doesn't help it all. And nasal allergic reaction/eustachian tube problems isn't that good for it ether.
Yes at the moment i change my speakers somewhat to get a more "flat" response to my ears, but it is not the best way i know. But i must do that because the treble overshadows the rest.
Within 10 years i will problably wear hearing aids for sure :xeye: (i sell them myself besides optical aids) And in 15 years practise another hobby? But in some way my ears are more sensitive, i know now more easy when how a certain speaker sounds, and like or dislike it easier.

With human spoken speech i have no problem, these are direct and undistorted, speakers distord more in some way, not only lineair(flat) but also harmonic distorsion. I like my hybrid amp (tube-fet) harmonic spectrum of it is more natural compared to an all bipolar transistor amp.

The speaker's lineair distorsion is also affected by the performed soundfield to the sideangle's.
Not all speaker units have a flat response: my current tweeter has some raising curve to 20k.(focal inverted dome) I have a new speaker design in my head to give some solutions : low distorsion units (peerless HDS) and dipole tweeters (B&G Neo3)
 
impsick said:
I always thought flat response speakers is what the mix engineer wants so he could pick out those freq. that stand out too much and take care of them. For me when it come to listening, not mixing, color is desired. From the amp to the speakers i want a colored sound.


Then you have at home problably a bipolar transistor amplifier Impsick, right? To my ears they give a more warm and some colored sound/distorsion spectrum, even if they have a "flat" response.
Swapping the standard diode bridge in it to hexfreds can remove some color, all parts have their specific sonic characteristics. I have restorated a quad 303/33 amp a year ago, and compared the sound to my hybrid amp.
 
diyAudio Chief Moderator
Joined 2002
Paid Member
DDF said:


The BBC dip is interesting. I’ve yet to see any reference describing exactly what it is. It’s the Loch Ness monster of speaker philosophies: everyone claims to know it, but no one can draw it. If you have a reference, please post it. My understanding of it is a depression in the mid band, whose purpose is to add some diffuse field equalization to the response. The idea being the mic picks up incident but also non-incident sound, and a more natural tonal balance has the playback chain apply some diffuse field response weighting.



To my knowledge it was not some local notch (popular legend with the reviewers) but a gradual tailoring, 2- 3dB the most. That, can be found in Spendor and Harbeth larger monitors too.

I found this also, from Lynn Olson:

"The Ariel now sounded sweet, relaxed, and natural. The 2-meter on-axis measurement (shown above with no response smoothing) followed the intended 2dB slope from 100Hz to 10kHz with a very mild recession around the 3.8kHz crossover region. This is the classic "BBC dip", and much preferable to a "forward" emphasis in the upper midrange. Since ear is approaching its greatest sensitivity in the 2-5kHz region, even very small peaks create an unpleasant and unnatural sibilance. By contrast, a small dip in this region results in a slightly more distant perspective, and a more relaxed sound.''
 
Status
This old topic is closed. If you want to reopen this topic, contact a moderator using the "Report Post" button.