The room should ideally only help supplement the limits of speakers in the lower end of the spectrum. I remember playing a set of small speakers in a small room, and it was nearly square 3M x 3M, probably the worst kind. Someone wanted to demo a lazer record player. Larger speaker when terrible in that room, and amazingly the small speakers, Lullaby, that I brought in fit that room nicely. We could hear the people stomping on stage with pretty good image presentation.
I sometimes wonder whether a certain criteria can be established. The purest experience have always been in pretty dead rooms, but there is always a threshold where once you exceed, either the system is being pushed too hard such that you start hearing signs of fatigue or the reflections in the room are high enough to mess things up. A good system should play well within these limits just giving you a feeling you are sitting further away than usual.
I sometimes wonder whether a certain criteria can be established. The purest experience have always been in pretty dead rooms, but there is always a threshold where once you exceed, either the system is being pushed too hard such that you start hearing signs of fatigue or the reflections in the room are high enough to mess things up. A good system should play well within these limits just giving you a feeling you are sitting further away than usual.
...
and cabinet diffraction from the tweeters causing polar ripple in the treble for small listener position offsets. (The latter seriously degrades the phantom channel image and stability)
Good point ....
Example comparison of effects from
listening angle vs. symmetric/asymmetric mounting of tweeter on baffle
Symmetric Tweeter (no baffle roundoffs)
- 00-Degrees: Ripple (2Khz ... 10Khz) ca. 3,8dB
- 10-Degrees: Ripple (2Khz ... 10Khz) ca. 2,5dB
- 30-Degrees: Ripple (2Khz ... 10Khz) ca. 3,0dB
Asymmetric Tweeter (no baffle roundoffs)
- 00-Degrees: Ripple (2Khz ... 10Khz) ca. 2,8 dB
- 10-Degrees: Ripple (2Khz ... 10Khz) ca. 2,1 dB
- 30-Degrees: Ripple (2Khz ... 10Khz) ca. 1,5 dB
Pictures refer to listening angles 00-, 10-, 30- degrees
Green : Symmetric tweeter mounting position (same height but equal distance to left and right baffle edge)
Pink : Asymmetric tweeter mounting position
Simularion angles changed towards the shorter distance between tweeter an baffle:
Asymmetric tweeter is shifted prefereably "inside" the stereo base to reduce diffraction ripple.
Kind Regards
Attachments
Last edited:
Stereo works best under whatever condition the mastering/mixing was done. If that was done in an anechoic chamber then maybe. However if it was done in a control room with a degree of normal room reflections (although typically less than an untreated living room) then that is how it will sound "best" or at least closest to the engineers artistic intention.
If you let go of trying to achieve absolute "reality" (what you hear in your room sounds exactly like the live instruments in a real venue recorded and played back raw) which is not possible with two channels, to trying to achieve hearing what the engineer heard in the control room ("artistic intent") then your goal becomes a lot more achievable.
An analogy - if you display a violet flower on a TV screen should we be disappointed that the colours don't look like the real violet flower held next to the TV (since the violet colour is outside of the colour space of the TV and CAN'T be reproduced faithfully) or should we be happy that our TV is calibrated as closely as possible to the colour specs of the TV standard and thus most accurately duplicates the artistic intention of the director when the colour grading was performed in post production ? We see what they intended without further embellishment ?
In my mind there's a difference between making stereo work best and making a recording work best. The latter is what you describe. The former hasn't been researched enough in my opinion.
If you're referring to the "stereo dip" at around 2Khz then I don't see how a side wall reflection fills this in. The delay to the first side wall reflection is well beyond the fusing time of the original signal so will not alter our perception of tonal balance significantly - 2Khz is high enough that we are perceiving the tonal balance based on the first arrival not steady state response.
The only time it would alter our perception of balance ("fill in" the hole) is if you're well past the critical distance into the reverberant field - but you won't be getting a good phantom image well back past the critical distance anyway so that's a moot point. Why design for such a non-optimal listening position.
The first stereo dip at around 2Khz can be compensated for with a small bit of EQ - in well-imaging speakers that get their final "voicing" by small tweaks based on listening (rather than pure design by measurement) you could argue that the designer is inadvertently (or knowingly) compensating for the stereo dip as part of the voicing process - if you know what angle of separation the speakers will be used at the frequency and depth of the dip is fairly predictable as peoples heads don't vary in size that much.
If you're referring to the stereo interference field above about 3Khz affecting phantom channel imaging, (phantom stability with listener movement etc) I think that comes down largely to two things - the typical insufficient angular separation between the two speakers (HRTF crosstalk >3Khz drops dramatically if you go from a typical 45 degrees of separation to 55-60 for example) and cabinet diffraction from the tweeters causing polar ripple in the treble for small listener position offsets. (The latter seriously degrades the phantom channel image and stability)
Sufficiently separate the speakers and eliminate diffraction from the tweeter (make it a true point source with smooth off axis response) and phantom image trouble largely goes away - even in a fairly dead room with damped side walls. My best listening room had curtains along both side walls at the speaker end so was fairly "dead" at high frequencies (the other end was live) and phantom channel imaging was excellent, much better than when the curtains were pulled back.
There's a paper by Vickers that provides a good overview and even proposes phase decorrelation to "fix" the phantom center issue: Vickers, Fixing the Phantom Center: Diffusing Acoustical Crosstalk, Audio Engineering Society Convention Paper 7916
Simple solution that is great for centre channel dialogue, but that adds as many problems as it solves for music.
Sure you can produce a musical instrument dead ahead without any interference effects but if you have a TV the centre speaker is either too high or too low. (Pinhole projection screens aside) But more importantly what happens to the instrument that you're trying to image as coming from half way between centre and one side ?
I agree. Projection and an acoustically transparent screen is the only solution.
The answer is that you're back to two speakers (centre and right say) both producing the same sound at different relative volume/phase/whatever, but this time you have half the angular separation than you had before (centre to right instead of left to right) which makes your high frequency "stereo" crosstalk MUCH worse.
Not sure if this is worse than the phantom center issue.
The fact that there aren't any standards is a real shame, that doesn't mean they are impossible to implement, its just a commercial chicken and egg problem not a fundamentally unsolvable technical problem. There is a realistic, optimal window for reverberation time in a "small" room for example, that alone would be a start. Most living rooms are well above the optimal reverberation time, especially for wide dispersion speakers that prevail.
I agree but I would like to see even stricter production standards that go beyond RT.
Good point ....
Not sure if the conclusion is valid because ripple is the same for each speaker.

Good joke, but not quite accurate. An NTSC display with true SMPTE phosphors or well implemented DLP, LED, etc., can be just as accurate as PAL, SECAM or even HDTV. It's in signal transmission and spacial resolution that NTSC has its greatest flaws.Especially when using the NTSC format (Never The Same Color)
Kinda OT, but since it's what I do for a living, I couldn't let it go. 😉
Yes, a good analogy - and an important point in video and still image reproduction (true violet is often recorded as blue). But the analogy is useful beyond that. The director and the color grader/timer should have large gamut, accurate, calibrated displays, but they often don't. Just like not all mixing and mastering suites are perfect.An analogy - if you display a violet flower on a TV screen should we be disappointed that the colours don't look like the real violet flower held next to the TV (since the violet colour is outside of the colour space of the TV and CAN'T be reproduced faithfully) or should we be happy that our TV is calibrated as closely as possible to the colour specs of the TV standard and thus most accurately duplicates the artistic intention of the director when the colour grading was performed in post production ?
But that does not mean that all is lost for audio or video. If YOU have a playback system that is wide gamut and well calibrated, then almost everything you play back will look or sound its best. We often try to cheat with boosts and tweaks here or there, but those tricks don't work well for everything. A wide gamut, calibrated system will. It can be even better than what was seen or heard in the mastering suite.
Not sure if the conclusion is valid because ripple is the same for each speaker.
Let's think about it ...
during shifting the (listening) position, the off axis angle
increases for the farther speaker and decreases for the
closer speaker (assuming speakers are not toed in).
You are right in that we would have to compare differences
in the L-R frequency response for a given setup including
both speakers, toe-in angle etc.
But effects will tend to get even worse IMO for a shift in listening
position: There is no reason for mitigation.
For a perfect symmetrical setup and listening position - you are right -
the diffraction effects should be symmetrical (the same) for both channels.
But such kind of diffraction effects will make imageing noteably
instable due to small listener's offsets i suppose .... and that's what
Simon was suggesting.
Last edited:
^
Now add lots of room reflections. This will result in more stable phantom images. At the same time it introduces new issues. With stereo you can't have it all. Or maybe you can: I've proposed a headphone based binaural loudspeaker renderer which could completely remove issues due to head position.
Now add lots of room reflections. This will result in more stable phantom images. At the same time it introduces new issues. With stereo you can't have it all. Or maybe you can: I've proposed a headphone based binaural loudspeaker renderer which could completely remove issues due to head position.
I really don't have phantom image problems with my system or room. Basically a non-issue.
Headphone based systems might be cool, they they aren't there yet. The best available, the Smyth, just imitates speakers and a room, so there is no real gain in imaging.
Headphone based systems might be cool, they they aren't there yet. The best available, the Smyth, just imitates speakers and a room, so there is no real gain in imaging.
I do like it when the Germans get to arguing, its always worth reading!!
Markus - there is no English word "reflexion", it's reflection. Also, so much of your position is based on the "recreation illusion", which dismisses the "stereo as the medium" illusion (and film as well.). I am quite fond of the later and as I say so often it is "optimum". The former leaves me wanting as it seems to do for you as well, but that still leaves stereo (and film) as a great medium for the presentation of acoustical art. Let's try and keep that in mind.
Oh and Sy ... your just a lazy SOB 😀
Markus - there is no English word "reflexion", it's reflection. Also, so much of your position is based on the "recreation illusion", which dismisses the "stereo as the medium" illusion (and film as well.). I am quite fond of the later and as I say so often it is "optimum". The former leaves me wanting as it seems to do for you as well, but that still leaves stereo (and film) as a great medium for the presentation of acoustical art. Let's try and keep that in mind.
Oh and Sy ... your just a lazy SOB 😀
I really don't have phantom image problems with my system or room. Basically a non-issue.
Lots of people do have problems with stable center images. As soon as there's more than one listener everybody has a problem with phantom images.
Headphone based systems might be cool, they they aren't there yet. The best available, the Smyth, just imitates speakers and a room, so there is no real gain in imaging.
Yes, the Realiser is limited. I'd like to see a system that handles HRTF and room parameters separately. It's doable. All the puzzle pieces are available.
Lots of people do have problems with stable center images. As soon as there's more than one listener everybody has a problem with phantom images.
Maybe worth a poll 🙂.
Can't imagine there are that there are that many people with problems with a (stable) phantom center image. For multiple listeners it could become a problem though.
Last edited:
I do like it when the Germans get to arguing, its always worth reading!!
That's why we even started to do it in English 😉
Markus - there is no English word "reflexion", it's reflection.
Earl lecturing me about spelling, that's a good one 🙂 Yes, sorry, it's reflection. I do get it right most of the time but in German there's "Reflektion" and "Reflexion" which have different meanings.
Also, so much of your position is based on the "recreation illusion", which dismisses the "stereo as the medium" illusion (and film as well.). I am quite fond of the later and as I say so often it is "optimum". The former leaves me wanting as it seems to do for you as well, but that still leaves stereo (and film) as a great medium for the presentation of acoustical art. Let's try and keep that in mind.
Then I didn't explain my position clear enough. I don't want stereo to things it can't do. I'm perfectly fine with "stereo as the medium". But for this to work better standards are needed.
If recreating real auditory spaces is the goal then something new has to be invented.
But diffraction won't cause the same frequency response deviation from both speakers for a number of reasons, some of which you both missed.Let's think about it ...
during shifting the (listening) position, the off axis angle
increases for the farther speaker and decreases for the
closer speaker (assuming speakers are not toed in).
You are right in that we would have to compare differences
in the L-R frequency response for a given setup including
both speakers, toe-in angle etc.
But effects will tend to get even worse IMO for a shift in listening
position: There is no reason for mitigation.
For a perfect symmetrical setup and listening position - you are right -
the diffraction effects should be symmetrical (the same) for both channels.
1) If you cross the speakers in front of or behind the listener - as is almost always the case for good imaging - the angle offset of the listener from the front face of the speakers is not symmetrical for left and right as the listener moves along a horizontal/sideways axis.
For example if the speakers are crossed in front of the listener such that the speakers are both 10 degrees off pointing at the listener, as you move the listener sideways to the left you are going further off axis on the left speaker but moving closer towards on axis for the right speaker - but past some point you will start going off axis again on the other side of the right speaker.
Clearly if the off axis angle from each speaker is different (and moving in the same direction for some listener positional ranges and the opposite direction for others) the diffraction ripple of the two speakers will not track at all.
2) Even if the speakers are toed exactly at the listener (which would only be valid for one listening distance) unless the speakers are at infinity geometry means the angles will not be equal and opposite as the listener moves sideway any significant distance. (Simple trig will bear this out)
3) If the speakers are of a asymmetric design, (tweeter offset to the right on the left speaker and the left on the right speaker for example) then all thoughts of symmetry when going off axis are thrown out the window - as the listener moves to the left they go off the "wide side" of the left speaker but the "narrow side" of the right speaker - the diffraction response will not match at all!
Its this last point which is one of the reasons why I do NOT like asymmetric speaker designs. (mirror image left and right instead of mirrored down the centreline of the speaker)
An asymmetrical design might result in a flatter on axis response but the changes off axis each way will be quite different such that phantom image stability is further harmed. The right way to do it IMHO is to have each speaker symmetrical down its centre line (left and right speakers identical) and deal with treble diffraction using directivity and/or absorption.
This has the best change of minimising differences in left/right treble response as you move sideways thus optimising the phantom image stability and centring.
That's what I was suggesting, although to my way of thinking the traditional approaches to minimising diffraction effects like offset drivers actually harm the stability of the phantom image for the reason described above.But such kind of diffraction effects will make imageing noteably
instable due to small listener's offsets i suppose .... and that's what
Simon was suggesting.
Lateral image placement at high frequencies (> ~2khz) is almost entirely left/right amplitude balance. It takes a relatively small imbalance in the treble over an octave or more (on the order of 1dB) to skew the phantom image significantly to one side.
I first realised this when trying to use two full range drivers which due to differences in their cone condition weren't well matched in the treble - the overall balance and trend was similar but all the individual ups and downs and bumps were different and distributed at different frequencies so that at any given frequency in the treble one driver or the other might be a couple of dB or so higher or lower than the other.
This resulted in a lot of "tearing" of the phantom image where it would seem to pull to one side or the other depending on content and sometimes "smear" into a wider but less distinct phantom image. I spent a lot of time applying unique left/right EQ to match the two drivers as much as possible and got a good feel for just how critical it was.
With the two channels more closely matching in frequency response the phantom image was much more stable and pin point in the centre without side to side tearing, but I had to match the two drivers with EQ very closely to achieve it. The match between left and right was a lot more critical than adherence to a flat response as far as the phantom image was concerned.
Now imagine the effects of diffraction at treble frequencies - at any given frequency as you move further off axis the diffraction ripple will cause the response to alternatively increase and decrease as in a ripple. At a different frequency the opposite may be happening at the same angle, as each frequency will have its own ripple density.
What happens is as the listener moves sideways certain key frequency bands (not all frequencies are equal in the treble in their effect on imaging) will increase on one speaker and simultaneously decrease on the other speaker, pulling the image to the side.
You can easily get a situation where moving slightly to the left causes an increase in the treble to the right ear and a decrease to the left ear pulling the image to the opposite side than the one that you moved to..but if you keep moving it "flips" back to the other side again as the ripple causes the amplitude balance at key frequency ranges to flip flop between left and right.
It's important to note that two different things can cause very similar symptoms - treble diffraction from the speaker (the speakers actual response directed towards the listener is changing with angle and see-sawing back and forth relative to the opposite speaker as the angle changes) but also stereo cancellation through summation at your ears.
The comb filtering from two different speakers (even diffraction free speakers with near ideal response) will have a similar effect if the separation is not great enough for our HRTF to provide sufficient crosstalk reduction.
So as I said earlier, sufficient angular separation of the speakers is needed to get enough crosstalk reduction in the treble (from your own HRTF) to minimise the effects of comb filtering on the phantom image, but you also need to have a smooth diffraction free horizontal off axis response from the tweeter so that the response either stays the same or smoothly and monotonically falls as you go off axis.
What you don't want is one speaker increasing its response at a given frequency as you move to the side while the other one is simultaneously reducing it, and then reversing roles as you continue to move further to the side - eg I think the response falling monotonically as you move off axis is an important ingredient, something that cannot happen at all frequencies when there is diffraction.
Last edited:
Only by turning a pin point "reach out and grab it" phantom image into a big ball of diffuse sound spread out around where the phantom image used to be.^
Now add lots of room reflections. This will result in more stable phantom images.
Not my cup of tea. 🙂
Only by turning a pin point "reach out and grab it" phantom image into a big ball of diffuse sound spread out around where the phantom image used to be.
Exactly. In my opinion a good sound reproduction system should enable the recording to deliver both perceptions to the listener. Two speaker stereo can't do it.
To further the analogies a bit, I was thinking about the distinction between "accuracy" and "realism" with reproduced sound, with some people striving for one, some for the other, and others treating them as if they're the same thing, which I don't think they are.Yes, a good analogy - and an important point in video and still image reproduction (true violet is often recorded as blue). But the analogy is useful beyond that. The director and the color grader/timer should have large gamut, accurate, calibrated displays, but they often don't. Just like not all mixing and mastering suites are perfect.
But that does not mean that all is lost for audio or video. If YOU have a playback system that is wide gamut and well calibrated, then almost everything you play back will look or sound its best. We often try to cheat with boosts and tweaks here or there, but those tricks don't work well for everything. A wide gamut, calibrated system will. It can be even better than what was seen or heard in the mastering suite.
What do I mean ? Imagine you have a yellow flower that reflects spectrally pure yellow light (I know that a real flower probably doesn't but lets pretend) - it looks yellow to our eyes, right ?
What if we then record that with a video camera and reproduce it on a TV - still looks yellow, right ? If the camera and TV are high quality and well calibrated, chances are the colour yellow will look VERY close to the original flower. (Yellow being much easier to reproduce than violet...well within the colour gamet of the TV system)
Is this reproduction of the flower realistic looking ? Yes. Is it an accurate reproduction of the light spectrum of the original scene ? No ! Not even close !
TV's (well, most of them, if you want to get picky) can't and don't produce yellow light, they can only produce red and green which when mixed together looks yellow due to the limitations of our eyes. The yellow appearance is an illusion.
Our eyes only have receptors for red green and blue spectra (only talking about the colour receptors here) each of which has an overlapping but distinct frequency range that they respond to.
Yellow light happens to fall in between red and green so partially stimulates both red and green receptors in the eyes which our brain interprets as yellow. However discrete red and green spectra can also stimulate the red and green receptors to the same degree and thus look yellow even though its a completely different light spectrum than the original scene.
In fact there are a multitude (infinite ?) number of different spectral combinations that will appear the same colour to our eyes, for each possible perceived colour.
Video recording technology exploits this fact by mimicking the red green and blue sensory bands of our eyes in the camera and recording that information as RGB values (later converted to yuv or other formats) rather than attempting to capture the entire raw spectral response of the scene per pixel which would be vastly more difficult, use vastly more storage and for no real benefit.
When the RGB values are reproduced by the TV we perceive the original colour - provided that the original colour was within the boundaries of the colourspace / gamet that the camera and TV uses.
It looks realistic to us because it exploits and works with our perceptual weaknesses, (we only have 3 colour receptors with known centre frequencies and filter shapes, that don't vary a lot between people) but it is not technically accurate, and an alien (or perhaps other animals) with different visual systems would not be fooled and would not see a "realistic" image or rendition of colour, for example if they used 5 different colour receptors with different centre frequencies. (Or could perceive the actual spectra for that matter, like walking spectrometers 🙂 )
The point is there is a many to one mapping of actual light spectra to perceived colours. More than one light spectra can stimulate the brain in an identical fashion.
Isn't this the case for sound as well ? We already know that certain deficiencies in sound reproduction are generally inaudible - smoothly changing phase shift through the spectrum (that doesn't result in concentrated peaks of group delay) can result in a waveform that looks totally different but sounds identical.
There is a many to one mapping between an actual sound field and our perception of it. That means that a "realistic" sounding reproduction of something does not necessarily have to be a 1 to 1 "accurate" reproduction of it to stimulate the brain in the same way.
An "accurate" reproduction of the full 3d sound field surrounding a listener at a venue with a 2 channel system is simply not possible - not enough information is captured in two channels to recreate the sound field, with the possible exception of a binaural recording, which does not lend itself to be decoded to multiple speakers for reproduction in a room.
So on a pure "accuracy" level 2 channel is doomed. We can improve its accuracy in the sense of getting the frequency response really flat, distortion and noise low, freedom from dynamic range compression etc etc, which are all noble goals, but it will never reproduce the sound field of a venue in an accurate way that can be verified by measurements to be accurately recreating the sound field. (It's questionable whether it can be done even with a large number of channels and speakers)
But can it sound "realistic" (eg impart a high degree of realism almost as if you're there) even if it is not technically accurate ? To this I say yes, occasionally, when everything comes together just right.
I've heard very accurate speakers that are neutral, and sound like a very accurate and well balanced reproduction of a recording, but you are not fooled for one minute that they are speakers and not the real thing.
I've heard other speakers (admittedly in different rooms) which I know for a fact are NOT "accurate" in the traditional sense, with quite big errors in frequency response flatness, known problems with the crossovers, (not steep enough, too much driver overlap, poor phase tracking) odd driver layouts etc, on traditional measurements they wouldn't fare too well (except for dynamic performance) and despite all this on many recordings can sound absolutely uncannily real where you can close your eyes and picture yourself there. Seriously. I bet a few of you have heard times where certain recordings on a certain system just sounded completely real too.
How can that be possible ? I still don't know... All I can think of is that there is a distinction between accuracy and realism. Accuracy is trying to reproduce the original sound field as perfectly as possible, (not possible with 2 channels and two speakers) while realism is trying to produce a sound field which knowing and exploiting the weaknesses (and specific many to one mappings) of the ear and brain sounds equivalent to your brain even if its not actually the same, or in fact quite different. Like making the brain see yellow by presenting it with red and green light. (The phantom channel is perhaps an example of tricking the brain into hearing something different as equivalent)
How to do that in a repeatable and deliberate way I'm not sure as it seems to happen more by accident and good luck than design...
Last edited:
Just like a Bose 901 "improves" spaciousness.
It's adding something that is not in the recording in order to make it sound "good".
The added spaciousness is a property of the listening room and not a property of the recorded space. It's added to each and every recording regardless if appropriate or not.
How so?
No.

Griesinger is NOT describing a process that is adding anything (..except for the little bit about LARES).
Instead he is describing a process to lower effects introduced by the room that are perceptually confusing when trying to hear lower freq. in-phase random fluctuations that are present in the recording.
In fact he specifically mentions a reference as free-field.. that he's essentially trying to accomplish a free-field condition in a small room with reproduction at lower freq.s. to better reproduce those in-phase random fluctuations.
The near-field setup lowers the time and intensity of modal effects relative to direct sound (Griesinger's approach), and it ALSO increases the effect of time and intensity between channels (or their difference). (..particularly for "impulsive" sounds.)
^
Which specific Griesinger paper are you referring to?
And, why would my near field sub "Ironically, [...] do a better job of reproducing random in-phase fluctuation (..at least for that one listener), than what he proposes."??
Which specific Griesinger paper are you referring to?
And, why would my near field sub "Ironically, [...] do a better job of reproducing random in-phase fluctuation (..at least for that one listener), than what he proposes."??
All I can think of...
Very good post which reflects very much my own thinking.
- Status
- Not open for further replies.
- Home
- Loudspeakers
- Multi-Way
- What is the ideal directivity pattern for stereo speakers?