Measuring the Imaginary

Surely (ahem) a set of binaural dummy head & ears recordings, taken at various progessive positions of known (human-judged) different stereo effectiveness, fed into a mega-comparator (multi-billion-dollar AI), ought to identify and then reproduce the effect. All the commercial surround-sound technology probably came about this way.

Analogy: stereo vision. Imaginary, until parallax math is worked out and a computer could reconstruct a 3D scene from two perspectives of known separation. However, even with only one perspective (viewing a single picture) a person can still perceive a 3D scene with about 60% true depth PROVIDED only one eye is open and the other eye shut. Both eyes open viewing the same image -- brain decides the scene is completely flat. This "imaginary" 3D effect is quite uncanny; I discovered it by accident in 1999.
 
Last edited:
It doesn't matter where the sound emanates from, it only matters what the pressure is at your ears.
Croaking frog out in front on a lilly pad - the sound comes from an actual point in space and your ears hear it as such. So, when two speakers "reassemble" a soundfield, do the wavefronts add at particular points in actual space - and it really is louder there; like two waves that add on a water surface - the displacement really is higher there at the intersecting point.

Or is that perception all in your head and a HiFi driving speakers are just "doors"? If you stuck a mic facing the place where the horn is blaring - apparently - it measures no louder than any other spot.
 
The stereo illusion of sound stage only works as well as it does if the virtual source is between and or behind the speakers. If the virtual source ever seems to emanate from in front of the speakers IME it indicates a problem in the reproduction system. In some cases the sound can also seem to emanate from beyond the width and or height of a speaker, but it still shouldn't be in front of the speakers. So, if you have a frog somewhere in the stereo illusion reproduction space, then it will have a L/R location given by cues such as the volume level from each speaker and ITD timing differences between the speakers. How far the frog is away in the distance will be given by cues such as the ratio of direct to reverberant sound, HF loss in air associated with distance, etc.

To put it another way, the stereo illusion is more or less as much fooling your perceptual system as is RGB video. Its just a different perceptual sense, so it needs different tricks.
 
  • Like
Reactions: 1 user
No, as a manufacturer it’s a little tricky for me to get into ingredients and explanations, but I can get more specific with Schumann 7.73 Hz. Fire at will.
wchang said:
Any specific key active ingredients? And explanation how it worked?

One thing I can say: I stabilize my CDs using special black electrical tape (3M Super 88). Cut 1.5” long strips, then cut each of them lengthwise to make two 1.5” long strips. I use 3 of those narrow strips of tape on the label side, radially, 120 degrees apart.

Also, ensure the CD is level when playing. You can’t rely on the level of the top of the CD player or transport since it is usually not (rpt not) the same level as the spinning disc. These steps reduce the wobbling and fluttering of the CD while playing. The reason these steps are audible is because the laser tracking servo feedback system doesn’t have to work so hard that it can’t keep up.
 
Last edited:
  • Like
Reactions: 1 user
Member
Joined 2009
Paid Member
How about a microphone that's - say - a 90 deg flat array of "shotgun" or other type of highly directional microphones, arranged so that the outer most pair point to the outside of the speaker L / R placement. Those would detect any "beyond the speaker" soundfield placement.

If by beyond the speaker soundfield placement you think about sound emanating outside of them ( L/R, soundstage seems wider than where your L/R loudspeakers are located) then you hear your room playing a trick on you, not your loudspeakers which image outside the stereo triangle. Sad but true. Just bring a pair of loudspeakers outside in outdoor and listen to them on tracks that are 'wider than loudspeakers location'... enlightning experience.



I was playing with the Reaper N band dynamic range compressor plug in, in expansion mode. Listening to a big band recording where, as those players will do, a few horn players stand up in the middle of the bandstand. They were clearly more dynamic with the multiband expander, but what was really cool is they remained in place spatially, but were more clearly defined in their location by the dynamics of their toots. So now we have a dynamics-in-place/position quality - you wouldnt want the image elements to smear as things get louder in transient, then go back into focus - would you.

Why would their location change when you process their dynamic? 🤔
 
To put it another way, the stereo illusion is more or less as much fooling your perceptual system as is RGB video. Its just a different perceptual sense, so it needs different tricks.
This is not strictly correct. True stereophonic recording and reproduction - as opposed to binaural/transaural methods that have also been raised in this thread - renders exactly the sound pressures at the ears at low frequencies (below about 650Hz) as would be evident with a real acoustic source in the free field placed somewhere between the angle subtended by the (nominally) two loudspeakers.

At higher frequencies, the departure from this condition can be ascribed completely to the listener's head getting in the way - a condition that is very hard to circumvent! Nevertheless, there have existed several means to alleviate many of these audible defects of stereophonic recording since its inception - although these have been largely ignored or deleted. EMI's "stereosonic" filtering is a good example, having all but disappeared since the 1960s.

More generally in this thread, there persists much confusion caused by the errant inference that left and right recording channels individually represent the sound pressures required at the left and right ears. In short, stereophonic recording REQUIRES left and right channel information is reproduced at BOTH ears. Furthermore, there are significant audible benefits to be gained in stereophonic reproduction by employing more than two loudspeakers: Two loudspeakers is simply the minimum number required.

Further still confusion in stereo abounds, however. The "specmanship" observed with channel separation and features such as "dual mono" electronics are yet more good examples of the departure of so-called "hi-fi" from the requirements for genuine high fidelity sound stereophonic reproduction. There are significant audible benefits to be gained in stereophonic reproduction by employing via suitable "mid-side (MS)" processing that deliberately compromises channel separation, for example.

We might also do well to regard the MS format as the natural format for stereophonic recording and reproduction, especially where we attempt to quantify subjective assessments such as "sound staging". But perhaps of more specific relevance to this thread...

Stereophonic reproduction contains NO height information whatsoever. Where height information is perceived, it is often ascribable to distortions introduced by the loudspeakers, for example; The same effect is unlikely to be reliably reproduced over different loudspeakers or with different listeners.

"Image depth" is (possibly best) evident in monaural reproduction, and results primarily (obvious level differences aside) from the reproduction of early reflections from the recording environment - audible cues that are relatively easily masked, or absent altogether when material is recorded in dry, studio environments (often with different real or emulated environments presented in the same recording!).

(We should note here that whilst M=L+R, M is not necessarily mono! What M is depends on the microphone techniques employed. Indeed, many of the auditory cues critical to this thread are more reliant on the recording method than anything else).

"Spatial separation" between recorded performers in stereophonic reproduction is perfectly predictable, if only as desired at low frequencies. But where the means to compensate for the defects of stereophonic reproduction are exploited, the width of the stereo image can extend considerably and reliably beyond the outer loudspeakers, often to the extent of developing a sense of "sonic envelopment" too.

And if spatial separation is combined with image depth, we even have a means to qualify subjective qualities such as "sound staging" too. However, if we wish to quantify such terms, we must look beyond single frequency analyses, beyond second-order (energetic) signal representations, and separate the defects of stereophonic methods from those due to other distortions in the reproduction chain; We must also take account of our learning abilities too that render all such measures compromised at the outset.

As such, the perception of sound staging can be modelled from bispectral information incident at the ears, both related to the frequency content per channel (how we tell one person's voice from another in old fashioned analogue telephony, for example), and resulting from having related information presented at the two ears (how we often tell the position of a sound source, for example). Such analysis is certainly not a simple task (even to envisage how the information can be usefully presented is difficult!). But it is a method that does not require anything imaginary.

In doing so, we might also then realise the effect of the SIGNIFICANT comb filtering that is evident when using two loudspeakers. (We also note here that these colourations are also evident in "dual mono" replay - that is replaying a single monaural recorded channel over multiple loudspeakers). The result of these (linear) errors is information is rendered errantly across a perceived sound stage, if indeed it is rendered at all! (We might then remark here on the oddity that assessing stereophonic reproduction via two loudspeakers is fundamentally flawed!).

Yet even with bispectral analysis tools in place, and with the gross effects of comb filtering compensated for as best as possible, our ability to learn - and even to teach ourselves - how to identify audible artefacts (whether they be real or otherwise) casts great doubt on the reliability of any measurement tool claiming to identify such small errors as relevant in this thread. Perhaps of greater relevance is therefore to first identify the reliability of the perceptions we are attempting to quantify.

[I also note here ahead of any responses stating the obvious, that much of the above musing is distinctly different from any discussion of the requirements for binaural (dummy head) recording, where we seek to reproduce the two discrete recorded channels at each ear with no "crosstalk" (nominally via headphones). Attempting transaural reproduction, where we try to reliably reproduce binaural recordings (or their emulation) via loudspeakers, remains a questionable endeavour].
 
There are significant audible benefits to be gained in stereophonic reproduction by employing via suitable "mid-side (MS)" processing that deliberately compromises channel separation, for example.

We might also do well to regard the MS format as the natural format for stereophonic recording and reproduction...
Be nice to see some reference on these claims. I know what MS processing is, and how to implement it. More interested in the psychoacoustics of your claims.
 
Be nice to see some reference on these claims. I know what MS processing is, and how to implement it. More interested in the psychoacoustics of your claims.
There are no claims of my own included. Rather my response is a selected summary of refereed and well-cited publications.

As a staring point, a good paper on the defects of stereophonic reproduction is (I would suggest):

"A New Approach to the Assessment of Stereophonic Sound System Performance". J. C. Bennett, K. Barker, and F. O. Edeko. (JAES, vol. 33, no. 5, pp. 314-321, 1985 May)

For articles on stereophonic recording and MS processing, I would recommend seeking out any of Michael Gerzon's papers on recording or shuffling. He was very well-blessed in making the complicated sound simple. I believe too that his "metatheory of auditory localisation" published in the JAES also contains an appendix dedicated to stereophonic perception (?).

Similarly easily readable references on bispectral and cross-bispectral analyses are not well-known to me.
 
Last edited:
Any two loudspeakers? Very large panel ESL speakers?
Two ears listening to two or more sound sources separated in space produces constructive and destructive interference. Equalisation at high frequencies is limited by inevitable head movements being relatively large re the wavelengths concerned. The other means of smoothing is to generate a diffuse response, which is essentially to blur the information in time. It matters not how big your speakers are, nor how they move the air.
 
Last edited:
It matters not how big your speakers are, nor how they move the air.
Actually I need correct myself here! It does matter if your speakers are large (specifically how large they are re the wavelength in question), because smearing in time or additional comb filtering with large (singular rather than concentric like the Quad ESL63) diaphragms is inevitable - and that will be in addition to the stereo filtering from having two such loudspeakers. Having one effect compensate the other would appear unlikely if the diaphragm dimensions are larger than those of your head.
 
Okay. I think I will disagree on a few points. I have the Waves S1 Stereo Imager which I can use in Samplitude to evaluate shuffling on the Sound Lab ESL speakers. IMHO, its an obvious and phony sounding effect which means it was wasted money.

Also, if one thinks about it, its obvious a lot of processing such as MS, such as shuffling, would be low-cost to include in mass produced audio devices, and almost free in computer software such as HQ Player, J River, etc. The problem is most of that stuff sounds bad if used on a good system. What does sound good and some people want more than dsp gimmicks is conversion of PCM to high sample rate DSD.
That's for good hi-fi systems.

However, it appears to be a very different situation with car audio. There are a lot DSP tricks being used to sell high end car sound systems. Some people seem to be very impressed and other people think it sounds bad.

Regarding Sound Lab ESL panels, they are segmented and curved. There is a little beaming at very high frequencies from smaller segments but it doesn't sound like comb filtering. I can make comb filtering by placing a small metal reflector on the rug in front of the speakers. The comb filtering effect has an obvious sound. Take the reflector away and there is no audible comb filtering. Small reflecting surfaces between and behind the speakers can have also produce audible comb filtering, which I find and correct because I recognize the sound. IOW, my observations with my speakers in my system do not confirm your claims. Therefore, I remain unconvinced.
 
Stereophonic reproduction contains NO height information whatsoever. Where height information is perceived, it is often ascribable to distortions introduced by the loudspeakers, for example; The same effect is unlikely to be reliably reproduced over different loudspeakers or with different listeners.

"Image depth" is (possibly best) evident in monaural reproduction, and results primarily (obvious level differences aside) from the reproduction of early reflections from the recording environment - audible cues that are relatively easily masked, or absent altogether when material is recorded in dry, studio environments (often with different real or emulated environments presented in the same recording!).

If those statements were true then I would not (rpt not) be able to get a very wide, deep and high soundstage using headphones, the signal arriving at my ears contains all necessary ambient information from the original recording. I realize other audiophiles often say they don’t like headphones because they don’t soundstage. My soundstage on headphones now - it wasn’t always so at all, it took a lot of experimentation and feeling my way along, using all tools at my disposal, which are many - is very similar to the soundstage I got with all tube electronics driving Quad 57s. If you sit a chimpanzee down at a typewriter he will eventually type out all the plays of Wm Shakespeare. I’m not saying this huge soundstage occurs will all recordings, and not to the same degree. It all depends on how the recording was made and the venue.

Geoff Kait
Machina Dynamica
Not too chicken to change
 
Last edited:
Member
Joined 2009
Paid Member
Markw4,
S1 isn't supposed to be used on 2 tracks signal but on discrete sound in the context of a mix/stems. That's why it have 'assymetric' placement capability.

The only exception would be to use it as a typical MS stereo 'expander/ compressor' process in case a message have a 'hole' in the M part of matrixing ( often happen when people use couple without using table presented in the previous paper i linked giving abnormal SRA -or they don't know what it is...- or such issue) or too narrow image: in that cases on a good system the plug in produce very good results, as any M/S matrixing treatments.
The issue is imho when you try to use it ( without finesse) on everything, especially thing that do not need such treatments...


Sounbloke explanations are the one usually considered as what science know about this at this point in time. At least it was last time i seriously take a look at it some years ago when i had to teach this professionaly. It might have changed since as digital process opened some ways which wasn't possible some years ago.

People claiming there is height information within typical capturated stereo (other than distortion Soundbloke talked about or acoustic playing trick on you): LOL.
Time to study what it's all about. It's relatively easy as there is science article to refer too.
 
  • Like
Reactions: 1 user
The only exception would be to use it as a typical MS stereo 'expander/ compressor' process...
Exactly my interest. Even slight changes to width paid a price with damage to the center image. And it didn't sound natural at all.
Of course there is a crossover frequency adjustment, but any frequencies chosen for shuffling were adversely affected.

Regarding the known science, little if any of it was conducted on systems with Sound Lab ESLs, along with other very high quality reproduction equipment. Unlikely anyone would bother even if they could afford the cost, since most listeners will be listening on more modest equipment anyway. Moreover, the best reproduction equipment available today is measurably better than the best of 40 years ago. Don't even get me started on evolution of high quality CD playback.
 
Last edited:
its an obvious and phony sounding effect
There is a significant problem with such a generalisation...

Shuffling has (unfortunately) come to subsume two different processes - one being the compensation of coincidentally-miked acoustic recordings, the other being the "enhancement" of stereo recordings such as in "spaciousness" effects and the like.

These "shuffling" effects are not the same thing and they should not be confused. The former is absolutely not "phony" but restorative of that which was present at the recording; The latter is indeed an effect that some may find beneficial, others may not.

Both such "shufflers" are obvious, however. Listening to a coincidentally-miked acoustic recording without appropriate compensation is an obvious distortion. Adding a "spaciousness" control can usefully increase audible resolution in some cases, but this is highly dependent on the nature of the recording itself. (Using both together should be a non-starter!)

Further, MS processing can extend beyond the low frequency realm (where spaciousness typically applies below 400-500Hz), and can be used to ameliorate the mid-frequency response colourations imposed on "centrally positioned" information, such as evident on lead vocals replayed via two loudspeakers for a typical example.

Many years ago, Blauert described a filter for the M channel to compensate for just the first such frequency. I believe also this might have appeared commercially somewhere as a "presence" control? Certainly DSP can offer higher frequency corrections too. How high is still limited by wavelength as per my previous response, however.

But there is lots more to be gained by MS processing too: Variable width controls to compensate the different localisation methods we exploit at low and high frequencies (rather than exacting equalisation); Using such processing to provide more than two loudspeaker feeds - and therein ameliorating the aforementioned comb filtering effects. I have yet to encounter an example where moving from two to three loudspeakers as such is not immediately obvious and preferable to all there at the time.

Further, IMHO loudspeakers with acoustically large diaphragms always sound coloured and give rise to false impressions of stereophonic images - that is they generate distortions that are not part of the recorded information even if some like the effect. I am certainly not alone in that opinion, but others are perfectly free to exhibit a different subjective preference.

I remain unconvinced

As I referred to in my first response, each of us possesses a highly non-linear learning capability with an inherent and significant capacity for delusion too. You may very well have learned to compensate for the defects in your loudspeakers - something for which others would likely lack the particular experience. Without blind testing, you might also be suffering from a delusion. It matters not. Neither does your opinion of my "claims"/statements of fact. But the task of high fidelity reproduction is surely to limit such defects in the first place?
 
I don't see that shuffling plays a useful role in high quality reproduction systems.

However, I would agree that center speaker could be a nice improvement, but not a junky center speaker. If each speaker and its amplifier costs $25,000, and each dac channel costs an additional $5k+ then the question arises as to whether it would be preferable to scrap the 2-channel system and replace it with a lesser quality, equivalent total cost 3-channel system. Personally, I would not make that swap. The system is quite useful as it is.