Hi,
some coffee break thoughts on subject of depth perception:
currently I've got impression about perception, in general, that it's always result of combination of multiple sensory inputs and memories and all kinds of stuff, which the brain subconsciously handles and let's to exist as our perception, in consciousness. I think it is important to understand we in our conscious minds do not control what we hear, because the process that makes the perception is subconscious and we cannot directly control it with our thoughts. We can't force our brain do things for us directly, but could indirectly. In loudspeaker / hifi context this means what we perceive is never the single truth that comes out of loudspeaker, but all kinds of stuff happens within our brain in addition that have nothing to do with hifi or speakers or sound coming out of speakers, that affect the perception.
I'm no neuroscientist or anything, this is just my current view on the subject based on random exposure of papers and data and personal observations. Like the material you attached to your post, the last screenshot in particular showing how the performers are perceived, notice how in the "Direct listening" the performers appear to be closer than what they are in reality, just like with the various mic / loudspeakers as intermediate steps. I speculate this is something simple, like the local early reflections mingling up localization cues so brain always makes a best guess off the location, and errs the sound source being closer than it actually is because that is safer and better error to make than estimating it being too far away, which is potentially lethal error in comparison. Imagine hearing train behind you as you walk on the tracks, if you'd perceive it being too far you'd get squished, rather than jump out of the way earlier than the last second. This means that what ever is on the track is perceived being closer than it actually is, and if one wants it further away it needs to be even further away, right?
So, if you want to perceive depth from your stereo you could look into your gear that it's sufficiently well performing, but the trick to get good depth perception is not in the gear but in the process that makes perception. What I mean is, all sensory inputs brain utilizes need to be rigged so that your brain makes assumption of true location, right, the task is to trick your own brain by consciously using indirect methods to affect the unconscious parts of yourself. Simplest example what I mean: assuming brain errors the depth localization to be closer than it is in reality, you'd want to make audio that hits your ear too far, so that when brain adds the error it's at the actual depth, right? In practice this would mean that all audio cues that indicate short distance should be minimized so that brain doesn't notice them, and all things that make sound appear farther should be there, emphasized. This means all early reflections should be reduced in level, right, not to give brain hint the sound source is there in front of you, and cues of far distance enhanced like late reverberation. Also making sure the frequency balance is right, for example too bright speakers would indicate all sounds are close, as treble should attenuate with distance. It could also mean that the playback system and source audio needs to have sufficient dynamic range to give cues there is sounds near by and also far away, and further away than they actually are, so kind of more dynamics than there would naturally be, less compression in signal processing. Also, keep eyes closed and so on, to control sensory inputs trying to align everything so that brain provides perception you want it to provide.
ps. there is fun aspect to this: everyone should build their own speakers, because that will ensure they are the best in the world, not because the sound leaving them was the best but because yer brain makes you perceive them the best. Until, at later date the brains opinion changes 😀
some coffee break thoughts on subject of depth perception:
currently I've got impression about perception, in general, that it's always result of combination of multiple sensory inputs and memories and all kinds of stuff, which the brain subconsciously handles and let's to exist as our perception, in consciousness. I think it is important to understand we in our conscious minds do not control what we hear, because the process that makes the perception is subconscious and we cannot directly control it with our thoughts. We can't force our brain do things for us directly, but could indirectly. In loudspeaker / hifi context this means what we perceive is never the single truth that comes out of loudspeaker, but all kinds of stuff happens within our brain in addition that have nothing to do with hifi or speakers or sound coming out of speakers, that affect the perception.
I'm no neuroscientist or anything, this is just my current view on the subject based on random exposure of papers and data and personal observations. Like the material you attached to your post, the last screenshot in particular showing how the performers are perceived, notice how in the "Direct listening" the performers appear to be closer than what they are in reality, just like with the various mic / loudspeakers as intermediate steps. I speculate this is something simple, like the local early reflections mingling up localization cues so brain always makes a best guess off the location, and errs the sound source being closer than it actually is because that is safer and better error to make than estimating it being too far away, which is potentially lethal error in comparison. Imagine hearing train behind you as you walk on the tracks, if you'd perceive it being too far you'd get squished, rather than jump out of the way earlier than the last second. This means that what ever is on the track is perceived being closer than it actually is, and if one wants it further away it needs to be even further away, right?
So, if you want to perceive depth from your stereo you could look into your gear that it's sufficiently well performing, but the trick to get good depth perception is not in the gear but in the process that makes perception. What I mean is, all sensory inputs brain utilizes need to be rigged so that your brain makes assumption of true location, right, the task is to trick your own brain by consciously using indirect methods to affect the unconscious parts of yourself. Simplest example what I mean: assuming brain errors the depth localization to be closer than it is in reality, you'd want to make audio that hits your ear too far, so that when brain adds the error it's at the actual depth, right? In practice this would mean that all audio cues that indicate short distance should be minimized so that brain doesn't notice them, and all things that make sound appear farther should be there, emphasized. This means all early reflections should be reduced in level, right, not to give brain hint the sound source is there in front of you, and cues of far distance enhanced like late reverberation. Also making sure the frequency balance is right, for example too bright speakers would indicate all sounds are close, as treble should attenuate with distance. It could also mean that the playback system and source audio needs to have sufficient dynamic range to give cues there is sounds near by and also far away, and further away than they actually are, so kind of more dynamics than there would naturally be, less compression in signal processing. Also, keep eyes closed and so on, to control sensory inputs trying to align everything so that brain provides perception you want it to provide.
ps. there is fun aspect to this: everyone should build their own speakers, because that will ensure they are the best in the world, not because the sound leaving them was the best but because yer brain makes you perceive them the best. Until, at later date the brains opinion changes 😀
Last edited:
I have extended the reflector point-source proof-of-concept to multiple-tweeter configurations, in order to better match the high sensitivity and SPL potential of large Pro/PA loudspeaker drivers. I now have nine different 15" widebands including (once) industry standards Peavey Scorpion and Black Widow, Fostex 15W300, JBL 2226H etc. Of course, the tweeter assembly and convex reflector should be "bridge" mounted and physically isolated from the cone and dustcap; unnecessary near-field unless playing deep bass loud. I think this experimental "speaker" to a large extent is able to mimick the aims of Fletcher (1934), Wente & Thuras monster horn point-source with 120° horizontal, 60° vertical dispersion. Hemispherically from the dustcap maintaining distance and ear/mic angle, comb-filtering and tonality change were not noticeable up to 75° off-axis (Extreme violin music or 10.5khz sinewave). Interestingly, the best (farthest away) image-depth was observed well off-axis, despite the (putative) increased non-coincidence of cone and multiple tweeter acoustic centers seen from off-axis (@AllenB @Scottmoose). Could be experimental error/inexactness in driver positioning, or perhaps something more subtle. I stand by the theory that distance perception is primarily monophonic and depended on time- and phase-aligned midrange-to-very-high frequencies i.e. coherent source; the brain is then able to synthesize and interpret each sound as coming from a distance correlated to the actual recording distance (if minimally mic'ed) or studio-engineered soundstage.
Hi guys. This continuation of #5404/5406 may be my thousandth message here (hard to believe) all in a short span of two-plus years. I finally squeezed in a few hours of experiment & trials in order to extend the reflector-coaxial virtual point-source to multiple-tweeter configurations. I don't know if the tweeters' reflected acoustic centers are quite within 1/4 wavelength <1cm of very high frequency >10khz, but they should be closer than front-firing tweeters. I evaluated the "speaker" by listening to music and 10.5khz sinewave -- hovering over the speaker maintaining distance and...
Don't know if people can figure out how to play the videos on bilibili (China's youtube) -- maybe just keep pressing. Lossy compression, sorry.
Extreme violin whistling folktune repeated ~60° off-axis; on-axis; ~30° off-axis https://b23.tv/ljcMPBf
Same, hemisphere-hovering https://b23.tv/Fa4CFV0
Violin showpiece in the style of Central Asia, hemisphere-hovering https://b23.tv/wudKgCx
(The same two pieces I recorded for the single tweeter reflector "Axia")...
Extreme violin whistling folktune repeated ~60° off-axis; on-axis; ~30° off-axis https://b23.tv/ljcMPBf
Same, hemisphere-hovering https://b23.tv/Fa4CFV0
Violin showpiece in the style of Central Asia, hemisphere-hovering https://b23.tv/wudKgCx
(The same two pieces I recorded for the single tweeter reflector "Axia")...
Last edited:
Can this method be adapted to car audioThree pieces of evidence:
(1) "Stereophonic" well-focused-depth of sound sources is best perceived with L/R speakers pointed axially at L/R ears at an oblique angle, where hearing is most acute (not from straight-ahead nor from the sides). Listener moving forward/back a few inches about the "sweet spot" can collapse the soundstage depth. Directional, very high frequency tones at the limit of hearing can only be heard the same way i.e. there's correlation. Poor HF-extension speakers (or speaker-placement) don't image well. Further experiment: Observe soundstage depth perception while altering HF extension.
(2) At very high frequencies the wavelengths become smaller than L/R distance-to-speaker variance, even within the sweet-spot, so L/R phase-correlation is not (can not be) critical for imaging, as is well-known. Evolution-wise, having ears that face opposite directions to detect predator/prey (and gauge nearness) further suggests L/R decoupling i.e. can function independently. Loudness in general is a major component of distance perception but so is tonality change due to the attenuation of (more directional) HF by distance; this requires only one ear to hear, though having both ears co-directional can help (e.g. sensitivity). Further experiment: Cut out one channel while listening from sweet-spot.
(3) I have done a 2-way (using Wavecor 045 ceramic-annulus-soft-dustcap tweeter) that had a deep soundstage when tweeter was offset (time- and XO-phase both aligned); when the tweeter was wired reverse-polarity and moved to the baffle (XO-phase was aligned but the acoustic centers no longer so; nothing else changed), the deep soundstage was gone. Further experiment: Start with time- and XO-phase both aligned and move tweeter offset by XO-wavelength.
These further experiments can help to test the theory. Additional tests anyone?
(Observations of analogous psycho-visual and psycho-acoustic stereo phenomena can be found in the "ragged coaxial" thread and also some threads I started in late 2022.)
Please buy a car "coaxial speaker" (for your car) whose tweeter can be disassembled (by cutting if need be) then flipped around and re-mounted so that a small reflector disk or dome could be inserted, about half-way between the cone's spider and the tweeter's voice-coil. Then nudge things around while listening (at approximate, appropriate angle) to music or test-tone, for "maximum effect". The reflector can be anything: paper/foil, plastic spoon, piece of ball; for a different project (omni-directional fullrange driver egg-goatee) I used clear packing tape folded on itself. Secure in place with wire/hot-glue/etc.Can this method be adapted to car audio
I remade the salted egg goaties, one end obtuse-ish and the other end acute-ish. Obtuse didn't...Before the dangerously-close big pointy snow-cones I had tried smaller cones cut from paper cups, then cooked salted goose eggs, then added "piece-of-tape" pointy goaties (to excellent effect). Still room for improvement above 10khz, but (listened OMNI) already remarkably flat above 120hz all the way to 10khz. Surprisingly, I could hear low-SPL-played 11.7khz tone omni. The challenge is to extend/raise very-HF bounce without enlarging the remaining ~8khz trace of the original HF plateau.
Last edited:
In response to some recent forum threads and at the risk of offending both "audiophiles" and "audio engineers", I'd like to state a rather simple view of what distinguishes one position/aspiration from the other -- pertaining to a fundamental but usually misunderstood aspect of audio fidelity that I believe can be scientifically studied. Imaging, of course. Much has been known for decades (even nearly a century Fletcher et al.).
o So-called HRTF head-related transfer function models how L/R sound waves diffract around the listener's head and facial features, reflect off of the receiving outer-ear parts, then enter the ear canals and become "heard". While phase change/delay due to HRTF may account for partial stereo sound localizaion (as proven by binaural in-ear recording/playback), high-frequency wavelength is too small compared to random variables such as distance for L vs R HF phase/delay to be an effective cue.
o Instead, L vs R relative loudness is a strong directional cue due to the outer ears' angular sensitivity/specificity (Fletcher); most recordings (except "purist" single-mic'ed ones) manipulate this to move/pan a sound source left or right on the "soundstage". This gives one dimension -- horizontal spread -- to the soundstage, and is the limit of how most listeners understood "stereo sound". We try to do better -- in several ways.
o By vertically aligning drivers' HF output and reducing baffle reflection and edge-diffraction -- thus minimizing each speaker's horizontal output signature/footprint -- the soundstage directional specificity can be sharpened.
o By aligning two drivers' phase at/around their crossover frequency, we get smoother frequency response but also -- subjectively -- a well-focused point-depth to each sound source as part of the soundstage. (I say "subjectively" because one common refrain is that imaging is not objectively measurable.)
o Constant directivity (a hugely popular theme in this forum) addresses the problem of a listener sitting (far) off the midline between speakers that L vs R distance-to-speaker attenuation disparity interfered with L vs R relative loudness cues in the recording. If the speakers' L vs R directivity-related loudness fall-off pattern is able to match and compensate for L vs R distance-attenuation, while keeping tonality FR (below 10khz) mainly flat and true, then horizontal imaging ought to be better. (If there's more to this someone please explain.)
o Now we come to "audiophile", which I will simply/simplistically define as someone who has heard and recognized, and then strives for, a "deeper" soundstage and higher-fidelity transcient response that required not just XO phase-alignment, but time-alignment to a very high bandwidth above 10khz, whether through DSP or physically aligning drivers' acoustic centers. The subject of this thread.
Frankly, I don't believe very many people have heard such a "coherent" sound system, from the proper sweetspot, ears angled obliquely to the speakers toed-in exactly on-axis, and actually hearing well above 10khz fully directional soundwave (shorter wavelength than size of transducer). This kind of experience is an Aha! moment.
o So-called HRTF head-related transfer function models how L/R sound waves diffract around the listener's head and facial features, reflect off of the receiving outer-ear parts, then enter the ear canals and become "heard". While phase change/delay due to HRTF may account for partial stereo sound localizaion (as proven by binaural in-ear recording/playback), high-frequency wavelength is too small compared to random variables such as distance for L vs R HF phase/delay to be an effective cue.
o Instead, L vs R relative loudness is a strong directional cue due to the outer ears' angular sensitivity/specificity (Fletcher); most recordings (except "purist" single-mic'ed ones) manipulate this to move/pan a sound source left or right on the "soundstage". This gives one dimension -- horizontal spread -- to the soundstage, and is the limit of how most listeners understood "stereo sound". We try to do better -- in several ways.
o By vertically aligning drivers' HF output and reducing baffle reflection and edge-diffraction -- thus minimizing each speaker's horizontal output signature/footprint -- the soundstage directional specificity can be sharpened.
o By aligning two drivers' phase at/around their crossover frequency, we get smoother frequency response but also -- subjectively -- a well-focused point-depth to each sound source as part of the soundstage. (I say "subjectively" because one common refrain is that imaging is not objectively measurable.)
o Constant directivity (a hugely popular theme in this forum) addresses the problem of a listener sitting (far) off the midline between speakers that L vs R distance-to-speaker attenuation disparity interfered with L vs R relative loudness cues in the recording. If the speakers' L vs R directivity-related loudness fall-off pattern is able to match and compensate for L vs R distance-attenuation, while keeping tonality FR (below 10khz) mainly flat and true, then horizontal imaging ought to be better. (If there's more to this someone please explain.)
o Now we come to "audiophile", which I will simply/simplistically define as someone who has heard and recognized, and then strives for, a "deeper" soundstage and higher-fidelity transcient response that required not just XO phase-alignment, but time-alignment to a very high bandwidth above 10khz, whether through DSP or physically aligning drivers' acoustic centers. The subject of this thread.
Frankly, I don't believe very many people have heard such a "coherent" sound system, from the proper sweetspot, ears angled obliquely to the speakers toed-in exactly on-axis, and actually hearing well above 10khz fully directional soundwave (shorter wavelength than size of transducer). This kind of experience is an Aha! moment.
Please enlighten me/us!What i miss in your write up is the effect doppler has on the image width and depth perception. It has!
Yes, In the late seventies-early eighties, a lot of testing. With a 3-way system (xo at ~450-3500Hz) the sound space was nice wide and deep and perspective appeared normal, that is a vanishing point, simply put, the further away the smaller.
With 2-way, xo at 1750Hz, this perspective sort of inversed. The further away the wider but also more vague/cloud instead of point. So the inverse of a vanishing point. For pop-music it was most of the time nice, for classical or jazz the recording mostly also has the space recorded, and that did not reproduce the spaciousness as you would expect. Not bad though. We attibruted this to the doppler effect, a rule of thumb then was frequency range of up to about 3 octaves the doppler effect is not audible, But at 5 or more it becomes audible. Also very close harmony would be added together as one, and more or less sound somewhat distorted. Think of 2 sopranos singing the flower duet, are a choir where sopranos or baritons sing the same.
Some background of the situation then: one of the persons involded worked as department head in a specialized medical centre for brain related issues, and with ENT specialists of that institute they, instead of studying traumas, studied the way the human hearing (both nervous and psycho) normally works.
With 2-way, xo at 1750Hz, this perspective sort of inversed. The further away the wider but also more vague/cloud instead of point. So the inverse of a vanishing point. For pop-music it was most of the time nice, for classical or jazz the recording mostly also has the space recorded, and that did not reproduce the spaciousness as you would expect. Not bad though. We attibruted this to the doppler effect, a rule of thumb then was frequency range of up to about 3 octaves the doppler effect is not audible, But at 5 or more it becomes audible. Also very close harmony would be added together as one, and more or less sound somewhat distorted. Think of 2 sopranos singing the flower duet, are a choir where sopranos or baritons sing the same.
Some background of the situation then: one of the persons involded worked as department head in a specialized medical centre for brain related issues, and with ENT specialists of that institute they, instead of studying traumas, studied the way the human hearing (both nervous and psycho) normally works.
Interesting, loss of coherence in the technical sense of time/phase errors. I get the vanishing point analogy.With a 3-way system (xo at ~450-3500Hz) the sound space was nice wide and deep and perspective appeared normal, that is a vanishing point, simply put, the further away the smaller.
With 2-way, xo at 1750Hz, this perspective sort of inversed. The further away the wider but also more vague/cloud instead of point.
Thanks for the explanation. Yes, so many Doppler-like distortions, time/phase/frequency/IM generated at different points of the cycle of widest-excursion. If 3-4 octaves are fine but 5 not, then 2-way midwoofs are out, unless the bass driver is very large (like my 15") and SPL domestic not PA. I'll try to think of a test for Doppler on imaging.We attibruted this to the doppler effect, a rule of thumb then was frequency range of up to about 3 octaves the doppler effect is not audible, But at 5 or more it becomes audible.
I did listen to Engleskyts soprano/organ played loud to unbearably loud, the closing climactic bottom chord, near-field and far-field in a large space, using a variety of 15" drivers and up to three high-sensitivity tweeters to match SPL... and was surprised by the avalanche of exuberant fine detail I hadn't been able to hear using small speakers and headphones (many). Not to say 15" could match the live concert experience I had at the Grace Cathedral in San Francisco.... I think I played the tests too loud and I was not doing bass extension so the drivers just sat in a washbasin or over a tall steel drum i.e. sealed.
Anyway, Doppler is relatively easy to mitigate. Ditto cabinet/component/tube vibration which can fuzzy-up otherwise pin-point imaging.
Last edited:
- Home
- Loudspeakers
- Multi-Way
- Audio-lensing and holographic depth perception: theory and experiments