Audio-lensing and holographic depth perception: theory and experiments

Ha! (mods don't lounge this yet)

After dozens of diy experiments under a wide range of (domestic) conditions and speaker/listener placements (including height), I'm reasonably convinced by the empirical evidence and common-sense inference to state a theory of soundstage imaging depth. Bits had been posted over time to the Full Range Photo Gallery and various threads such as on-going "ragged coaxials".

Metaphysics or psycho-acoustics -- a progression?

Claim 1: It is the monophonic recording-through-to-playback-chain that is responsible for enabling soundstage depth perception; the stereophonic L/R is responsible for enabling horizontal directional perception.

Claim 2: A necessary and sufficient condition for a monophonic audio chain to enable the perception of soundstage depth that projected well beyond the speaker, is close time- and phase-alignment (coherence), including very-high-frequency components of the sound.

I call this effect "audio-lensing" and experienced it quite dramatically, replaying phone-recordings of my coherent near-field 15in/ceramic-tweeter "reflector-coaxial" (playing virtuosic violin music). Aiming the phone-top-end (Vivo iqoo Z7 with stereo speakers at the ends) in various directions and at various distance-combinations from self, and room boundary 1-7 meters away, the sound source was heard to project/float mid-air far from the phone and near or just-past the wall in the pointed direction. In one extreme case the sound "played" from second-floor balcony at ~7.5m. The phenomenon as observed was analogous to "lensing" in that the distances from the phone to me and to the perceived source were in reciprocal relationship; source was farther away if phone was closer to me (and the reverse). Again, this was a stereo-phone playback with one end-speaker pointed away, of recordings of a monophonic coherent near-field 15"/ceramic-tweeter 2-way. This phenomenon was not observed replaying other people's sound-clips but I did not have time to test very many.

I will be adding experimental observations I've had, and inferences to back up this theory. And audio-clips if possible. I'd appreciate members sharing their experience and insight.

https://www.diyaudio.com/community/...ials-with-ragged-response.408887/post-7783316
 
Last edited:
  • Like
Reactions: cumbb
Three pieces of evidence:

(1) "Stereophonic" well-focused-depth of sound sources is best perceived with L/R speakers pointed axially at L/R ears at an oblique angle, where hearing is most acute (not from straight-ahead nor from the sides). Listener moving forward/back a few inches about the "sweet spot" can collapse the soundstage depth. Directional, very high frequency tones at the limit of hearing can only be heard the same way i.e. there's correlation. Poor HF-extension speakers (or speaker-placement) don't image well. Further experiment: Observe soundstage depth perception while altering HF extension.

(2) At very high frequencies the wavelengths become smaller than L/R distance-to-speaker variance, even within the sweet-spot, so L/R phase-correlation is not (can not be) critical for imaging, as is well-known. Evolution-wise, having ears that face opposite directions to detect predator/prey (and gauge nearness) further suggests L/R decoupling i.e. can function independently. Loudness in general is a major component of distance perception but so is tonality change due to the attenuation of (more directional) HF by distance; this requires only one ear to hear, though having both ears co-directional can help (e.g. sensitivity). Further experiment: Cut out one channel while listening from sweet-spot.

(3) I have done a 2-way (using Wavecor 045 ceramic-annulus-soft-dustcap tweeter) that had a deep soundstage when tweeter was offset (time- and XO-phase both aligned); when the tweeter was wired reverse-polarity and moved to the baffle (XO-phase was aligned but the acoustic centers no longer so; nothing else changed), the deep soundstage was gone. Further experiment: Start with time- and XO-phase both aligned and move tweeter offset by XO-wavelength.

These further experiments can help to test the theory. Additional tests anyone?

(Observations of analogous psycho-visual and psycho-acoustic stereo phenomena can be found in the "ragged coaxial" thread and also some threads I started in late 2022.)
 
Last edited:
Tying these together:
The phenomenon as observed was analogous to "lensing" in that the distances from the phone to me and to the perceived source were in reciprocal relationship; source was farther away if phone was closer to me (and the reverse).
Loudness in general is a major component of distance perception but so is tonality change due to the attenuation of (more directional) HF by distance;
... a deep soundstage when tweeter was offset (time- and XO-phase both aligned); when the tweeter was wired reverse-polarity and moved to the baffle (XO-phase was aligned but the acoustic centers no longer so; nothing else changed), the deep soundstage was gone.

A mechanism hypothesis: When a sound is coherent i.e. all of its component harmonics to very high frequency arrive together at the ear, and its temporal changes (such as a transcient attack) are in proper order (i.e. very high fidelity response), the brain is then able to interpret tonality degradation (i.e. attenuation of very high frequency due to distance) as the distance from sound source. Lensing happens because of the top-end-speaker's (still coherent) distant wall-bounce, in combination with the bottom-end-speaker's off-axis attenuation: closer to the phone without precise axial calliberation, off-axis attenuation of very high frequency at the ear is actually higher; this is perceived as a more distant sound source. Covering either end-speaker degraded the projected image.

I don't yet fully believe this complex explanation because the location of the projected image is fairly stable relative to quick changes of head position. But this stability could be a memory-effect, which I did observe. Listening this morning before work to LX MAOP 5/7 (mono near a corner), from different angle, distance, and height as I traversed the room, the floating image would shift (or brain locks in) but sometimes only after a momentary lag.

I can think of several experiments to test/falsify this hypothesis. What do you think?

https://www.diyaudio.com/community/threads/full-range-speaker-photo-gallery.65061/post-7791925
 
Last edited:
Hi,

for more listening tests with stereo sound try mono pink noise with two speakers. It should make maximally dry and sharp center phantom image, but perception of it might not be that. Listen the size of the phantom center, and whether your speakers seem silent or does some of the sound localize to the speakers as well. Move slowly further or closer from speakers staying equidistant to both speakers and note what happens to the image. What if you play with toe-in and repeat the test?

Does this stuff relate to the mono tests you've been doing and perceptual effects what you've been noticing?

About localization and memory:
yeah I think localization has memory, I have vague memory reading somewhere that transients make brain localize sound and it sticks until new reliable information, another transient perhaps, relocates it. Transients are easily localizable but sounds that have slow onset are not and I remember reading that it's due to brain having sufficient amount of time to process the information before reflections come in and mix it up. IOW there is high signal to noise ratio with the transient. Here is listening test: play mono violin on either of your speaker, left or right, and stand other side of your room. can you tell which speaker output it? Now try some transient, like snare drum hit or triangle or something, spoken word even, can you localize that better to one of the speakers? If not, try moving yourself bit closer to reduce early reflections and repeat the test. The transient containing signal localizes from much further listening distance, and it's due to how early reflections swamp it and brain just cannot pick it up. Once you localize the sound brain kinda assumes it's there. I assume if loudspeaker also loses this transient related information, like too bad group delay, diffraction issues, resonances, phase information reducing activity in general, the closer you need to be to get the localization happen.

Now that you are interested about all this stuff it's good to read some basics. For example checkout Wikipedia about perception, which reads
"The perceptual systems of the brain enable individuals to see the world around them as stable, even though the sensory information is typically incomplete and rapidly varying. Human and other animal brains are structured in a modular way, with different areas processing different kinds of sensory information. Some of these modules take the form of sensory maps, mapping some aspect of the world across part of the brain's surface. These different modules are interconnected and influence each other. For instance, taste is strongly influenced by smell.[7]"

Some other stuff:
"
Perception is not only the passive receipt of these signals, but it is also shaped by the recipient's learning, memory, expectation, and attention.[4][5] Sensory input is a process that transforms this low-level information to higher-level information (e.g., extracts shapes for object recognition).[5] The following process connects a person's concepts and expectations (or knowledge) with restorative and selective mechanisms, such as attention, that influence perception.

Perception depends on complex functions of the nervous system, but subjectively seems mostly effortless because this processing happens outside conscious awareness.[3] Since the rise of experimental psychology in the 19th century, psychology's understanding of perception has progressed by combining a variety of techniques.[4] Psychophysics quantitatively describes the relationships between the physical qualities of the sensory input and perception.[6] Sensory neuroscience studies the neural mechanisms underlying perception. Perceptual systems can also be studied computationally, in terms of the information they process. Perceptual issues in philosophy include the extent to which sensory qualities such as sound, smell or color exist in objective reality rather than in the mind of the perceiver.[4]

Although people traditionally viewed the senses as passive receptors, the study of illusions and ambiguous images has demonstrated that the brain's perceptual systems actively and pre-consciously attempt to make sense of their input.[4] There is still active debate about the extent to which perception is an active process of hypothesis testing, analogous to science, or whether realistic sensory information is rich enough to make this process unnecessary.[4]
"


So best to do this listening test stuff eyes closed 🙂 Need to hear better? Then indirectly exploit brain sensory system to make deliver you a perception you'd like, like imagining how it works 😉
 
Last edited:
Another example is if you need to listen effects of something, like importance of acoustic offset, then subject yourself to it somehow first to exploit the fact that our sensory system affects perception like in the Wikipedia article above. Because the auditory system adapts, you can acitvely adaot it to your advantage. Although we have no direct control on the subconscious parts of auditory system we can indirectly affect our perception of sound!

I haven't actively focused on the acoustic offset so I have no listening test for you, but I'll use early reflections listening test as an example:

Before you start to listen your stereo system for effects of early reflections first thing is to subject yourself to reflections to make you sensitive for them. I've used mobile phone and makeshift horn attached to it, just cut some big soda bottle and hold it in front of your mobile speaker so that you get some extra directivity. Play some noise and point your hand held sound torch to walls, floor, objects in general and listen the reflection. It's really easy as you are in control of making it.

Now if you set up your stereo so that only one speaker is playing, another disconnected, go standing behind the speaker that plays the sound and rotate it a bit, point it towards various walls the same way as you did with your mobile a minute ago, and sure enough you spot quite easily how the reflection sounds like. Now as you hear the reflection on the wall just move yourself from behind the speaker in front of it while concentrating listening the reflection you just heard and sure enough you'll hear it quite clearly all the time. You are now actively listening to the reflection and detect it very well and can do your tests with it. Change noise to music if you wish, and so on. Point was to make your sensory system very sensitive to it so it's easier to notice changes in it so that you can adjust things as you see fit.

Now proceed with what ever it was you wanted to test. You could for example listen your backwall reflection, and how it changes as you move closer or farther from speakers (adjust listening triangle), how it changes when you turn your head, use acoustic treatment like a pillow to figure out if you could attenuate it, try find listening position where the back wall reflection disappears from your perception, change toe-in where the reflection disappears, what ever you feel like you need to be doing. This is to try first determine whether it is something that affects perception negatively and whether it draws your attention, and if it does how the system needs to be arranged to make it disappear and not bother, not take away from your music enjoyment. If you can make it so that it is not drawing attention to itself while you have really heightened your sense to it you can be quite sure it's non-issue in normal listening situation the next day as your hearing has reset. Do what ever it was you wanted to do with it. This will give a lot of food for thought and nice fun perceptual experience, if nothing else 😀
 
Last edited:
  • Like
Reactions: uriy-ch
Three pieces of evidence:

(1) "Stereophonic" well-focused-depth of sound sources is best perceived with L/R speakers pointed axially at L/R ears at an oblique angle, where hearing is most acute (not from straight-ahead nor from the sides). Listener moving forward/back a few inches about the "sweet spot" can collapse the soundstage depth. Directional, very high frequency tones at the limit of hearing can only be heard the same way i.e. there's correlation. Poor HF-extension speakers (or speaker-placement) don't image well. Further experiment: Observe soundstage depth perception while altering HF extension.

(2) At very high frequencies the wavelengths become smaller than L/R distance-to-speaker variance, even within the sweet-spot, so L/R phase-correlation is not (can not be) critical for imaging, as is well-known. Evolution-wise, having ears that face opposite directions to detect predator/prey (and gauge nearness) further suggests L/R decoupling i.e. can function independently. Loudness in general is a major component of distance perception but so is tonality change due to the attenuation of (more directional) HF by distance; this requires only one ear to hear, though having both ears co-directional can help (e.g. sensitivity). Further experiment: Cut out one channel while listening from sweet-spot.

(3) I have done a 2-way (using Wavecor 045 ceramic-annulus-soft-dustcap tweeter) that had a deep soundstage when tweeter was offset (time- and XO-phase both aligned); when the tweeter was wired reverse-polarity and moved to the baffle (XO-phase was aligned but the acoustic centers no longer so; nothing else changed), the deep soundstage was gone. Further experiment: Start with time- and XO-phase both aligned and move tweeter offset by XO-wavelength.

These further experiments can help to test the theory. Additional tests anyone?

(Observations of analogous psycho-visual and psycho-acoustic stereo phenomena can be found in the "ragged coaxial" thread and also some threads I started in late 2022.)
Careful……you might find yourself in the bio acoustics realm where the individual differences from person to person give rise to the fact that subjectivity has merit……….and the new school of sighted listeners and reviewers on YouTube might label you a heretic.
 
Three pieces of evidence:

(1) "Stereophonic" well-focused-depth of sound sources is best perceived with L/R speakers pointed axially at L/R ears at an oblique angle, where hearing is most acute (not from straight-ahead nor from the sides). Listener moving forward/back a few inches about the "sweet spot" can collapse the soundstage depth. Directional, very high frequency tones at the limit of hearing can only be heard the same way i.e. there's correlation. Poor HF-extension speakers (or speaker-placement) don't image well. Further experiment: Observe soundstage depth perception while altering HF extension.

Well known phenomenon:
https://www.acousticfields.com/critical-distance-explained/#:~:text=Critical distance is the distance,and intelligibility of the sound.

What Griesinger study is not totally different.

The soundstage collapse mainly because early reflections mess things up, even more from 1khz and up:

http://downloads.bbc.co.uk/rd/pubs/reports/1995-04.pdf

I think your point of view on hf needs some verification/points made to validate your hypothesis.
High frequencies can have different effects over different peoples: we don't all listen/feels the same things. I've been whitness of such a test regrding supertweeters and results where surprising.

When you alter hf frequency you defacto alter the way our brain integrate distance ( depth): air does absorb high frequency with distance, our brain integrated this for a long time.


(2) At very high frequencies the wavelengths become smaller than L/R distance-to-speaker variance, even within the sweet-spot, so L/R phase-correlation is not (can not be) critical for imaging, as is well-known.

Yes this is how our brain works.Our auditory system switch from time difference ( delta Time) to level differences ( delta Level) to interpret stereo signals.
This happen because our head became a significant object which 'shadow' sound source. Of course it's not on/off so there is a grey area where both principle are used together. The transition occurs circa 1khz with a grey zone spanning circa an octave around (750hz/1,5khz).


Evolution-wise, having ears that face opposite directions to detect predator/prey (and gauge nearness) further suggests L/R decoupling i.e. can function independently. Loudness in general is a major component of distance perception but so is tonality change due to the attenuation of (more directional) HF by distance; this requires only one ear to hear, though having both ears co-directional can help (e.g. sensitivity). Further experiment: Cut out one channel while listening from sweet-spot.

I don't get neither what you tried to explain, neither what your experiment will show or you'll try to validate. Could you rephrase it, words it differently please?
(3) I have done a 2-way (using Wavecor 045 ceramic-annulus-soft-dustcap tweeter) that had a deep soundstage when tweeter was offset (time- and XO-phase both aligned); when the tweeter was wired reverse-polarity and moved to the baffle (XO-phase was aligned but the acoustic centers no longer so; nothing else changed), the deep soundstage was gone. Further experiment: Start with time- and XO-phase both aligned and move tweeter offset by XO-wavelength.

These further experiments can help to test the theory. Additional tests anyone?

I can suggest you some tools and test to performs.
Do you own a computer ( Laptop, tower,...) running windows ( even an old one, XP is the limit)?
 
I'm attempting a couple of videoclip uploads to China's youtube. Please let me know whether & how well they play.
https://www.bilibili.com/video/BV12JtoeeEgg/ (short Extreme violin, from CD-rip, "reflector-coaxial" distance ~1m) "Reminiscing"
https://www.bilibili.com/video/BV1KptoeYETq/ (longer showpiece, from CD-rip, "reflector-coaxial" distance ~1m) "Sunshine..."
Taken with inexpensive smartphone so please EQ/rebalance if you want to enjoy the music. For test evaluation only. (Bass falls off gradually below 230hz due to well-damped nested washbasins.)

https://www.bilibili.com/video/BV1ay4y1q7i3/ (someone else's video upload, same violinist -- China's first to win International Gold 1987 Paganini at 18)
Sunshine on Taxkorgan / Sunshine over Tashkurgan (1976 Chen Gang composer; Lü Siqing violin, Zheng Hui piano)
 
Last edited:
Thank you @tmuikku @krivium @mayhem13 for replying. I'll very briefly try to clarify and also bring in (or bring back) additional observations of a more subtle nature. Much of msg#1-3 are "common sense" or "well-known facts" but (I hope) in a new light once connected. And common sense may be wrong; facts not that well-known, maybe not even factual or not the whole story.

(4) My ultimate L/R comp method https://www.diyaudio.com/community/threads/replacing-crossover-capacitors.392208/post-7177613
"Make sure listening environment is symmetrical i.e. does not favor one side. L and R speakers XO'ed differently. Play the same musical segment twice in succession, "instantly" swap L/R input to the (pre)amp, not speakerwire. If one half of the soundstage presentation is preferred both times, that side (speaker XO) wins."
=> A possible (subtle) implication is that L and R auditory depth-perception (sound-field recognition) can function separately or at least not in unison.

(5) Stereo-depth headphone https://www.diyaudio.com/community/threads/open-wing-headphone-crossfeed-stereo-sound.391630/
"I was able to achieve stereo sound, depth perception with singer/musicians/stage in front of me, albeit smaller and nearer than the presentation through stereo loudspeakers. The music separates/delineates more easily and sounds more natural. This effect requires: (1) twisting the headphone pads "OPEN-WING" so sound comes from FRONT-LEFT/RIGHT (and very slightly above if possible), not shooting straight into ear canal; headphones with 2-degrees-of-freedom work best (2) EQ DOWN both trebble and bass for a realistic sense of distance, but compensate for bass-loss due to pads not being sealed (try hand-cupping ears or draping flaps over the gap as in picture) (3) time-delayed crossfeed to opposite channel (I conjecture that additional lower bass delay would be better)"
=> Wrong loudness of the treble or bass can break distance perception; treble >2.5khz adjusted properly can aid distance perception. (EQ curve based on my >100 front-row live music integrated listening impressions.)

The above fill-in msg#2 (2). A prey animal's ears listen in different directions at the same time (ours too). The proposed "further experiment" to listen in the sweet-spot and note the perceived soundstage depth, then cut out one channel (by remote) without moving or turning one's head, and note by how much the soundstage depth changed. (Of course horizontal span will have collapsed toward the still-active speaker.) If depth did not collapse (fully or nearly-so), even after a momentary memory-effect passed, that would lend support to msg#1 Claim 1.

Certainly our psycho-aural and psycho-visual organs/pathways/mechanisms are different (in fact vision partly crosses, called "optic chiasm"). But they both serve the same survival goal in the same physical environment, so one would suspect parallels such as I wrote in "ragged coaxials":

(6) One-eye looking at a picture has depth https://www.diyaudio.com/community/...ials-with-ragged-response.408887/post-7789761
"With both eyes open and no parallax the brain says "flat" no second thought. With just one eye open the brain amplifies the information and reconstructs the scene in real-time, to a perceived depth of ~65-70% that of a true stereo-pair of images seen through a 3D-viewer."
=> Only 1/3 of visual depth is dependent on parallax (and AI has already been trained to infer 3D model from a single 2D image). Auditory depth-perception should rely even less on combining L/R given the ears' position and direction (and all the rest).

I have been a music fan my whole life and (sorry) consider myself an "audiophile purist". Until recently I had considered L/R stereophonic sound to be absolutely necessay for soundstage to have depth. A quick bing "monophonic depth" first hit https://audiomav.com/mono-vs-stereo-and-when-to-use-them/ "There is no sound perspective when using monophinic sound. All of the sound heard from the track are at the same volume level. Since the volume levels are the same within the track, there isn't any audio-based depth perception." A well-known fact!

In 2022 I got hooked by vintage loudspeaker drivers on China's eBay called idlefish market, and asked innocently "suppose you had an awesome speaker but only one?"
https://www.diyaudio.com/community/threads/suppose-you-had-an-awesome-speaker-but-only-one.392618/
which led to dozens of DIY experiments mono or stereo, many up-firing or LX (reported in the Full Range Photo Gallery beginning late 2023). Including this effort:

(7) Wall-bounce stereo console dipoleL-R omniL+R https://www.diyaudio.com/community/...from-a-single-loudspeaker.200040/post-7681161 related https://www.diyaudio.com/community/...ntom-center-image-problem.393540/post-7688820
"First turn up the omni pair to get decent bass (but not quite enough) and the floating image, then turn up the dipole pair to spread out the image; finally tweak the overall volume so the floating 3D sound stage distance is consistent with the loudness"
=> Observed/effected the de-coupling of soundstage depth (monophonic) and width (stereophonic), and the importance of adjusting proper loudness.

A DSP fiend (we have many) can probably test out Claims 1 & 2, particularly time-aligned vs not, everything else being equal as much as possible. (And I think many already have.) Another "well-known fact": imaging cannot be measured. Maybe only the L/R "stereophonic" part cannot be measured.
 
Last edited:
Hi,

I'm trying to capture the essence of your thread. It seems to me you are mainly writing about depth perception and what affects that, what you've found with listening tests, right? But holographic means 3D to me, so not just depth, so are you assuming also that when depth is perceived the full 3D is complete, eg. depth perception is the hardest to achieve?

Yeah your reasoning and observations and experiences seem logical. I can deduce what sound sources at different distances would sound like just by reasoning, but perhaps there is more what I cannot reason. Depth is perceived from things like reverb and attenuation of highs, overall loudness, and perhaps something else like frequency dependent reverb or something like that. Doesn't matter too much yet as we could assume that when all the cues are present, there is no reason depth was not perceived, right? Conversely, when some of the audible things that indicate depth give mixed cues the brain is not fooled and there is no depth perception, right? So what we need to do is list some things and try to prevent the mixed cues for depth.

So, a playback system* should be able to reproduce sounds that have all cues what a closest and furthest sound source would have, and everything in between should fall in place, right? So perception of depth is determined how close, and how far, sounds we can imitate with the playback system. Various aspects of playback system support either or both, and could ruin either or both, so depth perception would be reduced either from the close distance of from the far distance, right. So, what I'm thinking all we need to do what extreme close sounds would require from our playback system in order to be realistic (not mixed cues), and what extreme far sounds would need. Either could be optimized relatively separately, but max depth would happen when both extremes are happening, right?

What are properties of sound source at close distance then? Here is few
  • quite long delay before early reflections come after direct sound, and reverb in general is in low level compared to direct sound.
  • Is quite loud compared to other sounds (in the recording) including the reverb
  • has high dynamics because we could hear very low level sounds from someone whispering your ear, and would also get very loud if they yelled to the ear.
  • All highs would need to be there, iow. natural balance between highs and lows.
  • What else?

The further the source is
  • the dynamic range gets lower because lowest sounds (like whisper) would not be audible
  • Highs attenuate in the air so frequency balance has some changes
  • Early reflections louder and with shorter delay compared to direct sound. Reverb in general is louder compared to direct sound.
  • overall loudness, the reverb, the dynamics, reverb stuff need to be such in the recording so that they are realistic to each other, and compared to the close sounds, so realistic distance information in the recording.
  • what else?

So the goal is to fullfil the above properties with our playback system:
  • high dynamic range capability: loud sound need to be as loud as they were in order not to reduce how close they would be, and low level sounds need to be audible as well so low noise system and high SPL setting should give most hints of distance regarding SPL.
  • properties of reverberation in local room needs to be in check so that cues in the recording come through. Since early reflections are loudest and would ruin the "close" sound, where early reflections come later in the recording than in your local room. This by the way should be good indication of reduced early reflections, maximum dry recording and low SPL early reflections and high SPL capability should be able to bring sounds right on the face. As early reflections get reduced either by toe-in or positioning, the sound should get closer to you as more of the close sound cues in the recording are audible (over your own room early reflections).
  • overall volume of the system needs to be up to bring the low level far stuff audible (so that there is hint of stuff far away). Too low listening level and the relatively low level sounds far away faint out of audibility much before the singer that's been mixed in the front.
  • relatively dry listening room would emphasise sounds closer to listener and not help any with the far sounds, so enable very close sounds and do nothing for the far sounds (in the recording). Very reverby room would bias all sounds further out in distance, so helps with the far distance depth, but disables the close distance.

In general room affects frequency balance, because of system directivity for example, highs attenuate much faster than lows the further one listens simply because lows are more omnidirectional than highs, also furnishing attenuates (room) highs more and so on on, so it's delicate at which distance (and positioning overall) any speaker system produces realistic frequency balance and low enough early reflections, right.

And stuff like this, just by trying to reason with what we know, or think we know, about things. There is likely much more than what I wrote, for example what the reverberation does to sound and how auditory system processes it (mixes phase, reduces signal to noise ratio), which I think is why also the phase and delay stuff of the loudspeakers matters as it mixes original sound harmonics so reduces SNR like loud reflections, and would prevent the close sound to happen, make everything biased further in depth and give mixed cues with SPL and other aspects of depth of various sounds in the recording. Perhaps any of these attributes of playback system is enough to give mixed cue and flatten the sound, preventing the holographic to happen.

My coffee break is over so perhaps continue later with the lists 😀 Is this something you've been writing about?

*) Playback system includes all the electronics, the speakers, the room and positioning of speakers and the listener.
 
Last edited:
Another cup of coffee 😀
Within the above hides lots of stuff, like why ~constant directivity speakers would give better depth perception:
Any smooth directivity speaker, whose DI increases with frequency, could be toed in so that power response and direct sound are relatively similar, IOW constant directivity to a particular angle. But, if one now moves and changes listening distance, the angle changes and constant directivity is lost and low / high balance changes as one moves, this would make mixed depth cue and likely flatten the sound everywhere else in the room except around the sweetspot. So, only constant directivity speaker would allow SPL depth cue for multiple listening distances, IOW maintain frequency balance no matter what the listening distance is. Well, stereo speakers with phantom image would likely collapse the depth anyway if deviating from the center line.

I assume directivity should span to very low frequency including room treatment / multisub so that also the lows (room modal region) stay in balance with the highs to maintain depth perception. If recording contains only sounds in rooms, there is similar low freq mismatch as with our own room acoustics recarding depth cue (lows about the same SPL no matter distance within same room), but if we listen sounds that are supposed to give depth perception of outside environment then the system should be constant directivity to very low frequency.

ps. one could fabricate many listening tests for any of the things in my previous post. Listeing tests how to get the sound field deepest end deeper, and how to affect how close the sound comes. Just with toe-in and positioning of yourself and how everything is in the room. With one or two speakers. Manipulating sound with DSP. Even building prototypes that are physically adjustable, if one is interested going that far.

There is lots of hidden nuggets to reason with: some people just don't like in your face sounds, so please just increase your early reflections and that's it, sound never comes too close. Want get sound far back? Then perhaps increase room sound using a dipole for example, and to make sure it never comes in your face don't make it too big so that it's dynamics and SPL capability are limited. In theory, constant directivity big dipole system should give most depth, as early reflections can be reduced (closest depth), late reverberation increased (farthest depth enhanced) but also the sound could be dynamic so somewhat in your face I think. Conversely, frontal radiating speakers will never give same depth as dipole (or monopole) as those enhance the depth with reverberation, there is extra far depth cues in comparison. Lotsa stuff, at least i theory, and optimization would need to be done according to what one likes, nolike close sounds? optimize differently than for those who likem 🙂 listening tests would help if these are true or not. Perhaps all we need is good balance between close and far depth, no need to go to extremes.
 
Last edited:
The inherent challenge or problem is that most of what this thread is about is already baked into the recording…..how the engineer placed tracks in the soundstage as well as the use of microphones and time domain post processing. We as end users really have no idea what that specific reference is…..so we often struggle to get out of a recording what we want it to sound like. No single stereo pair of speakers will ever be able to do that for all recordings and the fact is that if you’re fortunate enough to be able to own a few premium stereo pairs and a large enough soundstage to place them all at home, you’re likely as close as one could ever hope to be. And I’ll be completely honest, one of those stereo pairs should be a pair of Maggies because more often than not, there’s recordings that regardless of the native tonal imbalance of the Maggie’s, the spatial imaging is just too good to be outshined.

Constant DI speakers crossfired in front of the listening position is another great trick….if the recording has nice mid-side processing the image can float in the space. Lots of recording specific cons too…….those that practice this trick know what I’m talking about.

Closest i‘ve come to the ‘one ring to rule them all’?…….still the physically and acoustically small two way placed in the 3D space and supported by a multi sub system……nothing sounds outstanding but everything sounds great…….a fair trade off IMO If one does have a pair of Maggie’s close by…….those can on occasion produce the amazing listening nirvana once in a while.
 
I would guess, good soundstage depth has been around much, much longer than the attention paid to "constant directivity" and "room-effect management" especially via digital simulation software and DSP hardware. A small but good $15 fullrange driver* can have excellent imaging, much better than most multi-way speakers, if the direct, very high frequency sound can be heard, i.e. aimed axially at an ear obliquely. When the listener's head moved a few inches and the soundstage depth collapsed, surely the room reflections hadn't changed much if at all. As a first-order approximation I think soundstage depth is a direct-sound effect. Reflections can help or hinder imaging; but for it to be responsible or indispensable for soundstage depth is a "good" hypothesis that can be tested quantitatively in a large room (not to mention anechoic chamber). The "audio-lensing" mentioned above may be correlated with it.

But my two Claims in msg#1 are more about ("theory") mono vs stereo effect (each ear independent depth perception vs L/R-together parallax depth perception) and ("practice") time-aligned arrival of very high frequencies being necessary (along with phase-alignment etc.). What I'm not sure, is whether said condition is indispensable per se, or it enabled very-high-fidelity transcient response which is indispensable. (This can probably be tested with choice of music/sound.)

* For example a 4" whizzer fullrange (brand called AIRS) effortlessly projected violin soloist front and high, piano back and low.
 
  • Like
Reactions: tmuikku
My contribution:
I recommend, based on some arguments:

Speakers have an image size. Different speakers have different image sizes.
The majority of stereo signals are found on both channels, i.e. "mono".
If speakers are placed too far apart, the sound image is torn apart.

To find a sound that is homogeneous in depth, width and also height, and tonally: recommend placing your speakers about 50 cm (center-center) apart and listening for a while. Then at some point place the speakers 60 cm apart. Then at some point 70 cm. And so on, until the sound image torn apart... Don't forget to move speakers back until they play together again now;-)
With compact speakers, this distance is far less than one meter, with really large floorstanding speakers very rarely more than 1.5 meters.
Angulation: max homogeneity: often so that the outer side walls of the boxes are visible.
Distance to the rear wall: when bass and fundamental engage.

And: sources and amplifiers (and single parts as transistors or caps) and others also have different imaging sizes and shapes. The initially adjusted loudspeaker placement must very often be corrected, even if only by 2 - 3 cm, when replacing equipment.

...and: No DSP: all your equipment, all parts, the system is the "DSP"-)
 
  • Like
Reactions: tmuikku and wchang
I'm attempting a couple of videoclip uploads to China's youtube. Please let me know whether & how well they play.
https://www.bilibili.com/video/BV12JtoeeEgg/ (short Extreme violin, from CD-rip, "reflector-coaxial" distance ~1m) "Reminiscing"
https://www.bilibili.com/video/BV1KptoeYETq/ (longer showpiece, from CD-rip, "reflector-coaxial" distance ~1m) "Sunshine..."
Taken with inexpensive smartphone so please EQ/rebalance if you want to enjoy the music. For test evaluation only. (Bass falls off gradually below 230hz due to well-damped nested washbasins.)

https://www.bilibili.com/video/BV1ay4y1q7i3/ (someone else's video upload, same violinist -- China's first to win International Gold 1987 Paganini at 18)
Sunshine on Taxkorgan / Sunshine over Tashkurgan (1976 Chen Gang composer; Lü Siqing violin, Zheng Hui piano)
I found that on my phone browser it kept trying to launch bilibili App until the third attempt; then it would play from the bowser after a final confirmation to do so.

The short Extreme violin-as-tin-whistle piece was recorded from about 1ft not 1m. Due to the wide dispersion of very-high-frequency bouncing off the convex dustcap, no axial-placement caliberation was done or was felt necessary; the phone simply handheld for the duration (shakily).

The longer piece was recorded from about 1m, by standing over the speaker. Interesting comparisons between performances (the other video) and between sound. "Audio-lensing" using my Vivo iqoo Z7 phone, projecting toward wall/corner or (stereo landscape orientation) toward both sides, was stronger and clearer with my recordings.
 
Last edited:
p.s. bilibili compression made HF kind of harsh; smartphone recording lost a lot of LF.

p.p.s. open-wing headphone thread linked to my old bilibili audio-only clips including demo Lowther PM2A/Fidelio, PM6A-nude, Naturelle/Eve-honeycomb/TLonken, as well as crossfeed-EQ test files vs music source.
 
Last edited:
The inherent challenge or problem is that most of what this thread is about is already baked into the recording…..how the engineer placed tracks in the soundstage as well as the use of microphones and time domain post processing. We as end users really have no idea what that specific reference is…..so we often struggle to get out of a recording what we want it to sound like. No single stereo pair of speakers will ever be able to do that for all recordings and the fact is that if you’re fortunate enough to be able to own a few premium stereo pairs and a large enough soundstage to place them all at home, you’re likely as close as one could ever hope to be. And I’ll be completely honest, one of those stereo pairs should be a pair of Maggies because more often than not, there’s recordings that regardless of the native tonal imbalance of the Maggie’s, the spatial imaging is just too good to be outshined.

Constant DI speakers crossfired in front of the listening position is another great trick….if the recording has nice mid-side processing the image can float in the space. Lots of recording specific cons too…….those that practice this trick know what I’m talking about.

Closest i‘ve come to the ‘one ring to rule them all’?…….still the physically and acoustically small two way placed in the 3D space and supported by a multi sub system……nothing sounds outstanding but everything sounds great…….a fair trade off IMO If one does have a pair of Maggie’s close by…….those can on occasion produce the amazing listening nirvana once in a while.
Hi,
yeah idea is to let sound on the recording come through. A recording has spatial information on it, be it just dry mono speech without much at all, or an orchestra in a hall, and idea is to preserve those in the playback. There is no need to know what those are in the recording, as we are not replicating the recording situation but setting up a playback situation that works for all recordings in this respect. This ideal is not always possible and then one could do what ever seems fine given the circumstances.

If assuming spatial cues were only those I listed yesterday, it's possible to qualify the system regardless of recording just by making sure the system is technically in check, has enough dynamics, preserves phase, early reflections in check and so on, these could be just measured and deemed fine or lacking. But fear not it is relatively easy to listen this stuff as well, so both methods can be used to evaluate whether the cues come through sufficiently or not, and which one the listener likes more.

And it's not even which one they should choose, because I could speculate both can be accommodated simultaneously, and switched between fast, only if one is willing to have two listening spots!🙂 By both I mean situation where spatial information of recording comes through better or worse. It's easy to ruin the sound by increasing listening distance and let early reflections overhelm the spatial cues. This of course assumes that the system is fine enough that the good sound with exists at some short listening distance.
 
Last edited:
I would guess, good soundstage depth has been around much, much longer than the attention paid to "constant directivity" and "room-effect management" especially via digital simulation software and DSP hardware. A small but good $15 fullrange driver* can have excellent imaging, much better than most multi-way speakers, if the direct, very high frequency sound can be heard, i.e. aimed axially at an ear obliquely. When the listener's head moved a few inches and the soundstage depth collapsed, surely the room reflections hadn't changed much if at all. As a first-order approximation I think soundstage depth is a direct-sound effect. Reflections can help or hinder imaging; but for it to be responsible or indispensable for soundstage depth is a "good" hypothesis that can be tested quantitatively in a large room (not to mention anechoic chamber). The "audio-lensing" mentioned above may be correlated with it.

But my two Claims in msg#1 are more about ("theory") mono vs stereo effect (each ear independent depth perception vs L/R-together parallax depth perception) and ("practice") time-aligned arrival of very high frequencies being necessary (along with phase-alignment etc.). What I'm not sure, is whether said condition is indispensable per se, or it enabled very-high-fidelity transcient response which is indispensable. (This can probably be tested with choice of music/sound.)
For sure, this is the hifi I think, good stereo spatial sound, the magic that hifi has been for decades. Or in mono why not.

I think so too that the magic is in the direct sound and not in local room reflections. Local room reflections could prevent it though, same with bad phase and delay stuff, could prevent it. Same with too small speaker compressing, or bad positioning, too much noise in the system. Not listening exactly equidistant to both speakers if stereo. I bet that your good mono system would also lose it if you drag it closer to corner where frequency balance changes and early reflections increase in amplitude and reduce in delay compared to direct sound. All and any of these could ruin it, so we wanna all them aligned, or fine enough that the spatial cues in the direct sound maintain well enough through the room to our ears, and most importantly through our auditory system passing through it as real deal, so that "realistic" perception happens.

Realistic I mean, that if the spatial cues on recording are mixed by local room effects or deficits in the system like touched earlier, brain gets mixed cues and just ignores some of it: provides perception with most probable sound, which is likely flat sound localizing what your eyes see.

Basically philosophy is that a system cannot be better than what is on the recording, and it can only reduce from what is on the recording. It's how you look at things, magic is not made in the system, but let through! And it needs to be öet through through the auditory system as well, all the way to conscious perception.

Single fullrange driver speaker is good for this stuff. Single fullrange speaker doesn't suit all applications though as it has limited SPL capability and typically issues with coverage/directivity.

ps. someone used this kind of analog for sound reproduction on youtube that "to clearly see through multiple consecutive windows all of those windows need to be clean to see through clearly. If any of the windows is dirty, the scenery behind gets dirty."
 
Last edited:
Wow thanks a million @tmuikku I'm going to have to join ASR to contact j_j. Haven't had time to read the thread but downloaded the 1934 Symposium on Auditory Perspective and read 2+ articles starting with Fletcher.
How does the saying go? First the experts all say the idea is wrong. Then they say it is right but not important. Finally they say it is both right and important, but they had known it all along!
So they had known it for ~0.9 century.... Sadly neither Klipsch in '64 nor subsequent reprint-er provided comments on progress after '34. So I'm going to take a shot -- very nonexpertly, regardless of time period.

(1) Subjective observers were asked to localize apparent sound sources (speech not music) behind a curtain. My take-away (Fletcher's too, Fig.1) is that 2ch (2 mic, 2 speakers) collapsed soundstage depth to a line near the back; 2ch through L/R-ear angular-sensitivity-differential explained lateral directional localization (their curves qualitatively but not quite quantitatively matched my experience); but this mechanism in fact obfuscated depth perception (which I had preliminarily confirmed yesterday, that lowering one channel marginally increased the perceived depth of sound projected behind speaker).

(2) Much of their effort was devoted to 3ch; center mic and/or speaker uniformized the sound field as expected (i.e. in line with my experiments on the symmetric Hafler differential and single console stereo etc following others), with pros and cons such as increasing depth but narrowing stage-width.

(3) Depth perception (again, of speech) largely came from a combination of loudness and direct/indirect ratio which could be simulated using reverb. This matched my headphones stereo sound conclusions (as @Pano had done too).

(4) At the beginning (intro) mentioned importance of frequency response, phase coherence, and sound quality (which I took to mean transcient and decay fidelity). Their sketchily-described loudspeaker (distance-simulcast-replaying of the Philadelphia Orchestra with Stokowski at EQ-monitoring-controller) upper half was designed to be a 120deg-horizontal equal-loudness (near-) point-source multi-path horn (WOW!).

(5) Unfortunately Fig.1 subjective localization comparison did not specifically include mono (center channel) though Fletcher cited a '31 paper on motion picture sound.

(6) It would be easy to criticize many aspects of the 0.9-century-old report (many occurrences of "it is obvious that...") and just as easy to romanticize it. I think we can do a more rigorous and quantitative refinement of their work -- surely every generation has?

https://www.aes.org/aeshc/docs/bell.labs/auditoryperspective.pdf
1000002062.jpg
1000002064.jpg
 
Last edited:
  • Like
Reactions: tmuikku