Bessel vs Critically Damped Enclosure

Our HAS is built around and relies upon the time domain and is astonishingly accurate (microsecond scale) and is ultra sensitive to errors in time... This is why our HAS instantly detects and rejects any and all sounds as "fake" when they are reproduced with time domain errors in the millisecond scale...
I hope I am explaining this well enough to encourage others to investigate this further.

You have explained well enough to make me recall an "investigation" that disproves your assertion that "our HAS instantly detects and rejects any and all sounds as "fake" when they are reproduced with time domain errors in the millisecond scale..."

I provided a humorous (to me..) example of an "invisible speaker" one Halloween using a very narrow dispersion horn (only 13 degrees from about 2000 Hz up) located about 15 meters from the sidewalk the kids walked by on as they went trick or treating.

As the kids would come in the range of the horn covering around 2 meters of the sidewalk, I'd whisper or make cat meowing noises in to the microphone/mixer/amp driving the horn. I was hidden in a dark garage where I could see and hear them over headphones.

The close-miced sound would make the kids stop and look in every direction looking for the "ghost" whisper or non-existent cat that sounded like it was located within inches of their location, even though the actual source of the sound was up a hill 15 meters away.

By initially adding reverb to my voice, then reducing the reverb to dry sound ratio I could also create the illusion that the ghost voice was coming from a distance and then getting right next to the "target".

The "invisible speaker" was so convincing that many of the kids (and a few of their parents) took off running. Some went in the direction they were headed, others turned around.

I think I collected more spilled candy than we gave out, very few kids walked up the 20 steps to the door where my (ex) wife was dispensing candy. Usually the only ones making it up the stairs were kids with full head masks that made it hard to hear anything but their own breathing.

The real trick was using HF sounds (whispering), as the Maltese horn is so directional up high that people only 10 feet out of the pattern would not hear what those in the pattern were freaking out over. One set of kids would scream and run off, then I could scare the next set.

Later, when the stoned teenagers came out, had some real fun, I'd say something like "Hey, can I bum a smoke", they would look around- "where are you man?","I'm right here, can't you see me?" There were a few that hung around talking for a while before getting creeped out.

Probably the most fun I had with sound, and no one figured out where the "phantom" was all night.

In retrospect I know the EV DH1AMT driver used has several milliseconds of ringing in the upper range, yet everyone was convinced they were hearing something very real and present.

  • Like
Reactions: 2 users
You might also then like to consider the capability of the human auditory system in a low fidelity application, such as analogue telephony. Here we have (by modern standards) an exceptionally poor transmission channel and a single transducer pressed against a single ear - an audio system where the inter-aural delay or processing thereof is thus completely irrelevant.

Yet with such low fidelity reproduction, it is possible to recognise who you are talking too, and (apart from well-known prank impersonators) to do so with 100% reliability. Moreover, it is possible to discern information about the talker's emotions, or of the acoustic environment in which they are situated. Such capabilities often elude reliable linear analysis in spite of copious amounts of DSP power given to the task*.

To explain how we (or DSP) can achieve this efficiently with listening via a single ear, we often consider the human voice as exhibiting formants - that is, broad resonances due to our vocal tracts and their variation. Via bispectral means, we can discern how one formant is related to another (or not), and discern how those relationships vary; We can also discern how echoes relate to those formants, and how they vary if the talker's environment changes too.

Thus we can resolve information with 100% accuracy where the time domain behaviour is dominated significantly by the frequency limitations of the telephone - and without any contribution from the other ear. Any assertion that the limit of our auditory system and its remarkable capabilities is defined exclusively by the linear time domain or by the detection of inter-aural time differences is thus fundamentally flawed.

This does not reject what is capable using two ears instead - localising a voice and its emotional content in one of the crowded, noisy cocktail parties I regularly get invited to is a good example of how the cross-bispectrum can aid our hearing - even where those you do not want to hear are speaking with similar frequency outputs. Similarly we can mask detrimental room acoustics that would appear to obliterate information in the linear time (or frequency) domain.

Importantly, both the bispectrum (and cross-bispectrum where appropriate) offers not only the means to recognise a talker and the ability to attach a cognitive 'label' to them, it also offers the opportunity to learn the sonic signature of one loudspeaker defect from another, for example. We can then train ourselves to hear details in what could be discerned only as noise by another, and without generating any alarm in them whatsoever.

The hysteretic non-linearity due to learning thus provides a significant barrier to audio analysis tools in high fidelity applications. But to imply the linear time domain behaviour contains some magical elixir of high fidelity is a nonsense, and to imply that inter-aural thresholds are part of our capabilities viewed in this singular domain simply has no foundation.

*This needs some clarification before anyone catches me out, since magnitude in frequency or complex time (i.e. ETC) is derived from a non-linear squaring operation. "Phase blindness" jumps to our rescue in linear auditory models, however, hence why I have persisted with the terminology.
  • Like
Reactions: 1 users
I provided a humorous (to me..) example of an "invisible speaker"
That the brain is so easily fooled when deprived of its most powerfully processed sense - sight - is also easily demonstrated by screening a stereo pair of speakers behind a an acoustically transparent curtain, when most listeners will hear a subjectively better sound than when sighted. Golden Ears suddenly become made of cloth.
I have fooled experienced listeners in this way and still cannot fathom why sighted auditions of equipment still even exists, unless impressing ones' neighbours is a significant factor...
That the brain is so easily fooled when deprived of its most powerfully processed sense - sight
That is true in most cases. I am aware of one genuine exception, however, that being Angus MacKenzie. He was actually blind but had developed extraordinary hearing capabilities, presumably as a compensatory mechanism.
I can remember one demonstration where people were asked to raise their hands up if they could perceive a difference. Angus raised his hand after a couple of seconds, at which point half the audience followed with haste. Ego I would suggest is also a most significant factor.
Amazingly too, Angus then highlighted a further issue that not even the demonstrator had known. He was genuinely extraordinary.
  • Like
Reactions: 1 user