"The phase coherence of harmonics in the vocal formant range, ~630Hz to 4000Hz"

Status
This old topic is closed. If you want to reopen this topic, contact a moderator using the "Report Post" button.
However, the point is the envelope (from which timbre and pitch are derived) can be correct only if the phases of the harmonic components are preserved through the audio reproduction chain including loudspeakers and room.


- Elias
Elias,

A Bösendorfer model 290 is nine and one half feet long, about the same distance between my stereo speakers.
Even though sound is produced from the entire length of the sound board, the "timbre and pitch" of that piano are just fine when tuned.

Correct or not, when reproduced over my stereo speakers the "timbre and pitch" seem little unchanged (close enough for jazz;) ), even though the phase of the harmonic components may differ from a recording of the piano.

Art
 
......However, the point is the envelope (from which timbre and pitch are derived) can be correct only if the phases of the harmonic components are preserved through the audio reproduction chain including loudspeakers and room.


- Elias
Old audibility of phase debate rages on. From purely pitch standpoint if proportion of all components are maintained, then it sounds the same. Then it gets qualified that certain sounds under controlled conditions change character with modifying phase relationships of components.

All naturally produced sounds in music have t=0 where silence ends and sound begins. All spectral components start in phase at zero amplitude, and typically in short temporal period rise to peak of attack. During attack phase of instrument and manner of energy input spectral content and phase relationships form with distinct defining characteristic of instrument and energizing technique.

Degree to which attack behavior follows minimum phase/group delay behavior of typical speaker, the more accurate the speaker is at reproducing that particular sound. Speaker with flat frequency response and flat phase response reproduces minimum phase sources of wide variety equally well.

Highest fidelity dictates temporal fidelity which translates to phase fidelity in frequency domain.

No argument from me on this.:D

Regards,

Andrew
 
Administrator
Joined 2004
Paid Member
Valid question to be asked: Is stereo triangle Hi-Fi ?
Yes, very much so. I can be. Just because it isn't always done well, does not mean it can not be done right. I've heard it done well from small to very large. Stereo can be 3D.
As was said nicely earlier:
Good speakers well placed allow mind to easily find and lock onto phantom image streams with little concentration.
:checked:
 
I thought we now are talking about the basilar membrane filter banks. Those are of course real and have "fixed" middle frequencies with regard to the inner hair cells. Which does not keep them from interacting ... :)

It is not clear to me anymore what point are you trying to make ?

Have you invented new cochlear mechanics ? :p

There is such a number of those hair cells interacting with basilar membrane that the center freqs of band pass filters can be considered as continous.


Those envelopes are "correct" as long as they allow the detection of the corresponding vocal formant. Where from do you know, to what extend the envelope curve has to be preserved until this detection starts to fail? Any links?

Rudolf

To quantify this phenomena would be a valuable information resource. Unfortunately I don't know the answer yet.

I have found some very good information about modulation domain perception here:
Physiological Reviews
Journal of Neurophysiology
The Journal of Neuroscience


- Elias
 
In my opinion the biggest problem of 2 channel stereo is that it doesn't enable enough plausible dynamic cues caused by head movements (primarily head rotation). I believe these cues are most important for delivering a realistic auditory event in the presence of conflicting cues caused by the inherent problems of common stereo.

For example omnipolar radiation will enable such cues at the expense of clarity and timbre. The question is if a certain subset of acoustic parameters is enough to enable realism while maintaining clarity and timbre.

I'm planning to do more tests in that direction.

Be careful there. If you continue like this you may end up promoting flooders :D


I agree, the head shift and turning really kills the stereo imaging. Many people may not be aware but head is in a small constant subconscious movement, we are so used to it it passes unnoticed. I think this movement partly demolishes stereo phantom imaging at high freqs.


- Elias
 
then maybe you need to post a quote supporting that - or are using a odd definition of coherence

I get a cross-correlation peak if I pass correlated signals through identical all pass filters - no difference - they are still coherent - same value of correlation peak

You were talking about coherence between channels.

Griesinger was talking about coherence of harmonic phases within a channel.
 
Elias,

A Bösendorfer model 290 is nine and one half feet long, about the same distance between my stereo speakers.
Even though sound is produced from the entire length of the sound board, the "timbre and pitch" of that piano are just fine when tuned.

Correct or not, when reproduced over my stereo speakers the "timbre and pitch" seem little unchanged (close enough for jazz;) ), even though the phase of the harmonic components may differ from a recording of the piano.

Art


Well, I'm sure it will still sound like a piano even if played through a mobile phone :D

I suspect the harmonic energy of a piano note may not be best example of perception of harmonic phases.

Griesinger was focusing on human speech and singing.


- Elias
 
That would be bad news because if we need to rely on masking effects to increase plausibility of spatial reproduction by overriding conflicting cues then quality will most likely suffer a lot.
I like the idea of missing spatial cues that can be extracted from existing recordings and added in a yet to be defined way much better :)

Who would not like the idea, but it does not work that way ?

Typically there is a some cue lacking for an intended image but simultaneously there are one or more contradicting cues coming from real sources (speakers). If things have already evolved into this state, nothing much can be done to add a correct cue and at the same time make contradicting cues to disappear.

Looks like only option up to date is to override the contradicting cues with enough of fuzziness for the perception to neglect them and focus on remaining cues which hopefully point to the intended image.


- Elias
 
I have found some very good information about modulation domain perception here:
Physiological Reviews
Journal of Neurophysiology
The Journal of Neuroscience
Very good tip, Elias. :)
I already got me "Neural processing of amplitude modulated sounds" from the first link. Will take a while to digest. :rolleyes: Do you have any special PDF you would recommend from those sources?
Typically there is a some cue lacking for an intended image but simultaneously there are one or more contradicting cues coming from real sources (speakers). If things have already evolved into this state, nothing much can be done to add a correct cue and at the same time make contradicting cues to disappear.
Without a better understanding of what the brain regards as a "correct" cue and what as a "contradicting" cue in the auditory stream, we are lost.
BTW: If I want to hear the tweeter, I hear the tweeter. If I want to follow the stream on the auditory scene, I hear distributed instruments playing on the stereo stage. Why isn't it as simple as that to everyone? It needs really bad recordings to lock my ears to the tweeters :eek:

Rudolf
 
Elias,

Your post helped educate me on why early Altec theater speakers which use a crossover from the woofers to a large midrange horn at around 600Hz-700Hz, going up to 10Khz are so well reviewed, and still loved. New 1.4” compression drivers and new horn profiles are popular today for this bandwidth. The data also helped explain why the poor polar response when a tweeter is added at 10Khz can be an "acceptable sin" to many Altec big horn lovers and even a few golden-ear critics. (Read the WIKI on vocal range below).

SO...SO....
1) ARE HARMONICS KEY? Should we design speakers like the classic Altecs with a constant directivity horn covering 630-10K Hz.

2) ARE FUNDAMENTALS KEY? Should we design speakers with a wide bandwidth midbass that covers the 80-1,100Hz fundamental vocal range?

3) SHOULD WE ACCEPT COMPROMISES TO COVER BOTH F & H? Should we perfect a 80-20Khz Synergy Horn even with subtle horn High Order Mode artifacts. Put 40” x 40” Synergy Horns in every living room?


WIKI:
Vocal_range is the measure of the breadth of pitches that a human voice can phonate. The most common application of the term "vocal range" is within the context of singing, where it is used as one of the major defining characteristics for classifying singing voices into groups known as voice types.

The following are the general vocal ranges associated with each voice type using scientific pitch notation where middle C=C4. Some singers within these voice types may be able to sing somewhat higher or lower:

Soprano: C4 – C6
Mezzo-soprano: A3 – A5
Contralto: F3 – F5
Tenor: C3 – C5
Baritone: F2 – F4
Bass: E2 – E4

In terms of frequency, human voices are roughly in the range of 80 Hz to 1100 Hz (that is, E2 to C6) for normal male and female voices together.

Fundamental Speech frequency
The voiced speech of a typical adult male will have a fundamental frequency from 85 to 180 Hz, and that of a typical adult female from 165 to 255 Hz.

Telephony, and equal power:
Telephone transmission is optimized for 300-3000 Hz since the small telephone receiver speaker can cover this range while also isolating 60Hz/120Hz power noise, and the brain can rebuild a voice fundamental from the upper harmonics. Speakers with 300-3,000Hz crossovers avoid midrange intermodulation (doppler) distortion and power dissipation problems, but they put a xover in the vocal range that is often audible.
 

Attachments

  • BigSynergyHorn.jpg
    BigSynergyHorn.jpg
    60.5 KB · Views: 372
I see, what would be the limit in your opinion? Would for example 50m2 still qualify as "small"? The larger the room the further you can place speakers away from walls and delaying reflexions. Some people advocate for 5-6ms difference, some for more.. I myself don't know. :eek:
 
The discussion here is very relevant to my plan to create virtual speakers for playback over headphones, using my own HRTFs.

In the beginning I thought that the technical side of capturing HRTFs and EQing headphones (ala Smyth) would be the hard part. It turns out that decided on what kind of virtual speakers are in play might be harder.

Virtualising speakers means creating cross talk. Things like ambio try to cancel crosstalk and get closer to headphones.

The direction of first reflections depend on the type of speaker and room geometry.

Moving the head in a stereo setup reveals comb filtering -> you don't want to recreate this when using headphones and a head tracker.

In a scenario where you have total control of reflections, either real (outdoors) or virtualised (headphones), what would be closest to ideal for you guys?

@Marcus - does the Realiser let you shorten/attenuate the tail of the impulse response from the speakers? How does it sound as you get closer to the direct sound only?
 
Without a better understanding of what the brain regards as a "correct" cue and what as a "contradicting" cue in the auditory stream, we are lost.
BTW: If I want to hear the tweeter, I hear the tweeter. If I want to follow the stream on the auditory scene, I hear distributed instruments playing on the stereo stage. Why isn't it as simple as that to everyone? It needs really bad recordings to lock my ears to the tweeters :eek:

Rudolf

Dominant contradictory cues sources come as early reflections, and changes in drivers state due to non linearity and multiple modes of driver vibration (break up). Early reflections and attendant short delays tend to cause image aberrations to size, location, and amplitude. Poor summing of signals from multiple drivers leads to reflections having different amplitude and phase structure that forces brain to question weather additional principle source has entered scene that requires attention.

Driver basket, magnet, and reflections from inside of monkey coffins re radiate through drivers (especially the big thin light ones), along with baffle edges, and panel resonances all contribute to speaker's impulse response, which is imparted to all signals, playing on top and throughout.

Bad recording may be thought of as exceeding bandwidth capacity of speaker, strongly exciting non linear behavior.

That you can hear your tweeter when you want to with music playing illustrates this perfectly. They produce cues which you actively ignore. With my speakers, you can stare right at them, twist your head, move it back and forth, and the speakers remain invisible to the ears. With single speaker and head stationary and music source, it is very difficult to identify how far away the speaker is. The speaker acts like a lens, that happens to be the source. You don't see a clean lens, only the image that passes through it.

Regards,

Andrew
 
Status
This old topic is closed. If you want to reopen this topic, contact a moderator using the "Report Post" button.