"The phase coherence of harmonics in the vocal formant range, ~630Hz to 4000Hz"

Status
This old topic is closed. If you want to reopen this topic, contact a moderator using the "Report Post" button.
"The phase coherence of harmonics in the vocal formant range, ~630Hz to 4000Hz"

The thread title is one of the key points from David Griesinger's presentation:
"Pitch, Timbre, Source Separation"
http://www.davidgriesinger.com/Acoustics_Today/Pitch,%20Timbre,%20Source%20Separation_talk_web_sound_3.pptx


Looks like he has come to a conclusion many fundamental auditory perceptions are derived from the property:
"the phase coherence of harmonics in the vocal formant range, ~630Hz to 4000Hz"



Now, this can have some interesting consequencies to the sound reproduction over loudspeakers in a small room, too.

In order to achieve the above requirement we would need at least:

* phase linear loudspeaker over this freq range

* No cross over within this freq range

* No cabinet difractions within this freq range

* No early room reflections in this freq range

* Modulation transfer function of unity in this freq range



It's all nice, BUT, what will happen to the harmonic phase coherence when you put 2 loudspeakers in a triangle and get terrible stereophonic comb filtering at the listening position ? :rolleyes:

There should be at least one possible solution to reproduce sound over loudspeakers while the above requirements remain achievable: Cross talk cancelling ;)



- Elias
 
Last edited:
The properties of stereo phantom image formation remains highly intact for speaker pairs that mash up phase in similar fashion.

When signal content in one speaker undergoes different mashing, such as with driver break up modes, or poorly behaved passive crossover, phantom image is degraded.

Asymmetrical lateral early reflections distort imaging. And identity of source as some manner of speaker due to speaker transfer function is nothing new.

Take home message of David Griesinger is that virtually all sources with low frequency components have harmonic content which is faster for brain to decode into locational cues.

Regards,

Andrew
 
Elias,
as I understand Griesinger, the phase coherence is not a value in itself, but is helping the brain to decipher the individual(s) formants in each critical band filter. I'm referring to foil 28 here. So the phase coherence must be sufficient in each critical band (and some region left and right of it perhaps), but not linear for the complete vocal formant range.
Room reflections certainly are an issue. Cabinet diffractions can be an issue if not dealt with in the way I do ;) :D.

Rudolf
 
Elias: I think that most of your ideals could be achieved.

#1: agree, but isnt that impossible?
#2: agree. Quite possible.
#3: again I agree, and its quite possible to achieve.
#4: the band of freqencies you quote would suit judicious room treatment.
#5: i use transfer functions but never the modal variety ;) Would getting #2, 3 and 4 be close enough to achieving #5?
 
Last edited:
#1 has already been achieved:

309149d1351604382-how-achieve-coherence-phase-overlays.png


#2 is not a requirement;

#3 is effectively achieved by speaker system to demonstrate #1

#4 follows from #3 and good speaker placement

#5 for described speaker is exceptional

DSP based Pluto Clone has amazing imaging.

Regards

Andrew
 
Elias,
as I understand Griesinger, the phase coherence is not a value in itself, but is helping the brain to decipher the individual(s) formants in each critical band filter. I'm referring to foil 28 here. So the phase coherence must be sufficient in each critical band (and some region left and right of it perhaps), but not linear for the complete vocal formant range.

Interesting argument, but as single critical bandwidth does not yet fully determine the timbe nor pitch the comparison is made through the full freq range. And, taking into account the critical bandwidth is not a discrete value consept but a smooth continous filtering phenomena then it follows that as phase linearity is required within one band it is required along the whole freq band.

As noted in other posts linear phase is approachable with engineering.


Room reflections certainly are an issue. Cabinet diffractions can be an issue if not dealt with in the way I do ;) :D.

Rudolf

Also maybe a big non diffracting horn like JMLC would do.


- Elias
 
Hey guys, are you missing the point ? ;)

We can engineer a loudspeaker fulfilling the requirements. But, have you given a thought what happens when two of those (ideal) speakers are placed at the corners of an equilateral triangle and the listener being positioned at the third corner ? The conspiracy is born which is marketed to the masses as stereo :rolleyes:

Due to interaural cross talk at the listening position the timbre and pitch is screwed up !

Valid question to be asked: Is stereo triangle Hi-Fi ? :rolleyes:


- Elias
 
Cross talk cancelling

Not going to happen because there is no practical implementation that could attract a large audience. It would also require different production techniques. Virtually all stereophonic recordings ever made are more or less optimized (EQ, interchannel time and level differences, etc.) for 2 speaker crosstalk playback.

In my opinion the biggest problem of 2 channel stereo is that it doesn't enable enough plausible dynamic cues caused by head movements (primarily head rotation). I believe these cues are most important for delivering a realistic auditory event in the presence of conflicting cues caused by the inherent problems of common stereo.

For example omnipolar radiation will enable such cues at the expense of clarity and timbre. The question is if a certain subset of acoustic parameters is enough to enable realism while maintaining clarity and timbre.

I'm planning to do more tests in that direction.

Interesting read: http://opus.kobv.de/tuberlin/volltexte/2004/921/pdf/mackensen_philip.pdf
 
Interesting argument, but as single critical bandwidth does not yet fully determine the timbe nor pitch the comparison is made through the full freq range. And, taking into account the critical bandwidth is not a discrete value consept but a smooth continous filtering phenomena then it follows that as phase linearity is required within one band it is required along the whole freq band.
Elias,
you may have looked at slide 28, but you possibly have not understood it yet. ;) There is a low pass at the second stage. Afterwards the phase is lost, only the envelopes are compared. And we don't talk about what you understand as critical bandwidth, but about the basilar membrane filter banks, which are rather fixed.
 
Hey guys, are you missing the point ? ;)

We can engineer a loudspeaker fulfilling the requirements. But, have you given a thought what happens when two of those (ideal) speakers are placed at the corners of an equilateral triangle and the listener being positioned at the third corner ? The conspiracy is born which is marketed to the masses as stereo :rolleyes:

Due to interaural cross talk at the listening position the timbre and pitch is screwed up !

Valid question to be asked: Is stereo triangle Hi-Fi ? :rolleyes:


- Elias

Stereo is not a conspiracy, but as with all perceptions illusory mechanisms are in play.

Stereo illusion is possible because brain didn't evolve to cope with such highly correlated multiple sources, thus the glory in a choir singing as one. Humans are one of few species that pays attention to images in a mirror.

Fidelity of an information channel may be looked as temporal fidelity to source signal in time domain, or frequency and phase angle fidelity in frequency domain.

Cross talk is less of a problem then HRTF related equalization when panning phantom image element from center to side during mix. Stereo microphones with and without dummy head are illustrative.

Two speakers may be equated to light with double slit; resolving two slits is dependent on wavelength and slit separation.

Obviously, reconstruction of arbitrary three dimensional wavefront at a point in space and time requires more than two sources.

Two eyes, each 2 dimensional receivers suffice for 3-D vision. Within boundaries, two highly correlated 2-D images suffice for mind to accept illusion of 3-D scene from single perspective.

Many visual illusions exist that give insight into neural processing of vision, a particularly good one that shares much in common with hearing is illustrated with the "Magic Eye" type pictures. A single flat image has depth mask cue processing, and with a little practice, many people see the encoded 3-D image. The manner of depth mask generation impacts the ease with which the mind can lock in on the 3-D content. Depth in image ties to distance of viewer from source. Maintaining head and source orientation is important.

My first experiences with these images were tough. It took many trials and concentration. When image appeared, even blinking would cause image to disolve. With repetition I found that images appeared quicker, and could view them in relaxed state, even momentarily closing one or both eyes.

The parallels with stereo audio are very strong. Good speakers well placed allow mind to easily find and lock onto phantom image streams with little concentration. Early reflections, and speaker with poor off axis behavior force mind to decide if sound is new element/source, or if it belongs to existing process stream. When mind easily associates wall reflections with phantom image streams it ceases further processing of these sub streams, and walls tend to vanish.

When reflection become dominant, image size location and total intensity get changed much as peering through a kaleidoscope.

Regards,

Andrew
 
When reflection become dominant, image size location and total intensity get changed
May also be that local reflections mask some of the cues that would tend to localize the speakers as sound source . . . both "broadening" (or obscuring altogether) those sources, and leading the mind to give more weight to whatever cues are embedded in the recording. There seems to be a strong desire to create a "scene" to account for the origin of sounds . . . when we "know" that it can't be two boxes we fabricate an image as best we can out of the information available.
 
May also be that local reflections mask some of the cues that would tend to localize the speakers as sound source.

That would be bad news because if we need to rely on masking effects to increase plausibility of spatial reproduction by overriding conflicting cues then quality will most likely suffer a lot.
I like the idea of missing spatial cues that can be extracted from existing recordings and added in a yet to be defined way much better :)
 
The issue was not a coherence difference but the absolute value.

then maybe you need to post a quote supporting that - or are using a odd definition of coherence

I get a cross-correlation peak if I pass correlated signals through identical all pass filters - no difference - they are still coherent - same value of correlation peak
 
Elias,
you may have looked at slide 28, but you possibly have not understood it yet. ;) There is a low pass at the second stage. Afterwards the phase is lost, only the envelopes are compared. And we don't talk about what you understand as critical bandwidth, but about the basilar membrane filter banks, which are rather fixed.

The phenomena happens in the modulation domain. For that reason I suggested the usage of modulation transfer function MTF.

The basilar membrane is a mechanical resonator, and there is no 'fixed' corner freqs for each critical bands but rather they can be countinously interpolated. The only reason all the publications model the filtering actions as a bank of filters is due to savings in computational cost.

However, the point is the envelope (from which timbre and pitch are derived) can be correct only if the phases of the harmonic components are preserved through the audio reproduction chain including loudspeakers and room.


- Elias
 
The basilar membrane is a mechanical resonator, and there is no 'fixed' corner freqs for each critical bands but rather they can be countinously interpolated.
I thought we now are talking about the basilar membrane filter banks. Those are of course real and have "fixed" middle frequencies with regard to the inner hair cells. Which does not keep them from interacting ... :)
However, the point is the envelope (from which timbre and pitch are derived) can be correct only if the phases of the harmonic components are preserved through the audio reproduction chain including loudspeakers and room.
Those envelopes are "correct" as long as they allow the detection of the corresponding vocal formant. Where from do you know, to what extend the envelope curve has to be preserved until this detection starts to fail? Any links?

Rudolf
 
Status
This old topic is closed. If you want to reopen this topic, contact a moderator using the "Report Post" button.