AES Objective-Subjective Forum

Status
Not open for further replies.
I am not a member, so I have not read the paper. Several groups are and have been convinced that most subjective preferences are not real and have set out specifically to prove that hypothesis using some form of ABX as the weapon.

It is unclear from where the test pool of listeners was drawn, but if they are people who joined the test fully expecting to support their belief that no difference will be heard......

A few ABX tests supporting the idea that small differences are detectable have been dismissed as: Too small a sample size, blind luck, demands that better than 2 sigma correlation be achieved etc. Each camp has it's own built in bias and preconceptions. Due to the controversial nature of the subject no first class examiner or scientist will touch the subject with a ten foot pole.
 
To all.
Of course what you hear is not the same as what some one else hears and Blind tests are not all equal in methodology or subjects taking part, so there is no reason for anyone get unduly upset over the results. The results apply to that test, those people, on that day.
Find out your own thresholds at home by doing some simple blind tests, after all this is DIY.
 
fredex said:
To all.
Of course what you hear is not the same as what some one else hears and Blind tests are not all equal in methodology or subjects taking part, so there is no reason for anyone get unduly upset over the results. The results apply to that test, those people, on that day.
Find out your own thresholds at home by doing some simple blind tests, after all this is DIY.


The whole point of doing things 'scientifically' is so that they are repeatable, by other people, on other days. But I largely agree with you. Life is a journey, we need to go through it. If people actually ran their own tests (in a proper manner), there might be a lot fewer snake oilers, and science wouldn't be the belief-system it is for most people. But who has the time for that? :hot:

I doubt it would be possible for each one of us to understand and perform every experiment and examination out there, so we could say that we rely on no one's opinions, including the 'professional opinions' of experts. So where does that leave us? Up a creek, if being thorough is your thing.

I'm someone who values being thorough, and I really respect Meyer's and Moran's work - I wish more people were that thorough. But by my trusting their work, I create a belief system, trusting something I believe is correct, but unfortunately haven't actually experienced myself and know to be 'true'. So for myself, I realize it is futile to try and prove everything to myself as (objectively) true, and not base my life on the (subjective) opinions of others and myself. An aggravating thing for someone who just wants to be realistic. So I've decided, if I'm not at least enjoying myself, why bother?

The bottom line - its impossible to not be subjective, but if you want to interact with other people, its helpful to try and be objective. SY on the other hand, just wants to stir up trouble. 😉
 
cuibono said:



The whole point of doing things 'scientifically' is so that they are repeatable, by other people, on other days. But I largely agree with you. Life is a journey, we need to go through it. If people actually ran their own tests (in a proper manner), there might be a lot fewer snake oilers, and science wouldn't be the belief-system it is for most people. But who has the time for that? :hot:

I doubt it would be possible for each one of us to understand and perform every experiment and examination out there, so we could say that we rely on no one's opinions, including the 'professional opinions' of experts. So where does that leave us? Up a creek, if being thorough is your thing.

I'm someone who values being thorough, and I really respect Meyer's and Moran's work - I wish more people were that thorough. But by my trusting their work, I create a belief system, trusting something I believe is correct, but unfortunately haven't actually experienced myself and know to be 'true'. So for myself, I realize it is futile to try and prove everything to myself as (objectively) true, and not base my life on the (subjective) opinions of others and myself. An aggravating thing for someone who just wants to be realistic. So I've decided, if I'm not at least enjoying myself, why bother?

The bottom line - its impossible to not be subjective, but if you want to interact with other people, its helpful to try and be objective. SY on the other hand, just wants to stir up trouble. 😉


Good post. I would just say that in everyday life we depend on other people's experiences and 'proofs' because we are convinced that they are an accurate model of reality. If you connect a 9V battery to a 1k resistor you are pretty sure that the current will be 9mA, even if you have never ever before done this thing. That is the power and dependability of repeatable, reliable and scientific information.

I don't have that warm and fuzzy feeling when I change, say, a caddock resistor for a vishay in my amp. There is really no way to predict what the difference in sound will be, if any, even if someone went to great lengths to try to convince me that it really works because he/she heard it in his/her own home. Because it is anecdotal. Blind tests are not ideal nor infallable, but if your thing is being thourough, they (blind tests) beat anecdotes hands down.

Jan Didden
 
Has anybody seen this article in the June AES journal:

"On Some Biases Encountered in Modern Audio Quality Listening Tests-A Review"

Abstract: "A careful evaluation of listening tests designed to measure audio quality shows that they are vulnerable to systematic errors, which include biases due to affective judgments, response mapping bias, and interface bias. As a result of factors such as personal preferences, the appearance of the equipment, and the listeners' expectations or mood, errors can range up to 40% with respect to the total range of the scale. As a general conclusion, test results should be considered relative, rather than absolute. Scales in previous studies, which have been assumed to be linear, may exhibit departure from linearity. The visual appearance of the user interface may lead to severe quantization of the distribution of scores. Recommendations are offered to improve audio quality tests."

Contains a lot of interesting examples, pertinent to the current discussion. Like the experiment where a group of listeners were asked to assess the audio quality of two identical types of hearing aids, labeled as either digital or conventional. They found that out of 40 participants 33 listeners preferred the hearing aids labeled digital, 3 preferred the conventional ones, and only 4 participants did not hear the difference between the two.

Jan Didden
 
To address possible biases of participants is a major point during the test design process.
"Blinding" itself only removes one sort of bias; the expectation bias of participants regarding the difference between SACD and 16Bit/44.1kHz remains unaddressed.

Afair no controls were used,so literally no one knows what level of sensitivity was reached in these tests.

Expecially considering the fact that other studies (for related topics) came to different results i do have some doubts about the possible conclusions that can be drawn from the Meyer/Moran article.
 
It is unclear from where the test pool of listeners was drawn, but if they are people who joined the test fully expecting to support their belief that no difference will be heard...... Due to the controversial nature of the subject no first class examiner or scientist will touch the subject with a ten foot pole.

The paper can be ordered by non-members (or viewed/copied at an engineering library) and does answer your question; listeners comprised several categories and demographics, and these were broken down in the test results.

As for the latter, it's not the "controversial" nature that is keeping most scientists away from this, it's the low probability that there's anything there to be found. There are much more fertile fields to plow.
 
SY said:


The paper can be ordered by non-members (or viewed/copied at an engineering library) and does answer your question; listeners comprised several categories and demographics, and these were broken down in the test results.

As for the latter, it's not the "controversial" nature that is keeping most scientists away from this, it's the low probability that there's anything there to be found. There are much more fertile fields to plow.

I think SY has hit the nail on the head. It is not a fruitful area for a scientist nor is it it controversial.

These issues were pretty much put to rest about 2 to 3 decades ago. The relevant literature was not confined to audio magazines and journals, rather much of it came from more general issues in experimental psychology.
 
WithTarragon said:
These issues were pretty much put to rest about 2 to 3 decades ago.

'Knowns' as prosaic as the equal loudness curves were revised again just recently. The advancements over thirty years into the audible detectability of distortion alone, driven by research into lossy compression, must be staggering. It seems hardly likely all the important questions were settled with the Williamson.
Settled for engineers though was the answer; global negative feedback. No matter how low the audible detectable threshold is set a set of rules and a toolkit were now available to exceed the necessary measured results. To my mind that was a much greater cause for turning new engineers towards more exciting fields.
 
rdf said:


'Knowns' as prosaic as the equal loudness curves were revised again just recently. The advancements over thirty years into the audible detectability of distortion alone, driven by research into lossy compression, must be staggering. It seems hardly likely all the important questions were settled with the Williamson.
Settled for engineers though was the answer; global negative feedback. No matter how low the audible detectable threshold is set a set of rules and a toolkit were now available to exceed the necessary measured results. To my mind that was a much greater cause for turning new engineers towards more exciting fields.

RDF, you are misunderstanding my comments. The issues of what constitutes proper psychophysical measurement, the overall roles, of bias, placebo, expectations, learning, memory, attention etc, etc, were largely worked out 20-30 years ago. Of course, there is still some work being done and some of the answers and models are being refined. But for the stuff we are talking about, a good enough answer is already out there. It really would not be a fruitful area of research. Bringing up equal loudness curves (which really have not changed all that much) is a red herring and will only get folks off-track. Likewise, with negative feed back etc. We are discussing measurement techniques, their analysis and interpretation.

Proper measurement can be done on these topics (audibility of certain kinds of distortion etc). However, they would require a proper test protocol etc. Most audio enthusiasts would simply have no idea where to begin. It is not their field of expertise and training so why should we expect an expert-level design and analysis.

Incidentally, most folks who criticize the Moran work have not read the paper or if they have they would usually not have the background to think about it critically. It is not bad work, rejecting it reflexively is a mistake. Those guys knew what they were doing. The issue is really not whether they used fancy cables or an amplifier that cost a million dollars (that the "critic" happens to own).
 
The article is not bad I must admit. But it certainly lacks the quality of the average AES papers. I don't want to sound snobbish but they could have at least stated the equipment they used.
I once learnt that one should mention the tools used in scientific papers (for your own records you usually even protocol the serial numbers of the equipment for repeatability).

Regards

Charles
 
I like this. Some real things are being discussed.

I (only) brought up religion to hopefully "head off at the pass" a religious battle between the rampant subjectivists and rampant objectivits, which these sorts of discussions inevitably descend into. It remains my contention that unless you accept ALL aspects of science are ultimately based on faith and thus open to some question, then you are building a religion. Even if you do accept it then it's still a religion because we're still subjective creatures, there's no way to prove that logic and reason aren't an elaborate illusion. That sort of thing that makes people extremely uneasy so they pick a side and fight for their beliefs, religious battle follows, shouting rather than listening etc.

Or as rdf's sig says:
"Science is the belief in the ignorance of the experts - Richard P. Feynman"

I now understand what this quote means. Science is a very pessimistic religion even at the best of times, I've never really given it that much thought before. I'm an engineer.

Anyway...

As far as this paper goes, some fairly conventional things that come to mind:

1. The threshold of detection isn't absolute, it is a reducing probability as quality increases. Given that CD audio is supposed to be just good enough to be "perfect", it follows that it should be possible to detect improvements to this with suitable effort. Before that's dismissed as overkill, remember that there are many more people listening to music than there are doing scientific trials on it.

2. Through even basic noise shaping, 16/44.1 audio is capable of encoding somewhere between 18 and 19 bits of old-school PCM audio, seriously closing the gap between CD and SACD. When mastering a CD off a very high quality recording, a great deal of effort is put into making sure the right dithering is used, and often the various options open to the recording engineer are auditioned to make sure they don't cause problems (eg TPDF, UV22). Presumably it's a good recorder doing the monitoring in this test, maybe it's just a really good match for material that would have otherwise shown up differences with a different recorder (CD loop).

3. The very people that are the most "qualified" to take part in these sorts of tests, have usually suffered HF hearing loss as a result of their jobs and/or their age. That could account for some of the bandwidth difference. M&M do go hunting for correlations, but on a necessarily reduced data set per (smaller) group. They didn't get people off the street, it was "about 60 members of the Boston Audio Society and many other interested parties". They're appealing to audiophiles, not scientists by doing this.

4. What people often overlook is that a dodgy or 'fruity' audio system can be more likely to expose rather than mask faults of a dodgy audio coding system. An example is the person comparing MP3s with original on computer speakers which happen to have a 10dB peak right where the codec is trying to hide some grunge. These sorts of problems might be inaudible on an accurate high-end system that is otherwise considered very "revealing" to details in the recording (the music, which is designed to be heard - as opposed to the encoding, which is designed to not be heard).

5. ABX testing is a pretty blunt tool for comparing subtle musical details, due to the either/or nature of the test. Small differences need a vast number of tests before statistically significant results are available, and subjects are forced to make a very unnatural "decision" which is not a normal part of the musical listening experience and can completely and consistently throw test subjects off. That only accounts for a difference in sensitivity though, the ABX test will eventually pick something up if it is there and waiting. The exception is where the unnatural forcing of an 'unnecessary' objective judgement completely bypasses the normal path in the brain, resulting in an un-knowing random string of judgements or fabrications. A lot of effort goes into removing bias and influence, on the expectation that humans are unnaturally cunning. But humans are also capable of unnatural stupidity.

6. M&M seem to have been on this warpath for years. Good on them. But it doesn't matter what attention you give to blindedness, experimenter bias tends to get into the results of contentious studies. As one poster pointed out, they were expecting a null result, and told the subjects (I can't find this bit, I'll need to read it properly, which I admit I haven't), you can't get much more bias than that other than by fiddling the results.

7. The experiment lacks a control because no one could pick the difference. What could they do? Reducing the number of bits only increases the noise floor until it can be heard outright and increases audibility of whatever noise shaping is being used, rather than exposing any particular distortions inherent to the CD encoding standard. Reducing the sample rate only reduces the bandwidth until it can be heard outright and increases audibility of whatever anti-aliasing filtering is being used, rather than exposing any particular distortions inherent to the CD encoding standard. See the problem with this?

None of these things significantly invalidate the test in my opinion, but they do give some wiggle room.
 
adx said:
I like this. Some real things are being discussed.

...snip....
As one poster pointed out, they were expecting a null result, and told the subjects (I can't find this bit, I'll need to read it properly, which I admit I haven't), you can't get much more bias than that other than by fiddling the results.

...snip...
I think that was me, and I wondered if this was the case. I have no evidence to support the notion that panelists were told of this, but I do know that these same people have been promoting the ABX proves there's no difference school for some years.

I personally find difference testing to be most tedious, recently my friend and I spend one full day comparing two good cables. First find some kind of difference, second find an otherwise quality recording that highlights that difference, listen and listen some more. Now try a few other recordings, occasionally a product has a strength and also a weakness different from the comparison product.

In the end decide which is either more accurate or more pleasing, another small dilemma. All this was sighted and subjective if performed as a double blind, add one day for set up and a second full day of testing.

A whole new definition of listener fatigue, do people score double blind tests because they can't wait to escape the clutches of the test creators?

In the blowtorch thread John seems to say what works for him is a never ending pursuit of lower distortion levels especially of high order harmonics. He implies this has good correlation to his listening results. There is still room for improvement in these performance numbers. This leads me to believe we aren't done yet with getting better sound reproduction, it also flies mostly in the face of the "all well designed amplifiers sound the same" school.
 
scott wurcer said:


Subconciously hearing the sound of the A/B/X box? In retrospect very few listening tests examine all possibilities that could skew the result. I have never attended any listening comparison where there was an attempt to equalize +-.1dB broadband on both sources. This would disqualify all of them in the eyes of some.


Scott,
even +/- 0.1 is not quite enough as many can hear .1dB
 
Status
Not open for further replies.