The double blind auditions thread

Status
This old topic is closed. If you want to reopen this topic, contact a moderator using the "Report Post" button.
Hi,



To put this into different words, you fell for the con-job.

Would you have found equally "no difference" had you been:

1) Ignorant that you where comparing two items that you had previously percieved as "very different"?

2) Had been simply asked to rank item A and Item B by preference, while not only being ignorant of the identity of Item A and Item B, but also to their very nature?



So, in other words the ABX Test itself (I will grant that the ABX device itself was largely transparent) introduced an additional variable, which resulted in the obscuring of sonic differences, which you claim long term sighted testing reveals.

BTW, I once demonstrated that with the right kind of "challenge" to the right kind of "subject" it is possible to cause apparent inaudibility of gross sonic differences (polarity reversal of one channel).

In the end, the ABX protocol alone as normally applied already contains a strong bias against non-null results, this can be made far greater by exposing the subject to the tests without training and by making sure the subject has significant emotional involvement (that is, the subject is convinced of the presence of a difference and is especially eager to show that such a difference exists).

By handling things right even the grossest sonic differences may be made inaudible, never speak of any one that should be considered subtle.
...

Ciao T
so you are arguing strongly for the proposition that internal mental state - induced by the presentation, peer pressures, ect. do have massive effects on perception??

how does applying that to your characterization of "bad" DBT, ABX lead to the conclusion that sighted listening without controls is the way forward???



Training, directing focus to expected differences, or asking for multidimensional rankings done controlled, blind, level matched seem to be readily doable – lets have some direction on “good” subjective test protocols

some reported tests are amateurish, but why are you projecting a desire for null results on anyone running blind ABX tests - surely researchers in psychoacoustics need positive results to advance theory - and they do get positive results in DBT tests
 
Last edited:
Hi,

so you are arguing strongly for the proposition that internal mental state - induced by the presentation, peer pressures, ect. do have massive effects on perception??

Would care to argue the opposite position.

how does applying that to your characterization of "bad" DBT, ABX lead to the conclusion that sighted listening without controls is the way forward???

It does not lead to such conclusions, nor do I remember suggesting that such a conclusion was to be drawn.

It merely leads me to consider such test as no more scientific and no more relevant in terms of general applicability than when I read over at Audio Asylum's tweakers asylum gushing and superlative loaded reports about the "Teleportation Tweak" from Machina Dynamica, that all.

Training, directing focus to expected differences, or asking for multidimensional rankings done controlled, blind, level matched seem to be readily doable – lets have some direction on “good” subjective test protocols

I believe I have made suggestions towards such on many occasions when the subject of blind testing came up. I have arguably also on such occasions taken occasion to condemn in the strongest possible terms the kind of bad testing and prejudiced test setups perpetuated and published by the ABX Mafia.

some reported tests are amateurish, but why are you projecting a desire for null results on anyone running blind ABX tests

I am not projecting any desire.

I am referring to the originators of the Audio ABX tests, who for the better part of a decade operated a commercial entity to sell their "ABX Comparator", who, following commercial failure continued )and continue) to promote their method, protocol etc...

Knowing who they are should serve as illustration that there is no projection needed, observation will do just splendidly.

Now I know that there are others who have uncritically adopted these methods. These of course have no particular desire and agenda, but merely an affinity to the "everything sound the same" philosophy or a desire to oppose it that is not matched by expertise. These are just pittyable for wasting their time.

surely researchers in psychoacoustics need positive results to advance theory - and they do get positive results in DBT tests

It should be noted that, while ABX Testing as promoted by the ABX Corporation does qualify as double blind, however, double blind testing does not equate wholly to the ABX methodology only (despite strenuous efforts of the ABX Mafia to project the opposite image).

In most cases the researchers get positive results precisely because they do NOT apply ABX but DBT...

Ciao T
 
In total, Thorsten, you seem to be making the point that I was induced into believing the "no differences" gospel by the nature of the test and the motives of those giving the test.
Let's be clear about how this works...
The ABX box I used has three buttons, A, B and X. You can press A to hear amplifier A at any time, similarly with B. When you press X, you get either A or B, randomly, which you are then to identify, aurally.
My point is that when the levels were matched, I could not tell A from B by pushing the A or B buttons. As I said there were amplifiers that I believed were very different, listed through what Thorsten calls a largely transparent device.
Maybe I fell for a con job, but my evaluation of the testing situation was that there were no agendas at that time.
Thorsten, your questions 1 and 2 are hypothetical situations that I can't speak to and don't care to speculate about. I was only relating my experience.
I do believe that the format of ABX testing doesn't allow testing everything that characterizes the audible performance of a piece of equipment, for example listening fatigue. I do believe that the ABX format has shown that level matching and frequency response are important variables and may explain some of the audible differences reported between equipment. It may well be that ABX testing is not particularly useful for any testing any other variables.
I also believe that we are particularly good at fooling ourselves.
 
Hi,

But are you not the one who keeps on hamering on the propper scientific method, rightfully so btw. Why do you now support the conclusion of this study?

I neither support nor refute any results of Oohashis Studies.

I pointed out that:

1) Oohashi's study was a controlled, blind test
2) That when his original study was criticised, he attempted to address the criticisms with additional studies

Do you disagree with the above?

Ciao T
 
Thorsten, I didn't claim that the ABX is the best (or only) DBT test, nor did I claim that using one listener is enough for such and such purpose.
My only claim is that you can get informative results from one listener tests. The information one can get if one's methodology is sound can be something like:
"Given these test conditions and this listener we state with a confidence X that the effect is audible or it is not audible above some threshold T".
 
Last edited:
Hi,

Thorsten, you criticize the ABX methodology saying that it can mask audible effects in some conditions.

I posted the references previously.

For those preferring "popular science" level rendering to the serious math in the JAES articles, Stereophile's "Then highs and lows of blind testing" which I also referenced gives an "Executive Summary".

Do you know any methodology who is superior in this regard ? If so, please let us know.

First, let me repeat, training and controls. Listeners need to be trained and evaluation against positive (known audible phenomena) and negative (no difference) controls must be made, to assure that the test produces neither false positives nor false negatives.

For methodology, using extended questionnaires that allow detailed analysis for preference are by far more useful. They also allow easy statistical analysis using a confidence interval to illustrate just how reliably we can conclude that a difference was heard or not. Plus, such methodology is valid even for very small sample sizes (you just cannot get very high confidence levels).

All of this of course requires more than the "Milkmaid" approach of the ABX test (the one formalised by the ABX Corporation), but is also highly likely to give results that are useful beyond establishing the likelihood that a real difference was present.

Meanwhile, the ABX approach, especially combined with untrained listeners, unfamiliar systems and the intentional use of subjects holding serious bias's on the items tested appears (but is not) scientifically sound and has a very likelihood to return a "null result".

So, I guess anyone will pick whichever test method suits ones ones aims. My aims are not served by the ABX Test, nor can I conceive any particular situation where serious science, aimed at increasing our understanding would benefit from the typical ABX test, but I can conceive other aims that it would suit admirably.

Ciao T
 
From the BAS paper:
"The gains of the "A" and "B" paths were matched in both left and right channels to within 0.05 dB at 1 kHz using the PCM-F1's gain controls."


Level matching is an important aspect of a well designed test. They matched to about 0.05dB difference which is roughly 1%.
If my memory is correct, in the $10000 listening challenge (or something like that) the levels were matched with an Audio Precision to 0.01dB which is about 0.2%.

Does anyone have more info on the accuracy needed in level matching ?
Can an accuracy of 1% or 0.2% be achieved just with the volume pot of the amplifier or something more elaborate is necessary ?
 
Hi,

Level matching is an important aspect of a well designed test.

Yes. However if the level matching is achieved by using a digital processor with A2D and D2A conversion (the PCM-F1), may one ask what the result of such an action may be?

Level matching is of course important, however I believe it would be required to demonstrate first that the method employed remains undetectable...

In principle it is reasonably trivial to make precision attenuators, however they would also need to be buffered, so the project in implementing this is non-trivial.

Much easier just to use a generic 16 Bit AD/DA processor, one without oversampling, high order LC filters in front of ADC and DAC, who'se replacements by Apogee Digital was what first put that company on the Map.

Ciao T
 
I found this link about the $10k amp challenge:

Richard Clark Amplifier Challenge FAQ

It says that the levels were equalized to 0.05dB precision.
Here are the additional test requirements:

The amplifiers in the test must be operated within their linear power capacity. Power capacity is defined as clipping or 2% THD 20Hz to 10kHz, whichever is less. This means that if one amplifier has more power (Watts) than the other, the amplifiers will be judged within the power range of the least powerful amplifier .
The levels of both left and right channels will be adjusted to match to within .05 dB. Polarity of connections must be maintained so that the signal is not inverted. Left and Right cannot be reversed. Neither amplifier can exhibit excessive noise. Channel separation of the amps must be at least 30 dB from 20Hz to 20kHz.
All signal processing circuitry (e.g. bass boost, filters) must be turned off, and if the amplifier still exhibits nonlinear frequency response, an equalizer will be set by Richard Clark and inserted inline with one of the amps so that they both exhibit identical frequency response. The listener can choose which amplifier gets the equalizer .


So it seems that 0.05dB is somewhat of a standard in this kind of tests ?
 
A standard pot often has 270 degrees of electrical travel and usually does 45 db attenuation over that range. That would be 6 degrees / db. So I suspect .1 db is not a reasonably stable adjustment.

Since I am of the strong OPINION that I can hear difference between amplifiers and even the same amplifier under adjustment, any test that tells they all sound the same must have errors.

Years ago I used to adjust the output bias pot by ear. I am pretty sure that the changes I heard while making the adjustment and listening to music at a low level really did exist as they also show up on instrumentation.

ONLY FOOLS ARE POSITIVE, I am positive about that!
 
Yes. However if the level matching is achieved by using a digital processor with A2D and D2A conversion (the PCM-F1), may one ask what the result of such an action may be?

Level matching.

It wasn't applied to the "pure" analog side but to the channel which included the A/D and D/A conversion. So clearly, it was transparent (as was the PCM-F1), at least to Mr. T, once he couldn't peek and had to use ears alone.
 
ABX This:
 

Attachments

  • ABX.jpg
    ABX.jpg
    18.9 KB · Views: 139
First of all, as Thorsten_L. stated these papers were choosen, because they are freely accessible, the research topics are connected and they gave an impression what a tremendous effort sometimes is needed to implement controlled tests with more than one listener.

And to illustrate the fact that, despite the efforts, something like a ´perfect´test is as illusionary as it is ´perfect´ technical gear.
Some points in every test are debateble, simply due to the fact that the underlying decisions are subjective (at least up to a certain point), see for example the chosen level of siginificance (SL) and that he experimenter can not (and should not) control every variable in a test situation.
The latter because of the golden rule of testing:
"The tighter the control of an experiment the more lack of practical relevance of any result"

Sensory tests are based on subjective evaluation and do rely on answers of the participants. There always exists a uncertainty if everything that was heared was consciously percepted and reported too.

Oohashi et al. is quite interesting because they tried to get a somewhat more objective view on these mechanisms.
Unfortunately nobody else did catch up these methods otherwise we now might know a bit more about the individual reactions to various test schemes.

That one has had a lot of critisism:
1 no one could notice any difference between normal and hypersonic sound.

To be correct, no one could detect the hypersonic sound (HFC) if it was presented alone.

2 it takes 20 seconds for an effect is measured.

That is one of the very interesting effects they have found. They did an experiment, reported the methods and the results- in which way could that be criticised? Just because somebody did not like this result?

3 It has not been repeated by anyone but Oohashi

Obviously these methods are quite expensive, but that nobody did reproduce their work is surely not Oohashi´s fault, is it?

4 quote from page 3550: If you know in what order what is being tested, bias is inevitable.

That part of the experiment seems to be only single blind.

... although not a sensory test or a DBT. ....

It was mainly a dbt, in parts single blind, acoustical stimuli were presented and the reactions of listeners to the various stimuli were observed. Why should that not count as a sensory test?
 
Last edited:
Status
This old topic is closed. If you want to reopen this topic, contact a moderator using the "Report Post" button.