DAC blind test: NO audible difference whatsoever

Status
This old topic is closed. If you want to reopen this topic, contact a moderator using the "Report Post" button.
That's fair, Jakob. It's a push pull between sides calling one "delirious" and "deaf". Getting at the hard truth usually takes a lot more effort than anyone wants.

It's always a push-pull between statistical power (to eliminate chance, where ternary/tetrad tests win) and actually getting testers to perform well on positive and negative controls (where ternary/tetrad tests show up problematic).
 
Administrator
Joined 2004
Paid Member
You still don't get the problem with this thread or what is being said about such "hacked together test" - do you?
LOL. Scott "gets it" don't worry about that. He was likely talking about the broader scope of A/B and ABX that so often gets discussed across these pages.
If one fails to hear a difference, blame the test. That's been done countless times over the years, here and across the audio forums. The test works if you do hear a difference, but the test is faulty if you don't. It could never be that there simply is no difference, could it? :)
 
That's fair, Jakob. It's a push pull between sides calling one "delirious" and "deaf". Getting at the hard truth usually takes a lot more effort than anyone wants.

Fair is it if we consider the "cult like" beliefs on all sides that i´ve mentioned, but if somebody claims to be an objectivist, judging strictly on a scientific basis and demanding the application of scientific tools even by hobbyists then imo we should expect more from him as to accept any result - regardles of methodological sloppyness - if it only suits his personal belief what a result should be or even _must_ be.

Instead imo we could expect him to not only demand "do an ABX and you will see" but to recommend anything that helps an unexperienced tester to achieve correct results.

It's always a push-pull between statistical power (to eliminate chance, where ternary/tetrad tests win) and actually getting testers to perform well on positive and negative controls (where ternary/tetrad tests show up problematic).

"to eliminate chance...." puzzles me a bit, as statistical power (1-beta) means to really detect a difference when there is one. (Or more formally to reject the null-hypothesis if it is wrong)

So for assumed effect sizes we are able to perform power calculations for the various test approaches.

But as several authors have pointed out, in reality there exists a distinction between the calculated (theoretical) statistical power and the so-called operation power as human factors are getting into the game.
So,while the calculated statistical power for example in the 2-AFC test is lower than in the 3-AFC test, the actual operational power might be higher as it is somewhat easier for the participants to do the test.

In that sense the term "operational power" means to detect a difference in the actual test _reality_ when there exists a difference.
 
Last edited:
LOL. Scott "gets it" don't worry about that. He was likely talking about the broader scope of A/B and ABX that so often gets discussed across these pages.
If one fails to hear a difference, blame the test. That's been done countless times over the years, here and across the audio forums. The test works if you do hear a difference, but the test is faulty if you don't. It could never be that there simply is no difference, could it? :)

But he doesn't get it - in my post I particularly referred to this thread & its title with the obviously flawed tests & flawed conclusions to which he replied as an apologist, in defense of such non-science.

I would suggest that Jakob's post sums up my thoughts about Steve's post
" if somebody claims to be an objectivist, judging strictly on a scientific basis and demanding the application of scientific tools even by hobbyists then imo we should expect more from him as to accept any result - regardles of methodological sloppyness - if it only suits his personal belief what a result should be or even _must_ be."
 
Last edited:
It´s from:
Virginie Jesionka, Benoît Rousseau, John M. Ennis. Transitioning from proportion of discriminators to a more meaningful measure of sensory difference. Food Quality and Preference, 32 (2014), 77–82

In this graph mainly calculated for overall test results; for example in case of the "Tedja et al." 1 main test subject and the other two subjects used for confirmation purposes. Participant 1 got correct 50.4 % in Triangle and 75.1% in 3-AFC and had done 720 triangle and 718 AFC trials, while subject 2 and 3 did 240 triangle and 240 3-AFC each.

In the actual articles often more refined even calculated for the presented different triplets.
Thanks Jakob
 
......
Instead imo we could expect him to not only demand "do an ABX and you will see" but to recommend anything that helps an unexperienced tester to achieve correct results.

Exactly! If people are ACTUALLY trying to find the truth about audible differences then they would NOT be supporting a parlor trick listening test (like we see here) guaranteed to deliver a null result (no better than chance outcome).

This is the great dupe being perpetrated on audio forums by many of those who claim to be on the side of science - their self-proclaimed objectivism couldn't be further from the truth!!
 
That's fair, Jakob. It's a push pull between sides calling one "delirious" and "deaf". Getting at the hard truth usually takes a lot more effort than anyone wants.

It's always a push-pull between statistical power (to eliminate chance, where ternary/tetrad tests win) and actually getting testers to perform well on positive and negative controls (where ternary/tetrad tests show up problematic).

Well, if tests with adequate controls & adequate statistical power aren't feasible then what are you suggesting? Are you saying that such audio tests should be confined to research facilities conducted by people who know their limitations & traps? That such tests being suggested on audio forums as "proof" of audibility is, at best, ill-informed, at worst, disingenuous?
 
LOL. Scott "gets it" don't worry about that. He was likely talking about the broader scope of A/B and ABX that so often gets discussed across these pages.
If one fails to hear a difference, blame the test. That's been done countless times over the years, here and across the audio forums. The test works if you do hear a difference, but the test is faulty if you don't. It could never be that there simply is no difference, could it? :)

If a test, shown to be one of the least sensitive blind tests (when run & administered by people who know what they are doing & using trained listeners) is suggested on audio forums for untrained people to run as a suitable means of proving audibility - do you not think that such a flawed approach should be called out?

Yes, there may not be any audible difference to be found but the ABX test seen here (& typical of 99% of audio forum blind tests on many audio forums) are nowhere near suitable to answer this question. To pretend otherwise is disingenuous. "how about there just be no audible difference" is an answer that ignores the point being made about the flawed nature of the test & its misuse on audio forums, as evidenced by this thread's title
 
Last edited:
"If you can't dazzle them with brilliance then baffle them with BS!" Can't recall who said that, but it would appear to be appropriate here.

I don't see any point in continuing to produce pages of pseudo-scientific gobbledygook to show that the op's methodology, from a scientific perspective, was flawed, when that is already quite clear to everyone.
OK. I do see one point, if helps people to avoid dealing with the obvious issues raised by Pano in post #1722 above.
 
"to eliminate chance...." puzzles me a bit, as statistical power (1-beta) means to really detect a difference when there is one. (Or more formally to reject the null-hypothesis if it is wrong)

So for assumed effect sizes we are able to perform power calculations for the various test approaches.

But as several authors have pointed out, in reality there exists a distinction between the calculated (theoretical) statistical power and the so-called operation power as human factors are getting into the game.
So,while the calculated statistical power for example in the 2-AFC test is lower than in the 3-AFC test, the actual operational power might be higher as it is somewhat easier for the participants to do the test.

In that sense the term "operational power" means to detect a difference in the actual test _reality_ when there exists a difference.

I'm pretty sure we're saying the same thing: in essence the more "options" (e.g. paired/tetrad vs 3-AFC) reduces the probability of a test subject of guessing the right answer (in this case, matching correctly), which means fewer tests are required to rule out chance. On the corollary, the higher order tests are far more demanding of the test subject, and therefore our "sensor" (the test subject) is less reliable and more likely to give responses that look like noise/chance.

One has to balance the false positive rate against the false negative rate, and sensitivity vs selectivity. That's all.

(Maybe it's a language thing? Dunno, I don't really get a sense that we're disagreeing, other than I'm probably more sensitive to false positives, and you're a bit more sympathetic. )

Well, if tests with adequate controls & adequate statistical power aren't feasible then what are you suggesting? Are you saying that such audio tests should be confined to research facilities conducted by people who know their limitations & traps? That such tests being suggested on audio forums as "proof" of audibility is, at best, ill-informed, at worst, disingenuous?

I don't know about you, but I do this stuff for my own enjoyment. It's an entertainment industry. Nelson Pass has a great line about subjectivists vs objectivist and hoping at the end of the day that everyone's having fun. Therefore, if I want to run a test (in this field, not my work-work), it's going to be in that vein. Do I like this more than that? And do I even care if that essentially becomes an arbitrary decision? Not going to help anyone else,

Let's be honest, if you want to take a one sided argument that this "null" ABX is the worst thing on the planet, be my guest. Please have the same level of passion in arguing about every "positive" listening test out there on the forum as well and the ill-information and disingenuity involved. Because they're probably even less rigorous. Why are you so passionate about (potentially) false negatives when there's a plethora of (potentially) false positives here as well? Surely your work in the opamp and capacitor rolling crowd would be of much service.

So I advocate running tests to find out what you like and not even pretend it was remotely scientific (aka, useful to anyone but yourself). Find out what makes you happy and run with it.

If someone wants to come at it with all the rigor that's needed, that's great, but wholly unexpected in a diy audio forum. I'd really enjoy if people actually knew the when they're spouting anecdote and when they have something of substance, but this isn't the venue for that. (Leave that for work-work)
 
"If you can't dazzle them with brilliance then baffle them with BS!" Can't recall who said that, but it would appear to be appropriate here.

I don't see any point in continuing to produce pages of pseudo-scientific gobbledygook to show that the op's methodology, from a scientific perspective, was flawed, when that is already quite clear to everyone.
OK. I do see one point, if helps people to avoid dealing with the obvious issues raised by Pano in post #1722 above.

No, it's not just that it was flawed from a scientific perspective - it is flawed from the perspective of trying to find out what are real audible differences - that's the point - perception is not a black & white, binary thing.

People hear differences & on audio forums are often challenged to prove this with this parlor trick test posing as scientific. It's BS & should be pointed out whenever this trick is foisted under the banner of scientific investigation.
 
@DPH - again I agree with you - this hobby is for enjoyment - listen & decide what moves you, connects you to the music playback better - do whatever tests you like if you want.

Yes, one may find out in time that one is wrong & made the wrong choice but you also may do a blind test & find out in time that there actually was a difference - neither is foolproof but I guess it all boils down to which mistakes are more crucial to you, false positives or false negatives?

If somebody reports an improvement/device that made a sonic difference & there is enough validation & I'm interested I will try to evaluate this myself - end of. If I'm unsure of a sonic difference that's it - I may park it & return to it later - improvements in other areas can often reveal differences that were once masked.
 
Well, for me, I'd like to thank the OP because this thread has saved me a whole bunch of time - instead of trying out all sorts of DACs (since one can't rely on most reviews) I will simply buy a decently engineered DAC and feel happy knowing that it's not worth the money and time to chase down the ghost of a better sounding one.
 
No, it's not just that it was flawed from a scientific perspective - it is flawed from the perspective of trying to find out what are real audible differences - that's the point - perception is not a black & white, binary thing.

People hear differences & on audio forums are often challenged to prove this with this parlor trick test posing as scientific. It's BS & should be pointed out whenever this trick is foisted under the banner of scientific investigation.

Real audible differences are differences that can be heard. Its really.that simple. And all that needs to be done to show they exist is for someone to demonstrate they can hear them in a controlled situation.
Why doesn't that happen? The test doesn't need to be perfect, just good enough to reveal the difference. How good should a test need to be to hear any difference between the two DACs in the OP. For goodness sake its like comparing a Ferarri with a Fiat500.
 
I'm pretty sure we're saying the same thing: in essence the more "options" (e.g. paired/tetrad vs 3-AFC) reduces the probability of a test subject of guessing the right answer (in this case, matching correctly), which means fewer tests are required to rule out chance.

I was simply wondering about
"It's always a push-pull between statistical power (to eliminate chance, where ternary/tetrad tests win)...."

because due to the part in brackets i understand it as if to eliminate chance is coupled to the statistical power.

In our discussed tests the null hypothesis is usually " no actual difference i.e. random guessing" and the guard against false positives is the choosen significance level criterion.
So, as you´ve, said using a test protocol with lower guessing chance in each trial leads to higher efficiency (as a smaller number of trials is needed) to test on the same level of significance, but it does not automatically raise the statistical power.

On the corollary, the higher order tests are far more demanding of the test subject, and therefore our "sensor" (the test subject) is less reliable and more likely to give responses that look like noise/chance.

Which means that the statistically power is lower when using these kind of tests.

One has to balance the false positive rate against the false negative rate, and sensitivity vs selectivity. That's all.

"That´s all" it should be that simple but as history and todays discussions imo show, it is not, as the guard against false negatives is routinely neglected.
Even if not, the normally accepted statistical power threshold of 0.8 still is a four times larger risk for beta errors as for alpha errors. In the balanced case it should be for alpha = 0.05 -> beta = 0.05 -> power = (1- beta) = 0.95 .

(Maybe it's a language thing? Dunno, I don't really get a sense that we're disagreeing, other than I'm probably more sensitive to false positives, and you're a bit more sympathetic. )

Could be of course a language thing; basically i´m more like an agnostic so interested in balanced error risks..... :)

Btw, the second quote in your post was not written by me, but i´d like to address one point of your response:


<snip> Why are you so passionate about (potentially) false negatives when there's a plethora of (potentially) false positives here as well?<snip>

In my case it is triggered by the usuage of these controlled blind tests in the great debate about possibly audible differences.
My interest in these tests was triggered when reading articles by Dan Shanefield about "blind" listening tests because his proposal sound reasonable. As described earlier we tried it and i got a positive result while my colleague did not although in a qualitative way he was able to describe the sonically difference between the two DUTs exactly the way as i perceived them.
So in our first attempt we already faced some of the difficulties with this kind of tests.

I became member of the AES and therefore read some time later the first article by Les Leventhal about the potentially high probability for false negatives when using the then often used ABX tests. New stuff for me (nothing new in the other fields but in these often neglected even up to today) and it became clear that some modifications of the test routine were needed.

The authors/proponents of the ABX method responded to Leventhals critic and although polite (that´s the way it is in a professional journal) the authors used some ad hoc arguments (for example if it´s not detected in the ABX then it will not be of practical relevance) and overall you could notice that the critic wasn´t appreciated.
That surprised me, as the scientific reasoning for doing such tests (the hunt for the truth) wasn´t compatible to the response by the authors.
Ad hoc arguments are fine but in science you know that you have the obligation to test if these new arguments are really a rescue for your model, but that never happened.
Btw, the much less polite version of response to Leventhal by two of the authors could be read in the Stereophile letter section in 1989 and that made me even more suspicious that often an agenda played a role but not the search for science or truth.

Instead the ABX test became sort of a knock out argument and additionally unfortunately sort of a synonym for "blind test" .
While i understand the frustration about the lack of efforts from "audiophiles" to deliver some hard(er) evidence for all these (or at least for some) allegedly audible differences i don´t have much sympathy for all these misguided conclusions due to negative test results.

Furthermore as every listener has to decide if something is useful for his listening experience or not regardless if somebody else did hear something somehow and somewhere. Presumably not everything that was (probably) audible in listening test elsewhere is of importance for other listeners.

So I advocate running tests to find out what you like and not even pretend it was remotely scientific (aka, useful to anyone but yourself). Find out what makes you happy and run with it.

Good proposal, but it will not help in forum discussions. :)
 
Real audible differences are differences that can be heard. Its really.that simple.

Absolutely.

And all that needs to be done to show they exist is for someone to demonstrate they can hear them in a controlled situation.

That´s were the difficulties start... :cool:

Why doesn't that happen? The test doesn't need to be perfect, just good enough to reveal the difference. How good should a test need to be to hear any difference between the two DACs in the OP. For goodness sake its like comparing a Ferarri with a Fiat500.

Although we know that most likeyl no test will ever be perfect (it´s human work) in forums discussions (and whenever religious like beliefs dominate) anything less than perfect will not help.

The same people,who do not believe in audible differences between - lets say DACs - and talking about audiophiles only searching for excuses do become very creative in case of positive test results.

If somebody gets a positive test result he faces:

Was it published? (if not the opponent doesn´t care about the test)
If it was published, was it peer reviewed? (if not the opponent doesn´t care..)
If it was published and peer reviewed, then it was the wrong journal and/or the editiors should have asked other reviewers that were more knowledgeable in this particular field
Was is replicated? (if not your opponent doesn´t care, but he will never disclose what number of replications would be sufficient)
Was it sufficiently significant? (if "only" at the 0.05 level, your opponent will not be satisfied, but will never disclose which level of significance would be sufficient)
And if nothing else helps, you´ll face the accusation that the test was never done or the result manipulated.

This list is not imagined but simply reflects what has happened over the years in this and other forums. ;)
 
Real audible differences are differences that can be heard. Its really.that simple. And all that needs to be done to show they exist is for someone to demonstrate they can hear them in a controlled situation.
Why doesn't that happen? The test doesn't need to be perfect, just good enough to reveal the difference. How good should a test need to be to hear any difference between the two DACs in the OP. For goodness sake its like comparing a Ferarri with a Fiat500.

This is the issue - perception isn't understood by many here & hence how to test it is also misunderstood. Some people know this - listen to Jakob, for instance but most don't. Even things that are obviously audibly different can escape detection in such tests - perception is just not a binary, black & white thing, as I said before. And those who have designed rigorous blind tests know this & hence properly run tests ares specifically designed to deal with the nature of perception & its natural variability.

So, when you have a test that ignores these design principles the overwhelming likelihood is that there will be 50% right answers 50% wrong answers i.e. no better than chance according to statistical analysis.

DPH asked why we don't rile against sighted listening in the same way i.e. why we don't highlight the flawed nature of it? Simple - everyone, with any sense, uses posted anecdotal descriptions as possibly (& depending on forum, probably) incorrect & as such they accumulate evidence (possibly other anecdotal reports) to judge if it's worth personally investigating.

On the other hand, these blind tests are claimed to have a higher level of veracity - usually claimed to be backed by science when in fact they are a shoddy pretense at what correctly designed blind tests attempt to do.

This is the issue & it is adequately evident that many on this thread & forum are fooled into believing that such shoddy tests are better than anecdotal listening impressions - nothing could be further from the truth.
 
Last edited:
@jakob - again I agree - the goal of science (the search for truth) is the first victim of tests such as ABX often used as a denial of someone's listening impressions. Sure, anecdotal reports of listening impressions can be wrong, so what - evaluate these for yourself, just as you try to do with everything else in life.

I rail against these ABX tests in forum discussions because they mask the possible truths rather than attempting to get to the heart of the matter & do so in the name of science, which is doubly galling.

As I keep saying auditory perception is a moving target which is not easy to evaluate - lots of people have a simplistic view of what is being evaluated.

If people are going to use such tests as "proofs" then the onus is on such people to educate themselves in what is being evaluated & how to evaluate it - not just use some simplistic thinking & testing.
 
Status
This old topic is closed. If you want to reopen this topic, contact a moderator using the "Report Post" button.