DAC blind test: NO audible difference whatsoever

planet10 · 2017-11-13 4:01 am

JonBocani said:
You need to avoid the ''placebo effect'' in order to prove the efficiency of a drug.

The placebo effect is proven to be effective (same as the hifi boutique salesperson's speech) so the way to deal with that: they split the test in two groups: one that will take the real drug and the other one with some sugar pill that doesn't make any effect (besides placebo).

At the end, most of the time the two groups will have positive effects and the real drug must prove itself more efficient than the sugar pill (placebo) by a certain margin.

That is not an ABX test.

I don’t have issues with blind tests, just when the results are not interpreted correctly. An ABX test cannot be used to determine that 2 DUTs show no difference.

dave

abraxalito · 2017-11-13 4:02 am

Colonel Jessup

JonBocani said:
Even though the truth hurts,

JonBocani · 2017-11-13 4:15 am

planet10 said:
That is not an ABX test.

I don’t have issues with blind tests, just when the results are not interpreted correctly. An ABX test cannot be used to determine that 2 DUTs show no difference.

dave

At the essence, the whole idea is to be able to differentiate something from something else (A from B).

Couldn't be simplier than that: Two things.

''Are you able to identify A from B ?''

There is no interpretation possible, it's a clear cut case of YES or NO. That's the beauty of it.

Then, the logic that comes along with that is: If you CAN'T differentiate A from B, how can you possibly prefer one over another?

That's the spirit.

planet10 · 2017-11-13 4:16 am

The test is not sufficiently strong to show that 2 DUt are the same. This is because of the forced choice. The test does not enable you to make the conclusion you are.

dave

jameshillj · 2017-11-13 4:22 am

I wish I could enjoy my music on cheap gear - I have never understood the concept of comparing one product to another to see which one was 'best' - 'Whose best? it's a bit of nonsense really as what gear I prefer to listen to, another person may dislike intensely

Plus - My 'best' dac (highest specced, that is) is the Ayre but I spend most of my time listening to the modded Line Magnetic and occasionally, a NOS 1541A - it's all about the (listening to) music for me

I've taken part in numerous ABX double-blind tests over the years and find them mostly inconclusive as nearly all are short-term tests, there's little time to concentrate or focus your listening and the written evaluations are completely open to interpretation - you end up listening to differences, not audio reproduction.

JonBocani · 2017-11-13 4:26 am

And, yes Dave, you are right. Tests in pharmaceutical contexts are not ABX. They're not because it's impossible to work that way.

The participants (testees) have no way to test A, then B, then X, numerous times... for obvious reasons. Drugs cannot be tested the same as audio components. It's not a matter of A/B within seconds or minutes, but over days/weeks with a lot of biases.

So the way they do it is similar as the ABX, in the essence, but rather with A group and B group. The placebo group become the base reference. Then, an equivalent positive differenciation would be something like +5% or whatever pourcentage they consider valid, for the non-placebo group.

ABX valid differenciation is usually considered 17/20 or better.
Pharma tests valid differenciation is probably north of 5% over placebo group. I don't know... Interesting, though.

here is a start:

Placebo-controlled study - Wikipedia

JonBocani · 2017-11-13 4:29 am

planet10 said:
The test is not sufficiently strong to show that 2 DUt are the same.

dave

Please tell me more about that.

planet10 · 2017-11-13 4:30 am

None of that matters. ABX is statistically unable to determine whether 2 devices are the same, only if they ar edifferent.

dave

planet10 · 2017-11-13 4:37 am

JonBocani said:
Please tell me more about that.

Read the entire Wikpedia article you cited.

But here the 1st paragraph (italics mine):

An ABX test is a method of comparing two choices of sensory stimuli to identify detectable differences between them. A subject is presented with two known samples (sample A, the first reference, and sample B, the second reference) followed by one unknown sample X that is randomly selected from either A or B. The subject is then required to identify X as either A or B. If X cannot be identified reliably with a low p-value in a predetermined number of trials, then the null hypothesis cannot be rejected and it cannot be proven that there is a perceptible difference between A and B.

JonBocani · 2017-11-13 4:44 am

jameshillj said:
find them mostly inconclusive as nearly all are short-term tests, there's little time to concentrate or focus your listening and the written evaluations are completely open to interpretation - you end up listening to differences, not audio reproduction.

I heard that a lot, but i found that -on the contrary- the longer it is, the less your brain can potentially spot a difference.

Best music excerpt time would probably be somewhere between 5sec and 25sec.

Audio memory is very very short.

The thing to remember is NOT all blind test falls in the ''everybody fails'' pit. There is always a threshold. And these threshold proves that ABX test is a valid method. At the very least, it proves that some things show bigger differences than other, who falls in the more...subtle. If any.

I remember the first serious blind test i organized back in 2010... MP3 v.s. AAC v.s. CD v.s. HD 24/96.

I had to lower the quality til 64kpbs (!) MP3 files, to find the threshold where MOST people (not all!) could spot it. That was a shock. I was able to do it, so was my audiophiles buddies... But few participants were not. To my big surprise.

As ''low'' as 192kbps.... no one could spot any of the files. So the threshold was somewhere between 96 and 128kbps. MP3 only, AAC was impossible to spot either.

And i'm not even talking about the 24/96 v.s. HD or the AAC 256kbps... No one was even close. MP3's were challenging enough.

So, YES, thresholds are the key here. ABX shouldnt be discarded because a threshold is not yet found.

I'm pretty sure if you ABX a Pepsi and a glass of Vodka, you'll find it. 😀

planet10 · 2017-11-13 4:47 am

I'm pretty sure if you ABX a Pepsi and a glass of Vodka, you'll find it.

ABX is a strong test as far as 2 DUTs being found different.

dave

JonBocani · 2017-11-13 4:54 am

If X cannot be identified reliably with a low p-value in a predetermined number of trials, then the null hypothesis cannot be rejected and it cannot be proven that there is a perceptible difference between A and B.

Ok ?

Today we had, numerous times, participants that WERE NOT sure of the answer, once ''forced'' to give it.

On a scientific point of view, is that a problem ? No.

ABX test is meant to demonstrate that you can identify A from B. Therefore, if you're NOT able to provide the answer for a round, then that round says ''NO, you cannot identify A from B''. It's the equivalent of a negative (wrong) answer.

If X is A
...and you say B. It's a negative.
if you say ''i will not be forced to answer/ i don't know'', it's also a negative.

You cannot cheat an ABX test, unless you know to differentiate but tells the opposite.. Why would we do that?

You have to prove your capacity to identify. Simple as that.

JonBocani · 2017-11-13 4:56 am

ABX is a strong test as far as 2 DUTs being found different.

still don't know what you mean, Dave.

planet10 · 2017-11-13 4:57 am

JonBocani said:
ABX test is meant to demonstrate that you can identify A from B. Therefore, if you're NOT able to provide the answer for a round, then that round says ''NO, you cannot identify A from B''. It's the equivalent of a negative (wrong) answer.

That is not so. The conclusion may seem valid, but it is statistically invalid.

dave

planet10 · 2017-11-13 5:02 am

JonBocani said:
still don't know what you mean, Dave.

An ABX test is used to prove 2 DUT are different, it cannot be used to prove 2 DUT are the same.

dave

JonBocani · 2017-11-13 5:05 am

planet10 said:
That is not so. The conclusion may seem valid, but it is statistically invalid.

dave

17/20 is what's considered valid.

We sure as hell can consider a 501/499 result as ''no one can tell the difference'' 😀

As much participants as possible for as much trials as possible on the most stable test environment possible. That's the key.

We were only 4 today. Granted: that's not much. On a ''scientific/statistical'' basis, probably not valid.
That being said, i see no hint whatsoever that could change the outcome if the test was made with 20 or 1000 participants.

And, frankly, i was expecting some day & night differences from a 30$ DAC v.s. one that is 100x the price...

EVEN IF 1 participant out of 10 could spot it... that would be a problem, IMO. But that was not even the case.

planet10 · 2017-11-13 5:07 am

JonBocani said:
We sure as hell can consider a 501/499 result as ''no one can tell the difference''

No you can’t. The test is statistically incable.

dave

JonBocani · 2017-11-13 5:17 am

planet10 said:
An ABX test is used to prove 2 DUT are different, it cannot be used to prove 2 DUT are the same.

dave

That's where i don't agree with that ''logic''. It doesnt make any sense.

As i mentionned earlier, it's about thresholds. The line where things become possible to identify (for some).

Music files test, it was low bitrate MP3. Not HD24/96, not even lossy AAC, not CD, but MP3 below 192kbps...

Now, midrange drivers... You have to remove 1/2 octave to spot it.

Same with SPL: educated guess here: some will find 0.5db differencial but to get 99%+ positive answer, you'll probably need 1.5db diff. While making anything less than 0.2db would prove that no one is able to spot it.

Thresholds.

Pepsi/Vodka
Pepsi served @ 4.12 deg C v.s. Pepsi served @ 4.13 deg C.

Maybe 1 human out of 1,000,000,000 won't be able to spot the Pepsi/Vodka (probably drunk) and maybe 1 human out of 1,000,000,000 WILL be able to spot the 0.01 deg temperature difference.

But, if the ABX test would be made on all humans on the planet, we could find that the threshold are: 1.5ml Vodka in 250ml of Pepsi is the threshold for 50,1% of the population. And that 1,4 deg C temp difference is the threshold for 50,1% of the population.

JonBocani · 2017-11-13 5:19 am

planet10 said:
No you can’t. The test is statistically incable.

dave

501/499 says ''random'' all over the wall.

Or 5,000,000,000,000 V.S. 5,000,000,000,000 if you prefer... 😎

JonBocani · 2017-11-13 5:26 am

If 1 audiophile out of 2 audiophiles would be able to spot a difference between component A from component B... Would you be more interested about it than another set of components that gave 0 out of 100 results ?

I would.

That's thresholds. That's having perspective of what's doing what. That's knowing the true audible impact from each components.

Search

Amplifiers

Source & Line

Loudspeakers

Design & Build

General Interest

Live Sound

Member Areas

Site

Featured Vendors

Members Market

Vendors Market

Vendors

Search

DAC blind test: NO audible difference whatsoever

planet10

abraxalito

Attachments

JonBocani

planet10

jameshillj

JonBocani

JonBocani

planet10

planet10

JonBocani

planet10

JonBocani

JonBocani

planet10

planet10

JonBocani

planet10

JonBocani

JonBocani

JonBocani

DAC blind test: NO audible difference whatsoever

Frugal-phile™/Moderator

Member

Attachments

Member

Frugal-phile™/Moderator

Member

Member

Member

Frugal-phile™/Moderator

Frugal-phile™/Moderator

Member

Frugal-phile™/Moderator

Member

Member

Frugal-phile™/Moderator

Frugal-phile™/Moderator

Member

Frugal-phile™/Moderator

Member

Member

Member