DAC blind test: NO audible difference whatsoever

Status
This old topic is closed. If you want to reopen this topic, contact a moderator using the "Report Post" button.
Nice initiative JonBocani.

A difficulty with blind tests is setting it up so it's possible to hear smaller details. This often implies that the listeners have done some training before the final test, plus the music material needs to reveal the differences. The acoustic of the room also plays a major factor. Or use headphones of course.

If these and other criterias aren't met, the result often ends up with a null result. An example here is a blind test between MP3 in 192 kbps vs lossless some radio stations ran. They found no audible difference in their test and concluded MP3 in 192 kpbs was as good as lossless. The problem is that many other similar blind test have proven an audible difference and there are also those have been able to distinguish 256 kpbs and 320 kbps to lossless. It all depends on how well the test is performed.

A trained listener under the best conditions can pick out differences which aren't audible under a typical blind test with random non-trained listeners.

Bottom line is: A blind test doesn't necessarily give us the answers.
 
i am highly experienced and have tested so many dacs from the very cheap to $2000 class
i can easily tell them apart with my setup, ABX blinded or not whatever you like
so far no cheap dac (under $500) can reproduce realistic or acceptable drums to me and it can be heard in seconds
all dacs sound the same? no more joke please
(all DIYers know that, you need a very revealing system and well trained ears to differentiate high end dacs, and they know the reason well)
 
mmerrill99 said:
It's in your use of the words "likely" & "roughly" where you need to seek the answer
I knew you would say something like that.

Funnily when people use the type of ABX test that started this thread then they almost always get a no difference found, null result
They get a null result when the difference is small. They don't when the difference is large. What is surprising about this? Of course, if someone's income depends on others continuing to believe that a small difference is actually a large difference then it is not surprising that FUD flies around.

You still refuse to look into hidden anchors & controls - oh well.
I am not designing tests. I am not criticising tests. You are criticising tests yet continually refuse to actually say anything useful. When pressed you shift the goalposts.

Yes in this test of unknown quality there is no difference found - so what does that tell you?
It tells me that the difference is small. This is exactly what electrical theory tells me about any competently made DAC. Hence I have nothing to explain. I don't know how small is the difference. I am not sure that you know how small either (or how big - as seen from your point of view). You still haven't told us how you can be so certain that the test was flawed and insensitive, when you (like us) have no idea how sensitive the test was and you have not told us how different these DACs should be (information which we don't have, but you appear to have as possession of this information is a necessary prerequisite for criticising the test).

Ones that can correlate with auditory perception
And they are? If you don't know then say so. If you do know then tell us, or at least try to hide behind commercial confidentiality.

You really believe this??
Yes. DACs are low power devices, so thermal issues are unlikely to matter. Word-length combined with modern accuracy and dither means that resolution is well below any likely psychoacoustic threshold. The difficult part of audio electronics, where test signals matter, is power output stages. And, of course, transducers.
 
Omholt said:
A trained listener under the best conditions can pick out differences which aren't audible under a typical blind test with random non-trained listeners.
That may be so, but what does it tell us about untrained listeners listening to music under non-test conditions? This is what DACs are sold for. It is often asserted (implicitly or explicitly) that untrained listeners can detect differences under non-test conditions which cannot be detected under test conditions. How can we test this assertion? If we test it then the fact that we are doing a test seems to erase it - according to some.

Bottom line is: A blind test doesn't necessarily give us the answers.
That depends on what the question is. Of one thing we can be certain: a sighted test necessarily cannot give us the answers.

kinsei said:
i am highly experienced and have tested so many dacs from the very cheap to $2000 class
i can easily tell them apart with my setup, ABX blinded or not whatever you like
so far no cheap dac (under $500) can reproduce realistic or acceptable drums to me and it can be heard in seconds
all dacs sound the same? no more joke please
Your test lacks controls and calibration. Your statistical analysis is laughable. Your equipment is inadequate for the task. You need to demonstrate the contrary before we can take you seriously.

There: I just saved mmerrill99 a post. He must agree with me, for these are his standard criticisms of DAC tests. Once again I will not hold my breath.
 
@QAMatt,

you asserted that the ABX is the most sensitive test to detect differences and to that assertion was Mark4 responding - i assume with referring to the various articles i´ve cited a couple of posts ago. These were exploring the differences between ABX tests and other test protocols and it was found that others showed better results or a higher proportion of correct responses.

As said below they didn´t use multidimensional stimuli (and weren´t exploring the hearing sense) and any impact of training wasn´t examined.

Harris /1/ asserted already in a letter to the JASA (1952) that in their experiments an A/B test was more sensitive than an ABX, with further corrobation by other people doing similar experiments.

In addition they found that subjectively the ABX test task was more difficult.
"In this laboratory we have made some comparisons among DLs for pitch as measured by the ABX technique and by a two category forced-choice judgment variation of the constants method (tones A B, subject forced to guess B "higher" or "lower"). Judgments were subjectively somewhat easier to make with the AB than with the ABX method, but a greater difference appeared in that the DLs were uniformly smaller by AB than by ABX. On a recent visit to this laboratory, Professor W. A. Rosenblith and Dr.Stevens collected some DLs by the AB method with similar results.
The case seems to be that ABX is too complicated to yield the finest measures of sensitivity."

Huang/Lawless /2/ did experiments in 1997 comparing the ABX with other protocols (like paired comparison, 3AFD, Duo-Tria and Triangle) and their data showed that, although all tests delivered significant results, that the proportion of correct responses was higher for paired comparison and 3AFC.

Macmillan/Creelman /3/ predicted that, due to their models used, 2AFC and (mostly) 3AFC tests would show greater proportion of correct responses than ABX except when the differences were really large. (ABX would be more sensitive than same/different tests)

But all of these were done with tests where the DUTs differed only in one dimension, so it might be different with multidimensional stimuli, although i´ve experienced the same feeling of difficulty when trying ABX myself and observed it too when asking two other people trying it, therefore dropped (decided not to use) it.

/1/ J. Donald Harris, Remarks on the Determination of a Differential Threshold by the So-Called ABX Technique, J. Acoust. Soc. Am. 24, 417 (1952).
/2/ Yu-Ting Huang,Harry Lawless, Sensitivity of the ABX Discrimination Test, Journal of Sensory Studies 13 (1998) 229-239.
/3/ Neil A. Macmillan, C. Douglas Creelman, Detection Theory: A User’s Guide, 2nd edtion, Lawrence Erlbaum Associates, Inc., 2005, 253
 
Wowowow, using this forum but haven't checked this topic yet. Interesting.

Just like this article why we don't need those large 32bit files to feed our high-end DACs. I myself cannot make any judgements but those who are interested, read on.

Maybe clarifies things a little bit, maybe not. I'm still thinking (being in the middle of a complete new DIY chain) so decide for yourself. Cheers.
 
Nice initiative JonBocani.

A difficulty with blind tests is setting it up so it's possible to hear smaller details. This often implies that the listeners have done some training before the final test, plus the music material needs to reveal the differences. The acoustic of the room also plays a major factor. Or use headphones of course.

If these and other criterias aren't met, the result often ends up with a null result. An example here is a blind test between MP3 in 192 kbps vs lossless some radio stations ran. They found no audible difference in their test and concluded MP3 in 192 kpbs was as good as lossless. The problem is that many other similar blind test have proven an audible difference and there are also those have been able to distinguish 256 kpbs and 320 kbps to lossless. It all depends on how well the test is performed.

A trained listener under the best conditions can pick out differences which aren't audible under a typical blind test with random non-trained listeners.

Bottom line is: A blind test doesn't necessarily give us the answers.

This painful truth will not gain much traction here, sadly.
 
That may be so, but what does it tell us about untrained listeners listening to music under non-test conditions? This is what DACs are sold for. It is often asserted (implicitly or explicitly) that untrained listeners can detect differences under non-test conditions which cannot be detected under test conditions. How can we test this assertion? If we test it then the fact that we are doing a test seems to erase it - according to some.
Answer to the bolded:
It doesn't necessarily give us any clear answers. One reason is that the music material used may not reveal the differences. As an example: If you're going to hear differences between MP3 in 256 kpbs vs lossless you need pick out a part of a song that actually reveals the subtle differences, which are mostly in the very high frequencies by the way. How many blind tests do you think actually do that? Just picking out something random isn't good enough here.

Another reason is that people may very well hear the differences but they don't know what to listen the first time and need some training.

It isn't about having golden ears but more about doing the test so it's possible to distinguish what everyone with normal hearing are capable of hearing in the long run. How important the differences are is another debate.

Even if there's only 5% of a song that reveals audible differences it's still something may contribute to music pleasure over time. Obviously not much though.



That depends on what the question is. Of one thing we can be certain: a sighted test necessarily cannot give us the answers.
Absolutely. Sighted tests and lack of volume leveling is no proof whatsoever. A lot of subjectivisme out there that claim the right answers and that's folly. Blind tests are the best we have. But let's make sure we set them up so small differences can actually be detectable and not end up calling something science and proof when it really isn't.
 
Omholt said:
It doesn't necessarily give us any clear answers. One reason is that the music material used may not reveal the differences. As an example: If you're going to hear differences between MP3 in 256 kpbs vs lossless you need pick out a part of a song that actually reveals the subtle differences, which are mostly in the very high frequencies by the way. How many blind tests do you think actually do that? Just picking out something random isn't good enough here.
That is why we need lots of tests, so each test result becomes just one piece of data which when accumulated with other data gradually produces a picture. We need the test protocols not too rigidly specified, because if everyone merely repeats exactly the same test then we can expect to get the same outcome. I see no clash between one test which finds 192kbps is indistinguishable from lossless, and another test which finds a difference; they are merely two data points in the whole picture. I do not draw the conclusion that the first test was flawed and the second test was correct, as some might.

Even if there's only 5% of a song that reveals audible differences it's still something may contribute to music pleasure over time. Obviously not much though.
Yes. Doing lots of tests with lots of different music will gradually tell us what is audible and what is not, and how likely someone is to notice it.
 
I knew you would say something like that.


They get a null result when the difference is small. They don't when the difference is large. What is surprising about this?

Which is quite unspecific, as small and large seem to have quite different meaning to listeners.
Is a difference that a listener did not detect without beeing directed to it, but afterwards never fails to detect it, a small or a large difference?
Furthermore is "small" the same as "unimportant" and who decides what it is?

We should remember that this kind of reproduction is exclusively meant to be consumed by human listeners and that the perceptions of these listeners are the final arbiter about the reproductions quality, small or large differences (i.e. quality of the percepted illusion).
A difference measure in technical terms that omits the human/individual qualifying is missing the whole point.

Imo an answer is still missing, does the undetected (by a large proportion) "gorilla" qualify for a "large" or a "small" difference; whatabout the eletric guitar in the inattentional deafness experiments?

Of course, if someone's income depends on others continuing to believe that a small difference is actually a large difference then it is not surprising that FUD flies around.

Wouldn´t it be better to refrain from this sort of "ad hominem" approaches?
Short time ago you deliberately used the term "ears only" as synonym for a "blind test" which is not only misleading but totally incorrect. You even conceded after questioned about it, that you (of course) knew that no "ears only" test exist.
Do i have to conjecture about your (hidden?) motivations?
Wouldn´t it be better to stick to facts and arguments to have a fruitfull discussion?

I am not designing tests. I am not criticising tests.

Imo you are criticising tests, which btw is the only way to get good/better tests.

It tells me that the difference is small. This is exactly what electrical theory tells me about any competently made DAC. Hence I have nothing to explain.

If you are willing to accept the results of sloppy tests because you are biased by your knowledge about measured differences than the risk of wrong conclusions is quite high.
Because you are only able to qualify the degree of difference when comparing measured differences to a model of the hearing sense. To examine if your sorting in "small" and "large" are reflecting the reality you need to do pereceptual evaluations. Results of those tests will only help you if it were sound tests.

As you´ve said you know not much about the tests done (and we others don´t do either) but you seem willing to accept the results as data point just because these were done somehow "blind".

But, as stated before, any of these tests delivers as a result just "nullhypothesis can be rejected" or "nullhypothesis can not be rejected" .
Any further conclusion is only warranted if the usual quality criteria are fulfilled, therefore tests we don´t know much about do not create data points that help us to answer our (research) question(s).

If you want to departure from this formal logic - we don´t know much, but they are good guys so everything will have been okay - you have to accept that sighted listening test create data points too, although you never know if the results are correct.
 
That is why we need lots of tests, so each test result becomes just one piece of data which when accumulated with other data gradually produces a picture.
I agree that multiple tests is an advantage.
We need the test protocols not too rigidly specified, because if everyone merely repeats exactly the same test then we can expect to get the same outcome.
I think we need some minimum requirements. You can get a null result from almost anything if certain aspect aren't met.

I see no clash between one test which finds 192kbps is indistinguishable from lossless, and another test which finds a difference; they are merely two data points in the whole picture. I do not draw the conclusion that the first test was flawed and the second test was correct, as some might.
When 192 kbps is found indistinguishable from lossless, that says to me the test was poorly done. 192 kbps should be quite easy to distinguish from lossless when the test is properly conducted.

Yes. Doing lots of tests with lots of different music will gradually tell us what is audible and what is not, and how likely someone is to notice it.
It really depends on how it's conducted IMO.

For me, even though I'm a believer in blind tests, I have a problem when someone uses one blind test as a conclusion or proof of science to something.

When the LEDE acoustic principle was developed it was experienced that phase anomalies from a speaker became very audible in a well treated room. Something it wasn't in a regular room with little or no acoustic treatment. So they ended up having to custom make they own speakers instead of buying something off the shelves. This is an example of how something may effect what's audible or not.
 
.......
I think we need some minimum requirements. You can get a null result from almost anything if certain aspect aren't met.
....
Yes, & the use of controls within a test certifies a certain aspects of a test but even this is rejected by some here. Calling results from unqualified tests as data points is a misuse of language & logic, IMO but serves the agenda of many as they consider the number of 'null results' as confirmation that there is no difference to be heard & are uncaring how that null result is arrived at (any random selection of A or B qualifies as a test in this logic - a nodding dog or dipping bird toy hitting keyboard keys during a test will qualify as a 'result'). I've seen moderators on Hydrogen Audio forum saying that if they don't hear a difference in the first one or two trials, they just hit random keys in all the rest of the trials to get to 16 trials - they thought this was perfectly acceptable as 'Life's too short'

Controls within tests would reveal these sort of issues as well as others
 
Last edited:
Your test lacks controls and calibration. Your statistical analysis is laughable. Your equipment is inadequate for the task. You need to demonstrate the contrary before we can take you seriously.

a good demonstration of your lack of logical thinking and ignorance in dac design.
from what you said, when you insisted that all dacs sound the same, you should have at least tested all dacs on the market to qualify your claim, why not show us then.
 
i am highly experienced and have tested so many dacs from the very cheap to $2000 class
i can easily tell them apart with my setup, ABX blinded or not whatever you like
so far no cheap dac (under $500) can reproduce realistic or acceptable drums to me and it can be heard in seconds
all dacs sound the same? no more joke please
(all DIYers know that, you need a very revealing system and well trained ears to differentiate high end dacs, and they know the reason well)
I think it's only fair you go first
 
.....

When 192 kbps is found indistinguishable from lossless, that says to me the test was poorly done. 192 kbps should be quite easy to distinguish from lossless when the test is properly conducted.

......

Yes, this statement should be the source of a realization about auditory perception - it isn't a static, unchanging perception - it varies with all sorts of conditions, situations, etc

This is the fundamental problem with most people that run blind tests - they think they are testing a fixed thing called hearing i.e statements along the lines of 'if it's audible then it will be audible in all circumstances, otherwise it's negligible'

The wanton disregard for understanding this is at the heart of the problems we see here - any old test will do as long as it's blind - disregard for what they are actually testing & how to do it properly is the antithesis of the scientific principle.

The irony is that they they cite science is on their side
 
Jakob2 said:
Is a difference that a listener did not detect without beeing directed to it, but afterwards never fails to detect it, a small or a large difference?
For that particular listener it is neither small nor large; it is intermediate. If it were small he would not be consistently able to detect it; if it were large he would rarely fail to detect it even if not directed to it.

If you are willing to accept the results of sloppy tests because you are biased by your knowledge about measured differences than the risk of wrong conclusions is quite high.
I don't know whether the tests were "sloppy". I accept the result as a data point, without attempting to place a particular value on it.
 
For that particular listener it is neither small nor large; it is intermediate. If it were small he would not be consistently able to detect it; if it were large he would rarely fail to detect it even if not directed to it.
So we can gather from this that any aspect of sound which once one is directed what to listen for is not small if they can thereafter consistently hear it in a blind test.

Does this not contradict the claims made by you & others that these are small differences because they 'need training to be able to detect them'?


I don't know whether the tests were "sloppy". I accept the result as a data point, without attempting to place a particular value on it.
Illogical & unscientific, as usual.
 
For that particular listener it is neither small nor large; it is intermediate. If it were small he would not be consistently able to detect it; if it were large he would rarely fail to detect it even if not directed to it.


I don't know whether the tests were "sloppy". I accept the result as a data point, without attempting to place a particular value on it.

Is it still a data point if the value is zero?

Remember that the test was conducted by a person with frequently-elucidated views that human hearing ability is way overrated and that entire high end audio market is simply snake oil sold to wealthy boobs who are easily fooled with marketing claims.

Do you really trust that he has the interest and ability to conduct a test that would produce data rather than noise?

Remember, too that the OP doesn't even have a basic understanding of what a null result means and what it doesn't mean.
 
But you are asking this of a person who has proclaimed he is not interested in how valid blind tests should be conducted - he considers all tests as a datapoint. His replies are nothing more than an attempt at validating any & all rubbish as long as it supports his agenda- calling it a datapoint is a lame attempt at giving it a degree of validity & gravitas
 
Last edited:
Status
This old topic is closed. If you want to reopen this topic, contact a moderator using the "Report Post" button.