DAC blind test: NO audible difference whatsoever

Status
This old topic is closed. If you want to reopen this topic, contact a moderator using the "Report Post" button.
I don't agree that it's about what "some very picky people hear" - I'm pretty sure most people recognize a more realistic auditory illusion when they hear it.

I'm pretty sure they don't. The reason for that is I have tested people with variety of DACs and even with coaching most don't learn how to hear all the differences quickly or easily. It's more like they can lean how to listen for the most obvious differences with some effort quickly, but beyond that not much else. Even then they don't always care that they are hearing a bit more clearly than before, especially if it's expensive to get it.

For most people, if anything its more like they have learned to ignore any differences between DACs rather than to notice them. They focus on music only and don't listen to fine details of reproduction. For one thing, they are rarely exposed to finely detailed, very accurate systems, and people are notoriously bad at noticing what's missing, like missing distortion, etc. People are much better at noticing what's new, and it seems to work out that going from a very good system to less good system may be more noticeable than the other way around, at least for differences in DACs. It's different when going from a transistor radio to a system with chest-thumping bass, as that is a very noticeable big, new thing. Not something that takes effort to learn to give attention to.


Changing the subject a bit, may I ask if you have tried a Benchmark DAC-3? With a good set of cans? They go to a number of lengths to make their DACs more accurate than the stock ESS chips can do in minimal implementations. They also measure very well.

In the high end audiophile market they are considered mid-tier, but in the mastering and mixing market they are right up there in the top tier, I would say. It depends what people want. Mixing and mastering engineers are hired to produce results that sound as good as possible on many different systems, and to do that they need accurate systems. On the other hand, audiophiles care more about their listening pleasure and they want what they think sounds best to them.

I am more in the mixing and mastering camp in terms of my wants and needs. Therefore, I want measured accuracy and good sound, both of them. What you want might be different. You should not assume everybody wants what you want.
 
Ok fair enough. I guess the point I'm trying to make is to get to the crux of the issue of where is the information coming from that allows us to form a realistic image perception in our minds, ultimately from the recording but not in any specific way that makes it possible or not possible, it's not that simple an issue, it's largely to do with how we interact with it which must include how the speakers radiate it into the space we are occupying whilst listening

Sure a DAC or amplifier is part of a system which also includes speakers & room.

Each device in the complete system has it's own role & typical flaws

Once a certain quality of reproduction (subtle details aren't being masked or exaggerated) is achieved in a system then it's possible to differentially evaluate individual devices in that system by changing out the device for another one.

the phrase "a more realistic illusion" has many layers to it - often they are intertwined. Obviously this firstly depends on the recording - if we want to hear how well the recording room ambience is reproduced this needs to be recorded appropriately - electronica is not where it will be found.And remember there is almost always recording engineer manipulation of what is initially captured so we are generally listening to a mixture of the original musicians work overlaid with these manipulations. Unless we do a recording ourselves & know what we did we are faced with the "circle of confusion" as Toole calls it - we have no reference to judge what we are hearing.

But we have our inbuilt auditory model of how sounds behave in the real world & this is in operation at all times. So, at one level, we can judge how close the cymbals are to what we determine as realistic & how much like noise we perceive them. Similarly to any individual sound object in the scene - the sound of clapping hands, violins.

At the next level the "realism" is perceived in how solid & 3D the soundstage is presented (obviously again on the recording although studio manipulation also plays a part in this). How precisely located & unwavering in the auditory spatial scene is each auditory object portrayed? Usually the correct portrayal of the individual sounds in the above paragraph results in the soundstage realism too.
 
I'm pretty sure they don't. The reason for that is I have tested people with variety of DACs and even with coaching most don't learn how to hear all the differences quickly or easily. It's more like they can lean how to listen for the most obvious differences with some effort quickly, but beyond that not much else. Even then they don't always care that they are hearing a bit more clearly than before, especially if it's expensive to get it.

For most people, if anything its more like they have learned to ignore any differences between DACs rather than to notice them. They focus on music only and don't listen to fine details of reproduction. For one thing, they are rarely exposed to finely detailed, very accurate systems, and people are notoriously bad at noticing what's missing, like missing distortion, etc. People are much better at noticing what's new, and it seems to work out that going from a very good system to less good system may be more noticeable than the other way around, at least for differences in DACs. It's different when going from a transistor radio to a system with chest-thumping bass, as that is a very noticeable big, new thing. Not something that takes effort to learn to give attention to.
Maybe so but I'm not so sure that people don't notice when they are more engaged by the music & find it more interesting on one DAC when compared to another DAC which puts all the notes in the right place but lacks something? As you said before about system 1 Vs system 2 cognitive processes. I would extend the system 1 concept a bit by saying that even though it is the quick, intuitive part of the cognitive process, it also is second-guessed by system 2 processes. A lot of people do not have the knack of being able to turn off their analytical listening & are not able to access the system 1 aspect (as evidenced in this thread) - being convinced that the system 1 assessment has validity often takes some time & multiple exposure to different music/environments.

Changing the subject a bit, may I ask if you have tried a Benchmark DAC-3? With a good set of cans? They go to a number of lengths to make their DACs more accurate than the stock ESS chips can do in minimal implementations. They also measure very well.

In the high end audiophile market they are considered mid-tier, but in the mastering and mixing market they are right up there in the top tier, I would say. It depends what people want. Mixing and mastering engineers are hired to produce results that sound as good as possible on many different systems, and to do that they need accurate systems. On the other hand, audiophiles care more about their listening pleasure and they want what they think sounds best to them.

I am more in the mixing and mastering camp in terms of my wants and needs. Therefore, I want measured accuracy and good sound, both of them. What you want might be different. You should not assume everybody wants what you want.
I haven't heard Benchmark but Meitner, Lampizator, Chord, dCs & others - most of these measure excellently!!

Sure, I shouldn't assume others want the same things but I keep going back to the preference tests by Harmon on speakers - it turns out that trained/untrained; experts/non-experts; students/mixing engineers(?) generally gravitate towards the same sound preference as far as this test evaluated & it turns out that this measured as a smooth frequency response on & off axis

If you are mixing & mastering then you would be interested in this mixing engineers report of his ABX testing on sample rate differentiation - it's a great insight into how difficult ABX testing is even by an experienced individual
Foobar 2000 ABX Test - Redbook vs 192/24 - Gearslutz Pro Audio Community
 
Yes, so also as Mark implies, most people don't recognise a more realistic illusion, due, in part, to a lack of interest, also a lack of experience, if you've worn earbuds most of your life, what chance have you?

I know we mostly assume this but is it because we have some superiority complex about being audiophiles? See my mention of Harmon's preference testing of speakers!
 
I was thinking more of everyday exposure to the real sound world as opposed to only listening to our high-end audio equipment ;) Also, hearing real live music, not sure how prevalent it is these days though, and I don't mean in the concert hall (chance would be a fine thing) a pub singalong would do. Even just conversations with people, how we learn to hear is very important in this regard.
 
I was thinking more of everyday exposure to the real sound world as opposed to only listening to our high-end audio equipment ;) Also, hearing real live music, not sure how prevalent it is these days though, and I don't mean in the concert hall (chance would be a fine thing) a pub singalong would do. Even just conversations with people, how we learn to hear is very important in this regard.
Oh, I see what you mean - yes exposure to real world of sound is important but I reckon the majority of the groundwork/modelling/patterns are mostly established in early development - I think learning to speak in 'proper' sentences is related to this pattern learning so by the time we have mastered how to speak we have established the internal auditory models. I think it's pretty impossible to avoid exposure to real world sound.

What you say is correct though as it's something I read recently - spoken Chinese uses intonation within the sentence to communicate different meanings & foreigners thus have a great problem communicating because even though they have the correct sounds their intonation often suggest a different meaning.
The interesting aspect to this is that there are a lot more pitch perfect Chinese people than found elsewhere.

I believe this is correct but can't find my source for it - I have a feeling it may have been from Scott Wurcer so forgive if I got it wrong?

Edit: It was Earl Geddes here Who makes the lowest distortion speaker drivers
 
Last edited:
Yes, almost certain it was Scott earlier in this thread, we nearly went off on an interesting tangent about scales and harmony and apparent lack thereof

I found it in the above link in another interesting thread - 123 pages here but still some interesting stuff & I feel there's some meeting of minds or understanding happening on both sides, maybe, perhaps, just possibly, am I reading it all wrong??
 
Ah right, yes, that's another interesting fun thread. You are not reading it wrong, I'm a little out of my depth here when it comes to knowledge and understanding, but I'm not one to give up and I try to have zero ego, It does me no favours and is a hinderance to just about everything as I see it. Like everyone I have my views and beliefs which are mine and I like to keep if possible! Communication is what it's all about, it's also not easiest in this medium where there is no tone of voice to help, another reason I don't get hung up on what people say or the way they say it.
 
If you are mixing & mastering then you would be interested in this mixing engineers report of his ABX testing on sample rate differentiation - it's a great insight into how difficult ABX testing is even by an experienced individual

I don't want to pick through a whole thread to look for particular salient-to-me factors. ABX brings its own set of problems for detecting small differences, but aside from that one has to be careful with sample rates on most operating systems, as OS's will usually perform automatic low-quality SRC on you without any warning. That can destroy most of whatever difference there may be between hi-res and CD quality recordings. Using ASIO drivers whenever possible is probably the best defense.
 
Last edited:
I don't want to pick through a whole thread to look for particular salient-to-me factors. ABX brings its own set of problems for detecting small differences, but aside from that one has to be careful with sample rates on most operating systems, as OS's will usually perform automatic low-quality SRC on you without any warning. That can destroy most of whatever difference there may be between hi-res and CD quality recordings.

Ok, it's only 2 pages - others might like to read through it to learn about real ABX testing & what it involves

First thing to remember is that he has a preference for high sample rate sound from his long term exposure & work with different sample rates so he just set out to 'prove' his preference was based on something 'real' by doing ABX tests

Here are some salient excerpts:
Caveats--Program material is crucial. Anything that did not pass through the air on the way to the recording material, like ITB synth tracks, I'm completely unable to detect; only live acoustic sources give me anything to work with. So for lots of published material, sample rates really don't matter--and they surely don't matter to me for that material. However, this result is also strong support for a claim that I'm detecting a phenomenon of pure sample rate/word length difference, and not just incidental coloration induced by processing. The latter should be detectable on all program material with sufficient freq content.
Also, these differences ARE small, and hard to detect. I did note that I was able to speed up my decision process as time went on, but only gradually. It's a difference that's analogous to the difference between a picture just barely out of focus, and one that's sharp focused throughout--a holistic impression. For casual purposes, a picture that focused "enough" will do--in Marketing, that's 'satisficing'. But of course I always want more.

In re "kind of artefact", I tried to listen for soundstage depth and accurate detail. It took a lot of training repetitions, and remains a holistic impression, not any single feature I can easily point to. It seems to me that the 192 files have the aural analogue of better focus. To train, I would try to hear *precisely* where in front of me particular sound features were located, in two dimensions: left-to-right, and closer-to-further away--the foobar tool would then allow me to match up which two were easier to precisely locate. I know it muddies the waters, but I also had a very holistic impression of sound (uhhhhhh) 'texture'??--in which the 192 file was smoother/silkier/richer. The 192 is easier on the ears (just slightly) over time; with good sound reproduction through quality headphones (DT 770) through quality interface (RME Babyface) I can listen for quite a while without ear fatigue, even on material that would normally be considered pretty harsh (capsule's 'Starry Sky', for example), and which *does* wear me out over time when heard via Redbook audio.
I realize that the ABX only reveals that *something* is detected that allows me to identify the proper pairs. No one need take my word for it that I'm listening for and hearing spatial detail--but that is in fact what I'm doing, so folks can take it or leave it in that respect.

I will note that IF it were the case that a consistent artifact/distortion is being added to the signal, then it would also have to be the case that this artifact would be detectable in all tested content. But this is not the case. If there's not soundstage depth present in a live-recorded signal on the disk, then I can't score above random guessing in foobar, period. It IS the fact that I can detect the difference on some, but not others.
Practice improves performance. To reach 99.8% statistical reliability, and to do so more quickly (this new one was done in about 1/3 the time required for the trials listed above in the thread), I mainly have to train my concentration.

It is *very* easy to get off on a tangent, listening for a certain brightness or darkness, for the timbre balance in one part, several parts, or all--this immediately introduces errors, even though this type of listening is much more likely to be what I am and need to be doing when recording and mixing a new track.

Once I am able to repeatedly focus just on spatial focus/accuracy--4 times in a row, for X & Y, and A & B--then I can hit the target. Get lazy even one time, miss the target.
It took me a **lot** of training. I listened for a dozen wrong things before I settled on the aspects below.

I try to visualize the point source of every single instrument in the mix--that's why I picked a complex mix for this trial. I pinpoint precisely where each instrument is, and especially its distance from the listener. Problem is, both versions already have *some* spatial depth and placement, it's only a matter of deciding which one is deeper, and more precise. I've tried making determinations off of a particular part, like a guitar vamp or hi-hat pattern, but can't get above about 2/3 correct that way.
The better approach is just to ask myself which version is easier to precisely visualize, as a holistic judgment of all the pieces together. Equally effective, or rather equally contributing to the choice, is asking which version holistically gives me a sense of a physically larger soundstage, especially in the dimension extending directly away from me--thus the idea of listening to reverb characteristics.
Having to listen to four playbacks (A/B, X/Y, for one choice) gives rise to the problem of desensitization. Neurons naturally give decreased response to repetitions, so I've found I can target my answer more easily if I pause 5-10 seconds between an A/B (or an X/Y). Otherwise, A/B is always easier than X/Y.
I have rather junky monitors, KRK Rokit 6's, so I'm kind of surprised I can get a result out of them. To get down into low single digits I shifted to my headphones pushed by a nice Schiit Asgard2 amp, which I just acquired--if your headphones are good, I'd recommend using them for the testing. This is more for isolation than anything else.
The techniques used to obtain a result were the same: listening for tonal and EQ differences is hopeless, while listening for spatial depth makes the right choice apparent. Ear fatigue is still a problem to be fought: you can see a short break I took near the end to refresh my ears.

Sorry if it seems like I'm copying over the whole thread but as I went through it I couldn't help but extract what I felt relevant :eek:
 
Ok, it's only 2 pages - others might like to read through it to learn about real ABX testing & what it involves

So, I didn't see where he ever said whether he was using an ASIO driver in Foobar or not. If not and you try to compare two files at different sample rates, one of them is going to undergo low-quality SRC by the OS. In that case, one isn't really listening to two different sample rates at all.
 
So, I didn't see where he ever said whether he was using an ASIO driver in Foobar or not. If not and you try to compare two files at different sample rates, one of them is going to undergo low-quality SRC by the OS. In that case, one isn't really listening to two different sample rates at all.

The reason I posted it was not to debate the ins & outs of sample rates but rather to show a reasonably good description of the effort he goes through & the difficulties he encounters in doing ABX tests

Just as another 'datapoint' :) he posts again on another forum but this time about ABX testing the audibility of jitter
Jitter Correlation to Audibility | Page 8 | Head-Fi.org

I won't try extracting much but note this - he listens for a completely different aspect of the sound - not soundstage this time but an specific portrayal difference in the snare drum later I think he adjusts to something else?

Again, I'm just giving people a reference to read about real ABX testing & what's involved - far more detail than is given in this thread but it reiterates points made in this thread about ABX issues

Well, I will insist on the caveat that *all* ABX testing is of a sort pretty much wholly removed from how one would normally listen to music. The protocol can't be completed otherwise. The *only* time I ever listened like that in real life was when I was trying to hear John Lennon say "I bury Paul" at the end of "Strawberry Fields". [​IMG] That said,

Yes, my first research question is usually "Is differentiation possible at all???", and so I use the tools available to hunt for the differences.
It was particularly difficult in this case, as I don't have a good sense of what problematic jitter *ought* to sound like, and it matters what testers are listening for.

Since I can pick out a difference on one snare hit, a further refinement would be to listen more 'casually', and see if the drum set sounds different throughout.

I'm guessing that the added jitter track would have been indistinguishable for this particular music, but it's faintly conceivable that interested listeners could learn to hear the difference without the procedures I described.
 
Last edited:
A test which only has a statistical outcome cannot reject (or fail to reject) anything, in spite of forms of words which may suggest otherwise. It can only give some indication about the likelihood of something being true, and hence the likelihood of it being false.

You stated that a statistical analysis neither could nor could not reject the nullhypothesis but in fact that is the core of the process.
It is formal statistical procedure and at the end there is a decision made by comparison with a predefined criterion. So to be correct - if the probablitiy for the observed data is below the predifined level of significance - it would be the null hypothesis could be rejected on SL = 0.05. There is sometimes given the advice to just report the p-value and let the readers decide, but even that doesn´t change the formal procedure.

It is a statistical analysis based on the _observed_ data (in fact of the observed and more extreme data); the statistical analysis can´t decide whether the data was gathered in a good/sound/meaningfull experiment well planned and executed.
The analysis just answered the question how probable the observed data are under the assumption that the nullhypothesis is true.

Therefore i emphasized the role of the main quality criteria (and the needed replication(s)), if an experiment/test was objective, valid and reliable it is justified to draw further conclusions based on the result, although the experimenter still does not know if the decision is correct or incorrect.

This assumes, of course, that it cannot be both true and false and that it is a meaningful statement.

It is important to realize that the statistical analysis of any experimental result and the assessment of the quality (wrt to the hypothesis under examination) are totally different topics.

Let´s take some examples; two DUTs were examined in a controlled listening test (ABX or A/B, 100 trials, 65 hits, SL=0.05 ) therefore the statistical analysis - using the exact binomial test - with:
H0: p = 0.5
H1: p > 0.5 (it´s a different question if a test should be two-sided, this one is one-sided)

would result in "the null hypothesis can be rejected"; actual P for this observed result (under the assumption that the nullhypothesis is true) is:
P(X >= 65) = 0.0018 .

Looks impressive, but if nobody did measurements on the DUTs we simply don´t know if the result is something to be expected or a surprising result.
Let´s assume later measurements would show a level difference of 0.4dB between the DUTs, then the statistical analysis would still be absolutely correct, but would hardly be something special.

Let´s consider the same test but with a different hypothetical result of only 45 hits, then the statistical analysis would result in "the null hypothesis can not be rejected"; actual P for this observed data (under the assumption that the null hypothesis is true) is:
P(X >= 45) = 0.864

Which means the observed data are compatible with our null hypothesis (p=0.5, which means only random guessing is at play), so we can´t reject the null hypothesis.

We don´t know if the participants were really guessing and if nobody has tested (see the matter of positive controls) it before, we don´t know if the participants would have been able to detect any difference, but anyway the statistical analysis would still be correct.

But in neither of these two hypothetical test scenarios we would be able to draw further conclusions from the results.


The apparent glee with which some people make the point which you are making (but you make it quite soberly) suggests that they quite like the idea that difference can 'proved' but indistinguishability cannot.

I am not sure, if it helps to talk about "glee", because these are facts of testing.

Obviously there are questions of epistimology touched within these procedures and following that, we have to state that "prove" is not provided (either in case of positive or negative results) and i think the proponents of the idea of corrobation of "indistinguishability" do have a point; tests wich are objective, valid and reliable can imo indeed contribute to evidence of "indistinguishability" provided that sample sizes are of a sufficient order.
 
Last edited:
From post 696. I'll repeat it because the we're on the test, confidence, and hypothesis.

The OP has enough samples now to extract statistics.

The AB test should have a binomial distribution, as you only have 2 outcomes, just like a coin toss. So the 95% confidence interval, assuming its a unbiased random pick, is 0.5+/-1.96*( 0.5 * 0.5 / 263)^0.5 or p= 0.5 +/-0.0604 or [0.4396 to 0.5604] which is what the OP got (actual 0.475).

See the post above, the statistical test/calculation/analysis speaks for itself.
(Although i´d admit having some problems to extract which number of trials was done under the different test conditions).


Now for some semantics. If your null hypothesis was "there is a difference", then you are within 95% confidence of a random result, so you cannot statistically say "there is a difference". For most people the converse is also true that "there was no difference" detected.

I don't understand the controversy it generated. :scratch1:

For semantics, it´s "the observed data are compatible with the null hypothesis" (i.e. p = 0.5), but we don´t know if no difference was detected (generally) or if the listeners were really only randomly guessing.
Mainly because we haven´t really examined what the participants were doing, we only have analyzed the observed data under the assumption that the null hypothesis is true.

Furthermore we don´t know if this result occures because the difference is "indistuinguishable" or because the listeners would have not been able to spot a difference in general or because the listeners were not able to detect a difference under the specific test conditions.

Yes, it can. It can to a predefined confidence level. In medicine that's usually 95% chance the result is correct. Sometimes 95% sure isn't sure enough, it depends.

The "confidence level" is usually associated with the construction of a socalled "confidence interval" (that´s what DonVK did), but in these audio tests it´s mainly a socalled "significance level" in calculation the p-value.
It is an ongoing discussion if it would be better to dismiss p-values in favour of confidence intervals and the afore mentioned philosophical questions are imo still unresolved.

That includes the differences between Bayesian and Frequentist approach to probability.
The construction of confidence intervals (and therefore the confidence level) are statistical procedures and there is probability associated (given that the assumptions are correct and met) if done repeatedly.

In the Frequentist´s world there can´t be a 95% level of correctness or probability assigned to a hypothesis, because in his view the correct result is included in the confidence interval or it is not included, so it´s either 0% or 100%.
 
Some manufacturers only publish specs that portray their product in a favorable light. Many are not equipped to take a full gamut of high quality measurements either. So, to a significant extent its not that there is a huge problem with measurement technology itself, but perhaps more with how it is sometimes used.

There is also price point to consider. Most equipment is designed to sell at a particular price point and to compete only in that domain. Competition is not usually primarily on detailed specifications that consumers don't understand, and probably don't want to learn to understand.

Finally, there is little or no funding to do the kind of research that would be needed to better understand how some very picky people hear, and how to best measure some of it. In the meantime we are left with conducting listening tests along with taking measurements. It's up to each manufacturer to decide how to find listeners and whether or not to train them, and if so, how. Smaller manufacturers may not be able to afford much. So, its very complicated and difficult to design and build a really good product, salable at a good price, and then convince enough people to buy it to justify all the costs.

I agree with a lot of what you say, but this varies considerably from one manufacturer to another. I can speak for myself as a manufacturer.

IME many of the audio measurements that are routinely done to characterize equipment are woefully inadequate to correlate to sound quality in high-performance systems. This is why even JA of Stereophile is sometimes stumped when a really bad measuring component gets a rave review by the reviewer and why he himself hears differences in both digital and analog cables that cannot be correlated to any measurements.

I for one have been wanting to correlate digital jitter in S/PDIF signal to sound quality. I feel that a single number representing the jitter of some master clock is insufficient and really useless. Even the plots that you commonly see are useless. No real correlation has ever been done, except maybe at a really gross level, like 10nsec of jitter versus 100psec, and even this is debated as the jitter being correlated or uncorrelated to the waveform or music.

Until now.

See these plots of jitter for various S/PDIF cables:

Can S/PDIF cable jitter be measured?

This shows that some aspects of the histograms are useful to correlate to sound quality. Others, like liquid vocals are simply a result of conductor material and quality, having little to do with jitter shape, spectrum or magnitude. It will take a bit more work and more plots, but I think this is useful. Because of these plots, I made changes to my best BNC-BNC cable and improved it significantly. It's already working for me.

Steve N.
Empirical Audio
 
Status
This old topic is closed. If you want to reopen this topic, contact a moderator using the "Report Post" button.