Blind DAC Public Listening Test Results

RocketScientist · 2011-03-17 4:34 pm

I conducted a public blind listening test by making high quality recordings of the outputs of 4 different USB DACs using their line outputs and also the headphone outputs loaded with real headphones. The music excerpts were available for download in FLAC format and 20 people voted on at least some of the trials. The DACs are:

Behringer UCA202 ($29)
Modified UCA202 (above + $5 worth of parts)
NuForce uDAC-2 ($130)
Benchmark DAC1 Pre ($1600)

This was only an informal listening test, so don't expect rigorous methods or all the results to be statistically valid, but there were some interesting clear preferences. The link below has the full results:

NwAvGuy: DAC Listening Challenge Results

NuForce%20uDAC-2%20vs%20Behringer%20UCA202%20Sound_thumb%5B1%5D.jpg

Colin · 2011-03-17 4:58 pm

Nice work. Interesting blog and shame you got booted off another forum because one advertiser couldn't address the technical limitations of their product.

RocketScientist · 2011-03-17 5:22 pm

Thanks. I haven't quite been fully booted (yet at least) off Head-Fi but they've succeeded in making it much more difficult to contribute there. It's especially a shame because the majority of participants in the listening tests came from Head-Fi and I, and others, have been unable to even indicate in those Head-Fi threads the results have been published. Any such indications are immediately deleted. One really has to wonder where Head-Fi's priorities are?

Fortunately there are plenty of more rational forums--such as this one! And I'm more after quality than quantity anyway.

SY · 2011-03-17 5:38 pm

There's things to agree with and disagree with, there's things that could be done differently, but... you recognized what you did for what it is, there are some clever things said, you made some very trenchant observations, you qualified your comments, and I'm delighted you posted that link here.

SoNic_real_one · 2011-03-17 5:40 pm

That concept of testing is based on:

It's somewhat like looking at 3 different shirts at an online retailer. You can tell easily how they're different from each other by looking at the pictures of each shirt using your PC. You don't need a perfect computer display to tell that one might be a slightly darker shade of blue than another one. Most any computer display is good enough to show the differences. But you can't be sure of the exact shade of blue until you have the real shirt in front of you.

See, that assumption is false... Because many people cannot discern between subtle shades of blue. So, even if different, their eyes will still see them identical because of their poor vision (or poor PC monitor). And they would guess.

Your tests, run on a good equipment will reveal the differences. Run on a system that outputs tones of c**p, that c**p will MASK the real differences. A PC speaker system that has a bandwidth of 60-15kHz and a SNR/Dynamic of 60dB will NOT reveal nothing that happens out of those limits.
Maybe you should ask the testers to reveal their listening equipment and age.

RocketScientist · 2011-03-17 5:44 pm

Thanks SY. The "trenchant" remarks are mostly related to the outright attacks NuForce has tried to make on my review and even me personally. I've tried to remain as objective as possible regarding the measurements, results, etc. But I'm sure some of my commentary is tainted by my negative experiences with NuForce. Still, I've tried to compliment their product where compliments are due. It's not all bad.

And if I do this again, I'd very much appreciate input on ways to do a better job.

RocketScientist · 2011-03-17 5:50 pm

SoNic_real_one said:
That concept of testing is based on:
See, that assumption is false... Because many people cannot discern between subtle shades of blue. So, even if different, their eyes will still see them identical because of their poor vision (or poor PC monitor). And they would guess.

Your tests, run on a good equipment will reveal the differences. Run on a system that outputs tones of c**p, that c**p will MASK the real differences. A PC speaker system that has a bandwidth of 60-15kHz and a SNR/Dynamic of 60dB will NOT reveal nothing that happens out of those limits.

I should probably be more clear about a "reasonably decent" computer display or listening gear. Obviously a set of $20 PC speakers isn't going to cut it. In reality, most of the participants used headphones in the $100 -$400 range and many were driven from relatively expensive dedicated headphone sources. A few used the headphone jack on a MacBook Pro, etc. but they still used fairly well respected cans.

I agree the lower you go fidelity wise, the more there's a potential for masking. But it's interesting the subtle advantages of the high-end Benchmark stood out even when listening to the files on much less expensive hardware with far inferior specs including the SNR, etc.

And short of having the listeners go through an interview/selection process, I can't do anything about their "poor vision" or "poor hearing". I think it's fair to just include those interested in headphone listening as they're the typical customer of the products being compared.

SY · 2011-03-17 5:52 pm

RocketScientist said:
And if I do this again, I'd very much appreciate input on ways to do a better job.

I'd be delighted to help.

azneinstein · 2011-03-17 6:47 pm

Rocket, I thought your blind test was great and it shows what audio is about. Clearly, the assumption is as price increases- we get better fidelity, so we expect which one should be better until/or we hit personal preferences or "incorrect" pairing of sounds. I'm thinking of doing a similiar test of straight 1-10 scores on 10 or so 8" fullrangers soon with RS drivers, hopefully the Burros, random ebay's to the Visatons B200 and it'll be all by ear and friends which I expect, similiar results where the Visatons being the only audiophile driver to top and have everything else follow.

But I definitely think DACs are one of the harder equipment to judge which is why people expect the need for all the components to be audiophile. I tried an A/B/C a while back with an oldschool monster branded dac, audio authority, and another friends dac where we could barely pick out differences but in blind tests and different songs, we probably couldn't pick out accurately which dac was playing what, just which one sounds better.

RocketScientist · 2011-03-17 7:21 pm

@azneinstein I agree DAC differences are generally "non obvious". When I first conceived the test, my greatest concern was there wouldn't be any consistent outcome. But I did careful ABX trials on my own (using Foobar) of the first few recordings and was somewhat surprised I scored very well on some of them. But that only identified there was a difference. It was harder to pick a favorite when I didn't know which file was which. So I really didn't know what to expect from the "public".

I've seen enough solid evidence to be convinced that some people are much better "listeners" than others including in blind tests. I think it's partly just human genetics--we all have things we're predisposed to be better and worse at--and partly a matter of training one's ears and brain to hear small differences. So just because I can or can't hear something, I don't assume the same will be true for everyone.

HydrogenAudio has done some interesting listening tests using Foobar ABX of things like different re-sampling algorithms. And, with some, many couldn't hear a reliable difference while a select few could. There was even a blind cable test where among a panel of many listeners, only Michael Fremer from Stereophile (if I remember correctly) could reliably identify one of the cables.

So that's partly why I did this test. I thought it would be fun to see how my listening abilities stacked up to whoever else participated. Electronics lend themselves well to this sort of test where anyone can use Foobar ABX to compare the test files. Speakers and headphones, however, are very different.

I know Harman International has a really expensive and advanced listening room with a theatrical-like "stage" that can automatically at the push-of-a-button reposition speakers in the same spot for blind testing behind an acoustically transparent but visually opaque "screen". Even something as minor as trying to compare two speakers side-by-side or stacked on top of each other can mess up the comparison. Sean Olive has written about all that on his blog. He's perhaps most famous for his excellent article about all the problems with "sighted" (versus blind) listening:

Dishonesty of Sighted Listening

The "sighted" bias came out in my listening tests in that one person flagged early on one of the reference tracks (that hadn't been through any of the DAC's) as being "clearly the worst" and 5 more people promptly "heard" the same thing. Some of those who used blind ABX, however, picked the same track as their favorite. So it's easy to see how even a single positive or negative subjective comment on any product in the forums can severely bias many others who evaluate the same thing.

And I also think it's significant that a lot of manufactures "design by ear" but they use sighted listening. Given all the obvious and well proven bias with sighted listening, is it really valid to use it as the primary criteria for product design? Sean Olive does a better job than most of trying to grapple with that issue.

And as for price, that was partly why I chose the lowly $29 Behringer to include for comparison. It's like including a $3 wine in an expensive wine tasting. I love it when the low priced entry turns out to be a well liked bargain in a blind test! I this case it didn't, but some still liked it.

SY · 2011-03-17 7:50 pm

The Fremer thing is questionable and has taken on the aura of Urban Legend. Main issue to be addressed in a good test (besides "no peeking" or other non-auditory clues) is VERY careful level-matching.

RocketScientist · 2011-03-17 8:32 pm

SY said:
The Fremer thing is questionable and has taken on the aura of Urban Legend. Main issue to be addressed in a good test (besides "no peeking" or other non-auditory clues) is VERY careful level-matching.

Yeah, the Fremer thing does stand out as a bit of an oddity. I've met the man but we didn't get to that topic, we only talked about vinyl.

As for level matching, the left channel RMS average volume in each group of trials were matched to +/- 0.01 dB. So hopefully that's careful enough?

SY · 2011-03-17 8:36 pm

Excellent!

SoNic_real_one · 2011-03-18 4:18 am

Maybe you could include some control samples? Like one particular musical pice - once full CD quality, one at reduced quality (run thru mp3 at 128kbps or something like that).
If the respondents pick up corectly that, validate their results...

RocketScientist · 2011-03-18 2:54 pm

A "hidden" challenge is a good idea but it does complicate the trial a bit. And the more simple the trial (i.e. the fewer things to compare, less listening time, etc.) the more responses you'll likely get. It's like those online "tell us about our website" questionnaires. They get a decent response to the simple 15 second ones but few want to bother with the long ones unless they've been "bribed" in some way to make it worth their while.

One can also argue you're trying to represent what the average person would hear (or not hear). If some people don't have the hearing/skills/etc. to identify certain differences, isn't that still significant data for the study? Those people, and everyone else like them, might want to buy one of the items being reviewed and could save some money getting the cheapest one if they can't hear any differences.

I guess it depends on what you're trying accomplish. If you really want to discern the ultimate differences between gear, then you limit the trial to the most skilled listeners you can find. But few others would hear all the differences they heard so you can argue that's not very practical for the average person shopping for that gear. By including a wider range of listening skills, you average the results out to something more realistic.

Clinical drug trials used to generally be run on fairly random samples of the population within some broad boundaries. But, increasingly, the expensive new drugs were doing no better (and often worse) than the placebo in the trials. So, in response, some of the drug companies are now quietly "pre-selecting" their trial candidates in various ways to include only people they suspect will respond best to their drug. That's OK if the drug is then marketed only for that group of people, but that's often not the case. They're stacking the odds unfairly in their favor.

SoNic_real_one · 2011-03-18 10:40 pm

Well, you are right. Some (most?) of the new generation is perfectly happy with the downloaded mp3/aac songs, played via an iPod or similar. They never heard "better". I have a collegue at work - he is 24 and NEVER listened a CD player.
If that is the target of the study, sure... don't bother with control samples.
If you want to compare medium/high end equipment, targeted to sonically "educated" people, is a different story. maybe they don't have ADD and can go tru a couple extra samples.

RocketScientist · 2011-03-18 11:12 pm

To me, the best compromise, is to try and use participants who are among the "target customers" for the devices being compared. So, like you said, that will depend on what you're comparing.

SY · 2011-03-18 11:39 pm

Agreed, you want to select your participants rather than make it a free-for-all; there are some straightforward ways to remove the bandwagon effect and keep the responses truly independent. As a side effect, by implementing the controls correctly, you'll also get rid of any potential presentation bias. If you do that and are careful about level matching, you'll have a totally solid test.

wakibaki · 2011-03-19 12:45 am

Good test and site. I came away from reading it feeling that I had learned something about the products, that's a rarity in audio these days.

As for the behaviour on the Head-Fi site, it's dishonest, perverse and it's their loss.

w

RocketScientist · 2011-03-19 3:33 am

Thanks SY, I agree on all counts. There are various ways to do online surveys to allow voting without seeing any of the other results. Clearly that would help a lot.

The flipside, however, is it was fun to watch the forum bias skew the reference track into being the "worst". I think the exact same thing happens all the time with subjective opinions posted in these forums. Person A says Product B is amazing and others magically hear the same thing and, as you said, others pile on the same bandwagon from there.

And thanks Wakibaki for the encouragement.

Blind DAC Public Listening Test Results

Member

Member

Member

Ex-Moderator

Member

Member

Member

Ex-Moderator

Member

Member

Ex-Moderator

Member

Ex-Moderator

Member

Member

Member

Member

Ex-Moderator

Banned

Member