Blind DAC Public Listening Test Results

SY · 2011-03-19 7:56 am

Well, the test doesn't have to be online- you can have an online signup, then handle distribution of tracks and voting done via email. This gives you much more flexibility in the test design to implement positive controls and control for presentation order.

BTW, thanks for understanding my comments for what they are- suggestions on taking an interesting and clever test, where you avoided the most common error (level matching), and improving it to a rock-solid test with incontrovertible data, worthy of publication- which I urge you to do, it would be a great paper. You're one of the few people I've encountered on these forums who understands the concepts of qualifying results and the reality of error sources; I more often encounter hostility from people who are convinced that their test design is the ultimate and don't get the idea that they're dealing with humans. :up:

Buckapound · 2011-03-19 12:53 pm

If you're looking for a good hidden test, simply make a duplicate or two of one of the files already in the test. Those people who rank them as noticeably different have a lower reliability on everything else. This is a common quality check in wine judging.

--Buckapound

RocketScientist · 2011-03-19 3:51 pm

Thanks to both SY and Buckapound. One problem with duplicate files, in this case, is anyone using Foobar ABX can also use the Foobar Bitcompare plugin (or several other ways) to identify files that are the same. Plus the FLAC file sizes would also be identical unless you "doctored" one of the files.

Of course, once someone resorts to analyzing the files in ways that don't involve listening, they're essentially "cheating". But they can still skew the results if nobody knows they cheated.

Ideally, trials should be run in person where all those things could be controlled. Conducting them "remotely" over the web creates more ways for someone to skew the results. Another example would be people sharing their preferences with each other via email, PM, chat, etc.

There are other limitations as well. While short snippits of music work well for Foobar ABX trials. They're not so good for anyone that wants to just sit and listen for a while. I expected to get more complaints about my clips being under 20 seconds. But 20 seconds is apparently the legal limit to use a "public excerpt" without running into copyright issues and needing formal permission. Files sizes (and bandwidth) would also get big for full length lossless test files.

So the challenge is finding ways to work around as many of the various limitations of this sort of listening trial as possible. Lots of good ideas have already been presented. I certainly welcome more input on ways to improve future tests. I've already learned a lot.

godfrey · 2011-03-19 4:13 pm

To avoid the bandwagon effect, you could send each person differently named files. That way participants can't compare notes till the results are published, because only you know that Bob's dlgtjm.WAV is the same as Sue's akhtux.WAV

I'd use random file names like that too so people don't subconsciously vote for their favorite president (or whatever).

[edit]

RocketScientist said:
But 20 seconds is apparently the legal limit to use a "public excerpt" without running into copyright issues and needing formal permission.

Good to know! The question of legality cropped up on a thread here recently, regarding a similar test.

RocketScientist · 2011-03-19 4:33 pm

It's worth noting, in the USA at least, the excerpt has to also be for "non-commercial" (i.e. personal or educational) use. A for-profit business cannot share even 15 second excerpts legally. For example the preview clips on Amazon and similar sites require explicit pre-arranged legal permission (but it's not hard for them to get blanket permission, as they are after all, trying to sell the music for the benefit of the copyright holders).

I do agree random names would be better in terms of any psychological association.

The file name trick works if you want to email the files individually. But short of some fancy server-side file name scripting, I don't see a way to automate different names people initiate their own downloads.

sampleaccurate · 2011-03-20 7:19 am

RocketScientist said:
I conducted a public blind listening test by making high quality recordings of the outputs of 4 different USB DACs using their line outputs and also the headphone outputs loaded with real headphones. The music excerpts were available for download in FLAC format and 20 people voted on at least some of the trials. The DACs are:

Behringer UCA202 ($29)

Modified UCA202 (above + $5 worth of parts)

NuForce uDAC-2 ($130)

Benchmark DAC1 Pre ($1600)

This was only an informal listening test, so don't expect rigorous methods or all the results to be statistically valid, but there were some interesting clear preferences. The link below has the full results:

NwAvGuy: DAC Listening Challenge Results

I think you were unfair to Behringer for not trying their $199 ADA8000. Not only is it an 8 channel D/A, it's an 8 channel A/D as well.

The UCA202 sounds HORRIBLE. I own one and it even picks up radio noise. You could buy FIFTY-FIVE UCA202s for the price of one DAC1. Comparing a $29 product to a $1600 product isn't realistic. It would make more sense to choose some mid-priced ($200 or so) converters and compare them instead IMHO. Who is seriously going to use a $29 Behringer converter for a hi-fi system? For consumer use, fine, but not even close to hi-fi. Is it a surprise it doesn't sound as good? However, it's not a surprise that some people can't hear the difference. It's genetics as well as experience that determines whether someone can hear subtle differences in products.

Any test that isn't double blind is really worthless. High end audio manufacturers are AFRAID to have the results of double blind tests performed between their equipment and the less expensive competition because they know the difference is all too often indistinguishable to the average consumer and even to many who consider themselves audiophiles.

There is LESS and LESS of a correlation between price and quality of sound as technology advances and good sounding D/A converters are mass produced for pennies each. The high cost of good converters is in recouping development expenses, not fabrication. I for one believe we're very close to the point where inexpensive converters will sound as good as expensive ones, and the only people paying thousands of dollars for D/As are rich folks who cling to the idea that more expensive MUST be better quality. But that requires that the consumer demand quality and buy only reasonable priced products that perform well. Otherwise there is little incentive for all manufacturers to make good product and some will continue to make garbage that sells.

There's no reason a good D/A should cost thousands of dollars, but that statement will be met with rabid disagreement from those companies that produce D/As that are ridiculously overpriced. BUT, if they can sell 'em to people who think they have golden ears and have unlimited money to waste MORE POWER TO THEM!

SY · 2011-03-20 1:04 pm

godfrey said:
To avoid the bandwagon effect, you could send each person differently named files. That way participants can't compare notes till the results are published, because only you know that Bob's dlgtjm.WAV is the same as Sue's akhtux.WAV

I'd use random file names like that too so people don't subconsciously vote for their favorite president (or whatever).

Ding-ding-ding! Exactly correct. It also allows you to mix in some appropriate positive controls (e.g., MP3 or other lossy encoded/decoded) to determine whether or not to use the scores in the aggregate numbers.

RocketScientist · 2011-03-20 2:35 pm

sampleaccurate said:
I think you were unfair to Behringer for not trying their $199 ADA8000. Not only is it an 8 channel D/A, it's an 8 channel A/D as well...

I tried to explain at the start of the listening test (and perhaps didn't do a great job) the main point of the test was to evaluate the "sound" of the NuForce uDAC-2. The Behringer UCA202 was included because I happened to already have one, and it measured fairly well when I reviewed it. The Benchmark, as explained in the tests, was mainly included as a reference not as valid competition for the other two products.

sampleaccurate said:
The UCA202 sounds HORRIBLE. I own one and it even picks up radio noise. You could buy FIFTY-FIVE UCA202s for the price of one DAC1. Comparing a $29 product to a $1600 product isn't realistic. It would make more sense to choose some mid-priced ($200 or so) converters and compare them instead IMHO. Who is seriously going to use a $29 Behringer converter for a hi-fi system?

Judging from some of the scores, in blind testing, the UCA202 isn't "horrible" but it's not great either. It did comfortably score better than the NuForce when driving headphones. I haven't noticed any audible noise problems either subjectively or in the measurements. But, like most USB powered DACs, it's somewhat at the mercy of the PC it's powered from and I don't live next to any radio station antenna towers. The main point of including the UCA202 was to see how a much cheaper DAC would compare to the NuForce.

sampleaccurate said:
There's no reason a good D/A should cost thousands of dollars, but that statement will be met with rabid disagreement from those companies that produce D/As that are ridiculously overpriced. BUT, if they can sell 'em to people who think they have golden ears and have unlimited money to waste MORE POWER TO THEM!

I mostly agree with you. I do think the price of the Benchmark (and similar products from other companies) is justified for several reasons including being made in low volumes in the USA, having reference level specs, balanced outputs, etc. But does the average person need to spend that much? Absolutely not.

In the blind test linked below, a lowly Walmart-grade Sony CD player (with its cheap DAC) is compared to an expensive high-end Wadia CD transport/DAC in a well run blind test. And none of the audiophiles listening could tell them apart (despite the fact the Sony was further "handicapped" by a $200 Behringer amp and $4 interconnect cable). The total cost of the two systems being compared was about $700 versus $12,000:

Matrix HiFi Blind Test

sampleaccurate said:
I for one believe we're very close to the point where inexpensive converters will sound as good as expensive ones, and the only people paying thousands of dollars for D/As are rich folks who cling to the idea that more expensive MUST be better quality. But that requires that the consumer demand quality and buy only reasonable priced products that perform well. Otherwise there is little incentive for all manufacturers to make good product and some will continue to make garbage that sells.

I agree with you! That's what my blog is all about. I haven't tested one yet but this $150 USB DAC is likely already at the point of diminishing returns and all the average person needs (if they don't need a headphone output):

HRT Music Streamer II

And I wrote an article on why a DAC like the Benchmark can be justified in some circumstances:

Why the Benchmark DAC1?

But my needs are far from typical. So, yes, I'm all for rewarding manufactures that "get it right" for the least amount of money.

SoNic_real_one · 2011-03-20 2:48 pm

All in all it ia a great ideea and you have a valid methodology to create the files.
Selection of the sample population could be improved. Maybe, based onthe control files, you could publish two results - one for the ones with discernig equipament/ears, other for the rest of the people.
They would have two different points of diminishing returns. But there will be some points of course.

sampleaccurate · 2011-03-21 2:45 pm

RocketScientist said:
In the blind test linked below, a lowly Walmart-grade Sony CD player (with its cheap DAC) is compared to an expensive high-end Wadia CD transport/DAC in a well run blind test. And none of the audiophiles listening could tell them apart (despite the fact the Sony was further "handicapped" by a $200 Behringer amp and $4 interconnect cable). The total cost of the two systems being compared was about $700 versus $12,000:

Matrix HiFi Blind Test

It's too bad that this kind of test isn't conducted on a wide range of converters by an independent committee of people who have good ears and who don't work for a company or publication that has a financial interest in any of the products tested. I've found plenty of independent evaluations of individuals, but I don't trust the ears of one person other than mine!

BTW, my UCA202 is noisy and the sound isn't as "clear" as my high end stuff. You're correct though that it could be my computer. USB supplies only 5 volts and it's often not clean, and I could be close to a radio tower - I don't know. For the price, if all you want to do is make recordings for fun it's a GREAT DEAL! I guess I'm just very particular about my sound and if I hear noise and lack of clarity I don't use it.

I agree with almost everything you said. I slightly misunderstood your position but now that you've clarified it I wholeheartedly agree with you.

My kid will be very lucky if he likes music (expecting in May). He will inherit a wealth of studio equipment and by the time he's capable of recording music ultra high quality converters will be dirt cheap.

As far as low volume causing high costs, if the chip is designed and the fabrication process is complete, the only reason to keep production low is to keep prices up. As the test you refer to proves, cheap A/D converters are catching up to the "high end" stuff and it's already to the point that many audiophiles can't hear the difference.

Excellent post IMHO. Thanks for sharing that information.

RocketScientist · 2011-03-21 3:41 pm

sampleaccurate said:
It's too bad that this kind of test isn't conducted on a wide range of converters by an independent committee of people who have good ears and who don't work for a company or publication that has a financial interest in any of the products tested.

That is ideal. And such studies have also been done. There's a whole list of blind studies and other interesting links and posts here:

Testing-audiophile-claims-and-myths (Head-Fi thread)

sampleaccurate said:
As far as low volume causing high costs, if the chip is designed and the fabrication process is complete, the only reason to keep production low is to keep prices up. As the test you refer to proves, cheap A/D converters are catching up to the "high end" stuff and it's already to the point that many audiophiles can't hear the difference.

Excellent post IMHO. Thanks for sharing that information.

Thanks and glad we're mostly in agreement. As for low volumes, the chips are not the problem. You can design really good audio gear (especially something like a DAC) with off-the-shelf standard parts. The problem is everything else and how big of market there is for the product.

There's a lot of upfront and one-time costs that have to be amortized over the total number of units sold--all the R&D, mechanical design, the PC board, custom enclosure, manufacturing engineering (surface mount placement programming, solder masks, reflow profiles, test fixtures, etc.), graphics, packaging, marketing, support materials, etc. And it's also vastly more expensive to manufacture a product in low volumes in Western countries compared to high volumes in Asian countries. So when you add all that stuff up and divide by the numbers of units sold, companies like Benchmark are charging a fair price especially considering their dealers take a healthy cut of the profits.

Most everyone takes it for granted, but those cheap prepaid cell phones you can buy for $29 would literally cost more like $2900 each if the company could only sell 1000 of them, it was their only product, and they were made in the USA. I'm not exaggerating.

The only reason phones are so cheap is they share much of the design and components among other models, the parts are made in extremely high quantities and fiercely competitive (i.e. low profit margins), they're made in huge batches in highly automated factories, nearly everything else is done with obscenely cheap labor, and the companies make a bunch of different models to spread out their overhead and administrative costs.

It's also a "chicken and egg" problem sometimes. Something like the Benchmark DAC1 could be made in higher volumes much cheaper in China. But it would still be relatively expensive as it uses a lot of expensive parts. And I doubt any company wants to take the financial risk making 10,000 of them hoping they will all sell at a lower price than Benchmark's currently charging.

Cornelis Spronk · 2011-03-21 9:03 pm

This is one of the more interesting posts. I like to read about things that have been actually done.

More of the subjects choose the 'A', the less expensive system. This in my humble opinion can be explained. Most people judge that which is good by what they expect to hear. Having heard more bad sound, sets up a bias towards the bad sound.

I have had this experience with a group of audiophiles. After a listening session, the group talked things over, and generally came to the conclusion that one the the systems sounded better. I took a minority positions as the quality of a sound system. I suggested that the system less in favour sounded more like real live orchestral music. It was suggested that I was too critical to make such a comparison. I took this to mean that HiFi systems must be compared to other systems, not the the original sound.

I got the opinion that it was a mistake for me to believe that the purpose of high fidelity was to create as closely as possible the sound of the original orchestra.

Living in Cambridge I am fortunate to have an opportunity to go to excellent live concerts. In the UK, I have been fortunate enough to hear musical performances that match and even exceed anything on the best recordings.

Apparently it has been "proven" by good scientific method that wines, that are otherwise equal, taste better when the subjects believe it to have the higher price tag.

RocketScientist · 2011-03-21 9:27 pm

@Cornelis, Yes, it's absolutely well proven price creates subjective bias. It's true in wine or audio. My study wasn't nearly "blind" enough in some respects.

The whole "live vs Memorex" argument is an interesting one. In nearly all cases, the speakers (or headphones) are by the far the weakest link in the chain. When choosing speakers/headphones it's certainly valid to use live music as your "reference" if you want. Others might prefer a different sound (say more bass or brighter highs) and that's fine too.

But if you're comparing anything earlier in the signal chain (like the DACs here), you're generally trying to mainly decide how they differ from each other. Whatever headphones or speakers are being used will have far more to do with how close the overall sound is to live sound. But, if you think a particular source component sounds more like an original performance there's nothing wrong with that.

To me, the most interesting comparisons are those where even the "golden eared audiophiles" are unable to hear any difference. In those situations it's not about subjective preferences, or which one is "closer to live". It's simply about seeing if A can be distinguished from B. If not, you should buy the cheaper one and spend the left over money on something that will make a more noticeable difference.

This just came up on another forum... SACD was created, and marketed, to sound obviously better than CD. But, a very well done peer-reviewed study by AES failed to reveal any meaningful difference between the two. The Wikipedia Summary:

"The Audio Engineering Society published the results of a year-long trial in which a range of subjects including professional recording engineers were asked to discern the difference between SACD and compact disc audio (44.1 kHz/16 bit) under double blind test conditions. Out of 554 trials, there were 276 correct answers, a 49.8% success rate corresponding almost exactly to the 50% that would have been expected by chance guessing alone."

So blind studies can easily help define what matters most. Even if you try to discredit them, how big can the differences be if people keep failing to hear differences over and over? People obviously could do a lot more to improve the sound of their systems buying better speakers than an SACD player (or 24/192 fancy DAC) and new music in a new format.

SoNic_real_one · 2011-03-22 1:19 am

That is not corect the statistic interpretation.
It can be also be interpreted that 100% of the population HEARD the difference between SACD and CD. But for some reason, 50% of them preferred the sound that they where accustomed with. You know, the brain, in time, overcompensates the stimuli to match a subjective notion of "correct".
Somebody that listened all the time to cassete tape or mp3 would rate initially the sound of CD "harsh" and "metalic". Because his brain still overcompensates the lack of bandwidth and dynamics. After some time the brain adjusts itself and music sonds nice again.
Somebody that listened to CD all the time, might rate the SACD too "bright".
Some people would say that the equipment had to be "burn in" to sound good... when in reality their brain needs that "burn in" period.

I have listen the same songs in CD and SACD formats. And I can tell the difference every time.

RocketScientist · 2011-03-22 1:24 am

SoNic_real_one you need to read up on your statistics theory. A 50% result is the same as guessing. It has nothing to do with "preferred" anything. It has everything to do with even being to tell them apart. Which they could do no better than someone randomly guessing. That study is published in a very well respected peer reviewed journal (AES) and has stood the test of 4 years of scrutiny. And they were recording engineers--not people listening to low fi cassette tape.

SoNic_real_one · 2011-03-22 7:26 am

You are funny correcting me about probability theory

Your assumption is true if you filp a coin. That is unrelated, random eveniments, covered by the Discrete probability theory. Main assumption beeing an echiprobable sample space.

Anyway, In this case, the human hearing psychology make the events corellated (we all hear similar, averaged in a normal distribution) and the "flipping coin" probability doesn't apply as straight forward as you assume...
Read in your trusty wiki about probability distribution. And causes that migh shift the curve in one side or another of a normal distribution.
That's what I was talking about.

That "study" doesn't take in account the aspect of hearing adaptability. It makes the same basic mistake assuming that the humar hearing is similar with rolling a dice. Tipical for engineers

Make an experiment. Listen 100 hours music with tone control for "high" at minimum. And then put it to zero - it will sound bad on high end. You will be extra sensitive in that part, loosing the "precision" to apreciate those stimuli.
Or then listen only to mp3 in ear buds or FM radio in car speakers (like a regular population sample used in those studies) and then switch to CD/SACD with high-end gear. They will sound the same to your ear because you lost precision in that area of stimuli. You will "hear" the higher quality, but without precision. It will sound "the same" for another 100 hours. After that period you will be able to detect differences, your brain is adapted, fine tunned to that stimuli.
Is like driving for a while in a dark tunnel (bad audio quality) and going out to the sun (high def music). For a while, all it will be bright, but with no details.

This is why some people claim that OpAmps need to "burn in". Because their ears are adapted to crappy sound of previous device.

Using a person with distorted hearing by low quality music, in a "study" that compare CD with SACD is meningless - sure they will not be able to detect the differences. Both will sound "bright" in the same way, their brain won't recognise the small differences, being covered by the big "adaptive mask" present in the brain at that moment.

RocketScientist · 2011-03-22 3:24 pm

The methodology used in that study is well proven and is widely used. If there was a fatal flaw, as Sonic_real_one suggests, such a flaw should have been exposed and well documented long ago. There are certainly plenty of people who have tried to attack that study and others like it--such as this one:

Sampling Rate Discrimination 44.1 kHz vs. 88.2 kHz

To my knowledge, the best anyone has been able to do is suggest there may be some really minuscule differences not revealed in these studies. But these are hardly the sort of "obvious" differences people claim in sighted listening. They're so small one can easily argue they're insignificant--especially relative to all the other variables that change the sound.

I don't see how "hearing adaptability" is an issue as Sonic_real_one suggests. In fact, ABX studies are often done with very brief listening periods as the brain has a short memory what something sounds like. That's how most people prefer to use the ABX comparator in Foobar (as many did in my listening tests).

This wasn't some small, poorly run study. It ran for a year and involved 500 trials and many different listeners. The odds that ALL those listeners were suffering some magically debilitating hearing problem is essentially zero. Remember, they were not being asked to evaluate which source sounded better, only if they could hear any difference at all.

But, regardless, if what Sonic_real_one suggests is true, there should be some credible references documenting why all these blind studies are invalid. If someone can provide a link to such a reference I'd love to check it out. I'm sure the Audio Engineering Society would be interested as well.

SoNic_real_one · 2011-03-22 5:02 pm

Probably there where, most of them, exposed on daily basis to FM music and/or iPod mp3's. Also the older than 40 subiects should NOT be involved in this kind of "studies".
Do you have that data to infirm/confirm the composition of grup? Is like you would ask a whole train of people coming out of same dark tunnel to tell the differences between two color TV's.
"Hearing" needs the brain too... short and long term adaptability is a natural thing.

That selection of sample population would shift the results to the direction said - that they cannot tell the difference between those two formats. It doesn't mean that they will always not be able to tell that...

Jakob2 · 2011-03-22 5:34 pm

RocketScientist said:
SoNic_real_one you need to read up on your statistics theory. A 50% result is the same as guessing. It has nothing to do with "preferred" anything. It has everything to do with even being to tell them apart. Which they could do no better than someone randomly guessing. That study is published in a very well respected peer reviewed journal (AES) and has stood the test of 4 years of scrutiny. And they were recording engineers--not people listening to low fi cassette tape.

Normally researchers are a bit reluctant to call a result to be the same as guessing, so at first a result gives the statistical based conclusion that " the null hypothesis could not be rejected" .

Further conclusions about the reason for this outcome are only valid if these could be drawn due to scientific reasoning.

Unfortunately the study lacks in controls and had some severe issues up to a degree that i was asking myself how on earth it could pass the review process.

Among the participants was a subgroup of recording engineers, which did better than the others (P ~ 0.08), and a subgroup of three woman who gave some alarming results (reminds to Leventhal´s sspp - statistically significant poor performance) .

Among the issues was an experimenter expressing heavinly his opinion that a controlled listening would most certainly reveal that a high res format could not be distinguished.

No positive control was incorporated, there was no training of the participants under test conditions, the first system was broken somewhere during the test, but nobody could say when it did happen due to the lack of measurements, no meaurement of the system was shown, no analysis of the content of the sound samples was given.

So it is hard to draw any conclusions.
Nevertheless of course it might be true that no one could tell the formats apart.

RocketScientist · 2011-03-22 5:46 pm

Jakob2 said:
So it is hard to draw any conclusions.

As I said the result has been controversial, but my personal take on the "controversy" is most of the arguments are a stretch. It's like arguing over what kind of bullets are being fired at rebels in Libya.

I'm curious if you also so readily discredit this study:

AES E-Library: Sampling Rate Discrimination: 44.1 kHz vs. 88.2 kHz

And, if it's possible to run a proper blind study that DOES show a clear audible advantage to SACD and/or higher bit rate digital audio, why hasn't one been done? If you have a link to such a study, please post it? Most of the pro SACD stuff is a lot more vague and easy to debunk than the "sounds the same" studies.

I'm really not trying to be difficult, but I'm posting links to very credible, peer reviewed studies that have, more or less, stood the challenge of time and lots of critics. And what I get back in response are completely unreferenced, "armchair" critiques with no references what so ever to anything that really supports the counter claims.

Blind DAC Public Listening Test Results

Ex-Moderator

Member

Member

Member

Member

Member

Ex-Moderator

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member