DAC blind test: NO audible difference whatsoever

Status
This old topic is closed. If you want to reopen this topic, contact a moderator using the "Report Post" button.
This is your own personal misconception of the situation, but then you just state it's all a simulacrum and anything goes.
You misunderstand my words - I didn't state "anything goes" - what my words mean is that we have an internal mechanism called auditory perception which works in a similar manner to all our perceptions & judges what is being heard according to how realistic the perceived soundscape is - it's not a matter of "anything goes", in terms of evaluating the portrayal of realism in sound, it's very much defined by the 'rules' & experience gathered from the real world about how auditory objects behave - better to call them models rather than rules. If the audio playback of a real sonic event contravenes these models then it will be perceived as less realistic - the degree to which it judges diversion from realism is according to how much it contravenes parts of the model. So it's definitely not "anything goes" as far as a judgement about the realism of the illusion being created by playback is concerned.

Where it gets complicated, IMO, is that 2 channel stereo is already a fudge as regards realism - it can't portray a realistic soundscape as we would have heard it live - so we are already suspending some of our criteria for judging realistic sound & allowing the audio portrayal to be limited by the medium. But in terms of how we deal with sound this is a bit like how we evaluate the room acoustics in which we are listening to live sound & subconsciously understand the effect it is having on the sound (there are all sorts of added complexities in this like the recorded room acoustic being played back in the playback room acoustic but we seem to cope)

Anyway, that's some of what is behind the words I used - it's not an "anything goes" message - it's very much dictated by how we judge realism in soundscapes & their portrayal through our playback system

It has similarities to the other major piece of playback equipment found in rooms - the TV. We seem to accept it's limitations & deal with the images portrayed as a representation of reality. Like in our audio systems, TV is not a case of anything goes - we recognise clarity & more realistic hues & colors, more fluid movement, etc - all the factors our visual perception model has learned about how visual objects behave in our perceived world.

I think what you are getting at is that some people like to turn up the color saturation or the contrast or whatever & initially feel that it is a better portrayal - I would suggest mainly because they are trying to compensate for some shortfall in the visual portrayal.
 
@mmerrill

If the following is not the purpose of the device under discussion, I'm sure I don't know what is......
A sampled data stream has a unique relationship to an analog waveform output, if the "actual end use" is not to reproduce this then these exercises are just a waste of time.
I don't disagree with your analysis of the role of psychoacoustics in the way we perceive the audio illusion, but I fail to see how the DACs role figures in this
 
@mmerrill

If the following is not the purpose of the device under discussion, I'm sure I don't know what is......

I don't disagree with your analysis of the role of psychoacoustics in the way we perceive the audio illusion, but I fail to see how the DACs role figures in this

I agree. The whole point of this discussion is whether DACs which are designed to be transparent can be told apart, and under what conditions. DACs which are not designed to be transparent obviously don't need to apply.
 
mmerrill99 said:
All of this would be fine if you could show that the test itself has enough sensitivity to differentiate between differences that are equivalent in range to the type of differences being tested for.
The "type of difference being tested for" was any difference which is fairly plainly audible to a small group of people listening for differences. Such a difference might not be audible to someone not listening for differences (e.g. just enjoying the music). Hence this test is likely to be more sensitive than the usual use of the devices. If they knew in advance exactly what difference to listen for then they probably could have used test equipment instead to measure the electrical difference in the output. They (allegedly) deliberately chose a cheap and a more expensive item because they (allegedly) believed that the difference would be audible - thus attempting in some part to calibrate their tests for more expensive units.

In other words if your string comparison test can only be evaluated to 1 foot differences or 1 inch differences the only claim you can make is that two strings match to withing 1 foot or 1 inch - it's a very basic concept & your twisting & turning is not logical
But the only test which matters is "can I use it on this parcel?". If we knew exactly how long a piece of string to use then we would not need to do the parcel test, but merely measure the length of the string. In audio everyone keeps telling us that we are still at the 'parcel' stage and all our attempts at quantifying the string are useless, yet when cheap and expensive strings are shown to be equally good at tying parcels they then demand that we get our rulers out. Do you see how illogical and unreasonable this is?

How would we quantify the sensitivity of a DAC test? Could we say that it can distinguish $10 from $50 DACs? Or shiny DACs from matt DACs? Note that no electrical parameters can be used, as we are told we don't know what the correct electrical parameters are.

Let us suppose that someone came up with a listening test with suitably well-trained listeners which could reliably distinguish between $100 and $1000 DACs (and let us assume that they were not 'tuned by ear' and have trivial differences such as a wonky frequency response). When the details of the test were made known it would almost certainly be clear that it was so unlike normal music listening that for normal purposes we could regard almost all competent DACs as being indistinguishable.

Are you saying that you deny the end use of audio devices is more than the engineering measurements?
No, I believe that Scott is saying that the role of a DAC is merely to reproduce the signal which emerged from the anti-aliasing filter and entered the ADC. Any colourations should be added in the studio before the ADC, or added by the user after the DAC (e.g. tone controls, 'tube buffers').

No, you are missing the whole point - your claim was that the end goal of a DAC/amplifier, etc was to be accurate according to a prescribed set of measurements.

I contend this is wrong & stops short of what is the actual goal of all audio devices - to create an audio illusion as believable & realistic as possible.
The goal of an ADC/DAC pair is simple to get an undamaged signal from A to B. Same for an amplifier. Anything else is an effects box. Many people do not want "realistic" audio (the goal of hi-fi), but fondly imagine that they do.

"Components as effect boxes" is just a bit of a mantra as is "competently designed" - it is a blinkered view from within the 'accuracy is everything' mindset & again denies the end goal of the audio devices being designed/engineered.
No blinkers, just a desire for sound reproduction. If a particular performer/microphone/speaker/room/listener needs some adjustment then by all means add it - but not in the DAC.
 
mmerrill99 said:
I think what you are getting at is that some people like to turn up the color saturation or the contrast or whatever & initially feel that it is a better portrayal - I would suggest mainly because they are trying to compensate for some shortfall in the visual portrayal.
No, they are trying to achieve an 'impressive' outcome. They probably also turn up the bass and treble tone controls (if they have them). They think more is better.

oivavoi said:
DACs which are not designed to be transparent obviously don't need to apply.
Exactly.
 
@mmerrill

If the following is not the purpose of the device under discussion, I'm sure I don't know what is......
But the problem with Scott's statement is this - how is it determined that a DAC's analog output accurately recreates the analog signal from which the digital signal was derived? We use a set of measurements. As has been stated already - is the set of measurements fully characterising the DAC? What test signal is used in the measurements - not music. Let's be clear here - there is no such DAC as 100% accurate so we interpret these measurements as being sufficiently below audible thresholds to be negligible (thresholds which are not based on complex waveforms)

We are extrapolating from all these 'good enough' guess work measurements & claim that DAC A is transparent.

I disagree with this extrapolation/claim

I don't disagree with your analysis of the role of psychoacoustics in the way we perceive the audio illusion, but I fail to see how the DACs role figures in this
Every device in the audio chain needs to preserve & transmit the signal to produce the end result which is judged by auditory perception. If each device does this job correctly psychoacoustics should match valid & accurate measurements. Unfortunately, we are often making judgements with an incomplete set of measurement & often our auditory perception tells us something different from what the measurements suggest should be a transparent device.
 
I agree. The whole point of this discussion is whether DACs which are designed to be transparent can be told apart, and under what conditions. DACs which are not designed to be transparent obviously don't need to apply.

Yes, if the purpose of this ABX test was to see if a number of untrained listeners could tell apart two devices in some system of unknown quality, then yes, I guess it's job done but the o/p seems to have tried to make general claims based on his limited knowledge of how to do perceptual testing. It's a perfect example of just what not to do in such testing.

I really can't see how his test comes anywhere near the question you suggest " whether DACs which are designed to be transparent can be told apart, and under what conditions."
 

TNT

Member
Joined 2003
Paid Member
snip

Every device in the audio chain needs to preserve & transmit the signal to produce the end result which is judged by auditory perception. If each device does this job correctly psychoacoustics should match valid & accurate measurements.... .

*IF* a ("stereo") system, as we know it, is theoretically capable of an accurate reproduction of "reality". I' not so sure so the efforts within the current concept might be in vain. The DAC however is probably not the biggest problems - they lie in the ends. If the reproduction system is in fact flawed, it is hard to aurally judge one component in such a chain.

//
 
Maybe what we need is blind evaluation of tests. Potential critics are told all about the test, apart from the devices under test and the result. They can then decide whether the test is good enough, or not, or 'cannot tell'. Only after they have announced their verdict can the DUT and test result be revealed. That would stop all this 'you did the test wrong' as code for 'I don't like the result' or 'an excellent test' as code for 'you got the result I expected'. I suspect that the critics would say that they can only evaluate a test if they know what the DUT are i.e. only sighted test evaluations can be done, as unsighted test evaluations are too stressful?

I am still waiting for criticism of Mark's tests, which happened to get the opposite result (DACs were distinguishable), yet the silence from test critics is deafening. Of course, it could be that finding a difference is the only way of validating a test so that by definition all tests which find no difference are bad tests.
 
The "type of difference being tested for" was any difference which is fairly plainly audible to a small group of people listening for differences. Such a difference might not be audible to someone not listening for differences (e.g. just enjoying the music). Hence this test is likely to be more sensitive than the usual use of the devices. If they knew in advance exactly what difference to listen for then they probably could have used test equipment instead to measure the electrical difference in the output. They (allegedly) deliberately chose a cheap and a more expensive item because they (allegedly) believed that the difference would be audible - thus attempting in some part to calibrate their tests for more expensive units.
Again, you really don't understand (I suspect willfully so) the use of controls within such tests - your replies belie this misunderstanding. I've explained it often enough so I suggest you read about such controls yourself


But the only test which matters is "can I use it on this parcel?". If we knew exactly how long a piece of string to use then we would not need to do the parcel test, but merely measure the length of the string. In audio everyone keeps telling us that we are still at the 'parcel' stage and all our attempts at quantifying the string are useless, yet when cheap and expensive strings are shown to be equally good at tying parcels they then demand that we get our rulers out. Do you see how illogical and unreasonable this is?
I'm really not interested in reading the string analogy - this ABX test is intended to be a difference test but the tester (nor the reader) has any idea of the level of differences that can be revealed by that particular test setup, participants, etc. - some controls are needed to begin to answer this.


How would we quantify the sensitivity of a DAC test? Could we say that it can distinguish $10 from $50 DACs? Or shiny DACs from matt DACs? Note that no electrical parameters can be used, as we are told we don't know what the correct electrical parameters are.
Read up on hidden anchors & controls

Let us suppose that someone came up with a listening test with suitably well-trained listeners which could reliably distinguish between $100 and $1000 DACs (and let us assume that they were not 'tuned by ear' and have trivial differences such as a wonky frequency response). When the details of the test were made known it would almost certainly be clear that it was so unlike normal music listening that for normal purposes we could regard almost all competent DACs as being indistinguishable.
Again, you need to inform yourself about blind tests to answer your questions


No, I believe that Scott is saying that the role of a DAC is merely to reproduce the signal which emerged from the anti-aliasing filter and entered the ADC. Any colourations should be added in the studio before the ADC, or added by the user after the DAC (e.g. tone controls, 'tube buffers').


The goal of an ADC/DAC pair is simple to get an undamaged signal from A to B. Same for an amplifier. Anything else is an effects box. Many people do not want "realistic" audio (the goal of hi-fi), but fondly imagine that they do.
See my answer to scottjoplin


No blinkers, just a desire for sound reproduction. If a particular performer/microphone/speaker/room/listener needs some adjustment then by all means add it - but not in the DAC.

I want a reproduction which is informed by knowledge of psychoacoustics - how we get there seems to be divergent - I come at it from the end goal of auditory perception & it's understanding.

I know what Scott & you are saying - get accurate reproduction first & then we can adjust to suit the vagaries of our rooms, hearing, etc.

My point is that the measures of accuracy are flawed - 100% accuracy is impossible & thus begins the compromises over what is important & what isn't as far as what we can hear. It's where engineering meets psychoacoustics & this is where the understanding of psychoacoustics should be uppermost, informing if these target thresholds are actually psychoacoustically correct for the auditory perception of music.
 
*IF* a ("stereo") system, as we know it, is theoretically capable of an accurate reproduction of "reality". I' not so sure so the efforts within the current concept might be in vain. The DAC however is probably not the biggest problems - they lie in the ends. If the reproduction system is in fact flawed, it is hard to aurally judge one component in such a chain.

//
I know there is a common view held that the transducers & room interactions are the most flawed & therefore should receive the most attention but not all flaws are psychoacoustically equivalent.

Just as reference Earl Geddes is also of the opinion that distortions in electronics are more psychoacosutically noticeable than distortions in speakers.
 
<snip> .... cancer research .... <snip>

As we know there is an ongoing debate about the socalled replication crisis in science, difficulties that are based on methodology and of course the human observing factor which is much more prone to errors compared to just reading meters. Furthermore there often seems to exist a problem with statistics and that led to the conclusion of some researchers that a surprisingly large proportion of published scientific work is seriously flawed.

Otoh statistics can help even in this situation, and under certain assumption one can calculate the probability that a positive study result will be still positive in a replication attempt.
If you assume a positive result on a SL=0.05 niveau with an actual p of slightly below 0.05 chance probability, the calculated probability that a replication will still show a significant result will be nearly like a coin flip. (Of course this short summary neglects a lot of variables, and we can/should expand in another thread).

See bit above about the state of cancer science. Seriously, we can do better than relying on pink noise or a Clark's 1984 ABX box with TL074 opamps with some back to back diodes for introducing distortion. That was more than 30 years ago...

Of course, given the usual progress that shows that newer experiments usually tend to find lower thresholds than older experiments, we could expect this to be true in our case. But i thought that you questioned already these older figures.
Btw, both Clark and Frindle reported that these results were reached with music samples not noise or specially constructed artificial waveforms.

He states these differences in the paper, but I'm not aware of any published research from him confirming this, sample sizes, etc. Did he ever publish those studies along with the details? See above about the cancer R&D's inability to repro fundamental research. And then tell me if you think audio research is more or less disciplined and funded that cancer research?

As said above there are quite a lot reasons (some established and at least some conjectured additional ones) for the socalled replication crisis, but there is still the the often mentioned problem of not so well defined research objectives which seems to affect our discussion in this thread too.

No, although Clark´s article was published in the peer reviewed JAES he did not described the experiments in detail, just reported what was found and supplied a graph with various conditions of level differences like broadband, 3 octaves wide and so on. The afore mentioned level difference was for the broadband condition, according to Clark the figure for a 3 octave wide level difference centered at the most sensible region of human ears was even a bit lower.

Frindle wasn´t more explicit either.

But, i haven´t written that they gave estimates for the underlying population distribution nor that they used sample sizes that were that large to allow any further conclusions for populations.
Given you various description of the prices you were willing to pay for the first ones reporting (using your software) and seemed to be obvious that you weren´t looking for those population parameters, am i mistaken?

Therefore i was a bit surprised that you complained about the missing details of those older attempts while you would not getting much more in addition to the signed result from people using your software.
As said before, that might be me misunderstanding the question to be examined; as you were talking about the level differences with piano notes, that might be the point where i misunderstood.

If you are in fact just looking for listeners doing the "piano note test" that might be a different thing with special exclusion/inclusion of music samples but you´d still don´t know about the equipment listeners were using, because no measurements would be included. At that point i am again wondering why you did complain about the lack of detailed test setup description from Frindle/Clark because i´d say you won´t get more information from your listeners.

In another post you were writing about being interested in knowing what people could detect during "casual listening" and - i thinkg Mark4 and MMerill99 did already mentioning it - it is questionable if listeners can do "casual listening" while "ABXing" under the additional impact of a price win. At least we know from SDT experiments that monetary rewards do have an impact on internal decision formulas of the participants.

But if you showed me videos of people doing it, I'd flip my opinion instantly. It's that simple.

Obviously in reality it is never that simple. :)
As said above, you seem to be unwilling to accept Clark´s and Frindle´s reports as valid numbers (due to missing detailed information) while you won´t get much more information by your test/bet proposals.

I´d still be intersted in knowing if your software does randomly assigning the music samples to "A" and "B" in _each_ trial or only once for a whole test run?
And to know what you think/do about the familywise error?
Doing multiple tests will most likely produce positive results by chance (too), what about the multiple comparison problem?

Yes, the tradeoff as you know is small trial count means it's easy to guess your way to success. If we're always picking 10 trials for example, then we can increase the cost of guessing your way to success (such that it's probably not worth it) while not overtaxing the listener.

Which is imo a problematic take on probability. Guessing 7 correct answers in 7 trial test is even less probable than correctly guessing 9 out of 10 or 12 out of 16. But low trial tests suffer from much higher risk of committing a beta error.
So doing only 10 trial tests is only warranted if the detection abilities of listeners _under_ the _specific_ test conditions is really good.

And I should add that what I'm suggesting isn't the absolute limit of human hearing abilities.<snip>

A threshold in psychophysics is usually a 50% number, while a JND is around 70-75% correct responses. (Not necessarily used in multidimensional perceptual evaluations).

To me, that is what is so helpful about artificially degrading the stream in real time: You establish limits to the types of distortion you can readily hear, you pick a safety factor, and you are done. And you can have enormous sample sizes.

As said before, imo your software/idea is a nice/usefull attempt (especially as a training instrument) but the overall idea is a bit unclear to me.
As said above the "casual listening" concept isn´t so much compatible with the reward and training idea and detailed information about the various test setups is still missing.

In another post you´re (iirc) talking about a database for the different distortion mechanisms (which is a completely different topic) and about concluding from these numbers to the general audibility of differences regarding the measured performance which is imo an additional different topic.

With all due respect, as said above, i might be mistaken at some points.
 
mmerrill99 said:
But the problem with Scott's statement is this - how is it determined that a DAC's analog output accurately recreates the analog signal from which the digital signal was derived? We use a set of measurements. As has been stated already - is the set of measurements fully characterising the DAC? What test signal is used in the measurements - not music. Let's be clear here - there is no such DAC as 100% accurate so we interpret these measurements as being sufficiently below audible thresholds to be negligible (thresholds which are not based on complex waveforms)

We are extrapolating from all these 'good enough' guess work measurements & claim that DAC A is transparent.

I disagree with this extrapolation/claim
The test signal used is unlikely to matter, because a DAC is unlikely to have the sort of problems which might be affected/exposed by particular signals. I realise that this answer will not satisfy the 'we know almost nothing about sound' brigade. We know roughly what parameters are needed for sound reproduction, and we know that the DAC (unless deliberately 'voiced') exceeds almost all of these requirements by quite a margin. Filter characteristics are the main issue. Some people get excited about jitter. In both cases a DAC is much better than LP.

Every device in the audio chain needs to preserve & transmit the signal to produce the end result which is judged by auditory perception. If each device does this job correctly psychoacoustics should match valid & accurate measurements. Unfortunately, we are often making judgements with an incomplete set of measurement & often our auditory perception tells us something different from what the measurements suggest should be a transparent device.
This is the classic 'we don't know enough' argument. When tested (by listening), it is often found that actually we do know quite a lot; the usual response is to question the tests. FUD is a powerful economic weapon.

Again, you really don't understand (I suspect willfully so) the use of controls within such tests - your replies belie this misunderstanding. I've explained it often enough so I suggest you read about such controls yourself
Are you asking that they first buy (or construct) a really bad DAC in order to confirm that they can hear it? How bad does it have to be? They thought they had already done this with the $30 DAC.

I'm really not interested in reading the string analogy - this ABX test is intended to be a difference test but the tester (nor the reader) has any idea of the level of differences that can be revealed by that particular test setup, participants, etc. - some controls are needed to begin to answer this.
You miss the point. The test was not 'how different are these devices?' (how much longer is one string?). The test was 'can we tell them apart, in this system?' (do they both secure the parcel adequately?). It may be false to extrapolate to 'can anyone tell them apart in any system?' but that was not the test.

I know what Scott & you are saying - get accurate reproduction first & then we can adjust to suit the vagaries of our rooms, hearing, etc.

My point is that the measures of accuracy are flawed - 100% accuracy is impossible & thus begins the compromises over what is important & what isn't as far as what we can hear. It's where engineering meets psychoacoustics & this is where the understanding of psychoacoustics should be uppermost, informing if these target thresholds are actually psychoacoustically correct for the auditory perception of music.
OK, which measures of accuracy are flawed? What would you like to replace them with? Or is this just more FUD?
 
The test signal used is unlikely to matter, because a DAC is unlikely to have the sort of problems which might be affected/exposed by particular signals. I realise that this answer will not satisfy the 'we know almost nothing about sound' brigade. We know roughly what parameters are needed for sound reproduction, and we know that the DAC (unless deliberately 'voiced') exceeds almost all of these requirements by quite a margin. Filter characteristics are the main issue. Some people get excited about jitter. In both cases a DAC is much better than LP.
It's in your use of the words "likely" & "roughly" where you need to seek the answer

This is the classic 'we don't know enough' argument. When tested (by listening), it is often found that actually we do know quite a lot; the usual response is to question the tests. FUD is a powerful economic weapon.
Funnily when people use the type of ABX test that started this thread then they almost always get a no difference found, null result


Are you asking that they first buy (or construct) a really bad DAC in order to confirm that they can hear it? How bad does it have to be? They thought they had already done this with the $30 DAC.
You still refuse to look into hidden anchors & controls - oh well.


You miss the point. The test was not 'how different are these devices?' (how much longer is one string?). The test was 'can we tell them apart, in this system?' (do they both secure the parcel adequately?). It may be false to extrapolate to 'can anyone tell them apart in any system?' but that was not the test.
Yes in this test of unknown quality there is no difference found - so what does that tell you?


OK, which measures of accuracy are flawed? What would you like to replace them with? Or is this just more FUD?
Ones that can correlate with auditory perception
 
Maybe your definition of psychoacoustics is different - I understand it to mean auditory perception
Now if you are saying auditory perception is a red herring, I beg to differ

This is great, it's becoming an exercise in logic, pedantry and semantics. I know who I think is winning, and yes, it is a competition if you hadn't already realised. Keep up the good work folks, I'm learning all the time......

See?
 
Status
This old topic is closed. If you want to reopen this topic, contact a moderator using the "Report Post" button.