Can a double blind test really be double blind?
Posted 27th December 2016 at 12:48 PM by jan.didden
Updated 27th December 2016 at 12:51 PM by jan.didden
Updated 27th December 2016 at 12:51 PM by jan.didden
This blog is not about audio. Or at least not in the sense of a design or equipment discussion. It's about how we as humans tick, and possible (probable?) implications for how we form opinions and views about what we hear, about a particular design or sound.
My long-standing interest in this area was recently triggered again by a couple of posts from Mark4w, and a book he recommended (Thinking, Fast and Slow, by Daniel Kahneman). At about the time I received the book I also fell into a scientific discussion on the TV about the process our brain goes through to form an opinion and serve that up to our conscience as 'this is how it is'.
I thought back about all the discussions I've had on diyaudio about sighted versus blind listening. Without opening up Pandora's box yet again, in a nutshell: The brain uses every input it can get its hand on to form an opinion. So when you listen to, say, a new amp, the brain not only uses the sound coming in through your ears, bus also how the amp looks, what people you trust think about it, whether you paid a handsome sum to get it or spend many, many hours to build it, etc. Some of these are direct inputs from your senses, but a significant part comes from your memory. Thus, if we want to judge an amp on 'just' its sound, we want to eliminate all those non-sound inputs and hence the call for (double) blind controlled testing. So far so good, nothing new here.
The implication of the above is that if we shut off all those extraneous inputs, the brain has only the sound to work on, and that gives us a clear, uncontaminated judgement about the sound. But I am not so sure we can force the brain's hand, so to speak, in this way. It is very unnatural of the brain to do that, it is wired NOT to do that. What the TV science show discussed was that in such an 'environmental deprived situation' the brain starts to make things up what it feels should be there, even if it isn't! Many observations support this view. For instance, it is known that in prolonged solitary confinement, the brain does make up things – you start hallucinating. One interesting case: why do people who report seeing ghosts, aliens etc, almost always report them in dim light circumstances, in the dark of a bedroom, and similar occasions. Typically the occasions where sensory input falls to a very low level and the brain starts filling in, based largely on previous experience and memory. We probably all remember a case where you lie awake in the dark and a chair starts to look like a crouching beast.
So, when we participate in a controlled double blind test, do we really limit ourselves to 'just' the sound? Or is the brain still filling in some blanks and thus skewing our judgement?
Can a double blind test ever be truly double blind?
My long-standing interest in this area was recently triggered again by a couple of posts from Mark4w, and a book he recommended (Thinking, Fast and Slow, by Daniel Kahneman). At about the time I received the book I also fell into a scientific discussion on the TV about the process our brain goes through to form an opinion and serve that up to our conscience as 'this is how it is'.
I thought back about all the discussions I've had on diyaudio about sighted versus blind listening. Without opening up Pandora's box yet again, in a nutshell: The brain uses every input it can get its hand on to form an opinion. So when you listen to, say, a new amp, the brain not only uses the sound coming in through your ears, bus also how the amp looks, what people you trust think about it, whether you paid a handsome sum to get it or spend many, many hours to build it, etc. Some of these are direct inputs from your senses, but a significant part comes from your memory. Thus, if we want to judge an amp on 'just' its sound, we want to eliminate all those non-sound inputs and hence the call for (double) blind controlled testing. So far so good, nothing new here.
The implication of the above is that if we shut off all those extraneous inputs, the brain has only the sound to work on, and that gives us a clear, uncontaminated judgement about the sound. But I am not so sure we can force the brain's hand, so to speak, in this way. It is very unnatural of the brain to do that, it is wired NOT to do that. What the TV science show discussed was that in such an 'environmental deprived situation' the brain starts to make things up what it feels should be there, even if it isn't! Many observations support this view. For instance, it is known that in prolonged solitary confinement, the brain does make up things – you start hallucinating. One interesting case: why do people who report seeing ghosts, aliens etc, almost always report them in dim light circumstances, in the dark of a bedroom, and similar occasions. Typically the occasions where sensory input falls to a very low level and the brain starts filling in, based largely on previous experience and memory. We probably all remember a case where you lie awake in the dark and a chair starts to look like a crouching beast.
So, when we participate in a controlled double blind test, do we really limit ourselves to 'just' the sound? Or is the brain still filling in some blanks and thus skewing our judgement?
Can a double blind test ever be truly double blind?
Total Comments 33
Comments
-
No Jan, 'figment of imagination' is an incorrect labelling of the illusion of stereo (or indeed all the illusions of perception). Imagination is largely unconstrained (we're free to imagine whatever we like), but in the case of the illusion of stereo what we perceive is constrained by the vibrations in the air picked up at our ears.
Posted 30th December 2016 at 11:36 PM by abraxalito -
Quote:No Jan, 'figment of imagination' is an incorrect labelling of the illusion of stereo (or indeed all the illusions of perception). Imagination is largely unconstrained (we're free to imagine whatever we like), but in the case of the illusion of stereo what we perceive is constrained by the vibrations in the air picked up at our ears.
And that has been the theme all along; I don't think it is in any way controversial that perception is the result of combining and integrating external inputs, memory, experiences, expectations etc; it's the very reason why we would like to do controlled testing if we are interested in the sound only.
So we set up the test such that we shut off everything except the sound, but my question is, can we really do that? Sure, we can close our eyes; we can make sure we have no idea of what the DUTs are, what is playing at any one time. But we still have our memory, experiences, expectations, etc. The example was given from someone who is convinced that HiRes sounds better than CD and then sets up a controlled test to proof it. Well, in this case, there is no chance in hell that it comes up with no discernible difference, so this test is strongly biased: the outcome is already determined before it starts. I am willing to accept that this guy did his best to be impartial and honest and all that, but we all know how strong unintended clues etc can be.
So still my question is: can a (double) blind test ever be really (double) blind at all?Posted 31st December 2016 at 08:42 AM by jan.didden
Updated 31st December 2016 at 08:46 AM by jan.didden -
Then it seems you're twisting the normal use of imagination (which is entirely voluntary) into something constrained. So the word 'imagination' isn't apposite, the word 'illusion' is better suited.
Take optical illusions - in the case of say spurious 'neon' dots (go look at Donald Hoffman's webpage of examples) we're constrained by sensory input and even though we know intellectually they're not there in the visual field we're not free to 'imagine' them away. Hence the dots aren't imaginary they're illusory. 'Imagination' is entirely the wrong word to be using.Posted 31st December 2016 at 09:18 AM by abraxalito -
OK, it's illusionary then. But I still have my question that started this blog. Can we really have a (double) blind or otherwise controlled test with the brain construction the opinion that gets transmitted to my conscious thought, and that relies only on the sound, with nothing else playing also a role?
Posted 31st December 2016 at 10:51 AM by jan.didden -
'The sound' is the output of the perceptual processing of the brain, not its input. The input is vibrations in the air. So the sound is an illusion, a construct of the brain, constrained by what's picked up and transmitted to it by the ears. The idea that _nothing else_ would be able to be used by the brain as input beyond that from the ears would seem to me to be an imaginary one. But isn't the point in double blind testing not that nothing else _can_ be used, rather that nothing else correlated with the identity of A/B is _available_ to the brain to be used?
Posted 31st December 2016 at 11:51 AM by abraxalito -
Quote:
Not saying I have the answers, just the opposite, but I am wondering. I don't believe it is possible to shut down the brain's auto pilot, and then the question becomes, as you rightly point out, whether the brain's unasked-for assistance has or has not a bearing on the test you are participating in. Hmm.Posted 31st December 2016 at 01:15 PM by jan.didden -
Jan, I'm not sure you understand ABX testing. Have you done any? "The example was given from someone who is convinced that HiRes sounds better than CD and then sets up a controlled test to proof it. Well, in this case, there is no chance in hell that it comes up with no discernible difference, so this test is strongly biased: the outcome is already determined before it starts."
By the very design of this Foobar ABX test (& any DBT ABX test), the participant doesn't know which is the high-res track & which is RB track. Just to explain how ABX works - the participant always has track A & B available which he can play at any stage. He knows which of these is RB & which high-res - they are simply references to refresh ones hearing. He is then presented with an unidentified X which is either A or B. His task is to select what X is - either A or B
His selection is therefore based purely on what he hears & not on his previous conviction. This is repeated a number of times (16 is the usual minimum) to rule out the possibility that he got it right by random selection (hence the statement seen on the Foobar screen "the chances of guessing are 1% or whatever)
Having a conviction of a particular positive outcome is the very thing that the test is designed to obviate - a false positive result
As Abrax says you are missing the point about imagination & illusion - the important point he makes is that the illusion is based on/constrained by, the features contained in the soundwaves.
So, again, in an ABX test, it doesn't matter what you are conjuring up in your imagination while listening to X, your job is to match X (the unknown) to either A or B.
The only outcome can be predetermined in such a test is when someone has a bias that they will hear no difference so they don't really apply themselves - they just choose randomly. The result of such a test will be the chances of guessing 50%Posted 31st December 2016 at 02:22 PM by mmerrill99 -
mmerrill99, I already conceded that my use of illusion and imagination was swapped - see my earlier reply. I always try to use the right words because I know people will want to grab any spelling error to throw a discussion off, as happened here as well, but I am certainly not perfect and luckily we succeeded to get back to the topic at hand.
As to an ABX test, I have a Vanalstine ABX box permanently wired in my system so yes I am somewhat familiar with it.
My point is that once you decide that HiRes sounds better than CD, and then you design some test to prove it, that's a bias if there ever was one. I really do not care how the test and/or test results are described, I have no confidence in such a test where the outcome, that there is an audible difference, is decided before it starts. As I said before, I have no reason to doubt the tester's honesty and sincerity, but that is not the question here. The question here is what, if anything, the brain does to influence a test, even a (double) blind one. A test with a predetermined outcome is ripe for that.
That is the reason that I will not continue to discuss this test further, sorry.Posted 31st December 2016 at 02:33 PM by jan.didden -
OK, Jan, then seeing as you understand ABX testing, please tell me how this statement holds "you design some test to prove it," This Foobar ABX is not designed by him - it is a predesigned test - I suggest you try using it & then tell us how a preconceived notion of which is better A or B aids one in determining if X is A or B. Or even tell us using your ABX box in a blind test, how an expectation of a particular positive result can predetermine the outcome of the test?
You really make no sense, whatsoever in your comments - I had assumed this was because you are unaware of how ABX testing works but there seems to be something else at play - you want to shut down any further discussion of a particular ABX test when a real world practical example is the best way of discussing matters in a real sense instead of discussing in an abstract way..
Most people doing such ABX tests do so because they have already determined that one thing sounds better than another & they want to test themselves to ensure it's not placebo. Are they therefore all going to have a predetermined outcome? The design of Foobar ABX testing prevents such a bias influencing the outcome. As I said the only bias that will sail through, unchecked & will influence the outcome, is the expectation that no difference will be heard.
Really, Jan, I think you need to refresh yourself on how ABX testing works & what double or even single blind means (Foobar ABX is double blind testing) - what biasing it is designed to eliminate? Where can "the unintended clues" arise in a Foobar ABX test? Are you just repeating things that SY has mentioned in his article? I'm sure SY would tell you exactly the same, if he could participatePosted 31st December 2016 at 03:10 PM by mmerrill99
Updated 31st December 2016 at 09:55 PM by mmerrill99 -
Ohh well, the sucker that I am, spend some time reading that thread about the Ultmusicsnob test. Another hour wasted.
One guy, upsampling CD redbook, calling it hires for Pete´s sake, saying that he can hear the difference and concluding hires sounds better than CD. Luckily there are a few sensible guys in that thread, but lots of easily impressed lurkers.Posted 2nd January 2017 at 02:46 PM by jan.didden -
Ah, so you looked at the link I gave. Hopefully you now can see that having a preconceived notion going into the Foobar ABX test does not result in a predetermined positive result confirming the bias (unless that bias is that 'there won't be any differences to be found' - which often results in the participant not trying as hard to find the audible cue as this guy demonstrates he is willing to do?)
But unfortunately you seem to miss the point about this test & why I linked to it - it's about the process of ABX testing. I introduced this real world example of ABX tests to show what's involved - I didn't introduce it as a 'proof' of RB Vs high-res or get involved in any such sideshow.
It doesn't matter whether what he perceived in sighted listening (& 'proved' with ABX test) was the result of the software upsampling filter or of the different reconstruction filters in his DAC, the process of ABX testing for other than freq/ampl is what is being detailed.
If you read about his tests on different jitter files, you will see, in this case, he is using different audible cues other than soundstage as the differentiator. These jitter files are created by a third party with different amounts & types of jitter & the freq/ampl of the differences measured. Jitter thread on head-fi - sorry, unfortunately again a long thread of 361 posts - he comes in a third of the way through at post 113 Jitter Correlation to Audibility - Page 8
Again, I post these links to his ABX tests to show the extreme difficulty involved in doing real world ABX tests & I would posit that there are not many members who have the knowledge/expertise/dedication or interest in going to the lengths he shows are needed to get a result which isn't due to guessing i.e. to 'prove' what they hear sighted.
As I said before "Calling for people to do a "no peeking" test ignores the underlying mechanisms of auditory perception & as a result, the reality of what blind testing of any real value actually entails."
So instead of your blog title "Can a double blind test really be double blind" I would suggest an alternative "how practical is it to demand DBT 'proof' on an audio forum"Posted 2nd January 2017 at 06:40 PM by mmerrill99 -
Sorry Sir, you tried to make this an ABX test - this is a blog I started and it is NOT about ABX. But I guess when you only have a hammer, everything looks like nails to you.
I have no problem with ABX, on the contrary, it is a very nice tool to try to do an 'ears only' test. But it's only part of a total test protocol. To make a test acceptable for others, it is NOT enough to use an ABX tool. You need to design the test to the question you would like to answer. Then you need enough participants, enough trials, to make the outcome, whatever it is, statistically significant. Obviously, this Ultmusicsnob just learned how to press the Foobar buttons but has no idea about how to do a test that can stand justified criticism.
And then he doesn't even know what he is testing - definitely not Hires audibility. It is all anecdotal at best; I can post Foobar screens in about 5 minutes showing even better results.
This is no criticism on you - or maybe it is a bit. If you would have read that thread as well as I did, you probably wouldn't have fallen into the trap.
Chalk it up to learning.Posted 2nd January 2017 at 06:56 PM by jan.didden -
I didn't try to make this into an ABX test - ABX is simply one of many types of blind tests & the one I cited is directly apropos to your statement "In other words, you need to 'listen for differences'. No way around that."
What blind testing was done to establish what you quoted already "blind testing show fabulous sensitivity to tiny changes in level, frequency response, interchannel timing and localization"?
I see that now you are suggesting something more than an ABX test - do you expect forum members should 'prove' their listening perceptions - "To make a test acceptable for others, it is NOT enough to use an ABX tool. You need to design the test to the question you would like to answer. Then you need enough participants, enough trials, to make the outcome, whatever it is, statistically significant."
Do you know what statistical significance is & how it is calculated? Do you know what the statistical significance of UltMusicsnob's ABX tests are?
I repeat again "Calling for people to do a "no peeking" test ignores the underlying mechanisms of auditory perception & as a result, the reality of what blind testing of any real value actually entails." I was making the case for why I think the usually suggested demand from forum members for blind testing differences was gravely mistaken.
And you do understand that the need for such a formal & statistical approach to sensory testing is precisely because of the inexact nature of what's being tested.
I've no problem with 'justified' criticisms but where have you justified any of your criticisms - I see no sign of that.
Your criticism that a bias predetermined a blind test outcome was shown to be uninformed & unjustified.
Your criticism "And then he doesn't even know what he is testing - definitely not Hires audibility. It is all anecdotal at best" is directly addressed by him in post #8 on that thread "What I have proven rigorously is that when I make the alterations to the file that I describe -- upsampling, word length, nothing else -- then the result at my ears, on my signal path, is one I describe as "superior", and can reliably detect, using the recommended blind tool foobar2000." It couldn't be more clearly stated. Your misunderstanding/misrepresentation of that thread is again erroneous & unjustified
You also need to justify your claim about me - exactly what 'trap' are you claiming I fell into?Posted 2nd January 2017 at 10:24 PM by mmerrill99