Rate this Entry

Can a double blind test really be double blind?

Posted 27th December 2016 at 12:48 PM by jan.didden
Updated 27th December 2016 at 12:51 PM by jan.didden

This blog is not about audio. Or at least not in the sense of a design or equipment discussion. It's about how we as humans tick, and possible (probable?) implications for how we form opinions and views about what we hear, about a particular design or sound.

My long-standing interest in this area was recently triggered again by a couple of posts from Mark4w, and a book he recommended (Thinking, Fast and Slow, by Daniel Kahneman). At about the time I received the book I also fell into a scientific discussion on the TV about the process our brain goes through to form an opinion and serve that up to our conscience as 'this is how it is'.

I thought back about all the discussions I've had on diyaudio about sighted versus blind listening. Without opening up Pandora's box yet again, in a nutshell: The brain uses every input it can get its hand on to form an opinion. So when you listen to, say, a new amp, the brain not only uses the sound coming in through your ears, bus also how the amp looks, what people you trust think about it, whether you paid a handsome sum to get it or spend many, many hours to build it, etc. Some of these are direct inputs from your senses, but a significant part comes from your memory. Thus, if we want to judge an amp on 'just' its sound, we want to eliminate all those non-sound inputs and hence the call for (double) blind controlled testing. So far so good, nothing new here.

The implication of the above is that if we shut off all those extraneous inputs, the brain has only the sound to work on, and that gives us a clear, uncontaminated judgement about the sound. But I am not so sure we can force the brain's hand, so to speak, in this way. It is very unnatural of the brain to do that, it is wired NOT to do that. What the TV science show discussed was that in such an 'environmental deprived situation' the brain starts to make things up what it feels should be there, even if it isn't! Many observations support this view. For instance, it is known that in prolonged solitary confinement, the brain does make up things – you start hallucinating. One interesting case: why do people who report seeing ghosts, aliens etc, almost always report them in dim light circumstances, in the dark of a bedroom, and similar occasions. Typically the occasions where sensory input falls to a very low level and the brain starts filling in, based largely on previous experience and memory. We probably all remember a case where you lie awake in the dark and a chair starts to look like a crouching beast.

So, when we participate in a controlled double blind test, do we really limit ourselves to 'just' the sound? Or is the brain still filling in some blanks and thus skewing our judgement?
Can a double blind test ever be truly double blind?

Posted in Uncategorized

Views 1321 Comments 33

« Superregs for your line-level projects Main

Total Comments 33

Comments

Page 1 of 2

I think you are double thinking "double blind" giving it a meaning beyond its technical experimental use

"double blind" simply means that that during a trial there is no other information channel for the subject to "read"

specifically the 2nd "blind" in "double blind" is that the experimenter, anyone in the room or in any other communication with the test subject is also "single blind" as to the identity of the specific X trial A/B status

this is to avoid the "Clever Hans" effect

sound waves hitting ears and the best possible neural processing of that signal with training and focus ARE wanted if the question is can you hear the difference

Posted 27th December 2016 at 03:56 PM by jcx jcx is offline

Yes, fully agree, my rambling is not about the first (or 2nd) double in double. I should have made clear it is about the test subject only, how he/she can be blind. The trust of my thinking is that this test subject may be cut off from all external sensory inputs except the sound, but does that make his/her listening 'blind'? Does that guarantee that the listening report is not influenced by anything except the sound?
Knowing how devious a mind/brain really is, I doubt that.

As an example, when doing the test you probably know where you are, the time of day (morning, afternoon, evening), probably the names of the people administering the test etc. There would be a clear link in your memory to similar events with the same people with some specific outcome. This may be enough to skew your judgement big time. Hence my question: can a (double or not) blind, controlled test really be blind in the sense that ONLY the sound matters?

Posted 27th December 2016 at 06:17 PM by jan.didden jan.didden is offline

there is neurophysical feedback between hearing and brain state, expectation of listening for differences along different acoustic perceptual axis like loud/soft, pitch, timbre, attack, timing...
the Haas effect, a zoo of others can actually change the sensitivity of the inner ear sound to nerual impluse train transduction

so yes, a subject who's expectation is primed for listening for a particular difference may be distracted from hearing a "clearly audible" difference along another axis

the Gorrila suit walking past the basket ball players is a classic of perceptual focus, no reason to believe there isn't similar effects in our audio processing

Posted 27th December 2016 at 06:34 PM by jcx jcx is offline

This is what SY has to say about it:

"The unasked question: why does blind testing show fabulous sensitivity to tiny changes in level, frequency response, interchannel timing and localization...,? The issue isn't that ears only reduces sensitivity, there's too much data to the contrary; it's that it fails to confirm the magic differences that form high end marketing and the resulting audio lore. That's really the only controversy. This stuff is the usual excuse-making."

Posted 27th December 2016 at 07:20 PM by jan.didden jan.didden is offline

There are always issues like those discussed here when human subjectivity is involved. To some extent it may be helpful to approach the problem statistically. We could test many different people using one particular test, as is often done. Or if we are more interested in one specific individual, we could try using different tests at different times and under different circumstances. For some individuals we might see fairly consistent results, which in turn might give us more confidence about the reliability of our measurements for them. For individuals more at the other end of the spectrum, it may be harder to determine their abilities with much reliability or certainty, or how they might be likely to measure in the future.

Posted 28th December 2016 at 12:26 AM by Markw4 Markw4 is offline

The issue that you raise is fundamental to our perceptions - each one of them is attempting to resolve the problem of making a model of the exterior world from moment to moment neurological signals. This problem is an ill-formed one in that often there is not enough information to reach a unique solution - hence we are always dealing with a best guesses. Our biological & psychological demeanour is to try to avoid insecurity so we use all the inputs at our disposal including, memory, knowledge of objects in the world & signals from all our senses to try to solve the problem that faces perception. Cutting off the signals from the predominant perceptual sense of vision forces us to use our sense of hearing in a way that we are not accustomed to. Can our brain adjust to this new approach to analysis of the problem? Yes, but it takes time & training - just as the brain can adjust (takes about a week) to the new approach to analysis when the image projected on the fovea is upright rather than the normally inverted form.

I don't see how quoting SY's thoughts advances this discussion - it simply echoes the selective, half-baked, & uninformed dialogue of some of the high-end audio marketing sector he is trying to abuse!

Posted 28th December 2016 at 04:32 PM by mmerrill99 mmerrill99 is offline

Updated 28th December 2016 at 05:02 PM by mmerrill99

Thanks for your reaction, yes the training point is important. Maybe we can, after all, learn to depend on our ears only.
Surprised that it takes a week to get used to inverted vision, I thought I read somewhere that it would take just a couple of hours, but can't reproduce that.

As to SY's comment, I think he did highlight the important point that (double) blind testing has shown that it can facilitate hearing very subtle and minute differences. And yet the often so-called 'obvious' differences from uncontrolled sighted tests disappear. That's something we need to reconcile.

Posted 28th December 2016 at 06:51 PM by jan.didden jan.didden is offline

The brain adjustment to upright images Your Eyes See Everything Upside Down | Mental Floss UK

Yes, but SY mentions that blind testing using (trained) participants can "show fabulous sensitivity to tiny changes in level, frequency response, interchannel timing and localization"

These are typical indicators used in tests where audio snippets are used to identify these minute, subtle differences yet the 'obvious' differences heard in sighted listening are not usually of this type - they are more often related to holistic, differences with 'realism' of the sound, connectedness to the performers, etc - the sort of differences that are the result of the analysis function of auditory perception, I mentioned.

If you want to see what's actually involved in a real blind test for differences which are not just amplitude, frequency, localisation issues which are more suited to A/B testing then look at Ultmusicsnob's posts on head-fi where he walks through blind testing 16/44 Vs 24/192 (he also did blind tests for jitter - worth searching out his posts). What is evident from his approach is that he is not listening for freq/amplitude differences. What is also evident is that most people wouldn't be likely or able to go to the trouble that this recording engineer has done!!

"Keeping my attention focused for a proper aural listening posture is brutal. It is VERY easy to drift into listening for frequency domains–which is usually the most productive approach when recording and mixing. Instead I try to focus on depth of the soundstage, the sound picture I think I can hear. The more 3D it seems, the better."

Program material is crucial. Anything that did not pass through the air on the way to the recording material, like ITB synth tracks, I'm completely unable to detect; only live acoustic sources give me anything to work with. So for lots of published material, sample rates really don't matter–and they surely don't matter to me for that material. However, this result is also strong support for a claim that I'm detecting a phenomenon of pure sample rate/word length difference, and not just incidental coloration induced by processing. The latter should be detectable on all program material with sufficient freq content.
Also, these differences ARE small, and hard to detect. I did note that I was able to speed up my decision process as time went on, but only gradually. It's a difference that's analogous to the difference between a picture just barely out of focus, and one that's sharp focused throughout–a holistic impression. For casual purposes, a picture that focused “enough” will do–in Marketing, that's ‘satisficing’. But of course I always want more.

It took me a **lot** of training. I listened for a dozen wrong things before I settled on the aspects below.

The difference I hear is NOT tonal quality (I certainly don't claim to hear above 22 kHz). I would describe it as spatial depth, spatial precision, spatial detail. The higher resolution file seems to me to have a dimensional soundstage that is in *slightly* better focus. I have to actively concentrate on NOT looking for freq balance and tonal differences, as those will lead you astray every time. I actively try to visualize the entire soundstage and place every musical element in it. When I do that, I can get the difference. It's *very* easy to drift into mix engineer mode and start listening for timbres–this ruins the series every time. Half the battle is just concentrating on spatial perception ONLY

I initially found training my ears to find a difference very difficult. It's *very* easy to go toward listening for tonal changes, which does not help. I get reliable results only when trying to visualize spatial detail and soundstage size, and I tend to get results in streaks. I get distracted by imaginary tonal differences, and have to get back on track by concentrating only on the perceived space and accuracy of the soundstage image.

Posted 28th December 2016 at 07:40 PM by mmerrill99 mmerrill99 is offline

	Just to clarify & in case it wasn't clear, ultmusicsnob's listening tests are ABX tests for which he posted audits of but it was on Gearslutz, https://www.gearslutz.com/board/elec...-192-24-a.html
	Posted 28th December 2016 at 09:01 PM by mmerrill99 Updated 30th December 2016 at 08:44 PM by mmerrill99

There was a problem reported not too long ago in one of the forum threads here when somebody used foobar ABX testing and could reliably distinguish the difference between two files almost all of the time. The problem was that the files were bitwise identical. When this was discovered, some other people reported they had heard about some problem with foobar playing ABX test files at slightly different volume levels.

Posted 29th December 2016 at 12:54 AM by Markw4 Markw4 is offline

Updated 29th December 2016 at 01:00 AM by Markw4

	Strikes me that 'listening for differences' is a complete distraction and misdirection. I don't want an amp (or DAC or whatever) that's different, I am searching for one that's a more transparent window on the recording.
	Posted 29th December 2016 at 01:00 AM by abraxalito

Jan, I think this sentence is really the punch line: "So, when we participate in a controlled double blind test, do we really limit ourselves to 'just' the sound? Or is the brain still filling in some blanks and thus skewing our judgement?"

My take on it is that the brain is always filling in blanks, all the time, at least my brain is. The only way to shut it up is to fall asleep, and even then... So the challenge is to control this beast to stay focused on the task at hand, which can only be accomplished by training. In other words, a test can only be double blind if the test subject is sufficiently trained in the testing procedures. Otherwise you may test, indeed, very different things than you intended.

Posted 29th December 2016 at 10:53 AM by vacuphile vacuphile is offline

This "filling in the blanks" idea, in the way it's expressed, is incorrect - it's not a matter of training the mind to not do this - auditory perception is thought to work in this fundamental way & there's no changing it - it's like saying that we can stop our hair growing by training.

Once one realises that perception is, by & large, a guessing game & that we therefore have no way of improving this - it's not "skewing our judgement" - it's our biological machine, flaws & all that we have accommodated to.

So in blind tests we set up unnatural conditions, with unnatural ways of listening & training in how to handle this unnaturalness & then extrapolate the results as representative of what can/cannot be perceived & what is of importance in this hobby?

As Abrax says 'listening for differences' is a complete distraction and misdirection

But what do you think, Jan?

Posted 29th December 2016 at 03:39 PM by mmerrill99 mmerrill99 is offline

Updated 29th December 2016 at 03:49 PM by mmerrill99

Most of the blind tests are not because the subject knows that he is on a test, he almost always knows what the expected result is and, worst of all, he undergoes pressure (consciously or unconsciously) on the part of the experimenter.
The only ways to know the real opinions of the subjects are: or to lie to them about the purpose of the test that is what the Psychologists do (for example, it is said that the turntable is being tested, when it is really the amplifier) or simply that do not know that they are in a test, which is what is done, for example, in supermarkets, put a product on the shelves, hide some cameras and observe the behavior of the client, without anyone knowing that is being observed.
Years ago I sold HiFi for a few years, many times I did tests with customers without them knowing, made changes in the systems (without the customer knowing) we were testing and observed their reactions and comments, never in all these years none has known that there were been used in a listening test.
If anyone has an interest in the results: people prefer vinyl, amplifiers without feedback, and everyone, absolutely everyone, distinguish to the differences between cables.

Posted 29th December 2016 at 04:54 PM by raul_77 raul_77 is offline

Updated 29th December 2016 at 04:57 PM by raul_77

I've read "Fast and Slow" as well, a fascinating book. But you are reading too much into it, Jan. As SY has pointed out, blind tests show remarkable sensitivity to the things we care about. The conclusions that flatter, smoother responses are preferred is also what we expect. So, even if your mind is filling in the gaps, it is pointing us in the right (or at least, expected) direction. The contrary view would be that our hearing ability is so hopeless that without other cues, we are not able to discern the good from the bad.

Posted 30th December 2016 at 03:44 PM by ra7 ra7 is offline

Quote:

Originally Posted by abraxalito

Strikes me that 'listening for differences' is a complete distraction and misdirection. I don't want an amp (or DAC or whatever) that's different, I am searching for one that's a more transparent window on the recording.

Yeah but you can only know when you reached that when you have some kind of reference. In other words, you need to 'listen for differences'. No way around that.

Posted 30th December 2016 at 03:51 PM by jan.didden jan.didden is offline

Quote:

Originally Posted by ra7

I don't think I read to much in it, because I just started, still in Ch 1 ;-). But I get your point. Yet, lots of reports mention a certain kind of 'stress' when doing blind testing, which I would attribute to the unnatural state of having the other senses muted. I don't think we should doubt that in such a situation, the brain 'tries to make sense of it' i.e. fill in blanks if given half a chance. But knowing the incredible malleability of the brain, one can probably train to the point that one is totally at easy doing blind testing and getting very good results.

Posted 30th December 2016 at 03:54 PM by jan.didden jan.didden is offline

"blind tests show remarkable sensitivity to the things we care about." I would argue that this is an incorrect statement & it's patently obvious that these are not the "things we care about" in this audio hobby - this is where the stress lines show in this audio hobby.

It's also patently obvious that our hearing ability is not hopeless in it's normal operation & we can discern the good from the bad but it does so by the brains's analysis of all the neurological signals at it's disposal.

The dilemma here for many seems to be that when auditory perception is forced to rely on a subset of these signals it can differentiate freq/ampl & timing differences with adequate training but can it be trained to differentiate more dynamic, holistic differences like realism, connectdness to the performers/performance, etc (the things that we really are interested in)? These attributes are in the auditory signals being received & are NOT figments of imagination. The difference between the attribute of realism & ampl/freq is that realism is about the dynamic relationship between the changing patterns in the soundfield, not about individual static differences like ampl/freq.

The difficulty of recognising these dynamic aspects in the soundfield is illustrated by Ultmusicsnob's writeup of his ABX testing of redbook Vs high-res audio.

But here's the thing - he already knew that he preferred high-res to redbook from sighted testing - what blind testing forced him to do was to find the audible key on which to differentiate. Not an easy task even though he already has identified in sighted listening that high-res sounded superior to him. This also shows that we can identify good sound in normal listening.

So finding the attribute to key on for these tests is not an easy task (these attributes don't just fall out of our ordinary listening) - he tired at least 12 dead-ends (I expect his background as a production engineer greatly helped him in this regard). Staying focused on this attribute (soundfield depth, layering & solidity) proved to be very difficult for him during the tests.

His ABX jitter tests show a whole different set of attributes to key on & the difficulty in finding these along with the concentration & focus needed to do a blind test that has veracity.

Add to this the fact that the choice of music has to be carefully chosen to isolate the particular attributes being tested for.

As can be seen, it's not JUST a matter of training that is at the heart of a successful test, it is many, many factors that most people haven't the time or dedication to undergo.

Calling for people to do a "no peeking" test ignores the underlying mechanisms of auditory perception & as a result, the reality of what blind testing of any real value actually entails.

Posted 30th December 2016 at 05:29 PM by mmerrill99 mmerrill99 is offline

@ mmerrill99: don't want to waste bandwidth by quoting your valuable post, but have two serious issues with it:

1 - You mentioned 'figment of imagination' - or rather the absence. Well, let me remind you that sound reproduction in general and stereo reproduction in particular are all about figments of imagination. Hint: There is no singer in your room between the speakers - there only are two boxes with flapping membranes. The whole acoustic 'landscape' is a figment of your imagination;

2 - So this guy knew that HiRes sounded better than CD, and he got confirmation after he repeated it in a controlled way? Sorry, I'm not impressed. You say it beautifully: he knew what the outcome had to be, and that forced him to come up with an explanation. Really beautiful. ;-)

Posted 30th December 2016 at 06:59 PM by jan.didden jan.didden is offline

Updated 30th December 2016 at 07:02 PM by jan.didden

I know only too well that audio replay is a case of how believable is the illusion produced & the better the illusion, the better we consider the replay. We judge the realism of this illusion in the same way as we analyse & judge the sounds in the world - no different. The closer this analysis matches our real-world model of how sound works, the more realistic is the illusion. It's this realism that provides the sense of connectedness with the performance/performers - the stuff that we "really do care about"

What you are misreading in my post (although I don't know why - it seems clearly explained) is that this illusion is created by the audio signal & it's analysis by our auditory perception - it's not created by a "figment of imagination" as is meant in the common use of this phrase.

What I said is that it is the analysis of the dynamics within this signal stream that determines how realistic an illusion is produced & not spot differences in freq/ampl that are so often quoted as 'proving' the efficacy & accuracy of blind testing.

I don't know what your problem is with his double blind ABX test results - he had no way of knowing which was high-res in the test - isn't that the point of DBTs? You really don't make sense in "knew what the outcome had to be" - a double blind test avoids this "knowing" so what does your point mean - his DB testing confirmed what he had already determined as his preference in sighted listening? Do you expect blind tests to always contradict sighted listening or for them to be only valid when they do contradict sighted listening?

What is meant by "blind testing forced him to do was to find the audible key on which to differentiate." is that we can best differentiate in ABX blind tests (the type you are talking about) for differences by finding & isolating an audible key difference which can be identified in a snippet of the audio signal. This is what anyone who has done ABX testing knows - it's necessary to identify a specific difference, hence one is 'forced' into searching for this 'tell' in order to participate 'validly' in an ABX test. Otherwise one could just randomly select answers but one has to ask "is this a real test" and/or "what is being tested"

BTW, Jan, it's obvious you haven't read the link I gave to Ultmusicsnob's ABX tests & posts!! Based on your initial blog post you seemed interested in all viewpoints, including real DBT evidence as I linked to but is this no longer the case?

Posted 30th December 2016 at 07:35 PM by mmerrill99 mmerrill99 is offline

Updated 30th December 2016 at 09:57 PM by mmerrill99

Page 1 of 2

New To Site?	Need Help?
Register to Participate Search Privacy Statement Contact Us	Frequently Asked Questions Did you forget your password? Mark Forums Read