Double Blind Testing - diyAudio
Go Back   Home > Forums > General Interest > Everything Else

Everything Else Anything related to audio / video / electronics etc) BUT remember- we have many new forums where your thread may now fit! .... Parts, Equipment & Tools, Construction Tips, Software Tools......

Please consider donating to help us continue to serve you.

Ads on/off / Custom Title / More PMs / More album space / Advanced printing & mass image saving
Reply
 
Thread Tools Search this Thread
Old 17th September 2005, 06:59 PM   #1
pjpoes is offline pjpoes  United States
diyAudio Member
 
Join Date: Jan 2005
Location: New York
Send a message via AIM to pjpoes
Default Double Blind Testing

Someone inspired me to write a post about Double Blind testing, and its implications in Audio Review, or, IMHO, lack of implication. I think there is a great deal of misunderstanding with this term, and when its supposed to be used in an experiment. It really would not make the most sense in audio reviewing. As for my qualifications to make this statement, I am a recent graduate of Clinical Psychology program, where I spent most of my research and time working in things pertaining to research design. I now work for an agency as a Research Associate under Cornell University, and most of my work is in conjunction with other big Universities, some of my work is in some pretty high profile cases. I wont tell you that I am an expert in Experimental design, but it is what I went to 8 years of college for, and I probably know more about it than most. I feel that Experimental Psychology is the most relevant field to the subjective nature of Audio equipment review. I do not think that Engineering experimental designs would be appropriate, I do not think that even medical experimental designs would be the most appropriate. When someone wants to test the effects that something has on human senses, and human perception, which is what I believe listening to music would fall under, they would turn to someone like me, not a Medical Dr. or Engineer.

I pulled out some text books from my undergraduate years to get some of the most basic explanations I could, as some I thought would just not be fair to write up here. Single Blind testing is a an experiment inwhich the treatment group are not informed as to the nature of the experiment. Double Blind is when both the treatment group and experimenter are not informed of the experimental condition. This is used to remove experimenter Bias, when it's believed that the experimenter could introduce an extraneous variable by his own bias. So lets look at an example. Say we are testing the difference between Amp "A" and amp "B". The experimenter has say an "A/B" switch hooked from a preamp to the amp, and the treatment group, a listener or multiple listeners sits and listens. He then, through some measure, say a check list, or possibly just his educated opinion, tries to identify the differences he hears. With Double Blind, the experimenter would not know which amp he is switching between, and the idea would be, he could not influence the listener to think he is hearing differences that he is not really hearing. Here I will agree that a Double Blind test looks like a good idea.

However, we missed a whole bunch of extraneous variables we just introduced, along with taking certain things for granted, including somehow turning the treatment subject into an objective measure, which he/she is not. The A/B box is thought to be transparent in said experiment, in most of science, it would be, but in Audio, it probably is not. Second, we assume that the listener has the ability to hear all the difference's in these amps within a matter of seconds or minutes. We are assuming that the subject has a strong sense memory, which research in this area suggests that our short term sense memory is quite poor, so that would also not be true. We assume that the subject has the ability to objectify, to the best of his ability, the subjective differences he is experiencing. We are not comparing to a control group for the subject, so we have a subject bias, but we are controlling within subjects through the control of one amplifier, hopefully.

How does that compare with the experiments that reviewers do in real life, not at all. We get on their cases and gripe over the need for Double Blind testing. However, that is not enough, a simple double blind test does not take into account all the variables needed, and the naturalistic experimental design that reviewers use, I think, actually does a better job. It takes into account enviroment, sense memory, minimizes issues like the transparency of the switch, and takes advantage of subject bias rather than is hurt by it. In my opinion, a true experiment in whether amps sound different or not would actually use a highly controlled design that eliminated as many variables as possible, was counterbalanced in nature, and would not involve one person, or one set of actual experiments, but rather many. It would take into account all the variables of the subject as well, such as sense memory, quality of the senses, etc. Another major issue is that our reviewers do not go through a training to standerdize how they objectify what they hear, so what is bright to one, may be detailed to another. They shoot from the hip, they need actual measures capable of turning subjective experience into objective data. Such devices exist, they usually use 5 point Likert Scale questions, and would require training to ensure consistency. Given that this will probably never materialize, and is probably overkill, I really do believe that audio review is not something that lends itself well to the experimental paradigm, but can accuratly be explained through the simple introduction of reliable measures instead.
  Reply With Quote
Old 17th September 2005, 09:41 PM   #2
diyAudio Member
 
jan.didden's Avatar
 
Join Date: May 2002
Location: Great City of Turnhout, Belgium
Blog Entries: 7
pjpoes,

Nice write-up of a subject that is close to my heart. But I have a few comments . I feel you are mixing up a few things.

Firstly, we are NOT trying to assess the impact of music on the subjects. We are trying to find out whether two pieces of equipment reproduce the same signal differently.

Now, your case that DBT's are flawed because the A/B switch can influence that, and that subjects lack sound memory etc is certainly true but I can immediately use that to kill the value of ANY *subjective* judgement, because by the same reasoning a subjective judgement is ONLY valid for the person making the judgement, and not for me or you.So, the validity of subjective judgements for, say, purchase advice, is non-existing.

Toward the end you say (if I understand your drift): why not accept objective judgements as a valid judgement? Well, aside from the above objective, there is another one. You agree that extraneous factors determine the result, things that are documented elsewhere also, like the colour, shape, design, brand name, reputation etc. The problem is, we don't want to assess the impact of reputation on component sound perception (unless we are doing a market survey), we are trying as I said before, to assess whether a component reproduces the signal differently.

So, to keep this discussion clean, lets first try to make clear what we are trying to say:

Are we indeed trying to assess the difference in sound reproduction? Why then should we accept that, say, physical design has as least as big or even bigger influence on that perceiced (or not) difference? Would you accept the outcome of a clinical experiment trying to find out if a certain new medical drug works if you KNOW that the pills' shape and color has as AT LEAST the same or even bigger influence on the patient's report on its effectiveness???

Jan Didden
__________________
If you don't change your beliefs, your life will be like this forever. Is that good news? - W. S. Maugham
Check out Linear Audio!
  Reply With Quote
Old 18th September 2005, 03:23 AM   #3
pjpoes is offline pjpoes  United States
diyAudio Member
 
Join Date: Jan 2005
Location: New York
Send a message via AIM to pjpoes
To be honost, I think you confused a great deal of what I was trying to say, but I may not have been clear. I had not thought this out like I might for work, so I see that what I wrote is a bit mixed.

First, you may want to ***** if two amps reproduce sound differently, but it is my contention that we can't actually measure that. If you dont't believe me, then try and think through what it is you are looking at in the subjective assesment of a product. If we wanted to only know what difference a sound product has, and not the interpretation of the listener, then we have to eliminate a listener. Measure it, as we do, but, we have realized that we can not measure all that we can hear. Then you have the issue of how sound works on a human. We take sound in through our ears, and then process it in our brains, with a great many factors influencing how we percieve that sound. It is not a very straightforward process, making assesment very difficult. So Instead, what I was suggesting was that if a truely scientific experiment was to be done, you would have to exaustivly test every aspect of the process, to eliminate extraneous variables, including using many many many different listeners. However, I dont believe that to be practical, and I feel that the naturalistic observation method used really does a perfectly adequate job. We can, ourselves, adjust for listener bias and other variables just by being familiar with the listener.

As for the Objective over subjective matter, its important to understand those terms. Basicly, Subjective means the interpretation of a stimuli response, and Objective means the actual response. If we can't measure the actual response, and in Audio, we can not, then we have to objectify a subjective interpretation. We can do that to a point, though not completely. First, you use a special question scale known as a Likert scale, those are the rating scales. (1) poor, (2) fair (3) Good, etc, and you usually use odd numbers of choices like 5 or 7. Then, in order to avoid differences between reviewers or observers, and to avoid ceiling effect or central tendancies, you train the observer to match some standard you setup, and would do this with every observer until they rated to within 1% of that standard, if possible. The closer the better. This keeps everything as consistent as possible, and allows you to compare the numbers to each other. This is basicly the method to objectify a subjective matter. Its not 100% accurate, but it does a very good job none the less. Its really all we can do.

See, what I think people forget is that, we really can't measure how something actually sounds to people, because we have the unfortunate problem of having people. People don't hear things all the same, and simple measurements don't tell us what the human brain will do to our sound interpretation.

One example of how powerful the human psyche can be on what we hear, and may help you understand the problem here is this effect that happens with two notes very close together. When you have two notes played that are very close to each other, say one is slightly flat to the other, they tend to warble or modulate. The closer to each other they get, the faster that modulation is. Once you have them at the same frequency, it stops, and you simply hear the two tones layer each other. Now, and I might have this detail wrong, as its something I was told in a recording class (Free Elective ), that is not actually happening. You could not measure the warble, as nothing is warbling. However, our brains can not process the two discimeral but close notes, and so it creates the warble, its a psycho-acoustic effect. Musicians, my self included, use it to tune instruments as its very accurate. What this tells us is that our brain, universally so, has an effect on the sound, in many cases an effect that we all share collectivly, but can not measure, we can only express it. So the key is to find a way to objectify our subjective experience as much as possible, and that starts with consistency. The other way, and its what reviewers do now, is to simply get to know the reviewer and all of his or her quirks and styles, terms, tastes, biases, etc. Then when they review something, we can remove all of those from the review equation, and what is left is what actually exists for everyone who listens, or as close as you are going to get.
  Reply With Quote
Old 19th September 2005, 07:44 AM   #4
diyAudio Member
 
Sch3mat1c's Avatar
 
Join Date: Jan 2003
Location: Milwaukee, WI
Send a message via ICQ to Sch3mat1c Send a message via AIM to Sch3mat1c
No, the warble is actually there. When you linearly mix two frequencies very close, the amplitude appears to modulate. In fact, AM itself is a carrier with upper and lower sidebands, no suprise the waveform is very similar. The difference is easiest seen at maximum modulation, where the envelope defined by the peak voltage appears as rectified halves of a sinewave.

The following graph was produced with the equation:
f(x) = A * sin(Theta * C) + B * sin(Theta * D)
Where Theta = Pi * (1 + x/2).

Tim
Attached Images
File Type: gif interference.gif (6.2 KB, 220 views)
__________________
Seven Transistor Labs
Projects and Resources
  Reply With Quote
Old 19th September 2005, 07:52 AM   #5
diyAudio Member
 
Sch3mat1c's Avatar
 
Join Date: Jan 2003
Location: Milwaukee, WI
Send a message via ICQ to Sch3mat1c Send a message via AIM to Sch3mat1c
Quote:
Originally posted by pjpoes
First, you may want to ***** if two amps reproduce sound differently, but it is my contention that we can't actually measure that. If you dont't believe me, then try and think through what it is you are looking at in the subjective assesment of a product. If we wanted to only know what difference a sound product has, and not the interpretation of the listener, then we have to eliminate a listener. Measure it, as we do, but, we have realized that we can not measure all that we can hear. Then you have the issue of how sound works on a human.
Sounds to me like you're almost leading up to Occam's Razor: the simplest solution probably is. In this case, psychoacoustics. Except you didn't, you went off on a far less probable tangent.

I mean, give me a break here... if you can't measure a correlation using today's sophisticated equipment, what the heck makes you think the human subjects are going on anything percieved audially?

The ONLY barrier is that people REFUSE to believe that they are listening to their own PREJUDICES, not their cables or markers-on-CD-edges hacks (especially the $500/ft+ cables!).

Tim
__________________
Seven Transistor Labs
Projects and Resources
  Reply With Quote
Old 19th September 2005, 10:11 AM   #6
SY is offline SY  United States
diyAudio Moderator
 
SY's Avatar
 
Join Date: Oct 2002
Location: Chicagoland
Blog Entries: 1
Quote:
When you have two notes played that are very close to each other, say one is slightly flat to the other, they tend to warble or modulate. The closer to each other they get, the faster that modulation is.
Nope, exactly the opposite. As the two notes get closer, the beats decrease in frequency.
__________________
And while they may not be as strong as apes, don't lock eyes with 'em, don't do it. Puts 'em on edge. They might go into berzerker mode; come at you like a whirling dervish, all fists and elbows.
  Reply With Quote
Old 19th September 2005, 12:18 PM   #7
diyAudio Member
 
jan.didden's Avatar
 
Join Date: May 2002
Location: Great City of Turnhout, Belgium
Blog Entries: 7
Quote:
Originally posted by pjpoes
[snip]First, you may want to ***** if two amps reproduce sound differently, but it is my contention that we can't actually measure that. If you dont't believe me, then try and think through what it is you are looking at in the subjective assesment of a product. If we wanted to only know what difference a sound product has, and not the interpretation of the listener, then we have to eliminate a listener. Measure it, as we do, but, we have realized that we can not measure all that we can hear. Then you have the issue of how sound works on a human. We take sound in through our ears, and then process it in our brains, with a great many factors influencing how we percieve that sound. It is not a very straightforward process, making assesment very difficult. So Instead, what I was suggesting was that if a truely scientific experiment was to be done, you would have to exaustivly test every aspect of the process, to eliminate extraneous variables, including using many many many different listeners. However, I dont believe that to be practical, and I feel that the naturalistic observation method used really does a perfectly adequate job. We can, ourselves, adjust for listener bias and other variables just by being familiar with the listener. [snip]

Sorry, but I don't get any of this. First you seem to imply that because humans are subjective, it is very difficult to do an assessment of sound reproduction. You say that it is almost impossible to do a good scientific assessment. So, you say, lets stay with the current naturalistic (whatever that is; I take it it is subjective) method. But by your own words that subjective assessment is then worthless as an assessment on sound reproduction!! You find it next to impossible to isolate all external factors that lead to subjectivity, yet you think we can easily "adjust for listener bias and other variables just by being familiar with the listener". Surely you are joking??

Why are humans subjective? Well, I postulate that they are NOT really subjective. But, being bombarded with a plethora of impressions and opinions, sound, color, appearance, reputation, body language of peers etc, they make a weighted judgement and that judgement therefore may differ from occasion to occasion, from person to person etc even with the SAME test or equipment. The way to solve this would be to try to switch off all those external impressions and signals and try to limit the variables to what you want to assess: the sound reproduction. That is what DBT are for, and so far I am not aware of any method that does better. Your proposal, that, in essence says, lets just listen, have fun and state whatever we feel, is a giant step backwards. Unless you just want to listen, have fun and state whatever you feel of course. But I thought the object here was trying to find out whether people can hear differences between equipment and if so, what they are and/or which is preferred.

Jan Didden
__________________
If you don't change your beliefs, your life will be like this forever. Is that good news? - W. S. Maugham
Check out Linear Audio!
  Reply With Quote
Old 19th September 2005, 10:22 PM   #8
pjpoes is offline pjpoes  United States
diyAudio Member
 
Join Date: Jan 2005
Location: New York
Send a message via AIM to pjpoes
Many of you have missed my point big time. I think a lot of that is a misunderstanding of the scientific method, which is exactly why I stated that I may have a bit of knowledge over many of you, given my background.

As I said on the warble effect, it was something I had simply been told by an instructor in a studio production class, and I took it with a grain of salt, as I had never heard it before.

As for the issues of subjective vs Objective, its clear to me that you simply dont understand atleast my definition, as would be used in the kind of scientific testing I do. Humans are subjective, as far as measuring our senses are conserned. That is a fact, it is not argueable, at the moment. Subjective means our experience of something, objective means the actual "something." Again, that is the operational definition used in psychological studies of subjectivity vs objectivity with human subjects. That means that if we are trying to measure the experience that a listener gets from an amp is subjective. The goal of any good scientific study is to attempt to code the subjective experience into numbers that can be treated like consistent objective data, to objectify it as best as possible. That is all I proposed. However, I feel that fully doing that would be unreasonable.

I never said anything about just listen and whatever, nor do I think that cable prices or anything of that sort have anything to do with this discussion, so for the sake of a propper arguement, lets leave all of that aside for the moment.

Occam's razor is a common montra in psychology, and one I should learn better, as I am great at designing complex all encompassing studies that no one could ever get funded, and often have to streamline my studies big time in the end. So here, I suppose what I am suggesting is that, avoid my overly complex method of measure for studying the sound of an amp, and instead go with a simpler, less effective, but acceptable method, which would not be far off from what we have now. I would just call for more standardization, maybe some training in how to review a product, so everyone in a group of reviewers comes to the same sorts of conclusions for a given product, thus eliminating listener bias. Of course, that isn't so simple either, as sometimes what sounds accurate doesn't sound as pleasing, and then opinions and biases too take on a part, etc etc.

Again, I will restate that I believe we can not measure the sound of a piece of equipment, say an amp, directly. We can get some of the measurements, and get a rough idea from those measurements. However, I believe it gives us an incomplete picture, and that listening experience must all be measured, which is exactly what a product review is. A product review consists of the listening experience, or the listeners impression of the sound quality, and then product overview for features and setup. If you believe that we can measure everything we can hear, that is fine, thats a difference of opinion, but that would null my whole theory, and DBT as well, as we wouldn't need it, we would simply need to measure our equipment on a computer and be done with it. However, if you buy the premise that some aspects of sound can not be measured and instead must be experienced, then we must measure that human experience to get a full picture of the sq. Again, keep in mind that we can not measure the Sound Quality of an amp directly, we can not measure what our ears are detecting, we can not look at the brain process and see the full picture, so we must rely on subject reflection of the experience. That, as far as I can see, is a fact. DBT can be used in that case, but, as I wrote before, that still doesn't give a full picture of the SQ differences between products, because our senses simply dont work that well.

I have one more analogy that may help you understand why DBT can not work to detect these differences. If I setup a study to show you different color cards, and have you recall if two are the same or different, and I placed them side by side, you probably could detect very slight shade differences. However, If I show you one, then take it away, and show you the other, you probably could not detect the difference. This is because our sense memory is not very strong, and so we can not hold the one color in our brains long enough to compare it with the other. The analogue to sound would be playing two different components for you, and asking you to tell the slight differences between the two. If you had both available at the exact same time, you probably could, accept that our ears can't deal with that well, and its very difficult to do. So instead, I have to show you only one sound, take it away, and show you the next, and ask you to detect the difference. The problem here again is that our memory is simply not strong enough to extrapolate all the sudtle differences that exist. Instead, I would have to let you live with the two colors, or sounds for a long period of time, and then you would begin to notice all the differences. Again, please understand that is something I can defend with a great deal of studies, that is how our senses work, and that is always taken into account when attempting to measure the response from our senses.
  Reply With Quote
Old 20th September 2005, 12:47 PM   #9
diyAudio Member
 
jan.didden's Avatar
 
Join Date: May 2002
Location: Great City of Turnhout, Belgium
Blog Entries: 7
Quote:
Originally posted by pjpoes
[snip]As for the issues of subjective vs Objective, its clear to me that you simply dont understand atleast my definition, as would be used in the kind of scientific testing I do. Humans are subjective, as far as measuring our senses are conserned. That is a fact, it is not argueable, at the moment. Subjective means our experience of something, objective means the actual "something." Again, that is the operational definition used in psychological studies of subjectivity vs objectivity with human subjects. That means that if we are trying to measure the experience that a listener gets from an amp is subjective. The goal of any good scientific study is to attempt to code the subjective experience into numbers that can be treated like consistent objective data, to objectify it as best as possible. That is all I proposed. However, I feel that fully doing that would be unreasonable.[snip]

We understand all that, done it, been there. You skipped the basics. With DBT we don't want to objectivise human subjectivity. We JUST want to find out if there is an audible difference. To do that, we try to set up the test such that the only variable is the sound. We try to keep the listener from knowing which component in the comparison is playing. We try to delete all clues that are not strictly the sound. I don't see how you can get more scientific and objective than that.

The listener hears a difference or not, and scores accordingly. Now it is entirely possible that he hears it sometimes, or just on some types of music, or only after a beer or two. That then would be important indicators that the differences are relatively small. That then would also mean that in normal listening where we try to enjoy the music instead of listening to the equipment, the equipment is pretty equal in performance. On the other hand, if many different listeners in any circumstance consistently hear a difference, that would be an indication that one equipment would be superior to the other.

The problem is not the test. The problem is the unwillingness to accept results that contradict peoples prejudices. Some people go so far as to suggest, without any back-up, that the stress of an organised test keeps people from hearing differences that they would be able to readily hear in their own home comparing two pieces of equipment. And that at home, they would be perfectly able to disregard equipment price, design, reputation etc from their judgement, although it is documented that even experienced listeners cannot do that. So, I feel you are barking up the wrong tree.

Now, I agree, if we would do a test asking listeners: "Please tell me which component gives the best rendition of the emotional content of this musical excerpt" then we run in all sorts of trouble as discussed by you. In that case we would indeed, as you say, "trying to measure the experience that a listener gets from an amp". But that's not what we do.

Jan Didden
__________________
If you don't change your beliefs, your life will be like this forever. Is that good news? - W. S. Maugham
Check out Linear Audio!
  Reply With Quote
Old 20th September 2005, 12:53 PM   #10
diyAudio Member
 
jan.didden's Avatar
 
Join Date: May 2002
Location: Great City of Turnhout, Belgium
Blog Entries: 7
Quote:
[i][snip] avoid my overly complex method of measure for studying the sound of an amp, and instead go with a simpler, less effective, but acceptable method, which would not be far off from what we have now. I would just call for more standardization, maybe some training in how to review a product, so everyone in a group of reviewers comes to the same sorts of conclusions for a given product, thus eliminating listener bias. Of course, that isn't so simple either, as sometimes what sounds accurate doesn't sound as pleasing, and then opinions and biases too take on a part, etc etc. [snip][/B]

I hear what you are saying here, and I agree that it appears a sound objective or scientific way. But the problem I see is that what you call a standard is simply YOUR perception, which you want to impose on others as the Way It Should Be. This would be a great test to find out how the various pieces of equipment do according to YOUR standard, but what's that to me?

Jan Didden

PS Please don't take all this personally; I really try to address your posts' contents.
__________________
If you don't change your beliefs, your life will be like this forever. Is that good news? - W. S. Maugham
Check out Linear Audio!
  Reply With Quote

Reply


Hide this!Advertise here!
Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
ABX double blind comparator - original Speaker Dude Swap Meet 4 18th March 2008 08:40 PM
Evaluating amps, a fair comparison blind testing. destroyer X Solid State 11 21st March 2005 01:04 AM
Capacitors and double-blind listening tests MarcelvdG Parts 9 24th May 2003 03:39 PM
NFB and Electron Propagation (from Blind Testing) mikek Everything Else 135 29th March 2003 09:31 PM


New To Site? Need Help?

All times are GMT. The time now is 05:26 AM.


vBulletin Optimisation provided by vB Optimise (Pro) - vBulletin Mods & Addons Copyright © 2014 DragonByte Technologies Ltd.
Copyright 1999-2014 diyAudio

Content Relevant URLs by vBSEO 3.3.2