Double Blind Testing

Status
This old topic is closed. If you want to reopen this topic, contact a moderator using the "Report Post" button.
carlosfm said:


SY, this isn't funny.
Who are you comparing us to? :confused:


To the elephant that would bother with a biting ant so that the ant could feel important? Not funny at all :D

Here´s a much realistical joke:

An ant and an elephant were walking side-by-side through a desert . After a while the ant looks back and says:
"Wow! Look at all the dust we´re raising!"

;)


SY said:
Cork. You guys are world's largest producer of cork. That's why I always check under my car for bombs when I'm visiting Portugal.

Sy, you´re making a bad name of our hospitality. Here in Portugal we do not hyde bombs under cars. We welcome visitors with bottles of champaigne. We´ll even point one at you just to show you another sample of our products.

(Cheers)
 
SY said:
Paulo, the Amorims have not been quite as generous as you!

SY, I don't know what's the problem with Amorim, probably they only sell large quantities.
But hey, generalizing a hole county (or people) by the business experience with one company is not wise, is it??

Would it be logical to think that all americans are as smart as Bush?
Of course not... :D
 
Well I have run out of ways to explain myself here. Honostly, much of what you say would not jive in the professional scientific community, though you think it does. Doint DBT's as you feel they should be done would have far more intervening variables than you are willing to accept, and you seem to still be making a presumption of our senses that does not fit the model accepted by the scientific community as a whole. Our sense's simply can not due what you seem to think they can, no amount of "reasonable" time would allow our sense of sound to absorb enough to tell a difference reliably, and the differences dont have to be that sudtle either. Yes I would expect that somebody could hear the differnece between an amp that makes 1% distortion and .001% distortion, but again, if such differences are all that matter, then we dont need DBT, a simple distortion measurement would do fine. However, if you believe, as I think most here do, that there is more to sound then what we can measure, hence the need for something like a DBT, then my point remains valid.

The standard for which you call, "My Standard," is simply how things are done in any human based scientific experiment. You must create a standard to keep things consistent and give you something to compare with. Though you may call this "My Standard", it really is not, it is how things are done, and the standard is validated, usually more than once, before its used reliably, making it simply a standard. Like I said, its not perfect, but agian, its for more valid than a DBT would be in this sort of scenerio. DBT's are simply a small part of measuring in an experiment, if all you did was that, and you were to submit the results of a simple DBT to publication panel, it would be laughed out of the room. I do not believe that DBT's really have any place in this sort of experiment, but if you did do one of sorts, it still would have to be only one small part of many experiments used to measure the differences found. You would have to train the listener, again, for consistency. You would have to standerdize terms, for more than they are now, and then run each listener through a series of standardized tests, inwhich the listener scored specific known variables, all to ensure that each listner used in this panel is able to score reliably and consistently. Not only with themselves, but with others.

As for the color cards, and picking a curten and living with it. My feeling is that, for some this is true, for others it is not. Some people are fussy and picky, and most research suggests this to be both a mix of psychological quirks and actual physical differences. Some people are more sensitive to color than others. Some are more sensitive to certain colors than others. A card shown back and forth in a DBT would still only give a very small sampling of what the person is going to have to live with. Once they get that big picture, say a room full of these curtains, and they live with them for months if not years, then certain people might begin to find the color just not quite right. Others may be happy. Thats all an issue of personal taste, but it still matters, as the same applies to most everything. In music and audio equipment, Some people can not hear as well as others, some are more sensitive to certain frequencies than others, etc etc. Then you have personal taste, which also effects perception of sound. Which by the way, my experimental design proposal here would eliminate the variable of personal taste, a DBT in and of itself does not. My point with all of this is that two amps which sound somewhat similar, in time may begin to seem quite different as you live with them, and some people may find the difference worth the huge cost, even if they are sudtle.
 
I will back up my response here with someone who has done a great deal of research, over 35 years worth, in something he calls perceptual organization, with regard to sound. His area of study is similar enough to what we are attempting to do here that his methods would transfer over easily, infact, his methods are basicly the same as mine, though he is more willing to call this completely subjective than I am, he too objectifies his subjective results through systems of coding and practice for consistency. The reason he does this, and the same reason I am saying that traditional, Yes/No or Right/Wrong measures using DBT's, is that they are not sensitive enough, and subjective experience can be far more sensitive to differences. He even states in one of his articles, "Auditory Scene Analysis and the Role of
Phenomenology in Experimental Psychology"(ALBERT S. BREGMAN), that visual studies have an edge in when being published, because they can show through visual drawings the subjective methods used, and everyone can see for themselves the differences, but not with Audio. You have to hear it to understand, and so people are suspicious.

My point for why the commonly proposed Double Blind Testing is bad is because, as its proposed, it does not measure the sound of an amplifier in an objective way. It only eliminates visual bias and experimenter bias, and that is not all that matters. You still have personal prefrence for sound, and way of describing the difference, unless all you want is, yes it sounds different, no it doesnt, or its Amp A, no its Amp B, which again, has its flaws. I could show you that DBT's used for color are flawed, that if I showed you two different, yet similar color cards back to back, and asked you to identify which was A and which was B consistently over many trials, you would get as many wrong as you would trying to identify the different amps, but that doesn't mean they aren't different, nor does it mean that you can't tell the differnce, it simply means that the test is not sensitive enough to allow you to tell the difference, so it lacks validity. Hence the need for a subjective measure. However, to make it valid (it already has very strong ecological validty because it is subjective, but weak internal validity because its not consistent), we create a way to make it consistent. As for DBT's not measuring what you think they are, keep this in mind, a really common problem is the understanding of what measuring sensory perception. Even in psychology that term has come to mean measuring a difference that exists when the senses are stimulated. Psychologist's do their best to attempt to measure this change accuratly, by eliminating the problem of actual human perception, which includes a lot of brain processing which introduces biases based on past experience, mood, heart rate, moon phase, etc. However, my point here, and Bregman's point is that, that is impossible. Now an actual scientist would argue that it is not, but again, he would not use a DBT to prove this, they simply do things like measure the biological response of the body to the stimulus, such as brain activity, or chemical's in the blood, which can change from the stimulus, such as the introduction of an Amino C or other indicator. The other way, the place where a DBT would play a role, but again, it would have to be modified from how you all propose its use, is to attempt to objectify a subjecftive experience. Each click of that switch to change between amp's is still only changing our experience of the amplifiers sound, not a direct measure of the amp itself. We attempt to eliminate bias by removing visual bias, interviewer bias, it controls for room, speakers, etc. Which is all needed, however, then what? How do you code the data recieved from this test, how do you extrapolate from the responses, especially if its turned into a sensitive test, anything meaningful. That is where my methods come in, which is to standardize the reviewer through training and education in how to hear the differences, how to verbalize the differences, etc.
 
SNIP

pjpoes said:
Which is all needed, however, then what? How do you code the data recieved from this test, how do you extrapolate from the responses, especially if its turned into a sensitive test, anything meaningful. That is where my methods come in, which is to standardize the reviewer through training and education in how to hear the differences, how to verbalize the differences, etc.


I dont get it ? Put a set of speakers in a room. Cut wires. Hook up an A + B switch. Make sure that it all = itself out to a level that equipment would consider = then bring in the subjects .

Play a song and switch between amps. Even if you play 4 bars then go back play the same 4 bars again with a different amp. If there is a difference and the subject likes that difference have them write it down or better yet raise their hands when the sound they prefer is on. I dont get how this is hard to understand. I dont see a variable in this particular test scenario.

This isnt a coke vs pepsi test :hot:

Coke is better by the way :smash:
 
janneman said:
but nobody ever said DBTs are perfect; but they beat anything else that has been proposed.

This is true if you are trying to eliminate certain biases, but it does not automatically follow that the DBT is the most sensitive test of all audible differences. Maybe a good test (one worth doing) is well and truly impossible?
 
I dont get it ? Put a set of speakers in a room. Cut wires. Hook up an A + B switch. Make sure that it all = itself out to a level that equipment would consider = then bring in the subjects .

Play a song and switch between amps. Even if you play 4 bars then go back play the same 4 bars again with a different amp. If there is a difference and the subject likes that difference have them write it down or better yet raise their hands when the sound they prefer is on. I dont get how this is hard to understand. I dont see a variable in this particular test scenario.

This isnt a coke vs pepsi test :hot:

Coke is better by the way :smash: [/B][/QUOTE]

THe variable is the listener. First, what defines prefered sound, how are they identifying that. I mean, if all you want to know is which one they prefer, fine, but thats of no use, how do you know which one sounds better. And if 10 prefer amp A, and 2 prefer amp B, why did ten prefer amp A, what was the differences. Most people would want to know this, the reason being, it helps them identify why its prefered, and they can use that with their own sound bias. What you suggest is fine, but honostly, its added an extra step. Why then bother with the DBT at all, if the only concern is that knowing the brand or knowing the look will effect the percieved sound, then DBT still isn't needed really, just a SBT, where the reviewer doesn't know the details about the amp. Then again though, you still haven't solved the problem of reliably distinguishing differences between amps. Something that SBT or DBT has nothing to do with, thats a matter of reliable coding methods.


Madmike2 Why would people try to identify different amps ?

As for this question, that pertains to an old bunch of threads and idea's that still seem to mantain. Basicly, there is a veiw that all amps of good specs sound the same, and that we could not identify or distinguish between them. The proof of this was a large bet and a poorly set-up SBT scenerio, of which, scientificly, it had no validity or reliability. It was of such poor sensitivity that nobody stood a chance, hence why DBT's and SBT's are a poor way to review amps, or any component for that matter.

quote:
unless all you want is, yes it sounds different, no it doesnt

That strikes me as a pretty damn good place to start.

Yes it is a good place to start, but not a good place to end. It seems to me that many people want to end there, and the whole point of all this discussion is to say, we can't end there, it doesn't tell us anything useful other than, they sound different. My response to every review I read like that would always be, "Great, Why, which is better, by whose definition" All things I need to know to draw any sort of conclusion from the review. And if you simply think we should start there, fine, but then why fuss with a DBT, or SBT, that can be achieved through far simpler methods.

This whole post started with a mix of a friend trying to lecture me about wasting money on stereo equipment, that science has proven that we can't hear the differences. Then reading some letters to the editor in Stereophile about how we need to start doing DBT, as its the best and only way, IMO not even a viable way, and then seeing the editors actually catoring to these people who really know nothing about scientific design, or have any fundamental understanding of what it takes to propperly evaluate what differences exist, and the qualities of those differences. I felt I needed to give my view, based on a lengthy education, and some experience in developing experimental designs in a closely related field of study. I of course have my bias, but much of what I say is true regardless of opinion or view, and is simply stating the facts of a good experimental design.
 
pjpoes said:


This whole post started with a mix of a friend trying to lecture me about wasting money on stereo equipment, that science has proven that we can't hear the differences. Then reading some letters to the editor in Stereophile about how we need to start doing DBT, as its the best and only way, IMO not even a viable way, and then seeing the editors actually catoring to these people who really know nothing about scientific design, or have any fundamental understanding of what it takes to propperly evaluate what differences exist, and the qualities of those differences. I felt I needed to give my view, based on a lengthy education, and some experience in developing experimental designs in a closely related field of study. I of course have my bias, but much of what I say is true regardless of opinion or view, and is simply stating the facts of a good experimental design.

Then tell your friend he is smoking something. I am an IDIOT and i can hear differences. Not in cables and i dont want to start a speaker wire thread again because thats beat to death. But dude, ..... who cares WHY something sounds better besides you ? Someone that might want to reverse engineer it mabey. But average Joe and Jane Consumer they want to buy what sounds best. And if they judge things by what everyone else likes then they will buy what the 50 people liked and not what the other 12 did. Again you ask Why ? Because they work like all lemmings do, follow the majority.

Now The people in this forum are sitting at home scratching their heads wondering what you are going on about. We tweak, tune, position, elevate, pontificate till something sounds good to US. And if you dont like our sound then YOU are an idiot. Because we are a majority of one. And we are never wrong.

Science connot measure opinion dude. Opinion isnt static and science needs absolutes. You just spent 8 years in school to become an advertising copywriter. All your ever going to do is come up with new and better ways to convince people that a fluid is a solid. I hope you understand what i am saying ;)

And before you say that i am not understanding what YOU are saying. Let me tell you what a brilliant man once said before i killed him.

" Its not the job of the listener to infer what you are saying, its the job of the teller to mean what they say. " Milton .H Erickson

So if you think people dont get it then stop describing it the same way over and over again using different sentence structure.

(just joking, i didnt kill him):angel:
 
AX tech editor
Joined 2002
Paid Member
jeff mai said:


This is true if you are trying to eliminate certain biases, but it does not automatically follow that the DBT is the most sensitive test of all audible differences. Maybe a good test (one worth doing) is well and truly impossible?


Well, IMHO DBT is the best we have to identify audible differences without mucking up the perception with extraneous impressions & beliefs, if there ARE differences. So it follows that IMHO it IS the most sensitive. It would be nice to have one that is even more sensitive and being able to shut out all possible biases, but I don't know any. Doesn't mean it doesn't exist, of course.

Jan Didden
 
AX tech editor
Joined 2002
Paid Member
pjpoes said:
I dont get it ? Put a set of speakers in a room. Cut wires. Hook up an A + B switch. Make sure that it all = itself out to a level that equipment would consider = then bring in the subjects .

Play a song and switch between amps. Even if you play 4 bars then go back play the same 4 bars again with a different amp. If there is a difference and the subject likes that difference have them write it down or better yet raise their hands when the sound they prefer is on. I dont get how this is hard to understand. I dont see a variable in this particular test scenario.
[snip]


Sure, that'll work. IF the test is DB, that is. If not, you are never sure that the preference is really caused by the sound, and not the design, color, prejudice, peer opinion, etc. You ARE aware of that, yes?


Just in case: look here: http://www.harman.com/wp/pdf/AudioScience.pdf . Scroll down to page 10, section "BLIND vs. SIGHTED TESTS – SEEING IS BELIEVING"

Jan Didden
 
AX tech editor
Joined 2002
Paid Member
pjpoes said:
[snip]THe variable is the listener. First, what defines prefered sound, how are they identifying that. I mean, if all you want to know is which one they prefer, fine, but thats of no use, how do you know which one sounds better. And if 10 prefer amp A, and 2 prefer amp B, why did ten prefer amp A, what was the differences. Most people would want to know this, the reason being, it helps them identify why its prefered, and they can use that with their own sound bias. What you suggest is fine, but honostly, its added an extra step. Why then bother with the DBT at all, if the only concern is that knowing the brand or knowing the look will effect the percieved sound, then DBT still isn't needed really, just a SBT, where the reviewer doesn't know the details about the amp. Then again though, you still haven't solved the problem of reliably distinguishing differences between amps. Something that SBT or DBT has nothing to do with, thats a matter of reliable coding methods. [snip]


OK, so now we just want to know why someone prefers A over B, and it should not just be limited to the sound. OK I can relate to that. BUT the problem I see is that it is useless. Change the venue, change the lighting, change the sound level, and you can end up with totally different judgements. Does that help?

Jan Didden
 
Madmike2 said:
Play a song and switch between amps. Even if you play 4 bars then go back play the same 4 bars again with a different amp. If there is a difference and the subject likes that difference have them write it down or better yet raise their hands when the sound they prefer is on. I dont get how this is hard to understand. I dont see a variable in this particular test scenario.

You simply can't trust on everyone's oppinion because some don't have the sensibility and/or experience to recognize the sound of a true (unamplified) sound of an instrument.
This way, they don't know if it sounds real or not.

Also, some can detect the sound of a (low level) piano in the background, with other instruments in the foreground, while for others that passes completely unnoticed.
A less transparent amp 'hides' the sound of that piano. Changing for a better one that sound is there but some still don't notice.
They will only notice when someone points that out.

This is true for any kind of test. Blind, sighted, A/B or not.
They are never conclusive.
Learn to trust in your ears and listen for yourself.
 
pjpoes - First off, you can stop mentioning how qualified you are, I think we all get it by now. Second, why do you have to categorize the purpose of a DBT as to determine which piece of equipment sounds better? What about just being able to determine if there is an audible difference? If people can't determine if there is an audible difference between two pieces of equipment by narrowing down the passage size to something relatively short in length, and playing it over and over again in a relatively short period of time, then why should anyone believe different pieces of equipment produce any audible differences? We could all just buy a HTIB and be happy.

What if you did a DBT with the variable being that the music is played through either Paradigm Sigs or some Bose. Not only would people easily hear an audible difference after enough trials went by so that each speaker got to play once, but they would probably be able to tell you which they preferred in that same minimum amount of trials as well.
 
Madmike2 said:

But dude, ..... who cares WHY something sounds better besides you ? Someone that might want to reverse engineer it mabey. But average Joe and Jane Consumer they want to buy what sounds best. And if they judge things by what everyone else likes then they will buy what the 50 people liked and not what the other 12 did. Again you ask Why ? Because they work like all lemmings do, follow the majority.

Science connot measure opinion dude. Opinion isnt static and science needs absolutes. You just spent 8 years in school to become an advertising copywriter. All your ever going to do is come up with new and better ways to convince people that a fluid is a solid. I hope you understand what i am saying ;)


(just joking, i didnt kill him):angel:

Well I would like to say that I do not work in advertising nor do I do copywritting. I really did take that comment to be abit offensive. I work in social services doing program design and analysis. It means I help design programs which are supposed to help people. Though an old field, traditionally goverment and public programs have not relied on sound methods to ensure they work, so as many of you know, we have tons of expensive programs that don't do what they say they do. In recent times the goverment and public agencies have begun relying much more on psychologists to develop these programs with the social workers and nurses, and to then ***** how well they work. To that end, they generally are very sound, and are very good at creating sound new methods for dealing with age old problems. In that sense, I look at myself as a humanitarian, I help people, just I dont tend to do a lot of hands on, one on one, instead I help lots of people, but a step removed.

As to who cares what others think of how something sounds, I wish nobody. It would be great if the world would make decisions like that on an individual basis, but we dont. Some people are like lemmings, but others are not. And not for all a bad reason, it's a lot of work to go around and listen to every product on the market, a good review can be helpful in narrowing the feild. I happen to like the way reviews are done now, but so many people want products assesed in a scientific manner, which can only go so far. There are ways to remove issues like opinions and bias, to a point, but never 100%. My point with all of this was that, DBT or any other single test method like it doesn't in and of itself remove these, but a great many people here seem to think so. Some seem to not even believe me, but it is the truth, they do not, they have never been able to do that. Consumer investigation groups, which employ people with a similar but lesser degree to mine, I only mean lesser in that, its less involved in human issues and more involved in numbers, mostly statisticians, will use DBT's because they lack the expertise in scientific design, instead they specialize in data analysis, and often just dont realize the mistakes they are making.

In response to SteveCallas, just because a test can not show a difference in this narrowed amount of time does not mean there is no difference. It's why validated measures are used, that just means that they are not sensitive enough to flesh out the differences. Not always, it may mean there is no difference, but wiithout ensuring that your method of measure has the ability, that would be a problematic area of the test. Its common with things like DBT, you end up measuring something without taking into account variables that could effect its ability to flesh out differences, my example has been our ability to remember things from our senses for more than a matter of seconds.

Carlos your quote here is goes to my point exactly:
"You simply can't trust on everyone's oppinion because some don't have the sensibility and/or experience to recognize the sound of a true (unamplified) sound of an instrument.
This way, they don't know if it sounds real or not."
-You are right, you can't trust them, thats why you standerdize the opinion. Its common practice in assessing subjective matters. Basicly, you develop a training protocol for assessing, in this example sound, run them through it, then use your method of measure with all the people that went through the program. You continue to work with them untill the variability between their responses is less than 1%. Though you might argue that what we created as our definition of good sound is not valid or the same as yours, its atleast consistent, and by sharing with the reader the defintion we used, the operational defenition, then the reader can atleast understand what the results mean, in that context. As is, you dont know that, nor do you know what the reviewer does know about good sound.

For that article by Harmen, what they say is generally true, and actually does not disagree with what I have said, but it is giving an incomplete picture. First, they still assume that we can do listening tests objectivly, again, I dont see how. If you all believe we can, please share, not only have they not in that article ever actually show an truely objective, by its defintion, measure of sound, they misuse what subjective measures are. Listening tests are still subjective measures. DBT's are not a measure, they are a method, and they are not objective either, at all, they still remain a way to objectify subjective data. You still are testing human experience, it just removes shades of grey from that experience, which makes it more concrete and seem more objective. It is true that subjective measures can be natoriously elastic, which is why a good scientist always publishes his results in a proffessional peer review journal with all his methods and assesments clearly stated in the writeup, which you will note the Harmen people never did with their own results, but that is typical of marketing companies. Publishing them in a peer review journal doesn't mean that you still can't cheat data, but it means that other proffessionals are now reviewing your work, and if you did, they will find it, they will write it up, and your work will lose credit and validity.
 
pjpoes said:
Carlos your quote here is goes to my point exactly:
"You simply can't trust on everyone's oppinion because some don't have the sensibility and/or experience to recognize the sound of a true (unamplified) sound of an instrument.
This way, they don't know if it sounds real or not."
-You are right, you can't trust them, thats why you standerdize the opinion. Its common practice in assessing subjective matters. Basicly, you develop a training protocol for assessing, in this example sound, run them through it, then use your method of measure with all the people that went through the program. You continue to work with them untill the variability between their responses is less than 1%. Though you might argue that what we created as our definition of good sound is not valid or the same as yours, its atleast consistent, and by sharing with the reader the defintion we used, the operational defenition, then the reader can atleast understand what the results mean, in that context. As is, you dont know that, nor do you know what the reviewer does know about good sound.

I don't agree with the first part of your post, but you salvage it in the end. :D
You can't teach people how to listen, appretiate, concentrate on the music, without the feeling they are being put to a test (what can leave to precipitated decisions), on a couple of sessions.
Some things come from experience of listening to instruments, basic music knowledge, listening to lots of systems, music from all kinds, etc...

Some like unnatural presentations, like 'hey, this amp has much more bass!'. It hasn't, it can be just that it can't drive those speakers and sounds untight, with loose bass, what is easily confused with better bass by the uncautious listener.
A live, unamplified drums sounds TIGHT.

You can teach some of these things to people on a couple of sessions, and they will go home thinking.
Thinking of things they have never thought of, never realized, never noticed.
And they will later learn to appretiate all this, but that may take lots of time, lots of music listening, and listening to lots of systems because then they start to realize that they weren't appreciating their favourite music with all the pleasure they can get, on that &/&%$%$# midi system.:D
And believe me, when it sounds good, they will listen to more music.
 
Status
This old topic is closed. If you want to reopen this topic, contact a moderator using the "Report Post" button.