Double Blind Testing

pjpoes · 2005-09-22 11:37 pm

Hey Carlos, and everyone else for that matter, I agree with that totally, You can't teach someone to be an expert listener over night, though I never did say that either, I mean, maybe what would be needed is a year long training that included live music, with education on everything. Its not uncommon. At the same time, being an absolute expert isn't necassary for these sorts of tests. What you do is develop this way of measuring and coding, say for instance sound. Again, doesn't matter if its right, we are measuring something subjective, so to actually measure it, you have to formalize it. That means create a standard, doesn't matter if its right, just consistent. Then you simply have to get listeners to be able to use this measure, say a likert scale list of sorts, all the same, so that each one scores the same sound the same way, to within 1% accuracy. Doesn't mean they are right in some absolute term, my point, and I think your point is that, that absolute doesn't really easily exist in any concrete way, its something that has to be experienced fully over time. However, you use that long term experience, to develop the standard by which your listeners are held to. So lets say I find that you are an expert listner because of your experience in the field. I tell you what my defintions of terms are, so you know how to use them when evaluating something on my scale, or maybe, and more likely, I use you to develop the scale, and the defintions would be your own to begin with. Then you would use this to review different things on the scale, and score the sound. That becomes the standard, and I would then take others, have them do the same thing and keep doing it until they were all scoring the sounds you listened to the same way you did. It keeps things consistent. Some might argue that your defintion of good sound is wrong, and thats fine, but what I have here is still a sound way to consistently measure sound based on some standard. And by just changing product, we still can show an actually existing change in the sound. Yes, change one variable besides the product and the results are no good anymore, but that isn't the point, that is true of any study, as is, a study done like this still shows a change. Whether the change shown is for better or worse sound can be debated, but that isn't the methods so much as the operational defintions, and with regard to subjective experience, that is a debate that can never end.

But again, this isn't to say, Hey this is what we should do. This isn't to say, Hey, sound can be objectivly measured if you do it my way. Its to say, Hey, DBT's don't objectivly, or rather, directly and consistently measure changes in sound, or more importantly, sound quality, its just a small part, a method to be used in conjunction with an actual measure, such as the scale. It was to say, we dont have the ability to objectivly measure sound, other than through oscilliscopes and what not. And if you think my methods are too complex, then good, sit back and listen, that was the point, to say that, to actually even attempt to begin measuring more objectivly would be so intensive that no reviewer in his right mind would ever bother. Most people want to use them like marketing groups do, under the erroneous belief that its objective, and its not. Because agian, to sit someone down under blind conditions and say, which amp, A or B, do you like better, and they say A, only tells you one thing, they like A better, their subjective view. You still dont know why, and you dont know if its actually better, just that they like it better. The issue of flipping back and forth and asking them to identify A and B when its switched on has different problems, and is the reason why I dont like using DBT's in the traditional sense, traditional being how some audio "scientists" have used them. The test isn't sensitive enough to show differences, it is humenly impossible to tell the differences between amps that really do exist. They have never done this with amps to my knowledge, but they have done colors and other sounds, I have a few of the refrences to these studies, where they might differences in the stimuli that could be detected under one kind of test condition, but not under the random A/B DBT condition, showing potential problems with DBT's. The reason's they gave for why the DBT's didn't work was things like, sense memory, sense confusion-this idea that if you make certain types of changes to the sensory experience the brain will just revert back to a more comfortable experience. Sequencing is really big for that, visual sequencing is the most common, but some research in audio sequencing has also been done.

Shoebacca · 2005-09-23 12:36 am

pjpoes,
Perhaps you need to set aside the actual test methodology (DBT, visual cues) for a moment and decide what it is you are going to try and measure.

If we want more objective measures of sound perhaps we can get people to suggest some. In most reviews, the metrics that people discuss are either physically measureable (Bass extension to XX Hz, or THD of X%), or they are subjective and ill-defined ("tight" bass, or accuracy of the soundstage).

It seems to me that you are reaching for something in the middle, something that is perception based, but quantifiable. I imagine most people on this board actually use metrics like this to determine what it is they like and dislike, but when results are posted, all the fine details get left out. Some may not even be consciously aware of the metrics they use.

For example, one thing I like to listen for is the note separation in a fast stand-up bass riff. If I can't hear the individual notes, then I would call the bass response muddy or loose. Perhaps as an objective measure, test subjects could be asked to try and count the number of notes they hear in a test sample. It is quantifiable, and bears directly on people's perception of sound.

Do people have any other suggestions for measurement metrics? What SPECIFICALLY do you listen for when judging audio equipment?

pjpoes · 2005-09-23 1:29 am

What you suggest isn't necassarily wrong or bad, though again, I feel like I must not be expressing myself correctly, as that isn't what I meant exactly, but you are starting to understand it.

First, I wasn't actually suggesting something inbetween in the sense of, I am making up some new way to measure sound. Instead, I am taking from normal scientific methods of quantifying subjective experience, the most famouse being the likert scale questionare, which uses scale questions, you know, on a scale of 1-10, 10 being the best, how would you rate such and such, and suggesting we use these in reviews. So you are on the right track to follow this, but a scientist actually trying to do this would do more than just say, count the number of notes you, hear, but, that could be a question on the scale. However, to make it useable, you have to code it, so in this case, the closer the number is to the actual number of notes, the better the setup is able to reproduce that aspect of sound. And you would have to label that aspect of sound, so say, note definition in the mid-upper bass region. And you would have lots of other questions like that, and you would mix in these likert questions. In order to make the likert scores valid though, you have to train the listeners, otherwise the questions lack sensitivity, and you would get whats called a ceiling effect, all the scores would hover around a certain area, quite often the top for all good products. They didn't call it that, but that Harmen article discussed that, when they introduced standards of comparison, so that they had a bad, anything decent became good way up top. They then fixed it the wrong way, they should have trained the listeners on how to score things, then they would have seen greater resolution in the results.

This would be kinda crazy, but one way to use this would to say, develop a scale. Say one question is about resolution. So we make a recording to use as our standard, and it records something difficult to resolve. Say we record a drum snare, and we want them to hear the snap of the stick against the skin, the metal beads(dont know the right name) shaking, the sound of the cymbols rattoling from the vibrations, the reflections around the room, things that could all easily get lost if the system could not resolve fine detail. So then we have this series of questions that asks them, on the scale, how well could you hear, and then go through each one of them, as a seperate question. Then you tally the scores, and the higher the score, the more resoliving the system is, with drums. Thats always an issue with this method too, you can't generalize, which is true of anything, but people do. They say, oh it scorred high, it must be resolving, but it may not be in the higher frequencies, or with other midrange instruments. Your idea is fine too really, not a bad idea, but you will have things that can't be quantified that concretly, and so for those, you begin to just have to standerdize by coming up with standard defintions of terms, and having them rate on a scale how well it achieves the ideal of that term. And they would know the ideal beacuse you trained them to know it.

As for what I look for in a good system. Resolution in the midband is important. I like to here the dilineation of of midrange instruments. Horns, especially the trumpet is big for me. Guitar, both electric and acoustic. I play the electric guitar and so I am quite familiar with it, I also go to a lot of live shows and I am involved in both recording artists and being recorded, so I am familiar, to a point, with that aspect as well. I use those as my standards. I can make meaningful judgements for myself of how closely something approximates the sound I have as my standard. Sound staging is important to me as well, and I have various things I look for. I think that instruments should be placed correctly, both in scale and location within the 3 dimensions, and I find that most systems have trouble with the depth thing, and having a stage exist outside the speakers. I of course have standards I test by, including recordings that I know will do things like, have a lot of depth, or have an instrument that extends beyond the edges of the speakers. Sometimes I think those are results of accidents in the studio, but hey it worked. One example is a 1950's RCA Red label recording, I can't remember the title as I haven't used it in a while, but I remember listening to it one day and noticing that for some reason the rear stage left extended out beyond the speakers quite a bit, I think it was probably a phase problem, but none the less, it made it so the orchestra sounded realisticly large. It wasn't compressed between my speakers like normal. I thought maybe it was a strange interaction with my stereo, so I played it on a secondary system in my living room, same thing, and then again at my fathers house, same thing, its in the recording. However, I noticed that on my fathers and my secondary system, which both dont image nearly as well, it was far less pronounced, and in one case, was compley flat, but extended far beyond the speaker. I also listen for things like the viscerol aspects, music is physical, you can feel it, and I think that effects how realistic something sounds to us. A kickdrum sitting 10-15 feet infront of me would shake the hell out of my house, and probably feel like a thunk in my gut. But even my current stereo can't produce that realisticly. Again, that is something that could be formalized easily enough. They rate on a scale how intensley they felt the kickdrum, and in that case, I think that the realistic score would be in the middle, and the closer to the middle, the more realistic. I think that a large set of subwoofers or massive stereo in a small enough room could possibly produce a drum at a more physically intense level than would ever actually be felt if that drum was actually in the same room. Again, to keep it accurate, train them, have them hear a drum in the room they are doing the reviewing in, so they know what a live drum in that room would sound like, then playback a recording of the same drum recorded in the same room. I know, you still have some variables like, no recording format to date could likely capture every aspect of the live sound, and no recording method could either, so the recording is inevitably going to be compromised to a point, but you can always prorate the score too. That by the way is when you get into scientists fudging numbers, they use inconsistent methods to "prorate"(not the propper term in statistics, it would actually be manipulating Alpha) the score, which change in whatever direction helps prove their hypothesis.

PauSim · 2005-09-23 1:40 am

Shoebacca said:
Do people have any other suggestions for measurement metrics? What SPECIFICALLY do you listen for when judging audio equipment?

Music

Could it be you ´re all trying to quantify the unquantifiable?

While trying to legitimate DBT as a scientific method of evaluation of hi-fi components, why not go a little beyond? Just link the subjects to electroencephalogram machines
(or lie detectors, or whatever) and measure their brain / body reactions while they´re listening.

Of course, not all individuals will react the same way and you´ll return to ground zero. Using monkeys for the testing is out of the equation too, as they´re all individuals like us. Just watch the Odissey Channel.

I´m in a cybercafé and can´t type much more. I´ll get back with a serious point of view...

Bratislav · 2005-09-23 1:56 am

If DBT doesn't work because "perception of listener is subjective and depends on so many things" why bother with reviews and forums like this ? After all, if all that was said by some here was true, we wouldn't know how a particular piece of wire/amplifier/CD player/resistor/capacitor/insert your pet audio fantasy here is going to sound to us from one day to another (everything changes, remember) let alone believe phony reviewers or DIYer who often doesn't know resistor from a capacitor yet declares a "day and night" difference in amplifier designs.

Sorry, until you give us something better, DBT is the best yet method of weeding out the crooks who dismiss the engineering and science and embrarce woodoo and black magic.
Don't believe measurements ? Here ya go mate, I'll switch the wires and you tell me which is which.
End of story.

jeff mai · 2005-09-23 2:03 am

janneman said:
Well, IMHO DBT is the best we have to identify audible differences without mucking up the perception with extraneous impressions & beliefs, if there ARE differences. So it follows that IMHO it IS the most sensitive.

The DBT may be free from certain biases, but it does not follow that it is the most sensitive test in all circumstances. The DBT may provide results that are free from certain biases while reducing the acuity of the listener compared to other listening methods.

I don't think you'll find anyone that will say Coke and Pepsi taste the same, but there are plenty of blind test results showing that they do. Someone must be "imagining" in these cases - which group is it? The ones that taste the difference or the ones that don't?

jeff mai · 2005-09-23 2:13 am

Bratislav said:
If DBT doesn't work because "perception of listener is subjective and depends on so many things" why bother with reviews and forums like this ?

Indeed, why bother?

I don't come here for opinions - I come here for inspiration and ideas. For the reason above and because there are few people that can clearly communicate what they hear (myself included), I don't trust anyone's ears but my own.

sam9 · 2005-09-23 2:26 am

There is a sidebar in the latest ediition of "Discover" concerning perception that may throw awrinkle into this discussion. It seems someone exposed test subjects to a scent while flashing words on the wall in front of them. This was done without gioving the subjects any explanation about what was going on; they were simply asked to say whether they liked the scent or not. The words were either "cheddar cheese" or "body odor". I'm sure you wouldn't be surprised at the answers.

But that's NOT the punchline. The subjects had brain MRI images taken during the test. When the "Cheddar cheese" was flashed there was a highr level of brain activity "in the secondary olafactory cortex -- a collection of neurons that mediate pleasent sensatory response to smells and taste."

Hypothesis, if you associate some visual cues (like Conrad Johnson logos, glowing tubes, certain colors of cable terminated with gold plate, etc) with a pleasent listening experience, it may be not merely that you believe they sound better than items lacking the cues, but rather that your brain processes the information in such a way that they really do sound nicer by the time they reach the concious part of the brain. I'm not claiming I believe this just on the basis one sidebar, but it's not a unreasonable possability. Too bad no one is likely to try this out with audio components. I would like to compare the brain images between someone listening to a real CJ unit vs. someone listening to a RadioShack receiver hiddend inside a CJ enclosure.

Bratislav · 2005-09-23 2:29 am

jeff mai said:
For the reason above and because there are few people that can clearly communicate what they hear (myself included), I don't trust anyone's ears but my own.

The wholle point of DBT is to admit that you cannot trust even your own ears, when other "subjective" influences are present. Like knowing which wire/amp/resistor/connector/whatever is in play.

Yes, we are subjective lot. Yes, we get influenced by surroundings, emotions, what we had for breakfast, whether we got laid, if it was sunny or rainy outside. That should NOT be a reason to spend another 5 grand for that speaker cable! You can invest in other things that will make you feeld good much more objectively :angel:

SY · 2005-09-23 2:30 am

I tend to subscribe to Sam's hypothesis.

I don't think you'll find anyone that will say Coke and Pepsi taste the same, but there are plenty of blind test results showing that they do.

Really? Can you give a cite of any real experiment that showed this? I see it mentioned all the time, but I find it rather incredible, especially given the results in wine sensory panels that I run routinely.

jeff mai · 2005-09-23 2:48 am

Bratislav said:
The wholle point of DBT is to admit that you cannot trust even your own ears, when other "subjective" influences are present. Like knowing which wire/amp/resistor/connector/whatever is in play.

Why stop at a quarter measure? If you really wanted to be free of subjective influences you'd let a large group of disinterested strangers choose your system.

jeff mai · 2005-09-23 2:55 am

SY said:
Really? Can you give a cite of any real experiment that showed this? I see it mentioned all the time, but I find it rather incredible, especially given the results in wine sensory panels that I run routinely.

First Google search result:

http://www.museumofhoaxes.com/hoax/weblog/comments/1448/P20/

Unfortunately the original article in The Independent is subscription only.

sam9 · 2005-09-23 2:57 am

quote:
I don't think you'll find anyone that will say Coke and Pepsi taste the same, but there are plenty of blind test results showing that they do.

Really? Can you give a cite of any real experiment that showed this? I see it mentioned all the time, but I find it rather incredible, especially given the results in wine sensory panels that I run routinely.

How about Diet Coke vs. Diet Pepsi? I'm not being silly. To me they seem interchangable, but are different from non-diet variety.

SY · 2005-09-23 3:06 am

My wife had NO problem separating Diet Coke, Diet Pepsi, Pepsi One, and Diet Rite in a blind test.

sam9 · 2005-09-23 3:06 am

BTW, Sam's hypothesis suggests if the overall sensory experience is enhanced by visual cues, logos, inch thick brushed aluminum front panels, presense of a Bybee doodad or what not, that the purveyors of such are actually providing value for money at least as far as certain individuals go. Same for crystal pyamids, copper bracelets, etc.

Same applies to those who are impressed by THD+N vs freq charts. Doesn't matter if per your audiologist it's inaudible below .0x%, if the chart is impressive the brain will make is sound better. In DIY-land even the schematic may have an effect -- different people seem to see elegance in different topologies based just on the diagram.

SY · 2005-09-23 3:08 am

Jeff, that secondary source (too bad that there's no info on who did the test, methodology, etc.) does NOT say that tasters couldn't distinguish the two. It says that there was no consistent difference in preference. That is not at all the same thing.

jeff mai · 2005-09-23 3:14 am

sam9 said:
BTW, Sam's hypothesis suggests if the overall sensory experience is enhanced by visual cues,

Same applies to those who are impressed by THD+N vs freq charts.

Glad to see someone else making this point - technical specifications and elegant schematics are rarely mentioned as a possible subjective influence.

jeff mai · 2005-09-23 3:23 am

SY said:
Jeff, that secondary source (too bad that there's no info on who did the test, methodology, etc.) does NOT say that tasters couldn't distinguish the two. It says that there was no consistent difference in preference. That is not at all the same thing.

Point taken, I was sucked in by the misleading byline which said "the difference is all in your head".

Considering the fondness many people have for making qualitative judgements, I'd find it quite surprising if most people expressing no preference did taste a difference. I agree though that it's not the same thing.

Bratislav · 2005-09-23 3:33 am

jeff mai said:
Why stop at a quarter measure? If you really wanted to be free of subjective influences you'd let a large group of disinterested strangers choose your system.

Funnily enough, that is exactly what happens in most cases. People buy items they are told sound great.
Or do you honestly believe that Martin Colloms/John Atkinson/Gregg Borowmann/sweet talking dude at Encel's are intereseted one iota about YOUR system (beyond obvious ineterests like pushing the product for whatever personal gain might be)?

Madmike2 · 2005-09-23 4:18 am

jeff mai said:
Point taken, I was sucked in by the misleading byline which said "the difference is all in your head".

Considering the fondness many people have for making qualitative judgements, I'd find it quite surprising if most people expressing no preference did taste a difference. I agree though that it's not the same thing.

Here in Toronto we have a 3 week long 'carnival' Nicknamed the "EX" One summer i worked that doing the pop quiz. No one could see the product and all the little cheesy platic cups were the same. We knew exactly what we put in each cup but the girl running it always made us back away from the table when a new crew of 'monkeys' came up to the lab for a taste. ( I guess so our faces wouldnt betray anything ). Sometimes just to be pricks we would pour a little Pepsi into the Coke or vise versa just to see the faces and almost everyone scrunched up a little like they couldnt quite figure it out. BUT. In OUR 3 weeks 4 days a week 6 hours a day i watched with my own two eyes as literally hundreds of people came through and did the taste test. You know what ? It was almost 20 years ago but i do remember that better then 7 out of 10 chose Coke. No waffling, just point and shoot "That one". Young, old, skaters BMX'ers. Didnt matter.

They are both sweet acidic nasty drinks that are a last resort when i am thirsty but Coke > Pepsi .

One day i want to put Coke up against Bud Light. I wonder who would win that fight......

I know that taste test wasnt a DBT but i just wanted to mention that. SO i could repeat that coke is better then pepsi because the majority thinks so

Double Blind Testing

Member

Member

Member

Member

Member

Member

Member

Member

Member

Ex-Moderator

Member

Member

Member

Ex-Moderator

Member

Ex-Moderator

Member

Member

Member

Member