Claim your $1M from the Great Randi

Status
This old topic is closed. If you want to reopen this topic, contact a moderator using the "Report Post" button.
It's funny that some of the first verbs I learned in French were "cracher" and "avaler." Both "er" verbs with pretty regular conjugation.

See, the jokes write themselves.

In any case, let me also clarify how triangle tests work in wine tasting. They're actually a bit different than ABX, but have similar rigor from the standpoint of removal of all psychological clues; they're really XYZ, where in any given trio, two of the three are the same.

As an example, let's say we're tasting wines A versus B to see if there is any perceivable difference. Any given trio could be AAB, ABA, BAA, BBA, BAB, or ABB. The taster doesn't need to say which one is A or which one is B, he/she merely needs to spot the odd wine in the trio. We've done this in ABX format, too, but it requires more trials- there's actually a bit of multiplexing in the XYZ triangle version. And certainly that mode of testing could be done in audio, too, if desired.

There are many other forms of blind test, too. For example, when I was designing pointing devices for computers, we would test things like effect of production tolerances or how to set a specification, or the effect of a coding change on the responsiveness and usability. We would mix a group of pointing devices with different characteristics, code them, then have the users rank them or do a Fitt test, chasing little targets around a screen. You could then try to correlate the effect of the variable with user ranking or Fitt test performance. Or you might find out that users couldn't tell the difference. The clear thing is, though, that we could not make performance claims without doing a blind test where the subject was unaware of whether or not he/she was using a "good" device or a "bad" device. Our customers demanded such testing, did it themselves, and again, no-one complained about the validity of blind testing just because things that they WANTED to be true turned out not to be so.
 
SY, I think that there are serious differences in the 'wine tasting' test and the ABX test. I think that it really makes a difference in the results, even if the same statistics are employed. It may also be that we are looking for different things, such as 'quality difference' in wines, but we expect the hi fi stuff to be virtually the same 'transmission' of audio information, and if there is a difference, we will first equalize it out, as best we can.
 
John, do you recall offhand what blind test format Lipshitz and Vanderkooy used to establish the audibility of level and eq changes at the 0.1dB level? They published this in JAES sometime in the late '70s, and knowing your phenomenal library and impeccable filing system, I figure you've got the paper.
 
AX tech editor
Joined 2002
Paid Member
john curl said:
SY, I think that there are serious differences in the 'wine tasting' test and the ABX test. I think that it really makes a difference in the results, even if the same statistics are employed. It may also be that we are looking for different things, such as 'quality difference' in wines, but we expect the hi fi stuff to be virtually the same 'transmission' of audio information, and if there is a difference, we will first equalize it out, as best we can.


John,

Why would it be different? It seems to me that those tests are trying to find differences between wines, or differences between audio reproduction, period. These differences can take different forms, of course. Sure, you use different senses in each case, but there doesn't seem to be a conceptual difference.
The case of trying to equalise the audio level before testing to me seems analogous to, say, making sure all wines are at the same temperature. It seems common sense if you want to find differences in taste, you try to eliminate the other differences to avoid your perception to be mixed up. Likewise, looking for audible differences caused by different equipment, you try to eliminate differences in level, which would also mix up your perceptive apparatus.

Jan Didden
 
John,

I'd be interested in why you believe ABX testing specifically is flawed for the purpose of establishing audibility differences in audio equipment. It's one thing to dismiss it, but it would be more informative to us all if you spelled out your concerns (other than anecdotes like 'I've seen bar tests with 7up and ginger ale where it didn't work so well').

And if your concerns about audio ABX testing have merit, then what would you suggest as a controlled testing alternative? Are there any controlled listening formats that you would find acceptable, or does it all boil down to the ages old 'listening during a test isn't the same as listening when relaxed' explanation?
 
John, can you summarize your objections to L&V's procedures? (I asked the question earlier because it's been 15 years since I read their papers and don't remember the details) Do you have a copy of your published objections I could read?

To tie together your analogy and Jan's point along with my comments about wine, when we do a test on wine, we use the same kinds of glasses for each sample. If you've got a wine in a big glass and the same one in a small glass, differences can be found which aren't there.
 
SY, I do have virtually all the info from the AES on CD rom and all articles from 'The Audio Amateur' where most of the important debate occurred. If you want to contact me personally, I can get it for you. It would be easiest for you to pick up copies from me, as my computer is not at this time capable of sending info on this forum, or even by e-mail.
My basic complaint about the original Lipshitz-Vanderkooy articles in 'TAA ' was the lack to technical understanding to make the testing somewhat on a level playing field. For example, they rolled off the highs 6dB at 15KHz on both units being tested, and apparently didn't notice or care. They also tried to test for slew rate with a moving magnet cartridge that had a 4th order rolloff filter at approximately 20KHz. DUH?
To me it there were other factors similar to wine tasting with equally dirty glasses, rather than clean ones.
My problem with ABX testing itself, was first stated 25 years ago in 'TAA'. This was followed up by subsequent articles in 'TAA' by Rod Rees, who was a professor at Washington State Univerity (I'm pretty sure) It is all here, if you are interested. You are free to put any info that I give you on this subject to associates or on line here.
 
AX tech editor
Joined 2002
Paid Member
John,

IIRC the main (or one of the main points) made by you in that long gone epoch was that, although (double) blind testing removes some potential biases for listeners to hear differences that were not really there, it doesn't remove biases NOT to hear differences that ARE there. This may, among other reasons, be to avoid possible embarassement if one is exposed to be "wrong".
This in itself is a valid point, although of course exactly the same point, or even stronger, could be made for sighted tests.

But please observe. Blind testing is often (and in fact should be) arranged so that individual choices or preferences are anonymous. After all, we are after statistically significant scores, and it really doesn't matter whether it was Stuart or John (no pun intended) who made a particular choice. This will immediately remove any anxiety for possible embarassement, so this becomes a non-issue.

In fact, I feel that the opposite of your point is true. It seems to me that listeners in any test would be strongly motivated to confirm, if only to themselves, that they CAN hear differences. It is also for this reason that blind testing MUST be anonymous to avoid competition among listeners on who can hear the most differences the quickest.

So, whichever way you cut it, it seems to me that the "blinder" the test the more you force the participants into "honesty".

Jan Didden
 
it doesn't remove biases NOT to hear differences that ARE there.

This is encountered in non-audio blind testing. A couple simple mitigation techniques are to select test subjects in such a way that the nature an purpose of the test is concealed or to imbue the test subjects with false counter expectations.

For example, instead of telling the subjects that the investigators are from Golden Ear Electronics, Inc. and that they are going to compare different pieces of audio hardware you tell them you are from Juliard and they are going to compare selections from the Muleshoe, Texas High School Youth Symphony to the Aroostock, Maine Community Chamber Orchestra. In orther words something the test subjects are unlikely to give a d**n about. Of course, what you actually play is the same recording by whoever through different pieces of equipment. You also tell your test subjects that if they are discerning enough to correctly differentiate between the two groups they get a prize. (Free concert tickets?) Thus they have no clue what is actually being tested but they are motivated to discern differences. This removes the bias NOT to hear differences between equipments. Since there will (should) be a baseline series where there really are NO differences, i.e., all the sound samples are driven by the same equipment, you can compare performance in this case to the series where you actually are presenting them with different equipment. Ideally, you, the tester, has no way of knowing until later which was which. This eliminates bias in either direction.

The point is, that a bias NOT to find differences can destroy the validity or your test and there is no good way to detect and nullify that bias. A bias TO find differences can be detected and countered by administering a no-difference baseline which can be comparred to the "real" test. Thus you can improve the validity of the test by concealing the actual purpose coupled with providing a motivation to FIND differences.
 
AX tech editor
Joined 2002
Paid Member
Hi Sam,

Interesting post. The only concern I would have is that I think you have in some way to direct the listeners to the differences you are looking for. In your example, they may for instance come up with differences in the sense like " this is more sensitive, the other recording is more cold". You'r not really looking for that, you're (I presume) looking for things like: " this is more smooth, this is less rough". I don't know, it's 1.30 in the morning, maybe I'm talking non-sense but I have a gnawing concern.

Jan Didden
 
Jan: too easy this time. Next time, try Heinlein or Clarke.

sam9: Randi uses a good technique, one that I borrowed during the time when I was playing around with blind-testing of amps and preamps. I always used subjects who claimed that they were hearing night-and-day differences. And we would do the first few runs non-blinded to make sure that they could still "hear" those differences. Once the test was blinded, they would start frowning and I knew what the outcome would end up being. A few didn't frown, had supreme confidence that they scored well and were honestly astounded when we checked their answers against the key.

FWIW, our tasting test today was a Petite Sirah bottled side by side with synthetic cork and screwcap. Identical chemistries (pH, free and bound sulfur, optical absorbance). With only 30 days in the bottle, my lab tech and I were able to score 100% identification in 5 trio-forced-choice trials each. This blind testing stuff really works. I don't know what the heck the test-pressure grumbling is all about.

And now it's off to a nice dinner in San Francisco, where the bottles will be all unblinded and largely German and Austrian.
 
To expand on my point in a more general way: John certainly pointed out a possible problem with ABX testing but sounds like (appologies if I misinterpret) having found a problem rather suggesting a way to fix it, he just dismissed the whole thing. Heck, no one would ever get an amplifier design to work with that approach!

If you'll for give a bit of exaggeration, suppose in ca. 1915 folks listened to an early broadcast using one of Lee DeForest's triodes, judged (quite correctly) that it sounded like crap, and decided just to forget the whole business. Today we would be getting the latest results from Athens via spark gap transmitters and Morse code.
 
Status
This old topic is closed. If you want to reopen this topic, contact a moderator using the "Report Post" button.