Claim your $1M from the Great Randi

Status
This old topic is closed. If you want to reopen this topic, contact a moderator using the "Report Post" button.
Konnichiwa,

sam9 said:
To expand on my point in a more general way: John certainly pointed out a possible problem with ABX testing but sounds like (appologies if I misinterpret) having found a problem rather suggesting a way to fix it, he just dismissed the whole thing.

Enoughsuggestions have been made over time as to "How to fix the ABX Test". For example, when the ABX test showed that cable differences where audible with a .2 Significance (meaning that we are 80% certain that the results where not random) with a small sampe, size (where even a .2 significance would statistically favour a "Null" result for small differences), the solution was to shift the significance to .05 which returned reliably a "Null" result for pretty much anything.

To be clear, a .2 significance mean that 4 correct out of 5 Trials is taken as rejecting the "Null hypothesis" with reasonable certainty. On the other hand, a .05 significance means that we require 19 correct in 20 Trials before we reject the Null hyphotesis.

Now ignoring any concerns about artificailly leveled playing fields and test setups that simply not discriminating enough and are in fact the audio equivalent of the Bar ABX test for drinks and of which certain ABX experimenters with a high public profile seem to be so awfully fond, discounting ANY such tomfooler and deliberate manipulation of the tests and instituting an ABX test with a .05 significance you will find that the vast majority of audiophiles AND sound engineers lacks the discrimiation and acoustic memory retention to complete the above trial 19/20 sucessfully with quite obviously audible stimuli (like polarity inversion in ONE channel and in BOTH channels - they are both considered audible).

We may argue that the reason why people fail to be sucessfull is down to experimental stress, poor hearing, whatever. The fact remains that the specific, small sample size double blind test proposed for audio use as "ABX" is inherently flawed in both the test setup and the statistical evaluation post test.

The possible fixes are manifold, but there is one key element that is invariably absent in ABX "lets proove audiophiles are idots that pay over the odds for something they cannot hear" test but which has been present in all sucessfull double blind tests for audio use (eg. research of the audibility of compression algorythms to the AVERAGE listener) and which is unlikely to corrected for audiophiledebunking tests, namely sample size.

If you assume small differences and wish to qualify their audibility or not you MUST either adjust your significance or increase your sample size (participants and trials) untill your choosen significance can yield results other than an automatic "null". This implies very large studies with litereally 100's of participants, just as they are conducted in serious professional research in medicine and/or serious audio or the publishing of the test data with the admission that a significance of higher than .4 could not be reliably applied to the test due to the lack of sample size but we choose to use .05 anyway and got the result we wanted, namely nobody can hear anything!

My problem with most blind tests is less the fundamental methode, but rather many secifics about the implementation and the post test statistical anlysis of the tests.

SY, IF I where to conduct an ABX style tests if a small group of average wine drinkers could tell the difference between an Italian Chianti and a French Cabernet I would with disturbing reliability find that "all wines taste the same". I would also find that many supoposedly golden tounges find "all wines taste the same". I could then safely reject the few who did taste a difference as "exceptional" and as "lucky coins" and claim "all wine tastes the same". The fact that this has never been done is the obviousness of the fact that wines taste different.

However, with sufficient intent I can use blind testing and statistics to "proove" anything. Of course, as I pointed out before, even a million ABX tests with Null results don't "proove" ANYTHING WHATSOEVER, but this fine distinction is usually omited from the publication of the (null) results and not understood by many who blithingly claim that so'n'so have proven that tist'n'that is inaudible and anyone paying money to buy this'n'that is a fool, easily parted from his money.

With that, in parting I would suggest that correctly implemented blind and double blind testing is a perfectly usefull tool, as is a 9mm full automatic Uzi in assasinations.

But the Uzi would not be my tool of choice to reliably hit my target over a distance of 500m without hitting anyone else. It would be great though for cowboying someone drive by style on the sidewalk and taking out halve of his posse and a few innocent bystanders in the process. Similar distinctions apply to all tools, inclduing blind testing, use the right tool for the right job.

The poeple who currently use ABX testing ARE using the right tool for THEIR job, which is to convince anyone that there no audbible differences between audio gear.

Sayonara
 
SY, IF I where to conduct an ABX style tests if a small group of average wine drinkers could tell the difference between an Italian Chianti and a French Cabernet I would with disturbing reliability find that "all wines taste the same". I would also find that many supoposedly golden tounges find "all wines taste the same".

I suggest you try this. You will find that you are 100% incorrect. The drinkers may well not be able to say which is the Cabernet and which is the Chianti, but they will have no problem telling that the wines are different. Heck, even the color will be different.

And, unlike the world of goo-goo audio, credentialed professionals in wine take and pass blind tests as part of their certification exams.
 
As a proud member of the high end audio design team, I think that this obsession with double blind testing shows a problem in itself. It is not really necessary to make a successful audio product. However, if the problems of ABX testing were easy to fix, or the proponents were willing to address many of the criticisms of the test, then it would be used. At this point, it is essentially a NULL test that implies that we are generally wasting our time to attempt to improve audio design.
 
AX tech editor
Joined 2002
Paid Member
john curl said:
As a proud member of the high end audio design team, I think that this obsession with double blind testing shows a problem in itself. It is not really necessary to make a successful audio product. [snip]


That is very true. I would even go as far as to suggest that it isn't really necassary to make a GOOD audio product to be succesfull. You can fool most of the people most of the time. To be honest, this is not limited to audio, of course.

Jan Didden
 
Janneman, you FORGOT that: "You can't fool ALL the people, ALL the time!"
We ALL know that we can sometimes fool ourselves, and others. BUT, that does NOT mean that we would bother to fool ourselves, when we could make something cheaper and more profitable to us, by fooling ourselves and others. We want to make better audio products, this keeps us on our toes to watch out against fooling ourselves in some way. PS on rereading, I hope that I have not been 'foolish' in the way that I stated this :xeye:
 
"You can't fool ALL the people, ALL the time!"
We ALL know that we can sometimes fool ourselves, and others. BUT, that does NOT mean that we would bother to fool ourselves, when we could make something cheaper and more profitable to us, by fooling ourselves and others. We want to make better audio products, this keeps us on our toes to watch out against fooling ourselves in some way.

Is there not also a tendency to fool ourselves with the assumption that "making it cheaper" is the opposite with "making it better"? If high quality (and "high-end") audio is a good thing, would it not be a noble endevor to extend it's availablity to those who are less prosperous
 
I don't quite know where this is going but: I design products or at least assist others in product design at ALL price levels. For example, I might review a $100 retail phono stage and offer suggestions to improve it. At the same time I make a $5000 retail phono stage, that really sounds a lot better. I apply all that I can from my experience with the $5000 phono stage that I can to make the $100 as good as possible, but I'm afraid that it doesn't sound quite as good. If I subjected these 2 preamps to an ABX test, it is quite probable that almost nobody would be able to hear the difference to a 95% statistical level. Does this mean that the $100 unit, used within its capabilities, is just as good sounding as the $5000 unit? If so, then save your money, folks! However, in normal listening tests, there is a significant difference between the phono stages, so I think that there is something wrong with the ABX test method, rather than the two phono stages actually perform at the same quality level.
 
AX tech editor
Joined 2002
Paid Member
john curl said:
Janneman, you FORGOT that: "You can't fool ALL the people, ALL the time!"
We ALL know that we can sometimes fool ourselves, and others. BUT, that does NOT mean that we would bother to fool ourselves, when we could make something cheaper and more profitable to us, by fooling ourselves and others. We want to make better audio products, this keeps us on our toes to watch out against fooling ourselves in some way. PS on rereading, I hope that I have not been 'foolish' in the way that I stated this :xeye:

John,

I get your point. I think I was not very clear what I meant. It was in no way meant negatively to all those very erudite and experienced audio designers, that turn out the finest products without any ABX whatsoever, and make amateurs like myself feel very small fish indeed. What I meant was that sometimes you see a product in the market that is so mediocre, to say the least, that it makes your toes curl (no pun intended, honestly, it's a Dutch saying). And somebody makes a lot of money off it (defining 'succes' for convenience as 'making a lot of money'). And that happens also in the audio market. What I meant to say is that succes is no guarantee of quality, which was what I thought you said.

Anyway, knowing fully well that we also on a routine basis fool ourselves, one should always search for tools or methods to try to minimize it, and to try to be as 'scientific' if you get my drift, as possible. To me ABX is a good tool, albeit not the perfect one.


When I say, fool ourselves, I don't mean that we do it consciensly and willingly. But our perception apparatus is very inaccurate. It works by analogies. It works out things like 'sounds like', 'looks like', 'feels like', it reminds me of', 'it's like that time that I', etc.
So, when you really are looking at very small, precise differences in say sound or sight, you have to force yourself in an unnatural mode. Being aware of it is half the battle, training yourself also helps. But important is that you constantly watch yourself and be very critical.

Jan Didden
 
AX tech editor
Joined 2002
Paid Member
john curl said:
I don't quite know where this is going but: I design products or at least assist others in product design at ALL price levels. For example, I might review a $100 retail phono stage and offer suggestions to improve it. At the same time I make a $5000 retail phono stage, that really sounds a lot better. I apply all that I can from my experience with the $5000 phono stage that I can to make the $100 as good as possible, but I'm afraid that it doesn't sound quite as good. If I subjected these 2 preamps to an ABX test, it is quite probable that almost nobody would be able to hear the difference to a 95% statistical level. Does this mean that the $100 unit, used within its capabilities, is just as good sounding as the $5000 unit? If so, then save your money, folks! However, in normal listening tests, there is a significant difference between the phono stages, so I think that there is something wrong with the ABX test method, rather than the two phono stages actually perform at the same quality level.


John,

For me this comes back to a point I mentioned earlier, which is that perception is a complex of inputs, both external as surely also internally generated. If someone listens to and reports on a sound, it is practically IMPOSSIBLE to ONLY take in account that what impinges on the tympany and ignore everything else.

It has, as a simple example, been discovered that in anticipation to a certain perception, the brain actively increases the gain in those sensor channels that would support the expectation, and decrease the gain in those channels that would work against the expectation. And it is an expected activity if you realise that in a forest, where you gather fruit, you would have your acoustic perception tuned to the sounds of an approaching tiger. You don't want to be distracted by say, a songbird, so you decrease the hf channels. I'm not making this up, I can give you several references of studies if you want.

So, perception is complex, involving expectations, experiences as well as the actual physical stimulus.

Therefore, to come full circle, it is no mystery for me that in sighted tests one can 'hear' (or I should say 'perceive') differences that disappear in ABX test. In an ABX test you are really sensory deprived! It would be daft to do an ABX test in a shop when you want to buy a system. You want the system you 'like' in the widest sense, and the actual sound is just a part of it. But is you want to judge say whether the change of a resistor from carbon film to metal film makes a difference, you cannot escape some form of blind testing, IMHO.

Jan Didden
 
The really interesting question is if we suffer from placebo or not?

Only if we're human.


It would be daft to do an ABX test in a shop when you want to buy a system.

But quite smart to do one there if you want to buy an amplifier or CD player (or, horror of horrors, magic hockey pucks or rocks in a jar), assuming you can convince the shop to do a proper level-match. If I can't tell the difference between a $3000 amp and a $400 amp, that's $2600 left that I can spend on concert tickets, CDs, and chasing girls.

Blind testing (whether ABX or some other format) is an invaluable tool in all forms of sensory research. It's a bit of a straw man to complain that it's not useful for doing other tasks.
 
AX tech editor
Joined 2002
Paid Member
SY said:
[snip]But quite smart to do one there if you want to buy an amplifier or CD player (or, horror of horrors, magic hockey pucks or rocks in a jar), assuming you can convince the shop to do a proper level-match. If I can't tell the difference between a $3000 amp and a $400 amp, that's $2600 left that I can spend on concert tickets, CDs, and chasing girls.

Blind testing (whether ABX or some other format) is an invaluable tool in all forms of sensory research. It's a bit of a straw man to complain that it's not useful for doing other tasks.

Stuart,

When you go to a shop to check our these rocks-in-a-jar, you really have already decided to buy them. There is only an infinitisimal small chance that you will refrain from buying based on what you hear. It would really take an incompetent salesperson to mess it up. They will 'sound' OK, I'm sure.

Blind tests in a shop won't work. Most people have already selected what they want, maybe they are in doubt between two speaker models or something, they have already mentally parted with their money, no way that they will decide for the 800$ system if they ask to hear the 2500$ system as well. It'll be the 2500$ system, as any seasoned salesman will confirm.

Jan Didden
 
Konnichiwa,

SY said:
I suggest you try this. You will find that you are 100% incorrect.

You insist on misinterpreting my comments.

SY said:
The drinkers may well not be able to say which is the Cabernet and which is the Chianti, but they will have no problem telling that the wines are different. Heck, even the color will be different.

And this all will be hard to distinguish if I make sure the light is dim and of the right colour and flood the room with powerfull (and pleasant floral) smells. Then, despite the differences which in other circumstances may be blindingly obvious may be simply obliterated. And THAT was my point, togther with the fact that the wine drinkers will still have a hard time to tell accuratly WHICH WINE is the cabernet and which is the Cianti in 19 seperate trials out of 20.

Or to be ABSOLUTELY clear, a current standard ABX test allows you to take a taste of each (as many tastes as you like) of Item A and Item B and then requires you, in the interest of expediencey usally quite quickly, to take (trial) twenty samples of a pair of choices each, which will be completely random (and may be the same for each pair) and to correctly identify item A or item B in nineteen of these trials. As such it requires not only the identification of a difference, but the correct identification of an earlier established reference!

I will retain that such a test of wine will be less sensitive to modest differences than that which you carry out, try it.

I MUST REPEAT, the simple term "double blind" does not ensure that a given test is usefull for a given purpose. One will have to assure a number of other variables are adjusted correctly too.

Sayonara
 
Status
This old topic is closed. If you want to reopen this topic, contact a moderator using the "Report Post" button.