TG, as an addition to your perceptive post, I found that music that wasn't particularly interesting to me was one of the best tests of sonics. As a bonus, I wasn't burned out on music that I love.
Don't let the "hasn't been done before" part spook you. That's the whole reason to implement the controls- the claim is extraordinary. That DOESN'T mean it's not real.
I won't be blowing a police whistle and I promise to make you feel as comfortable and relaxed as possible. It's not a matter of passing or failing as a person, it's just some wires in your hifi- there's nothing big at stake, just your own curiosity and desire to know what's real and what isn't. Pressure shouldn't enter into it.
Don't let the "hasn't been done before" part spook you. That's the whole reason to implement the controls- the claim is extraordinary. That DOESN'T mean it's not real.
I won't be blowing a police whistle and I promise to make you feel as comfortable and relaxed as possible. It's not a matter of passing or failing as a person, it's just some wires in your hifi- there's nothing big at stake, just your own curiosity and desire to know what's real and what isn't. Pressure shouldn't enter into it.
TG, as an addition to your perceptive post, I found that music that wasn't particularly interesting to me was one of the best tests of sonics. As a bonus, I wasn't burned out on music that I love.
If I may make another point, a very telling test was to play music that the owner of a very carefully put together system hates. In one case we played an original Vertigo pressing of "Autobahn" on a system built around old RCA's and Mercury's. Suddenly it sounded big and loud but very ordinary.
...I found that music that wasn't particularly interesting to me was one of the best tests of sonics.
I also agree with this. A buddy of mine has a Cindy Lauper CD he likes to torture me with. No "euphonics" there.

I think it's often hard to pinpoint differences because we listen to music with different parts of our brain than we do other sounds. That seems to make it hard to be objective with music - especially music we like.
So if you can't differentiate the modifications alledged to be caused by cabling differences while listening to music for pleasure, why does it matter to you if the cables cause a difference? How do you know the differences are there in the first place?
Hello auplater!
You're twisting my words. I NEVER said I couldn't hear differences while listening to music for pleasure. This is in fact what I do almost 100% of the time.
I know for some reason it seems people who hear differences in wires cannot pass a controlled DBT in front of witnesses who don't believe wires have a sonic difference. Now whether this is the a result of "performance anxiety" or the ear/brain possibly not working the same when taking a test as it does when listening for pleasure, I don't know. What I do know is there are many people who belittle these people who've failed DBTs and proclaim the reason they've failed the DBT is because they're "fooling themselves" and there's no sonic differences in wires.
I can already hear people, like yourself, claim TG was just one more who thought he could hear differences that don't really exist! He probably also believes in fairies, leprechauns, the world is flat & the sun rotates around the earth. Your contempt towards those who believe wires do indeed have a unique sonic characteristic of their own is heard through your comments of "alledged to be caused by cabling differences and How do you know the differences are there in the first place?
So for the sake of those people like you who don't believe differences in wires exist I find it more important to just prove there are sonic differences and they can be heard, period! Besides why do you care what method I use, provided I don't cheat and I prove sonic differences in wires do exist for reasons other than LCR? Is it more important to have your opinion be correct or to know the truth? I want to know the truth and I'm willing to be tested under controlled protocols to discover what the truth is! Even if that means I possibly fail.
Thetubeguy1954
~Rational Subjectivism. It's An Acquired Taste!~
Although I believe in the law of diminishing returns, there certainly is a difference in cables. Whether interconnects or speakers. My DBT is my wife and kids. All are non-audiophobes. They can routinely pick out the difference between cheap cables and a decent set. I think it most depends on the particular set-up to match the cable LCR to the impedance and charateristics of the amp.
In what way are they DBT? Do you blindfold them before any swap (or not)?
jd
TG, the "performance anxiety" never seemed to bother anyone (including me) who was seriously involved in organoleptic wine research. We did DBTs routinely, and our goal was to find out what was real, not to see who had the better nose or taste buds- that's not what these were testing. Again, I encourage you to think of this as a way of discovering something and gaining understanding, not a "test" of you as a person or some physiological "quality factor" of your ears. I applaud your willingness and open-minded attitude, yet again.
TG, the "performance anxiety" never seemed to bother anyone (including me) who was seriously involved in organoleptic wine research.
It's easy to be relaxed while drinking. 😀 Also consider movie DVDs, for me they provide the right combination of cold voice, disinterest and depending on the movie potentially huge spatial effects.
...then come on over. I'll show the mechanics of tap operation. Nothing subjective about it.
Turkey all gone, even the stock is made and in the freezer. I can't believe I ate turkey for 5 days and not once complained. Even cold from the fridge was good. Mmm...now I'm hungry again.
Cheers.
Cheers.
Turkey all gone, even the stock is made and in the freezer. I can't believe I ate turkey for 5 days and not once complained. Even cold from the fridge was good. Mmm...now I'm hungry ag
Still a tiny bit of turkey here... and half a pot of soup/stew -- i didn't bother setting the stock aside.
dave
Still a tiny bit of turkey here... and half a pot of soup/stew -- i didn't bother setting the stock aside.
dave
Hey Dave!
How's things in Canada? Believe it or not I have all the windows in my home open today 12/31/09. Heck, I like turkey so much I make one on our Ronco ---{set it and forget it}--- indoor rotisseries once about every 2-3 months, so I can have an extra special dinner ready for my wife when she gets home from work. After that we pick on it, make sandwiches and have quick, micro-waved dinners until it's gone.
On another off-topic note. I just purchased two pairs of the rare 8 ohm, 94db, Foster/Fostex FF7-4273 10in dome woofers! That you commented on in a different thread here at diyaudio:
http://www.diyaudio.com/forums/planars-exotics/150426-dome-woofer.html
Do you think they'll make a nice subwoofer or lowend augmenter for my Sachikos?
Thetubeguy1954
I've not seen anything but pictures. I just know they are pretty and have a suspicion that they are probably pretty good based on their efforts on normal bass drivers.
dave
dave
Unfortunately, the only person here who claims to have done test design has only vague suggestions and criticisms and refuses to be pinned down to specifics. When pressed, he has only suggested "positive controls" of things that the claimant does NOT claim to be able to hear (e.g., frequency response) nor are related to the stimulus (which is NOT frequency response).
Not to forget the statistical remark and the proposal of an paired preference test along with positive _and_ negative controls. 🙂
Beside the fact that we are not only discussing TG/SY but more in general blind test protocols, i really can´t remember having proposed only something related to frequency response.
In fact i´ve several times proposed to take some things from Paul Frindle´s quite impressive list and afair there is a bit more than "frequency response" to find.
But to set a bottom line; normally the claimant claims to hear a difference and you as an experimentator doesn´t know what the reason could be.
But you want to set up a test, which will itself present a confounder for the claimant in any way (that is the most well known fact in testing).
So, normally you should try to have the claimant/participant in the most sensitive state he is able to reach. As you don´t know what the claimant will be able to hear during any "normal listening" (to find out that you´ll need a totally different test scheme) you normally only have the chance to look for the best possible sensitivity under test condition.
AFAIR it is common sense that small level differences will be percepted (if percepted) as sound differences; so for example you could use a small level difference as a positive control. Do you really think that the detection of 0.1db and below is an easy task?
As long as nobody of us knows what the physical reason for any detected (if so) audible difference will be, we just have to be creative and present different (optimal case) positive controls but all on highest sensitivity level already reported.
I'm beginning to wonder if he actually knows what a "positive control" actually is or is just using it as a mantra. I am also beginning to suspect that attempts are being made to scare TG off or have him back off from the agreed blind protocol and thus prevent his claim from being tested- I certainly hope this is not the case.
Strange ideas, but i´ll ask the next time i´ll see him. 🙂
Wishes
P.S. It seems that i got the wrong idea and that auplater doesn´t really want to talk about error type 2 risks, but what would be your answer to the question?
Most amateur builders lack the equipment to make physical measurments of cables so we rely on is what we hear. 99.9% of us could not tell any difference from 18 gauge zip cord or a $10,000.00 pair of cables. Measurements would also show little differences that would impact the sound quality. Like religion, I say let those who beleive, believe.
The fun of our hobby is the satisdfaction we get from creating great sound with our own hands and technical judgement. If $10,000.00 cables make you happy, by all means be happy.
Ken Bird
The fun of our hobby is the satisdfaction we get from creating great sound with our own hands and technical judgement. If $10,000.00 cables make you happy, by all means be happy.
Ken Bird
DBT's Part I of III
I've recently read a 3 part "article" by Jon Risch on DBT's, ABX's & the potential problems associated with the ABX forced-choice DBT. I'd like to make it clear this ISN'T the type of DBT SY & I will partake in. It does however makes for interesting reading especially in light of often hearing it's proponents state "In 20/25/30 years of testing, no one has being able to prove audio components sound different, under controled conditions, when nothing is broken", etc.
NOTE: I edited out one very small portion that spoke about another audio forum. Other than that it's in it's entirety ---Thetubeguy1954
I personally have been accused of being anti-DBT, nothing could be further from the truth.
I have often come out against the unwarranted conclusions that a few certain folks come to over the results of a few certain amateur listening tests, primarily because I am more familiar than most (but not more than jj) with the problems and limitations of such listening tests.
So I am going to discuss some of the various issues and aspects of DBT's, and in this, I am primarily referring to the amateur type of listening tests, not the professionally conducted types that jj and the codec folks do. I am not going to place this disclaimer at the end of every paragraph, so jj, please print out that sentence, and attach manually. If I specifically refer to a professionally conducted test, I will say so, quite clearly.
I also may refer to audio cable testing at times, but really, what I am saying applies to almost all audio component testing, and is relevant and applicable.
Now that that is out of the way, let's get down to brass tacks.
What is valid? That is, what is a valid listening test, or what constitutes a scientifically valid set of data?
The gold standard for many years has been serious studies or papers published in a peer-reviewed professional journal. There are many reasons for this, and I am not going to cover all of them. Suffice it to say, that this kind of presentation allows one to examine all the facts, the procedures, and the data. It provides for the review of the paper and it's contents by peers in the field, and is published where other professionals have access to it and can question it or raise points they feel have been overlooked. Does such publication guarantee that the conclusions reached by the author are pretty solid evidence? No, but it does provide a certain minimal level of information, screening and review that make the data and conclusions useful to a certain point.
DBT test results from certain amateur listening tests get thrown about sometimes as if they were some sort of hard, cold facts; after all, it was "scientifically determined" that such and such was the case, right?
However, when we look at these DBT listening tests more closely, we find that most have not been published in a peer-reviewed professional journal, in fact, not one DBT on audio cables has been published in such a manner. None. Very few listening tests on other audio components, with the exception of codecs, have been so published either. There have been a few landmark studies on speakers, ala Toole, and most people agree that audio loudspeaker systems do sound different, so this is not one of the more controversial components of study.
So why all the noise about DBT's? Where have they been published, and are they valid evidence? Well, for audio cables, only a handful have been published in popular press magazines. Note that this is not the same thing as being published in a professional journal, an editor may or may not have an agenda, no one else may be reviewing the article for accuracy or proper scientific procedures, etc. When I say just a handful, this is literally the case, as there are only about a half dozen (depends on your criteria) on speaker cables, and few on interconnects. For other audio components, there may be a half dozen articles or so. Not all of these came up with null results either, so it would be very hard to come to any sort of real conclusion based on the data from these articles.
What about web sites, message board posts, news group posts? These are what is known as anecdotal data, they usually have not provided all the details of the tests, nor all of the data, nor have they been reviewed by anyone for proper scientific procedures, etc.
The vast majority of listening test accounts are of an anecdotal nature, and not traditionally allowed to be considered as any sort of good scientifically based evidence.
So the very thing that is being argued about, the DBT listening tests, on audio components, are not of a nature that one can say are very useful in terms of truly valid scientific evidence.
So what about these amateur listening tests, these anecdotal web sites, the popular press magazine articles, are they any good to make any judgments from?
One of the great little catch phrases that get used by some folks extolling these amateur DBTs, is that "In 20/25/30 years of testing, no one has found XXX audio component to sound different, under controled conditions, when nothing is broken", etc.
This is meant to sound like DBT tests have put the matter to bed years ago. This sounds all very well and fine, until you realize: what was the SOTA for 25 or 30 years ago? What kind of cables would have been compared 25 or 30 years ago? I can tell you: zip cords against zip cords. Several of the articles commonly referred to by folks citing popular press DBT results, were this old. Some of the articles on CDPs are 17, 14 years old. How far have CDPs come in that length of time? I mean, we are talking about CDPs that probably did not even have 16 or 15 bits of resolution, no dither, multi-stage, multi-opamp analog output flters, etc.
So if you stop to think about how valid, how relevant some of these really old tests are to the current state of audio, including mid-fi, then it becomes clear that some of them are not really of any use for modern audio components.
What about the tests themselves, how were they conducted? Let's look at a typical scenario for one of the more popular testing paradigms of the day: an ABX style listening test. Note that this is not intended to represent ALL such tests, but merely to provide some idea of what went on in many of the amateur listening tests commonly cited.
First, an ABX switchbox was used to connect the two DUT's, or Device Under Test. In most cases, this required additional cables to be used to insert the switchbox into the signal chain, so it could control which unit was being heard at any given time. The extra cables were almost without exception, just zip cords and/or el cheapo IC's. Even when an audio cable was the subject of the test, the extra cable portion was almost always a zip cord or an el cheapo IC. The reasoning here was that both units were subjected to the same conditions, so it shouldn't matter. So much for the weakest link.
For cable tests, this would be a serious limiting factor, as whatever losses or problems the zip cords or cheap ICs had, were now superimposed on the test cables as well. Ironically, since the vast majority of testers did not believe that audio cables had any sonic impact, they created a situation that virtually guaranteed that it would be hard, at best, to hear what was going on.
Then the listener is asked to listen to the test units, and 'familiarize' themselves with the switchbox and listening protocol.
Typically, while the listener was listening, and switching back and forth, the music was allowed to play on. The first portion was the so-called sighted portion of the test, where they knew the identity of each unit (they know which one is A, and which one is B). The listener was often encouraged to switch back and forth during this portion, and to state whether or not they felt they were hearing the same kinds of sonic differences they did under sighted listening without the switchbox. More on this aspect later.
Then after what might have been hundreds of switches back and forth, under what I would call fairly casual conditions, they would enter the forced choice portion of the listening test, and be asked to identify an unknown DUT, presented as X. They still had access to hearing DUT A or B, and still knew what device the A or B unit was, but X was an unknown, and they were asked to make a choice as to whether it was unit A or unit B.
Classically, they were exposed to a total of 16 trials where they had to select what unit they thought X was, and since it was what is known as a forced choice type of situation, even if they were to readily admit, that they did not think they could identify the DUT, or that they had listening fatigue, they still were supposed to make a choice.
Note that each trial could consist of as many switches back and forth from A to B and back again, and to X and back again.
A single listener might only participate in a single run of 16 trials, and there might only be a handful of such listeners.
Once the listening tests were completed, then the test administrator would check the ABX hardware for the accuracy scores of the listener, and check this against a table of probability ratings, to see how much of a probability existed that the listener had actually been identifying the DUT beyond a certain level of sheer chance.
The benchmark for the 16 trials was to get 12 or more correct, this would then establish that the listener had less than a 5% chance of just guessing that many correct. It is what is known as a confidence level of 95%. The criteria for what was considered 'good enough' so as to not be just due to chance, is supposed to be selected before the test, and then adhered to. Other confidence levels could be used, such as 99% (very strict, and usually extremely hard to do in these kinds of tests), or 90%. It should be noted, that for a 95% confidence level, that just conducting 20 runs would typically result in one that appeared to exceeded the 95% confidence level, even if everything was just random choices. So in order to take the test results as a valid positive, one would have to do better than this on the average.
Much was made of these kinds of tests, mainly because of the fact that they were Double Blind, due to the use of the automated switchbox hardware. The test administrator did not know the identity of the X unit until after the test was completed, and therefore, was theoretically incapable of influencing the outcome of the tests.
What were the problems with these early amateur DBTs?
Unfortunately, they were legion.
It was often assumed that since these tests were double blind, that they represented the only 'true' kind of valid listening test available. However, it was often overlooked that the mere fact that any given listening test was DBT, did not guarantee ANYTHING else at all. It could have been the worlds worst listening test ever, and still could have been double blind.
The long open (sighted) initial portion was not really training, nor were they valid controls of the test sensitivity. In my opinion, they were more of a fatigue inducing situation than anything else. The listener seldom got any real training, they were not exposed to the forced choice scenario until it was time to 'perform', and they were not really trained in terms of what kinds of things to listen for, what kinds of things to hone in on, etc.
The music was typically left to play on, and this is a huge error in procedure. In essence, the listener was never comparing the same signal on both DUTs at any given time, in fact, the same signal was NEVER compared, ONLY a different signal was ever compared. This is such a big problem with the procedure, that such listening tests could be summarily dismissed as an invalid attempt based on this alone.
In terms of listening fatigue, the listener was encouraged to switch back and forth as often and as much as they desired, and this often lead inexperienced and untrained listeners to switch back and forth a huge number of times, all the while not really focusing in on the musical presentation that much. Again, with the music playing on, it would be very hard to try and draw any sort of valid choice, and just as hard to hear what the two units were doing even when you knew which one was which.
This combined with the typically open ended initial sighted portion, and the relatively large number of trials, each of which might include dozens or even hundreds of times that the listener switched back and forth betwen the various units, in my opinion was the cause f a lot of listener fatigue, and therefore was also a very significant factor in these kinds of tests coming up with null results.
Then there was the issue of the switchbox itself, and the extra cables, often of a very poor overall quality level. The relays inside the ABX boxes were of various types over the years, the early ones were mercury wetted reed-relays, the later ones were supposedly rheuthenium plated relay contacts. It has been argued that the switchbox was a source of significant degradation of the listening test resolving power, due to the extra cables and contacts involved. The signal was exposed to magnetic fields inside the relay, and had to travel through a lot of extra wiring and contacts compared to a normal direct real-world connection.
Defenders claimed that the ABX switchbox had passed two tests that assured it was transparent, aside from the usual objective measurement standards of THD, noise and the like:
One, it had been tested using yet another ABX switchbox, and the results had turned up as a null.
Two, J. Gordon Holt, the golden-ears of Stereophile fame, was said to have found it to be 'inaudible' during one of his listening sessions once long ago.
Well, I hope that I don't have to explain the fallacy involved with the first assertion, and the second one is ironic, as one of the very things that the ABX folks were against, was the acceptance of any pronouncements from golden-eared reviewers using sighted listening to review audio products. I think it incredible that they wanted to dismiss and discount all the other reviewers, and Mr. Holt as well when he was reviewing audio equipment, but it was OK to accept his pronouncement on THEIR unit as being transparent when using the same methods. Even so, it is a good idea to note that this occurred back in the 80's, so who knows what one would hear using modern high performance audio gear?
Finally, the confidence level chosen, as well as the particular number of trials, created a very high 'bar' to hurtle, the listener had to be really hearing definite things, and would not have been able to easily discern more subtle things to the requirements chosen.
Despite all of this, certain folks try to cite these old DBT tests as definitive evidence of no sonic differences for audio cables, CDPs, power amps, etc. Not only are the previous problems cited good reasons not to do this, even if none of the problems had existed, and all the items objected to been corrected, there would still be a fundamental problem with doing so.
This fundamental problem is the equating of a null result, that is, a listening test result that simply failed to reach the previously defined criteria for a statistically significant result, as a negative result.
If you have a controlled listening test, and it fails to reach the defined level of statistical confidence, then the result is often called a null result, or "accepting the null hypothesis". However, this kind of result really and truly has no other meaning. You can not legitimately equate a null result to a negative.
Some folks have tried to argue that the equating of a null with a negative is legitimate, and even cited a lone book as a reference. However, the vast majority of statistics books, professors, and accepted authorities still maintain that doing so is just not correct.
The primary reason for not doing so was touched on earlier, you can not know how sensitive the listening test set-up is, unless you have performed a control experiment to determine this. Without such a control, a test that has determined how sensitive both the listening test set-up, and the listening subjects are to very subtle sound issues, you can not have any chance of knowing that the listening test was even inherently capable of discerning what was being tested for!
In the ABX style listening tests, the comments by listeners in the sighted portion that they are indeed hearing what they expected to hear is often cited as a sufficient provision of this test sensitivity control information.
However, this is NOT a scientific way to achieve the determination of this control condition. It is another example of the answer begging the question. Just as you can not use the test to test the test, you can not use a sighted portion to verify the performance of the forced choice portion. This is yet another example of the incorrect reasoning used to justify these kinds of listening tests, and how valid they are supposed to be.
Part 2 will cover the inherent problems and flaws with DBT listening tests, even when done impeccably. Part 3 will cover alternate methods and include some comments on doing your own DBT's.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Thetubeguy1954
I've recently read a 3 part "article" by Jon Risch on DBT's, ABX's & the potential problems associated with the ABX forced-choice DBT. I'd like to make it clear this ISN'T the type of DBT SY & I will partake in. It does however makes for interesting reading especially in light of often hearing it's proponents state "In 20/25/30 years of testing, no one has being able to prove audio components sound different, under controled conditions, when nothing is broken", etc.
NOTE: I edited out one very small portion that spoke about another audio forum. Other than that it's in it's entirety ---Thetubeguy1954
DBTs, ABX and the Meaning of Life? Part 1
Talking about Double Blind Tests is worse than discussing politics or religion, and the infamous DBT thread death-spiral is all too familiar to most of us who have perused the various audio message boards or news groups on the Internet.I personally have been accused of being anti-DBT, nothing could be further from the truth.
I have often come out against the unwarranted conclusions that a few certain folks come to over the results of a few certain amateur listening tests, primarily because I am more familiar than most (but not more than jj) with the problems and limitations of such listening tests.
So I am going to discuss some of the various issues and aspects of DBT's, and in this, I am primarily referring to the amateur type of listening tests, not the professionally conducted types that jj and the codec folks do. I am not going to place this disclaimer at the end of every paragraph, so jj, please print out that sentence, and attach manually. If I specifically refer to a professionally conducted test, I will say so, quite clearly.
I also may refer to audio cable testing at times, but really, what I am saying applies to almost all audio component testing, and is relevant and applicable.
Now that that is out of the way, let's get down to brass tacks.
What is valid? That is, what is a valid listening test, or what constitutes a scientifically valid set of data?
The gold standard for many years has been serious studies or papers published in a peer-reviewed professional journal. There are many reasons for this, and I am not going to cover all of them. Suffice it to say, that this kind of presentation allows one to examine all the facts, the procedures, and the data. It provides for the review of the paper and it's contents by peers in the field, and is published where other professionals have access to it and can question it or raise points they feel have been overlooked. Does such publication guarantee that the conclusions reached by the author are pretty solid evidence? No, but it does provide a certain minimal level of information, screening and review that make the data and conclusions useful to a certain point.
DBT test results from certain amateur listening tests get thrown about sometimes as if they were some sort of hard, cold facts; after all, it was "scientifically determined" that such and such was the case, right?
However, when we look at these DBT listening tests more closely, we find that most have not been published in a peer-reviewed professional journal, in fact, not one DBT on audio cables has been published in such a manner. None. Very few listening tests on other audio components, with the exception of codecs, have been so published either. There have been a few landmark studies on speakers, ala Toole, and most people agree that audio loudspeaker systems do sound different, so this is not one of the more controversial components of study.
So why all the noise about DBT's? Where have they been published, and are they valid evidence? Well, for audio cables, only a handful have been published in popular press magazines. Note that this is not the same thing as being published in a professional journal, an editor may or may not have an agenda, no one else may be reviewing the article for accuracy or proper scientific procedures, etc. When I say just a handful, this is literally the case, as there are only about a half dozen (depends on your criteria) on speaker cables, and few on interconnects. For other audio components, there may be a half dozen articles or so. Not all of these came up with null results either, so it would be very hard to come to any sort of real conclusion based on the data from these articles.
What about web sites, message board posts, news group posts? These are what is known as anecdotal data, they usually have not provided all the details of the tests, nor all of the data, nor have they been reviewed by anyone for proper scientific procedures, etc.
The vast majority of listening test accounts are of an anecdotal nature, and not traditionally allowed to be considered as any sort of good scientifically based evidence.
So the very thing that is being argued about, the DBT listening tests, on audio components, are not of a nature that one can say are very useful in terms of truly valid scientific evidence.
So what about these amateur listening tests, these anecdotal web sites, the popular press magazine articles, are they any good to make any judgments from?
One of the great little catch phrases that get used by some folks extolling these amateur DBTs, is that "In 20/25/30 years of testing, no one has found XXX audio component to sound different, under controled conditions, when nothing is broken", etc.
This is meant to sound like DBT tests have put the matter to bed years ago. This sounds all very well and fine, until you realize: what was the SOTA for 25 or 30 years ago? What kind of cables would have been compared 25 or 30 years ago? I can tell you: zip cords against zip cords. Several of the articles commonly referred to by folks citing popular press DBT results, were this old. Some of the articles on CDPs are 17, 14 years old. How far have CDPs come in that length of time? I mean, we are talking about CDPs that probably did not even have 16 or 15 bits of resolution, no dither, multi-stage, multi-opamp analog output flters, etc.
So if you stop to think about how valid, how relevant some of these really old tests are to the current state of audio, including mid-fi, then it becomes clear that some of them are not really of any use for modern audio components.
What about the tests themselves, how were they conducted? Let's look at a typical scenario for one of the more popular testing paradigms of the day: an ABX style listening test. Note that this is not intended to represent ALL such tests, but merely to provide some idea of what went on in many of the amateur listening tests commonly cited.
First, an ABX switchbox was used to connect the two DUT's, or Device Under Test. In most cases, this required additional cables to be used to insert the switchbox into the signal chain, so it could control which unit was being heard at any given time. The extra cables were almost without exception, just zip cords and/or el cheapo IC's. Even when an audio cable was the subject of the test, the extra cable portion was almost always a zip cord or an el cheapo IC. The reasoning here was that both units were subjected to the same conditions, so it shouldn't matter. So much for the weakest link.
For cable tests, this would be a serious limiting factor, as whatever losses or problems the zip cords or cheap ICs had, were now superimposed on the test cables as well. Ironically, since the vast majority of testers did not believe that audio cables had any sonic impact, they created a situation that virtually guaranteed that it would be hard, at best, to hear what was going on.
Then the listener is asked to listen to the test units, and 'familiarize' themselves with the switchbox and listening protocol.
Typically, while the listener was listening, and switching back and forth, the music was allowed to play on. The first portion was the so-called sighted portion of the test, where they knew the identity of each unit (they know which one is A, and which one is B). The listener was often encouraged to switch back and forth during this portion, and to state whether or not they felt they were hearing the same kinds of sonic differences they did under sighted listening without the switchbox. More on this aspect later.
Then after what might have been hundreds of switches back and forth, under what I would call fairly casual conditions, they would enter the forced choice portion of the listening test, and be asked to identify an unknown DUT, presented as X. They still had access to hearing DUT A or B, and still knew what device the A or B unit was, but X was an unknown, and they were asked to make a choice as to whether it was unit A or unit B.
Classically, they were exposed to a total of 16 trials where they had to select what unit they thought X was, and since it was what is known as a forced choice type of situation, even if they were to readily admit, that they did not think they could identify the DUT, or that they had listening fatigue, they still were supposed to make a choice.
Note that each trial could consist of as many switches back and forth from A to B and back again, and to X and back again.
A single listener might only participate in a single run of 16 trials, and there might only be a handful of such listeners.
Once the listening tests were completed, then the test administrator would check the ABX hardware for the accuracy scores of the listener, and check this against a table of probability ratings, to see how much of a probability existed that the listener had actually been identifying the DUT beyond a certain level of sheer chance.
The benchmark for the 16 trials was to get 12 or more correct, this would then establish that the listener had less than a 5% chance of just guessing that many correct. It is what is known as a confidence level of 95%. The criteria for what was considered 'good enough' so as to not be just due to chance, is supposed to be selected before the test, and then adhered to. Other confidence levels could be used, such as 99% (very strict, and usually extremely hard to do in these kinds of tests), or 90%. It should be noted, that for a 95% confidence level, that just conducting 20 runs would typically result in one that appeared to exceeded the 95% confidence level, even if everything was just random choices. So in order to take the test results as a valid positive, one would have to do better than this on the average.
Much was made of these kinds of tests, mainly because of the fact that they were Double Blind, due to the use of the automated switchbox hardware. The test administrator did not know the identity of the X unit until after the test was completed, and therefore, was theoretically incapable of influencing the outcome of the tests.
What were the problems with these early amateur DBTs?
Unfortunately, they were legion.
It was often assumed that since these tests were double blind, that they represented the only 'true' kind of valid listening test available. However, it was often overlooked that the mere fact that any given listening test was DBT, did not guarantee ANYTHING else at all. It could have been the worlds worst listening test ever, and still could have been double blind.
The long open (sighted) initial portion was not really training, nor were they valid controls of the test sensitivity. In my opinion, they were more of a fatigue inducing situation than anything else. The listener seldom got any real training, they were not exposed to the forced choice scenario until it was time to 'perform', and they were not really trained in terms of what kinds of things to listen for, what kinds of things to hone in on, etc.
The music was typically left to play on, and this is a huge error in procedure. In essence, the listener was never comparing the same signal on both DUTs at any given time, in fact, the same signal was NEVER compared, ONLY a different signal was ever compared. This is such a big problem with the procedure, that such listening tests could be summarily dismissed as an invalid attempt based on this alone.
In terms of listening fatigue, the listener was encouraged to switch back and forth as often and as much as they desired, and this often lead inexperienced and untrained listeners to switch back and forth a huge number of times, all the while not really focusing in on the musical presentation that much. Again, with the music playing on, it would be very hard to try and draw any sort of valid choice, and just as hard to hear what the two units were doing even when you knew which one was which.
This combined with the typically open ended initial sighted portion, and the relatively large number of trials, each of which might include dozens or even hundreds of times that the listener switched back and forth betwen the various units, in my opinion was the cause f a lot of listener fatigue, and therefore was also a very significant factor in these kinds of tests coming up with null results.
Then there was the issue of the switchbox itself, and the extra cables, often of a very poor overall quality level. The relays inside the ABX boxes were of various types over the years, the early ones were mercury wetted reed-relays, the later ones were supposedly rheuthenium plated relay contacts. It has been argued that the switchbox was a source of significant degradation of the listening test resolving power, due to the extra cables and contacts involved. The signal was exposed to magnetic fields inside the relay, and had to travel through a lot of extra wiring and contacts compared to a normal direct real-world connection.
Defenders claimed that the ABX switchbox had passed two tests that assured it was transparent, aside from the usual objective measurement standards of THD, noise and the like:
One, it had been tested using yet another ABX switchbox, and the results had turned up as a null.
Two, J. Gordon Holt, the golden-ears of Stereophile fame, was said to have found it to be 'inaudible' during one of his listening sessions once long ago.
Well, I hope that I don't have to explain the fallacy involved with the first assertion, and the second one is ironic, as one of the very things that the ABX folks were against, was the acceptance of any pronouncements from golden-eared reviewers using sighted listening to review audio products. I think it incredible that they wanted to dismiss and discount all the other reviewers, and Mr. Holt as well when he was reviewing audio equipment, but it was OK to accept his pronouncement on THEIR unit as being transparent when using the same methods. Even so, it is a good idea to note that this occurred back in the 80's, so who knows what one would hear using modern high performance audio gear?
Finally, the confidence level chosen, as well as the particular number of trials, created a very high 'bar' to hurtle, the listener had to be really hearing definite things, and would not have been able to easily discern more subtle things to the requirements chosen.
Despite all of this, certain folks try to cite these old DBT tests as definitive evidence of no sonic differences for audio cables, CDPs, power amps, etc. Not only are the previous problems cited good reasons not to do this, even if none of the problems had existed, and all the items objected to been corrected, there would still be a fundamental problem with doing so.
This fundamental problem is the equating of a null result, that is, a listening test result that simply failed to reach the previously defined criteria for a statistically significant result, as a negative result.
If you have a controlled listening test, and it fails to reach the defined level of statistical confidence, then the result is often called a null result, or "accepting the null hypothesis". However, this kind of result really and truly has no other meaning. You can not legitimately equate a null result to a negative.
Some folks have tried to argue that the equating of a null with a negative is legitimate, and even cited a lone book as a reference. However, the vast majority of statistics books, professors, and accepted authorities still maintain that doing so is just not correct.
The primary reason for not doing so was touched on earlier, you can not know how sensitive the listening test set-up is, unless you have performed a control experiment to determine this. Without such a control, a test that has determined how sensitive both the listening test set-up, and the listening subjects are to very subtle sound issues, you can not have any chance of knowing that the listening test was even inherently capable of discerning what was being tested for!
In the ABX style listening tests, the comments by listeners in the sighted portion that they are indeed hearing what they expected to hear is often cited as a sufficient provision of this test sensitivity control information.
However, this is NOT a scientific way to achieve the determination of this control condition. It is another example of the answer begging the question. Just as you can not use the test to test the test, you can not use a sighted portion to verify the performance of the forced choice portion. This is yet another example of the incorrect reasoning used to justify these kinds of listening tests, and how valid they are supposed to be.
Part 2 will cover the inherent problems and flaws with DBT listening tests, even when done impeccably. Part 3 will cover alternate methods and include some comments on doing your own DBT's.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Thetubeguy1954
It does however makes for interesting reading especially in light of often hearing it's proponents state "In 20/25/30 years of testing, no one has being able to prove audio components sound different, under controled conditions, when nothing is broken", etc.
Two comments:
1. First off, this IS a true statement. No-one has. The proponents of "mysterious" wire effects have yet to cough up even a scintilla of evidence. Maybe you can- but none of the guys hustling these wires have done so. Why not?
2. Risch is the guy who censors any mentions of controlled testing from the forum he moderates. There's open and honest discussion for you.
Also, have you gotten Risch's permission to post this? AFAIK, it's his copyrighted material. If you don't have permission but you have a link, I can edit this for you.
- Status
- Not open for further replies.
- Home
- Design & Build
- Parts
- I don't believe cables make a difference, any input?