pg. 208 Stereophile mag Oct 2007 Industry Update

Status
Not open for further replies.
The premise of ABX is that everything sounds the same, so that is all it could prove.

Jan beat me to it- that is absolutely, 100% wrong, and I don't say that to you very often.

And as a general note to all, double blind and ABX are NOT one and the same thing- ABX is just one possible test format (I say this as someone with about 20 years of professional experience in sensory testing). All the energy spent in trying find real or imagined flaws and pointlessly arguing on Internet forums could be much better used for understanding the various types of test formats, choosing ones appropriate to the issues at hand, and performing them. But one has to be ready to accept the outcome, even if it's not the one that one wants.
 
rdf said:
...as an expert in listener preference, not reproductive accuracy. Until they start comparing to a live source that's the only valid interpretation of the results.[snip]

True. But that doesn't in itself invalidate the test and the outcome.

The point I was making by reference to this test was that besides sound, the shape, color, reputation etc of the item under test, the speakers in this case DOES influence even the judgement of seasoned listeners, people that declared before the test: "I know about this non-sound influences, but I can disregard them". They couldn't.

rdf said:
[snip] And I see you fell for auplater's parry/thrust manoeuvre. 😉

Somebody should tell me when to stop 😀

Jan Didden
 
planet10 said:


The premise of ABX is that everything sounds the same, so that is all it could prove.

I'm puzzled by this statement. It's logically impossible to *prove* that no difference exists in any case. It's only possible to demonstrate that under the specific test conditions, no difference was detected. [or possibly that 'no statistically significant result was obtained']
Thus, I'm puzzled as to how ABX can 'prove' that everything sounds the same at all, let alone how this can be attributed to an a-priori assumption.


The forced choice affects the beta such that it actually can't do anything useful.

I must be slow. ANY testing methodolgy would seem to involve a 'forced choice' in the sense of providing an answer. If you're suggesting that the introduction of a 3rd sample changes the structure of the test, I'd be interested in seeing the reasoning.


I would infer from this that you would say that eliminating X altogether and running a straight comparison between Unknown Entity A and Unknown Entity B is better [maybe I'm missing something, as the only other conclusion is 'testing doesn't work']. I have never seen an analysis, but to my laymans perspective this seems to be a more stringent and objectionable test since you aren't provided the opportunity to 'recalibrate' against a known component. IMHO this is the benefit of an ABX approach over an A/B - you are provided as much opportunity as possible to ground yourself in a known state before making a choice.


ABX is psudeo-science at its best, and besides the death blow above has all sorts of other weaknesses (like how much does the switch-box itself obsure?)

dave


The fact that you mention 'switch box' indicates to me that you are talking about ABX differently than I am, or at least how I intended to. I may be deviating from acceped terminology or practice, but I take 'ABX' to be an abstract/logical methodology for evaluation. It's pretty simple conceptually

Listen to known entity A as much as you want
Listen to known entity B as much as you want
Listen to unknown entity X,
[repeat at will, switching between A,B,X]
Eventually, identify X as either A or B


Suggesting that 'ABX' in this logical definition is 'useless' seems to be a stretch. This is how the majority of progress on perceptual coding has been made, and despite it being uninteresting to audiophiles I find the ability to throw out 80+% of a signal and still end up with something half-way decent pretty remarkable.


Having said all that, I fully realize that there have been a lot of zealots running around with their 'ABX' switchboxes crowing about how they have 'proven' that everything sounds the same. I'm not attempting to defend that practice in any way. As I said in my earlier post, I have come to the conclusion that a properly administered ABX test (as I think of it) is at best highly impractical to try to run, and may be effectively impossible for evaluating physical components.
 
I have a feeling the brain uses sight as a reference to build up a picture of the differences between items and totally blind it can't build up the picture of differences as it doesn't have a reference point and short term memory isn't very good at holding the info for comparisons (hence why you can't tell the differences between the two similar coloured cards shown in sequence) .

A valid blind test IMHO would be where each item has a letter so you always know which is playing and the brain can build up a catalogue of differences.

having the letter A or B displayed in a blind test where equipment isn't visible has no ability to affect the outcome unless there are actually differences between the equipment.

John🙂
 
janneman said:
Really Dave? That's news to me. Is that documented somewhere?

Jan,

I already mentioned this to you specifically at least once... the record of this would be in the archives of the hifi list, circa 2000/2001, and then that list provider was sucked into the maw of yahoo, a black hole i've never been able to get anything out of....

dave
 
SY said:
that is absolutely, 100% wrong, and I don't say that to you very often.

I believe that i picked that up off you SY. Even if i got it backwards it does not make the test any more valid

double blind and ABX are NOT one and the same thing

That is for sure. I am still waiting to see some one come up with a solid reliable blind test. The best i've seen are when the listener is actually blind to the fact that they are even being tested.

dave
 
mr-mac said:
I have a feeling the brain uses sight as a reference to build up a picture of the differences between items and totally blind it can't build up the picture of differences as it doesn't have a reference point and short term memory isn't very good at holding the info for comparisons (hence why you can't tell the differences between the two similar coloured cards shown in sequence) .
[snip]John🙂


The brain uses EVERYTHING it can to construct a final picture for you, sound, sight, previous experiences, expectations, your 'body state' (how you feel etc). If that is not enough, the brain will go to great lengths to preserve you self-value and your ego.

If you are part of a team that accomplishes something important, your brain will make sure you feel it was because of your strong involvement and leadership. If the team failed, your brain will make sure you know it was because of the incompetence of the other team members. Personal objectivity is an illusion if there ever was one.

Jan Didden
 
Sorry Jan... your posts always seem on the mark and I tend to lend them a great deal of credibility... after all, you actual make equipment and stand by your results... as well as provide cogent arguments...:nod: maybe an analogy to SET amps should be made wrt..😀

Guess I should butt out and let the discussion continue re: the time worn DBT / ABX rebuttal/invalidation crowd...

SY's right on the money... if all the efforts to dismiss blind testing were spent on understanding what testing can and cannot do, we'd all be better off...

I just hope the nay-sayers don't use any of the medical procedures and/or medications that have been validated through extensive statistically valid methods of blind testing, thus remaining true to their cause
 
janneman said:
True. But that doesn't in itself invalidate the test and the outcome.

100% agreement. It does very much limit how those results are interpreted and applied. I can think of no better way to design loudspeakers which will, all else equal, meet with wide marketplace acceptance. What requires demonstration is equivalency between preference and accuracy rather than with smiling graphic equalizers. Your response did make me realize the similarities between the protocols at the HK labs and wine tasting. Wines are invariably - reflexively flinching at SY's response - compared to each other in a closed system. A reference outside of other wines appears (to the layman) nonsensical. Speakers however, assuming accuracy is the goal, have a reference completely outside themselves in the form of the original acoustic event. So to me HK’s results at best suggest something about the latter, but no more.
 
On the Harman paper...ask yourself this question:
Do you feel that it has more credibility, coming from HK, than if the paper had been released by some small boutique outfit?
Think that through before answering. The implications aren't as self evident as you might think.
rdf, for one, seems to feel that the HK brand name imparts legitimacy. If I read his post properly, auplater seems to feel otherwise. It warrants the time and trouble to take it down at least three or four levels. On the surface, the words Harman International at the top sound official; makes it seem like, okay this must be legitimate. On the second level, there's the "But what if their profit motive outweighs their impulse to tell the truth?" Then there's: But surely HK wouldn't risk their credibility by putting out marketing fluff masquerading as scientific fact...would they? Then you think back to other white papers you've seen that were, in fact, thinly disguised sales literature. And so on. Like I said, think before responding.
Jan, the problem is that you didn't begin with, "Okay, I've got this link to something that might bear on what we've been talking about. There are some problems with it, perhaps, but it says thus-and-such." It was more along the lines of a sneering, "See, I've got you now! Here's an authority figure that says I'm right!" And nary a word indicating that you saw the inherent flaws in the paper. The question at this point is: Did you recognize the flaws and keep quiet in hopes that they would not be noticed, or did you not see them until they were pointed out. Either way, it doesn't paint a pretty picture.
And that's the problem. It's not about science, it's about emotion. It's about control and authority figures. It's about hypocrisy and lack of rigor in examining your own position. What it is not about is science. You are free to distrust me. You are free to trust Harman International. But you still haven't explained what my motive for lying might be, whereas HK has every reason in the world to disguise sales literature as "science."
In the end, you can put it to the test, you know, you can actually train your ears and listen. Then you will know. Now that's science. It's based on independent verifiability. Unfortunately, that's an old-fashioned concept that is currently out of fashion. It's much more in vogue to simply take an authority's word for it.
Me? I chose independence years ago. The authorities were quite simply wrong.
Juergen Knoop,
You have hit pretty much the same point that the editorial in Wine Spectator (mentioned earlier) was talking about. ABX is fatally flawed.

Grey
 
mr-mac,
Your colored card experiment is being ignored, perhaps because it cuts too close to the quick for some. I can tell you that I would flunk.
My father, both uncles, and grandfather were all in the textiles business. They, through long years of experience, were able to glance at a bolt of fabric and tell immediately what lot the dye came from. I could not see the differences, but I was only a youngster and they'd been doing it longer than I had been alive. In other words, eyes can be educated, too.
Although all three of them are dead now, I imagine they would do quite well at your colored card test, whereas I would not do well at all, not having educated my eyes in the same manner.
Perhaps we should coin the term Golden Eye...?

Grey
 
Do you feel that it has more credibility, coming from HK, than if the paper had been released by some small boutique outfit?

c) neither of the above.

The paper has more credibility because Floyd Toole, much-honored AES Fellow, has a long and distinguished track record as a serious researcher in both industry and government/academia. His papers are subject to comment and (in the case of his numerous AES publications) peer review. If you disagree with them on substantive grounds, get something published in JAES.
 
Donuts and coffee are basic food groups to broadcast engineers. Trivializing somewhat, wine has no target outside of 'user preference'. Unless a vintner intentionally strives for the perfect hotdog experience (my father's homemade Concord qualifies) the reference is always internal, another wine. Speaker designers who claim to target realism by default imply correlation to an acoustic event entirely unrelated to other speakers. The metric of success should be that acoustic event, not other speakers.

Not even close Grey. I own nothing Harman nor care. The consideration comes from exactly what SY spoke of, decades of peer reviewed research. For 25 years Toole did at the NRC much what he’s doing at HK. Preemptive industry whoring?

That hardly makes me a Toole fanboi. Whispers in the acoustical community completely unrelated to hifi suggest the calibration of NRC’s rooms during his tenure…. could have been better. The one speaker I heard long ago (Rega) Toole reputedly had a hand designing was appalling. To reiterate clearly, HK’s labs are doing 100% exactly what they should be doing for their parent company, determining what an educated listener prefers to hear. Nothing dishonest or underhanded about it, nor does there need to be for it to fulfill its purpose. It’s valid science within its limits. My problem with it is what I see as a lack of proven correlation between preference, the metric I’ve seen applied in every HK paper including the one linked here, and accuracy in reproduction.
 
Nope, still have to disagree, except for very special cases- as you point out, Toole is specifically looking at user preferences. If, for example, I were testing the effects of jitter on digital sound and sensitivity thresholds, I'd be comparing signals using lowest possible jitter to signals with vary levels and distributions of jitter, just to see "audibly different" or no, not preference or any reference to live events. Same with amps, preamps, wires, whatever. "Can a difference be distinguished?" must precede "is A preferred to B?" or "is A more 'realistic' than B?" Until that question is answered, the experimenter can't work the problem on any deeper level.
 
Loudspeakers are one of those special cases. The discussion arose from Jan's linking to an HK release in part discussing sighted vs. blind speaker listening tests. In that case the differences giving rise to preference were clearly audible and your metrics met. Presumably they were also repeatable, otherwise HK's statement has no value. The published results were limited to examining preference between models rather than indistinguishability from an unrelated acoustic source. It's my recollection the same is true for all HK's and the NRC's work, though that might be more a gap in my knowledge than a failing on their part.
 
Back at RIT, in Hollis Todd's class on photography, they held up a green card and told or asked (can't remember exactly) us if it was the color of the grass outside. The entire class laughed and said, "no way". The card was obviously a color no self respecting blade of grass would have anything to do with. BTW, the lighting was reasonable incandescent, nothing odd. Well, we all went outside, where the green card was laid on the green grass at a slight distance. It matched so well, you could barely see it. The eye/brain does a lot of adjustment based on current conditions, and color memory is as bad as audio memory. Or the other way 'round. Hearing is anything but a constant and sometimes I can hear very subtle differences, whereas other times (usually when I'm trying to hear a difference), everything sounds about the same. The arguments for ABX and other tests are logical and rational, and those tests certainly indicate when equipment is significantly different. OTOH, they don't show up subtleties well at all, leading to the question of whether those subtleties even exist. Just about everybody who hears subtleties is convinced they're real, not some secondary effect of emotion, weather, or too-spicy pizza. AFAIK, we don't have a test that compares two different components ability to give emotional involvement in music. I think I know it when I hear it, but it disappears the minute I try to prove it or test for it.

As for speakers, I'm not sure I want an exact reproduction of a live performance in my living room. I want what I remember it sounding like. The competition for speakers is other speakers, not live performances, so if I were a manufacturer, I'd be designing to compete with other speakers during short listening sessions in a showroom. And making sure I had great marketing. IMO, that rarely produces great speakers, but it prevents going out of business.:dead:
 
Blind faith in authority was my credo, too, for quite a few years. After all, the science couldn't be wrong! It was peer reviewed, and had graphs and numbers and all those nice things.
Now, looking back, I recall the scathing criticism (some delivered by yours truly, but also in the audio press and in overheard conversations--in fact, it was everywhere...same as today) of those who said capacitors sounded different. Some of you might not have been around then, but it had the same "I have science on my side and you're a misguided idiot" attitude about it. Then the capacitor paper came out in the late '70s and suddenly it was all right to hear differences in capacitors. It was now politically correct, you see.
And, of course, as we all know, the capacitors in question didn't have any effect on sound the day before the paper came out, but the day after...well, somehow, mysteriously, they did.
Right?
People who listened insisted that absolute phase could be heard. "Oh no it can't!" chortled the crowd. "There's no mechanism in the ear that can detect such a thing. There's no reason for people to evolve such an ability. You're a *******' lunatic!" they cried. Until, lo and behold, the day came when it was proved that people could actually hear absolute phase.
Well, we all know what happened...it was inaudible the day before the paper came out...then overnight it became audible.
Right?
No. Wrong. It was there all along. People who actually listened knew it but were laughed at if they dared say a word.
I confess to a sneaking admiration for Doug Self, who--rather grumpily--admitted that...well, okay, maybe absolute phase can be heard after all, in a later revision of his book. Given his jeering in the past, I can only imagine the pain it must have cost him to write that paragraph.
Of course, it could all have been avoided if he had just shut up and listened. Ten minutes would have done the trick, maybe just two, assuming that he knew how to listen. But no, his arrogance and pride got in the way and he was so certain that science was on his side that he fired one broadside after another at those who listened. Couldn't be bothered to do the one thing that would answer the question:
Listen.

"Oh, but I just can't trust my ears."
"I need someone to tell me it's okay to hear this effect and I don't have permission."
"I wouldn't know what to listen for."
"I can't be bothered with that stuff. It's just a minor effect at best."
"But everybody will laugh at me."
"There's no paper to back me up."
"My equipment's not good enough."
"What will my uncle think?"
"I thought I heard something. I must have imagined it. After all, everybody knows it's not really there."
"I read a paper on that once. It proved beyond the shadow of a doubt that no such thing exists."
"I can't see why or how it would work, so it can't be so."
"I can't go against the crowd. There are so many of them. They must be right."
"I can't set up a scientifically valid test on my own. It would take at least a dozen people and thousands of dollars of test gear."

I was once where you are now. I got over myself.
Listen.
It's that simple.

Grey
 
Status
Not open for further replies.