The double blind auditions thread

Status
This old topic is closed. If you want to reopen this topic, contact a moderator using the "Report Post" button.
And in parts, not blind at all.

I wouldn't call Oohashi a sensory test. Their hypothesis (not proven) is that changes they present were caused by sensory stimuli, but there could be quite a few other sources. For example, the data analysis and classification were not controlled. And so far, their work has failed replication in controlled sensory tests in England and in another Japanese lab, which makes me wonder about the last point.
 
And in parts, not blind at all.

Let´s see---the listening test was double blind, the pet/scan part was single blind, the second eeg experiment was double blind, the first eeg experiment was (maybe) single blind.

I wouldn't call Oohashi a sensory test. Their hypothesis (not proven) is that changes they present were caused by sensory stimuli, but there could be quite a few other sources. For example, the data analysis and classification were not controlled.

I don´t know if i got your argument right; do you argue, that because the test was not "triple blinded" (phrase as used in medicine, if the statisticians do still not know who was tested and what was tested) we can not accept the results?

Which would lead to the question if there are any audio tests fullfilling this requirements. I´m not aware of one.

Otoh, that the statistical analysis that provides those impressive pictures of the cerebric is not a trivial task, is known since a lecture was given in which scans of a dead fish showed still brain activity. :)
But i think the authors were well aware of this trap - they included baseline and other scans .

And so far, their work has failed replication in controlled sensory tests in England and in another Japanese lab, which makes me wonder about the last point.

But to be fair, afaik nobody even tried to duplicate their experiments.
 
Yes, two groups tried, I've cited them repeatedly. Null.

There were no controls on the brain scans (see the discussion in the Blowtorch thread). No controls on the data selection and interpretation, no multiple comparisons. But as I said, they were at least honest and forthcoming about their methods, however, this was not established as a sensory effect, so classing it as a sensory test is erroneous at this point.
 
Actually I like the polygraph test of plants exposed to evil thoughts. You can find lots of cites to that, almost none to the debunking follow ups.

The web can be a handy tool, but I've seen quotes from neo-nazi to plain zany sites used in context to support theories and then there are the quotes just plain out of context.

But the question I will ask is what is gained by double blind or equally accurate tests?

Do you design for the outrider, the average or the below average?
 
But the question I will ask is what is gained by double blind or equally accurate tests?
To prevent things like this.
1602.gif
 
Folks,

So, what are we seeing here?

We are discussing blind testing.

A range of more or less accessible set of papers that illustrate good experimental practice in research are posted.

Most are completely ignored and not discussed. One paper, which posited some observations that seem to run somewhat counter to received orthodox dogma is the only one being discussed.

Is the experimental setup being discussed?

Is the dichotomy that while one of several tests failed to show results, a much less subjective method showed them being discussed?

No, all this is being as much as possible ignored and the paper and underlying study is attacked not with facts, but instead by spreading FUD, making aspersion that seem cast doubt on the study, while not giving any factual criticsm of the Paper and/or study!

All other issues around DBT Testing that are covered in those various papers, including the ITU recommendations which are very widely peer reviewed prior to being formalised as well as articles published in peer reviewed are sweapt under the carpet and are ignored.

Meanwhile certain other experiments which are best likened to a confidence trick and violate any sensible experimental protocol all across the board are being heavily talked up, attempting to give them a relevance and credence, which cannot be justified if we compare the experimental setup, statistics etc. with any reasonable protocol one would use in serious science, but places the firmly with the kind of sideshows run by common mountebanks and confidence tricksters...

Someone with sufficient ill will could come to the conclusion that some here pursue an agenda that has nought to do with science, truth and facts.

Otherwise, why don't we discuss how for example the Study by Oohashi that is controversial in some circles and the "Lipshits digital challenge" which is at least as controversial in other circles relate to the ITU recommendations for "Methods for the subjective assessment of small impairments in audio systems"?

Surely, if the experimental design, procedure and statistical analysis of a given experiment are shown to follow good experimental design as suggested by the ITU, it can only aid the study's authority, regardless if the outcome or conclusion is one we "like" or not.

Equally, if the experimental design, procedure and statistical analysis of a given experiment are shown to NOT follow good experimental design as suggested by the ITU, it must strongly reduce the study's authority, regardless if the outcome or conclusion is one we "like" or not.

Well, at least this would be the scientific and reasonable approach, embodied by this quote from Sy's signature, which I strongly agree with and support:

"In science, contrary evidence causes one to question a theory. In religion, contrary evidence causes one to question the evidence."

Ciao T
 
One paper, which posited some observations that seem to run somewhat counter to received orthodox dogma is the only one being discussed.

Is the experimental setup being discussed?

If you're speaking of Oohashi et al, yes, it was.



Someone with sufficient ill will could come to the conclusion that some here pursue an agenda that has nought to do with science, truth and facts.

Who accused you of that?
 
Yes, two groups tried, I've cited them repeatedly. Null.

The last time someone ;) nitpicked about the difference between repetition and replication, so i have to ask if you really mean "replication" or if you mean "repetition" (which should be same as duplication)?

I´m not aware of any other experiments that tried to repeat Oohashi et al. methods.
I know the two papers you´ve cited (several other tried to explore this topic; afair all with inconclusive results), but the differences are quite large therefore i´d doubt that we could call them repetitions of the original experiment.

As the experiment incorporated a subjective evaluation of the presented stimuli, why should it not have been a sensory test?
Of course the authors tried to expand the analysis beyond the subjective answer scheme.

There were no controls on the brain scans (see the discussion in the Blowtorch thread). No controls on the data selection and interpretation, no multiple comparisons.

I am not sure about that; at least they cited the papers of Friston et al. (and those people were in fact developing countermeasures against the multiple comparison problem), so at a first glance i got the impression that they were, as stated before, quite aware of the problem.
 
Citing a paper and implementing multiple comparisons are two different things. They didn't do the latter. Perhaps HF content can work for brain waves of dead fish?

The "sensory" comparisons were a model of bad statistics. Take a set of results that average out to 50%, separate them into two piles with one larger than 50%, one smaller, then claim significance. No wonder they couldn't get their paper accepted in JAES.

The results of the replications at NHK and KEF were not "inconclusive," they were null.
 
Sy,

The results of the replications at NHK and KEF were not "inconclusive," they were null.

So, we have three sets of tests, that appear, from the debate similar but not 100% identical in methodology. One showed a result. Two did not. Null results ARE inconclusive, they simply mean we cannot accept the thesis as true.

So, anyway, how would you say does the Lipshitz/ABX challenge experiment fare, when compared to the ITU Recommendations cited earlier?

We should remember incidentally (or notincidentally), the ITU are mainly Telecom's people, their definition of "small impairment" is by far greater an impairment than is normally accepted as audible in HiFi...

So, where does good old Stan stand then with his experiment? Solid scientific ground, well above criticism, or perchance something a little less exalted that that, when seen in the light of the standards serious experiments and experimenters should hew to?

Ciao T
 
If "serious experimenter" means, "getting the results Thorsten likes," then Prof. Lipshitz is not a serious experimenter. If you can point out his specific errors (a general "he didn't follow a standard of questionable relevance that I don't use either" is not an answer, and you are vague about what you mean by "his experiment"), I'm sure we'd all be interested.
 
Dear Sy,

If "serious experimenter" means, "getting the results Thorsten likes," then Prof. Lipshitz is not a serious experimenter. If you can point out his specific errors (a general "he didn't follow a standard of questionable relevance that I don't use either" is not an answer), I'm sure we'd all be interested.

I believe the supposition on the table is that the recommendation ITU-R BS.1116-1 (10/97) "Methods for the subjective assessment of small impairments in audio systems including multichannel sound systems" represents what may be called "best practice" or perhaps "gold standard".

And the question tabled is, if a particular test you are particularly championing in this thread did indeed conform to such a "gold standard" or if it may be found wanting.

As you are by far more familiar with the test you champion than I am, I posed the question to you, how does the test compare to the best practice. It seems that you are not in a mood to confirm that this test did in fact represent good or best practice, which is surprising, surely a test you are so fond of would represent the highest standards? So it should be trivial to agree that it represented best practice.

Seeing that you are unwilling to come out in favour of this test being equal to the gold standard and instead seek to reject the gold standard, I shall leave the genteele reader to draw her or his conclusion.

Maybe, when I have nothing better to do I will take occasion to illustrate just to what degree the cited "Digital Challenge" failed to conform to "best practice", but I see no urgency, so for now instead I'll leave the field to those who wish to expound just how much said experiment indeed represents best practice and how it should be used as model for any other experiment in audibility research...

Ciao T

PS, I do believe your little Jibe of "getting the results Thorsten likes," is a bit out of order, seeing that the results where not even part of the debate, but rather the methodology applied in said experiment. It may be rather a case of the pot calling the kettle black, if I may opine...
 
A standard is not "best practice" or "gold standard." It is a standard used to allow people following the standard to directly compare results. I sit on several standards committees and we do our best to come up with something reasonable while being well aware that the standards have a very narrow utility.

And again, you vaguely refer to "the test." What test?
 
So... with all the hub-bub as to the invalidity of most (if not all) objective efforts to characterize reality from illusory perceptions proffered by a vocal minority here, one wonders what we are to make of the subjective evidence presented by many non-believers as to the superiority of their claims??? Seems all the objections to the ABX/DBT protocols and multivariate analysis schema are exponentially confounded in subjective claims of superiority based on pronouncements made w/o evidence other than personal observational skills... how's about putting some numbers and such up, eh?

John L.
 
Sy,

A standard is not "best practice" or "gold standard." It is a standard used to allow people following the standard to directly compare results. I sit on several standards committees and we do our best to come up with something reasonable while being well aware that the standards have a very narrow utility.

So, are you suggesting that the ITU recommendations referenced are not applicable to the majority of tests for audibility? If not, how are they not applicable to for example the "digital Challenge" you have been championing here?

BTW, would you suggest that the failure of one person to detect one particular effect in very specific circumstances offers proof that there is no difference than can be perceived by anyone?

And again, you vaguely refer to "the test." What test?

The test, as should be clear from the fact that it is the only test mentioned in the respective posts refers to Mr Lipshitz's "Digital Challenge", however, I am happy to clarify this for you, in case you misunderstood.

Ciao T
 
Switching for ABX tests

How is the switching done usually ? Is it by means of a relay ? If so, is it a special type of relay designed for audio, or a regular relay suffices ?

I believe that there is no problem for an audio amp to suddenly have its output switched from 8 ohms (or so) to no load ("infinite" impedance).
Are there problems with that ?

What other details are important for the power switcher ?
 
Status
This old topic is closed. If you want to reopen this topic, contact a moderator using the "Report Post" button.