What kind of evidence do you consider as sufficient?

Status
Not open for further replies.
NATDBERG while I agree with most or all of what you said it has very little to do with the point I was trying to make.

The point being that the participants can not fail an ABX test, only the difference being tested can so there is no pressure on those participants.
The exception is when a participant claimed audible differences in non-blind reviews and in audiophool world those differences tend to be of 'night and day', 'my wife could hear it from the kitchen' or 'a veil was lifted' type.
This creates self-imposed pressure as well as the possibility to fail for the particular participant.
 
@DPH,

Jacob, I wish I could give you a straight answer that didn't have conditionals. Obviously agree on the need for clear hypothesis. "It depends" would be my most honest answer. Preregistration of test protocols (DBT with positive and negative controls, or prior research into the capability of the testers under similar conditions), training results showing that variability is heading towards its asymptote are a good start. Consult with a statistician and have a pre-trial analysis work flow established.

Nothing wrong with conditionals, as we already know that there is no "one size fits it all" method.

I´d say that means your were satisfied with any experiment that fullfills the usual quality criteria (i.e. validity, objectivity and reliability) but has added preregisteration.

@ DF96,

It would help if those involved in the test had no financial interest in the outcome - or at the very least declared any such interest. Reputational interest can also be an issue.

"It would help..." is nice, but is it mandatory or a knock out criterium?

@ RMarsh,

"Majority .. around the world" , that sets the bar really high, or is it a misunderstanding?
 
<snip>
This creates self-imposed pressure as well as the possibility to fail for the particular participant.

Self-imposed pressure makes it more worse but there it is already quite likely that the experimental conditions per se create kind of "pressure" or "distraction".

But in the specific case you´ve mentioned i´m pretty sure that "failing" meant "unability to get a positive ABX result" although the listener is confident that an audible difference exists and that he perceives it so.
 
So far it seems that no unusual requests or conditions were posted, following good scientific practice seems to be considered as sufficient.

A bit harder to meet is the demand that no one of the experimental crew should have financial or reputational interests at stake.
A new proposal (for the audio field imo) is the demand for pre-registration, but it is very important.

Generally nothing really surprising but some informations are missing. What about the level of significance that is required? What about replications/duplications?
 
Late to the discussion (again, I am) but

I personally like “computationally blind testing, with trickery”. It seems to expose even the most subtle auditioning bias.

For example. I “made a box” some 20 years ago (before anything was Arduino-cheap) which had a bunch of whisper-quiet mercury-reed switches and gold-plated super duper jacks on the back. There was a 10 meter cable to a “audition box” and inside was a single-board micro controller computer I got for free to do the testing.

Initially, I gave the audition participant the choice of A, B and X. A for A, B for B and X for A-or-B more or less at random. That way — so my thinking went — the auditor could “train up” to both A and B, then using X, try to decide whether it was path A or B. Moveover, once X was chosen, the auditor had to choose "A" or "B" as her guess … which was statistically registered when the "Y" (yes) or "N" (no) button was depressed following the A or B.

A typical test was like this:

A, B, A, B, A, B, X
A, B, A, Y (i.e. X sounded most like A)
A, B, A, B, X
A N (X didn't sound like A)
… and so on

Did this work?

Well … because of little quirks in the device, the minute sound of the reed switches was detectable to listeners in a very quiet chamber. Thus, the “blindness” went away. Users could easily figure out what was what from cues, and the statistics became rubbish.

Next, I decided on using the same system, but with “trickery”. I put in much larger conventional relays that made distinct clicking sounds. A bunch of them. However, they weren't attached to anything. Just there to click. Moreover, when the auditor chose 'A' or 'B', the relays 'A' and 'B' clicking was chosen at random. Thus (and because they were so much louder) there wasn't any pattern to learn. 'A' and 'B' became quite indeterminate.

Another tack that I tried was the very same, but with both 'A' and 'B' connected to 'B' channel. 'A' would never fire up. And the louder bogey relays continued their random patterns. This was VERY telling, because even so, some people expressed strong opinions about whether they preferred A or B. (Obviously, this is easily done by externally jumper-wiring the back of the box, so that the box wasn't involved at all.) Very telling.

Out of nearly 100 people, we encountered fewer than 15% who could reliably state, “I really cannot tell a difference at all”.

In any case, I do prefer the automated-randomness-with-trickery setup. It seems to have the best quick-unlearning for auditors. Especially when they're told HOW the “unbiasing” is being performed. Never up front, but after a few minutes of trials.

Just saying,
GoatGuy
 
He won't give straight answer or acknowledge that he has no answer and he's doing this out of his audio electronics business interest. Yes, he sells.
I'm afraid I find myself agreeing with you, I wish it wasn't so, lots of smoke and mirrors and non-answers to straight questions, he's done it to me once too often and I'm sick of it, not wasting my time with him anymore
 
Look at this thread for the first time, two things immediately come to mind from cognitive psychology research into bias and human nature. (1) There there is very interesting research showing that when people have firmly made up their minds, they are closed to thinking in truth seeking mode.<snip>

I remember having read in one of Laplace´s articles from around 1810 about the tendency of humans to establish quite strong opinions even on topics that they don´t know really well. And he noticed already back then that after this initial opinion forming a kind of confirmation bias comes into play which filters new information and dismisses those parts that do not corrobate the opinion.

A supplemental idea came from a Gigerenzer article (~1993, although i don´t know if he invented it) that we tend to behave like Bayesians more often than we think, which means we maintain (strong) prior believes and any new information only varies the degree of our belief.

Of course in true Bayesian style this degree of belief is expressed as probability but if one sets his prior belief (in our case in the audibility of a certain effect) at zero then no experimental result/evidence will be really convincing anymore.
 
Last edited:
NATDBERG while I agree with most or all of what you said it has very little to do with the point I was trying to make.

The point being that the participants can not fail an ABX test, only the difference being tested can so there is no pressure on those participants.
The exception is when a participant claimed audible differences in non-blind reviews and in audiophool world those differences tend to be of 'night and day', 'my wife could hear it from the kitchen' or 'a veil was lifted' type.
This creates self-imposed pressure as well as the possibility to fail for the particular participant.

I see - sorry, my confusion! Think the quote in your post was missing the original context so I lost it too.
 
@Jakob2, Maybe you are putting the cart before the horse by asking what people would need to see to be convinced. You are asking people to play a game of imagination that people tend to be very bad at. There is research showing that people are very bad at imagining the future and how they will feel or react if this or that comes to pass.

Also, it seems to me for now we just ought to work on demonstrating just how bad and mistaken the old hearing research is, at least for some fraction of the population. We also ought to look into what kind if hearing mistakes people make when they think they hear a difference and in reality there isn't one. My bet would be that the errors are mostly systematic, that is, everyone tends to make the exact same errors.

If you have read Daniel Kahneman's book, Thinking Fast and Slow, what he and Amos Tversky did that later won Kahneman a Nobel Prize was experiment on themselves until they found systematic errors in their thinking then design experiments to demonstrate the same effects in others. I think there are probably some easy pickings of that sort that can be exploited in hearing research. No need to go after the more difficult high hanging fruit until the easy stuff has been dispensed with.
 
Last edited:
So far it seems that no unusual requests or conditions were posted, following good scientific practice seems to be considered as sufficient.

A bit harder to meet is the demand that no one of the experimental crew should have financial or reputational interests at stake.
A new proposal (for the audio field imo) is the demand for pre-registration, but it is very important.

Generally nothing really surprising but some informations are missing. What about the level of significance that is required? What about replications/duplications?

The latter is where my "it depends" and "please consult a statistician" comes into play. :) It really depends on exactly you want to demonstrate. But, yes, I'm pretty straightforward as far as following present best practices.

I'm reluctant to say "6 sigma" or p < .05 or anything of that sort.
 
Self-imposed pressure makes it more worse but there it is already quite likely that the experimental conditions per se create kind of "pressure" or "distraction".
No pressure or distraction. Those are the terms they come up with after the results show up to be random guessing.

But in the specific case you´ve mentioned i´m pretty sure that "failing" meant "unability to get a positive ABX result" although the listener is confident that an audible difference exists and that he perceives it so.
It's either significantly correct or random guessing. As long as the listener makes the selections, there is no inability to guess. Those so called "pressure" or "stress" during test is bunch of scapegoat conjured up by sellers / shills.
 
frugal-phile™
Joined 2001
Paid Member
...perform another test which is a just a front whilst they observe some other behaviour. Could something like that be worked out?

(my emphasis)

It could. It would be a lot easier if data was also being gathered without having the testees having to vocalize their thots (ie by directly reading brain function).

But even eithout that. I read about an informal test done at an AES convention where attendees, in a theatre watching/listening to a performance, were subjected to 1st hiRez (24/192?) then the source was step-by-step degraded thru CD quality to MP3s. The writer expressed how he felt the sonics slowly got worse.

It adds more difficulty to the test, but i expect it is doable.

dave
 
The trained ear/brain readily differentiates time domain behavioral differences between DUT's whereas stationary signals/measurements do not.
I find that testing using a slightly different methodology is more useful than ABX blind testing for my purposes, call it AX testing.

By this I mean unlimited training to the A/Reference condition.
Next is knowingly switching to B and learning differences with unlimited switching back to A/Reference as required to reinforce findings.

Data collection stage is subject determined (length/track position) listen to A, and then switching (immediate or variable muting period, subject determined length/track position) to the unknown...ie A or B and then making a decision of 'sameness' or 'difference'.
I find that by this re-referencing at each experiment, very fine differences can be readily discerned and reliably.
A or B, sighted or unsighted longer term listening then determines ultimate preference.
My two cents.


Dan.
 
Self-imposed pressure makes it more worse but there it is already quite likely that the experimental conditions per se create kind of "pressure" or "distraction".

But in the specific case you´ve mentioned i´m pretty sure that "failing" meant "unability to get a positive ABX result" although the listener is confident that an audible difference exists and that he perceives it so.

Can you please explain the difference between "get a positive ABX result" vs "perceives it so"? Are you suggesting that ABX (or other blind protocol) results are predicated on something other than auditory perception?
 
Status
Not open for further replies.