John Curl's Blowtorch preamplifier part III

Status
Not open for further replies.
john curl said:
PMA you must look to your own designs as possibly lacking something, IF people do not prefer them over others, not that everyone is fooling themselves over what something looks like or some sales pitch from another.
You want him to design line stages which are better than a piece of wire?

Actually, his latest test does show a small (statistically quite insignificant) preference for 'amp' over 'wire', so maybe he is already on the path to true audio?
 
<snip>
Interestingly, getting 2 out of 8 shows reverse correlation with the correct answers, and should be considered equally significant as getting 6 out of 8. If either trend were to persist, that might suggest System 1 and or System 2 are responding in a non-random way, not guessing.

Forgot to address that before....

Usually we are looking for a sort of direction in results while the null hypothesis represents the "random guessing" formally written it is:

H0: p = 0.5 (H0 is the null hypothesis)

and the alternative hypothesis would be:

H1: p > 0.5

this kind of hypothesises directional and the test of this kind is called "one tailed" or "one sided".

Another possible variant is:

H0: p = 0.5

and the alternative hypothesis would be:

H1: p <> 0.5

in this case the direction is unspecified and it is accordingly called "undirectional" and tests of this kind are called "two sided" or "two tailed" .

As the binomial distribution is principially symmetric (depending on the actual sample size if perfect symmetry is possible) it follows that the probabilities is the same on both sides of the distribution.
In this case we would accept results from both sides, but have to divide our criterion of SL = 0.05 to both sides so that it is using 0.025 for the lower end (results like 0 or 1 or 2) and 0.025 to the high side (results like 10 or 9 or 8).

Leventhal covered that in an JAES arctice (~1986 - 1988 iirc) and coined the nice term "statistical significant poor performance" .

Obviously it makes rejection of the null hypothesis more difficult in case of true positives (means for listeners who could really distinguish) and therefore the advice is given not to alter the criterion for the high side and to take significant poor results just as a hint that some participants might have misunderstood the instructions given or did confuse the answer buttons.
 
The factors above all discredit any short testing regimen.

I would not agree in general as it is my experience that humans are able to learn to control those factors (not perfectly though as we are still humans) and there is some evidence for this from food tests too.

But it is of course more difficult in case of multidimensional evaluations than it is in directional onedimensional experiments which shows in the data from different studies/experiments.

Forget about variability between different humans, my opinion after testing many is there is very little correlation between most people for anything more than gross differences; but even within a single test subject variability is significant.

Agreed that the variability can be surprisingly large, but imE it is possible to find agreement within groups if synchronization of vocabulary and shared listening to sound examples (representing various effects in different categories) is done.

<snip>..Longer testing sessions over a few days or longer average out these factors as much as possible.

Agreed again; surprisingly i was strongly attacked in another forum by the admin after mentioning this positive effect of listening over a longer time span. His point was that i failed to provide scientific evidence for this temporal averaging effect that surely must exist only in my imagination....
Got banned shortly after that for a never specified violation of forum rules so could not figure out why somebody would negate temporal averaging.

Except for gross sonic differences, I believe any of these tests really only test the individuals, not the equipment. Of course there is value in that result, but not necessarily as applied to subtle equipment differences.

It seems to be usually a combination of testing both, listener and actual (if existing) difference.
 
I would not agree in general as it is my experience that humans are able to learn to control those factors (not perfectly though as we are still humans) and there is some evidence for this from food tests too.

But it is of course more difficult in case of multidimensional evaluations than it is in directional onedimensional experiments which shows in the data from different studies/experiments.



Agreed that the variability can be surprisingly large, but imE it is possible to find agreement within groups if synchronization of vocabulary and shared listening to sound examples (representing various effects in different categories) is done.



Agreed again; surprisingly i was strongly attacked in another forum by the admin after mentioning this positive effect of listening over a longer time span. His point was that i failed to provide scientific evidence for this temporal averaging effect that surely must exist only in my imagination....
Got banned shortly after that for a never specified violation of forum rules so could not figure out why somebody would negate temporal averaging.



It seems to be usually a combination of testing both, listener and actual (if existing) difference.

I noted before that HHoyt also gets it - glad to see it - with enough people registering their honest opinion a realistic view of Foobar ABX testing can be established for the general reader who no longer get fooled by the ABX party trick
 
Agreed again; surprisingly i was strongly attacked in another forum by the admin after mentioning this positive effect of listening over a longer time span. His point was that i failed to provide scientific evidence for this temporal averaging effect that surely must exist only in my imagination....

I would like to just point out that there forums that ban anyone that insists on any non sighted test of any protocol.
 
Guys, I have been defined as 'crazy'. At least 'crazy' enough to be rejected by the US military during the Vietnam War! (thank goodness)
When it comes to 'bragging' about my success in audio design, it should be obvious from the number of products that I designed that have been given a high rating internationally.
However, I am a LOUSY business man, and a poor marketing guy, by almost every account. I struggle along on some modest royalties, because I failed running several actual businesses in the past, but I am happy that I made a known level of achievement in advancing audio circuit design. I certainly never got rich for it, although I did help a few others to do so. I personally feel that I have lived a 'completed' life by contributing to audio design for the last 50 years, and am still going strong at it. Now, what am I doing here? I am trying to help others find what 'works' to make a more successful audio design. And I have tried to give it away for free, only to be put down by the 'hear no difference fraternity' that came to reside in THIS thread. Why this thread? You know why.
 
While you guys intelligently discuss the inner workings of ABX testing and statistics, I keep hoping to point out that many audio designers have come to ignore ABX tests, because they tend to 'throw the 'baby' out with the bathwater'. That is the only reason. Serious audio designers need to hear differences, not have them obscured by some test procedure. We have to be able to hear these differences while we develop products for release to the public, in order to be competitive with our fellow designers competing with us in the marketplace. Only non-ABX methods, not necessarily sighted, seem to work, so we stick to them. I have found from personal experience that IF an audio designer designs essentially for the best 'specs' only, and just relies on instruments and ABX nulls only, that what product will result MAY fail in the audio marketplace, just because it 'sounds' disappointing, no matter what it looks like or who designs it. I have made such a mistake, more than once, to my own personal embarrassment. Have any of you out there? Well, if you have, please don't blame it on fickle audiophiles, and attempting in proving them wrong with ABX nulls that you have made. It just doesn't work.
 
Last edited:
... I keep hoping to point out that many audio designers have come to ignore ABX tests...

Hey John, long time, how z'it doin'?

While I certainly respect the designers' point of view, yours among many others, I'd like to point out that many audio reviewers and audio consumers also agree with it.

Check out John Atkinson's session at the RMAF 2018: he spent half an hour talking about how this this amp sounds better (or worse) than that amp, and then he jokingy said, at 40:41 "... Of course, the blind listening tests say they sound the same anyway...".
 
Last edited:
Status
Not open for further replies.