John Curl's Blowtorch preamplifier part III

Markw4 · 2018-11-16 12:26 am

scott wurcer said:
What would you suggest?

I don't know where you got the idea anything like you said is going to happen. So, maybe you could go first?

john curl · 2018-11-16 3:49 am

What are you guys up to?

phase · 2018-11-16 4:26 am

DPH said:
Hanlon's razor - Wikipedia

(although stupidity might be better called "laziness")

We refer to that as the “human” factor in my group.

PMA · 2018-11-16 7:04 am

Markw4 said:
Actually, there is still a problem with Foobar ABX and how it calculates probability of guessing. Guessing should give a score of 4 out of 8 (on average). The probability of guessing in that case should be at a maximum. However, Foobar ABX says there is a 50% chance of guessing when the score is 4 out of 8.

Foobar ABX calculates the probability exactly according to a mathematical definition of probability. Probability of 4 successes of 8 coin flips is NOT 50% !!!!!!! For 10 attempts 👎 with (k) successes, the probability you were guessing is

k- probability you were guessing
1- 99%
2- 99%
3- 95%
4- 83%
5- 62%
6- 38%
7- 17%
8- 5%
9- 1%

Probability to have success k-times from n attempts in a vote from 2 possibilities (so called Bernoulli scheme)

p(x = k) = (n over k)*p^k*(1 - p)^(n - k)

p .... probability of success
n .... number of attempts
k .... number of successes

PMA · 2018-11-16 7:42 am

Markw4 said:
Actually, there is still a problem with Foobar ABX and how it calculates probability of guessing. Guessing should give a score of 4 out of 8 (on average). The probability of guessing in that case should be at a maximum. However, Foobar ABX says there is a 50% chance of guessing when the score is 4 out of 8.

No, it does not. Foobar ABX says this in the 4/8 case:

Code:

Output:
WASAPI (event) : OUT (DUO-CAPTURE EX), 24-bit
Crossfading: NO

08:38:52 : Test started.
08:39:07 : 01/01
08:39:10 : 02/02
08:39:14 : 03/03
08:39:20 : 04/04
08:39:25 : 04/05
08:39:29 : 04/06
08:39:34 : 04/07
08:39:38 : 04/08
08:39:38 : Test finished.

 ---------- 
Total: 4/8
Probability that you were guessing: 63.7%

john curl · 2018-11-16 7:49 am

It would appear that 'statistics' is confusing everybody. '-)

Markw4 · 2018-11-16 7:50 am

The probability of a coin flips and the probability of guessing are two different things. On average X=A should happen in ABX as often as X=B. If one doesn't know the correct answer and therefore always chooses X=A then answers should be right half the time. One should score 50% correct.

This is the same type of problem that comes up in trying to penalize guessing in multiple choice exams. The usual formula is that if there are 4 choices on each question, then exam taker will be penalized 1/4 point for each incorrect answer. The purpose of applying a penalty is to discourage guessing.

However, the approach has been criticized for various reasons, including, IIRC, whether or not the formula is the most correct one to use.

In addition, with ABX testing it has not been researched whether there is in fact a reverse correlation effect due to System 1 operations in instances of weak or unfamiliar signal detection. I think there might be.

Further, with coin flipping it is assumed that each trial is a random event, uncorrelated with other trials. For a human either guessing systematically or with correlated errors associated with memory, fatigue, and other factors when trying to answer the same question over and over but not quite getting it, it might be something more like having a skilled coin fipper that can affect the outcome of trials to make them partially deterministic rather than purely random and independent.

For the above reasons it might be better to skip trying to assign guessing probabilities for ABX testing of small differences. Equating it to a process like coin flipping may be too much of an oversimplification.

PMA · 2018-11-16 8:09 am

Markw4 said:
The probability of a coin flips and the probability of guessing are two different things.

Of course. Probability you will succeed just 5 times from 10 attempts is not the same as probability that you were guessing if you had 5 successes from 10 attempts.

Probability that you were guessing if you had k successes from n attempts is what is called "at least k successes".
For our example of 5/10, it is calculated as probability of just 5 successes of 10 attempts + probability of just 6 successes of 10 + ...... etc. It is a sum of probabilities of just 5/10 + 6/10 + 7/10 + 8/10 + 9/10 + 10/10. And that's exactly how Foobar calculates the result, there is no problem, no mistake.

Markw4 · 2018-11-16 8:21 am

We will disagree then, whether the right math model is being applied to what actually happens in ABX testing. That's okay though, hardly the only thing we disagree on. 🙂

PMA · 2018-11-16 8:54 am

That's fine, no problem. We basically disagree on usefulness of the ABX test, so why to bother on probability calculations.

Jakob2 · 2018-11-16 9:13 am

That illustrates one of the problems with null hypothesis testing like it is usually done.
The data observed is the number of correct answers summed up and used as test statistic.
The analysis is done under the assumption that the null hypothesis is true (the null hypothesis is stated as random guessing) and uses the exact binomial test.

A significance level is set before doing the experiment (often at SL=0.05) and the accumulated probailities for each possible result is compared to this significance criterion.
Foobar shows these accumulated probabilites for the result chain rounded to one digit.

The chance to get 10 correct answers in 10 trials is P(10 l 10) = 0.00098 = A (rounded to the 6th digit.
The chance to get 9 correct answers in 9 trials is P(9 l 10) = 0.0098 = B

Given the often used SL = 0.05 we would accept each of these results rejecting the null hypothese, but that means (according to the Kolmogorov axioms) that we have to add the probabilites, as the probability for getting the result P(A or B) = P(A) and P(B) .
In this example it gives P(A or B) = 0.01074 so it is still below our criterion.

The next would be P(8 l 10) = 0.04395 = C but now the cumulated probability for
P(A or B or C) = 0.05469
The probability for that result would be higher than our critierion of SL = 0.05, so in a formal decision process we would no longer reject the null hypothesis.

Actually we are evaluating how compatible the observed data is with our null hypothesis, but as we are not really examing if the null hypothesis is true, we can´t conclude that a negative test result establishes the null hypothesis or corrobates the null hyptothesis, because the real reson might be different and other hypothesises might be even more compatible with the observed data than the "random guessing assumption" .

scottjoplin · 2018-11-16 9:25 am

Best to concentrate on measuring the unmeasurable rather than the immeasurable me thinkst.

Markw4 · 2018-11-16 9:39 am

Jakob2 said:
Actually we are evaluating how compatible the observed data is with our null hypothesis, but as we are not really examing if the null hypothesis is true, we can´t conclude that a negative test result establishes the null hypothesis or corrobates the null hyptothesis, because the real reson might be different and other hypothesises might be even more compatible with the observed data than the "random guessing assumption" .

Exactly! Thank you. A calculation is being made on the probability of a particular score occurring based on the assumption that all answers are random guesses, but in fact answers probably are not all random guesses, so the probability of a particular score occurring is likely not being correctly calculated.

That's not all, either. Probably no point in going on, though.

PMA · 2018-11-16 9:47 am

Jakob2 said:
The next would be P(8 l 10) = 0.04395 = C but now the cumulated probability for
P(A or B or C) = 0.05469
The probability for that result would be higher than our critierion of SL = 0.05, so in a formal decision process we would no longer reject the null hypothesis.

.

The same results I spoke about, the only difference is "just 8 of 10" and "at least 8 of 10". We are not interested in "just 8 of 10" solely.

gpauk · 2018-11-16 10:21 am

billshurv said:
Hope that there are enough people who are not bipartisan in the discussion that we might actually make some progress rather than just keeping pushing the boulder up the hill each day.

Sadly there are those that think there is a problem, but are not prepared to produce a test process that can be used by all, but just keep avoiding doing so.

.. you must look to your own designs as possibly lacking something, IF people do not prefer them over others, not that everyone is fooling themselves over what something looks like or some sales pitch from another.

While some people still think that if it doesn't sound different it's bad, that rock is going to keep getting pushed...

Jakob2 · 2018-11-16 10:31 am

PMA said:
The same results I spoke about, the only difference is "just 8 of 10" and "at least 8 of 10". We are not interested in "just 8 of 10" solely.

Actually no difference as you cited it already:
" The next would be P(8 l 10) = 0.04395 = C but now the cumulated probability for
P(A or B or C) = 0.05469
The probability for that result would be higher than our critierion of SL = 0.05, so in a formal decision process we would no longer reject the null hypothesis.
"
(i´ve activated the bold now)

P(A or B or C) = P(A) + P(B) + P(C) which is the probability for at least 8 out of 10 .

Better it is to use the expression that it is the probability for P(X >= 8 l 10) with X representing the number of correct trials.
I remember that for example excel uses the expression "at least" for explanation of their probability calculation but is actually calculating P(X > 8 l 10) which is a problem in small trial number experiments.

Edit: in my post with the explanation i used the line "P(A or B) = P(A) and P(B)" but meant P(A or B) = P(A) plus P(B) so it is the mathematical operator "+" not the logical "and" .

scottjoplin · 2018-11-16 10:33 am

Didn't Mark imply a while ago that it's mostly a mental/intellectual/academic exercise and what's wrong with that since there are subjects the same exercise could be brought to bare on that are outside the forum rules? Which suggests to me that, in the end this is "just for fun" 😉

Markw4 · 2018-11-16 11:28 am

gpauk said:
Sadly there are those that think there is a problem, but are not prepared to produce a test process that can be used by all, but just keep avoiding doing so.

While some people still think that if it doesn't sound different it's bad, that rock is going to keep getting pushed...

I have proposed a change to ABX as it currently exists in Foobar ABX, that I think would move it a into the direction of being more useful. Maybe even to whole way to definitely fair and useful. It would involve adding a loop checkbox, and single button switching between samples with eyes closed such as a hotkey on the keyboard. Those two things should help a lot. At that point I would want to test again. Not sure about at the end a question how the answer choices are presented. Don't remember if its okay or not since I haven't tried it for quite awhile. A programmer here in the forum did contact me by PM at one time and offer to do it so we would have a something a little better than Foobar ABX, but he eventually decided it was more than he could take on.

With regard to 'if it doesn't sound different its bad,' don't know what that is supposed to mean. Some things are indistinguishable from one another. Just not usually the things in PMA's listening tests, although they can be quite hard to use Foobar ABX on. My only complaint about it is that if I can reliably hear a difference blind using a different protocol myself, I would like to see us find a protocol as good that we can agree on and that everybody here can use. The problem is that Foobar ABX can't be changed and it is the only program with a validation test system (although it can be cheated). Like many things, it would appear to take funding to fix it, and nobody wants to pay. They only want to argue.

Once again, I would like to step back as this is taking up too much time. Bye.

billshurv · 2018-11-16 11:37 am

See, I had hoped that an interesting discussion could come out of the fact that we have an interesting result to discuss. Not the score in isolation but the way he trained himself to hear the difference and the fact that he was honest enough to admit he didn't have a preference and wouldn't guarantee that he could tell the difference if just presented with one (so effectively transparent for normal use).

The method was a variant of a fast switching on a single event. This may increase sensitivity to the test, but is at odds with the hardline subjectivists and their view that you need hours or days to listed to one before switching to hear the differences. This is fascinating, whilst not unexpected.

My take away is that an open and honest mind can get a lot from these sort of comparisons. Personal learning rather than data to solve an unsolvable argument, but still good 🙂

Back to pushing your boulders. The vulture will be along to eat your liver in a while!

billshurv · 2018-11-16 11:38 am

Markw4 said:
With regard to 'if it doesn't sound different its bad,' don't know what that is supposed to mean.

Hifi reviewers are paid to hear differences in things even when they don't exist! So a lot of people believe they must be able to hear differences when they don't exist. And get huffy when others point out they shouldn't be able to.

Search

Amplifiers

Source & Line

Loudspeakers

Design & Build

General Interest

Live Sound

Member Areas

Site

Featured Vendors

Members Market

Vendors Market

Vendors

Search

John Curl's Blowtorch preamplifier part III

Markw4

john curl

phase

PMA

PMA

john curl

Markw4

PMA

Markw4

PMA

Jakob2

scottjoplin

Markw4

PMA

gpauk

Jakob2

scottjoplin

Markw4

billshurv

billshurv