Can you hear a difference between 2 solid state preamps?

Can you hear a difference between the two test files

  • I can hear a difference, but have no ABX result

    Votes: 12 50.0%
  • I cannot hear a difference and have no ABX result

    Votes: 6 25.0%
  • I can hear a difference and have an ABX result

    Votes: 4 16.7%
  • I cannot hear a difference and have an ABX result

    Votes: 2 8.3%

  • Total voters
    24
  • Poll closed .
Status
This old topic is closed. If you want to reopen this topic, contact a moderator using the "Report Post" button.
Yes, until I was able to find a part of the sample recording that allowed for me to tell the difference quite reliably. Then, I got similar results as posted, repeatedly, but always limited to 8 trials, as I could not keep concentration to make 16 trials in a row.
So, first trials were something like initial preparations. I agree that if I did not have this training time, I would not be able to tell the difference as I did later. It is not so easy in ABX.
That was my point - you didn't get 11 out of 16 trials as according to yourself you couldn't focus for 16 trials & therefore if you had done 16 trials you would have delivered a random result. On the other hand you could have done 16 trials with long breaks between each trial (or taken a break between 8 trials & 8 trials) to avoid the problem with holding concentration for a longish time - people who don't realise the limits of their concentration & how it is effecting their ABX performance can be lead to believing that the ABX results 'proves' that there is no audible difference discernible

Glad to see you stating that ABX testing is very difficult & maybe those who pontificate about 'night & day' differences in sighted listening not being easily discerned when doing ABX testing may take note?
 
Last edited:
@mmerrill99
No, generally it is correct to combine two seperate runs, provided every run itself is objective, reliable and valid.
Independence can be questionable so sometimes it is better to choose another statistical test that does not rely on the independence of trials.

Of course it depends on the significance level (SL) that one wants to achieve, but for the usual SL = 0.05 even 5 trials are sufficient (actually testing on 0.032 niveau). SL is only considering alpha-errors so beta-errors are neglected, but if we take the beta-errors into account even 20 trials don't help much (but that depends on the listeners ability to detect differences if existing).

Not sure what the confidence interval length of 10 means, as usually the confidence interval would be an interval for the p-value (as opposed to the assumed p=0.5 under the null hypothesis)

From Leventhal here (emphasis mine) :
I do not know whether these "subtle differences" are real or imaginary. But I do know that many listening tests using the ABX comparator, including many published tests such as those in Audio cited by reader Huss, are conducted and analyzed in such a way that subtle differences actually heard by the listener will likely go unidentified by the experimenter when the data is analyzed. The problem with these listening studies is that the experimenters conducted too few trials (for example, 16), and used the .05 level of significance when subjecting the data to a statistical test of significance.
 
@jakob2
Combining the results of two ABX tests can lead to the possibility that many 8 trial tests were done & most gave random results so only the two slightly positive results were used, ignoring all other ABX runs - it's somewhat like just picking the trials out of an ABX test that are correct, ignoring those that aren't correct - I'm sure you agree this is not valid.

Doing a run of 16 trials consecutively avoids these potential issues

This whole area of listening tests that rely on statistical analysis is a minefield
 
I will speculate that the capacitors on the input pins of 797 need to be increased a bit, or better (but will change the design too much that may need some adjustments) reduce the resistances in parallel with it. This will reduce the measured performance a bit.

So you want to reduce BW. To get some 100kHz/-3dB, right? However, you seem to be one of not too many who prefers C#2 sample. So what now, the circuit to be modified according to subjective preferences of a single member or two? ;)
 
So you want to reduce BW. To get some 100kHz/-3dB, right?
Of course not. That can be done by increasing capacitance of the input filter :) The idea is to increase stability. The effect to measured performance should be minimal. It (the C tweak) is an icing on the cake. Can be increased or decreased, it is just a 'function' of chosen Rf. I don't know what value you choose for Rf. I'm guessing you choose too high a value. Assuming that jfet current can be decided through power supply (and that a K170 can handle 11 mA), the value of Rf can be 'minimized' (assuming that you choose high value). It would be better if gain is lower of course. Lower than the value that i assume you have chosen.

However, you seem to be one of not too many who prefers C#2 sample.
I would no doubt prefer direct (without the preamp). I don't prefer C#2, just that listening to C#1 is more tiring for me so I have no choice. But I wish AD797 can perform 'better'. It is just a matter of stability I believe.

So what now, the circuit to be modified according to subjective preferences of a single member or two? ;)
That's the point of voicing a design. You can provide two designs (same design with different options) which do not differ a lot in measurement, and that is supposed to sound the same.
BTW, I'm curious how you replace AD797 with AD744, of course, removing a socket if any is a good tweak.
 
It always amazes me when someone has the ability, after listening to a 90 second except to be able to home in on the exact component that is the problem.
Honestly I doubt that others can hear what I can regarding improper PRAT that I associate with stability (because this usually happen with opamps with known tendency for instability). In ABX I could identify which is which based on distortion. I could also pin point this 'instability' (I used percussion part) but I don't know which one was which (in ABX you only know that A=X and B=Y but you don't know which one is X and which one is Y). But listening to C#1 and C#2 using several amps I know that C#1 is fatiguing. C#2 is higher in 'distortion' (typical perception) but that's not as fatiguing as C#1 (it's hard to describe the sound of it with words)...


Instability in opamp implementations are theoretically affected by input stray capacitance. We want this to make a pole far above the amplifier bandwidth. My rough calculation for AD797 is to have a maximum of Rf around 105 Ohm for a noise gain of 2 (non-inverting buffer) for pole at twice the bandwidth... (Other efforts related to increasing stability is of course, short and 'clean' layout especially in feedback path). I have little experience with opamps to know how much this will affect the perceived sound, but one or two is I think sufficient...
 
That's an enourmous leap given the sample size here. Nothing can be taken from this test as far as I am concerned.

Agreed, but my comment was related to a post over at the "blowtorch thread" :

AD711 might have been a breakthrough in times of uA741, however that's all. OPA134 is much better, technically and audibly, though it is about 20 years old chip. Interesting how rigid is audio community and how long it takes to notice something.

No one objected.


I did the test unblinded so my results are totally void. But I am interested to know if I can pick the files out with some form of blinding. But I am just not setup for that sort of test at the moment.

That's the question. :)
As said before when mentioning the concept of "qualitative methods" (surely in other threads), if you consider a really refined evaluation wrt "main and sub parameters" to assess the quality of the reproduction (which means something between 6 to 20 parameters) you are most likely still blinded (wrt these parameters) although listening sighted.
That is one of the advantages of this kind of tests.


should also re-iterate that I accept the bias of pride of ownership so I would personally build such a preamp with the top opamps because

a) I can
b) I would know that I had put in the best I could afford and use that bias to my advantage.

Of course, nothing wrong with that.

My biggest intrigue though is the fleeting glimpses of evidence that people do listen in very different ways. It's not a suprise, but this is the first time it has appeared in the open for me personally.

Scottjoplin mentioned something similar (afair) and it is still surprising, as I've (others as well) literally written numerous times about the quite large intersubject differences when evaluating sound events and even more when evaluating the lossy reproduction in our usual stereophonic setups; even mentioning the important role of experience and so on. :)
 
Yes, until I was able to find a part of the sample recording that allowed for me to tell the difference quite reliably. Then, I got similar results as posted, repeatedly, but always limited to 8 trials, as I could not keep concentration to make 16 trials in a row.
So, first trials were something like initial preparations. I agree that if I did not have this training time, I would not be able to tell the difference as I did later. It is not so easy in ABX.

I guess it is obvious where the arguments gets a bit problematic, as it is just based on your subjective impression of the process. ;)

There is no problem with the 8 trial runs provided that you report _all_ results.

I've a weak memory that the training problem with controlled listening tests (even more important in the case of ABX tests) was mentioned in the past from "time to time" ...... :)
 
From Leventhal here (emphasis mine) :

" The problem with these listening studies is that the experimenters conducted too few trials (for example, 16), and used the .05 level of significance when subjecting the data to a statistical test of significance."

Yes, there he points to the problem; using SL = 0.05 (as decision criterion for statistical significance) while neglecting the probability for beta-errors (means to _not_ reject the null-hypothesis although it is wrong) which "skyrockets" for listeners detecting ability below 0.8.

Leventhal's main point was the importance of sufficient statistical power when doing tests.
As the term "statistical significance" denotes the guard against alpha-errors (means to reject the null-hypothesis although it is true) and is the risk to get the number of trials needed by random guessing, there is generally nothing wrong with usage of the minimal number of trials required.

Just an example, if you want a balanced approach (means the same risk) for the probabilities of alpha- and beta-error, then it is
alpha = 0.05
beta = 0.95

and the number trials depends on the (unknown in most cases) listener ability to detect audible differences under the certain test conditions; that means we need:

13 trials, if the listener detection ability is 0.9
28 trials, if the listener detection ability is 0.8
67 trials, if the listener detection ability is 0.7
268 trials, if the listener detection ability is 0.6



@jakob2
Combining the results of two ABX tests can lead to the possibility that many 8 trial tests were done & most gave random results so only the two slightly positive results were used, ignoring all other ABX runs - it's somewhat like just picking the trials out of an ABX test that are correct, ignoring those that aren't correct - I'm sure you agree this is not valid.

Doing a run of 16 trials consecutively avoids these potential issues

This whole area of listening tests that rely on statistical analysis is a minefield

Yes, see my comments to PMA on this, but it is not a problem of statistical significance. If one does not report/consider all results that will be misleading in the case of 16 trial tests as well.

Statistical analysis and the matter of tests overall is a minefield indeed. :)
 
Scottjoplin mentioned something similar (afair) and it is still surprising, as I've (others as well) literally written numerous times about the quite large intersubject differences when evaluating sound events and even more when evaluating the lossy reproduction in our usual stereophonic setups; even mentioning the important role of experience and so on. :)
Yes, my comment was about the different way/things people focused on and what they picked up on/missed/were surprised by, there was actual evidence here from members who participated. I'm sure Bob and others here consider themselves to be experienced.
 
I guess it is obvious where the arguments gets a bit problematic, as it is just based on your subjective impression of the process. ;)

There is no problem with the 8 trial runs provided that you report _all_ results.

I've a weak memory that the training problem with controlled listening tests (even more important in the case of ABX tests) was mentioned in the past from "time to time" ...... :)

Indeed training is crucial & I would be interested in hearing from PMA how long & how many dead ends he went down before finding "a part of the sample recording that allowed for me to tell the difference quite reliably"?

The details, particularly coming from PMA, of what it takes to achieve a successful ABX test would be of interest to many who think ABX testing is simple
 
" The problem with these listening studies is that the experimenters conducted too few trials (for example, 16), and used the .05 level of significance when subjecting the data to a statistical test of significance."

Yes, there he points to the problem; using SL = 0.05 (as decision criterion for statistical significance) while neglecting the probability for beta-errors (means to _not_ reject the null-hypothesis although it is wrong) which "skyrockets" for listeners detecting ability below 0.8.

Leventhal's main point was the importance of sufficient statistical power when doing tests.
As the term "statistical significance" denotes the guard against alpha-errors (means to reject the null-hypothesis although it is true) and is the risk to get the number of trials needed by random guessing, there is generally nothing wrong with usage of the minimal number of trials required.

Just an example, if you want a balanced approach (means the same risk) for the probabilities of alpha- and beta-error, then it is
alpha = 0.05
beta = 0.95

and the number trials depends on the (unknown in most cases) listener ability to detect audible differences under the certain test conditions; that means we need:

13 trials, if the listener detection ability is 0.9
28 trials, if the listener detection ability is 0.8
67 trials, if the listener detection ability is 0.7
268 trials, if the listener detection ability is 0.6
But in a typical forum run/home based ABX listening test we don;t know the "listener detection ability" (& I would include "system revealing ability" in this phrase as a dependent factor) - that is a large part of the problem


Yes, see my comments to PMA on this, but it is not a problem of statistical significance. If one does not report/consider all results that will be misleading in the case of 16 trial tests as well.
But this is where it gets tricky. If PMA or anyone gets a positive result on 8 trials & then does another 8 trials but gets a null result, does he report this or retrospectively consider it a training run - very easy to fall into the later trap. Nothing wrong with doing training runs at any time but if the ABX results reported are actually just the cherry picked positive results, then ......

Doing a 16 trial run at least means it's not two 8 trial runs cherry picked from all 8 trial runs.

I'm not accusing PMA of anything, just pointing out some of the practical issues with ABX testing

Statistical analysis and the matter of tests overall is a minefield indeed. :)
Yep, yep & yep again
 
Status
This old topic is closed. If you want to reopen this topic, contact a moderator using the "Report Post" button.