Can you hear a difference between 2 solid state preamps?

Can you hear a difference between the two test files

  • I can hear a difference, but have no ABX result

    Votes: 12 50.0%
  • I cannot hear a difference and have no ABX result

    Votes: 6 25.0%
  • I can hear a difference and have an ABX result

    Votes: 4 16.7%
  • I cannot hear a difference and have an ABX result

    Votes: 2 8.3%

  • Total voters
    24
  • Poll closed .
Status
This old topic is closed. If you want to reopen this topic, contact a moderator using the "Report Post" button.
As already explained, first trials were to pickup the part where I was able to catch the difference. It was voice with "I got a song that I sing ....", you know that ssong, ssing ..... After finding this part, the results were consistent >5/8 each.

Thanks. How long did it take you to find & settle on this extract as the reliable differentiator between the tracks - what other sections did you try?
 
As already explained, first trials were to pickup the part where I was able to catch the difference. It was voice with "I got a song that I sing ....", you know that ssong, ssing ..... After finding this part, the results were consistent >5/8 each.

Agreed, I find it muffled, not as forward/less presence and excessive sibilance, can't hear evident difference between 1 & 2.
;)
 
@mmerrill99,

all we can do is to emphasize that one shall not do cherry picking.

If one nevertheless does, it doesn't matter if he is cherry picking from a badge of 8 trial runs or from a badge of 16 trial runs. :)

Of course, the detection ability is a crucial point, therefore I wrote (unkown) ability under the test conditions, which should be understood as including every possible variable, like reproduction chain capabilities, state of mind at that time, detection ability in general, (maybe existing) problems due to the test protocol choosen and so forth.
 
@mmerrill99,

all we can do is to emphasize that one shall not do cherry picking.

If one nevertheless does, it doesn't matter if he is cherry picking from a badge of 8 trial runs or from a badge of 16 trial runs. :)
I don't want to belabour the point but I believe this is incorrect - let's look at the just two trials in each ABX test - if 32 positive trials are cherry picked out of 100 trials do you think this is the same as a positive result of a 64 trials in one test?

Of course, the detection ability is a crucial point, therefore I wrote (unkown) ability under the test conditions, which should be understood as including every possible variable, like reproduction chain capabilities, state of mind at that time, detection ability in general, (maybe existing) problems due to the test protocol choosen and so forth.
Yes & this is what we don't know for these home trials - in formal trials this can be accounted for
 
Member
Joined 2014
Paid Member
Agreed, but my comment was related to a post over at the "blowtorch thread" :
No one objected.
The blowtorch is never the best place for a serious discussion though :)




As said before when mentioning the concept of "qualitative methods" (surely in other threads), if you consider a really refined evaluation wrt "main and sub parameters" to assess the quality of the reproduction (which means something between 6 to 20 parameters) you are most likely still blinded (wrt these parameters) although listening sighted.
That is one of the advantages of this kind of tests.
By the time I would find the time to do that sort of evaluation I'd be too old to trust my ears at all!


Scottjoplin mentioned something similar (afair) and it is still surprising, as I've (others as well) literally written numerous times about the quite large intersubject differences when evaluating sound events and even more when evaluating the lossy reproduction in our usual stereophonic setups; even mentioning the important role of experience and so on. :)


I know it has been stated, but this is the first time I have seen any evidence of it in action. Maybe I haven't been looking hard enough or maybe I haven't been participating enough for it to become more visible. I've always known I do not share the cymbal fetish that some have and I've always known we are sadly lacking a common lexicon to work with on describing sounds and even more resistance from some to trying to modify an exisiting framework. Comes with the territory I guess.



Nevertheless I have learned something and that is good. I take my hat off to Pavel for his almost unending patience in setting these up, and yours from answering my dumb questions for the 30th time :)
 
Nevertheless I have learned something and that is good. I take my hat off to Pavel for his almost unending patience in setting these up, and yours from answering my dumb questions for the 30th time :)

Nice to read your kind words, Bill, and also thanks to Jakob for explaining the things that are out of my scope and he is experienced in them.
 
I don't want to belabour the point but I believe this is incorrect - let's look at the just two trials in each ABX test - if 32 positive trials are cherry picked out of 100 trials do you think this is the same as a positive result of a 64 trials in one test?

Not sure, why you've posted this.
The basic rule is that "cherry picking" is prohibited, period. :)
If one does for example 21 8-trial ABX tests and got two with 5/6 hits then it would be cherry-picking to only report these two with combined results.

If one does for example 11 16-trial ABX tests and got one with 12 hits then it would be cherry-picking to only report this one.

One must use all the data that is available for the statistical analysis, but it is always possible to do the evil "cherry picking" (and others ;) ) regardles of the number of trials in each test.
I hope that clarifies it?!
 
The basic rule is that "cherry picking" is prohibited, period. :)

However, "learning phase" is allowed, in many DBT tests. What is the difference if you call say first 5-10 attempts as "learning phase" or "unsuccessful trials", and all other trials after learning phase count and are above some threshold, in this case 6/8 and better. Regardless if 2 protocols or more are posted. Is it cherry picking as well?
 
However, "learning phase" is allowed, in many DBT tests. What is the difference if you call say first 5-10 attempts as "learning phase" or "unsuccessful trials", and all other trials after learning phase count and are above some threshold, in this case 6/8 and better. Regardless if 2 protocols or more are posted. Is it cherry picking as well?

The term "cherry picking" means just that, to pick the best from a badge for the reports; sometimes happening if people don't know about the problem, sometimes happening intentionally (one of the reasons for the so-called replication crisis in some fields).

Of course, nothing wrong with training, if the training phase is really parted from the "real" test phase.
We have mentioned before Feynman's famous speach and it is well worth to remember his line that "the easiest one to fool is oneself" .

If you state _before_ doing a test if it will be a training test (and therefore not be counted) or a real test (that will be reported/used for statistical analysis) there is no problem.
 
Not sure, why you've posted this.
The basic rule is that "cherry picking" is prohibited, period. :)
If one does for example 21 8-trial ABX tests and got two with 5/6 hits then it would be cherry-picking to only report these two with combined results.

If one does for example 11 16-trial ABX tests and got one with 12 hits then it would be cherry-picking to only report this one.

One must use all the data that is available for the statistical analysis, but it is always possible to do the evil "cherry picking" (and others ;) ) regardles of the number of trials in each test.
I hope that clarifies it?!

Jakob, my point is that if one 'cherry picks' & reports 11 correct out of 16 trials (in one test) from a set of listening tests (all tests of 16 trials each) that then this is more statistically significant statistically 'cherry picking' a 5 out of 8 run & a 6 out of 8 run from a set of ABX listening tests - it's statistically more difficult to get one run of 11 correct in 16 than it is to get a 5 & 6 in runs of 8 trials.

Agreed, no cherry picking is allowed but sometimes people are not even aware they are cherry picking & that getting 5 out of 8 & 6 out of 8 randomly, is not terribly difficult if enough ABX runs are considered

Foobar ABX was changed a while back in a number of ways - one of which was to stop reporting to the test subject, the ongoing total of correct results - this was to avoid people stopping the test when they had got a good run of positive results - a form of 'cherry picking'.

As you can see from the posts here people still think ABX testing is simply about "hearing" & fail to understand the significance of the statistics behind the test
 
Last edited:
However, "learning phase" is allowed, in many DBT tests. What is the difference if you call say first 5-10 attempts as "learning phase" or "unsuccessful trials", and all other trials after learning phase count and are above some threshold, in this case 6/8 and better. Regardless if 2 protocols or more are posted. Is it cherry picking as well?
PMA, if the 'real', non training ABX runs show a statistical significance then they are considered a positive result. if you report 5/8 & 6/8 & "2 or more protocols" (meaning ABX tests?) were run then are all these subsequent ABX tests either 5 or 6/8 threshold.

In other words if you are consistently scoring above a threshold & you report just that threshold then that is fine.
 
Jakob, my point is that if one 'cherry picks' & reports 11 correct out of 16 trials (in one test) from a set of listening tests (all tests of 16 trials each) that then this is more statistically significant statistically 'cherry picking' a 5 out of 8 run & a 6 out of 8 run from a set of ABX listening tests - it's statistically more difficult to get one run of 11 correct in 16 than it is to get a 5 & 6 in runs of 8 trials.<snip>

Generally it depends; according to PMA's description it has to be 5/8 and 6/8 in consecutive runs, probability to get such a result is comparable to the probability to get a 11/16 result if both are doing 8 tests.
If it doesn't have to be in consecutive runs then you are right, probabilities are higher for the 5/8 and 6/8 case.

But we shouldn't use the term "statistically significant" for "cherry picking".
 
The blowtorch is never the best place for a serious discussion though :)

It used to be different before grumpy old dogmatics decided that it wouldb be better to destroy any friendly discussion where people don't agree to their handmade hypothesises about what people can hear.

But anyway, usually someone'e post about audible differences between opamps (in nonpathologically behaving circuits with comparable measured numbers) would be enough to trigger a flood of posts mocking audiophiles, demanding "level matched double blind listening tests" and even more "extreme" requirements. But this time? :)

By the time I would find the time to do that sort of evaluation I'd be too old to trust my ears at all!

It's not that time consuming at all; just to find out why and where differences (that might have led to preferences) exist. A bit of additional analysing overall.

know it has been stated, but this is the first time I have seen any evidence of it in action. Maybe I haven't been looking hard enough or maybe I haven't been participating enough for it to become more visible.

I was just surprised reading that, because I (in this case) underestimated that personal experience of a fact is much more convincing than to read about it.

Nevertheless I have learned something .....

And so did I. :)

I'd hope that this example and the opamp case in the Blowtorch thread might help to prevent members from "BS - shouting" more often when others describe their listening impressions.
 
I was just surprised reading that, because I (in this case) underestimated that personal experience of a fact is much more convincing than to read about it.

That is why I have tried to invite people out to visit, so they can see for themselves.

I'd hope that this example and the opamp case in the Blowtorch thread might help to prevent members from "BS - shouting" more often when others describe their listening impressions.

+1
 
Member
Joined 2014
Paid Member
It used to be different before grumpy old dogmatics decided that it wouldb be better to destroy any friendly discussion where people don't agree to their handmade hypothesises about what people can hear.
That would have been before my time :). I should be clear. I don't believe I could tell the files apart if they were shuffled in some way. If I get round to testing them and I can, then the real headscratching begins.


But anyway, usually someone'e post about audible differences between opamps (in nonpathologically behaving circuits with comparable measured numbers) would be enough to trigger a flood of posts mocking audiophiles, demanding "level matched double blind listening tests" and even more "extreme" requirements. But this time? :)
There is bad behaviour on both sides. If someone claims night and day differences with some random component rolling or having paid someone to empty the kitty litter into their DAC with no before or after measurements then they are fair game in the lounge IMO.


It's not that time consuming at all; just to find out why and where differences (that might have led to preferences) exist. A bit of additional analysing overall.
Right now that is time I don't have. My critical listening time is a couple of hours a month at the moment. Come the spring that should change, but such is the self-inflicted burden of a second brood.


I was just surprised reading that, because I (in this case) underestimated that personal experience of a fact is much more convincing than to read about it.
I've been uneasy about the fact that I can't translate 90% of what audio reviewers are saying into something that relates to me since the 80s. Online it's worse. I've told all my children not to touch hot things and all of them have got burned at least once so I can confirm humans need to experience things :). I am no closer to a common lexicon but at least now I can be sure I focus on different things. Wrong or not, they are my preferences and I make no claims for if anyone else would enjoy my system.
 
That would have been before my time :). I should be clear. I don't believe I could tell the files apart if they were shuffled in some way. If I get round to testing them and I can, then the real headscratching begins.

Bill,
Possibly your system isn't good enough to make the differences as clearly audible as they otherwise might be. That is why my old Bryston 4B power amp had to go, and was replaced with a Benchmark AHB2. Now I find the Benchmark DAC-3 is coming very near to the end of its usefulness. It will go when the AK4499 dac on my workbench is ready.

Happy to let you hear for yourself if and when you get around to visiting Northern California.
 
Status
This old topic is closed. If you want to reopen this topic, contact a moderator using the "Report Post" button.