Humans are humans, some folks are just better about hiding it than others. 🙂
As someone who's been in some form of a research environment for his entire professional career, we scientists/researchers are good about being objective about everyone else's data, unless it rocks our world and/or contradicts our own data. Which is to say we're emotionally attached just like everyone else, and it's the exceptional few that are able to look at their own investment with the same sort of objective detachment.
Which is why it's best to work with others (no matter how problematic that can become) given it keeps everyone a lot more honest. And why we rely on external review and replication for reliability.
As someone who's been in some form of a research environment for his entire professional career, we scientists/researchers are good about being objective about everyone else's data, unless it rocks our world and/or contradicts our own data. Which is to say we're emotionally attached just like everyone else, and it's the exceptional few that are able to look at their own investment with the same sort of objective detachment.
Which is why it's best to work with others (no matter how problematic that can become) given it keeps everyone a lot more honest. And why we rely on external review and replication for reliability.
Last edited:
<snip> The test results do however dismiss sighted testing as a valid technique to determine minor differences.
There are some reasons why i think the conclusion must not be true generally. It is undenieable, that the impact of "sight/knowledge what is what" can be quite strong, but the question remains why this specific impact should be unbeatable even by experience/practice. It seems unreasonable to assume that this bias factor/confounder should be literally the only one that can´t be controlled/defeated.
We know already that after removing this specific bias factor, a lot of other bias effects are still at work and further we know that a perfect experiment will normally not be possible, so after all efforts to block some confounders and to randomize the impact of those effects that we can´t block out, there inevitably will some bias effects remain.
But we have to assume that the participants can be able to control these to a certain degree.
For example it seems to be quite difficult to prevent humans from expecting at least something; some sensory labs try to block that out by not telling the testers what effect will be under test.
In food sensory questions usual practice is to ask an expert and only start with serious (and costly) "blind" sensory tests if the expert thinks the differences are really small (perceptionwise), a strategy that is based on the assumption that control of bias effects is indeed a matter of learning/experience.
<snip> Considering these facts I don't understand why anyone would still want to consider sighted preference testing a valid technique to determine purely aural performance.
I understand the concerns but want to remind of the fact that something like a test of "purely audio performance" does not exist as it is always a combination of "aural" signal processing and interpretation done by our brain.
But dismissing sighted listening completely would imo mean to neglect practical experience.
How about this idea: faux front panels so those preferring sighted testing could still do so, but both amplifiers would look the same. What do you think the results would be then?
We did that it with our preamplifier test that i´ve mentioned before; not only using the same front panels but (nearly) exactly the same cases jacks, knobs and so on. Measured between the two variants was clearly in the "small" compartment. I could distinguish the two and handed the units out to 5 different listeners for evaluation without telling them that it was part of a test approach, just asking colleagues for their preference (if they would prefer one). But of course i choose these listeners (and was only able to find 5) because i knew about their ability, knew that they are used to do this kind of comparison as part of their business (and hobby as well) and knew their listening habits and preferences very well. Therefore a lot of synchronization was included but done upfront without knowledge of the listeners.
The result was very encouraging as they all preferred the same variant as i,so we had two controlled independent listening experiments where the null hypothesis could be rejected.
<snip> Kind of interesting from a physiological viewpoint, but not terribly relevant to understanding aural performance of different circuit topologies...which is why I joined this group...
It seems to be quite difficult to address aural performace of different circuit topologies as the clash between the subjectivists and the objectionists (at least so the selfascribed viewpoint,i have often my doubts though) overshadows the discussions in a lot of threads.
To reach some agreement about demand of and performing good controlled tests i fear we will not have any progress in the discussions.
After having been unfortunately involved in many exhausting testing experiences, I now find the best way to evaluate components is ~30 minute relaxed listening sessions with a variety of music to excite system weaknesses in a dimly lit room I am familiar with.<snip>
it´s quite similar to my approach, but it basically means that sighted listening has its merrits too, which reflects what most people usually do. Even if you are interested in controlled listening tests, you are doing sighted listening first to seach for things that could/should be more assessed by controlled listening later. To do so would be a contradiction in ratio if impressions of sighted listening had no weight at all..........
It does not matter to me nor does it make me inferior if someone else could hear that difference, the music in my head comes through my hearing apparatus not theirs. I know of no other more revealing technique if what I am interested in is actual aural performance minus all the other BS.
I agree, although there is a salt of grain as i have to take the risk of missing some effects that combined might be of relevance although each one would not.
What are we supposed to be looking at? If Sonny Rollins was here in my room pretending to play his sax would my system sound more realistic?
Interestingly over on Pavel's thread someone has had a pretty good ABX result (6/8) , but didn't have a preference 🙂.
Yes, but a single 6/8 does say much. This may be achieved even by "blind clicks", IME. I would like to see 8/8, 10/10 or something like 16/20. 6/8 still has strong guessing probability.
Possibility does not mean PROBABILITY! Most of us actually want honest differences, not wishful thinking. What is the point of 'wishful thinking'?
PMA you must look to your own designs as possibly lacking something, IF people do not prefer them over others, not that everyone is fooling themselves over what something looks like or some sales pitch from another.
PMA you must look to your own designs as possibly lacking something, IF people do not prefer them over others, not that everyone is fooling themselves over what something looks like or some sales pitch from another.
Yes, but a single 6/8 does [not??] say much. This may be achieved even by "blind clicks", IME. I would like to see 8/8, 10/10 or something like 16/20. 6/8 still has strong guessing probability.
Flip 8 fair coins. Probability that six of them come up heads = 0.1445 which strikes me as a relatively low number. Only one time in seven does it occur.
Last edited:
Pavel: I agree that a single result does not prove anything. But is does show that someone with an open mind can score a good result but be honest about how much is actually matters. Gives me some hope 🙂
Hope for what? That Jakob2 wasn't right when he said only negative results are accepted? Would PMA have been so quick to point to chance if the score had been 2 out of 8?
Actually, there is still a problem with Foobar ABX and how it calculates probability of guessing. Guessing should give a score of 4 out of 8 (on average). The probability of guessing in that case should be at a maximum. However, Foobar ABX says there is a 50% chance of guessing when the score is 4 out of 8. Interestingly, getting 2 out of 8 shows reverse correlation with the correct answers, and should be considered equally significant as getting 6 out of 8. If either trend were to persist, that might suggest System 1 and or System 2 are responding in a non-random way, not guessing.
Actually, there is still a problem with Foobar ABX and how it calculates probability of guessing. Guessing should give a score of 4 out of 8 (on average). The probability of guessing in that case should be at a maximum. However, Foobar ABX says there is a 50% chance of guessing when the score is 4 out of 8. Interestingly, getting 2 out of 8 shows reverse correlation with the correct answers, and should be considered equally significant as getting 6 out of 8. If either trend were to persist, that might suggest System 1 and or System 2 are responding in a non-random way, not guessing.
Last edited:
Hope that there are enough people who are not bipartisan in the discussion that we might actually make some progress rather than just keeping pushing the boulder up the hill each day.
Mark you sure it's not representing the data in a non-intuitive way from a binomial distribution?
https://stattrek.com/online-calculator/binomial.aspx
https://stattrek.com/online-calculator/binomial.aspx
Mark you sure it's not representing the data in a non-intuitive way from a binomial distribution?
Oh, I would agree that it is where the numbers come from. Just not sure what their intentions were. Did they want to show how available statistics could be used to help interpret results? Did they want to find some numbers that gave an accurate estimate of guessing? Maybe they wanted to do both at once. Whatever the intentions were, something is wrong somewhere.
Don't we need thousands of pounds (dollars) to make progress..........Mark?
Probably. There would almost have to be some expenses involved, even with a lot of volunteer efforts. And, while it would be nice to think we could get lots of volunteer efforts, over the years what mostly seems to have happened is a lot of arguing and no real progress.
over the years what mostly seems to have happened is a lot of arguing and no real progress.
We are going to eventually have some tests with subjects selected from across a large cross section of the population rather than hand picked from colleagues with similar tastes and sensitivities.
We are going to eventually have some tests with subjects selected from across a large cross section of the population rather than hand picked from colleagues with similar tastes and sensitivities.
We are? Who is paying for it? What questions will the tests be investigating?
We are? Who is paying for it? What questions will the tests be investigating?
What would you suggest?
Oh, I would agree that it is where the numbers come from. Just not sure what their intentions were. Did they want to show how available statistics could be used to help interpret results? Did they want to find some numbers that gave an accurate estimate of guessing? Maybe they wanted to do both at once. Whatever the intentions were, something is wrong somewhere.
https://en.wikipedia.org/wiki/Hanlon's_razor
(although stupidity might be better called "laziness")
- Status
- Not open for further replies.
- Home
- Member Areas
- The Lounge
- John Curl's Blowtorch preamplifier part III