AES Objective-Subjective Forum

Status
Not open for further replies.
milosz said:
<snip>

No one heard a DIFFERENCE.

Let me repeat that.

NO ONE HEARD A DIFFERENCE.

If there is an audible difference between these two sample rates / depths, it damn well ought to audible when the same hardware is asked to play each in turn. If the difference is so bleeding subtle as to be inaudible except under very very special conditions, then that audible difference is far too tiny to be worthwhile.
ed? They're all available for download. None of the professional musicians- including the guy from the Chicago Symphony - said they were bad recordings.
<snip>
[/B]

The underlying argument in your reasoning is, that an (experienced) listener must be able to perform under blind test conditions as well as in his somewhat more normal listening routine.

That is at a first glance just a hypothesis which must be verified.

So, if your are doing some more blind tests, please try to find out, by presenting some differences (known as audible).

If your are presenting differences on various sensitivity levels you´ll get an impression what your participants are really detecting under the specific circumstances of your test.

Wishes


P.S. And of course it would be better to have some measured data of the test equipment
 
Jakob2 said:


The underlying argument in your reasoning is, that an (experienced) listener must be able to perform under blind test conditions as well as in his somewhat more normal listening routine.

That is at a first glance just a hypothesis which must be verified.

So, if your are doing some more blind tests, please try to find out, by presenting some differences (known as audible).

If your are presenting differences on various sensitivity levels you´ll get an impression what your participants are really detecting under the specific circumstances of your test.

Wishes


P.S. And of course it would be better to have some measured data of the test equipment


I think Bill stated that in sighted tests (i.e. when they knew which file was playing) the subjects claimed to hear huge differences between the two. Can we not use that as controls ? 😉

If you donot accept it, but instead require separate control tests, how do you know that the differences they report *then* are actually there?

Sometimes I feel that the tired old statement 'the test stress prevents me from hearing a difference I know must be there' is something like a "Universal Cop-out". Sorry, not meant to be personal.


Jan Didden
 
janneman said:



I think Bill stated that in sighted tests (i.e. when they knew which file was playing) the subjects claimed to hear huge differences between the two. Can we not use that as controls ? 😉

If you donot accept it, but instead require separate control tests, how do you know that the differences they report *then* are actually there?

Sometimes I feel that the tired old statement 'the test stress prevents me from hearing a difference I know must be there' is something like a "Universal Cop-out". Sorry, not meant to be personal.


Jan Didden

Unfortunately, if we could use that as a control, we´d not need any further DBT. 🙂

If someone is argueeing, that a difference can´t be audible, because it woud have been detected otherwise (due to the very skilled listeners), wouldn´t it be nice to know which difference could have been detected in reality? 🙂

Does it really make sense to do _tests_ , if you have to assume something in the end?

I´ve posted it before- maybe you have to conduct some dbts by yourself to get an impression what differences could remain undetected if the participants are not used to the specific test protocol.

Wishes


P.S. It´s just a matter of methodology; in this case i already assume that in Bills test all statistical reasoning leads to the mentioned results. Of course, normally it would be needed to present the data, the hypothesis, the SL and so on....
 
janneman said:
If you donot accept it, but instead require separate control tests, how do you know that the differences they report *then* are actually there?

Perhaps use varying levels of 'impairment' (eg varying the wordlength and sampling frequency) also in a blind way, so that you should end up with a correlation (or a brick wall 🙂 ) showing what level of impairment is detectable. This would also be a good way to address sensitivity concerns with ABX, as the test can reasonably expect to find a threshold (or correlation) somewhere, it just doesn't prescribe where it should be. I haven't seen this done with ABX, afaik it is done with more 'sophisticated' rating systems (eg "on a scale of 1 to 10...") and associated more complex stats which I really don't understand (like accounting, for the same reason).

The same thing (ABX with varying wordlength etc) could also be done sighted and/or with immediate feedback, so subjects could get some training in and get a feel for what they can really hear.

There is no doubt (in my mind) that a lot of it is imaginary or at best some undiscovered type of magic. Things like copper sounding warmer than silver make the ordinary public laugh because the connection is just so obvious. Yet if someone hears this difference, they are a lot more likely to believe what they heard over the opinions of others (scientists, engineers etc). That opens the door for all the snake oilers out there, to the point where people are selling clocks and holographic dots to willing customers who should and do know better, but can't help themselves because they do observe an improvement, and that observation is in itself the end goal (ie, listening to music).
 
janneman said:



<snip>

If you donot accept it, but instead require separate control tests, how do you know that the differences they report *then* are actually there?

<snip>


Jan Didden

That is a problem of statistics in general. You never find the _real_ truth, it´s just a matter of probability. 🙂

In this regard we have less problems in our tests than in other empirical matters. Normally you have to estimate the distribution of the population from a size restricted sample, but in our cases the distribution to test against is already known, as we are testing for pure chance.

But no matter how hard you try, how many tests or how many additional guards against false positives you design, in the end you´ll never really _know_.

But you´ll get some probabilities to deal with.
 
Jakob2 said:


That is a problem of statistics in general. You never find the _real_ truth, it´s just a matter of probability. 🙂

In this regard we have less problems in our tests than in other empirical matters. Normally you have to estimate the distribution of the population from a size restricted sample, but in our cases the distribution to test against is already known, as we are testing for pure chance.

But no matter how hard you try, how many tests or how many additional guards against false positives you design, in the end you´ll never really _know_.

But you´ll get some probabilities to deal with.


Hard to argue against that one. That's objective truth 😀

Jan Didden
 
janneman said:
Another interesting DB test. This one was done by a guy called 'billmilosz' on the Yahoo group for the DEQX. You can find this particular post, and a whole discussion, there.

/start of quote
"Here was the test:
I used a few different pieces of music, recorded professionally at 96 kilobits per second with 24 bit depth, in stereo. For example,
Dvorak violin concerto from http://01688cb.netsolhost.com/samplerdownload/

Then, using Adobe Audition I did high-precision resampling down to 44.1 kilosamples per second with 16 bit depth, same as a CD.
So I had two files of the same music- one which had a lot more information in it (24 bit / 192 khz sample rate) and one in which this extra information had been removed, leaving only the mount of information that one finds in a normal CD.
I wrote a little program for my computer using the C++ language with which I have some familiarity. This program allowed me to start two software audio players at the same time - one playing the 96 / 24 bit version of the file, the other playing the 44.1 / 16 version. They stay in perfect sync whilst playing.

Then this same little bit of software I wrote waits for any key on
the PC keyboard to be hit. When a key is hit it either keeps the same version playing (96/24 or 44.1/16) or it switches from one version to the other- there's a large table of true random umbers
that's used to randomize the action, so this is truly double-blind. Then it waits for the test subject to enter "Y" or "N" using the keyboard. The test subject is told to enter Y if he / she heard a difference between the two versions, and an N if they did not. The software keeps a text file as a log of the files played and the answers from the test subject.

The output of the PC sound card (a Creative X-Fi with quite respectable performance at both 192 / 24 and 44.1 /16 rates) was fed to a pair of Monarchy SM-70 Pro amps in mono, these are good class-A amps. These were driving a pair of ESL-57's, with refurbished panels and HT sections by highly-regarded Quad guru Wayne Piquet. This was in a fairly small, quiet room. The speakers were placed faily close to the listening position, so listening was essentially nearfield. Detail, linearity, transient response etc of this amp / speaker system is very good. Quad ESL-57's are very revealing.

I did this with around 45 test subjects over the past year or so. It's very easy to do, the gear is always set up in one of my rooms because that's where I use I, I just have to select the PC as the source to the power amps and fire up the program, so pretty much any visitor to my home gets badgered by me into doing the test.

I know a lot of musicians, sound engineers, producers, and also a lot of guys who consider themselves highly-skilled "golden eared
audiophiles." I also used some non-music / non-audio types, and
also a few children (around 7~10 years old. They can hear far higher in frequency. Very few people over 30 can hear well above 16 kHz, this is a medical fact.) I didn't use any rock type musicians or producers, typically their hearing is pretty shot from listening to loud shows. The musicians were a mix of jazz, folk and classical. Some were professionals and the rest were studying in MFA programs at one of the local colleges. One was with the Chicago Symphony (I live in Chicago)- some of you may have heard of them. The producers were mostly radio (NPR) types, with some film and one theater audio designer. There was also a composer / music professor. Some of these guys were also audio nuts. The other audio guys were just audio hobbyists, they are accountants, lawyers, a cab driver, software engineer, and one art museum curator. There was one audio store owner.

All the test subjects are asked is to try to see if they can hear a
difference in the sound.

Correct answers averaged out just below 50%. No one listener got more than 53% correct. This is pretty much what you'd expect from chance.

To me, this experiment shows that a fairly decent sample of folks who make a living with music and sound, along with people who consider themselves skilled at listening, simply cannot hear the differences between 96 /24 and 44.1 /16 audio.

I suppose it could be argued that using a better audio card is necessary, but I reject that argument. This is not about SOUND QUALITY, it is simply CAN YOU HEAR ANY DIFFERENCE. The Creative X-Fi is a well engineered card with low noise and distortion, etc., and if the differences between 96 / 24 and 44.1 / 16 are REALLY audible, then SOMEONE should have heard a difference.

But no one did.

By the way, about half of the audiophiles said they COULD hear a BIG difference between 96/24 and 44.1/16 files when they KNEW which file they were listening to. But this was apparently just a
psychoacoustic effect of expectation: they THOUGHT a 96/24 file
SHOULD sound better, and in their brains IT DID.

Once they no longer knew which file they were listning to, this "BIG DIFFERENCE" in sound TOTALLY VANISHED. That tells me that, in fact, they COULD NOT ACTUALLY HEAR A DIFFERENCE.

I am guessing that upsampling a 44.1 /16 CD to some higher rate would also prove to be inaudible in a double blind test. I will make a pair of files for this and add it to the tests I subject my dinner guests to.

Most of the people involved in the test were quite interested in the results. Only one is no longer speaking to me.....

FYI I am considered a good cook. So, even if they have to sit still for a few minutes of batty psychoacoustic testing, people I invite over for dinner rarely turn down the invitation. Punjabi lamb or chicken with artichoke in mole apparently compensate for the audio test mania."
/end of quote

Jan Didden

Issue 1- with 45 tested, your magin of error is about 15%.
issue 2- someone did hear a difference, you rejected it though. (53%)
 
When poking round near a threshold (of hearing say), there's always going to be uncertainty.

That's irrelevant though. When differences are "clearly audible", you can expect 100% correct answers from everyone who reports they can hear those differences sighted. Any less and you have to ask yourself what's going on - either it's becoming inaudible, or something in the test is throwing people off. While it is possible to get 50% by chance, it's extremely unlikely and even less likely to be replicated.
 
As you know, I am working on high end PA systems where the comparison is direct and immediate.
I always have to compare the sound directly from stage to the same sound coming from my speakers. It is always different, so I have a lot yet to do, and I don't believe that somebody have done that already.
 
It has been mentioned before but worth repeating.
The main reason given by people who discount DBTing is that the conditions/surroundings of the test affects their hearing ability.
So they admit that factors other than the actual sound affects what they hear, and yet they don't think that seeing which amp or cable they are listening to has any effect on what they hear.
 
The DBT is a good one, and I’ve done a few. Tests blind, double blind and not blind at all. All have their merits. But they also have faults. For example:
There have many times in blind tests when the difference was obvious - but there have been times when I could not tell the difference by listening but I could “feel” the difference. And that’s interesting. Yes, go ahead and roll your eyes, skip to the next post if you must – but here is an example of one such test.

One day I was comparing DACs. Old DAC vs new DAC vs CD player. Made sure to level check everything and had my wife connect the DACs so I did not know which was which in this A/B/C test. C was always easy to hear – kinda dull and dirty. C turned out to be the CD player. But DAC A and DAC B – I could not hear a difference. The switch was seamless, as though there was no switch at all. No difference that I could hear.

But

I noticed something odd. Every time DAC “A” was in play my jaw would relax, I would sink deeper into the couch, my heartbeat would actually speed up and my foot would start tapping. But there is no way I could actually hear a difference when switching between the two. And yet DAC A always made me feel better and the music was more enjoyable. I didn’t want to go back to B or C. Why? I certainly could not have identified DAC A or B by ear – maybe not even have heard the switch – but given 4 or 5 bars of music the feeling was always different.

So maybe some of those tests where the listener is wired to a polygraph or EEG or EKG might reveal something more than asking “can you hear the difference?”



FWIW, the A/B test between DACs A&B was repeated a month later on a different system, different room, different state. There it was audible, and even a casual bystander preferred “A”.
 
beware the ugly DBTing et al...

how much more of this BS is going to make it to these (and other) forums?

Objectionists believe that there is no measureable difference, therefore there is no difference.

Subjectionists believe there is a discernable difference based on their personal (and hence subjective) responses.

pano makes an interesting point. If a level-matched blind listening test can occur (with a adjustable level remote preamp, maybe), and the levels can be verified with an SPL metre, everything else being "=", then the only thing to change would be the equipment being checked, and the music. This is assuming that we are talking about sources here. Amps may technically be slightly more difficult to test in this matter, but a rig could be made to allow for this as well.

Measuring some kind of emotional response would be an interesting experiment, and I might suggest one of the few ways that we could (perhaps) reconcile the two differing camps.

I don't care about measured performance (much). Specifications need not apply at iglooNanook, short of checking perhaps technical behaviour (eg measurement of values, output ,etc.)

Musicians almost always make lousy listeners anyways. Why? As the late John Lee Hooker stated during a Stereophile interview (a series about musicians and their sound systems), he knew most of the music so well that his brain could fill in the rest. Of course there are always exceptions to "any" rule, so take that with a grain of salt.

So the AES can go on about the subjective-objective thing all they want. The only way we can be assured that a system is doing what it is supposed to do is if we stop making excuses for our equipment, stop making excuses for our listening environment, and really start enjoying our systems. I can tell you all from my personal experiences that some of the most satisfying systems I have heard cost under $100, and did not leave me wanting. If you start re-discovering old music or discovering new music, then I'm convinced our systems are doing the right thing--allowing us (correctly or incorrectly) to enjoy music.

As an aside, the "Spirit of Orion" project is nearing completion with no or little technical testing done. I have never heard a system that is as completely capable (including all the great speakers at the Vancouver Island diyAudio'fest '08), nor as jaw dropping. Not because I designed it, not because I helped build it. But because it is a truly full frequency system that almost any could have, is sweet and musical and does almost all of the hi-fi stuff as well (sorry Ian, no room correction or error correction/equalization). One problem I didn't really count on is how revealing the Spirits are regarding upstream equipment. Good sources and good amplification are a must.

later

stew
 
Nanoo. Good post, your listening experiences are mine also.

FWIW
The sujectivist objectivist debate often devolves into, "my hearing ability is better than yours".
Much the same as, "mine is bigger than yours". Who cares? A lot apparently.
 
Re: beware the ugly DBTing et al...

Originally posted by Nanook: Measuring some kind of emotional response would be an interesting experiment, and I might suggest one of the few ways that we could (perhaps) reconcile the two differing camps.

Something roughly equivalent has been done, monitoring the brainwaves of subjects listening to a form of traditional Thai music with abundant content above 20 kHz, both full range and Redbook brick wall filtered. The authors claim statistically significant differences, someone told me (as I recall) either Lipschitz or Vanderkooy found significant flaws with the test protocol.

For the hard core, that right there ^ above is a cap /resistor picker noting potential flaws with an experiment contradicting the basis of Redbook. Keep it in mind next time the 'all subjectivists' urge arises. =D
 
er, OT here: demo"ing" the "Spirit of Orions"

(mods, please move if more appropriate elsewhere) sorry for the OT post here, but in response to Ian:

well I'm a little farther than Vancouver from you (add about another 850-900 miles further), so a personal demo for you may be slightly out for the moment 🙁

Perhaps I could be talked into a return visit to casaDlugos with them in tow next year (not nearly as big as Cal's "ghetto blaster minis" though, BUT I'd have to get a mini van or bring my truck).

As I have no means of measurement, I'd be up for a suggestion on measuring the frequency response. I wonder if I could do a USB ADC/Dac , and input from my trusty Radio Shack all analog SPL metre (there are correction tables available), or modify the RS meter with a better , known microphone insert to said ADC and then into my iMac (or a PC). Then perhaps I'd be able to do a nice in-room frequency sweep. Right now working on limiting the energy (and frequencies) going to the JX92 drivers. I can't say that I feel the need for either more low-end or high-end information, just trying to clean up the midrange and protect the little Jordans as much as possible (and allow the few Watts that we're running them with to do the most). Class A/B tube amp, 52 watts/ch. The 300B amp simply doesn't have enough dampening factor to do it.
 
myhrrhleine said:


Issue 1- with 45 tested, your magin of error is about 15%.
issue 2- someone did hear a difference, you rejected it though. (53%)


Hi,

Just for the record: I have not performed this test, I was merely quoting it.

Issue 1 - So, margin at 45 tested is 15%. Hmm. Since you are apparently knowing what you are talking about, just for perspective, what would you say is the margin of error if one guy listening all by himself reports a particular outcome?

Issue 2 - I reread the text but couln't find any reference to one guy hearing a difference and being rejected. Care to fill me in?

Jan Didden
 
fredex said:
It has been mentioned before but worth repeating.
The main reason given by people who discount DBTing is that the conditions/surroundings of the test affects their hearing ability.
So they admit that factors other than the actual sound affects what they hear, and yet they don't think that seeing which amp or cable they are listening to has any effect on what they hear.


Very good! May I use that?

Jan Didden
 
Status
Not open for further replies.