Hello all!
I am on the odyssey of finding out what makes the difference between various audio players on the PC. There is one area that I have not considered until right now: dithering.
Can the way a player dithers from the 64-bit floating point internal format to the external 24-bit integer format make a difference between the SQ of the player? Since noised shaping is involved, can this process be just as much as it is an art as it is a science? For that matter, integer mode with the player may make the difference in quality since, for better or worse, it is the software that will be doing this conversion?
Thanks!
PS: I may have finally found what is responsible for the differences between audio players.
I am on the odyssey of finding out what makes the difference between various audio players on the PC. There is one area that I have not considered until right now: dithering.
Can the way a player dithers from the 64-bit floating point internal format to the external 24-bit integer format make a difference between the SQ of the player? Since noised shaping is involved, can this process be just as much as it is an art as it is a science? For that matter, integer mode with the player may make the difference in quality since, for better or worse, it is the software that will be doing this conversion?
Thanks!
PS: I may have finally found what is responsible for the differences between audio players.
I use Linux with MPD, Ecasound and dither with SOX. I can't say I noticed any real difference when I turned the dithering on...
I would be strongly tempted to do double-blind or ABX testing to see if you can really hear any difference, if you care. At this point, I didn't hear a difference even when I knew it was "on" or "off", so I am not heading that way.
I would be strongly tempted to do double-blind or ABX testing to see if you can really hear any difference, if you care. At this point, I didn't hear a difference even when I knew it was "on" or "off", so I am not heading that way.
at 24 bits dither is irrelevant - electronic noise of today's very best audio DACs are <~21bit
so the analog electronics' thermal/Johnson noise adequately dithers 24 bit PCM - unbiased rounding would be all you need numerically to truncate to 24 bits
"flavors" of dither may become an issue at 16 bits - perceptually weighted noise shaping involves choices of weighting curves
so the analog electronics' thermal/Johnson noise adequately dithers 24 bit PCM - unbiased rounding would be all you need numerically to truncate to 24 bits
"flavors" of dither may become an issue at 16 bits - perceptually weighted noise shaping involves choices of weighting curves
Basically nobody bothers with dither from 64 bit double to 24 bits for audio (or even the far more common 32 bit float to 24 bit conversion) simply because there is basically no point. You do sometimes see it done in things like oil exploration where the signals really do have that sort of dynamic range, but for audio it is very much guilding the lilly.
The mantissa of a float matches the precision of a 24 bit integer anyway, and it is significantly faster to scale to +- (INT_MAX -1) then just truncate then it is to do the above, add dither then truncate.
Given that 120dB dynamic range is about as good as it gets (And most rooms and repro chains are a long way short of that), and that 24 bits is ~144dB, that leaves you some 24dB to loose the intermod in before it appears above the broadband noise assuming a perfectly silent source recording with no thermal noise of its own.
In practise even a very good recording done with excellent mics on a close miked very loud source will not manage a noise floor 120dB down on peak level in most circumstances.
Given that the guys writing the digital audio worstations don't bother (And that is the kit used to produce the bloody recording) it seems unlikely that there is much be be gained in worrying about it later.
Regards, Dan.
The mantissa of a float matches the precision of a 24 bit integer anyway, and it is significantly faster to scale to +- (INT_MAX -1) then just truncate then it is to do the above, add dither then truncate.
Given that 120dB dynamic range is about as good as it gets (And most rooms and repro chains are a long way short of that), and that 24 bits is ~144dB, that leaves you some 24dB to loose the intermod in before it appears above the broadband noise assuming a perfectly silent source recording with no thermal noise of its own.
In practise even a very good recording done with excellent mics on a close miked very loud source will not manage a noise floor 120dB down on peak level in most circumstances.
Given that the guys writing the digital audio worstations don't bother (And that is the kit used to produce the bloody recording) it seems unlikely that there is much be be gained in worrying about it later.
Regards, Dan.
[snip]
Given that the guys writing the digital audio worstations don't bother (And that is the kit used to produce the bloody recording) it seems unlikely that there is much be be gained in worrying about it later.
Regards, Dan.
So no recording software uses dithering on output to a 24-bit target on disk? Well, so much for my "discovery". So there probably is no benefit to integer mode set on the player, correct?
Another urban legend bites the dust! LOL
Bob
Last edited:
Depends, there are a **Lot** of very poorly implemented playback tools out there and it is entirely within the relm of the possible that some of them get one or the other mode wrong....
But yea, with a correctly written toolchain, integer buys you nothing worth having (And fixed point math is harder to write then floating point, so is more prone to implementation screwups).
Hell, there was a version of a **Major** DAW out there for a long time that had implemented an 'interesting' rounding mode that made it rather bloody obvious that you had used the thing, that screwup was probably the origin of the whole fixed point thing.
Fact is audio just ain't that critical, required dynamic range on a distribution format (Production needs more because you are never sure when the peaks will end up) is under 100dB in just about all situations, 20-20K give or take, this is trivial to deal with.
The transducers are where the magic happens, but unless highly budget or power constrained the electronics is basically straightforward (And the electronics usually has orders of magnitude better behaviour then the transducers).
Regards, Dan.
But yea, with a correctly written toolchain, integer buys you nothing worth having (And fixed point math is harder to write then floating point, so is more prone to implementation screwups).
Hell, there was a version of a **Major** DAW out there for a long time that had implemented an 'interesting' rounding mode that made it rather bloody obvious that you had used the thing, that screwup was probably the origin of the whole fixed point thing.
Fact is audio just ain't that critical, required dynamic range on a distribution format (Production needs more because you are never sure when the peaks will end up) is under 100dB in just about all situations, 20-20K give or take, this is trivial to deal with.
The transducers are where the magic happens, but unless highly budget or power constrained the electronics is basically straightforward (And the electronics usually has orders of magnitude better behaviour then the transducers).
Regards, Dan.
at 24 bits dither is irrelevant - electronic noise of today's very best audio DACs are <~21bit
so the analog electronics' thermal/Johnson noise adequately dithers 24 bit PCM - unbiased rounding would be all you need numerically to truncate to 24 bits
"flavors" of dither may become an issue at 16 bits - perceptually weighted noise shaping involves choices of weighting curves
I think I now see that you are getting at. It would make a difference if 64-bit floats were converted to 16-bit integers by the software. So it is advisable to setup the player to output 24-bit digital information to avoid this problem, and hope that the player software is doing something intelligent with this information, like a direct conversion between float and the 24-bit integer.
Correct?
Come to think of it, isn't it true that the default way the DAC handles this is by directly accepting the float numbers? And then it does the needed conversions itself? Correct?
Bob
actually I should make a distinction between dither and simple noise masking
additive dither, noise above a few lsb will do, destroys the correlation of distortion products from truncating longer words - this noise is always going to be present in any live human musical performance recording at levels much higher than 24 bit lsb
DAC and electronic noise added after the truncation/rounding can only mask the theoretical correlated distortion products that could in principle be seen with a synthetic signal fading through the 24 lsb truncation/rounding level
http://audio.rightmark.org/lukin/dither/ may help understanding dither
additive dither, noise above a few lsb will do, destroys the correlation of distortion products from truncating longer words - this noise is always going to be present in any live human musical performance recording at levels much higher than 24 bit lsb
DAC and electronic noise added after the truncation/rounding can only mask the theoretical correlated distortion products that could in principle be seen with a synthetic signal fading through the 24 lsb truncation/rounding level
http://audio.rightmark.org/lukin/dither/ may help understanding dither
Last edited:
actually I should make a distinction between dither and simple noise masking
additive dither, noise above a few lsb will do, destroys the correlation of distortion products from truncating longer words - this noise is always going to be present in any live human musical performance recording at levels much higher than 24 bit lsb
DAC and electronic noise added after the truncation/rounding can only mask the theoretical correlated distortion products that could in principle be seen with a synthetic signal fading through the 24 lsb truncation/rounding level
Homepage of Alexey Lukin may help understanding dither
I see what dithering is used for, which is very interesting indeed! If the player indicates it is sending 44.1 64-bits to the DAC, does this mean the DAC is handling any truncation where necessary? Or is this a floating point number that it is handing to the DAC?
>>>Also, does the difference in integer mode determine whether integers instead of floating point numbers are handed to the DAC?
Sorry for all the questions, but all of you are one of the few knowledgeable persons who is willing to explain all of this to me. There is allot of BS out there. 🙂
There is lots of BS on audio internet discussions indeed. The BS is spread by unscrupulous goldminers and their clueless gullible herd. Fortunately this and a few other sites offer the refreshing air of reality. And honestly I am glad you want to look beneath the layer of marketing BS.
A soundcard consists of two parts - the interface controller and the DAC or SPDIF transmitter (let's call it just DAC for simplicity). The interface controller communicates with the PC via USB, PCI/e, FW. The DAC is linked to the controller most often via I2S or Intel HDA buses.
The DAC chips accept only integer samples - 16, 24, 32 bits wide. I have yet to hear of a DAC accepting floating point. Real-world resolution of the consumer technology is below 24 bits. Latest DACs with 32bits input are designed to simplify the connection, not because of higher resolution.
The soundcard controller reads samples from the PC RAM (either directly by DMA, or via a USB bus which in turn is fed by a USB controller reading the samples from RAM by DMA too). It also accepts only integer samples. The most common size is 32bits, some accept 16 bits, some USB controllers accept 24 bits too. The controller either trims or appends zeros to the incoming samples to fit the DAC specifications. E.g. Envy24 based cards (e.g. ESI Juli@) accept only 32 bits, while their DACs/ADCs always accept 24 bit I2S. The controller removes the zeros, no dithering, it does no DSP.
The samples are stored into RAM by the soundcard driver, a part of the OS kernel. The driver itself accepts only formats supported by the card controller - still exclusively integer.
And above the driver there are several layers of software. Each OS has a different structure of audio layers.
The most typical layer is the stream mixer. It allows multiple applications to play through the soundcard at the same time. If two applications request a different samplerate, the mixer layer has to resample to a common rate. Typically this layer runs at floating point as it makes calculations easier. The end result gets always converted to one of the formats accepted by the driver/the soundcard - integer.
The mixing/audio layer gets developed over time.
In windows - kmixer - kernel streaming - WASAPI - WASAPI,... I am not knowledgable about windows. In linux - dmix - pulseaudio.
In OSX - core audio.
Audio power users did not need the mixing layer, they needed low-latency instead, i.e. access to the lowest level of the stack possible. Since Microsoft did not bother to expose the low-level API, ASIO API was created. It is just a layer of drivers and API offering the raw hardware access. Every soundcard can support ASIO if there is a driver for this API, the HW itself is a regular soundcard. Only recently did MS offer direct access too - WASAPI exclusive?
Apple heard the calls for raw access and implemented the "integer mode". An API to circumvent the floating point mixing layer. The discussions about "which DAC supports integer mode" are misaimed - every soundcard controller does, it is just a question of appropriate drivers supporting the new API.
In linux alsa the raw device access is offered from the very beginning. Actually the source code of alsa drivers is a great (and often the only) source of credible information about HW features of a particular soundcard.
BTW, very few soundcards offer DSP capability in hardware, X-Fi being the most typical exception. Vast majority of effects is done by SW supplied by the soundcard manufacturer in userspace, way above the actual driver.
Now your questions can be comprehensibly referenced to their corresponding layers.
The stuff is actually pretty simple and there is absolutely no voodo involved.
A soundcard consists of two parts - the interface controller and the DAC or SPDIF transmitter (let's call it just DAC for simplicity). The interface controller communicates with the PC via USB, PCI/e, FW. The DAC is linked to the controller most often via I2S or Intel HDA buses.
The DAC chips accept only integer samples - 16, 24, 32 bits wide. I have yet to hear of a DAC accepting floating point. Real-world resolution of the consumer technology is below 24 bits. Latest DACs with 32bits input are designed to simplify the connection, not because of higher resolution.
The soundcard controller reads samples from the PC RAM (either directly by DMA, or via a USB bus which in turn is fed by a USB controller reading the samples from RAM by DMA too). It also accepts only integer samples. The most common size is 32bits, some accept 16 bits, some USB controllers accept 24 bits too. The controller either trims or appends zeros to the incoming samples to fit the DAC specifications. E.g. Envy24 based cards (e.g. ESI Juli@) accept only 32 bits, while their DACs/ADCs always accept 24 bit I2S. The controller removes the zeros, no dithering, it does no DSP.
The samples are stored into RAM by the soundcard driver, a part of the OS kernel. The driver itself accepts only formats supported by the card controller - still exclusively integer.
And above the driver there are several layers of software. Each OS has a different structure of audio layers.
The most typical layer is the stream mixer. It allows multiple applications to play through the soundcard at the same time. If two applications request a different samplerate, the mixer layer has to resample to a common rate. Typically this layer runs at floating point as it makes calculations easier. The end result gets always converted to one of the formats accepted by the driver/the soundcard - integer.
The mixing/audio layer gets developed over time.
In windows - kmixer - kernel streaming - WASAPI - WASAPI,... I am not knowledgable about windows. In linux - dmix - pulseaudio.
In OSX - core audio.
Audio power users did not need the mixing layer, they needed low-latency instead, i.e. access to the lowest level of the stack possible. Since Microsoft did not bother to expose the low-level API, ASIO API was created. It is just a layer of drivers and API offering the raw hardware access. Every soundcard can support ASIO if there is a driver for this API, the HW itself is a regular soundcard. Only recently did MS offer direct access too - WASAPI exclusive?
Apple heard the calls for raw access and implemented the "integer mode". An API to circumvent the floating point mixing layer. The discussions about "which DAC supports integer mode" are misaimed - every soundcard controller does, it is just a question of appropriate drivers supporting the new API.
In linux alsa the raw device access is offered from the very beginning. Actually the source code of alsa drivers is a great (and often the only) source of credible information about HW features of a particular soundcard.
BTW, very few soundcards offer DSP capability in hardware, X-Fi being the most typical exception. Vast majority of effects is done by SW supplied by the soundcard manufacturer in userspace, way above the actual driver.
Now your questions can be comprehensibly referenced to their corresponding layers.
The stuff is actually pretty simple and there is absolutely no voodo involved.
Last edited:
The soundcard controller reads samples from the PC RAM (either directly by DMA, or via a USB bus which in turn is fed by a USB controller reading the samples from RAM by DMA too). It also accepts only integer samples. The most common size is 32bits, some accept 16 bits, some USB controllers accept 24 bits too. The controller either trims or appends zeros to the incoming samples to fit the DAC specifications. E.g. Envy24 based cards (e.g. ESI Juli@) accept only 32 bits, while their DACs/ADCs always accept 24 bit I2S. The controller removes the zeros, no dithering, it does no DSP.
I suppose an external DAC with its own USB controller operates in a similar way. The following here refers to the Mac and Core Audio. So internally it is a 32-bit word that gets trimmed by the controller to the requirements of the DAC before the data is sent to the DAC? This must mean 32-bit data is sent to the external DAC? So the conversion between float and integer with "integer mode" occurs within the player itself? I am assuming DSP is being utilized by the player. Otherwise does the player keep data in integer mode directly to the lower level hardware driver?
Audio power users did not need the mixing layer, they needed low-latency instead, i.e. access to the lowest level of the stack possible.
Is low latency required for bit-perfect playback from a audio player? The data ends up in a USB buffer anyways.
Here is my dithering question. I wanted to know if the data format used by the player (and Core Audio) needs to be dithered before it is sent out through the hardware driver to the DAC. But if the internal format is 32-bit, there should really be no dithering required, just a truncation of the numeric data before it is sent to the DAC. Correct?
However, if floats are being used due to for instance DSP processing in the player, it first needs to convert this to integers. Conversion between floats and 32-bit (or even 24-bit) data is trivial. If the controller only accepts 16-bit integers, which I hope is not the case for my audio setup, then dithering would be required either by the player if integer mode is being used, or by Core Audio. So I guess it would come down to which software does a better job at dithering. Otherwise there should not be any loss of accuracy for bit-perfect playback which means dithering is not required.
How does this sound to you?
Thanks!
EDIT: Only 16/24 bit Resolutions are supported by the Tenor TE7022L USB audio streaming controller in my external DAC. So the 32-bit data is truncated to 16-bits or 24-bits by the driver on the Mac before sending it out to the DAC? I would hope we are talking 24-bit data in my case.
I suppose an external DAC with its own USB controller operates in a similar way. The following here refers to the Mac and Core Audio. So internally it is a 32-bit word that gets trimmed by the controller to the requirements of the DAC before the data is sent to the DAC? This must mean 32-bit data is sent to the external DAC? So the conversion between float and integer with "integer mode" occurs within the player itself? I am assuming DSP is being utilized by the player. Otherwise does the player keep data in integer mode directly to the lower level hardware driver?
Audio power users did not need the mixing layer, they needed low-latency instead, i.e. access to the lowest level of the stack possible.
Is low latency required for bit-perfect playback from a audio player? The data ends up in a USB buffer anyways.
Here is my dithering question. I wanted to know if the data format used by the player (and Core Audio) needs to be dithered before it is sent out through the hardware driver to the DAC. But if the internal format is 32-bit, there should really be no dithering required, just a truncation of the numeric data before it is sent to the DAC. Correct?
However, if floats are being used due to for instance DSP processing in the player, it first needs to convert this to integers. Conversion between floats and 32-bit (or even 24-bit) data is trivial. If the controller only accepts 16-bit integers, which I hope is not the case for my audio setup, then dithering would be required either by the player if integer mode is being used, or by Core Audio. So I guess it would come down to which software does a better job at dithering. Otherwise there should not be any loss of accuracy for bit-perfect playback which means dithering is not required.
How does this sound to you?
Thanks!
EDIT: Only 16/24 bit Resolutions are supported by the Tenor TE7022L USB audio streaming controller in my external DAC. So the 32-bit data is truncated to 16-bits or 24-bits by the driver on the Mac before sending it out to the DAC? I would hope we are talking 24-bit data in my case.
Last edited:
When doing some processing and converting back to 16 bits, dithering really matters, if truncation is used instead it sounds like a loss of detail.
However when converting to 24 bits, since the LSB is 256x smaller, it matters 256 times less !... I wouldn't worry about that.
If your soundcard has a SPDIF output, a simple way to check is to play some audio, output it to SPDIF, and record the SPDIF with your computer (it probably has a SPDIF input). If the original file and the recorded one are bit-identical (after adjusting the delay), then all your questions will be answered. If they are not, then you know you got a problem somewhere.
However when converting to 24 bits, since the LSB is 256x smaller, it matters 256 times less !... I wouldn't worry about that.
If your soundcard has a SPDIF output, a simple way to check is to play some audio, output it to SPDIF, and record the SPDIF with your computer (it probably has a SPDIF input). If the original file and the recorded one are bit-identical (after adjusting the delay), then all your questions will be answered. If they are not, then you know you got a problem somewhere.
I suppose an external DAC with its own USB controller operates in a similar way.
Every USB soundcard has a USB receiver and a DAC. The cheap ones combine the two functions into a single chip.
The following here refers to the Mac and Core Audio. So internally it is a 32-bit word that gets trimmed by the controller to the requirements of the DAC before the data is sent to the DAC? This must mean 32-bit data is sent to the external DAC? So the conversion between float and integer with "integer mode" occurs within the player itself? I am assuming DSP is being utilized by the player. Otherwise does the player keep data in integer mode directly to the lower level hardware driver?
Of course, the USB layer accepts only integer format. Some layer above has to perform the conversion from float to int.
Is low latency required for bit-perfect playback from a audio player?
Absolutely not, latency and bit-perfection have nothing in common.
Here is my dithering question. I wanted to know if the data format used by the player (and Core Audio) needs to be dithered before it is sent out through the hardware driver to the DAC. But if the internal format is 32-bit, there should really be no dithering required, just a truncation of the numeric data before it is sent to the DAC. Correct?
Right. Dithering to int32 or int24 is useless.
However, if floats are being used due to for instance DSP processing in the player, it first needs to convert this to integers. Conversion between floats and 32-bit (or even 24-bit) data is trivial. If the controller only accepts 16-bit integers, which I hope is not the case for my audio setup, then dithering would be required either by the player if integer mode is being used, or by Core Audio. So I guess it would come down to which software does a better job at dithering. Otherwise there should not be any loss of accuracy for bit-perfect playback which means dithering is not required.
I do not know whether core audio dithers when truncating to 16bits. In any case I very much doubt you would be able to tell the difference in a DBT. Honestly, I would not worry about that.
EDIT: Only 16/24 bit Resolutions are supported by the Tenor TE7022L USB audio streaming controller in my external DAC. So the 32-bit data is truncated to 16-bits or 24-bits by the driver on the Mac before sending it out to the DAC? I would hope we are talking 24-bit data in my case.
Your USB soundcard offer several altsets - configurations. The OSX audio subsystem should select the one fitting most the required parameters. I would assume it works correctly and picks the 24bit altset. In such case the low-level driver accepts 24bits only and some layer above has to truncate.
Every USB soundcard has a USB receiver and a DAC. The cheap ones combine the two functions into a single chip.
[deleted excellent information]
Thank you very much! You and others here are very helpful if one, such as myself, makes the effort and is willing to listen. I wish this was the first place I checked. 🙂
- Status
- Not open for further replies.
- Home
- Source & Line
- PC Based
- Dithering?