Highest peak between 400Hz-30kHz is at -88.2dB.
Highest crest factor?
I believe Audacity does not show crest factor. Anyhow this was just a quick&dirty measurement. The resulting null is highly dependent on other factors than DAC quality. Due to the analog leg (DAC-to-ADC) it is very difficult to have full synchronization of the original recording and the re-recording. This can be seen on the samples waveform as there is a minute difference in samples even though the wave format is the same. Also gains should be matched closely. I just used the 0.1dB gain resolution of Audacity.
There is something strange about these plots. Assuming that the normal input range is -1 to 1, with -1 corresponding to 25 % ones and 75 % zeros, 0 with 50 % ones and 50 % zeros, and 1 with 75 % ones and 25 % zeros, and that the SOX modulator is a plain single-bit sigma-delta, I would expect to see a tone or narrow bump at half the sigma-delta sample rate, 1.4112 MHz, when the input signal is 0. Offsets should shift it by 705.6 kHz per unit offset, so you get an audible mixing product with a 1.4112 MHz ripple at the reference or clock when the offset is between 0.000028 and 0.028 or between -0.000028 and -0.028. At +1 or -1, it should end up at a quarter of the sigma-delta sample rate, so 705.6 kHz.Here are the spectrograms for a range of DC offsets. Structured noise is definitely there, with a FM tone in the middle. It looks quite neat when flipping through the pictures. So it is a big failure for this encoder.
https://peufeu.fr/audio/dsd_6_spectrogram_dc.zip
Marcel, here is the python source code. It is quite simple, just loading the raw DSD file and using the spectrogram function. You could use it on your modulator, just by adapting the IO. I'm curious to see if your multibit approach gets rid of these tones.
https://peufeu.fr/audio/dsd_6_spectrogram_dc.py
Instead, you get a tone at 705.6 kHz for an input of 0 instead of +/- 1. Some mix-up between signed and unsigned maybe?
A small offset should just shift the tone a bit, not modulate it all the way from 0 to Nyquist, but still you get this at an offset of -0.000007:
No offset, here it is playing music. But look at the top of the spectrum! It looks like the music is encoded in the noise or something. Modulating this with half the sampling frequency, as you suggest, should be interesting. It's just like having 32*Fs noise frequency on VREF.
View attachment 1079240
Here is the result. It is still aligned, so you can play it along with the original file.
Basically I just multiplied the DSD signal by (1,-1,1,-1...). So if you want to know what a 0.01% ripple on VREF sounds like, scale this to 0.01% and listen to it or add it to the original song. (spoiler: it's atrocious)
https://peufeu.fr/audio/05 Grandmother.flac-derivative.flac <--- WARNING: this one is full scale, so lower the volume a lot!
This plot looks exactly as I would expect when the music is soft, about -20 dB or so. The momentary value of the music modulates the momentary frequency of the idle tone around half the sigma-delta sample rate. The idle tone has an alias at fs,sigma-delta - ftone, so you always get a tone below fs,sigma-delta/2 that depends on the absolute value of the momentary value of the music.
YesThere is something strange about these plots. Assuming that the normal input range is -1 to 1, with -1 corresponding to 25 % ones and 75 % zeros, 0 with 50 % ones and 50 % zeros, and 1 with 75 % ones and 25 % zeros,
No mix-up, it's all in float32, I checked the DSD sample values and they are correct. The tone at 1.4M is there, but low amplitude, so it didn't show on the spectrogram due to the limited color scale.and that the SOX modulator is a plain single-bit sigma-delta, I would expect to see a tone or narrow bump at half the sigma-delta sample rate, 1.4112 MHz, when the input signal is 0. Offsets should shift it by 705.6 kHz per unit offset, so you get an audible mixing product with a 1.4112 MHz ripple at the reference or clock when the offset is between 0.000028 and 0.028 or between -0.000028 and -0.028. At +1 or -1, it should end up at a quarter of the sigma-delta sample rate, so 705.6 kHz.
Instead, you get a tone at 705.6 kHz for an input of 0 instead of +/- 1. Some mix-up between signed and unsigned maybe?
Well I guess this modulator has problems! Mark's Japanese software and SARACON are much cleaner.A small offset should just shift the tone a bit, not modulate it all the way from 0 to Nyquist, but still you get this at an offset of -0.000007:
Is this also the case with a multibit sigma delta?you always get a tone below fs,sigma-delta/2 that depends on the absolute value of the momentary value of the music.
That spectrum plot (post #268) looks like the whole modulator is running at half the intended clock frequency, DSD32 instead of DSD64. I thought the spectrogram with music from post #253 looks perfectly normal, but actually I don't see the music at audio frequencies (the bottom part is all dark).
Multibit sigma-deltas need not have tones around fs,sigma-delta/2. Whether they really don't have them depends on the design.
Multibit sigma-deltas need not have tones around fs,sigma-delta/2. Whether they really don't have them depends on the design.
It's running at DSD64, it just has a pretty bad idle tone.That spectrum plot (post #268) looks like the whole modulator is running at half the intended clock frequency, DSD32 instead of DSD64.
Yeah I noticed too. It's in pixel #0 which is hidden under the X axis. Matplotlib oddity 😀I thought the spectrogram with music from post #253 looks perfectly normal, but actually I don't see the music at audio frequencies (the bottom part is all dark).
If I drag the plot, it's there.
The plot of post #268 has a notch around 1.4112 MHz. Normally you only get notches around multiples of the sample rate, as those are aliases of the band around 0 Hz. Then again, the spectrum just below 1.4112 MHz doesn't look like the mirror image of the part just above 0 MHz.
All this is well and good as far as it goes. IIRC the Teac DSD converter sounded pretty bad. However IIRC the Saracon DSD conversion that I heard didn't compare sound-wise with some HQ Player algorithms.
Perhaps there is more that can make DSD conversion and or associated upsampling sound good or bad besides modulator noise and or idle-tone behavior. For instance how about audio band phase response, are transients reproduced intact?
How about when modulator noise is very low level, does its particular correlation with the audio signal still allow the ear/brain system (of at least some people) to learn to recognize something about it? If so (and according to ESS the ear 'exquisitely' sensitive to signal correlated noise), could that be why the HQ Player adaptive algorithm is often preferred. Is it that the noise correlation is non-stationary in a good and or useful way?
Perhaps there is more that can make DSD conversion and or associated upsampling sound good or bad besides modulator noise and or idle-tone behavior. For instance how about audio band phase response, are transients reproduced intact?
How about when modulator noise is very low level, does its particular correlation with the audio signal still allow the ear/brain system (of at least some people) to learn to recognize something about it? If so (and according to ESS the ear 'exquisitely' sensitive to signal correlated noise), could that be why the HQ Player adaptive algorithm is often preferred. Is it that the noise correlation is non-stationary in a good and or useful way?
Last edited:
Regarding audio band phase response of a sigma-delta modulator, to the extent that you can trust the simplistic linear time-invariant models used for calculating coefficients, the signal transfer function corresponds to an IIR low-pass filter with the same poles as the noise transfer function. The zeros depend on the design, there can be none at all, zeros covering the poles and anything in between.
As the poles determine where the noise transfer function becomes flat, they are usually placed well above the audio band, at locations corresponding to a corner frequency of a few hundred kilohertz. There will therefore not be much phase nonlinearity without zeros and none with zeros covering the poles.
As the poles determine where the noise transfer function becomes flat, they are usually placed well above the audio band, at locations corresponding to a corner frequency of a few hundred kilohertz. There will therefore not be much phase nonlinearity without zeros and none with zeros covering the poles.
...to the extent that you can trust the simplistic linear time-invariant models used for calculating coefficients, the signal transfer function corresponds to an IIR low-pass filter with the same poles as the noise transfer function.
Makes sense. Its that when I look at some of the noise shaping graphs, I see noise FR bumps in the noise in the audio band. Even though they are low level, that raised the question in my mind as to whether those FR bumps correspond with phase shifts in the audio band. Also, getting a more ideal brickwall looking FR often tends to imply more phase shifting. Might I expect less of that with upsampling since that would leave the audio band in the flatter part of the noise shaping FR?
I'm gonna do this to avoid testing the different oversamplers in the DSD encoders:
44.1k music -> upsample to 352.8k ->
352.8k -> DSD encode
352.8k -> upsample to 2.8224M PCM
substract the result of these two then downsample.
Because if I do this:
44.1k music -> DSD encode
44.1k music -> upsample to 2.8224M PCM
substract the result of these two then downsample.
Then it goes through two different upsamplers, so if there's a difference, I can't know if it comes from that or the DSD encoding...
44.1k music -> upsample to 352.8k ->
352.8k -> DSD encode
352.8k -> upsample to 2.8224M PCM
substract the result of these two then downsample.
Because if I do this:
44.1k music -> DSD encode
44.1k music -> upsample to 2.8224M PCM
substract the result of these two then downsample.
Then it goes through two different upsamplers, so if there's a difference, I can't know if it comes from that or the DSD encoding...
Not sure how that makes sense exactly. Say for example, upampling 16/44 by 4x to 24/176, then DSD encode will produce DSD256/44 (means the 44kHz family version of DSD256). BCLK are the same for both PCM and DSD, around 11MHz, and they also produce about the same useful audio bandwidth. So, not clear on why upsample the PCM again before comparing? Besides, can't exactly compare files in two different encoding formats?
EDIT: Or maybe you mean you will play back through a DAC, record the output then analyze? If so, what dac operating in what mode? Or else, maybe use the SOX DSD decoder?
EDIT: Or maybe you mean you will play back through a DAC, record the output then analyze? If so, what dac operating in what mode? Or else, maybe use the SOX DSD decoder?
Last edited:
To take the DSD to PCM conversion out of the equation, and only downsample the difference.not clear on why upsample the PCM again before comparing? Besides, can't exactly compare files in two different encoding formats?
Comparing different formats at the same sample rate is easy, DSD is just +1, -1, +1... it's samples...
Makes sense. Its that when I look at some of the noise shaping graphs, I see noise FR bumps in the noise in the audio band. Even though they are low level, that raised the question in my mind as to whether those FR bumps correspond with phase shifts in the audio band.
There often are notches in the audio band in the noise transfer function (NTF), made by placing the zeros of the NTF (poles of the loop filter) at the appropriate places. The noise transfer function only shapes the quantization noise, including dither if applicable, but none of the other noise contributions. The notches therefore usually look like minor dents when you look at the total noise spectrum, if you see them at all. The part in between two dents then looks like a small bump.
The NTF zeros don't end up in the signal transfer function, so they don't affect the phase shift or the magnitude of the signal.
Also, getting a more ideal brickwall looking FR often tends to imply more phase shifting. Might I expect less of that with upsampling since that would leave the audio band in the flatter part of the noise shaping FR?
I don't understand the question, so I can't answer it.
- Home
- General Interest
- Everything Else
- How we perceive non-linear distortions