MicroSD Memory Card Transport Project

SDTrans192 Rev. 3.0 sonic profile

A SDTrans192 Rev. 3.0 board assigned to overseas users evaluation is now in California after traveling from North Europe and New York.

At this moment, I'd like to give my summary view on SDTrans192 Rev. 3.0's sonic profile. I asked eight users of Rev. 2.1 board (most of them have already tweaked their Rev. 2.1 boards in their own ways) to evaluate the Rev. 3.0. Only one out of eight said his heavily upgraded Rev. 2.1 was superior than the normal Rev. 3.0.
Eight reviewers including me concluded Rev. 3.0 achieved better sonic results than those of Rev. 2.1 upgraded.

Common impressions are;
1. Higher resolution in tones
- this brings improved staging
- this makes us discover tones in tunes even from music instruments played in backing
2. Well-dumped bass

As my experience in audiophile life has been limited, I was very surprised with the second point. I can easily understand that seeking high signal precision in a digital domain brings the result of the first point. However, how can I interpret the effect on the second?

As for bass, additional explanation is necessary.
I experienced contrasted review results on multiple audio systems. Some said the bass of Rev. 3.0 was weak or less punching, not deeper while others said very outperforming and well-dumped. I understand that these different results come from the difference of audio systems they use. Based on my own experiences on several systems, my understanding is;
"Rev. 3.0 would bring positive results in bass on systems that have enough potentials for bass".

In the next post, I will write what Chiaki have done in the design of Rev. 3.0.
 
Last edited:
.......... However, how can I interpret the effect on the second?

..........

I'm not sure what "well dumped" bass means but if you are talking about the hearing a tauter, textured bass with more inner detail in it, then this is a known result of the sound of lower jitter & surprises many people. The apparent lack of bass to some is probably explained by the removal of some bass bloatedness.

I have found that jitter is often perceived as stronger initial attack of the notes & when removed some people feel that their system has lost some detail but in fact all it has lost is some distortion that we have always interpreted as the detail of digital. When listened to more clearly the smoother presentation has no detail missing - it just isn't unnaturally exaggerated. How do I know that this is the correct, more realistic sound? - because of people who have done their own recordings of live events, report it. This taken together with the other audible improvements in 3D sound stage & texture & inner detail in the sounds, leads me to this conclusion.

I liken jitter to MSG (flavour enhancer) in food - initially, it impresses because of the exaggeration/enhancement of the taste but it overlays on every taste & ultimately masks the true range of tastes in nature. It also gives me a headache, much like jitter (digitalis) :)
 
Dear jkeny,

I appreciated your explanation very much.
Every phrase in your reply made me feel very reasonable.
Though my understanding of vocabularies in English might not be enough, I could imagine what you meant in your post.

However, there is just one aspect that I can't agree with you. It's your use of the term "jitter".
Naturally, I can imagine and understand what you mean in a word "jitter" very well.
As for me, I hesitate to call it as "jitter" because the term seems too ambiguous to me. I have never found any well-defined and strictly measured quantitative jitter values in actual digital I2S signals.
For me, it's simply very difficult to tell the harm of "jitter" apart from "noises" in an analog characteristic of digital signals. (Of course, noises cause jitter.)

Anyway, I realize the importance of reducing "jitter" of your meaning in digital signal outputs.

There are three design points to achieve low "jitter" in I2S output.
1. selection of clock sources
2. handling of BCLK signal
3. selection of interfacing method to DAC

Bunpei
 
Last edited:
Sure, Bunpei,
Jitter is an ambiguous term & causes more arguments than it should - I use the term as I'm sure some of what we are talking about is as a result of a reduction in Jitter but probably not all of it. A lot of the confusion about jitter probably arises from the fact that it is treated as a single number rather than a spectrum analysis of the jitter. It seems to me that jitter is at the same point in digital audio as THD was many years ago - just one figure stated. We now know that THD's importance is really about the spectrum analysis of that THD. This understanding has yet to be accepted about jitter & so the arguments about single jitter figures will rage on, much as they did (& do) about THD single distortion figures.

I'm not an expert in this - just a learner & I'm sure there are other (many?) points where jitter can get into an I2S signal - power supply being one such point. This is one of the problems - jitter is so insidious that it can never be eliminated, only reduced to a manageable level/spectrum. One of the advantages that SD card players have over computers is the more manageable number of parameters that need to be handled for best audio playback i.e no moving parts, spinning disks, head stepper motors, etc - a smaller number of processes & general activity.

Using a PC for audio is a bit like trying to play a turntable on a train - difficult!
 
Rev3 impressions

I recently had the pleasure of listening to Bunpei and Chiaki’s SDTrans 192 Rev 3.
I have owned the Rev 2 for nearly a year now. It has been modified with improved regulators/power supplies, and the same clock xtal’s as the Rev3. All material I listened to was 44/16 ripped with EAC.
I initially powered the Rev3 with an apparently less than optimal supply, because my results were not that great. I did hear the low mid and bass improvements Bunpei mentioned but the treble was rather edgy (sibilance) and veiled, if both those words can be used in the same sentence.
I fortunately remembered just a few days before I sent the unit on that I had an Acopian Gold 5V supply (linear) and tried that. There was a very big difference. The treble was very clean and the low mid/bass was even better than before. So in all I heard the Rev 3 having more well defined bass, mid bass, lower mid range, and rivaling my modded Rev2 as far as treble clarity, definition, ambience/air. This low end definition is far more than a frequency bump. There is a very transparent 3D quality to that region. I had thought this to be a speaker flaw in my system, but (once again) the SD Trans showed this to be a source issue. The Rev3 overall provided a more relaxed and balanced (but still detailed) presentation to the music I listen to.

I did not try the HDMI out.
I did not listen to any Hi Res recordings.
I used I2S output.
The utilitarian display and card interface has not changed.
As with the Rev2. the power supply is crucial to performance.

I found myself preferring the overall sound of the Rev3 more than mine.
I have mentioned this before to Bunpei and I will say again here. I did not know my system until I used the SD Trans as a source.

Hats off to Bunpei and Chaiki on the Rev3.
 
To compressit:
I greatly appreciate your report! We are very much pleased with it and happy to know that SDTrans helped you reveal potential of your system.

To jkeny:
I agree with your consideration completely.
-------------------------------------------------------------------------

I explain approaches in SDTrans192 Rev. 3.0.
1. selection of clock sources
2. handling of BCLK signal
3. selection of interfacing method to DAC

1. Selection of clock sources

A crystal oscillator device, NZ2520SD, manufactured by a Japanese leading crystal maker, Nihon Dempa Kogyo Co., Ltd. is selected for clock sources for I2S signal generation.

The model has a remarkable low phase noise profile. According to their measurement result sheet, Phase Noise [dBc/Hz] 26MHz 3.3V for 5 samples are
1Hz: max -76, min -81
10Hz: max -108, min -111
100Hz: max -136, min -138
1kHz: max -151, min -152
10kHz: max -156, min -157

These values are better than those of Rb-tube calibrating type oscillator,
Rubidium Frequency Standard FE-5680A, Frequency Electronics Inc..
Its technical manual says;Phase Noise (fo=10 MHz)
@ 10 Hz: -100 dBc
@ 100Hz: -125 dBc
@ 1000 Hz: -145 dBc

As jkeny explained in his previous post, the simple datasheet value "Jitter" is a statistically representing value. On the other hand, "Phase Noise" is a "spectrum" type graph, for example found here;
An externally hosted image should be here but it was not working when we last tested it.


I think this "Phase Noise" profile is reliable enough. However, only crystal manufacturers used to publish these measurement results because the measurement requires very expensive dedicated instrument such as this system manufactured by Agilent Technologies.
E5505A Phase Noise Measurement Solution, 50 kHz to 110 GHz | Agilent

Two Japanese SDTrans users compared independently NDK NZ2520SD and FE-5680A based clock sources. When they compared them using SDTrans192 Rev. 2.1, both of them concluded FE-5680A was better.
After the latest evaluations using SDTrans192 Rev. 3.0, both published the same comments, "NDK is better" on their blog article (in Japanese language) respectively.

Based on these results, I'd like to recommend DIY users try NZ2520SD in their projects.
 
Last edited:
""""Two Japanese SDTrans users compared independently NDK NZ2520SD and FE-5680A based clock sources. When they compared them using SDTrans192 Rev. 2.1, both of them concluded FE-5680A was better.
After the latest evaluations using SDTrans192 Rev. 3.0, both published the same comments, "NDK is better" on their blog article (in Japanese language) respectively. """

There you are comparing oranges and apples ! FE-5680A is DDS based clock , also its output frequency (12.288Mhz) is multiplied by two to get 24.576Mhz using PLL , so phase noise level becomes completely different (bigger).

By the way I am very curious about those NDK oscillators . Do you know , they use AT or SC cut crystals on them ? What the price per unit ? Thanks .

Vil
 
Last edited:
There you are comparing oranges and apples ! FE-5680A is DDS based clock , also its output frequency (12.288Mhz) is multiplied by two to get 24.576Mhz using PLL , so phase noise level becomes completely different (bigger).
Your speculation on the clock signal processing is correct. As I prefer "minimal" approach, I would not apply FE-5680A for 24.576 MHz. They applied, instead.
In this situation, what they needed was the same 24.576MHz output, "Vitamin C", regardless of orange or apple juice. Extracting Vitamin C from not an orange but an apple might not be a good idea. There were no other optional ways, however, for FE-5680A.
By the way, do you know a calibration (servo) cycle time of reflecting Rb-tube resonance to VCO frequency in FE-5680A? In the case of PRS10, I just remember the servo cycle might be 14.3 ms (=70 Hz).

By the way I am very curious about those NDK oscillators . Do you know , they use AT or SC cut crystals on them ? What the price per unit ?
Sorry. I don't know the cut type of the crystal. They have not published it. I will try to ask it to them.
The price per unit is 1,500 JPY for one piece order. 1,000 JPY for 10 pieces order. A long lead time, 2 - 3 months, is an issue.
Mr. Ishida mounted the unit on a 8-pin dip socket for using in various applications.

129729189366816123670_clock3.jpg
 
Padding residual bits for 24 bit data

SDTrans192 outputs I2S signal of BCLK frequency, 64*fs. This means formal bit length of one channel data is always 32 bit while effective bit length of actual data is 16, 24, or 32 bit depending on its sources. Therefore, in the case of 16 or 24 bit audio data, residual 16 or 8 bit space must be padded with dummy data.

As you may know, LPCM data in audio file is represented in a signed integer representation of 2's compliment.
Current FPGA program on SDTrans192 pads binary 0 for a positive value and binary 1 for a negative value. Originally, padding 0 for all values were adopted. One over seas user requested changing the method and we made the change because two users including the requesting person said "The new method brings better sonic effect."
However, the current way of padding, "binary 0 for positive value and binary 1 for negative value" is mathematically incorrect.

When we extend, for example, 24 bit signed integer to 32 bit signed integer, a method called "signed extension" should be used. In this operation, the sign of original integer is padded for "left side extended space"; binary 1 for a negative value and binary 0 for a positive value, respectively.
However, if we use this direct value for a play, dB value becomes small. Therefore, we should multiply 8 bit number. This requires "arithmetic left shift" that pads eight 0 to its right space. This operation is not sign dependent but involving only fixed 0.

I prefer a pragmatic approach. Even if the operation is "scientifically unsound", I agree with it as long as it results "better sounds".
One practical explanation supporting the "operation padding binary 0 for positive value and binary 1 for negative value" might be "possible favorable effects on power supply conditions for related digital circuits".
 
Last edited:
Carmen Habanera Fantasia 352.8 kHz/32 bit audio source

I obtained "Carmen Habanera Fantasia" 352.8 kHz/32 bit WAV file from one of my acquaintances. The file was once downloadable from this page;
Complimentary High Resolution Downloads Courtesy of First Impression Music (24/88.2, 24/176.4, 32/352.8) | Computer Audiophile
The file is not "integer LPCM" but "floating point LPCM".

As SDTrans192 can't handle 32bit files of 352.8 kHz, we converted the original file into 352.8 kHz/24 bit and 176.4 kHz/32 bit. I felt that the tunes were very "Hi-fi".

Just recently, I could examine FFT spectra of the source and found high range components above 22 kHz is almost removed by a very sharp filter.
I guess it is re-recorded or re-sampled from a master tape or HDD originally recorded for a CD release.
Moreover, floating point LPCM may denote it's not a raw recording but an edited deliverable.

Contrarily, 2L DXD sources are respectful genuine DXD sources.

For your information, I understand that the name, DXD, was given by Digital Audio Denmark (DAD) for a usual LPCM WAV format for 352.8 kHz or 384 kHz.
Therefore, there is no formal independent standard document for DXD is available. The WAV file format produced by the combination of ProTools and DAD AX24 might be a de facto standard for DXD. I believe the WAV file format released by 2L can be regarded as "genuine".
 
Last edited:
Bit perfect play of PCM audio data

SDTrans192 can play only WAV format audio files of integer LPCM representation. It can play neither FLAC, MP3, nor floating point PCM though it can handle 32 bit data up to 192 kHz sampling rate.
The MCU(CPU) device is too busy for executing a task of transferring raw integer LPCM data from SD memory to FPGA that generates I2S signals. No such jobs as format conversion is available at all.
This means its play is completely "bit perfect" for LPCM integer data. No room for conversion.

On the other hand, usual commercial audio DAC devices are compatible only for LPCM integer input data even if the bit length is 32 bit. I have never looked at any commercial audio DAC chip that accepts raw floating point 32 bit data for input. Therefore, feeding a DAC chip with floating point PCM data is quite ridiculous. I have once given floating point 32 bit data to ES9018 from SDTrans by faking data type. The tones obtained were garbled.

If you can play a WAV file of floating point PCM by feeding a DAC chip with a data prepared on computer, a program on a computer must be converting floating point numbers into corresponding integer numbers.
Of course, such automatic conversion is very convenient.
However, it's not "bit perfect".
 
I obtained "Carmen Habanera Fantasia" 352.8 kHz/32 bit WAV file from one of my acquaintances. The file was once downloadable from this page;
Complimentary High Resolution Downloads Courtesy of First Impression Music (24/88.2, 24/176.4, 32/352.8) | Computer Audiophile
The file is not "integer LPCM" but "floating point LPCM".

As SDTrans192 can't handle 32bit files of 352.8 kHz, we converted the original file into 352.8 kHz/24 bit and 176.4 kHz/32 bit. I felt that the tunes were very "Hi-fi".

Just recently, I could examine FFT spectra of the source and found high range components above 22 kHz is almost removed by a very sharp filter.
I guess it is re-recorded or re-sampled from a master tape or HDD originally recorded for a CD release.
Moreover, floating point LPCM may denote it's not a raw recording but an edited deliverable.

Contrarily, 2L DXD sources are respectful genuine DXD sources.

For your information, I understand that the name, DXD, was given by Digital Audio Denmark (DAD) for a usual LPCM WAV format for 352.8 kHz or 384 kHz.
Therefore, there is no formal independent standard document for DXD is available. The WAV file format produced by the combination of ProTools and DAD AX24 might be a de facto standard for DXD. I believe the WAV file format released by 2L can be regarded as "genuine".

Hi Bunpei,

I used you SDTrans192 (not rev 3.0) to verify the 384k operation of my DAC some time back in time.
I expect I then also verified that the SDTrans192 prior to the rev 3.0 technically are able to operate ate 384k :D

I simply switched the clocks on the SDTrans192 so it actually clocked the data out at 384k.
I know this was not a true 384k test as it was a 352.8k file that was clocked at 384k, but I was able to verify that both the FPGA on the SDTrans192 and my DAC performed perfect at 384k samplerates - and that was my goal at the time - to verify the flawless 384k operation of my DAC.
 
hey can anybody tell me how to make simple mp3 player using analog devices' DAC IC(ad1852) allowing file access from a micro SD with the help of an 8051

Have you ever read this page?
VLSI Solution-VS1053 Evaluation Kit
If your requirement is limited to 44.1 kHz/16 bit or 48 kHz/16 bit, VS1053b chip would be able to output I2S signals for AD1852.
The schematic linked to the web page above may help you.
 
... I simply switched the clocks on the SDTrans192 so it actually clocked the data out at 384k.
I know this was not a true 384k test as it was a 352.8k file that was clocked at 384k, but I was able to verify that both the FPGA on the SDTrans192 and my DAC performed perfect at 384k samplerates ...

Hi, RayCtech, Welcome you back to this thread!

The result is very interesting. You have hacked your SDTrans.
I will tell this result to Chiaki.
However, we have no music audio source of genuine 384 kHz/ 24bit recording. Do you think 2L will release such sources?
 
... This requires "arithmetic left shift" that pads eight 0 to its right space. This operation is not sign dependent but involving only fixed 0.

Just recently, I found an illustration that shows fixed "0" padding on residual bit spaces in "S/PDIF Sub Frame". (In this figure, as MSB is located in the right-hand side, right and left notation is reverted to that in my explanation before.)

An externally hosted image should be here but it was not working when we last tested it.


The original blog article(written in Japanese)

?????????PC ????? ???: #24 Digital Audio????????S/PDIF????4?

presented by a president of RATOC Systems explains S/PDIF fundamentals very well in a rigorous manner. (The president is an excellent engineer who is also an audiophile.)

I'd like to explain one important point on an error correction mechanism in S/PDIF here. This topic was covered in the blog article.

One sub frame of S/PDIF consists 32 bit. A main 24 bit part is assigned to LPCM data, 4 bit leading part for "preamble(sync)", and 4 bit trailing part for "flag & status". In the last of the flag & status part, there is one "Parity" bit. If a S/PDIF receiver calculates own 1 parity bit value and compares it to the parity bit value in the flag part, the receiver is able to know whether the sub frame is broken or not. As you may know very well, 1 bit parity information can just tell merely that the value is valid or invalid. No value correction is possible base on just one bit. For error correction, more redundant bits are necessary.
In a secured data communication scheme, when a parity error is detected in a data frame, a re-transmission mechanism for the data frame is to be executed. For example, TCP protocol used in the Internet sometimes invoke re-transmissions if necessary.
In our S/PDIF protocol, no re-transmission mechanism is implemented. The communication involved is just one way. No re-transmission request can be sent from receiver to transmitter. Therefore, no data correction is available on S/PDIF data transmission.

If you feel the tunes through S/PDIF line is better than I2S because of its error handling mechanism, it might be due to a placebo effect.
 
Last edited:
As you may know very well, 1 bit parity information can just tell merely that the value is valid or invalid. No value correction is possible base on just one bit. For error correction, more redundant bits are necessary.
If you feel the tunes through S/PDIF line is better than I2S because of its error handling mechanism, it might be due to a placebo effect.

You said it "because of its error handling mechanism"..

Disregarding all other aspects of the differences between SPDIF and I2S and focusing only on the "error handling mechanism" of the SPDIF then giving the credits of the error handling mechanism to the placebo effect is a very strange technical conclusion ? ?

Here are a very simplified example:
If we use a SDTrans192 as a source and two DAC´s as the destination - one with direct I2S input - the other with a SPDIF receiver delivering I2S to the DAC.
The FPGA on the SDTrans192 clocks out I2S signals and the I2S signals are routed both to the SPDIF transmitter (onboard the SDTrans192) and directly to the I2S output pins.
One DAC receives the SPDIF signal and the other the direct I2S signals.

Lets say that we will send three samples:
1. A positive sample at 0dB.
2. A positive sample at -6dB.
3. A positive sample at 0dB.

Lets say that we just have transmitted the first error free positive sample at 0dB (positive clipping/maximum - a signed 16bit integer two´s complement value of 32767).

Then we are transmitting the positive sample at -6dB (a signed 16bit integer two´s complement value of 16383), but some electrical disturbance interferes and the MSB in both the I2S and the SPDIF transmission are changed to a "1" - changing the sample to a NEGATIVE value...

With I2S the decoded signal will then go from a (the first sample) positive sample (at 0dB below positive clipping) to a negative sample at -6dB (a signed 16bit integer two´s complement value of -16385) above negative clipping..

With SPDIF it could depending on the selected behavior of the error handling mechanism either be a zero value (a signed 16bit integer two´s complement value of 0), the last known good value a (signed 16bit integer two´s complement value of 32767) or the actual value (in error) (a signed 16bit integer two´s complement value of -16385) that are given to the DAC..

Then we are transmitting the third error free positive sample at 0dB (positive clipping/maximum - a signed 16bit integer two´s complement value of 32767).

How on earth you can assign the second samples (sample in error via SPDIF) conditional behavior to placebo I would like to see your logical or mathematical explanation of...
 
Last edited:
Dear RayCTech,

Your discussion in theory is completely correct. I2S has a fatal defect.
My technical background is in a computer science. No data transmission without error corrections can be acceptable in the field in principle.

Therefore, we must do our best for eliminating transmission errors in I2S. In my system where SDTrans192 Rev. 2.1 that you kindly upgraded power supplies for us is connected to TPA Buffalo II via I2S. The wires are shielded and wire length does not exceed an inch.

I can prove that absolutely no errors that you worry happen in my system.
If any erroneous bit flip occurs so frequently, you will always detect a large spike noise. You might easily think the system would be defective. I have never heard such spike noises.

Lets say that we will send 10,000 samples:
1. A zero sample at -96.3dB. 0b0000000000000000
2. A zero sample at -96.3dB. 0b0000000000000000
....
10000. A zero sample at -96.3dB. 0b0000000000000000
When
50. A zero sample at -96.3dB. 0b0000000000000000
is mis-received as
50. A negative sample at 0.0dB. 0b1000000000000000

If the erroneous bit flip occurs so frequently, every datasheet of receiver chip would excuse, "We handle data errors in this way, ....". Have your ever read such descriptions?

If you want to argue your point, please show us your clear actual recorded and reproducible evidences that bit flip errors happen during I2S connection.

Bunpei
 
Last edited: