I realize that I don’t have the foggiest idea how digital audio is transmitted, at the basic layers. How is it steamed from a device to a network steamer? How is it transferred to a DAC? What protocols are used, how is transmission failure or errors handled?
Feel free to link me to articles, I know I’m asking very broad questions. I don’t know where to jump in.
Feel free to link me to articles, I know I’m asking very broad questions. I don’t know where to jump in.
IIUC transmission is the same as any digital data. Packets are sent as data through a network protocol stack. At some point a device driver for a particular OS may send data to a dac formatted in the way the dac hardware wants to see it. So it sounds like most of what you are asking about is general computer and networking stuff, not especially something dedicated to audio. Ultimately what a dac needs may include some control signals, and the audio data formatted for the particular dac operational mode which might be PCM or DSD at some sample rate (and a particular bit-depth, in the case of PCM).
The easiest to understand is by using the layered model, on the analogy of the OSI model in computer networking. That means the sender and the receiver on each layer communicate with each other using an agreed protocol, and they define the protocol of the next higher layer.
So for example the TOSLINK defines the communication by optical transmission. Over the TOSLINK there is the S/PDIF format that has the audio signal embedded. Both the transmitter and the receiver must understand S/PDIF. The next layer above S/PDIF is e.g. PCM 2-channel. And so on. Each layer has some kind of header containing the protocol information, and the actual data transmitted. The topmost layer contains the analog audio encoded to some digital format (like WAV on computer files). The sending device encodes from top layer to bottom, and sends it over the medium. The receiver decodes from bottom layer to the top, so you will get the original analog signal.
This TOSLINK, S/PDIF. WAV is just an example, there could be other protocols used on each layer. One can use Ethernet instead of TOSLINK, one can use I2S instead of S/PDIF, one can use MP3 instead of WAV, etc. Important is the sender and receiver should talk the same protocol on each layer.
So for example the TOSLINK defines the communication by optical transmission. Over the TOSLINK there is the S/PDIF format that has the audio signal embedded. Both the transmitter and the receiver must understand S/PDIF. The next layer above S/PDIF is e.g. PCM 2-channel. And so on. Each layer has some kind of header containing the protocol information, and the actual data transmitted. The topmost layer contains the analog audio encoded to some digital format (like WAV on computer files). The sending device encodes from top layer to bottom, and sends it over the medium. The receiver decodes from bottom layer to the top, so you will get the original analog signal.
This TOSLINK, S/PDIF. WAV is just an example, there could be other protocols used on each layer. One can use Ethernet instead of TOSLINK, one can use I2S instead of S/PDIF, one can use MP3 instead of WAV, etc. Important is the sender and receiver should talk the same protocol on each layer.
Just noticed that the original question also asked about clocks. Clocks should be located at the dac, and they should serve as the primary time reference for the dac. One way that can work well is by using 'Asynchronous USB,' which means the the USB data is sent as demanded by the dac and its clock system, rather than using the PC's clock as the primary time reference. In other words, data is sent 'asynchronously' relative to the computer's timekeeping.
Last edited:
The easiest to understand is by using the layered model, on the analogy of the OSI model in computer networking. That means the sender and the receiver on each layer communicate with each other using an agreed protocol, and they define the protocol of the next higher layer.
So for example the TOSLINK defines the communication by optical transmission. Over the TOSLINK there is the S/PDIF format that has the audio signal embedded. Both the transmitter and the receiver must understand S/PDIF. The next layer above S/PDIF is e.g. PCM 2-channel. And so on. Each layer has some kind of header containing the protocol information, and the actual data transmitted. The topmost layer contains the analog audio encoded to some digital format (like WAV on computer files). The sending device encodes from top layer to bottom, and sends it over the medium. The receiver decodes from bottom layer to the top, so you will get the original analog signal.
This TOSLINK, S/PDIF. WAV is just an example, there could be other protocols used on each layer. One can use Ethernet instead of TOSLINK, one can use I2S instead of S/PDIF, one can use MP3 instead of WAV, etc. Important is the sender and receiver should talk the same protocol on each layer.
The OSI is a perfect example….
Very well said
If you're asking about streaming audio over the Internet (or even just moving files that end up on hard disk), then you're asking about how data moves across it. The answer is TCP/IP, which is a bit if a rabbit hole.
Playing .wav or .mp3 files from a thumb drive (FLASH memory) or hard disk (oh, the technology!) may be more straightforward (that is, a little less complicated), but it all ends with "samples show up at the DAC at just the right time..."
Playing .wav or .mp3 files from a thumb drive (FLASH memory) or hard disk (oh, the technology!) may be more straightforward (that is, a little less complicated), but it all ends with "samples show up at the DAC at just the right time..."
Some comments about dedicated DAC quality vs DVD player DAC quality got me thinking about how the DAC works. I have seen some explanations on YouTube but I am looking for a simple explanation, like the OP is.
This could form a basis for discussion: https://en.m.wikibooks.org/wiki/Sound_in_the_Digital_Domain
This could form a basis for discussion: https://en.m.wikibooks.org/wiki/Sound_in_the_Digital_Domain
Sounds are created by vibrating surfaces. An electric motor, for example, creates a humming sound caused by the vibration of its surfaces in air. An electrical current is sent to the motor, and this electrical current, which has energy, is converted to sound energy, hence the sound you hear. This is the creation of sound through an analog signal, that is, through a varying electrical signal, which is also known as alternating current. A loudspeaker works in the same way. In the early days of sound reproduction, all signals were analog, that is, created by a mechanically vibrating device or an electrical current.
Using the popular sound editing tool, Audacity, a tone can be generated. The sound in question is a sine wave, and zooming in, the classic sine wave shape is revealed. The frequency of the sound is 440 Hz or the oscillations take place 440 times a second. You can also try out tones using an on line tone generator, but turn the volume down first!
Here is the zoomed in view of the 440 Hz signal in Audacity:
Using the popular sound editing tool, Audacity, a tone can be generated. The sound in question is a sine wave, and zooming in, the classic sine wave shape is revealed. The frequency of the sound is 440 Hz or the oscillations take place 440 times a second. You can also try out tones using an on line tone generator, but turn the volume down first!
Here is the zoomed in view of the 440 Hz signal in Audacity:
Analog signals are stored on vinyl records and on magnetic tape. The varying sound pressure is used to generate groove on a record or magnetic patterns on a tape that are then played back through suitable equipment.
How is an audio signal stored digitally? This is the basis for digital recording. To understand this, let's look at the sound wave again. The wave can be represented with two coordinates in a coordinate system the x-axis representing time, and the Y- axis representing amplitude, or strength of the signal.
Taking the sine wave as an example, it is possible to create a sine wave using a set of values, or to generate these values from a formula. We will do this using a spreadsheet.
How is an audio signal stored digitally? This is the basis for digital recording. To understand this, let's look at the sound wave again. The wave can be represented with two coordinates in a coordinate system the x-axis representing time, and the Y- axis representing amplitude, or strength of the signal.
Taking the sine wave as an example, it is possible to create a sine wave using a set of values, or to generate these values from a formula. We will do this using a spreadsheet.
you cloud start with PCM and I2S. understand connection and different between them in audio use is very helpful for beginners.
That's the transportation part of it, that comes later. He is nicely describing the generation/storage.you cloud start with PCM and I2S. understand connection and different between them in audio use is very helpful for beginners.
Jan
Isn't all that sort of introductory info already written up in many other places? Maybe links to a few well-written introductory articles would be more useful?
Modern (sigma-delta or delta-sigma) DACs and ADCs are quite complicated beasts, and I think efforts to simplify them wouldn't do them justice. Earlier technologies, such as R/2R ladders for DAC and successive approximation registers (or even flash converters, not to be confused with flash memory) for ADCs, are relatively easier to understand.Some comments about dedicated DAC quality vs DVD player DAC quality got me thinking about how the DAC works. I have seen some explanations on YouTube but I am looking for a simple explanation, like the OP is.
This could form a basis for discussion: https://en.m.wikibooks.org/wiki/Sound_in_the_Digital_Domain
https://en.wikipedia.org/wiki/Resistor_ladder
A successive approximation ADC takes a DAC (such as an R/2R resistor ladder) and finds the bit combination that through the DAC gives the closest voltage to an input analog signal:
https://en.wikipedia.org/wiki/Successive-approximation_ADC
A flash ADC is used in some applications other than audio, but still it shows a way of turning voltages into bits:
https://en.wikipedia.org/wiki/Flash_ADC
The original poster asked about the transmission of digital audio data. There are several transport protocols in use, and some are proprietary. To make things even more confusing, the audio data delivered to the DAC from a streaming source does travel trough several different digital systems, and the data packets that carry the digital audio signal are converted or encapsulated at each boundary. At the very least the audio data it is fetched from the storage medium inside the source computing device trough a high speed serial communication bus such as PCI-e. Then it is sent trough a communication network, usually a TCP/IP based one, and finally it is presented to the DAC chip trough a serial peripheral bus such as I2S. Each segment of the journey does have its own flow control system, error correction, and data transfer protocols. Most of this is generic data communication/computing stuff, but there are a few protocols that are targeted specifically to audio/video streams. As example, RTP (Real-time Transport Protocol) is the transport component of any modern streaming protocol such as AirPlay. It is layered over a underlining TCP/IP digital communication network, wich provides the required capabilities such as the UDP protocol stack. To learn about the basics there is a good wikipedia article about RTP.
Here's another resource for conversion between digital and analog, it's not simple but it's pretty much complete. Worst case, it shows the depth of this part of the rabbit hole.
http://dspguide.com/
http://dspguide.com/
To continue: we have a set of values that approximates the shape of a sign wave. If we choose more points, we could get a smoother curve. The object of this exercise, if you remember, was to represent a sine wave in terms of numerical values.
If you have experimented with a loudspeaker driver, you will see that, if a current is applied to the speaker, for example a DC current of 1 volt (this may damage the speaker, so hopefully you have seen it already: I have, when turning on an amplifier, for example, with the speakers connected) , the speaker cone moves out. We can safely assume, therefore, that a current applied to speaker terminals will cause it to move. With an alternating current such as a sine wave or music, the speaker cone will vibrate, producing a tone or music. The important thing to note is that this pattern of the desired movement is what is stored in analog or digital format.
So we store the information this way. At the first second, t=0 (for simplicity, using seconds instead of milliseconds) on the X- axis, the Y value is 0. At the next half second, the Y value is 0.48, and so on. In this way, the sine wave or music can be stored in some format.
It is clear that storing data at 1 second intervals is not going to do any good. The lowest frequencies used in music range from 30 Hz (30 cycles per second) to 20,0000 cycles per second. So, for 1 second of a sine wave, we will have to store 30 data values. For a 1 kHz tone , we will have to store 1000 data values (y values) and 1000, corresponding x or time values from 0 to 1000). This is the sampling rate of the signal.
What about the Y values, that is the amplitude values? How many of these have to be available for storage of data? Assuming you want to have a record of the amplitudes ranging from 0 to say 256, this will require 255 values. These values are stored in binary format, which means that 256 in binary will be like this: note the pattern:
256 in binary 100000000
234 in binary 11101010
The Wikibook puts it this way:
https://en.m.wikibooks.org/wiki/Sound_in_the_Digital_DomainSampling is the process of taking a continuous, acoustic waveform and converting it into a digital stream of discrete numbers. An ADC measures the amplitude of the input at a regular rate creating a stream of values which represent the waveform in digital. The output is then created by passing these values to the DAC, which drives a loudspeaker appropriately. By measuring the amplitude many thousands of times a second, we create a "picture" of the sound which is of sufficient quality to human ears. The more and more we increase this sample rate, the more accurately a waveform is represented and reproduced.
The text goes on to say:
Sampling accuracy and bit depth
It has been established that the higher the sample rate, the more accurate the representation of a waveform in a digital system. However, although there are many reasons and arguments for higher sample rates, there are two general standards: 44100 samples per second and 48000 samples per second, with the former being most commonplace
- Home
- Source & Line
- Digital Source
- How does digital audio even work? (Specs, protocols, error handling, clocks??)