Internet music streaming and sample clocks: how does it work?

Status
This old topic is closed. If you want to reopen this topic, contact a moderator using the "Report Post" button.
You might simply assume that it is the sound card that provides the sample clock when listening to music on your PC. I can see how this would work if the music data can be delivered 'on demand' in chunks at the average rate the sound card requires. Is this how it works when listening to personal music streams e.g. Spotify?

But what about internet music broadcasting e.g. internet radio, or live TV streaming? How is the discrepancy between your sound card's sample clock and the broadcast's sample rate resolved?

And there's a further technicality I'm curious about: some sound cards e.g. Creative Audigy, work at a fixed sample rate (48kHz) and supposedly re-sample everything to their own internal clock rate. I can picture this for, say, the SPDIF input, but how does it work for listening to 44.1kHz music off the hard drive? Does the PC provide a 44.1kHz (but surely jittery?) stream to the card, which it then re-samples to 48 kHz? Or does the card demand chunks of data from the PC at an average 44.1kHz sample rate and do the re-sampling purely in soft (firm-)ware? Or does something clever happen in the driver?

Thanks for any thoughts on this.
 
1. You have seen the word "buffering" on those stations?
The data comes when it comes, is stored in a big memory buffer and read with the audio clock is the one provided by PC. If they get too much out of wack due to low bandwidth, you will hear pauses and wait for buffering.
The same buffering can be used anywhere in the audio chain to isolate the clocks. It is a project here that does that for SPDIF transmission. There are CD/DVD players that use the buffering and dedicated audio DSP for the same purpose.
2. All the soundcards can work with any samplerate because they have a programmable frequency divider inside. Driver choses the right divider based on the Windows mixer "instructions". Not very clean, but works. Memory buffer is used inside Windows mixer to account for differences (and that 'mixer" is a quality issue sometimes, bypassed by ASIO drivers). Jitter from that divider of course is an issue.
Creative soundcards are a special case. They also can work with any samplerate, the difference is that they have an on-board DSP that does the samplerate conversion (ASRC). The second generation also have a bit-perfect mode that bypasses that DSP with a DMA (but you loose other effects). X-Fi has a monster DSP for that ASRC conversion. E-MU cards, like all the "pro" soundcards, have solved the problem with two different clocks (that have to be selected manually, per incoming stream).
 
Last edited:
But what about internet music broadcasting e.g. internet radio, or live TV streaming? How is the discrepancy between your sound card's sample clock and the broadcast's sample rate resolved?

This is a very good question. An interesting explanation of options can be found at http://www.icg.isy.liu.se/courses/tsin02-ici/slides/07_Realtime-4up.pdf . Basically either drop/interpolate samples to keep the input buffer optimally filled, or adaptively resample the stream using the incoming timing information in the stream. It is not a simple task at all. I have not been able to locate the place in mplayer source code where some form of synchronization occurs.
 
Because there is no "interpolation" or "resampling". You loose the stream, you loose the data, that's why is used a buffer, to allow for retries. That buffering protocol is not in the player, is in the network driver.

Yes, when referring to network latency/unreliability, there is the buffer. But you have to account for long-term bitrate difference of incoming and outgoing streams - timing in the transmitter, how fast it is sending data and timing in the sound card clock. If the transmitter is slower, you can always break the playback and wait a bit for refilling the buffer. If the transmitter is faster, you end up with full buffer and have to decide what to do with superfluous samples - either dropping, or adaptively resampling. Simple solutions use the dropping/sample interpolation (mplayer? I do not know), more sophisticated (such as network pulseaudio or jackd streaming) use continous adaptive resampling via libsamplerate to bridge the average incoming and outgoing clocks.
 
Because there is no "interpolation" or "resampling". You loose the stream, you loose the data, that's why is used a buffer, to allow for retries. That buffering protocol is not in the player, is in the network driver.

Buffer doesn't help in this case.

For example, if input stream has exactly 44100 samples per second, and the soundcard has clock frequency of 44090 samples per second, there will be an overrun of 10 samples per second - regardless of any "games" with buffers.

So, some kind of interpolation is necessary.

Look, for example, for 'jack_diplomat' in Sound Engineers Guide To Jack .
 
Buffer doesn't help in this case.

For example, if input stream has exactly 44100 samples per second, and the soundcard has clock frequency of 44090 samples per second, there will be an overrun of 10 samples per second - regardless of any "games" with buffers.

So, some kind of interpolation is necessary.

Look, for example, for 'jack_diplomat' in Sound Engineers Guide To Jack .

continuing that math, if you buffered one second before starting (or just send the previous seconds worth of music all at once), and the sample rates were as above it would take you 3 days before you hit an empty buffer, at which point the player stutters for a second and goes on for another three days.
 
continuing that math, if you buffered one second before starting (or just send the previous seconds worth of music all at once), and the sample rates were as above it would take you 3 days before you hit an empty buffer, at which point the player stutters for a second and goes on for another three days.

Yes, that is why simple stream players just drop or duplicate samples to keep the buffer filled. However, it is a compromise between quality/performance and complexity/CPU demands. E.g. jackd or pulseaudio do not accept this compromise and do proper adaptive resampling.
 
I guess my point was that i doubt many audio players do any kind of resampling and rather just accept that occasionally your buffer may under run. I have no doubt that other coding schemes are better if that is not acceptable.

Interestingly I was curious how bad just doubling 10 samples every second at 44100samp/sec would be. I worked an example of a 3khz sin wave. Here's the picture in the frequency domain. So the strongest peak from distortion is about -23db.

insnwn.jpg
 
Status
This old topic is closed. If you want to reopen this topic, contact a moderator using the "Report Post" button.