Two R Pi synchronized with NTP OK ? Inspired by C. Laub speech at Burning Amp

Status
Not open for further replies.
Hello,

Sorry to open another thread, but I'm exploring the different options. The presentation by Charlie Laub at Bruning amps 2015 (2015 Burning Amp speaker: Charlie Laub - YouTube) is very inspiring. Using a different R Pi in each speaker makes a lot of sense as long that sufficient sync is maintained between the two audio streams. It is said in the presentation thet the NTP protocol removes the roadblock and synchronize the two R Pi.

I would be happy to learn more about the results, and if this NTP sync is:
- promising, but still not perfectly sufficient,
- or more than enough.

This would help to decide if from time fidelity perspective, using 2 R Pi (one in each speaker) is on par with using one central R Pi for both speakers... or not.

Thanks for sharing all those precious information.

Best regards,

JMF
 
I can comment on that!

Well, it turns out that NTP really doesn't have anything to do with it. I only figured this out about 6 months after my presentation at BA last year after some back and forth posting with DIYaudio member forta... At the time I made the presentation I felt that NTP running on all the systems did help to keep the timing together. Since then I have continued to modify and improve my streaming audio setup and currently things work quite well as long as the connection between the hardware that is streaming the audio and the hardware that is receiving it in the loudspeaker is relatively good quality. Nothing super fancy, but don't try to do it with a weak WiFi signal or there will be many more problems leading to dropouts and resync issues.

So, if I am not using NTP what am I using?
There is a protocol that is designed for "real time" streaming, in fact it's called the "real time protocol", or RTP. It includes some time base info that the receiving client(s) use to determine the playback speed. NTP is a wrapper around other data. Unfortunately it can't hold just any type of data, so for instance FLAC can't be streamed using RTP. But native PCM can and it doesn't get better than that. I stream 16-bit PCM at 48kHz in my system for instance. With a fast WiFi or wired connection you can do higher bitrates and deeper bit depths. For me that is not important.

How do I get the audio stream to each speaker and get them to play it back in synchrony?
This has also changed since I gave my talk. I now send a separate "unicast" RTP stream to each loudspeaker from my audio server. I have some shell scripts that do this for me. It's not super straightforward to explain here and there are several things going on. But the main actor is VLC, which is streaming multiple unicast RTP streams, one to each speaker. RTP is not "self starting" meaning that the client needs to know about the payload in order to receive the stream. The info used to describe the stream is provided in a few lines of text in an SDP file (session description protocol) by VLC. This can be provided to the clients in a couple of different ways, and previous I was using RTSP for this because it was easy and it mostly worked. But using RTSP does occasionally cause more problems that it solves, so recently I developed a new solution. The instance of VLC running on the audio server can also write the SDP to a file. I write a separate SDP file for each unicast stream to a local directory. I use NFS (network file system) to export this directory to all my client computers. On each client I mount this directory, and point VLC to "open" the SDP file. The client VLC finds all the info it needs about the stream in the SDP, and starts playing.

One small but very important detail about the RTP stream and its SDP file is that there is a long number that is stream ID. Each time you stop and restart the RTP stream, a new ID is generated. For this reason, the clients must get real-time access to the SDP info. If you did not get fresh SDP info, the stream could by played at most once by the clients and then the ID would change and the client would return "stream not found".

Also, I should note that all the machines in my system have a fixed IP address assigned to them by the router when they power up.

I've been meaning to put together a tutorial on streaming with some examples, but I have my hands in too many pies at this time so I haven't yet done that. Anyone who is really intent on banging their head against this challenge is welcome to contact me and I will try to help you get up and running.

Is the wireless system playback just like a wired connection?
The realities of WiFi can include brown outs (slowing of throughput) and dropouts (brief loss of connection), at least in the 2.4GHz band where I live. When I scan the network I see my neighbor's Wifi systems, their wireless printers, etc. all wanting one of the available channels. YMMV depending on the band you use and how crowded it is. When my streaming audio system starts playing and for the first few seconds (e.g. about 10 seconds) RTP is detecting and honing in on the correct playback rate from the incoming data stream. If there is a WiFi dropout and the stream is lost, my system will immediately attempt to reload it. What happens is that one channel mutes for a second before audio plays from the newly re-acquired stream - for the first few seconds one might perceive a subtle effect of the RTP syncing on certain types of instruments (e.g. piano or violin) if you listen closely. Increasing the amount of buffering can sometimes reduce the occurrence of dropouts to never, depending on how consistent the stream is being received and this might not be a problem for you. With a strong wireless connection that stays up and at full speed (or a wired connection, but what's the fun in that?) this should not be a problem. Other than that, yes it is like a wired connection that is playing non-compressed audio: in short, great!
 
Thank you Charlie to have taken so much time to share all those information and progress.

I understand from your detailed explanations that you succeeded to implement those 2 synchronized streams, but that it is not so straight forward. I also understand that from usage perspective, the main benefit is to go wireless.

My set-up cas cope with a central CPU+cables to the speakers (at least one one cable by speaker).

So I may not go that route....

Thank you again Charlie for sharing, and for your libs for soft DSP applications: this is still something tha I want to try.

JMF
 
I didn't mean to discourage you!

It might be easier to implement portions of the complete system in steps. Maybe you want to start with one computer doing all the DSP and use wires to connect to the loudspeakers. Once that is working you could get two more computers and then send the audio wirelessly to them. The DSP crossover part can stay the same. In this way you can slowly build the different parts of the system instead of trying to do it all at once. Just a suggestion.
 
To get good sync between speakers look at PTP NTP vs PTP: Network Timing Smackdown! Its much more accurate.

I have worked with many wireless systems and the most common issue is not tgetting the sources stable, its getting them to the same stable every time. When the L/R timing shifts with system startup its unsettling. The current generation (Sonos, Allplay, PlayFi) have a fixed and significant latency, usually about 500 mS to allow the network buffers to fill and cover for lost packets. if the control (play Pause etc.) is not delayed most users are completely unaware of the delay, unless you are syncing with video.
 
I don't have any of this hardware lying around:
https://en.wikipedia.org/wiki/List_of_PTP_implementations

Would be interested in learning about software implementations of PTP.

Anyway, you are correct. There is a delay of about 500msec or even more between when I push "stop" and when the audio does so at the speakers when using my WiFi wireless playback system. But that's certainly something I can live with.
 
This suggests you don't need special hardware in a small system PTP in networks without timing support
Much to learn.

In fact, you don't need any time synchronization, so neither NTP nor PTP are useful (as it turns out). I know we spoke about this and PTP at Burning Amp, and at the time I thought otherwise. I now know that I don't need to do any clock updating or syncronization, and my clients no longer have real time clock hardware onboard nor do they try use any mechanism to adjust for local (kernel clock) drift. Timing is all based on the incoming stream and the embedded timestamps in RTP. This was actually the case all along but because of other network and stream issues I was having it appeared that using NTP and keeping clocks synchronized improved playback. Now I know that to be false.

A readble overview of RTP is found here:
What is Real-Time Transport Protocol (RTP)?
and this excellent FAQ answers many questions about RTP:
RTP: Some Frequently Asked Questions about RTP
 
Hi Charlie,

Thank you for the complementary information. I start to see the principles behind your solution, but I can't get a feeling about the performance from time accuracy perspective.

I understand that you have a Central Computer, that streams audio to each speaker: one unicast for Left channel data, to a R Pi in the left speaker, and one unicast for Right channel data, to a R Pi in the right speaker.

Each R Pi in the speakers performs the Crossover calculations on the received audio channels, and then drives DACS using the USB ports.

Do I understand correctly?

I imagine that it is important to have constant latency between the sending of the data by the Central Computer and the signals at the input of the DAC:
- always same time,
- no jitter (not sure that it is the right name in that case).

From your experiments, or theory, or feeling, is this time accuracy from the sender to all the DACS on par with what we would get with a SPDIF link, or with USB in adaptative mode (or worse, or better)?

Does it makes a difference to perform the DSP calculations in the central computer instead of the speakers R Pi (less software process, more Real Time)?

I see that some people are ready to spend 10 k€ for products with very good clocks. I try to assess how near wa can get from the same results with cheap electronics and sound design.

Best regards,

JMF
 
Regarding sync, i understand that in a house wide multi room setup with speakers in each room, sync is important to avoid echoes, and it's enough to keep sync within a few tens of msec to get the job done.

But when it comes to get in sync a left speaker and a right speaker in sync, i am afraid we need a much tighter time frame. After all, 1 ms time difference is worth 30cm travel distance.

Hence my question, how tight that sync can be made? 10ms? 1ms? 1us?
 
Last edited:
Regarding sync, i understand that in a house wide multi room setup with speakers in each room, sync is important to avoid echoes, and it's enough to keep sync within a few tens of msec to get the job done.

But when it comes to get in sync a left speaker and a right speaker in sync, i am afraid we need a much tighter time frame. After all, 1 ms time difference is worth 30cm travel distance.

Hence my question, how tight that sync can be made? 10ms? 1ms? 1us?

I have been able to get well below 500usec, around 100-200usec, in a system with two independent (left and right speaker) clients operating from a stable WiFi stream.
 
Does it makes a difference to perform the DSP calculations in the central computer instead of the speakers R Pi (less software process, more Real Time)?

I see that some people are ready to spend 10 k€ for products with very good clocks. I try to assess how near wa can get from the same results with cheap electronics and sound design.

And on my previous question about location of the DSP calculations in the architecture. What would be your idea?

JMF
 
I'm confused about why you would want to do this?

Is it just to stream to a pair of speakers, or is there a more compelling reason to do so - like some sort of digital crossover/dsp?

I have a streaming Pi-based pair of speakers using a QAudio Pi-DIGIAmp+ and Moode. It works great.
 
I have been able to get well below 500usec, around 100-200usec, in a system with two independent (left and right speaker) clients operating from a stable WiFi stream.

Looks very good indeed. Did you need to do something special to get this, or is it "standard" by using RTP streaming ?

Btw, as others, if find RTP streaming a bit ... scary...😱 Especially the sdp stuff, and all you had to build on the server side to take care of the streaming... Not really for a beginner like me...😱

How do I get the audio stream to each speaker and get them to play it back in synchrony?
This has also changed since I gave my talk. I now send a separate "unicast" RTP stream to each loudspeaker from my audio server. I have some shell scripts that do this for me. It's not super straightforward to explain here and there are several things going on. But the main actor is VLC, which is streaming multiple unicast RTP streams, one to each speaker. RTP is not "self starting" meaning that the client needs to know about the payload in order to receive the stream. The info used to describe the stream is provided in a few lines of text in an SDP file (session description protocol) by VLC. This can be provided to the clients in a couple of different ways, and previous I was using RTSP for this because it was easy and it mostly worked. But using RTSP does occasionally cause more problems that it solves, so recently I developed a new solution. The instance of VLC running on the audio server can also write the SDP to a file. I write a separate SDP file for each unicast stream to a local directory. I use NFS (network file system) to export this directory to all my client computers. On each client I mount this directory, and point VLC to "open" the SDP file. The client VLC finds all the info it needs about the stream in the SDP, and starts playing.

One small but very important detail about the RTP stream and its SDP file is that there is a long number that is stream ID. Each time you stop and restart the RTP stream, a new ID is generated. For this reason, the clients must get real-time access to the SDP info. If you did not get fresh SDP info, the stream could by played at most once by the clients and then the ID would change and the client would return "stream not found".

Maybe would like to try with something easier... Heard about forked-dappd as an airplay streamer, but don't know if would work...

http://ejurgensen.github.io/forked-daapd/
 
Last edited:
I'm confused about why you would want to do this?

Is it just to stream to a pair of speakers, or is there a more compelling reason to do so - like some sort of digital crossover/dsp?

I have a streaming Pi-based pair of speakers using a QAudio Pi-DIGIAmp+ and Moode. It works great.

Yes, it is to perform digital crossover: I try to understand what are the pros/cons of each option for the location of the Crossover calculations:
- Cross-over for both speakers in central CPU (pro=synchronization, con = 4 streams to send to speakers),
- Cross-over in a R Pi inside each Speaker (pro = only one stream per speaker, and one output from the computer, which is the standars, con = synchronization)

JMF
 
Yes, it is to perform digital crossover: I try to understand what are the pros/cons of each option for the location of the Crossover calculations:
- Cross-over for both speakers in central CPU (pro=synchronization, con = 4 streams to send to speakers),
- Cross-over in a R Pi inside each Speaker (pro = only one stream per speaker, and one output from the computer, which is the standars, con = synchronization)

JMF

There is NOT enough synchronization between streams to do the crossover and then streaming the crossover output (e.g. different streams for tweeter/midrange/woofer) to the loudspeakers where they are rendered to the analog domain via a DAC. The phase angle between each of these streams will be constantly changing and, as a result, so will the frequency response around the mid-tweeter and woofer-mid crossover points. I do not recommend this approach.

There IS (or there can be) enough synchronization to stream two copies of the same stream, one to the right speaker and one to the left, and then do the crossover processing at the speaker. Within the R-Pi (or whatever hardware you use for the software DSP) the individual channels stay 100% in-sync. If you use USB hardware that gets its clock from the USB bus, then all of your USB output devices will also be in perfect sync, too. I use this approach, and I can recommend it to others.
 
Status
Not open for further replies.