LADSPA plugin programming for Linux audio crossovers

I'm not planning to do anything in ALSA. That just seems way too complicated.

It has been configured in the other thread with no resampling and supporting any sample rate. But there are always many ways, fair enough.

Anyway, that wouldn't work in my case because the card is open all the time in my application, even when there is no audio coming in.

You recording card is open constantly, your playback device (the snd-loop chain) is open by the playback application only.

RTSP is the answer to your question #2.
How will you synchronize to millisecond precision the two rtp streams coming to two clock-independent devices? I really do not know, IMO it is not so simple, especially considering the clock drift. A few millisecs of time difference will ruin the stereo image.
 
It has been configured in the other thread with no resampling and supporting any sample rate. But there are always many ways, fair enough.
You recording card is open constantly, your playback device (the snd-loop chain) is open by the playback application only.
No, that is not how my implementation works. The server, once I start a session, is always streaming even if that "stream" is silence. At the client, the playback device is also always "on" and playing this stream. For instance I can create a playlist of a few tracks on the server. When the playlist is over, the music stops but the stream continues. When the audio signal is no longer present at the client, after a short delay I want the amps to shut down. ALSA won't help in this case, because it still sees the stream open and audio coming through. It has no awareness of the audio signal level, which is what I use to determine when to power the amps up or down.

Anyway, like you mentioned, there are several approaches one could use. I'm choosing the LADSPA plugin based one.

How will you synchronize to millisecond precision the two rtp streams coming to two clock-independent devices? I really do not know, IMO it is not so simple, especially considering the clock drift. A few millisecs of time difference will ruin the stereo image.

I don't agree with your assertion that a couple of milliseconds difference in the playback timing between one speaker and the other in a stereo pair will "destroy" the stereo image. Just think about it. In one millisecond sound only travels about 0.35m or about 1 foot. If you are listening to your speakers and walk a few feet to one side, is the stereo image "ruined"? Definitely NOT. You need timing differences of on the order of 20 milliseconds or more before you really notice any difference, and it is not an abrupt change from "perfect stereo" to "ruined" but rather a subtle awareness of an echo (after you get up to several tens of msec) becoming more easily discernible the higher the delay.

Under RTSP the client is frequently querying the stream server and then updating their time information about the stream. There is no need for tight clock synchronization at the beginning because the playback timing is constantly adjusted to maintain sync by using variable playback speed on the client. This offsets differences in clock speed between clients and between the server and clients. If you stop and then re-start the stream on a client it will be back in perfect sync after a few seconds. Even more interesting - you can briefly (e.g. 2 sec) pause a client and then restart it (you would not normally do this - this action is intentionally generating a large playback timing different). The client resumes playing the stream (that is has been buffering) and will be, in this example, 2 seconds behind. Over the next 30 seconds it will "catch up" to the other clients that have been continuously playing, and will again be in sync. In my player, the maximum amount of time that the client will try to make up in the way is a user defined setting. If this is exceeded it will simply pick up the stream again fresh from the server just as if it was joining the stream in the first place.

It all seems to work very well from my experience, all over my WiFi network.


.
 
Member
Joined 2007
Paid Member
Very interesting! I notice that multiple AirPlay destinations do not maintain good synch. The timing differences between destinations are obvious. So AirPlay isn't ideal for 'Rockin' Party House' situations, which many of us secretly wish we needed regardless of the age of our friends!
 
I don't agree with your assertion that a couple of milliseconds difference in the playback timing between one speaker and the other in a stereo pair will "destroy" the stereo image. Just think about it. In one millisecond sound only travels about 0.35m or about 1 foot. If you are listening to your speakers and walk a few feet to one side, is the stereo image "ruined"? Definitely NOT. You need timing differences of on the order of 20 milliseconds or more before you really notice any difference, and it is not an abrupt change from "perfect stereo" to "ruined" but rather a subtle awareness of an echo (after you get up to several tens of msec) becoming more easily discernible the higher the delay.

Look at results of this experiment Realizability of Time-Difference Stereo Using the Precedence Effect The figures are in sample times, i.e. unit = 22 microseconds.

On good recordings of solo piano you can easily tell higher tones spatially apart of lower ones - the piano width.
 
Last edited:
Look at results of this experiment Realizability of Time-Difference Stereo Using the Precedence Effect The figures are in sample times, i.e. unit = 22 microseconds.

On good recordings of solo piano you can easily tell higher tones spatially apart of lower ones - the piano width.

Since you can't have any knowledge or measurements of the performance of my system, it seems you are assuming that there must be some significant timing flaws that should be obvious or cause obvious issues with the reproduction. If you are simply trying to point out that there MAY be issues IF timing differences are not controlled below SOME threshold, then you are right, and thanks for mentioning it. That's certainly a valid point. But the way you have framed your posts I didn't get the idea this was the intended message. You seem to have a general skepticism that this approach can work successfully when I am saying that I have actually used this setup myself and it seems to work fine using a very middle-of-the-road WiFi setup.

All I can really do to answer this line of inquiry is to refer to my previous statements in which I explain my setup, which seems to work fine for me. The academic paper you lined to, while valid in and of itself, presents some clues about the limits for timing differences, although I think that these are more like to 20 MILLIseconds than 20 microseconds based on what I know about the precedence effect. You are welcome to try it out and see if you detect problems and you are more than welcome to come back and report your observations here.

.
 
Since you can't have any knowledge or measurements of the performance of my system, it seems you are assuming that there must be some significant timing flaws that should be obvious or cause obvious issues with the reproduction.

I am not making any assumptions. I am asking what technology you use to maintain synchronization and time alignment between the right and left channels. The paper I pointed to shows it is in single millisecs (the charts show 100 x 20 us = 2ms was already significant). I know of only one open-source technology maintaining synchronization between networked independently clocked soundcards - netjack. There are certainly proprietary technologies used in the industry. Networked speakers is an interesting topic, but IMO it is pretty complicated for timed-aligned setups.

To put into perspective - why did you quest for sub-sample delays in your crossovers when the second set of speaker drivers not so far from the first set could be basically randomly delayed and yet make no difference in the resultant sound?
 
Last edited:
Why did you quest for sub-sample delays in your crossovers when the second set of speaker drivers not so far from the first set could be basically randomly delayed and yet make no difference in the resultant sound?

Varying time differences in the signal sent to each driver WITHIN one speaker will change the phase relationship between the drivers. In and around the crossover region, this will introduce interference (destructive or constructive) between the sound produced by the drivers which will alter the frequency response. So, changing the time delay between drivers is to be avoided because the frequency response will likely be significantly changed by even small time delays. This will be noticeable as timbre changes, etc. This is why you would never want to send the audio for each driver (already processed by the crossover) to each driver as a separate stream.

On the other hand, when there are time delays between two loudspeakers this can influence how the brain processes the sound and recreates a stereo "scene". There are some cute demos on the web showing even a few milliseconds can shift the apparent position of the stereo image to the right of left. All I can say is that I really haven't experienced this kind of thing in my own setup. That may infer that the synchronization between clients is at least on the order of a couple of milliseconds.

As for your questions of "how" and "what technology" I suggest you read up on RTSP and RTP.
 
I did the homework and studied on RTP/RTCP. Great reading for principles is RTP > Designing IP-Based Video Conferencing Systems: Dealing with Lip Synchronization and next pages.

Still I could not find real-world figures for achievable precision of the playback synchronization. Here is a VERY interesting paper http://www.genelec.com/documents/publications/ICME_2009.pdf, quoting:

He found that lip synchronization tolerated up to 80 ms jitter between the visual and auditory signals to be imperceptible by human recipients. In other multimedia scenarios jitter for good synchronization quality ranged from 500 ms (loosely coupled audio, such as speaker and background music) to 11-20μs (tightly coupled audio, such as stereo channels creating an auditory image)

Also quoting:

However, RTCP is not designed for high precision playback synchronization of tens of microseconds between multiple recipients.

Unfortunately this report does not have any open source code associated...

Measuring the time difference between analog outputs of two RTSP clients sounds like a fun project. Please what rtsp server/rtsp receiver do you use? I do not have a pi but three x86 machines should do.
 
Measuring the time difference between analog outputs of two RTSP clients sounds like a fun project. Please what rtsp server/rtsp receiver do you use? I do not have a pi but three x86 machines should do.

Thanks for the links. I will give them a read today.

I use VLC as both the server and player. It's not perfect, but it is free and has a lot of built in functionality both in its GUI based and command line versions.

As you have discovered there doesn't seem to be anything published regarding the timing performance of RTSP/RTP. If you could do some testing with your machines that would be awesome. I have been thinking about how I might try to do that here as well. Then we would have some info for a couple of different LANs, which would be typical for the use of in-home distributed (and possibly wireless) audio. If you have any thoughts on testing methodology, feel free to post them.
 
On the testing methodology - I am thinking of measuring phase shift between pure sine signals. I would start with single tens of Hz (tens of millisecs period) to make sure the time difference does not span over one period of the signal. Increasing the frequency in subsequent measurements should improve the measurement precision, until the shift exceeds one period. I assume the time shift will not fluctuate within each measuring cycle, after letting RTCP do its work for a while.

My "code-upgraded" rigol entry level scope should be able to measure the phase shift with enough time precision (500Msps for each of the two channels).

The plan is set, now to find the time. It will take a while, I confess :)
 
Last edited:
OK, I did some RTSP client delay measurements using ARTA. I set up two Raspberry Pi 2s as RTSP clients and had a laptop for my RTSP server. All of these were set up in my garage, which is not in a very good area for signal strength, but I have the space to spread out. I used an outboard Firewire soundcard to capture the data. I sent the signal from ARTA out from the laptop using RTSP, which travels wirelessly back to my router and then back down to the PIs, out through the DACs and into the soundcard. One input channel was taken from the DAC connected Pi #1 and the other input channel was taken from the DAC connected to Pi #2. In ARTA I used a 2-channel capture mode that uses the signal on one channel to trigger acquisition. From the captured data you can then easily get the differential group delay. I had to manually run the test, and I could do this about every 20 seconds or so.

I have to say there was more delay than I expected, and the amount of delay changed over time, minute to minute. The delay would remain stable for a couple of measurements and then the value would change in a stepwise fashion (e.g. not slowly drift). The amount of delay would typically be about 20-30 milliseconds or less, with occasional spikes to 50ms. I am suspicious about the connectivity in my garage, so I plan to move indoors and do some more testing. I can try a hardwired connection to my router to eliminate the WiFi system (I think I have a couple of free ports available) and if that looks better I will try again with WiFi but in a location that is not so distant. I only have these mini WiPi USB dongles on the Pis...

So, what to think of all of this... the numbers that I measured are not as good as I was expecting from some listening tests that I did, where everything seemed fine. I have to admit, however, those tests did not use a setup that was ideal for critical stereo listening, in fact I was using a Pi on one side of the room and a PC on the other side of the room and sitting half way in between. On the other hand, as I have been saying, even a 20 millisecond L-R difference for stereo may tweak the soundfield but may not be obvious or all that objectionable (to some). Would I prefer a lower and more stable delay? Yes, definitely. But as it stands now the system is still workable. I will post again when I have some new measurements to talk about.
 
Thanks for the measurements. Honestly, IMO tens of ms difference is not recommendable. According to the scientific experiments it is several orders of magnitude more than it should be ideally. I understand why those finnish guys said plain rtsp is not suitable for "swarm" setups.

But again thanks a lot for the measurement.

Eventhough the guys mention the squeezebox protocol as being designed for tight synchronization, I am not that convinced. It certainly has some features built in http://wiki.slimdevices.com/index.php/SlimProto_TCP_protocol but the squeezeplay does not do any rewriting of the alsa soundcard buffer, it is a standard interrupt-driven alsa player like e.g. mpd, sox, etc. If the period in L-channel player is 10ms and 20ms in the right channel (something the server has no control over), I do not think the two channels could play with time differrence below 1ms.

I would trust the pulseaudio protocol since PA regularly modifies the playback buffer just ahead of the DMA reading pointer of the soundcard.

OK, I will do the test with PA :)

Yet adding all the filters into the chain will likely ruin the low-level synchronization, IMO.
 
Extremely interesting thread.

Some questions for Charlie:
1. Earlier you indicated a desire to use the HDMI output + multi-channel audio extractor. In other posts not necessarily in this thread you indicated that this is not the preferred way to output audio from RP2 and USB DAC is better. Can you kindly step me through the pros and cons?

2. In an implementation that requires multiple clients to be synchronized, how tight does the synchronization need to be? I'm thinking through a patio system where each client is a mono channel, but with 2-way crossover. A listener would be able to hear sound from multiple speakers in this scenario

3. Some time ago I standardized on Squeezebox hardware, which offers "good" synchronization and supports iTunes libraries. If you're in a testing mood, would there be any interest in seeing how good Squeezebox synchronization is? I'll drive....

4. For a complete system, how do you propose to control volume?

I saw/heard an earlier version of your system at the last NorCal DIY meet and was mightily impressed. Nice to see that development continues.

Thanks and regards,

Rob
 
Hi Rob - see my comments, below:

Extremely interesting thread.

Some questions for Charlie:
1. Earlier you indicated a desire to use the HDMI output + multi-channel audio extractor. In other posts not necessarily in this thread you indicated that this is not the preferred way to output audio from RP2 and USB DAC is better. Can you kindly step me through the pros and cons?
All I can say is that after I tried for about 10 days or so to get multichannel HDMI audio working reliably I gave up. Too bad, since I bought two HDMI extractors! You are welcome to borrow one to try things on your end, if you would eventually return it. One thing that was working against me was that I used a DVI-D monitor with an HDMI to DVI adapter. The Pi sees DVI and turns off the HDMI audio channels. I tried editing the config.txt file in all sorts of ways, but I could only manage 2-channel output via HDMI. In any case, this did not end up being a game changer because the big question mark in all of this was how "good quality" the audio out of the HDMI audio extractor would be. I never got to find out, but there were no specs provided so my expectations were not all that high.

So my next experiment was to add a USB to SPDIF dongle to the Pi. This was an immediate success. The dongle was recognized on bootup and I could send SPDIF out. I had this setup supplying audio to a system that had a built in miniDSP (the one you saw last year) but I have since used the Pi for my LADSPA crossover stuff (this thread's topic). I might go back and revisit the SPDIF system while I am checking into this whole latency/synchronization thing...

2. In an implementation that requires multiple clients to be synchronized, how tight does the synchronization need to be? I'm thinking through a patio system where each client is a mono channel, but with 2-way crossover. A listener would be able to hear sound from multiple speakers in this scenario
Honestly, in casual listening you might not notice much. Even when it is "not well sync'd" it's still within about 50msec for only short while. Actually, this can give mono sources have a "fake stereo" effect, so you might enjoy it!

3. Some time ago I standardized on Squeezebox hardware, which offers "good" synchronization and supports iTunes libraries. If you're in a testing mood, would there be any interest in seeing how good Squeezebox synchronization is? I'll drive....
Once I have the time to do some more measurements on my setup I might take you up on your offer. I recall hearing that the Squeeze has pretty good synchronization, but I don't have any first hand experience with it.

4. For a complete system, how do you propose to control volume?

I saw/heard an earlier version of your system at the last NorCal DIY meet and was mightily impressed. Nice to see that development continues.

Thanks and regards,

Rob

If latency can be made low enough you can control volume at the source. If I reduced buffering within the various processing stages in my recent streaming system to 100msec in my setup I could get a real-time latency of about 0.75 second, which was a tolerable lag when changing volume, stopping or starting a track, etc. Again, I need to experiment more with this and figure out exactly where buffering is needed and how much. This would likely depend on how reliable the connection is (e.g. strong WiFi signal?), or other factors and there may not be a hard and fast rule that could universally apply.

Finally, glad you like my system last year. I'm adding a large OB sub to fill out the low end and plan to redo the baffles and driver mounting for this year's version. I'm still building it, and hopefully I won't run to any snags!
 
revised:

I just found this link in which software clocks on twp Pis were tested:
http://blog.remibergsma.com/2013/05/12/how-accurately-can-the-raspberry-pi-keep-time/

The drift was found to be about 50msec over 10 minutes. This seems to indicate that the software clock drift can't account for the amount of timing error (jitter?) that I observed previously. I'm thinking that is coming from transport or ecasound processing. Will have to look into this in a little more detail later.
 
Last edited:
I have been thinking of adding a RTC module for a different project (openmediavault file server). Perhaps adding a RTC could help with the audio too. Here's the how-to instructions: RPi 2(RTC module): from unruly Stepchild to Wunderkind - General - OpenMediaVault

The author reports good improvements for real-time charts, etc.

Another interesting note that maybe can add some ideas to explore is the concept explained in the shairport-sync wiki: "A much simplified version of NTP clock synchronisation protocols is used to keep the Shairport Sync local clock in sync with the source to well within a millisecond. All timings are made relative to that local clock.
As well as getting more accurate timing information from the source, Shairport Sync gets accurate timing information from the output device (via ALSA in Linux), so it can work out the exact time difference between incoming and outgoing audio."
https://github.com/mikebrady/shairport-sync/wiki
 
I also read somewhere that the Beaglebone Black supports hardware time stamping which means that PTP could be implemented without a software only config. How to set up ptpd, pulseaudio, etc., in Linux is far beyond my current knowledge level. But, I'm starting to warm up to the idea that Beaglebone is a better platform to start from, apart from video performance which I don't care about.

Regards,

Rob
 
I have been thinking of adding a RTC module for a different project (openmediavault file server). Perhaps adding a RTC could help with the audio too.

From what I have read recently, adding an RTC won't improve the kernel clock. The RTC is used at startup to set the kernel clock with the time and date stored in the hardware RTC and then the OS does not refer to it again until the next boot up. The kernel clock would continue to drift as before. The RTC is most useful when the system is rarely (or at least not continuously) connected to the internet.

I also read that the software clock in Raspbian tries to slowly "catch up" over an hour or more. See the top answer to this question:
http://raspberrypi.stackexchange.com/questions/4370/where-does-the-raspberry-pi-get-the-time-from
This might explain the problems that I have observed with my measurements, e.g. if the software clock is continually adjusting itself during playback. I might try to set the time manually to some time server the next time I make playback sync measurements.
 
Last edited: