A bash-script-based streaming audio system client controller for gstreamer

Why write some boring documentation when you can add more features??? :D

Previously I built the functionality wherein the user can run code (stored on either the client or the server) on the client during the launch and terminate operations for the gstreamer pipeline. I knew that this would have some practical applications, and today I had one. Unfortunately I needed a little more flexibility, so I have expanded the possibilities so that the user can specify whether a certain script should be run before or after the pipeline is launched or terminated.

For example, I'm testing out a new inexpensive ARM board for streaming audio. It has an onboard 24/192 codec, which is awesome. But I noticed that when I terminate the gstreamer pipeline to end streaming there are two "pops". These remind me of the sound you would get if you unplugged an RCA jack or something like that. I imagine that this is not something that I want being fed to my speakers, especially the tweeter. But I discovered that if I muted the ALSA device before terminating the pipeline there was no popping sound. So, how to do this automatically as part of my termination process?

I needed to be able to MUTE the audio on the client BEFORE terminating the gstreamer pipeline. Likewise in order to get sound the next time, I needed to be able to UNMUTE the audio on the client AFTER the gstreamer pipeline was restarted. The muting and unmuting operations can be done from the command line using amixer. I created two scripts, mute.sh and unmute.sh and I put these on the SERVER. Then I set:
LOCAL_SCRIPT_AFTER_LAUNCH=./unmute.sh
and
LOCAL_SCRIPT_BEFORE_TERMINATE=./mute.sh
in the system configuration file.

This solved the problem and has completely automated the action. I think that this is very nice functionality and can be used for many different purposes to automate actions on the client that you would normally perform manually, like turning on amplifiers, or whatever.

The nice thing about using the LOCAL SCRIPT option is that if I have multiple clients in the system they can all use the same mute and unmute script, so I only need to write it once and all clients reference the same file. This keeps things centralized and tidy.

I came up with another program function that I hope to code next. Hopefully I can wrap it up and get going on the documentation after that.
 
Apologies for the delay... documentation is coming along, but I just spent a couple of days trying to track down the following problem:

I decided to try upsampling and streaming at 96kHz. When I did that, the audio would drop out about once a second. I spent a lot of time trying to debug my code until I started to look elsewhere.

I finally discovered that the problem was with MPD. I had set it to resample using the "highest quality sinc converter", that is in my mpd.conf file I have:
Code:
samplerate_converter = "0"
This had no problem upsampling from 44.1 to 48kHz (the SR I normally use) but resulted in the dropouts when upsampling to 96kHz. If I used a faster converter (type="2") there were no dropouts, but I wanted the highest quality possible so this was not an acceptable solution.

I decided to try using Sox to resample. You do not need to pipe to Sox, you can resample with Sox from within MPD simply by changing the command to:
Code:
samplerate_converter = "soxr very high"
This will resample using Sox's highest quality resampling algorithm, which is very good. See:
SoX
SRC Comparisons

What I found to be very striking is that when I changed to resampling via Sox the CPU utilization was only about 5-10%. The previous method, which uses libsamplerate, used just about 100% of one core on my machine when resampling 441. up to 48kHz. This is probably why it was crapping out when I tried upsampling to 96kHz. Either the Sox algorithm is much more efficient, or the libsamplerate is very poorly optimized or perhaps was not compiled with any optimization.

Anyway, now that I have that sorted out I can back out a bunch of debugging statements and get back to finishing up the documentation.
 
Last edited:
Just FYI.

We have been discussing efficient libsoxr and libsoxr-lsr (libsamplerate bindings) quite a while back elsewhere.

Inmate phofman prepared some interesting and to me very useful posts on his blog.

POST1

POST2

He also showed a great solution to speed up Pulseaudio resampling.
Pulseaudio also caused extreme loads with high quality resampling during those days. I'm not sure if this is still the case nowadays.

I also think it's a good idea to check the output spectrum, as phofman did.

Cheers
 
Last edited:
Just FYI.

We have been discussing efficient libsoxr and libsoxr-lsr (libsamplerate bindings) quite a while back elsewhere.

Inmate phofman prepared some interesting and to me very useful posts on his blog.

POST1

POST2

He also showed a great solution to speed up Pulseaudio resampling.
Pulseaudio also caused extreme loads with high quality resampling during those days. I'm not sure if this is still the case nowadays.

I also think it's a good idea to check the output spectrum, as phofman did.

Cheers
I believe that it is the libsoxr library (not libsoxr-lsr) that is being used by the version of mpd that is currently installed on my machine. In any case, it is left up to the user to choose a player and resampler.

I initially thought that the dropouts were being generated by the code I wrote due to some error or whatever, but in the end I determined that was not the case. Since I did not know about the posts you linked to and those issues I thought I would mention it, since the CPU utilization dropped so significantly. Thanks for your reply about it. I hope these kind of issues are sorted out with continued feedback by the community.
 
Looks like I will be able to release version 1.0 tomorrow. Just need to build a web page on my site for it. I will post a link here when it is ready for download.

For now, ponder the system requirements, below:

Requirements:
------------------------------------------------------------------------------
Software:
  • OS: some flavor of Linux on both server and clients
  • BASH shell (included with most Linux distros)
  • Audio subsystem: ALSA
  • ALSA Loopback (snd_aloop) enabled
  • gstreamer 1.x installed on server and client(s)
  • music player software, e.g. MPD on server
  • SSH: sshpass or ssh with shared public key
Hardware:
  • A "Server" computer with a static IP address
  • One or more "Client" computers with static IP addresses
  • A Local Area Network (wired or wireless)
 
Today I did a little qualitative latency test of the audio streaming system and I thought I would share the results here.

I streamed 16-bit, 48kHz audio over 2.4GHz WiFi to an ARM client with an integrated WiFi chip. I attempted to stream with the client latency set to 100, 60, 20 and 10 milliseconds. This sets the amount of buffering of the stream at the client side. I was able to play the stream at all settings, however, the lowest amount of buffering resulted in audible (but very brief, e.g. 50msec) dropouts. With the buffering set to 60msec, dropouts were very rare and by 100msec no dropouts could be detected.

I repeated the experiment using a 24-bit 96kHz stream. 10msec buffering resulted in frequency brief dropouts. Increasing to 20msec reduced the frequency of these to about once per 4-5 seconds on average. The 60msec and 100msec results were approximately the same as for the 16/48 stream with no detectable dropouts or other audio artifacts at 100msec.

There is additional end-to-end latency resulting from playing the audio on the server, sending the stream, and rendering the audio on the client end that seems to come to another 200-300msec. I don't currently have a good way to measure this, so this number is a guesstimation on may part. Overall this seems to be close enough to "real-time" that you can adjust the volume without much guesswork about when the change will come through. If a compressed audio codec was used this might reduce transmission time but the time to compress and decompress the audio on each end might not improve overall latency, and I prefer to stream uncompressed audio in a format of my choosing.

It's important to note that this test was done using inexpensive WiFi gear. Using a wired LAN would allow for lower latency streams. The remaining latency is probably hardware (e.g. of the server and client) dependent.
 
1. The streams must be able to be rendered in sync. I use the RTP protocol for this purpose. RTP includes timestamps that are generated at the server side and these are used at the client side to control the rate that the stream is rendered to the local (internal) audio system, which in my case is ALSA.

...

4. I use NTP to synchronize the clocks of the server and all clients in my system. To provide low latency I added an NTP server to my network using a Raspberry Pi Zero. This is set up with the standard polling of some regional NTP timeservers. The goal is NOT to get accurate time/date on each client, but rather to get their clocks running at the same RATE. The actual time of day is irrelevant. By setting my NTP server to poll internet timeservers on a long time scale, and setting all local computers to poll my server on a very short timescale, everything can be kept in sync to within a couple hundred microseconds or better. For example, two clients in one of my systems are current showing -0.038 msec and -0.044 msec from true time. That's a difference of only 0.006 msec or 6 microseconds. That is really pretty darn good.

Hi Charlie,

I am confused with sync. You say you use NTP , but NTP is not mentionned as required software by GSASysCon instructions of use.

Can you please clarify this: Does Gstreamer alone allows synced multiple speakers, or NTP also required?
 
Hi Charlie,

I am confused with sync. You say you use NTP , but NTP is not mentionned as required software by GSASysCon instructions of use.

Can you please clarify this: Does Gstreamer alone allows synced multiple speakers, or NTP also required?

NTP is not required but in my opinion it helps with sync between multiple systems. When streaming, there is the stream buffer that needs to be filled and emptied at the right rate but additionally the computer where audio is rendered must also send the audio out from its DAC at the same rate as other clients. In my systems I use DACs running in adaptive mode. Their rate is set by the rate that data is send to their buffer from the host computer. So the host computer's clock should be running at the same rate as all the others in the system. This is where NTP helps.

The stream buffering done by the program receiving the stream can get synchronization from the stream itself (RTP timestamps only) or gstreamer can compute a rate and drift, or just a rate and assume the drift is zero. These different modes of operation are what you can set in the config file for my program. There is some scant documentation on all of this from gstreamer. But this will not affect the DAC rate. Multiple asynchronous DACs in the system will eventually drift apart because there is not way to synchronize their clocks. This is why I use NTP and adaptive mode DACs which indirectly get their clock rate from the host computer.

You are welcome to experiment with the settings and report back your findings. I wanted to make this available so that the user can choose which is best. At the same time I did incorporate a gstreamer element that tries to create a fixed rate output based on the local clock of the client and this is another reason to use NTP.

I welcome comments on all of this.
 
When all system clocks are running at the same rate, their USB busses are also running at the same rate, and the USB adaptive mode DAC clocks are all running at the same rate.

Sorry for pulling this older thread. The USB adaptive DAC clocks are derived from hardware clocks PLLed in the chipset, while "system clocks" NTP/PTP are purely software-based, they do not tune the hardware clock generator of the PC.

However, it is likely/possible the gstreamer chain drops/inserts samples to align with the time marks, regardless of the actual playback speed, virtually synchronizing the USB clocks on different devices.
 
Sorry for pulling this older thread. The USB adaptive DAC clocks are derived from hardware clocks PLLed in the chipset, while "system clocks" NTP/PTP are purely software-based, they do not tune the hardware clock generator of the PC.

However, it is likely/possible the gstreamer chain drops/inserts samples to align with the time marks, regardless of the actual playback speed, virtually synchronizing the USB clocks on different devices.

In my own system I use my streaming audio controller (GsaSysCon) to send PCM audio from a "server" computer to a bunch of "client" computers. Each and every computer has its own clock, and some types of DACs have their own clocks, too. Getting all of these to be in sync, or close to it, seems to give me good reliable sound. I sync all the clocks of the computers using NTP and a local GPS based Stratum 1 time server that I put together plus a 3-computer server stratus 2 cluster. Typical jitter/wander of any computer within the audio network from the NTP server is on the order of 20-50 microseconds. I use the gstreamer rtpjitterbuffer mode (mode 4) that assumes that sender and receiver clocks are well sync'd. I also use the gstreamer element audiorate. This inserts blank samples or removes extraneous samples from the as-received stream on each client to maintain a constant sample rate. This is important because without it, if/when streamed packets are dropped they are just eliminated and the stream "speeds up". Internally I may or may not process the audio stream on each client using ecasound and ALSA (to implement a DSP crossover). Then the audio data is sent to DACs and I may use more than one on each client. I need some way to control the DAC playback rate, and the only way I can do that under USB is to use a DAC with usb adaptive mode. I am not exactly sure how it derives the clock, but it is from the USB bus clock, which I believe is coming from the system clock and is therefore synchronized to other clocks throughout my audio network.

Using this approach seems to keep everything sync'd together well. I have also used the "server" computer as the NTP server for the system and that can also work OK. THe biggest challenge is that CPU load and environmental temperature both influence the clock rate, and because of the stream load on the LAN port of the server there is much more jitter in the responses to NTP queries. So while that approach is workable the overall jitter is higher, and wander is much higher because of the influence of temperature. Using a GPS based NTP server is best, because temperature is not a factor for it. There is still the possibility to suddenly heat or cool a client which will make its jitter increase temporarily. But mostly this is because I use the boards without any case, so the SOC is just sitting right there in the air...

I've been using this approach for a year or two and I am very happy with it.
 
Last edited:
Charlie, thanks for the details with numbers. I am not saying the streaming does not work (it does as your system proves), I just wanted to clarify the source of the USB clock.

I also use the gstreamer element audiorate. This inserts blank samples or removes extraneous samples from the as-received stream on each client to maintain a constant sample rate. This is important because without it, if/when streamed packets are dropped they are just eliminated and the stream "speeds up".

Yes. that is the key synchronization element.

I am not exactly sure how it derives the clock, but it is from the USB bus clock, which I believe is coming from the system clock and is therefore synchronized to other clocks throughout my audio network.

The USB bus is clocked by hardware oscillator in your chipset/SoC. It is unrelated to the "system clock" - that is purely software abstraction, running on CPU + likely some HPET to time the "ticks". But since your chain does the above described synchronization between required timing (time marks in the RTP stream + precisely synchronized system clock) and real playback (USB controller's internal hardware clock) the system experiences no buffer under/overruns.

It is just important to note the playback is not 100% bitperfect but IMO that is no problem at all, it would be inaudible in PCM stream. A non-PCM stream would be a problem (DD, DTS) but it is not used in this scenario.

I've been using this approach for a year or two and I am very happy with it.

Great job, your work has helped a lot to many others.
 
Charlie, thanks for the details with numbers. I am not saying the streaming does not work (it does as your system proves), I just wanted to clarify the source of the USB clock.



Yes. that is the key synchronization element.



The USB bus is clocked by hardware oscillator in your chipset/SoC. It is unrelated to the "system clock" - that is purely software abstraction, running on CPU + likely some HPET to time the "ticks". But since your chain does the above described synchronization between required timing (time marks in the RTP stream + precisely synchronized system clock) and real playback (USB controller's internal hardware clock) the system experiences no buffer under/overruns.

It is just important to note the playback is not 100% bitperfect but IMO that is no problem at all, it would be inaudible in PCM stream. A non-PCM stream would be a problem (DD, DTS) but it is not used in this scenario.



Great job, your work has helped a lot to many others.

Thanks for clarifying the clock source for the USB! I did not realize there was a separate source for that. So I will have to rethink WHY it seems to work (e.g. different systems maintain synchrony).

I am still working on the bash script, making small changes and improvements here and there. I will eventually release an update.
 
It works because the gstreamer plugin performs asynchronous reclocking (adding/removing samples
as needed) between the incoming stream timed by the synchronized system clock and the output DAC clock (USB controller clock). Very likely it would work for asynchronous usb dac too (clock by the DAC).