Switching Alsa Loopback Timer Source to Playback HW Device

The alsa loopback software device is by default clocked by a system timer. Chaining with any hardware soundcard thus introduces two clock domains which must be aligned somehow. Fortunately the loopback offers alsa controls for fine-tuning the internal software clock which is conveniently used in alsaloop or, more importantly, in CamillaDSP. The bridging SW must keep track of clock differences of devices on both sides, and using some controller (basically a PID regulator) fine tune the loopback clock to keep the production/consumption rates on average equal. But this controller has delays, can overshoot, which is fixed by keeping the bridge latency sufficiently long - undesired effect.

I noticed a feature added to the loopback kernel code in 2019 https://patchwork.kernel.org/projec...1120115856.4125-1-andrew_gabbasov@mentor.com/ . The loopback timer can be switched to a virtual timer provided by any running alsa device. Technically it's quite simple - when the master device fires end of its period, the loopback runs its end-of-period code too. As a result the loopback and the master device run absolutely synchronously.

The timer is either configured at module load (param timer_source), or dynamically via /proc/asound/LoopbackX/timer_source RW file. The configured/changed timer is applied when opening the loopback device, that's important to keep in mind.

I tested the feature on a chain with USB gadget as a playback device, with CamillaDSP as the bridge (no rate adjust, empty pipeline). The USB gadget conveniently allows to tweak its samplerate in a large frequency range with its alsa control while playing (works via USB async feedback messages to the host).

I switched the timer source to the USB gadget playback device on the USB host (empty string means the system timer):

Code:
echo 'hw:Gadget' > /proc/asound/Loopback/timer_source

CDSP would not start processing, due to a simple deadlock - the playback device waits for chunks received from the capture side, but the capture device is stalled (as the playback device is not running) and cannot produce any chunk to start with. This was simple to solve by sending two kickstart chunks from the capture side with zero samples before the capture loop (one chunk did not hit the configured playback device buffer threshold for start up).


Diff:
--- a/src/alsadevice.rs    (revision 3981740d8a4f38f00a44de3abee77ffd4342b546)
+++ b/src/alsadevice.rs    (date 1705754133966)
@@ -682,6 +682,14 @@
         peak: vec![0.0; params.channels],
     };
     let mut channel_mask = vec![true; params.channels];
+
+    if !send_zero_chunk(&channels, &params) {
+        return;
+    }
+    if !send_zero_chunk(&channels, &params) {
+        return;
+    }
+
     loop {
         match channels.command.try_recv() {
             Ok(CommandMessage::Exit) => {
@@ -871,6 +879,19 @@
     params.capture_status.write().state = ProcessingState::Inactive;
 }
 
+fn send_zero_chunk(channels: &CaptureChannels, params: &CaptureParams) -> bool {
+    let waveforms = vec![vec![0.0; params.chunksize]; params.channels];
+    let chunk = AudioChunk::new(waveforms, 0.0, 0.0, params.chunksize, params.chunksize);
+    let msg = AudioMessage::Audio(chunk);
+    if channels.audio.send(msg).is_err() {
+        info!("Processing thread has already stopped.");
+        return false;
+    } else {
+        info!("Sent kickstart zeros to playback.");
+    }
+    true
+}
+
 fn update_avail_min(
     pcmdevice: &PCM,
     frames: Frames,

With this simple change CDSP starts running fine, and no matter how much I torture the playback device rate, loopback runs synchronously and CDSP keeps the playback buffer levels basically constant:

Code:
amixer -c UAC2Gadget cset numid=1 1000000
Momentary freq = 48000 Hz (0x6.0000)

2024-01-20 14:44:05.865793 DEBUG [src/alsadevice.rs:573] PB: buffer level: 1584.0, signal rms: Some(HistoryRecord { time: Instant { tv_sec: 14944, tv_nsec: 337173553 }, values: [0.0, 0.0] })
2024-01-20 14:44:08.873720 DEBUG [src/alsadevice.rs:573] PB: buffer level: 1584.0, signal rms: Some(HistoryRecord { time: Instant { tv_sec: 14947, tv_nsec: 345099643 }, values: [0.0, 0.0] })
2024-01-20 14:44:11.881425 DEBUG [src/alsadevice.rs:573] PB: buffer level: 1584.0, signal rms: Some(HistoryRecord { time: Instant { tv_sec: 14950, tv_nsec: 352803979 }, values: [0.0, 0.0] })


amixer -c UAC2Gadget cset numid=1 875000
Momentary freq = 42000 Hz (0x5.4000)

2024-01-20 14:44:35.004469 WARN [src/alsadevice.rs:787] sample rate change detected, last rate was 42000.428535622406 Hz
2024-01-20 14:44:35.956677 DEBUG [src/alsadevice.rs:573] PB: buffer level: 1507.9, signal rms: Some(HistoryRecord { time: Instant { tv_sec: 14974, tv_nsec: 428059201 }, values: [0.0, 0.0] })
2024-01-20 14:44:36.028451 WARN [src/alsadevice.rs:787] sample rate change detected, last rate was 42000.5191411824 Hz
2024-01-20 14:44:37.052470 WARN [src/alsadevice.rs:787] sample rate change detected, last rate was 41999.34199663706 Hz
2024-01-20 14:44:38.076479 WARN [src/alsadevice.rs:787] sample rate change detected, last rate was 41999.594277356846 Hz
2024-01-20 14:44:38.979670 DEBUG [src/alsadevice.rs:573] PB: buffer level: 1508.0, signal rms: Some(HistoryRecord { time: Instant { tv_sec: 14977, tv_nsec: 451050650 }, values: [0.0, 0.0] })
2024-01-20 14:44:39.100444 WARN [src/alsadevice.rs:787] sample rate change detected, last rate was 42001.36209104719 Hz
2024-01-20 14:44:40.124472 WARN [src/alsadevice.rs:787] sample rate change detected, last rate was 41999.031151256 Hz
2024-01-20 14:44:41.148455 WARN [src/alsadevice.rs:787] sample rate change detected, last rate was 42000.761181763686 Hz
2024-01-20 14:44:42.002688 DEBUG [src/alsadevice.rs:573] PB: buffer level: 1507.9, signal rms: Some(HistoryRecord { time: Instant { tv_sec: 14980, tv_nsec: 474066920 }, values: [0.0, 0.0] })
2024-01-20 14:44:42.172459 WARN [src/alsadevice.rs:787] sample rate change detected, last rate was 41999.57122703359 Hz
2024-01-20 14:44:43.196445 WARN [src/alsadevice.rs:787] sample rate change detected, last rate was 42000.87151808399 Hz
2024-01-20 14:44:44.220464 WARN [src/alsadevice.rs:787] sample rate change detected, last rate was 41999.17634037181 Hz


amixer -c UAC2Gadget cset numid=1 1005000
Momentary freq = 48240 Hz (0x6.07ae)

2024-01-20 14:45:48.449535 DEBUG [src/alsadevice.rs:573] PB: buffer level: 1579.6, signal rms: Some(HistoryRecord { time: Instant { tv_sec: 15046, tv_nsec: 920911704 }, values: [0.0, 0.0] })
2024-01-20 14:45:51.463561 DEBUG [src/alsadevice.rs:573] PB: buffer level: 1579.0, signal rms: Some(HistoryRecord { time: Instant { tv_sec: 15049, tv_nsec: 934935325 }, values: [0.0, 0.0] })
2024-01-20 14:45:54.477358 DEBUG [src/alsadevice.rs:573] PB: buffer level: 1578.8, signal rms: Some(HistoryRecord { time: Instant { tv_sec: 15052, tv_nsec: 948735932 }, values: [0.0, 0.0] })

IMO this is a rather useful loopback feature which may allow using lower latencies (shorter chunksize) for CDSP bridges/simple DSP involving alsa loopback.

Cons:
  • The loopback device does not consume samples from the player until the CDSP bridge starts running (i.e until the loopback timer source starts running).
  • The /proc/asound/Loopback/timer_source is writable only by root.
  • The Loopback capture and playback devices must use the same period size (already used by CDSP)
 
Last edited:
  • Like
Reactions: 1 users
I was thinking about reliable implementation which added only minimum latency at start.

My testing modification was just a quick hack how to pass zeros to the playback device.

This is the playback buffer fill when running:

1705827065267.png

IMO the zeros should be injected on the playback side (to avoid startup delay in Processing).

The playback device needs to start consuming the incoming chunk from Capture with minimum delay to minimize the bridge latency. That means the playback device should have as few zero samples in its buffer as possible when the first captured chunk comes. Alsa offers rewinding, but some devices do not support it (a great paper on alsa rewinding is http://lac.linuxaudio.org/2015/papers/10.pdf ).

IMO the zero blocks should be only 1 period long (chunk is 2 periods) and sent more often (until the first chunk arrives). Maybe avail_min could be lowered to 1 period for the time of kickstarting, making the buffer fill fluctuate around 1 period. Using avail_min less than one period may be dangerous for devices which calculate the delay only at period boundary (i.e. delay information with only large granularity, the second chart) - it could result in buffer underflows.

Also playback xrun handling would need to be analyzed. Since xruns stop playback, that would also pause capture. Threshold is set at 1 sample, so any already captured chunk in the chain would kickstart the playback upon its restarting (dev.prepare). However if there was no chunk in the chain yet (like xrun occuring before capture had time to produce the first chunk), playback would again need the zeros kickstart. Maybe the zero kickstarting could handle both the initial startup and xrun recovery - no incoming chunk -> keep sending 1 period of zeros (with dev.wait), until the chunk arrives. IMO for xruns in most cases no zeros would be sent as in most cases the chain would already contain some chunk ready pro playback.

As of my test requiring 2 zero chunks to start up - I thought it was caused by threshold being set at 1 chunksize. But since it's at 1 sample only, IMO the first chunk reached playback, it was played, but this one loopback timer tick was not enough for the capture to finish generating its first chunk for the playback to continue running. Generating the kickstart zeros in the playback loop should avoid that.

Edit: Maybe the zeros block could be smaller than 1 period (like half), with the kickstarting loop running faster, to minimize the added delay.
 
Last edited:
The Wasapi and Coreaudio backends pump zeros to the playback device if there is no available data. They both use a separate thread for handling the low-level shuffling of data. The Alsa backend is different in that it runs a single thread only. Maybe it's time to start looking into adding another thread also there. It gives some advantages, and it would solve the kick-start right away. The downside is all the added complexity.

You mentioned that the period must be the same for the loopback and the clock source card. Would it be possible to run the two at different sample rates, if the period size is set to give the same period time for them? För example, would it work to run the loopback at 48 kHz with a period of 256, and a DAC at 96 kHz and a period of 512?
 
The Wasapi and Coreaudio backends pump zeros to the playback device if there is no available data. They both use a separate thread for handling the low-level shuffling of data. The Alsa backend is different in that it runs a single thread only. Maybe it's time to start looking into adding another thread also there. It gives some advantages, and it would solve the kick-start right away.
That sounds like the very best solution, your inner loop (with zeros playback) is already well tested. Actually IIUC the inner loop on playback may allow using different period times for capture (= chunksize) and playback. That could be convenient for some device combination and scenarios too (e.g. the USB gadget capture has a fixed period time, while the period time on playback could be larger = safer).
Would it be possible to run the two at different sample rates, if the period size is set to give the same period time for them? För example, would it work to run the loopback at 48 kHz with a period of 256, and a DAC at 96 kHz and a period of 512?
I just tested this scenario and it seems to run OK. Period time must be equal for the master and slave, but the actual period sizes do not seem to matter.
Loopback device as the timing master behaves weird, but regular HW devices seem to be OK (loopback as timing master would make no sense anyway).