Switching Alsa Loopback Timer Source to Playback HW Device

phofman · 2024-01-20 2:04 pm

The alsa loopback software device is by default clocked by a system timer. Chaining with any hardware soundcard thus introduces two clock domains which must be aligned somehow. Fortunately the loopback offers alsa controls for fine-tuning the internal software clock which is conveniently used in alsaloop or, more importantly, in CamillaDSP. The bridging SW must keep track of clock differences of devices on both sides, and using some controller (basically a PID regulator) fine tune the loopback clock to keep the production/consumption rates on average equal. But this controller has delays, can overshoot, which is fixed by keeping the bridge latency sufficiently long - undesired effect.

I noticed a feature added to the loopback kernel code in 2019 https://patchwork.kernel.org/projec...1120115856.4125-1-andrew_gabbasov@mentor.com/ . The loopback timer can be switched to a virtual timer provided by any running alsa device. Technically it's quite simple - when the master device fires end of its period, the loopback runs its end-of-period code too. As a result the loopback and the master device run absolutely synchronously.

The timer is either configured at module load (param timer_source), or dynamically via /proc/asound/LoopbackX/timer_source RW file. The configured/changed timer is applied when opening the loopback device, that's important to keep in mind.

I tested the feature on a chain with USB gadget as a playback device, with CamillaDSP as the bridge (no rate adjust, empty pipeline). The USB gadget conveniently allows to tweak its samplerate in a large frequency range with its alsa control while playing (works via USB async feedback messages to the host).

I switched the timer source to the USB gadget playback device on the USB host (empty string means the system timer):

Code:

echo 'hw:Gadget' > /proc/asound/Loopback/timer_source

CDSP would not start processing, due to a simple deadlock - the playback device waits for chunks received from the capture side, but the capture device is stalled (as the playback device is not running) and cannot produce any chunk to start with. This was simple to solve by sending two kickstart chunks from the capture side with zero samples before the capture loop (one chunk did not hit the configured playback device buffer threshold for start up).

Diff:

--- a/src/alsadevice.rs    (revision 3981740d8a4f38f00a44de3abee77ffd4342b546)
+++ b/src/alsadevice.rs    (date 1705754133966)
@@ -682,6 +682,14 @@
         peak: vec![0.0; params.channels],
     };
     let mut channel_mask = vec![true; params.channels];
+
+    if !send_zero_chunk(&channels, &params) {
+        return;
+    }
+    if !send_zero_chunk(&channels, &params) {
+        return;
+    }
+
     loop {
         match channels.command.try_recv() {
             Ok(CommandMessage::Exit) => {
@@ -871,6 +879,19 @@
     params.capture_status.write().state = ProcessingState::Inactive;
 }
 
+fn send_zero_chunk(channels: &CaptureChannels, params: &CaptureParams) -> bool {
+    let waveforms = vec![vec![0.0; params.chunksize]; params.channels];
+    let chunk = AudioChunk::new(waveforms, 0.0, 0.0, params.chunksize, params.chunksize);
+    let msg = AudioMessage::Audio(chunk);
+    if channels.audio.send(msg).is_err() {
+        info!("Processing thread has already stopped.");
+        return false;
+    } else {
+        info!("Sent kickstart zeros to playback.");
+    }
+    true
+}
+
 fn update_avail_min(
     pcmdevice: &PCM,
     frames: Frames,

With this simple change CDSP starts running fine, and no matter how much I torture the playback device rate, loopback runs synchronously and CDSP keeps the playback buffer levels basically constant:

Code:

amixer -c UAC2Gadget cset numid=1 1000000
Momentary freq = 48000 Hz (0x6.0000)

2024-01-20 14:44:05.865793 DEBUG [src/alsadevice.rs:573] PB: buffer level: 1584.0, signal rms: Some(HistoryRecord { time: Instant { tv_sec: 14944, tv_nsec: 337173553 }, values: [0.0, 0.0] })
2024-01-20 14:44:08.873720 DEBUG [src/alsadevice.rs:573] PB: buffer level: 1584.0, signal rms: Some(HistoryRecord { time: Instant { tv_sec: 14947, tv_nsec: 345099643 }, values: [0.0, 0.0] })
2024-01-20 14:44:11.881425 DEBUG [src/alsadevice.rs:573] PB: buffer level: 1584.0, signal rms: Some(HistoryRecord { time: Instant { tv_sec: 14950, tv_nsec: 352803979 }, values: [0.0, 0.0] })


amixer -c UAC2Gadget cset numid=1 875000
Momentary freq = 42000 Hz (0x5.4000)

2024-01-20 14:44:35.004469 WARN [src/alsadevice.rs:787] sample rate change detected, last rate was 42000.428535622406 Hz
2024-01-20 14:44:35.956677 DEBUG [src/alsadevice.rs:573] PB: buffer level: 1507.9, signal rms: Some(HistoryRecord { time: Instant { tv_sec: 14974, tv_nsec: 428059201 }, values: [0.0, 0.0] })
2024-01-20 14:44:36.028451 WARN [src/alsadevice.rs:787] sample rate change detected, last rate was 42000.5191411824 Hz
2024-01-20 14:44:37.052470 WARN [src/alsadevice.rs:787] sample rate change detected, last rate was 41999.34199663706 Hz
2024-01-20 14:44:38.076479 WARN [src/alsadevice.rs:787] sample rate change detected, last rate was 41999.594277356846 Hz
2024-01-20 14:44:38.979670 DEBUG [src/alsadevice.rs:573] PB: buffer level: 1508.0, signal rms: Some(HistoryRecord { time: Instant { tv_sec: 14977, tv_nsec: 451050650 }, values: [0.0, 0.0] })
2024-01-20 14:44:39.100444 WARN [src/alsadevice.rs:787] sample rate change detected, last rate was 42001.36209104719 Hz
2024-01-20 14:44:40.124472 WARN [src/alsadevice.rs:787] sample rate change detected, last rate was 41999.031151256 Hz
2024-01-20 14:44:41.148455 WARN [src/alsadevice.rs:787] sample rate change detected, last rate was 42000.761181763686 Hz
2024-01-20 14:44:42.002688 DEBUG [src/alsadevice.rs:573] PB: buffer level: 1507.9, signal rms: Some(HistoryRecord { time: Instant { tv_sec: 14980, tv_nsec: 474066920 }, values: [0.0, 0.0] })
2024-01-20 14:44:42.172459 WARN [src/alsadevice.rs:787] sample rate change detected, last rate was 41999.57122703359 Hz
2024-01-20 14:44:43.196445 WARN [src/alsadevice.rs:787] sample rate change detected, last rate was 42000.87151808399 Hz
2024-01-20 14:44:44.220464 WARN [src/alsadevice.rs:787] sample rate change detected, last rate was 41999.17634037181 Hz


amixer -c UAC2Gadget cset numid=1 1005000
Momentary freq = 48240 Hz (0x6.07ae)

2024-01-20 14:45:48.449535 DEBUG [src/alsadevice.rs:573] PB: buffer level: 1579.6, signal rms: Some(HistoryRecord { time: Instant { tv_sec: 15046, tv_nsec: 920911704 }, values: [0.0, 0.0] })
2024-01-20 14:45:51.463561 DEBUG [src/alsadevice.rs:573] PB: buffer level: 1579.0, signal rms: Some(HistoryRecord { time: Instant { tv_sec: 15049, tv_nsec: 934935325 }, values: [0.0, 0.0] })
2024-01-20 14:45:54.477358 DEBUG [src/alsadevice.rs:573] PB: buffer level: 1578.8, signal rms: Some(HistoryRecord { time: Instant { tv_sec: 15052, tv_nsec: 948735932 }, values: [0.0, 0.0] })

IMO this is a rather useful loopback feature which may allow using lower latencies (shorter chunksize) for CDSP bridges/simple DSP involving alsa loopback.

Cons:

The loopback device does not consume samples from the player until the CDSP bridge starts running (i.e until the loopback timer source starts running).
The /proc/asound/Loopback/timer_source is writable only by root.
The Loopback capture and playback devices must use the same period size (already used by CDSP)

HenrikEnquist · 2024-01-20 9:25 pm

This is a great discovery! had no idea that this functionality existed. It looks very useful and it would make a lot of sense to implement the needed changes in CamillaDSP to make it usable.
I'll play around with this stuff as soon as I have a some free time!

phofman · 2024-01-21 9:30 am

I was thinking about reliable implementation which added only minimum latency at start.

My testing modification was just a quick hack how to pass zeros to the playback device.

This is the playback buffer fill when running:

IMO the zeros should be injected on the playback side (to avoid startup delay in Processing).

The playback device needs to start consuming the incoming chunk from Capture with minimum delay to minimize the bridge latency. That means the playback device should have as few zero samples in its buffer as possible when the first captured chunk comes. Alsa offers rewinding, but some devices do not support it (a great paper on alsa rewinding is http://lac.linuxaudio.org/2015/papers/10.pdf ).

IMO the zero blocks should be only 1 period long (chunk is 2 periods) and sent more often (until the first chunk arrives). Maybe avail_min could be lowered to 1 period for the time of kickstarting, making the buffer fill fluctuate around 1 period. Using avail_min less than one period may be dangerous for devices which calculate the delay only at period boundary (i.e. delay information with only large granularity, the second chart) - it could result in buffer underflows.

Also playback xrun handling would need to be analyzed. Since xruns stop playback, that would also pause capture. Threshold is set at 1 sample, so any already captured chunk in the chain would kickstart the playback upon its restarting (dev.prepare). However if there was no chunk in the chain yet (like xrun occuring before capture had time to produce the first chunk), playback would again need the zeros kickstart. Maybe the zero kickstarting could handle both the initial startup and xrun recovery - no incoming chunk -> keep sending 1 period of zeros (with dev.wait), until the chunk arrives. IMO for xruns in most cases no zeros would be sent as in most cases the chain would already contain some chunk ready pro playback.

As of my test requiring 2 zero chunks to start up - I thought it was caused by threshold being set at 1 chunksize. But since it's at 1 sample only, IMO the first chunk reached playback, it was played, but this one loopback timer tick was not enough for the capture to finish generating its first chunk for the playback to continue running. Generating the kickstart zeros in the playback loop should avoid that.

Edit: Maybe the zeros block could be smaller than 1 period (like half), with the kickstarting loop running faster, to minimize the added delay.

HenrikEnquist · 2024-01-22 11:06 pm

The Wasapi and Coreaudio backends pump zeros to the playback device if there is no available data. They both use a separate thread for handling the low-level shuffling of data. The Alsa backend is different in that it runs a single thread only. Maybe it's time to start looking into adding another thread also there. It gives some advantages, and it would solve the kick-start right away. The downside is all the added complexity.

You mentioned that the period must be the same for the loopback and the clock source card. Would it be possible to run the two at different sample rates, if the period size is set to give the same period time for them? För example, would it work to run the loopback at 48 kHz with a period of 256, and a DAC at 96 kHz and a period of 512?

phofman · 2024-01-23 8:51 am

HenrikEnquist said:
The Wasapi and Coreaudio backends pump zeros to the playback device if there is no available data. They both use a separate thread for handling the low-level shuffling of data. The Alsa backend is different in that it runs a single thread only. Maybe it's time to start looking into adding another thread also there. It gives some advantages, and it would solve the kick-start right away.

That sounds like the very best solution, your inner loop (with zeros playback) is already well tested. Actually IIUC the inner loop on playback may allow using different period times for capture (= chunksize) and playback. That could be convenient for some device combination and scenarios too (e.g. the USB gadget capture has a fixed period time, while the period time on playback could be larger = safer).

HenrikEnquist said:
Would it be possible to run the two at different sample rates, if the period size is set to give the same period time for them? För example, would it work to run the loopback at 48 kHz with a period of 256, and a DAC at 96 kHz and a period of 512?

I just tested this scenario and it seems to run OK. Period time must be equal for the master and slave, but the actual period sizes do not seem to matter.
Loopback device as the timing master behaves weird, but regular HW devices seem to be OK (loopback as timing master would make no sense anyway).

siraaris · 2024-10-16 12:43 am

A common configuration using CDSP would be Roonbridge -> Loopback -> CDSP -> Hardware DAC. In this case would the approach be to set Loopback's timer_source to the Hardware DAC?

phofman · 2024-10-16 7:03 am

Yes, but you need the "kickstart" the playback to get the capture loopback clock running. I am not sure CDSP v. 3 does that, IMO not. For now (and maybe for always) you are better off with just enabling rate-adjust and letting the loopback clock be controlled by CDSP.

siraaris · 2024-10-16 2:59 pm

I guess this is probably part of a bigger conversation on what @HenrikEnquist has on the roadmap for CDSP.

🙂

HenrikEnquist · 2024-10-16 4:20 pm

I will most likely implement the separate thread in the alsa backend at some point, but it's not a very high priority for now since the current solution works well for the majority of use cases.
But even with that change, I would still recommend using rate adjust in cdsp instead of changing the timer source of the loopback. I don't see much benefit from using the timer source method, and it makes the loopback pretty quirky to use.

siraaris · 2024-11-04 5:35 am

I’ll try your mods @phofman as a test on my Loopback config with CDSP, as the only configuration I’ve managed to get working without XRUNs is on macOS using TotalMix’s Loopback function where Capture and Playback on CDSP is the the same device.

All software options (BlackHole on macOS, Loopback on Linux) have XRUNs.

It seems logical that if Loopback on Linux was to use the DAC as the timer source that this would implement the basic principle of Capture and Playback having the same clock source.

phofman · 2024-11-04 7:09 am

Do you have all configs including target level reasonably fitting the buffer size correct? The alsa loopback is commonly used in CDSP without any issues.

siraaris · 2024-11-04 1:45 pm

Thanks - this prompted me to lower the period and this actually reduced the XRUNs, at increased CPU. There’s been none observed for a couple of hours so. Maybe that is the approach - to reduce period whilst leaving say 20% CPU in reserve.

siraaris · 2024-11-05 12:26 pm

I swapped in a MOTU UltraLite mk5 for the Loopback function (RoonBridge Playback -> UltraLitemk5 and CDSP Capturing from UltraLitemk5’s Loopback).

Result is super stable.

With snd-aloop there were XRUNs either in end playback (an RME HDSPe MADI) or in CDSP Capture from snd-aloop’s Loopback.

This is with 2ch 96k.

I’m surprised that snd-aloop is so problematic.

phofman · 2024-11-05 12:38 pm

siraaris said:
With snd-aloop there were XRUNs either in end playback (an RME HDSPe MADI) or in CDSP Capture from snd-aloop’s Loopback.

Did you have your CDSP config correctly setup for alsa loopback (i.e. rate adjust enabled and reasonable target level)? People commonly use aloop without issues.

Edit: actually an almost identical duplicate of my post https://www.diyaudio.com/community/...ce-to-playback-hw-device.408077/#post-7835801 with no response...

siraaris · 2024-11-05 12:56 pm

I think I have CDSP setup correctly. Rate adjust is enabled, yes. I've tried a range of target levels.

The current config setup (for the UL as the Loopback) is attached. I'm using the "optional_alsa_params" branch to set period/buffer on devices.

You can also see that I've tried using Jack, in particular as Jack can utilise the RME card without plug (as it's a MMAP device). Jack on the final device (RME) results in stable CDSP Playback, but XRUN's present on (snd-aloop Loopback) Capture.

RoonBridge can also define buffer level (resulting in periods on Loopback Playback ranging from 240-24000).

Unfortunately I couldn't work out a way to get snd-aloop Loopback into Jack reliably - ried alsa_in and zita-a2j, more success with alsa_in, but still observing xruns.

Anyhow - more information than is necessary, just FYI.

I've tried using an Alsa Capture in CDSP (Loopback,1, RoonBridge plays to Loopback,0) and Jack on the CDSP Playback (ie. not using alsa_in). That also has issues.

So I've tried all sort of configurations - but maybe I'm missing something obvious?

phofman · 2024-11-05 1:39 pm

All unneccessary complications in the chain (such as jack) can just deteriorate the timing situation and often are just workarounds for suboptimal configuration/chain setup. Plain alsa is their common denominator, i.e. the simpliest chain. Therefore I would suggest to make the chain work with alsa only.

siraaris said:
in particular as Jack can utilise the RME card without plug (as it's a MMAP device)

Yes, CDSP uses/tries RW access only https://github.com/HEnquist/camilla...77bfea45df8bfba6e22cc1/src/alsadevice.rs#L374 . IMO it would be simple to change that to Access::MMapInterleaved https://docs.rs/alsa/latest/alsa/pcm/enum.Access.html#variant.MMapInterleaved and recompile CDSP if your particular HW supports MMAP only. Or using the plug plugin which would do the RW -> MMAP conversion only, provided all other params were compatible with your HW.

siraaris said:
RoonBridge can also define buffer level (resulting in periods on Loopback Playback ranging from 240-24000).

That's on the playback side of the loopback, CDSP uses its hw_params on the loopback's capture side.

siraaris said:
I'm using the "optional_alsa_params" branch to set period/buffer on devices.

IMO setting these adds another layer of complexity. Period and buffer sizes are closely related to xrun safety and target level control used in the rate-adjust algorithm. I would tend to keep them default at what CDSP uses and what its algorithms are designed for. Of course checking what the values are actually being used in /proc/asound/..../hw_params is important, especially when working with 16/32 channels where some hw/driver limitations can already be reached.

I do not know what CDSP version you use (i.e. the buffer size configured by CDSP), but I would suggest to set the target level to 3/4 of the playback buffer size. E.g. in v.2 your chunksize of 1024 would result in buffersize of 2048. For that buffer size the requested target level of 2048 would not be a viable value as it would mean the feedback is constantly trying to have to buffer filled at 100% at all measurement moments - clearly not achievable.

I believe your chain can be made work reliably with alsa only and alsa loopback as the bridge as its a very common setup here.

siraaris · 2024-11-05 2:38 pm

I'm using the next30 branch.

I agree, default Alsa/CDSP would be ideal.

I'll set the chunksize=2048, target_level=4096 and see how it goes. It seem pretty stable on initial observation.

Thank you.

phofman · 2024-11-05 3:13 pm

V.3 sets buffer size to 4 x chunksize. For maximum xrun safety a target level of around 6k would seem optimal to me. But if your CPU is fast enough, 50% of the buffer (4 period) should be safe too (i.e. lower overall latency). 16ch in and 32ch out 96kHz are some massive streams.

siraaris · 2024-11-06 3:39 pm

I'm not having much luck with xrun's on the Loopback capture. To not pollute this thread I'lI move further discussion to the main CamillaDSP thread.

Search

Amplifiers

Source & Line

Loudspeakers

Design & Build

General Interest

Live Sound

Member Areas

Site

Featured Vendors

Members Market

Vendors Market

Vendors

Search

Switching Alsa Loopback Timer Source to Playback HW Device

phofman

HenrikEnquist

phofman

HenrikEnquist

phofman

siraaris

phofman

siraaris

HenrikEnquist

siraaris

phofman

siraaris

siraaris

phofman

siraaris

Attachments

phofman

siraaris

phofman

siraaris