CamillaDSP - Cross-platform IIR and FIR engine for crossovers, room correction etc

Yes, but I kinda like to do it in a web interface way and automatic installers way, at this exact moment I haven't found couple clicks installation routine described. EQ APO has both.

The installation only needs to happen once. After that, you can use the GUI (a web interface) to configure everything you need—no need to touch the command line or deal with any complex technical setup once it’s running.

The installation process for CDSP is clearly described here: https://github.com/mdsimon2/RPi-CamillaDSP

You can copy and paste all commands directly; everything should work perfectly.
 
@phofman , @HenrikEnquist

quick FYI in case you're interested: I did a few tests, the issue seems to be with websocket:

1/ without any websocket client - measurement made twice, each over a period of 20 minutes, same source played, ondemand cpu scheduler:

v2.0.3 (github bin): 0 xrun 30% cpu avg.
beta3 (github bin): 1 xrun 30% cpu avg.
beta3 (custom aarch64 build, no default features, +websocket,+32bit): 15 xruns, 24% cpu avg (that one is odd - lower cpu usage but a lot more xruns...)

2/ with a pycamilladsp program running cdsp.levels.playback_rms() in a loop without any delay:

beta3 (github bin): 28 xruns 48% cpu
v2.0.3 (github bin): 0 xrun 48% cpu

the number of cdsp.levels.playback_rms() requests processed was the same in both cases (~30k / minute)
 
Last edited:
@phofman - sorry, forgot to mention chunksize was 4096 for those tests (target_level is unset). The rest of the configuration is the same I posted a couple posts ago.

re- trace logs: OK, I'll do a bit more tests with v3 vs v2 and I'll post them (I wasn't sure you or Henrik had any interest in this - as mentioned I don't want to waste your time).
 
2/ with a pycamilladsp program running cdsp.levels.playback_rms() in a loop without any delay:
Thanks, this is very good info. I'll try this and see if I can figure out what makes V3 perform worse.
beta3 (github bin): 28 xruns 48% cpu
v2.0.3 (github bin): 0 xrun 48% cpu
That's 28 xruns more than I would like to see for V3..

re- trace logs: OK, I'll do a bit more tests with v3 vs v2 and I'll post them (I wasn't sure you or Henrik had any interest in this - as mentioned I don't want to waste your time).
I'm very interested in sorting this out! It's always easiest to fix problems that I can reproduce on my own. You don't have to capture trace logs, but if I don't manage to trigger the xruns I may ask for some more help with that.
 
Trying now in a linux VM on a macbook air M1. Camilladsp v3 capture and plays to loopbacks, aplay plays into the loopback I'm capturing from, and a python script is hitting the web socket interface with requests in a loop with no delay. Right now it's at 3 million loop iterations, and zero xruns. This machine is probably too fast to see the problem. I also have a quartz64 board with 4 x Cortex A55 cores at 2.0GHz, hope that will work "better".
 
@HenrikEnquist - Thanks!

Happy to help, so I automated a few tests yesterday and this morning, here's a bit more info:

tests: 6 times, 10 minutes each, v2 and v3, flooding camilladsp with websocket requests (read_rms.py)

command: chrt -r 45 su -l user -c camilladsp ... > /tmp/log-v<version>-<debug-?><session_nb> (/tmp is in ram). Config files attached.

counting underruns: grep 'underrun' <file> | wc -l (-> "real" nb. of xruns is likely less as this may count the first xrun just after start)

source: squeezelite, playing music to alsa loopback @44100Hz. Note: I configured the alsa loopback to have 3 dshare configured inputs: squeezelite's L+R + another channel to play low freq tones to prevent the subwoofer from turning off (see asound.conf). IIRC the reason I had to do that was because once squeezelite was running it would get "exclusive" access to the loopback and it wasn't possible to play the low freq tone at regular intervals.

Unlike my previous tests the music was random (I don't think this changes anything but I could be wrong).

I ran tests with and without debug to see if it made a difference: it did (a bit), despite having /tmp in ram.

CPU usage (simply looking at top) was ~75% read_rms.py, ~70% camilladsp (v3 seemed to use a bit less), ~1.55 1min system load average.

Latency tests: the output below is only over a few seconds and is similar for v2 and v3. When left running longer there are spikes in latency every now and then but they aren't correlated to xruns.

Code:
cyclictest -M    # results in us

# /dev/cpu_dma_latency set to 0us
policy: other/other: loadavg: 0.71 0.69 0.43 5/217 20839

T: 0 (20839) P: 0 I:1000 C:   1054 Min:     74 Act: 7144 Avg:  282 Max:    7144

Results (nb. xruns - logfile)

Code:
# v2, with -l debug
 1 - log-v2-debug-0
 1 - log-v2-debug-1
 2 - log-v2-debug-2
 2 - log-v2-debug-3
 2 - log-v2-debug-4
 1 - log-v2-debug-5
total: 9 / avg: 2/session

# v2, without debug
 0 - log-v2-0
 0 - log-v2-1
 1 - log-v2-2
 1 - log-v2-3
 1 - log-v2-4
 2 - log-v2-5
total: 5 / avg: 1/session

# v3, with -l debug
10 - log-v3-debug-0
12 - log-v3-debug-1
 4 - log-v3-debug-2
12 - log-v3-debug-3
 1 - log-v3-debug-4
10 - log-v3-debug-5
total: 48 / avg: 10/session

# v3, without debug
 1 - log-v3-0
10 - log-v3-1
 9 - log-v3-2
12 - log-v3-3
14 - log-v3-4
 1 - log-v3-5
total: 39 / avg: 8/session
 

Attachments

The installation only needs to happen once. After that, you can use the GUI (a web interface) to configure everything you need—no need to touch the command line or deal with any complex technical setup once it’s running.

The installation process for CDSP is clearly described here: https://github.com/mdsimon2/RPi-CamillaDSP

You can copy and paste all commands directly; everything should work perfectly.

Thank you, that is exactly the reason I will stay away from it
 
I also have a quartz64 board with 4 x Cortex A55 cores at 2.0GHz, hope that will work "better".
It is very possible that my hardware has enough processor speed (for my needs) but is simply at the "limit" in terms of latency - which is probably what @phofman said (I only considered speed then). In that case, a very small performance drop in one of the libraries used by CamillaDSP (or a compiler optimization, ...) would be enough to trigger xruns. The fact that there are more xruns at debug logging level compared to the standard level (even with v2.0.3) would support this. And if that's indeed the case, you'll lose your time testing/investigating this issue. So don't worry! No problem to test stuff on my hardware if needed of course.

Cheers!
 
Last edited:
@HenrikEnquist

Here are some results; same test conditions as before except that this time the source was white noise played locally with aplay to exclude most of the network traffic and signal/music variability.

Branch "skip_instead_of__block", compiled with default features and lto = false (the default) ; logs attached:

Code:
xrun: 19 | p_status_blocked: 18 | p_params_blocked: 00 | log-v3-debug-0
xrun: 14 | p_status_blocked: 24 | p_params_blocked: 01 | log-v3-debug-1
xrun: 07 | p_status_blocked: 27 | p_params_blocked: 00 | log-v3-debug-2
xrun: 02 | p_status_blocked: 30 | p_params_blocked: 00 | log-v3-debug-3
xrun: 15 | p_status_blocked: 20 | p_params_blocked: 00 | log-v3-debug-4
xrun: 11 | p_status_blocked: 17 | p_params_blocked: 00 | log-v3-debug-5

And out of curiosity:

Same branch but compiled with lto = true; It seems there are a bit less xruns, with less variability:

Code:
xrun: 09 | p_status_blocked: 28 | p_params_blocked: 00 | pass 0
xrun: 03 | p_status_blocked: 16 | p_params_blocked: 00 | pass 1
xrun: 06 | p_status_blocked: 24 | p_params_blocked: 00 | pass 2
xrun: 08 | p_status_blocked: 17 | p_params_blocked: 00 | pass 3
xrun: 01 | p_status_blocked: 26 | p_params_blocked: 00 | pass 4
xrun: 08 | p_status_blocked: 19 | p_params_blocked: 00 | pass 5

For comparison, branch "next30" compiled with lto = true:

Code:
xrun: 12 | pass 0
xrun: 11 | pass 1
xrun: 14 | pass 2
xrun: 07 | pass 3
xrun: 01 | pass 4
xrun: 02 | pass 5

and without lto - similar results:

Code:
xrun: 01 | pass 0
xrun: 16 | pass 1
xrun: 01 | pass 2
xrun: 09 | pass 3
xrun: 12 | pass 4
xrun: 03 | pass 5
 

Attachments

There are lots of events from numid 7, what control is that on your system?

Code:
$ amixer -c0 cget numid=7
numid=7,iface=PCM,name='PCM Rate Shift 100000',device=1
  ; type=INTEGER,access=rw------,values=1,min=80000,max=120000,step=1
  : values=100126

I had no clue what that control was for; if I understand correctly from this comment, alsa's loop device seems to do some resampling to match the clock rate of the producer and consumer, I didn't know that. No idea why those events would show up in v3's debug log and not in v2's (either they don't happen in v2, or v2's debug level simply doesn't output them?).

I don't do fancy stuff with alsa's loop device - here's how the module is initialized (there are 3 configured loopbacks for different purposes but only one is used at a given time - in the case of those tests, the first one).

Code:
$ cat /etc/modprobe.d/snd-aloop.conf
options snd-aloop index=0,1,2 enable=1,1,1 pcm_substreams=1,1,1 id="Loopback0","Loopback1","Loopback2"

By the way I see that snd-aloop has a timer_source parameter to select the timer to be used; it defaults to jiffies, I'm wondering what will happen if I tried to set that to the Motu M4 interface (maybe that's a dumb idea, and I'll have to understand how to do that!)
 
Last edited:
if I understand correctly from this comment, alsa's loop device seems to do some resampling to match the clock rate of the producer and consumer
It fine-tunes the speed at which the virtual/software clock of the loopback device runs. No resampling involved.

No idea why those events would show up in v3's debug log and not in v2's (either they don't happen in v2, or v2's debug level simply doesn't output them?)
That part was rewritten for v.3 . IIUC the Rate Shift ctl throws some change event after being updated by the rate-adjust algorithm, and that unimportant event is not caught by checks in the get_event_action method https://github.com/HEnquist/camilla...16fce19db1d815a6/src/alsadevice_utils.rs#L457 . @HenrikEnquist : Maybe it could be caught/ignored there just to clean up the logs.

By the way I see that snd-aloop has a timer_source parameter to select the timer to be used; it defaults to jiffies, I'm wondering what will happen if I tried to set that to the Motu M4 interface (maybe that's a dumb idea, and I'll have to understand how to do that!)
See https://www.diyaudio.com/community/...ck-timer-source-to-playback-hw-device.408077/

@HenrikEnquist : IIUC in the v3 config of @t-gh-ctrl the target_level is left at default, which should make it 3x chunksize of 4096bytes = 12288 bytes (3/4 of the 16k alsa buffer). Looking at the PID controller target level adjustments:

Code:
2024-11-03 15:21:53.137499 [src/helpers.rs:162] Rate controller, ramp step 1/20, current target 4525.771414170258
2024-11-03 15:22:04.681466 [src/helpers.rs:162] Rate controller, ramp step 2/20, current target 4442.1889025862065
2024-11-03 15:22:14.681564 [src/helpers.rs:162] Rate controller, ramp step 3/20, current target 4371.434797790948
2024-11-03 15:22:24.721680 [src/helpers.rs:162] Rate controller, ramp step 4/20, current target 4312.1240275862065
2024-11-03 15:22:44.601942 [src/helpers.rs:162] Rate controller, ramp step 5/20, current target 4262.950666756466
2024-11-03 15:22:54.645746 [src/helpers.rs:162] Rate controller, ramp step 6/20, current target 4222.687937068966
2024-11-03 15:23:10.090825 [src/helpers.rs:162] Rate controller, ramp step 7/20, current target 4190.188207273707
2024-11-03 15:23:20.270142 [src/helpers.rs:162] Rate controller, ramp step 8/20, current target 4164.382993103448
2024-11-03 15:23:30.493205 [src/helpers.rs:162] Rate controller, ramp step 9/20, current target 4144.282957273706
2024-11-03 15:23:40.501584 [src/helpers.rs:162] Rate controller, ramp step 10/20, current target 4128.977909482759
2024-11-03 15:23:56.141516 [src/helpers.rs:162] Rate controller, ramp step 11/20, current target 4117.636806411638
2024-11-03 15:24:06.173687 [src/helpers.rs:162] Rate controller, ramp step 12/20, current target 4109.507751724138
2024-11-03 15:24:16.181864 [src/helpers.rs:162] Rate controller, ramp step 13/20, current target 4103.917996066811
2024-11-03 15:24:26.184928 [src/helpers.rs:162] Rate controller, ramp step 14/20, current target 4100.273937068965
2024-11-03 15:24:36.189074 [src/helpers.rs:162] Rate controller, ramp step 15/20, current target 4098.061119342672
2024-11-03 15:24:46.216710 [src/helpers.rs:162] Rate controller, ramp step 16/20, current target 4096.844234482759
2024-11-03 15:24:56.221542 [src/helpers.rs:162] Rate controller, ramp step 17/20, current target 4096.26712106681
2024-11-03 15:25:06.225577 [src/helpers.rs:162] Rate controller, ramp step 18/20, current target 4096.052764655173
2024-11-03 15:25:16.233943 [src/helpers.rs:162] Rate controller, ramp step 19/20, current target 4096.003297790949
2024-11-03 15:25:26.241679 [src/helpers.rs:162] Rate controller, ramp step 20/20, current target 4096
2024-11-03 15:25:36.288944 [src/helpers.rs:146] Rate controller, buffer level is 6008.829787234043, starting to adjust back towards target of 4096
2024-11-03 15:25:36.289075 [src/helpers.rs:162] Rate controller, ramp step 1/20, current target 5654.011816888298
2024-11-03 15:25:46.344904 [src/helpers.rs:162] Rate controller, ramp step 2/20, current target 5351.007623404255
2024-11-03 15:25:56.389447 [src/helpers.rs:162] Rate controller, ramp step 3/20, current target 5094.50910412234
2024-11-03 15:26:06.393403 [src/helpers.rs:162] Rate controller, ramp step 4/20, current target 4879.495080851064
2024-11-03 15:26:16.401002 [src/helpers.rs:162] Rate controller, ramp step 5/20, current target 4701.231299867021
2024-11-03 15:26:26.404613 [src/helpers.rs:162] Rate controller, ramp step 6/20, current target 4555.270431914893
2024-11-03 15:26:36.412471 [src/helpers.rs:162] Rate controller, ramp step 7/20, current target 4437.452072207447
2024-11-03 15:26:46.444733 [src/helpers.rs:162] Rate controller, ramp step 8/20, current target 4343.902740425532
2024-11-03 15:26:56.449041 [src/helpers.rs:162] Rate controller, ramp step 9/20, current target 4271.0358807180855
2024-11-03 15:27:06.456820 [src/helpers.rs:162] Rate controller, ramp step 10/20, current target 4215.551861702128
2024-11-03 15:27:16.464959 [src/helpers.rs:162] Rate controller, ramp step 11/20, current target 4174.4379764627665
2024-11-03 15:27:26.472735 [src/helpers.rs:162] Rate controller, ramp step 12/20, current target 4144.968442553191
2024-11-03 15:27:36.484860 [src/helpers.rs:162] Rate controller, ramp step 13/20, current target 4124.704401994681
2024-11-03 15:27:46.512755 [src/helpers.rs:162] Rate controller, ramp step 14/20, current target 4111.493921276596
2024-11-03 15:27:56.556947 [src/helpers.rs:162] Rate controller, ramp step 15/20, current target 4103.471991356383
2024-11-03 15:28:06.561672 [src/helpers.rs:162] Rate controller, ramp step 16/20, current target 4099.060527659574
2024-11-03 15:28:16.573671 [src/helpers.rs:162] Rate controller, ramp step 17/20, current target 4096.968370079787
2024-11-03 15:28:26.585648 [src/helpers.rs:162] Rate controller, ramp step 18/20, current target 4096.191282978723
2024-11-03 15:28:36.593521 [src/helpers.rs:162] Rate controller, ramp step 19/20, current target 4096.01195518617
2024-11-03 15:28:53.920796 [src/helpers.rs:162] Rate controller, ramp step 20/20, current target 4096
2024-11-03 15:29:34.009029 [src/helpers.rs:146] Rate controller, buffer level is 5613.692307692308, starting to adjust back towards target of 4096
2024-11-03 15:29:34.009257 [src/helpers.rs:162] Rate controller, ramp step 1/20, current target 5332.169870192308
2024-11-03 15:29:44.021618 [src/helpers.rs:162] Rate controller, ramp step 2/20, current target 5091.757923076923
2024-11-03 15:29:54.037677 [src/helpers.rs:162] Rate controller, ramp step 3/20, current target 4888.244870192308
2024-11-03 15:30:04.070149 [src/helpers.rs:162] Rate controller, ramp step 4/20, current target 4717.64676923077
2024-11-03 15:30:14.081121 [src/helpers.rs:162] Rate controller, ramp step 5/20, current target 4576.2073317307695
2024-11-03 15:30:24.088890 [src/helpers.rs:162] Rate controller, ramp step 6/20, current target 4460.397923076923
2024-11-03 15:30:34.104625 [src/helpers.rs:162] Rate controller, ramp step 7/20, current target 4366.9175625
2024-11-03 15:30:44.116535 [src/helpers.rs:162] Rate controller, ramp step 8/20, current target 4292.692923076923
2024-11-03 15:30:54.132750 [src/helpers.rs:162] Rate controller, ramp step 9/20, current target 4234.87833173077
2024-11-03 15:31:04.164593 [src/helpers.rs:162] Rate controller, ramp step 10/20, current target 4190.8557692307695
2024-11-03 15:31:14.205024 [src/helpers.rs:162] Rate controller, ramp step 11/20, current target 4158.234870192307
2024-11-03 15:31:24.217030 [src/helpers.rs:162] Rate controller, ramp step 12/20, current target 4134.852923076924
2024-11-03 15:31:34.233853 [src/helpers.rs:162] Rate controller, ramp step 13/20, current target 4118.774870192307
2024-11-03 15:31:44.245750 [src/helpers.rs:162] Rate controller, ramp step 14/20, current target 4108.293307692307

The target level seems to be pushed to 4096bytes which is just one chunksize, very likely to produce xruns on that weak CPU. Perhaps here is the issue?
 
@HenrikEnquist : Maybe it could be caught/ignored there just to clean up the logs
Yes that's a good idea, it's a bit confusing now.

Looking a big more at the logs, it seems like it times out waiting for a ready event from the capture device. The waiting time limit there is somewhat arbitrary, we could try increasing it to see if it helps.

@t-gh-ctrl could you modify the limit here?
https://github.com/HEnquist/camilla...96738e16fce19db1d815a6/src/alsadevice.rs#L258
Change the 4 * millis_per_chunk to 8 *, and change the 10 on the two following lines to 20.
 
@phofman - thank you for your explanation, I'm learning something new every day! I actually did a few tests with the timer source shortly after writing about it, I was wondering why the interface was silent despite aplay running seemingly without issues. After reading the posts in the linked thread it's now clear why. I'll try the "deadlock patch" later today (out of curiosity).

@HenrikEnquist - here are the results, it's much better! (note- I applied the changes to branch "skip_instead_of__block", but let me know if you'd like me to test with "next30" ; logs attached):

Code:
xrun: 00 | p_status_blocked: 20 | p_params_blocked: 00 | log-v3-debug-0
xrun: 00 | p_status_blocked: 15 | p_params_blocked: 00 | log-v3-debug-1
xrun: 00 | p_status_blocked: 16 | p_params_blocked: 00 | log-v3-debug-2
xrun: 00 | p_status_blocked: 15 | p_params_blocked: 00 | log-v3-debug-3
xrun: 00 | p_status_blocked: 19 | p_params_blocked: 00 | log-v3-debug-4
xrun: 00 | p_status_blocked: 21 | p_params_blocked: 00 | log-v3-debug-5
 

Attachments

  • Like
Reactions: HenrikEnquist
The fact is that v.3 doubled the buffers size from 2 chunks to 4 chunks, but this value was not changed. The current value of one buffer size may be at the edge of issues, when considering some timing jitter. IIUC that's why it was raised from 2 to 4 chunks last time in the commit https://github.com/HEnquist/camilla...2daccd86520398446c22592ae6e6677adf76R264-R267 .

Still I wonder about the target level. IMHO the 4k target level for 4k chunk and 16k buffer is not to be expected with the default target_level calculation in https://github.com/HEnquist/camilladsp/commit/ca25bdaaadf6f9e95425101edec622c4206e05ad . But I may understand the PID code incorrectly.
 
Still I wonder about the target level. IMHO the 4k target level for 4k chunk and 16k buffer is not to be expected with the default target_level calculation in...
That piece of code is just calculating the limit, for checking that the config is valid.
The default target is 1*chunksize. That's a compromise between latency and robustness, and of course won't always be the ideal choice.
 
That piece of code is just calculating the limit, for checking that the config is valid.
Thanks for the explanation, I overlooked the next check.

@t-gh-ctrl : IMO for the weak CPU loaded close to max the config could configure the target level higher to lower the xrun chances. It will raise latency, but also increase the time marging for xruns. I used 3 chunksizes (i.e. 75% of buffer, instead of the default 25%) on my Pi S with no xruns at stable run (I have hard-coded changes for S32 <-> float format conversion for that weak CPU but those are in different parts of CDSP).
 
  • Thank You
Reactions: t-gh-ctrl