CamillaDSP - Cross-platform IIR and FIR engine for crossovers, room correction etc.

just generating a low-pass Butterworth filter in rePhase on the "Minimum Phase Fitlers" tab, then using the .wav file in CamillaDSP.
Mikey, how many taps have you specified in rePhase and where in the CamillaDSP config is the .wav file ?

My understanding is that rePhase builds FIR filters which will always add delay, the more taps the longer the delay.

I use CamillaDSP to tri-amp and time align Klipschorns and used rePhase - see
https://www.diyaudio.com/community/...overs-room-correction-etc.349818/post-7213813
 
I'm having an issue that I think is probably something obvious that I'm missing. I got CamillaDSP 1.0.3 up and running on Windows, with GUI backend and frontend. System is Windows 10 on Ryzen 5800X3D, output to VB-Audio virtual cable, into CamillaDSP, out to Focusrite USB interface. Basic functionality seems to work great (e.g. apply a 100Hz low-pass filter and watch/listen to something, obviously works). But when I apply a (supposedly) minimum phase convolution filter (.wav file), I'm getting delay in audio output that seems about equal to half the filter length. I've tried a few filter .wav files, they all seem to have the same effect. For example just generating a low-pass Butterworth filter in rePhase on the "Minimum Phase Fitlers" tab, then using the .wav file in CamillaDSP. Maybe this is a question for the rePhase thread?

How do you have your centering set in rePhase? The default is middle which will give you a lot of unnecessary latency for minimum phase filters. rePhase should tell you the latency, what does it report?

Here is an example of a minimum phase BW4 100 Hz LPF using 16K taps at 44.1K with middle centering. The rePhase report delay is 186 ms.

1696640981441.png


Using 1% centering gives 4 ms delay. It should be 0 ms if you use 0% centering.

1696641193460.png


Michael
 
Okay, thanks to @fluid in the rePhase thread, my delay was my user error in rePhase. When creating the filter I was using middle centering in the settings, I changed to "0" and "use closest perfect impulse" and now no more delay.
EDIT: one more quick test, 131k taps with a 15kHz low-pass, doesn't seem to introduce any perceptible delay. Applied to left channel only and listened to someone clap on a youtube video and there doesn't seem to be any "double-tap" or anything like that :)
1696653321342.png
 
Last edited:
Yep, here's a 200Hz low-pass generated in rePhase, exactly in the middle of the file. What should I do?
Better control once again your filter after having generated it once again. Hopefully with the correct settings.

As is, the graph you showed with it's irrgular shape and it's null interval is nothing but a weird mess, regardless of any leading zeros and/or so. If such a filter would unexpectedly produce something like a BW 200Hz LP amplitude characteristic at all, then it's phase certainly will be cruelly misbehaving. Instead, a correct 8k BW LP 200Hz minphase looks like this:

Time.png


Magnitude.png


You cannot have an adequate resolution at low frequencies with short FIR filters like this 8k filter. This is why the continuity if flawed below some 50H in this example. Better go for 64k or 128k. A minphase filter will not introduce any substantial delay anyway, independently from it's length.

CamillaDSP, as any other convolver, can only be as functional as the FIR filters set up. If you are in doubt of the FIR's filters quality, then you better may resort to the inbuilt and therefore always precise IIR filters of CamillaDSP where suitable.
 
Last edited:
  • Like
Reactions: 1 user
On R2R DACs distortion can lowered handsomely even by dithering
Yes sure there are use cases for dithering at > 16 bits, but since most people use delta-sigma DACs I would not give a general recommendation to use dithering. Some people yes, most people no.
As is, the graph you showed with it's irrgular shape and it's null interval is nothing but a weird mess
It's probably fine. The vertical scale in that graph is set to dB which makes it look completely weird. But it wouldn't hurt to also look at it on a normal linear scale, it should look similar to the one from @Daihedz.
 
It's probably fine. The vertical scale in that graph is set to dB which makes it look completely weird. But it wouldn't hurt to also look at it on a normal linear scale, it should look similar to the one from @Daihedz.

It is fine! You were right. Not only probably. Right straight away!

I generated a 32Bit WAV of the BW4 200Hz LP and exported it to Audacity, to control how "my" filter looks like there . In linear graph mode everything looks more or less as in Acourate. Instead, in Log/DB mode the graph looks as weird as I mentionned it in my post. Therefore both filters, the one generated by RePhase, and the one by Acourate visually seem more or less identic. And probably they will be.

Lin.png


LogDB.png
 
One minor point - the timestamps in log files use UTC rather than local time. Is this intentional? Local time would be a lot more intuitive for me. Example:
Code:
streamer@office-streamer:~$ cat camilladsp2/camilladsp.log
2023-10-07 23:55:07.995369 INFO  [src/bin.rs:695] CamillaDSP version 2.0.0-alpha4
2023-10-07 23:55:07.995577 INFO  [src/bin.rs:696] Running on linux, x86_64
2023-10-07 23:55:08.029195 INFO  [src/alsadevice.rs:142] PB: Starting playback from Prepared state
streamer@office-streamer:~$ date
Sun Oct  8 13:03:21 NZDT 2023
 
Yes, setting codegen_units = 1 avoids it, and gives a little speed boost as well.
It looks like I spoke too soon confirming that the problem is fully fixed in 2.0.0a4. It is fixed for the configs I had previously tested, but I've just stumbled across a different config where it resurfaces. This new config also uses biquads.

I suspect until the root cause is fixed in the compiler, these performance issues may continue to show up with different configs / compiler options in no particular pattern, depending on how the sse2 registers happen to be used.

As an alternative work-around, adding some assembly code to zero out the sse2 registers before processing each chunk fixes the problem for all the configs I've tested and does not rely on compiler options. It's a hack, but is only needed once per chunk and should be pretty unobtrusive. I'm about to submit a pull request containing this fix.

I can give details of the new config if anyone wants to analyse it further. I haven't spent time tracking down exactly where this problem occurs or hunting for denormals in particular registers but it looks like the same problem and is due to the same drop in instructions per clock. The elapsed time to process a chunk of audio in 2.0.0a4 is 2.307s when the problem occurs, and drops to 0.511s with the fix applied to zero out the sse2 registers.
 
Here's the config that causes the problem. It's doing a straightforward headphone crossfeed.
Code:
devices:
  samplerate: 48000
  chunksize: 2048
  queuelimit: 1
  capture:
    type: File
    channels: 2
    filename: "sin1k.raw"
    format: S16LE
  playback:
    type: File
    channels: 2
    filename: "/dev/null"
    format: S16LE

mixers:
  2to4cross:
    channels:
      in: 2
      out: 4
    mapping:
    - dest: 0
      mute: false
      sources:
      - channel: 0
        gain: 0
        inverted: false
        mute: false
    - dest: 1
      mute: false
      sources:
      - channel: 0
        gain: 0
        inverted: false
        mute: false
    - dest: 2
      mute: false
      sources:
      - channel: 1
        gain: 0
        inverted: false
        mute: false
    - dest: 3
      mute: false
      sources:
      - channel: 1
        gain: 0
        inverted: false
        mute: false

  4to2cross:
    channels:
      in: 4
      out: 2
    mapping:
    - dest: 0
      mute: false
      sources:
      - channel: 0
        gain: 0
        inverted: false
        mute: false
      - channel: 2
        gain: 0
        inverted: false
        mute: false
    - dest: 1
      mute: false
      sources:
      - channel: 1
        gain: 0
        inverted: false
        mute: false
      - channel: 3
        gain: 0
        inverted: false
        mute: false

filters:
  cx3_hi:
    parameters:
      freq: 868.97
      gain: -2
      type: Lowshelf
      q: 0.5
    type: Biquad
  cx3_lo:
    parameters:
      freq: 700
      type: LowpassFO
    type: Biquad
  cx3_lo_gain:
    type: Gain
    parameters:
      gain: -8
      inverted: false

pipeline:
  - type: Mixer
    name: 2to4cross
  - channel: 0
    names:
      - cx3_hi
    type: Filter
  - channel: 1
    names:
      - cx3_lo
      - cx3_lo_gain
    type: Filter
  - channel: 2
    names:
      - cx3_lo
      - cx3_lo_gain
    type: Filter
  - channel: 3
    names:
      - cx3_hi
    type: Filter
  - type: Mixer
    name: 4to2cross

And the timing results. "camilladsp-2.0.0a4-download" is straight off github and "camilladsp-2.0.0a4-zero" is a version I built with the extra instructions to zero out the sse2 registers. This example is simpler and the CPU usage is very low (2.3 seconds to process a 5 minute file even with the performance issue) so I probably wouldn't have noticed the problem if I wasn't alert for it but it's definitely there.
Code:
$ time ./camilladsp-2.0.0a4-download crossfeed.yml
2023-10-08 19:38:00.527862 INFO [src/bin.rs:695] CamillaDSP version 2.0.0-alpha4
2023-10-08 19:38:00.527899 INFO [src/bin.rs:696] Running on linux, x86_64
2023-10-08 19:38:02.827811 INFO [src/bin.rs:395] Capture finished
2023-10-08 19:38:02.828053 INFO [src/bin.rs:378] Playback finished

real    0m2.303s
user    0m2.768s
sys     0m0.257s
$ time ./camilladsp-2.0.0a4-zero crossfeed.yml
2023-10-08 19:37:53.055546 INFO [src/bin.rs:695] CamillaDSP version 2.0.0-alpha4
2023-10-08 19:37:53.055588 INFO [src/bin.rs:696] Running on linux, x86_64
2023-10-08 19:37:53.531675 INFO [src/bin.rs:395] Capture finished
2023-10-08 19:37:53.531763 INFO [src/bin.rs:378] Playback finished

real    0m0.478s
user    0m1.061s
sys     0m0.119s
 
  • Like
Reactions: 1 user
Hi,

I just started using CamillaDSP and I‘m very happy with it. Thank you very much for this incredible software.

I have one question:
I‘m running CamillaDSP on an RPi4 with Moode. I have 10 filters for each stereo channel, and this produces a load of 2% CPU and 0.1% MEM. How many filters could I use without introducing any problems? Is the load linear i.e., with 20 filters each I would have a load of 4%, and is this the only limiting factor?

Cheers, Joachim
 
I‘m running CamillaDSP on an RPi4 with Moode. I have 10 filters for each stereo channel, and this produces a load of 2% CPU and 0.1% MEM. How many filters could I use without introducing any problems?

I have CamillaDSP (alpha 0.4) running on a Rpi3A+ clocked at 900MHz (steady state). This produces a load of 95% CPU (8 threads on 2 CPU kernels) and 3% MEM. And this runs rock stable until now. Along with the CPU clocked a bit lower at 800MHz was just not enough, this leads to sporadic xruns (more or less once every 30 minutes). With a different config (and versions <= alpha 0.3) CamillaDSP was stable along with a total CPU load of 135%. Nota bene: This does not mean that alpha 0.4 might be more critical than the previous versions. Multiple factors do strongly influence the stability of a system, and there have been other changes of my setup since then. I mention this historical 135% value to underline that also loads >100% may run stable.

So, on your RP4, you may have huge (!!!) reserves CPU-wise.

Is the load linear i.e., with 20 filters each I would have a load of 4%, and is this the only limiting factor?

You may testwise try this out with e.g. 100 ... 500 filters and then report on the forum.
 
Last edited:
Ok, so there is no data regarding this and there is no additional limiting factor that I haven't seen.

I have done two tests in addition to the one data point I had. All are done simply with a radio station with 98kbps and 16bit/48kHZ. The filters are Biquad Peaking.

Here is the data:
20 filters (10 for each channel) 2% CPU 0.2% MEM
40 filters (20 for each channel) 3% CPU 0.2% MEM
80 filters (40 for each channel) 4% CPU 0.2% MEM

Measured with "top" on the console.

This fits quite well with your estimate of "huge reserves CPU-wise" :)