CamillaDSP - Cross-platform IIR and FIR engine for crossovers, room correction etc.

I have some build optimization questions if you don't mind.

On a Linux box, is the following sufficient to enable all of the CPU's target-features or do they need to be individually added with "-C target-feature=+sse4.1,+sse4.2,+avx,+etc,+etc,+etc" (from the "rustc --print target-features" listing for the specific CPU) ?
Setting target cpu to native gives the same result as adding each target feature manually.





Does LTO (Link Time Optimizations) do anything other than clearing out dead/unused code from the resulting executable ?
Don't know! I have assumed that the default release profile is a reasonable compromise, and haven't found the time to experiment with it yet.



After trying to comprehend the cargo and rustc docs, I appended the following "release" target optimization flags to the end Cargo.toml and noticed that the resulting binary reduced from @ 8.5MB to 5.5MB.

Are these flags reasonable for optimization ?
...
The "opt-level" is 3 by default, no need to set that. Playing with "lto" and "codegen-units" might be interesting, exploring this has been on my todo-list for a quite a while..

The default settings are here:

Profiles - The Cargo Book


I also noticed the build is using "--edition=2018" when there appears to be a 2021 option. Is there any reason to be not using the 2021 option ?
The editions are for handling changes of the language itself. There should be no performance benefit in using the newer edition. I will migrate everything to edition 2021 once 1.0.0 is out the door. The changes from 2018 to 2021 are quite small, should be very easy.
 
Any chance a 64bit camilladsp distro can be made, like a headless one for those of us who can't get their head around installing Linux audio pipelines/configuring them for camilladsp. It'd be awesome to have for things like turning a raspberry pi into a super powerful 2 channel DSP board. It's a big ask, but imagine it as part DIY (and better) alternative to miniDSP. You could even sell the distro ��. Just an idea ����
 
I think I may have answered some of my own questions after discovering the camilladsp/target/.rustc_info.json log file and running a few test builds.

If "target_feature=+avx,+etc,+etc" is NOT specifically specified, the CPU features appear in the .rustc_info.json log file.

If "target_cpu=native" is specified, "aes" is added to the list.

If "target_cpu=ivybridge" is specified, "aes" is NOT added to the list (maybe because my ivybride CPU is unlocked ???).

Code:
{"rustc_fingerprint":11919284090315669549,"outputs":
        {"6851037996948357471":
                {"success":false,"status":"exit status: 1","code":1,"stdout":"","stderr":
                        "error: `-Csplit-debuginfo` is unstable on this platform\n\n"},"17598535894874457435":
                {"success":true,"status":"","code":0, "stdout":
                        "rustc 1.55.0 (c8dfcfe04 2021-09-06)\n
                        binary: rustc\n
                        commit-hash: c8dfcfe046a7680554bf4eb612bad840e7631c4b\n
                        commit-date: 2021-09-06\nhost: x86_64-unknown-linux-gnu\nrelease: 1.55.0\n
                        LLVM version: 12.0.1\n","stderr":""},"10326529527813338399":
                {"success":true,"status":"","code":0,"stdout":"___\nlib___.rlib\nlib___.so\nlib___.so\nlib___.a\nlib___.so\n
                        /blah/blah/.rustup/toolchains/stable-x86_64-unknown-linux-gnu\n
                        proc_macro\n
                        target_arch="x86_64"\n
                        target_endian="little"\n
                        target_env="gnu"\n
                        target_family="unix"\n
                        target_feature="aes"\n
                        target_feature="avx"\n
                        target_feature="fxsr"\n
                        target_feature="pclmulqdq"\n
                        target_feature="popcnt"\n
                        target_feature="rdrand"\n
                        target_feature="sse"\n
                        target_feature="sse2"\n
                        target_feature="sse3"\n
                        target_feature="sse4.1"\n
                        target_feature="sse4.2"\n
                        target_feature="ssse3"\n
                        target_feature="xsave"\n
                        target_feature="xsaveopt"\n
                        target_os="linux"\n
                        target_pointer_width="64"\n
                        target_vendor="unknown"\nunix\nverbose\n","stderr":""}},"successes":{}}

After reading more about the rustc releases, it appears the rustc development is moving along at a rapid pace with about half a dozen releases so far in 2021.

The 2021 compiler edition option became stable in August and is targeted for the rustc version 1.56.0 dated 10/21/2021, so not yet out. It also may require significant code/syntax changes to migrate to.
 
Last edited:
Setting target cpu to native gives the same result as adding each target feature manually.

Don't know! I have assumed that the default release profile is a reasonable compromise, and haven't found the time to experiment with it yet.

The "opt-level" is 3 by default, no need to set that. Playing with "lto" and "codegen-units" might be interesting, exploring this has been on my todo-list for a quite a while..

The default settings are here:

Profiles - The Cargo Book

The editions are for handling changes of the language itself. There should be no performance benefit in using the newer edition. I will migrate everything to edition 2021 once 1.0.0 is out the door. The changes from 2018 to 2021 are quite small, should be very easy.

HenrikEnquist,

Thanks much for the info.

The LTO functionality is interesting and new to me. It reminds me a little bit of "coalescing compilers" which were few and far in between, but the LTO is on steroids. I watched a few YouTube lectures on them and they appear to do much more than removing unused dead wood. One lecture indicated an average [3-5]% performance increase.
 
Making and maintaining a custom distribution is a massive undertaking! That would take far too much time, so the chance for that is exactly zero. Unless someone else does it :)


But I plan to make an automated installer for the usb gadget mode. That will just need a RPi4 with some kind of dac (hat or usb) and a standard Raspberry Pi OS installation. Then the script will set up camilladsp, the gadget mode, the gui, and systemd services to start everything on boot.

That gives you a device that behaves like a USB sound card, and that has DSP via CamillaDSP built in.


I plan to start on this properly once RPi OS comes with kernel 5.14. In earlier kernels the USB audio gadget mode doesn't work with Windows and macOS.
 
Really?! With my IQAudioDAC+ it's just enabling the i2s on startup for the board, similar to how you'd start onboard audio but dtoverlay=iqaudio-dacplus.

Well, I mean that dtoverlay=iqaudio-dacplus. It creates some alsa device, which is called somehow and that name needs to be entered into camilladsp config. Automating all of that is the major hassle - that goes toward projects like moode.
 
Benchmarking

HenrikEnquist,

Can you recommend a preferred way to benchmark the various optimized builds ?

I would like to see if there are any differences and if so, by how much.

Using strip, my resulting speed optimized binary is now 1/3rd the size of the default build.

I have a ramdisk setup in tmpfs that is getting @ 3.4GB/s speeds that could house sample input files as well as launching camillaDSP from alsa_cdsp along with the config files.

Would something like the following be suitable or a pipe-lined input file ?

Code:
time aplay -D camilladsp sample.flac

I would prefer getting a player out of benchmark if possible. Any suggestions are greatly welcomed (including a perf monitor).

Thanks much.
 
Can you recommend a preferred way to benchmark the various optimized builds ?
I would start by preparing a test audiofile in a raw format, i32 for example. Put that in the ramdisk, and make a camilladsp config that uses File input to read the test file. For output, you can also use File, and put /dev/null. Add a mix of different filters and mixers in the pipeline to give Camilla some work to do.

Then just run "time camilladsp testconfig.yml". This will include setting up the pipeline and all filters, so make the testfile large enough that it takes a couple of seconds to process it.

I'm very curious to know what results you get. I have only been benchmarking individual components, not the whole thing like this. There are some benchmarks in the camilladsp repo you can run with "cargo bench".
 
HenrikEnquist,

Can you recommend a preferred way to benchmark the various optimized builds ?

For a little bit more details then time provides, the tool perf is available:

PHP:
$ sudo apt-get install linux-perf
$ sudo mount -t tmpfs -o size=150m myramdisk /mnt/ramdisk
$ cd /mnt/ramdisk
$ cp <yourtestfiles> .
$ perf_4.9 stat -r 10 camilladsp ./resample_test.yml
Sep 17 21:14:53.230 INFO Capture finished, module: camilladsp
Sep 17 21:14:53.231 INFO Playback finished, module: camilladsp

 Performance counter stats for 'camilladsp /usr/share/camilladsp/configs/resample_test.yml':

        236.931054      task-clock:u (msec)       #    1.964 CPUs utilized
                 0      context-switches:u        #    0.000 K/sec
                 0      cpu-migrations:u          #    0.000 K/sec
               234      page-faults:u             #    0.988 K/sec
       227,944,890      cycles:u                  #    0.962 GHz
       392,416,050      instructions:u            #    1.72  insn per cycle
        61,382,844      branches:u                #  259.075 M/sec
           175,133      branch-misses:u           #    0.29% of all branches

       0.120633112 seconds time elapsed
 
I would start by preparing a test audiofile in a raw format, i32 for example. Put that in the ramdisk, and make a camilladsp config that uses File input to read the test file. For output, you can also use File, and put /dev/null. Add a mix of different filters and mixers in the pipeline to give Camilla some work to do.

Then just run "time camilladsp testconfig.yml". This will include setting up the pipeline and all filters, so make the testfile large enough that it takes a couple of seconds to process it.

I'm very curious to know what results you get. I have only been benchmarking individual components, not the whole thing like this. There are some benchmarks in the camilladsp repo you can run with "cargo bench".

HenrikEnquist,

Thanks much. I needed to figure out how to create raw files. I was able to create a stereo 192kHz 32-bit white noise wav file in REW and then use SOX to convert it into a stereo 32-bit raw file and run it though CamillaDSP to create an 8-channel 192kHz 32-bit raw output file.

I used SOX to split the 8-channel raw file into 8 separate mono wav files so they could be individually plotted in Audacity (to verify each resulting channel output shapes). Need to find a better plotting program than Audacity. Everything appears to be functioning as expected so I can generate a larger sample file and test different build options.

I also tried the "cargo bench" options with different optimization flags from -O0 to -O3 and native (first time using it and don't think I got LTO working with it). Switching between -O2 and -O3 seemed to help one and hurt another of the bench's tests by a very slight margin. The largest delta was between no optimization and some optimization.

I will post my findings once I have them.

For a little bit more details then time provides, the tool perf is available:

PHP:
$ sudo apt-get install linux-perf
$ sudo mount -t tmpfs -o size=150m myramdisk /mnt/ramdisk
$ cd /mnt/ramdisk
$ cp <yourtestfiles> .
$ perf_4.9 stat -r 10 camilladsp ./resample_test.yml
Sep 17 21:14:53.230 INFO Capture finished, module: camilladsp
Sep 17 21:14:53.231 INFO Playback finished, module: camilladsp

 Performance counter stats for 'camilladsp /usr/share/camilladsp/configs/resample_test.yml':

        236.931054      task-clock:u (msec)       #    1.964 CPUs utilized
                 0      context-switches:u        #    0.000 K/sec
                 0      cpu-migrations:u          #    0.000 K/sec
               234      page-faults:u             #    0.988 K/sec
       227,944,890      cycles:u                  #    0.962 GHz
       392,416,050      instructions:u            #    1.72  insn per cycle
        61,382,844      branches:u                #  259.075 M/sec
           175,133      branch-misses:u           #    0.29% of all branches

       0.120633112 seconds time elapsed

Bitlab,

Thanks for this. It will be very helpful.
 
Just a hint - you can use sox for all of that, in a single command (and much more :) )

I am just learning SoX and discovered it even has some DSP functionality.

I tried using Sox to go full circle from wav -> raw -> wav and then doing a binary diff to guarantee my flag selection was correct. While doing this, I just discovered Sox creates .wav files with type WAV_FORMAT_EXTENSION instead of WAV_FORMAT_PCM so the binary diffs fail due to the header type and additional data chunk. Also discovered, many programs have issues loading the WAV_FORMAT_EXTENSION.

Loading raw formats into Audacity requires answering a lot of format related questions due to there not being a descriptive header in the raw file. Being able to load the .wav file would bypass those questions and makes loading less cumbersome.

Audacity's analysis frequency plot window could use a facelift compared to REW's. I tried loading the WAV_FORMAT_EXTENSION files into REW and it didn't like the "EXTENSION" format. I posted the conflict on REW's forum last night and John replied he will try to add EXTENSION support on the next build. REW's frequency plotter is light years ahead of Audacity's and also allows multiple plots to be rendered on the same graph using REW's Overlay option.

This would allow plotting all processed XO filters in a single window with scaling/zooming controls and printing options.
 
Last edited: