CamillaDSP - Cross-platform IIR and FIR engine for crossovers, room correction etc.

One other thing I noticed.

CamillaDSP and Alsa_cdsp support FLOAT64 input. The alsa_cdsp plugin publishes/advertises support for 6 formats including FLOAT32 and FLOAT64 via the ALSA plugin API, but programs such as JRiver Media Center [26,27,28] do NOT offer FLOAT32 and FLOAT64 as output options to be selected.

It also appears that different Linux playback programs present these options inconsistently.
 
Last edited:
For a little bit more details then time provides, the tool perf is available:

PHP:
$ sudo apt-get install linux-perf
$ sudo mount -t tmpfs -o size=150m myramdisk /mnt/ramdisk
$ cd /mnt/ramdisk
$ cp <yourtestfiles> .
$ perf_4.9 stat -r 10 camilladsp ./resample_test.yml
Sep 17 21:14:53.230 INFO Capture finished, module: camilladsp
Sep 17 21:14:53.231 INFO Playback finished, module: camilladsp

 Performance counter stats for 'camilladsp /usr/share/camilladsp/configs/resample_test.yml':

        236.931054      task-clock:u (msec)       #    1.964 CPUs utilized
                 0      context-switches:u        #    0.000 K/sec
                 0      cpu-migrations:u          #    0.000 K/sec
               234      page-faults:u             #    0.988 K/sec
       227,944,890      cycles:u                  #    0.962 GHz
       392,416,050      instructions:u            #    1.72  insn per cycle
        61,382,844      branches:u                #  259.075 M/sec
           175,133      branch-misses:u           #    0.29% of all branches

       0.120633112 seconds time elapsed


I spent some more time learning SoX and was able to generate some bash scripts that creates sample wav and raw white noise files in multiple bitdepth [16,24,32,64], sample rate [44.1-384]kHz and format [floating-point and signed-integer] combinations.

I was able to validate that a 2-channel 384kHz FLOAT64 raw input file was processed into an 8-channel 192kHz S32LE output file, applying 10 64-bit FIR filters (multi-pass convolution in .wav format), gain, 2x8 mixing, sample rate and format reduction, etc.. I then used this heavy test case to try the perf test.

My goal is to have the benchmarks all scripted for easier reuse.

Here is a sample "perf" and "time" output which runs 10 iterations CamillaDSP in the perf test.

My surprise is that running in and out of the ramdisk is showing no significant difference against the NVMe drive using the same build optimization flags. I may have to increase the sample file size.

My second surprise (if I am groking perf's output correctly) is that @ 1.1 CPU threads were used. My CPU is an old IvyBridge i7-3770K with 4 cores and 8 threads. With 2 input channels and 8 output channels, I would have assumed more thread usage would come into play, but probably not considering the tightly coupled 8-channel output interleaving.

PHP:
sudo perf_5.10 stat -r 10 /tmp/ramdisk/test/camilladsp -l warn ./config_out.yml_192000.yml

 Performance counter stats for '/tmp/ramdisk/test/camilladsp -l warn ./config_out.yml_192000.yml' (10 runs):

          3,472.40 msec task-clock                #    1.154 CPUs utilized            ( +-  0.62% )
             1,590      context-switches          #    0.458 K/sec                    ( +-  0.08% )
                 4      cpu-migrations            #    0.001 K/sec                    ( +- 25.31% )
            32,250      page-faults               #    0.009 M/sec                    ( +-  0.00% )
    13,540,444,406      cycles                    #    3.899 GHz                      ( +-  0.62% )
     7,350,682,505      stalled-cycles-frontend   #   54.29% frontend cycles idle     ( +-  1.19% )
    21,625,826,985      instructions              #    1.60  insn per cycle
                                                  #    0.34  stalled cycles per insn  ( +-  0.01% )
     1,252,698,916      branches                  #  360.759 M/sec                    ( +-  0.04% )
         2,405,940      branch-misses             #    0.19% of all branches          ( +-  0.16% )

           3.00780 +- 0.00304 seconds time elapsed  ( +-  0.10% )


time /tmp/ramdisk/test/camilladsp -l warn ./config_out.yml_192000.yml

real    0m2.998s
user    0m3.307s
sys     0m0.165s
 
Is anyone familiar with perf's "stalled-cycles-frontend" ?

It maybe of some interest and appears to be related to prefetching data (stalled waiting for data).

Gcc has some prefetch optimization build flags (e.g. "-fprefetch-loop-arrays), but I haven't found any equivalent rustc flags to try yet.

https://elinux.org/images/3/37/ELCE_-_fighting_latency.pdf

PHP:
     7,350,682,505      stalled-cycles-frontend   #   54.29% frontend cycles idle     ( +-  1.19% )

Thanks much.
 
Last edited:
Ad RAM disk vs. NVME: your TMPFS mount is only 150MB, that means the files are quite small. They all easily fit into IO cache/memory buffers of the kernels so they are all in RAM after their first use. Locating the files into the RAM disk guarantees reading from memory, without it the reading from RAM is "just" highly probable.
 
Ad RAM disk vs. NVME: your TMPFS mount is only 150MB, that means the files are quite small. They all easily fit into IO cache/memory buffers of the kernels so they are all in RAM after their first use. Locating the files into the RAM disk guarantees reading from memory, without it the reading from RAM is "just" highly probable.

FWIW, Bitlap's ramdisk is 150MB (first quote box).

My ramdisk is 2GB in the TMPFS with 32GB memory installed.

The CamillaDSP binary, config files, filters, data input and output files are all in the ramdisk for the ramdisk test (and not for the NVMe test). Still not a lot of space until the sample data file is made much larger.
 
Last edited:
OK, still 2GB is tiny compared to the 32GB of RAM available. Your IO buffers will easily reach over 20GB with that much RAM, keeping most of lately used files in RAM. The summary is shown in commands free and top.

FWIW, I have htop running and it doesn't show much memory usage at all. Zero swap is being used and I have adjusted the "swappiness" to avoid burning up the NVMe drive with unnecessary swaps and only swap as a last resort low memory situation.
 
Last edited:
In top the buff/cache value, in htop the yellow part of RAM usage (I prefer top). Swap is basically the opposite to buffers/cache.

Of course little RAM is used on a 32GB machine when little is being run. Eventually most of your RAM will be used for caches (if you actually load that many files during the session). Just to explain why your NVME and TMPFS results were basically identical, not an important topic here.
 
alsa_cdsp issues

Don't know if I should be reporting these issues here or somewhere else.

I have 2 Intel i7 X86-64 machines, one running Debian 10-64 (i7-3770/Gigabyte MoBo) and running Debian 11-64 (i7-3770K/Gigabyte MoBo).

Issue 1:

I could never get the following asound.conf to work with 705600 & 768000 with 8 channels set.

I tracked it down to the calculated min_p value being set larger than the max_p value in libasound_module_pcm_cdsp.c when 8 channels are configured.

These min/max values are passed to:
Code:
snd_pcm_ioplug_set_param_minmax(&pcm->io,SND_PCM_IOPLUG_HW_PERIOD_BYTES, min_p, max_p)

Setting max_p to 32,768 (from 16,384) appears to work because it is now larger than the calculated min_p(30,780) and the aplay test doesn't crash any more.

Code:
# /etc/asound.conf
pcm.camilladsp { # Native rate camilladsp wrapper
  type cdsp

...

  min_channels 2
  max_channels 8 # Only supports up to 4, not 8

  rates = [
    44100
    48000
    88200
    96000
    176400
    192000
    352800
    384000
    705600  # Problem with 8-channel DAC
    768000  # Problem with 8-channel DAC
  ]

...
}

Issue 2:

Alsa_cdsp runs on the Debian 10 machine and generates XRUN Underrun errors on the Debian 11 machine using aplay to test. The Debian 11 machine is faster (CPU, bus, memory, storage) and running 0.63.0 versus 0.50.0 (same alsa_cdsp but differ gcc versions).

I setup identical config files on both machines, set camilladsp log levels to "-l TRACE" and alsa_cdsp "DEBUG=1" level to "DEBUG=5" (and recompiled/installed).

I captured and compared output of "FILE IN/File OUT" test run on both machines. The faster machine had the following extra logs inserted right before the underruns started logging.

All of the configuration dumps and buffer sizes were logged with identical values before this ALSA driver callback method was called.

Code:
[B]cdsp_sw_params()[/B]
CDSP Plugin INFO: Changing SW avail min: [B]7680[/B] -> [COLOR="Red"][B]94223624286336[/B][/COLOR]
Changing SW avail min: [COLOR="red"][B]94223624286336 -> 4294967258[/B][/COLOR]

cdsp_sw_params() is called and it updates pcm->io_avail_min with the much larger value. Once this is done, the underruns start. This happens without any of the above modifications in place.

This does NOT happen on the slower machine. I commented out the resetting of the pcm->io_avail_min to the much larger value (leaving it at 7680) and alsa_cdsp stops throwing XRUN underrun errors.

The debug log "Measured sample rate" are then similar on both machines where they were half the size on the fast machine with the XRUN underrun errors.

Code:
Oct 23 11:59:31.195 TRCE Measured sample rate is 80634.53888812399 Hz, module: camillalib::filedevice
Oct 23 11:59:32.246 TRCE Measured sample rate is 69908.20094474913 Hz, module: camillalib::filedevice
Oct 23 11:59:33.298 TRCE Measured sample rate is 72119.26986957765 Hz, module: camillalib::filedevice

This bug probably needs a proper fix to find out why this function is being called with such a large number. It could indicate another problem or configuration issue.

aplay works with this hack on the faster machine (tested with [44.1-384]kHz wav files) with no XRUN undrerrun errors.

Issue 3:

JRMC28 is not working with alsa_cdsp on the faster machine. JRMC28 was developed on Debian 10, not 11. Will test JRMC28 on the Debian 10 machine as a WIP.
 
Last edited:
Update on Issue #2.

It appears the function call is returning random uninitialized memory while not returning an error code. Zeroing the memory first and then checking against zero appears to address the problem.

This callback is being called twice on startup on the both machines but only the faster one was changing the value to something problematic.

Issue 2:

Alsa_cdsp runs on the Debian 10 machine and generates XRUN Underrun errors on the Debian 11 machine using aplay to test. The Debian 11 machine is faster (CPU, bus, memory, storage) and running 0.63.0 versus 0.50.0 (same alsa_cdsp but differ gcc versions).

I setup identical config files on both machines, set camilladsp log levels to "-l TRACE" and alsa_cdsp "DEBUG=1" level to "DEBUG=5" (and recompiled/installed).

I captured and compared output of "FILE IN/File OUT" test run on both machines. The faster machine had the following extra logs inserted right before the underruns started logging.

All of the configuration dumps and buffer sizes were logged with identical values before this ALSA driver callback method was called.

Code:
[B]cdsp_sw_params()[/B]
CDSP Plugin INFO: Changing SW avail min: [B]7680[/B] -> [COLOR="Red"][B]94223624286336[/B][/COLOR]
Changing SW avail min: [COLOR="red"][B]94223624286336 -> 4294967258[/B][/COLOR]

cdsp_sw_params() is called and it updates pcm->io_avail_min with the much larger value. Once this is done, the underruns start. This happens without any of the above modifications in place.

This does NOT happen on the slower machine. I commented out the resetting of the pcm->io_avail_min to the much larger value (leaving it at 7680) and alsa_cdsp stops throwing XRUN underrun errors.

The debug log "Measured sample rate" are then similar on both machines where they were half the size on the fast machine with the XRUN underrun errors.

Code:
Oct 23 11:59:31.195 TRCE Measured sample rate is 80634.53888812399 Hz, module: camillalib::filedevice
Oct 23 11:59:32.246 TRCE Measured sample rate is 69908.20094474913 Hz, module: camillalib::filedevice
Oct 23 11:59:33.298 TRCE Measured sample rate is 72119.26986957765 Hz, module: camillalib::filedevice

This bug probably needs a proper fix to find out why this function is being called with such a large number. It could indicate another problem or configuration issue.

aplay works with this hack on the faster machine (tested with [44.1-384]kHz wav files) with no XRUN undrerrun errors.
 
Last edited:
Have you checked alsa-lib versions? Probably lots of changes in alsa-lib between debian 10 and 11.

I pulled down the alsa-lib code and started looking at portions of it, but haven't compared differences between 10 and 11.

It appears JRMCXX has some debug logging features. I installed JRMC28 on Debian 10 as well as CamillaDSP 0.6.3 and it ran (with my alsa_cdsp patches). Being such, I will probably generate JRMC28 debug logs on both machines and compare the two. Hopefully there will be some detailed helpful deltas appearing in the Debian 11 version.

JRMC28 on Debian 11 doesn't get far enough for alsa_cdsp to generate the rate specific config_out.yml file so JRMC28's debug files need to be detailed enough to discover/identify what it doesn't like.

JRMC doesn't seem to like outputing 2 channels into the 2x8 mixer. I usually have to configure it to output 8 channels of which 6 are wasted going into the 2x8 mixer. It is probably some channel/rate/format mismatch issue.
 
Last edited:
Update:

I ran the JRMC28 tests and the logs are not verbose enough to identify the specific problem. Trying to find out if there is a way to increase the verbosity level. On Debian 11, JRMC28 logs the following error, nothing more detailed.

Code:
CALSAPlugin::OpenALSA: Opening audio device camilladsp failed, Error = No such device or address
 
FYI

FYI, John Mulcahy, the creator of REW, just released a new version of REW which understands the PCM_FORMAT_EXTENSION wav file formats generated by SoX.

This allows you to use REW's feature rich frequency plotting tools and controls (zoom/smoothing/overlays/colors/axis/gain/etc.). REW's plotting tools are much more feature rich than Audacity's.

Here is a capture of 4-way filters from SoX generated white noise samples with CamillaDSP configured to output to an 8-channel 192kHz S32LE file.

SoX was then used to split the 8-channel file into 8-mono wav files for individual inspection in REW's OVERLAY window.

Hope this is helpful to anyone wanting to inspect their filters for a good sanity check.

It also contrasts what the filters look like with different sample rates (e.g. comparing 16/44.1 versus 64/768 white noise sample files).

BTW, I used REW's level adjust to back-out the individual driver gains configured in the CamillaDSP config file to more accurately visualize the XO points.

olYrxzJ.jpg
 
Last edited:
Sox can generate both formats, the non-extensible wav requires the -t wavpcm specifier:

Code:
sox -r 192000 -c 1 -n -c8 -b 32 out.wav synth 5 sine 1k

soxi -V out.wav 
soxi INFO formats: detected file format type `wav'
soxi INFO wav: EXTENSIBLE

Input File     : 'out.wav'
Channels       : 8
Sample Rate    : 192000
Precision      : 32-bit
Duration       : 00:00:05.00 = 960000 samples ~ 375 CDDA sectors
File Size      : 30.7M
Bit Rate       : 49.2M
Sample Encoding: 32-bit Signed Integer PCM

vs.

Code:
sox -r 192000 -c 1 -n -c8 -b 32 -t wavpcm out-pcm.wav synth 5 sine 1k

soxi -V out-pcm.wav 
soxi INFO formats: detected file format type `wav'

Input File     : 'out-pcm.wav'
Channels       : 8
Sample Rate    : 192000
Precision      : 32-bit
Duration       : 00:00:05.00 = 960000 samples ~ 375 CDDA sectors
File Size      : 30.7M
Bit Rate       : 49.2M
Sample Encoding: 32-bit Signed Integer PCM

Historically the built-in java convertors could not read the extensible format.