One other thing I noticed.
CamillaDSP and Alsa_cdsp support FLOAT64 input. The alsa_cdsp plugin publishes/advertises support for 6 formats including FLOAT32 and FLOAT64 via the ALSA plugin API, but programs such as JRiver Media Center [26,27,28] do NOT offer FLOAT32 and FLOAT64 as output options to be selected.
It also appears that different Linux playback programs present these options inconsistently.
CamillaDSP and Alsa_cdsp support FLOAT64 input. The alsa_cdsp plugin publishes/advertises support for 6 formats including FLOAT32 and FLOAT64 via the ALSA plugin API, but programs such as JRiver Media Center [26,27,28] do NOT offer FLOAT32 and FLOAT64 as output options to be selected.
It also appears that different Linux playback programs present these options inconsistently.
Last edited:
For a little bit more details then time provides, the tool perf is available:
PHP:$ sudo apt-get install linux-perf $ sudo mount -t tmpfs -o size=150m myramdisk /mnt/ramdisk $ cd /mnt/ramdisk $ cp <yourtestfiles> . $ perf_4.9 stat -r 10 camilladsp ./resample_test.yml Sep 17 21:14:53.230 INFO Capture finished, module: camilladsp Sep 17 21:14:53.231 INFO Playback finished, module: camilladsp Performance counter stats for 'camilladsp /usr/share/camilladsp/configs/resample_test.yml': 236.931054 task-clock:u (msec) # 1.964 CPUs utilized 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 234 page-faults:u # 0.988 K/sec 227,944,890 cycles:u # 0.962 GHz 392,416,050 instructions:u # 1.72 insn per cycle 61,382,844 branches:u # 259.075 M/sec 175,133 branch-misses:u # 0.29% of all branches 0.120633112 seconds time elapsed
I spent some more time learning SoX and was able to generate some bash scripts that creates sample wav and raw white noise files in multiple bitdepth [16,24,32,64], sample rate [44.1-384]kHz and format [floating-point and signed-integer] combinations.
I was able to validate that a 2-channel 384kHz FLOAT64 raw input file was processed into an 8-channel 192kHz S32LE output file, applying 10 64-bit FIR filters (multi-pass convolution in .wav format), gain, 2x8 mixing, sample rate and format reduction, etc.. I then used this heavy test case to try the perf test.
My goal is to have the benchmarks all scripted for easier reuse.
Here is a sample "perf" and "time" output which runs 10 iterations CamillaDSP in the perf test.
My surprise is that running in and out of the ramdisk is showing no significant difference against the NVMe drive using the same build optimization flags. I may have to increase the sample file size.
My second surprise (if I am groking perf's output correctly) is that @ 1.1 CPU threads were used. My CPU is an old IvyBridge i7-3770K with 4 cores and 8 threads. With 2 input channels and 8 output channels, I would have assumed more thread usage would come into play, but probably not considering the tightly coupled 8-channel output interleaving.
PHP:
sudo perf_5.10 stat -r 10 /tmp/ramdisk/test/camilladsp -l warn ./config_out.yml_192000.yml
Performance counter stats for '/tmp/ramdisk/test/camilladsp -l warn ./config_out.yml_192000.yml' (10 runs):
3,472.40 msec task-clock # 1.154 CPUs utilized ( +- 0.62% )
1,590 context-switches # 0.458 K/sec ( +- 0.08% )
4 cpu-migrations # 0.001 K/sec ( +- 25.31% )
32,250 page-faults # 0.009 M/sec ( +- 0.00% )
13,540,444,406 cycles # 3.899 GHz ( +- 0.62% )
7,350,682,505 stalled-cycles-frontend # 54.29% frontend cycles idle ( +- 1.19% )
21,625,826,985 instructions # 1.60 insn per cycle
# 0.34 stalled cycles per insn ( +- 0.01% )
1,252,698,916 branches # 360.759 M/sec ( +- 0.04% )
2,405,940 branch-misses # 0.19% of all branches ( +- 0.16% )
3.00780 +- 0.00304 seconds time elapsed ( +- 0.10% )
time /tmp/ramdisk/test/camilladsp -l warn ./config_out.yml_192000.yml
real 0m2.998s
user 0m3.307s
sys 0m0.165s
Is anyone familiar with perf's "stalled-cycles-frontend" ?
It maybe of some interest and appears to be related to prefetching data (stalled waiting for data).
Gcc has some prefetch optimization build flags (e.g. "-fprefetch-loop-arrays), but I haven't found any equivalent rustc flags to try yet.
https://elinux.org/images/3/37/ELCE_-_fighting_latency.pdf
Thanks much.
It maybe of some interest and appears to be related to prefetching data (stalled waiting for data).
Gcc has some prefetch optimization build flags (e.g. "-fprefetch-loop-arrays), but I haven't found any equivalent rustc flags to try yet.
https://elinux.org/images/3/37/ELCE_-_fighting_latency.pdf
PHP:
7,350,682,505 stalled-cycles-frontend # 54.29% frontend cycles idle ( +- 1.19% )
Thanks much.
Last edited:
Ad RAM disk vs. NVME: your TMPFS mount is only 150MB, that means the files are quite small. They all easily fit into IO cache/memory buffers of the kernels so they are all in RAM after their first use. Locating the files into the RAM disk guarantees reading from memory, without it the reading from RAM is "just" highly probable.
Ad RAM disk vs. NVME: your TMPFS mount is only 150MB, that means the files are quite small. They all easily fit into IO cache/memory buffers of the kernels so they are all in RAM after their first use. Locating the files into the RAM disk guarantees reading from memory, without it the reading from RAM is "just" highly probable.
FWIW, Bitlap's ramdisk is 150MB (first quote box).
My ramdisk is 2GB in the TMPFS with 32GB memory installed.
The CamillaDSP binary, config files, filters, data input and output files are all in the ramdisk for the ramdisk test (and not for the NVMe test). Still not a lot of space until the sample data file is made much larger.
Last edited:
OK, still 2GB is tiny compared to the 32GB of RAM available. Your IO buffers will easily reach over 20GB with that much RAM, keeping most of lately used files in RAM. The summary is shown in commands free and top.
OK, still 2GB is tiny compared to the 32GB of RAM available. Your IO buffers will easily reach over 20GB with that much RAM, keeping most of lately used files in RAM. The summary is shown in commands free and top.
FWIW, I have htop running and it doesn't show much memory usage at all. Zero swap is being used and I have adjusted the "swappiness" to avoid burning up the NVMe drive with unnecessary swaps and only swap as a last resort low memory situation.
Last edited:
In top the buff/cache value, in htop the yellow part of RAM usage (I prefer top). Swap is basically the opposite to buffers/cache.
Of course little RAM is used on a 32GB machine when little is being run. Eventually most of your RAM will be used for caches (if you actually load that many files during the session). Just to explain why your NVME and TMPFS results were basically identical, not an important topic here.
Of course little RAM is used on a 32GB machine when little is being run. Eventually most of your RAM will be used for caches (if you actually load that many files during the session). Just to explain why your NVME and TMPFS results were basically identical, not an important topic here.
alsa_cdsp issues
Don't know if I should be reporting these issues here or somewhere else.
I have 2 Intel i7 X86-64 machines, one running Debian 10-64 (i7-3770/Gigabyte MoBo) and running Debian 11-64 (i7-3770K/Gigabyte MoBo).
Issue 1:
I could never get the following asound.conf to work with 705600 & 768000 with 8 channels set.
I tracked it down to the calculated min_p value being set larger than the max_p value in libasound_module_pcm_cdsp.c when 8 channels are configured.
These min/max values are passed to:
Setting max_p to 32,768 (from 16,384) appears to work because it is now larger than the calculated min_p(30,780) and the aplay test doesn't crash any more.
Issue 2:
Alsa_cdsp runs on the Debian 10 machine and generates XRUN Underrun errors on the Debian 11 machine using aplay to test. The Debian 11 machine is faster (CPU, bus, memory, storage) and running 0.63.0 versus 0.50.0 (same alsa_cdsp but differ gcc versions).
I setup identical config files on both machines, set camilladsp log levels to "-l TRACE" and alsa_cdsp "DEBUG=1" level to "DEBUG=5" (and recompiled/installed).
I captured and compared output of "FILE IN/File OUT" test run on both machines. The faster machine had the following extra logs inserted right before the underruns started logging.
All of the configuration dumps and buffer sizes were logged with identical values before this ALSA driver callback method was called.
cdsp_sw_params() is called and it updates pcm->io_avail_min with the much larger value. Once this is done, the underruns start. This happens without any of the above modifications in place.
This does NOT happen on the slower machine. I commented out the resetting of the pcm->io_avail_min to the much larger value (leaving it at 7680) and alsa_cdsp stops throwing XRUN underrun errors.
The debug log "Measured sample rate" are then similar on both machines where they were half the size on the fast machine with the XRUN underrun errors.
This bug probably needs a proper fix to find out why this function is being called with such a large number. It could indicate another problem or configuration issue.
aplay works with this hack on the faster machine (tested with [44.1-384]kHz wav files) with no XRUN undrerrun errors.
Issue 3:
JRMC28 is not working with alsa_cdsp on the faster machine. JRMC28 was developed on Debian 10, not 11. Will test JRMC28 on the Debian 10 machine as a WIP.
Don't know if I should be reporting these issues here or somewhere else.
I have 2 Intel i7 X86-64 machines, one running Debian 10-64 (i7-3770/Gigabyte MoBo) and running Debian 11-64 (i7-3770K/Gigabyte MoBo).
Issue 1:
I could never get the following asound.conf to work with 705600 & 768000 with 8 channels set.
I tracked it down to the calculated min_p value being set larger than the max_p value in libasound_module_pcm_cdsp.c when 8 channels are configured.
These min/max values are passed to:
Code:
snd_pcm_ioplug_set_param_minmax(&pcm->io,SND_PCM_IOPLUG_HW_PERIOD_BYTES, min_p, max_p)
Setting max_p to 32,768 (from 16,384) appears to work because it is now larger than the calculated min_p(30,780) and the aplay test doesn't crash any more.
Code:
# /etc/asound.conf
pcm.camilladsp { # Native rate camilladsp wrapper
type cdsp
...
min_channels 2
max_channels 8 # Only supports up to 4, not 8
rates = [
44100
48000
88200
96000
176400
192000
352800
384000
705600 # Problem with 8-channel DAC
768000 # Problem with 8-channel DAC
]
...
}
Issue 2:
Alsa_cdsp runs on the Debian 10 machine and generates XRUN Underrun errors on the Debian 11 machine using aplay to test. The Debian 11 machine is faster (CPU, bus, memory, storage) and running 0.63.0 versus 0.50.0 (same alsa_cdsp but differ gcc versions).
I setup identical config files on both machines, set camilladsp log levels to "-l TRACE" and alsa_cdsp "DEBUG=1" level to "DEBUG=5" (and recompiled/installed).
I captured and compared output of "FILE IN/File OUT" test run on both machines. The faster machine had the following extra logs inserted right before the underruns started logging.
All of the configuration dumps and buffer sizes were logged with identical values before this ALSA driver callback method was called.
Code:
[B]cdsp_sw_params()[/B]
CDSP Plugin INFO: Changing SW avail min: [B]7680[/B] -> [COLOR="Red"][B]94223624286336[/B][/COLOR]
Changing SW avail min: [COLOR="red"][B]94223624286336 -> 4294967258[/B][/COLOR]
cdsp_sw_params() is called and it updates pcm->io_avail_min with the much larger value. Once this is done, the underruns start. This happens without any of the above modifications in place.
This does NOT happen on the slower machine. I commented out the resetting of the pcm->io_avail_min to the much larger value (leaving it at 7680) and alsa_cdsp stops throwing XRUN underrun errors.
The debug log "Measured sample rate" are then similar on both machines where they were half the size on the fast machine with the XRUN underrun errors.
Code:
Oct 23 11:59:31.195 TRCE Measured sample rate is 80634.53888812399 Hz, module: camillalib::filedevice
Oct 23 11:59:32.246 TRCE Measured sample rate is 69908.20094474913 Hz, module: camillalib::filedevice
Oct 23 11:59:33.298 TRCE Measured sample rate is 72119.26986957765 Hz, module: camillalib::filedevice
This bug probably needs a proper fix to find out why this function is being called with such a large number. It could indicate another problem or configuration issue.
aplay works with this hack on the faster machine (tested with [44.1-384]kHz wav files) with no XRUN undrerrun errors.
Issue 3:
JRMC28 is not working with alsa_cdsp on the faster machine. JRMC28 was developed on Debian 10, not 11. Will test JRMC28 on the Debian 10 machine as a WIP.
Last edited:
Update on Issue #2.
It appears the function call is returning random uninitialized memory while not returning an error code. Zeroing the memory first and then checking against zero appears to address the problem.
This callback is being called twice on startup on the both machines but only the faster one was changing the value to something problematic.
It appears the function call is returning random uninitialized memory while not returning an error code. Zeroing the memory first and then checking against zero appears to address the problem.
This callback is being called twice on startup on the both machines but only the faster one was changing the value to something problematic.
Issue 2:
Alsa_cdsp runs on the Debian 10 machine and generates XRUN Underrun errors on the Debian 11 machine using aplay to test. The Debian 11 machine is faster (CPU, bus, memory, storage) and running 0.63.0 versus 0.50.0 (same alsa_cdsp but differ gcc versions).
I setup identical config files on both machines, set camilladsp log levels to "-l TRACE" and alsa_cdsp "DEBUG=1" level to "DEBUG=5" (and recompiled/installed).
I captured and compared output of "FILE IN/File OUT" test run on both machines. The faster machine had the following extra logs inserted right before the underruns started logging.
All of the configuration dumps and buffer sizes were logged with identical values before this ALSA driver callback method was called.
Code:[B]cdsp_sw_params()[/B] CDSP Plugin INFO: Changing SW avail min: [B]7680[/B] -> [COLOR="Red"][B]94223624286336[/B][/COLOR] Changing SW avail min: [COLOR="red"][B]94223624286336 -> 4294967258[/B][/COLOR]
cdsp_sw_params() is called and it updates pcm->io_avail_min with the much larger value. Once this is done, the underruns start. This happens without any of the above modifications in place.
This does NOT happen on the slower machine. I commented out the resetting of the pcm->io_avail_min to the much larger value (leaving it at 7680) and alsa_cdsp stops throwing XRUN underrun errors.
The debug log "Measured sample rate" are then similar on both machines where they were half the size on the fast machine with the XRUN underrun errors.
Code:Oct 23 11:59:31.195 TRCE Measured sample rate is 80634.53888812399 Hz, module: camillalib::filedevice Oct 23 11:59:32.246 TRCE Measured sample rate is 69908.20094474913 Hz, module: camillalib::filedevice Oct 23 11:59:33.298 TRCE Measured sample rate is 72119.26986957765 Hz, module: camillalib::filedevice
This bug probably needs a proper fix to find out why this function is being called with such a large number. It could indicate another problem or configuration issue.
aplay works with this hack on the faster machine (tested with [44.1-384]kHz wav files) with no XRUN undrerrun errors.
Last edited:
Have you checked alsa-lib versions? Probably lots of changes in alsa-lib between debian 10 and 11.
Have you checked alsa-lib versions? Probably lots of changes in alsa-lib between debian 10 and 11.
I pulled down the alsa-lib code and started looking at portions of it, but haven't compared differences between 10 and 11.
It appears JRMCXX has some debug logging features. I installed JRMC28 on Debian 10 as well as CamillaDSP 0.6.3 and it ran (with my alsa_cdsp patches). Being such, I will probably generate JRMC28 debug logs on both machines and compare the two. Hopefully there will be some detailed helpful deltas appearing in the Debian 11 version.
JRMC28 on Debian 11 doesn't get far enough for alsa_cdsp to generate the rate specific config_out.yml file so JRMC28's debug files need to be detailed enough to discover/identify what it doesn't like.
JRMC doesn't seem to like outputing 2 channels into the 2x8 mixer. I usually have to configure it to output 8 channels of which 6 are wasted going into the 2x8 mixer. It is probably some channel/rate/format mismatch issue.
Last edited:
Update:
I ran the JRMC28 tests and the logs are not verbose enough to identify the specific problem. Trying to find out if there is a way to increase the verbosity level. On Debian 11, JRMC28 logs the following error, nothing more detailed.
I ran the JRMC28 tests and the logs are not verbose enough to identify the specific problem. Trying to find out if there is a way to increase the verbosity level. On Debian 11, JRMC28 logs the following error, nothing more detailed.
Code:
CALSAPlugin::OpenALSA: Opening audio device camilladsp failed, Error = No such device or address
Class dunce here again. Stupidly upgraded the kernel from 5.11 to 5.13 and ran into this:
I have no idea where to go next. Help please!
Code:
ubuntu@ubuntu:~$ sudo modprobe snd-aloop
modprobe: FATAL: Module snd-aloop not found in directory /lib/modules/5.13.0-1008-raspi
I have no idea where to go next. Help please!
Brilliant! All sorted. Many thanks again Michael. If I had to do this before, I've completely forgotten about itTry running:
sudo apt install linux-modules-extra-$(uname -r)
Michael
You are welcome.
I ran in to it when I did I clean install of Ubuntu 21.10 recently, never had to worry about it on 21.04.
Michael
I ran in to it when I did I clean install of Ubuntu 21.10 recently, never had to worry about it on 21.04.
Michael
FYI
FYI, John Mulcahy, the creator of REW, just released a new version of REW which understands the PCM_FORMAT_EXTENSION wav file formats generated by SoX.
This allows you to use REW's feature rich frequency plotting tools and controls (zoom/smoothing/overlays/colors/axis/gain/etc.). REW's plotting tools are much more feature rich than Audacity's.
Here is a capture of 4-way filters from SoX generated white noise samples with CamillaDSP configured to output to an 8-channel 192kHz S32LE file.
SoX was then used to split the 8-channel file into 8-mono wav files for individual inspection in REW's OVERLAY window.
Hope this is helpful to anyone wanting to inspect their filters for a good sanity check.
It also contrasts what the filters look like with different sample rates (e.g. comparing 16/44.1 versus 64/768 white noise sample files).
BTW, I used REW's level adjust to back-out the individual driver gains configured in the CamillaDSP config file to more accurately visualize the XO points.
FYI, John Mulcahy, the creator of REW, just released a new version of REW which understands the PCM_FORMAT_EXTENSION wav file formats generated by SoX.
This allows you to use REW's feature rich frequency plotting tools and controls (zoom/smoothing/overlays/colors/axis/gain/etc.). REW's plotting tools are much more feature rich than Audacity's.
Here is a capture of 4-way filters from SoX generated white noise samples with CamillaDSP configured to output to an 8-channel 192kHz S32LE file.
SoX was then used to split the 8-channel file into 8-mono wav files for individual inspection in REW's OVERLAY window.
Hope this is helpful to anyone wanting to inspect their filters for a good sanity check.
It also contrasts what the filters look like with different sample rates (e.g. comparing 16/44.1 versus 64/768 white noise sample files).
BTW, I used REW's level adjust to back-out the individual driver gains configured in the CamillaDSP config file to more accurately visualize the XO points.

Last edited:
Sox can generate both formats, the non-extensible wav requires the -t wavpcm specifier:
vs.
Historically the built-in java convertors could not read the extensible format.
Code:
sox -r 192000 -c 1 -n -c8 -b 32 out.wav synth 5 sine 1k
soxi -V out.wav
soxi INFO formats: detected file format type `wav'
soxi INFO wav: EXTENSIBLE
Input File : 'out.wav'
Channels : 8
Sample Rate : 192000
Precision : 32-bit
Duration : 00:00:05.00 = 960000 samples ~ 375 CDDA sectors
File Size : 30.7M
Bit Rate : 49.2M
Sample Encoding: 32-bit Signed Integer PCM
vs.
Code:
sox -r 192000 -c 1 -n -c8 -b 32 -t wavpcm out-pcm.wav synth 5 sine 1k
soxi -V out-pcm.wav
soxi INFO formats: detected file format type `wav'
Input File : 'out-pcm.wav'
Channels : 8
Sample Rate : 192000
Precision : 32-bit
Duration : 00:00:05.00 = 960000 samples ~ 375 CDDA sectors
File Size : 30.7M
Bit Rate : 49.2M
Sample Encoding: 32-bit Signed Integer PCM
Historically the built-in java convertors could not read the extensible format.
- Home
- Source & Line
- PC Based
- CamillaDSP - Cross-platform IIR and FIR engine for crossovers, room correction etc