Just to recap.
To mimic a brutefir setup running a 65536 tap Dirac test filter with 8 partitions for a left and right channel I'd configure this in the latest develop branch:
Is that it?
Thx.
To mimic a brutefir setup running a 65536 tap Dirac test filter with 8 partitions for a left and right channel I'd configure this in the latest develop branch:
Code:
devices:
samplerate: 44100
chunksize: 8192
...
Code:
filters:
r_fir:
type: Conv
parameters:
type: Values
length: 65536
l_fir:
type: Conv
parameters:
type: Values
length: 65536
Is that it?
Thx.
I used the obscure pcm_hook feature of ALSA. Compiling only against libasound2 (meaning not having to bring in the full alsa-lib sources) it allows 4 hooks involving PCM devices. Open (really hook install), modify hw_params, hw_free, and close.Alright! I'm very interested in this! Please tell more 🙂
I use the hw_free call to send a stop signal to camilla so that camilla closes the loopback device and it can then be reopened with any parameters by the playback program. I use the hw_params hook to send camilla a new config with the updated parameters telling it to reopen the loopback with the now appropriate settings.
Sadly hook install is too late in that the loopback device is already locked down if camilla has it open. I send stop to camilla here anyway though because it at least causes camilla to free the connection such that the playback program only fails once if camilla was already reading from loopback. (This is when I ran into the double stop breaks camilla issue.)
The annoying thing is that I can't figure out how to get the new hw_params inside the hw_params hook function. The actual ALSA callback sends them along but the hook code stashes them in a private field that I don't know how to access without compiling against the full alsa-lib code. libasound2 has a bunch of declared but undefined structures that are defined internally to the alsa-lib code but not published in libasound2. Someone better with ALSA than me might know how to get them (I've asked on alsa-devel but no solutions yet.) Fortunately the parameters are available from the proc file system.
So what I really do is call an external program for hook creation and hw_free that does the websocket stop command and another one that parses the proc settings and configures camilla however it wants.
I actually think the external callback is nice as people might have very different things they want to do with camilla at different sample rates. Different FIR taps being the obvious one, but maybe they have a selection of them for different house curves, etc.
The two commands could of course be made one with an argument. I think it's even possible to pass in the scripts you want run in the asound.conf/.asoundrc files but I didn't implement that.
In terms of being more robust it would be helpful if camilla rather than just dying when the loopback is opened with a rate that isn't supported by the output hardware returned an error code (hopefully matching ALSA's numbering scheme) that the hook could pass back to the playback program as an error but remained running and ready to accept a new request. One problem with loopback is it doesn't reflect the actual hardware's capabilities.
RPi?
What about CamillaDSP and RPi4 (or another HW) as USB soundcard? I gave up with notebook (pulseaudio sees only 2ch) and still struggling with my PC, Camilla does not work at my daily Ubuntu 18.04 and I am too scared to upgrade to 20.04 (PulseAudio Crossover Rack does not work, there are differences in the debug log, but not enough to make a bug report) or Fedora 32 (alsa plays only 2 channels on ATI Ellesmere).
Is the RPi4 powerful enough for a 3 way FIR and EQ?
What about CamillaDSP and RPi4 (or another HW) as USB soundcard? I gave up with notebook (pulseaudio sees only 2ch) and still struggling with my PC, Camilla does not work at my daily Ubuntu 18.04 and I am too scared to upgrade to 20.04 (PulseAudio Crossover Rack does not work, there are differences in the debug log, but not enough to make a bug report) or Fedora 32 (alsa plays only 2 channels on ATI Ellesmere).
Is the RPi4 powerful enough for a 3 way FIR and EQ?
Alright! Yes I think having it separate like this is the way to go, there is no reasonable way to include something like this in CamillaDSP without causing tons of issues for every other use case.I used the obscure pcm_hook feature of ALSA. Compiling only against libasound2 (meaning not having to bring in the full....
This whole thing would certainly become easier if somebody fixed the broken pcm_notify feature of the loopback. But there doesn't seem to be anything happening with that unfortunately.
If you use the wait flag, -w, then CamillaDSP will stay running and waiting for a new command if opening the playback device fails. I'll have to think a bit about how the error code could be made available.
Is your code available somewhere? It would be fun to take a look.
What about CamillaDSP and RPi4 (or another HW) as USB soundcard? I gave up with notebook (pulseaudio sees only 2ch) and still struggling with my PC, Camilla does not work at my daily Ubuntu 18.04 and I am too scared to upgrade to 20.04 (PulseAudio Crossover Rack does not work, there are differences in the debug log, but not enough to make a bug report) or Fedora 32 (alsa plays only 2 channels on ATI Ellesmere).
Is the RPi4 powerful enough for a 3 way FIR and EQ?
Is it the old pulseaudio of 18.04 that is stopping it from working? Do you absolutely need pulse? In my main system I simply let pulse output to an alsa loopback and then I capture from there. Then the oulse backend of CamillaDSP isn't needed.
The pi4 is pretty good, it can handle quite long filters at high sample rates. How long filters do you use and at what rate?
There is something strange going on with the latest pulseaudio. I can capture audio from pulse on my fedora 32 laptop, but with very high cpu usage. This wasn't a problem before.
Looks correct!
Didn't work as I suggested.
When checked with --check, an error on missing "values" were issued.
Below worked:
Code:
l_fir:
type: Conv
parameters:
type: Values
values: [ 0.0 ]
length: 65536
It seems the values field is still required, or the check function still expects it.
Oops sorry, the values field is still required. You can leave out the entire parameters: block to make a dirac spike, but that of course means you cant specify the length. I'll make the values field optional as well.Didn't work as I suggested.
When checked with --check, an error on missing "values" were issued.
Below worked:
It seems the values field is still required, or the check function still expects it.Code:l_fir: type: Conv parameters: type: Values values: [ 0.0 ] length: 65536
If you put "values: [ 1.0 ]" instead it will make a dirac, [1.0, 0.0, 0.0 ..... 0.0] of the length you want.Now you just have zeros (which works fine for just testing cpu load, but not very useful if you want to look at the output).
Last edited:
What you say is the function as being implemented wouldn't act like a "transparent" filter!?!
Meaning - no audio playback is possible?
I now fired up the DSP for the first time.
It immediately locked up the CPU at 100%. (brutefir runs with dirac idle at 2-3%.)
The PI4 runs at 1500Mhz on PIOS64.
For the test I pipe squeezelite >> CDSP >> aplay.
systemd:
It's basically the same systemd setup as brutefir.
My config:
What am I doing wrong?
Meaning - no audio playback is possible?
I now fired up the DSP for the first time.
It immediately locked up the CPU at 100%. (brutefir runs with dirac idle at 2-3%.)
The PI4 runs at 1500Mhz on PIOS64.
For the test I pipe squeezelite >> CDSP >> aplay.
systemd:
Code:
ExecStart=/bin/sh -c "/usr/local/bin/squeezelite -n slcdsp -b 20000:20000 -a 32 -o - | /usr/local/bin/camilladsp /etc/camilladsp/configs/config.yml | /usr/bin/aplay --quiet -D hw:0,0 -r 44100 -f S32_LE -c 2"
It's basically the same systemd setup as brutefir.
My config:
Code:
---
devices:
samplerate: 44100
chunksize: 8192
silence_threshold: -60
silence_timeout: 3.0
capture:
type: Stdin
channels: 2
format: S32LE
playback:
type: Stdout
channels: 2
format: S32LE
filters:
r_fir:
type: Conv
parameters:
type: Values
values: [ 1.0 ]
length: 65536
l_fir:
type: Conv
parameters:
type: Values
values: [ 1.0 ]
length: 65536
mixers:
mono:
channels:
in: 2
out: 2
mapping:
- dest: 0
sources:
- channel: 0
gain: -6
inverted: false
- channel: 1
gain: -6
inverted: false
- dest: 1
sources:
- channel: 0
gain: -6
inverted: false
- channel: 1
gain: -6
inverted: false
pipeline:
- type: Mixer
name: mono
- type: Filter
channel: 0
names:
- r_fir
- type: Filter
channel: 1
names:
- l_fir
What am I doing wrong?
Add this to the devices section:
The 100% cpu happens when you have a source that will feed data at an unlimited rate, and a playback device with a limited rate. Then CamillaDSP will read as fast as it can until it has filled its internal queues. At that point it settles down and you should see a low cpu usage. Setting queuelimit to 1 makes the max internal queue length 1 element, so the queues get filled right away.
To make a transparent filter you give "values: [1.0]" and "lengh: 65536". That takes all the values from "values" and adds zeroes after to make it 65536 elements long in total. If you give "values: [0.0]", then the first element also becomes zero so the whole vector becomes just 66536 zeroes.
If you leave out the whole parameters block, it generates a filter that is a single one, followed by chunksize-1 zeroes.
Code:
queuelimit: 1
To make a transparent filter you give "values: [1.0]" and "lengh: 65536". That takes all the values from "values" and adds zeroes after to make it 65536 elements long in total. If you give "values: [0.0]", then the first element also becomes zero so the whole vector becomes just 66536 zeroes.
If you leave out the whole parameters block, it generates a filter that is a single one, followed by chunksize-1 zeroes.
Code:
---
devices:
samplerate: 44100
chunksize: 8192
silence_threshold: -60
silence_timeout: 3.0
queuelimit: 1
capture:
type: Stdin
channels: 2
format: S32LE
playback:
type: Stdout
channels: 2
format: S32LE
It's still at 100% max and keeps squeezelite @ around 32%
That's odd. Does it stay at 100% indefinitely, and does the memory usage creep up? Is the sound you get out ok?
I'll try to reproduce this tonight.
I'll try to reproduce this tonight.
I just figured something out. Playback works. 😀
However.
There's more.
If idle, CDSP goes through the roof starting at 106% then goes down to around 100% after a couple of seconds. And it stays that way.
As soon as I start playback it drops to 3%. Not bad.
As soon as I stop playback it goes up again to around 100%.
There's something wrong if no signal is present on stdin.
However.
There's more.
If idle, CDSP goes through the roof starting at 106% then goes down to around 100% after a couple of seconds. And it stays that way.
As soon as I start playback it drops to 3%. Not bad.
As soon as I stop playback it goes up again to around 100%.
There's something wrong if no signal is present on stdin.
Ok! Then I think I know what's going on, if I'm right it's easy to fix. Will take a look tonight.
Does it do the same if you use the File device and point it at /dev/stdin?
Does it do the same if you use the File device and point it at /dev/stdin?
Code:
---
devices:
samplerate: 44100
chunksize: 8192
silence_threshold: -60
silence_timeout: 3.0
queuelimit: 1
capture:
type: File
filename: /dev/stdin
channels: 2
format: S32LE
playback:
type: File
filename: /dev/stdout
channels: 2
format: S32LE
Same situation.
I also just noticed a kind of of scrambled mess (distortions) for half a second after starting the service and playback the first time. After that all is OK. Next tracks run without issues.
With no camilla specific open/close scripts the code is available here.Is your code available somewhere? It would be fun to take a look.
GitHub - scripple/alsa_hook_lbparams: ALSA pcm hook to let the capture end of the loopback device connect/disconnect with changing hw_params
I'd like to share some early benchmarks I just did.
I've been testing
* the RUSTFFT performance vs. FFTW3
* CamillaDSP vs. Brutefir
* with and without NEON/march optimizations
I've been offline convolving from file to file stored on tmpfs using 2^16 taps.
What I found surprising is that there seems no change with NEON in or out.
And 2nd that brutefir was still quite a bit faster on the job.
RUSTFFT seems to be about 7.5% slower than FFTW. That's not surprising
since Henrik mentioned a slight advantage of FFTW over RustFFT earlier.
The Q that IMO remains - What's going on with NEON?
Enjoy.
I've been testing
* the RUSTFFT performance vs. FFTW3
* CamillaDSP vs. Brutefir
* with and without NEON/march optimizations
I've been offline convolving from file to file stored on tmpfs using 2^16 taps.
Code:
Performance Test
RPI4 4GB @1500MHz
PIOS64
SSD as boot and root device
camilladsp develop branch - Oct-07-2020
Testfile: .wav 44100/16bit - 00:03:06.16
TC
1. camilladsp-fftw $CONFIG
2. camilladsp $CONFIG
3. brutefir -nodefault $CONFIG_BF
4. TC1 without Rust opt flags (supposedly enabling NEON)
5. TC2 without No Rust opt flags (supposedly enabling NEON)
##PI4OPTS-NEON-FFTW3
real 0m8,176s
user 0m9,272s
sys 0m0,692s
##PI4OPTS-NEON-RUSTFFT
real 0m8,787s
user 0m9,815s
sys 0m0,712s
>> +7.5%
##Brutefir
real 0m7,297s
user 0m0,114s
sys 0m0,382s
>> -12.2%
TC4/TC5 - Test w/o RUSTFLAGS ( Neon and march=native) did not show any differences-
What I found surprising is that there seems no change with NEON in or out.
And 2nd that brutefir was still quite a bit faster on the job.
RUSTFFT seems to be about 7.5% slower than FFTW. That's not surprising
since Henrik mentioned a slight advantage of FFTW over RustFFT earlier.
The Q that IMO remains - What's going on with NEON?
Enjoy.
The differences between RustFFT and FFTW are as expected. Same goes for CamillaDSP vs brutefir. I'm getting closer and closer to brutefir in speed, but I will probably never beat it.I'd like to share some early benchmarks I just did.
I've been testing
* the RUSTFFT performance vs. FFTW3
* CamillaDSP vs. Brutefir
* with and without NEON/march optimizations
I've been offline convolving from file to file stored on tmpfs using 2^16 taps.
What I found surprising is that there seems no change with NEON in or out.
And 2nd that brutefir was still quite a bit faster on the job.
RUSTFFT seems to be about 7.5% slower than FFTW. That's not surprising
since Henrik mentioned a slight advantage of FFTW over RustFFT earlier.
The Q that IMO remains - What's going on with NEON?
Enjoy.
For this test the majority of cpu time is used by the FFT/iFFT. It seems like RustFFT doesn't have any loops that the compiler manages to vectorize. And then there would be no speed advantage with neon. There is ongoing work to support AVX in RustFFT, but nothing for NEON yet. Proper support for NEON is very new in Rust so I expect things to change in this area.
When FFTW is used there shouldn't be any difference at all, since it's compiled by another compiler that doesn't care about the rustflags.
I think you would see a difference if you benchmark with a config that uses the asynchronous resampler. That one uses a lot of "simple" loops. IIRC this is where I saw some improvement (but still not a lot).
I started looking into how to properly use SIMD in the resampler, but it's a lot of work since a lot of code has to rewritten as separate versions for SSE, AVX, NEON etc.
Should be fixed now. Can you try this again with the new version in develop?I just figured something out. Playback works. 😀
However.
There's more.
If idle, CDSP goes through the roof starting at 106% then goes down to around 100% after a couple of seconds. And it stays that way.
As soon as I start playback it drops to 3%. Not bad.
As soon as I stop playback it goes up again to around 100%.
There's something wrong if no signal is present on stdin.
- Home
- Source & Line
- PC Based
- CamillaDSP - Cross-platform IIR and FIR engine for crossovers, room correction etc