CamillaDSP - Cross-platform IIR and FIR engine for crossovers, room correction etc

phofman · 2020-06-21 1:53 pm

wealas said:
it seems rockchip is working on it directly: usb: dwc2: hcd: fix isoc out transfer with unaligned dma address * rockchip-linux/kernel@c4af28d * GitHub

While the author of that commit contributes to mainline kernel too, this particular commit (quite recent - from April) is not in the mainline - the introduced method dwc2_alloc_qh_dma_aligned_buf is nowhere to find in the mainline repo.

That means there is dwc2 driver in the mainline, supported by Synopsys guys in linux-usb mailinglist, and a different dwc2 driver in the rockchip repo, still kept at android version of 4.4. The latest patched kernel by GitHub - heiher/linux at nanopi-m4 does not contain that patch either. Yet the patch fixes quite an important issue - playback of quite common 24bits 96KHz 3LE. I believe such issue has been fixed long time ago in the mainline, using a different patch. And this is not even gadget, just the regular USB host. Gadget code has been way behind, very likely working very poor in 4.4, if at all.

I am sure such device will work great for mainstream tasks which have been widely tested for years. But IMO it is not suitable for novelty tasks using or even advancing the latest kernel code where yet unresolved issues are to be expected and communication with know-how holders for that particular hardware is necessary.

wealas · 2020-06-21 3:25 pm

I suspect the reason is that there is no interest in the non-basic gadget functionality.
Before you started working on the USB audio gadget, I think you said the code was pretty old and mostly abandoned. It looks to me like you showed interest and things started moving.
It also looks to me like you have a rockchip resource that is active and has the hardware knowledge. You also have the DWC module developers that can handle the software side and maintain the associated mainline code.
You also have a platform that is very popular for multimedia tasks, much more so than the RPI. And it supports 8ch I2S streams out of the box. It is also supported by Armbian, which I consider much better option than Raspbian.
The problem with the RPI is Broadcom. They don't publish full datasheets. So when you need something hardware related that was not part of the initial RPI design you're out of luck. That's why the RPI doesn't support slave i2s mode to this day. If you dig a bit you will find there was one ex-broadcom guy that managed to supply some register info for a bit but that was it. As soon as the rpi4 gets stable kernel support, all the broadcom support will dry out.

phofman · 2020-06-21 6:46 pm

wealas said:
That's why the RPI doesn't support slave i2s mode to this day. If you dig a bit you will find there was one ex-broadcom guy that managed to supply some register info for a bit but that was it.

Hm, the PCM/I2S interface of the Broadcom SoC is relatively well described in the public specification https://www.raspberrypi.org/documentation/hardware/raspberrypi/bcm2711/rpi_DATA_2711_1p0.pdf page 138+ - register MODE_A, bits CLKM and FSM define the master/slave operation. A number of drivers/DACs configure the PCM interface to slave and provide their own BCLK/LRCLK - e.g. linux/hifiberry_dacplus.c at rpi-5.7.y * raspberrypi/linux * GitHub - setting alsa ASoC flag SND_SOC_DAIFMT_CBM_CFM which is used in linux/bcm2835-i2s.c at rpi-5.7.y * raspberrypi/linux * GitHub -> linux/bcm2835-i2s.c at rpi-5.7.y * raspberrypi/linux * GitHub

wealas · 2020-06-21 7:21 pm

Oops, sorry, I meant external MCLK (for the PCM_MCLK i2s clock in the datasheet) via GPIO.
In any case, this is way OT and to be fair, you are the one doing all the work on the audio gadget so you pick whatever platform you want.
I'm trying to move away from USB altogether and start using Ravenna ASAP.

phofman · 2020-06-21 7:38 pm

IIUC PCM_MCLK is internal bitclock for the master mode. It is replaced by external bitclock (i.e. PCM_CLK) in the slave mode.

In clock master mode (CLKM=0), the PCM_CLK is an output and is driven from the PCM_MCLK clock
input.
In clock slave mode (CLKM=1), the PCM_CLK is an input, supplied by some external clock source

RPi does not generate MCLK for the DAC, MCLK is not part of I2S. Many DACs generate their own MCLK via PLL from the BCLK.

phofman · 2020-06-21 8:27 pm

The PCM_MCLK is certainly the bclk because the driver sets bclk_rate to the assigned clock device (defined by upper layers) in master mode linux/bcm2835-i2s.c at rpi-5.7.y * raspberrypi/linux * GitHub .

HenrikEnquist · 2020-06-24 9:59 pm

HenrikEnquist said:
Quick test on my raspberry pi 4 with the same pipeline. Well not so quick, compiling on the Pi takes forever, especially with FFTW...

I had to go down to 96kHz since 192k was a bit much.

I'm running with resampling 44.1 -> 96kHz, chunksize 4096, 8 FIR filters per channel of 65k taps each.

RealFFT: 62%
FFTW: 58%

Pushing it a bit more:
Resampling 44.1 -> 192kHz, chunksize 4096, 6 FIR filters per channel of 65k taps each.

RealFFT: 95%

FFTW: 87%

I think most people would run much less than 6 x 65k taps, so I think it's safe to say the Pi4 runs just fine at 192kHz with resampling enabled.

I have been working on making the code more efficient. If I repeat the last two tests I did above on the Pi, I now get:
RealFFT: 82% (down from 95%)
FFTW: 78% (down from 87%)

Considering that most of the CPU-time is spent on FFT and IFFT (which are the same as before) I'm quite happy with this 🙂 This is in branch "split", which will be merged to develop as soon as I have tested it a little more.

voodooless · 2020-06-25 2:05 pm

One could bring down CPU time by using something like Nvidia Jetson Nano and use cuFFT. It's a drop-in(-ish) replacement for FFTW. It might give you upto 472 GFLOPS of FIR filtering goodness 😛

HenrikEnquist · 2020-06-25 2:41 pm

4real said:
One could bring down CPU time by using something like Nvidia Jetson Nano and use cuFFT. It's a drop-in(-ish) replacement for FFTW. It might give you upto 472 GFLOPS of FIR filtering goodness 😛

I have been playing with the idea, but with clFFT instead of cuFFT. There doesn't seem to be any good cuFFT binding for rust, but there is a clFFT one that looks good. It's a fun experiment and I will probably try it at some point.

But I don't expect to get anywhere near the maximum speed. I think the short(ish) 1D data used here is so small that FFT on the CPU is pretty fast, and doing it on a GPU means there is a large overhead of moving data to and from the GPU. It probably only makes sense for 2D or 3D data where the actual FFTing takes much longer than moving the data around.

voodooless · 2020-06-25 2:53 pm

HenrikEnquist said:
I have been playing with the idea, but with clFFT instead of cuFFT. There doesn't seem to be any good cuFFT binding for rust, but there is a clFFT one that looks good. It's a fun experiment and I will probably try it at some point.

Would be great if finally there would be proper support for the Pi 4 GPU.. Edit: and there is somebody at least working on a cufft rust wrapper: GitHub - mokus0/cufft.rs: Rust wrapper for cufft library

But I don't expect to get anywhere near the maximum speed. I think the short(ish) 1D data used here is so small that FFT on the CPU is pretty fast, and doing it on a GPU means there is a large overhead of moving data to and from the GPU. It probably only makes sense for 2D or 3D data where the actual FFTing takes much longer than moving the data around.

The great thing is that you might not need to copy anything: Programming Guide :: CUDA Toolkit Documentation. I have no idea however how the Rust memory model deals with that (my Rust is.. well, you can probably guess.. a bit.. :frosty:

).. However even so, having a larger dataset to FFT will bring much more performance. So if latency is not an issue, it could still work.. Or just crank up the sampling rate to 192 kHz or even 384 kHz 😀

In any case I really like what you have done so far. I did not have a chance to test it out yet. I might try compiling on a Mac first. I guess it will work if I disable ALSA (I do have Pulseaudio running). Would be really nice to have a fancy web interface to the config and make real time adjustments. Your simple web API might make this not so hard to do.

HenrikEnquist · 2020-06-25 9:14 pm

4real said:
Would be great if finally there would be proper support for the Pi 4 GPU.. Edit: and there is somebody at least working on a cufft rust wrapper: GitHub - mokus0/cufft.rs: Rust wrapper for cufft library

The great thing is that you might not need to copy anything: Programming Guide :: CUDA Toolkit Documentation. I have no idea however how the Rust memory model deals with that (my Rust is.. well, you can probably guess.. a bit.. ).. However even so, having a larger dataset to FFT will bring much more performance. So if latency is not an issue, it could still work.. Or just crank up the sampling rate to 192 kHz or even 384 kHz 😀

In any case I really like what you have done so far. I did not have a chance to test it out yet. I might try compiling on a Mac first. I guess it will work if I disable ALSA (I do have Pulseaudio running). Would be really nice to have a fancy web interface to the config and make real time adjustments. Your simple web API might make this not so hard to do.

Yeah I saw that cuFFT binding. No readme, and not even a single comment in any of the files.. I think I'll wait for something more ready to show up 🙂

When using OpenCL I think the proper way of doing it is to move the entire convolution to the GPU. I mean the whole process of FFT, multiply with all filter segments and accumulate result and then IFFT. And then fetch the data. I might take a look at this at some point, but it's not so high on the prio list.

I would be great if you could try compiling and running on a mac! You will of course need to disable the Alsa backend, but the rest should in theory be fine. But afaik nobody has tried it.

HenrikEnquist · 2020-06-25 10:22 pm

I merged the new version to master now. This means master is at v0.2.2 and includes all the latest goodies.

voodooless · 2020-06-26 8:57 am

HenrikEnquist said:
Yeah I saw that cuFFT binding. No readme, and not even a single comment in any of the files.. I think I'll wait for something more ready to show up 🙂

Yah, it's a long shot.. However it should actually only be a change in header file to get the basics working vs FFTW.

When using OpenCL I think the proper way of doing it is to move the entire convolution to the GPU. I mean the whole process of FFT, multiply with all filter segments and accumulate result and then IFFT. And then fetch the data. I might take a look at this at some point, but it's not so high on the prio list.

Well, it might be done when OpenCL comes to the Pi 4B 😉 There is another option however that would work on the Pi right now (as well on other GPU's): You could use the OpenGL ES shaders. Some people already did some of that with some success. When it comes to GPU power however, the Nvidia platform is miles ahead: the Pi 4 is estimated to give you about 32 GFLOPS, where Nvidia has almost 15x more raw power. BTW, both OpenCL as well as OpenGL Shaders know the concept of shared memory, so a copy of the data might not be needed in both cases.

I would be great if you could try compiling and running on a mac! You will of course need to disable the Alsa backend, but the rest should in theory be fine. But afaik nobody has tried it.

Will do so :cheers:

voodooless · 2020-06-26 9:36 am

4real said:
Will do so

It compiles and runs as far as I can see. Haven't figured out Pulseaudio on MacOS yet, so have no idea how that works yet. Will spend some more time later. You might want to exclude alsa as a default if the platform is MacOS, makes building a bit simpler 😉

HenrikEnquist · 2020-06-26 9:32 pm

4real said:
It compiles and runs as far as I can see. Haven't figured out Pulseaudio on MacOS yet, so have no idea how that works yet. Will spend some more time later. You might want to exclude alsa as a default if the platform is MacOS, makes building a bit simpler 😉

Great! Thanks for testing this. I have made a fix to disable Alsa on macos, it's in branch "mac". This doesn't seem to cause any trouble in linux but I haven't tried it on a mac. Could you give it a try?

What did you need to install in order to compile? It would be nice to add that to the readme 🙂

voodooless · 2020-06-26 9:41 pm

HenrikEnquist said:
Great! Thanks for testing this. I have made a fix to disable Alsa on macos, it's in branch "mac". This doesn't seem to cause any trouble in linux but I haven't tried it on a mac. Could you give it a try?

Yes, I’ll try.

What did you need to install in order to compile? It would be nice to add that to the readme 🙂

Basically nothing.. I already installed pulse, pkg-config and fftw via brew. Just installed Rust per your instructions, disabled alsa (and enabled fftw) and it builds.

TNT · 2020-06-29 10:22 am

4real, would you share your osx app?

//

voodooless · 2020-06-29 10:36 am

TNT said:
4real, would you share your osx app?

//

Well, it's not really an app. Best to build it yourself. That way you'll also get a native build. I can however give you a short installation instruction.

First of, install a decent terminal program. I like iTerm2, but there are others. You can also use the already installed Terminal.app if you like.

Next, make sure you have brew installed. Once installed install the following packages:

Code:

brew install fftw
brew install pkg-config
brew install pulseaudio

All that is left then is to follow the nice guide of how to build camilladsp. It also tells you how to install the rust environment. The only difference I made when building is to specify the correct build options:

Code:

RUSTFLAGS='-C target-cpu=native' cargo build --release --no-default-features --features pulse-backend --features socketserver --features FFTW

BTW, the "mac" branch does not seem to build:

Code:

 (/Users/christiaan/Downloads/camilladsp-mac)
error[E0463]: can't find crate for `clap`
 --> src/lib.rs:3:1
  |
3 | extern crate clap;
  | ^^^^^^^^^^^^^^^^^^ can't find crate

error: aborting due to previous error

For more information about this error, try `rustc --explain E0463`.
error: could not compile `camilladsp`.

To learn more, run the command again with --verbose.

TNT · 2020-06-29 11:08 am

Thanks!!

I dont have the development environment installed... maybe I should...

//

HenrikEnquist · 2020-06-29 11:12 pm

4real said:
Well, it's not really an app. Best to build it yourself. That way you'll also get a native build. I can however give you a short installation instruction.

First of, install a decent terminal program. I like iTerm2, but there are others. You can also use the already installed Terminal.app if you like.

Next, make sure you have brew installed. Once installed install the following packages:

Code:

brew install fftw brew install pkg-config brew install pulseaudio

All that is left then is to follow the nice guide of how to build camilladsp. It also tells you how to install the rust environment. The only difference I made when building is to specify the correct build options:

Code:

RUSTFLAGS='-C target-cpu=native' cargo build --release --no-default-features --features pulse-backend --features socketserver --features FFTW

BTW, the "mac" branch does not seem to build:

Code:

(/Users/christiaan/Downloads/camilladsp-mac) error[E0463]: can't find crate for `clap` --> src/lib.rs:3:1 | 3 | extern crate clap; | ^^^^^^^^^^^^^^^^^^ can't find crate error: aborting due to previous error For more information about this error, try `rustc --explain E0463`. error: could not compile `camilladsp`. To learn more, run the command again with --verbose.

Thanks! I'll add something about building on a mac in the readme.

I noticed the "mac" branch doesn't work. There is a new branch called "windowsbuild" which seems to build fine on both macos and windows. (yes I managed to find a mac to try on 🙂)
And I started implementing a backend based on the cpal audio library cpal - Rust which supports many backends. I will try to use it to add support for both Wasapi and CoreAudio.

Search

Amplifiers

Source & Line

Loudspeakers

Design & Build

General Interest

Live Sound

Member Areas

Site

Featured Vendors

Members Market

Vendors Market

Vendors

Search

CamillaDSP - Cross-platform IIR and FIR engine for crossovers, room correction etc

phofman

wealas

phofman

wealas

phofman

phofman

HenrikEnquist

voodooless

HenrikEnquist

voodooless

HenrikEnquist

HenrikEnquist

voodooless

voodooless

HenrikEnquist

voodooless

TNT

voodooless

TNT

HenrikEnquist