FIR-LADSPA: A LADSPA plugin for FIR filtering

Several weeks ago I started to write code for implementing FIR filters under LADSPA, and mentioned it HERE.

LADSPA plugins were never intended to do intensive processing, or to have highly variable CPU loads between calls to the plugin by the LADSPA host. This means that implementing an FIR convolution inside the plugin itself is probably not advisable, or even possible. So I have taken a different approach:

The FIR convolution is performed in a separate process that is entirely independent of, and is running asynchronously from, the LADSPA plugin and host. I call it an "FIR engine". This separate process is launched using an operating system call as part of the LADSPA plugin setup process. Parameters and other info are passed from the plugin to the FIR engine using a "control" FIFO, and other FIFOs are used to pass unprocessed and processed data.

Performing the FIR convolution in a separate process has several advantages. The LADSPA plugin, once up and running, only needs to put/get data into/out of the FIFOs and this keeps it very computationally lightweight. The FIR convolution can only be performed when the desired length of data has been obtained via LADSPA. Because of the real-time nature of the audio processing, this takes Ndata/sample_rate seconds. For example, at 48kHz if you process 16384 data points per call it will require the FIR engine to collect incoming data for about 0.35 seconds before it can perform one convolution. During that time, the LADSPA plugin might have been called 16 times (given a typical frame of 1024 samples). The speed requirement of the convolution can therefore be as slow as 0.3 seconds or so, by which time the next 16384 samples have been collected and need to be processed. This can be helpful when using low powered Linux hardware such as a Raspberry Pi. Additionally, the OS is free to schedule the FIR engine process around other processes, and there is no need to run it at a high priority, etc.

As part of the set of of the plugin and FIR engine, a uniquely named directory is created in the tmpfs (files in memory) that comes with all Linuxes. The FIR engine collects and writes to this directory a file that lists the mean and longest "cycle time", that is the time to process the data and return it to the LADSPA plugin. The cycle time latency can be obtained in a test run for a given platform and FIR filter set, and then this is supplied to the LADSPA plugin as a parameter during normal use. This latency timing info is used to set up internal buffers so that underruns are prevented.

Because the FIR engine is a separate and independent process and because some LADSPA hosts do not correctly tear down the plugin (by calling deactivate, etc.) I have written the FIR engine to self-terminate and cleanup after itself. This includes deleting the tmpfs directory in which it was operating, and the FIFOs. Once this process is completed there is no sign that the FIR engine was ever there. A pair of error logs are written to the tmpfs and not deleted, however, on reboot the tmpfs filespace is wiped clean. The user can manually delete the error log files anytime. Since these only are used to record fatal error messages, it is not likely they will grow to any appreciable size.

I am currently coding up the FIR convolution using FFTW but have everything else functioning well using a dummy convolution function that simply passes input to output in the FIR engine. I hope to get a test version fully up and running in a week or so and will post updates as I have them.
 
Charlie, congrats. I like your chained LADSPA approach where you do not have to use the loopbacks.

It looks Linux has gotten several top-quality filter solutions recently, each for a different scenario. I wish they made it all the way to stock ubuntu packages...
 
Charlie, congrats. I like your chained LADSPA approach where you do not have to use the loopbacks.

It looks Linux has gotten several top-quality filter solutions recently, each for a different scenario. I wish they made it all the way to stock ubuntu packages...

Thanks! Yes I think this is going to work well.

One interesting aspect is that ANY process can write to the control FIFO for the FIR engine. The only command that is currently supported is one that causes a filter change, e.g. a new filter is loaded and applied to the audio stream without tearing down. This is handled by re-processing the previous set of data to create a new overlap-add data set. As long as the computing horsepower permits, this can allow for seamless transitions from filter to filter. The new filter must be able to operate with all the other parameters unchanged, that is to say that it will use the sample rate, data length, and FFT length from the previous filter(s). Currently, if these parameters need to be changed, the chain must be broken down and restarted. A filter with fewer taps can always be zero padded up to a standard size, so that all filters that one wishes to switch between use this same filter length. The idea is to make it possible to A/B filters, or compare a filter to one that has the same latency but does not result in any changed to amplitude or phase. It would be great for "can you hear it" ABX testing.
 
Is this a trick question or something? No, not from a mere picture with a "black box" representing some "sound card in". Details matter.

BTW, what does that have to do with this thread?

I've clicked on the links in your signature, and then i've read the pages, and finally click on what you've qualified by "strongly recommend reading this excellent".

Sorry and great work :D
 
Nice! This will be fun to look at once it's complete.
The fifos between the ladspa part and the fir engine, are they Unix pipes, or something else?
Will you support partitioning the impulse response for shorter latency?

Yes, the FIFOs are simple named pipes in the OS. I really like their flexibility, their ability to buffer, and they seem fast enough and have low enough latency for audio applications.

I would love to implement a partitioned convolution but I do not have the expertise to code one up myself. If any one can help with that, or can point me to some open source code that doesn't require too much overhead in terms of libraries and so on, please let me know.
 
I would love to implement a partitioned convolution but I do not have the expertise to code one up myself. If any one can help with that, or can point me to some open source code that doesn't require too much overhead in terms of libraries and so on, please let me know.
Take a look here. It's rust but shouldn't be too hard to translate to C.
camilladsp/fftconv.rs at develop * HEnquist/camilladsp * GitHub
Setting things up starts at row 30, filtering at row 87.

I'm not using fftw, but I guess that one works the same way as RustFFT.
 
Last edited:
Take a look here. It's rust but shouldn't be too hard to translate to C.
camilladsp/fftconv.rs at develop * HEnquist/camilladsp * GitHub
Setting things up starts at row 30, filtering at row 87.

I'm not using fftw, but I guess that one works the same way as RustFFT.

I'm already working up the FFT/iFFT under FFTW in C/C++ and that is coming along nicely. It seems that your code is just doing that under Rust, and not using a partitioned convolution - is that correct?
 
No check again! There is an extra loop that loops over the partitions (which I call segments). Check line 106 to 111 in the develop branch. Compare develop and master to see the changes I did for partitioned.

Here is a nice description of the process:
fourier transform - How does minimum-latency partitioned convolution reverb work when you receive input samples in chunks, rather than one at a time? - Signal Processing Stack Exchange

Nice. I'm very impressed. Thank you very much for sharing your code freely! If I were more familiar with Rust and everything about partitioned convolution I would try to port it to C/C++. For now I think I will just try to get the LADSPA plugin up and running and see how it does.

I have an idea on how to do only 1 forward transform (of the audio data) and use it for N filtering processes. This would save N-1 forward transforms each time.

There will still be the usual and expected latency of FFT. Once I get everything up and running I can re-evaluate my options in terms of the FFT and iFFT calculations.
 
For now I think I will just try to get the LADSPA plugin up and running and see how it does.
That is definitely a good idea. Once it's working you can extend it. I would recommend writing some automated tests that you can run to check that your convolver gives the right output. That's a quick and easy way to check that any change you make doesn't break anything. For inspiration you can look at the tests at the bottom of my convolver sourve file. I try to run them before every time I want to push a new version to GitHub, and then there is a GitHub Action that runs them automatically for every push.
 
UPDATE:

I have implemented FFTW based convolution in the code and gotten everything to compile. I hope to move on to testing soon.

As part of getting set up for FFTW I wanted to make sure that the DFT length would NOT result in the slowest FFTW calculation method being used. According to the FFTW docs:
FFTW is best at handling sizes of the form 2^a * 3^b * 5^c * 7^d * 11^e * 13^f, where e+f is either 0 or 1, and the other exponents are arbitrary. Other sizes are computed by means of a slow, general-purpose algorithm (which nevertheless retains O(n log n) performance even for prime sizes).
I have chosen e = f = 0. In addition, I wanted to make sure that the user-supplied DFT length was a factor of only the other primes: 2, 3, 5, and 7. So I wrote a function that will check the user supplied FFT length and if it cannot be factored into these primes it searches for the next higher integer that is only comprised of these "lower prime" numbers and then uses that for the FFT length. The additional bins will just be zero padded as usual, and with O N log N complexity this should keep the convolution speed optimized for FFTW. It's a nice and useful feature IMO.

Also, I have chosen to use "system wisdom" rather than try to do a quick but sub-optimal wisdom calculation during startup. This will require the user to pre-compute FFTW wisdom using the command line wisdom building utility included with FFTW. This utility can be configured to employ SSE/2/3 or NEON instructions, and can be compiled for double or single precision. These parameters can have a strong influence on FFTW's speed, so some attention and customization by the user is warranted.

For more info on Wisdom pre-calculation, see:
FFTW-WISDOM(1) manual page
fftwf-wisdom(1): create wisdom - Linux man page
 
Last edited:
UPDATE:

I've finally gotten everything working and done some preliminary testing and benchmarking. The code still needs some optimizatin and cleanup, but things are looking good. The plan to separate the plugin and the FIR convolution into different processes seems to have paid off.

Channels are processed individually, that is to say this is exclusively a monophonic LADSPA plugin. Stereo or in general N channels simply invoke N plugins, each one running independently of the other.

I have been using an Intel J1900 based machine during development and testing. As an example of performance, I ran an 64k FFT. The time to perform the convolution and return the processed data to the plugin was about 3.1 msec. CPU consumption was 1% per channel. To put this into perspective, the data must be processed by the FIR engine and returned to the plugin during one audio frame. The plugin uses a frame size of 1024 samples and is running at 48kHz, which comes out to about 21msec per frame meaning that the data is returned well in advance of the "deadline". LADSPA hosts that allow the user to adjust the frame size (ecasound for example) can extend the deadline to whatever the user desires, limited by system memory.

Hopefully I can toss the code onto a Raspberry Pi 4 and get some numbers for that platform in the next couple of days. I think it should work well there, too.
 
I have been continuing to test and refine the code. Have made the memory footprint smaller, which resulted in a speedup of about 15% for a 64k tap filter. Everything seems to be working well.

If anyone wants to give it a test before I release it please drop me a line. I would love some guinea pigs to give it a try so that I can firm up the instructions/documentation.
 
PLEASE FEEL FREE TO READ THE ATTACHED OVERVIEW

It will take me a little more time to review and upload everything to my LADSPA plugin web page, so for now I am attaching the file that describes the installation and use of the plugin and FIR engine. It should provide a good overview of everything and includes an example for a 3-way stereo crossover using ecasound as the LADSPA host.

If there is something missing or you find something to correct please post here of OM me so that I can fix the issue.
 

Attachments

  • INSTALLATION AND USE OF LADSPA-FIR.txt
    18.4 KB · Views: 95
Hi Charlie,
congratulations on your new realization.


I followed up here the instructions for INSTALLATION AND USE OF LADSPA-FIR,

PLUGIN AND FIR ENGINE INSTALLATION:
3. Issue the command "make" to build the plugin object file

but..





flavio-PC LADSPA-FIR # make
g++ -I. -Ofast -Wall -c -fPIC -DPIC -o LADSPA_FIR.o LADSPA_FIR.cpp
In file included from /usr/include/c++/5/chrono:35:0,
from LADSPA_FIR.cpp:32:
/usr/include/c++/5/bits/c++0x_warning.h:32:2: error: #error This file requires compiler and library support for the ISO C++ 2011 standard. This support must be enabled with the -std=c++11 or -std=gnu++11 compiler options.
#error This file requires compiler and library support \
^
LADSPA_FIR.cpp:302:3: warning: identifier ‘nullptr’ is a keyword in C++11 [-Wc++0x-compat]
if ( val == nullptr ) { // invalid to assign nullptr to std::string
^
LADSPA_FIR.cpp: In function ‘std::__cxx11::string& ltrim(std::__cxx11::string&)’:
LADSPA_FIR.cpp:145:2: warning: ‘auto’ changes meaning in C++11; please remove it [-Wc++0x-compat]
auto it = std::find_if(s.begin(), s.end(),
^
LADSPA_FIR.cpp:145:7: error: ‘it’ does not name a type
auto it = std::find_if(s.begin(), s.end(),
^
LADSPA_FIR.cpp:148:4: error: expected primary-expression before ‘)’ token
});
^
LADSPA_FIR.cpp:149:21: error: ‘it’ was not declared in this scope
s.erase(s.begin(), it);
^
LADSPA_FIR.cpp: In function ‘std::__cxx11::string& rtrim(std::__cxx11::string&)’:
LADSPA_FIR.cpp:156:2: warning: ‘auto’ changes meaning in C++11; please remove it [-Wc++0x-compat]
auto it = std::find_if(s.rbegin(), s.rend(),
^
LADSPA_FIR.cpp:156:7: error: ‘it’ does not name a type
auto it = std::find_if(s.rbegin(), s.rend(),
^
LADSPA_FIR.cpp:159:4: error: expected primary-expression before ‘)’ token
});
^
LADSPA_FIR.cpp:160:10: error: ‘it’ was not declared in this scope
s.erase(it.base(), s.end());
^
LADSPA_FIR.cpp: In function ‘std::__cxx11::string get_random_char_str(int)’:
LADSPA_FIR.cpp:279:3: warning: ‘auto’ changes meaning in C++11; please remove it [-Wc++0x-compat]
auto now = std::chrono::system_clock::now();
^
LADSPA_FIR.cpp:279:8: error: ‘now’ does not name a type
auto now = std::chrono::system_clock::now();
^
LADSPA_FIR.cpp:280:3: warning: ‘auto’ changes meaning in C++11; please remove it [-Wc++0x-compat]
auto now_us = std::chrono::time_point_cast<std::chrono::microseconds>(now);
^
LADSPA_FIR.cpp:280:8: error: ‘now_us’ does not name a type
auto now_us = std::chrono::time_point_cast<std::chrono::microseconds>(now);
^
LADSPA_FIR.cpp:281:3: warning: ‘auto’ changes meaning in C++11; please remove it [-Wc++0x-compat]
auto epoch = now_us.time_since_epoch();
^
LADSPA_FIR.cpp:281:8: error: ‘epoch’ does not name a type
auto epoch = now_us.time_since_epoch();
^
LADSPA_FIR.cpp:282:3: warning: ‘auto’ changes meaning in C++11; please remove it [-Wc++0x-compat]
auto epoch_usec = std::chrono::duration_cast<std::chrono::microseconds>(epoch
^
LADSPA_FIR.cpp:282:8: error: ‘epoch_usec’ does not name a type
auto epoch_usec = std::chrono::duration_cast<std::chrono::microseconds>(epoch
^
LADSPA_FIR.cpp:284:36: error: ‘epoch_usec’ was not declared in this scope
seed = static_cast<unsigned int>(epoch_usec.count() );
^
LADSPA_FIR.cpp: In function ‘std::__cxx11::string GetEnv(const string&)’:
LADSPA_FIR.cpp:302:15: error: ‘nullptr’ was not declared in this scope
if ( val == nullptr ) { // invalid to assign nullptr to std::string
^
LADSPA_FIR.cpp: In function ‘void connectPort(LADSPA_Handle, long unsigned int, LADSPA_Data*)’:
LADSPA_FIR.cpp:407:33: error: call of overloaded ‘abs(double)’ is ambiguous
temp = abs( trunc( temp ) ); //truncate to whole number, make sure is >=
^
In file included from /usr/include/c++/5/cstdlib:72:0,
from LADSPA_FIR.cpp:26:
/usr/include/stdlib.h:774:12: note: candidate: int abs(int)
extern int abs (int __x) __THROW __attribute__ ((__const__)) __wur;
^
In file included from LADSPA_FIR.cpp:26:0:
/usr/include/c++/5/cstdlib:179:3: note: candidate: __int128 std::abs(__int128)
abs(__GLIBCXX_TYPE_INT_N_0 __x) { return __x >= 0 ? __x : -__x; }
^
/usr/include/c++/5/cstdlib:174:3: note: candidate: long long int std::abs(long long int)
abs(long long __x) { return __builtin_llabs (__x); }
^
/usr/include/c++/5/cstdlib:166:3: note: candidate: long int std::abs(long int)
abs(long __i) { return __builtin_labs(__i); }
^
LADSPA_FIR.cpp:412:33: error: call of overloaded ‘abs(double)’ is ambiguous
temp = abs( trunc( temp ) ); //truncate to whole number, make sure is >=
^
In file included from /usr/include/c++/5/cstdlib:72:0,
from LADSPA_FIR.cpp:26:
/usr/include/stdlib.h:774:12: note: candidate: int abs(int)
extern int abs (int __x) __THROW __attribute__ ((__const__)) __wur;
^
In file included from LADSPA_FIR.cpp:26:0:
/usr/include/c++/5/cstdlib:179:3: note: candidate: __int128 std::abs(__int128)
abs(__GLIBCXX_TYPE_INT_N_0 __x) { return __x >= 0 ? __x : -__x; }
^
/usr/include/c++/5/cstdlib:174:3: note: candidate: long long int std::abs(long long int)
abs(long long __x) { return __builtin_llabs (__x); }
^
/usr/include/c++/5/cstdlib:166:3: note: candidate: long int std::abs(long int)
abs(long __i) { return __builtin_labs(__i); }
^
LADSPA_FIR.cpp:418:33: error: call of overloaded ‘abs(double)’ is ambiguous
temp = abs( trunc( temp ) ); //truncate to whole number, make sure is >=
^
In file included from /usr/include/c++/5/cstdlib:72:0,
from LADSPA_FIR.cpp:26:
/usr/include/stdlib.h:774:12: note: candidate: int abs(int)
extern int abs (int __x) __THROW __attribute__ ((__const__)) __wur;
^
In file included from LADSPA_FIR.cpp:26:0:
/usr/include/c++/5/cstdlib:179:3: note: candidate: __int128 std::abs(__int128)
abs(__GLIBCXX_TYPE_INT_N_0 __x) { return __x >= 0 ? __x : -__x; }
^
/usr/include/c++/5/cstdlib:174:3: note: candidate: long long int std::abs(long long int)
abs(long long __x) { return __builtin_llabs (__x); }
^
/usr/include/c++/5/cstdlib:166:3: note: candidate: long int std::abs(long int)
abs(long __i) { return __builtin_labs(__i); }
^
LADSPA_FIR.cpp:424:33: error: call of overloaded ‘abs(double)’ is ambiguous
temp = abs( trunc( temp ) ); //truncate to whole number, make sure is >=
^
In file included from /usr/include/c++/5/cstdlib:72:0,
from LADSPA_FIR.cpp:26:
/usr/include/stdlib.h:774:12: note: candidate: int abs(int)
extern int abs (int __x) __THROW __attribute__ ((__const__)) __wur;
^
In file included from LADSPA_FIR.cpp:26:0:
/usr/include/c++/5/cstdlib:179:3: note: candidate: __int128 std::abs(__int128)
abs(__GLIBCXX_TYPE_INT_N_0 __x) { return __x >= 0 ? __x : -__x; }
^
/usr/include/c++/5/cstdlib:174:3: note: candidate: long long int std::abs(long long int)
abs(long long __x) { return __builtin_llabs (__x); }
^
/usr/include/c++/5/cstdlib:166:3: note: candidate: long int std::abs(long int)
abs(long __i) { return __builtin_labs(__i); }
^
LADSPA_FIR.cpp: In function ‘void activate(LADSPA_Handle)’:
LADSPA_FIR.cpp:504:44: error: ‘stoi’ was not declared in this scope
FIR->filter_length = stoi( tokens[1] );
^
LADSPA_FIR.cpp:536:66: error: ‘to_string’ was not declared in this scope
a_message = "filter_length = " + to_string(FIR->filter_length);
^
LADSPA_FIR.cpp: In function ‘void run(LADSPA_Handle, long unsigned int)’:
LADSPA_FIR.cpp:631:74: error: ‘to_string’ was not declared in this scope
string a_message = "update_interval = " + to_string(update_interval);
^
LADSPA_FIR.cpp: In function ‘std::__cxx11::string GetEnv(const string&)’:
LADSPA_FIR.cpp:308:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
Makefile:14: set di istruzioni per l'obiettivo "LADSPA_FIR.o" non riuscito
make: *** [LADSPA_FIR.o] Errore 1


Help me! Please.