Let's build a FIR convolver for Pulseaudio Crossover Rack

Status
This old topic is closed. If you want to reopen this topic, contact a moderator using the "Report Post" button.
This thread seems to be getting stale...

@dc655321 and Tfive:
Was there any progress on the FIR filtering implementation? PM me if you want to discuss privately.

I found an issue with the uniform-partitioning algorithm, where input blocks of sizes less than what the library was configured for at runtime cannot be handled correctly.

Basically, the algorithm requires feeding with a constant number of input samples each iteration, otherwise corrupted output will result.

To work around this, I added a "mode" switch, where the integrator can choose between the uniform-partitioning algorithm or a generic, overlap-save algorithm. The latter is flexible with respect to number of input samples (must be <= initialized value), but is approximately an order of magnitude slower at each convolution iteration. In most circumstances involving listening to music, the additional time requirement is not a handicap.

Currently, this new-ish code is passing my test-suites. However, so was the uniform-partitioning code until I began deeper probing of the library's integration with ALSA and Pulseaudio :eek:

I currently traveling (apartment shopping!), with lots of time in airports/planes, so I will take a crack at integration again and report my findings.


PS: if anyone knows of a way to instruct ALSA or Pulseaudio that a plugin/sink requires constantly-sized input chunks, please speak up. I have some ideas to test, but it would be great to be handed an answer for once without hours of digging :rolleyes:
 
This thread seems to be getting stale...

@dc655321 and Tfive:
Was there any progress on the FIR filtering implementation? PM me if you want to discuss privately.


There's very little progress atm. I built a skeleton LADSPA plugin, linking the library is working, but no real effort was put into a working implementation yet.


I'm currently pretty busy designing and building hardware. You can expect further results in this thread in a few weeks at best.
 
Well I might as well bring up my ideas and the approach I am planning towards a LADSPA plugin that can implement FIR filtering...

I am doing things a little differently: the LADSPA plugin and the FIR "engine" will operate as independent processes in the operating system. This is because the LADSPA interface gets and receives audio data from the LADSPA host in small chunks (frames) which are on the order of e.g. 1024 samples, possibly less. Often times people want to run FIR kernels on the order of thousands of taps, and often the FFT size will be several times larger than the number of taps to improve efficiency of the FIR convolution. This means that the plugin will be buffering, buffering, buffering until there is enough data and then it will pull the lever and the FIR engine will do the convolution and spit out a large data set. Then the LADSPA plugin will sip from that at 1024 samples per call to return processed data to the host, all the while buffering unprocessed data for the next cycle of the FIR engine. In between the FIR engine is doing nothing - it's idle. The problem with this is that the LADSPA interface was designed primarily with lightweight calculations in mind, in which one frame of data was processed and immediately returned to the host, and the overhead to do this did not vary in time. With the FIR engine, if it was incorporated into the plugin code, every N frames there will be a spike in the wall-clock-time (e.g. a delay) while the FIR engine was running. I fear that this will cause problems for the host.

In my approach the LADSPA plugin is taking the data it gets from the host and sticking it in a kind of FIFO, 1024 sample frame by frame. At the same time the plugin is loading processed data from the FIR engine, 1024 sample frame by frame. Independently, the FIR engine is waiting in the background and when there is enough data in the FIFO it fires up, performs the FIR convolution, and then spits out the result into a data-return FIFO from which the LADSPA plugin is taking data to return to the host. The FIR engine is a separate process, and the operating system will take care of scheduling it. The big plus is that the FIR engine can take its time and run over a length of wall clock time that what might amount to several calls to the LADSPA plugin (e.g. several frames of data). Because (it is assumed) that the size of the data in the FIFOs is much larger than the frame size, the return FIFO will not run dry data before the FIR engine cycle has finished and more processed data is returned. At the same time the FIR engine does not delay the plugin from returning data to the host, and the plugin's computational demand remains constant in time.

There are more accounting details in this approach, and the data transfer has to be coordinated properly, but loads should be much more balanced.
 
Status
This old topic is closed. If you want to reopen this topic, contact a moderator using the "Report Post" button.