Let's build a FIR convolver for Pulseaudio Crossover Rack

Status
This old topic is closed. If you want to reopen this topic, contact a moderator using the "Report Post" button.
Don't really know about the licencing situation with the GPL vs the BSD licence I release all my projects under. I hate the GPL for the reason that you can never be really sure if you can use the stuff released under it or not, all the crap of programs vs. libraries etc. The real open source licence is the BSD licence IMO.

The API looks quite reasonable, will have a look at it some time

It's not rocket science. Just read the GPL2 license text:

GNU General Public License v2.0 - GNU Project - Free Software Foundation

Supposedly it's GPL v3 that is making people quite unhappy, not version 2.

Here is some info comparing BSD and GPL:
What is the difference between a BSD and a GPL license types? - Quora
 
Last edited:
I know, I'm probably a bit too opinionated about the GPL. I just think it's way too long. Apart from the fact that it was meant to ensure contributions back to the project, actually a lot of companies just use stuff that's GPLed and don't give a **** about licence terms. So what's the whole point of putting it into a long licence text? Just release your stuff to the public and be done with it. Just my two cents.

Thanks for the hint about the library though. Appreciated!

Will take some time though till I can look into it because I'm mainly designing and building hardware at the moment... There's some uber cool stuff in the works ;)

PS: Whaaaat? I can't write $h1t here? :D:D:D
 
I'm finally (or so I think) figuring out what steps are used to perform FIR filtering on a data stream. I found a couple of web pages to be informative:

The illustrations at the bottom of this page were helpful:
Example of Overlap-Add Convolution

Also this page, with illustrations of overlap-add:
fft - Overlap/add time-domain audio frames: How does normalization/scaling work with overlap greater than 50%? - Signal Processing Stack Exchange

This leaves me with a couple of questions:
a. What is the usual, or preferred, window function for the input data?
b. If the window is N samples wide and I will be operating on chunnks of data M samples wide, will the output FFT be M+2N in size? I am always a bit unsure of the vector sizes involved with convolution...
c. What are sane choices for M and N?
d. In the second link it is explained that there are several different overlap widths that can be used - some may require scaling. What is the usual/preferred overlap and why?

At this point, the general procedure seems to be:
1. Window input data/stream
2. Perform FIR filtering via convolution with kernel and windowed data
3. Move result into overlap-add buffer
4. When enough data has been placed into output buffer to satisfy overlap, copy data out of overlap-add buffer and into output data/stream

Are the steps above correct? Note I am assuming that I already know the filter kernel ahead of time. This will be determined via other means.
 
Last edited:
You should probably start your own thread about this. Not because I don't like the discussion to happen in this thread but because you'd probably get more attention to your questions then. If you do, please tell me so I can follow... Also, if you ask on stackexchange (which is more likely to get your questions answered) or similar, please also keep us posted here, because at least I am interested to learn this stuff also.
 

The problem discussed at that link is that of analyzing the spectral content of a signal. It is related to but not directly applicable to the problem of convolving an impulse response with a streaming input signal. An example use-case might be to display spectral power as function of time (think real-time EQ display on an AVR).

However, FIR impulse responses are typically windowed prior to convolving with input data blocks.

This leaves me with a couple of questions:
a. What is the usual, or preferred, window function for the input data?
b. If the window is N samples wide and I will be operating on chunnks of data M samples wide, will the output FFT be M+2N in size? I am always a bit unsure of the vector sizes involved with convolution...
c. What are sane choices for M and N?
d. In the second link it is explained that there are several different overlap widths that can be used - some may require scaling. What is the usual/preferred overlap and why?

a. For overlap-add (or overlap-save), the input signal blocks are not windowed. Or rather, they are, but the rectangular window is used (equivalent to unity multiplication), so the operation is implicit only.

b. Forget window sizes here. The rule for vector lengths in convolution is that for an input block of size B and a filter (IR) of size N, the convolution product K must be
Code:
K >= B + N - 1
.
c. The input block sizes are not typically yours to control - the blocks are provided by the audio application (eg: VLC or Spotify --> Pulseaudio (convolve here) --> ALSA (or convolve here) --> DAC

d. Again, forget windowing the input stream. This is a misunderstanding of the fundamentals.
 
Thanks for your reply, dc655321. If I am understanding correctly:

A. I will get a block of data from whatever processing handling the audio buffering. For LADSPA the size is whatever the LADSPA host decides. The block size (B) is known to the LADSPA plugin.
B. I convolve the input data block with the kernel of length K to yield a convolution output of length B+K-1.
C. Successive convolution outputs are summed together to form the output data stream.

That's it? Seems so simple. No windowing. Wow. Hard to believe it is not more complicated! Or am I missing something?
 
If I am understanding correctly:

A. I will get a block of data from whatever processing handling the audio buffering. For LADSPA the size is whatever the LADSPA host decides. The block size (B) is known to the LADSPA plugin.
Sort of?
You, the algorithm designer, must chose what size of blocks your algorithm will work with. The audio application may feed your code blocks of any size though (~4 kilo-frames, for example), so one must be prepared to handle that situation.

B. I convolve the input data block with the kernel of length K to yield a convolution output of length B+K-1.

No. The output of your algorithm can only be a large as the size of the input block it was given (assuming no output buffering). The rule, y >= b + n - 1, when applied over a segment of input will contain aliased values. Thus the need to overlap and add (or save/discard).

Eg: if your algorithm works in 4 ksample blocks of input, and your FIR filter is 8 ktaps (taps, samples, same thing), clearly you cannot multiply two vectors of unequal length (convolution in frequency domain IS multiplication). So, the 4 ksamples of input are padded with 4 kilo-zeros, yielding an input+zeros length of 8 ksamples. This can then be multiplied with the FIR transform, inverse FFT'd, and B samples of convolved output extracted.

C. Successive convolution outputs are summed together to form the output data stream.

No.
There is always an overlap (hey, it's in the name!) of M-1 points.
It's a stateful algorithm.

With overlap-save, there is no addition, post-convolution, required.

That's it? Seems so simple. No windowing. Wow. Hard to believe it is not more complicated! Or am I missing something?

I'm afraid you're missing a few things. But, this is not something that will make sense without a solid understanding of convolution.
 
Sort of?
You, the algorithm designer, must chose what size of blocks your algorithm will work with. The audio application may feed your code blocks of any size though (~4 kilo-frames, for example), so one must be prepared to handle that situation.
Sure, I understand this point 100%. I was just giving an example from what I have experienced under LADSPA. The host gives you some block size. IF you need "more" than you buffer it.

No. The output of your algorithm can only be a large as the size of the input block it was given (assuming no output buffering). The rule, y >= b + n - 1, when applied over a segment of input will contain aliased values. Thus the need to overlap and add (or save/discard).

Eg: if your algorithm works in 4 ksample blocks of input, and your FIR filter is 8 ktaps (taps, samples, same thing), clearly you cannot multiply two vectors of unequal length (convolution in frequency domain IS multiplication). So, the 4 ksamples of input are padded with 4 kilo-zeros, yielding an input+zeros length of 8 ksamples. This can then be multiplied with the FIR transform, inverse FFT'd, and B samples of convolved output extracted.
Actually, there is no requirement that the impulse is smaller than the block of data. Where did you get that idea? Remember, outside of the non-zero data I am assuming infinite zeroes. Maybe a theoretical construct, but that is how it is. For example, if you are convolving an impulse with a FIR filter, the impulse is only a couple of samples wide, really. Then what? (that was a rhetorical question).
No.
There is always an overlap (hey, it's in the name!) of M-1 points.
It's a stateful algorithm.

With overlap-save, there is no addition, post-convolution, required.



I'm afraid you're missing a few things. But, this is not something that will make sense without a solid understanding of convolution.

I don't see a difference between what you are saying and what I stated in my post. There will be an "overlap". What is that? It means that information produced by a successive convolution also contains information that must be added to adjacent convolutions to yield the "correct" output. This is just a consequence of windowing.
 
@dc655321 any news about your library?

@Tfive apologies for the silence (again!).
I just got back late last night from traveling across the country for a two-day job interview. Never had one lasting 2 days before... Brutal.

I will ping you on Gitlab and we can discuss offline exactly what you need in an API (eg: what additions may/may not be required from cvngn for LADSPA use).
 
Would it be possible to use the Pulseaudio module-virtual-surround-sink? It's not intended for multichannel out, but as per the code it does support multichannel convolution.

Otherwise, in the coming year I will try to route PAXOR's output through BruteFIR. Or, I'll try GitHub - bmc0/dsp: An audio processing program with an interactive mode. which includes zita-convolver and FFTW packages for FIR.


Does the virtual-surround-sink actually support convolution?
I only had time for a quick look through the source, so may have missed it, but it looked to me like a "simple" mixing filter.


Would be grateful to have the convolution aspects here pointed out explicitly. Thx.
 
Just in case anyone is interested in what one of the pulseaudio example HRIR looks like (ignore mag scaling; it's wrong here). It's 6 channels of 128 IR coefficients.
 

Attachments

  • Figure_1.png
    Figure_1.png
    74.6 KB · Views: 154
Status
This old topic is closed. If you want to reopen this topic, contact a moderator using the "Report Post" button.