Brand New RaspBerry 3 : 64 bits

Status
Not open for further replies.
Cortex A9 GCC Performance

Comments: In 2000, a dual-processor system where each core had 1 GF single and 600 MF double precision performance (on something relatively hard to optimize, like an FFT) was a decent workstation. Today, it's a decent cell phone.

😀

BruteFIR was originally released right around 2000-2001. Link is for a dual A9, versus a quad A53, which is marginally faster for our uses.

Reading around, BruteFIR on RPi should be cross-compiled as I'm not sure the on-Pi gcc has everything we need, and I think FFTW has been updated to handle AARCH64 NEON (on recent GCC's).

And, yes, Charlie, you understand what I was thinking.
 
@CharlieLaub: Would it make sense to benchmark something like BruteFIR to get a sense of tap count scaling? I've had a project like that on my back burner for a while.

I've spent more time studying FIR than IIR (though frankly, not too much time on either). Do IIR calculations also rely heavily on FFTs?
 
Raspberry PI 3

Daniel, Charlie,

Thanks for your discussion on the PI 3. Learnt something from it so keep the conversion going.

Rob Watts from Chord Electronics thinks that more taps is better sounding, and his latest products like Hudo, Mojo has many. And the smallest timing difference can be detected by our brain. And he has FPGAs with massive processing power to work with.

I wonder if you need more CPU power than in 1 PI 3, if it's easy to use a cluster of say PI3 to do processing you're doing? As the PI 3 has wifi built in, that reduces the need for ethernet or wifi dongle for each PI 3.

Rick
 
You don't use convolutions for IIR filters like you do FIR filters, ergo, no. IIR filters are implemented via a difference equation (given the feedback mechanism).

That said, there should be a way to use NEON to get a decent amount of speedup on IIR filters.

Rick--I think there's enough power on the RPI-3 to do quite long FIR filters, especially if you're not trying to push out to 8 channels or so. My questions are theoretical, as I haven't bought an RPI2 or RPI3 to really test it and characterize its performance. But I don't think running a multi-system convolution would be a smart way to go for $70, versus, say, getting an Atom (Cherry trail)-based hardware setup.
 
Last edited:
Rob Watts from Chord Electronics thinks that more taps is better sounding, and his latest products like Hudo, Mojo has many. And the smallest timing difference can be detected by our brain. And he has FPGAs with massive processing power to work with.

From what I recall, in general, as you reduce the number of taps of an FIR filter the accuracy of the approximation by the FIR filter of the filtering function that you ask of it gets worse. The converse is true - increasing the number of taps results in better agreement between what you want the filter to be doing and what it actually does. There is likely a point where adding even more taps does not improve the accuracy of the filter all that much. Where that is depends on just what exactly the filter is doing and where in the frequency spectrum it is doing that.

Honestly I always wonder if people who design FIR filters take the time to check to see if the resulting set of taps is producing the response that they intended, or if there are some oscillations in the FR, etc because of too few taps. Maybe this is done automagistically by some design software... not sure. But the question is always in the back of my mind!
 
Hmmm… from what I remember of EE–225 (UCBerkeley), I was under the impression that neither IIR or FIR are convolution-based filtering systems. Specifically, FIR is a feed-forward fixed-factor summation system, and IIR is exactly the same but where the sum is fed back to the first ("zeroth") term. Therefore 'energy' can be retained within the system, leading to all the usual Nth order differential z-plane behavior. s-plane if you like (i don't).

Again, remembering from my stint of 3 years as lead programmer for a text-to-speech professor, it was felt that FIR offered unconditional stability in that it couldn't feed backward the residual response energy (leading to number space overflow and all the horrible sounding results one gets therein.) Oh yes… Nth order FIR is also mappable onto (N+2)th order 'recursive' filtering. Which if one follows the transform formulæ found in Rabiner & Gold, it is easy to compute the bi-directional filter parameters that exactly map response. And the last note is that a classic IIR filter system also can exactly map onto the same recursive filter.

Lastly, since we're on this almost comically geeky topic, the number of multiply-summations is found to be minimized for the recursive realtime topology, and in particular exactly finite: (4M + 6A) • Order. (where M and A are the work (time) to multiply and add, respectively)

So, while I respect field findings where lots of taps is sussed out to be good, especially for emulations of classic filter impulse responses, such as Butterworth, Bessel, Chebyshev, Elliptic and Sallen-Key, there are definitely times when all the mathematical-theoretic noodling doesn't really deliver detectably better results. Like using 32 bit linear digitization (if one could find such a beast) to represent more precise values for that which is coming off a 96 kHz ÷ 24 bit recording. Mmm… more bits doesnt' equal more goodness.

Anyway - perhaps my z-delay filter theory memories are weak. They are after all over 40 years 'old'.

GoatGuy
 
https://ccrma.stanford.edu/~jos/fp/Convolution_Representation_FIR_Filters.html

There's no such (good) representation of an IIR difference equation in terms of a convolution. Back to school! 🙂

I used convolution language, because it (at least to me) is easier to think FTs and iFTs.

POS on this board has some really great software for building and analyzing the effects of filter taps in FIR. Worth playing around with to get a better appreciation for what's going one. (RePhase) Believe Scott Wurcer did an article in Linear Audio about reduced tap optimization, too. (Subject to bad memory)
 
Hmmm… yes, what you say is true about IIR filters - if you're desire is to get one to do something very “natural filter-ish” (especially with Q > ½ for any of the poles/zeros) then it is precious difficult to compute the mesh that will actually get the desired filter 'done'.

This I know. There are some well-regarded degenerate cases that have some nearly trivial solutions (which I can't find a cite for right now), but I know that for N = [1, 2 and 3] there are direct non-iterative solutions for those orders of IIR filters.

Now, while it may not sound like N = 2 or N = 3 is a terribly high order for a filter, the first I in IIR does mean 'infinite' impulse response: the filter system does not need to be 'wide enough' to competently emulate a Q > ½ second or third order filter that has slightly underdamped behavior (if that's what you're aiming for.)

And by Hoozidonger's Theorem (too lazy to look it up), there is also a mathematical identity that says that a huge number of 4th, 5th … Nth order IIR "solutions' can be competently broken into (N=2) → (N=3) → … → (N=2) chained filter banks. With some attenuation between them to keep them from 'over unity' net gain.

And (wow, I've not thought this in decades!) while thinking about that, I also remember figuring out that the Nth order IIR transform in Rabiner & Gold to a bidirectional recursive filter analytic system also mapped onto the (N=2) … → … (N=2) chained form, without changing the net amount of computational work in any case. Kind of a cool result.

Remember though - the most fundamental limitation of FIR which kind of is superior to the ease with which one can find all the coefficients regardless of the 'z-order' of the thing, is that it simply does not have a natural filter's definitionally infinite exponential decay (if desired) or underdamped variations on exponentially damped oscillatory behavior. Is this bad, or a feature?

We tend to think of it as a feature, since just about no one that I know is designing Nth order filters to ring on their output. We audio (and video) people work pretty avidly to ensure that Q is always less than ½, both overall and if implemented in chained stages, between all stages. (That was the other Lemma of Hoozidonger's Theorem: that a chain of Q < ½ well damped stages could never in toto become a Q > ½ underdamped system. Kind of a 'lets you sleep well at night' realization.)

Thanks for the discussion!
And the memories!
GoatGuy
 
Most of he Things discussed here are far to theoretically for me, but I am using Brutefir on my Raspberrys (in fact i use it on all my SOC Boards, as there are 2 Singlecore Raspis, a Hummingboard i1 and an OrangePi Pc (getting more and more supported each Day since Armbian is developing for it))

For the Raspis I modified the Kernel a little bit, so i have at least Multichannel output via HDMI feeding my AVR up to 8 channels at 48kHz and 44,1kHz ,16bit (for me good old White Book Standard is enough 😀)
I can run Brutefir with 65536 Taps and 4 Channels Output on a RPI2 without any Hazzle, there is still Headroom for more.
The Singlecore Pi is able to do 4 Channels with 8192 Taps. By pushing the Core up to 900MHz, the Load is constant at 90% running stable for Hours.
It is possible to run a Mpd Server on the same Machine.

I have a Wolfson Audio Card for The Pi one, perhaps I will find some Time to do some Measurements with different counts of Taps in the next Days.

I am interested in the results myself; there was a Discussion on the german HifiForum about how much Taps are enough, concerning the new HD Minidsp 2x4.

Regards
 
You don't use convolutions for IIR filters like you do FIR filters, ergo, no. IIR filters are implemented via a difference equation (given the feedback mechanism).

That said, there should be a way to use NEON to get a decent amount of speedup on IIR filters.

Rick--I think there's enough power on the RPI-3 to do quite long FIR filters, especially if you're not trying to push out to 8 channels or so. My questions are theoretical, as I haven't bought an RPI2 or RPI3 to really test it and characterize its performance. But I don't think running a multi-system convolution would be a smart way to go for $70, versus, say, getting an Atom (Cherry trail)-based hardware setup.

Ok. Good to know RPI-3 is powerful enough. I am only listening to 2 channel so that is enough for me.

Noted that an Atom is better way for higher processing power if needed. Thanks!

Rick
 
From what I recall, in general, as you reduce the number of taps of an FIR filter the accuracy of the approximation by the FIR filter of the filtering function that you ask of it gets worse. The converse is true - increasing the number of taps results in better agreement between what you want the filter to be doing and what it actually does. There is likely a point where adding even more taps does not improve the accuracy of the filter all that much. Where that is depends on just what exactly the filter is doing and where in the frequency spectrum it is doing that.

Honestly I always wonder if people who design FIR filters take the time to check to see if the resulting set of taps is producing the response that they intended, or if there are some oscillations in the FR, etc because of too few taps. Maybe this is done automagistically by some design software... not sure. But the question is always in the back of my mind!

Charlie,
I new to this and have't tried running filters yet.
But I did get to hear Rob Watts last month when he was in town talking about the Chord Mojo and Dave Dacs.

Rob said he learnt a lot in the last 2 years working on Hugo, Mojo and Dave.

You may be able to find some of the finding he shared on google; as I find it interesting.

I recalled he said most sound engineers do not listen to results but rely only on measurements. And if you are going for ultra accuracy, only one measurement tool will do. He does both measurement and listening.

Rick
 
Last edited:
Most of he Things discussed here are far to theoretically for me, but I am using Brutefir on my Raspberrys (in fact i use it on all my SOC Boards, as there are 2 Singlecore Raspis, a Hummingboard i1 and an OrangePi Pc (getting more and more supported each Day since Armbian is developing for it))

For the Raspis I modified the Kernel a little bit, so i have at least Multichannel output via HDMI feeding my AVR up to 8 channels at 48kHz and 44,1kHz ,16bit (for me good old White Book Standard is enough 😀)
I can run Brutefir with 65536 Taps and 4 Channels Output on a RPI2 without any Hazzle, there is still Headroom for more.
The Singlecore Pi is able to do 4 Channels with 8192 Taps. By pushing the Core up to 900MHz, the Load is constant at 90% running stable for Hours.
It is possible to run a Mpd Server on the same Machine.

I have a Wolfson Audio Card for The Pi one, perhaps I will find some Time to do some Measurements with different counts of Taps in the next Days.

I am interested in the results myself; there was a Discussion on the german HifiForum about how much Taps are enough, concerning the new HD Minidsp 2x4.

Regards

skyunlimited,
Yes will be interesting to try. And if you do please also listen to and let us know if you hear any difference.
 
Most of he Things discussed here are far to theoretically for me, but I am using Brutefir on my Raspberrys (in fact i use it on all my SOC Boards, as there are 2 Singlecore Raspis, a Hummingboard i1 and an OrangePi Pc (getting more and more supported each Day since Armbian is developing for it))

For the Raspis I modified the Kernel a little bit, so i have at least Multichannel output via HDMI feeding my AVR up to 8 channels at 48kHz and 44,1kHz ,16bit (for me good old White Book Standard is enough 😀)
I can run Brutefir with 65536 Taps and 4 Channels Output on a RPI2 without any Hazzle, there is still Headroom for more.
The Singlecore Pi is able to do 4 Channels with 8192 Taps. By pushing the Core up to 900MHz, the Load is constant at 90% running stable for Hours.
It is possible to run a Mpd Server on the same Machine.

I have a Wolfson Audio Card for The Pi one, perhaps I will find some Time to do some Measurements with different counts of Taps in the next Days.

I am interested in the results myself; there was a Discussion on the german HifiForum about how much Taps are enough, concerning the new HD Minidsp 2x4.

Regards
Not sure if you (and other readers) know this but there is also an inexpensive single-core (Allwinner R8) board with onboard WiFi, USB coming out in June (can pre-order now) that I plan to try. Check it out here:
Get C.H.I.P. - The World's First Nine Dollar Computer
I gave the Pi Zero a go but it was rather disappointing. Sure, the board costs $5 if you can get one, but then you have to add on $30 worth of stuff to make it work for audio.

Wasn't aware of the "miniDSP 2x4 HD" as I haven't been following them since I fell off the miniDSP wagon and started using the R-Pi and other related boards (that I call minicomputers). I will have to check it out to see what they have been up to... I felt like that company went off on a tangent developing more expensive (yet stupid IMHO) room correction engines and kind of let their DSP crossover line get a bit stale. I get some additional capabilities with e.g. an R-Pi 2 and I am in total control and not subject to some MFG GUI and its limitations.
 
Thanks for the discussion!
And the memories!
GoatGuy

Not ignoring the rest of what you wrote (good info, will have to look up the official name of the Whodunit Theorem), but wanted to return thanks in kind.

Most of he Things discussed here are far to theoretically for me, but I am using Brutefir on my Raspberrys (in fact i use it on all my SOC Boards, as there are 2 Singlecore Raspis, a Hummingboard i1 and an OrangePi Pc (getting more and more supported each Day since Armbian is developing for it))

For the Raspis I modified the Kernel a little bit, so i have at least Multichannel output via HDMI feeding my AVR up to 8 channels at 48kHz and 44,1kHz ,16bit (for me good old White Book Standard is enough 😀)
I can run Brutefir with 65536 Taps and 4 Channels Output on a RPI2 without any Hazzle, there is still Headroom for more.
The Singlecore Pi is able to do 4 Channels with 8192 Taps. By pushing the Core up to 900MHz, the Load is constant at 90% running stable for Hours.
It is possible to run a Mpd Server on the same Machine.

I have a Wolfson Audio Card for The Pi one, perhaps I will find some Time to do some Measurements with different counts of Taps in the next Days.

I am interested in the results myself; there was a Discussion on the german HifiForum about how much Taps are enough, concerning the new HD Minidsp 2x4.

Regards

Thanks! Practical information. Just to make sure I understand, your data flow is:

Input -> Resample to 16/48k (if needed) -> BruteFIR 4 ch x 65536 taps (probably 32 bit internal math?) -> HDMI to receiver

RPi 3, especially if recompiled in Aarch64 (needed for NEON) is about 120-130% the performance in terms of pure math crunching versus the RPi 2. And, more so, more bandwidth to/from memory (although I'm not sure how much bandwidth is needed in our application). All in all, sounds like you could convincingly do 6 channels x 65536 taps. Not bad at all.
 
On the high end of the SOC-micro/mini/meso/nano/pico/femto/atto 😀 computer spectrum, if you're willing to be a bit more hands-on and don't *need* the built-in WIFI/BTLE, the ODROID C2 looks pretty promising (Amlogic s905 processor = quad A53 clocked out to 2ghz). Extra memory to boot.

Best if used headless by the looks of it, as I don't think the ARM Driver for Linux is all that hot.
 
Thanks! Practical information. Just to make sure I understand, your data flow is:

Input -> Resample to 16/48k (if needed) -> BruteFIR 4 ch x 65536 taps (probably 32 bit internal math?) -> HDMI to receiver
That is the way on the Pi2; on the Singlecore it is 4x 8192.
I have just put the SD card from the SInglecore to RPI2, to show a little Picture. This is the PI2 running MPD, Brutefir with 4x8192 Taps @16/44.1 Output...
 

Attachments

  • Screenshot - 03042016 - 06:35:50 PM.png
    Screenshot - 03042016 - 06:35:50 PM.png
    48.3 KB · Views: 343
Status
Not open for further replies.