Brand New RaspBerry 3 : 64 bits

Status
This old topic is closed. If you want to reopen this topic, contact a moderator using the "Report Post" button.
I'm sitting here wondering the same thing: enough horsepower in the GPU for a FIR crossover?

How to optimize Raspberry Pi code using its GPU Pete Warden's blog

Haven't looked much past this, would need to write carefully aligned assembly to get the GPU to do our bidding. I'm not worried about latency in audio playback, but there'd need to be some decent optimizations to FFT/IFFT or direct convolution, primarily in cache alignment. Pretty much zero dependencies to worry about, though! :)

* I haven't touched assembly in a while. Could be fun!
 
I don't think the GPU is appreciably faster.

There's already a library that does FFT on the GPU for RPi here: GPU_FFT

It's a drop-in replacement for the FFTW 3 library used by tools such as BruteFIR.

More details on FFT on VideoCore here: https://www.raspberrypi.org/blog/accelerating-fourier-transforms-using-the-gpu/

All that said, an RPi2 is already fast enough to run 4 channel FIR filters in CPU, especially if you use a version of FFTW that takes advantage of the NEON SIMD instructions.
 
Last edited:
The raspberry pi folks have said that they will only provide 32bit OS for the near future - until performance benefits of 64bit are able to be demonstrated. Since it only has fixed 1gigabyte of RAM there is limited benefits of 64bit.

The ARMv8 instruction set is likely to be the biggest benefit. I thought the GPU had been upgraded and moved from a 250MHz clock to 400MHz?

Is there more than 2ch I2S output from these?
 
Thanks, Keylimesoda! That's very helpful!

Horsepower-wise, seems the GPU is same architecture, but clocked up a bit from prior SOC's. If I'm reading correctly. (250 MHz -> 300 MHz...I'll take 120%)

Probably need to park it out to a USB multichannel DAC, ultimately.
 
Last edited:
Raspberry PI 3

I'm excited by the PI 3 for the 64bits CPU, faster speed and build in WIFI and bluetooth. Ordered mine and waiting for delivery.

Will be keeping an eye out for 64bit OSs and apps to play with.

I currently play music with Moode 2.5 on my PI 2.

Rick
 
Much in the same way that so many applications are *still* 32 bit on laptops/desktops (at least under Windows), and only a few *need* 64 bit memory access, I don't really think the 32->64 bit part of moving from ARM7 to ARM8 is going to matter much for audio through an RPI and the kinds of things people use RPIs for. Improved NEON architecture and higher IPC are absolutely useful for audio DSP, along with a tidy clock speed boost.

For Moode player, all I expect you'll see is an ever-so-slightly lower power demand, as you're not taxing your RPI2 already. But it's certainly fun having new toys! :)
 
Are these at a point where they can run 8CH linear phase FIR filters with all source re-sampled to 96KHz? The only reason i ask is, that is the filters i am currently using for my 4Way active speaker system. The filters are created on rePhase, Jriver is the convolution engine and audio player. There are also some delay and level adjusts specified in Jriver. This is all currently running on an old i7 laptop with 8GB memory.

What are the equivalent software tools i can use on Pi3?

Btw, does pi3 have a I2S interface to directly feed the processed MC stream to a DAC board?

Another possibility is using Pi3 solely as a digital music player and streamer. I2S link to a DSP board doing XO filters, followed by a MC dac?
 
I read through a couple of tech reviews of the Pi 3 yesterday. It seems that the performance has been improved more or less across the board: memory, graphics processor, CPU/FPU frequency, etc. This has resulted in an improvement in "speed", as measured with benchmarks and real applications completing is less time, of 50%-100% with the price point remaining the same! Awesome! The addition of onboard WiFi (found to be decent in terms of performance) saves me the cost and resources of a USB WiFi dongle compared to the Pi 2.

After multiple failed efforts to order Orange Pi PCs from China and the price of those climbing from what was around $20 to more like $30+ now, combined with the poor quality of the OS ports to the Orange Pi, I can finally kiss that platform goodbye and embrace the Pi 3. I should be able to move all the code and other tools that I have developed for audio from my Pi 2 to the Pi 3 without a hitch. Judging by the excellent online community for software support and tips, as well as the good quality hardware of the Pis I have used to date, it's a no-brainer to stick with this platform for all my future audio projects.

Too bad that MCM has already sold out of Pi 3 stock!
 
Last edited:
jojip--that's basically my question as well. It's really going to come down to how fast the NEON hardware is on the quad a53. The double-precision FP ops are lost on us (I don't need my math to be perfect through -300 dB), but I do wonder if FP math or int math will be faster on the hardware. Another option is to do a single long FIR then split the signals with a IIR filter block, which, ostensibly, should need less hardware to crunch.

Cannot help you at the moment about the I2S link, as my original plan is to use a USB DAC (can work from there).
 
Last edited:
jojip--that's basically my question as well. It's really going to come down to how fast the NEON hardware is on the quad a53. The double-precision FP ops are lost on us (I don't need my math to be perfect through -300 dB), but I do wonder if FP math or int math will be faster on the hardware. Another option is to do a single long FIR then split the signals with a IIR filter block, which, ostensibly, should need less hardware to crunch.

Cannot help you at the moment about the I2S link, as my original plan is to use a USB DAC (can work from there).

Speak for yourself I guess. In my LADSPA plugins (DSP IIR crossover filters done in software) all the calculations are done in double precision (real numbers). I am looking forward to trying the 64-bit processing capability of the Pi 3, since this should speed up that portion of my code.
 
Speak for yourself I guess. In my LADSPA plugins (DSP IIR crossover filters done in software) all the calculations are done in double precision (real numbers). I am looking forward to trying the 64-bit processing capability of the Pi 3, since this should speed up that portion of my code.

That's fine, it's just wildly excessive: I'll carry the baton of FP32 being far more than good enough; beyond loading down your hardware, what *are* you gaining from FP64 over FP32? 24 bit mantissa seems, especially if you're smart about your coefficients and do volume control at the last second, the accumulated errors are very, very, VERY small. I haven't even looked to check to see what the differences between FP32 and FP64 are after dithering to a 24 bit (even if full scale) PCM stream.

No need for controversy, though, hardware is generally beefy enough to handle it.
 
That's fine, it's just wildly excessive: I'll carry the baton of FP32 being far more than good enough; beyond loading down your hardware, what *are* you gaining from FP64 over FP32? 24 bit mantissa seems, especially if you're smart about your coefficients and do volume control at the last second, the accumulated errors are very, very, VERY small. I haven't even looked to check to see what the differences between FP32 and FP64 are after dithering to a 24 bit (even if full scale) PCM stream.

No need for controversy, though, hardware is generally beefy enough to handle it.

At low frequencies the additional bits (in the mantissa) are really helping as the numerator and denominator transfer function coefficients become very close to 1.0. But you are correct that, at higher frequencies, the error becomes minimal.

Since often (at least for me, when I am implementing several filters on the same audio bitstream) there are a number of filter in series I want to minimize the accumulated error. For this reason I restrict the transfer functions of each filter to first or second order and use double precision for all TF calculations.

Certainly the audio signal itself is fine using 32 bit floats. It's only with the TF calculations that doubles are useful.
 
Hah, I suppose this is a safe point to jump off to a new thread. :) I'm wont to implement my system as a hybrid IIR + FIR (a single, long tap to handle low frequency effects) or FIR throughout, so we have different goals. I would be interested to see what kind of error rate you're seeing in FP32 vs. FP64, though. Sounds like quite a long filter chain.

Edit to add: http://iowahills.com/A8FirIirDifferences.html might be helpful for those confused about our respective concerns.
 
Last edited:
Hah, I suppose this is a safe point to jump off to a new thread. :) I'm wont to implement my system as a hybrid IIR + FIR (a single, long tap to handle low frequency effects) or FIR throughout, so we have different goals. I would be interested to see what kind of error rate you're seeing in FP32 vs. FP64, though. Sounds like quite a long filter chain.

Edit to add: Difference Between FIR and IIR Filters might be helpful for those confused about our respective concerns.

That iowa hills page is getting quite dated.

he beauty of FIR filters, and quite possibly their most important feature, is that they can be implemented with integer math. As you are surely aware, everyone wants small, low power, low cost, portable devices. These devices typically use a processor similar to the Texas Instruments MSP430, or an FPGA, or an ASIC.

These types of processors work great and are as common as dirt, but seldom have a floating point math core.
Most current minicomputers like the Raspberry and Orange Pi, ODROID, etc have an FPU in hardware. There is no "slow" FPU emulation going on, and hasn't been for years!

I find it a bit misleading that in their examples of filters showing how FIR can oh so easily be implemented with only a few times the taps than the coefficients of an IIR filter always happen to have the corner frequency at something like 0.25*nyquist. Unless you plan to do all your filtering at 10 or 20kHz you might want to look a little more deeply into just how many taps are typically used (generally many more, like thousands). And don't forget to check your FIR filter for deviations from the target... oops need still more taps!

I'm not sure why you would want to use FIR for low frequencies where the number of taps is large... and if you are going to use an FIR filter there really isn't a need to add on IIR filtering (or at least I don't see why) because whatever filtering that is done by the IIR filter can just be wrapped into the FIR filter. This is because an FIR filter can be as complex as you would like, so load em up!
 
I read through a couple of tech reviews of the Pi 3 yesterday. It seems that the performance has been improved more or less across the board: memory, graphics processor, CPU/FPU frequency, etc. This has resulted in an improvement in "speed", as measured with benchmarks and real applications completing is less time, of 50%-100% with the price point remaining the same! Awesome! The addition of onboard WiFi (found to be decent in terms of performance) saves me the cost and resources of a USB WiFi dongle compared to the Pi 2.

After multiple failed efforts to order Orange Pi PCs from China and the price of those climbing from what was around $20 to more like $30+ now, combined with the poor quality of the OS ports to the Orange Pi, I can finally kiss that platform goodbye and embrace the Pi 3. I should be able to move all the code and other tools that I have developed for audio from my Pi 2 to the Pi 3 without a hitch. Judging by the excellent online community for software support and tips, as well as the good quality hardware of the Pis I have used to date, it's a no-brainer to stick with this platform for all my future audio projects.

Too bad that MCM has already sold out of Pi 3 stock!

My first simple thought was close. More than,the 64 bits still with 1Go ram, the interessant information seem to be a better speed. Also a better electrical management : lower cunsumption, maybe some layout problems were involved as well ?! I bet for exactly the same layout for the 40 pins outputs (I2S, power, etc...). I read than also the USB power is better : like you I was going to buy a board : BananaPi to have the sata connection because USB HDD is not the best way with the Pi2 ! Wanted a Pi compatible instead of the good Odroid C1+ because I have the pin 40/I2S uf-l plug adapter made by IanCanada!
I knox nothing about ARM, I hope the plateform to be close enough to see our best musical playback softs to run on the Pi3 with few modifications !
 
Charlie--my point was less architecture specific and more the computational demands of each filter design and the importance of having a large mantissa for IIR. That said, NEON speedup on integer ops seems to be greater than FP ops (JPEG is int-heavy), but a far call from emulation.

Second, hybrid FIR/IIR data flow would go:
Input PCM -> Resample to (FP32/96k) -> long-FIR (2ch) -> IIR (6ch) -> Resample to output PCM (TBD)

Where that long FIR will help with DRC/bulk EQ/phase preemption, and the IIR filters can be very lightweight/idealized--get the most bang:buck out of that very expensive FIR step. Or, flip-flop, where low frequencies corrections are done IIR and everything else is done less expensively shorter-tap FIR. (Will explore all architectures)

Obviously FIR gives you a lot more flexibility, and you'd probably go INT32 instead of FP32 for intermediate calculations.

Ultimately, it's important to look at all the compromises and pick the best working solution (even if that's a simpler, less efficient solution and throw more hardware at the problem) after characterization. Or if you're happy, just roll with what you've got. :D
 
Charlie--my point was less architecture specific and more the computational demands of each filter design and the importance of having a large mantissa for IIR. That said, NEON speedup on integer ops seems to be greater than FP ops (JPEG is int-heavy), but a far call from emulation.

Second, hybrid FIR/IIR data flow would go:
Input PCM -> Resample to (FP32/96k) -> long-FIR (2ch) -> IIR (6ch) -> Resample to output PCM (TBD)

Where that long FIR will help with DRC/bulk EQ/phase preemption, and the IIR filters can be very lightweight/idealized--get the most bang:buck out of that very expensive FIR step. Or, flip-flop, where low frequencies corrections are done IIR and everything else is done less expensively shorter-tap FIR. (Will explore all architectures)

Obviously FIR gives you a lot more flexibility, and you'd probably go INT32 instead of FP32 for intermediate calculations.

Ultimately, it's important to look at all the compromises and pick the best working solution (even if that's a simpler, less efficient solution and throw more hardware at the problem) after characterization. Or if you're happy, just roll with what you've got. :D

OK, I see what you are saying now. IIRC if you can turn on NEON and SSE optimizations there should be some resulting speedup. Can't recall how much that would be, exactly...

Your FIR-for-system-correction + IIR-for-filters (if I am understanding it correctly) is one that I have thought about doing. A few months ago now I created a tool for doing phase-equalization (phase linearization) using FIR, applied above a certain frequency (e.g. 50Hz or whatever). In this approach I was able to set lots and lots of coefficients to zero and still get pretty good results. What stopped me was my inexperience with some basic properties of FIR filters and trying to figure out just how many taps I needed for the sample rate that I was using, etc. I need to go back and revisit that effort and get some implementation assistance.
 
FIR filters can be run through BruteFIR on RPi. Most of the heavy mathematical lifting is done by the FFTW library, which is already has optimizations for NEON.

I'm somewhat familiar with BruteFIR. I wasn't referring to how to implement an FIR filter on e.g. a Raspberry Pi but rather how to figure out how to choose the number of taps in my application as part of the design process to suit sampling rate and other parameters, and how to handle the coefficients that are set to zero so that they don't need to be calculated (only a few percent are non zero).

I tried to get some help with those topics back when I developed the phase-linearization tool, but I didn't get anywhere, probably because I didn't really know what questions to ask and even what the correct terminology should be in those questions! But I am very interested in the possibility of using FIR for system phase linearization, and I will revisit my previous posts on this. See this thread for more info:
http://www.diyaudio.com/forums/pc-based/281447-group-delay-equalization.html

Send me a PM if you are someone who can help me get up to speed with implementing this using FIR.
 
Status
This old topic is closed. If you want to reopen this topic, contact a moderator using the "Report Post" button.