nVidia CUDA GPU computing PC FIR

Please see first..
http://koonlab.com/CUDA_RealFIR/CUDA Real FIR.html

"Real" FIR for 4Way requires huge computing power.
So I selected nVidia's "CUDA" Graphics Card computing.

By Cuda, I think fan less PC can do 88.2k/24bit, 8192 Taps, 4 Way FIR channel dividing at real time. (with fan less video card, 8600GT / fan less CPU, Athlon BE)

These knowledge are required to reproduce CUDA FIR at your PC.
(1) FIR calculating, mathematical knowledge
(2) C programming
(3) nVidia graphic card, newer than 8000 series
(4) nVidia CUDA SDK programming
(5) nVidia CUDA compatible OS, and Driver.

CUDA information is here.
CUDA site http://www.nvidia.com/object/cuda_home.html
(please don't ask basic CUDA/C/FIR math to me...)

I posted current source, converted wav file to my site.
If your player can open / play extended wav format, please let me know.

attached is Audacity screen capture, it shows result wav file. If I have 8 channel sound device, maybe I can play it now.


  • 4way_converted.png
    46.5 KB · Views: 2,179


Paid Member
2002-09-25 11:01 am

If you don't need a high-power CPU, the card can happily sit with a fanless Thermalright Ultra-120 heatsink topping a BE or an E2160 (the 2160 runs very cool at default clock, and is a little more efficient than the Athlon) in a silent case and a fanless supply. Maybe just one system fan for everything. The card core will get to about 100 degrees C though, but it can take it.

The 9600GT by ECS is fanless and has more shader power (3x, IIRC) than the 8600GT. And is just $150 from Newegg.com

I just found another reason to buy one, but are binaries possible for the programming-illiterate like me?
I already got
and 2 of ARCTIC ACCELS1 Rev 2 (They looked like low inventory).
I'm thinking to use 2 8600GT, or 2 lowered clocked 9600GT, anyway non full power usage to keep < 80deg and get enough power. Only in worst case, I need automatic fan (which normally stopped).

The binary - My .exe requires CUDA libraries, so anyone needs to install CUDA driver and CUDA SDK. and I'm working on XP 64.
I wonder CUDA application can be "Installer" package.


Paid Member
2002-09-25 11:01 am
Hope 2x cards work out well for you. SLI does have a few issues with latency as I understand it, which creates overhead at low system loads. This is why it is usually slower (!) than a single card at resolutions below say, 1600x1200.

But that is gaming, don't know if it is similar for GPGPU applications such as yours. I think you're one of the few people who're using CUDA in a public space yet, so congratulations.

On another note, the Accel S2 is what the ECS cards use for their passive 9600GT. With a 'turbo' module, it is enough for 8800GT. the turbo model is two small fans that strap on to the cooler. And the Scythe looks like a monster, total overkill, I love it :)

I'm on XP32, so your application a)will not work for me or 99% of consumers but b)is the right approach as 64-bit is much more efficient at this sort of thing anyway. I'm kind of hoping it'll work with only the driver binaries and I will not need the SDK (doubt it though as of now CUDA is only a developer-oriented offering because of the lack of consumer GPGPU applications), but i still need a graphics card - I'm assuming the CUDA driver means I can't use the card for its other purpose. I have a dedicated music machine so switching to 64-bit is only an issue of buying an OS.

I'm watching the space closely, so good luck! I'll download the 27mb file in a couple of days (connection running slow) and give it a whirl on my 1212m. I may not be able to hear all of it as I don't have anything to decode the ADAT outputs, but I'll be able to switch between channels and hear them one at a time.
Yes CUDA does not use SLI.
Now I'm working with 8800GTS and 8400GS, I can specify card number to run. (this is my main powerful PC, so I will build other fanless PC)

With CUDA driver, I can run some OpenGL2.0, and DirectX9 applications so it's not a limited functional version.

I posted to my site, test result of Xylo-L board from www.fpga4fun.com.
This is very easy board, enough for my application, but not a extreme faster.
Hi, Now I did almost hardware/firmware/PC software, and listening True 8192 Taps FIRed 4 way sound:spin:

I need to grab whole (extended)Wav data on the memory, and I have no time to tell foobar how many minutes:second played. Current Source file was posted on my site, so if you can try to implement in foobar dll, I can look your result.


  • p1010322diy.jpg
    98 KB · Views: 1,945
>Maiky: sorry, now I made my own GUI to control complicated command line programs.
This is small program by vb2005, about 1200 lines. I'm not sure copyright of this program.
attached shows main GUI and Calculated Frequency response.

I can drag/drop folders or single file to play list or conversion box, then convert / play / stop / prev / next. enough so far:)


  • wavexgui.png
    54 KB · Views: 1,396
I got a contact from nVidia and they told me which dll can be / to be included.
Then I posted zip binary package to my site.

Package has
FIR Parameter generator
GPU FIR Converter, 44.1/16 stereo wav to 8ch extended
Source codes
sample BAT file

CUDA capable GPU (8x00, 9x00) and driver
Microsoft VisualC++ 2005 redistributable runtime(many people already should have. or search Microsoft & download)

Exact Audio Copy : to rip CD to wave, without list / info chunk
Audacity : to look converted wave

How to play
easy: 7.1ch audio, and Microsoft Media (9 or later) / foobar
great: foobar / ASIO / Lynx AES16 board

How to recompile
Visual C++ 2005 (or you can try edit and recompile on Linux)
nVidia CUDA toolkit

I hope someone can find bugs in source code :)
Maybe only one solution to get true FIR within acceptable money:)

peufeu said:
8.5 G tap calculation / sec ? Nice.
What format are you using for the FIR ? 64 bit float ?
8.5GTaps are just a "minimum required". GPU can do far more than that.

nVidia GPU accepts 32bit float, but I estimated before that it is enough for 16bit output. (insufficient for full 24bit accuracy)
Next generation of ATI or nVidia GPU computing will support double precision format. it has 52 bits of numerics, so will be accurate enough for 24bit audio.
But it will be trading between performance.(maybe no problem practically? I hope)

In my opinion, 16bit - 24bit is below -100dB issue, more important is true FIR, linear phase, flat synthesized responce, with sharp -80dB filtering in 1/4 Octave, is now possible.
and 88.2kHz(or 176.4kHz) up-sampling will improve for high freqs.

I recommend you will look extended wave format playback and sending data to your DAC. it's convenient.


2008-10-08 1:32 pm
I think i'll use something like this when i'll get my new hifi system (in no less than 5 years /cry...).
Do you think by that time the video card will be able to interface with the audio card? That is, i use the video card as crossover, but i link the amplifier directly to my audio card.