Fourier Transform Speaker

Earths · 2025-04-27 9:57 pm

MarcelvdG said:
looks somewhat similar to a 19th-century spectrum analyser

I though you were joking, wow what a device.
Helmholtz resonators except with an acoustic feed rather than tuning forks?

deckar01 · 2025-04-27 10:11 pm

I went down the N-channel crossover rabbit hole a while back and looked into harmonic separation. I decomposed audio into 96 channels of audible chromatic tones, then mixed them into 12 channels of fundamentals and their octave multiples. Digitally the transformation is invertible by just mixing the channels back down, but simulating spatial separation with a virtual sound stage produced lots of distortion, so I shelved it.

I suspect they are using CQT (Constant-Q Transform). They stated 20 channels per speaker and I count 14 speakers (w/sub - stereo), so 280 frequency bins. Even if they are going outside the audible spectrum, they are probably sampling south of 50 cents per bin. They don't say how they are grouping the bins, but the goal seems to be preserving driver inertia. That sounds like expensive (not real-time) computation which might require the input to this rig to be preprocessed on a PC.

njswede · 2025-04-27 10:19 pm

lcsaszar said:
(F)FT works for time-invariant signals, as far as I know. It does not work for music. I might be wrong.

The continuous Fourier Transform assumes a periodic, steady state signal and assumes it repeats for an infinite time. In practice, however, we always do discrete Fourier Transforms over finite time (because infinite time would take too long 😀), which means that you can chop up a long signal (like an entire song) into frames and do FT on each one of them. That’s what compression formats like MP3 do (although I think they use slightly simpler algorithm like the cosine transform). So in a way, you can do discrete FT over a time variant signal.

MarcelvdG · 2025-04-27 10:20 pm

Earths said:
I though you were joking, wow what a device.
Helmholtz resonators except with an acoustic feed rather than tuning forks?

I've often seen the one at Teylers Museum here, but never in use.

njswede · 2025-04-27 10:24 pm

So it’s basically just a lot of speakers and a really gnarly bank of digital crossovers? I saw some mumbo jumbo about directing the each tone to a separate speaker, but I’m sure that’s just a fancy way of talking about an N-way crossover.

weltersys · 2025-04-28 12:18 am

CharlieM said:
If I'm understanding the concept correctly; a typical digital audio setup recreates a music signal as a mixture of pure sine waves that mix and interact via the Fourier transform principle, and the mixing occurs within a digital processor, upstream of the power amp(s) and speakers.

The Fourier transform takes a signal in the time domain and transforms it into the frequency domain. This transformation reveals the different frequencies present in the signal and their respective amplitudes and phases.
In typical digital signal processing, the Fourier transform is used to analyze and manipulate signals based on their frequency content. The signals are split by frequency content upstream of the power amp(s) and speakers. Using digital IIR filters, splitting the frequency ranges results in an alteration of the phase of each filter's pass band.
The more pass bands and the higher their slope (more "poles") the more phase shift from low to high frequencies.

CharlieM said:
In contrast; in this guy's setup the mixing of the component sine waves occurs at the speaker itself. That is; driving each speaker are twenty separate signal processors feeding pure component sine waves into twenty amp channels-- discretely driving twenty drivers in each speaker. As such, the Fourier transform occurs at the speaker itself, as the separate driver outputs blend together in front of the speaker.

The concept is no different than a typical digital audio setup other than the amount of pass bands, and the use of FIR filters.
FIR filters require more processing power than IIR, but can use the inverse Fourier transform to maintain a flat phase response through as many pass bands as desired.
When the output of the multiple pass bands (20 per tower!) are combined through individual loudspeakers widely separated in their physical location, the wide bandwidth flat phase response can only be preserved in one specific location, the "center listening chair". The designer is pointing out that waveform being similar to the original, or a pair of decent headphones in this section of the video:

Linear phase at our center listening chair.png

CharlieM said:
That's a lot of digital processors and amp channels so even if this guy succeeds in taming all the gremlins, it's a complicated and expensive setup.

When tuned for precise phase alignment at the "center listening chair", that alignment will be wrong everywhere else.
That "gremlin" is inherent in the design.
That was obvious in the video from the next day (posted in #6), presumably after tuning.
Rather than "greater impact and detail of the bass from the speakers", the recording from off center sounded terribly wrong.

It's definitely a complicated and expensive setup for one chair..

Mark'51 said:
There probably would be less intermodulation distortion. But imaging? That's a hard sell.

One certainly does not need 40 drivers and amp channels to get IM or AM distortion down below audibility at typical home listening levels, while imaging details suffer almost inversely proportional with the individual point source separation distance.

Art

njswede · 2025-04-28 12:34 am

CharlieM said:
If I'm understanding the concept correctly; a typical digital audio setup recreates a music signal as a mixture of pure sine waves that mix and interact via the Fourier transform principle, and the mixing occurs within a digital processor, upstream of the power amp(s) and speakers.

Nothing is mixed. The principle of digital audio is incredibly simple: It’s just a stream of numbers that each represent a voltage that the DAC should output at fixed time intervals. A filter then “connects the dots” between these voltage samples and, for all intents and purposes, exactly recreates the original analog signal up to some maximum frequency (usually in the 20s of kilohertz for consumer gear). Everything is done in the time domain. No FFT or mixing is needed.

MarkRehorst · 2025-04-28 3:41 am

The way I interpret what is being done is they run an FFT on the input signal and break it into harmonics each of which occurs at a specific level depending on the input signal. Each driver operates over it's optimum frequency range, and each has a sine wave generator attached. At one instant you might have a fundamental at 400 Hz with a 2nd harmonic at -20 dB, a 3rd at -40 dB, a 4th at -15, etc. With 20 drivers/signal generators you can have the fundamental and up to 19 harmonics (but only for a low frequency square wave input). In that instant you turn on sine generators at the appropriate levels, attached to the drivers for which those frequencies are best reproduced. For example, for a 30 Hz fundamental, you send the fundamental to the biggest bass driver, and harmonics are produced by smaller drivers higher up in the speaker.

It isn't just a multiway crossover where you filter the input signal and send pieces to different drivers. Each driver is producing a sine wave. The sum of the sines add up to the original input signal.

mvs0 · 2025-04-28 8:03 am

Its not Fourier transform speaker!
If it was and lets say we the 32 point transform at 48000Khz:
This would result in the following bands:
1500Hz
3000Hz
4500Hz
6000Hz
7500Hz
8000Hz
9500Hz
12000Hz
13500Hz
15000Hz
16500Hz
18000Hz
19500Hz
21000Hz
22500Hz

Do you see this mapping to the speakers on the pictures?

havun · 2025-04-28 9:04 am

weltersys said:
, while imaging details suffer almost inversely proportional with the individual point source separation distance.

Art

Stereo format works only in the horizontal plane, and the speakers (sound sources) are located vertically, how should this affect the detail of images in the stereo picture?
If the speakers were placed horizontally, it would be as you write or am I wrong?

ianbo · 2025-04-28 10:23 am

njswede said:
The principle of digital audio is incredibly simple: It’s just a stream of numbers that each represent a voltage that the DAC should output at fixed time intervals.

Yes. I wonder if @CharlieM was thinking of DSP in his original post, rather than digital audio as such?

The primary idea behind these speakers, going by the video in post #1, seems to be that real world speaker drivers have a hard time reproducing complex signals, because the inertia of the cone (or other radiating surface) makes it hard for them to change direction quickly enough. By asking a driver to reproduce just a simple sine wave, you make its job much easier.

Is this true, though? We have an established way of assessing how good a transducer is at changing direction at different speeds - frequency response measurement. To reproduce a high frequency, a cone must change direction quickly; to reproduce a low frequency it must change direction slowly. A frequency response measurement gives you an idea of the bandwidth over which any particular transducer can operate successfully. Most can manage a few octaves with decent linearity, especially when equalised (passively or actively).

So I'm sceptical about the premise. Also, I don’t really understand the decision to stack lots of unbaffled drivers vertically. Was that just to get a lot of drivers as close together as possible, or is there some other thinking involved? And, truth be told, I'm rather horrified by the complexity that this system seems to involve.

But it's good that there are people around who are willing to think for themselves and try out unconventional approaches. It would be a boring world if there was only one way to build a good speaker.

njswede · 2025-04-28 12:02 pm

mvs0 said:
Its not Fourier transform speaker!
If it was and lets say we the 32 point transform at 48000Khz:
This would result in the following bands:
1500Hz
3000Hz
4500Hz
6000Hz
7500Hz
8000Hz
9500Hz
12000Hz
13500Hz
15000Hz
16500Hz
18000Hz
19500Hz
21000Hz
22500Hz

Do you see this mapping to the speakers on the pictures?

It's also a bit suspect that there are 13 speakers. I would have gone with 16, since it's a power of 2.

ianbo · 2025-04-28 12:29 pm

MarkRehorst said:
The way I interpret what is being done is they run an FFT on the input signal and break it into harmonics each of which occurs at a specific level depending on the input signal. Each driver operates over it's optimum frequency range, and each has a sine wave generator attached. ...

It isn't just a multiway crossover where you filter the input signal and send pieces to different drivers. Each driver is producing a sine wave. The sum of the sines add up to the original input signal.

There's no way this would work, and give anything approaching fidelity. The complexity of real-world music signals goes far beyond what such a system could reproduce.

If the system worked as you suggest, it would be acting as something like a primitive polyphonic synthesiser. A modern digital keyboard has orders of magnitude more capability than that, yet is still nowhere near what you'd need to synthetically recreate the world of recorded music out there.

njswede · 2025-04-28 12:32 pm

I deleted my previous post, because it wasn't fully accurate. OK, it was plain wrong. 🙂 Thinking about it, I believe you COULD actually do this using Fourier transforms, more specifically Short Time Fourier Transforms (STFT). Since the Fourier transform is a reversible function, you can always get the signal back by doing an Inverse Fourier Transform, which is more or less just adding the spectral components together. It doesn't matter how short the FT is, you will always get the original back, albeit a very short snippet of it. You could take an 8-point FFT of 48kHz signal and get 3k, 6k, 9k, 12k, 15k, 18k, 21k, 24k, route each spectral component to its own speaker and let the ears do the Inverse Fourier Transform by adding them together. I have no idea how that would work in practice. Probably not very well, if I were to guess.

But wait! How can we reproduce, say, a 100Hz tone when our lowest frequency is 3kHz? Well, the result of each Fourier transform is an amplitude and a phase for each tone. And that varies over time, so the lower notes are actually produced by modulating the higher frequencies. So at the end of the day, the lower frequencies are still present across all the drivers. So I'm not sure what that achieves.

Also, this is not "breaking the signal into harmonics" as someone suggested. Harmonics are features of a single tone, not a piece of music. Instead, we're breaking up the signal into fixed Fourier components that have nothing to do with the harmonics of the signal.

mvs0 · 2025-04-28 12:37 pm

ianbo said:
Yes. I wonder if @CharlieM was thinking of DSP in his original post, rather than digital audio as such?

The primary idea behind these speakers, going by the video in post #1, seems to be that real world speaker drivers have a hard time reproducing complex signals, because the inertia of the cone (or other radiating surface) makes it hard for them to change direction quickly enough. By asking a driver to reproduce just a simple sine wave, you make its job much easier.

Is this true, though? We have an established way of assessing how good a transducer is at changing direction at different speeds - frequency response measurement. To reproduce a high frequency, a cone must change direction quickly; to reproduce a low frequency it must change direction slowly. A frequency response measurement gives you an idea of the bandwidth over which any particular transducer can operate successfully. Most can manage a few octaves with decent linearity, especially when equalised (passively or actively).

So I'm sceptical about the premise. Also, I don’t really understand the decision to stack lots of unbaffled drivers vertically. Was that just to get a lot of drivers as close together as possible, or is there some other thinking involved? And, truth be told, I'm rather horrified by the complexity that this system seems to involve.

But it's good that there are people around who are willing to think for themselves and try out unconventional approaches. It would be a boring world if there was only one way to build a good speaker.

That is also the outcome for any multi-way speaker. Thats just what this is..
They could called it a true fourier speaker if they would have made it 1024 or so..

ianbo · 2025-04-28 12:49 pm

mvs0 said:
They could called it a true fourier speaker if they would have made it 1024 or so.

Well, maybe not true, but closer. (Modern digital keyboards use samples of real instruments, plus something like 128 note polyphony, just to attempt to mimic a single instrument.)

njswede · 2025-04-28 12:54 pm

The problem with my 8 point FFT approach above is obviously that the lowest frequency speaker has to be able to reproduce 3kHz, which more or less makes it a tweeter. And that totally defeats the purpose.

I still think it’s just a bunch of filters that he gave a fancy name…

ianbo · 2025-04-28 1:24 pm

njswede said:
I still think it’s just a bunch of filters that he gave a fancy name

Yes, I'm sure you're right. Presumably 20 way (there are 13 drivers in the vertical stack, plus 5 planar/amt drivers on the frame, plus 2 woofers?) and presumably with linear phase filters.

The whole vertical stack of unbaffled drivers thing is a puzzle, though. They will be quasi-omni, horizontally, and with much-reduced direct sound at the listening position (from the stack of 13 anyway), compared to indirect sound. If they do have a distinctive sound, it'll have much more to with this than the 20-way crossover, I reckon.

njswede · 2025-04-28 4:04 pm

I don’t know much about physical speaker design, but directing them away from the listener seems… interesting…

ianbo · 2025-04-28 4:36 pm

I watched the video in post #6 where the Audiophile Junkie guy listens to them. It's interesting how he reacted.

His first reaction was to pull his seat a lot closer. That's not a surprising response, since these speakers will have a much higher ratio of indirect to direct sound compared to most speakers.

The second thing he picked up, after he did that, was a leanness in the upper bass/lower mids - he suggests 100-250Hz. If each driver is covering approx. half an octave and they start from the floor with the lowest octave, then his ears would have been pretty much level with the drivers producing exactly those frequencies -and that's where the dipole null would be strongest. Listening further back might minimise this problem.

Search

Amplifiers

Source & Line

Loudspeakers

Design & Build

General Interest

Live Sound

Member Areas

Site

Featured Vendors

Members Market

Vendors Market

Vendors

Search

Fourier Transform Speaker

Earths

deckar01

njswede

MarcelvdG

njswede

weltersys

njswede

MarkRehorst

mvs0

havun

ianbo

njswede

ianbo

njswede

mvs0

ianbo

njswede

ianbo

njswede

ianbo