Experience of using AI programmes for loudspeaker design

I know very little at the moment about the ability of AI programmes to generate sensible, or good data for typical 2 - 4 way loudspeaker design, hence my ramble questions below.

The candidate AI programme I am asking about here is Claude.ai. I assume the free version or do I need to go to the subscription version?

Kindly, can somebody provide a brief overview of what the free version would allow me to achieve, is it viable for loudspeaker design ?

Forum member mechano23 using XMachina has already shown that you can make a well received design, if the correct parameters are applied. I definitely do not want to have to spend three times the amount on crossover components compared to the £80 driver I wanted to used. Cost are one constraint or parameter to be entered?
Do they accept FRD and ZMA file data, do they need real data or can they work from manufacturers data sheet data and apply diffraction loss other cabinet effects, once cabinet data is entered?

Within the questions I am not sure how many microseconds these things take to produce an output, or does it take the course of an evening whizzing around searchable data on the internet before it provides an answer? As a preference I like graphical display in the time and frequency domains as they allow me to have a indication of what is going on

In terms of model training for a loudspeaker design, are you allowed to describe the parameters that would allow you to measure pre defined standard say IEC 60268-21, the Harman preferred curve, BBC, Stereophile reviewers listening room, or audio express recommendations etc.

If not part of the AI and it is up to the designer to provide parameters, maybe we should start listing what parameters we think are relevant.?.
Does somebody want to share their design an parameters to enlighten me. How would it handle a remake of a Spica TC50 using Purifi drivers for one example and currently available Peerless drivers for the other.

Finally I assume as the models learn over the course of time one user in Shanghai will end up with exactly the same Xover design for a certain set of drivers and optimum box size, as somebody in Jakarta, if they use exactly the same choice of drivers?

Any help, thoughts or guidance much appreciated.
 
  • Like
Reactions: Netlist
I think current large language model AI will not be able to properly design a speaker because it pulls on existing original creative content to predict and extrapolate the next step. In something totally new requiring spontaneous synthesis, it may have a problem as all it does is copy and extrapolate. It may get lucky. I have tried with ChatGPT and gotten rehashed designed pulled from the internet and tweaked by an AI “amateur.”

But I don’t think you can give it an FRD and ZMA file, baffle shape and dimensions and ask it to make a crossover design. Not yet anyhow. Maybe when someone specifically applies foundation models to the all the existing published designs then a LLM based on the audio specific trained FM can make a nice design.

As more AI generated content is uploaded to the internet, and that content is used to train new AI models, the resulting models will get worse and worse due to poisoning of the original creative content. Probably within a few years we will notice the AI generated stuff is going to not be as good as what we see now. Unless they take measures to prevent this self poisoning of the data.
 
Don't be deceived. ChatGPT and the likes are very good in "understanding" human speech and composing a slick answer. Actually the talk is so good you would believe it.

Yet I have to see one technical question answered in a way which contained new knowledge, insights or anything I could not find on Google. I also could post a long list of questions I asked ChatGPT or POE which were answered in perfect grammar and contained inaccurate or plain wrong answers.
 
Here is an example prompt for ChatGPT 4.0:

Design me a new 2 way bookshelf speaker with a 5.25in woofer and 1in dome tweeter with a symmetric crossover at 1.6kHz.

Make the cabinet 7in wide x 12in high by 8in deep sealed design.

The woofer should be poly cone and tweeter silk dome.

Sensitivity of woofer is 86dB at 2.83vrms and tweeter is 92dB at 2.83vrms.

Response by ChatGPT:

Here’s a detailed design outline for your two-way bookshelf speaker:

Specifications

1. Cabinet Dimensions:

• 7 inches wide x 12 inches high x 8 inches deep (external dimensions).

• Material: MDF (medium-density fiberboard), 3/4-inch thick for rigidity.

2. Drivers:

• Woofer: 5.25-inch poly cone, sensitivity 86 dB at 2.83 Vrms.

• Tweeter: 1-inch silk dome, sensitivity 92 dB at 2.83 Vrms.

3. Crossover Frequency: 1.6 kHz, symmetric design.

4. Enclosure:

• Type: Sealed.

• Internal Volume: Approximately 4.5 liters (accounting for material thickness).

5. Crossover Type:

• Second-order Linkwitz-Riley (LR2) for both woofer and tweeter, providing a 12 dB/octave slope.

• Includes an L-pad to match tweeter sensitivity to woofer.

Crossover Design

Components:

1. Woofer Section:

• Low-pass filter: Second-order LR2.

• Inductor: L_{\text{low}} = 1.25 \, \text{mH}.

• Capacitor: C_{\text{low}} = 12.5 \, \mu\text{F}.

2. Tweeter Section:

• High-pass filter: Second-order LR2.

• Capacitor: C_{\text{high}} = 12.5 \, \mu\text{F}.

• Inductor: L_{\text{high}} = 1.25 \, \text{mH}.

3. L-Pad for Tweeter:

• Resistor R_1 = 3.9 \, \Omega.

• Resistor R_2 = 6.8 \, \Omega.

• This reduces the tweeter sensitivity by approximately 6 dB to match the woofer.

Cabinet Design

Internal Features:

1. Bracing: Use a vertical brace from front to back to reduce panel vibrations.

2. Damping: Line the interior walls with acoustic foam or polyfill to minimize internal reflections.

Driver Placement:

1. Tweeter: Centered horizontally, placed 2 inches below the top.

2. Woofer: Centered horizontally, 3.5 inches below the tweeter.

Baffle Design:

• Chamfer or round the baffle edges to reduce diffraction effects.

Sealing:

• Ensure airtight seals at all driver mounting points and joints to maintain the integrity of the sealed design.

Performance Predictions

1. Sensitivity: ~86 dB (after matching with the L-pad).

2. Frequency Response:

• Approx. 60 Hz to 20 kHz (-3 dB points).

3. Impedance: Nominal 8 ohms.

Assembly Notes

• Use high-quality binding posts for speaker terminals.

• Ensure precise crossover assembly on a circuit board to minimize interference.

• Verify phase alignment between drivers at the crossover point (reverse polarity testing is recommended).

Here’s the Bill of Materials (BOM) and sourcing recommendations for your bookshelf speaker. The components include the drivers, crossover parts, and additional hardware.

Bill of Materials (BOM)

1. Drivers

1. Woofer:


• Model: Dayton Audio DC130B-8 (5.25” poly cone woofer, 8 ohms, 86 dB).

• Cost: ~$25 each.

• Source: Parts Express.

2. Tweeter:

• Model: Dayton Audio DC28F-8 (1” silk dome tweeter, 8 ohms, 92 dB).

• Cost: ~$30 each.

• Source: Parts Express.

2. Crossover Components

1. Woofer Low-Pass Filter:

• Inductor: 1.25 mH air-core inductor (~0.2 ohm DCR).


• Model: Jantzen Audio Air Core Coil 1.25 mH, 20 AWG.

• Cost: ~$9 each.

• Capacitor: 12.5 µF polypropylene film.

• Model: Dayton Audio PMPC-12.5.

• Cost: ~$7 each.


2. Tweeter High-Pass Filter:

• Capacitor: 12.5 µF polypropylene film (same as above).

• Inductor: 1.25 mH air-core inductor (same as above).

3. Tweeter L-Pad Resistors:

• Resistor 1: 3.9 ohms, 10W wire-wound.

• Cost: ~$2 each.

• Resistor 2: 6.8 ohms, 10W wire-wound.

• Cost: ~$2 each.

3. Cabinet Materials

1. MDF Sheets:

• Material: 3/4” MDF.

• Cost: ~$30 for a 2’x4’ sheet (one sheet is sufficient for two cabinets).

• Source: Local hardware store (Home Depot, Lowe’s).

2. Damping Material:

• Type: Polyester fiberfill or acoustic foam.

• Cost: ~$10.

3. Bracing:

• Use MDF offcuts for bracing.

4. Driver Mounting Screws:

• #8 pan-head wood screws (~$5 for a box).

5. Binding Posts:

• Model: Dayton Audio BPA-38G Gold Binding Posts.

• Cost: ~$8/pair.

6. Gasket Tape:

• Type: Foam speaker gasket tape.

• Cost: ~$6.

Total Estimated Cost

• Per speaker: ~$90 to $100.

• Per pair: ~$180 to $200 (including tools and additional hardware).

Recommended Sources

1. Parts Express:

• Great for drivers, crossover components, and speaker accessories.

2. Madisound Speaker Components:

• Reliable for high-quality crossover components and DIY kits.

3. Local Hardware Stores:

• Ideal for MDF sheets, screws, and basic tools.

Optional Tools

• Soldering Iron: Required for crossover assembly (~$20).

• Jigsaw or Router: For cutting driver holes (~$50).

• Clamps: For gluing the cabinet (~$15).

• Multimeter: To verify crossover assembly (~$15).

1732733938786.png


(That has to be the most basic odd schematic I have ever seen) but good enough for P2P. But looking at the circuit it seems the input is shorted to GND! So obviously there is a huge technical error. If one designed it based on the text alone it might work.

——-

There is no consideration for phase matching etc as we have seen.
 
Last edited:
The ability to reason well enough to actually do basic math was missing from general use AI's until recently. They are getting better all the time, but specific scientific applications often have necked down and highly trained systems just for that task. It's hard to predict when they will have sufficient understanding to effectively design something like a refined multi-way speaker, but I don't think we're there as of November 2024.

 
Last edited:
  • Like
Reactions: grindstone
I imagine once the parameters are specified as you did in your example @xrk971 , there is a lot of variables to work with, and many variables for it to explore or loose itself in.

Like human learning, and using people to refine the modelled parameters these things will take many manhours from skilled programmers for it to work smoothly.

For a Government funded establishment or a global company that is targeting a particular type of data it will pay dividends.
For companies that work in medical research or people trying to predict climate change or other more sinister things it will need investment in the chipsets that are capable of handling AI, and people to work on the code or provide the correct input into the tools to auto generate the code.
 
Certain areas like medicine can be really helped for example…

Authors showed that LLMs outperform doctors in diagnosing in real-life differential diagnosis scenarios even when the doctors can use the LLM to help them. They got 35.4% correct, while doctors (with an average of 11 years of experience) got only 13.8%. Furthermore, they showed that their top-10 diagnoses contained the correct one far more often than doctors (55.4% vs. 34.6%). When they gave the doctors access to the LLM, their performance again fell short (24.6% for diagnoses, and 52.3% for top-10).

https://arxiv.org/pdf/2312.00164
 
It is not as bad as I expected. It states crossover frequency is 1.6 kHz while the components are selected for 1.2 kHz. I don't understand that, there are numerous sources on the internet stating how to calculate a cross-over. 1.2 kHz would barely be acceptable for this tweeter.
 
Raymond, let's not forget that there isn't a clear stated formula for building speakers to a given approach. Not to mention that there are many and varied approaches, often considered valid in their own right.

If you go searching you'll uncover much incorrect information. Some subjective, some misunderstood, some guessed at and much which is simply not known by the majority.

To put it succinctly, the best way to know is to already know...
 
In my defence, I did in fact suggest that we start listing the relevant parameters in my post.
I am also aware that there is a two part audio express article that touches on the subject and lists parameters.

When you consider the basic knowledge needed by people to allow them to populate VituixCad with FRD files that contain correct minimum phase data , then add the ZRD data. Also not forgetting to add data for the pesky vertical plane measurements which I often do. Turning a tall loudspeaker on its side and aligning it to the microphone has its own set of balancing and support problems.
I also have limited space available for making the measurements, so it has already become quite an involved process, and I imagine that this is where a lot of newcomers give up.

moving further on into the tools menu to design the box, one also needs a good understanding of driver parameters and checking of values if you have entered your own data. Progressing further manually interacting with the programme by looking at low frequency response ,group delay, vent air velocity and power handling characteristics, additionally playing with box loss, Qt, SB4 and BW responses etc, several more variables there.

Moving onto the Xover itself, an area where there are lot of questions generated in this forum, you have many choices.
DSP building blocks, Passive crossover, either as a building block or hand entered parameters.

If you have got to this stage in the design process you can happily listen to your design, or continue to optimise it to a known standard, assign weighting or apply your own tweaked response line and then commence listening and change components as required to come up with your chosen compromise.

Some background here into why I posted my question,
The reason for the question in the first place is that Netlist asked me to start a new thread on the subject of AI usage for loudspeaker design. This was because I had asked a question in a thread regarding claude.ai and the generated Xover for some SB Satori drivers

I haven't used REW or ARTA for several years now, although I am sure I could happily utilise them for a design.
I still have a two channel Behringer box and a phantom powered microphone with cal file which I used some years ago with a Hypex DLCP and some Ncore amps to apply the necessary DLCP corrected signals to a pair of kit built LX minis.

At that time, while I was in a reasonably well-paid job, and was thinking about some nice Spendor speakers. I decided not to purchase a new pair of speakers.
Instead, I opted for a Clio pocket and used the remaining funds to buy my youngest a budget-friendly second-hand car.
I also have Sound easy, but I haven't opened for some years, and its old USB key gets flagged by Windows 11, with it needing a security feature to be disabled.

I am simply thrilled and content by Vituixcad abilities, and I haven't used all of them or possibly understood them all yet. 🙂

Obviously it maybe some time before AI starts making some real progress for loudspeaker design, so definitely not before Christmas.
In the first place feeding it good real-world data will be a key factor, so no great changes there.

At the same time the XMachina optimisation work looks promising for Xover tasks for those who do not want to do their own manual parameters.

 
I’m running my own, local AI these days via Ollama and Open-WebUI. Just for ***** n’ giggles. Different LLMs produce different results, and it is important to understand that the same model can produce different result every time it is queried.

Here’s what “mistral-small:latest” model does when presented with a loudspeaker design query:

Can you help me design a mass-loaded transmission line loudspeaker?

mistral-small:latest
Certainly! Designing a mass-loaded transmission line (MLTL) loudspeaker involves several steps, from understanding the principles to selecting components and calculating dimensions. Here's a simplified guide to help you through the process:


1. Understand the Principles​

A MLTL loudspeaker uses a long tube with a resistive load at one end (often filled with damping material) to control the back wave from the driver, reducing reflections and improving low-frequency response. The design involves calculating specific dimensions for the enclosure and transmission line.


2. Select Components​

Driver:​

  • Choose a woofer suitable for your application. Pay attention to parameters like sensitivity, impedance, frequency response, and power handling.

Port:​

  • Determine the port diameter using the formula d=S, where S is the area of the woofer's cone.

3. Calculate Dimensions​

The dimensions for a MLTL are calculated based on the desired tuning frequency (f0). The transmission line length (L) is typically around 1/4 of the wavelength at f0, and the termination resistance (Rt) should match the driver's impedance.


Formulae:​

  • Wavelength (λ) : λ=f0c, where c is the speed of sound in air (~344 m/s).
  • Transmission Line Length (L) : L≈4λ.
  • Volume of the Transmission Line (Vt) : Vt=A×L, where A is the cross-sectional area of the transmission line.

4. Calculate Resistive Load​

The resistive load at the end of the transmission line should be calculated based on the driver's Thiele-Small parameters (mainly Vas). The formula for the termination resistance (Rt) is: Rt=QtsZ0where Z0 is the characteristic impedance of the transmission line, and Qts is the total Q factor of the driver.


5. Design the Enclosure​

  • Transmission Line Dimensions : Ensure the dimensions are practical for your application.
  • Materials : Use damping material (like fiberglass or foam) to line the transmission line and improve absorption.
  • Port Tuning : Adjust the port length to achieve the desired tuning frequency if necessary.

6. Simulate and Optimize​

Use software like WinISD, BassBox Pro, or similar simulation tools to model your design. This will help you optimize the dimensions and predict performance before building.


7. Build and Test​

  • Construction : Use high-quality materials for construction to ensure airtight seals and proper alignment of components.
  • Testing : Measure the frequency response, phase response, and overall performance using measurement equipment like an oscilloscope or a dedicated loudspeaker measurement tool.

Example Calculation:​

Let's assume you have chosen a woofer with a Vas of 30 liters, Qts of 0.5, and a frequency response down to 40 Hz. You want the MLTL to be tuned to around 40 Hz.


  1. Wavelength at 40 Hz : λ=40344≈8.6 meters.
  2. Transmission Line Length (L) : L≈48.6≈2.15 meters.
  3. Volume of the Transmission Line (Vt) : If you use a rectangular transmission line with a cross-sectional area of 0.1 m², then Vt=0.1×2.15=0.215 cubic meters (~215 liters).
  4. Termination Resistance (Rt) : Assuming a characteristic impedance (Z0) of around 6 ohms, Rt=0.56=12 ohms.

Final Considerations:​

  • Ensure that the driver's power handling and sensitivity match your amplifier and overall system requirements.
  • Consider crossover design if you are using multiple drivers for a full-range system.
  • Proper sealing and damping are critical for the performance of the MLTL.







Wow! Great! So if I provide the Thiele-Small parameters of a driver, could you calculate the ideal MLTL ?



profile

mistral-small:latest
Absolutely, I can help guide you through the calculations given the Thiele-Small parameters of your driver. Please provide the following Thiele-Small parameters for your woofer:


  1. Resonant Frequency (fs) : The free-air resonance frequency of the speaker.
  2. Voice Coil Diameter (D) or Sd (effective piston area): This is often given as a diameter, but you may need to calculate the area for some formulas.
  3. Mechanical Q (Qms) : The quality factor of the suspension system.
  4. Electrical Q (Qes) : The quality factor of the voice coil and magnet system.
  5. Total Q (Qts) : The total Q, which is usually calculated from the other two.
  6. Vas : The volume of air that the speaker acts like it has behind it when in a closed box.
  7. Cms : The mechanical compliance of the suspension system.
  8. Le : The inductance of the voice coil.
  9. Re : The resistance of the voice coil.
  10. BL : Force factor (product of field strength and length of the gap).
  11. Mms : Mass of moving parts.
  12. Fs or Resonant Frequency .

With these parameters, we can calculate the ideal dimensions for your MLTL loudspeaker.

—————————————

I will continue to provide it T/S parameters and see what happens.
 
  • Like
Reactions: raymondj
It will be interesting to see the spread or tolerance of output data over 20 or so runs.

Does it churn out the parameters quite quickly or takes a few minutes?

If the data closely matches previous models within a certain tolerance, then niavely I assume that it appears you have another tool that you could market or share for people wishing to build TL's?

Would you have to bundle this up as a product to make a useable tool, or is it likely others will arrive at the same design criteria but by using different sources from your chosen Ollama and Open-WebUI. , plus Large Language Model (LLM). I had to look that up, which indicates i know jack *&^% about this.
 
I am an AI-skeptical, be warned.

I assume you (@cogitech) do know something about speaker enclosure design. So you will assess the AI output. That is the goal of this project, isn't it. Imagine the next user does not know much about speaker enclosure design, how will he assess the soundness of the design before starting to build it?

Any AI model so far is fed with knowledge from the internet. That includes sites like Reddit. Why do you expect AI to produce information which is more useful compared to which you would have gathered yourself? I mean I see the difference between information I got from Reddit or from StackExchange. AI does not tell me where it got the information from. And does it weigh the information from certain sites.

AI is awfully good at making a nice story in perfect grammar about anything. You would almost believe it. However when I ask something I know really much about I see that AI produces a perfect article but not with the information I asked for.

Almost half a century ago when PSPICE was new on PC's and run on an AT with CGA graphics, my boss said: "Sure you should simulate your circuits. But to verify. Not by designing your circuit by randomly changing part values until it looks nice. YOU are the designer. Keep it that way."

Watch the movie Idiocracy (2015)
 
Why do you expect AI to produce information which is more useful compared to which you would have gathered yourself?
I don’t. I’m just playing along and contributing to the thread. At no point did I make any claim about the accuracy of the output, good or bad. I simply submitted a query and posted the results here for others to observe and interpret as they will.

I also never said I have used, or will use it to design speakers.

Great movie, indeed. I am also an AI skeptic, however I choose to confront it head on rather than look the other way. Also, I have the skills to run it privately at home, which increases the comfort level dramatically. Public, online instances (chatGPT, etc.) have obvious privacy and data mining implications.
 
Last edited:
Does it churn out the parameters quite quickly or takes a few minutes?
Sorry I didn’t answer earlier.

The speed depends on the LLM model that is used. With the mistral-small model I used in this example, it churns it out about as fast as I can read it, which is actually sort of slow. The model is too large to fit in the 4GB of VRAM on my Nvidia GPU in my laptop, so instead it uses system RAM and the 12-Core CPU. GPUs are more capable at processing the requests, so if you use a model that fits in VRAM, it is blazing fast. For example “llama3.2:latest is a much smaller model, which fits comfortably in 4GB of VRAM, and therefore the GPU is utilized 100% when running that model. It answers nearly instantly.

If you would like to play with locally-hosted AI and need something more accessible than Ollama + Open-WebUI, then I recommend MSTY. https://msty.app/ Anyone should be able to get that up and running in a few minutes. Performance will be much faster if you have a modern GPU with lots of VRAM, but also not terrible with a decent multi-core CPU.
 
The way most people use AI is sort of like a search engine that you can ask questions in plain English. That's fine, but you can go WAAAY further than that. This is an example of generating graphics from raw data.

I used FuzzMeasure to run measurements on a speaker at 0 15 30 45 60 75 90 degrees. I set it to compare all 7 curves on one graph like this:

focal-audax-0-15-30-45-60-75-90.png

Then I clicked on the graph and asked it to export data. It gave me a 14 column CSV that looks like this:

focal-audax-14columnCSV.png

All 7 measurements in one CSV. Which is nifty.

I really like FuzzMeasure, but it doesn't do polar spectrum plots. I'm not particularly fond of Windows or the DATS Parts Express measurement program that I normally use to generate these so I thought I'd try ChatGPT instead.

I uploaded my spreadsheet to ChatGPT 01, along with a screen shot of a DATS output, just like this:

chatgpt-polarplot1.png


It gave me this output:

chatgpt-polar-output2.png


As you can see it generated an entire python script for me on the spot without being asked.

Then it told me:

Key points in the script:

Normalization: magX_norm = magX - mag0 enforces 0° → 0 dB at all frequencies.

Mirroring: We simply copy each off‐axis curve (e.g. +15°) into its negative counterpart (−15°).

Log scale: plt.xscale('log') makes the horizontal frequency axis logarithmic.

Color range: vmin=-15, vmax=0 ensures 0 dB appears at the upper end (red) and −15 dB at the lower end (violet) of the rainbow colormap.

That should produce a heatmap similar to your attached PNG, with angles running vertically from −90° to +90° and frequencies along the horizontal (log) axis from ~20 Hz up to 20 kHz.

So then I asked it:

"Can you run the python script?"

It thought about it for a few seconds and generated this:

chatgpt-polarplot-3.png


Hey Folks - keep in mind that's what I got on my very first try.

Then I replied:

"can you smooth the graphic so the data looks less pixelated?"

It immediately gave me this new graph:

Smoothed Polar Heatmap (Linear Interpolation) Normalized to 0° = 0 dB.png

But that's not all.

It generated a NEW python script:

chagpt-polar-response4.png


Last week I also used it for FFTs.

I exported an impulse response graph. I uploaded it to ChatGPT and said "please run an FFT of this impulse response. Ignore everything before 8msec and everything after 25msec. Generate a CSV file with 3 columns: Frequency, magnitude in DB, and phase in degrees. Smooth the data to 1/12 octave.”

It did all of that exactly as requested. I didn't have to explain what FFTs were or anything. It figured all that out.

Then I uploaded the output FFT CSV file into FIR Designer which generated FIR files for my MiniDSP unit.

I haven't tried using ChatGPT to make FIR files, but that's something people can explore.

Upshot: You can do things in minutes with AI, that would take hours of programming or monkeying around with spreadsheets. You just have to ask it for EXACTLY what you want, in plain English.

This revolutionizes all kinds of careers. Obsoletes others.

It slashes stupid busy work that gets in the way of what you want.

If you're not playing with this stuff (which is fun, it's NOT a nuisance or burden) you're missing out on 1) cool hobby stuff, and 2) you're also falling behind in your career... because odds are there's stuff in your job or other parts of your life (like analyzing your 401K or whatever you like to do) that can be way easier than doing it the old way.

The new Google Gemini 2.5 has incredible capabilities as well. The AI companies are trying like crazy to one-up each other because they know this is the next trillion dollar opportunity, literally.

I hope you guys are paying attention.
 
Last edited: