Measuring speakers using music (Michael Tsiroulnikov's FSAF)

One of these 3 traces was measured with music. Which one is it?

  • Red - nothing goes fast as red!

    Votes: 3 18.8%
  • Blue - yawn...

    Votes: 1 6.3%
  • Black! You know what they say about black...

    Votes: 1 6.3%
  • What? Is there more than one trace?

    Votes: 4 25.0%
  • I need more hints... can I at least listen to the test file?

    Votes: 7 43.8%

  • Total voters
    16
No words can express...
 

Attachments

  • 20241024 DualTriode .png
    20241024 DualTriode .png
    64.1 KB · Views: 103
Going back to the original quiz ...
There are 3 curves in 3 colours in the attached graph. Two are very similar but the Red curve is different below 200Hz. Which of A, B or C is the Red curve?
Tranh, you going to tell us which is the Red curve? My guess (and its only a guess as I dunno da parameters of da white noise signal) is it's the white noise measurement using FSAF
Thanks Tranh for telling us the correct answer. Red curve was measured with MLS pseudo random white noise.

Was this using the MLS signal as the stimulus but processing with FSAF? The answer will be important later 😊
 
Reading Mike's stuff from da MATLAB downloads linked in REW suggest his main interest is AEC. Finding speaker distortion mucks this up and that current driving helps, he joined us plebs starting the thread Experiments with the current drive
No, his original work on using FSAF for loudspeaker measurement was uploaded to MatLab in Nov 2020 and other work using adaptive filtering in subbands to measure responses dates back at least to 2019. More general work on related topics goes back to at least 2001.
 
No, his original work on using FSAF for loudspeaker measurement was uploaded to MatLab in Nov 2020 and other work using adaptive filtering in subbands to measure responses dates back at least to 2019. More general work on related topics goes back to at least 2001.
Thanks JohnPM

v1.0.0 of FSAF in MathWorks is dated 25nov20
The present v3.0 has
3 LOUDSPEAKER / ROOM MEASUREMENTS [408,409] as an application of FSAF from p32 in fsaf.4.pdf

Loudspeakers for AEC: Measurement and Linearization is on v2.0 20oct24

Were there others?

I still think his main interest (at least his day job) is AEC but it's really moot at this point.

 
Last edited:
.. from #118
Could you gurus please comment on my probably naive, understanding (guess?) of what Michael is doing.
  1. Use FSAF with arbitrary stimulus to produce a long IR which models the supposedly LTI transfer function of your DUT (including room responses bla bla) This produces an 'accurate' response which we know corresponds to other methods for IR & response. Loadsa buzzwords and TLAs which I don't unnerstan at present but we don't care cos it gets 'good' response.
  2. Convolve this IR with the stimulus to get what should be the perfect result if the transfer function is LTI (ie has no distortion, compression bla bla)
  3. Subtract the expected perfect result from the actual result to get the residual.
  4. Analyse the residual to get the display.
John Mulcahy, the father & mother of REW, has confirmed this is what happens .. or at least in his version of Michael's stuff. Indeed, Mike says this is on p32 of fsaf4.pdf under 3.1 BASICS. Missed that in the welter of TLAs & buzzwords.

Originally,I thought FSAF did some supa dupa juju to dream up the distortion. But apart from step one, dreaming up the model, the other steps aren't that different from what I, Bill Waslo and other luminaries (& pseudo luminaries) have done in Jurassic times.

FSAF is ONLY used to dream up the model.

John Mulcahy's implementation of Mike's stuff has created loadsa interest cos people think it is a method to measure using music and even listen to the yukky stuff

There are 3 separate processes in da method above.

To investigate the claims that the FSAF residual (dunno if we should call it FSAF residual as it is common to da Jurassic stuff I, Bill & others have done) is an accurate representation of yucky stuff generated by speakers playing music, we look at each process in turn. … to be continued.
 
STEP ONE Measuring, or modeling our supposedly Linear Time Invariant (LTI) Device Under Test (DUT)

FSAF isn't the only way to get a good IR / Response / Model. IM not so HO (and also that of AP, Audiomatica, Da Dynamic Duo Lipshitz & Vanderkooy … ), Angelo's sweep method gets to a given accuracy faster than any other method. Mike claims -115dB accuracy for FSAF with an unspecified stimulus p39 fsaf.4.pdf

(DcibeL, Tranh and those of you who have compared Angelo's method with FSAF using the same sweep stimulus, what's the difference in processing time (if any)? p39 fsaf.4.pdf says 55x slower than Angelo or MLS to compute.)


You can use ANY broadband Stimulus to get IR / Response / Model. Just deconvolve the DUT output with your Stimulus ... 2 FFTs and trivial (complex) arithmetic. You effectively do this for both Angelo's method and traditional MLS.

But the stimulus also decides measurement accuracy, and more importantly,how this accuracy is affected by distortion.

At one end is Angelo's sweep. This is very robust in the face of distortion and in fact, is used to measure distortion too. Each harmonic appears as a pre-impulse before the 'response' IR (or at the tail of the FFT record if you prefer cos the 'cyclic' nature of a FFT record)

At the other end are 'noise' type stimuli like MLS. Accuracy of measurements with MLS are badly affected by noise. You see this inTranh's RED curve #1 and #34
also Distortion Immunity of MLS-Derived Impulse Response Measurements

But Tranh's RED curve is a FSAF measurement using MLS as the stimulus. The accuracy of this appears to be what you'd expect from the traditional MLS method, ie somewhat inaccurate cos the DUT isn't fully LTI (has distortion).

DcibeL and TNT, using Angelo's sweep but processing with FSAF shows what appears to be the same “total distortion. FSAF using Angelo's sweep seems to work though without giving us the individual harmonics. Response is good too.

It appears FSAF is no better or worse than traditional zillion point FFT methods (which are effectively what Angelo and MLS methods use.) I discount FSAF's 55x greater computational load cos this will become trivial as computing power rises.

Accuracy of FSAF is dependent on the stimulus as with other methods.

FSAF with noise stimuli becomes inaccurate with higher distortion in the DUT. FSAF,using 'noise' can't actually measure Distortion except via the cheat I'll describe later.

FSAF using Angelo's sweep can measure 'Total Distortion' but can't separate the Harmonics.

In between these two extremes are music and other broadband signals. Most of them are good enough for response with FSAF. How accurate and robust depends on how close they are to 'noise'. I'm not sure any of these are good for 'distortion' except via the cheat I'll describe later.

Dunno about multi-tones. Need to be sufficiently 'broadband' to do good response. Gotta engage brain.
 
  • Like
Reactions: dimitri
JohnPM, is the IR model dreamt up by FSAF 'denoised'? ie noise MUCH below the tail of the IR?

If the "model IR" has noise, this will be convolved onto your original signal too. eg a sine wave will gain 'noise sidebands'.

You have this even from a "model IR" from Angelo's method, the most noise resistant method. If I was using one of these, I'd have to get rid of half the IR (the half with the harmonic IRs) and check if the remaining noise is well below the tail.

If this noise was significant, I'd model the IR starting with eg Simple Arbitrary IIRs The Excess Phase of transducers is fairly straightforward to grok and hence model but I dunno about the HUGE Excess Phase of a Room Impulse Response (RIR)

As FSAF is a 'modelling' method (as far as my small brain can grok), perhaps the IRs it dreams up are already denoised. This may be the real benefit of the 55x computational load with FSAF 😊

Or it might be simpler to just use an even longer Angelo sweep or loadsa averaging.
 
Last edited:
The FSAF adaptive filtering process produces a best estimate of the LTI response in a least squares sense. For a given stimulus duration, a log sweep will recover an IR with a noise floor around 12 dB lower than FSAF with a noise stimulus, based on my testing with programmatically applied distortion levels of around 10% and various levels of measurement noise. However, Michael's argument is that sweep measurements give an overly flattering view of the distortion the system will produce as (1) they are only able to extract harmonic distortion and (2) the stimulus is more benign than music. As such FSAF provides an opportunity to use an arguably more relevant stimulus and examine the total distortion that produces. There are faster and more accurate ways to extract the LTI behaviour of a system, but not while also extracting a distortion measure from content representative of final use.
 
The FSAF adaptive filtering process produces a best estimate of the LTI response in a least squares sense. For a given stimulus duration, a log sweep will recover an IR with a noise floor around 12 dB lower than FSAF with a noise stimulus, based on my testing with programmatically applied distortion levels of around 10% and various levels of measurement noise.
Thanks for this John.

Your 12dB better noise floor compared to FSAF with noise is better than my Jurassic experience with my version of Angelo's method compared to MLLSA. I'll tell Prof Farina, you think it's a log sweep rather than exponential too 😊
However, Michael's argument is that sweep measurements give an overly flattering view of the distortion the system will produce as (1) they are only able to extract harmonic distortion and (2) the stimulus is more benign than music. As such FSAF provides an opportunity to use an arguably more relevant stimulus and examine the total distortion that produces.
There are faster and more accurate ways to extract the LTI behaviour of a system, but not while also extracting a distortion measure from content representative of final use.
I'll pontificate on this in the next few posts.
 
Steps2 and 3. Obtaining the residual and the cheat disguised. This is the 2nd PROCESS

Simple convolution and subtraction with book-keeping to ensure the residual is as small as possible in a Least Mean Squares (LMS) sense bla bla. Bill Waslo will be familiar with the nitty gritty.

But for a 'small' residual, the model IR must be as accurate and noise free as possible. Otherwise, the residual isn't a measure of the non-LTI ness of the DUT, but of the inaccuracy of the measurement method.

With MLS stimulus, this is a double whammy. Device distortion makes the'model IR' inaccurate so the subtraction isn't accurate either. The residual is greater than it should be and we can't quantify this cos the residual is just more noise. The inaccurate 'model IR' is also 'noisy' and this noise is convolved again into the original stimulus before we do the subtraction.

An inaccurate 'model IR' gives a large residual and this is the cheat that allows us to hear a residual regardless of whether or not it represents the non-LTI behaviour of the DUT.

The most accurate 'model IR' is from Angelo's method and probably also from FSAF using Angelo's sweep. With Angelo's method, you need to throw away AT LEAST half your 'model IR'cos that bit has the IRs corresponding to each harmonic at the tail end of the FFT record. If the noise in the remainder is well below the tail of the 'model IR', you are good to go .. but see next episode.

Music as a stimulus is somewhere between these two cases. Not quite as accurate as Angelo's sweep but better than MLS and other 'noise' used with FSAF depending on how close the 'music' is to noise or to Angelo.

But for the subtraction, we don't have to use the 'model IR' we derive from the stimulus. If we know the 'model IR' is inaccurate eg from MLS or certain types of music, we can use a more accurate 'model IR' eg from Angelo's method.

How do we know we are using a better 'model IR' ? A better'model IR' will give a smaller residual and this is true regardless of the original stimulus.
 
I've glossed over loadsa important stuff but the above is generally true. You might ask, “Why doesn't a 'model IR' derived from the stimulus itself give the smallest residual?” Using this should give NO RESIDUAL. If the 'model IR' was the same length as the stimulus, this would probably happen.

We have a residual cos the 'model IR' is only a Least Mean Squares (LMS) estimate. It is usually much shorter than the stimulus. Just over the RT60 reverb time for the Room Impulse Response (RIR) measurements that we use to 'listen to yucky stuff' ; <1sec for Mike's RT60=200ms room in “Loudspeakers for AEC”. Much shorter than even that for a quasi anechoic response that us Jurassic speaker designers like


More importantly, the residual is a measure of LTI ness. 'Linear'(distortion) is only one part of LTI. 'Time Invariant'ness is also shown. (Angelo's method is a good way of showing this clearly without burning up your HF units) This is mainly changes in frequency response due to voice coils heating up on sustained signals (compression) and will certainly increase the residual.

BTW, this non-TI stuff is detectable on musical DBLTs. Once distortion gets below a quite easy bar, it has little effect on musical DBLTs. Of course there are test signals where DBLTs easily detect low level distortion but I'm more interested in what can or cannot be heard on music.

Let's look at inaccuracies, using Angelo's method cos these are predictable and it is in theory the most 'accurate'.

A 15sec 20Hz – 20kHz sweep is long enough to clearly show non-TI stuff. eg a Treble unit without Ferrofluid will show less HF compared with a 1sec sweep. This is non-TI ness.

But which measurement is the better 'model IR' ? Answer. The 'model IR'from the short sweep is accurate for the short sweep stimulus and will show the smallest residual for that. Similarly the 'model IR' from the long sweep is accurate for that.

On the other hand, noise type stimuli like MLS are inaccurate even for their own stimuli. You see this in Tranh's RED curve #1and #34

In one case, Angelo's method accurately describes the DUT behaviour (including distortion) but under 2 conditions; short sweep and long sweep. In the other, processing a 'noise' type signal shows how inaccurate this can be with DUT distortion.

Both give a residual. But in one case, it is the LTI behaviour(s) of the DUT.. while in the other, it is the inaccuracy of the measurement.

A music stimuli will be somewhere between.

Whatever 'model IR' we use, however 'accurate', it will only give a small residual at certain times on a 15s music excerpt and this holds even for a 'model IR' generated by the excerpt itself. And there are certain stimuli (noise like stuff like MLS) that always result in greater or lesser 'model IR' inaccuracies from device distortion.
 
  • Like
Reactions: Juhazi
I like how you devote time to go through it! Although I do not understand significance of any of it because of very low knowledge on signal processing... The thing that draws me to FSAF is feel that ability to listen the distortions is great. Also the RIR stuff is intriguing as it seems it could be possible to get better semi-anechoic measurements at home? Quick skim through the papers (with AI) it seems there is several applications mikets work touches.
Both give a residual. But in one case, it is the LTI behaviour(s) of the DUT.. while in the other, it is the inaccuracy of the measurement.
How does this inaccuracy of measurement sound like in the residual? Is it noise as well or does it sound like the music? Can ear still detect the distortions out of the residual on top of the measurement inaccuracy?

I think ability to compare two drivers by listening, which one has "better" sounding residual is cool way to evaluate. At least now before really trying it. Hörnli brought mikets post about it up in post https://www.diyaudio.com/community/...ichael-tsiroulnikovs-fsaf.418843/post-7825327 Ability to listen annoyance while looking at all kinds of distortion graphs of the DUT, like harmonics, IMD, any, could perhaps reveal whether the harmonics actually matter and which ones and so on?
 
Last edited:
How does this inaccuracy of measurement sound like in the residual? Is it noise as well or does it sound like the music? Can ear still detect the distortions out of the residual on top of the measurement inaccuracy?
I've done a lot of simulation of distortions and listening to see (hear?) how much is audible on music. eg Intermodulation Distortion Listening Tests

The use of REW's residual (I'm calling it REW's cos Mike didn't invent it and it actually has little to do with FSAF) is new so we have to build up experience listening and correlating it to listening tests. My guess is ..

With noise stimulus, you can't tell anything cos the residual is also noise, perhaps shaped by various factors in ways we are yet to grok.

With music or Angelo's sweep, if you hear what appears to be a softer version of the original perhaps shaped by various factors bla bla, this result is almost certainly due to inaccuracy. If what you hear sounds crackly or as though it is a recording run at twice or 3x normal speed, then it is distortion. The paper above points out that Intermod doesn't sound like a speaker fault but rather a bad amplifier.

These are just guesses.

Also the RIR stuff is intriguing as it seems it could be possible to get better semi-anechoic measurements at home?
We could do this with MLSSA and very efficiently with Angelo's method. FSAF can do this too but it doesn't bring anything new to the table.
 
  • Like
Reactions: tmuikku
  1. Use FSAF with arbitrary stimulus to produce a long IR which models the supposedly LTI transfer function of your DUT (including room responses bla bla) This produces an 'accurate' response which we know corresponds to other methods for IR & response. Loadsa buzzwords and TLAs which I don't unnerstan at present but we know it gets 'good' response.
  2. Convolve this IR with the stimulus to get what should be the perfect result if the transfer function is LTI (ie has no distortion, compression bla bla)
  3. Subtract the expected perfect result from the actual result to get the residual.
JohnPM, what is the length of the 'model IR'? It must be shorter than the stimulus cos if it was the same length, it wouldn't be a LMS estimate, but 'exact'. I can't see you using a 'model IR' longer than eg several RT60s.

I think you see where I'm heading. My contention is you don't have to use FSAF to get your 'model IR'.

You can use straight deconvolution as Angelo's and MLS methods essentially do.

You can do this with music to determine response. A truncated result is your LMS 'model IR' which you convolve with your original music, and subtract to get the residual with the same accuracy as a 'model IR' obtained with FSAF.

ie You can compare two drivers by listening, which one has "better" sounding residual more easily.

How are you doing the convolution? I know this Millenium, there are efficient ways to do zillion point FFTs, even in 'real time', which were not available to me in Jurassic DOS days. Are you using these?

It would be nice to be able to use a different 'model IR' to get the residual in REW, rather than just the one derived from the stimulus.
 
Last edited:
  • Like
Reactions: Kravchenko_Audio
The paper kgrlee linked is downloadable only to AES members, but the conclusion is (bolded by me)

Intermodulation Distortion Listening Tests​

It is fairly simple to measure the amount of intermodulation distortion produced by loudspeakers, but it is more difficult to find out how much of this kind of distortion is found objectionable (or just detectable) when masked by music. It is made more difficult by the fact that this has to be done in the absence of other kinds of distortion such as harmonic and transient intermodulation distortion. In order to measure the effects of intermodulation distortion, a black box was built which was capable of generating a known and controllable percentage of pure intermodulation distortion, and then listening tests were held at different sound pressure levels with different kinds of music with several speakers and listeners. The results show that intermodulation distortion is masked to a large extent by music but it can be easily detected when pure tones are used.



Author (s): Fryer, Peter A.
Affiliation: (See document for exact affiliation information.)
AES Convention: 50 Paper Number:L-10
Publication Date: 1975-03-06 Import into BibTeX
Permalink: https://aes2.org/publications/elibrary-page/?id=2476