Future of Distortion Measurement..?

From time to time I brainstorm on this topic because as I see we are "stuck" a little bit on this area.
Nowadays the mainstream and exclusively used/accepted method is the THD.
And although it's a very useful way of some basic distortion measurement, (IMO)
it is not enough or insightful to represent the level and nature of audio distortions.
(As most of the time they starts with 2-3-4 zeros, etc.)
It is a very specific and "narrow" test aspect with constant frequency, signal level, load, etc.
Probably an "amplifier's sound" builds up from just the "opposite": the dynamic behaviour
between different frequencies and levels of the input signal and the different states of the momentary load, combined.
So therefore having multiple but "static" tests are probably also not enough as the nature of these
transistions between these different states are probably a very decisive component.

One other aspect that the final target (the human hearing/sound processing) is also quite different
and works not on the numerical/scalar but probably much more on a "higher, content based" level,
so using DC, sine/square wave, burst, noise, etc based tests can be quite misleading as well.
IMO our "ears" (brain level "sound-to-perception" conversions) are highly sensitive to pattern level integrity.
This means it tolerates a lot of distortions if it doens't affect the integrity in a way
and is disturbed if the distortion is related to this original harmony
(even beyond the input signal as we have "built in" samples/sensitivity to natural harmony).
An example can be images or caricatures: link here
Our eyes decode the whole "big picture" (as a "content") and mateches patterns not just pixels.
(So measuring pixel level differences may catch human level distortions with difficulty.)

Getting back to content integrity I see it similar as a hash-mechanism (borrowing from the IT world):
Hashing is an algorythm where you map (or "compress") a large content to a smaller representation in a way where
if you change even just a small fraction on the large content, the hash will change and detect the smallest changes easily and quickly.

Content based distortion measurement: if we take a picture and want to analyze the distortion on a pixel
level it would be (very) hard to catch the content level distortion. For example if there is a white background
witch a black circle in the middle every human would recognize it immediately that this is a circle.
Now if we remove just 4-5 black pixels from this circle even then our brain will recognize it easily and quickly as a circle.
If we measure this distortion just on the pixel level it's (very) hard to distinguish wheter the distorted pixels are affecting the content or not
as we approach that this is just a batch of data equally significance.
It'd be hard to say what is significant while we approach just on the pixel level, as we don't know what is the content that matters.

On the contrary if we know that the content are mainly shapes then a vector-graphical representation is much closer.

One other interesting aspect is that although all our sound reproduction stages are "serial" devices
(they are processing one single stream of signal) the final outcome is perceived on the human level
as a buffered and rebuilt image/content. Similar how the CRT image reproductions works: a single beam of
electrons build up a 2D image/content and sometimes it's hard to correlate between the "stream level"
signal vs the content it builds up and the significance of the different distortions in the final outcome.
So again: measuring here with one (or more) simple, static, even continuous waveform would probably not be so meaningful
as the final spectrum is much more complex: changes and continuous "change of changes" etc.

Maybe some solutions:
* visualize the "single stream transfer function" as an actual picture could help with complex built up content level patterns..?
* or involve AI to help find us some correlations between human ear level results vs technical solutions..?
https://crowdunmix.org/the-best-way-to-recover-audio-distortion-using-ai/
https://www.marktechpost.com/2022/0...versarial-attacks-on-machine-learning-models/


What do you think..? Any thoughts are welcome!
 
I think the basic problem is that we think that distortion measurement results are directly connected to how something 'sounds'.
Distortion measurements are very useful to investigate the linearity (or not) of audio units. Nothing more, nothing less.
As such it is a useful design tool.

Your post has a number of interesting perception/hearing related statements. The problem is that non of them has a number or parameter associated with them. That makes it very hard to define a measurement or metric that gives insight in how it will sound.

If you want to select an amplifier on the basis of best linearity at all levels and frequencies a distortion measurement, preferably a multitone, is very useful.
It's like this: distortion measurments are useful to determine how 'HiFi' (in the original sense) a unit is, but not (100%) to determine how it will sound.

BTW The title of the video you linked to is misleading: it does not recover audio from distortion, it only tries to recover audio from clipping.
There is a much simpler solution for that: don't clip it.
The signal distortion caused by circuit and device non-linearity cannot be recovered.

Jan
 
  • Like
Reactions: IanHegglun
Hashing discards information but has the property that if a==b, then hash (a)==hash (b). The converse is not true: hash (a)==hash (b) does not imply a==b. hash (a) and hash (b) say nothing about whether a>b or a<b.

An incomplete circle is still recognizable because a circle contains very little information (constant radius in polar coordinates).

Discarding information can be useful if the information was redundant. This is the field of compression algorithms.

In audio, to run a FFT, one has to force the input to be periodic. That is a form of compression.
Ed
 
I hesitate to participate here, but it’s interesting.

What happens in the box of a device and what happens inside one’s head are phenomena in very different domains.

Instruments are very good at measuring engineering work products and some of these work products have a direct effect on whether we like the resulting sound or not. Exaggerated example: An amp adds a distracting noise in the audible spectrum.

That being said, the vast majority of modern products are well engineered, thanks to the careful use of measuring instruments.
Hence, the typical metrics of engineering are no longer quality differentiators.

This is particularly true because human senses and their processing by the brain have evolved to identify signal in noise. The brain will literally modulate down noise in an attempt to find signals.

It’s easy to assume “signal” is positive and “non-signal” is negative, but that may not be true.
For instance, some people like lots of even harmonic distortion.
I even know people who had to listen to music with a background of white noise, got used to it and now prefer it. Go figure.

So, is it a return to the dark ages and good engineering no longer matters?

No, good engineering will always matter, but the bar in that realm has been raised to a level of standard that makes it less differentiating.

We are simply back to expressing preferences, which are very personal, based on our body hardware, emotional state, life experiences, environment, upbringing and who knows what else!?

In that context, finding numbers to pin to preferences may not be practical or useful.
 
If we think that a speaker is a perfect, in any sense, transformer of an electrical signal to sound pressure - the electrical chain ans its impact on sound will be perfectly characterised by; distorsion, frequency response and phase. So if the premise I defined is correct, it follows that the measurement I mention is also 100% related to "sound" quality. If the premise is not correct
(which it isn't)
, well..

There is no need I think for any new measurement in the electrical (dig(ana) domain. Todays measurements systems exceeds human hearing by far. To break now ground, I think it is in the acoustical domain it needs to happen. But I believe there are a few things that are interesting:

  • Envelope / wavelet analysis can be interesting...
  • Impulse analysis where the stimuli is actually an impulse and not a calculation from a study state sinus sweep and FFT... the jolt of a natural impulse is really something else mechanically than checking in on a playing sinus tone - be it a sweep....

//
 
Your post has a number of interesting perception/hearing related statements. The problem is that non of them has a number or parameter associated with them. That makes it very hard to define a measurement or metric that gives insight in how it will sound.

Yes, maybe even the title isn't precise as the target could be rather visualize the complex response spectrum of a "black box transfer system" and not "just" giving a few digits as a measurement of some kind of narrow and meaningless aspect of distortion. I can imagine that would work much better and intuitively as such a method would both display the characteristic/nature of the system and the some "rude" technical distortions (clipping, ringing) as well.
For example let's imagine in video signal processing how difficult if would be to express a ghost image (sorry I dont know the exact english term for it) or some content level "skew" on the level of a serial signal stream or on a continuous sine wave test signal as it could be a distortion when the signal transistions from different level and frequency state to another state and the pattern of these changes are the content itself. That's why a distortion can cause some "edge-finding-type" contrast for example.


BTW The title of the video you linked to is misleading: it does not recover audio from distortion, it only tries to recover audio from clipping.
There is a much simpler solution for that: don't clip it.
The signal distortion caused by circuit and device non-linearity cannot be recovered.
The AI section I linked (for me) is just an example that AI may have the computing power to analyze very complex signals and yet recognize some content level pattern.
If it can decode and generate complete new songs from singers, separating some technical distortions could be a much easier challange if we set this as the target.

Hashing discards information but has the property that if a==b, then hash (a)==hash (b). The converse is not true: hash (a)==hash (b) does not imply a==b. hash (a) and hash (b) say nothing about whether a>b or a<b.

An incomplete circle is still recognizable because a circle contains very little information (constant radius in polar coordinates).

Discarding information can be useful if the information was redundant. This is the field of compression algorithms.

In audio, to run a FFT, one has to force the input to be periodic. That is a form of compression.
Ed
Why I brought up hashing was just the example that for our "ears" (brains) some small level differences can be irrelevant and easily tolerabable and "completable" and some others can be very disturbing, annoying, tiring. Maybe a method that'd visualize/display the complex transfer function could help to distinguish between these artifacts and could help a lot improving our solutions.
If applied not just at the output on the amplifier but "measured" with a stereo microphone it could catch up even with human level listening tests.
It'd be very interesting that could we built such a sensitive system that'd show up nuance level differences between cables for example... 🙂
If yes the audio reproduction "industry" (or the diyers community) could use this with a very big impact.
Even the focus of the measurement could be focused on different aspects as needed, The point would be that the result is a complex format response (a visual image for example) yet easily readable, which has much-much more info then a few THD numbers with static test signals.
And this could be used also for testing different loudspeakers (with the same system) or even with the same loudspeaker but tuning the crossover in the while.

...
So, is it a return to the dark ages and good engineering no longer matters?

No, good engineering will always matter, but the bar in that realm has been raised to a level of standard that makes it less differentiating.

We are simply back to expressing preferences, which are very personal, based on our body hardware, emotional state, life experiences, environment, upbringing and who knows what else!?

In that context, finding numbers to pin to preferences may not be practical or useful.
Not at all. This would be just another tool that would display the nature of an audio transfer system.
And beyond taste and subjectiveness even when not producing nueric digits. (But it could if needed!)
Imagnie a method where the things like "warm sound" or "thight bass" or "open soundstage" would be clearly visible.
With a clever "mapping" for example colors could mean warm sound and the odd harmonics some other color is visualization.
If this would work and widely accepted we could check and also tune our system to the desired nature of reproduction.
Even to enginer level precision and even to a "colorized" type of sound while other technical distortions could be also managed like local instabilities, ringing, difficulty with handling heavy load, etc.
 
There is no need I think for any new measurement in the electrical (dig(ana) domain. Todays measurements systems exceeds human hearing by far. To break now ground,
The point is that to detect and evaluate the differences (distiortions) are not trivial at all.
That's why I mentioned the CRT example.
If we just measure a simple technical quantity that wont help us much if the target is the final experience.
Another example: a "low level" engineer aspect could be DC precision.
One could say that all you have to do is just measure how the DC levels are at the output.
If the gain is 10x, measure it at 1mV, 2mV, 10mV, 100mV at the input and at the output.
We all know that this could be said as a complete and 100% coverage test but is not even if the measurement precision is 0.01uV.
The same is true for a typical frequency response. It gives a 100% coverage but we all know it's also not relevant in a lot of cases
as amps typically have very linear response and even so their sound is completely different.

THD is of course a next step but clearly far from the final destination...
It also just measures these "DC" values.

With complex (real world) signal and load (and even including speakers and room acoustics) the final outcome is much duifferent.

My "initiative" would aim this level of complex characterization.
 
Last edited:
If it can decode and generate complete new songs from singers, separating some technical distortions could be a much easier challange if we set this as the target.
That's not a valid comparison. If a signal is clipped it can in pronciple be recovered because there is a clear connection between the clipped waveform and the unclipped original. You don't need AI to recover that, some sweepable filters will do.
If a signal is distorted, information is lost. There is in principle no way to recover lost information.
You can look for say a 2nd harmonic but you do not know if that is due to distortion or whether it is part of the signal.

Jan
 
  • Like
Reactions: TNT