Digital Signal Processing - How it affects phase and time domain

Practical question then: Why does using the "gaussian" upsampling filters on CD audio in HQ Player before modulation to DSD256, then upon playback tend to result in enhanced spatial localization cues in the virtual soundstage? What must be the tradeoffs between time and frequency domains, etc.?
 
re' "Also, there are not many publicly available tools to synthesis IIR filters."

maybe so, but I'll bet nearly everyone posting here has one == REW
it has limits, but its a good starting point although as Mark said, not nearly so convenient as FIR-autoEQ

To be honest, that is really the only reason why I'm always looking for a better FIR platform and software. It may be true that FIR is a crutch and can be misused, but its also true that time is a limited commodity.

Hi Jack, yes it's amazing what REW can do. And get's ever more amazing.... if one can keep up with the updates Lol.

I so agree time is a limited resource...time and money right?
And is a reason I recently sprung for another auto-FIR generator that has measurement capability built in.
Let's me take a driver's measurement, establish an acoustic target for it, and then auto-generate a FIR file to give desired response.

Been learning/practicing with it, and it's become very quick.
For grins this morning, I timed how long it takes to do the following on a 4-way synergy designed for 100Hz up, .... that had no processing in place.



So tasks were:
*Make raw measurements of each driver section. Red low, dark green mid, light green HF, blue VHF.
1721575491378.png




*Choose xover points based on raw meas. (I already had 100, 250, 750, and 4000Hz in mind...could have chosen others, polars determine final points)
*Make FIR file for each based on those xover points (used my standard 96 dB/oct lin phase xovers). Let file file relative handle levels too.
*Measure each driver section post FIR, to verify FIR filters in-band responses and xovers.
1721575705379.png





*Input driver sections impulse peaks time differences to determine delays to insert into processor. .

1721575805969.png

All channels are delayed to low (Red), which is the last to arrive. For example using HF.
Red vertical bar is at impulse peak. Blue vertical bar is set at HF (Green) peak.
Difference 1.208ms, is what HF gets as delay in processor.
1721575951478.png


* make final measurement of entire speaker to verify.

1721576347342.png



OK, elapsed time to do all the above.
22 minutes.

Now this was just a single mic position 2 foot indoor tuning. Not at all the outdoor, off a deck, 3m multiple vertical mics, and spinorama that it takes for optinal polar confirmation. (Tha't where all the work is imo/ime).
Simple 1/24th smoothing on all sections. No windowing, no noise reduction, no yada.
What i'm trying to show with this example, is when this degree of automation can be accomplished so easily, so quickly,
it makes me much more willing to do measurement based xover experimentations, optimizing polars, freq dependent windowing, impulse reflection removals, various FIR filter smoothing comparisons etc ect.

And I get to HEAR the experiments...not just look at what models might tell me.
 
  • Like
Reactions: hifijim
No, it's about more than that. The math (in this case it's actually arithmetic) is straightforward. But what it needed is understanding of the interplay between the frequency domain and the time domain -- bandwidth and duration, causality and delay, spectral shape and time domain oscillation, etc. Those plus the standard implementation considerations of numerical precision and accuracy, execution speed, and so on.
Of course. But no need to make it harder than it has to be 😉
 
Practical question then: Why does using the "gaussian" upsampling filters on CD audio in HQ Player before modulation to DSD256, then upon playback tend to result in enhanced spatial localization cues in the virtual soundstage? What must be the tradeoffs between time and frequency domains, etc.?

I don't know if this helps, but in the case of traditional minimum-phase analogue filters, the filters that have a more or less linear phase response in their passbands, little or no overshoot, and a rather smooth transition from passband to stopband (and therefore rather poor rejection in the first octaves above cut-off) are all approximations to an ideal Gaussian filter in some way or other. I mean things like Bessel filters, Gaussian magnitude filters and 0.05 degree equiripple linear phase filters.

Unfortunately the manual of the latest versions of HQPlayer is only available when you install the program, which I don't intend to do, so I have no way to check if this is applicable.
 
To me that appears to be a question for psychoacoustics, not strictly DSP.
Its readily available information: https://en.wikipedia.org/wiki/Sound_localization
Some of the main cues include ratio of direct to reflected sound for distance, and ITD (Interaural Time Difference -- effectively phase accuracy between channels) within a few microseconds for precise lateral localization.

Marcel: The HQ Player manual does not give much detail. Mainly only the name of the filter. So the question is really one of preservation (and or enhancement) of cues during processing.

EDIT: I knew I was taking a chance asking to see if a practical problem would be seriously addressed. Too often experts turn up their noses when faced with interdisciplinary problems (or no-man's land type of problems, ones that lay in the cracks between nonoverlapping disciplines).
 
Last edited:
Practical question then: Why does using the "gaussian" upsampling filters on CD audio in HQ Player before modulation to DSD256, then upon playback tend to result in enhanced spatial localization cues in the virtual soundstage? What must be the tradeoffs between time and frequency domains, etc.?

Its readily available information: https://en.wikipedia.org/wiki/Sound_localization
Some of the main cues include ratio of direct to reflected sound for distance, and ITD (Interaural Time Difference -- effectively phase accuracy between channels) within a few microseconds for precise lateral localization.

I have two comments on this, one related to time resolution and one general one:

Time resolution
Theoretically, when you have a signal chain consisting of an ideal anti-aliasing filter, a sampler that just samples and doesn't quantize, and an ideal anti-imaging filter, the whole chain is linear time invariant from input to output, even though the sampler itself is anything but time invariant. Hence, shifting the input signal of the whole chain by 1 ps leads to a 1 ps shift of its output signal, even when the sample rate is low.

Of course, in real life, nothing is ideal, but with a non-ideal but pretty good anti-aliasing filter, a dithered ADC with a reasonable number of bits and a non-ideal but pretty good anti-imaging filter, the response from input to output is still close to time invariant. That is, the time resolution is definitely not limited to multiples of the sample time.


General
Regarding your observation that Gaussian filters tend to result in enhanced spatial localization cues, presumably compared with ordinary steep phase-linear FIR filters, I could think of these possibilities, in random order and with no guarantee that the list is complete:

A. You have less intersample overshoot issues with the Gaussian filter
B. When using a steep phase-linear filter, you get distracted by its pre- and post-ringing
C. You like the slight treble loss caused by the smooth roll-off of the Gaussian filter, it somehow helps you to concentrate on the localization cues
D. Your observation might be incorrect
E. The pre- and post-echoes of a steep filter with passband ripple disturb you

Assuming a 20 kHz cut-off frequency of the steep filter, the pre- and post-ringing is due to the absence of everything above 20 kHz. Hence, B can only be applicable if you can notice the absence of signals above 20 kHz somehow.

Regarding E, I mean the echoes related to the passband ripple and the finite length of the filter, as identified by R. Lagadec and T. G. Stockham, "Dispersive models for A-to-D and D-to-A conversion systems", Audio Engineering Society preprint 2097, presented at the 75th convention, March 1984. I've summarized it in appendix B of https://linearaudio.net/sites/linearaudio.net/files/03 Didden LA V13 mvdg.pdf .

In a nutshell, suppose that the filter has linear phase and that its passband ripples have equal frequency distances. The filter can then be regarded as equivalent to a cascade of two linear-phase filters, a low-pass with perfectly flat passband and a filter that only has equidistant ripples all the way up to the Nyquist frequency. A filter with an impulse response consisting of a pre-echo, a main response and a post-echo produces exactly this kind of equidistant ripple response. We did a little experiment on this on this forum some three years ago, see the attachment.
 

Attachments

@MarcelvdG , Thank you for the thoughtful responses. Seems to me there may be more going on as well, since HQ Player manual doesn't say much if anything about computational tradeoffs of the types that @gberchin alluded to.

One thing I might share is that your RTZ DSD dac (if well implemented) is very good at reproduction of sound stage. However in terms of the effects of external circuitry and or processing outside of the dac itself, both clock phase noise and PCM->DSD256 upsampling/conversion algorithms seem to have substantial effect on audible spatial cue reproduction (referring here to the ability to produce a spacious and convincing soundstage between, behind, and somewhat beyond the width of the distance between the speakers -- OTOH a narrow soundstage in front of the speakers usually turns out to be a jitter problem).
 
Last edited:
Regarding your observation that Gaussian filters tend to result in enhanced spatial localization cues, presumably compared with ordinary steep phase-linear FIR filters, I could think of these possibilities, in random order and with no guarantee that the list is complete:

A. You have less intersample overshoot issues with the Gaussian filter
B. When using a steep phase-linear filter, you get distracted by its pre- and post-ringing
C. You like the slight treble loss caused by the smooth roll-off of the Gaussian filter, it somehow helps you to concentrate on the localization cues
D. Your observation might be incorrect
E. The pre- and post-echoes of a steep filter with passband ripple disturb you
Gaussian filters like IIR, or Gaussian windowing for FIR filter generation?
Not sure what you mean?
 
Neither am I...

The discussion started with Mark's remark about HQPlayer's Gaussian interpolation filters, but neither of us know exactly what those filters are. I assumed it's something with a more or less Gaussian impulse and magnitude response, but it might as well be Gaussian windowing.
 
Well look, how many of us back in high school math class were given word problems with extra information that wasn't needed? Some people couldn't handle it. Other people knew, "we don't need that piece of information."

Now, I wasn't trying to fool anyone by adding useless information to the question. However, whether or not gaussian is necessary to know or not, there must be some things about DSP that help or hinder preservation and or enhancement of spatial localization cues. Part of it has to be plain numerical accuracy of processing. Part of it may have to do with very small FR variations (<1dB) that may emphasize frequencies the ear most relies upon for localization, etc.

Other than that, it may be that some dacs are capable of being more linearly matched between channels when reproducing filtered audio with less severe rates of phase shift in the audio band, etc. IOW there is no strict claim made that the problem is purely a mathematical one, nor maybe only a DSP math problem combined with a little basic psychoacoustics.

The question is more to the effect of, "if you are an expert at any of this, how to do proceed to make any progress at all towards answering the question?" Otherwise maybe we are just as well off to let mark100 hammer away as he does with his filter designs and maybe by sheer luck and pluck solve the problem the experts are afraid to touch?

EDIT: Now another observation to share: IME, one should NOT use ferrites in clocking circuitry nor in their power supplies or it will only be making the problem harder.
 
Last edited:
Guys, if I am being too provocative then please accept my apologies. TBH, fully agree that its good for folks to understand the deeper math. However, after all the talk about the merits of developing deeper expertise, except from comments by MarcelvdG, we haven't started thinking all that deeply about a possibly somewhat gnarly real life example problem. So was the prior talk of merits mostly sales pitch, or is the example problem too far out?
 
Gaussian filters like IIR, or Gaussian windowing for FIR filter generation?
Not sure what you mean?

Getting back to this: if the HQPlayer Gaussian filter has a fairly long, Gaussian-windowed or confined-Gaussian-windowed sinc impulse response rather than the approximately Gaussian impulse response I assumed, that pretty much eliminates options A, B and C of post #109, leaving only D, E and everything I didn't come up with. It seems likely that this is the case, as a Gaussian impulse response filter would not suppress the first image much.

Test person 2 from the report attached to post #109 is TFive, so @Tfive , what did the difference between the signals with and without pre- and post-echoes sound like to you? Was the stereo imaging different or was it something else?
 
  • Like
Reactions: Markw4
In casual listening I would probably never had noticed any difference at all. I used my own audio player where I could seemlesslay swich between the three provided versions, with the option to go back to a configurable timestamp, so I could listen to the same passage of a song over and over again. With this technique I sat down and carefully listened to try to make out any differences. Here are my observations:
  • I could not find any obvious differences in tonality, except see the last point. So no "EQ-effect" was to be heard.
  • There was no percieved difference in stereo imaging,
  • Reverb tails seemed to be slightly affected by the pre/post echos, they seemed to die down faster, or in other words, the seemed clearer and longer when listening to the unaffected file.
  • Most obvious, but I repeat myself here, very hard to detect was a certain "harshness" or "grainyness" in the sound when listening to the affected files.
I had to spend about 5 Minutes per song, swicthing back and forth between the three versions, this way analyzing around 3-4 sections of a song (quieter, passages, louder passages, passages with most obvious reverb tails).

As mentioned in the paper by Marcel I did a retest under the influence of cannabis, which is now - finally - legal to consume and grow here in germany. I thought that I could detect differences much more easily, which turned out to be total garbage in the end. Totally random results, who would have thought 😀

I did a second retest (now sober like in the first test) where I could successfully discern three out of the four files. Again this was "hard work" and really careful listening was required.

HTH
 
Some more comments on other stuff I observed when playing around with EQing and building crossovers with DSP:
  • Doing IIR crossover filters, f.ex. with 24dB/octave Lingwitz-Riley characteristic and then applying a phase correction FIR filter generated by rephase to me made no audible difference. Did not blindly A/B that though. Might do that some time soon. So far I have no pressing need for any FIR filters in my system.
  • One very obvious downside to long FIR filters is the time delay they introduce. When listening to Music this is irrelevant. When listening to audio of a video stream/file (which I do a lot with my systems) this is very important. More than ~50ms becomes obvious In music videos where the drummers movement appears to be forward in time compared to the sound. Very annoying to watch. For spoken dialog this could be relaxed to ~100ms i guess.
  • I had the pleasure to be part of two measurement/listening sessions, one with a three way horn system, one with a five way horn system and I did my own FAST two way system. During these sessions we measured time delay differences between drivers, separately for left and right channerls. For this I used the time alignment meaurement capabilities using so called wavelets. This is included in my software pulseaudio crossover rack. See screenshot here:
    PaXoverRack1.png
    In all three cases there was a very obvious change in tonality of the systems (might be that we just used a better EQ correction than before though!) and most obvious was that the "tightness" and "punch" of the sound was much improved. Or in other words: the systems started to make you nod your head and tap your feet a lot more. 🙂
    Used hardware/software was totally different in all three cases, one time being a high end audio crossover of which I don't remember brand nor name, one time two stacked behringer DCX2496s for the five way system with SPDIF input, and in case of my system I used a focusrite scarlett 16i20 1st gen together with pulseaudio crossover rack.
    EDIT: there's a discussion of this method here: https://www.diyaudio.com/community/...rding-wavelets-and-phase-measurements.384660/
 
Last edited:
A filter synthesis tool delivers such parameters. E.g. taking a measuring as input and spitting out an EQ filter.

All DSP's are not created equally. While they should give you the same results with the same settings, this is often not reality.

Bennett Prescott asked several people in 2011 to load an 'imaginary' loudspeaker preset that was created, and in return got back almost 2 dozen results.

Results are on his website under 'DSP differences'.

Measured results trump theory results. When 'auto' spit out filter does not match reality, then what? Go back to the math, math says it should be good?
 
I think maybe phase of DSP platform and phase difference between drivers when not in perfect alignment are underestimated.
Difficult to assess how the different filters will interact when a extra phasedifference is introduced and filtertype is unknown.
So the linkwitz-reily filter is well understood regarding phase and adding phase responce of two drivers with a time difference.
Different polar of the drivers add to the complexity.
So IIR realization of a LR24 is the best real world solution when not knowing all acoustic parameters of drivers and box in my opinion.
Steeper curves made by FIR or IIR can give some nasty artifacts when phase difference is introduced.
Understanding the math of a LR filter is very usefull to understand the complexity of adding responses of to drivers

Also FIR or IIR filters used near nyquist or LP filters phase influence, migth fool the operator using the prosessor.