Digital Room Correction Project

boconnor · 2009-10-16 1:57 pm

I have commenced a project to do digital room correction (DRC) of my sound system. I thought it might be useful to report on the techniques used and the progress of the project.

I begin by describing my existing use of DRC techniques and then describe how I intend to enhance the system.

Existing System

The existing system has been in place now for about 6 weeks. It has convolution filters running under the Squeezebox system. The equipment used to create and play the filters is:

Test signal generation: sine wave log sweep, played via the SqueezeBox receiver running the Inguz Audio test signal plug-in
Mic and mixer: Behringer omni-directional microphone ECM8000 (BEHRINGER: ECM8000) and Behringer UB1204 mixer with phantom power.
Recording software and hardware: Audacity under Windows, running on a laptop. Mixer output was fed to a Creative Audigy 2ZS PCMCIA sound card in the laptop.
Filter generation: DRC program by Denis Sbragion, running under Windows. (DRC: Digital Room Correction, freeware)
Filter convolver: Inguz Audio plug-in for Logitech Squeezebox (http://inguzaudio.com/RoomCorrection/, freeware)

The process used was:

a 20 to 20k Hz sine wave log sweep test signal was played through the existing sound system. The signal lasted for about 30 seconds. Separately, each speaker played the signal and the room response was excited. The SPL of the signal was manually adjusted to get a good recording level based on a visual check of the Audacity recording screen during the recording.
the room response of the system was measured with the microphone positioned in the best listening position. The mic was on a mic stand, with the mic capsule oriented vertically at listener ear height. The listening setup is a standard triangle with a stereo pair of speakers, and with the prime listening position at the apex of the triangle. Speakers are at ear height, about 1.58 metres apart, and measured from their midpoint some 2.1 metres from the listening position. There is no acoustic room treatment in the room. Approximate room dimensions are 4 metres (w), 4 metres (d), 4.5 metres (h). The room is furnished with rugs, window drapes and fabric couches.
the room measurements were processed through the DRC program. This has a range of target curves that can be selected. I opted for the ‘erb’ curve which uses a psychoacoustic target that is fundamentally a flat target curve but with a slight roll-off of the bass response. The program created a correction filter for each speaker.
the left and right correction filters were converted into a stereo .wav file. The stereo filter was loaded into the Squeezebox system using the Inguz Audio plug-in. In my Squeezebox system iTunes is the media source for music files. All files are in Apple lossless format. When playing music the data stream is extracted in real time from iTunes then convolved at the PC server with the correction filter, then transmitted by WiFi to the Squeezebox receiver in the listening space. From there it goes into the analogue input chain of amplifiers and speakers.

The sound of the existing system is excellent. In particular the bass is very tight and controlled. It is much better than the 1/3 octave equaliser solution I had been using before the shift to digital room correction filters.

Proposed System

The goal of the proposed system is to recreate, as far as possible, the sound environment of a recording space that is physically larger than the listening space. That is, to increase the chance that the listener suspends their disbelief that the room is larger than it actually is.

The three main changes to the existing setup to achieve this are:

running the filters on a dedicated PC rather than through the Squeezebox system.
moving from stereo to multi-speaker reproduction. Specifically from a 2.1 system to a 6.1 system.
running separate filters on each of the four surround speakers using impulse responses recorded from large acoustic spaces.

There are three advantages to running the filters by PC compared with by Squeezebox. First, with an appropriate sound card and other hardware more than 2 channels of audio can be processed. This allows for experimentation with surround sound formats. Second, at present only the audio output from the Squeezebox is processed against the filters. I have a DVD/CD player and home theatre system and those outputs can’t at present be put through the filters. Third, although the Inguz Audio plug-ins are really excellent software (and free!) they require the Squeezebox software to remain at version 7.2 or below. I have had to re-install old versions of the Squeezebox server software to remain compatible with the Inguz Audio software.

Equipment list for proposed setup:

Test signal generation: sine wave log sweep, played from the filter software running on the laptop.
Mic and mixer: Behringer omni-directional microphone ECM8000 and Behringer UB1204 mixer with phantom power.
Recording software and hardware: Not yet decided. I am leaning towards the Audiolense software by Juice HiFi (Juice HiFi) running under Windows on a laptop – but there are other options such as Acourate by Ulrich Bruggemann (http://www.acourate.com/). The Audiolense product seems better at driving the Audigy card during recording. Both products are payware. The sound card will remain the Creative Audigy 2ZS PCMCIA sound card.
Filter generation: Probably the same software as the recording software, Audiolense running under Windows.
Filter convolver: BruteFIR running under Linux (BruteFIR, freeware).
PC for running the filters: a fan-less, small PC with an appropriate sound card, running Linux.
Analogue input and output: Behringer ADA8000 DAC/ADC unit (BEHRINGER: ADA8000). This provides 8 channels of simultaneous input and 8 channels of simultaneous output via two ADAT connectors.

What’s been purchased so far:

Filter convolver: BruteFIR running under Linux (well not purchased, more like downloaded and compiled under Linux).
PC for running the filters: I purchased a small PC from eBay for $200. It has one PCI slot and a PCMCIA slot. It uses a Via 800Mhz Mini-ITX Motherboard with fan-less heat sink and has 512 MB of memory, more than enough to run the filters. It does not have a hard disk. Initially I intend to run the BruteFIR convolver using a USB stick loaded with a disk-less version of Linux “SPBlinux” (SPB-Linux 2.1 beta) and the BruteFIR software.
Sound card: RME Digi9636 Hamerfall PCI sound card (RME Intelligent Audio Solutions - Hammerfall Lite). This is a superseded card, which I bought on eBay for about $200. It has a S/PDIF digital input and output channel, and four ADAT connectors that can be configured as 16 input channels and 16 output channels at 44.1k or 48k sample rates.
Analogue input and output: Behringer Pro-8 ADA8000 DAC/ADC unit.

Next step: get the mini PC and BruteFIR software working for just 2 channels to confirm that the PC/sound card combination and the Behringer ADA8000 link is working.

I’ll provide updates as things progress.

I am happy to answer any questions or explain the project in more depth for those that are interested.

phofman · 2009-10-16 2:30 pm

Hi,

Nice plan, hats off. How are you going to transfer audio between the playback NTB and the brutefir linux filter?

boconnor · 2009-10-16 3:10 pm

phofman said:
Nice plan, hats off.

Thanks.

How are you going to transfer audio between the playback NTB and the brutefir linux filter?

I'm going to answer this based on the NTB you mention being the same as the Squeezebox receiver I have in the lounge room. If I have misunderstood your question please let me know.

The S/PDIF coaxial output from the Squeezebox receiver will be input into the coax input of the RME sound card in the PC. The correction filters running in the PC (under BruteFIR) will convolve the input and send the digital output to the RME’s ADAT channels. Then via toslink optical cable the output will go to the Behringer ADA8000 where the data stream is converted to line level analogue. Then its just a normal analogue signal which gets input in the amps and speakers.

lazycatken · 2009-10-25 1:45 am

Some questions:
1. Why did you choose erb configuration file? Why not stronger correction?
2. Does the convolution process 32 bits floating point signal? And what's the data format that fed to DAC? 16bits?
3. How do you set volume (or magnitude of amplification after convolution)? I mean, in order to achieve higher resolution, supposedly, signal must be normalized...
4. Can you talk more about your amplifier and speaker?
5. Did you use pa-xx.x.txt as your target frequency response curve? Due to room dimension, max 4.5m, it's impossible to generate bass under 38Hz...
6. Any microphone frequency response correction is applied?
7. Looks like iTunes plays music, then Inguz Audio plug-in does convolution. Sorry, I have no any idea about how Squeeze works. Does it work just like a sound card, thus, iTunes output music to the sound card?

boconnor · 2009-10-25 10:50 am

Some answers

lazycatken said:
Some questions: 1. Why did you choose erb configuration file? Why not stronger correction?

I found the bass response for the room was smooth and tight using the erb target. The stronger targets produced a slight boom in the bass. I like to compare the room response to what I remember from live concert performances, and they have a seamlessness from mid to bass that really appeals to me. That’s what I’m trying to reproduce in the listening room.

2. Does the convolution process 32 bits floating point signal? And what's the data format that fed to DAC? 16bits?

The drc program produces two files, one each for each channel. I understand they are 32 bit floating point files. They are spliced together by an Inguz Audio program to produce a stereo .wav file that is loaded with the SqueezeBox server software to convolve the data stream before it is transmitted to the receiver.

3. How do you set volume (or magnitude of amplification after convolution)? I mean, in order to achieve higher resolution, supposedly, signal must be normalized...

It’s a trial and error process. The output after the filter is applied is often over the normal digital 0db limit so I set a variable in a Inguz configuration file that attenuates the original signal to make room for the affect of the filter. At present the attenuation is -19db. It’s a compromise, since its been set to prevent clipping of the loudest music I play, but for some recordings it means the amp gain has to be turned up quite a bit to compensate for the attenuation.

4. Can you talk more about your amplifier and speaker?

I have a 4.1 setup. The front left and right speakers are modified Mirage Omnisat V2. I like their innovative speaker geometry. They use a specially constructed parabolic “wave guide” above the midrange and tweeter drivers that disperses the sound wave through a large angle into the listening space. This is designed to overcome the narrowing of the power response as frequency increases from the omni-directional low frequencies to the directional high frequencies.

Also,

The speakers are a small acoustic radiator relative to the size of the room. This means they are closer to the ideal of a point source radiator.
The enclosures are rigid and non rectangular so are less likely to emit out of phase secondary radiation into the listening area compared with rectangular wooden box designs.

They also have some practical advantages, being sufficiently small and light to be placed on adjustable stands, and easily moved around the room. Also they come with built in mounting screws that can be used with inexpensive home theatre stands.

I have mounted them so that the midrange driver is at the same height as the ears of the listener when sitting on the lounge at the optimal listening position.

The Omnisats use a very simple (3 component!) passive crossover. I removed the passive crossover board and replaced the banana plug speaker cable connectors with RCA connectors. That gave me one RCA cable per driver so I can run a simple stereo RCA cable to each enclosure from the amplifier.

I left the simple foam padding inside each enclosure that is part of a normal acoustic suspension design.

Due to the size and type of midrange driver, and the small enclosure size, the Omni’s are not going to have much bass output below 90Hz. So I decided to add the Siegfried Linkwitz designed “Pluto” subwoofers as front bass units (see here Linkwitz Lab - Loudspeaker Design for his website, this is the same Linkwitz of Linkwitz-Riley crossover design fame).

They seemed an ideal match for the satellites given they were designed to marry with omni-directional satellites with a low frequency roll-off.

Front Satellite Crossovers

I use a Behringer DCX2496 digital crossover for the front speakers. It provides 6 output channels which is perfect since I need 3 channels per side.

The Behringer has a number of different types of crossover filters (Butterworth, Bessel and Linkwitz-Riley), with a variety of slopes. I chose Linkwitz-Riley crossovers with a slope of 48db per octave.

The front bass units are crossed over at 100Hz. The crossover from midrange to tweeter is 3kHz. The passive crossover had used 2.7kHz. I adjusted the crossover value whilst listening to the speakers and found the 3kHz point sounded smoother than the 2.7kHz point.

Front Satellite Amplification

The front bass units are powered by a second hand Kenwood power amp. The satellites are powered by a second hand Yamaha RX-V520 AV receiver where I use the four 70W power amps in the receiver in external decoder mode.

Other speakers

OK, so that’s for the front speakers. For the rear speakers I use a pair of unmodified Spherex speakers (no longer manufactured) which are mounted on adjustable stands. These are the same design as the Mirage front speakers (they licensed the technology), but with smaller midrange and tweeter drivers.

For the subwoofer system I use a home built subwoofer done to a Linkwitz design, with a 12” Peerless driver in a 50L sealed box. It is driven by a Behringer A500 power amp in bridged mode. Because it is too close to the listening position compared with the satellites I digitally delay the signal being fed to the sub so that it is acoustically the same distance from the sub to the listener as from the front satellites to the listener.

I use a Behringer CX310 crossover to cross over the sub. The setting is 60Hz, with 24db per octave Linkwitz-Riley crossovers.

The rear surround speakers are powered with a Yamaha DSP-A1, which also does all digital stream decoding.

I don’t have a front centre speaker. I use an $80 Behringer mixer to take the three pre-amp signals (front left, centre, front right) from the Yamaha and mix them to the front stereo channels. This allows me to control the level of the dialogue and adjust the centre channel tone using the equaliser on the mixer.

5. Did you use pa-xx.x.txt as your target frequency response curve? Due to room dimension, max 4.5m, it's impossible to generate bass under 38Hz...

No, I used the erb-xx.x.txt curve. Not sure about the no bass under 38Hz. I think room modes and bass calculations can go a bit haywire in normal rooms (compared to the ideal rooms used in the equations). Also, the lounge room is open to another room of the same size so the combined volume is quite large. For instance the satellites are positioned in an archway joining the two rooms so they effectively don’t have any rear wall directly behind them.

6. Any microphone frequency response correction is applied?

No. I am aware that Denis Sbragion’s correction file for the Behringer ECM8000 was just based on testing one mic. I think if you are going to use an adjustment file it should be based on a reasonable number of samples.

7. Looks like iTunes plays music, then Inguz Audio plug-in does convolution. Sorry, I have no any idea about how Squeeze works. Does it work just like a sound card, thus, iTunes output music to the sound card?

No. The SqueezeBox server software runs on a PC where the music is also located. After taking data from iTunes and convolving with the filter, the SqueezeBox software sends the data stream to a wireless transmitter. From there it gets picked up by a SqueezeBox wireless receiver, which has the digital to analogue converters. Then its on to the amps and speakers.

boconnor · 2009-10-27 2:30 am

Correction to an answer

lazycatken said:
Some questions:5. Did you use pa-xx.x.txt as your target frequency response curve? Due to room dimension, max 4.5m, it's impossible to generate bass under 38Hz...

Sorry, I made a mistake when I answered the first part of this question, concerning the .txt file used by DRC.

You were right lazycatken - I have looked at the DRC configuration files and the 'erb' control file actually uses the pa-xx.xx.txt file for the target frequency response curve calculations.

lazycatken · 2009-10-27 6:21 pm

boconnor said:
I found the bass response for the room was smooth and tight using the erb target. The stronger targets produced a slight boom in the bass. I like to compare the room response to what I remember from live concert performances, and they have a seamlessness from mid to bass that really appeals to me. That’s what I’m trying to reproduce in the listening room.

In my system, extrme or even stronger got better bass performance(less stationary waves)
So far, I tested four systems (different rooms & equipments), stronger correction produced better bass resolution in all the four systems.
The only issue prevents taking stonger correction is pre-echo...

It’s a trial and error process. The output after the filter is applied is often over the normal digital 0db limit so I set a variable in a Inguz configuration file that attenuates the original signal to make room for the affect of the filter. At present the attenuation is -19db. It’s a compromise, since its been set to prevent clipping of the loudest music I play, but for some recordings it means the amp gain has to be turned up quite a bit to compensate for the attenuation.

Volume setting issue makes me be interested in the internal signal format of convolver...
No doubt, the convoluton process is under 32 bits floating point format. And then? 16bits?
I'm wondering down converting from float32 to interger16 must result in some loss...
Thus, I use 24 bits, sounds better. Although, not sure its due to the issue that I concerned...

I have a 4.1 setup. The front left and right speakers are modified Mirage Omnisat V2. I like their innovative speaker geometry. They use a specially constructed parabolic “wave guide” above the midrange and tweeter drivers that disperses the sound wave through a large angle into the listening space. This is designed to overcome the narrowing of the power response as frequency increases from the omni-directional low frequencies to the directional high frequencies.

View attachment 144559

Yes, the design is very impressive...
Thus, I mentioned it in my blog couple months ago æ²ˆæµ¸åœ¨éŸ³æ¨‚ä¹‹ä¸*...: æœ‰æ„�æ€�çš„å–®é«”è¨*è¨ˆ--Mirage Uni-theater
But, it may not be an appropriate choice for DRC...
You know, the major issue that DRC tried to solve is stationary wave in common listening room. DRC does not do much on high frequency range. High frequency must be taken care by traditional passive room treatment. That's why Denis Sbragion holds a heavily damped listening room.
Thus, I guess, the best speaker choice for DRC is one that pretty directional. Just like studio near field monitor speakers...

I have mounted them so that the midrange driver is at the same height as the ears of the listener when sitting on the lounge at the optimal listening position.

Looks like, your speakers do not be placed on a position to get better correction by DRC?
DRC documents suggets put speakers close to wall. I did it, got better result.

The Omnisats use a very simple (3 component!) passive crossover. I removed the passive crossover board and replaced the banana plug speaker cable connectors with RCA connectors. That gave me one RCA cable per driver so I can run a simple stereo RCA cable to each enclosure from the amplifier.

That's interesting... Why did you choose RCA instead of Y spade or banana?

The front bass units are crossed over at 100Hz. The crossover from midrange to tweeter is 3kHz. The passive crossover had used 2.7kHz. I adjusted the crossover value whilst listening to the speakers and found the 3kHz point sounded smoother than the 2.7kHz point.

Did you do measurement? Just curious about what it can be seen on measurement result...

For the subwoofer system I use a home built subwoofer done to a Linkwitz design, with a 12” Peerless driver in a 50L sealed box.
...
No, I used the erb-xx.x.txt curve. Not sure about the no bass under 38Hz. I think room modes and bass calculations can go a bit haywire in normal rooms (compared to the ideal rooms used in the equations). Also, the lounge room is open to another room of the same size so the combined volume is quite large. For instance the satellites are positioned in an archway joining the two rooms so they effectively don’t have any rear wall directly behind them.

My speakers are Spica TC50, with 6.5" woofer. No subwoofer is companioned, though, I have one.
TC50 can't produce bass under 50Hz. I realized it's useless to compel TC50 to generate flat frequency response down to 20Hz. Thus, I modified the target frequency response curve.

No. I am aware that Denis Sbragion’s correction file for the Behringer ECM8000 was just based on testing one mic. I think if you are going to use an adjustment file it should be based on a reasonable number of samples.

I'm thinking about purchasing a mic with frequency response data...

No. The SqueezeBox server software runs on a PC where the music is also located. After taking data from iTunes and convolving with the filter, the SqueezeBox software sends the data stream to a wireless transmitter. From there it gets picked up by a SqueezeBox wireless receiver, which has the digital to analogue converters. Then its on to the amps and speakers.

Sorry, I didn't clearly present my question.
I'm thinking about playing music by iTune, then send music via LAN to a device, just like Squeezebox.
Right now, it's Apple Airport Express. Unfortunately, Airport Express can only handle 16bits music.
Thus, I'm thinking building my own. You know, the first issue will be "How to get music from iTune?"
A simple way is to install a virtual sound card. Not a good idea, right?
Another way is tried to hack the encryption key of RAOP. I'm not an expert of that...
Or license RAOP from Apple. I have to buy lottery first...
Or?

boconnor · 2009-10-28 1:07 am

lazycatken said:
Volume setting issue makes me be interested in the internal signal format of convolver...No doubt, the convoluton process is under 32 bits floating point format. And then? 16bits? I'm wondering down converting from float32 to interger16 must result in some loss... Thus, I use 24 bits, sounds better. Although, not sure its due to the issue that I concerned...

The following is my understanding of how the process works. However, I am not an expert in digital signal processing so if anyone reading this wants to correct me please feel free.

The impulse response of the room is measured using a 16-bit resolution log sine sweep test tone. The actual measurement of the room response results in a 32-bit floating point format file. There are then a series of mathematical calculations done on this file which results in a 32 bit floating point output file, which is the correction filter. The correction filter is then put into a convolver. A convolver like BruteFIR also uses 32-bit internal floating point calculations. Within BruteFIR the word length of the final output file that actually gets converted into analogue will depend on the settings of the sound card.

The question then becomes is it better to use 24 bits or 16 bits in the final digital to analogue conversion? Remember that the original input data stream that goes into the convolver is likely to come from a CD source. That means that the original data stream is in 16-bit format. Even if it is up-converted to 24 bits, no additional information is created in that up sampling process. Provided the convolution algorithms are correctly designed, convolving a 16-bit input with a 32-bit floating point file should not result in any loss of information. Certainly not information that would be audible.

I think that the question can only ultimately be answered through psychoacoustic testing. A properly designed and executed test will indicate if there are audible improvements in going from 16 bits to 24 bits in the final DAC process. My view is that 16-bit resolution is good enough in normal listening environments, so I don't see a need to necessarily use 24-bit conversions.

But, it may not be an appropriate choice for DRC...You know, the major issue that DRC tried to solve is stationary wave in common listening room. DRC does not do much on high frequency range. High frequency must be taken care by traditional passive room treatment. That's why Denis Sbragion holds a heavily damped listening room. Thus, I guess, the best speaker choice for DRC is one that pretty directional. Just like studio near field monitor speakers... Looks like, your speakers do not be placed on a position to get better correction by DRC? DRC documents suggets put speakers close to wall. I did it, got better result.

I am very persuaded by the argument of Seigfreid Linkwitz concerning the best speaker geometry and placement of speakers in a room to generate a sound field that convinces the ear/brain combination that you are in the recording space rather than the listening room.

As I understand his ideas you need either dipole or monopole (omnidirectional) speakers which have a uniform polar response across mid to high frequencies. One of the reasons I quite liked the Omnisat design was that, with the use of the waveguide, they really seem to generate a polar response that is uniform across the mid and high frequencies. That, combined with a base unit which would be omnidirectional anyway, means that there is a very good chance that there is a nice consistency in the polar response across all useful frequencies.

The other issue is placement of the speakers in the room. I am also convinced by his arguments that you need at least a one metre gap between the speaker and any side walls. This ensures that there is not a wave of reflections that are so close in time compared with the direct wave from the speakers that the brain cannot distinguish the two waves and the illusion of phantom sources collapses.

I have tried different placement of the speakers in the room and it certainly appears to me that speakers at least one foot (3 metres) from a side wall sound better, in the sense that the phantom images and the realism of the reproduction is better.

I also think that room correction is easier if the reflected waves are both lower in intensity and there is a longer time gap between the direct wave and the reflected one - on the basis that the algorithms will have less work to do in distinguishing the corrections needed for the direct wave compared with correcting the reflected waves. At higher frequencies the DRC program uses an increasingly short time window so that would imply to me that the corrections are attempting to deal more with the direct wave from the loudspeaker rather than the room reflections.

That's interesting... Why did you choose RCA instead of Y spade or banana?

Convenience in building really. The existing speakers had a pair of speaker connectors. When I removed the passive crossover board and the speaker connectors, I found that the holes were perfect for the insertion of a pair of RCA terminals. That way I can run just a stereo RCA cable to the speakers and drive both speakers in the cabinet with one cable.

Did you do measurement? Just curious about what it can be seen on measurement result...

When I originally equalised the satellites with a 1/3 octave equaliser I hired an anechoic chamber and used a 1/6 octave measurement process to look at the response of the speakers as I varied the crossover frequency.

To be truthful there wasn't much difference in the measured response as I varied the crossover frequency from 2.5 kHz through to 3.2 kHz. So my reasoning at that point was that it would be better to have a high crossover frequency for the tweeter given that it is so small and the less information out of bounds that it gets the less distortion there would be. On that basis I chose to 3 kHz crossover frequency.

I'm thinking about purchasing a mic with frequency response data...

Sounds like a good idea.

I'm thinking about playing music by iTune, then send music via LAN to a device, just like Squeezebox. Right now, it's Apple Airport Express. Unfortunately, Airport Express can only handle 16bits music. Thus, I'm thinking building my own. You know, the first issue will be "How to get music from iTune?" A simple way is to install a virtual sound card. Not a good idea, right? Another way is tried to hack the encryption key of RAOP. I'm not an expert of that...Or license RAOP from Apple. I have to buy lottery first...Or?

As I indicated earlier I am simply not convinced that trying to get 24 bit source material (either originally recorded that way or through up-sampling) is worthwhile. The vast majority of existing commercially available material is in 16-bit format. If you run with that then there are a number of solutions available including just using Apple airport express as it's currently configured.

By the way, I have enjoyed your comments and questions – if has forced me to think through some things that had not been clear in my own mind before responding to your posts.

dviswa · 2009-10-28 2:20 am

Hi Guys,

Nice to see this going on. I have been on the side lines, considering dipping my toes into DRC. I have read elsewhere on the net about the latency issue. Using it on Music is a no brainer, but for TV and DVD, the digital processing of the sound causes a delay loosing sync with the images on TV. This is the main reason, why I am still on the sidelines. Please let me know how you are addressing this. Or you are using it only for music?

Thanks,
Dinesh

boconnor · 2009-10-28 10:42 am

dviswa said:
Hi Guys, Nice to see this going on. I have been on the side lines, considering dipping my toes into DRC. I have read elsewhere on the net about the latency issue. Using it on Music is a no brainer, but for TV and DVD, the digital processing of the sound causes a delay loosing sync with the images on TV. This is the main reason, why I am still on the sidelines. Please let me know how you are addressing this. Or you are using it only for music? Thanks, Dinesh

I'm happy to address the latency issue from the perspectives of both the existing and proposed systems.

Existing system

When using my 4.1 home theatre system the front left and the front right speakers are fed via a Behringer DCX2496 active crossover unit. This takes two channels and converts them to six channels. Each channel is processed using an eighth order (48 db per octave) Linkwitz-Riley crossover calculation. All six channels are processed through one DSP unit. I do not know the latency figure for this unit but there would obviously be some latency in doing those calculations.

But, I have not noticed any sync issues when watching images on the screen and listening to the dialogue from the front stereo speakers.

Proposed system

In the proposed system all audio channels will go through convolution filters plus the front left and right will then go through the Behringer processor. As I understand it there are three components to the magnitude of the latency delay (independent of what is happening in the Behringer): the latency within the sound card, the speed of the PC in doing the convolution calculations, and the programming efficiency of the convolution algorithms. The sound card I will be using, a RME card, has next to zero latency. The PC processor is quite fast. And the BruteFIR algorithms should be quite fast, since it has been designed to be very fast at convolving data files.

On that basis there should not be a problem with latency. But I guess I’ll find out when everything comes together.

Carl_Huff · 2009-10-29 4:21 pm

On latency ...

Adding in a BruteFIR convolver to your audio chain will add a 40 to 50ms delay to the audio dependent upon the processor used. A video frame roughly equates to 30ms. That means when watching a DVD or any other video source the audio will lag the video by roughly 1.5 frames. Most people begin to perceive the delay when the difference is 2 frames or greater. The best way to fix this problem would be to delay your video by an equal amount bring the audio back in sync with the video. There are video processors on the market allow you to do this.

lazycatken · 2009-10-29 5:00 pm

boconnor said:
The question then becomes is it better to use 24 bits or 16 bits in the final digital to analogue conversion? Remember that the original input data stream that goes into the convolver is likely to come from a CD source. That means that the original data stream is in 16-bit format. Even if it is up-converted to 24 bits, no additional information is created in that up sampling process. Provided the convolution algorithms are correctly designed, convolving a 16-bit input with a 32-bit floating point file should not result in any loss of information. Certainly not information that would be audible.

Just do an experiment.
Normalize a music clip. Let's call the result N.
Filter N. Normalize the result, then convert to 16 bits format. Let's cal the result NF.
Play NF. Then play N.
You'll find N sounds louder.

I think that the question can only ultimately be answered through psychoacoustic testing. A properly designed and executed test will indicate if there are audible improvements in going from 16 bits to 24 bits in the final DAC process.

In my system, 24bits indeed sounds better. More detail and emotional...

I hired an anechoic chamber

That's great! How much per hour did they charge?

I also think that room correction is easier if the reflected waves are both lower in intensity and there is a longer time gap between the direct wave and the reflected one - on the basis that the algorithms will have less work to do in distinguishing the corrections needed for the direct wave compared with correcting the reflected waves. At higher frequencies the DRC program uses an increasingly short time window so that would imply to me that the corrections are attempting to deal more with the direct wave from the loudspeaker rather than the room reflections.

As you have noticed, DRC uses an increasingly short time window at high frequency range. http://drc-fir.sourceforge.net/doc/drc004.gif
That means, reflections that with time gap longer than the window can not be filtered out...

I am very persuaded by the argument of Seigfreid Linkwitz concerning the best speaker geometry and placement of speakers in a room to generate a sound field that convinces the ear/brain combination that you are in the recording space rather than the listening room.

By making the impulse respone of the whole system(equipment + room) close to ideal impulse response, DRC tried to make the sound reproduced in listening position close to original music source. In my opinion, that means, the music produced in mastering studio, you know, nearly all direct sound.

BTW, did you have to do balancing for filtered music?
In my system, right channel must be distorted 3dB. That's another reason for 24 bits...

boconnor · 2009-10-30 12:57 am

Carl_Huff said:
On latency ... Adding in a BruteFIR convolver to your audio chain will add a 40 to 50ms delay to the audio dependent upon the processor used. A video frame roughly equates to 30ms. That means when watching a DVD or any other video source the audio will lag the video by roughly 1.5 frames. Most people begin to perceive the delay when the difference is 2 frames or greater. The best way to fix this problem would be to delay your video by an equal amount bring the audio back in sync with the video. There are video processors on the market allow you to do this.

That's interesting about the latency calculation. So what you are saying is that a latency of greater than 60 ms will result in a perceptible ‘out of sync’ mismatch.

Do you know what the latency would be for the Behringer DCX2496 unit that I am using? I would have thought that it would be of comparable latency to your calculation for BruteFIR, given that BruteFIR is known for its speed and I would be surprised if the DSP unit inside the Berenger was very fast. If it is of comparable latency (or greater) then that would imply I should already be seeing a perceptible delay. But there has been no problem so far.

boconnor · 2009-10-30 1:01 am

lazycatken said:
Just do an experiment. Normalize a music clip. Let's call the result N. Filter N. Normalize the result, then convert to 16 bits format. Let's cal the result NF. Play NF. Then play N. You'll find N sounds louder. In my system, 24bits indeed sounds better. More detail and emotional...

I suppose we need to be clear as to what is the question and therefore what is the experiment seeking to answer. As I understand it what you are saying is that a 24 bit signal “sounds better” after having a filter applied to it compared with a 16 bit signal with a filter applied. And by sounds better I suppose that would mean that there is less distortion or greater clarity or any other description that would fit the idea that the is no distortion of the signal. Or less distortion compared to that in the 16-bit signal.

So what we need is an experiment which allows us to determine if a 24 bit input signal convolved has less artefacts in it compared with a 16-bit signal similarly convolved. My experiment to answer that question would be as follows:

Take a known analogue signal. Do two analogue to digital conversions, producing a 16-bit datastream and a 24 bit datastream.

Call the 16-bit conversion A. Call the 24 bit conversion B. Convolve both A and B with the same filter. If necessary normalise both A and B so there is no clipping of the datastreams. Check that both A and B will have the same average signal amplitude after conversion back to analogue.

Convert both A and B back to analogue, A with a 16-bit conversion, B with a 24 bit conversion.

Use a ABX software program to do a double blind experiment to see if there is a perceptible difference between A or B. If there is a perceptible difference then it is likely that the 24 bit datastream has retained more information (less distortion) than the 16-bit datastream. If there is no perceptible difference compared to chance, then it doesn't matter whether you use a 16-bit datastream or 24 bit datastream for conversion. My guess is that there would be no perceptible difference but that's an empirical question.

That's great! How much per hour did they charge?

About 125 Australian dollars per hour. I hired the chamber for 2 hours.

As you have noticed, DRC uses an increasingly short time window at high frequency range. http://drc-fir.sourceforge.net/doc/drc004.gif That means, reflections that with time gap longer than the window can not be filtered out... By making the impulse respone of the whole system(equipment + room) close to ideal impulse response, DRC tried to make the sound reproduced in listening position close to original music source. In my opinion, that means, the music produced in mastering studio, you know, nearly all direct sound. ...

I have listened to music in a very acoustically dead environment. In fact I have listened to music in an anechoic chamber, and I must say that whilst there was fantastic clarity it was a very empty experience and not very emotionally engaging. Sort of like being inside a huge pair of headphones, but with the sound in front of you rather than inside your head.

The point that writers like Linkwitz are making is that you actually need a normal reverberant space, the sort of space that you would find in a living room, to generate the reflections needed for the listener to perceive phantom images in the soundstage. And being able to perceive a credible soundstage in front of you is essential to the emotional engagement with the reproduced performance.

The great advantage of the DRC program is that its control of bass response in the room is fantastic. When I looked at the output from the program it was creating many hundreds of filter taps for frequencies below 100 Hz. As the frequency increases the filtering is reduced. So in that sense the reflections from the higher frequencies are being left alone, as you say. And I think that is a good thing. Because it means that there is a nice reverberant space that is giving the right psychoacoustic clues to the listener.

BTW, did you have to do balancing for filtered music? In my system, right channel must be distorted 3dB. That's another reason for 24 bits...

Can't say that I've noticed a need to adjust left or right channels after filtering. I have a very symmetrical speaker setup. Both speakers are placed in the room at the same distance from the side walls in a triangle configuration. And the distance from each speaker to the listener is the same within 1 cm.

Carl_Huff · 2009-10-30 3:38 am

Do you know what the latency would be for the Behringer DCX2496 unit that I am using? I would have thought that it would be of comparable latency to your calculation for BruteFIR, given that BruteFIR is known for its speed and I would be surprised if the DSP unit inside the Berenger was very fast. If it is of comparable latency (or greater) then that would imply I should already be seeing a perceptible delay. But there has been no problem so far.

As I remember it, the Behringer DCX2496 runs a SHARC DSP to do its processing. The SHARC is a good performer. I would be very surprised if it represents more than 30ms delay total.

On 16 vs 24 bit ...
Always use 24 bit. In an ideal world, If you load allpass filters into a convolver what you put in would be identical to what comes out. If the output matches the input, the system is said to be 100% 'bit accurate'. However in the real world that never happens. What you will find is that DSP processing will trash a minimum of 2 bits and more if there is a sample rate conversion. As crazy as it seems, you must run at 24 bits to be 16 bit accurate.

lazycatken · 2009-10-30 4:18 am

boconnor said:
Originally Posted by lazycatken
Just do an experiment. Normalize a music clip. Let's call the result N. Filter N. Normalize the result, then convert to 16 bits format. Let's cal the result NF. Play NF. Then play N. You'll find N sounds louder. In my system, 24bits indeed sounds better. More detail and emotional...

Click to expand...

So what we need is an experiment which allows us to determine if a 24 bit input signal convolved has less artefacts in it compared with a 16-bit signal similarly convolved.

What I mean is, more dynamic range is necessary for filtered sound.
Just use Audacity, check the waveform of N & NF.

I have listened to music in a very acoustically dead environment. In fact I have listened to music in an anechoic chamber, and I must say that whilst there was fantastic clarity it was a very empty experience and not very emotionally engaging. Sort of like being inside a huge pair of headphones, but with the sound in front of you rather than inside your head.

That's interesting... I can image it.

The point that writers like Linkwitz are making is that you actually need a normal reverberant space, the sort of space that you would find in a living room, to generate the reflections needed for the listener to perceive phantom images in the soundstage. And being able to perceive a credible soundstage in front of you is essential to the emotional engagement with the reproduced performance.

Someone said the phantom image is due to sound from sides of ears.
I tried heavily damping at sides of listening position, indeed resulted in less lively sound.

The great advantage of the DRC program is that its control of bass response in the room is fantastic. When I looked at the output from the program it was creating many hundreds of filter taps for frequencies below 100 Hz. As the frequency increases the filtering is reduced. So in that sense the reflections from the higher frequencies are being left alone, as you say. And I think that is a good thing. Because it means that there is a nice reverberant space that is giving the right psychoacoustic clues to the listener.

Well...Anyway, I tried to raise MPWindowExponent to 1.5 and MPLowerWindow to 740, that is much stronger than Extreme(1/740).
Filtered music is much clear, I have to say, huge improvement.
Even more lively, perphaps, it's due to less noisy.
And make me more enjoy music...great...

I have a very symmetrical speaker setup. Both speakers are placed in the room at the same distance from the side walls in a triangle configuration.

I tested 4 systems.
Before filtering, can't hear any unbalance, though, all with asymmetrical setup.
After filtering, 2~6dB adjustment is necessary, depends on correction level.

boconnor · 2009-10-30 11:34 am

Carl_Huff said:
On 16 vs 24 bit ... Always use 24 bit. In an ideal world, If you load allpass filters into a convolver what you put in would be identical to what comes out. If the output matches the input, the system is said to be 100% 'bit accurate'. However in the real world that never happens. What you will find is that DSP processing will trash a minimum of 2 bits and more if there is a sample rate conversion. As crazy as it seems, you must run at 24 bits to be 16 bit accurate.

Interesting stuff about 2 bits being trashed. I think the final arbiter is whether or not we can perceive a difference. That is why I suggest a proper psychoacoustic test to decide. If it showed that 24 bits was needed then so be it – I’m not wedded to 16 bits. In fact, my gear is going to convert the analogue signals to 24 bits anyway (that’s just the way there are designed).

But I remain sceptical about the absolute need for the 24 bits. Particularly given the maximum 60 db dynamic range in a normal listening room at home, I can’t see a difference of 24 bits vs 16 bits being perceived in those environments.

boconnor · 2009-10-30 11:35 am

lazycatken said:
Anyway, I tried to raise MPWindowExponent to 1.5 and MPLowerWindow to 740, that is much stronger than Extreme(1/740). Filtered music is much clear, I have to say, huge improvement. Even more lively, perphaps, it's due to less noisy. And make me more enjoy music

I haven’t varied any of the parameters – just used the defaults. Looks like you’ve found a big difference in the sound using the MPWindowExponent and MPLowerWindow parameters. How did you decide on those values? Have you varied any other parameters?

I tested 4 systems. Before filtering, can't hear any unbalance, though, all with asymmetrical setup. After filtering, 2~6dB adjustment is necessary, depends on correction level.

That’s unexpected. Intuitively I’d expect the filters to deliver very nearly equal matched signals. Any idea why the filters would be generating a channel imbalance?

lazycatken · 2009-10-31 2:40 am

boconnor said:
But I remain sceptical about the absolute need for the 24 bits. Particularly given the maximum 60 db dynamic range in a normal listening room at home, I can’t see a difference of 24 bits vs 16 bits being perceived in those environments.

Just as you said. Judge it by your ears. You'll find the truth.
And then think about "What's wrong with the measurement theory/method/tool/value?"

boconnor said:
I haven’t varied any of the parameters – just used the defaults. Looks like you’ve found a big difference in the sound using the MPWindowExponent and MPLowerWindow parameters. How did you decide on those values? Have you varied any other parameters?

Did you tried to filter the sample impulse response provided by Denis Sbrigion?
To achieve the simulation drawing shown on DRC document, huge correction must be applied.
I tried to apply as strong as possible correction to the filter of my system, until pre-echo is perceivable. A quick way to do this is to find how sensitive is your system vs pre-echo. Then make simulations, find the most possible strong correction. And verify it.

That’s unexpected. Intuitively I’d expect the filters to deliver very nearly equal matched signals. Any idea why the filters would be generating a channel imbalance?

Check DRC document, you can find normalization factor...
Balance, means, equal loudness of left & right channel. It's judged by ears. You know human ears are not frequency linear...
One of the correction of DRC is magnitude frequency response. That is, DRC does the inverse of your system response. Check this. æ²ˆæµ¸åœ¨éŸ³æ¨‚ä¹‹ä¸*...: EQçš„ç›®æ¨™æ˜¯？20~20KHzå¹³ç›´？
The first graph is my system response(record of impulse response measurement, log sweep), the second is corrected log sweep.
The correction applied to both channels may not be equal.

BTW, the graph shows why more dynamic range is necessary for filtered signal.

Wingfeather · 2009-10-31 4:06 am

Hi all.

Originally posted by boconnor
As I understand it there are three components to the magnitude of the latency delay (independent of what is happening in the Behringer): the latency within the sound card, the speed of the PC in doing the convolution calculations, and the programming efficiency of the convolution algorithms.

You're almost correct. The sound card will certainly introduce some latency, which I'll term hardware latency. In professional sound cards you can usually specify one of several available. I think 32 samples is the lowest mine goes to. Note that hardware latency is caused by the block-processing nature of the PC and is not intrinsic to DSP - an embedded DSP system might only have a single sample of hardware latency.
However, it is only a common misconception that processing latency has anything to do with the processing speed of the machine. If you have a fast processor, then your system will work; if you have one that's too slow, you'll get glitches and it won't. Instead, processing latency is decided by the specific types of filters being used.*

The DCX2496 emulates traditional analogue filters (such as Linkwitz-Riley and similar). In digital systems, this is achieved with infinite impulse response (IIR) filters, which are computationally cheap and which themselves, like their analogue counterparts introduce no latency. So, the following observation:

Originally posted by boconnor
I have not noticed any sync issues when watching images on the screen and listening to the dialogue from the front stereo speakers.

is absolutely what should be expected from such a system. Actually, I tell a quick lie - each IIR filter in a system will introduce a single sample of latency. I would be very surprised if the DCX2496 isn't based on cascaded second-order filters (by far the simplest way to do it), so an eighth-order `effective' filter should introduce 4 samples of latency - essentially nothing.

In contrast, convolution is another word for the `other' type of digital filter - the finite impulse response (FIR). These filters generally do exhibit latency, but the amount is dependent on the design of the filter, and may be significant. It is however bounded by the number of taps, so the filter cannot at least introduce a delay longer than itself.

One of the supposed advantages of FIR filters is that they can, with correct design, adjust frequency response in a linear-phase fashion, meaning simply that they delay all frequencies by the same length of time and will maintain waveshape. IIR filters don't do this. Whether this constitutes an audible advantage is still up for debate, but it's a popular criterion for designing many filters. Anyhow, a linear-phase FIR filter will by definition always have a delay which is half the length of the filter. So for example, at a 48kHz sample rate, a filter which is 4800 taps long will exhibit 2400 taps (50ms) of delay.
With room-correction filters, the definition of latency is rather more woolly, but it is often considered to be the location of the first large peak in the filter impulse response. I don't know offhand, but there might well be a large latency introduced in room-correction applications. I imagine this will depend on the exact method and the exact measurements used to produce it. In any case, if you're using a convolution-based filtering system, you should tread carefully with regards to latency - it can be an issue, in a way that it cannot with your DCX.

Originally posted by boconnor
So what we need is an experiment which allows us to determine if a 24 bit input signal convolved has less artefacts in it compared with a 16-bit signal similarly convolved

Thankfully, this experiment can be done in the mind. A quantised signal can be regarded as the sum of the infinite-resolution analogue signal and some error signal introduced due to the quantisation process. Convolution is a linear process, so the convolved output can be regarded as the convolution of the infinite-precision original signal plus the convolution of the error signal. Now, using a 16-bit signal will result in a larger error signal than that of a 24-bit signal, and there will be larger errors at the output. However, since the convolution introduces no error of its own, this increase will be exactly proportional to the increase in input error.
Now to implementation: 32-bit floating point is not perfectly linear, so the real convolver will introduce error on top of that contained in the input signal. The output error will be difficult to predict as it is the result of two nonlinearities in series, but it will be slightly more than proportional. It will also have a slightly more complex harmonic distortion structure.

Really, in the end, I think the difference in artefacts will be very close to proportional (i.e. if you can't hear the quantisation noise in a 16-bit signal then it's unlikely you will hear the quantisation noise in the convolver output). There will be more error, but it will be only slightly.

In any case, the convolved signal will contain more information than the signal coming in, so you should use 24 bits at the end regardless of whether the input was 16-bit or 24-bit, or you're artificially adding yet another error signal on top of the convolver output.

ASIDE: Since floating-point arithmetic cannot be made linear I would not use 32-bit floating point in a system where 24-bit transparency is required - in many instances, mostly when dealing with loud signals, harmonic distortion spikes will show above the 24-bit noise floor. Using 64-bit floating-point will solve this in a practical sense - though I will say that the distortion due to 32-bit arithmetic is unlikely to be audible.

Originally posted by Carl Huff
What you will find is that DSP processing will trash a minimum of 2 bits and more if there is a sample rate conversion. As crazy as it seems, you must run at 24 bits to be 16 bit accurate

That's a vastly over-generalised statement. Poorly-done DSP arithmetic will trash the LSB (by truncation), but I don't know where you got 2 bits from. Especially, fixed-point systems which truncate and which are dealing with quiet signals may introduce many LSBs-worth of harmonic distortion - and this is for a single multiplication. Signal processing chains such as a filter are comprised of many multiplications and the distortions of each are summed together. A long FIR filter can exhibit a surprising amount of distortion because of this.

A `proper' DSP audio system will be TPDF-dithered, adding 2 LSBs of noise. However, `trash' is absolutely the wrong word for what this process does, as such a system will display an infinite dynamic range and can perfectly represent signals which are much less than one LSB in amplitude. It sounds incredulous, but the latter isn't even theoretical - you can readily measure it (search for some old threads here on this subject - I've posted some).
Some noise will be added to the signal - this is what dither does, after all - but it's white noise and not distortion. Even for 16-bit output, it's usually below the noise floor in the room and can be ignored.

Whew, that's a long post! If my explanations are unclear (or even contain little mistakes), it's only because it's very late and I'm tired. The issues are subtle though and should get careful, thorough explanation. I can give more detail on anything that needs it.

Right. Bed time...

* BruteFIR is a special case. It uses a frequency-domain algorithm, as compared to an FIR filter which is time-domain. The result is much the same, it's more of an implementation issue than anything. However, frequency-domain methods must be processed in blocks and actually add yet another layer of latency on top of what I've mentioned, similar in nature to how the sound card adds latency. BruteFIR's algorithm allows this (and only this) extra latency to be reduced by throwing more CPU power at it. Even so, the total system latency can only approach that of an FIR filter - never beat it.

Digital Room Correction Project

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member