There IS an affordable HDMI to AES/EBU Multichannel Audio Solution

For years, I’ve been looking for an affordable device that can extract multichannel audio from HDMI and output it digitally for further processing, specifically for DSP. Keeping everything in the digital domain ensures the best possible audio quality.

My goal was to create a high-quality multichannel audio system that was also affordable, modular, and user-friendly. However, this proved to be quite challenging. The closest I got to achieving that goal is the setup described below:

Flow Chart:
  1. Smart TV (Spotify and video source)
    ↓ (Audio via SPDIF)
  2. Raspberry Pi
    • Receives SPDIF output from the TV.
    • Applies DSP (Digital Signal Processing).
      ↓ (4 Audio Channels via USB)
  3. Okto DAC8(8 Channel DAC)
    • Receives 4-channel audio via USB from Raspberry Pi.
    • Converts digital audio to analog.
      ↓ (Analog Audio Out)
  4. 2 DIY Mono Block Amplifiers + 2 Active Subwoofers
    • Each amplifier receives one analog audio channel.
    • Amplifies audio for output to speakers.
Unfortunately, this setup resulted in stereo audio only, which didn’t meet my expectations. It could have been improved by using something like FFMPEG (thanks to @phofman for suggesting this) on the Raspberry Pi to process Dolby-encrypted audio over SPDIF for surround sound. However, I didn’t have the skills or the motivation to dive into that. Getting Camilla DSP working on the Pi and getting it to process the TOSLINK input was already a tough challenge. Once set up, though, Camilla DSP proved to be a great piece of software. The TOSLINK input had a specific quirk, requiring a special power-on sequence to work, which wasn’t very user-friendly for my household.

I considered the Vanity PRO, which seemed like the best affordable high-quality solution, but for my use case I couldn't justify the cost.

Then, I discovered a post from @yulen (on DIY Audio) promoting his 5AES to HDMI Black Box, which caught my attention, @mdsimon2 expressed an interest in the device as well. The product, a converter from HDMI to 5 AES, also includes optional DSP processing via Sigma Studio. It features 2x HDMI input and 1x HDMI outputs, with up to 4K @ 30Hz output. (Note: I have since edited this post to reflect the correct HDMI connections as I made a mistake in my original post)

While the product was intriguing, I had some concerns as it didn’t support Dolby decryption. However, the built-in DSP was a big plus for me. After several emails back and forth with Yulen (thanks for your patience!), I decided to try the product. Yulen assured me that it would work well with an Apple TV set to LPCM out, which was a perfect source for my needs—Spotify, Apple Music, Netflix, etc.

I made the purchase, it cost me $278 US or ~ £220GBP for the unit with the DSP module, this included tracked shipping to the UK, and within a week or so, the Black Box arrived—well-packaged and complete with the parts as discussed, along with the DSP board. I also ordered an Apple TV to pair with it.

To connect the Black Box to my DAC, I had to find a suitable DB25 to XLR cable. I then had to re-solder the connections to match the pinout of the Black Box (Yulen does sell premade cables if needed). This was my first time soldering a DB25 connector, so managing the soldering iron temperature and ensuring correct pinout was a bit tricky, but I got it done in a few hours.

Without the DSP board, the Black Box is plug-and-play and works flawlessly. It automatically adjusts audio output from stereo to multichannel depending on the content being played (e.g., Apple Music defaults to stereo, while movies play in 5.1). I’ve even seen it handle all 8 channels of audio. Since everything remains in the digital realm, there is no loss in sound quality.

I’m currently running a 3.2 audio system. Since I haven’t set up the DSP yet, I’m only getting stereo sound and 3.1 channels for movies. Once the DSP is configured, I plan to add room correction and blend some of the rear surround channels into the front speakers. I also hope to split the LFE channel between the two subwoofers and add crossovers to the stereo signal to make full use of both subs.

Here’s the current setup, outlined in the flow chart below:

Flow Chart:
  1. Apple TV (Spotify and video source)
    ↓ (Audio and Video via HDMI)
  2. Black Box
    • Receives LPCM HDMI output from Apple TV.
    • Applies DSP (not yet implemented).
      ↓ (4 Audio Channels via AES/EBU) + (HDMI video signal to TV)
  3. Okto DAC8(8 Channel DAC)
    • Receives 4-channel audio via AES/EBU from Black Box.
    • Converts digital audio to analog.
      ↓ (Analog Audio Out)
  4. 1 DIY Mono Block Amplifier + 1 DIY Stereo Amplifier + 2 Active Subwoofers
    • Each amplifier receives one analog audio channel.
    • Amplifies audio for output to speakers.
Images of it in my application:

IMG_0289.JPG


IMG_0291.JPG


IMG_0290.JPG


I ordered the required DB25 cable from here:
DB25 Cable

Additionally, I’ve ordered a generic UART USBi module and DB9 connector to communicate with the DSP board.

I support small manufacturers and appreciate the creativity and support from the DIY community, so I highly recommend checking out Yulen’s Black Box if you’re looking for a similar solution. Yulen also makes and sells other innovative products, which you can find on his blog (use a browser translator) and on his TikTok page, @SoundProAudio8.

He also has some upgrades for the black box in the pipeline, one is including HDMI ARC. Personally I would like to see 2x HDMI input with one being HDMI ARC, a boost in the frame rate at 4k to 60hz even though 30hz has proved fine so far and HDCP decryption. Overall I'm very happy with product though and even without any upgrades it's perfect, I'm looking forward to utilising the DSP in the near future. Note the unit already has 2x HDMI input and 1x HDMI output

Yulen is very patient and helpful but please be mindful of the language barrier.

Has anyone else used the Black Box and if so what's your application?
 
Last edited:
  • Like
Reactions: fb and yulen
I acquired the first version of this HDMI to AES-EBU converter as presented by Yulen (SoundPro Audio) on DIYAudio: the RED BOX (RB). Then, for several reasons, I very quickly got the new version: i.e. the BLACK BOX (BB). My system is now organized around a MacMini M4 as a streamer, without screen, keyboard or mouse, therefore controlled remotely by an iPad or a MacBook Pro in screen sharing or under RVNC Viewer. There are many other ways to control it (Anydesk, ...). The connection is made through HDMI between the MacMini and the BB. I also have an Apple TV 4K connected in HDMI to the BB since the BB has 2 HDMI inputs. The HDMI output is connected to a JVC D-ILA video projector and the AES-EBU audio by the DB25 output is then handled by a MC Pro TRINNOV processor which performs all the processing (multi-DA conversion, room processing, phases and levels adaptation, etc.). The audio system is a set of GENELEC studio monitors (3 front speakers 1032A, two subs 1094A and two surrounds 8030 CW). I have an Apple Music subscription which allows through the decoder integrated into the MacMini to benefit from spatial and stereo sound in HR without needing a special external decoder! Of course any multi-channel video signal broadcasted by the ATV (Netflix, etc.) is played by the entire audio system! The TRINNOV processor is particularly flexible and allows perfect calibration and routing according to the connected audio system. This is the most direct connection I found avoiding too much cables and links. SIRI controlled all the system through a SONOFF interface.
The BB is sold with a DSP option based on the ADAU 1701 chip that the Sigma Studio software from Analog Devices allows to program according to several needs. Given the facilities of APPLE Music and devices I haven’t used it for the moment…
Finally, the included remote control allows to switch the HDMI inputs of the BB according to whether I use the MacMini or the ATV…
The BB is a beautiful piece of gear using a different approach than the competition to convert HDMI audio to AES format, with a very low level of jitter, delivering professional signal levels needed in a professional environment. For those who need special AES processing SoundPro Audio can provide useful support ! Brilliant definitely…
 
  • Like
Reactions: STUNTfingers
Orange pi 5 plus is a cheapish SBC that had HDMI etc rx. Maybe it can receive raw Atmos bitstream and has plenty compute to dsp for room correction, and decode realtime. Someone would have to put in the hard work to do this, but this could obsolete AVRs and add in support for any bitstream, apple, eclipsa, Dolby Atmos, dtsx.
 
  • Like
Reactions: STUNTfingers
@ondesx thank you for your post and explaining your setup, you’re using some very high end gear there, especially the MC pro Trinnov. You’ve also highlighted to me that the Black Box has two HDMI inputs, I thought it had two outputs, this is great news as it means I can connect a games console or other source to it.

I’m eager to get the DSP working and Yulen has kindly shared some instructions on how to use it, there’s also some documentation I’ve found here but I’m yet to look through it properly which may help you in setting the DSP up. Maybe @yulen can confirm whether this is applicable to the black box.

I’m so impressed with how well this product works.
 
  • Like
Reactions: ondesx
@crazycoder : There are many RK3588 boards with HDMI input available, but mainline linux kernel support is what drags this use case down https://gitlab.collabora.com/hardwa...-rockchip-3588/-/blob/main/mainline-status.md . Their android kernel (far from mainline) is probably quite well supported, I tested 4k/60Hz HDMI capture -> playback on RK3588 running android, working fine.

For receiving hirez from a commercial player you would need either need working HDCP setup & purchased key (IMO unfeasible for small volumes), or a HDCP stripper before the input (viable).

For SW decoding of atmos objects stream - so far I have heard only about https://github.com/VoidXH/Cavern, quite a long way to a simple-to-use solution, IMO.
 
  • Like
Reactions: crazycoder
@phofman

Thanks for the reply

I am interested if feasible to build a stripped down kernel with my own user program to basically diy a receiver.

Idea being earc input for audio only. Process raw stream, and arc events for volume control, etc. add detection of stream if Atmos render it, if 2ch PCM try Dolby up sampling, after signal decode and channel mapping then add room correction all digitally then stream it to a multi channel dac at highest bit rate and minimal samples as hw permits.

There is a lot of compute on this chip so maybe it's possible

I took a look at cavern it's the Atmos lite decoding, but that's a good start.

I'm unsure if you plug in a source into the HDMI in and bitstream Atmos, what comes out from ALSA. Have you experimented with that by chance?

Thanks
 
Last edited:
I am interested if feasible to build a stripped down kernel with my own user program to basically diy a receiver.
A stripped-down kernel requires a full version first, which is not available yet. Still a long way to go.
I'm unsure if you plug in a source into the HDMI in and bitstream Atmos, what comes out from ALSA. Have you experimented with that by chance?
There is no HDMI RX support for RK3358 in mainline kernel yet, let alone support for metadata in the stream where the Atmos objects are located. I doubt Rockchip is interested in decoding Atmos because it requires expensive licenses both annually and per piece to be done legally.

But you can study the SoC datasheet (HDMI RX is in part 2 of RK3588 Technical reference https://github.com/FanX-Tek/rk3588-TRM-and-Datasheet/tree/master ) and upstream + extend the existing HDMI-RX driver in the android kernel. Unlike e.g. in RPi all the technical information is publicly available (at least the datasheet seems quite exhaustive).
 
@phofman that's great I took a look at the register spec, this seems like a fun project if I can get time to modify the driver, I have done linux kernel driver in the past. I know nothing of the audio world processing other than samples at the ALSA level, so understanding audio HW and flows will be new to me. There are some registers for 3d audio, it looks like there is some parsing of meta data in hw? Not clear. I wonder if there is access to VI style sequences for testing the hw. That could help, but doubtful for anything. Play and get working!
 
Looking at the specs this does not have HDCP so will only receive audio at 16/44.1? Also unless the source supports it you won't get individual channels sent over the HDMI link?
1. Support up to 24bit/96K (96K is limited by AES output)
2. Support stereo and multi-channel output (greater than two channels, up to 8 channels)
3. The internal AES output interface is modular, and other interface outputs can be expanded through the I2S interface
4. The device has a total of 5 groups of AES outputs. After adding a DSP card with more than 8 channels, an additional 2 channels (1 group of AES outputs) can be added, and some frequency division processing can be performed, etc.
5. In addition to an independent HDMI 2-to-1 chip, the HDMI input end also has an independent HDMI 1 TO 2 chip, where HDCP is processed.
 
  • Like
Reactions: STUNTfingers
@STUNTfingers : Please find enclosed a picture showing the heart of the system i. e. The BB, the MacMini M4 and the TRINNOV MC Pro !… The studio monitors from Genelec have their own amplifiers, then they are perfectly optimized and this setup reduces the links : there are only 7 XLR cables between them and the processor… I have to precise that the room was also treated acoustically with some absorption panels and skyline diffusers to avoid as much as possible the early reflexions which results in a large, stable and holographic soundstage even in stereo mode !… Obviously, the Spatial Audio is really immersive !
 

Attachments

  • FullSizeRender.jpeg
    FullSizeRender.jpeg
    584.6 KB · Views: 56
Last edited:
  • Like
Reactions: yulen
The BB is totally different of the HDMI extractors ! It’s a special piece of gear converting HDMI audio in a still digital professional format i. e. AES/EBU. Moreover, this conversion is available for 8 channels and not only for stereo…
 
@STUNTfingers : Please find enclosed a picture showing the heart of the system i. e. The BB, the MacMini M4 and the TRINNOV MC Pro !… The studio monitors from Genelec have their own amplifiers, then they are perfectly optimized and this setup reduces the links : there are only 7 XLR cables between them and the processor… I have to precise that the room was also treated acoustically with some absorption panels and skyline diffusers to avoid as much as possible the early reflexions which results in a large, stable and holographic soundstage even in stereo mode !… Obviously, the Spatial Audio is really immersive !
Thanks for sharing this.

It’s looks like a very high end setup. So the Trinnov manages the DSP? Do you happen to know the amount of Jitter the black box makes?

Also are you using all Genelec for 5.1?
 
  • Like
Reactions: ondesx
@STUNTfingers :

1-Sorry if I was imprecise in my description: no, the Optimizer MC Pro from TRINNOV does not manage the DSP of the BB... This audio processor is connected to the BB through the DB-25 cable according to the DOLBY protocol (which is not the TASCAM protocol, nor the standard AES-EBU protocol, because initially the BB was used with CP650/750 by Yulen). Be careful when building the sub D-25 cable if the standard AES-EBU is to be carried out by any audio device following the BB.

2-We discussed with Yulen the level of jitter of the BB which uses a specific chip (https://www.analog.com/media/en/technical-documentation/data-sheets/ADV7612.pdf) and we concluded that it at least is comparable or even better than the solutions using an FPGA... It is planned to include an option to the BB in the future, but I prefer that Yulen talk about it.

3-Yes, all my GENELEC monitors are very easily routed through the TRINNOV MC Pro processor. The processor allows several routing options saved in its memory and accessible through the configuration pages. All the management is remotely accessible, since the MC Pro acts like a server and is on your network with an IP address. When the output signals of the BB are multi-channel, such as with NETFLIX for the Apple TV 4K or with the Mac Mini M4 with the Apple Music subscription, then the audio-spatial is reproduced WITHOUT any other decoder and in high resolution (24 bits - 192 kHz) by the set of 5 monitors (L,C, R, LS and RS) and the two subwoofers connected to the MC Pro. This is the reason I presently don't need to use the DSP included in the BB...

As it is currently, thanks in particular to the BB, this is a high-resolution audio system. Of course, as I said elsewhere, the listening studio is itself treated acoustically with absorbers and especially skyline diffusers that limit the attenuation of the signal and avoid many other acoustic problems such as early reflections, fluttering, excessive reverberation, etc.
 
  • Like
Reactions: STUNTfingers
I just ordered this Nuprime-X H16-A: https://nuprime-x.com/product/h16/. It won't ship from Taiwan until May though. Am very excited as it decodes ATMOS. They have only tested it to 9.1.4. Whereas my Audient Oria does 9.1.6. But I guess I will get to do the 9.1.6 test for them! Plan is to send it 7.1 True ATMOS over HDMI from an Nvidia Shield running Kodi. Hoping then to put sixteen channels (9.1.6) over Dante to the Oria for sound while the video continues to the TV over the eARC output. The engineers in Taiwan have not promised. But they have not discouraged me either.
 
@Carousel : Thank you for this link. There are interesting pages about 3D audio on their web site. They pointed out that the audio decoding of the macOS for ATMOS audio is presently in Dolby Digital Plus which is a lossy CODEC... I was told that it was High-Resolution, i. e. Dolby True HD ! I think they are right since the infos from Apple are sometimes confusing... I'll investigate a software solution to this issue.
Anyway, since Apple doesn't stream any DolbyTrueHD signal, no one could decode it obviously !...
Though this device looks interesting even if it couldn't make more than the macOX decoding (i. e. Dolby Digital Plus), I didn't understood how the audio is distributed to the amplifiers/loudspeakers or monitors since the device outputs ONLY digital audio, thus the need for an external multi-DAC with or without an additional 3D decoder for TrueHD...
 
  • Like
Reactions: STUNTfingers