First post! I joined here as I think I'm wearing my welcome on the electronics forum 🙂
There may be more experts in here to maybe nudge me away from peril or my own stupidity and in the more correct direction.
I set myself some requirements for how I personally want my audio served up.
A desktop system which will provide
Without building anything I can find cheap rubbish for about £150 that will work for 75% of those requirements, but will end up fissiling and scratcing an failing after a month. I can also find studio grade monitor/cue routing boxes for 1U racks with multiple inputs and outputs and EQ + 4 or more headphone amps. However these cost £3k +
I'm not looking for "professional grade audio" or even a "professional grade mixer". Just something that allows me to maintain pure digital audio until the last possible second when it's off loaded at the best quality reasonably achievable to an amplifier. In the case of the headphone amp that will hopefully be a PCB trace away from the DAC.... if I can't find a beefy enough I2S headphone amp with more than 500mW power.
To that end. Unless required to "cast" up in bit width for calculations all audio will remain 16 bit. Unless there is a good reason, such as word alignment, buffer alignment or other prevalent reason, all audio will be 48K stereo.
This is cheating. Yes. It's the only reason I consider it achievable. If I wanted to do this to support all the possible bit widths, sample frequencies and all manor of formats and endpoints ... it would be better off just using a massive DSP chip and becoming long term friends with it's datasheet by the bed side!
By standardising all streams to the single 48k 16 bit stereo, and cutting the buffers to be the standard 1ms from USB and I2S I can treat audio buffers as cookie cutter items. All of them will be 192 bytes containing 96 samples. 48 left, 48 right. <- this is subject to a trade off between efficiency/stability of processing larger buffers versus latency (and increased synchronisation requirements) of the same.
Ideally, running a SINGLE I2S clock on the project will prevent buffer creep/clock skew on all but the USB end points which will need realignment from time to time. (present test setup loops the buffer in about 30 minutes with a 24.576Mhz clock on a breadboard, not ideal as that includes about a minute where both DMAs are reading/writing the same frame!).
The current outline architecture is really hinged on one question. Do I want to continue to use an MCU as the USB endpoint. I have had far, far too many issues with the driver code surrounding USB Audio Class on STM32. I still haven't solved the fact it will not respond to the incoming audio stream about 4 times out of 5. I have to continually reset the MCU to get it to pick up the stream. Obviously a driver timing issue there, it's missing an important packet or it's receiving it when it's not ready for it and the PC does not send another. I expect forcing an endpoint reset on power up or on USB cable insertion may help sync them. The other option is using a hardware IC like a PCM2706. That has the issue where "consumer audio" ICs, tend not to like 24.576Mhz clocks. They tend in fact to have a max I2S MCLK of 12Mhz. So until I can get my hands on one (shipping and IC supply issues) and test a few prototypes out I can't know if it will be worth seeking an alternative slower clock or continue with the MCU approach. The ideal here would be for the PCM2706 to be running with an external I2S master clock, and for it to do the USB sync'ing or asking the host to respond to ITS clock. That would completely free me from doing any reclocking / reframing of audio streams - a huge bonus.
The decision above knocks on to the internal bus architecture. I2S is great and all, but it's slow as it forces you down to the sample frequency domain for transmission. I mean that kinda is the point of it. However the I2S ports are really just SPI ports with a few additional signals. For the internal bus I am free to use SPI at a MUCH faster rate than the I2S. A 48K@16bit stereo I2S stream is about 1.3Mbit/s The native hardware SPI bus on even the small STM32F411 runs at 50Mbit/s. (well, 48Mbit/s if you need USB 48MHz clocks too). A single frame of I2S can be transmitted and received in less than 250us on that bus.
High level: Multiple input end points can "dump" their audio packets (pre-gained) to one or two internal buses. These buses are mastered and processed by a more beefy MCU such as a STM32H7. Processing will include down-mixing - simply mixing two streams together, parametric EQ per bus, not per channel, not unless I have LOTS of free horsepower, which I doubt. I won't be writing perfectly optimised filter code and will probably be re-calculating biquad coefficients on the fly.
Finally, of course, output routing. Each internal bus can be assigned to any output.... or more practically the buses will send their audio out regardless and each output endpoint is free to pick up that bus or not.
4 x Input - i2s -> 2x Bus mixer STM32F411 - 50Mbit SPI -> 2x Bus processors STM32H7 -> 4 x i2s output.
If the PCM2706 avenue hits a dead end, I can drop the i2s bus and just use SPI bus there. The exception would be the aux analouge in which will have to have it's i2s stream rebussed to SPI.
I already have breadboard prototypes of mixing an ADC stream with a USB stream, including buffer alignments. I have a list of prototypes to put together and test. Each sub component or bus topology has a set of prototypes to test it works as expected or not as the case might be. Each helps me decide which path to take.
The hardest part is steming from my lack of formal mathematics training/study. I just don't speak maths. That creates an issue when you come to researching EQs and filters, which the main populus in that field are electrical engineers who insist on calculating everything from scratch every time and forcing you to sit through listening to them explaing it in derivatives each time. I can find OS libraries I can pilfer/borrow, it's just finding the easies to port to ARM DSP biquads.
There may be more experts in here to maybe nudge me away from peril or my own stupidity and in the more correct direction.
I set myself some requirements for how I personally want my audio served up.
A desktop system which will provide
- multiple USB digital inputs,
- a single 3.5mm/RCA analogue 'aux' in.
- multiple 3.5mm/RCA analogue outs.
- Option of a single digital output, tos, spdif TBC.
- Have 2 (or more) internal 'buses' with EQ, minimal processing (mix/balance/eq).
- nice to have, knobs, buttons and a LCD screen UI.
Without building anything I can find cheap rubbish for about £150 that will work for 75% of those requirements, but will end up fissiling and scratcing an failing after a month. I can also find studio grade monitor/cue routing boxes for 1U racks with multiple inputs and outputs and EQ + 4 or more headphone amps. However these cost £3k +
I'm not looking for "professional grade audio" or even a "professional grade mixer". Just something that allows me to maintain pure digital audio until the last possible second when it's off loaded at the best quality reasonably achievable to an amplifier. In the case of the headphone amp that will hopefully be a PCB trace away from the DAC.... if I can't find a beefy enough I2S headphone amp with more than 500mW power.
To that end. Unless required to "cast" up in bit width for calculations all audio will remain 16 bit. Unless there is a good reason, such as word alignment, buffer alignment or other prevalent reason, all audio will be 48K stereo.
This is cheating. Yes. It's the only reason I consider it achievable. If I wanted to do this to support all the possible bit widths, sample frequencies and all manor of formats and endpoints ... it would be better off just using a massive DSP chip and becoming long term friends with it's datasheet by the bed side!
By standardising all streams to the single 48k 16 bit stereo, and cutting the buffers to be the standard 1ms from USB and I2S I can treat audio buffers as cookie cutter items. All of them will be 192 bytes containing 96 samples. 48 left, 48 right. <- this is subject to a trade off between efficiency/stability of processing larger buffers versus latency (and increased synchronisation requirements) of the same.
Ideally, running a SINGLE I2S clock on the project will prevent buffer creep/clock skew on all but the USB end points which will need realignment from time to time. (present test setup loops the buffer in about 30 minutes with a 24.576Mhz clock on a breadboard, not ideal as that includes about a minute where both DMAs are reading/writing the same frame!).
The current outline architecture is really hinged on one question. Do I want to continue to use an MCU as the USB endpoint. I have had far, far too many issues with the driver code surrounding USB Audio Class on STM32. I still haven't solved the fact it will not respond to the incoming audio stream about 4 times out of 5. I have to continually reset the MCU to get it to pick up the stream. Obviously a driver timing issue there, it's missing an important packet or it's receiving it when it's not ready for it and the PC does not send another. I expect forcing an endpoint reset on power up or on USB cable insertion may help sync them. The other option is using a hardware IC like a PCM2706. That has the issue where "consumer audio" ICs, tend not to like 24.576Mhz clocks. They tend in fact to have a max I2S MCLK of 12Mhz. So until I can get my hands on one (shipping and IC supply issues) and test a few prototypes out I can't know if it will be worth seeking an alternative slower clock or continue with the MCU approach. The ideal here would be for the PCM2706 to be running with an external I2S master clock, and for it to do the USB sync'ing or asking the host to respond to ITS clock. That would completely free me from doing any reclocking / reframing of audio streams - a huge bonus.
The decision above knocks on to the internal bus architecture. I2S is great and all, but it's slow as it forces you down to the sample frequency domain for transmission. I mean that kinda is the point of it. However the I2S ports are really just SPI ports with a few additional signals. For the internal bus I am free to use SPI at a MUCH faster rate than the I2S. A 48K@16bit stereo I2S stream is about 1.3Mbit/s The native hardware SPI bus on even the small STM32F411 runs at 50Mbit/s. (well, 48Mbit/s if you need USB 48MHz clocks too). A single frame of I2S can be transmitted and received in less than 250us on that bus.
High level: Multiple input end points can "dump" their audio packets (pre-gained) to one or two internal buses. These buses are mastered and processed by a more beefy MCU such as a STM32H7. Processing will include down-mixing - simply mixing two streams together, parametric EQ per bus, not per channel, not unless I have LOTS of free horsepower, which I doubt. I won't be writing perfectly optimised filter code and will probably be re-calculating biquad coefficients on the fly.
Finally, of course, output routing. Each internal bus can be assigned to any output.... or more practically the buses will send their audio out regardless and each output endpoint is free to pick up that bus or not.
4 x Input - i2s -> 2x Bus mixer STM32F411 - 50Mbit SPI -> 2x Bus processors STM32H7 -> 4 x i2s output.
If the PCM2706 avenue hits a dead end, I can drop the i2s bus and just use SPI bus there. The exception would be the aux analouge in which will have to have it's i2s stream rebussed to SPI.
I already have breadboard prototypes of mixing an ADC stream with a USB stream, including buffer alignments. I have a list of prototypes to put together and test. Each sub component or bus topology has a set of prototypes to test it works as expected or not as the case might be. Each helps me decide which path to take.
The hardest part is steming from my lack of formal mathematics training/study. I just don't speak maths. That creates an issue when you come to researching EQs and filters, which the main populus in that field are electrical engineers who insist on calculating everything from scratch every time and forcing you to sit through listening to them explaing it in derivatives each time. I can find OS libraries I can pilfer/borrow, it's just finding the easies to port to ARM DSP biquads.
To explain the requirements via use cases.
Scenario: I am in the office and playing some games on the gaming PC. Currently that analogue output goes to a multi-output headphone amp which I use as an output mixer. I have levels for the wired headphones and the desktop monitor speakers. Either or both at whatever level. I even have the ability, on that particular PC to connect to the wireless headphones. When they will take the audio stream over entirely.
However that is not the only source of sound in the office. Following the gaming season I might shut down the power hungry gaming PC and carry on with the micro-pc for basic desktop stuff. Currently I have to swap the audio cable over physically. The micro-pc does not have bluetooth and even if I added a dongle, it would fight with the gaming PC on binding/pairing etc. Been there with the bedroom PC and TV fighting over the headphones.
I have plans to add another tiny micro PC dedicated to the electronics bench as a "thin client" and provide USB connections and a monitor. Having sound would be nice also.
My "audio router" box will solve this by making the USB audio inputs from both/all PCs available in one place. Also in one place are the outputs for the desktop speakers, the headphones AND a bluetooth source for the headphones. So all outputs can be interconnected to all sources, regardless of which PC or device is on or off, playing or stopped.
There is one aspect I insist on being "professional" grade and that is in headroom and margins. In particular or by examplar, headphone amps. Most "consumer grade" headphone amps have a nanny state limiter. They are excessively annoying. If I want to melt my headphones, or damage my hearing that is my choice. I shall not have any IC or hardware which thinks otherwise. My current headphone amp has been "limited" by it's output series resistors (against 38Ohm headphones) to about 1 watt. It's not that I ever push the headphones to 1 watt, or that they even play at that level without distortion, it's just that I never want to run out of gain budget. I'm willing to accept the risk of hitting play without checking the levels or EQ settings can be painful.
Scenario: I am in the office and playing some games on the gaming PC. Currently that analogue output goes to a multi-output headphone amp which I use as an output mixer. I have levels for the wired headphones and the desktop monitor speakers. Either or both at whatever level. I even have the ability, on that particular PC to connect to the wireless headphones. When they will take the audio stream over entirely.
However that is not the only source of sound in the office. Following the gaming season I might shut down the power hungry gaming PC and carry on with the micro-pc for basic desktop stuff. Currently I have to swap the audio cable over physically. The micro-pc does not have bluetooth and even if I added a dongle, it would fight with the gaming PC on binding/pairing etc. Been there with the bedroom PC and TV fighting over the headphones.
I have plans to add another tiny micro PC dedicated to the electronics bench as a "thin client" and provide USB connections and a monitor. Having sound would be nice also.
My "audio router" box will solve this by making the USB audio inputs from both/all PCs available in one place. Also in one place are the outputs for the desktop speakers, the headphones AND a bluetooth source for the headphones. So all outputs can be interconnected to all sources, regardless of which PC or device is on or off, playing or stopped.
There is one aspect I insist on being "professional" grade and that is in headroom and margins. In particular or by examplar, headphone amps. Most "consumer grade" headphone amps have a nanny state limiter. They are excessively annoying. If I want to melt my headphones, or damage my hearing that is my choice. I shall not have any IC or hardware which thinks otherwise. My current headphone amp has been "limited" by it's output series resistors (against 38Ohm headphones) to about 1 watt. It's not that I ever push the headphones to 1 watt, or that they even play at that level without distortion, it's just that I never want to run out of gain budget. I'm willing to accept the risk of hitting play without checking the levels or EQ settings can be painful.
Seems unnecessarily complicated, and resolution limited by building in the USB codecs. Yamaha made a sort of DAC especially for desktop use that might do the job. Yamaha Personal Sound Processor DP-U50
http://www.byrneweb.com/sunburn/audio/yamahadpu50.html
https://www.amazon.com/Yamaha-Audio-Soundboard-Discontinued-Manufacturer/dp/B00005I9PU
better pictures of one here:
https://www.ebay.ca/itm/334209249252
It has a headphone jack, but the guy who reviewed it wasn't impressed. Maybe it's possible to wire in a better headphone amp.
Or, maybe a Cambridge Audio DacMagic? I'm using one now I got from a yard sale for $20, which sadly does not have volume control or headphone output or go above 96k. The current version does, and Bluetooth input.
https://www.cambridgeaudio.com/row/en/products/hi-fi/dacmagic/dacmagic-200m
An A/V receiver or pre/processor might do the job. Budget-priced new pre/pros are about non-existant, but obsolescence might mean that pre-HDMI units have poor resale value. An A/V receiver might seem like overkill, but a pre-HDMI unit can be cheap, and have a bunch of digital inputs, depending on where it sits in the model line-up, and probably has a headphone output. Some Pioneers had multichannel line outputs if you wanted to use a better headphone amp.
http://www.byrneweb.com/sunburn/audio/yamahadpu50.html
https://www.amazon.com/Yamaha-Audio-Soundboard-Discontinued-Manufacturer/dp/B00005I9PU
better pictures of one here:
https://www.ebay.ca/itm/334209249252
It has a headphone jack, but the guy who reviewed it wasn't impressed. Maybe it's possible to wire in a better headphone amp.
Or, maybe a Cambridge Audio DacMagic? I'm using one now I got from a yard sale for $20, which sadly does not have volume control or headphone output or go above 96k. The current version does, and Bluetooth input.
https://www.cambridgeaudio.com/row/en/products/hi-fi/dacmagic/dacmagic-200m
An A/V receiver or pre/processor might do the job. Budget-priced new pre/pros are about non-existant, but obsolescence might mean that pre-HDMI units have poor resale value. An A/V receiver might seem like overkill, but a pre-HDMI unit can be cheap, and have a bunch of digital inputs, depending on where it sits in the model line-up, and probably has a headphone output. Some Pioneers had multichannel line outputs if you wanted to use a better headphone amp.
Those options don't meet the requirements. The DacMagic for example has 2x TOS inputs, one doubles with an SPDIF. 1 USB.
So the inputs. The USB is useful to me. I have one optical SPDIF out, BT of course. Thing is it's one or none. Not what I want. I want simultaneous audio.
This is maybe one of the areas my requirement depart rapidly away from "HiFi mindset". I'm not interested in creating a Hifi. I may leave the analogue circuitry completely out of the box for various reasons. I'm creating a digital audio router to handle the annoyance of having different sources. These sources are not music sources, or movie sources they are generic sources from things like PCs and BT devices like smart phones which can carry all kinds of audio use-cases.
In the HiFi world the number of times you want to listen to the digital radio at the same time as watching a DTS movie are so close to zero very few manufacturers would even put that on the 5th page of requirements.
In my case I might have music playing from one of my sources, but I would like that the others remain active, such that, say, for example my boss phones me on the laptop, I'd still like that ringer sound to play through the outputs.
Yes this can all be done quite easily if you are on a single PC host. You can mux, duplex and even select outputs, all you want. It's when you have 3 PCs it becomes tricky. I've been down that research route of networked sound and sound servers and ... no.
Then there are outputs. I'm going to bet when you plug the headphones into that DACMagic the speaker outs cut. I'd like a way to re-route different inputs to different outputs. Such that most things come out the speakers, but if I want to listen to one source through the headphones.... I don't have to start muting the speakers, turning the amp down or repatching things.
On the complicated aspect. If you stay away from the high-end studio gear and you do not have a requirement for high bandwidth multi-channel like DTS/DolbyN.N.N and what not, then the consumer grade DAC/ADC/Codec/DSP IC market tends to produce ICs which just work. There is no messing around with config interfaces and memory registers, you plug them in (solder them up), pull the FMT and FSEL pins (Format and function select) to the way you want them and turn them on. Boom out comes you stream of digital audio. ADCs and DACs function that way. So do the USB codecs. Literally china will sell you a board with a high end consumer ADC on one end and a DAC on the other for $4. You can mock, but if you knew what you were looking at you probably wouldn't. I've seen them with Cyress audio ADCs and BurrBrown PCM5102's setup for 384K 32bit... for less than $10. You productise that and sell it in the western world with a nice brand name on it and it's a £300+ box.
Except for the later (USB), these consumer grade "jelly bean" devices come with a range of formats. ADCs often have selectable 48k-192K 24bit or higher. My currently selected output DAC is a PCM5102A. It will auto detect and just work whatever you throw at it, within reason. Up to 384K@32bit. It's impressively quiet, provides more dynamic range and stereo separation than my PC's headphone socket (an ES9037 I believe). It's a £3.88 +VAT part. I'm going to order a strip of 10 of them from TI direct. In case a cook a few soldering them. The only downside is it's overload behaviour is hideous so the circuitry downstream of it will need to buffer it with enough gain that it never needs to be driven hard. It will drive a pair of headphones to moderate volume though with no additional supporting circuitry or amplifier.
DSPs. Particularly audio DSPs are not all created equally. They don't all do the same things. Typically if a commercial product advertises that it has a "DSP" they tend to me a digital effects processor, which is still a DSP, a DSP with specific functions enabled etc. Home theatre DSP/Effects modes for example. A less "shouty" DSP usage would fit my project entirely and that is using the DSP for all the other stuff they do really well. Such as manage multiple cross clock domain I2S/SPDIF/etc channels, delays, alignment, buffering, correction, filters, anti-pop, anti-noise, anti-buffer crash all the little edge cases covered in one central processing IC. Brilliant.... Until you open the datasheet and find it's 100 pages long... but is only the electrical datasheet, to find out how to use it you need the Reference Manual. It's 2500 pages. You find that the little inch square DSP has not one but two ARM cores on it, it has kilobytes of configuration registers I2C and SPI configuration buses. It's a career learning to drive one of those! It's the programming of those bad boys that adds the engineering labour costs to a high end studio rack box that costs 10K+.
So not for me. The central DSP with a few auxilary supporting front-ends or rear-ends makes far more commercial sense if you consider a digital audio product company will not be doing all that from scratch, but will have in house experience to take advantage of. Once they program a DSP the way they wish they can stamp out millions of products with just software config tweaks and the BOM cost will come down because the number of components comes down, even if the remaining ones are higher value.
In terms of actual digital signal processing all I need is gain, adders and the most difficult part, a multiband parametric EQ. 2 shelves and 3 peak bands would do.
So that is why the plan is to use cheap though powerful general purpose MCUs and do the job of the DSP in software. The actual MCU chip is probably two or three times more expensive than a basic DSP but I only need to make one or two of these. So I can afford to splurge on BOM, there is effectively no limit on the BOM. If an IC I want costs me £15 each due to shortages, I'm fine with that. My day job is software engineering, so I'm fine with software complexity.
Consumer grade ADC ICs. Consumer grade DAC ICs. Consumer grade USB Bridges. Probably a few consumer grade I2S reclocking ICs.
48K 16 bit. Is the target requirement for V1. I have decided to keep a straight flat clock/sample rate throughout. There are reasons and caveats. I don't need professional grade totally lossless encoding/decoding. I'm fine with my 20kHz nyquist. Hell, I can't hear anything over 14K at all and anything beyond about 12K is a perception rather than a sound. I don't even hear CRTs whine any more when I enter a room with one. I literally dropped a high shelf with a 3db cross over around 16k on an EQ and I couldn't hear the difference between +15db and -15db. The cat was not impressed though.
32 bit. 32 bit is both 'easy' and really difficult for the same reason. It aligns perfectly with most ARM core MCU/MPUs architecture. It's DSP is design to work with 32bit numbers. The reason that makes things harder however is that ALL of your mathematics have to be carefully bounded within the 32bit width and cognisant of any overruns/carries. Such that you will in reality be force to DROP precision by shifting your bytes right so you have headroom for calculating or downmixing. It's that horrible sacrifice audio people are too willing to make IMHO. You don't go down in precision to come back up, you always do it the other way! 99.9% of amplifiers attenuate their input signal and then amplify it with a fixed gain. It's the same thing in the DSP code. If you start with full range 32bit numbers, you can't, quite just add them together and you certainly can't multiply them by anything greater than 1. So you shift them (divide by 2) as many times as you think you'll need headroom for what comes next. After that stage you are then responsible for rescaling your output back to 32bit. Yuk.
24bit. A word 3 bytes wide. No thanks. Next. Seriously it's just a whole load of 'dicking about' with half words and quarter words. It "does" provide you with some headroom in a 32bit word for safety though.
16 bit. It's lovely. It's mommy bears porridge. You can get 2 samples into a single 32bit word "on the wire", that's perfectly happily going to L/R (or R/L depending if you forgot to set that pin right 🙂) One L sample one R sample. In one 32bit word. Such that if you want to do anything relating to the stereo pair... it's right there in the same temporal and logical location. No need to go peek&seek through the buffer to find the associated L/R sample or the other xSB of the 32bit word.
When it comes to calculations it's very natural in ARM to treat 16bit and 32bit words interchangable. You can ask a 16bit question of a 32bit word and get a 16bit answer. If that makes sense. In code you can literally cast buffers between 32bit unsigned and 16bit signed as long as it's valid. If you take 2 or 5 16 bit samples and add them all together you can bet it won't fit into your 16bit word, not without throwing away 4/5ths of each's dynamic range. But they will fit into the parent 32bit word.
The trickier part is what to do with the output that now doesn't fit into your output word. There are two approaches. Is it the result of the previous calculation? Did you just add a bunch of stuff? Then chop it down by the appropriate weighting that undoes the totalling effect (weighted average either static or dynamic (multichannel compression)). Note doing this after the mix will preserve the most dynamic range/snr. Or you could compress it. Limit it, or if you are in a section of code where it really wasn't your fault, you mute the whole frame, flash a red LED that says "CLIP" or "O/L" and move on. If someone dumps two 100% volume streams into a single mix bus it will CLIP and that's not the processing bus's fault.
Caveats. Other than using 32 bits for headroom and alignment for the likes of ARM 32bit BiQuads on their DSP extensions I'm sticking to 16bit 48K as the target architecture. That does not mean I'm not prototyping and playing with higher rates. It's just that to step up to even 24bit/96k would mean my margins get a lot narrower and my code and interfaces need to be tighter and tighter. At least the first time round I'm aiming somewhere I can be a bit sloppy, slow and still not add more than a few ms of latency. I do have an Atmel DSP Anamero(?) in the post. 384K@32bit + D512 and all those high end formats. Will be interesting to play with, but the processing power to work with those bitrates is.... you'd need an ASIC DSP basically. They only truely shine when you are handling 7,8,9,10 channels of theatre audio. It would be perfume on a pig for a single stereo stream from Spotify, no?
However, your post did remind me I DO have an optical SPDIF on the "big" PC. Turns out several of the same codec chips I have been looking at support SPDIF, That provides one less pesky USB ground to worry about. My easy solution for power rail noise is going to be running the box on batteries with opto coupled USB ports. When I switch it OFF, it can charge it's battery with as much noise as it likes. Opto-couple USB ports that will support greater than USB1.1 and 48K 16Bit 2 channel audio cost a fortune. USB1.1 couplers cost a fiver. Another reason to stay 48k.
So the inputs. The USB is useful to me. I have one optical SPDIF out, BT of course. Thing is it's one or none. Not what I want. I want simultaneous audio.
This is maybe one of the areas my requirement depart rapidly away from "HiFi mindset". I'm not interested in creating a Hifi. I may leave the analogue circuitry completely out of the box for various reasons. I'm creating a digital audio router to handle the annoyance of having different sources. These sources are not music sources, or movie sources they are generic sources from things like PCs and BT devices like smart phones which can carry all kinds of audio use-cases.
In the HiFi world the number of times you want to listen to the digital radio at the same time as watching a DTS movie are so close to zero very few manufacturers would even put that on the 5th page of requirements.
In my case I might have music playing from one of my sources, but I would like that the others remain active, such that, say, for example my boss phones me on the laptop, I'd still like that ringer sound to play through the outputs.
Yes this can all be done quite easily if you are on a single PC host. You can mux, duplex and even select outputs, all you want. It's when you have 3 PCs it becomes tricky. I've been down that research route of networked sound and sound servers and ... no.
Then there are outputs. I'm going to bet when you plug the headphones into that DACMagic the speaker outs cut. I'd like a way to re-route different inputs to different outputs. Such that most things come out the speakers, but if I want to listen to one source through the headphones.... I don't have to start muting the speakers, turning the amp down or repatching things.
On the complicated aspect. If you stay away from the high-end studio gear and you do not have a requirement for high bandwidth multi-channel like DTS/DolbyN.N.N and what not, then the consumer grade DAC/ADC/Codec/DSP IC market tends to produce ICs which just work. There is no messing around with config interfaces and memory registers, you plug them in (solder them up), pull the FMT and FSEL pins (Format and function select) to the way you want them and turn them on. Boom out comes you stream of digital audio. ADCs and DACs function that way. So do the USB codecs. Literally china will sell you a board with a high end consumer ADC on one end and a DAC on the other for $4. You can mock, but if you knew what you were looking at you probably wouldn't. I've seen them with Cyress audio ADCs and BurrBrown PCM5102's setup for 384K 32bit... for less than $10. You productise that and sell it in the western world with a nice brand name on it and it's a £300+ box.
Except for the later (USB), these consumer grade "jelly bean" devices come with a range of formats. ADCs often have selectable 48k-192K 24bit or higher. My currently selected output DAC is a PCM5102A. It will auto detect and just work whatever you throw at it, within reason. Up to 384K@32bit. It's impressively quiet, provides more dynamic range and stereo separation than my PC's headphone socket (an ES9037 I believe). It's a £3.88 +VAT part. I'm going to order a strip of 10 of them from TI direct. In case a cook a few soldering them. The only downside is it's overload behaviour is hideous so the circuitry downstream of it will need to buffer it with enough gain that it never needs to be driven hard. It will drive a pair of headphones to moderate volume though with no additional supporting circuitry or amplifier.
DSPs. Particularly audio DSPs are not all created equally. They don't all do the same things. Typically if a commercial product advertises that it has a "DSP" they tend to me a digital effects processor, which is still a DSP, a DSP with specific functions enabled etc. Home theatre DSP/Effects modes for example. A less "shouty" DSP usage would fit my project entirely and that is using the DSP for all the other stuff they do really well. Such as manage multiple cross clock domain I2S/SPDIF/etc channels, delays, alignment, buffering, correction, filters, anti-pop, anti-noise, anti-buffer crash all the little edge cases covered in one central processing IC. Brilliant.... Until you open the datasheet and find it's 100 pages long... but is only the electrical datasheet, to find out how to use it you need the Reference Manual. It's 2500 pages. You find that the little inch square DSP has not one but two ARM cores on it, it has kilobytes of configuration registers I2C and SPI configuration buses. It's a career learning to drive one of those! It's the programming of those bad boys that adds the engineering labour costs to a high end studio rack box that costs 10K+.
So not for me. The central DSP with a few auxilary supporting front-ends or rear-ends makes far more commercial sense if you consider a digital audio product company will not be doing all that from scratch, but will have in house experience to take advantage of. Once they program a DSP the way they wish they can stamp out millions of products with just software config tweaks and the BOM cost will come down because the number of components comes down, even if the remaining ones are higher value.
In terms of actual digital signal processing all I need is gain, adders and the most difficult part, a multiband parametric EQ. 2 shelves and 3 peak bands would do.
So that is why the plan is to use cheap though powerful general purpose MCUs and do the job of the DSP in software. The actual MCU chip is probably two or three times more expensive than a basic DSP but I only need to make one or two of these. So I can afford to splurge on BOM, there is effectively no limit on the BOM. If an IC I want costs me £15 each due to shortages, I'm fine with that. My day job is software engineering, so I'm fine with software complexity.
Consumer grade ADC ICs. Consumer grade DAC ICs. Consumer grade USB Bridges. Probably a few consumer grade I2S reclocking ICs.
48K 16 bit. Is the target requirement for V1. I have decided to keep a straight flat clock/sample rate throughout. There are reasons and caveats. I don't need professional grade totally lossless encoding/decoding. I'm fine with my 20kHz nyquist. Hell, I can't hear anything over 14K at all and anything beyond about 12K is a perception rather than a sound. I don't even hear CRTs whine any more when I enter a room with one. I literally dropped a high shelf with a 3db cross over around 16k on an EQ and I couldn't hear the difference between +15db and -15db. The cat was not impressed though.
32 bit. 32 bit is both 'easy' and really difficult for the same reason. It aligns perfectly with most ARM core MCU/MPUs architecture. It's DSP is design to work with 32bit numbers. The reason that makes things harder however is that ALL of your mathematics have to be carefully bounded within the 32bit width and cognisant of any overruns/carries. Such that you will in reality be force to DROP precision by shifting your bytes right so you have headroom for calculating or downmixing. It's that horrible sacrifice audio people are too willing to make IMHO. You don't go down in precision to come back up, you always do it the other way! 99.9% of amplifiers attenuate their input signal and then amplify it with a fixed gain. It's the same thing in the DSP code. If you start with full range 32bit numbers, you can't, quite just add them together and you certainly can't multiply them by anything greater than 1. So you shift them (divide by 2) as many times as you think you'll need headroom for what comes next. After that stage you are then responsible for rescaling your output back to 32bit. Yuk.
24bit. A word 3 bytes wide. No thanks. Next. Seriously it's just a whole load of 'dicking about' with half words and quarter words. It "does" provide you with some headroom in a 32bit word for safety though.
16 bit. It's lovely. It's mommy bears porridge. You can get 2 samples into a single 32bit word "on the wire", that's perfectly happily going to L/R (or R/L depending if you forgot to set that pin right 🙂) One L sample one R sample. In one 32bit word. Such that if you want to do anything relating to the stereo pair... it's right there in the same temporal and logical location. No need to go peek&seek through the buffer to find the associated L/R sample or the other xSB of the 32bit word.
When it comes to calculations it's very natural in ARM to treat 16bit and 32bit words interchangable. You can ask a 16bit question of a 32bit word and get a 16bit answer. If that makes sense. In code you can literally cast buffers between 32bit unsigned and 16bit signed as long as it's valid. If you take 2 or 5 16 bit samples and add them all together you can bet it won't fit into your 16bit word, not without throwing away 4/5ths of each's dynamic range. But they will fit into the parent 32bit word.
The trickier part is what to do with the output that now doesn't fit into your output word. There are two approaches. Is it the result of the previous calculation? Did you just add a bunch of stuff? Then chop it down by the appropriate weighting that undoes the totalling effect (weighted average either static or dynamic (multichannel compression)). Note doing this after the mix will preserve the most dynamic range/snr. Or you could compress it. Limit it, or if you are in a section of code where it really wasn't your fault, you mute the whole frame, flash a red LED that says "CLIP" or "O/L" and move on. If someone dumps two 100% volume streams into a single mix bus it will CLIP and that's not the processing bus's fault.
Caveats. Other than using 32 bits for headroom and alignment for the likes of ARM 32bit BiQuads on their DSP extensions I'm sticking to 16bit 48K as the target architecture. That does not mean I'm not prototyping and playing with higher rates. It's just that to step up to even 24bit/96k would mean my margins get a lot narrower and my code and interfaces need to be tighter and tighter. At least the first time round I'm aiming somewhere I can be a bit sloppy, slow and still not add more than a few ms of latency. I do have an Atmel DSP Anamero(?) in the post. 384K@32bit + D512 and all those high end formats. Will be interesting to play with, but the processing power to work with those bitrates is.... you'd need an ASIC DSP basically. They only truely shine when you are handling 7,8,9,10 channels of theatre audio. It would be perfume on a pig for a single stereo stream from Spotify, no?
However, your post did remind me I DO have an optical SPDIF on the "big" PC. Turns out several of the same codec chips I have been looking at support SPDIF, That provides one less pesky USB ground to worry about. My easy solution for power rail noise is going to be running the box on batteries with opto coupled USB ports. When I switch it OFF, it can charge it's battery with as much noise as it likes. Opto-couple USB ports that will support greater than USB1.1 and 48K 16Bit 2 channel audio cost a fortune. USB1.1 couplers cost a fiver. Another reason to stay 48k.
Well, doing it with digital sounds like a big undertaking. It would be trivial to mix a bunch of analog inputs using op-amps. I'd be tempted to start with an off-the-shelf DJ mixer, then build an add-on box for more channels and tie that into the main and cue busses. But good luck with your project. Maybe you should get an analog mixer to use while you work on the digital version? BTW, there's $5 DACs on ebay; they resemble the one I bought about 10 years ago, but perhaps some corners have been cut since then.
https://www.diyaudio.com/community/...olution-how-good-could-it-sound.211702/page-2
https://www.diyaudio.com/community/...olution-how-good-could-it-sound.211702/page-2