24 channels USB to I2S interface (with Source codes, ASIO/VHDL/Schematic)

Thank you Koon. I will work it with the sim first. I have a fair bit of MCU experience but I appeciate your advice, breadboard first!

Trevor, good summary of USB audio options. Note this option is not USB Audio compliant and uses a different model (FIFO buffer) not described in the paper (although quality should be equivalent to "best"). Latency may be a little more with this solution due to the time taken to fill the buffer up to the limit where the FPGA starts transfering the data, should only be in the low msec as the USB transfer rate is very fast (faster than the output sample rate).
 
Looking into FPGA makes/series at the moment. Does anyone have any data on which FPGA make/series has low jitter output, or what techniques can be used to lower jitter on the output of the FPGA? I'm not talking about the clocks, but the inherit jitter in the FPGA itself through internal gate switching (also jitter may be irregular depending on the logic path taken for the signal).
 
now I'm listening through foobar - Reaper - my ASIO driver.:)
But foobar direct has some trouble, (1) does not show time info, (2) no graphical FFT meter, (3) when stopping, debug assertion (no source code for foo_asio_out.dll)
maybe.. ASIOTimeStamp function? I need to look into foobar plugin structure.
 

Attachments

  • koonasio01_20111117.cpp.txt
    17.7 KB · Views: 106
Koon, check with the foobar forum at hydrogenaudio (developer forum) for your ASIO question.

I have a couple more questions as I have considering a design similar to yours (and a good learning exercise for FPGA). Why did you use an expensive FPGA board? I have loaded up your VHDL code into Xilinx Navigator and got it to compile easily with a small CPLD, which is a much simpler and cheaper alternative. You don't need too many logic blocks for this function (although I'm considering using SDRAM which needs a lot more code to manage the SDRAM as a FIFO).

Also, why did you mention to isolate after the FPGA? A better option would be to isolate between the FTDI board and the FPGA, then any extra jitter added by the isolator will be removed by the FIFO, and the FPGA can share the digital ground with the other digital logic.

To optimise your design for even lower jitter, use a low jitter flipflop (like the potato ones) as a reclocker for the I2S outputs.
 
Hi dean,
Please don't think my design is 'reference'.. just a doodle :eek:
FPGA module: I have some FPGA module, in this case 50 pin is good.
of course you can replace to any other CPLD, FPGA if VHDL fits in.
SDRAM: yes you can use SRAM/SDRAM or FPGA internal block RAM. why I use external FIFO is, that is not a point of the VHDL. (sample code should be simplified)
This DLP design board can implement everything(USB interface, SDRAM, VHDL) in this module, but that is not a good sample.

Isolation point, power, clock, buffer of FF: if isolation between PC and external, you can place Isolation around UM232H.
for Isolation between your DAC and other digital, then should be after FPGA.
FPGA can be the source of clock (power supply) jitter, because it uses different clock.

FlipFlop/reclock: also, Master Clock on this board or Master Clock on the DAC | reclock on this board or reclock just before DAC | is selectable.
I use TI digital amplifier so I only need buffer / clock on this board.
If you use splendid DAC, you will pull Master clock from DAC, and isolation / reclock will be placed just before DAC.
you can modify as you like. please enjoy.
 
Koon, it seems according to your schematic that you are using a 8K FIFO. Did you try this design with just the 1K FTDI internal FIFO? If you keep the channels to 8 and sample rate to 96Khz then you should be able to get about 40 frames into the 1K FIFO buffer which is about 0.5msec worth of buffer, for High Speed USB should be enough (buffer of 2 x 512 bulk transfer USB packets). However with a small FIFO the design will be open to dropouts if the PC can't keep the packet rate up (bulk mode does not guarantee timing).
 
Further research has answered this question.

Even on a lightly loaded USB bus, Windows may cause delays in USB bulk transfer requests, so there is a high chance of buffer underruns and gaps in audio output on a Windows system using this approach. Basically the FIFO buffer between the FTDI module and the FPGA/CPLD should be as large as possible to minimize overruns. I've been trying to decide what hardware to use for my prototype (CPLD with FIFO chip, or FPGA with SDRAM), and for some reason even medium capacity FIFO chips are very expensive, so if a large buffer is needed, a FPGA with a cheap SDRAM chip is the way to go. Note with a FPGA and SDRAM approach typically you need to implement both the SDRAM controller (dual port for async) as well as the FIFO logic. Luckly there are reusable code blocks for these functions.
 
Hi dean, great, you are designing your own?
When I tried internal buffer, looked many dropouts (fifo empty flag), from test program.
I think 0.5msec should be enough for Driver, but I don't want to touch Windows DDK..

I used 72V05, 8KB = 105 samples = 2.3msec. (+ internal buffer)
my ASIO driver runs dedicated thread for buffer writing, did not cause drop. I checked underrun flag from FPGA.

Yes you can use FPGA + SRAM or SDRAM buffering. I paid $20 to avoid more VHDL to think.:)
and my previous design were using FIFO, not SDRAM.

Or, this is vendor specific, but you can use Block-RAM in FPGA.
 
Yes, I calculated similar buffer times. However some research on the microsoft support site mentions there will be times when USB bulk transfer will experience delays that can't be controlled by the user, which would lead to buffer underruns. I have seen these delays occassionally with my experiments with the hi-speed FTDI modules, sometimes in the 10s of msecs. USB Device Using Bulk Transfers Experiences Buffer Overflows

So although its great that we have error correction in bulk transfer (so we are guaranteed to get the bits to the FPGA bit perfect) we cannot do much about the timings. So I think for this application you want to use a large FIFO, 1M or greater, that will pretty much ensure that you won't get buffer underruns, especially if you have slower PC that shares the USB bus with other peripherals (I wouldn't use a USB hard disk or digital tuner card in the PC at the same time). Having a 1MB FIFO would give almost 300msec which should be more than sufficient, the internal FPGA RAM is not sufficient. One thing I'm not sure of is I think the application sends the audio data through the device driver at the native sample rate, which means if we add too much buffer then we end up with a large latency which will be unacceptable for video applications, and having a small buffer will leave us open to possible USB bus congestion. Maybe a 10msec latency would be a nice compromise. Have you stress tested your USB connection while looking for FIFO underruns?

The SDRAM controller and FIFO code is complex but from my research you can use the Xilinx CoreIP for an asynchronous FIFO.
http://www.xilinx.com/support/documentation/ip_documentation/fifo_generator_ds317.pdf
I have also found a fairly decent FPGA development board for $55 with a Spartan 3A and 4MB of SDRAM at XESS homepage announcements with some great samples and utilities to support the board, as well as the UM232H for the high speed USB for $20.

So with a little bit of work we can have something similar to the Exadevices for under $100 with these development modules, and if the design is successful you could make an integrated board for under $50.
 
Hi, XESS is very cheap! nice! do they provide SDRAM as FIFO sample VHDL? Please don't believe Xilinx CoreIP can work as it is, always needs modification.

you will use 11 pins for FTDI, and MCLK, SCK, LRCK, SD(1-N) so total 32 should be enough.
Basic design = FTDI UM232H + XESS Xula-200 + master Clock + Buffer, reclock, isolation etc

If you want to change clock, FTDI FT2232H module should be used. then,
PORT A: FIFO supply
PORT B: Bit Bang, to switch Clock input and mode (11.2896/12.000/22.5792/24.000 etc)

For USB delay: I'm using Core i7/Core i3, and no USB hub, did not see huge delay.
while debugging I added "FIFO Empty - 2.8MHz binary 8bit counter" and watched upper 4 bit output.
that is 44msec, 22msec, 11msec, 5msec. no one was activated.
Stress: Reaper is heavy process, but I think Reaper gives ASIO DLL the high priority.
With PCMark2011 or other benchmark, I should see the dropout :)

When you want "clean" environment for USB communication, dedicated PCIe x1 USB card - straight connection, will separate FTDI from other USB transaction?
 
Koon,

Yes, they provide SDRAM controller samples but not FIFO. FIFO needs to be asynchronous as you have different clocks for input (synchronous FT245 mode at 60Mhz) and output (48xFs), which means it needs dual port SDRAM controller (which Xess provide) and gray encoding + circular buffer logic etc.

Good to hear you did not experience any dropouts, especially with a 4K FIFO (only 2.3msec buffer). The dropout problem is not CPU, it is a bus/IO issue. The best way to avoid this problem is to have the PC software feed the ASIO driver at a sample rate higher than the DAC sample rate, so the FIFO is always being filled. However my understanding is that the PC software sends the data at the native sample rate, so we only have the buffer latency of 2.3msec as the margin of error. I am unsure about this, my logic tells me this is how the PC end works (send at sample rate), yet I would expect you would see occassional buffer underruns with such a small buffer??? To stress test it you could plug in a USB drive and do a data transfer while monitoring your dropout counter.

I have an alternative 'design doodle' to consider. I have been looking at the Cypress FX2LP EZ-USB chips and they look even more powerful than the FTDI ones including High Speed 480Mhz support. These chips have 4K FIFO buffer setup as 1K x quad buffers (4 deep FIFO buffers which I assume for FIFO use is the same as a 4K buffer) and they have a 8501 micro on board. So I am thinking that if we do the I2S formatting on the PC end and send out bytes with each bit representing one of the I2S channels (and sending the L/R word toggle as one of the bits) then all you have to do is to place the byte in the FIFO on the parallel port output of the cypress chip and you have instant multi-channel I2S without needing a CPLD/FPGA! Ideally you want to use a reclocker as the final stage but this should be one of the low jitter discrete chips outside of the I2S logic anyway. The micro on the cypress USB chip can do auxillary functions like the dropout counter, FIFO buffer management. The cypress chip can also do a separate control endpoint that you could send control information to the onboard 8501 like sample rate information, or even I2C commands for the DAC (eg. setup & volume for the ESS chips).

This approach seems to me to be even more simpler and very cheap hardware (dev boards on eBay for $15)! Cypress has a DLL similar to the FTDI D2xx driver and examples for .NET and C++ so the PC end should be easy. What do you think?
EZ - USB FX2LP - USB host - peripheral controllers - PDAs - Set - top boxes - MP3 players / Personal Media Players - Cypress Semiconductor
 
Hi Dean,
I used single clock FIFO access, just used "to pass" register to the I2S area.
So you can implement FIFO area simply.

PC software prepares "to fill" function and register it to ASIO driver.
then, ASIO driver will call this function, periodic, with double buffering.
While sending buffer 0, requesting buffer 1. then, requesting buffer 0, while sending buffer 1.
Sampling Rate is single.
2.3msec is, in my code, {FT_W32_WriteFile finished, fire the event, and call FT_W32_WriteFile again} this cycle.
If this cycle exceeds 2.3msec, it will cause underrun. But this is continuous loop in dedicated thread. 2.3msec is long enough.
Code:
	while (Parent->started)
	{
		int BufferNum = Parent->toggle;
		DWORD dwWritten;
		DWORD dwBufferBytes;
		dwBufferBytes =  kBlockFrames * (kheaderbytes + 3*kNumOutputs);	//24bit/channel * 24channels + header

		if (FT_W32_WriteFile(Parent->ftHandle, Parent->pSendBuffers[BufferNum], dwBufferBytes, &dwWritten, NULL)) { //modified wav
			if ( dwBufferBytes == dwWritten) {
				//OK
			} else {
				Parent->ftStatus = FT_W32_GetLastError(Parent->ftHandle);
				FT_W32_PurgeComm(Parent->ftHandle, PURGE_TXABORT | PURGE_RXABORT | PURGE_TXCLEAR | PURGE_RXCLEAR);
			}
		}
		SetEvent(Parent->hEvent);	//Fire event
	}

(1) Player software have to supply next data, 512samples, in every 10msec.
(2) ASIO driver continuously call WriteFile, within 2.3msec headroom.
(3) in FT_W32_WriteFile, FTDI d2xx.sys sends byte stream to UM232H FIFO.
(4) FPGA reads FTDI to ITD, if ITD not full and FTDI not empty.
(5) FPGA reads ITD to I2S, if I2S next requested and ITD is not empty.
(6) FPGA generates I2S stream, for L/R cycle.

(2) is not critical, as I wrote above, it's in continuous loop.
(3) is not critical. if WriteFile() called, it has enough bandwidth(40MB/sec), almost waiting for FIFO consumed and available.
(4)(5) are not critical, there is enough bandwidth (15MB/sec) and almost on idle.
(6) is routine work.

only (1) is not on my control. Player software without pre-load buffer can not supply in 10msec.

Cypress FX2: I used this at previous design. https://sites.google.com/site/koonaudioprojects/usb-to-multi-channel-i2s
But I wonder Cypress can continue to supply Signed driver for Windows7, cyusb.sys?
and programming 8051 is not fun. If you want to go MCU with memory(DMA IO can work like FIFO), maybe Atmel SAM3U should be better. (There is a thread)

Packing L/R + SD1234567 in one Byte is great idea. it can reduce much VHDL.. no, maybe we don't need FPGA.
1Byte = bit(01234567)
bit0 = goes to LRclock
bit1 = SD1
bit2 = SD2
:
then, 64 bytes can produce one L/R cycle, for 14 channels. multiple channel, I14S signal comes straight from PC.
Outside logic: from MasterClock, generate SerialClock and pull signal for each byte. can be 74HCxx glue logic.

The DSD player(on another thread) just work like this. for DSD, now there are no L/R, only bit stream comes from PC. SIMPLE!:)
 
Koon, as always thanks for the detailed explaination. This has been a good tutorial for me. I'm glad you like the idea of packing the individual I2S bit streams into a byte, this thought came to me as I was looking at the cypress data sheet, and will reduce the complexity of this solution a lot. The other advantage of this method is that with a little more logic around the external FF reclockers (which is important to reduce jitter), multiple sample rates can be supported by adjusting the FF clock rate.

I understand the buffering in your PC code and that is not where you will get underruns, but my question is more basic about the buffering mechanism and how to avoid buffer underruns with USB bulk transfer. If the player software is delivering the data to the ASIO driver at the normal (single) sampling rate, then there is no way for the PC side or I2S side to avoid an underrun using USB bulk mode unless there is a large latency delay to fill up the buffer on the I2S side (assuming the problem we are tackling is a delay sending USB bulk packets due to Windows housekeeping - which will happen, can be up to 100msec). But if we have a large buffer on the I2S side then for video applications this won't work as there will be lip sync issues.

Do you use a special player with a buffer? What latency do you see in your system that avoids the buffer underruns? How well will your solution work with standard software like a DVD player that can't have the player buffers tweaked?

So I'm thinking that USB interrupt mode would be a better choice than bulk mode (which could suffer from buffer underruns or high latency) or Isochronous (which can lose packets) as interrupt mode guarantees the latency and has all the other benefits of buk transfer (retries in case of error) and up to 1KB in a frame. This can't be done in the FTDI chip but the Cypress chip can.

There is a bit more work needed for Win64 install but it can be done, see this thread for details: fpga4fun.com • View topic - Vista and windows 7 x64 driver.
I can't locate a thread with SAM3U, do you mean the Open Source Widget thread? They use an Atmel CPU, their problem is that there is no official UAC2 drivers for Windows which is a problem for high speed operation. I looked into writing the UAC2 drivers on the Windows side and it was not pretty, and came to this thread instead. Another reason for liking the cypress solution is that there are cheap and easily accessible development boards, and the cypress site has good support for PC developers (eg. .NET SDK).
 
Koon, check with the foobar forum at hydrogenaudio (developer forum) for your ASIO question.

For foo_asio_out issue, I changed
getSamplePosition (sampleposition, timestamp)
from ASE_NOTPRESENT to, return current sample count + timestamp.
OK foobar is running as usual.
So this interface should be "must have" for ASIO drivers :)
 
Hi dean, anyway Cypress board is cheap, I bought one now :)

Difficult to explain, but from usec to sec,..
(1) I2S area: buffer = 72 bytes, for next 1 frame. cycle is 22usec. register copy.
(2) FTDI-IDT area: buffer = total 9KBytes, 118 frames, about 2.6 msec. FIFO read.
[ FTDI D2XX area : bulk transfer, using 125us microframe ]
(3) ASIO double buffer: buffer = 39KBx2, 512 sample frames. about 11.6msec. Memory to WriteFile() loop.
(4) Player buffer: depends on implementation. some MB, File Read Loop.

from (1) to (4), buffer and cycles are going larger and larger to avoid underrun.

FTDI D2XX has 9KB FIFO buffer in front of it, and behind there is 39KB data.
If there are available in FIFO, D2XX sends some data from ASIO side to FTDI device.
If Behind buffer is nothing, WriteFile() routine ends, but (3) ASIO routine is waiting that and issue next 39KB in continuous while() loop by switching.
So, underrun only occurs when this buffer switching can not processed within 2.3msec.
not buffer "filling". switching. filling is done background.

interrupt or bulk: If bulk mode causes error and retry, interrupt mode will cause error at same ratio, and it loses data = discontinuous wave, and rip sync will be lost.
I think bulk mode is enough (for bandwidth/response). If bulk mode causes underrun trouble, interrupt mode causes comb data lost trouble.
 
Hi Koon, sorry to be persistent with the buffer problem, stay with me here, I may have missed something fundamental, your explainations are appreciated and I'm learning :)

My proposition is that there is no problem with the buffers on the PC side, or with the buffers in the I2S DAC side, but the problem is that we can get random waits on the USB line with the bulk transfer. So think of it like you get an uncontrolled 30msec delay between the USB packet sent on the PC (FT_W32_WriteFile) and it being received on the FTDI chip. On the PC end you just keep filling up the internal D2XX buffers and the code blocks on the write. No problem on the PC side, the data gets bigger waiting to be sent and will get sent eventually when the USB jam clears.

Now on the I2S side its a different matter. We have much smaller buffers and we have a constant clock stream pulling the data out of the FIFO buffer that only gives us 2.3msec of buffer. This means that once the FIFO buffer is full, if we don't get the buffer refreshed in 2.3msec the I2S stream will run out (underrun), and this could happen if our regular USB bulk mode packets get delayed on the USB bus - which according to the Windows & USB specifications for bulk mode it can happen if the USB bus or even Windows is busy. So from an engineering standpoint we have to cater for this situation even though it may not be happening with testing on a lightly loaded system. There is nothing that we can do on the PC side to help this situation, the concern is the size of the FIFO on the I2S side.

The only choice we have to cater for larger USB bulk transfer delays is to increase the size of the FIFO buffer (which causes latency delays, OK for audio but no good in a multi-media situation), or change to a USB mode that offers us some guarantee of latency. This is why I think USB interrupt mode is useful, it is similar to bulk mode (including retries for error correction) and it allows us to tune the guaranteed latency time (eg. we could set it to 1msec). Interrupt mode is better than Isochronous mode because you can have guaranteed latency but still get corrupted data, but in Interrupt mode you get the guarantee of latency and the error checking.

I appreciate your patience with me (and my logic!). DIYAudio is an area for learning by experimentation, teaching and shared experiences. ;)
 
Hi dean,
Some points I'm curious are,..
(1) Microsoft is writing "from Device to Host", when Device buffer is full, device can not send data anymore. This is issue of course.
(2) "This means that once the FIFO buffer is full" - FIFO buffer is almost always full? and there is no "refresh" of buffer. always "append".
(3) Bulk transfer delay - I don't see such delay in my condition, with 4 way PC crossover.
(4) bulk mode delay when Windows is busy.. this is same for all devices, interrupt, PCI, everything. (Windows Driver and IO issue). Handling Priority is Player software issue.
(5) Bulk mode delay in USB specification.. this happen when USB bus usage = bus bandwidth. Bulk mode has lower priority. but, what is the interrupt or isochronous device which occupy 480Mbps?
for this device, FTDI uses around 3.5MB/sec. when remaining 55MB/sec is used by another device, FTDI has lower priority - delay can happen. but, with USB HD Video cameras and USB HDD on same bus?

I2S 1 frame buffer : filled around 17usec before, consumed just before next cycle.
IDT FIFO buffer: 8KB almost always filled, consumed by IDT-I2S logic.
FTDI FIFO buffer: 1KB almost always filled, consumed by FTDI-IDT logic.
D2XX internal buffer: repeats Write()=39KB filled, 0KB remain=Write() Finished.
ASIO internal buffer: double bufferd, when 1 is writing, 1 is filling.

underrun only occurs when I2S 1 frame buffer is empty. Other buffers can be empty.[I/]
overrun does not occur. every layer is always "pushing/waiting" by faster speed than next layer. only one I2S MCLK pulls, consumes.

In short, I think Bulk mode is enough and good for audio. Windows completely does nothing. and Bus speed is fast enough not have to worry about occupied.