UAC2.0 on STM32

@EvSap : I guess this is due to holidays and other things to be done - but since I followed this discussion with great interest and there was heavy progress before christmas I wondered what´s your current status with the project?
Can't stop thinking of "MCU + USB3300" device working on HS with polling interval of 1 microframe to get maximum data throughput. So, have replaced stm32f722 by stm32h743 for tests, but have not succeeded, USB3300 does not work at all with stm32h743 by now.

may I ask whether you have your codebase somewhere available on github? If not, will it be an open source example?
By now code is not published on github. I am going to share the code as I get some working version.
 
Have been trying getting pure sound for several days, haven't succeded. Sound has distortions. Don't understand what the reason is. I am sure device receives data without losses and sends feedback properly. Iso OUT Incomplete interrupts don't occur, UAC2 log reports no errors, PC sends audio data of variable size according to feedback. But output sound is not pure. The last sections of code under suspicion are buffer filling / reading. Have checked them several times and in my opinion according to my logic there is nothing wrong with the code there. But my understanding of buffer filling / reading may be wrong.
@bohrok2610 could you please take a look on my buffer filling / reading code? Maybe you will see something I am missing.
C:
#define USBD_AUDIO_FREQ        192000 //16 bit
#define AUDIO_OUT_PACKET       (uint16_t)(((USBD_AUDIO_FREQ * 2U * 2U) / 1000U) / 4)
#define AUDIO_OUT_PACKET_MAX   200
#define AUDIO_OUT_PACKET_NUM   120
#define AUDIO_TOTAL_BUF_SIZE   ((uint32_t)(AUDIO_OUT_PACKET * AUDIO_OUT_PACKET_NUM))

static uint8_t USBPacket[AUDIO_OUT_PACKET_MAX];
static uint8_t AudioData[AUDIO_TOTAL_BUF_SIZE];
//haudio->buffer = AudioData;

//---------------------buffer filling
uint32_t PacketSize = USBD_LL_GetRxDataSize(pdev, epnum);

uint32_t FreeSpace = AUDIO_TOTAL_BUF_SIZE - haudio->wr_ptr;

if (FreeSpace >= PacketSize)
{
    memcpy(&haudio->buffer[haudio->wr_ptr], USBPacket, PacketSize);

    haudio->wr_ptr += PacketSize;
}
else
{
    memcpy(&haudio->buffer[haudio->wr_ptr], USBPacket, FreeSpace);

    haudio->wr_ptr = 0;

    PacketSize -= FreeSpace;

    memcpy(&haudio->buffer[haudio->wr_ptr], &USBPacket[FreeSpace], PacketSize);

    haudio->wr_ptr += PacketSize;
}

(void)USBD_LL_PrepareReceive(pdev, AUDIO_OUT_EP, USBPacket, AUDIO_OUT_PACKET_MAX);

if ((haudio->rd_enable == 0U) && (haudio->wr_ptr >= AUDIO_PLAY_THRESHOLD))
{
    haudio->rd_enable = 1U;

    SAI_Play((uint16_t *)&haudio->buffer[haudio->rd_ptr], AUDIO_OUT_PACKET, 2);
}
C:
//in DMA transfer complete interrupt
haudio->rd_ptr += AUDIO_OUT_PACKET;

if (haudio->rd_ptr >= AUDIO_TOTAL_BUF_SIZE)
{
    haudio->rd_ptr = 0U;
}

SAI_Play((uint16_t *)&haudio->buffer[haudio->rd_ptr], AUDIO_OUT_PACKET, 2);
There are two buffers in the system: 1) USBPacket is for receiving data from PC; 2) AudioData is for audio data storage and outputting.
Filling AudioData buffer. Device receives audio data from PC to USBPacket buffer. Free space in AudioData buffer is checked.
If free space is enough for new packet, the whole packet is copied to AudioData from current position.
If free space is not enough for new packet, some part of packet is copied to AudioData from current position to the end of AudioData buffer.
And the rest part of packet is copied from 0 position of AudioData buffer.
When the half of AudioData buffer is filled in, SAI starts outputting audio data.
Reading AudioData buffer. SAI DMA always reads the same size of data from AudioData buffer per transfer that is equal to nominal audio packet size AUDIO_OUT_PACKET. In DMA transfer complete interrupt routine, read position in AudioData buffer is updated and new transfer starts. When read position reaches the end of AudioData buffer, reading starts from the beginnig of AudioData buffer, that is, read position = 0. AudioData buffer size is equal to AUDIO_OUT_PACKET * 120.
 
Your buffer filling process seems to be ok. What does SAI_Play do?

As the buffer filling and reading processes are asynchronous buffer over/underruns may occur unless feedback is working properly. Do you check for buffer over/underruns? E.g. at buffer reading check if there is enough data to read (wr_ptr - rd_ptr >= AUDIO_OUT_PACKET)?
 
What does SAI_Play do?
It initiates SAI transfer by means of DMA.
C:
SAI_Play((uint16_t *)&haudio->buffer[haudio->rd_ptr], AUDIO_OUT_PACKET, 2);
//&haudio->buffer[haudio->rd_ptr] - pointer to buffer to be read
//AUDIO_OUT_PACKET - bytes number to be read
//2 - audio stream resolution in bytes, that is, 16 bit
void SAI_Play(uint16_t *Data, uint16_t Size, uint8_t ResByte)
{
  if (ResByte == 3)
    ResByte = 4;

  uint16_t TxSize = Size / ResByte;
 
  DMAStream->NDTR = TxSize;
  DMAStream->M0AR = (uint32_t)(&Data[0]);
  DMAStream->CR |= DMA_SxCR_EN;
  SAIBlock->CR1 |= SAI_xCR1_DMAEN;
}
Do you check for buffer over/underruns? E.g. at buffer reading check if there is enough data to read (wr_ptr - rd_ptr >= AUDIO_OUT_PACKET)?
No, I don't. Good idea. Buffer reading starts after filling a half of buffer. So wr_ptr should always be ahead of rd_ptr if feedback is working properly. Maybe feedback implementation is the reason. Going to check it. Thanks for help.
 
As the buffer filling and reading processes are asynchronous buffer over/underruns may occur unless feedback is working properly.
It seems this is the case. Reading is much faster than writing. Feedback value is estimated by means of a timer which counts BCLK pulses of SAI within SOF period. Nominal BCLK pulses count is 768 per 125 us (192 kHz / 16 bit). I get values close to this: 768 +/- 6. Feedback value is accumulated for 8 SOF periods (feedback endpoint bInterval = 4) then fitted to range [0x002F0000, 0x00310000] (as Windows requires for data endpoint bInterval = 2) and send to PC.
C:
#define AUDIO_BCLK_NOMINAL   ((uint32_t)(USBD_AUDIO_FREQ * 2 * 16 / 1000) / 8)

static uint32_t BclkValue = AUDIO_BCLK_NOMINAL;
static int32_t FeedbackValue = 0;
static uint8_t SOFCnt = 0;
      
BclkValue = FB_TIMER->CNT;
FB_TIMER->CNT = 0;

FeedbackValue += BclkValue - AUDIO_BCLK_NOMINAL;

SOFCnt++;
if (SOFCnt == 8)
{
    FeedbackValue += AUDIO_BCLK_NOMINAL;
    FeedbackValue <<= 12;

    AudioFB[0] = FeedbackValue;
    AudioFB[1] = FeedbackValue >> 8;
    AudioFB[2] = FeedbackValue >> 16;
    AudioFB[3] = FeedbackValue >> 24;       

    USBD_LL_Transmit(pdev, EPNUM_AUDIO_FB, AudioFB, AUDIO_FEEDBACK_EP_PACKET_SIZE);
    FeedbackValue = 0;
    SOFCnt = 0;
}
Don't understand what error is. From one hand estimated feedback value is very close to nominal value and packet of nominal size is demanded almost always. But from the other hand I can see via debugger that reading is faster than writing which would require packets of bigger size more often. At the moment I have no explanation for such a contradiction.
 
Reading is much faster than writing.
If SAI runs much faster than host the reason may be the clock configuration. Are you using PLL? It may be useful to check that the data rates of host and SAI are the same (e.g by counters on received USB packets and SAI DMA interrupts).
Feedback value is estimated by means of a timer which counts BCLK pulses of SAI within SOF period. Nominal BCLK pulses count is 768 per 125 us (192 kHz / 16 bit). I get values close to this: 768 +/- 6. Feedback value is accumulated for 8 SOF periods (feedback endpoint bInterval = 4) then fitted to range [0x002F0000, 0x00310000] (as Windows requires for data endpoint bInterval = 2) and send to PC.
This seems too coarse for feedback. E.g. XMOS uses a moving average over 128 SOF periods. Using buffer write and read pointers as basis for feedback calculation should be easier and works just fine.
 
If SAI runs much faster than host the reason may be the clock configuration. Are you using PLL?
Yes, I use PLL. I believe SAI is configured properly (as far as PLL settings allow), which is proved by PLL settings and by measuring frequency of SAI output signals.

Using buffer write and read pointers as basis for feedback calculation should be easier and works just fine.
Going to try this technique.
 
I don't know what you mean by "much faster" but async feedback is not able to correct for large deviations as the allowed feedback rates shows. Windows 10 UAC2 driver has stable data rate but when streaming starts there is some variation. If the async feedback scheme is too aggressive this variation can easily be amplified.
 
What is the PLL error rate between the actual audio clock and the generated clock?

Even a 0.3% error will results in a 1ms slip in a matter or minutes. I found that out with the UAC 1 driver the hard way.

Hell, even two buffers running off the clock rate or even the same clock will slip over time.

I spent yesterday working on I2S coding. My use case is a little more complicated, but the first thing I did was get away from circular buffers. They will cause much more pain than they solve. Instead I created a pool of buffers and hand them out when something needs one, in the good faith they return it when done with it. This means nobody is sharing buffer halfves.

For buffer slippage control it's slightly easier and slightly more tricky for me. I am downmixing multiple I2S streams. So I need to align multiple buffers from different I2S streams and clocks and reclock them out. So while the mixer is waiting on all 4 streams having a buffer available I am running a timer to see how long the alignment delay is. At a certain point I can intervene if required. However it has an inherent fail safe with this approach. If a buffer is late, it's pointer will not go to the next stage, a blank frame 1ms will result and the buffers will align next time. If a buffer is early it's pointer replaces the previous buffer and a ms of audio is missed on that stream. No buffer crashing is possible.

Also when buffers are returned for reuse they are zeroed. So if anything ends up with a "dangling" pointer it will just point to an array of silence.
 
I don't know what you mean by "much faster"
My thoughts about it. The total buffer size is 120 packets, 120 * 192 = 23040. Reading starts when the buffer is filled in for half, that is, WrtPtr is >= 23040 / 2 = 11520. It appears when full buffer is read, that is, RdPtr moves up from 0 to 23040, WrtPtr moves from 11520 (or close to this) rolls over and reaches value near 6000. But in ideal system WrtPtr should reach value of 11520 to move over the full buffer size as RdPtr does. Then RdPtr reaches WrtPtr in the next cycle. But estimated feedback value is very close (+/- 0,3) to nominal one then PC seldom sends packets of bigger size.
 
My thoughts about it. The total buffer size is 120 packets, 120 * 192 = 23040. Reading starts when the buffer is filled in for half, that is, WrtPtr is >= 23040 / 2 = 11520. It appears when full buffer is read, that is, RdPtr moves up from 0 to 23040, WrtPtr moves from 11520 (or close to this) rolls over and reaches value near 6000. But in ideal system WrtPtr should reach value of 11520 to move over the full buffer size as RdPtr does. Then RdPtr reaches WrtPtr in the next cycle. But estimated feedback value is very close (+/- 0,3) to nominal one then PC seldom sends packets of bigger size.
That is far too much deviation for feedback. The data rate discrepancy (23040 to 17000) is very strange. If it was 2x then non-matching sample rates would be the explanation.

BTW why are you using so large buffer?
 
My thoughts about it. The total buffer size is 120 packets, 120 * 192 = 23040.
Wow. That's 120ms depending on bit rate.

I'm working with 192 byte buffers. 96 samples. 1ms. Although total end to end might be more like 3ms. That may need to increase, but maybe doubling it, not 120 fold!

By comparison something like a USB guitar box has a latency of 6ms and that's a cheap one. It's very difficult to play if your amp is 20ms behind your playing.


So as that is the drift per second and it's mostly likely in a consistent direction, you should be able to calculate how long it takes your buffers to slip by a particular amount.

I found that using a 24.576Mhz clock injected into the micro gave me a theoretical 0% offset. It reduced the USB buffers slippage from once every few minutes to once every hour or so.
 
Wow. That's 120ms depending on bit rate.
Actually not as this is HS so one packet every 125us.
I found that using a 24.576Mhz clock injected into the micro gave me a theoretical 0% offset. It reduced the USB buffers slippage from once every few minutes to once every hour or so.
With async feedback and external MCK there should be no slippage.
 
I should point out, I gave up entirely on the STM32 USB audio interface. I'm sure it works fine, but the driver code for same is just not worth wasting time on.

If I am going to use the STM32 UAC ever again, I'm going to go straight to the LL drivers or the registers. It might be faster that way that attempting to use their drivers.

I went hardware instead with Atmel/Xlinx USB I2S bridges. At £30 each, £60, covers me for about 1.5 hours salary rate. Using the STM32 USB Libs already wasted about 10 times that for me. The USB I2S bridge took 5 minutes to get working and I only have to deal with the master clock slew/drift.
 
If you mean the UAC1 code that Cube generates, it is not a driver but non-working sample code. Nobody has suggested to use that.

STM32F723 works very well for UAC2 up to 768k/32 synchronized play & record. At the same time it can also be used for other duties such as a DAC/ADC/encoder/display controller.
 
  • Like
Reactions: 1 user
The data rate discrepancy (23040 to 17000) is very strange.
It appears some data packets are lost. That is the reason. My idea was: as bInterval for data endpoint is 2, parity should be changed in SOF ISR.
C:
uint32_t USBx_BASE = (uint32_t)USB_OTG_HS;
   
if ((USBx_DEVICE->DSTS & (1U << 8)) == 0U)
{
    USBx_OUTEP(1)->DOEPCTL |= USB_OTG_DOEPCTL_SODDFRM;
}
else
{
    USBx_OUTEP(1)->DOEPCTL |= USB_OTG_DOEPCTL_SD0PID_SEVNFRM;
}

ISO Out interrupts didn't occur, I thought that was correct. But after strange situation with buffer pointers described above I implemented packet counter in DataOut routine. The counter showed packet loss. Then I deleted the above code of parity changing from SOF ISR and ISO Out interrupts started to occur. Now in ISO Out interrupt ISR code is added:

USBD_LL_FlushEP(pdev, AUDIO_OUT_EP);
USBD_LL_PrepareReceive(pdev, AUDIO_OUT_EP, USBPacket, AUDIO_OUT_PACKET_MAX);

I can hear the sound is much more clearer now. But packet counter shows that some packets are still lost now but not as often as before.
Investigating why this is happening.