@EvSap, this was probably the issue I faced and why I decided to check the parity:
http://forum.chibios.org/viewtopic.php?t=926
I just tested by sending feedback at every interval and it seems no longer an issue.
http://forum.chibios.org/viewtopic.php?t=926
I just tested by sending feedback at every interval and it seems no longer an issue.
@bohrok2610, I see your point. As for USB3300, I think the reason is not USB3300 itself. USB3300 is asynchronous to stm32f722 which CPU frequency is 216 MHz, that is, only 3,6 times faster. So I think stm32f722 may lose some events / flags from USB3300 that may cause packets loss.
Tried UAC2 code on STM32F769I-DISCO with USB3320. Result is much more better than on STM32F722+USB3300 but not ideal. One packet is lost approximately every 2 minutes.
After a pause I resume this project. I managed to get a couple of STM32F723 MCUs and made a board for tests. The board receives data without losses. SAI is clocked via external 49,152 MHz generator. If feedback is not used I can hear sound with not loud clicks on the background. If feedback is used the sound is much more worse than without it. So feedback implementation is wrong. I spent several days trying to correct feedback value calculation and have not succeeded yet, just don't understand what is wrong.
Feedback value estimation is based on a gap between Read and Write pointers. Read process starts when Write pointer = a half of buffer size. The idea is that gap value should always drift around a half of buffer size +/- one packet size. So if Read pointer is closer to Write pointer more than a half of buffer size when Write pointer > Read pointer, feedback value is incremented to increase data packet size. And if Write pointer is closer to Read pointer more than a half of buffer size when Read pointer > Write pointer, feedback value is decremented to decrease data packet size. But it works wrongly.
@bohrok2610 , could you please take a look on the feedback value estimation code?
Feedback value estimation is based on a gap between Read and Write pointers. Read process starts when Write pointer = a half of buffer size. The idea is that gap value should always drift around a half of buffer size +/- one packet size. So if Read pointer is closer to Write pointer more than a half of buffer size when Write pointer > Read pointer, feedback value is incremented to increase data packet size. And if Write pointer is closer to Read pointer more than a half of buffer size when Read pointer > Write pointer, feedback value is decremented to decrease data packet size. But it works wrongly.
@bohrok2610 , could you please take a look on the feedback value estimation code?
C:
/* stream: 192 kHz / 32 bits
BufferSize = 192 (bytes per nominal packet) * 80 (packets)
AudioBuffer.HighPtrsGap = BufferSize / 2 + 192
AudioBuffer.LowPtrsGap = BufferSize / 2 - 192
feedback = 0x18 +/- 1
*/
int8_t FBCorrection = 0;
if (AudioBuffer.rd_enable == 1)
{
uint16_t PtrsGap;
if (AudioBuffer.wr_ptr > AudioBuffer.rd_ptr)
{
PtrsGap = AudioBuffer.wr_ptr - AudioBuffer.rd_ptr;
if (PtrsGap > AudioBuffer.HighPtrsGap)
FBCorrection = -1;
else if (PtrsGap < AudioBuffer.LowPtrsGap)
FBCorrection = 1;
}
else// if (AudioBuffer.rd_ptr > AudioBuffer.wr_ptr)
{
PtrsGap = AudioBuffer.rd_ptr - AudioBuffer.wr_ptr;
if (PtrsGap > AudioBuffer.HighPtrsGap)
FBCorrection = 1;
else if (PtrsGap < AudioBuffer.LowPtrsGap)
FBCorrection = -1;
}
}
AudioFB.Data[0] = 0;
AudioFB.Data[1] = 0;
AudioFB.Data[2] = AudioFB.Nominal + FBCorrection;
AudioFB.Data[3] = 0;
Your feedback implementation is a sort of bang-bang control. A proportional control should work better. The feedback value has 16.16 format. So instead of increasing/decreasing the feedback integer part you should slowly increase/decrease the fractional part. See chapter 5.12.4.2 in USB 2.0 specification.
Yes. E.g. 48kHz @ bInterval=1 yields only 6 frames per packet. Then +/-1 frame would result in samplerate of 56kHz/40kHz, a huge jump. Typically the changes are tiny, to allow for small variations.
BTW in this matter the implicit feedback is much easier - the host driver just remembers what packet sizes were received from the device and uses the same stream of packets sizes for the transmission. The device can keep sending e.g. zeros, just timed at the DAC clock (it can send whatever was collected in the buffer between the IN packet intervals). Not available in the MS UAC2 driver though. Also not implemented by the linux audio gadget where it would be much more complicated due to processing in much larger chunks than with the microcontroller.
BTW in this matter the implicit feedback is much easier - the host driver just remembers what packet sizes were received from the device and uses the same stream of packets sizes for the transmission. The device can keep sending e.g. zeros, just timed at the DAC clock (it can send whatever was collected in the buffer between the IN packet intervals). Not available in the MS UAC2 driver though. Also not implemented by the linux audio gadget where it would be much more complicated due to processing in much larger chunks than with the microcontroller.
Implemented feedback value calculation by BCLK pulses counting by means of TIM2 between SOF interrupts. For 192 kHz @ 32 bits stream nominal pulse count is 1536. On the board I see values in range of 1530 - 1536, then convert them in 16.16 format. For example, value of 1534 BCLK pulses, that is 23,96875 frames, corresponds to 0x17F800 value in 16.16 format. Sound is clearer than before but not pure. Is there something else I must pay attention to? Any help would be appreciated.
Last edited:
If the BCLK pulse counts are constantly below nominal it probably means that your SW is running too slow so you get buffer overruns. Feedback handling may slow it down even further which is why the sound is worse with FB. You could try lower sampling rates.
Your new FB implementation should work better but it may still be a bit abrupt. As I said earlier in this thread XMOS uses a moving average over 128 SOF periods.
Your new FB implementation should work better but it may still be a bit abrupt. As I said earlier in this thread XMOS uses a moving average over 128 SOF periods.
IIUC the BCLK as well as SOF timing are HW based and not related to the feedback value. As a result the BCLK pulse count between SOF will be more or less constant, regardless the feedback value.If the BCLK pulse counts are constantly below nominal it probably means that your SW is running too slow so you get buffer overruns.
The calculation seems correct to me. Lower BCLK => lower requested rate. The averaging is a very good point.
When checking performance of the feedback wireshark+packet capture and tshark get handy. These are commands I used for checking whether linux usb-audio driver was correctly applying implicit feedback for a given device:
Code:
IN packets:
tshark -r /tmp/usblog.pcap -Y 'usb.src == "2.3.1" and usb.transfer_type == 0x00 and usb.endpoint_address == 0x81' -T fields -e usb.iso.iso_len > /tmp/in_lengths.txt
OUT packets:
tshark -r /tmp/usblog.pcap -Y 'usb.dst == "2.3.1" and usb.transfer_type == 0x00 and usb.endpoint_address == 0x01' -T fields -e usb.iso.iso_len > /tmp/out_lengths.txt
tr ',' '\n' < /tmp/in_lines.txt | sort | uniq -c
684 40
931484 48
tr ',' '\n' < /tmp/out_lengths.txt | sort | uniq -c
932198 48
The numbers revealed that the driver was ignoring the feedback. While the device sent 684 of 40-byte packets and 931,484 of 48-byte packets, the driver did no adjustment and sent 932,198 of same-length 48-byte packets.
In the same way you can check what your USB driver is sending. Wireshark + tshark are multiplatform, the analysis pipeline can be performed by any method (linux, osx, cygwin, wsl, etc).
Last edited:
Most of the USB OTG and SAI functionality in STM32F7 (including SOF and BCLK) are interrupt driven so if any of the interrupt handlers takes too much time the timings may start to drift.
Why "and SAI"?
SAI can operate without interruption in Cycle DMA mode. It just interrupts periodically at half and end of the buffer.
SAI can operate without interruption in Cycle DMA mode. It just interrupts periodically at half and end of the buffer.
Is it possible to debug the buffer under&overflow somehow? Like setting some GPIOs and monitoring them with a scope/digital analyzer? I have no practical experience with FPGA programming techniques 🙂
DMA half/full callback handler may be too slow.
It depends of a buffer size ant interrupt priority - it should be less than USB.
What to do in DMA interrupts? Just copy from USB buffer to DMA buffer.
P.S. And not to use HAL functions - this is what may slow the process.
There are various ways to debug or monitor. E.g. debug output (printf) is possible via SWD. Or GPIOs could be used to light up leds.Is it possible to debug the buffer under&overflow somehow? Like setting some GPIOs and monitoring them with a scope/digital analyzer? I have no practical experience with FPGA programming techniques 🙂
BTW STM32 is a MCU, not FPGA.
Depends entirely on the functionality. E.g. PCM output requires splitting the buffer for DL and DR. Also host may operate on different bit depth than SAI (e.g. 16/24 vs. 32).What to do in DMA interrupts? Just copy from USB buffer to DMA buffer.
No need to avoid USB OTG or SAI HAL as it is just a thin layer.P.S. And not to use HAL functions - this is what may slow the process.
I mean - not to use HAL DMA&SAI functions inside the interrupts and other time critical points.
I HAD a troubles with this.
HAL is made as "universal library" for all kinds.
Small example - GPIO set:
But if you know exactly in which state you need to put a pin for what you need this "if"?
You can write GPIO->BSRR=GPIOPin; or GPIO->BRR=GPIOPin; - it will works twice faster.
Another example - HAL_DMA_IRQHandler(DMA_HandleTypeDef *hdma).
You can see how many thing are checked in this function, for all occasions! But, usually, many things are not related to a particular mode of operation and can be avoided.
In few projects, the standard function was too slow, so I made my own function My_HAL_DMA_IRQHandler(DMA_HandleTypeDef *hdma), which works much faster.
I HAD a troubles with this.
HAL is made as "universal library" for all kinds.
Small example - GPIO set:
Code:
void HAL_GPIO_WritePin(GPIO_TypeDef* GPIOx, uint16_t GPIO_Pin, GPIO_PinState PinState)
{
/* Check the parameters */
assert_param(IS_GPIO_PIN(GPIO_Pin));
assert_param(IS_GPIO_PIN_ACTION(PinState));
if(PinState != GPIO_PIN_RESET)
{
GPIOx->BSRR = (uint32_t)GPIO_Pin;
}
else
{
GPIOx->BRR = (uint32_t)GPIO_Pin;
}
}
But if you know exactly in which state you need to put a pin for what you need this "if"?
You can write GPIO->BSRR=GPIOPin; or GPIO->BRR=GPIOPin; - it will works twice faster.
Another example - HAL_DMA_IRQHandler(DMA_HandleTypeDef *hdma).
You can see how many thing are checked in this function, for all occasions! But, usually, many things are not related to a particular mode of operation and can be avoided.
In few projects, the standard function was too slow, so I made my own function My_HAL_DMA_IRQHandler(DMA_HandleTypeDef *hdma), which works much faster.
Last edited:
Have tried different optimization levels, situation has not changed.Increase the optimization level to -O2 or -O3 if you haven't done so already.
The best result I have is with the following feedback calculation. But background clicks still present, the sound is not pure. By means of debugger I can see Read pointer reaches Write pointer after 10200 - 10700 SAI transfers.
C:
//AudioBuffer.Border = 192 (bytes) * 160 (packets)
//AudioBuffer.NominalPtrsGap = AudioBuffer.Border / 2 = 192 (bytes) * 80 (packets)
int32_t PtrsGap = AudioBuffer.wr_ptr - AudioBuffer.rd_ptr;
if (AudioBuffer.rd_ptr > AudioBuffer.wr_ptr)
PtrsGap += AudioBuffer.Border;
if (PtrsGap > AudioBuffer.NominalPtrsGap)
PtrsGap -= AudioBuffer.NominalPtrsGap;
else
PtrsGap = AudioBuffer.NominalPtrsGap - PtrsGap;
uint32_t PtrsGapFrac = (PtrsGap * 65536) / AudioBuffer.NominalPtrsGap;
uint32_t FBValue = AudioFB.Nominal << 16;
if (PtrsGap > AudioBuffer.NominalPtrsGap)
FBValue -= PtrsGapFrac;
else
FBValue += PtrsGapFrac;
if (FBValue < 0x00170000)
FBValue = 0x00170000;
else if (FBValue > 0x00190000)
FBValue = 0x00190000;
AudioFB.Data[0] = FBValue;
AudioFB.Data[1] = FBValue >> 8;
AudioFB.Data[2] = FBValue >> 16;
AudioFB.Data[3] = FBValue >> 24;
- Home
- Source & Line
- Digital Line Level
- UAC2.0 on STM32