April 12, 2025

Black Pill boards - F411 USB -- DMA

First of all, sorry. You won't be able to use DMA with an actual F411 chip. Let me explain.

There are two variations of the USB controller we are looking at, the FS and the HS. Only the HS variant supports DMA. The F411 has only one USB controller and it is FS. You can verify this by looking at offset 0x14 in the endpoint registers. There is nothing there in the FS controller and this is the DMA address register in the HS controller.

Lately, I have been doing almost all of my work with my F429 board. It has two USB controllers, one is FS just like the F411, the other is HS. And only the HS controller is connected, which is a nice choice if you are only going to provide a connector for one. There are two important things to point out. First is that the FS and HS are almost entirely identical. The HS has extra endpoints (but you can ignore that). It has more FIFO ram (but you can ignore that too). It also has DMA (but you don't need to use it). The second thing to note is that the HS controller works only in FS mode on this board. An external HS Phy chip would be required to actually use it in HS mode and none is provided, so we use the internal FS Phy. But we still refer to it as the HS controller.

Let's just try it

I edit library/usb_conf.h and uncomment #define HS_INTERNAL_DMA_ENABLED. Indeed, this is only available when HS_CORE is defined. Then we rebuild the software and get this:
vcp/cdc.c: In function 'CLASS_Setup':
vcp/cdc.c:320:16: error: 'usbd_cdc_Desc' undeclared
This is an unexpected problem for us to need to sort out. Why does a special descriptor need to be provided when DMA is enabled? Also this is in the code, but usbd_cdc_Desc is not declared anywhere, so this never would have worked.

I replace this reference with a call to "panic()" which allows me to build. The software will "panic" if the code that needs this ever gets called.

This is a request for a CDC_DESCRIPTOR_TYPE, which apparently never happens. (However, maybe some host systems would request this -- like Windows).

It just works. The device enumerates and my test of sending 512 bytes works fine.

How does this work?

In cdc.c a check ensures that APP_Tx_Buffer is not in CCRAM. I don't have CCRAM in any of the devices I am working with, but it is "core coupled RAM" that generally enhances performance, but cannot be used with DMA.

Then in driver.c we see:

#ifdef HS_INTERNAL_DMA_ENABLED
    pdev->cfg.dma_enable       = 1;
#endif
I like this because it makes for cleaner code that having ifdef statements everywhere.

Now WritePacket() becomes a NOOP. Before it would have copied the packet being written into the FIFO. This is called from EPStartXfer() which puts the transfer address into the appropriate register as follows:

WRITE_REG32(&pdev->hw->INEP_REGS[ep->num]->DIEPDMA, ep->dma_addr);
WRITE_REG32(&pdev->hw->OUTEP_REGS[ep->num]->DOEPDMA, ep->dma_addr);
This routine handles both IN and OUT endpoints. It also calls WritePacket, which is now a do-nothing noop.

Where does "dma_addr" get set you may be asking. In one case it is set in EP_PrepareRx() right before the call to EPStartXfer(). The other case is in EP_Tx(), which is pretty much the same. These are in usb_dcd.c

Straight thinking about USB endpoints

And endpoint is a buffer that is by and large controlled by the host. It is the job of a USB device to manage the buffer.

Off on a tangent -- DTR and 8 bit variables

Consider the following code (from vcp/vcp.c). Does declaring the function uint8_t buy us anything?
static volatile uint8_t VCP_DTRHIGH = 0;
uint8_t VCPGetDTR(void) { return VCP_DTRHIGH; }
This compiles to:
08001850 :
 8001850:   4b01        ldr r3, [pc, #4]    @ (8001858 )
 8001852:   7818        ldrb    r0, [r3, #0]
 8001854:   4770        bx  lr
 8001856:   bf00        nop
 8001858:   20000094    .word   0x20000094
I get exactly the same code when I compile:
int VCPGetDTR(void) { return VCP_DTRHIGH; }
The description of "ldrb" says that it fetches the byte, then zero extends it to a 32 bit value, so this makes good sense. If the 8 bit variable was signed, an extra instruction would be needed to sign extend, as per:
static volatile int8_t VCP_DTRHIGH = 0;
int VCPGetDTR(void) { return VCP_DTRHIGH; }

0800185c :
 800185c:   4b01        ldr r3, [pc, #4]    @ (8001864 )
 800185e:   7858        ldrb    r0, [r3, #1]
 8001860:   b240        sxtb    r0, r0
 8001862:   4770        bx  lr
 8001864:   20000094    .word   0x20000094
My conclusion is that the function return value may as well be int. The place to take care is when declaring variables that need memory allocated to them. Make them small and unsigned if you want to conserve both ram and code space.

Perform an experiment writing from USB

So far this is in progress and revealing some surprises. What I see is that when I do a short write (like 4 characters), they appear one by one as data for me to read! And they never show up on the linux side where a read is pending. It is like the USB is in "loopback" mode for writes from USB (to an IN endpoint). However enumeration works fine, which involves both reading and writing. This is the case whether or not DMA is enabled.

Everything is fine when I use "picocom /dev/ttyACM1" on the linux side. So --- this looks like a linux problem, or actually a C programming problem. I change the C code to read only one character, and I never get any data read, but the data shows up "echoed" to USB. And maybe that is exactly what is happening. Linux is treating this like a terminal and echoing data. It may also be waiting for a full line (i.e. a linefeed).

Indeed, when I send "ABCD\n" the linux side gets 5 bytes. The echo is seen by our USB as:

Hydra USB got: 1 00000041
Hydra USB got: 1 00000042
Hydra USB got: 1 00000043
Hydra USB got: 1 00000044
Hydra USB got: 2 0000000D
Note that the last response was 2 bytes, no doubt the second byte (not shown) was 0x0A.

So if I turn off echo and set "raw" mode for the read it should all work fine.

Follow a usb write from beginning to end

So, suppose we make a call like usb_write ( "ABCD\n", 5) to send 5 bytes. This will call class_usb_write() and then VCP_DataTx() in vcp/vcp.c What happens here is that data is transferred to a 2048 byte "buffer" which acts like a software queue or fifo. The buffer is APP_Tx_Buffer[] and a pair of pointers indicate whether the buffer is empty, full, or partially full.

In our case, the 5 bytes fit easily into the 2048 bytes buffer, but suppose we were doing a bigger write. In this case, the buffer would get filled and then code would spin watching the pointers, and as soon as data was removed, it would put more in. You might think that locking would be required, but apparently not the way things are coded.

What happens next is a bit mysterious. Data is in the buffer waiting to be sent, but who notices this and does something about it? I turn on debug messages to get a handle on what is happening and the first message I get after putting data into the buffer is from:

Handle_USBAsynchXfer()
This gets called from CLASS_SOF() in the same file. Aha! Indeed this is getting called at 1000 Hz in response to SOF interrupts. This checks on every call to see if data is in the buffer. If there is, it sets a "lock variable" USB_Tx_State and sets up the write for as much as is allowed of what is in the buffer, calling EP_Tx with that data. It modifies APP_Tx_ptr_out as though the data was removed from the buffer, but it has not been -- it has just been scheduled for the write. This is the kind of thing that usually leads to a race and the need for locks, but remember that this is running in interrupt code.

The routine EP_Tx is in driver/usb_dcd.c -- it calls EPStartXfer() which will either set the DMA pointer or copy the data into the FIFO buffer.

The routine CLASS_DataIn() eventually gets called. It either discovers (as in this case) that there is no more data to send, or it sets up subsequent transfers in the same way that the game was started by EPStartXfer(). This may get called multiple times if there is a lot of data in the buffer. When the buffer is finally empty this will clear the USB_Tx_State lock

When does CLASS_DataIn() get called? It is called from CORE_DataInStage() in library/core.c This is called from HandleInEP_ISR() in driver/interrupts.c What has happened is that an endpoint interrupt has taken place and the "xfercompl" bit is set. This makes sense -- the previously scheduled transfer set up by either EPStartXfer() or CLASS_DataIn() has finished, and it is time to decide if there is more to be sent or if everything is all done.

Current debug messages

Here is what we see when writing 5 bytes from USB:
class USB write: 08005100, ABCD
- VCP DataTx 5 bytes: ABC
- VCP DataTx returns: 5
USB Tx asynch start: 5
USB Tx asynch send: 5
- EP 1 StartXfer 5 bytes
- TxE interrupt enabled for endpoint 1
OTG EP1 IN ISR status: 00000080
Endpoint 1, write packet 5 bytes to FIFO: ABC
OTG EP1 IN ISR status: 00000001
- TxE interrupt disabled for endpoint 1
usbd_cdc_DataIn called with 0 bytes waiting to send on endpoint 1

Bigger writes

Trying to write 512 bytes blows up. Writing 64 bytes doesn't do anything terrible, but the reader only sees the data after two writes (a total of 128 bytes). Writing 32 bytes seems OK and we don't get the "flush after two writes" behavior. Writing 80 bytes is even stranger, we NEVER see anything on the linux side. Writing 48 bytes is just fine. Writing 63 bytes is just fine.

Note that for and HS USB device the parameter CDC_DATA_MAX_PACKET_SIZE is 512, whereas it is 64 for the FS device. I wonder if in our case it should be 64 for FS also, since our HS interface is running FS? I change 512 to 64 and now 64 byte writes work as expected! 99 byte writes work also! Even 200 byte writes work. Here is the debug trace for this:

Writing 200 bytes
class USB write: 200 2000000C, ABCDEF
- VCP DataTx 200 bytes: ABC
- VCP DataTx returns: 200
Tick 9 -- bytes: 0 -- int, sof, xof = 8531 8498 91
USB Tx asynch start: 200
USB Tx asynch send: 64
- EP 1 StartXfer 64 bytes
- TxE interrupt enabled for endpoint 1
OTG EP1 IN ISR status: 00000080
Endpoint 1, write packet 64 bytes to FIFO: ABC
- TxE interrupt disabled for endpoint 1
OTG EP1 IN ISR status: 00000001
- TxE interrupt disabled for endpoint 1
usbd_cdc_DataIn called with 136 bytes waiting to send on endpoint 1
USB Tx datain start: 136
USB Tx datain send: 64
- EP 1 StartXfer 64 bytes
- TxE interrupt enabled for endpoint 1
OTG EP1 IN ISR status: 00000080
Endpoint 1, write packet 64 bytes to FIFO: ABC
- TxE interrupt disabled for endpoint 1
OTG EP1 IN ISR status: 00000001
- TxE interrupt disabled for endpoint 1
usbd_cdc_DataIn called with 72 bytes waiting to send on endpoint 1
USB Tx datain start: 72
USB Tx datain send: 64
- EP 1 StartXfer 64 bytes
- TxE interrupt enabled for endpoint 1
OTG EP1 IN ISR status: 00000080
Endpoint 1, write packet 64 bytes to FIFO: ABC
- TxE interrupt disabled for endpoint 1
OTG EP1 IN ISR status: 00000001
- TxE interrupt disabled for endpoint 1
usbd_cdc_DataIn called with 8 bytes waiting to send on endpoint 1
USB Tx datain start: 8
USB Tx datain send: 8
- EP 1 StartXfer 8 bytes
- TxE interrupt enabled for endpoint 1
OTG EP1 IN ISR status: 00000080
Endpoint 1, write packet 8 bytes to FIFO: ABC
- TxE interrupt disabled for endpoint 1
OTG EP1 IN ISR status: 00000001
- TxE interrupt disabled for endpoint 1
usbd_cdc_DataIn called with 0 bytes waiting to send on endpoint 1

Conclusion

This has gotten pretty long, so I will move on to another page. I fixed a problem with long writes and the settings for the HS mode of the driver. Best of all, I have a solid understanding of how writes from the USB are handled. Next I should do the same kind of study for reads.

I keep meaning to do a study of why there is a special routine EP0StartXfer(). Why can't the general routine EPStartXfer() be used?


Feedback? Questions? Drop me a line!

Tom's Computer Info / tom@mmto.org