I am now studying the NetBSD starup code to see how the initialize the H3 ARM processor. I have timed a 1 megabyte transfer (using netcat) with NetBSD on my Orange Pi PC board and it runs at 0.17 secons just like it should (with Kyu the time is now 5 seconds). This proves that the H3 is capable of doing much better, and also that my specific board does not have broken hardware.
I built NetBSD for the H3 just so I would have a record of the files actually used in the build. This is all but impossible to work out just looking at the sources themselves. Unfortunately, NetBSD does not leave the object files next to the sources in the nice way that U-boot does (so they can serve as breadcrumbs). I captured the build log and extracted a list of filenames from it, which is proving to be invaluable.
Two files hold the main execution thread for startup:
arch/arm/arm/armv6_start.S arch/arm/arm32/locore.SI have no idea why there is arm and arm32. The 64 bit ARM stuff is all in arch/aarch64. Whatever the case, there is an interesting comment at the start of armv6_start.S --
* At this point, this code has been loaded into SDRAM and the MMU should be off * with data caches disabled. * linux image type should be used in uboot images to ensure this is the case.This is certainly not the case for Kyu since uboot simply loads a binary file into memory and jumps into it. In fact when I test the control registers I see that the cache enable bits are on, and the MMU is enabled. One might think I could just take the U-boot initialization and run with it, but that doesn't work.
int cleanup_before_linux_select(int flags) { /* * this function is called just before we call linux * it prepares the processor for linux * * we turn off caches etc ... */ disable_interrupts(); if (flags & CBL_DISABLE_CACHES) { /* * turn off D-cache * dcache_disable() in turn flushes the d-cache and disables MMU */ dcache_disable(); v7_outer_cache_disable(); /* * After D-cache is flushed and before it is disabled there may * be some new valid entries brought into the cache. We are * sure that these lines are not dirty and will not affect our * execution. (because unwinding the call-stack and setting a * bit in CP15 SCTRL is all we did during this. We have not * pushed anything on to the stack. Neither have we affected * any static data) So just invalidate the entire d-cache again * to avoid coherency problems for kernel */ invalidate_dcache_all(); icache_disable(); invalidate_icache_all(); } else { /* * Turn off I-cache and invalidate it */ icache_disable(); invalidate_icache_all(); flush_dcache_all(); invalidate_icache_all(); icache_enable(); } /* * Some CPU need more cache attention before starting the kernel. */ cpu_cache_initialization(); return 0; } int cleanup_before_linux(void) { return cleanup_before_linux_select(CBL_ALL); }Worth reading might be: doc/README.arm-caches
So where is the above called?
./lib/efi_loader/efi_boottime.c: cleanup_before_linux(); ./arch/arm/lib/spl.c: cleanup_before_linux(); ./arch/arm/lib/bootm.c: cleanup_before_linux();No matter what the comment in NetBSD says, I see no code in U-boot that disables the MMU.
U-Boot SPL 2022.10-dirty (Jan 16 2023 - 21:31:52 -0700)
I add code to Kyu to record the value of the MMU registers the very instant we gain control in locore.S. I see:
orig SCTLR = 00c5187d orig TTBR0 = 7fff4000 orig TTBR1 = 40040059 orig TTBCR = 80000f00 orig DACR = 55555555All this is interesting. The DACR gives permissions for the 16 domains (2 bits per domain). The value 5 (0101) is setting 01 which is to check access using the information in the tables. I have always used f (1111), which says to just skip checking access altogether.
The SCTLR tells us that both I and D caches are enabled and the MMU is enabled as well.
The low 3 bits of TTBCR are zero and that means that only TTBR0 is being used. The big surprise is bit 31 being set (see below)
The value in TTBR1 is not used and is a value I set, which has persisted through a reset and never was changed by U-Boot. I set TTBR1 equal to TTBR0 (why not), even though it should never be used.
The value of TTBR0 shows us the U-Boot stuck the MMU table way out near the very end of the 1G of ram (0x1000_0000 is 256M of ram).
We can use Kyu to look at the table:
dl 0x7fff4000 32 7fff4000 7fff0003 00000000 7fff1003 00000000 7fff4010 7fff2003 00000000 7fff3003 00000000 7fff4020 00000000 00000000 00000000 00000000 Kyu, ready> dl 7fff0000 8 7fff0000 00000441 00400000 00200441 00400000 7fff0010 00400441 00400000 00600441 00400000 7fff0020 00800441 00400000 00a00441 00400000 7fff0030 00c00441 00400000 00e00441 00400000 7fff0040 01000441 00400000 01200441 00400000 7fff0050 01400441 00400000 01600441 00400000 7fff0060 01800441 00400000 01a00441 00400000 7fff0070 01c00441 00400000 01e00441 00400000 Kyu, ready> dl 7fff1000 8 7fff1000 40000449 00000000 40200449 00000000 7fff1010 40400449 00000000 40600449 00000000 7fff1020 40800449 00000000 40a00449 00000000 7fff1030 40c00449 00000000 40e00449 00000000 7fff1040 41000449 00000000 41200449 00000000 7fff1050 41400449 00000000 41600449 00000000 7fff1060 41800449 00000000 41a00449 00000000 7fff1070 41c00449 00000000 41e00449 00000000Holy smokes. This makes no sense at all.
A side note. Cortex-A8 and A9 do not provide LPAE, only A7. Also LPAE is the default with arm64.
The details are there (as they should be) in the ARMv7 A/R manual. Search for "long-descriptor translation table format desriptors" ---
The first 2 levels look like this:They say that "to be consistent with the short format" the bit AP[0] is not defined.
Setting AP[2] makes the memory "read only" (otherwise it is R/W) - We see it set 0.
Setting AP[1] allows access at any level, otherwise it is privileged. - We see it set 1.
The access flag bit will yield a fault if it is zero and the entry is read into the TLB. Software is expected to do something, then set the flag to one. Since I don't want this, setting this to 1 initially sounds like just the right thing.
orig SCTLR = 00c5187f orig ACTLR = 00000042 orig SP = 9ef40818 orig TTBR0 = 9fff0000 orig TTBR1 = 00000000 orig TTBCR = 00000000 orig DACR = fffffffd Kyu (bbb), ready> di 9fff0000 8 9fff0000 00000c12 00100c12 00200c12 00300c12 9fff0010 00400c12 00500c12 00600c12 00700c12 9fff0020 00800c12 00900c12 00a00c12 00b00c12 9fff0030 00c00c12 00d00c12 00e00c12 00f00c12 9fff0040 01000c12 01100c12 01200c12 01300c12 9fff0050 01400c12 01500c12 01600c12 01700c12 9fff0060 01800c12 01900c12 01a00c12 01b00c12 9fff0070 01c00c12 01d00c12 01e00c12 01f00c12So, I cache enabled, D cache also, and the MMU.
Kyu / tom@mmto.org