It is often important to get exactly the ARM manual for the specific processor, and there is a specific manual for the Cortex-A7 MPcore. However for details of the MMU, you go to the more general "ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition". Look for the section labeled VMSA (virtual memory system architecture) and find the subsection that describes "translation tables". This is section B3 (page 1307) and B3.3 (page 1318).
Why do we care about this at all if we are doing bare metal programming and want a static transparent address map? We must set up the translation tables if we want to enable the caches. The reason why is that the addresses that reference hardware registers must not be cached. So we set up a linear (transparent) mapping for the entire 4G address space, but only enable caching for the 1G sub-space that could contain ram.
The first complication that arises when reading the manul is that it discusses various "extensions" that may or may not be present on your specific processor. These are:
PFR0: 00001131 PFR1: 00011011These are described in section B4.1.93 and B4.1.94 of the manual.
For PFR0 we have (right to left): ARM, Thumb2, Jazelle, ThumbEE
For PFR1, right to left: STD model, Security extensions, NO M profile, virtualization extensions, generic timer
So we have answered two of our questions, but still don't know about LPAE and Multiprocessing extensions. There are 4 "memory model feature registers", but they don't discuss LPAE.
Searching the Cortex A7 MPcore manual, I find the statement: "The Cortex-A7 MPCore processor supports the Virtualization Extensions (VE) and the Large Physical Address Extension (LPAE)". I am just going to ignore the multiprocessing extensions.
TTBCR: 80000F00The high bit is set, indicating that EAE (40 bit) addresses will be used. This means the TTBR0 is a 64 bit register!! This is something I have been unaware of up to now. There are special instructions that get or set such a thing from a pair of 32 bit arm registers:
/* Set 64-bit TTBR0 */ asm volatile ( "mcrr p15, 0, %0, %1, c2" : : "r"(low_32), "r"(0) : "memory"); /* Get 64 bit TTBR0 */ asm volatile ( "mrrc p15, 0, %0, %1, c2" : "=r" (low_32), "=r" (hi_32) )I read this out and see:
TTBR0: 00000000 7FFF4000Note that if I just use mcr/mrc to access this register, I mess with the low 32 bits, which is fine as long as the upper 32 have been previously set to 0.
There is also another register (TTBR1), but it is never initialized or used. It reads as different random garbage after each power cycle of the board.
So, our first level page tables are at 0x7fff_4000 and they look like this:
TT 0: 7FFF4000 - 00000000 7FFF0003 TT 1: 7FFF4008 - 00000000 7FFF1003 TT 2: 7FFF4010 - 00000000 7FFF2003 TT 3: 7FFF4018 - 00000000 7FFF3003Each of these 64 bit entries describes a 1G section of the address space. I was confused by these initially, expecting to find cacheability bits here, and the entries are all the same, except for addresses, but these point to second level page tables with 1M entries (1024 of them for each 1G). They look like this:
TT 0: 7FFF0000 - 00400000 00000441 TT 1: 7FFF0008 - 00400000 00200441 TT 2: 7FFF0010 - 00400000 00400441 TT 3: 7FFF0018 - 00400000 00600441 TT 0: 7FFF1000 - 00000000 40000449 TT 1: 7FFF1008 - 00000000 40200449 TT 2: 7FFF1010 - 00000000 40400449 TT 3: 7FFF1018 - 00000000 40600449 TT 0: 7FFF2000 - 00400000 80000441 TT 1: 7FFF2008 - 00400000 80200441 TT 2: 7FFF2010 - 00400000 80400441 TT 3: 7FFF2018 - 00400000 80600441 TT 0: 7FFF3000 - 00400000 C0000441 TT 1: 7FFF3008 - 00400000 C0200441 TT 2: 7FFF3010 - 00400000 C0400441 TT 3: 7FFF3018 - 00400000 C0600441Here I have just dumped the first 4 of the 1024 entries in each second level table.
We have gotten ahead of ourselves, but have now presented the whole picture. Let's go back and look at value in the registers and tables in detail.
First the "CR" (control register). We see: TTBCR: 80000F00. B4.1.153 in the manual describes this. We have already noticed that bit 31 (seen here as "8" enables extended addressing using 40 bits and 64 bit table entries. Why do this if we only have a 32 bit (4G) address space? It is because it allows large pages (1M) and segments (1G) and thus a relatively compact page table.
The 4 bits set to "F" are OOII where OO selects "outer" cacheability and II selects "inner" cacheability. I have run around in circles trying to find out what inner and outer are talking about. This is clearly a flaw in the documentation. What I take it to be is that "inner" refers to the L1 cache and "outer" refers to the L2 cache. The setting with the two bits set to one (0x3) is "Outer Write-Back no Write-Allocate Cacheable". This gets modified later on a page by page basis by the table entries.
The low 3 bits of the TTBCR determine how many bits in TTBR0 are used for the base address. I always just ignore this aspect of things and slap the actual address into this register, which works fine as long as all the bits beyond those used as the base are zero. There are no fancy bits in TTBR0, so it just holds the base address of the translation table:
TTBR0: 00000000 7FFF4000This takes us to the level 1 table, which has 4 entries like so:
TT 0: 7FFF4000 - 00000000 7FFF0003 TT 1: 7FFF4008 - 00000000 7FFF1003 TT 2: 7FFF4010 - 00000000 7FFF2003 TT 3: 7FFF4018 - 00000000 7FFF3003The low bit is a "valid" bit -- if it is 0, any access to that 1G block of addresses is invalid and causes a fault. The next bit (bit 1) is set to 1, and that indicates that this simply points to a level 2 table.
The level 2 tables (there are 4 of them) looks like so:
TT 0: 7FFF0000 - 00400000 00000441 TT 1: 7FFF0008 - 00400000 00200441 TT 2: 7FFF0010 - 00400000 00400441 TT 3: 7FFF0018 - 00400000 00600441 TT 0: 7FFF1000 - 00000000 40000449 TT 1: 7FFF1008 - 00000000 40200449 TT 2: 7FFF1010 - 00000000 40400449 TT 3: 7FFF1018 - 00000000 40600449 TT 0: 7FFF2000 - 00400000 80000441 TT 1: 7FFF2008 - 00400000 80200441 TT 2: 7FFF2010 - 00400000 80400441 TT 3: 7FFF2018 - 00400000 80600441 TT 0: 7FFF3000 - 00400000 C0000441 TT 1: 7FFF3008 - 00400000 C0200441 TT 2: 7FFF3010 - 00400000 C0400441 TT 3: 7FFF3018 - 00400000 C0600441The low 2 bits again indicate the type of entry. Here we see "01". In this format, bits 2-11 and 52-63 give attributes.
Bit 53 (which is "4" for everything but RAM) is "PXN", which is privileged execute never, which says do not execute here at PL1. So an instruction fetch from these regions would cause a fault
Now, what about bits 11-2. We see 11-4 are always "44". The interesting thing is bit 3 which gets set to 1 for RAM. This must indicate that this region is cacheable.
At this point I am getting lazy. That bit set to 1 in bit 3 indicates that ram should be cached, and selects a caching flavor. The ARM documentation is all but inscrutable about the translation table entries. Looking at U-Boot would be the better bet with the help of ctags (naturally) and checking which CONFIG options are actually active for the Orange Pi.
But for my purposes, just mimicing that bit without fully understanding it will do just fine. Do I really care about the NXP bit on other regions? It might catch some really crazy code runaways, but I doubt that it really matters.
So for me 0x441 for non-RAM and 0x449 for RAM.
What I need to know all this for is when I kick off another core. The smart thing to do would be to just tell it to use the very same translation table as core 0. The other smart thing would be to ensure that I don't stomp on the table set up by U-Boot. And that might be a little trickier if I work on a different H3 board (like the NanoPi) with less ram. The "right thing" to do would be to have core 0 read the TTBR0 and post the value someplace to be used when I set up core 1.
Another very important discovery is that TTBR0 is a 64 bit register. I have been getting away with treating it like a 32 bit register for core 0, relying on the upper 32 bits being set to zero by U-boot. But when I start core 1, I must be sure to initialize the entire 64 bits.
Tom's electronics pages / tom@mmto.org