I had the idea that maybe all the cores need to run and set the bit to enable the D cache to get the shared L2 cache to come alive. There is after all a bit to enable coherency between cores and without all the cores up and running, who can say that it isn't deadlocked waiting for some core that isn't running to handshake or something.
A conclusion up front, just spinning up all the cores has not yet made any difference. It is fun though to get multiple cores running and avoid the strange race conditions that arise.
Test 11: Multiple core test Kyu (opi), ready> i 11 Kyu (opi), ready> Testing multicore startup BANG! BOOM! *T*J CT orine 1mm fua_isleetd_t ttbor s, tapargte able: 40040000 set TTBCR to 0, read back: 00000000 set TTBR0 to 40040059, read back: 40040059 set TTBR1 to 40040059, read back: 40040059 set DACR to ffffffff, read back: ffffffff GIC cpu_init called BONK! BONK! BONK! BONK! BONK! BONK! BONK! BONK! BONK! BONK!A look at the code shows us that this calls test_core() in board/multicore.c This routine will just be a stub in a single core system like the BBB. In the case of the Orange Pi it called a routine that I was endlessly using for experiments back in the day.
The BANG and BOOM just show core 0 sending interrupts to itself via the GIC. Then we launch core 1, which yields a mangled bunch of messages during startup while core 0 and core 1 are both accessing the uart. Core 1 then goes into a spin loop. After some delay, core 0 sends 10 interrupts to core 1, which yields the BONK messages.
10c: e3a00103 mov r0, #-1073741824 ; 0xc0000000 110: e3800801 orr r0, r0, #65536 ; 0x10000 114: e3e01000 mvn r1, #0 118: e5801230 str r1, [r0, #560] ; 0x230 11c: e1a0000b mov r0, fp ....... ....... ....... ....... ; We come here if we are a secondary processor ; r0 = 0xc0010230 184: e3a00103 mov r0, #-1073741824 ; 0xc0000000 188: e3800801 orr r0, r0, #65536 ; 0x10000 18c: e3800e23 orr r0, r0, #560 ; 0x230 ; this is where secondary cores park until they get moved elsewhere. ; loop around WFI instruction ; monitors the location r0 is pointing to (0xc0010230) ; this looks a lot like the PSCI scheme for starting secondary processors ; So, a secondary processor ends up here, spinning around a wfi ; and monitoring 0xc0010230 ; Note that cmn adds the #1 to the value, discarding the result and setting flags ; So if the magic location is 0xffffffff, adding 1, wraps to zero, ; meaning to this loop, "nothing to do yet". ; Note that nothing here checks the cpu number, so it must be ; necessary to start the secondary processors one at a time, ; let them park here for a while, then move them elsewhere. 190: e320f003 wfi 194: e5901000 ldr r1, [r0] 198: e3710001 cmn r1, #1 19c: 112fff11 bxne r1 1a0: eafffffa b 0x190So all cores are watching the same magic location (as per the comment above). I have seen other twists on this scheme that have the core read the mpid register to figure out who they are and then look at a second word to see if they have been requested for action. Another scheme would be to assign a different mailbox to each core.
The following is similar code from the Rockchip RK3399 bootrom. Note a couple of differences though. One is the magic value 0xdeadbeaf to pull the core out of the loop. Also the wfe instruction is used rather than wfi (either should do). Also note that this ROM runs in 64 bit aarch64 mode, whereas the Fire3 ran in 32 bit mode.
Once again, all the cores get awakened, and will race together to the same posted branch location. Presumably there they will sort themselves out and return to a wait loop like this if not needed.
Note that this works in conjuntion with the SEV instruction, which is described as a "hint" instruction. It signals an event to all processors in a multiprocessor system. It is not clear to me whether "sev" will awaken wfi as well as wfe. It will certainly awaken wfe. One person says:
WFE is used for communication between multiple cores (one core waits by executing WFE and other core wakes it up by executing SEV). WFI is for sleeping until interrupt.I buy this. So WFI is more like what you would use for the idle loop where only one processor is concerned. However all of this needs careful study (and will SEV awaken the loop in the Fire3 rom?).
; We are some core other than cluster 0, core 0 ; Cores other than cluster 0, core 0 will spin here. ; The mechanism for awakening one of these cores is as follows: ; ; Write the address to run to 0xff8c0008 (in SRAM) ; Write 0xdeadbeaf to 0xff8c0004 (in SRAM) ; ; Note that we write 0xdeadbeaf, not 0xdeadbeef ; ; It seems to me that this will wake up all the cores and ; take them to this other address, which will then have to ; sort them all out via the mpidr_el1 register. ; It is a little surprising that we cannot select a single core ; to bring out of this WFE spin loop, but it is what it is. ffff002c: 58000760 ldr x0, 0xffff0118 ; ff8c0004 ffff0030: 52800021 mov w1, #0x1 ; w1 = 1 ffff0034: b9000001 str w1, [x0] ; write the "1" ffff0038: d503205f wfe ffff003c: 18000762 ldr w2, 0xffff0128 ; w2 = 0xdeadbeaf ffff0040: 580006c0 ldr x0, 0xffff0118 ; ff8c0004 ffff0044: b9400001 ldr w1, [x0] ffff0048: 6b02003f cmp w1, w2 ffff004c: 54ffff61 b.ne 0xffff0038 ; go back to sleep ffff0050: 58000681 ldr x1, 0xffff0120 ; ff8c0008 ffff0054: b9400020 ldr w0, [x1] ffff0058: d61f0000 br x0 ; go run at that addressI am a bit surprised, but apparently I have not yet read out and disassembled the bootrom for the Allwinner H5. My bet is that it will be like the H3 -- I will need to turn on clocks and manipulate a reset register to start each core running.
Kyu / tom@mmto.org