December 22, 2016
Multiple cores - Part 1
I began looking into this (without success) a month ago, back in 12-22-2016.
Now I am back at it again, with new insights.
Bad documentation
This business is vitually undocumented in the Allwinner H3 datasheet.
On top of that, the stock linux kernel does not include the latest H3 code.
For that, you need to grab the "Armbian" distribution, which builds a custom
kernel that includes the most complete sunxi code for the H3 chip.
Armbian to the rescue
I had gone through the full exercise of building Armbian from scratch sometime
around 12-11-2016. To do this I ran virtual box, installed Debian (as directed)
and ran the Armbian build scripts (which are an involved set of bash scripts that
will only work on a Debian system). This patches the regular U-boot and
linux kernel (3.4.112) distribution and builds from the patched sources.
I worked out how to do the U-boot patching and building under Fedora, but never
followed on with the kernel. But since I had the patched kernel sources in
the virtual disk image, it was easy enough to transfer that snapshot and study it.
It turns out there is a huge amount of code in the arch/arm/mach-sunxi
directory. And it was clear by looking at which files had corresponding ".o" files
which of the many files were actually involved in the build.
This is good, and a valuable source of information above and beyond what is
in the mainline linux kernel sources.
If you do begin looking at the linux code, there is a lot of nomenclature to be
aware of. First of all, the H3 chip is in the "sun8i" family.
In particular, it is a sun8iw7p1.
So you can ignore stuff for the sun8iw6 and the sun9i and others.
Note that the H5 chip (that I plan to work on someday)
is in the sun50i family, and as near as I can tell there is no support at all
for this in the code tree I am have. This may be supported if a different
set of patches were applied and an H5 specific Armbian build was performed.
Maybe. Also note that the H5 is a 4-core Cortex-A53 device (a 64 bit ARM).
Something for another day.
ARM booting and Bootrom involvement
It turns out that when a new core starts running, it is simply another ARM processor
just like any other. It is unaware that it is part of a multi-core system or that
it is not the one and only ARM processor in the whole wide world.
It has ways of figuring things out (namely reading the processor affinity register),
but until it does that, it simply comes out of reset, sets the PC to 0xffff0000 and
starts running. This as it happily turns out is the start address of the H3
bootrom (and a second core will start in this just like the first processor did).
A new core does not get far before reading the processor affinity register and
getting its processor ID from the low 2 bits. If these are 0, it is the "main"
processor and continues on in the bootrom, ends up doing the SPL process,
running U-boot and all that we are familiar with.
However if the processor ID is non-zero, it is some other core and
the bootrom code does something special.
Namely it loads the PC from a special location, address 0x01f01da4.
Naturally this must have been set to point to some code we have
prepared for it by the routine we are using to start a new core.
Note that this is not a location in on-chip SRAM.
It does boil down to a 4 byte piece of SRAM if you want to look at it that way.
Detecting a new core running
There are any number of ways for a freshly booted processor to get into
trouble, so we would like to contrive the simplest possible test to
have it announce that it is up and running. My first idea was to just
set aside a memory location, set this location to some non-zero value,
and have the new core clear the location when it starts up.
Processor zero can just poll this location watching
to see if it goes non-zero.
This all sounds simple enough --- until you consider the issue of caching.
The issue of caching
The simple strategy of using a "sentinel" like this fails because of
caching. What happens is that the fully intialized processor that
is trying to start a new core has the data cache enabled.
It writes some non-zero value to the sentinel address,
but that value never makes it beyond the cache.
The new core will almost certainly come up with the data cache disabled,
but this doesn't help us any.
We could write some code to flush and invalidate cache lines to
solve all this, but we would like a simpler solution just to do
some preliminary testing.
The thing to do is to use some memory location that is not cached.
In other words, a memory location that is not in SDRAM.
One possibility is to use any part of the on-chip SRAM (which is
simply gathering dust once the system is booted up).
Another possibility is to just zero the value in the magic location
that holds the jump address, namely 0x01f01da4.
This works out just fine as it turns out.
After having success with this, we try just some "random" address in
on-chip SRAM (namely 0x4), this also works just fine.
Have any comments? Questions?
Drop me a line!
Tom's electronics pages / tom@mmto.org