I could "fix" the problem by sticking a delay in soclose() which caused the call made by my application to stall, and then the call generated by network activity would complete first. The reason this works is that the Kyu tcp thread (that runs tcp_input() and hence the bulk of the TCP processing) runs at priority 13 in Kyu and my application thread runs at priority 20. But this is hardly a solution, it is just an aid in diagnosing the problem.
The symptom of the problem was an ARM data abort. This was happening because I was deallocating a semaphore structure twice. The first deallocation put it back on the free list (and corrupted it). The second deallocation tried to use corrupted data.
One of course asks, "why is the proven BSD code suddenly plagued with race conditions?". The answer is that I stubbed out the calls to splnet, splimp, and splx that fiddle with processor priorities and that BSD used to lock critical sections. The question becomes how to lock critical sections in Kyu for this code (semaphores of course). Actually manipulating ARM processor interrupt priorities in a way that mimics what a VAX used to do is nothing I want anything to do with. It is worth noting that for BSD4.4 on the MC68020 processor, that is exactly what they did.
What I am doing right now (and what has fixed this specific bug) is to replace splnet() with a routine I call "net_lock()" and the corresponding splx() with a routine I call "net_unlock()". I add these to soclose() (which did lock a critical section with splnet/splx) and I am half way there.
The other half is realizing that the routine "tcp_input()" is called by the interrupt code on BSD and just runs at splnet the entire time when performing TCP processing. So I wrap my call to tcp_input() with my net net_lock/net_unlock routines (which are implemented by a mutex semaphore) and I am in business. My bug is gone and everything works. Well, probably not everything because there are other critical sections wrapped in splnet/splx that I need to look at.
There is another more tricky issue. The BSD code uses two special interrupt priority levels for the network code: splnet and splimp. It uses splimp to really tighten things up and avoid contention even with code running at interrupt level. Most of the TCP code uses splnet, which is a software interrupt level that does not preempt any hardware interrupts. The vital thing is that when the function tcp_input() is called with a new packet to process, it runs at splnet. The only time splimp is used is when pulling packets out of a queue, which has packets loaded into it by interrupt code.
The tricky aspect of this is that if I simply replace both splimp and splnet with calls to net_lock I am likely to see deadlock. In particular, if I am already locked as if at splimp level, and then try to lock again to go to splnet level, I will deadlock.
It turns out thought that in practice this is not a big concern. This is because I see only one call to splimp in the TCP code and I am not confronted with significant issues eliminating it.
The way to think about this is to consider two things:
Large locks like this can lead to inefficient code. A proper way to do things is to have different locks for every resource where there might be contention. This would require a lot of careful study of the code. For now I am willing to do something that closely emulates what was done by BSD (but without the splimp/splnet distinction) and just get things working.
What I am looking to do is to ensure that there are locks in code that would have been part of the "top half" in the BSD kernel (as indeed there are) and replacing each of them with my locking via semaphore scheme.
Correct is a first goal. Fast can come later (if ever).
Kyu / tom@mmto.org