There is a fair summary of what we know thus far.
I add some debug and run a test. After 489 transfers it is hung. The last message on the Kyu console is "Send blocked". Typing "u wang_thr" yields:
PC = 40010910 ( resched+288 ) Called from thr_block+7c -- 4000fa40 Called from sem_block_cpu+4c -- 40010e3c Called from sem_block+24 -- 40010e68 Called from sbwait+38 -- 40030fbc Called from sosend+31c -- 4002f19c Called from tcp_send+80 -- 4002ee68 Called from big_wangdoodle+f4 -- 4003230c Called from one_wangdoodle+124 -- 40032580This is the relatively infrequent block in "sbwait". This looks like a bonafide and legitimate deadlock, and we may as well investigate it.
The first thing to notice is that this seems to happen every time sosend decides that it needs to call sbwait and block. This doesn't happen very often, but when it does it does not seem to be handled properly. Who is supposed to awaken us? There are two cases in tcp_input() that call so_wwakeup_ext(). I added some debug to these and by the time I got back to the kyu console there had been more activity, the message "Send unblocked" along with "out of netbufs" and it was entirely hung. So I am just going to reboot, run another test, and perhaps the next time the added debug will be helpful (so_wwakeup now prints a message).
I now get 3 messages and the end of each transfer from so_wwakeup() when we are disconnecting and disconnected. I think I can just ignore these. Just watching the network activity LED lets me know at a glance if the test is running (it blinks rapidly and steadily) or if the test has hung (it blinks slowly and intermittently). Now we wait for a hang to analyze. Each transfer now takes about 2 seconds (the console messages slow it down), so 489 transfers would take 978 seconds (16 minutes). The next run hangs after 1195 transfers (30 minutes). No message about being blocked or awakened. The traceback is:
PC = 400104c8 ( change_thread+1f8 ) Called from thr_unblock+178 -- 4000fd14 Called from sem_unblock+70 -- 40010c20 Called from user_unlock+1c -- 40023a3c Called from tcp_send+88 -- 4002ee80 Called from big_wangdoodle+f4 -- 4003231cTyping "t 9" shows that we are rapidly cycling between 3 threads:
Switch wang_thr --> tcp-bsd Switch tcp-bsd --> tcp-timer Switch tcp-timer --> wang_thrWhy are we continually running tcp-bsd? It should only run when Kyu has a TCP packet that has arrived and needs to be processed. Running tcp-timer of course makes sense (it should run every 100 ms). I add debug to tcp-bsd to tell me how many actual packets are processed every time it awakens to pull packets off the queue, and now we run again and wait for a hang.
Away we go again with this debug in place. Now we are setting records, having gone for 2400+ transfers. It makes me wonder if just the tiny bit of delay that has now been added to tcp-bsd has made the difference. Now I am up to 3870 transfers (each taking 1.8 seconds according to netcat). Round this to 2 seconds each and this is nearly 2 hours. Finally we get this:
Wangdoodle connection 4126 cksum: out of data *** *** *** tcp_input 2, bad cksum = 3f m, tlen, len = 401686f0, 12849, 12869 ..... ..... Wangdoodle connection 4148 *** *** *** tcp_input 2, bad cksum = 4201 m, tlen, len = 40168230, 552, 572Bad checksums should not lead to a hang though, the packet should just get dropped and then will ultimately get resent. I type "u wang_thr" to get:
PC = 40010464 ( change_thread+194 ) Called from thr_unblock+178 -- 4000fd14 Called from sem_unblock+70 -- 40010c20 Called from user_unlock+1c -- 40023a3c Called from tcp_send+88 -- 4002eed8 Called from big_wangdoodle+f4 -- 40032374
Typing "t 9" shows absolutely nothing, which is odd. What is happening though is that tcp-bsd is probably in the very loop I wanted to diagnose (at priority 14). But the test code runs the "test" (i.e. t 9) at priority 20, so it is getting starved by the looping tcp-bsd.
Kyu, ready> l Thread: name ( &tp ) state pc sp pri Thread: shell (400a799c) READY J 40010464 40568000 11 Thread: net (400a75dc) SEM J 40010464 40578000 12 Thread: net_slow (400a74ec) REPEAT C 40015a78 4057c000 13 * Thread: tcp-bsd (400a77bc) READY I 40024b08 40570000 14 Thread: tcp-timer (400a76cc) READY C 40023c48 40574000 15 Thread: wangdoodle (400a730c) SEM J 40010464 40584000 30 Thread: wang_thr (400a721c) DEAD I 4000f2ec 40580000 35 Thread: wang_thr (400a703c) DEAD I 400117e0 40580000 35 Thread: wang_thr (400a6f4c) DEAD I 400117d0 40588000 35 Thread: wang_thr (400a6e5c) DEAD I 4000f2f8 4058c000 35 Thread: wang_thr (400a6d6c) DEAD I 4000f2dc 4058c000 35 Thread: wang_thr (400a6c7c) DEAD I 4000f2fc 4058c000 35 Thread: wang_thr (400a6a9c) DEAD I 4000f280 4058c000 35 Thread: wang_thr (400a69ac) DEAD I 4000f2fc 40590000 35 Thread: wang_thr (400a68bc) DEAD I 4000f2dc 40590000 35 Thread: wang_thr (400a66dc) DEAD I 4000f2d8 40590000 35 Thread: wang_thr (400a65ec) DEAD I 4000f2d8 40590000 35 Thread: wang_thr (400a64fc) DEAD I 4000f27c 40590000 35 Thread: wang_thr (400a640c) DEAD I 4000f2d8 40594000 35 Thread: wang_thr (400a631c) DEAD I 4000f2fc 40594000 35 Thread: wang_thr (400a622c) DEAD I 4000f2d8 40594000 35 Thread: wang_thr (400a613c) READY J 40010464 40594000 35 Thread: idle (400a78ac) READY C 40010964 4056c000 1234The traceback for any of these looks like the following (with perhaps different exact addresses in cleanup_thread())
PC = 4000f2d8 ( cleanup_thread+88 ) Called from xxx_arm+bffeeddc -- 00000004 Called from thr_exit -- 4000f770
My guess as to what is happening is that an interrupt happens during thr_exit(), and since
the thread is marked as "DEAD" the scheduler will never return to it to finish the exit
processing. Putting a lock on exit processing should solve this.
But an examination of the thr_exit() code shows the expected locks.
This will have to remain a mystery for the time being.
I rearranged a bunch of priorities as follows - the idea being to get some
elbow room to run things above the network priorities.
In particular, we can now type "t 9" and the thread that runs that "test"
and displays debug information can now run at priority 14.
Thread: name ( &tp ) state pc sp pri Thread: shell (400a799c) READY I 4000460c 40568000 11 Thread: net (400a75dc) READY I 400109d4 40578000 20 Thread: net_slow (400a74ec) REPEAT C 40015b40 4057c000 21 Thread: tcp-bsd (400a77bc) SEM J 4001052c 40570000 24 * Thread: tcp-timer (400a76cc) READY I 400109d4 40574000 25 Thread: wangdoodle (400a730c) SEM I 400109d4 40584000 31 Thread: idle (400a78ac) READY C 40010a2c 4056c000 1234Away we go again with a series of transfers. Starting at 5:54 PM. It hangs after only 114 transfers, this time with the "sbwait bug". It does announce that it is blocking with the "Send blocked" message. And, lo and behold there is no message from so_wwakeup, so it is simply not being notified of new buffer space. Why not? Either I screwed up and eliminated that wakeup that was once in the BSD code, or I am holding a lock and preventing other code from running that would free up space. Or maybe there is a more basic problem that is clogging up the output channel. Things are piling up waiting to be sent, and whatever is causing that is never getting rectified.
After a long wait, I get the message "Send unblocked", but without ever seeing a message that so_wwakeup did anything. How can this happen? This can only happen if sbwait() returned.
Wireshark reveals even more curious and confusing information.
Kyu sends 500 bytes with seq 157001, next seq 157501 (data is PK311) Linux gives the ACK for this (ack 157501) Kyu next sends 500 bytes with seq 151501, next seq 152001 (data is PK300) Linux repeats the ACK for 157501
Why has TCP regressed to the tune of 6000 bytes or so?
Also worth noting is that while hung in this way, every TCP facility is stuck.
Happily though the Kyu console is alive and well.
What information do we need to understand this better?
So, with some new features to display sizes of buffers and queues, away we go again. After 530 transfers it is still going. This has taken 16 minutes (960 seconds) which exactly matches 1.8 seconds per transfer times 530. So, a person could estimate 15 minutes for 500 transfers or 2000 transfers per hour. Each transfer is 200,000 bytes, so we are moving 400 megabytes per hour or 0.4 gigabyte per hour. Not terrific for a 100 megabit ethernet, but adequate for things I want to do -- and I'll be happy if it just works reliably.
And after only 1216 transfers we are hung! I see the messages "Send blocked" as well as "Send unblocked" (but I was busy and distracted, so who knows how long it was from one to the other). Also I see:
Kyu input queue size: 737Kyu has an active console, but will not respond to ping and all aspects of TCP are hung. There are 3 wang_thr -- all DEAD. Wireshark shows the strange seq. number regression we saw on the previous hang. My bet though is that if I had been on the ball and caught this shortly after the hang I might have see an sbwait situation.
First of all, since I added some code to tcp-bsd where it pulls packets off the queue, I have had much longer runs and none of the "thr_unblock" style hangs. Clearly this is some tricky timing issue having to do with the semaphore announcing packet arrivals, but it is progress in that it points to the root of the problem. But the last run of the day went only 272 transfers then got one of these, so I am hardly out of the woods here.
The sbwait problem has more drastic effects. Every aspect of Kyu networking stops. It is as though some thread is holding the interrupt lock, although if that was true my interrupt driven console would not work. So it is some fundamental semaphore thing. It could even be the same bug as the "thr_unblock" bug, but this time having to do with pulling packets off the Kyu main packet queue. Wouldn't that be something. If I can figure out why packet processing is frozen up, I will have learned something. Here is my guess. Kyu needs to deliver TCP packets via tcp_rcv() and this needs to get a lock to put them on the TCP input queue. If it is waiting to get that lock, that would explain this hang.
Look at sem_block_cpu ( sem ). This is designed to do the atomic blocking on a semaphore while holding the INT lock.
A handy addition to Kyu would be a "clock" that just counts seconds since bootup or some such. It would let me print times and then be able to evaluate timing between events (such as Send blocked/unblocked). It turns out I have this as get_timer_count_s() which returns a value in seconds, so as my last trick for the day I add this to the Send block/unblocked printout, and start another run.
Also, I am convinced that I do need a TCP output queue (and probably a network output queue as well).
Kyu / tom@mmto.org