December 29, 2022

BSD TCP/IP for Kyu - the unexpected shutdown bug

Here is how I can quickly and repeatedly generate this bug. I start a server on the Kyu side on port 114. This is the server that will dish out 200K of data every time I ask for it. Then I run my Python script that asks for such transfers back to back. Then I just lean on the ^C key. Python has a bug where the ^C does not interrupt the script, but simply interrupts each transfer, but this works out well for this test. In short order I get:
 SP = 4059bec0,  FP = 4059bed4
Called from bsd_panic+14 -- 400235bc
Called from sb_lock+5c -- 40030ba0
Called from sorflush+40 -- 4002f95c
Called from sofree+a8 -- 4002fa64
Called from soclose+1d8 -- 4002ffc8
Called from tcp_close+28 -- 40031080
Called from run_wangdoodle+94 -- 4003201c
Called from thr_exit -- 4000f554
locker count: 0
Input thread: idle
Timer thread: idle
User lock: busy
BSD panic: sb_lock
PANIC: BSD tcp
In general Kyu hangs, rather than just some tcp related thread being aborted. This indicates that even though I detected the bug and paniced, some rather serious corruption has taken place.

Troubleshooting

I spent an evening looking into this. What triggers the panic is that I have code in the routine sb_lock() in socket_sb.c that checks the value of sb->sb_lock and if it is 0, throws the panic. This ought to be a pointer to a Kyu mutex semaphore. I added printf statements along with calls to
socket_show ( so );
unroll_cur_short ();
I do this in the routine sorflush() in socket_io.c (which is what calls sb_lock). Consider two things. One is that it is unclear who and how sb->sb_lock gets cleared, even after I have finished my analysis of the bug. Also remember that the sockbuf structure in contained within the socket structure (one for reading and one for writing). So code in sorflush() has to generate a pointer to pass to sb_lock() as follows:
struct sockbuf *sb = &so->so_rcv;
As I poked at this I noticed that before I started hammering on Ctrl-C, several transfers took place just fine and each one yielded just on call to sorflush(), but when the trouble came, there were two calls to sorflush(). I added some code to display the state of the socket itself and discovered when the second call was being made, the socket structure was actually on the free list! This should not be!! Somehow two calls are being made to sofree(), which is the root cause of the problem, never mind the details of just how the socket structure gets corrupted. Once it is back on the free list, anything is possible.

So why are we calling sofree twice, and why do things work for "normal" transfers but not for interrupted transfers. Here is what I think explains it.
For a normal transfer, we send the 200K of data then call tcp_close on the Kyu side. This will trigger the sending of a FIN packet, but code will quickly run to process the close and finish long before the reponse to the FIN packet shows up.
This works.
For an interrupted transfer, the linux side terminates the transfer and sends the FIN. This causes code in tcp_input() to "close" the connection before we detect that the connection is broken and call tcp_close on the Kyu side.
This does not work.

In short, if we call tcp_close() first (before tcp_input() gets a FIN), things work. If tcp_input() runs first, we fail.

So the basic issue seems clear, but not yet all the details. At first I thought it was simply that I was not calling soclose() from my tcp_close(), but it is not that simple. Now I am focusing my attention on a flag in so_state called SS_NOFDREF.

The basic idea with this flag seems to be to indicate that a socket has no references from the "user side", so the protocol side is free to dispose of it or do as it pleases with it. The case of making a call to tcp_connect() which calls socreate(), the so_state bits are all zero, indicating that the socket has a "user side" reference, and that tcp_close() just might get called.

It is interesting to note that in sonewconn() (which handles the case where connections arise on a listening socket) this bit is set. This certainly is correct when you consider what the bit is supposed to mean. But the bit ought to get cleared before tcp_accept(). And this turns out to be the bug. When I coded up tcp_accept I did not understand the importance of the one line of code that clears this bit and omitted it.

To fix things, I added the routines accept1() from kern/uipc_syscalls.c, and soaccept() from kern/uipc_socket.c The single line I missed was in soaccept (the main purpose of which is to return the peer name). I hacked "accept1" into accept_kyu in my socket_io.c and now the bug seems to be fixed. It was all about proper handling os SS_NOFDREF.

To be continued ......

What does the protocol do if it runs first?
Here tcp_input() calls tcpcb_close, which calls in_pcbdetach(), which calls sofree() This is all good, but tcp_close() (actually soclose()) should check so->so_pcb, find it zero, and avoid trying to call sofree() a second time.

Attach and Detach

Connect and Disconnect


Have any comments? Questions? Drop me a line!

Kyu / tom@mmto.org