SP = 4059bec0, FP = 4059bed4 Called from bsd_panic+14 -- 400235bc Called from sb_lock+5c -- 40030ba0 Called from sorflush+40 -- 4002f95c Called from sofree+a8 -- 4002fa64 Called from soclose+1d8 -- 4002ffc8 Called from tcp_close+28 -- 40031080 Called from run_wangdoodle+94 -- 4003201c Called from thr_exit -- 4000f554 locker count: 0 Input thread: idle Timer thread: idle User lock: busy BSD panic: sb_lock PANIC: BSD tcpIn general Kyu hangs, rather than just some tcp related thread being aborted. This indicates that even though I detected the bug and paniced, some rather serious corruption has taken place.
socket_show ( so ); unroll_cur_short ();I do this in the routine sorflush() in socket_io.c (which is what calls sb_lock). Consider two things. One is that it is unclear who and how sb->sb_lock gets cleared, even after I have finished my analysis of the bug. Also remember that the sockbuf structure in contained within the socket structure (one for reading and one for writing). So code in sorflush() has to generate a pointer to pass to sb_lock() as follows:
struct sockbuf *sb = &so->so_rcv;As I poked at this I noticed that before I started hammering on Ctrl-C, several transfers took place just fine and each one yielded just on call to sorflush(), but when the trouble came, there were two calls to sorflush(). I added some code to display the state of the socket itself and discovered when the second call was being made, the socket structure was actually on the free list! This should not be!! Somehow two calls are being made to sofree(), which is the root cause of the problem, never mind the details of just how the socket structure gets corrupted. Once it is back on the free list, anything is possible.
So why are we calling sofree twice, and why do things work for "normal" transfers
but not for interrupted transfers. Here is what I think explains it.
For a normal transfer, we send the 200K of data then call tcp_close on the Kyu
side. This will trigger the sending of a FIN packet, but code will quickly run
to process the close and finish long before the reponse to the FIN packet shows up.
This works.
For an interrupted transfer, the linux side terminates the transfer and sends the
FIN. This causes code in tcp_input() to "close" the connection before we detect
that the connection is broken and call tcp_close on the Kyu side.
This does not work.
In short, if we call tcp_close() first (before tcp_input() gets a FIN), things work. If tcp_input() runs first, we fail.
So the basic issue seems clear, but not yet all the details. At first I thought it was simply that I was not calling soclose() from my tcp_close(), but it is not that simple. Now I am focusing my attention on a flag in so_state called SS_NOFDREF.
The basic idea with this flag seems to be to indicate that a socket has no references from the "user side", so the protocol side is free to dispose of it or do as it pleases with it. The case of making a call to tcp_connect() which calls socreate(), the so_state bits are all zero, indicating that the socket has a "user side" reference, and that tcp_close() just might get called.
It is interesting to note that in sonewconn() (which handles the case where connections arise on a listening socket) this bit is set. This certainly is correct when you consider what the bit is supposed to mean. But the bit ought to get cleared before tcp_accept(). And this turns out to be the bug. When I coded up tcp_accept I did not understand the importance of the one line of code that clears this bit and omitted it.
To fix things, I added the routines accept1() from kern/uipc_syscalls.c, and soaccept() from kern/uipc_socket.c The single line I missed was in soaccept (the main purpose of which is to return the peer name). I hacked "accept1" into accept_kyu in my socket_io.c and now the bug seems to be fixed. It was all about proper handling os SS_NOFDREF.
What does the protocol do if it runs first?
Here tcp_input() calls tcpcb_close, which calls in_pcbdetach(), which calls sofree()
This is all good, but tcp_close() (actually soclose()) should check so->so_pcb, find
it zero, and avoid trying to call sofree() a second time.
Kyu / tom@mmto.org