December 10, 2022

BSD TCP/IP for Kyu - connection close and shutdown

I am seeing a bug where when I call soclose(so) once a connection is shutdown I am getting nasty errors (ARM Data faults). Digging into these a bit, I see that a semaphore structure is basically just rubbish. The reason is that the semaphore has already been destroyed, "free" has been called to return memory to the pool.

The reason is sort of a race. The protocol itself has already closed the connection, and I am repeating work (or trying to) that already has been done, i.e. I am calling soclose(so) twice when I should only be calling it once.

It would seem clear that some kind of reference count is needed. The protocol close should happen, but the socket should be left around for the code to actually recognize that the connection is gone and do a tidy shutdown.

Look at detail of the connection process

My simple test is against the DAYTIME server on TCP port 13. Here is what wireshark shows me: This all looks exactly right. Comer's book (Internetworking with TCP/IP vol 1) page 196 has a nice discussion of "closing a TCP connection". Three aspects stand out: So when one side sends a FIN, it basically says that "I am done sending". The other side will also need to indicate at some point that it is done sending.

Some details in the code

I spent quite a bit of time looking at the code, aiming to find where FIN was handled. I added some debug to display every active socket and the so_state variable. Then I ran my client that connects to port 13 on my linux machine. The linux machine sends the daytime string, then the FIN/ACK to shutdown the connection. I commented out the call to soclose() on the Kyu side and found that the Kyu code waits for ever with the socket marked as follows:
Socket: 401600a0 1 pcb = 40160158, 0022 CONNECTED CANTRCV
So, the fact that the FIN has been received from Linux has caused the socket to be marked as "CANTRCV", so a receive call can detect that waiting for more data is hopeless, and could (in theory) also relay this fact to the application.

Some more poking around led me to discover the routine "sorwakeup()" which calls sbwakeup(). At one point this was capable of doing an upcall, but I got rid of this (perhaps hastily) for Kyu. In the BSD code, the upcall wall only used by NFS code (something I have no intention of implementing).

At this time, sorwakeup() is a macro in socketvar.h and is just:

#define sorwakeup(so)   sbwakeup(&(so)->so_rcv)
But it used to be:
#define sorwakeup(so)   { sowakeup((so), &(so)->so_rcv); \
	      if ((so)->so_upcall) \
		(*((so)->so_upcall))((so), (so)->so_upcallarg, M_DONTWAIT); \
	    }
(Note that I renamed sowakeup to sbwakeup). My code for sbwakeup() now looks like this. The original code has a socket pointer argument and handled notifications via select and signals.
void
sbwakeup ( struct sockbuf *sb )
{
        // sb->sb_flags &= ~SB_SEL;
        if (sb->sb_flags & SB_WAIT) {
                sb->sb_flags &= ~SB_WAIT;
                // wakeup((caddr_t)&sb->sb_cc);
                sem_unblock ( sb->sb_sleep );
        }
}
Note that I have introduced the kyu semaphore "sb_sleep" in lieu of the BSD wakeup/tsleep scheme. The wait for the semaphore takes place in sbwait().

Chapter 16 of "the book" (TCP/IP Illustrated by Wright and Stevens) is entitled "Socket IO" and disusses socket buffers along with the sosend() and soreceive() routines. It very much deserves some close study at this juncture. Both of these routines are more complex than they need to be just for TCP. They support TCP, UDP (unreliable), as well as record oriented protocols and out of band data. They could certainly be streamlined to contain only what is needed for TCP -- and perhaps this is something I will tackle.

As just one example, UDP required "ATOMIC" treatment so that entire datagrams are returned by a read call, as well as the "ADDR" option that indicates that each datagram is preceded by an mbuf with address information. Other protocols such as ISO and NS require similar treatment, and may also expect record boundaries to be respected. We need none of this for TCP.

Debug experiments with soclose()

I added some printf and delays to this routine and discover there is a race going on! Without the delays, I see that sofree gets called twice on the same socket, which is definitely not good. This is what leads to my data abort. With a delay, everything works just fine.


Have any comments? Questions? Drop me a line!

Kyu / tom@mmto.org