December 13, 2022

BSD TCP/IP for Kyu - Recv - blocking and signaling

More about race conditions quite frankly. The symptoms are that my call to "recv()" returns immediately with no data. Why did it not block and wait as it should? If I add 100 ms of delay between making the connection and trying to read the data it works. This is because in that time the data has arrived and there is no need to block.

Here is what some debug output looks like while making a connection to the daytime server on port 13. Remember that the usual unix daytime server will immediately (upon receiving a connection) send the response (the time string) and close the connection. What I am doing here in the debug is to announce every event that would cause the blocked "recv" call to unblock. I looked at this for several days before realizing what the trouble was.

Start connect to 192.168.0.5 (port 13)
rwakeup: connected 2
 -- soreceive BLOCK 1
 -- soreceive BLOCK 1 done
TCP input - data in tcp_reass 2 26
rwakeup: tcp_input reass
TCP input - data in tcp_reass 2 0
rwakeup: tcp_input reass
rwakeup: cant rcv more
Return from  soreceive with 0
0 bytes received
rwakeup: disconnecting
rwakeup: disconnecting
rwakeup: disconnected
rwakeup: cant rcv more
TCP connect test finished

Of course this will make no sense to someone who did not author these messages, so I should explain. The lines that start with wwakeup and rwakeup indicate calls to routines by this name that unblock either the write or read semaphores and indicate why. We really only care about the rwakeup messages, so I have deleted many of thw wwakeup messages. Notice that we get awakened due to state changes such as a connection being extablished or a disconnect happening, as well as data arriving.

Note the messages indicating tha soreceive has blocked then unblocked. What has happened is that the rwakeup prior to these messages has cleared the semaphore so that no blocking at all happens! In addition, all this happens before any data has arrived, so the thread running recv sees that no data is in the buffer. Why it doesn't loop around and continue to wait is yet to be determined. Before the recv call manages to return, data arrives and the tcp_input thread preempts the thread running the recv. It deposits 26 bytes in the buffer (and announces it), then it apparently gets a packet (the FIN/ACK) with no data, and reports that also (these are the rwakeup lines marked "reass" which refers to the TCP input reassembly routine (that actually is not doing any reassembly, but is given a chance to examine these packets).

Once the recv thread resumes, it has already decided that there is not data and that it does not need to loop and block, so it returns the 0 count. I modified the Kyu test code to look like this:

	n = tcp_recv ( so, buf, TEST_BUF_SIZE );
        printf ( "%d bytes received\n", n );
        if ( n == 0 ) {
            n = tcp_recv ( so, buf, TEST_BUF_SIZE );
            printf ( "%d bytes received\n", n );
        }
The expectation is that the second call will work. The first call is premature and should block (but doesn't, as discussed above). But it will cause enough delay (mostly due to the time to print the debug messages) that the second call will find the 26 bytes waiting. But now the first call works! The debug trace now looks like this:
Start connect to 192.168.0.5 (port 13)
rwakeup: connected 2
wwakeup: connected 2
 -- TCP input - data in tcp_reass 2 26
rwakeup: tcp_input reass
TCP input - data in tcp_reass 2 0
rwakeup: tcp_input reass
rwakeup: cant rcv more
receive RESTART
Return from  soreceive with 26
26 bytes received
13 DEC 2022 15:25:51 MST
wwakeup: disconnecting
rwakeup: disconnecting
wwakeup: disconnecting
rwakeup: disconnecting
wwakeup: disconnected
rwakeup: disconnected
rwakeup: cant rcv more
TCP connect test finished

But wait? We don't see any blocking at all now. Now it is miraculously working. What could have happened? Perhaps the linux machine is now responding really quickly with the daytime response after the connection handshake finished. No. It is me. I added a printf (the message "receive RESTART") before making the decision to block, and the delay sending that message over the serial port (even at 115200 baud) is enough to provide time for the packet to arrive. Here is a valuable lesson for the patient reader. I have seen this before when simply adding a printf either causes or "fixes" a bug due to added delay.

I comment out that printf (of "receive RESTART") and try my experiment again and get the expected result. You could accuse me of the mistake of changing two things at the same time, but who would expect adding a printf statement to have such a radical effect.

Start connect to 192.168.0.5 (port 13)
rwakeup: connected 2
wwakeup: connected 2
 -- soreceive BLOCK 1
 -- soreceive BLOCK 1 done (error = 1074422196
TCP input - data in tcp_reass 2 26
rwakeup: tcp_input reass
TCP input - data in tcp_reass 2 0
rwakeup: tcp_input reass
rwakeup: cant rcv more
Return from  soreceive with 0
0 bytes received
try again
Return from  soreceive with 26
26 bytes received
13 DEC 2022 15:50:50 MST
wwakeup: disconnecting
rwakeup: disconnecting
wwakeup: disconnecting
rwakeup: disconnecting
wwakeup: disconnected
rwakeup: disconnected
rwakeup: cant rcv more
TCP connect test finished

A side note on Kyu semaphores

Kyu provides a binary semaphore and two calls: sem_block() and sem_unblock(). Consider these being used for signaling, as they are here. The semaphore state will be "1" and the sem_block() call will block. The sem_unblock() call will reset the state to "0" and unblock the thread that is waiting in the sem_block() call. All well and good, but what is the problem here?

The first problem is indicated by the "rwakeup: connected 2" line. Here a wakeup was performed to indicate that the connection was established. This cleared the semaphore state to 0 before we decided that we wanted to block and wait for data. So our attempt to block and wait returned immediately, accomplishing nothing. With a binary semaphore as implemented for Kyu, we can call sem_unblock multiple times with no effect (if no thread is blocked on the semaphore) other than clearing the semaphore to zero.

What can we do?

The first question is why the BSD code does not check a second time, see that no data is available, and block again. Alternately you could ask why nobody was blocked waiting for the signal that announced that we were connected.

I would like to think that we don't need to monkey with the BSD code, but that I should contemplate adding additional semantics to the Kyu semaphores. On addition might be the ability to inspect a semaphore to see if we will block on it (we want that to be true in this case). The question then would arise, what do we do if we find the semaphore clear? This leads to the idea of adding the ability to force a semaphore to be set (so a subsequent sem_block call would indeed block). Either of these additiona could be subject to certain race conditions. We could add a call like "sem_block_ensure()" - but that sounds unspeakably ugly.

The real solution

After all of this I began to study why we were not looping and blocking again in soreceive() as one might expect. It turned out to be a stupid bug.

The routine sbwait() -- which is what does the blocking -- returns an error status (or it did in the original BSD version). When I reworked it to call a Kyu semaphore I neglected to have it return a value. So it was returning non-zero garbage,which soreceive() was interpreting as some kind of error, causing it to abandon ship.

Once I simply make sbwait() return 0 the code in soreceive() works as it should.

Even though this adventure ended up with this simple fix, it was a lot of work to track down and an interesting chance to look closer at the code.


Have any comments? Questions? Drop me a line!

Kyu / tom@mmto.org