December 16, 2022

BSD TCP/IP for Kyu - Long send bug and panic

This bug turned up during testing. The test idea was to make a connection to a listening server and then send a lot of data. It was easy to set up a listener on the linux side using netcat using one of the following commands.
ncat -v -v -l -p 2022 > levi.test < /dev/null
ncat -v -v -l -p 2022 < /dev/null | wc -c
On the Kyu side, a loop like this was used:
	while ( total > 0 ) {
            if ( total < LOTS_CHUNK ) {
                tcp_send ( so, buf, total );
                break;
            } else {
                tcp_send ( so, buf, LOTS_CHUNK );
                total -= LOTS_CHUNK;
                buf += LOTS_CHUNK;
            }
        }
This worked fine with a chunk size of 2048, but failed with a chunk size of 8000. Further experimenting showed that 5000 works, but 6500 does not. Somewhere between 5000 and 6500 bytes exists some kind of threshold.

I expected this to fail in a different way than it did. I expected that I would ask for 8000 bytes to be written and that less would be actually be written and I would need to arrange for tcp_send() to return the actual number written. However what happens is that I get a panic!
The panic message is:

PANIC: m_copym

A possible workaround

I am tempted to fix this in the following way. I know that this works with values of 5000 bytes or less. So I could check the count, and if the value was greater than 5000, I could break the send up into small pieces that I know would work. In fact I could modify tcp_send() to contain a loop much like that shown above. Then if I desired to send a big block of data, just one call would be needed and the loop above would be "built in" as a handy convenience.

I may well do just this, but for now I want to dig a bit further.

We need more information

The panic message gives the name of a routine in uipc_mbuf.c. This routine contains three exactly identical panic messages, so I do several things. First I add a number 1, 2, or 3 to each message to distinguish them. Next, rather than calling panic() (which is a generic call in Kyu just like it was in BSD), I call bsd_panic(). This allows me to augment the usual one line printing of a panic message with a stack traceback. After doing all of this, I get:
Current thread is tcp-lots
 SP = 4056be70,  FP = 4056be84
Called from bsd_panic+14 -- 400235bc
Called from m_copym+dc -- 4002ceac
Called from tcp_output+8fc -- 40029840
Called from sosend+688 -- 4002eedc
Called from tcp_send+70 -- 4002e844
Called from lots_thread+b8 -- 40031e1c
Called from thr_exit -- 4000f554
BSD panic: m_copym 3
PANIC: BSD tcp
It is important to note that various amounts of data have been successfully sent. I have even seen one test with the chunk size set to 6500 that worked, while most do not. I am not entirely sure if this is a bug I have caused or an ancient bug (perhaps it should be called limitation) in the BSD code.

Looking at wireshark, I see data payloads that are typically multiples of 100 (such as 300, 400, 100, 500). The value of 500 seems typical. Usually a 48 byte payload starts the transfer for some reason.

It is worse than I thought

I just had a test case fail with 2048 byte chunks and the same panic reported. I modified a server on the Kyu side to respond to the string "big" by sending a large file. Then I used netcat from linux to poke it and get it to send data. Interestingly, netcat shows read returning:
READ SUCCESS for EID 1730 [192.168.0.73:114] (500 bytes)
READ SUCCESS for EID 1738 [192.168.0.73:114] (1500 bytes)
READ SUCCESS for EID 1746 [192.168.0.73:114] (48 bytes)
This indeeds totals to 2048 bytes, but wireshark shows only packets with 500 byte payloads.

But never mind the details of packet fragmentation, the real problem here is that I fail sometimes (maybe 1 out of 10 times) with a 2048 byte chunk size. When it does fail, I see:

Current thread is wang_thr
 SP = 4056ad98,  FP = 4056adac
Called from bsd_panic+14 -- 400235bc
Called from m_copym+dc -- 4002ceac
Called from tcp_output+8fc -- 40029840
Called from sosend+688 -- 4002eedc
Called from tcp_send+70 -- 4002e844
Called from big_wangdoodle+48 -- 4003164c
Called from one_wangdoodle+fc -- 40031790
Called from run_wangdoodle+88 -- 40031840
Called from thr_exit -- 4000f554
BSD panic: m_copym 3
PANIC: BSD tcp
I added some code to display how I am handling the mutual exclusion lock and it seems clear that it is not being held when this panic occurs.

So I wrapped my call to sosend() with net_lock and net_unlock and the bug persists.

Fix this with "big locks"

No doubt the scheme in the original code with splnet() and splx() worked fine and was debugged and tested. I could put some time and energy into emulating it more carefully, allowing arbitary nesting of such calls and keeping track of some kind od pseudo processor interrupt level. I decided to do something different.

What I have done is to identify the basic "entries" into the TCP code from the outside world as follows:

 *  -- tcp_input()
 *  -- timer calls
 *  -- user calls
 *  ---  send
 *  ---  receive
 *  ---  bind
 *  ---  accept
 *  ---  connect
 *  ---  close
I use one mutex semaphore and wrap each of the above calls with it. I also have to take note of places in send, receive, accept, and connect that block waiting and unlock during the wait. In general the wait is for something that will be driven by tcp_input (i.e. packet arrival) and not unlocking and letting tcp_input to run will yield immediate deadlock.

I make all of this conditional on the macro "BIG_LOCKS".
It seems to work.

Checksum error?

During testing, I got this message:
 *** *** *** tcp_input 2, bad cksum = ffff
There are still serious errors. It mostly just locks up. I have to do several transfers of about 100K before it does, but then it locks up.

I think this is fixed. The clue is that the hang always happened when the entire transfer was finished, i.e. during close. And an examination of close revealed that I had a deadlock by not releasing the lock while I waited for the connection to initiate the disconnect. A simple oversight, but it led to hours of debugging.

To aid the debugging, I changed my test to generate recognizable data packets. For whatever reason, even though I do 2000 byte writes, the packets have 500 byte payloads. I don't particularly care, but I took some pains that the final 500 bytes would be recognizable as the end of the data.

I am using a python script to pound away over and over doing 200K transfers.

Hangs

My python script runs any number of tests one after the other. The test I am using makes a connection to port 114 (the "wangdoodle" server) and sends the command "big" using netcat, which then receives the 200K transfer that ensues and writes it to disk (overwriting the previous such). I have seen hangs after:
4654 tests (200K * 4654 = 930M)
1234 tests (200K * 1234 = 247M)
Once a hang occurs, Kyu is entirely crashed, the console is unresponsve, so there is no way to have a clue about what is going on.

As an experiment I changed the size of the default thread stack from 4096 to 16384 bytes. This value is in kyu.h and it is necessary to do a "make clean ; make" to rebuild Kyu after such a change.


Have any comments? Questions? Drop me a line!

Kyu / tom@mmto.org