Wondering just what semaphore tcp-bsd is blocked on has me thinking of adding semaphore
names and displaying them by the "l" thread list command. So I did this.
This involved adding a routine "sem_set_name" if you desire to name your semaphores.
Otherwise you get a possibly unique name like "sem-ba68" which uses the last
16 bits of the address of the semaphore structure.
I also renamed the tcp-bsd task to tcp_input (which is more accurate and informative).
The new thread display looks like this:
Kyu, ready> l
Thread: name ( &tp ) state sem pc sp pri
Thread: shell (400a7a1c) READY I 40004610 40568000 11
Thread: net (400a764c) SEM I net-inq 40010a14 40578000 20
Thread: net_slow (400a7558) REPEAT C 40015c54 4057c000 21
Thread: tcp-input (400a7834) SEM J tcp-queu 40010568 40570000 24
* Thread: tcp-timer (400a7740) READY I 40010a14 40574000 25
Thread: wangdoodle (400a7370) SEM I socket 40010a14 40584000 31
Thread: idle (400a7928) READY C 40010a68 4056c000 1234
I am also thinking of implementing thr_block_atomic() and thr_block_atomic_t() to allow
race free producer/consumer setup. This is not a different sort of semaphore, but
some different semantics for semaphore blocking. It turns out I already did this,
calling this a "condition variable", the calls are:
m = sem_mutex_new () cv = cv_new ( m ) cv_wait ( cv ) cv_signal ( cv ) cv_destroy ( cv )I end up needing to rework the (pretty much never used or tested) code in cv_wait(). But now I have this in place for the TCP input queue.
As for the Kyu input queue, it does not require similar treatment. I uses hardware interrupts for the mutex to lock the queue and I already have an atomic block operator in use there. So, we gave it a spin and after 782 transfers we have a thr_unblock hang.
Thread: name ( &tp ) state sem pc sp pri Thread: shell (400a7a1c) READY J 40010568 40568000 11 Thread: net (400a764c) READY J 40010568 40578000 20 Thread: net_slow (400a7558) REPEAT C 40015d38 4057c000 21 * Thread: tcp-input (400a7834) SEM I tcp-inq 40010a30 40570000 24 Thread: tcp-timer (400a7740) READY C 40023f64 40574000 25 Thread: wangdoodle (400a7370) SEM J socket 40010568 40584000 31 Thread: wang_thr (400a7464) READY I 400105cc 40580000 35 Thread: idle (400a7928) READY C 40010a88 4056c000 1234 PC = 400105cc ( change_thread+1f8 ) Called from thr_unblock+178 -- 4000fe18 Called from sem_unblock+70 -- 40010dec Called from user_unlock+1c -- 40023d3c Called from tcp_send+88 -- 4002f294 Called from big_wangdoodle+10c -- 40032850 Called from lots+34 -- 40032904 Called from one_wangdoodle+158 -- 40032b3cIt is important to note that "user_unlock" indicates that we are dealing with the "big TCP lock". This is at the very end of "tcp_send" where we have finished sending a buffer and are releasing the lock to any other activity.
Wireshark shows all the packets in order as they should be. Using "t 9" to turn on debug shows the threads cycling from timer to input to wang. So "wang" is getting plenty of opportunity to run. Why is it just sitting there?
Aha! It all makes sense now. See the next page for details.
Also, I think I just fixed the DEAD thread thing. It was an inadvertent enabling of interrupts in the routine "thr_exit", and if an interrupt happened right then, we would never return to finish the job with the thread marked DEAD. This was because of locking/unlocking via interrupts inside of "cancel_repeat".
And, lest I forget, some things pending from yesterday. I added some code to where tcp-input pulls packets off the queue and it seemed to have a dramatic effect of allowing longer runs before a hang. That is the name of the game when you are dealing with probabilistic race conditions, but it will be interesting to comment out that code and then see how well we do. This code is in tcp_thread() in kyu_main.c
I am still convinced that I need a TCP output queue and/or an overall network output queue. What happens if there are ICMP packets mixed with TCP and maybe UDP thrown in to boot? At this time, everybody calls ip_send() and there is no effort at locking of any sort. The only thing that makes this work is sheer luck along with the fact that I typically am not doing multiple things in parallel. This is one of those things that was good enough when we were simply trying to get on the air, but now needs to be addressed.
Kyu / tom@mmto.org