One thread is doing hairy math problems (500x500 CDB matrix inversions and stuff), and posting results into a queue. After it posts results, it proceeds with the next math problem.
Another thread picks up the results from the queue, opens a TCP connection to another thread (on the same machine or a different machine), and transmits the results, after converting them to a string.
The results might be 50-100 kBytes.
The transmit thread opens a connection, transmits a header and a block(of 1024), a header and a block, a header and a block, etc. until done.
The receive thread waits on a connection, then receives a header (fixed size), then a block (described by the header), a header and a block, a header and a block, etc., until done.
All TCP operations use a timeout of 200 mSec.
The trouble is, I�m getting receiver errors (56 = timeout), that the transmit side doesn�t see. The transmit side is set to detect an error, and re-do the whole thing later if an error occurs. That has previously proven to be working. But now, I have cases where the receive side reports an error (56), but the transmit side doesn�t know about it, so my code fails.
I thought that an error on the receive side would be reflected back to the transmit side. (guaranteed delivery?)
Should I jack up the timeout value and hope for the best?
Should I implement an acknowledge reply scheme?
Other ideas?