I agree with Kenn that using lower level function calls to the DAQmx driver will give you more flexibility and power. However, another thing to consider is something that you can manipulate in your original example using the DAQ Assistants.
Notice that you are specifying to read 85,000 samples at 80kHz every time the DAQ Assistant runs. Just do the math and you'll see that that operation takes over a second to complete. So right away your output is over a second later than your input. You can drastically improve performance by setting the number of samples to read to be something comparably small like 1000. Basicly, the smaller the input buffer, the lower the delay for the output loop. You have to be careful here, though, because reducing the input buffer size can cause buffer overflows if you aren't able to read the data quickly enough. So you don't want your buffer to be too small.
Jarrod S.
National Instruments