You may be near the actual limits of your hardware.
You must check the characteristics of your board, but since it's a low-cost card, it may be probable that it hasn't any internal output fifo buffer, that means that the speed of generation is limited by the system into which the board is installed.
Since this card has only one DMA channel, software architecture can influence the throughput of the system: according to hardware catalog, 6036E has an output update rate of 10 kS/sec only if the single DMA channel is dedicated to output generation, otherwise the update rate is limited to 1 kS/sec (please note that these rates are classified as "typical", so your actual system can be unable to reach them).
Now it's to you to decide how to assign resources to the single operations your application is doing: with Set_DAQ_Device_Info (1, ND_DATA_XFER_MODE_AO_GR1, ND_UP_TO_1_DMA_CHANNEL); you should be able to assign the dma channel to analog output (it may be necessary to execute previously Set_DAQ_Device_Info (1, ND_DATA_XFER_MODE_AI, ND_INTERRUPTS); to free the dma) being sure this way that AO generation is made at the maximum rate. In any case, it seems that 0.05 ms lies beyond your board capabilities since the maximum resolution is 0.1 ms in the best conditions.