Reducing pre-filling for buffered analog output

dmorris · ‎03-21-2013

I'm using a USB-6216 for a hardware-timed, buffered analog output task, with regeneration off, via DAQmx in C++.

Let's say the task is: I want to generate a sine wave at 1kHz, and a user is going to manually control the frequency of the sine wave. So of course the buffering is going to add some latency, but I want to reel in that latency within reason. And if I can't keep up, I understand I'll get a buffer under-run error.

My sampling rate is 1kHz, and I'm setting my buffer size to 10000 samples (10 seconds). I'm logically creating a "chunk size" of 1000 samples; this is what I'll write out whenever I get my EveryNSamples callback.

So here's how I hoped things would work:

DAQmxCreateTask(...)

DAQmxCreateAOVoltageChan(...)

DAQmxCfgSampClkTiming(taskHandle,"",1000.0,DAQmx_Val_Rising,DAQmx_Val_ContSamps,10000));

DAQmxSetBufOutputBufSize(10000);

DAQmxSetWriteRegenMode(taskHandle, DAQmx_Val_DoNotAllowRegen));

DAQmxRegisterEveryNSamplesEvent(taskHandle,DAQmx_Val_Transferred_From_Buffer,1000,0,EveryNCallback,0));

// Now I need to write out a couple chunks before I start the task, lest I should immediately have an under-run. I'm going to write THREE

// "chunks" (3000 samples) in this example.

for(int i=0; i<3; i++)

{

// Assume I'm filling in sine wave data into "data" at each iteration

DAQmxWriteAnalogF64(taskHandle,1000,0,0.0,DAQmx_Val_GroupByChannel,data,&samplesWritten,0);

}

DAQmxStartTask(streamingTaskHandle));

Now in my "EveryNCallback", I'm going to generate the next 1000 samples and do another DAQmxWriteAnalogF64. My hope was to *ignore* the first 3 calls to EveryNCallback, since those are just me "pre-writing".

What I expected to happen was that now that 3000 samples are safely in the buffer, then 1 second after I started the task, when 1000 of these samples had *actually appeared on the analog wire*, I would get another EveryNCallback, and I'd fill in another 1000 samples, which should appear on the wire 3 seconds later. That is, I assumed this pre-write would determine my effective latency.

What I find in practice is that the device *always* immediately calls my callback until it has 8000 samples, regardless of buffer size, sample rate, etc., before any data hits the actual analog wire. This is apparently the size of the internal buffer on the 6216. What I want is for it *not* to require that I fill this up before we start the task. So I learned about DAQmx<Get/Set>AODataXferReqCond. The request condition defaults to DAQmx_Val_OnBrdMemNotFull, which is consistent with the behavior I'm seeing: the 6216 really wants to fill up its buffer all the time.

I tried setting this to DAQmx_Val_OnBrdMemEmpty, which I thought would say "hey, 6216, I know your buffer isn't full, but I promise I'll keep up, please start putting samples on the wire". This doesn't throw an error, but seems to have no effect: if I immediately read the value back (via DAQmxGetAODataXferReqCond), it's still DAQmx_Val_OnBrdMemNotFull.

If this matters, calling DAQmxGetAODataXferMech tells me that I'm using USB bulk transfer. I have not tried changing this value. (Update: I tried changing this value, and USB bulk transfer is the only value that doesn't give me an error.)

So... is my mental model of how this works reasonable? Is there a way to get the latency down lower than a whopping 8 seconds?

I'm also pretty unclear on what the software buffer size (the one I configured via DAQmxSetBufOutputBufSize) even means here, since it doesn't seem to affect any of this.

Thanks!

[Update: I confirmed that if I just jack up the sampling rate, this problem largely goes away (since now filling up 8000 samples corresponds to lower latency), though in practice this is a very unsatisfactory solution, since I up the compute, bus, and memory load for no real reason. This is not an issue for generating a sine wave, but of course this is just a toy example and not what I'm actually interested in.]

-Dan

AleAlejandro · ‎03-22-2013

Please refer to the manual for this device on page 5-2 where it refers to Analog Output Data Generation Modes. This may help shed some light on the matter

http://www.ni.com/pdf/manuals/371931f.pdf

Daniel G.
Semiconductor & Wireless
National Instruments

dmorris · ‎03-22-2013

What specifically were you suggesting I look at? Page 5-2 discusess the different data generation methods; I'm definitely using hardware-timed, non-regeneration output, which I'm 99.9999% sure is the right (and only) option for even the simple case of "generate a sine wave whose frequency is under interactive control".

The question is specifically about latency and buffering behavior, and whether I can get a 6216 to *not* require its buffer to be full all the time. I can't easily confirm whether this question is specific to the 6216.

Thanks!

-Dan

dmorris · ‎10-03-2013

Ping on this thread? The last I heard was a rec to look at a documentation page that didn't seem relevant...

Summarizing: I'm trying to control the latency of a hardware-timed, non-regeneration write task by pre-buffering some amount of data. This is standard practice, for example, for audio APIs (e.g. WinMM, DirectSound). Hoping to get the same behavior from DAQmx. I have a vague notion that DAQmxGetAODataXferReqCond is relevant, but it's not entirely clear how to use it properly, and the behavior I'm seeing is inconsistent with the default request condition.

Tips?

-Dan

John_P1 · ‎10-03-2013

This sounds frustrating. As you've discovered, the Every N Samples event fires when data is transferred from the software buffer to the on-board FIFO (8k on the 6216). This event is typically used as an indication of when enough space has filled up in the software buffer for you to write new data (the write is a blocking call and I know I wouldn't want my application to remain unresponsive while waiting for buffer space to fill up).

If there is any data in the software buffer it will be transferred to the on-board (8k on the 6216) FIFO when there is space available. This transfer is executed by the lower level driver that you don't have explicit control over (aside form setting the Data Transfer Request Condition which apparently doesn't work on the 6216--it does seem weird though that no error is thrown, perhaps somebody from NI could look into this to determine if it is a bug).

I'd probably just suggest interpolating/repeating your samples and increasing the sample rate (cheesy I know). The 8191 sample FIFO represents ~8 seconds of data at 1 kHz, but only ~0.8 seconds at 10 kHz.

If you can't increase the sample rate (e.g. the 1 kHz clock is coming from an external source or something), you could add a second channel (just output a full buffer of 0s). The FIFO is shared so that would reduce your added latency from ~8 seconds to ~4 seconds which is pretty close to the 3 you were looking for (not ideal, I know).

If your card supported signal events (it might be worth a try, but unless something has changed I don't think it does) you could instead use a counter output to generate a pulse every N sample clocks of the analog output and use the counter output event instead of the every N samples event. Again, I don't think this will work on your hardware though...

Lastly, you could drop events altogether and poll using DAQmxGetWriteTotalSampPerChanGenerated.

Best Regards,

John Passiak

Catherine-B · ‎10-04-2013

Hi dmorris,

I will look into this issue and see if I can reproduce it.

Best Regards,

Catherine B.
Applications Engineer
National Instruments

Catherine-B · ‎10-08-2013

Hi dmorris,

The Data Transfer Request Condition was working for me. Perhaps you could post your code (or a small subsection)?

Regards,

Catherine B.
Applications Engineer
National Instruments

dmorris · ‎10-08-2013

This issue about the data transfer request condition may be a red herring; the core question is:

*** Is there any way to tune the buffering latency for non-regenerated output? ***

I.e., can I trade a smaller buffer (so a higher risk of getting an underflow error) for lower latency? This tradeoff is standard practice in most streaming interfaces, and it seems critical for any kind of closed-loop control, so I'm hoping there's some way to do this. Or is there *always* a fixed number of samples that live between my write calls and what appears on the wire?

With that said... I went back and revisited the question about the data transfer request condition. It appears that my statement that I couldn't change it to anything other than "DAQmx_Val_OnBrdMemNotFull" was because I had already set the regen mode to DAQmx_Val_DoNotAllowRegen. I can write and read back the data transfer request condition before I do this. I don't know if this is exactly a bug, but IMO DAQmxSetAODataXferReqCond should probably return an error if DAQmx_Val_DoNotAllowRegen has been set; right now it just silently fails, e.g. in this case:

DAQmxSetWriteRegenMode(*m_taskHandlePtr, DAQmx_Val_DoNotAllowRegen);

DAQmxSetAODataXferReqCond(*m_taskHandlePtr, physicalChannel, DAQmx_Val_OnBrdMemEmpty);

int32 xferReqCond = 0;

// This will always read "DAQmx_Val_OnBrdMemNotFull", no matter what I write in the previous call

DAQmxGetAODataXferReqCond(*m_taskHandlePtr, physicalChannel, &xferReqCond);

Thanks!

-Dan

Catherine-B · ‎10-09-2013

Hi Dan,

Yes, you can tune the buffering latency for non-regenerated output. And the Data Transfer Request Condition is the property that allows you to accomplish this, as discussed in the article below. Though the article discusses using LabVIEW rather than a text-based language, the property should work the same way.

http://www.ni.com/white-paper/4402/en/

In the example attached to the article, the regeneration mode is first set, and then the data transfer request condition. So, the ordering you originally had should work. I will look into whether this is an issue specifically seen when using text-based DAQ functions.

Once you change the order of the function calls so that you are changing the data transfer request condition and then setting the regeneration mode, do you see the results you are looking for?

Regards,

Catherine B.
Applications Engineer
National Instruments

dmorris · ‎10-09-2013

Now I think we're onto something... 🙂

That white paper was really helpful. It sounds like "on board memory empty" is exactly what I want, and the problem may be the fact that something else about my task is making it impossible to set this as the transfer request condition.

I also made an error in my previous evaluation: it was not disabling regeneration that was forcing the request condition mode to "DAQmx_Val_OnBrdMemNotFull", it was setting a buffer size. It doesn't matter what I set it to: 1 sample, 10000 samples, 1000 samples... once I set a buffer size, DAQmxSetAODataXferReqCond always returns without error, but DAQmxGetAODataXferReqCond always indicates that the request condition is DAQmx_Val_OnBrdMemNotFull, which gives me 8000 samples of latency in the FIFO.

If I don't explicitly set the buffer size, my first write call sets the buffer size, and in fact this has the same effect: it immediately sets the data transfer request condition to DAQmx_Val_OnBrdMemNotFull.

I pulled this all out of the code it was tied into so I could share a standalone example. This example will:

1) Create an analog voltage channel at 1kHz

2) Set and confirm the data transfer request condition

3) Optionally set the buffer size (according to a #define), then confirm the data transfer request condition

4) Disable regeneration, then confirm the data transfer request condition

5) Register EveryNCallback and Done events

6) Write some data to the FIFO without autostart, then confirm the data transfer request condition

7) Start the task, then confirm the data transfer request condition

The example generates a sine wave on a hard-coded output pin (Dev3/ao0), and you can change the frequency and amplitude of the sine wave with simple keyboard commands. This lets me eyeball the latency (that channel goes right to a scope), and as reported, right now, no matter what I do, it's always around 8 seconds at 1kHz. Also consistent with this: no matter what I do, the device always triggers the EveryNCallback 8 times immediately and is sad if I don't immediately fill up the 8-second buffer.

I'm pasting (below) what the output looks like for me; this will likely not make sense without looking at the code, but basically as soon as I write data (the task has not yet started), the data transfer request condition becomes DAQmx_Val_OnBrdMemNotFull and there's nothing I can do to change it.

So the question remains: is there some way I can dial down the latency from 8 seconds for my 1kHz output channel? Part of the answer to this may be: am I doing something wrong that is preventing me from changing the data transfer request condition?

Code is attached as a Visual Studio 2012 project.

Any tips?

Thanks!

-Dan

Buffer size after DAQmxCfgSampClkTiming is 0
Data transfer request condition after DAQmxCfgSampClkTiming is 10242 (DAQmx_Val_OnBrdMemNotFull)
Data transfer request condition after DAQmxSetAODataXferReqCond is 10235
Buffer size after DAQmxSetWriteRegenMode is 0
Data transfer request condition after DAQmxSetWriteRegenMode is 10235 (DAQmx_Val_OnBrdMemEmpty)
Buffer size after Initial writes is 1000
Data transfer request condition after Initial writes is 10242 (DAQmx_Val_OnBrdMemNotFull)
Buffer size after DAQmxStartTask is 1000
Data transfer request condition after DAQmxStartTask is 10242 (DAQmx_Val_OnBrdMemNotFull)
Task started, use "quit" to quit...
Generating sine data for buffer 1 at time 0.000000 seconds
Generating sine data for buffer 2 at time 0.000000 seconds
Generating sine data for buffer 3 at time 0.000000 seconds
Generating sine data for buffer 4 at time 0.000000 seconds
Generating sine data for buffer 5 at time 0.000000 seconds
Generating sine data for buffer 6 at time 0.000000 seconds
Generating sine data for buffer 7 at time 0.000000 seconds
Generating sine data for buffer 8 at time 0.000000 seconds
Generating sine data for buffer 9 at time 0.281000 seconds
Generating sine data for buffer 10 at time 1.281000 seconds
Generating sine data for buffer 11 at time 2.281000 seconds
Generating sine data for buffer 12 at time 3.281000 seconds
Generating sine data for buffer 13 at time 4.281000 seconds
quit
Press any key to continue . . .

Multifunction DAQ

Reducing pre-filling for buffered analog output

Reducing pre-filling for buffered analog output

Re: Reducing pre-filling for buffered analog output

Re: Reducing pre-filling for buffered analog output

Re: Reducing pre-filling for buffered analog output

Re: Reducing pre-filling for buffered analog output

Re: Reducing pre-filling for buffered analog output

Re: Reducing pre-filling for buffered analog output

Re: Reducing pre-filling for buffered analog output

Re: Reducing pre-filling for buffered analog output

Re: Reducing pre-filling for buffered analog output