DAQmxWriteDigitalU32 strange behavior / slow when overwriting entire buffer in regen mode - DMA completion wait?

alexkai · ‎09-14-2011

Greetings--

I am programming a soft realtime system that acquires analog data, does some DSP, and generates PWM on digital output ports.I am using X series PCIe-6353. I am having issues with DAQmxWriteDigitalU32 function that updates a buffer of 4096 samples, which runs at 2MS/sec.

Here's the pertinent code:

		terr = DAQmxSetWriteRelativeTo(taskPWMDO, DAQmx_Val_FirstSample); 

<error handling code omitted>
	
		terr = DAQmxSetWriteOffset(taskPWMDO, 0);

<error handling code omitted>

		terr = DAQmxWriteDigitalU32(taskPWMDO, 
			4096,
			0,
			0,
			DAQmx_Val_GroupByChannel,
			ADO,
			&tint,
			NULL);

The idea is to overwrite the entire buffer with the new PWM data every 4096 sample clocks, with 4096 sample clocks worth of data. I don't care about glitches, it is all PWM.

Here's the odd behavior: when the buffer is defined to be 4096 samples and I attempt to write 4096 samples, the call take a LONG time to execute, e.g.:

2.01405 2.03364 1.96055 2.11388 2.02275 1.91826 3.25712 3.29755 etc..

these are msecs measured using HPET.

When I write only half the buffer in DAQmxWriteDigitalU32, that is I keep the buffer size at 4096, but write only 2048 samples, the call executes as expected:

0.0186601 0.0189711 0.0189711 0.0189711 0.0192821 0.0186601 0.0186601 0.0186601, again in msecs.

The slowdown above is 100x+ slower.

The code is otherwise IDENTICAL, I just replaced 4096 with 2048 in the code snippet above to make it go fast.

Why is there the delay in full buffer overwrite, and how to avoid it?

I have done some experiments, and generally it seems that one a few combinations work:

4096 or larger buffer / 2048 or smaller write is fast

2048 buffer / 2048 write is slow

2048 buffer / 1024 write is slow

Running at 3072 (of 8192) produces a curious result - every other one is fast:

0.0186601
1.57709
0.0189711
1.48254
0.0186601
1.38738
0.0186601
1.29066
etc.

So this seems to be related to the hardware FIFO size which 2047 samples, and it seems that the write function is waiting on DMA completion. I cannot write more than then FIFO size and I need to have at least FIFO size worth free in the buffer, otherwise there's wait. How can I bypass that -- I don't about glitches?

I have checked for the obvious (lack of DMA channels, etc. etc). The output buffer size is set explicitly using DAQmxCfgOutputBuffer. Please see the config log:

AI: Created channel :/Dev1/ai0, termconfig=10106.
AI: Created channel :/Dev1/ai1, termconfig=10106.
AI: Created channel :/Dev1/ai2, termconfig=10106.
AI: Created channel :/Dev1/ai3, termconfig=10106.
AI: Created channel :/Dev1/ai4, termconfig=10078.
AI: Created channel :/Dev1/ai5, termconfig=10078.
AI: Created channel :/Dev1/ai6, termconfig=10078.
AI: Created channel :/Dev1/ai14, termconfig=10078.
AI: Default input buffer size is 131072
AI: Default on-board input buffer size is 511
AI: Default FIFO transfer mode is 10241
AI: Default FIFO transfer mechanism is 10054
AI: Setup complete.
AO: Created channel :/Dev1/ao0.
AO: Start trigger source:/Dev1/ai/StartTrigger.
AO: Default regen settings is 10097
AO: Default output buffer size is 0
AO: Default on-board output buffer size is 8191
AO: Default FIFO transfer mode is 10242
AO: Default FIFO transfer mechanism is 10054
AO: Setup complete.
DO: Created channel :/Dev1/port0.
DO: Clock source is :/Dev1/ao/SampleClock.
DO: Start trigger source:/Dev1/ai/StartTrigger.
DO: Default FIFO transfer mode is 10242
DO: Default FIFO transfer mechanism is 10054
DO: Default on-board output buffer size is 2047
DO: Setup complete.
DODebug: Created channel :/Dev1/port1/line1:2.
DODebug: Setup complete.
Initialize: complete

AO: Output buffer size is 4096
DO: Output buffer size is 4096
DAQStart: complete

Thank you for reading!

-Alex

Andrew_B · ‎09-15-2011

Hello alexkai,

This is Andrew Brown, an Applications Engineer from National Instruments. I appreciate your thorough writeup of your issue and related question. First, I want to confirm that the maximum FIFO size on your PCIe-6353 is 2047 samples according to page 7 of the NI 6353 Specifications document. This can lead to issues in applications such as the one you have provided.

To specifically answer your question, you can set the buffer to not allow regeneration to avoid the significant slowdown. This is presented in Configuring the Data Transfer Request Condition in DAQmx. Additionally, you can try to use a different data transfer mechanism to reduce the slowdown. This idea is presented in the article Configuring the Data Transfer Mechanism (Interrupts or DMA).

Please let me know if you have related questions. Thanks, and have a great day!

Regards,

Andrew Brown

Software Engineer
National Instruments

ZachHindes · ‎09-15-2011

Hey alexkai,

First off, is there any way I can see all the code related to the taskPWMDO? I'm curious what (if any) other configuration you've done on it. You said you have tried modifiying the buffer size, but I'd like to see how you are doing that.

As for some stuff I'd like you to answer/try:

I'm curious what happens if you don't use DAQmxSetWriteRelativeTo and DAQmxSetWriteOffset. Setting relative to First Sample seems a little strange to me. I'm pretty certain that is relative to the First Sample of the entire generation. I'm a little surprised that isn't erroring.
Are you modifying any of the regeneration settings?
Are you writing before you start the DO task or are you just setting the buffer size?
What version of DAQmx are you using? I didn't see that mentioned.

I'll try to reproduce some of the behaviors you're seeing here to see if I can help explain and fix it up for you.

------
Zach Hindes
NI R&D

alexkai · ‎09-15-2011

Andrew,

Thank you for the response. I have considered the PIO option and the non-regen option. The PIO is slow, and is not prefereable because I have a lot going in with the DAQ card already. The non-regen option is the backup plan (which now seems more likely). I am familiar with the functions to turn that on and off.

My direct questions is this: given that the write functions seem to wait for DMA completion because is doesn't want to overwrite the active DMA transfer area, is it possible to turn off the wait?

I understand that most of the time, the wait is there to prevent output corruption, but I do not care about that. I am outputting PWM signals that do not change too much from frame to frame and it does not matter if some part of the previous buffer is never generated -- that would only mean that my system has faster response time.

I don't know if it is any more efficient, but we do have a support contract and we could communicate directly.

-Alex

alexkai · ‎09-15-2011

Zach,

The system runs daqmx 9.3 (I can upgrade to 9.4 if that makes any difference), Windows 7 x64, Visual Studio 2010, all the latest patches, core i5 quad core 4.0GHz, 16GB RAM.

To answer your questions:

- DAQmxSetWriteRelativeTo and DAQmxSetWriteOffset are there for paranoia reasons, they don't seem to have any effect. In my original design, I set the buffer size to 4096, and I write each time in blocks 4096, so the write should start at 0 position regardless.

- I tried modifying the regen settings, but the regen is on by default, so it doesn't seem to make any difference if I re-enable it.

- Before I start the task, I set the buffer size to 4096 and I write one full buffer (4096 samples) before calling StartTask.

Now for the code:

Configuration:

	terr = DAQmxCreateTask("PWMDO_Chip", &taskPWMDO);
	CheckNIErr(terr, HERR_DOCreateTask, "ADO: DAQmxCreateTask failed.\r\n");
 
	ts = "/Dev1/port0/line0:7";
	terr = DAQmxCreateDOChan(taskPWMDO,
							 ts.c_str(),
							 "",
							 DAQmx_Val_ChanForAllLines);
	  CheckNIErr(terr, HERR_DOCreateDOChan, "ADO: DAQmxCreateDOChan failed creating channel id " << ts << ".\r\n");
	tout << "ADO: Created channel :" << ts << ".\r\n";
	
	ts = "/Dev1/ao/SampleClock";
	terr = DAQmxCfgSampClkTiming(taskPWMDO,
								 ts.c_str(),
								 2000000,
								 DAQmx_Val_Rising,
								 DAQmx_Val_ContSamps,
								 4096);
	CheckNIErr(terr, HERR_DOCfgSampClkTiming, "ADO: DAQmxCfgSampClkTiming failed.\r\n");
	tout << "ADO: Clock source is :" << ts << ".\r\n";
 
	// The output task uses the input task's start trigger.  This causes the tasks to start at 
	// exactly the same time in hardware.
	AcqStateNext();	
	ts = "/Dev1/ai/StartTrigger";
	terr = DAQmxCfgDigEdgeStartTrig (taskPWMDO, 
									ts.c_str(), 
									DAQmx_Val_Rising);
	CheckNIErr(terr, HERR_DOCfgDigEdgeStartTrig, "ADO: DAQmxCfgDigEdgeStartTrig failed.\r\n");
	tout << "ADO: Start trigger source:" << ts << ".\r\n";
 
	terr = DAQmxCfgOutputBuffer (taskPWMDO, 4096);
	CheckNIErr(terr, HERR_DOCfgOutputBuffer, "ADO: DAQmxCfgOutputBuffer failed.\r\n");
 
	terr = DAQmxSetWriteRegenMode(taskPWMDO, DAQmx_Val_AllowRegen);
	CheckNIErr(terr, HERR_DOSetWriteRegenMode, "ADO: DAQmxSetWriteRegenMode failed.\r\n");
 
	// **************************** EVENTS
	// Register "DONE" event that handles cleanup and errors
 
	terr = DAQmxRegisterDoneEvent(taskPWMDO,
		0,
		DAQDoneCallback,
		&ltaskPWMDOToken);
	CheckNIErr(terr, HERR_AORegisterDoneEvent, "AAO: DAQmxRegisterDoneEvent failed.\r\n");
	tout << "ADO: Setup complete.\r\n";

Start task:

	// Write the data (full buffers)
	tstatus = DAQWriteOutputs(tout, aulDOWaveform, 4096, afAOWaveform, 4096);
	if (tstatus != HERR_NONE) return CHP::CHPProcCStatus(CS_UNKNOWN, tstatus, &tout);
 
	// *************************** START TASKS *************************************************
	// Start Digital output first, it won't really start because its clock is provided by ADO and 
	// its start edge is set to AI.
 
	btaskPWMDOrunning = 1;		
	terr = DAQmxStartTask(taskPWMDO);
	CheckNIErr(terr, HERR_DOStartError, "DAQStart: PWMDO: DAQmxStartTask failed.\r\n");

Write routine:

	// Write to Digital Out
	if (NULL != ADO) {
		terr = DAQmxSetWriteRelativeTo(taskPWMDO, DAQmx_Val_FirstSample);
		CheckNIErr(terr, HERR_DOWriteError, "DAQWriteOutputs: DAQmxSetWriteRelativeTo failed.\r\n");
 
		terr = DAQmxSetWriteOffset(taskPWMDO, 0);
		CheckNIErr(terr, HERR_DOWriteError, "DAQWriteOutputs: DAQmxSetWriteOffset failed.\r\n");
 
		terr = DAQmxWriteDigitalU32(taskPWMDO, 
			ADOSize,
			0,
			0,
			DAQmx_Val_GroupByScanNumber,
			ADO,
			&tint,
			NULL);
		CheckNIErr(terr, HERR_DOWriteError, "DAQWriteOutputs: DAQmxWriteDigitalU8 failed.\r\n");
	}

I can send you an entire application that you can run and observe the behavior, if that's helpful. Thank you in advance.

-Alex

alexkai · ‎09-15-2011

Zach,

Important detail: the entire thing runs off EveryNSamples callback from the analog input task. The callback happens once every 2048 samples for AI @ 1MS/s or 4096 samples for DO @ 2MS/s.The callback function processes the incoming data, computes the control output, generates the PWM data and sends it off to the digital output.

-Alex

ZachHindes · ‎09-15-2011

After much spelunking and grinding of teeth, we've figured out why the behavior is what you see. Here is what you're doing at a high level:

The black outline is the virtual buffer that DAQmx owns. Since we're doing a continuous task, conceptually this buffer can be looped over indefinitely. The first thing to notice is that hardware is consuming data and is at some random point in the buffer. I've indicated that with the purple arrow. DAQmx doesn't want to update the buffer out from underneath hardware as its consuming because that would always cause a glitch. So we do the best thing we can do, and move the write position past the hardware position.

Now, remember, this the black outline is the virtual concept of the buffer. Physically, the buffer is only a single chunk of memory that we loop around. So in this new picture, we're still updating the same position, but now we're going to wait so that we don't overwrite hardware as it consumes.

Now, the next thing to know is that we never write to the buffer more than half the buffer size at at time. So, to update the full buffer, we actually do two writes.

The green section is now the first write we will do, and the red section is the second write we will do.

Now, to do the first write, we still need to make sure we don't overwrite hardware. There we want to make sure it gets done consuming that portion of the buffer. So now we wait on hardware.

Ok, so once the hardware has consumed up to the dotted line, we can write the green portion. Lets assume we've done that. Now we are in this case.

Now, just like before, we don't want to overwrite what hardware is doing. So we have to wait for it AGAIN to write the red portion until the hardware has consumed the data we are going to overwrite.

Once we've waited, we can write the last half of the data.

Finally, considering this is just a visual representation of a buffer being regenerated from, this is now the final state.

That was long-winded. The bottom line is that there were two points in that sequence where we waited on hardware. One was for hardware to get halfway through the buffer. For a buffer of 4096 samples and an update rate of 2 MS/s, that should be between 0-1.024 ms. The second wait was for it to get all the way through the buffer starting from halfway. That should take 1.024 ms more. So the total is 1.024-2.048 ms which is what you're seeing when you write the full buffer. When you write half a buffer you will get random amounts of times from 0-1.024 ms which is also what you're seeing. On top of this there is software latencies which sometimes cause us to get in weird situations where stuff will take a little longer because we hit an edge case where we were perfectly lined up with hardware, but by the time the second half of the buffer went to be written hardware happened to pass us so we wait again!

To illustrate the 0-1ms, for example, when I do a 4096 sample buffer and a 2048 sample write I see these write times (with a very low resolution timer unfortunately):

As you can see, it often takes 0 ms (<1) and has some interesting aliasing where is starts taking 2 ms (<3 but >1) for a while.

Unfortunately, this whole mess the driver undertakes is in an effort to prevent glitching. Despite this, we glitch anyways and generate that warning I'm sure you're seeing, 200015. You don't care about the glitching and really want a way of saying, "look just overwrite immediately, I don't care if the hardware glitches", which doesn't currently exist.

All of this is to say, I'm not really sure how to make the write take less time. We need some brainstorming on that, I think.

------
Zach Hindes
NI R&D

alexkai · ‎09-16-2011

Zach,

Thank you for looking into this -- this is excellent and is very very helpful.

I've been working on the workaround, and now I've run into another related problem.

First off, the option of non-regen will not work for my purposes. If I use non-regen and I need to achieve anywhere near my desired latency, I have to run very short buffers and eventually the write is going to be late thereby crashing the entire task. The regen option is there to prevent that, and task failures (and the associated restart times) are completely not an option for me because that would damage the hardware I am controlling.

So the workaround is:

- Use a large buffer, for example 32 x 4096 samples (i.e. 32x the original 4096 size)

- Track where the generation is using DAQmxGetWriteTotalSampPerChanGenerated, and write a couple of blocks ahead of it. Use DAQmxSetWriteRelativeTo / DAQmxSetWriteOffset to correct position ONLY in case of dropped frames. Otherwise, the writing can get caught behind the DMA and that's not good because it will go into permanent "slow"/wait mode.

- In case of a dropped frame when the writes get overtaken by the DMA, the AI callbacks the drive the whole thing are eventually going to catch up, and the writing code is going to stay at a fixed distance ahead of the DMA, so it will overwrite itself a few times while processing the callbacks that happen faster than the DMA is advancing along the buffer.

In theory, this should work, except for some dropped frames that might get caught behind for a short while.

The problem now is the seemingly bogus error -200292 / DAQmxErrorSamplesCanNotYetBeWritten.

Below is the log of operation. The error occurs on first buffer wrap. It seems bogus.

FrameCount= -32 SamplesGen= 0 tmsec= 10486.9 / 26.4146 SampInBuf= 4096 DMAIndexEst= -1 WriteIndex= 0
FrameCount= -31 SamplesGen= 0 tmsec= 10513.8 / 0.0335882 SampInBuf= 8192 DMAIndexEst= -1 WriteIndex= 0

// snipped - this is pre-filling of the buffer with 32 frames of initial data 4096 samples each so that it is full when we start
// negative indices are frames before T-0 when everything starts>

FrameCount= -1 SamplesGen= 0 tmsec= 10526.3 / 0.0261241 SampInBuf= 131072 DMAIndexEst= -1 WriteIndex= 0

DAQStart: complete.

// execution starts here -- TaskStart takes about 50msec, so the DMA is half-way into the buffer when we start processing the first AI callback>
// SamplesGen = # of samples generated, SampInBuf is the difference between samples generated and written
// DMAIndexEst is index, in frames, from the top of the buffer, where we expect the most forward position of DMA to be.
// WriteIndex is the frame index, from the top of the buffer, where we expect to write our 4096 sample block

FrameCount= 1 SamplesGen= 52673 tmsec= 51.4878 / 0.0264351 SampInBuf= 82495 DMAIndexEst= 13.3594 WriteIndex= 17 **** Buffer skip forward: last= 0 next= 17
FrameCount= 2 SamplesGen= 54395 tmsec= 52.308 / 0          SampInBuf= 80773 DMAIndexEst= 13.7798 WriteIndex= 17 **** Write operation skipped, duplicate index
FrameCount= 3 SamplesGen= 55860 tmsec= 53.041 / 0.0236361 SampInBuf= 83404 DMAIndexEst= 14.1375 WriteIndex= 18
FrameCount= 4 SamplesGen= 57384 tmsec= 53.8029 / 0.000311001 SampInBuf= 81880 DMAIndexEst= 14.5095 WriteIndex= 18 **** Write operation skipped, duplicate index
FrameCount= 5 SamplesGen= 58925 tmsec= 54.5733 / 0.000311001 SampInBuf= 80339 DMAIndexEst= 14.8857 WriteIndex= 18 **** Write operation skipped, duplicate index
FrameCount= 6 SamplesGen= 60702 tmsec= 55.4615 / 0.0245691 SampInBuf= 82658 DMAIndexEst= 15.3196 WriteIndex= 19
FrameCount= 7 SamplesGen= 62259 tmsec= 56.2399 / 0.000311001 SampInBuf= 81101 DMAIndexEst= 15.6997 WriteIndex= 19 **** Write operation skipped, duplicate index
FrameCount= 8 SamplesGen= 63729 tmsec= 56.9745 / 0.0245691 SampInBuf= 83727 DMAIndexEst= 16.0586 WriteIndex= 20
FrameCount= 9 SamplesGen= 65282 tmsec= 57.753 / 0.000311001 SampInBuf= 82174 DMAIndexEst= 16.4377 WriteIndex= 20 **** Write operation skipped, duplicate index
FrameCount= 10 SamplesGen= 66777 tmsec= 58.4991 / 0         SampInBuf= 80679 DMAIndexEst= 16.8027 WriteIndex= 20 **** Write operation skipped, duplicate index
FrameCount= 11 SamplesGen= 68292 tmsec= 59.2564 / 3.593 SampInBuf= 83260 DMAIndexEst= 17.1726 WriteIndex= 21
FrameCount= 12 SamplesGen= 77066 tmsec= 63.6688 / 0.0255021 SampInBuf= 78582 DMAIndexEst= 19.3147 WriteIndex= 23 **** Buffer skip forward: last= 21 next= 23
FrameCount= 13 SamplesGen= 78658 tmsec= 64.4379 / 0         SampInBuf= 76990 DMAIndexEst= 19.7034 WriteIndex= 23 **** Write operation skipped, duplicate index
FrameCount= 14 SamplesGen= 79741 tmsec= 64.9813 / 0         SampInBuf= 75907 DMAIndexEst= 19.9678 WriteIndex= 23 **** Write operation skipped, duplicate index
FrameCount= 15 SamplesGen= 80839 tmsec= 65.5286 / 0.0214591 SampInBuf= 78905 DMAIndexEst= 20.2358 WriteIndex= 24
FrameCount= 16 SamplesGen= 81993 tmsec= 66.1049 / 0         SampInBuf= 77751 DMAIndexEst= 20.5176 WriteIndex= 24 **** Write operation skipped, duplicate index
FrameCount= 17 SamplesGen= 83062 tmsec= 66.6395 / 0.000311001 SampInBuf= 76682 DMAIndexEst= 20.7786 WriteIndex= 24 **** Write operation skipped, duplicate index
FrameCount= 18 SamplesGen= 84209 tmsec= 67.2136 / 0.0233251 SampInBuf= 79631 DMAIndexEst= 21.0586 WriteIndex= 25
FrameCount= 19 SamplesGen= 85395 tmsec= 67.8061 / 0         SampInBuf= 78445 DMAIndexEst= 21.3481 WriteIndex= 25 **** Write operation skipped, duplicate index
FrameCount= 20 SamplesGen= 86468 tmsec= 68.3426 / 0         SampInBuf= 77372 DMAIndexEst= 21.6101 WriteIndex= 25 **** Write operation skipped, duplicate index
FrameCount= 21 SamplesGen= 87542 tmsec= 68.8797 / 0         SampInBuf= 76298 DMAIndexEst= 21.8723 WriteIndex= 25 **** Write operation skipped, duplicate index
FrameCount= 22 SamplesGen= 90405 tmsec= 70.3115 / 0.0217701 SampInBuf= 77531 DMAIndexEst= 22.5713 WriteIndex= 26
FrameCount= 23 SamplesGen= 94495 tmsec= 72.3557 / 1.49187 SampInBuf= 77537 DMAIndexEst= 23.5698 WriteIndex= 27
FrameCount= 24 SamplesGen= 98669 tmsec= 74.4432 / 0.0233251 SampInBuf= 77459 DMAIndexEst= 24.5889 WriteIndex= 28
FrameCount= 25 SamplesGen= 102676 tmsec= 76.4466 / 0.0211481 SampInBuf= 77548 DMAIndexEst= 25.5671 WriteIndex= 29
FrameCount= 26 SamplesGen= 106792 tmsec= 78.5045 / 0.0220811 SampInBuf= 77528 DMAIndexEst= 26.572 WriteIndex= 30
FrameCount= 27 SamplesGen= 110873 tmsec= 80.5453 / 0.0214591 SampInBuf= 77543 DMAIndexEst= 27.5684 WriteIndex= 31
FrameCount= 28 SamplesGen= 114979 tmsec= 82.5976 / 0.0217701 SampInBuf= 77533 DMAIndexEst= 28.5708 WriteIndex= 0
FrameCount= 29 SamplesGen= 119069 tmsec= 84.6431 / 0.0211481 SampInBuf= 77539 DMAIndexEst= 29.5693 WriteIndex= 1
FrameCount= 30 SamplesGen= 123161 tmsec= 86.6885 / 0.0211481 SampInBuf= 77543 DMAIndexEst= 30.5684 WriteIndex= 2
FrameCount= 31 SamplesGen= 127272 tmsec= 88.7443 / 0.0214591 SampInBuf= 77528 DMAIndexEst= 31.572 WriteIndex= 3
Buffer wrap ********************
FrameCount= 32 SamplesGen= 131354 tmsec= 90.7922 / 27.4829 SampInBuf= 73446 DMAIndexEst= 0.568604 WriteIndex= 4 **** Err=-200292

// Error -200292 happens here for no good reasons -- the writing is staying 4 frames ahead, all is well - the error shoudn't be there
// It seems that it is trying to track my writing ... but I've overridden its positioning and I haven't written too many samples either

How do I disable/avoid the error -200292?

I've managed to avoid the DMA for the most part (except for frame 11, which I don't understand).

Now, if I simply disable the DAQmxSetWriteRelativeTo / DAQmxSetWriteOffset code below, everything works except for I don't get my low latency.

Thank you in advance for reading all of this.

-Alex

alexkai · ‎09-16-2011

The code that decides when and where to write:

CHP_STATUS DAQWriteDO(std::stringstream& tout, 
					  uInt32 ADO[],
					  int ADOSize,
					  int64 AFrameTag) {
		NI_ERR terr = 0;				// Last returned error status (NI or otherwise)
		NI_ERR twarning = 0;			// Last returned NI warning
		CHP_STATUS tstatus = HERR_NONE;	// Last assigned HERR
		int32 tint = 0;
 
		// Write to Digital Out
		if (NULL != ADO) {
 
#ifdef DEBUG_DO
			stringstream ttout;
#endif
			double tdDMAHighwater = -1;
			bool tSkipWrite = false;
 
			AcqStateNext();
			DAQmxGetWriteTotalSampPerChanGenerated(taskPWMDO, &nDOSamplesGenerated);
			CheckNIErr(terr, HERR_DOWriteError, "DAQWriteDO: DAQmxGetWriteTotalSampPerChanGenerated failed.\r\n");
 
			// Throttling is enabled when we are generating (AFrameTag > 0) and the threshold is configured
			if ((AFrameTag > 0) && (lDOStarvationThresholdFrames > 0)) {
				if (bDORegenEnabled) {
					// Compute the generation position in the buffer
					uInt64 tPosDMAMax = (nDOSamplesGenerated + lDOFIFOSize) % nDOBufferSizeScans;
					tdDMAHighwater = (double)tPosDMAMax / (double)lDOWaveformSize;
					int tIndex = ((int)ceil(tdDMAHighwater) + lDOStarvationThresholdFrames) % lDOBufferMultiplier;
#ifdef DEBUG_DO
					if (floor(tdDMAHighwater) == 0.0) ttout << "Buffer wrap ********************\r\n";
					
					if (tIndex == lDOBufferChunkIndexLast)  tSkipWrite = true;
					
					if (!tSkipWrite) {
 
						if (((tIndex - lDOBufferChunkIndexLast) != 1) && ((tIndex - lDOBufferChunkIndexLast) != -31)) {
							ttout << "Buffer skip: last= " << lDOBufferChunkIndexLast << " next= " << tIndex << "\r\n";
 
							AcqStateNext();
							terr = DAQmxSetWriteRelativeTo(taskPWMDO, DAQmx_Val_FirstSample);
							CheckNIErr(terr, HERR_DOWriteError, "DAQWriteDO: DAQmxSetWriteRelativeTo failed.\r\n");
 
							AcqStateNext();
							terr = DAQmxSetWriteOffset(taskPWMDO, tIndex * lDOWaveformSize);
							CheckNIErr(terr, HERR_DOWriteError, "DAQWriteDO: DAQmxSetWriteOffset failed.\r\n");
						}
 
					}
#endif
 
 
					lDOBufferChunkIndexLast = tIndex;
				}
			}
 
#ifdef DEBUG_DO
			double tdnow = oPWMDOTimer.MeasureMs(0);
#endif
			if (!tSkipWrite) {
				AcqStateNext();
				terr = DAQmxWriteDigitalU32(taskPWMDO, 
					ADOSize,
					0,
					0,
					DAQmx_Val_GroupByChannel,
					ADO,
					&tint,
					NULL);
 
				nDOSamplesWritten+= tint;
			}
 
#ifdef DEBUG_DO
			double tdnow2 = oPWMDOTimer.MeasureMs(0);
			double d = tdnow2 - tdnow;
 
			ttout << "FrameCount= " << AFrameTag << " SamplesGen= " << nDOSamplesGenerated << " tmsec= " << tdnow << " / " << d << 
				" SampInBuf= " << (nDOSamplesWritten - nDOSamplesGenerated) << " DMAIndexEst= " << tdDMAHighwater << " WriteIndex= " << lDOBufferChunkIndexLast;
 
			// tout << " tid=" << GetCurrentThreadId(); 
			if (tSkipWrite) ttout << " **** Write operation skipped, duplicate index "; else if (tint != ADOSize)  ttout << " *** SamplesWritten mismatch @ " << tint << " ";
			if (terr != 0) ttout << " **** Err=" << terr << " ";
			ttout << "\r\n";
			OutputDebugStringA(ttout.str().c_str());
#endif
 
			CheckNIErr(terr, HERR_DOWriteError, "DAQWriteDO: DAQmxWriteDigitalU32 failed.\r\n");
 
		}
 
		return HERR_NONE;
 
NIError:
		tout << "DAQWriteDO: NI Error: " << NIGetExtendedErrorInfo() << "\r\n" << endl;
 
		//OtherError:
		tout << "DAQWriteDO: Status: " << tstatus << "\r\n" << endl;
		tstatus = CHP::CHPProcCStatus(CS_UNKNOWN, tstatus, &tout);
 
		DAQTerminate(G::Config, false);
		return tstatus;
}

...and the configuration log:

AI: Created channel :/Dev1/ai0, termconfig=10106.
AI: Created channel :/Dev1/ai1, termconfig=10106.
AI: Created channel :/Dev1/ai2, termconfig=10106.
AI: Created channel :/Dev1/ai3, termconfig=10106.
AI: Created channel :/Dev1/ai4, termconfig=10078.
AI: Created channel :/Dev1/ai5, termconfig=10078.
AI: Created channel :/Dev1/ai6, termconfig=10078.
AI: Created channel :/Dev1/ai14, termconfig=10078.
AI: Callback registered at every 256 samples.
AI: Default input buffer size is 131072
AI: Default on-board input buffer size is 511
AI: Default FIFO thansfer mode is 10241
AI: Default FIFO thansfer mechanism is 10054
AI: Setup complete.
AO: Created channel :/Dev1/ao0.
AO: Start trigger source:/Dev1/ai/StartTrigger.
AO: Default regen settings is 10097
AO: Regen mode is 10097
AO: Default output buffer size is 0
AO: Default on-board output buffer size is 8191
AO: Setup complete.
DO: Created channel :/Dev1/port0.
DO: Clock source is :/Dev1/ao/SampleClock.
DO: Start trigger source:/Dev1/ai/StartTrigger.
DO: Default on-board output buffer size is 2047
DO: Buffer size is now: 131072.
DO: Regen mode is 10097
DO: Default FIFO transfer mode is 10242
DO: Default FIFO transfer mechanism is 10054
DO: Setup complete.
DODebug: Created channel :/Dev1/port1/line1:2.
DODebug: Setup complete.

ZachHindes · ‎09-16-2011

First off, I really like your idea!

I'm not quite able to reproduce the behavior you're seeing, but I think the problem may be there you're using offsets that are relative, not absolute. For example, once you wrap you're setting your offset back to 0 from First Sample. Offsets should be absolute, so if your buffer size is 1000, you should be doing offsets 0, 100, 200, ..., 900, 1000, 1100, ... for 100 sample writes.

Here is pseudo-code of what I have working doing the algorithm I think you described:

WritePosition = TotalSampPerChanGenerated + FIFOSize

WriteIndex = (WritePosition % WriteSize) + 1

WritePosition = WriteIndex * WriteSize

if WritePosition != OldWritePosition:

set WriteRelativeTo to First Sample

set WriteOffset to WritePosition

Write

OldWritePosition = WritePosition

With this code, I can run for thousands of iterations of actually calling DAQmx Write with no errors and the write taking no time at all. The only downfall I could see is that the Write Offset property is an i32, and eventually you will surpass max_i32. One thing you can do is change it to WriteRelativeTo "Current Write Position" with very little rework.

------
Zach Hindes
NI R&D

Multifunction DAQ

DAQmxWriteDigitalU32 strange behavior / slow when overwriting entire buffer in regen mode - DMA completion wait?

DAQmxWriteDigitalU32 strange behavior / slow when overwriting entire buffer in regen mode - DMA completion wait?

Re: DAQmxWriteDigitalU32 strange behavior / slow when overwriting entire buffer in regen mode - DMA completion wait?

Re: DAQmxWriteDigitalU32 strange behavior / slow when overwriting entire buffer in regen mode - DMA completion wait?

Re: DAQmxWriteDigitalU32 strange behavior / slow when overwriting entire buffer in regen mode - DMA completion wait?

Re: DAQmxWriteDigitalU32 strange behavior / slow when overwriting entire buffer in regen mode - DMA completion wait?

Re: DAQmxWriteDigitalU32 strange behavior / slow when overwriting entire buffer in regen mode - DMA completion wait?

Re: DAQmxWriteDigitalU32 strange behavior / slow when overwriting entire buffer in regen mode - DMA completion wait?

Re: DAQmxWriteDigitalU32 strange behavior / slow when overwriting entire buffer in regen mode - DMA completion wait?

Re: DAQmxWriteDigitalU32 strange behavior / slow when overwriting entire buffer in regen mode - DMA completion wait?

Re: DAQmxWriteDigitalU32 strange behavior / slow when overwriting entire buffer in regen mode - DMA completion wait?