Maximizing TDMS streaming speed

pwwLabView · ‎08-24-2011

24 August 2011

Hello,

I'm currently using Windows LabView v.8.5. I'm trying to write data via the TDMS api at a very high rate, but the best I can do is write 8 channels at about 2000 samples/sec. Right now, my VI is setup to write 4 channels per call to the TDMS write function. If I call the TMDS write function less by writing a larger group at one time, will this improve efficiency? Should I create a large buffer array, collect my samples, and the write the whole large array at once? If so, what is the best way to accomplish this? Or is there another solution?

thanks

-phil (phil@weballey.com)

Matt_W1 · ‎08-25-2011

TDMS should be able to handle much more than 8 channels @ 2kHz (depending on hardware the limit should be hundreds to thousands of times more). Anyway assuming you don't have horrendous IO contention (streaming video to or from the disk) or fragmentation (it'd have to be really bad).

The things I can think are

Grouping the data (as you mentioned as a possible fix), can help immensely if you're writing data one sample at a time. But there are diminishing returns from making the grouping larger. A simply way is to set the minimum buffer size for each channel. I don't know how your collecting data but in with high speed data collection in a none realtime system you'll probably receive the data already grouped, and you configure how large you want that grouping when you setup your data collection (if you don't wont to group things by hand).

Don't close and reopen the tdms file in a loop (every time a tdms file is open the file is scanned through which can really slow things as the file gets larger).

Without example code it's hard to say where the problem is. There might be a slow done with a piece of code completely unrelated to the writing the data.

It's possible the harddrive your saving to is stuck in PIO mode (older versions of windows would downgrade a drive from DMA mode to PIO mode if it had enough problems with the drive). If I remember right in xp and older you can check the device manger open up the IDE Controller that the drive is attached to (Not the drive itself), and under advanced settings there will a current transfer mode is should have some kind of DMA, if not change the transfer mode to "DMA if available" (updating drivers for the controller might help stop it from getting unset again)

I don't know how fast you intend to go but there were some speed improvements in tdms write in LV2009 (about 4 times faster), but you're not anywhere close to the old speed limits, so I doubt upgrading would gain you much much until you work your current problem.

pwwLabView · ‎08-25-2011

25 August 2011

Thanks for the reply.

I'm collecting data from an ethernet stream, but the data is only grouped (4 channels) per time-step. I'm currently just writing each group when I receive them. I can't changing the "group" that I receive. Thus, I'm think that I should group the "group" into a larger chunk, and write that with a call to the TDMS write function.

Is there a way to create an array (composed of X consective samples) to wire to the TDMS write so that I don't have to modify the "channel name" entries? Or should I just modify the "channel name" array to reflect the repeated entries?

Also, the information about the buffer size isn't in my help documentation. Are there any other things I should try?

BTW I'm running LabView inside a Parallels VM on Mac OS X, so I know that my disk I/O is probably degraded quite a bit. The ethernet performance is quite good, however.

-phil

Matt_W1 · ‎08-25-2011

Writing 4 samples at a time will definitely slow you down. I hacked together a grouping size speed test if you want to figure out a good size for your system. Since there were some tdms performance changes between 8.5 and later versions of LabVIEW I can't really say what would be an ideal size to aim for, on my system with LV2011 and default tdms settings 1K and 1M seem to be good sample sizes (but since by default the OS buffer is disabled the will probably be smother for you on the older system).

To write multiple channels with multiple samples you need to pass either a 2d array of basic datatypes (doubles or singles) or a 1d array of waveforms. You may need to transpose the 2d array depending on how you put it together.

Info on the minimum buffer size (the link says it should work for LV8.5) with this you wouldn't need to group the data by hand.

http://digital.ni.com/public.nsf/allkb/63DA22F92660C7308625744700781A8D

pwwLabView · ‎08-26-2011

26 August 2011

OK. I've grouped the data, inserting 50 entries (5 channels, 10 samples) per call to TDMS. I can sustain entering 2 groups at a time at a sample rate about 4000 hertz (TDMS called at 400 hertz). However, I seem to hit a wall at about 70,000 entries. The data begins to back up, and the writes really slow down tremendously. Writing at TDMS rate 200 hertz lets me go twice as long, and I still hit the approx. 70,000 entries before the slowdown. At 8000 hertz (TDMS rate 800hz), I get to about 41,000 entries before the back up occurs. I'm currently using 1000000 for NI_MinimumBufferSize, and all entries are single precision floats.

-phil

Jacob_K · ‎08-30-2011

One thing that you might want to do is make sure that you are using an efficient vi architecture. When reading and writing data into and out of a buffer it is gennerally wise to use a producer consumer architecture. Here is a link to an example that uses TDMS and a producer consumer architecture:

https://decibel.ni.com/content/docs/DOC-8962

You might want to upload your vi as well so that others might be able to provide you with more exact feedback.

Jacob K.

Jacob K

DFGray · ‎08-31-2011

Without code, we can't tell, but it appears your slowdown occurs when you fill the operating system's write buffer, i.e. when you start actually writing to disk instead of memory. Newer versions of TDMS will automatically bypass this buffering and write to disk in block mode to speed things up. This may be your problem, since the Parallels VM may mess this up and use the buffer anyway. Try turning this off (it is a boolean input on the open VI). Your minimum buffer size is probably too large, as well. Try reducing it to something around 50kBytes to 100kBytes (with buffered writes, I have found 65kBytes to be optimum on Windows).

Above all, don't be afraid to plot your data rates vs. buffer size in both buffered and non-buffered modes, using single and multple waveforms per write. Look for write speed maxima. You may find them in unforseen locations. TDMS is currently heavily optimized to run on Windows platforms. Running it in Parallels introduces some extra variables that will need to be worked through.

Good luck.

pwwLabView · ‎08-31-2011

Well, I'm using the exact design pattern, as specified in the prior posting.

As to the buffering, it does appear that its the transaction rate to TDMS that matters, rather than the size of the transaction. I rewrote my VI to allow me to experiment with various size groupings and found that if I insert about 1000 samples (8 channels per sample) at a time, I can achieve really impressive streaming rates. Which means that for the same data quantity, calling TDMS write less with larger insertions is more efficient that calling TDMS write more with smaller insertions.

I'd love to post my VI but its rather large (screen-wise). Maybe a JPG of the section might suffice?

-phil

ColeVV · ‎01-15-2013

Hi All,

This question seemed related so I thought I would post it here.

I have a producer consumer loop where I am acquiring data from an IMAQ device, and trying to save to a list of .tdms files (perhaps a single file being appended to is better?). I am trying to save around 1500000 frames each with 912 pixels (I16). I know that I need the ability to split those 150000 frames into subframes, and on the producer side I know that as long as I have my subframe size of 1000 or greater I can stream data at 230kHz without missing data.

On the consumer side, I seem to get very different data streaming rates depending on my subframe size. I create a file for each of my subframes, and then give this array of references to my consumer loop. For subframes of size 100, I can save at 40 kilo-frames-per-second, however when I go to subframes of size 1000, the save rate goes to 3.8 kilo-frames-per-second. Jumping to subframes of size 10000 (which is actually my ideal for when I stick a processing loop in between these two loops) I drop to 0.33 kilo-frames-per-second. Obviously, as I jump factors of 10 in my subframe size I am dropping by factors of 10 in my data saving rate.

It feels like I should be able to get gains by:

1) Playing with the disk cache size, the buffer size, or by disabling buffering.
2) Setting the properties of the data before saving (is this like a preallocation?, or is the what the "TDMS Reserve File Size" is for?)
3) Perhaps using the advanced TDMS functions?

For each of those points, I am not sure which avenue I should be exploring. Preliminary tests with disk cache sizes, and buffer sizes didn't give any gains.

Also, some design considerations I have to keep in mind.

1) I know that I could reshape the data out of my IMAQ array into a 1D array which would let me acquire and save in different chunk sizes. I am loath to do that as the acquire loop is real-time and cannot miss any data. I guess the reshape could be done in the consumer loop but there seems like there should be a better way.
2) I will eventually insert a processing loop in between these such that the procssing loop consumes the camera data and produces the save data.

Does anyone have any suggestions? Here are a few links that I found that didn't seem to answer the question completely.

http://forums.ni.com/t5/LabVIEW/How-to-use-TDMS-Advanced-functions-properly/td-p/2039026

deppSu · ‎01-16-2013

"TDMS Reserve File Size" is used for asynchronous write of advanced TDMS functions.

For standard TDMS Write, for better performance, you can disable-buffering and set the disk cache size to 4MB.

If the performance can't yet satisfy your requirements. The next step is to use TDMS advanced API (asynchronous) which can reach the maximum disk throughput on windows. The benchmark and example usage can be found in LabVIEW like "TDMS Advanced - Asynchronous Write" and "TDMS Advanced - File Write Speed Test (Asynchronous)".

Another benefit for Asynchronous TDMS is that since it's non-block IO ( return immediately instead of waiting until the IO is finished), you can use only one loop instead of 2 loops in producer-consumer pattern!

LabVIEW

Maximizing TDMS streaming speed

Maximizing TDMS streaming speed

Re: Maximizing TDMS streaming speed

Re: Maximizing TDMS streaming speed

Re: Maximizing TDMS streaming speed

Re: Maximizing TDMS streaming speed

Re: Maximizing TDMS streaming speed

Re: Maximizing TDMS streaming speed

Re: Maximizing TDMS streaming speed

Re: Maximizing TDMS streaming speed

Re: Maximizing TDMS streaming speed