TDMS files large, take long time to read

AlexP1 · ‎09-20-2006

I just upgraded to LAbview 8.20 and have been taking advantage of the new file write functions, the TDMS. So far, I like what I see. I have much better performance writing than I did before. However, my file sizes are huge! I need some help figuring out if this is a characteristic of TDMS, or if my programming just isn't all that efficient.

Some program info: I have two independent, parallel while loops pulling in information of off respective PXI cards. One is National Instruments based, while the other is a third party card from Condor. The first while loop is simple, and I attached screenshots of each loop. The loop is a simplified version just to show the basics. Ultimately, I open a TDMS file, set the properties, write the data from each channel plus a timestamp, then close it. About 21 signals, pretty simple.

The second while loop pulls the data in using some third party Labview functions, and is a bit more complex. It is a three channel card pulling in multiple digital signals per channel. So I use many case structures to determine what channel I'm on, whether the data being supplied is good, then finally what signal I'm actually looking at. Since each signal comes in at a different sampling rate, I need to save a timestamp for each signal, thus doubling the data I save. Off of this loop I have about 54 total signals. I think I have it programmed correctly, but this is where I probably could use some expert advice. Am I properly passing the error out plus TDMS info from the case structures to the TDMS close function at the end? Will setting up the file structure, ie channel groups, help?

So the problem is this. Both loops sample at about 10-30 Hz. A one hour test produced a 42 meg NI TDMS file and a 250 meg Condor TDMS file. The NI one opened up quickly in Diadem, while the Condor took over 2 hours to load in. The associated index file for both TDMS files are about just as big. So right off the bat it something seems wrong with the Condor file. Even though it is a larger file, it took a substantially longer time to read into Diadem than the NI file. I'm definately worried about 5 hour tests producing absolutely huge files.

For those who know Diadem:

When I save the Condor file as a TDM from Diadem, the file size shrinks down from 250 megs to about 30. Seems the TDMS files are better at storing data fast, but TDM files are better at storing data in a smaller file size. Should I convert my TDMS file to a TDM file first, before opening it up in Diadem? Is there even a way to do this in Labview?

Hopefully this message doesn't make the problem sound more complicated than it really is. I just want to be able to open up my files quickly after a test, and as of now, I have hours of loading time before even seeing the data.

Thanks,
Alex

JLS · ‎09-21-2006

Hello,

I can address the LabVIEW half of your question, but I have sent it to the Diadem team to handle that side of things (which seems to be your main question).

1. "Am I properly passing the error out plus TDMS info from the case structures to the TDMS close function at the end?"

- it looks ok, but you can use shift registers on your loops so that if an error occurs in one interation, then subsequent iterations know about it; you can also handle the case when an error occurs.

2. "Will setting up the file structure, ie channel groups, help?"

- You can give this a shot, but I have a feeling that the bottleneck is with diadem reading that huge 250MB file. There is an example using the TDMS Set Properties function; the example is called "TDMS - Write data (time domain).vi" which you can find in labview\examples\file\plat-tdms.llb.

3. You can try periodically creating new files so that your data is spread out... perhaps a file every 30 minutes would be better. Here's a general example of periodically creating new files while streaming data:

Stream to Disk with Periodic File Creation

http://sine.ni.com/apps/we/niepd_web_display.display_epd4?p_guid=0290B314ED6F46E3E0440003BA7CCD71&p_...

If you have problems with this link in the future, try replacing the sine with zone near the beginning of the url.

Best Regards,

JLS

Best,
JLS
Sixclear

Patrick_P. · ‎09-21-2006

Hey Alex,

One thing I would recommend doing is using the TDMS Defrag.vi after you are finished writing. This should increase performance when you are reading your data. It can also decrease the file size a little. Let us know if this helps out at all.

Pat P.
Software Engineer
National Instruments

AlexP1 · ‎09-22-2006

JLS, thanks for the comments. Yea, I understand that this is probably more a Diadem issue, but it always helps if a more experienced programmer can look at the code and see if there are any dumb novice mistakes in it. I did incorporate the shift registers though. Luckily no missed errors there!

Pat, I took a 350 meg TDMS file (with associated 320 meg index file) and ran it through a defrag. The resulting TDMS file was much more manageable, with a 48 meg TDMS/7kb index file, and opened up in Diadem very quickly. Unfortunately, it took over two hours to get to this point. It took about the same time to load the original TDMS file into Diadem and save it into a TDM. So I'm not saving much time there. Is there a way to incorporate this into the Labview code so it's defragging as the code is running? Then the resulting file at the end would already be defragged and ready to go into Diadem. Would this slow down the performace of the code/system?

Ultimately, I guess I want to be able to examine the results soon after a test is complete. Waiting two (or potentially more) hours just to see if a test produced good results is not a best case scenario. Hpoefully there is an easy way to get around this!

Thanks,

Alex

Jarrod_S. · ‎09-22-2006

Well, one problem is that you really don't want to defrag the file while you are writing to it, because that would ruin your file streaming speed. There are two sides to this coin:

1. TDM files are optimized for reading data quickly. All channel information is stored together in one continuous block in the file along with its metadata. This makes it quick and easy to access. It is slower to write data to file, however, because if you want to append to an existing channel, you have to move all the data after that channel in the file. This operation becomes slower as the file grows.

2. TDMS files are optimized for writing data quickly. Channel information no longer has to be stored in one contiguous block in the file. This means that you can append channel information very quickly at the end of the file, regardless of where the rest of the channel information is. The downside to this is that you will duplicate some of the metadata for that channel in the process. This could partly be what is causing your files to grow so large. You are appending potentially hundreds of redundant headers for channels.

We at least give options that are good at both sides, but bridging the gap with a file format to meet half-way would likely only produce a file format that was good at neither streaming data to file nor reading it.

Perhaps you could make the defrag process an automated overnight process somehow?

Jarrod S.
National Instruments

Jarrod_S. · ‎09-22-2006

After looking at your code, there are some other suggestions I might make as well. You could actually affect some of this "defragging" in your code itself. Write now you seem to be writing individual values to your file at a time. You acquire a sample (or 10 channels of individual samples), then write them to the file. This causes duplication of the channel header information in your file for each individual data point.

You could look at storing each channel's data in a software buffer (such as a queue or a functional global), and then only writing it to file after a certain amount of time or when the buffer gets to a certain size. By writing arrays of information in at a time instead of individual elements, you should drastically decrease the size of your file at the end. You might still want to defrag it, but it should be a much quicker process.

Check out this mock example. It shows two ways of writing tdms files with three channels of data. The first way writes each channel one data point at a time. After 10,000 iterations, the file is 1.02MB large with an index file size of around 800KB. There is also an example that stores the three channels of data in buffers until they reach the size of 1000 elements, and then it gets written to file. The same amount of data now only creates a tdms file that's 235KB large with an index file of only 1.6KB! The example is not very flexible; it is hard-coded to only work with three channels, but there are ways you could make this completely dynamic.

It's not a quick fix, exactly, but this could drastically help your file IO performance. Open Large_File.vi for more info.

Jarrod S.
National Instruments

MartinMcD · ‎04-09-2008

Hello,
I was wondering if anybody could help with the two vi's posted by Jarrod please? It seems that once the array position is greater than the buffer size then the data is just repeated. Please can anyone help with this?
I'm trying to buffer a bunch of single values and then write them to the file every now and then, exactly as in this example.

Thank you,

Martin

Herbert Engels · ‎04-09-2008

Using the "NI_MinimumBufferSize" property enables you to "defrag" the file while you are writing. It will cause LabVIEW to accumulate a given number of values in memory before writing them to disc. See this thread for details: http://forums.ni.com/ni/board/message?board.id=60&message.id=6719&requireLogin=False.
Herbert

LabVIEW

TDMS files large, take long time to read

TDMS files large, take long time to read

Re: TDMS files large, take long time to read

Re: TDMS files large, take long time to read

Re: TDMS files large, take long time to read

Re: TDMS files large, take long time to read

Re: TDMS files large, take long time to read

Re: TDMS files large, take long time to read

Re: TDMS files large, take long time to read