"One header only" for TDMS writes?

NI-hilator · ‎02-21-2014

I will be writing some labview code to produce TDMS files where there will be frequent writes of small amounts of data of long periods of time. After doing some research on TDMS files, I am concerned about all the additional headers that may be created and about the file size and fragmentation as a result of this use case. It seems that TDMS is optimized for streaming data and not as I mentioned.

I read this in a forum “Every time you call TDMS Write you will create a header on disc. “ I have verified this in the following in the NI KnowledgeBase but that was for Labview version 8.5. http://digital.ni.com/public.nsf/allkb/63DA22F92660C7308625744700781A8D “Any time data is written to a TDMS file a header is written along with the data to the TDMS index file. This means if you write 1 point at a time to the TDMS file then there is header information for every data point in your file. ” I have also read in a different forum that with LV 2009 came a feature called “one header only” for TDMS (if you keep writing to the same channels, every time with the same number of values, you'll have only one header in the file), however I can only find that property when it comes to using the Write to Measurement File Express VI. Even looking at the table of properties in the 2013 Help for the TDMS Set Properties Function, I do not see a “one header only” property. Can anyone tell me how to do this with standard and/or advanced TDMS functions?

From the Labview 2013 Help “Standard versus Advanced TDMS Functions” “The standard TDMS VIs and functions write meta data and raw data at the same time. The Advanced TDMS VIs and functions enable you to write meta data and raw data separately.” So, again, does this mean that Advanced TDMS functions DO NOT create a header on disc each time you write? Or is this solely dependent on setting the minimum buffer size NI_MinimumBufferSize? What about interleaved write vs decimated write? How does this effect when/if a header gets re-written to the TDMS file? Do I even need to do anything different or just, as quoted, “keep writing to the same channels, every time with the same number of values” to avoid creating a new header?

Thanks in advance!

zaizhou.ma · ‎02-24-2014

Hi,

The one header only feature is enable by default and cannot be turned off. But to hit this feature, it has strict conditions: same channels and same number of values for every time. In order to decrease the headers, you can set the 'NI_MinimumBufferSize' property to channels. It means that TDMS will hold the values of channels in memory until the count of values is larger than this 'NI_MinimumBufferSize' property, then a new header will be added to files.

Based on your above descriptions, I believe this NI_MinimumBufferSize property is what you need and it is workable for Standard TDMS.

Thanks,

Zaizhou Ma

NI-hilator · ‎02-24-2014

Thanks for the reply Zaizhou.

I am aware of setting the buffer size reduced the number of writes, however at the risk of data loss. You mentioned "... set the 'NI_MinimumBufferSize' property to channels" ? What do you mean by "to channels"? Do you you mean set it for each channel?

Also, according to Labview 2013 help, NI_MinimumBufferSize is valid only if the data layout input of the TDMS Write function is decimated (i.e. non-interleaved). Does an interleaved write result in a new header with each write? Again, what about the Advanced TDMS functions - will aTDMS Advanced Open with a TDMS Advanced Write bypass the header being written with each write?

Karl-G · ‎02-25-2014

Looking over the Advanced palette of TDMS streaming options, it appears that those VIs will allow you to specify asynchronous or synchronous data streaming for the chosen file. I don't see any further options for formatting the write operation.

Honestly, for your application (small amounts of data), TDMS streaming may not be the ideal operation. You may have better results using a standard write to ASCII file, which would give you more control over what is being written. Rather than writing continuously the small packets, you could use a software buffered acquisition and then write a larger file each at a specified time interval or sample range.

NI-hilator · ‎02-25-2014

Thanks for you feedback Karl. I have approximately 20-30 data values that are being recorded continuously, i.e. weeks and months. Some every 10 seconds and some every 10 milliseconds. I think a text file would become unwieldy.

NI-hilator · ‎02-26-2014

Will a TDMS Defrag remove redundant header information?

Karl-G · ‎02-26-2014

I'm unsure which pieces of the TDMS index that the TDMS Defragment function will affect. It will attempt to declutter the file to improve access performance.

Similar to my suggestion of using a different file type, buffering a number of samples into fewer write operations will also limit the number of headers written. Essentially, the headers contain timestamp and channel data which should only be written once per write. So if 20 write operations of 1 data point are called, it will result in 20 headers.

Feel free to explore the effects of both the Defrag function and look into using a larger software buffered write operation.

NI-hilator · ‎02-26-2014

Thanks Karl.

I just read the following description of TDMS Defrag in a white paper: "Each call of the TDMS File Write VI records a block of data to file, which may contain several channels. This prevents the data for a single channel from being written contiguously across multiple writes. The TDMS File Defragment VI reorganizes the file by channel to optimize read operations. In this way, the user can take the performance penalty only once by defragmenting the TDMS file “offline” rather than with every read." But still, it did not explicitly state that the extra headers are being removed, although the reduction in file size certainly indicate such. I did verified it by opening the tdms file before and after a defrag with a hex file editor. What remains a question is how the interleaved vs. decimated writes effect the "one header" or multiple headers. Here is what I found:

Interleaved, one group, repeated write of same channels = one header

Decimated, one group, repeated write of same channels = one header

Interleaved, two groups, repeated write of same channels = multiple headers

Decimated, two groups, repeated write of same channels = multiple headers

So it seams with the standard TDMS Write, the data layout (decimated/interleaved) has no effect on the number headers so long as you write the same data to the same channels. It's too bad this "One header only" concept wasn't carried through to include repeated writes to the same groups. I noticed that the TDMS write can only write to one group at a time as evidenced by the fact that the command accepts an array of channel data but only a single group name.

As for buffering, according to the LabVIEW 2013 help, the NI_MinimumBufferSize is valid only if the data layout input of the TDMS Write function is decimated. In my case I am using the interleaved setting to write an array of data to multiple channels. I could, as you mentioned, buffer the data before writing to reduce the the initial size of the file but I want to minimize data loss in the even of a power outage.

Other than buffering the data for fewer writes, is there a way to write to multiple group and NOT cause the multiple headers with each write?

zaizhou.ma · ‎02-27-2014

Yes, you are right. You should set NI_MinimumBufferSize to each channel if you need every channel to buffer values. And it is valid only for decimated data.

For Adv TDMS, some functions are used to set header information and some functions are used to write raw data. For example. If you call 'set channel information' once and call ‘Advanced Write’ for many times, then there will be only 1 header. But, you have to make sure that the raw data layout passed in by ‘Advanced Write’ should meet the meta-data information passed in 'set channel information'.

zaizhou.ma · ‎02-27-2014

@NI-hilator wrote:

Will a TDMS Defrag remove redundant header information?

Yes, sure. The TDMS defragment will finally reorganize the groups and channels to one same segment (one header you mentioned). So, after defragment, it should only one header in the disk. The file size should shrink and the access performance should improve.

LabVIEW

"One header only" for TDMS writes?

"One header only" for TDMS writes?

Re: "One header only" for TDMS writes?

Re: "One header only" for TDMS writes?

Re: "One header only" for TDMS writes?

Re: "One header only" for TDMS writes?

Re: "One header only" for TDMS writes?

Re: "One header only" for TDMS writes?

Re: "One header only" for TDMS writes?

Re: "One header only" for TDMS writes?

Re: "One header only" for TDMS writes?