Best log file format for multivariable non-continous time series

Mads · ‎07-07-2011

Hi. Thanks for the support.

I'm running LabVIEW 2010 f2 32 bit.

As mentioned the TDMS functionality is great so I would love to be able to use it, but because of the way I need to do writes (many and small and without buffering (due to possible data loss on crash) - but accumulating into a big file...) it is difficult to get the necessary read performance with TDMS (the extra file size is less important).

If it had been possible to preallocate space for a given group/channel so that writes would be done more continous without repetition of the header etc. then TDMS would probably have been fast enough and I would not have hesitated on using it. I've begun writing a custom format instead that will preallocate chunks for each group and then add data to those chunks in a continous stream. The headers of each chunk contain enough information for me to quickly browse the available data and time frames. Searching for a given time period is done with a binary search within the chunks that contain data from the requested period (that's another feature that would be nice in the high level TDMS API too; The ability to search for given values (or values within a range) in a channel and then retrieve data from it and other channels that were written together with it 🙂 ).

Mads Toppe
Check out our Modbus Test Master - developed in LabVIEW

YongqingYe · ‎07-07-2011

Thank you for the thoughst for TDMS!

We have a property called "NI_MinimumBufferSize" which you can set on TDMS channels, once it set, it means when the data values accumulated to some number (like 1000 to 10000) then TDMS flush all the data value at once, it would decrease the number of headers and according to my understanding of what you mentioned, it's similar like you said as "preallocated", it just store the data values temporarily in memory.

You can refer to the details of this property in the help of "TDMS Set Property".

Thank you again!

Mads · ‎07-07-2011

Yes I know about the buffer size, but the problem with that is that we are generating data rather slowly and still need to write to disk relatively often to not lose data if the PC or software has an unexpected/uncontrolled shutdown. This is why we have to write to file once a minute - and that will only be between 1 and 60 values...So the buffer feature does not allow us to eliminate the overhead. If the TDMS format allowed us to avoid repeated headers in another fashion the problem could be solved. That is what I was thinking about with the preallocation idea mentioned in the previous message.

Mads Toppe
Check out our Modbus Test Master - developed in LabVIEW

YongqingYe · ‎07-07-2011

OK, I understand. It's a good idea to preallocate the size. However, what if the crash happens? Then the preallocated size would be filled by some dummy values, like all 0? And another concern is that whether the preallocating way would affect the performance, what do you think?

Mads · ‎07-07-2011

If a crash happens when there is no write operation the preallocated but unused part will stay as it is, and the header will tell how much of it has been filled with data (alternatively we can detect this based on the content of the chunk).

If the crash happens during a write you might get a partial entry.

- If the partial entry is in the data section the affected entries will simply be excluded because the header has not been updated yet (but because the writes happen relatively often we do not risk losing more than a minute of data).

-If it happens during an update of the header, the file could be corrupted, but other chunks might still be OK and they could also allow us to reconstruct the corrupted chunk.

We already have an auto-repair feature in the log file API we use today. It detects inconsistencies in the data on every write, and tries to eliminate them. If a crash has corrupted the last entries of the file these errors will removed from the file on the next write operation.

Mads Toppe
Check out our Modbus Test Master - developed in LabVIEW

LabVIEW

Best log file format for multivariable non-continous time series

Re: Best log file format for multivariable non-continous time series

Re: Best log file format for multivariable non-continous time series

Re: Best log file format for multivariable non-continous time series

Re: Best log file format for multivariable non-continous time series

Re: Best log file format for multivariable non-continuous time series