Best log file format for multivariable non-continous time series

Mads · ‎07-01-2011

Databases or TDM(S) files are great, but what if you cannot use a database (due to the type of target) and TDM files are (seems) unsuitable because the data does not come in blocks of continous time series. What is the best file option for data logging?

Scenario:

The number of variables you are going to log to a file can change during run-time
The data is not sampled at fixed intervals (they have been deadband filtered e.g.)
The files must be compact and fast to search through (i.e. binary files with known positions of the time stamps, channel descriptions etc.)
Must be supported on compact fieldpoint and RIO controllers

Right now we use our own custom format for this, but it does not support item no. 1 in the list above (at least not within the same file) and it would be much nicer to have an open format that other software can read as well.

Any suggestions?

Mads Toppe
Check out our Modbus Test Master - developed in LabVIEW

DFGray · ‎07-01-2011

HDF5 comes to mind, but you would need to compile it for the target controller OS. Your best bet would probably be to modify your current custom format to support item 1.

Mads · ‎07-02-2011

I would really prefer a solution that is written in pure and source included (if not open) G, or which are indirectly supported by NI.

It is a bit of a surprise that this does not come up often enough for it to have support e.g. in the TDM format (or that HDF5 support is not part of the VI.lib...or database support on RT ).

I'm working on redesigning our custom file format yes (perhaps by creating chunks which are given a preallocated size within the file, I'll read the HDF5 specification for ideas), but it really feels like I'm reinventing the wheel. Plenty of others must have similar data logging formats already.

Mads Toppe
Check out our Modbus Test Master - developed in LabVIEW

DFGray · ‎07-03-2011

I am unsure why TDMS would not work for you. Could you save your data as two vectors, one containing the data, the other the timestamp? To reduce disk usage, set a minimum buffer size so you don't write single points constantly (the latest TDMS has the feature).

YongqingYe · ‎07-03-2011

I would also recommend TDMS maybe is a good option for you, based on your requirements.

Mads · ‎07-04-2011

You are right, I could use TDMS if I write the time as a separate channel in each of the groups.

It is not ideal that (as far as I understand) it adds headers on every write though (It ought to be possible to define the structure explicitly on change only - and then not get any new headers). This defrgmentation will be reduced with buffering, but with typically just one sample per second or less* I still have to write to disk every now and then to ensure that I do not lose hours of data if the PC/PAC crashes or loses power.

Mads Toppe
Check out our Modbus Test Master - developed in LabVIEW

Mads · ‎07-04-2011

I did a test with a realistic data set (two log groups of 4 channels, where one group produces 1 sample per second and the other 1 sample per minute - and the maximum tolerable data loss caused by a crash is 1 minute of data) and the overhead with TDMS was about 29%.

It seems that due to the way headers are added on each write the format is less suited for long term logging at low sample rates. That is unless you can accept the risk of data loss that comes with buffering (the buffering will reduce the number of headers added, but if the system crashes while readings are in the buffer those readings are lost).

Mads Toppe
Check out our Modbus Test Master - developed in LabVIEW

YongqingYe · ‎07-04-2011

For the header problem, if you are using LV 2009 or later version, we had a feature called "one header only" for TDMS, that means if you keep writing to the same channels, everytime with the same number of values, you'll have only one header in the file.

If you are using LV 2010, we have another bunch of API called TDMS "Advanced API", by using this API, you'll have the control to the number of headers - everytime you call Set Channel Infomraiton, it creates one header, otherwise, there's no header any more.

Mads · ‎07-06-2011

I did some tests of the performance. For a months worth of data (2592000 rows) with 4 channels, I got the following results when reading all of the data:

1. TDMS file written as blocks of 60 values (1 minute buffers):1,5 seconds.

2. As test 1, but with a defrag run on the final file: 0,9 seconds

3. As test 1 & 2, but with all the data written in one operation: 0,51 seconds

3. Same data stored in binary file (1 header+2D array): 0,17 seconds.

So even if I could write everything in 1 go (which I cannot), reading a month of data is 3 times faster with a binary file. The application I have might get a lot of read-requests and will need to read much more than 1 month of data - so the difference is significant (reading a year of data if stored as monthly files would take me 12-18 seconds with TDMS files, but just 2 seconds with a binary file.

Because I'll be writing different groups of data at different rates, using the advanced api to just get one (set) og header(s) is not an option.

TDMS files are very versatile, it is great to be able to dump a new group/channel into the file at any time, and to have a file format that is supported by other applications as well. However, if the number of writes are many and the size of each write is (has to be) small the performance gets a serious hit. In this particular case performance trumphs ease of use so I'll probably need to rewrite our custom binary format to preallocate chunks for each group (feature request for TDMS? 🙂 ).

Mads Toppe
Check out our Modbus Test Master - developed in LabVIEW

YongqingYe · ‎07-06-2011

Hi MTO,

Thank you very much for testing and thinking about TDMS file format. I'm just wondering which version of LabVIEW you are using? And when you testing TDMS vs Binary, did you use "disable buffering" in TDMS Open node?

TDMS has had some features for performance improvements, as I introduced above, in the cases you don't have too many headers, the performance of TDMS is would not be far away behind Binary file. Plus, if you using TDMS Advanced API, the performance of TDMS can beat Binary, in high speed streaming use cases.

Another main difference of TDMS from Binary is, for Binary file, you just stream the data to the file and the file only contains the data which directly from LV, you need to take care of the file format itself, the data in Binary file is not well organized, or we can say it's not self-described. For TDMS, the data is organized in file/group/channels structure, very clear, and can be used by multiple NI software, like DIAdem, CVI and so on.

LabVIEW

Best log file format for multivariable non-continous time series

Best log file format for multivariable non-continous time series

Re: Best log file format for multivariable non-continous time series

Re: Best log file format for multivariable non-continous time series

Re: Best log file format for multivariable non-continous time series

Re: Best log file format for multivariable non-continous time series

Re: Best log file format for multivariable non-continous time series

Re: Best log file format for multivariable non-continous time series

Re: Best log file format for multivariable non-continous time series

Re: Best log file format for multivariable non-continous time series

Re: Best log file format for multivariable non-continous time series