TDMS file size reduced by 6x through ZIP

mschmit · ‎08-07-2007

OK, I'm using the TDMS format for streaming to disk my data. I'm writing single precision floating points to reduce the file size by 1/2. I've got 6 channels at 50kS/s (M-Series) and after about 10,000,000 samples I have a TDMS file of around 286MB, which makes sense since this is like 240MB of raw data.

However, when I ZIP the file either in Windows or using LabView, I can reduce the file quite a bit, to around 44MB.

Any thoughts on what's happening here? It would seem like I'm losing some data or at least precision. However, when I unzip, the file is completely normal. Obviously storing the ZIP file seems like a much better option.

I'm hesitant to try and coerce my data into an unsigned or signed interget format since I'm preforming analysis and graphing of this data in real time and doing a bunch of conversions from I16 or U16 to floating points will be painful, and I might as well do the conversion up front and not do it myself.

I'm just sort of confused about the ability to compress a binary format so much.

TCPlomp · ‎08-07-2007

Only true white noise isn't compressable. Every other signal can be compressed.
I also suggest to defragment your TDMS file, I think your 286 MB will go down a second time.
I also suggest to store the data as I16 (read raw binary from your device) and scale upon displaying. The examples show you how to do this.

Ton

Free Code Capture Tool! Version 2.1.3 with comments, web-upload, back-save and snippets!
Nederlandse

LabVIEW user groep www.lvug.nl
My LabVIEW Ideas

LabVIEW, programming like it should be!

mschmit · ‎08-07-2007

I'm already doing a TDMS defrag before calling the ZIP function in LV. This reduces the TDMS index file a bunch but not the TDMS file by much.

The I16 calls just seem so painful... I started doing this, but in all honesty, it was tedious to recode everything since each channel has different scaling/calibration coefficients. Maybe I need to make a quickie sub-Vi for conversion.

One quick question: Is the 1D Raw I16 read for 6 channels that much faster than the unscaled 2D I16 call? I know the samples are interleaved and this is kinda painful since I'm already doing some gyrations to move the code into the FIFO buffer.

Thanks.

Sean · ‎08-07-2007

Ah, the magic of lossless compression...

http://en.wikipedia.org/wiki/Lossless

DFGray · ‎08-10-2007

If you want/need compression, try NI-HWS (found on your driver CD). It has native compression. With your total data rate (300kS/s) you may be able to stream and compress on the fly (depends on your computer specs). There is no HWS to Excel converter, but you won't need it with that amount of data. Most other analysis packages can read HWS using an HDF5 filter. The under-the-hood file hierarchy is very similar to SCPI-DIF.

Sean · ‎08-10-2007

Alternatively, just tell Windows to compress the folder that contains the files. Right-click on the folder, click the Advanced button. Enable compression, and select the option to apply it to files within.

I've no idea what algorithms it uses, but it is very likely that datalogging files will shrink massively. If you look at the file properties in Windows, it shows both the "Size" and the "Size on disk".

The CPU overhead should be fairly low, as it's handled at a very low level by the OS. There are then no issues with file formats, as Windows handles the compression transparently.

mschmit · ‎08-17-2007

Sean,

Thanks for the reply. However, this isn't applicable since the data collection and writing to disk are happening on a LV RT system, and hence Windows isn't available.

Does anyone know anything about the Property "NI_MinimumBufferSize" for TDMS? I can't seem to find much documentation on this. It would help a lot I think in another area (trying to write a single value to a TDMS channel).

Thanks.

-Mike

Herbert Engels · ‎08-17-2007

Cross-post (http://forums.ni.com/ni/board/message?board.id=60&message.id=6837&jump=true#M6837).
Herbert

Jarrod_S. · ‎08-17-2007

Regarding NI_MinimumBufferSize:

The best way to write data to TDMS files is in large chunks such as 1000 points at a time. The reason is that TDMS files always append data on to the end of a file, which makes writing a very fast operation. Channel data in the file don't have to be stored contiguously. However, this also means that every time a channel gets written to file, some indexing information is added so that it can be interpreted correctly later. If you're writing one point at time, the indexing information might actually be larger than the actual data point. But if you write 1000 points at a time, you only need the same indexing information as for one point. So the balance is much better and you get much smaller files.

However, it can be inconvenient in a single-point application to buffer all your channels and only write them when they get to a certain size. This is what the NI_MinimumBufferSize setting does automatically for you. You can just write one point at a time, but each channel's data just stays in memory until it reaches the selected buffer size. Only then does it get written to disk.

Note that defragmenting your file collapses all the indexing information and puts all the channels together into contiguous chunks. But defragmenting a large single-point acquisition can take a lot of time, so it's best to either do the buffering yourself or set this property.

P.S. Thanks Herbert.... 🙂

Message Edited by Jarrod S. on 08-17-2007 11:17 AM

Jarrod S.
National Instruments

LabVIEW

TDMS file size reduced by 6x through ZIP

TDMS file size reduced by 6x through ZIP

Re: TDMS file size reduced by 6x through ZIP

Re: TDMS file size reduced by 6x through ZIP

Re: TDMS file size reduced by 6x through ZIP

Re: TDMS file size reduced by 6x through ZIP

Re: TDMS file size reduced by 6x through ZIP

Re: TDMS file size reduced by 6x through ZIP

Re: TDMS file size reduced by 6x through ZIP

Re: TDMS file size reduced by 6x through ZIP