LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

File format choice

Hi,

 

I'm currently faced with a small design problem that I'd like your input on. For our application, a software used for testing audio equipment on production lines and R&D labs, I need to pick a certain file format to save our data to. I don't want to set a lot of limits on what data could look like, but for now it's a series of user metadata(e.g. operator, SN, PN, firmware version etc.) and a bunch of waveforms, xy traces, results coming out of our test executor. These all can be nested in various groups.

I'm trying to optimize 2 things.
First is data interchangeability within our customers' test setup, comprised of our software and others as well. This means an easy way of parsing our files, or getting the information out of our software and into theirs. This in principle means existing support for parsing these files in other languages. We don't have the resources to write those ourselves.  The waveforms can go up to  10s of MB so not too large and at this point it seems like ease of data exchange is a higher priority than minimizing storage space usage. We also currently differentiate between waveforms stored in this binary format and other xy pairs that we might output from various analysis steps that will always be in the range of a few hundreds of points.
Secondly, we are getting requests to save to various tools like Tableau, Wats etc. so this file and the API I'll have to design to have our software write it will have to minimize the work required to import files directly into these tools.

A good API into our software could do the job but we need our customers to do as little programming as possible basically. We do already save to a database, and various file formats, txt, xlsx, wav and a binary one as well but they all have some problems associated with them. I'm deliberately not saying anything about what I'm leaning towards to not bias this thread from the very beginning.

Any thoughts on this are welcome!


Thank you,
Lucian Grec

0 Kudos
Message 1 of 18
(2,456 Views)

My first thought was TDMS

G# - Award winning reference based OOP for LV, for free! - Qestit VIPM GitHub

Qestit Systems
Certified-LabVIEW-Developer
0 Kudos
Message 2 of 18
(2,451 Views)

I work with OP.  NI TDM doesn't seem to have wide industry adoption.  It seems like it's just DIAdem plus an Excel convertor.   Some of our customers use WATS, some of them use Tableau, some of them make their own in house parsers with python.   So ideally we'd want a fairly general file format that works with or could easily work with all of those.

 

We're posting here mostly in case someone knows of a widespread industry standard measurement data format that works with common analysis packages.

0 Kudos
Message 3 of 18
(2,407 Views)

Have you considered HDF5?  I know a colleague who settled on that after some similar consideration for a well-defined format with reasonably wide support.  There's a VIPM package available.

 

 

-Kevin P

ALERT! LabVIEW's subscription-only policy came to an end (finally!). Unfortunately, pricing favors the captured and committed over new adopters -- so tread carefully.
0 Kudos
Message 4 of 18
(2,390 Views)

@lucian.grec wrote:

First is data interchangeability within our customers' test setup, comprised of our software and others as well.


What is "others"?

 

If these are third party programs (not NI based), do some research and see if there is a common standard file type that is understood by all.

 

Then study the full published file specification and implement it in LabVIEW. You have full control over every last byte in any file you write. 😄

0 Kudos
Message 5 of 18
(2,379 Views)

I did consider HDF5 but I kind of excluded any binary formats at this point as it would make inspecting the data harder for the

non programmer/science types. Keep in mind we are not doing heavy streaming to disk nor do we generate TB large files so from my point of view the advantages of a file like HDF5 or TDMS or any other binary file don't outweigh the disadvantages for our users.

We're trying to move away from csv at the end of the day that's pretty easy to view/parse. I wouldn't want them to have to understand yet another complex format with it's own API. I want to improve hierarchical data support in the file which a csv doesn't lend itself to very well due to the tabular structure.

0 Kudos
Message 6 of 18
(2,374 Views)

That's a good question that I can't answer at this point. Very likely not NI.
I might've also been a bit loose with the nomenclature. I am not out to write my own worse than HDF5 file specification 😄 but rather to chose between various text file types, e.g. json, xml, csv and the like that are ubiquitous nowadays. I am thinking a JSON is easily extensible with other fields, can be nested as much as I please, it takes almost no effort for a python user to load it in a dictionary and we won't have worse performance(*file size) than a csv file since that was already ascii to start with. On top of that it should be very easy to read if prettified. 

 

We could base64 encode our waveforms for a ~%30 percent hit in file size compared to binary, but that really doesn't seem to be that much of a concern.

0 Kudos
Message 7 of 18
(2,365 Views)

@lucian.grec wrote:

That's a good question that I can't answer at this point. Very likely not NI.
I might've also been a bit loose with the nomenclature. I am not out to write my own worse than HDF5 file specification 😄 but rather to chose between various text file types, e.g. json, xml, csv and the like that are ubiquitous nowadays. I am thinking a JSON is easily extensible with other fields, can be nested as much as I please, it takes almost no effort for a python user to load it in a dictionary and we won't have worse performance(*file size) than a csv file since that was already ascii to start with. On top of that it should be very easy to read if prettified. 

 

We could base64 encode our waveforms for a ~%30 percent hit in file size compared to binary, but that really doesn't seem to be that much of a concern.


I was going to suggest JSON. I don't like XML.



Mark Yedinak
Certified LabVIEW Architect
LabVIEW Champion

"Does anyone know where the love of God goes when the waves turn the minutes to hours?"
Wreck of the Edmund Fitzgerald - Gordon Lightfoot
0 Kudos
Message 8 of 18
(2,358 Views)

@Mark_Yedinak wrote:

I was going to suggest JSON. I don't like XML.


I am also not a fan of XML.  JSON, in my opinion, is easier to read.  So if you really need a hierarchical, ASCII/Unicode format, JSON would be the route I would do with.


GCentral
There are only two ways to tell somebody thanks: Kudos and Marked Solutions
Unofficial Forum Rules and Guidelines
"Not that we are sufficient in ourselves to claim anything as coming from us, but our sufficiency is from God" - 2 Corinthians 3:5
Message 9 of 18
(2,347 Views)

I would still recommend TDMS the hierarchy is specifically designed for measurements and supports attributes and events.

 

JSON also has good support but suffers from the DOM standard and W3C won't / can't access it in a theradsafe manner. 

 

SCOUT by SignalX is a reasonable  TDMS Editor and viewer for rough eyeballing 

 

npTDMS  is well featured for your py gurus

 

The TDMS addon for Excel is out there everywhere 

 

MATLAB support is around s

 

Some quick Googling suggests Excel, MATLAB and Python are each supported in Tableau.  I let y'all dig in for more.

 

And of course, you can always use the Native LabVIEW functions to do the analysis in a better language environment 😉


"Should be" isn't "Is" -Jay
Message 10 of 18
(2,340 Views)