LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

How do you handle and distribute huge amounts of data?

I've got about 315 channels and will be collecting data at about 1300 Hz.  Data collection will last about 3-4 minutes.  I'm at a loss on how to distribute the data to researchers for evaluation.  These researchers will not have LabView or any other NI software to use.  I've tried the TDM stuff, and the add-in for excel 2007 but the files are just too large for excel to handle in a timely manner.  It took just over three hours to open one of my test files.  Anybody have any suggestions on how to manage and distribute files this large?
0 Kudos
Message 1 of 20
(5,201 Views)
Do they need to have access to the raw data?
 
That is a lot of data 4 minutes x 60 sec/minute x 1300 Hz x 315 channels x 4 bytes (for dbl) is over 393 million bytes if stored in binary.  If it was just strings in a text file, it would grow more.  So I am sure it would take along time to open (though I wouldn't have thought 3-4 hours).
 
If the researchers only need some calculations or analysis on that data, or charts and graphs, perhaps the Report Generation VI's could handle it.

Message Edited by Ravens Fan on 07-31-2007 05:10 PM

0 Kudos
Message 2 of 20
(5,188 Views)
Well I'm sitting in Cleveland Ohio right now so I am going to have to disregard your message.....hahhahaha...
 
The big problem is that the raw data is what is important to them.  I just set up the daq system, write the code, collect the data.  The researchers want all the data to evaluate.  I'm just not sure how to get it to them.  This is the first time I have ever worked on a project with these type of data requirments....
0 Kudos
Message 3 of 20
(5,180 Views)


@Frank Rizzo wrote:
Well I'm sitting in Cleveland Ohio right now so I am going to have to disregard your message.....hahhahaha...
\

Smiley Happy

I'll be curious to see if Jamal Lewis's yardage improves for you guys this year after having some definite dropoffs with us the last couple.

Though now that he's not playing against you,  your defense's numbers against the run should improve.Smiley Wink

 

I'd be surprised if you could get that data to the researchers at all using Excel.  It has a limit of 256 columns and 65536 rows.  If you had a column per channel and a row per data point, you'd be talking 315 columns and 312,000 rows for 4 minutes of data.  I guess you could always break it up into several files being sure to leave some rows and columns to give them a chance to do some data calculation.

Out of curiosity, I created a spread sheet where every cell was filled with a 1.  It took a good 30 seconds to save and was over 100 MB.  That was probably about 1/10 of the amount of data your dealing with.  And going back to my earlier calculations, I would guess that as a text file, that much data would need about 10 bytes per value thus getting you to about 1 GB in text files.

I would ask them what kind of software they will use to analyze these files and what format they would prefer it in.  There are really only 2 ways to get it to them, either ASCII text file which could be very large, but would be the most flexible to manipulate.  Or a binary file, which would be smaller, but there could be a conflict if they don't interpret the file the same way you write it out.

I haven't used a TDM file add-in for Excel before.  So I don't know how powerful it is, or if there is a lot of overhead involved that would make it take 3 hours.  What if you create multiple TDM files?  Let's say you break it down by bunches of channels and only 20-30 seconds of data at a time.  Something that would keep the size of the data array within the row and column limits of Excel.  (I am guessing about 10 files).  Would each file go into Excel so much faster with that add-in that even if you need to do it 10 times, it would still be far quicker than one large file?  I am wondering if the add-on is spending a lot of time figuring out how to break down the large dataset on its own.

The only other question.  Have you tried the TDMS files as opposed to the TDM files?  I know they are an upgrade and are supposed to work better with streaming data.  I wonder if they have any improvements for larger datasets.

 


 

Message 4 of 20
(5,158 Views)
No one in their right mind *wants* to see 1GB of data, or anything close to it.  My experience is that they say "Give it all to me" because they don't know what they want.

Were it me, I would:
1.  Ask them what they want.  Then ask them again in a different way, and perhaps ask a third time to a different person.  Then provide it or say why it's a bad idea, and ask again.
2.  If #1 doesn't reduce the data set, provide them with a big, ugly text file (or series of big, ugly text files, say, 1 file for each channel organized in folders).  Text never goes out of style, and they may have already identified the "perfect tool" to analyze this stuff.  More power to them.
3.  Along with #2, provide a series of picturesque, polished graph summaries from LV's reporting vi's or image captures.  Make them shiny, neat snapshots of the data.  Embed them in a local web page with a good indexing system, shower them with glitter, whatever it takes.  (This may take some domain knowledge of what's being measured and why.)

I have never seen any engineer or scientist turn down a graph.  If you're fortunate, they will end up wanting only the graphs and you can just zip and archive the data.  Less fortunate, and you deliver both (if the graphs are good, they'll still never look at the data, but it warms a techies heart to know he has the goods on a graph).

Joe Z.
Message 5 of 20
(5,149 Views)
Hi Ravens Fan,

To the horror of programmers everywhere, Excel 2007 now supports something like 64k columns and 1M rows.

We've actually kept that to ourselves where I work.  We already get spreadsheets with 20 tabs, 255 columns, and 50k rows.  Extending that further is... painful to contemplate.

Joe Z.

Message Edited by Underflow on 07-31-2007 07:52 PM

0 Kudos
Message 6 of 20
(5,149 Views)


@Underflow wrote:
Hi Ravens Fan,

To the horror of programmers everywhere, Excel 2007 now supports something like 64k columns and 1M rows.

We've actually kept that to ourselves where I work.  We already get spreadsheets with 20 tabs, 255 columns, and 50k rows.  Extending that further is... painful to contemplate.

Joe Z.

Message Edited by Underflow on 07-31-2007 07:52 PM


I did not realize that.  Honestly, I didn't even realize there was an Excel 2007 out.  I think it is long overdue in terms of expanding the worksheet space, but you're right,  that can cause a lot of headaches.  Like they say, trash will expand to fill the amount of available space.  Our company has just about finished upgrading all but the oldest, lowest denominator PC's from Win2000 to XP.  And the office package I'm running has Excel 2002.  It will probably be another year or two before they start installing Vista on the newest of PC's.  Office packages will probably be upgrading about the same time.

I realize I made an error in my earlier statement.  I said that double precision would be 4 bytes per value, but it is actually 8 bytes, so all my calculations would be twice as large.  They would be accurate for single precision numbers.  But I was just trying to put a sense of scale to the massive amount of data that was being generated.

0 Kudos
Message 7 of 20
(5,143 Views)
I have to collect and analyze very large datasets produced by LV programs every day.  A few suggestions:

1)  If you are collecting your data from DAQ boards, they are most likely 16 bit, so when using the DAQmx read VI, read the data in as raw 16 bit integers.  If the researchers really want voltage values or whatever calibrated floating point values, just tell them to divide the 16 bit integer data by the appropriate value (based on the gain you set when sampling the channels, for example).  Let them deal with doubles.

2)  Stream the data to disk as raw binary values, so that you have chunks of 315x2 bytes (for 16 bit integers), for a total of file size of 315x2x(N samples) bytes (this would be a 150-200 MB file based on your numbers).  This represents the smallest possible size of the data you can create (unless you want to try and zip it afterwards, but it probably won't help much).  If the researchers can't read raw binary data into their analysis program (and most decent analysis packages and\or custom written software should be able to), then they need to learn how to do so, or you can write some small translation program that they can use to convert the raw binary data to whatever format they desire (even the dreaded Excel file!).

3)  For each raw binary file, I'd create a small, text-based (perhaps XML formatted) header file (same name as the data file but with different extension) that summarized the data file associated with it (time\date collected, relevant experimental parameters, perhaps some summary statistics, etc.).  This way they can open up the header file to quickly see the important information, and then use that information to target specific parts of the data in the binary file for further processing.

4)  Excel is a very poor choice for this data.  No matter what you do, Excel will not handle this much data well at all.  If the researchers only know how to use Excel, introduce them to the wonderful world of Matlab or Sigmaplot or some other environment capable of handling large amounts of data.  If they really want to look at the data you produce, they'll have to learn to go beyond Excel at some point.  Researchers have to learn how to adapt to new software with new types of data.  As a researcher myself, I know that sometimes you have to drag them kicking and screaming.  "Because we're used to it" is not a good excuse.

5)  If they originally planned to use Excel, that means the type of analysis they expect to do is relatively simplistic.  Perhaps you can find out what calculations\graphs they would be creating in Excel with the data, and just pre-empt that by having your program automatically create that output.

Hope that helps.  Good luck!
Message 8 of 20
(5,121 Views)

Thanks to everyone for all the good information.  I'll let you know what I end up doing.  If there are any other ideas or past experiences anyone has to share I would welcome them.  Thanks!

 

 

0 Kudos
Message 9 of 20
(5,088 Views)
I was surprised not to see much discussion about DIAdem here. Can anyone talk about how DIAdem handles "huge" amounts of data?
0 Kudos
Message 10 of 20
(5,082 Views)