Reading a very large text file

cn@med · ‎02-10-2006

I am reading a very large text file (.dat), nearly 20MB. The text file contains different data types - float, date/time, string etc. I am using the "Read from Spreadsheet File.vi" to do this. The only problem is that labview takes a long time (~10 secs) to read data then process it. Is there any alternate method that i can use to read large text files in lesser time ? Please help. Thanks

altenbach · ‎02-10-2006

You will have more control if you read the file as simple string, then do the parsing manally. Read from spreadsheet file does not work well if you have a mix of data types anyway.

How is the file structured? Do all "lines" contain the same number of characters? Any patterns? What seperates the fields (tab, spaces, etc.)

Please attach a simple version of your data file (e.g. the first 100 lines) and we'll figure out the best way.

LabVIEW Champion.

cn@med · ‎02-10-2006

I have attached a sample file. All lines may not contain the same number of characters. The pattern is as follows :

First few rows are for comments (string that can run up 100's of charaters)

Next row is a header (remains the same always)

Next few rows are data separated by tabs. The row length may vary, but the number of columns are fixed.

I hope this helps.

Thanks.

Note: The file extension is .xls in this example. In my actual code, the file extension is .dat.

smercurio_fc · ‎02-10-2006

I think the first question we need to ask is what kind of computer do you have? That may be the fundamental problem. My machine is a P4 3 GHz with 512MB or RAM and it takes just about 5 seconds to read a 20MB file that's in the format you gave (had 278531 lines) using the "Read from Spreadsheet File" VI. That's not too unreasonable for that size of a file and using that function on this type of computer. I simplified the VI by just using the raw file I/O functions followed by the "Spreadsheet String to Array" function just like in the "Read from Spreadsheet File" VI and there was a small improvement in speed (about half a second tops). Most of that time was actually spent in the "Spreadsheet String to Array" function performing the conversion from string to number.

cn@med · ‎02-10-2006

This is just 12KB file. I have files that are ~20MB. So you are implying that 10-12 seconds for a 20 MB file is reasonable ? I want to know if there is a way to reduce this time.

nyc_(is_out_of_here) · ‎02-10-2006

I agree with altenbach. I would read in the lines as straight text and parse later.

jasonhill · ‎02-10-2006

20 MB is pretty big. The file I/O may be hardware bound. It take me about 82 seconds to load my 187MB monsters. Its a pain, but there are a lot of data points in there. I do not think that LV is making it particularly slow. It takes about the same amount of time to open them in Wordpad. Mind you, this is over a network which may be adding a bit to the time.

What you may be able to do is feed the data into queues and start processing the data as it is being read. I don't know what sort of analysis you want to perform. Perhaps the processor could be working on the data while the file read .vi is waiting for the disc to spin.

smercurio_fc · ‎02-10-2006

I'm saying that the amount of time is relative depending on your computer, which is why I asked what type of computer you were using. I took the sample file you posted and duplicated the data until I had a 20MB file. Then I just used the core file I/O functions and fed the output of the Read function directly into a "Spreadsheet String to Array" function as shown in the attachment. In essence I created a slimmed-down version of the "Read Spreadhseet File" VI. The best I could get was 5 seconds, and I don't think that's unreasonable for the computer I had.

As to the comment of reading the whole file as straight text and parsing later, that's precisely what this example is doing. In this example I'm reading the whole file at once and then parsing it using the "Spreadsheet String to Array" function. It's still going to take time with that amount of data. And that time is dependent on CPU horsepower and amount of RAM that you have.

If you were to use binary files you could gain speed since you would not need to do the text to number conversion which is where the time is being chewed up.

cn@med · ‎02-10-2006

Thanks for all your replies. I will try your suggestions and hopefully have a reasonable load time at the end of it.

LabVIEW

Reading a very large text file

Reading a very large text file

Re: Reading a very large text file

Re: Reading a very large text file

Re: Reading a very large text file

Re: Reading a very large text file

Re: Reading a very large text file

Re: Reading a very large text file

Re: Reading a very large text file

Re: Reading a very large text file