Optimize way to manipulate Big text file ?

engwei · ‎03-17-2009

Hi all,

Here I start my story ..

Long long time ago, Mr.A has a big text file (let say 120 columns X 840k rows) , around 100MB. He wanted to swap the text column according to the input number, let say,

if input are 1,2,3,10,20,30,

output = a file with 6 columns, taken from column 1,2,3,10,20,30 of the big text file.

Then he found that the VI taking too long time to do the extracting. (the subVI consists of the normal array function, let say index array, insert array to create a new array, then save to a text file).

Hence, he chops the big text file to smaller text files (65k rows each text file), and he found that it's still taking a lot of time although the extracting time has improved a lot. Then he chops the files to 1024 rows each text file. And he end up with taking a few minutes too, to get the output file produced. (since smaller files = more files = more time consumed to open and close the file).

So, here I m asking help from expertss here.. Any idea to optimizes the way shown above ? Any ideal number of rows (120columns) to shorten the extracting time for the situation above ?

please advise. Thank you very much.. 🙂

smercurio_fc · ‎03-17-2009

What is the format of the text file? Fixed column widths, or delimited? Do you have an example file with, say, only about 10 rows (and 120 columns)?

nathand · ‎03-17-2009

How are you reading the file - are you reading individual lines, chunks of text, or using read from spreadsheet file? You should not need to chop up your original file into smaller ones, you just need to avoid processing the entire file at once. You'll save time if you preallocate as much as possible - for example, create your intermediate array (the array with just the desired columns) when your code first starts and make sure to reuse it for each new chunk of data you read and process. Also preallocate your output file by setting the end-of-file marker to some large value sufficient to hold all the output - you can move that marker back to the actual end once you finish processing. What is the format of your data? Even if it's numbers, treat it as text to avoid unnecessary conversions.

engwei · ‎03-17-2009

Hi,

Thanks for you guys' replies ..

What is the format of the text file?

.txt

Fixed column widths, or delimited?

ya the number of columns is fixed for whole file. Delimiter = <space>

How are you reading the file ?

by using "read from spreadsheet file" ..

Here I explain more on my purpose.

Another purpose I chop the big file to 65k rows perfile is because the VI after this section (spreadsheet string to digital) cannot process huge data at once. So, the 65k of rows is the maximum row allowed for my case.

Then I chop it to smaller to observe if it will improve the performance....My VI is somesort like

spreadsheet string to array => index the column => append all the columns => array to spreadsheet string

I believe this is no good for big file ..

Here I attached a text file sample .. Thanks for the advices. 😄

JK1 · ‎03-17-2009

Hai,

I too met with the same problem. But i changed the file to binary format and used the pointers to read only required data rather than loading a huge text file into memory and extracting the required portion. The format i followed will writes the data in row format (i.e. each row is a channel) while retreiving, each channel is read seperately, streamed into seperate temp files and will be manipulated.

Post back if you would like to know about it.

With regards,
JK
(Certified LabVIEW Developer)
Give Kudos for Good Answers, and Mark it a solution if your problem is solved.

engwei · ‎03-18-2009

Hi,

Thanks JK1 for giving me some idea about it.

Think am doing something very similar to what u have done .. Hope to know more 🙂

smercurio_fc · ‎03-18-2009

JK1 wrote:
Hai,
I too met with the same problem. But i changed the file to binary format and used the pointers to read only required data rather than loading a huge text file into memory

You do not need to change the file to binary. All files are inherently binary - it's just a matter of how you choose to read them.

engwei wrote:
What is the format of the text file?
.txt

Ummm... well, the extension tells me nothing. That's not what I meant. It doesn't matter since you uploaded the file and I can see what the format is. Just out of curiosity, how long did it take to extract the columns with your initial attempt, and how long did it take with your chopped up attempts? I wrote a quick and dirty VI that took about 3.5 minutes on a file with a little over 700K rows. Have to see if I can improve that.

I don't know where you're getting the 65K row limit. Are you trying to open these with Excel?

Ben · ‎03-18-2009

smercurio_fc wrote:
JK1 wrote:
Hai,
I too met with the same problem. But i changed the file to binary format and used the pointers to read only required data rather than loading a huge text file into memory
You do not need to change the file to binary. All files are inherently binary - it's just a matter of how you choose to read them.
...

If the data file was originally saved as binary, then we could walk through the file by manipulating the file pointer and just read the values we want since the records will be of fixed size.

Ben

Retired Senior Automation Systems Architect with Data Science Automation LabVIEW Champion Knight of NI and Prepper LinkedIn Profile YouTube Channel

smercurio_fc · ‎03-18-2009

Ben wrote:

If the data file was originally saved as binary, then we could walk through the file by manipulating the file pointer and just read the values we want since the records will be of fixed size.

Yes, but you can obviously do the same thing if the file is a text file, especially given the structure format of the file as we have it. Saving it as "true" binary will, of course, make the file a lot smaller, though.

smercurio_fc · ‎03-18-2009

engwei wrote:

Here I attached a text file sample .. Thanks for the advices. 😄

Lines 387, 461 look corrupted. Can you please verify?

LabVIEW

Optimize way to manipulate Big text file ?

Optimize way to manipulate Big text file ?

Re: Optimize way to manipulate Big text file ?

Re: Optimize way to manipulate Big text file ?

Re: Optimize way to manipulate Big text file ?

Re: Optimize way to manipulate Big text file ?

Re: Optimize way to manipulate Big text file ?

Re: Optimize way to manipulate Big text file ?

Re: Optimize way to manipulate Big text file ?

Re: Optimize way to manipulate Big text file ?

Re: Optimize way to manipulate Big text file ?