LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

read delimited spreadsheet with ~10e6 rows

Solved!
Go to solution

Background:

  • I am trying to read in a delimited test file with ~10 million rows, that is ~8 GB in size.
  • The "read text file" chokes on memory.  It can't allocate enough memory.  I am not even using it, it dies "at the gate".
  • I'm cleaning up the data and writing to SQLite3 using the SQLite3 library by Dr. Powell. 
  • After I read the spreadsheet I use "flatten to string" then count characters.

Now I can't see why a pointer in FAT can't make the read part of this work reasonably quickly.

 

I am trying to use a "read delimited spreadsheet" and increment the start of read offset by number of characters.  The idea is that I "gulp" a reasonable number of lines, count characters, increment the character skip value, process the lines, then return to "read delimited spreadsheet" with a larger "characters to skip" number.

 

It gets to ~1.43 million lines and dies. 

 

Thoughts:

  • I can't find mal-formed data, so it is likely not a malformed data problem.
  • I have adjusted the "bite size" but that doesn't change the crash point.
  • Excel maxes out at 1.048 million rows, and my operation shows it gets to about 1.4 million lines.  This might be an excel thing.
  • I would be surprised if this was a LabVIEW thing, but it could be an Excel or Microsoft thing.
  • The flatten to string might be removing newline characters from the string. 

 

Question:

  • is there any reason it should die at that row count?
  • why might this code be terminating at only a small part of the way through the data?

 

0 Kudos
Message 1 of 24
(5,184 Views)

Have you tried "read from text file" with "read lines" enabled, then read a chunk of lines, remember the last line read, and use "set file position" to set the next read start point to the last line read +1?

 

 

Read from Text File Details
This function opens files as read-only. If you wire the refnum out output of this function to the file input of a write function, LabVIEW returns a permissions error. Use the Open/Create/Replace File function to open the file with the default read/write access and wire the refnum to the read and write functions.


By default, this function reads all characters from the text file. Wire an integer value to count to specify how many individual characters you want to read starting with the first character. Right-click the function and place a checkmark next to the Read Lines option in the shortcut menu to read individual lines from the text file. When you select the Read Lines option in the shortcut menu, wire an integer value to the count input to specify how many individual lines you want to read from the file starting with the first line. Enter a value of -1 in count to read all characters and lines from the text file.

 

Use the Set File Position function if you need to perform random access.

 

The function converts all platform-dependent end-of-line characters to line feed characters unless you right-click the function and remove the checkmark next to the Convert EOL shortcut menu item. If you wire a path to file, the function opens the file before reading from it and closes it afterwards.

 

========================
=== Engineer Ambiguously ===
========================
Message 2 of 24
(5,166 Views)

This node doesn't utilise Excel or any other Microsoft Word technology. it is a purely LabVIEW construct (albeit built on top of the Win32 API).

 

Some code would be helpful but I am assuming that you are running out of memory. Your code or an example that fails would help, obviously. 

 

0 Kudos
Message 3 of 24
(5,162 Views)

What version of LabVIEW are you running?  32 bit or 64 bit?

What version of Windows are you running?  32 bit or 64 bit?

How much memory does your PC have?

 

8 GB is a lot of data.  Unless you have a lot of memory and running LV 64 bit, you are not going to be able to read in all that data.

Message 4 of 24
(5,160 Views)

Dangit.  I was probing the count.  It is a 32 bit integer.  It goes negative (-2,146,754,745), so it has wraparound. 

 

This means, given my file size, that the counter on "Read delimited spreadsheet" isn't going to work for me. 

0 Kudos
Message 5 of 24
(5,157 Views)

You could split up the large file into smaller ones that are easier to process. There are tools you can download to do this for you. One I have used before is GSplit.

 

Either way you should implement some form of read-chunk-process loop as others have suggested. 

0 Kudos
Message 6 of 24
(5,146 Views)

It isn't going to work.  The "set file position" is in bytes and is a 16-bit integer.  If the 32-bit integer count of characters (think ascii) then a 16-bit integer showing bytes isn't going to do the job.

 

LabVIEW might not be capable here.

0 Kudos
Message 7 of 24
(5,136 Views)

@tyk007 - Trying to do the chunk-process-loop, but without file manipulation outside of LabVIEW.

0 Kudos
Message 8 of 24
(5,134 Views)

@RavensFan - I have 64 bit OS.  I am running 32 bit LabVIEW (15.0.1f1).  I have ~200GB of RAM. 

 

I have ~1TB total split into chunks, and I am going to be sad if I have to process it in R (32 bit, single thread, single core, interpreted) because LabVIEW can't crush this. 

0 Kudos
Message 9 of 24
(5,132 Views)

@EngrStudent wrote:

@RavensFan - I have 64 bit OS.  I am running 32 bit LabVIEW (15.0.1f1).  I have ~200GB of RAM. 


I'm surprised if you have 200Gb of RAM. Physical volatile memory we're talking, not storage space.

 

What's to stop you reading a line in, counting the characters in the line and then keeping a running UInt64 counter of the offset to seek from?

0 Kudos
Message 10 of 24
(5,128 Views)