LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

Fixed width spreadsheet string to array

Solved!
Go to solution

I have to read a file that is in a format with a fixed width per column, and an unknown number of columns. Is there a simple way to convert this in to an array - basically a "Spreadsheet string to Array".  

 

I have a working method, but it is not exactly elegant and is somewhat slow for larger data sets. It also (currently) has the problem of stopping if there is a blank line in the data set. I can fix that by scanning for tokens (cr/lf), finding the number of lines, replacing that outer "while" loop to a for loop, but...  I'm hoping there is a better solution.

 

Thanks,

-M

0 Kudos
Message 1 of 19
(5,478 Views)
Solution
Accepted by topic author BowenM

Start by right-clicking on your Read Text File.  There is an option to read lines.  Use that.  What that will now do is create an array of strings, each element being a line in your file.  Now change that While loop into a FOR loop with the array of lines autoindexing in.  Then eliminate the Get Line function.

 

That will help a little bit.  I'll have to think about the inner while loop to see if there is a better way.  Nothing immediately comes to mind.


GCentral
There are only two ways to tell somebody thanks: Kudos and Marked Solutions
Unofficial Forum Rules and Guidelines
"Not that we are sufficient in ourselves to claim anything as coming from us, but our sufficiency is from God" - 2 Corinthians 3:5
Message 2 of 19
(5,468 Views)

I diden't realize that "Read Line" function existed.  Thank you.

0 Kudos
Message 3 of 19
(5,457 Views)

Unfortunately, the "Read Line" option only seems to return an array if I have a count wired - which means I still need a while loop, but this time during reading the file.  I also looked at converting it to a byte array and "bunching" it that way, but it didn't seem to give me any increase in effeciency.  

 

Still, reading by line brought the total speed up enough that it is viable for the larger data. Not the prettiest solution, but it works.

 

Thanks,

-M

 

 

0 Kudos
Message 4 of 19
(5,441 Views)

Use "spreadsheet string to array", but use a 1D string array as array type, %s as format, and newline as delimiter.

Message 5 of 19
(5,433 Views)

@altenbach wrote:

Use "spreadsheet string to array", but use a 1D string array as array type, %s as format, and newline as delimiter.

 

This does work, but I'm not sure why I would pick this method over another. Would you be willing to explain a bit?

 

From my knowledge, in the second method (Spreadsheet String to Array), the entire file would exist in memory twice since LV is entirely pass-by-value.  Once after reading from text file, and once after converting it in to an array. For larger files, this could quickly become a problem - though I'll admit LV memory handling is still a bit of a black box to me.

 

Thanks for your input. I have been putting effort in to making sure I am following best practices in my code.  LV seems to be a language that is extremely easy to use, but difficult to use correctly 😉

 

-M

0 Kudos
Message 6 of 19
(5,415 Views)

@BowenM wrote:
From my knowledge, in the second method (Spreadsheet String to Array), the entire file would exist in memory twice since LV is entirely pass-by-value.  Once after reading from text file, and once after converting it in to an array. For larger files, this could quickly become a problem - though I'll admit LV memory handling is still a bit of a black box to me.

You did not say how big the datasets are, but I don't thing this is something you have to worry about.

Do you have performance issues? Is the final data a 2D array of string or do you need to convert it to a numeric array, for example?

Is each row the same lenght for a given file?

0 Kudos
Message 7 of 19
(5,399 Views)

As for the data set size, that is variable.  The LV program will to be view data acquired from another old Unix program (I will eventually be working to upgrade this to LV).  Most of the data isn't that large: 30-40 channels, maybe 2000 scans / channel. That having been said, there are times when there may be several hundred thousand scans / channel. I want to make sure that the program won't "break" under the rare large data set - which means paying attention to effeciency, data duplication in memory, etc.

 

For your other questions:  The final 2D array is being converted to floating point, and then some operations done.  Row length is constant after the header, which varies in size depending on the number of channels used.

 

Edit:  One of the reasons I am so worried about making the program air-tight as it were, is I won't be the end user.  I don't know what a normal every day technician will do, so I want to keep things like memory management in mind.  

 

To this note, I plan on actually doing the loading inside of an "in place element structure" that uses a data value reference, and a message handler passing that reference in order to keep track of what is writing/reading the data when.  This will be done with a producer-consumer state machine architecture. 

 

Is this a "valid" design?  I haven't really seen much documentation on it yet, and while I have been programming with LabVIEW for awhile, I am only now trying to make sure I do it "correctly" - not just throwing whatever works together.

0 Kudos
Message 8 of 19
(5,387 Views)

What is the datatype of the final numeric array (I32, SGL, etc). 

 

Formatted strings are typically inefficient to store numbers (you only use a very small subset of ASCII characters!), so it probably pays to go to numerics as soon as possible. If in doubt, code a few alternative version and do a few benchmarks. That's what I would do. (I don't think the code you show is very efficient).

 

 

(Note that there is an idea to allow empty string as delimiter for spreadsheet string to array. See if you like it)

0 Kudos
Message 9 of 19
(5,376 Views)

@BowenM wrote:

To this note, I plan on actually doing the loading inside of an "in place element structure" that uses a data value reference, and a message handler passing that reference in order to keep track of what is writing/reading the data when.  This will be done with a producer-consumer state machine architecture. 

 

Is this a "valid" design?  I haven't really seen much documentation on it yet, and while I have been programming with LabVIEW for awhile, I am only now trying to make sure I do it "correctly" - not just throwing whatever works together.


This does not sound right.

0 Kudos
Message 10 of 19
(5,375 Views)