02-07-2014 12:20 PM
I have to read a file that is in a format with a fixed width per column, and an unknown number of columns. Is there a simple way to convert this in to an array - basically a "Spreadsheet string to Array".
I have a working method, but it is not exactly elegant and is somewhat slow for larger data sets. It also (currently) has the problem of stopping if there is a blank line in the data set. I can fix that by scanning for tokens (cr/lf), finding the number of lines, replacing that outer "while" loop to a for loop, but... I'm hoping there is a better solution.
Thanks,
-M
Solved! Go to Solution.
02-07-2014 12:35 PM
Start by right-clicking on your Read Text File. There is an option to read lines. Use that. What that will now do is create an array of strings, each element being a line in your file. Now change that While loop into a FOR loop with the array of lines autoindexing in. Then eliminate the Get Line function.
That will help a little bit. I'll have to think about the inner while loop to see if there is a better way. Nothing immediately comes to mind.
02-07-2014 12:45 PM
I diden't realize that "Read Line" function existed. Thank you.
02-07-2014 01:28 PM
Unfortunately, the "Read Line" option only seems to return an array if I have a count wired - which means I still need a while loop, but this time during reading the file. I also looked at converting it to a byte array and "bunching" it that way, but it didn't seem to give me any increase in effeciency.
Still, reading by line brought the total speed up enough that it is viable for the larger data. Not the prettiest solution, but it works.
Thanks,
-M
02-07-2014 01:34 PM
Use "spreadsheet string to array", but use a 1D string array as array type, %s as format, and newline as delimiter.
02-07-2014 02:17 PM
@altenbach wrote:
Use "spreadsheet string to array", but use a 1D string array as array type, %s as format, and newline as delimiter.
This does work, but I'm not sure why I would pick this method over another. Would you be willing to explain a bit?
From my knowledge, in the second method (Spreadsheet String to Array), the entire file would exist in memory twice since LV is entirely pass-by-value. Once after reading from text file, and once after converting it in to an array. For larger files, this could quickly become a problem - though I'll admit LV memory handling is still a bit of a black box to me.
Thanks for your input. I have been putting effort in to making sure I am following best practices in my code. LV seems to be a language that is extremely easy to use, but difficult to use correctly 😉
-M
02-07-2014 04:08 PM
@BowenM wrote:
From my knowledge, in the second method (Spreadsheet String to Array), the entire file would exist in memory twice since LV is entirely pass-by-value. Once after reading from text file, and once after converting it in to an array. For larger files, this could quickly become a problem - though I'll admit LV memory handling is still a bit of a black box to me.
You did not say how big the datasets are, but I don't thing this is something you have to worry about.
Do you have performance issues? Is the final data a 2D array of string or do you need to convert it to a numeric array, for example?
Is each row the same lenght for a given file?
02-07-2014 04:30 PM - edited 02-07-2014 04:43 PM
As for the data set size, that is variable. The LV program will to be view data acquired from another old Unix program (I will eventually be working to upgrade this to LV). Most of the data isn't that large: 30-40 channels, maybe 2000 scans / channel. That having been said, there are times when there may be several hundred thousand scans / channel. I want to make sure that the program won't "break" under the rare large data set - which means paying attention to effeciency, data duplication in memory, etc.
For your other questions: The final 2D array is being converted to floating point, and then some operations done. Row length is constant after the header, which varies in size depending on the number of channels used.
Edit: One of the reasons I am so worried about making the program air-tight as it were, is I won't be the end user. I don't know what a normal every day technician will do, so I want to keep things like memory management in mind.
To this note, I plan on actually doing the loading inside of an "in place element structure" that uses a data value reference, and a message handler passing that reference in order to keep track of what is writing/reading the data when. This will be done with a producer-consumer state machine architecture.
Is this a "valid" design? I haven't really seen much documentation on it yet, and while I have been programming with LabVIEW for awhile, I am only now trying to make sure I do it "correctly" - not just throwing whatever works together.
02-07-2014 05:35 PM
What is the datatype of the final numeric array (I32, SGL, etc).
Formatted strings are typically inefficient to store numbers (you only use a very small subset of ASCII characters!), so it probably pays to go to numerics as soon as possible. If in doubt, code a few alternative version and do a few benchmarks. That's what I would do. (I don't think the code you show is very efficient).
(Note that there is an idea to allow empty string as delimiter for spreadsheet string to array. See if you like it)
02-07-2014 05:36 PM
@BowenM wrote:
To this note, I plan on actually doing the loading inside of an "in place element structure" that uses a data value reference, and a message handler passing that reference in order to keep track of what is writing/reading the data when. This will be done with a producer-consumer state machine architecture.
Is this a "valid" design? I haven't really seen much documentation on it yet, and while I have been programming with LabVIEW for awhile, I am only now trying to make sure I do it "correctly" - not just throwing whatever works together.
This does not sound right.