Efficent method to sort data from tab delimited text file

Kenny_K · ‎10-26-2006

I am currently writing a program to sort through data that was acquired and display it on a graph and some other indicators. The file is a tab delimited text file with possibly 100,000s of data points. the current method that I have tried using was that if I wanted all of the data from Oct, I would parse out the month from the timestamp, compare that to the desired month, and add it to the array if it is the same. Other possible options of sorting are yearly and daily, possibly even hourly.

The method does work, however it does take some time (up to a minute on a P4 3.6 GHz with 2 gb ram), and most of the other computers are not nearly as fast or with as much memory. Is there a more efficent method to sorting the data??

I attached my sorting vi as well as a sample data file.

thanks for the advice. It is saved in LV8.0.1

Kenny

Kenny

DavidJCrawford · ‎10-26-2006

Kenny

I usually upload data to a database and then usen SQL for this sort of thing. Its makes my life much easier and faster than trying to code my own sorting and parsing algorithms.

I usually use LabSQL and MS Access. NI have the Database Connectivity Toolkit which I own but prefer LabSQL. Reading the forums it looks as if I should make a move to mySQL from but not had time to do that yet.

Just my £0.02

David

comrade · ‎10-26-2006

Hello Kenny,

first thing: on my machine it took just about 4 seconds to complete the job for a txt-file with 17 MB in size and roughly 600,000 lines (apart from reading in the file that is...)

I tried to exchange the multiple indexing of arrays by putting the three components of one line into one single string by converting the 1D-array into a string - I had no luck with it, since the conversion takes longer than the 3 instances of indexing the 1D-Array.

The subsequent use of Match Pattern ("sort for day in month" only) is obsolete though - wire the remaining substring after the first match of "/" (before the double CASE-structure)directly to one more Match Pattern. That is about all I figured out for performance tuning...sorry

find my modifications attached for your convenience.

You will have to reconnect the month and day controls to your TypeDefs and reconnect the DataIn Control to the FOR-loop...

Sorry - VIs are in 8.2...

Message Edited by comrade on 10-26-2006 10:38 AM

altenbach · ‎10-26-2006

First of all, "sorting" has usually a different meaning (Sorting and numeric array ascending or descending, a string array aphabetically, etc.). Your data already seems sorted by date and time, you just want to pick a subset having certain characteristics.

The main problem that is slowing you down is your constant growing of large arrays. This causes constant memory reallocations.

Since your data is already sorted by date and time, all you need is to place your data in a sutable data structure, find the start and end point of your selection, then use "array subset" for example.

Your code also seems to have a lot of unecessary complexity. See for example your "test for sort data" (see image below).

the four cases only differ by filename --> only the file name belongs into the case and the file operation outside the inner case. Even better, just use autoindexing.
that shift register does not do anything, because it always contains the same data. Using "index array" with index wired to [i] is equivalent to an autoindexing tunnel.
You have a case structure to select which files to read, skipped files give you an empty array. Do you really need to do all these operations on an empty array. Why not place all code inside the TRUE case??

Below is an image of one possible code alternative that addresses some of these points.

Message Edited by altenbach on 10-26-2006 09:32 AM

LabVIEW Champion.

LabVIEW

Efficent method to sort data from tab delimited text file

Efficent method to sort data from tab delimited text file

Re: Efficent method to sort data from tab delimited text file

Re: Efficent method to sort data from tab delimited text file

Re: Efficent method to sort data from tab delimited text file