05-14-2020 01:47 AM - edited 05-14-2020 01:52 AM
Hi all ,
We are reading csv file ( size ~ 1 GB) using read spreadsheet vi and then storing data in variant . This is sub vi , if we don't open sub vi ; VI executes well without any issue. However LabView memory increases beyond 8 GB while reading and storing the data ; because of which sometimes whole LabVIEW hangs.
Sorry, cannot share VI / Picture of code due to security issues.
Is there any other better approach to read big csv files and storing data ( rather then using variants )
we tried the solution from below mentioned link , it didn't helped:
https://knowledge.ni.com/KnowledgeArticleDetails?id=kA00Z0000015CQ0SAM&l=en-SG
05-14-2020 02:02 AM
Hi c,
@c_rituc wrote:
We are reading csv file ( size ~ 1 GB) using read spreadsheet vi and then storing data in variant .
How much data is contained in this file? How many rows, how many columns, which datatype (string, I32, DBL)?
Usually the solution is to NOT load the whole file into memory…
05-14-2020 02:16 AM
Hi GerdW ,
Thanks for your reply .
For the 500 MB file columns used are : 34 , and rows : 1048576 .
The size if the file varies and thus the content too .
we are reading whole file as need to search few strings in first column and and then take data corresponding to cell. The operation is done in sequence ; every time we come to this step , different string is being searched . The file is read only first time ; after that we are storing data in varaint in FGV and using that , However even the first time read is hitting the memory.
05-14-2020 03:17 AM
Reading such big files in memory will always take a lot of time. If possible, try reading in chunks and parse the chunks. The parsed data might be smaller, so the total load of needed memory can be smaller.
You won't be able to do that with a OoTB VI, but it's not that hard to make it work with the File and Advanced File functions. Open, read chunk until end of file, close file. If you know exactly what to expect, you can even give Scan From File a try.
This will probably be slower for small files, but a lot faster for big files.
05-14-2020 04:15 AM
Hi Wiebe,
Thanks for your reply,
i will try to optimize the code and read in chunks . We tried something similar mentioned in below link , however we didn't used parallel loop function and after dequeing data is stored in variant. Not Sure if LabVIEW hang is happening because if large data in variant.
https://knowledge.ni.com/KnowledgeArticleDetails?id=kA00Z0000015CQ0SAM&l=en-SG
05-14-2020 04:40 AM
@c_rituc wrote:
Not Sure if LabVIEW hang is happening because if large data in variant.
Large data in general is a major issue. You might want to just convert from the CSV into a format that is nicer to the computer, like a TDMS or binary file. Then you can do some random access of those files to process it.
05-14-2020 04:52 AM
Big data sets and variant collide quite hard. Speed and variants as well.
If it's the file or the variants is easy to test, simply create a large variant dataset with simulated (random) data.
Converting from and to variants is costly. They store more than their data (e.g. their type) so they will require more data. If you know you're reading floats, use doubles or singles (50% data use of a double).
If you don't know, why use variants? Might as well use strings.
05-14-2020 09:27 AM
we are reading whole file as need to search few strings in first column and and then take data corresponding to cell. The operation is done in sequence ; every time we come to this step , different string is being searched . The file is read only first time ; after that we are storing data in varaint in FGV and using that , However even the first time read is hitting the memory.
Why not doing a two-pass approach:
First: do a ReadLn on the file, just building an array with the content of the first cell, also storing seek position in file.
Second: Pick the data corresponding to first Column data with indices of file positions. On an SSD this should be quick. If not, then load chunks of the file (100k) into memory string and parse there.
This shouldn't cause any memory problems.
05-14-2020 11:30 AM
You likely have a problem with duplicating memory, and we can't help much if you don't show at least that part of your code. It's just reading a file, so there shouldn't be security issues.
If your CSV is 1 GB and you're reading it in 1 pass, it will pull the whole thing into memory (AFAIK). You're also converting the data. Your example said 34 x 1048576, or 35,651,584 elements. If these are all doubles (8 bytes) that's 285,212,672 bytes, or 285 MB of data. I don't know how you're actually storing it (one variant that holds an array of doubles or one array of 35 million variants), and I don't know the overhead of a variant, but let's go nuts and say it's 2x the space to store a value in a variant than in a double.
You're still at only ~500 MB of memory, plus your 1 GB file. You should be seeing less than 2 GB of RAM used, but you're seeing 8 GB, thus you're duplicating your data somewhere, likely 3-4 times. That's causing your slowdown.
Not to mention, if you're running 32 bit LabVIEW then it can only get 2 (maybe 3) GB of actual memory at once.
Try using Tools -> Profile -> Show Buffer Allocations to see where you're duplicating your data. Better yet, post just the part of your code that reads in a file and we can help you look.
05-14-2020 11:58 AM
Hi Proven Zealot ,
Thanks for your reply.
csv file format is string , data read is 2D array of string . After parsing the data , it is stored in variant .
data read is still okay ; program hangs while trying to store/read data from variant.
i will try if we can use just array rather then variant and see if that helps.