‎12-12-2021 03:47 AM - edited ‎12-12-2021 03:59 AM
I’m not sure that there is any specific reason other than that int32 was the default datatype in 32 bit programming, and nobody felt that could be getting a problem anytime soon. Use of 64 bit integers would also have increased memory consumptions of LabVIEW programs even more, for the price of assembly programming on most platforms around it and to solve a problem that nobody felt could be ever an issue. 8GB of memory back then would have cost a lot more than an average year salary and required a fridge sized box filled with electronics, and be terribly slow to access. Even high end hard disks could barely store that much data in the mid 90ies.
‎12-13-2021 10:16 AM
@altenbach wrote:
Can you explain your use case in a bit more details? How many columns/rows? What do they represent? What kind of processing do you need to do? (.i.e. does an atomic processing step involve more than one point, e.g. FFT). What is the datatype (U8? DBL?). For example if there are only two columns (X and Y), maybe you can combine them into a complex number?
We are dealing with point cloud data. There are three files of point clouds being inspected.We have to check the horizontal distance of each point in point cloud 1, against each point in point cloud 2 and 3 and use that number to perform some math on the extra columns in point cloud 1 and plot a result for each point. Honestly, most of the labview built-in functions rock this operation, except for the need to have more elements support. I can program around it, but it requires a lot more effort and creates a lot of copies of data that I'd rather not have. Spatial data is tough to break up without a lot of pre-processing because you can't just truncate to the first billion elements. I like the idea of combining the X Y numbers into complex numbers. That may speed up the code further.
The minimum specification for our computers is 64 GB of RAM. The code usually takes a few hours to execute, in that scheme of things, taking 5 minutes to read a CSV isn't considered inefficient by me.
‎12-13-2021 01:48 PM
Another data structure that can break the I32 limit are maps, but it is currently not obvious how they could be used in your problem. Not enough information. 😉
What is "horizontal distance"? Just the difference in X (or RE). Remember that when using complex number, the direct 2D distance between two points is just the absolute value of the difference.
I am sure that there could be optimizations that could dramatically speed up your processing (Avoid data copies, inplaceness, avoid resizing, no debugging or indicators for inner data, inlining, parallelization, etc.). No guarantees, of course. (For each type of calculation, I would write and benchmark for at least five different approaches). 😄
‎12-13-2021 04:06 PM
@altenbach wrote:
Another data structure that can break the I32 limit are maps, but it is currently not obvious how they could be used in your problem. Not enough information. 😉
What is "horizontal distance"? Just the difference in X (or RE). Remember that when using complex number, the direct 2D distance between two points is just the absolute value of the difference.
I am sure that there could be optimizations that could dramatically speed up your processing (Avoid data copies, inplaceness, avoid resizing, no debugging or indicators for inner data, inlining, parallelization, etc.). No guarantees, of course. (For each type of calculation, I would write and benchmark for at least five different approaches). 😄
@altenbach I wish I could share all my problems for the world to solve. Unfortunately there are NDAs. Horizontal distance is actual distance between two points sqrt (X^2+Y^2).
We do this calculation 5 times a year all within a week of each other. Optimization is not a priority when we can throw CPUs at the problem.
Now that I think about it more, I am using parallel loops! That means the code to process the data in chunks already exists. So there is no reason I can't just read the files in multiple parts and then handle the data that way.
I remember also running into this issue trying to read large files using the LabVIEW read files code. I think I got around that by using the count at max value and readying over and over again from the file. That code could use an update for sure. Lots of people have files larger than 2GB now.
I am going to dig into maps too. Haven't yet.
‎12-13-2021 05:24 PM
@AMyth wrote:
I remember also running into this issue trying to read large files using the LabVIEW read files code. I think I got around that by using the count at max value and readying over and over again from the file. That code could use an update for sure. Lots of people have files larger than 2GB now.
LabVIEW file IO functions use int64 offsets since LabVIEW 8.0. So unless you use a very old version you should have no problem reading files larger than 2GB. Of course you can't read more than 2GB elements (could be more than 2GB depending on the datatype you read) per file read as it has to be copied into a LabVIEW array handle, but still.
‎12-13-2021 05:32 PM - edited ‎12-13-2021 06:04 PM
So, a complex number does make sense. D= |(X+iY - X'+iY')| just like Pythagoras figured out while sitting around naked.
A Map of <U64,CXD> might just work wonders. A sparse Map would be ideal. Unless you have a few Billion Terabytes of memory hanging around, you'll never fill it.
Leave it to CA to find an approach that is half imaginary 😀
‎12-14-2021 02:54 AM
As you mentioned "Spatial data", I thought I'd link to the R-Tree module of SQLite which is intended for spatial data. I've used this in a sonic inspection system. R trees allow fast lookup of "find all points near this point" type operations.
‎12-14-2021 11:31 AM
For speed, some kind of space partitioning tree would help.
For the problem (64-bit array indexing), you can simply make an array of clusters with arrays. If the first array is full (at any size), fill up the next one. When searching etc., loop over each element, step into the cluster, loop over each element. Some functions can even be used directly on the structure, although you're memory requirements would grow quickly
Not as easy as using the native LV functions, but not too hard either.
It will still be slow. Even just filling 2GB of random data took too long for me to actually try this.