12-10-2021 01:52 PM
Can someone talk me out of wanting 64 bit indexing of arrays, especially with LabVIEW 64 bit? A lot of my work involves reading large CSV files and with projects getting larger, I'm running into issues with the 32 bit index limit at times. I know I can write code to read part of the file and then only do operations on that part of the file and then continue with the rest, but that is a lot of work for what we do because ideally we are always looking at all the data all the time. RAM isn't the limitation for us.
12-10-2021 02:40 PM
The array size is I32 datatype so there is nothing you can really do except stay within that. A 1D array cannot contain more elements. You need to program your way around that.
(It seems ridiculous to store such large data in CSV files, because the parsing will be expensive and inefficient. Can you switch to binary files?)
12-10-2021 02:52 PM
I agree with your comment about large data in CSV files. Unfortunately, our choices are proprietary binary or CSV from the software that exports the data.
12-10-2021 04:04 PM
Can you explain your use case in a bit more details? How many columns/rows? What do they represent? What kind of processing do you need to do? (.i.e. does an atomic processing step involve more than one point, e.g. FFT). What is the datatype (U8? DBL?). For example if there are only two columns (X and Y), maybe you can combine them into a complex number?
12-10-2021 04:42 PM
That is a TON of data. I don't have a way around it but I'm curious as well, are you really looking at over 4 billion elements of data at once? I think if you could explain what you're doing we might be able to offer some alternative solutions. Perhaps a SQLite database would fit your needs?
12-10-2021 05:55 PM
For a little bit of history (because I was the lead developer on 64-bit LabVIEW)...
We chose not to switch to 64-bit (or pointer-sized, which depended on what platform you were running on) indices because we didn't want to create an incompatibility between 32-bit and 64-bit LabVIEW. (That is, if we supported 64-bit indices in 64-bit LV, what would we do on 32-bit LV? Should we truncate the data? Refuse to load the VI? Somehow support 64-bit indices on a 32-bit system?)
We felt like the use cases for 64-bit indices were rare, but the use cases for compatibility between the versions was much higher. That decision could certainly be changed at some point in the future, but it has far-reaching implications. For example, the "N" and "i" of loops would need to switch to 64-bit integers by default (since they are related to array indexing). Then to reduce coercion dots, you should at least consider making the default integer type a 64-bit integer.
I think we're still in the world where most people don't need 64-bit integers _by default_. This may change in the future.
A couple of other points...
* I think array indices are signed, so the real limit is 2^31-1 (about 2 billion), not 4 billion.
* Keep in mind that the data element size can be (somewhat) arbitrarily large. If you try to create a billion-element array of billion-character strings, you will run out of memory quickly.
12-11-2021 02:47 AM - edited 12-11-2021 03:02 AM
Also most computers are still not quite up to snuff to process so much data. 2 billion array elements (yes the array index is a signed int, so there is one unused bit as LabVIEW doesn’t use negative array indices or sizes anywhere for anything AFAIK) would mean 8 billion bytes when the array is single precision floats, which must be in one single continuous memory block. And it is very hard to do anything useful with this data without causing at least one copy if not more. Even server grade systems with 64 GB memory will quickly run out of memory with such data processing. You may be able to push the limit a little beyond the physical memory limit but not that much since the memory manager will have a harder and harder time to find one huge continuous memory block that is large enough.
So no the time is indeed not quite there yet that LabVIEW could gain a real advantage by throwing away all backward compatibility and adopting 64 bit array indices. The only regret in hindsight is that they used signed 32-bit integer, which wastes one bit that is nowhere used. But back then there had someone just said only a few years before that computers were nevver gonna need more than 640 kB of memory! 😀
And when LabVIEW for Windows came out in 1992, lots of customers where moaning and complaining that LabVIEW was quite slow and painful to use on machines with a whooping 4 MB of RAM, which was the minimal requirement in the marketing documentation. Machines with 8MB of RAM were considered beefy high end machines! It was very hard to imagine back then that computers would someday contain enough memory that 32 bit pointers and indices won’t suffice. Also most C compilers didn’t even have support for 64 bit integers so even if the LabVIEW developers had wanted to, they could not have used 64 bit indices unless they would have resorted to custom assembly programming!
12-11-2021 03:28 AM
I second Bert's suggestion of possibly converting the CSV files into SQLite or similar format designed to handle very large data sets.
12-11-2021 07:46 AM
I certainly cannot talk you out of wanting 64bit Arrays. I can state that you do not have them now and need to code within the language choices available.
Way back near the stone age I wrote a program that took up more than half of my whopping 64K memory. I remember the arguments at the time. Programmers wanted better memory and speed engineers, engineers wanted better programmers.
Your basic problem is; the proprietary data export author did not choose a format that scaled up to your current use case. So, your 2 rational choices are
12-11-2021 05:36 PM
@rolfk wrote:
Also most computers are still not quite up to snuff to process so much data. 2 billion array elements (yes the array index is a signed int, so there is one unused bit as LabVIEW doesn’t use negative array indices or sizes anywhere for anything AFAIK) would mean 8 billion bytes when the array is single precision floats, which must be in one single continuous memory block. And it is very hard to do anything useful with this data without causing at least one copy if not more. Even server grade systems with 64 GB memory will quickly run out of memory with such data processing. You may be able to push the limit a little beyond the physical memory limit but not that much since the memory manager will have a harder and harder time to find one huge continuous memory block that is large enough.
So no the time is indeed not quite there yet that LabVIEW could gain a real advantage by throwing away all backward compatibility and adopting 64 bit array indices. The only regret in hindsight is that they used signed 32-bit integer, which wastes one bit that is nowhere used. But back then there had someone just said only a few years before that computers were nevver gonna need more than 640 kB of memory! 😀
And when LabVIEW for Windows came out in 1992, lots of customers where moaning and complaining that LabVIEW was quite slow and painful to use on machines with a whooping 4 MB of RAM, which was the minimal requirement in the marketing documentation. Machines with 8MB of RAM were considered beefy high end machines! It was very hard to imagine back then that computers would someday contain enough memory that 32 bit pointers and indices won’t suffice. Also most C compilers didn’t even have support for 64 bit integers so even if the LabVIEW developers had wanted to, they could not have used 64 bit indices unless they would have resorted to custom assembly programming!
I was going to say I can imagine scenarios where a negative index would be helpful, but every time I thought about it, I talked myself out of it. The only reason I can think of it being signed is so that you won't wrap around if some algorithm you made tried to index an array with a negative number, and that's pretty sketchy at best.