11-07-2011 06:50 PM
>>Yes. Rewrite the operating system.
Yuck yuck.
I found a better solution than rewriting the OS.
I decided to use this technique, which calls an external EXE to iterate through and calculate the folder size:
https://decibel.ni.com/content/docs/DOC-17862
Its a million times faster than a pure-G solution could ever be.
11-07-2011 08:45 PM
Did you try writing it in pure G, my first attempt at a pure g solution is faster than DU at scanning large directory structures (LV2011f2)
All tests number are from the second test so the data read is likely cached.
My Dev Directory (lot's of small files)
DU 26461 ms
LV 14596 ms
In a smaller 7 MB directory I got
DU 110 ms
LV 128 ms
And in my data directory (68 GBs but large files)
DU 3829 ms
LV 3365 ms
While not the same thing, depending on what you're trying to do you might be able to use "Get Volume Info" to check the free space on the drive, which will much faster than scanning the directories.
11-08-2011 06:22 AM
Yes, I tried that. This style of pure-G solution takes forever to iterate through my datafolder. I have datafolders with billions of small files, nested in folders that total well over 20 GB+.
I ended up having to kill the pure-G solution because I gave up after 30 seconds.
But DU.exe takes around 1 second.
11-08-2011 07:49 AM
@josborne wrote:
>>Yes. Rewrite the operating system.
Yuck yuck.
I found a better solution than rewriting the OS.
I decided to use this technique, which calls an external EXE to iterate through and calculate the folder size:
https://decibel.ni.com/content/docs/DOC-17862
Its a million times faster than a pure-G solution could ever be.
You're missing the point. The poster(s) are looking for a way to do it without having to recurse the directories. All of the solutions so far, including yours, recurse the directories. Why? Because that's the way the operating system works.
11-08-2011 09:20 AM
No. If you re-read their posts, including mine, you will see that their real underlying goal is to get the folder size quickly.
11-08-2011 03:53 PM - edited 11-08-2011 03:54 PM
@josborne02 wrote:
Yes, I tried that. This style of pure-G solution takes forever to iterate through my datafolder. I have datafolders with billions of small files, nested in folders that total well over 20 GB+.
I ended up having to kill the pure-G solution because I gave up after 30 seconds.
But DU.exe takes around 1 second.
Hmm, I'm kinda curious what the difference is since I get different results on my setup. I would guess maybe LV is getting stuck on some file (maybe it needs some kind of lock to get the file size), or maybe there's some kind of loop structure in folders (via hardlinks) that labview gets stuck in since it doesn't check for that. If you really have "billions" (so an average file size on the order of 10 bytes on 20GB), then the problem might be from not enough memory. Or if you just ran the test once caching would make an enormous difference. But you probably don't have the time to debug a solved problem.
@smercurio_fc wrote:
@josborne wrote:
>>Yes. Rewrite the operating system.
Yuck yuck.
I found a better solution than rewriting the OS.
I decided to use this technique, which calls an external EXE to iterate through and calculate the folder size:
https://decibel.ni.com/content/docs/DOC-17862
Its a million times faster than a pure-G solution could ever be.
You're missing the point. The poster(s) are looking for a way to do it without having to recurse the directories. All of the solutions so far, including yours, recurse the directories. Why? Because that's the way the operating system works.
There is a way to do it without recursing the files, you put the folder into it's own partition and use get volume info to get the in use size of that partition. You should be able to use a ntfs junction (or maybe a volume mountpoint) to link the folder to the place you want it. I haven't tried this but I don't see a reason why it wouldn't work (I suspect they would be a minor difference between folder and partition size since I think the partition size would include metadata)
http://schinagl.priv.at/nt/hardlinkshellext/hardlinkshellext.html
.
Although if we knew what the folder size was being used for we could suggest a different method that doesn't depend on the folder size.
11-08-2011 04:48 PM
@josborne02 wrote:
Yes, I tried that. This style of pure-G solution takes forever to iterate through my datafolder. I have datafolders with billions of small files, nested in folders that total well over 20 GB+.
I ended up having to kill the pure-G solution because I gave up after 30 seconds.
But DU.exe takes around 1 second.
I find this result quite dubious. Are you sure you were running it correctly? The mere disk I/O for "billions of files" would lead one to believe that it would take longer than 1 second.
11-09-2011 07:20 AM
So at this point, it is an academic problem. As Matt W points out, we are debugging a solved problem. However, I will enlighten:
Here is a folder with 1.2 million files in it. Total size is about 30 GB.
If I pull up the folder properties in Windows, it takes about 1 second to add up the file sizes as seen below:
If I run DU.exe, it takes about a second also. Results are here:
All of the reccomended pure-g solutions involve running a"List Folder" primitive, which will take well over 30 seconds to finish executing. Obviously, it is building up an array of paths internally with millions of elements. That just won't work. It'll overload memory. DU.exe is probably iterating through each file without having to load the entire file list into memory. Perhaps NI should provide a function like this?
Sure there are other workarounds, but why reconfigure my hardware (add a partition) or my file architecture, when DU.exe does the job fine.
Anyway, I have wasted enough time on this. Problem solved.
11-09-2011 07:39 AM
Your example is flawed. It consists of a single folder. When you're talking about a folder hierarchy it's a completely different story. When I ran du on a folder which had many subfolders and far less than 1.2 million files it took a hell of a lot longer than 1 second.
You are free to not continue responding, but that won't change the facts that (a) you need to recurse through the directories and (b) it takes time, depending on how many folders you have and how many files you have inside those folders. There is no silver bullet solution.
11-09-2011 07:44 AM
DU.exe works great.
Problem solved.