LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

How to get folder size in bytes


@smercurio_fc wrote:

 

I'm trying your VI now. It's still running after 20 minutes on the single-folder with 1.2 million files. Something doesn't seem quite right...


Maybe one the recursive VIs isn't exiting properly?

 

I used an Not a Path constant as a signal to destroy the queue inside "Recursive File Size Queued.vi"

 

The others should error out and exit, returning the total size processed by that sub-vi...

0 Kudos
Message 31 of 37
(1,569 Views)
CORRECTION:
 

@Phillip Brooks wrote:

 

I created a second version that included queues. One loop to get all the filenames and a recursive REENTRANT VI called multiple times to get the file sizes. This pushed the processor to 100% and ran in about half the time.

 

 


 

0 Kudos
Message 32 of 37
(1,563 Views)

I found that the output of "Get File Size" is not consistent! Smiley Surprised

 

I replaced "Get File Size" with "File/Direcotry Info" and totals are much more reliable.

 

I discovered this when running my test example against the "C:\Program Files\National Instruments\IVI" folder. There is a shortcut in that folder...

 

EDIT:

 

A bug that was fixed in LV2009?

 

135928 - Get File Size, when file not found, populates indicator with size from last file that was found
0 Kudos
Message 33 of 37
(1,545 Views)

@Phillip Brooks wrote:

@smercurio_fc wrote:

 

I'm trying your VI now. It's still running after 20 minutes on the single-folder with 1.2 million files. Something doesn't seem quite right...


Maybe one the recursive VIs isn't exiting properly?

 

I used an Not a Path constant as a signal to destroy the queue inside "Recursive File Size Queued.vi"

 

The others should error out and exit, returning the total size processed by that sub-vi...


No, that's not the issue. There are actually 2 issues. The first has to do with the fact that you have 2 List Folder functions. This is an absolute killer when it comes to listing folders with lots of files. The second has to do with the fact that even though you have 4 parallel queues going, it doesn't really do you much good since the disk I/O is serialized anyway (at least when you're talking single disks - I'm ignoring RAID here). Thus, you're back to the issue of having the get the file information one by one for 1.2 million files, regardless of whether they's in a single folder or in 1000 folders sub-nested 10 levels deep.

0 Kudos
Message 34 of 37
(1,523 Views)

To save time working out the problem. I switched to smaller directory of files with sizes between 20-24k, and wrote none recursing functions.

 

Separated has separate timings for the list folder, building an array of files (oddly doing this in a separate loop from the file sizes is faster), and getting the file size.

Info tests use file directory info instead of get file size (which is also oddly faster)

Mix uses my list folder function instead of the LV primitive.

DU is the posted DU function with with the -n flag (no recursion)

And WinAPI is non recursing version of my function that I was using previously

 

So for an average Average of 10 cached tests of 10K files

 Total (ms)List Folder (ms)Build File List(ms)Get File Sizes(ms)Size
Separated 788.700000 132.500000 79.900000 576.300000 245819811.000000
Pure G 819.300000       245819811.000000
Sep Info 679.900000 136.100000 79.700000 464.100000 245819811.000000
Pure G Info 698.600000       245819811.000000
Mix 700.700000       245819811.000000
Sep Mix 671.400000 14.400000 79.300000 577.700000 245819811.000000
Mix Info 576.500000       245819811.000000
Sep Mix Info 563.900000 14.100000 80.200000 469.600000 245819811.000000
DU 25.800000       245819811.000000
WinAPI 9.500000       245819811.000000

 

100 tests of 1000 Files

 Total (ms)List Folder (ms)Build File List(ms)Get File Sizes(ms)Size
Separated 79.140000 12.710000 7.840000 58.560000 24562057.000000
Pure G 82.040000       24562057.000000
Sep Info 67.120000 12.660000 7.800000 46.640000 24562057.000000
Pure G Info 69.490000       24562057.000000
Mix 71.860000       24562057.000000
Sep Mix 67.930000 1.610000 7.760000 58.540000 24562057.000000
Mix Info 57.800000       24562057.000000
Sep Mix Info 56.010000 1.660000 7.640000 46.700000 24562057.000000
DU 19.910000       24562057.000000
WinAPI 0.960000       24562057.000000

 

1000 tests of 100 files

 Total (ms)List Folder (ms)Build File List(ms)Get File Sizes(ms)Size
Separated 7.802000 1.422000 0.802000 5.571000 2464519.000000
Pure G 8.045000       2464519.000000
Sep Info 6.546000 1.392000 0.769000 4.380000 2464519.000000
Pure G Info 6.740000       2464519.000000
Mix 6.786000       2464519.000000
Sep Mix 6.465000 0.225000 0.763000 5.471000 2464519.000000
Mix Info 5.498000       2464519.000000
Sep Mix Info 5.325000 0.190000 0.767000 4.365000 2464519.000000
DU 20.348000       2464519.000000
WinAPI 0.140000       2464519.000000

1000 tests of 10 files

 Total (ms)List Folder (ms)Build File List(ms)Get File Sizes(ms)Size
Separated 0.925000 0.291000 0.088000 0.541000 249394.000000
Pure G 0.937000       249394.000000
Sep Info 0.797000 0.253000 0.083000 0.454000 249394.000000
Pure G Info 0.822000       249394.000000
Mix 0.736000       249394.000000
Sep Mix 0.701000 0.071000 0.080000 0.549000 249394.000000
Mix Info 0.596000       249394.000000
Sep Mix Info 0.587000 0.066000 0.080000 0.439000 249394.000000
DU 19.472000       249394.000000
WinAPI 0.053000       249394.000000

There is also a difference in function between DU and WinAPI vs the others. When they find a shortcut, DU and WinAPI return the size of the shortcut, the others return the size of the file pointed to by the shortcut.

 

I attached the code, it hasn't been cleaned up, likely buggy (a bug in the CLNs may crash LabVIEW or leak memory), no documentation and few comments (hopefully I deleted ones that are no longer relevant, and the spelling and grammatical errors aren't too bad). In short not a representation of what I would use in production.

Message 35 of 37
(1,504 Views)

Out of curiousity I ran a single cached run over night. I don't think the disk cache is holding during these long tests. Since I don't what else would explain the differences in list folder time between the Info vs Get File Size numbers.

 

 Total (ms)List Folder (ms)Build File List(ms)Get File Sizes(ms)Size
Separated 399843.000000 52063.000000 11145.000000 336635.000000 31161818258.000000
Pure G 423376.000000       31161818258.000000
Sep Info 91164.000000 18740.000000 11164.000000 61260.000000 31161818258.000000
Pure G Info 93594.000000       31161818258.000000
Mix 456419.000000       31161818258.000000
Sep Mix 462253.000000 75509.000000 11115.000000 375629.000000 31161818258.000000
Mix Info 76684.000000       31161818258.000000
Sep Mix Info 74180.000000 2010.000000 11100.000000 61070.000000 31161818258.000000
DU 890.000000       31161818258.000000
WinAPI 1323.000000       31161818258.000000
0 Kudos
Message 36 of 37
(1,479 Views)

Unfortunately, I have not been able to run the code on my computer since I have LV 8.2. Tried to backsave, but ran into other issues with that. I'll have to try on my computer at home.

0 Kudos
Message 37 of 37
(1,468 Views)