11-10-2011 08:26 AM - edited 11-10-2011 08:26 AM
@smercurio_fc wrote:
I'm trying your VI now. It's still running after 20 minutes on the single-folder with 1.2 million files. Something doesn't seem quite right...
Maybe one the recursive VIs isn't exiting properly?
I used an Not a Path constant as a signal to destroy the queue inside "Recursive File Size Queued.vi"
The others should error out and exit, returning the total size processed by that sub-vi...
11-10-2011 08:35 AM
CORRECTION:
@Phillip Brooks wrote:
I created a second version that included queues. One loop to get all the filenames and a
recursiveREENTRANT VI called multiple times to get the file sizes. This pushed the processor to 100% and ran in about half the time.
11-10-2011 10:09 AM - edited 11-10-2011 10:11 AM
I found that the output of "Get File Size" is not consistent! ![]()
I replaced "Get File Size" with "File/Direcotry Info" and totals are much more reliable.
I discovered this when running my test example against the "C:\Program Files\National Instruments\IVI" folder. There is a shortcut in that folder...
EDIT:
A bug that was fixed in LV2009?
| 135928 | - | Get File Size, when file not found, populates indicator with size from last file that was found |
11-10-2011 01:28 PM
@Phillip Brooks wrote:
@smercurio_fc wrote:
I'm trying your VI now. It's still running after 20 minutes on the single-folder with 1.2 million files. Something doesn't seem quite right...
Maybe one the recursive VIs isn't exiting properly?
I used an Not a Path constant as a signal to destroy the queue inside "Recursive File Size Queued.vi"
The others should error out and exit, returning the total size processed by that sub-vi...
No, that's not the issue. There are actually 2 issues. The first has to do with the fact that you have 2 List Folder functions. This is an absolute killer when it comes to listing folders with lots of files. The second has to do with the fact that even though you have 4 parallel queues going, it doesn't really do you much good since the disk I/O is serialized anyway (at least when you're talking single disks - I'm ignoring RAID here). Thus, you're back to the issue of having the get the file information one by one for 1.2 million files, regardless of whether they's in a single folder or in 1000 folders sub-nested 10 levels deep.
11-10-2011 06:50 PM
To save time working out the problem. I switched to smaller directory of files with sizes between 20-24k, and wrote none recursing functions.
Separated has separate timings for the list folder, building an array of files (oddly doing this in a separate loop from the file sizes is faster), and getting the file size.
Info tests use file directory info instead of get file size (which is also oddly faster)
Mix uses my list folder function instead of the LV primitive.
DU is the posted DU function with with the -n flag (no recursion)
And WinAPI is non recursing version of my function that I was using previously
So for an average Average of 10 cached tests of 10K files
| Total (ms) | List Folder (ms) | Build File List(ms) | Get File Sizes(ms) | Size | |
|---|---|---|---|---|---|
| Separated | 788.700000 | 132.500000 | 79.900000 | 576.300000 | 245819811.000000 |
| Pure G | 819.300000 | 245819811.000000 | |||
| Sep Info | 679.900000 | 136.100000 | 79.700000 | 464.100000 | 245819811.000000 |
| Pure G Info | 698.600000 | 245819811.000000 | |||
| Mix | 700.700000 | 245819811.000000 | |||
| Sep Mix | 671.400000 | 14.400000 | 79.300000 | 577.700000 | 245819811.000000 |
| Mix Info | 576.500000 | 245819811.000000 | |||
| Sep Mix Info | 563.900000 | 14.100000 | 80.200000 | 469.600000 | 245819811.000000 |
| DU | 25.800000 | 245819811.000000 | |||
| WinAPI | 9.500000 | 245819811.000000 |
100 tests of 1000 Files
| Total (ms) | List Folder (ms) | Build File List(ms) | Get File Sizes(ms) | Size | |
|---|---|---|---|---|---|
| Separated | 79.140000 | 12.710000 | 7.840000 | 58.560000 | 24562057.000000 |
| Pure G | 82.040000 | 24562057.000000 | |||
| Sep Info | 67.120000 | 12.660000 | 7.800000 | 46.640000 | 24562057.000000 |
| Pure G Info | 69.490000 | 24562057.000000 | |||
| Mix | 71.860000 | 24562057.000000 | |||
| Sep Mix | 67.930000 | 1.610000 | 7.760000 | 58.540000 | 24562057.000000 |
| Mix Info | 57.800000 | 24562057.000000 | |||
| Sep Mix Info | 56.010000 | 1.660000 | 7.640000 | 46.700000 | 24562057.000000 |
| DU | 19.910000 | 24562057.000000 | |||
| WinAPI | 0.960000 | 24562057.000000 |
1000 tests of 100 files
| Total (ms) | List Folder (ms) | Build File List(ms) | Get File Sizes(ms) | Size | |
|---|---|---|---|---|---|
| Separated | 7.802000 | 1.422000 | 0.802000 | 5.571000 | 2464519.000000 |
| Pure G | 8.045000 | 2464519.000000 | |||
| Sep Info | 6.546000 | 1.392000 | 0.769000 | 4.380000 | 2464519.000000 |
| Pure G Info | 6.740000 | 2464519.000000 | |||
| Mix | 6.786000 | 2464519.000000 | |||
| Sep Mix | 6.465000 | 0.225000 | 0.763000 | 5.471000 | 2464519.000000 |
| Mix Info | 5.498000 | 2464519.000000 | |||
| Sep Mix Info | 5.325000 | 0.190000 | 0.767000 | 4.365000 | 2464519.000000 |
| DU | 20.348000 | 2464519.000000 | |||
| WinAPI | 0.140000 | 2464519.000000 |
1000 tests of 10 files
| Total (ms) | List Folder (ms) | Build File List(ms) | Get File Sizes(ms) | Size | |
|---|---|---|---|---|---|
| Separated | 0.925000 | 0.291000 | 0.088000 | 0.541000 | 249394.000000 |
| Pure G | 0.937000 | 249394.000000 | |||
| Sep Info | 0.797000 | 0.253000 | 0.083000 | 0.454000 | 249394.000000 |
| Pure G Info | 0.822000 | 249394.000000 | |||
| Mix | 0.736000 | 249394.000000 | |||
| Sep Mix | 0.701000 | 0.071000 | 0.080000 | 0.549000 | 249394.000000 |
| Mix Info | 0.596000 | 249394.000000 | |||
| Sep Mix Info | 0.587000 | 0.066000 | 0.080000 | 0.439000 | 249394.000000 |
| DU | 19.472000 | 249394.000000 | |||
| WinAPI | 0.053000 | 249394.000000 |
There is also a difference in function between DU and WinAPI vs the others. When they find a shortcut, DU and WinAPI return the size of the shortcut, the others return the size of the file pointed to by the shortcut.
I attached the code, it hasn't been cleaned up, likely buggy (a bug in the CLNs may crash LabVIEW or leak memory), no documentation and few comments (hopefully I deleted ones that are no longer relevant, and the spelling and grammatical errors aren't too bad). In short not a representation of what I would use in production.
11-11-2011 03:24 PM
Out of curiousity I ran a single cached run over night. I don't think the disk cache is holding during these long tests. Since I don't what else would explain the differences in list folder time between the Info vs Get File Size numbers.
| Total (ms) | List Folder (ms) | Build File List(ms) | Get File Sizes(ms) | Size | |
|---|---|---|---|---|---|
| Separated | 399843.000000 | 52063.000000 | 11145.000000 | 336635.000000 | 31161818258.000000 |
| Pure G | 423376.000000 | 31161818258.000000 | |||
| Sep Info | 91164.000000 | 18740.000000 | 11164.000000 | 61260.000000 | 31161818258.000000 |
| Pure G Info | 93594.000000 | 31161818258.000000 | |||
| Mix | 456419.000000 | 31161818258.000000 | |||
| Sep Mix | 462253.000000 | 75509.000000 | 11115.000000 | 375629.000000 | 31161818258.000000 |
| Mix Info | 76684.000000 | 31161818258.000000 | |||
| Sep Mix Info | 74180.000000 | 2010.000000 | 11100.000000 | 61070.000000 | 31161818258.000000 |
| DU | 890.000000 | 31161818258.000000 | |||
| WinAPI | 1323.000000 | 31161818258.000000 |
11-11-2011 04:37 PM
Unfortunately, I have not been able to run the code on my computer since I have LV 8.2. Tried to backsave, but ran into other issues with that. I'll have to try on my computer at home.