A File Reference and its evolving life!

QFang · ‎01-19-2015

Hi all,

I've noticed something that came as a little bit of a surprise to me, but I think I have the explanation, at a hand-wavy higher level anyway. What I have not established is if this is a 'bug' or a 'feature', and if there are any ways the following issue can be avoided at the NI function/api layer.

Consider the file open and file close function. You open a file, you use the reference to the file to write/read data, then at some point you close the reference and the close function spits out the file-path. Here are a couple of tid-bits you may not be aware of (that are easy to test):

Q1) After your application opens/creates a file and starts using the file-reference to make file writes, if an external source changes the file-name of that file... guess what will happen on your next write function call?

A1:: The write successfully updates the newly re-named file with your new data without producing an error or a warning. (At least this is the case if your program is running on a vxWorks cRIO target and the file-name is changed directly on the cRIO via an FTP browser.)

Did this surprise you? It did surprise me! -My handwavy explanation is that the file-pointer is perhaps managed/maintained by the OS, so when the OS tells the file-system to rename that file, the pointer that LabVIEW holds remains valid and the contents of the memory at the pointer location was updated by the OS.

Q2) Continuing from the situation setup in Q1, after writing several new chunks of data to a file now currently named something completely different than when the file reference was originally created, you use the close function to close the file-reference. What do you expect on the file-path output from the close function?? What do you actually get??

A2:: The close function will 'happily' return the ORIGINAL file-name, not the actual file-name it has been successfully writing to(!). This has some potentially significant ramifications on how/what you can use that output for. At this point there is a ton of room for pontifications and more or less 'crazy' schemes for what one could do, but I argue that the bottom line is that your application has at that point completely lost the ability to accurately and securely track your file(s). Yes, you could list a folder and try and 'figure out' if your file-name was re-named during writing and you can in various ways make more or less good 'guesses' on which file you in reality just had open, but you can never really know for sure.

So, what do you guys think?? Is the behavior of returning the (incorrect) original file-path when you close the handle a BUG or a FEATURE?? Would it not be possible for LabVIEW to read back the data contained in the (OS?) pointer location and as needed update the file out path data when it closes a reference? Should we not EXPECT that this would be the behavior?

Q3) Again, continuning from the above situation, lets assume we are back at the state in Q1, writing data to a (re)named file. What happens if the file is deleted by an external process? What happens to the file reference? File function calls using the reference?

A3:: This one is less surprising. The file reference remains 'valid' (because it is a valid reference), but depending on the file function you are calling, you will get error such as error 6 (binary write reports this), or error 4 (a TDMS write will report this error), etc. So as long as you don't rely on file ref-num tests to establish if you are good to go with a file-write or file-action, you should be safe to recover in an appropriate way.. Just don't forget to close the file-reference, even if the file is 'gone', the reference will still remain in memory until you 'close' it (with an error)(?I might be wrong about this last part?)

I am not sure if the above is possible on e.g. Windows, Windows would probably prevent you from re-naming a file that has an open file-handle to it, but this is definitley observable on at least vxWorks cRIO targets. (I don't have PharLap ETS or RTLinux devices so I can't test on those targets.. if you want to test its pretty straigth forward to make a simple test app for it.)

[begin rant-mode related to why I found this out and why this behavior BITES]

There are situations where the above situation could cause some rather annoying issues that, for somewhat contrived reasons related to cRIO file API performance, CPU and memory resource management, are non-trivial to work around. for example, using the NI "list folder" to listing folders take a very hefty chunk of time at 100% cpu that you cannot break up, so polling/listing folders after every file update (or even on a less regular interval) is a big challenge, and if you are really unlucky (or didn't know any better) and gave the list command in a folder with 1000's of files (as opposed to less than about 100 files), the list will lock your CPU at 100% for 10's of seconds... Therefore, you might be tempted to maintain your own look-up table of files so that your application can upload/push/transfer and/or delete files as dictated by your application specific conditions... except that only works until some prankster or well-intention person remotes in and starts changing file-names, because then your carefully maintained list of file-names/paths' suddenly fall appart. 😠

[\end rant]

QFang
-------------
CLD LabVIEW 7.1 to 2016

Bob_Schor · ‎01-19-2015

I suspect the issue is the overhead that would be necessary to be constantly "aware" when another routine over which you have no control changes something that your program needs. In situations where this could occur, one needs to "harden" the system to prevent such things (such as running a Real-Time OS and restricting access to specific IPs or NICs). Indeed, one function of the RTOS is to prevent Windows, itself, from messing things up (like having a Virus Scan run during a time-critical loop).

BS

TroyK · ‎01-19-2015

Would the "Deny Access" file function make it work the way you need it to in vxWorks?

File I/O > Advanced File Functions > Deny Access

Troy - CLD "If a hammer is the only tool you have, everything starts to look like a nail." ~ Maslow/Kaplan - Law of the instrument

rolfk · ‎01-20-2015

The File Path returned by the Close File function is indeed always the file path that was used to open the file. Changing a file name under the nose of another application is under Windows at best a dangerous adventure, but I believe is possible too. It's how I get around the situation where a (crashed) Windows app still has a file handle open to a file that prevents me from deleting that file from the disk. Renaming it suddenly makes the file overwritable/deletable.

This working under Windows in some sorts certainly makes it less strange that you can do that under VxWorks too. Possibly the removal of the Write Deny rights as mentioned in the other post might prevent this, but it's no guarantee, especially on a RT system. Those systems are not programmed around the assumption that multiple users can access the same resources simultanously on the system and therefor take little to no precautions to prevent (or secure) that.

As to listing a folder contents: That is the stuff, many Linux (and other OSes) programmers have spent man years of time to try to come up with a solution that is performant, multithreading- and multiusersafe and what else and it's something that is always a tradeoff in one or more of these things.

Rolf Kalbermatter My Blog

DEMO, Electronic and Mechanical Support department, room 36.LB00.390

CoastalMaineBird · ‎01-20-2015

Interesting problem. I did some tests.

On Windows, the code above throws an error in the RENAME function - "File Already Open"

The second text is not written, but the file is closed and contains the first text.

The path returned is the original path.

If I replace the RENAME operation with a 10-sec delay, and during that delay, I use Windows Explorer to rename the file: the rename operation fails with an error: "File is open".

All of that is consistent and sensible.

On a PXI box, running PharLap 13.1:

The code as shown above returns NO error.

The path returned is the original path "Eraseme.txt"

The file exists as "Eraseme2.txt" but it's EMPTY !

If I replace the RENAME with a delay, and change the name manually during the delay, then it's the same:

NO error.

The path returned is the original path "Eraseme.txt"

The file exists as "Eraseme2.txt" but it's EMPTY !

That seems wrong to me.

Steve Bird
Culverson Software - Elegant software that is a pleasure to use.
Culverson.com

Blog for (mostly LabVIEW) programmers: Tips And Tricks

CoastalMaineBird · ‎01-20-2015

That is the stuff, many Linux (and other OSes) programmers have spent man years of time to try to come up with a solution that is performant, multithreading- and multiusersafe

Indeed. Just thinking about the problem gives me a headache. It's easy to see the issues that need to be resolved and it's like a balloon: squeeze it down over here and it bulges out over there.

Steve Bird
Culverson Software - Elegant software that is a pleasure to use.
Culverson.com

Blog for (mostly LabVIEW) programmers: Tips And Tricks

QFang · ‎01-20-2015

Hey guys, thanks for turning out your comments on this thread!

-Deny Access : still able to re-name (and delete) the file via FTP browser (didn't test other file avenues). I think this is for the same reason that NI vxWorks targets (such as cRIO-9014) do not support the concept of different users with different rights, as such, everyone have access rights to everything at the OS level. Another issue for me would be that "Deny Access" does not work on TDMS file references, so even if it worked, it would not help me.

--> I strongly suspect that these things are non-issues or issues that can be properly managed, on the new NI LinuxRT targets since (the ftp is disabled by default) it supports user accounts and user restrictions on files/folders. The controller could simply create the files in a tree where 'nobody else' has write access.

Obviously nobody should mess around with files on a (running) cRIO, but customers don't always do what they are supposed to do. 😉

As far as the 'resources' or overhead to update the file-refnum with the new information, this would not be needed to be done in a polling fashion, simply, when the file-close function is called, as part of that call it updates its internal register from the pointer data, so this should be a low overhead operation I would think? If that is a true concern, a boolean input defaulting to not updating or a separate 'advanced close' could be created?

I've included a zip with the LV2013 project and test VI's (one for tdms one for binary) that I've used. nothing fancy, but in the interest of full disclosure. The snippet is the 'binary file' test vi, in case you just want a quick peak:

Steve Bird's findings of (yet) another behavior on Pharlap systems is also very interesting, I think!!

[EDIT] JUST TO CLARIFY, on vxWorks, the re-named file keeps being successfully written to, unlike the PharLaps' empty file that Steve Bird found.

QFang
-------------
CLD LabVIEW 7.1 to 2016

QFang · ‎01-20-2015

Steve Bird :: Indeed I have a deep respect for the people that work at trying to design file-systems and OS file level functions! It is about as far from trivial as I can imagine, especially when you start moving into niche constraints, where you need 'atomic, fail-safe file updates', while being 'cpu and memory restricted' and 'must support large file and folder structures AND fast'...

a bit of a side-bar, on 'that stuff' which this all relates to:

The balloon analogy is very good and matches how my approach(es) have evolved over time.. initially, we were unaware of the/any file-system related issues. Then we learned that performance goes through the floor if you have a) very large files or b) a large ammount of items (folder or files) at any (folder) level (even if it is not the folder you are currently operating in/on), large being nebulous, but seemingly you are mostly ok with a count of 100 to 120 or less at any level. So we changed our log outputs to live within those limitations, in the process complicating the 'as needed, delete oldest file first, must work even after rebooting (e.g. must scan drive)'. We then next learned that as the drive filled up with 'safe' folders and folder levels, due to a new customer requirement of 'one scan per ASCI WITSML file output' along with 'maximum log file retention', scanning all the files in a 'side loop' and 'deleting/trimming folders as required' was not fast enough to keep up with file production. In fact, with 12k+ files (in hundreds of folders) the over-use of list and recursive file scan's fell behind so far that new file production overtook file-cleanup (after about 60 days or run-time), then the 2GB hdd eventually overfilled, and it all went to **bleep**'s. (The limitations of the file-system above are specific to the (old) Reliance file system that NI uses).

We handled that by making some forced assumptions about files only being created by the application itself, combined with a one-time comprehensive scan of all files/folders at boot-up prior to data acquisitions. We hold the information in a two-levels deep variant attribute lookup, with a helper variant look-up for folder summary (that we can do fast limit checks on). At that point it was fairly robust, as long as nobody renames or adds files to the system (since those files would not be cataloged and thus be 'invisible' to our cleaners), but its an evolving thing for sure, constantly trying to balance fault tolerance (towards various corner cases) with performance and resource management.. Its a challenging and rewarding topic I think.. By the way, we no longer use the recursive file/folder listing as it tied up the CPU too much, instead we wrote our own version with built-in 'sleep time' so that we can gradually build the listing while still giving time to other startup and initialization processes.

QFang
-------------
CLD LabVIEW 7.1 to 2016

nathand · ‎01-20-2015

I'm sure there's reasons for it, but storing that many files and doing that much file IO on a cRIO sounds like a bad idea to me. All the storage on a cRIO is flash-based so you get a limited number of write cycles - I've seen numbers between 100,000 and 1 million per sector. So, if you're writing/moving/deleting lots of small files frequently, you'll eventually get errors or bad data, probably with no warning. I've never actually seen this happen but I've always tried to limit how often I wrote to a flash device. If you're writing to an external USB device this is less of a problem since at least that's replaceable.

QFang · ‎01-20-2015

Hi nathand,

I hear you, but as you suspect, there are reasons for everything. I did (with NI) some modelling on SSD life in our use-case, and even with artificially accellerated worst-case scenarios, the wear-out is more several years out. At actual reasonable field conditions, the estimated life-time is past 5-10 years.

We keep this in mind as much as we can. The data rate is relatively speaking low (~few kB of data every several minutes), however limited to no network access combined with questionable power source and UPS availability drives the 'max data retention' and 'can't use USB drive' parts of the design. (USB drives can become unmountable after unexpected power interruptions for example, plus they must be FAT formated, which means a bit of bad luck ruins the file allocation table, rendering the data on the drive 'lost' except if by some heroic effort the table was reconstructed, possible but not ideal..) We also typically set the 'max space' limit in our in-house disk maintenance routine to leave ~20+% of the total space on the SSD. In fact, we're not too happy with the mounting options for drives on e.g. the LinuxRT products, because despite what (most?) people think about ext3 and ext4, they are far from being as safe as e.g. Reliance or Reliance NITRO, both have (several) possible conditions that could cause lost data and/or file corruption.

[Edit] for the devices that are not on any network, data is collected seasonally, and often involves helicopter rides... so to say that corrupted or lost data is 'expensive' is an understatement. This is starting to diverge from the topic of the thread quite a bit though... THANKS QFang... so I best shut-up and re-focus the thread!! The file-refnum and external deletion/renaming is obviously not an issue for systems running 'off grid'.

QFang
-------------
CLD LabVIEW 7.1 to 2016

LabVIEW

A File Reference and its evolving life!

A File Reference and its evolving life!

Re: A File Reference and its evolving life!

Re: A File Reference and its evolving life!

Re: A File Reference and its evolving life!

Re: A File Reference and its evolving life!

Re: A File Reference and its evolving life!

Re: A File Reference and its evolving life!

Re: A File Reference and its evolving life!

Re: A File Reference and its evolving life!

Re: A File Reference and its evolving life!