Proving a file has not been altered using information within said file?

DoctorAutomatic · ‎05-10-2022

I don't know what this is called commonly (hash, encryption, digital signing, etc) but I could really use a way to be able to show that a file my program generates, was created only by my own program at a specific time, and has not been altered externally/elsewhere by other means. I realize MD5's accomplish part of this. The problem is that I need the "proof" to reside inside the file itself. So my first thought was the file would consist of two parts, the data of concern, and the md5 hash of that PART of the file. But then of course the data could be changed and rehashed and no one would know. I may have to live with this flaw. I do NOT need some ridiculous bank level security or anything, I'm really interested in something that significantly discourages alteration and allows detection of alteration within reason. I need it to be self contained, no 3rd party certificate authority stuff or network access, and self referential in the sense that the "proof" is there in the file itself.

Again, this doesn't need to be bulletproof, I'm not guarding nuclear codes or billions of dollars. I'm hoping there's something out there a little more elegant and a bit more secure than my md5 idea earlier, but I could live with that if necessary. I need to also reiterate that I won't be able to depend on network access, so something that's purely local or within LabVIEW is ideal. I'd also prefer to implement something that is standardized/known and free as opposed to me making some home-cooked hash/algo of my own in LabVIEW.

Any ideas? Just want to be able to reasonably be able to convince others that a file was created by my program, on a certain date/time, and the contents have not been altered since then without having to reference a second file (like a file containing the md5 hash of the first file). I'd also like the actual contents to still be readable/unencrypted.

Gregory · ‎05-10-2022

You can try something like a cyclical redundancy check: https://forums.ni.com/t5/Example-Code/Calculating-the-CRC32-of-a-File-with-LabVIEW/ta-p/3496230

Also, the timestamp of the file is available using "File/Directory Info"

CLA // BALUG // Unofficial Forum Rules and Guidelines // Ask Smart Questions

LLindenbauer · ‎05-10-2022

You probably won't like to hear this, but you need to work on your threat model. Who is going to check the authenticity of the file? Who do you suspect might tamper with the file? Do you want to guard against accidents or against malicious intent? Is the program running in a compromised environment?

Your suspicion about hashing functions is correct. It sounds like you are looking for https://en.wikipedia.org/wiki/Digital_signature. These systems are widely used and reasonably simple to use. Check out https://docs.microsoft.com/en-us/dotnet/standard/security/cryptographic-signatures for an example.

The requirement to have the signature in the same file is peculiar. If you put both the signature and the file into a zip-file, the zip-file contains both a readable version of the file and the signature, does it not? The same is true if you append the signature to a text file.

To come around to the threat scenario: If you don't need to defend against actively malicious actors, adding the MD5 should be more than sufficient to check for integrity - any additional effort will probably make maintenance harder down the line. On the other hand, if you truly suspect a malicious actor, you might come around, second-guess the validity of the makeshift solution and have to restart the process from the beginning. Then again, if you just need some security theater, an MD5 should look intimidating enough.

Kyle97330 · ‎05-10-2022

You might want to consider at least dropping MD5 for something still public but more secure. Not long ago (2020?) LabVIEW officially deprecated its MD5 function and replaced it with a polymorphic VI implementing a bunch of flavors of the SHA algorithm.

Still though, your requirement to have everything in one file is by definition impossible to really be secure, as mentioned by the other posters. If the file contains 100% of the information needed to validate it, it also contains 100% of the information needed to create a falsified version of it.

Without a reference kept somewhere else, that's always going to be something you can't stop from happening.

Is there any way you can do something like record just the checksum, i.e. no sensitive data, to an external database or something along those lines?

You could also consider some form of WORM (Write-Once-Read-Many) data storage as an alternative. You can go old-school and get a CD/DVD burner, or slightly more modern and get specialized WORM USB drives. Once the file is written there, you can't change it. It could be destroyed, but not changed.

BertMcMahan · ‎05-10-2022

If you're not guarding against a REALLY dedicated attacker, just obfuscate your method for creating the signature. Then anyone who tried to duplicate it couldn't. You could probably reverse engineer it if you were really dedicated but it might be good enough for your work.

For example, if you had the "data" part of your file, you could create a hash of the data, then add 0x04 to every fifth character. That would be relatively easy to check but it'd be a real pain to reverse engineer it. Or do it in reverse; add 0x04 to every fifth byte, then create the hash.

This assumes that your "data validation method" is also hidden. If anyone needs to be able to validate it, I think you're going to have to go with something cryptographic.

RTSLVU · ‎05-11-2022

While I don't have a good answer for you other than to encrypt your data files.

I have to comment on how much questions like this bother me.

If there is no legal or contractual requirement for this. Then it means you (or your company) doesn't trust employees to not falsify test data or basically commit fraud.

I find that insulting to everyone in our field as the vast majority of engineers and technicians have a high sense of integrity when it comes to their work and most would resign before they would falsify test data.

If you are being urged to add this to prove you didn't alter the data your program collected then you should be insulted.

========================
=== Engineer Ambiguously ===
========================

LabVIEW

Proving a file has not been altered using information within said file?

Proving a file has not been altered using information within said file?

Re: Proving a file has not been altered using information within said file?

Re: Proving a file has not been altered using information within said file?

Re: Proving a file has not been altered using information within said file?

Re: Proving a file has not been altered using information within said file?

Re: Proving a file has not been altered using information within said file?