03-21-2017 09:16 AM
I inherited a DLL that the project uses to generate a checksum. The DLL does this by building a list of project files and calculating the MD5 hash for each. There are excluded files, such as anything subversion(SVN) uses to track version. These calculated MD5 hash values get saved to a text file. They also get sent to a roll up hash function that we built that 'and' s each into one overall hash.
Later we run, using the same DLL, a verify hash which uses the exact same MD5 hash generator function to do a string comparison (strncmp() to be specific), the source of the 'known hash' is the text file from earlier.
Here is where I found something unexpected, the CVI .c source files inside the project have a changing hash. Specifically it is just the .c source files that generate both a DLL and an EXE but everything else in that workspace is matching and that includes the generated files, i.e. the DLL/EXE.
I tried making all the files read only using the SetAttrib() and performing that before a MD5 is calculated. Then I ran the generate and the verify, it still changed. Ran the generate again, thinking well now they are all read only and it should not change and it still changed when I ran the verify.
I am using CVI 2013 SP2 and dotnet version is 4.5.2, we are not using FIPS compliant algorithms. Pasted the function that gets the MD5 hash below.
I am wondering if anyone has some ideas where this behavior is coming from, I could always add the .c files to the ignore list but I would rather understand what is going on before I do that.
char* getFileMD5(const char* filePath)
{
Initialize_mscorlib();
System_IO_FileStream fileStream = NULL;
System_Text_StringBuilder sbInstance = NULL;
System_Security_Cryptography_MD5 cryptoInstance = NULL;
char* hashString = NULL;
unsigned char* hash = NULL;
ssize_t hashLength = 0;
// Create a new instance to the cryptography class
System_Security_Cryptography_MD5_Create(&cryptoInstance, NULL);
System_Security_Cryptography_MD5_Initialize(cryptoInstance, NULL);
// Open a file stream to the provided file path, probably need to handle any exceptions raised
System_IO_File_OpenRead(filePath, &fileStream, NULL);
// Computer the MD5 hash of the file
System_Security_Cryptography_MD5_ComputeHash(cryptoInstance, (System_IO_Stream) fileStream, &hash, &hashLength, NULL);
// Convert the MD5 hash from a byte array to string readable format
System_BitConverter_ToString_1(hash, hashLength, &hashString, NULL);
// Replace hyphens with an empty character
System_Text_StringBuilder__Create_2(&sbInstance, hashString, NULL);
System_Text_StringBuilder_Get_Length(sbInstance, &hashLength, NULL);
System_Text_StringBuilder_Replace(sbInstance, "-", "", 0, strlen(hashString), &sbInstance, NULL);
System_Text_StringBuilder_ToString(sbInstance, &hashString, NULL);
StringLowerCase(hashString);
// Clean up
CDotNetFreeMemory(hash);
CDotNetDiscardHandle(sbInstance);
CDotNetDiscardHandle(fileStream);
CDotNetDiscardHandle(cryptoInstance);
Close_mscorlib();
return hashString;
}
03-22-2017 03:39 PM
The build is going to change every time you build the executable, due to a few things. Time and date will have changed, which are included in the build. Data structures that are built internally are also dynamically allocated. Keys in the hash table are based on pointer values of this allocated memory. The following stack overflow page describes this a little, as well.
03-23-2017 07:39 AM
Ok, I understand now where this change is coming from in the .c files. Let me ask a follow on question, does that pointer allocation occur each time the file is accessed, even if it is accessed outside of a compiler? The thing I am getting at here is that, I would think that the source file is just an ascii text character file that is being read in. This makes me think that the source file is actually stored as a precursor to executable code. That the system is looking at the file as 'What does the system need to allocate', maybe even responds with I have these types of resources.
It might be easier for me to just create a function that prints the .c file as a text file and do a checksum on that.
03-24-2017 11:50 AM
Hi CJHammond,
If you are comparing a text file of a .c file it should not include the different pointer allocation even if it was compiled at different times because it would just include the code itself. This sounds like it would be a good way to check exclusively for changes in the coding.
Thanks,
ShaneK
Applications Engineering
04-04-2017 11:09 AM
I believe I have the issue narrowed down to the 'how' or the process of determining checksum. I basically ended up rewriting almost the entire .DLL and I now get the same answer for checksum.
What I did to fix things up was this:
I wrote a new function that outputs all of the MD5 Hash and the file name(with path) to a text file.
I also wrote a function that runs the first function and compares two text files and writes the outcome to another text file.
Then I used the function right after Deploying a new application build.
Ran some of tests in the application.
Then ran the compare function. Jumped up and down. High five everyone in the office. Accept praise and adoration. Obviously, successful execution.
Went and made a change to a source file, saved. Reran Compare, got a fail for the test. Which is great, detection of a fail condition.
SO, lessons learned.
When dealing with Checksum, make sure that the exact and I mean 100% exact same code is being executed to calculate values. That means that even the list gets sorted in the exact same manner.
One of the things the last developer did was generate a roll up hash right at the beginning of the verify/compare function. This extra file open might affect the outcome. I would not think it would but keep things the same, it is just easier and takes out a variable.
Do not get fancy and try comparing a value in a file to a value in memory. By generating two files, it gives a good way to do a 'what went wrong analysis' outside of code and file open string compares are easy to do when ready to do it with code.