03-06-2011 04:18 PM
HI Ray,
Most of the Code modules are written in Labview but wrapped in CVI DLL. So Teststand either calls the Labview DLL or it calls the CVI DLL but it does not call the VI's directly.
Teststand Version:4.2
03-06-2011 05:43 PM
Norbert,
Let me answer all your questions in the same order
03-06-2011 06:12 PM
Hi Doug,
Yes i do have muliple executions and multiple threads running from Teststand. Yes i do have shared data via station globals. I have a simple watchdogtimer sequence in my process model. Before the process model calls my Mainsequence, i call this sequence in a new thread, which constantly monitors the time taken by the MainSequence, and if it takes more than 15 minutes, it terminates the MainSequence programmatically. This helps me to continue with the test run even if one of the developer has a never ending loop in their sequences. See the attached sequence. I am not sure whether i have a race condition. But i will look into it.
I totally agree with you, something must have changed and i should have had this issue all along and it became apparent now as the number of sequences we execute increased over time. i used to see crashes every now and then before but i ignored. I had like may be 1 or 2 crash out of 15 to 20 test runs.
I did try to reproduce the crash with WinDBG but the call stack did not make much sense to me as i could not understand any of it. But i will try again and maybe send you a log.
03-07-2011 09:50 AM
Hi Nathan,
If I understand you correctly, what you are saying is very consistent with a race condition. The fact that adding more threads makes the crash more likely to occur is a classic symptom of a race condition.
The most common cause is if you are modifying the structure of station globals or file globals in one thread while accessing them in another. Modifying and accessing values at the same time is safe (TestStand automatically protects access to values), but adding and removing properties while accessing even sibling properties from another thread can lead to issues unless you protect access to the globals using a Lock or critical section. This applies both to station globals and file globals and any other teststand variables you might be sharing between threads. If you think you might be doing this, try adding teststand lock steps, using the same name for the lock, around all steps which access or modify globals.
Hope this helps,
-Doug
03-15-2011 01:55 AM
Doug,
I was able to take a mini dump when Teststand Hanged. Please see the report below for the analyze command. I am also attaching the text file with thread info as well. I could not understand what it means by "A breakpoint has been reached". There is no breakpoint in my code. If you look at the attached file, there is so many threads getting opened, not sure how that works, my dlls are not opening that many threads.
0:056> !analyze -v
*******************************************************************************
* *
* Exception Analysis *
* *
*******************************************************************************
!pe
The current thread is unmanaged
GetPageUrlData failed, server returned HTTP status 404
URL requested: http://watson.microsoft.com/StageOne/SeqEdit_exe/4_2_0_134/ntdll_dll/5_1_2600_6055/0000120e.htm?Retr...
FAULTING_IP:
ntdll!DbgBreakPoint+0
7c90120e cc int 3
EXCEPTION_RECORD: ffffffff -- (.exr 0xffffffffffffffff)
.exr 0xffffffffffffffff
ExceptionAddress: 7c90120e (ntdll!DbgBreakPoint)
ExceptionCode: 80000003 (Break instruction exception)
ExceptionFlags: 00000000
NumberParameters: 3
Parameter[0]: 00000000
Parameter[1]: 00000002
Parameter[2]: 00000003
FAULTING_THREAD: 0000064c
DEFAULT_BUCKET_ID: STACKIMMUNE
PROCESS_NAME: SeqEdit.exe
ERROR_CODE: (NTSTATUS) 0x80000003 - {EXCEPTION} Breakpoint A breakpoint has been reached.
EXCEPTION_CODE: (HRESULT) 0x80000003 (2147483651) - One or more arguments are invalid
EXCEPTION_PARAMETER1: 00000000
EXCEPTION_PARAMETER2: 00000002
EXCEPTION_PARAMETER3: 00000003
MOD_LIST: <ANALYSIS/>
NTGLOBALFLAG: 70
APPLICATION_VERIFIER_FLAGS: 0
MANAGED_STACK: !dumpstack -EE
!dumpstack -EE
OS Thread Id: 0x64c (56)
Current frame:
ChildEBP RetAddr Caller,Callee
ADDITIONAL_DEBUG_TEXT: Followup set based on attribute [Is_ChosenCrashFollowupThread] from Frame:[0] on thread:[PSEUDO_THREAD]
LAST_CONTROL_TRANSFER: from 7c952119 to 7c90120e
PRIMARY_PROBLEM_CLASS: STACKIMMUNE
BUGCHECK_STR: APPLICATION_FAULT_STACKIMMUNE
STACK_TEXT:
00000000 00000000 seqedit.exe+0x0
SYMBOL_NAME: seqedit.exe
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: seqedit
DEBUG_FLR_IMAGE_TIMESTAMP: 49e4fdc7
STACK_COMMAND: ** Pseudo Context ** ; kb
BUCKET_ID: MANUAL_BREAKIN
IMAGE_NAME: C:\Program Files\National Instruments\TestStand 4.2\Bin\SeqEdit.exe
FAILURE_BUCKET_ID: STACKIMMUNE_80000003_C:_Program_Files_National_Instruments_TestStand_4.2_Bin_SeqEdit.exe!Unknown
FOLLOWUP_IP:
SeqEdit+0
11000000 4d dec ebp
WATSON_STAGEONE_URL: http://watson.microsoft.com/StageOne/SeqEdit_exe/4_2_0_134/49e4fdc7/ntdll_dll/5_1_2600_6055/4d00f27d...
Followup: MachineOwner
03-16-2011 02:57 PM
Hi Nathan,
Are you adding or removing properties, subproperties, or array elements in the station globals at runtime? If so, have you trying protecting such edits and all accesses with a lock like I suggested in my previous post.
If you are getting memory corruption, it will be hard to diagnose with a crash dump because the crash doesn't always happen where the error in the code is. It could happen later as a side effect of the memory getting corrupted.
Hope this helps,
-Doug
03-16-2011 06:14 PM
Hi Doug,
I can confirm we do not change any properties of station gloabls during the execution intentionally. I need to investigate whether we do it without our knowledge, for example we have property loader steps which loads certain values into the station gloabaI, not sure whether it changes any properties of station globals. I have attached the stationgloabls.ini for your reference. Is there any limitation in the string length of station gloabals? I have two gloabls where string length exceeds 2000 chars. I am not sure where the string length is defined.
03-17-2011 10:34 AM
If you are only changing values in the globals and not adding/removing properties you should be ok. You can always add a lock around all access to the globals just in case if you aren't completely sure. Just make sure you use the same lock name in all places.
There is no fixed limit on string length in TestStand, whatever available memory allows is allowed. 2000 characters is not that many.
Have you tried disabling various parts of your sequence or disabling some of the threads involved to see if the problem goes away (i.e. narrowing down the cause by process of elimination)? When it hangs is it always on the same step or is it random? Is it always hanging on steps that call into your code modules?
-Doug
03-17-2011 07:47 PM
Hi Doug,
I think i may have found one of the main problem. I wrote a simple test sequence and i was able to crash or hang Teststand very often if i load/unolad one particular DLL. In this DLL, i read a file and store array of strings using a Labview function. I used an example code listed here http://zone.ni.com/devzone/cda/epd/p/id/4119
In our test run, this DLL was loading/Unloading several times, causing the function to read the file once every load.
If i change this to load only once every test run, i might find some improvement. I will test this during next week and will publish the results later.
My initial observations are when CVI tries to allocate large amounts of memory using those labview related functions, it can hang Teststand, it can give system level exception or completely crash Teststand or work without any issues.
03-18-2011 10:27 AM
Hi Nathan,
Good to hear you've narrowed down the cause. One thing to consider, since this sounds like a potential memory corruption issue, is to make sure that any buffers you are allocating are big enough to hold the data that is being written to them. If they are not then any data written past the end of the buffers will corrupt memory and potentially lead to crashes, hangs, or other unexpected behavior in later code. Also, if you are trying to allocate really large buffers it's possible the allocation is failing. Make sure that you are checking that the allocation has succeeded before trying to use the memory.
-Doug