NI TestStand

cancel
Showing results for 
Search instead for 
Did you mean: 

Teststand Application either crashes or hungs

HI Ray,

 

Most of the Code modules are written in Labview but wrapped in CVI DLL. So Teststand either calls the Labview DLL or it calls the CVI DLL but it does not call the VI's directly.

 

Teststand Version:4.2

0 Kudos
Message 11 of 20
(2,880 Views)

Norbert,

 

Let me answer all your questions in the same order

 

  1. Under Sequence file properties, Load option is set to "Load Dynamically" and Unload option is set to "Unload When Sequence File Is closed". I hope this will override all the step loading and unloading option. However i just found out that some of the sequences did have "Step Load option" and "Step Unload option". This could be one of the issue, i will change it and see whether it improves.
  2. I can confirm it is not stuck in an endless loop. But i am not sure how to check whether the code module is hanged. I agree if the code module is hanged it would explain teststand hanging issue. Sometimes Teststand finished executing the step and it Updated the step status to Error or Done and then it hangs. See the attached picture.
  3. No i did not encounter these kind of issues with default examples. I did not encounter these issues even with my sequences for a very long time. It starteded happening for the past three months.
  4. I am using NI sequence editor but i have customised the process model.
  5. The CPU load is 52-56% and the memory allocation is Total-3134892, Available-1607068, System Cache-501224
  6. I took a memory log of private bytes for yesterday's run. It reached only about 500MB and hanged at one of the step saying system level exception but could not terminate.
  7. It is networked. Yes there is a possibility that it might have downloaded a security patch or any other Windows updates. But not sure what i can do about it.
  8. I do agree with heat related issues. I ran the Windows memory Diagnostic tool for several hours. The tool could not find any issues with the memory. I think the tool constantly writes to the RAM and checks for any inconsistency. Also the system is not in a confined spot.
  9. Yes, if i use another PC with similar configuration, it exhibits the same problem. But the frequency of crash came down. With my previous PC, i get a minimum of 4 crash out  of 5 runs, but with the second PC, i get 2 or 3 crash out of 5.

 

0 Kudos
Message 12 of 20
(2,880 Views)

Hi Doug,

 

Yes i do have muliple executions and multiple threads running from Teststand. Yes i do have shared data via station globals. I have a simple watchdogtimer sequence in my process model. Before the process model calls my Mainsequence, i call this sequence in a new thread, which constantly monitors the time taken by the MainSequence, and if it takes more than 15 minutes, it terminates the MainSequence programmatically. This helps me to continue with the test run even if one of the developer has a never ending loop in their sequences. See the attached sequence. I am not sure whether i have a race condition. But i will look into it.

 

I totally agree with you, something must have changed and i should have had this issue all along and it became apparent now as the number of sequences we execute increased over time. i used to see crashes every now and then before but i ignored. I had like may be 1 or 2 crash out of 15 to 20 test runs.

 

I did try to reproduce the crash with WinDBG but the call stack did not make much sense to me as i could not understand any of it. But i will try again and maybe send you a log.

0 Kudos
Message 13 of 20
(2,876 Views)

Hi Nathan,

 

If I understand you correctly, what you are saying is very consistent with a race condition. The fact that adding more threads makes the crash more likely to occur is a classic symptom of a race condition.

 

The most common cause is if you are modifying the structure of station globals or file globals in one thread while accessing them in another. Modifying and accessing values at the same time is safe (TestStand automatically protects access to values), but adding and removing properties while accessing even sibling properties from another thread can lead to issues unless you protect access to the globals using a Lock or critical section. This applies both to station globals and file globals and any other teststand variables you might be sharing between threads. If you think you might be doing this, try adding teststand lock steps, using the same name for the lock, around all steps which access or modify globals.

 

Hope this helps,

-Doug

0 Kudos
Message 14 of 20
(2,853 Views)

Doug,

 

I was able to take a mini dump when Teststand Hanged. Please see the report below for the analyze command. I am also attaching the text file with thread info as well. I could not understand what it means by "A breakpoint has been reached". There is no breakpoint in my code. If you look at the attached file, there is so many threads getting opened, not sure how that works, my dlls are not opening that many threads.

 

0:056> !analyze -v
*******************************************************************************
*                                                                             *
*                        Exception Analysis                                   *
*                                                                             *
*******************************************************************************

!pe
The current thread is unmanaged
GetPageUrlData failed, server returned HTTP status 404
URL requested: http://watson.microsoft.com/StageOne/SeqEdit_exe/4_2_0_134/ntdll_dll/5_1_2600_6055/0000120e.htm?Retr...

FAULTING_IP:
ntdll!DbgBreakPoint+0
7c90120e cc              int     3

EXCEPTION_RECORD:  ffffffff -- (.exr 0xffffffffffffffff)
.exr 0xffffffffffffffff
ExceptionAddress: 7c90120e (ntdll!DbgBreakPoint)
   ExceptionCode: 80000003 (Break instruction exception)
  ExceptionFlags: 00000000
NumberParameters: 3
   Parameter[0]: 00000000
   Parameter[1]: 00000002
   Parameter[2]: 00000003

FAULTING_THREAD:  0000064c

DEFAULT_BUCKET_ID:  STACKIMMUNE

PROCESS_NAME:  SeqEdit.exe

ERROR_CODE: (NTSTATUS) 0x80000003 - {EXCEPTION}  Breakpoint  A breakpoint has been reached.

EXCEPTION_CODE: (HRESULT) 0x80000003 (2147483651) - One or more arguments are invalid

EXCEPTION_PARAMETER1:  00000000

EXCEPTION_PARAMETER2:  00000002

EXCEPTION_PARAMETER3:  00000003

MOD_LIST: <ANALYSIS/>

NTGLOBALFLAG:  70

APPLICATION_VERIFIER_FLAGS:  0

MANAGED_STACK: !dumpstack -EE
!dumpstack -EE
OS Thread Id: 0x64c (56)
Current frame:
ChildEBP RetAddr  Caller,Callee

ADDITIONAL_DEBUG_TEXT:  Followup set based on attribute [Is_ChosenCrashFollowupThread] from Frame:[0] on thread:[PSEUDO_THREAD]

LAST_CONTROL_TRANSFER:  from 7c952119 to 7c90120e

PRIMARY_PROBLEM_CLASS:  STACKIMMUNE

BUGCHECK_STR:  APPLICATION_FAULT_STACKIMMUNE

STACK_TEXT:  
00000000 00000000 seqedit.exe+0x0


SYMBOL_NAME:  seqedit.exe

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: seqedit

DEBUG_FLR_IMAGE_TIMESTAMP:  49e4fdc7

STACK_COMMAND:  ** Pseudo Context ** ; kb

BUCKET_ID:  MANUAL_BREAKIN

IMAGE_NAME:  C:\Program Files\National Instruments\TestStand 4.2\Bin\SeqEdit.exe

FAILURE_BUCKET_ID:  STACKIMMUNE_80000003_C:_Program_Files_National_Instruments_TestStand_4.2_Bin_SeqEdit.exe!Unknown

FOLLOWUP_IP:
SeqEdit+0
11000000 4d              dec     ebp

WATSON_STAGEONE_URL:  http://watson.microsoft.com/StageOne/SeqEdit_exe/4_2_0_134/49e4fdc7/ntdll_dll/5_1_2600_6055/4d00f27d...

Followup: MachineOwner

0 Kudos
Message 15 of 20
(2,817 Views)

Hi Nathan,

 

Are you adding or removing properties, subproperties, or array elements in the station globals at runtime? If so, have you trying protecting such edits and all accesses with a lock like I suggested in my previous post.

 

If you are getting memory corruption, it will be hard to diagnose with a crash dump because the crash doesn't always happen where the error in the code is. It could happen later as a side effect of the memory getting corrupted.

 

Hope this helps,

-Doug

0 Kudos
Message 16 of 20
(2,785 Views)

Hi Doug,

 

I can confirm we do not change any properties of station gloabls during the execution intentionally. I need to investigate whether we do it without our knowledge, for example we have property loader steps which loads certain values into the station gloabaI, not sure whether it changes any properties of station globals. I have attached the stationgloabls.ini for your reference. Is there any limitation in the string length of station gloabals? I have two gloabls where string length exceeds 2000 chars. I am not sure where the string length is defined.

0 Kudos
Message 17 of 20
(2,775 Views)

If you are only changing values in the globals and not adding/removing properties you should be ok. You can always add a lock around all access to the globals just in case if you aren't completely sure. Just make sure you use the same lock name in all places.

 

There is no fixed limit on string length in TestStand, whatever available memory allows is allowed. 2000 characters is not that many.

 

Have you tried disabling various parts of your sequence or disabling some of the threads involved to see if the problem goes away (i.e. narrowing down the cause by process of elimination)? When it hangs is it always on the same step or is it random? Is it always hanging on steps that call into your code modules?

 

-Doug

0 Kudos
Message 18 of 20
(2,762 Views)

Hi Doug,

 

I think i may have found one of the main problem. I wrote a simple test sequence and i was able to crash or hang Teststand very often if i load/unolad one particular DLL. In this DLL, i read a file and store array of strings using a Labview function. I used an example code listed here http://zone.ni.com/devzone/cda/epd/p/id/4119

In our test run, this DLL was loading/Unloading several times, causing the function to read the file once every load. 

If i change this to load only once every test run, i might find some improvement. I will test this during next week and will publish the results later.

 

My initial observations are when CVI tries to allocate large amounts of memory using those labview related functions, it can hang Teststand, it can give system level exception or completely crash Teststand or work without any issues.

0 Kudos
Message 19 of 20
(2,745 Views)

Hi Nathan,

 

Good to hear you've narrowed down the cause. One thing to consider, since this sounds like a potential memory corruption issue, is to make sure that any buffers you are allocating are big enough to hold the data that is being written to them. If they are not then any data written past the end of the buffers will corrupt memory and potentially lead to crashes, hangs, or other unexpected behavior in later code. Also, if you are trying to allocate really large buffers it's possible the allocation is failing. Make sure that you are checking that the allocation has succeeded before trying to use the memory.

 

-Doug

0 Kudos
Message 20 of 20
(2,723 Views)