11-07-2008 12:42 AM
Hi
We have been facing a strange problem for over 2 weeks.
We have developed an application that works on client-server communication.
PXI-RT acts as the server and the RT application is delpoyed to run on startup.
Host PC receives data continuously from the RT for analysis, processing and display.
the RT acquries data from a max of 63 channels (PXI and SCXI), decimates the data by a factor defined by the user and transmits it to the host.
for over few days, the RT code was observed to abort execution after nearly 20 min of execution. Data was acquried from a max of 63 channels at 5KS/sec.The RT aborted irrespective of the number of channels being configured for acquisiton.
The duration after which the code aborts varies between 10 min and 20 min.
We were not able to trace the reason for this behaviour. We tried increasing the delays and timeouts in the communication vis.
If anyone out there can suggest the possible fixes to this issue, it would be grateful.
Any more information regardign the system, if needed, will be given.
Thank you
11-07-2008 08:11 AM
You say that the RT aborts execution. What does that mean?
Does the RT stop running the VI and stays in that state until rebooted?
Does it reboot on its own?
Are there any error messages?
Norbert
11-07-2008 08:35 AM
As Norbert said, you'll have to elaborate on the specifics, but it sounds like you might possibly have a fast memory leak? (Just a guess, based on the loose description.) Have you tried monitoring the memory and CPU usage on your RT target?
RT System manager works pretty well for this, under Tools->Real-Time Module->System Manager.
A new tool in LabVIEW 8.6 is the Distributed System Manager, under the National Instruments group on your start menu or under Tools->Distributed System Manager. DSM works fantastically well and gives you a wealth of information that wasn't easily available before.
Either way, maybe try keeping an eye on the memory usage and see if it increases? If it's only 20 minutes before it fails, a memory leak should be really obvious, as you'd see the numbers changing very quickly. If it's not a memory leak there used to be a TCP stack issue on some older versions of LabVIEW RT, but that was a while ago. (Pre 8.5, if I recall)
I hope this is helpful...
Jim
11-10-2008 08:50 PM - edited 11-10-2008 08:53 PM
Hi
I did use the RTSM to observe the memory and CPU usage.
The CPU usage either goes to 100% or to 0% after certain time. When the load goes to 100%, memory usage is still somewhere around 60%. The RT code does not abort and connection re-establishes after 2 minutes.
When the CPU usage goes to 0% (which is more often than the load at 100%), the RT aborts the running code. I will have to reboot the RT to run the code again. I get a 'system error' message at times. I had seen three different system error messages at different times and not always when the RT goes to 0% load. When the rt is at 0 % CPU usage, memory is constant at around 20%.
This is happening recently, and i do not realise what change in my code,if any, could have caused this.
Any suggestions?
Edit:
The code runs in LV 8.5.1
Memory does not change rapidly. It increases slowly to around 60% before the code aborts. The CPU usage jumps between a higher and a lower value, till it actually ends up either at 0% or at 100%
11-10-2008 09:21 PM
MScap wrote:Memory does not change rapidly. It increases slowly to around 60% before the code aborts. The CPU usage jumps between a higher and a lower value, till it actually ends up either at 0% or at 100%
Message Edited by MScap on 11-10-2008 08:53 PM
If memory is growing even slowly to 60%, I would strongly suspect a memory leak. Does anything about what you are doing in the code make sense that the memory is growing?
I would suspect that you have arrays that are growing with time. If the array grows to be larger than the space it currently occupies, then the memory manager will shuffle it in memory until if finds a contiguous space in memory to hold it. If the array grows so large that there isn't a large enough block of memory to hold it, then I believe the system would crash. Even though you are only using 60%, that means only 40% is free, thus less than half the memory. If the array has grown over time, the array may now be larger than 40% of your memory space.
Take a close look at what operations you are doing to your arrays and make sure you are not growing them unnecessarily. Preallocate space for arrays by using initialize array functions. Use replace array subset in code rather than build array or insert into array, maintaining the array between loop iterations using shift registers along with any other data you need to hold such as how much of the array contains valid data.
11-11-2008 02:35 AM
Hi
The code running on the PXI-RT uses only queues. The data acquired from the DAQmx channels (PXI and SCXI) channels is logged in HWS file on RT,enqueued and transmitted back to host. These are the only functions being performed in the code, apart from TCP Communication.
Thus, the memory incresing slowly, or rapidly must depend at the rate at which i eenqueue and dequeue the data to the queues.
The queue status shows a max. of 1 element in each queue at any point of time. I concluded that i am not dequeuing the data slowly.
I understood your explanation, Ravens. I shall try to trace the memory being used by my code and see if it helps.
Thank you
11-11-2008 08:07 AM
I agree that memory is the first thing to watch for this type of crashing.
I believe there was a config option that forced a re-boot if communications to the host was lost. If your app is consuming all CPU then comm timeout could be missing cycles and also triggering a re-boot. Sorry I don't emeber where to find that setting since it has moved with different versions of LV.
Please keep us posted on what you find. Thank you,
Ben
11-12-2008 12:23 AM
Hi
The closest i remember to what Ben mentioned is the 'Halt if TCP fails' option in MAX, udner the network settings for an Remote Target. If there is any other such setting, i do not rememebr seeing it either!
The latest observation made is this:
The acquisition of data is from a PXI-6251 module and an SCXI-1102B module. the pxi-6251 also acts as the device communicator to the SCXI module. the problem of connection failing due to any reason is not observed yesterday, when only pxi channels are acquried, or when only scxi channels are acquired.
This rises a question, if data can be acquried from a pxi module and an scxi module when the same pxi module is used for controlling the SCXI chassis (combo chassis or a separate chassis)
Any other inputs will be gladly awaited for
Thank you