RT system monitor reports several VIs as "bad" then crashes

testingHotAir · ‎10-14-2009

Hi,

I've got a fairly complex program running on my cFP-2220 which was working until I had to change a VFD drive from analog to digital control. To do that I've added two subroutines that utilize two of the RS232 ports and a queue-based producer-consumer architecture. Everything works fine until I get to the part of the program that polls the I/O modules. Here's what happens:

1.) New subroutine runs, executes fully, then I get a popup saying "Waiting for RT target to respond" and a spike to 100% CPU on the RT system monitor. Seems to correspond to the top level VI switiching to the "run" state.

2.) Popup goes away, CPU usage goes back down to ~52%, program continues to execute.

3.) The top level VI goes into the polling part of the program (state machine architecture), I get the same "Waiting..." message, the CPU pegs, system freezes, then all the subVIs in the polling state of the program switch to "Bad" in the RT system monitor, and I lose the connection.

4.) I have to reset the controller in order to start the process over again.

Does anyone know what the RT system monitor means by "switched to Bad"?

Are there any pitfalls I should look out for when using either queues or serial functions on a RT target?

Any advice on what to check?

Thanks.

CLAD

DiscoBall · ‎10-15-2009

Hello,

Can you provide me the versions of the following:

LabVIEW Real Time
FieldPoint Drivers
NI VISA
NI Serial RT that is installed to the target

Which serial ports are you using? Can you test this new portion of code independently such that you can tell if this new code works on its own? Do any errors ever get reported?
Do you have access to Real Time Execution Trace Toolkit, this could help us track down the issue? Does the Real Time System Manager report any significant increase in memory usage?

About pit falls with queues: You can keep on enqueuing elements and more and more memory will be used. The memory will not be released until the queue is released; hence my question about memory usage. As for the serial VIs, there is not too many caveats.

I would first check the new code by itself and see if it runs correctly.

Message Edited by DiscoBall on 10-15-2009 04:28 PM

Joshua B.
National Instruments

testingHotAir · ‎10-19-2009

Sure,

Labview RT 8.6

FP Drivers 6.0.2

NI VISA 4.4

Modbus I/O server 1.5.0

NI Serial RT 3.3.2

I'm using the 10-pin RS 485 port and the 10-pin RS 232 port for this part of the application. The DB9 RS 232 port on the front of the cFP2220 is being used for another part of the program. There is an unidentified error code "-197386942" reported after the new code executes. It looks like the type of error I get when trying to write an out-of-range value to a variable or I/O terminal, but there are no variables or I/O terminals written to in that code: only serial ports.

I don't have the execution trace toolkit unfortunately.

Memory usage remains flat at about 10% for the entire event. Seems to be just the processor that pegs.

Running the new code independent of the overall program does not crash the system and seems to execute correctly.

What I've done is commented out the section of the block diagram containing the new code using a case structure. The program still crashes, but one of the "Waiting for RT target to respond" messages no longer appears. Other than that, the behaviour of the program is identical. I guess this means I've got a problem in the original code somewhere. I did refactor the original block diagram when I put in the new code, so I may have made a mistake there.

CLAD

DiscoBall · ‎10-20-2009

I think you know where we should go from here...double check the original code for any mistakes about why it could be crashing. Also, are you sure the 197386942 is correct. I was unable to find a error with that number, can you post a screenshot? From which VI does the error come from? Do you have the Cons. Out switch ON currently?

Joshua B.
National Instruments

testingHotAir · ‎10-28-2009

I went over the code again and didn't find anything obvious. In fact the only error I found was a VI that had a "VISA Initialize Port" VI in a loop. That didn't stop the crash though. A couple of questions:

1.) My application uses ~30 network shared variables. Would using Datasocket to transmit a single cluster with 30 elements use less processor than the NSVs?

2.) I have the cFP 2220 configured as the SVE host. Would moving SVE hosting to the PC (which contains the front panel for the system) decrease processor usage significantly on the cFP?

The reason I ask is that the processor usage on the cFP while the program is runing normally is ~52%, spiking into the lower 90% range when VIs change state. Since I am unable to find obvious errors in the code, I'm thinking that I'm just asking too much of the cFP's processor.

Thanks for all your help!

CLAD

rex1030 · ‎10-28-2009

When you say the program enters the 'polling part' and then talk about processor spikes I can't help but wonder if you forgot to add waits to your loops. Are you using timed loops or waits? If you are using timed loops have you changed the priority of any of the timed loops?

Also with your "queue-based producer-consumer architecture" it sounds like the program is hanging on the read of the queue like you are reading faster than you are writing at times. Also, are you wiring the errors from the queue read so you can see if it is starving (underflow)?

Could you post any code so we can see whats wrong? I know you said it was a big complicated program. Any screenshots of the polling parts would be great.

PS disco, don't forget the negative on that error number if that matters

Message Edited by rex1030 on 10-28-2009 04:13 PM

---------------------------------
[will work for kudos]

testingHotAir · ‎10-28-2009

rex1030,

I'm using timed loops. I've attached a screenshot of the top level VI in the state that crashes (Top level run test.jpg). In it you can see the three parallel timed loops. The top loop has a period of 100ms and priority of 600, the middle loop has a period of 1000ms and a priority of 450, and the bottom loop has a period of 1000ms with a priority of 300. These did not change when I refactored the code.

The subVI that uses the consumer-producer structure screenshot is "Orifice changer auto.jpg". This subVI has been executing fine except for a momentary loss of connection with the cFP2220 when it finishes and the top level VI changes to the "run" state.

CLAD

rex1030 · ‎10-28-2009

Yea see the problems you are having seem to be coming from you messing with the loop priorities. Now you see why I never touch those. The higher priority loops are starving the lower priority ones. Reset all the priorities back to default and see if that solves your problem.

Oh yea and I should also mention that is very easy to descover the limits of the compact field point devices and from the complexity of the code you posted I would say that is probably part of it as well.

---------------------------------
[will work for kudos]

testingHotAir · ‎10-28-2009

In the first version of this code I had all the loops set to the same priority. This caused the network loop at the bottom to take too much processor and crash the system, so I used the priority setting to ensure that it came after the control loop. These priority and period settings worked fine before I added subVIs that use the serial ports.

I'm thinking that, after adding the code to use two of the remaining three serial ports, I've simply pushed my processor requirement above what the 400MHz chip in the cFP2220 can handle.

CLAD

rex1030 · ‎10-29-2009

Since you added VI's to that loop and now it does a lot more work in each iteration of the loop, and you set it to a higher priority than the other loops, that it is hogging all the processor time and the less priority loops are not being run. Even if you don't set the network loop at the same priority, set the other ones at the same priority and see what happens.

On another note, since you have the loops in the bottom picture running at with 1000ms dt, which is an eternity for a computer, there is no reason to mess with the priorities here.

Why does the middle loop in the top picture not have any timing source or dt or anything?

Another thing that might help you is that the timed loops have an indicator you can read from on the inside of the loop that will tell you how long it actually took to run the loop. You will be able to tell from that if the loops aren't running in the time expected.

---------------------------------
[will work for kudos]

LabVIEW

RT system monitor reports several VIs as "bad" then crashes

RT system monitor reports several VIs as "bad" then crashes

Re: RT system monitor reports several VIs as "bad" then crashes

Re: RT system monitor reports several VIs as "bad" then crashes

Re: RT system monitor reports several VIs as "bad" then crashes

Re: RT system monitor reports several VIs as "bad" then crashes

Re: RT system monitor reports several VIs as "bad" then crashes

Re: RT system monitor reports several VIs as "bad" then crashes

Re: RT system monitor reports several VIs as "bad" then crashes

Re: RT system monitor reports several VIs as "bad" then crashes

Re: RT system monitor reports several VIs as "bad" then crashes