Shared Variable engine is too busy to respond

kevinks · ‎07-26-2018

Hi all,

The title would be familiar with many, I would also disclose a few methods which we tried. Before going to the actual troubles we face, let me give you a brief summary of the system we are using. We are working on a project which uses two NI CRIO 9030 (for redundant operation, but currently not yet configured for redundancy), LabVIEW 2016 full development system and a Windows 10 workstation. We have employed the Network shared variables to communicate with CRIO and Operator workstation, hereafter referred to as OS. There are over 250 digital inputs, over 100 digital outputs, 90+ analog inputs and 8 analog outputs. Around 375 shared variables(hereafter referred to as SV) are being used. In CRIO side, all the SV's are connected to a cluster in a communication VI and is wired throughout the logic without using any Global Variables. In OS, Shared Variables are used wherever necessary, directly.

During integrated testing we have found that the shared variable engine in OS crashes after a few seconds with the error that SVE engine is too busy to respond or is unavailable after which a windows Restart is required, or a warning says that NI PSP server is not connected yet. We monitored the tagserv in windows services and found that the Variable engine stops automatically in between.

Following methods were implemented by us hoping to solve the issues:

1. Restarting Windows

2. Reinstalling Labview

3. Formatting Crio

4. Disabling firewall

When all these failed there was a chance that we were demanding too much from the SV engine, so in order to transfer DI,DO and AI data we used the TCP IP communication so that the load on SV engine was lessened. (Yes, we required SV's even after that for transferring data) After which we did the following one after the other and in combination:

1. RT FIFO

2. aliasing at CRIO side alone

3. aliasing at OS side alone

4. Splitting the libraries at OS side in such a way that, SV's that used aliasing for data transmission are in a single library and SV's for alarm are in a separate library.

5. Using programmatic access to read and write from SV's, during which we have ensured that SV connections that are not being used throughout are closed immediately after being opened, hoping that the load will further be lessened.

We have also confirmed that none of the SV's have both read and write access. If an SV is read in CRIO then the same will be write in OS. There are no conflicts.

Despite all these, we are still facing the issue. Since we have come too far, I don't think it would be easy for us to implement any other solutions other than trying to solve the present issue.

The project is classified, however I can share screenshots of the error and test VIs once I have been granted permission. In the meantime any suggestions or ideas would be much welcomed. Thanks in advance

MichaelBalzer · ‎07-30-2018

Just to be clear, are there two copies of all the variables, one set hosted on the cRIO, and one set hosted on the Windows workstation which is aliased to the cRIO SVs? Something like:

[cRIO] (Scan Engine) I/O -> [cRIO] SVs -> [OS] Aliased SVs

Does the SVE crash when using fewer variables? What about if the cRIO target is disconnected?

Are the SVs in a single library, or multiple libraries? I've found too many variables in a single library can be problematic, any more than about 50.

Have you tried disabling your firewall (as a quick test), or allowing the appropriate SV ports?

When the SVE does crash, can the "NI Variable Engine" service be restarted through services.msc?

After a reboot of the PC, can you open the Distributed System Manager and query the variables on the cRIO?

If you're looking to reduce the number of variables, you could try packing 64 digitals into a single U64 variable.

For comparison I have a project with a roughly even mix of AI, AO, DI and DO with > 400 SVs split across 75 libraries hosted on a cRIO. A duplicate set of variables is hosted on a PC which are aliased to the cRIO's SVs. All SV access is programmatic at 10Hz, and is always to the locally hosted library (ie. cRIO reads/writes cRIO SVs, PC reads/writes PC SVs).

Unless otherwise stated, all code snippets and examples provided
by me are "as is", and are free to use and modify without attribution.

Jimmie_A. · ‎07-30-2018

Hi,

Quick question, have you contacted support regarding the issues you are experiencing? If yes, what did they say?

Down below you find some initial feedback and questions (so I understand what you are doing better!).

I agree that you have a decent amount of networked shared variables, what computing capability do you have on the OS side of things? I guess you deploy and host the shared variables on the SVE on your cRIO target, correct? I normally deploy on the cRIO target since I expect it to be online longer than a Windows based machine. What rates are you running at? How much data do you actually send?

Good idea to pack the shared variables (digital I/O) to a bigger datatype and that way reduce number of shared variables.

Why do you connect all shared variables to a cluster? Especially if you use the programmatic approach?

Regards,
Jimmie Adolph
Systems Engineering Manager, National Instruments Northern European Region

kevinks · ‎08-02-2018

Sorry for the delay, please find the answers to your questions:

“are there two copies of all the variables, one set hosted on the cRIO, and one set hosted on the Windows workstation which is aliased to the cRIO SVs?”

Yes, there are two copies of all the variables, one set hosted on the cRIO and the other on Windows workstation and is aliased to cRIO SVs

“Does the SVE crash when using fewer variables?”

Yes, the SVE crashes even if we are using fewer variables in the VI with a full library, and also if the library is set to hold only few variables.

“What about if the cRIO target is disconnected?”

The same error persists even if cRIO is disconnected

“Are the SVs in a single library, or multiple libraries? I've found too many variables in a single library can be problematic, any more than about 50.”

This is something which we haven’t implemented yet. Currently we have two libraries. One library is for acquiring data from cRIO and the SVs are all aliased. The next library is having SVs for creating events, like, if the user presses a button or enters a new tab then an event will be recorded with the help of these SVs. These SVs are not used in cRIO and therefore are not aliased with anything. [We had a doubt that it might be the reason, so we tried testing without the event generating library but the error was still there]

“Have you tried disabling your firewall (as a quick test), or allowing the appropriate SV ports?”

Yes, the firewall is disabled, and as of the SV ports, we haven’t looked into that matter. Could you please share some ideas about the same?

“After a reboot of the PC, can you open the Distributed System Manager and query the variables on the cRIO?”

The distributed systems manager will work properly after a reboot, however once the error is generated the library cannot be expanded in distributed systems manager.

“If you're looking to reduce the number of variables, you could try packing 64 digitals into a single U64 variable”

If done so, then how will you manage the Alarms?

kevinks · ‎08-02-2018

“Quick question, have you contacted support regarding the issues you are experiencing? If yes, what did they say?”

Yes, we have. But the solutions and documents shared by them where not able to solve our issues.

“I agree that you have a decent amount of networked shared variables, what computing capability do you have on the OS side of things? I guess you deploy and host the shared variables on the SVE on your cRIO target, correct? I normally deploy on the cRIO target since I expect it to be online longer than a Windows based machine?”

By computing capabilities, do you refer to the workstation specs? If yes, then the system is powered by intel Xeon E3 with 16GB RAM. The shared variables are deployed on both the cRIO and Windows OS.

And regarding the bundling to cluster, I will try to give you an overall picture of our program working. We have three types of valves, say A, B and C. Now for these three types of valves we have certain SVs which would have information about their position, state, commands and fault status. Corresponding to these three types of valves, there are three subVIs which would read or write the SVs (the three subVIs are hosted inside a parent VI called, data exchange). In a single subVI there will be two clusters, one which would accept commands from SVs and takes to valve operation logic VI so that valves are operated, and the other cluster which comes from valve operation logic VI and has the states of valves, which is unbundled and written to SVs so that the status is displayed on OS. We used this logic assuming that we don’t have to read to the same SV at multiple places inside the logic which can slow down the execution. At least this way SVs are read/written only based on the execution of Data exchange VI and are always wired.

I am also attaching two screenshots along with this reply. During execution the warning comes first and then the error.

MichaelBalzer · ‎08-02-2018

I feel your pain with the unexpected and unexplained shared variable behaviour, I've run into many issues with them in the past. You mentioned alarms - are you using the DSC module + logging?

If your firewall is disabled, then there's no need to setup the SV ports. It was more a case of if the firewall was active, then there's a range of ports you must allow through to enable SVs to work correctly.

Does the variable engine crash if you remove all of the aliasing from the server hosted variables and run the server application? This should've been covered with the cRIO disconnected test, but probably worth verifying anyway.

Have you tried hosting the variables on another PC (ideally a different version of Windows) and running the application from there? I'm wondering if it's an installation issue or Win10 version incompatibility.

Is there a minimal code example you can provide which causes the crash?

Sorry for the 20 questions! By the way the server hardware you have should be more than enough. Most of my dev and test is in a Win7 VM with 8GB RAM and two cores of an i7, and it handles hundreds of variables + DSC logging and alarming.

Unless otherwise stated, all code snippets and examples provided
by me are "as is", and are free to use and modify without attribution.

kevinks · ‎08-03-2018

Hi Michael,

Yes we are using DSC module for alarms and logging. We had also tried testing on a windows 7 machine with the same hardware specs but the same issue occurred.

I had previously told that cRIO disconnected test threw an error. After reading your comment I realised that I hadn't disabled the aliasing (sorry, my mistake). I disabled aliasing and removed hosted variables from Distribution manager and ran the software again. It didn't throw any error and distribution manager could be accessed properly.

Now I am planning to implement packing the digitals to a U64 as you had mentioned earlier. In the meanwhile if the cRIO disconnected test result gave you some idea please do let me know.

Thanks in advance,

kevinks

MichaelBalzer · ‎08-05-2018

Hi kevinks,

What options have you enabled in terms of update deadband and logging deadband / resolution for the server hosted variables? I think the default for the update deadband is 0% - try setting this to 0.01%, or even 0.1% to see if it has any effect. I think the log deadband defaults to 0.01% / 0.01 resolution, so you could set this to 0.1 if your update deadband is 0.1.

Unless otherwise stated, all code snippets and examples provided
by me are "as is", and are free to use and modify without attribution.

kevinks · ‎09-28-2018

Hi,

Actually i would like to reply to all. Thank you all for your valuable information. I have been using the shared variables as such and the standard alarming VI's which came with labview. This seemed to be the problem. Now I have stripped off all those and is accessing variables programmatically with alarms and events only when desired, taking the load off of the system and using the DSC module to its fullest potential . Looking back at my previous programs I now feel like a complete idiot. It took a while, but it sure taught me a lot.

MichaelBalzer · ‎09-30-2018

DSC has been a long learning experience for me, and I still run into gotchas. I'm glad you've been able to learn something too.

A friendly warning with alarm events - if a channel goes from say HIHI straight to LOLO (maybe the sensor was knocked offline), you will receive a bunch of alarm events at once, and not necessarily in the order you'd expect (HIHI -> HI -> No alarm -> LO -> LOLO). You may need to reread the alarm state of a channel after receiving the event to verify the actual alarm state.

Unless otherwise stated, all code snippets and examples provided
by me are "as is", and are free to use and modify without attribution.

LabVIEW

Shared Variable engine is too busy to respond

Shared Variable engine is too busy to respond

Re: Shared Variable engine is too busy to respond

Re: Shared Variable engine is too busy to respond

Re: Shared Variable engine is too busy to respond

Re: Shared Variable engine is too busy to respond

Re: Shared Variable engine is too busy to respond

Re: Shared Variable engine is too busy to respond

Re: Shared Variable engine is too busy to respond

Re: Shared Variable engine is too busy to respond

Re: Shared Variable engine is too busy to respond