03-16-2011 01:55 PM
Currently running an application that is accessing 10 separate ENET-100 GPIB units (GPIB0 to GPIB9). On startup the application calls ibdev exactly one time for each ENET box after having verified that the unit is present on the network with a "ping" command. On some occasions the application will hang up on one of those initial 10 calls to ibdev. Killing the application and restarting it does not seem to fix the problem (still hangs). The problem does not disappear until the entire Linux machine is rebooted. After reboot the application starts up automatically and in most cases all 10 calls to ibdev work and the application performs flawlessly after that. Note the ENET boxes are not turned off at any time and are powered on when the system first boots.
The program is written in C++ and is single threaded.
The OS is Red Hat Enterprise 5.0 kernel version 2.6.18-164.elSPAE.
The ENET boxes firmware is version C.9
The ni488 driver we are using is version 2.5.1b1 dated March 2008 (from README file).
Any help is deeply appreciated.
Thanks.
03-17-2011 06:33 PM
After doing some research I ran across a similar issue in the past. Is it possible that you are getting intermittent fails because the application is running out of system
handles. This can happen because you're making multiple calls to ibdev() without paired ibonl(). Is this a possibility?
Aaron
03-21-2011 11:18 AM
Thanks for taking the time to research the issue with the hanging ibdev. The reason that I think we are seeing something completely different from the running out of file handles/calling ibdev wihtout ibonl issue is:
1. The ibdev call is definitely only being made once for each of the ten devices/ENET boxes and is never relesed with the ibonl until the program exits.
2. Killing and restarting the program, which would release resources, does not fix the problem. I have to reboot the entire box which seems to suggest some bad Linux kernel voodoo has occured.
3. The number of allowed file descriptors per process is set to 1024 with the Linux limit command.
4. When I list the open socket file descriptors for the process using lsof -a -i -p<processs-id>. I see 4 file descriptors allocated by the first ibdev and two file descriptors allocated for each subsequent ibdev for a total 22 file descriptors which is way below the 1024 limit.
Thanks for your help.
03-22-2011 10:03 PM
It is unusal that you have to reboot the machine to get it functioning again after the issue happens. I noticed that you are using the beta drivers because of the Red Hat version you are running. It is possible that you've run across something that might be driver related. Out of curiousity, have you tried using a similar setup with the non-beta drivers and a Red hat 4.0 release or any other linux-based setup with your hardware?
03-23-2011 08:32 AM
I don't think switching to the non-beta driver is going to make any difference. There is very little functional difference between those drivers.
It would be helpful to see an NI-Spy capture of the application when it is hanging, and then another NI-Spy capture showing the application when it is restarted after hanging. I hope from this we will be able to tell if it is always hanging on the same instrument/controller, or if once it hangs no new sessions can be opened to any device.
Jason S.
03-23-2011 09:44 AM
Thanks for your replies. The unfortunate logistics of the problem are that the situation is happening in the field, at 20,000+ feet in the air and is happening very rarely. It is not practical under those conditions to have someone running NISpy and monitoring. I have not been able to duplicate this in our lab (maybe because I also do not have 10 ENET controllers available to me in our lab). However the thought that it might be a specific ENET controller is an intriguing one and perhaps with some enhanced logging in the application we will be able to record what ENET might be hanging and if it is determine if it is the same one each time.
I do not have the option of using Red Hat Linux 4 and previous drivers because Red Hat 4 is not blessed version by our security accreditation folk.
Thanks for your help.