10-21-2013 01:42 PM
I am using ni488-2 on a Linux system (Redhat kernel 2.6.9-89), and my application hangs the system (requiring me to reboot) when accessing the Agilent 3458A meter on the 488 bus. I believe the hang occurs during a call to ibwrt. I have verified using "print" statements that the string and the string length are valid. And, it does not hang on the first call to ibwrt; it gets quite a way through the program before the problem occurs.
The NI libraries installed on this system (as shown by rpm -qa) are:
nipali-2.2.0-f0
ni4882ki-2.5.4-b1
ni4882i-2.5.4-b1
nispyi-2.5.2-f1
nikali-1.7.0-f0
nipalki-2.2.0-f0
Where can I find a list of known issues with this ni488 library release?
Do you have any other suggestions I can try?
10-22-2013 01:21 PM
Hi ben_white,
Have you tried updating your NI-488.2 drivers? You can find them here:
Thanks!
10-22-2013 07:29 PM
I am told by other developers that the currently installed drivers are the latest ones that are applicable for this Linux OS.
10-23-2013 03:51 PM
Ok. Has this worked on your machine in the past? If you could, please let me know the fowlloing:
What commands are you sending to the device?
Have you been able to use this device on other machines? If so, are you using the same commands that you were using?
Are you sending the same commands every time, or are you changing the order of commands that you are sending?
Lastly, if you have a program that will let you get the crash dump files, that might be helpful for us to take a look in to.
Thanks!
10-23-2013 04:12 PM
Stephanie,
I can't say that this has worked exactly as-is in the past, as we have modified our software in various places. What I can say is that the exact same software (same compiled binaries) works in Linux 2.6.18 (CentOS 5.4) but hangs the system, as previously described, in Linux 2.6.9 (CentOS 4.6). I believe the major difference between the two OS's is that on the former, we are using NI GPIB 2.9.0 drivers, and on the latter (the system hang case) we are using NI GPIB 2.5.4 drivers, as listed in my earlier posting.
I can't move to newer drivers on the Linux 2.6.9 system (as stated earlier, I've been told they will not install cleanly), but I did roll back the drivers on the Linux 2.6.18 system to use GPIB 2.5.4 drivers, and then I was able to reproduce the system hang problem. Or, to be more precise, the Linux 2.6.18 system was also having the system hang problem when I started working on this, and then someone here at the office suggested that I should try upgrading the GPIB drivers. And, that worked.
The device is an Agilent 3458A digital multimeter, and I've been using the same unit on all configurations tested.
I have a copy of the nispy capture log when the system hung, although I can't say that every last GPIB transaction
was captured in the file before the OS went out to lunch. I can send that along if you ask.
More info: one of our senior developers has pointed out that the 2.6.9 kernel is a "hugemem" kernel, meaning that
user mode addresses can (and in our programs, they do) exceed 31 bits. If the address of the buffer being passed
into ibwrt or ibrd exceeds 31 bits, are there any known problems with the 2.5.4 GPIB driver in handling that buffer?
I believe the buffer contents are valid: I log the write buffer contents to the screen before each call to ibwrt, and
the size of the read buffer is twice the size of the "count" value passed to ibrd.
-- Ben
10-24-2013 02:29 PM
Hi Ben,
10-24-2013 03:41 PM
Stephanie,
I consulted the table that you referenced for NIKAL compatibility, and appears that I am using the right version (1.7.0). We're using CentOS 4.8, which is basically a "rebranded" Redhat 4.8 (I said 4.6 in an earlier post, and that was my mistake).
The kernel is 32 bit, and the system I worked on most recently has 3 Gbyte physical memory. What's different about the "hugemem" kernel is that it allows user processes to access roughly 3 Gbyte of virtual addr space, as compared with 2 Gbyte for a standard Linux OS. That explains my earlier reference to addresses with the 32nd bit set.
I will attach the NISpy log to this message. It should answer the question about when the hang occurs. As far as I can observe, it is quite repeatable, and happens at the same point every time, at least as far as the log messages on my screen indicate. The NISPY capture indicates that the last command sent to the meter was "PRESET NORM ..." and my on-screen log messages agree with that. However, as you will note when you open the .spy file, it is truncated.
While my program is multi-threaded, the GPIB access is not. The meter is the only device on the GPIB bus, and only one thread is talking to that device.
Regarding the variability of Linux distributions, I understand your position. I'll try whatever suggestions you come up with,
and if you have to give up at some point, that's life.
--Ben
10-26-2013 09:02 PM
Hi Ben,
Does this problem occur any time that specific ibwrt PRESET NORM command that causes the device to hang is sent? For example, if you have another program that also writes this, will it hang? Or is it after a certain number of ibwrt commands?
Thanks,
10-28-2013 02:42 PM
Stephanie,
My guess is that it is not the PRESET NORM command itself that is causing the hang. In fact, in my latest version of the program, the hang is now associated with TRIG HOLD, both in NISpy, and in my on-screen log messages.
I built two versions of the program: one is the "normal" version, and the other is basically the same, except that several of the meter command strings are issued twice on the GPIB. My intent was to was to answer your question about when the hang occurs. The NISpy logs from both runs (Capture3 = normal, Capture4 = doubled) are attached.
My conclusion is that the hang is not related to the number of GPIB writes: it occurs at the same place in the program each time. In the second test case, quite a few more writes to the GPIB have occurred at the time of hang. It's also not associated with a specific command string. As you noted, originally it hung after a "PRESET NORM; ...", now it hangs after a TRIG HOLD. You are welcome to challenge that analysis, of course.
Since we last communicated, I think I have also eliminated the 32nd address bit as a possible cause. I re-wrote my code
to allocate the string buffer (a C++ char [256] variable) as static, which resulted in a 28-bit buffer address. In other words, that should be similar to the addresses one would see with a regular 2 Gbyte Linux kernel.
Ben
10-29-2013 06:59 PM
Ben,
Thanks for attaching those screenshots. I looked in to the errors you are seeing in the I/O Trace and found the following article:
http://digital.ni.com/public.nsf/websearch/2FA525A8585A92E9862566EE002A3755#ENOL
It provides a pretty good description for why you might be seeing the ENOL (2) error in your trace. Can you verify that this device works on another computer? Have you updated the firmware lately?
Also, when your system hangs do you have to turn off the device and power it back on to start again, or can you restart the program?
Thanks!