system hang during ibwrt

ben_white · ‎10-21-2013

I am using ni488-2 on a Linux system (Redhat kernel 2.6.9-89), and my application hangs the system (requiring me to reboot) when accessing the Agilent 3458A meter on the 488 bus. I believe the hang occurs during a call to ibwrt. I have verified using "print" statements that the string and the string length are valid. And, it does not hang on the first call to ibwrt; it gets quite a way through the program before the problem occurs.

The NI libraries installed on this system (as shown by rpm -qa) are:

nipali-2.2.0-f0
ni4882ki-2.5.4-b1
ni4882i-2.5.4-b1
nispyi-2.5.2-f1
nikali-1.7.0-f0
nipalki-2.2.0-f0

Where can I find a list of known issues with this ni488 library release?

Do you have any other suggestions I can try?

warEagle12 · ‎10-22-2013

Hi ben_white,

Have you tried updating your NI-488.2 drivers? You can find them here:

http://search.ni.com/nisearch/app/main/p/bot/no/ap/tech/lang/en/pg/1/sn/catnav:du,n19:Linux,n8%3A3.2...

Thanks!

Stephanie S.
Application Engineer
National Instruments

ben_white · ‎10-22-2013

I am told by other developers that the currently installed drivers are the latest ones that are applicable for this Linux OS.

warEagle12 · ‎10-23-2013

Ok. Has this worked on your machine in the past? If you could, please let me know the fowlloing:

What commands are you sending to the device?

Have you been able to use this device on other machines? If so, are you using the same commands that you were using?

Are you sending the same commands every time, or are you changing the order of commands that you are sending?

Lastly, if you have a program that will let you get the crash dump files, that might be helpful for us to take a look in to.

Thanks!

Stephanie S.
Application Engineer
National Instruments

ben_white · ‎10-23-2013

Stephanie,

I can't say that this has worked exactly as-is in the past, as we have modified our software in various places. What I can say is that the exact same software (same compiled binaries) works in Linux 2.6.18 (CentOS 5.4) but hangs the system, as previously described, in Linux 2.6.9 (CentOS 4.6). I believe the major difference between the two OS's is that on the former, we are using NI GPIB 2.9.0 drivers, and on the latter (the system hang case) we are using NI GPIB 2.5.4 drivers, as listed in my earlier posting.

I can't move to newer drivers on the Linux 2.6.9 system (as stated earlier, I've been told they will not install cleanly), but I did roll back the drivers on the Linux 2.6.18 system to use GPIB 2.5.4 drivers, and then I was able to reproduce the system hang problem. Or, to be more precise, the Linux 2.6.18 system was also having the system hang problem when I started working on this, and then someone here at the office suggested that I should try upgrading the GPIB drivers. And, that worked.

The device is an Agilent 3458A digital multimeter, and I've been using the same unit on all configurations tested.

I have a copy of the nispy capture log when the system hung, although I can't say that every last GPIB transaction

was captured in the file before the OS went out to lunch. I can send that along if you ask.

More info: one of our senior developers has pointed out that the 2.6.9 kernel is a "hugemem" kernel, meaning that

user mode addresses can (and in our programs, they do) exceed 31 bits. If the address of the buffer being passed

into ibwrt or ibrd exceeds 31 bits, are there any known problems with the 2.5.4 GPIB driver in handling that buffer?

I believe the buffer contents are valid: I log the write buffer contents to the screen before each call to ibwrt, and

the size of the read buffer is twice the size of the "count" value passed to ibrd.

-- Ben

warEagle12 · ‎10-24-2013

Hi Ben,

Thank you for replying with all of this information. I just want to give you a heads up that since there is a significant amount of variability in behavior between Linux distributions, we decided to support the Linux versions specified in this article:

http://digital.ni.com/public.nsf/allkb/4857A755082E9E228625778900709661?OpenDocument

The effect of this variability is that drivers that work properly on one distribution may not work on others. It is possible that our products may function on other versions of the supported distributions, however, we have not tested these distributions This document shows the version of NI-KAL that works with different Linux distributions, as well:

http://digital.ni.com/public.nsf/allkb/2B4BB2B78A02C2FA86257BD500605CAD?OpenDocument

I have not been able to find anything about a 2.5.4 GPIB driver buffer issue, but I will continue searching. Can you tell me more about your system? Are you using a 32-bit or 64-bit system? Here is another article that might be helpful for your situation in case your system as greater than 4GB of RAM:

http://digital.ni.com/public.nsf/allkb/20789C3E4A3DCAFB8625708B00693CF9?OpenDocument

Also, when exactly will this ibwrt cause a failure? I know that you mentioned that it did not happen with the first ibwrt write, but do you know when it does occur? How many writes can you do before hanging and is this the same every time you do this? Are you doing anything else while you're writing or trying to access any other ports?

Lastly, it might be helpful to attach the NISpy capture that you have so that we can take a look.

Thank you!

Stephanie S.
Application Engineer
National Instruments

ben_white · ‎10-24-2013

Stephanie,

I consulted the table that you referenced for NIKAL compatibility, and appears that I am using the right version (1.7.0). We're using CentOS 4.8, which is basically a "rebranded" Redhat 4.8 (I said 4.6 in an earlier post, and that was my mistake).

The kernel is 32 bit, and the system I worked on most recently has 3 Gbyte physical memory. What's different about the "hugemem" kernel is that it allows user processes to access roughly 3 Gbyte of virtual addr space, as compared with 2 Gbyte for a standard Linux OS. That explains my earlier reference to addresses with the 32nd bit set.

I will attach the NISpy log to this message. It should answer the question about when the hang occurs. As far as I can observe, it is quite repeatable, and happens at the same point every time, at least as far as the log messages on my screen indicate. The NISPY capture indicates that the last command sent to the meter was "PRESET NORM ..." and my on-screen log messages agree with that. However, as you will note when you open the .spy file, it is truncated.

While my program is multi-threaded, the GPIB access is not. The meter is the only device on the GPIB bus, and only one thread is talking to that device.

Regarding the variability of Linux distributions, I understand your position. I'll try whatever suggestions you come up with,

and if you have to give up at some point, that's life.

--Ben

warEagle12 · ‎10-26-2013

Hi Ben,

Does this problem occur any time that specific ibwrt PRESET NORM command that causes the device to hang is sent? For example, if you have another program that also writes this, will it hang? Or is it after a certain number of ibwrt commands?

Thanks,

Stephanie S.
Application Engineer
National Instruments

ben_white · ‎10-28-2013

Stephanie,

My guess is that it is not the PRESET NORM command itself that is causing the hang. In fact, in my latest version of the program, the hang is now associated with TRIG HOLD, both in NISpy, and in my on-screen log messages.

I built two versions of the program: one is the "normal" version, and the other is basically the same, except that several of the meter command strings are issued twice on the GPIB. My intent was to was to answer your question about when the hang occurs. The NISpy logs from both runs (Capture3 = normal, Capture4 = doubled) are attached.

My conclusion is that the hang is not related to the number of GPIB writes: it occurs at the same place in the program each time. In the second test case, quite a few more writes to the GPIB have occurred at the time of hang. It's also not associated with a specific command string. As you noted, originally it hung after a "PRESET NORM; ...", now it hangs after a TRIG HOLD. You are welcome to challenge that analysis, of course.

Since we last communicated, I think I have also eliminated the 32nd address bit as a possible cause. I re-wrote my code

to allocate the string buffer (a C++ char [256] variable) as static, which resulted in a 28-bit buffer address. In other words, that should be similar to the addresses one would see with a regular 2 Gbyte Linux kernel.

Ben

warEagle12 · ‎10-29-2013

Ben,

Thanks for attaching those screenshots. I looked in to the errors you are seeing in the I/O Trace and found the following article:

http://digital.ni.com/public.nsf/websearch/2FA525A8585A92E9862566EE002A3755#ENOL

It provides a pretty good description for why you might be seeing the ENOL (2) error in your trace. Can you verify that this device works on another computer? Have you updated the firmware lately?

Also, when your system hangs do you have to turn off the device and power it back on to start again, or can you restart the program?

Thanks!

Stephanie S.
Application Engineer
National Instruments

Instrument Control (GPIB, Serial, VISA, IVI)

system hang during ibwrt

system hang during ibwrt

Re: system hang during ibwrt

Re: system hang during ibwrt

Re: system hang during ibwrt

Re: system hang during ibwrt

Re: system hang during ibwrt

Re: system hang during ibwrt

Re: system hang during ibwrt

Re: system hang during ibwrt

Re: system hang during ibwrt