07-02-2009 12:25 PM
I am testing a new version of a program. The production version is running more or less ok, but I wanted to tweak a few things. This system has DIO/DAQ VISA serial and VISA PCI-GPIB running in parallel loops. Without the hardware actually connected but the PCI boards in the system, the DIO and DAQ run fine.
However without the actual hardware connected (3 GPIB devices) and 8 multi-drop RS-485 devices there are a huge number of timeouts which I expect. What I don't expect is that the whole system seems to lock up after about 12 hours of running. Are each of these timeouts leaking memory or something?
My system log is full of
Jul 2 00:02:30 hybridcryomacb [0x0-0x28028].com.ni.labview[1919]: VISA: Async timeout forced
Jul 2 00:03:11: --- last message repeated 144 times ---
Jul 2 00:03:06 hybridcryomacb [0x0-0x28028].com.ni.labview[1919]: VISA: Async timeout forced
Jul 2 00:03:41: --- last message repeated 133 times ---
which I assume is just informational and then I get a really bad message
Jul 2 13:03:33 hybridcryomacb ReportCrash[9962]: Formulating crash report for process coreservicesd[9809]
Which tells me my system is seriously FooBar. At that point the Finder menu bar is gone and the Dock starts to fail. I still have non-gui access to the system. Has NI by any chance tested VISA against lots (ie 10,000 or more) async VISA timeouts to check for cleanup?
VISA 4.4.0 & VISA 4.5.0
LabVIEW 8.5.1
Mac OS X 10.5.7 / PPC
07-06-2009 05:18 PM
Interesting issue sth. Let me bring up this issue to our R&D department and see if they have seen something like this before.
Regards,
Sammy Z.
07-06-2009 06:29 PM
07-08-2009 09:44 AM
sth wrote:I am testing a new version of a program. The production version is running more or less ok, but I wanted to tweak a few things. This system has DIO/DAQ VISA serial and VISA PCI-GPIB running in parallel loops. Without the hardware actually connected but the PCI boards in the system, the DIO and DAQ run fine.
However without the actual hardware connected (3 GPIB devices) and 8 multi-drop RS-485 devices there are a huge number of timeouts which I expect. What I don't expect is that the whole system seems to lock up after about 12 hours of running. Are each of these timeouts leaking memory or something?
My system log is full of
Jul 2 00:02:30 hybridcryomacb [0x0-0x28028].com.ni.labview[1919]: VISA: Async timeout forced
Jul 2 00:03:11: --- last message repeated 144 times ---
Jul 2 00:03:06 hybridcryomacb [0x0-0x28028].com.ni.labview[1919]: VISA: Async timeout forced
Jul 2 00:03:41: --- last message repeated 133 times ---
which I assume is just informational and then I get a really bad message
Jul 2 13:03:33 hybridcryomacb ReportCrash[9962]: Formulating crash report for process coreservicesd[9809]
Which tells me my system is seriously FooBar. At that point the Finder menu bar is gone and the Dock starts to fail. I still have non-gui access to the system. Has NI by any chance tested VISA against lots (ie 10,000 or more) async VISA timeouts to check for cleanup?
VISA 4.4.0 & VISA 4.5.0
LabVIEW 8.5.1
Mac OS X 10.5.7 / PPC
Great POST!
The simple explaination is that your GPIB controller buffer is full!
The big clue is the ASYNC TMO. (BTW you would have NEVER traced this down if you were using Synchronous VISA calls because the OFFENDING call would return OK but the NEXT call would TMO) Since Async calls wait for the data to leave the buffer before returning you can see that the time to execute the call depends on the amount of data already in the buffer. If the data in the async call is greater than the remaining buffer size VISA forces the call to TMO (and reports your posted error)
A couple of things you can do:
1) make sure ALL calls are ASYNC so the buffer is clean at the end of each call.
2) hang a second GPIB bus to handle an equipment that needs huge amounts of access (so it doesn't block the rest of the communications)
3) If you are using RQS and Waiting for an instrument STOP! and find another way to synchronize your timing (you are idling the bus when you need to talk to other instruments)
07-08-2009 11:58 AM
Jeff,
Thanks for that info. You make a lot of interesting points. I have tried to maximize the asynchronous nature of this program.
You mention that using a wait for RQS is inefficient in that it ties up the GPIB bus until the request is satisfied. My understanding of the IEEE standard is that this is *supposed* to do the complete opposite. That this allows all other transactions to take place on the GPIB bus until the SRQ line is asserted. At that point you poll to see who asserted the SRQ and service it. But the for the time between the wait and until the SRQ is asserted other threads should be able to access the GPIB.
This brings up the problem of Async vs. Sync VISA calls. I have tried to make my system as asynchronous as possible. This has led to a lot of bug finding in LV over the years. This is to take advantage of the OS threading and multiprocessors. However, I have noted in the past that counterintuitively Async calls chew up a lot more CPU time than Sync calls to VISA.
This is because an Async call will spawn a thread that spins and polls waiting for the call to complete. This is not using a driver call back to signal the completion but eats up a lot of CPU time polling the transaction. It is *MUCH* more efficient to use synchronous threads if you don't run out of threads.
I have asked and never gotten an answer about the trade offs of creating a larger number of threads (the default is 2 * Ncpu) and it is easy to use the NI utility to crank that up to 8 or so. Compare this to using async calls with fewer threads. The question is about the overhead of thread switching vs the huge CPU hit of the async calls.
As to the original problem it is due to testing without hardware connected. So the async calls to different GPIB instruments are filling the buffer before they timeout. Either a smaller default timeout or a larger buffer should solve this. However I am not sure that you can set the buffer size for GPIB transactions.
And last, the comment about being all async to make sure that the buffer is clean means that the buffer still has data in it while the program attempts another GPIB call. Do you mean to make sure that they are all sync calls?
07-08-2009 01:48 PM
STH,
It is counterintuitive but Async calls return when the data leaves the buffer (so execution time is variable, for a given size transfer, dependant on how much data was previously in the buffer. so the execution time is asynchronous.) Synchronous calls dump data to the buffer and return (so execution tim, for a given size transfer, is always the same or synchronous) Sync calls leave data in the buffer when they return. Async calls require the buffer to be emptied before they return (unless they time-out).
You are correct that there is a speed trade-off from the calling application. I do tend to err towards robustness over CPU optomization.
The Wait for SRQ does involve a lot of bus time that cannot be used to tranfer data while conduting serial polling. If you need to maximize bus throughput you should consider alternatives that do not chew up transfer speeds by handshake line operations (and combine as many messages as possible as well. And use block data transfers with compact data formats) Moreover, an instrument's SRQ is intentionally maskable by an SRQ of any lower addressed instrument (a fact that we frequently forget when assigning addresses to equipment without regard to SRQ prioratization. you can get bit badly by a low addressed inst that doesn't have code to service its SRQ's.) But by all means, if you have the bus capacity SRQ is valuable. I advise against using it unless you really know exactly what each instrument is doing and how all the bus riders work together. Similararlly I usually advise against transfering CIC. The maintainers you pass your code off to may not be experts in 488.
And really, NO the SRQ is a service request, it gets priority if you are waiting for it. When using the SRQ line the assumpion to make is "I don't want to do anything on the bus until...(event upon which to raise SRQ.)"
I do not believe the buffers are resizable in most GPIB controllers. A typical controller uses SRAM as a FIFO so the buffer is hardware. Some controllers may vary from this approach. I do not know.
07-13-2009 03:08 PM
Well I haven't used SRQs unless forced to by certain instruments. In 90% of my cases I use a write/read combination with VISA locking. This should only lock the bus for the specific VISA address. I haven't tried passing control with CIC and and Pass Control calls since I have tried to have simple one controller setups.
When we are talking about buffers I guess it we need to make clear the difference between the various places data is buffered. I was talking about the VISA set buffer which I assume is in the VISA code. There are at least 3 levels where buffering could occur. The top level is the VISA code in user space, the second is in the kernel extension in kernel space, and the third is on the hardware for the PCI board (or other device) itself. I was assuming that the synch/async difference was all in the user space API calls.
Originally I assumed that the user space for async would send the data to the kernel driver, set up a completion call back and then return to Labview. Thus labview could do other things with that thread. When the call back came, then any error codes or returned data would be passed back to Labview for execution. It turns out that instead of returning, the VISA uses a spin where it continually polls the driver for completion. Either way the data is transferred to some buffer in user space which should be setable.
Those error codes that started the thread should not be driver level filled buffers. That would not be a VISA error but deep in the ni488.kext or the nipal.kext.
07-13-2009 04:58 PM
That is correct! VISA is not the source of the error it is reporting the real condition elsewhere that prevents timely completion of the function.
As an asside, you only listed the PC side areas that buffers exist. In the instruments there are generally input buffers and a command que that add delays to exectution of a send command beyond the base execution time of the function refered to in the command.
07-20-2009 07:56 AM
@Jeff Bohrer wrote:
It is counterintuitive but Async calls return when the data leaves the buffer (so execution time is variable, for a given size transfer, dependant on how much data was previously in the buffer. so the execution time is asynchronous.) Synchronous calls dump data to the buffer and return (so execution tim, for a given size transfer, is always the same or synchronous) Sync calls leave data in the buffer when they return. Async calls require the buffer to be emptied before they return (unless they time-out).
I am going to disagree with what you have stated regarding Synchronous and Asynchronous transfers. These actually mean different things depending on whether you are calling NI-VISA through text-based languages, or if you are calling them through LabVIEW. For starters, NI-VISA ultimately ends up calling into a lower level driver, so absolute behavior depends on the behavior of that driver. For most read/write transfers NI-VISA does not have its own internal buffer, so for the sake of this discussion I will explain how it works with NI-488.2. Things can behave differently with serial, but as I said, it depends on the specific serial driver in use.
Outside of LabVIEW:
Synchronous calls will block until the transfer has actually completed all the way out to the instrument. The call will not return as soon as the data has made it to the buffer. Once the call returns the data has already made it across the bus.
Asynchronous calls will return immediately after registering the request, returning control back to your application, and will set the CMPL bit when the transfer has gone all the way across the bus. This will allow your application to continue doing something else with the thread, and you need to watch for CMPL (usually through ibwait or ibnotify) in order to know when the transfer is complete.
Inside of LabVIEW:
Sychronous calls behave exactly as they do outside of LabVIEW. Since the call blocks until complete, the number of simultaneous transfers is limited by the threads available in your LabVIEW execution pool.
Asynchronous calls are wrapped by a layer in LabVIEW which makes them behave synchronously from the perspective of the user. The original call still returns immediately, but the driver is then polled to find out when the transfer has been completed. This allows the threads to be used elsewhere in the system, but the method used causes the inefficiency that sth mentioned, which is excessive CPU usage due to the polling. With Async calls in LabVIEW you can have more transfers going than your number of threads, but overall performance can go down due to the processor overhead.
-Jason S.
07-20-2009 10:01 PM
Jason, thanks for the clarification. It leaves me at least 2 questions for now.
1. Inside LV, there is a VISA primitive "Set Buffer Size" that allows setting the input, output or both buffer sizes. The behavior is different on different platforms (drivers) for the serial port. Where is this buffer which is getting resized? In the VISA code, in the kernel/driver code or in LabVIEW.
2. Given the description and the inefficiencies that you outline, what is the downside of allocating a bunch of threads to LV at least a lot more than any possible number of simultaneous I/O transfers and then do everything synchronous? The NI utility tops out at 8 threads per execution system, but why not 100? Even if each thread takes 100 Kbytes (which is a lot), this is still only 10 MB. This can easily be set by messing with the preferences file which will accept any number.
At the moment I have topped out my system at 8 threads per execution system since that is easy and sort of sanctioned by the LV utility. But I am more than willing to set it to 100 if that just has a memory tradeoff. Or rather about 20 which should be more than any conceivable number of I/O calls that I am doing.