Interrupt driven high speed digital i/o with 256MB onboard memory support

pranav · ‎12-05-2006

We are trying to build up an application in which appropriate NI board has to read two 13 bit I/O ports upon occurance of an interrrupt (interrupt rate is about 100000/Second)). These two 13-bit information is to be stored in a specific location of a 256MB onboard memory acording to their value. After certain time interval(say 1 hour), the process of reading two ports and storing data in onboard memory should stop for a while and a data stored in this 256MB onboard memory should be transferred to PC(windows-xp) memory (RAM). The application software will then process further according to the rquirements.

We would like to know, which NI BOARD supports these features ? and also please pass on any additional information. Thanks

Travis G. · ‎12-06-2006

Hello Pranav,

For this type of application, the devices to consider are the PCI/PXI-654x, PCI/PXI-655x, or PCI Express 653x devices. All of these devices can acquire from over 13 digital lines using an external sample clock of speeds much greater than 100kHz. The hard part comes from arranging these samples in memory according to their value, and whether to use onboard memory or directly acquire to system memory(RAM). The NI-654x and NI-655x are PCI or PXI device that come in three varieties with either 1, 8, or 64 Mb/channel of onboard memory, but based on my calculations, even the largest memory option will not hold enough data for an hour-long acquisition at 100kS/s. For example, acquiring at 100,000 bits per second for 1 hour (60sec*60min) = 360Mb/channel of onboard memory required. Based on this limitation, I would recommend using the PCI Express 653x device. The PCI Express technology used by this device alleviates the need for onboard memory and allows the device to stream the acquired data directly into system memory(RAM). However, there is no way that I know of arranging these samples in system memory according to their value as they are acquired. Instead, I would recommend rearranging the samples post-acquisition in your application, in addition to any further processing that you require.

I hope this information helps and let us know if you have any further questions.

Travis G.
Applications Engineering
National Instruments
www.ni.com/support

Kevin_Price · ‎12-06-2006

What is the nature of this "specific location within 256 MB memory, according to the value of the bits?" Would this be a binning operation, where you want to count the # occurrences of every possible 26-bit pattern? It kinda sounds like it because 2^26 is about 64 million values, and 256 MB lets you store a 4-byte (32-bit) count for the # occurrences of each pattern.

I'd think that if your XP PC can't handle the processing on the fly, you might want to stream to disk rather than to RAM. You're looking for a rate of about 400 kB/sec. If you're only binning, I'd think you might be able to count on the fly. If processing is more complex, you should certainly be able to stream to disk at that rate.

I'd think you might be able to consider a PCI-express version of the M-series 6259 board, which gives you 32-bits of hw-timed DIO along with additional static DIO, analog IO, and 2 counters. It probably won't have as much onboard memory, but it appears that no board has enough to avoid the need to transfer data to system RAM. The 6259 would give you much more variety for future apps, and it's even cheaper out of the box -- what's not to like?

-Kevin P.

ALERT! LabVIEW's subscription-only policy came to an end (finally!). Unfortunately, pricing favors the captured and committed over new adopters -- so tread carefully.

pranav · ‎12-07-2006

Thanks Travis and Kevin but still I need help.

I am sorry I did not give full and exact information about our application. Here it is in detail.

Two 13-bit data comes from two different detectors on the above two i/o ports simultaneously and at the same time a separate interrupt signal is generated by the detectors' electronics indicating the data is ready on i/o ports. This interrupt is to be used to read these ports. Here I say sorry that I said interrupt rate is 100K/Second. But it is the maximum rate of occurrence. However, it is not periodic having [1 / 100000] second time period. It is purely random. That means for any given second there would be few tens of occurrences and during the next second it could go up to maximum of 100k occurrences. So we need the maximum capacity of 100K.

Two 13-bit data represents the coordinate on X and Y axis respectively in 3-dimension view(x-y-z) ,where Z axis represents the number of occurrence of an event at specific coordinates. For example, if there are two occurrence/interrupts, which give the same value of 13-bit data, (0FFF, 0FFF). This is equivalent to (4095, 4095) in decimal. So the point at coordinate (4095, 4095) is raised by 2. Here 2 represents the value of Z, the number of occurrences. Now question is why 256MB memory is required ?

I/o ports are 13-bit = 8192 long. So each axis has 8192 points on it. This point we call as channels. So it is 8192 * 8192 = 64 MB array in which each coordinate is one byte long. That means a channel pair (pair of x-y coordinate) can store up to 256 occurrence in it. But our application demands more. So we kept 4-byte= 32-bit for each pair of channel. That is equal to 2 to the power of 32 = 4294967296. So final equation is 8192 *8192 * 4 = 256MB.

Second point is why I am looking for 256MB onboard Memory. Allocation of 256MB memory in PC RAM is possible as we have I GB RAM. However, in user mode of WINDOWS(XP), as it is not real-time OS and if I use direct PC interrupt method, the latency are high and we may loose some of the data(during acquisition) which might be available at higher rate on i/o ports.

But there is a solution for this problem. and that is we need to use Kernel mode ISR and kernel mode memory allocation which works quite fast. This is possible, but the problem is that the PC in which this acquisition is going on, can not be used to run another application other than data acquisition. And this is not desirable because we need to run application software which processes this 256MB data, on the same PC. As this task requires intensive Graphic processing, there is a chance that even kernel mode acquisition gets affected and results in to higher latency and loss of data (acquisition) at the input terminal.

So if the digital I/O card has onboard 256MB memory and a small processor on it, the acquisition task can be run in background and when user want to see the data, the application software issues command to the DI/O board to stop the acquisition for a while and whole bunch of 256MB data is transferrerd to computer memory. Upon getting data, the acquisition process is restarted and data analysis can be done in PC. This way the two processes, data acquisition and data analysis would not interfere each other as they are processed by different processor and stored in different memory.

I do not know whether single array of 256MB in PC memory would be able to do above tasks ?

regards

Pranav

Kevin_Price · ‎12-07-2006

Thanks for all the helpful detail. It appears I guessed right about using the 256 MB for "binning" (counting occurrences) the possible 26-bit patterns.

I can't comment usefully about ISR latency and kernel mode because you already know more than me. However, I rather strongly suspect that interrupt handling is quite beside the point. The data acq tasks *should* be set up to use DMA for data transfer from board to system RAM, not interrupts. This would normally be the default mode under DAQmx anyway unless you explicitly asked for interrupts.

Suggestion: use the detector's "data ready" pulse as an external sampling clock for the data acq task, and don't think in terms of interrupt handling.

I'm not personally aware of a board with 256 MB onboard memory, though I could very easily just be ignorant -- can anyone from NI comment? (And not just as an excuse to call me "ignorant," ok?

) I'm still inclined to think it isn't needed, provided your data acq uses DMA and you structure your app with reasonable care. This is especially true if your rate peaks at 100 kHz, but has an average that's much smaller.

A "producer-consumer" design pattern using a queue for the data can be an efficient way to provide a buffer between your 256 MB array binning function and your data acq servicing function. The essential key to queue efficiency is that when you read data from your data acq task you read it as a U32 array and send it directly into "Enqueue Element" and NO OTHER PLACE. This allows LabVIEW to optimize under-the-hood and simply put a pointer in the queue and let the queue claim responsibility for the memory that's being pointed to. If the wire branches somewhere else, data copies will be made of the arrays and will slow you down with new memory allocation.

If you use a 6259 board, I'd recommend that you wire to bits 0:12 and bits 16:28 so you have a low word and high word. I'd also wire the remaining port bits to digital ground to guarantee that they're always 0. Then when you get a 32-bit digital port value, it's very easy to convert the bit pattern to your X-Y coord indices.

One approach for your "consumer function" would go like this:

1. Read 1 array of U32's from the queue.

2. Auto-index a For Loop using this array -- each iteration of the loop will increment through the values from start to finish.

3. Inside the loop, use the LV function that splits a 32-bit u32 into a high u16 and a low u16. I think the name is something like "Split Number".

4. Use the two u16's as indices into your pre-allocated 256 MB array. Extract the current value at that location, increment it, then use "Replace Array Subset" to put it back at the same location. The 256 MB array comes needs to come in and go out of the loop using a Shift Register, not tunnels.

5. Return to step #1 to repeat on next chunk of data.

I'd make an educated guess that such simple-ish processing would handle more than 1 million samples/sec. Your app seems very plausible.

When you try to combine this data acq program with a separate analysis program, you may need to start looking into explicitly setting vi priorities and execution systems to give precedence to the data acq program.

-Kevin P.

ALERT! LabVIEW's subscription-only policy came to an end (finally!). Unfortunately, pricing favors the captured and committed over new adopters -- so tread carefully.

Travis G. · ‎12-08-2006

Hello Kevin and Pranav,

Those are some great suggestions by Kevin for how to properly set up this type of application. To reinforce what Kevin said, all DAQ devices will use DMA transfers by default and have multiple DMA channels to transfer acquired data to system memory rather than interrupts. With the device having a direct DMA channel to system memory, no data should ever be overwritten on the onboard memory, especially at rates of 100kHz. If data is ever overwritten in the memory onboard the device before it can be transferred to system memory, the driver will throw an error indicating that a onboard memory buffer overflow occurred. What especially helps to avoid these onboard memory buffer overflows is using a PCI Express device like the PCIe M Series device Kevin suggested or the PCIe-653x devices, as the PCI Express bus gives the device a much wider dedicated pipe to pump data into system memory with DMA. Because of this 'double buffered' type of acquisition (buffer in onboard memory on the device and buffer in system RAM), you don't need a device with a full 256MB of onboard memory to hold all samples from the finite acquisition of 256MB worth of data. However, just for further reference, the only devices with 256MB of memory actually onboard the device are the high-memory option of the NI-654x Digital Waveform Generators/Analyzers, and this memory is devided up into 8MB/channel segments.

The next bottleneck to consider is actually reading the data out of system memory into your LabVIEW program (are you using LabVIEW?). This is where system dependancy and system memory allocation becomes a factor. Using the producer/consumer architecture in your program that Kevin suggested is the best way to have your read-from-buffer operations separated from your data processing and File I/O operations, and attach priorities to each operation. In addition, you may want to look into the LabVIEW Real-Time development environment, although I'm not sure that this is really required. This development tool will allow you to create your program on a development machine and deploy it to a system running a non-Windows Real-Time operating system. This allows the code to run in the kernel mode as you indicated, gives you more power to control execution priorities, and avoids a lot of the jitter and system dependencies involved with running your code on a Windows system.

I hope this information helps,

Travis G.
Applications Engineering
National Instruments
www.ni.com/support

pranav · ‎12-12-2006

Hello Kevin and Travis,

Thanks for all useful informations. I am working on it with the help of your valuable tips. Nodoubt some points may require further discusssion. In that case I will revert back. All the interactions were informative, thanks once again.

-Pranav

pranav · ‎01-19-2007

Hello Kevin and Travis

I hope you will hear me once again. I did my 'home work' after getting valuable information from you during previous discussions. But still there are some hurdles.

Up to know I mainly focused on acquisition speed related problem that is, whether DAQ card will be able to do this task. However, after studying all recommanded cards( PCI-6533, 6534, PCIe- 6536, 6259), some other problems have arised apart from the above mentioned one. They are

1) Our ADCs need atleast one feedback signal(acknowledgement) from DAQ card after every read of data (26-bit), indicating that data is accepted. only after getting this signal, ADCs will accept another signal from detector. And this will rule out the use of 6259 in pattern generator mode which you probably recommanded. However, I feel that we may use LEVEL-ACK mode(what is your comment ?) after inserting small additional hardware circuit between card and ADC. Unfortunately, 6259 does not support any of 6 available handshaking modes including Level-ACK. So we have three alternates left, that is 6533, 6534, 6536.

2) The other problem is that our application needs the information about the deadtime along with total period of acquisition time(i.e realtime). The deadtime is the total time during which ADC remains blind for the incoming signal rate. This is due to some finite time required (could be order of few microsecond) by whole setup(ADC-DAQ CARD-Computer) between each read of data. For this, our ADCS provide a seperate digital output line called 'composite deadtime (cdt) at the output, which we used to gate(make ON) an onboard timer(8253) in our previous application( 256 * 256 resolution). Unfortunately, all 6533, 6534, 6536 doesn't have onboard timer and corresponding hardware input lines. I thought about software base internal timer, but it seems quite difficult as all 32-bit data are dumed in the memory using DMA without any processing. Again some additional hardware stuff may require !! I am thinking on it (possibly using 'change direction' feature on 32 DI/O lines). Any suggestion ?

3) In addition to above, our ADCs also want(so much demanding!!)one more digital line for START/STOP acquisition, at the begining and at the end of acquisition. As there is no seperate digital lines left, again it seems that some additional hard ware stuff plus 'change direction' feature might require once more (any new idea for this ?)

4) This is about the acquisition speed and all. I am still not clear!! Confused in selection between PCI-6534 and PCIe-6536. But before I say somthing more,I would like to make sure whether the data transfer process which I understand is the same as you suggested. And that is, a bunch of 32-bit data will be read and tranferred to computer memory(array) by DMA process. This array (I will call it A1)is different from the one which is the final one and having 256MB size (I will call it A2). A1 could be of any size but large is benificial. So DMA process is dumping 32-bit(actually 26) data, which is actually an address of a specific location in A2. So this is a producer function. Now, in between, application software read the 32-bit of data one by one and increase the value of content by one, at a location pointed by this address. This is a consumer function. (I am keen to know, which function would have a priority?)

Now problem is as follows. 6534 has 64MB onboard RAM and transfer rate is about 20MBps, where as e-6536 doesn't have RAM but transfer rate is 100MBps. (I hope these rates are correct). Now our application has maximum input rate is 100Khz * 4 byte(32-bit) = 400KBps. So compared to 6534, e-6536's producer function will engage the array A1 for a smaller period of time while dumping the data in to it. However, at the other hand, if PCI bus is not free for DMA transfer, e-6536 will have to wait untill it becomes avilable. whereas 6534 may probably store in its onboard RAM and will be ready for the next acceptence/transfer of data. So which is the better choice, 6534 or e-6536 ? or simply 6533 (no onboard memory, 5MBps rate and cheap!!)

Thanks and warm wishes

Pranav

Travis G. · ‎01-22-2007

Hello Pranav,

The more I learn about your application, the more I think that the Digital Waveform Generator/Analyzer devices we've discussed are probably not best for your application. Typically, these devices are used to generate or acquire hardware-timed digital waveforms for testing memory devices, image sensors, and display panels. Your application sounds more like trying to interface with and control a peripheral chip (26-bit ADC) with a customized messaging protocol. All of the NI-653x devices have some basic capabilities to interface with peripheral chips using the 8255 handshaking protocol. This protocol involves transferring data between the NI-653x devices and the peripheral digital chip by handshaking REQ and ACK lines, and may or may not use a clock signal. The details of this handshaking protocol are discussed in the following application note, and also in the help manuals for each device:

NI 653X User Manual

I would highly suggest studying the handshaking capabilities of these devices to see if they are compatible with your peripheral. However, if your peripheral chip requires a more complicated and customized handshaking protocol not directly supported by the NI-653x devices, I'm worried that you will run into the limitations of the NI-653x devices and not be able to successfully implement your application.

Another option that comes to mind is for you to implement your own digital circuit to interface with your peripheral chip. This can be done on an FPGA based device using the LabVIEW FPGA module. This involves developing FPGA code within LabVIEW that is then deployed and executed on an FPGA device like a National Instruments R-Series reconfigurable I/O device. Basically, this is like designing and developing your own customized digital communication chip to interface with your peripheral. However, the LabVIEW FPGA module's graphical programming environment makes this easier than it sounds. To get an idea of the capabilities of the LabVIEW FPGA module, take a look at the following tutorial:

Developing Digital Communication Interfaces with LabVIEW FPGA

I hope I haven't confused your choices even more, but based on your inital post, the task seemed easily accomplished with the NI-653x Digital Waveform Generator/Analyzer devices. However, after finding out all of the details of your communication protocol, the application does not seem so trivial.

Travis Gorkin

Applications Engineering

National Instruments

pranav · ‎01-25-2007

Dear Travis,

Thanks for the information about the FPGA based data acquisition. I am thinking on it, but simultaneously, I have restarted thinking about PCIe-6259 for our application, just to make the things simple and straight forward at this stage. I can look for FPGA based system if all the other alternates are closed. This is because FPGA system needs few other software modules like FPGA deployment etc which we do not have at present. At the same time, it will be slightly complex and may require more time to implement.

At present, I am trying to find out the best suitable DAQ card for our application and some how again I feel that PCIe-6259 may be useful. I have come to a conclusion about the possible logic (strategy) for acquiring the data using PCIe-6259 in the following way. I need your help to find out whether this strategy is feasible or not.

Three main tasks are to be performed in our application.

(1) Acquisition of data on high-speed 32 DIO

For this task, what I have planned is to use a ‘change detection’ feature on one of 32 DIO. One DIO line will be connected to ADC’s ‘READY’ signal, which indicates that data is available to read. I have gone through the user manual of PCIe-6259 and found that on either rising or falling edge of this signal a change detection event is generated which can be used as a “DI Sample Clock’ to acquire the data available on DIO lines. In addition, change detection event can be routed on one of the PFI lines. So eventually change detection should trigger another two events and that is generation of DI Sample clock and a pulse (or change of digital level) on one of PFI lines. Out of these two, DI sample clock will be used to read 26-bit data on D0-D25 and signal generated on PFI line will be used to reset the ADC. Resetting of ADC will make it (ADC) ready for the detection of another signal which comes from detector.

I need your opinion about this whether it is possible or not. That is, Can a single change detection event generate two other signals (DI sample clock and a pulse on PFI line)? Also I would like to know whether the change detection event will trigger ‘DI sample clock’ only once so that only ONE bunch of 32-bit data (actually 26-bit) will be read. What I mean to say that change detection event should not create a series of clock signal which read the data lines continuously and even if data is not available on it.

If this is possible, I will be able to acquire data even without handshaking.

(2) Controlling (making on/off) the onboard counter by external digital signal (called ‘CDT’) available from ADC.

As I mentioned earlier, I need to count dead-time during one acquisition period. For this a separate digital line called ‘cdt’ is available from ADC, which I would like to connect to gate of one of the counters available on PCIe-6259. I also need to read and clear this counter(through software) at some time during, at the beginning and at the end of the acquisition. I hope this is quite possible with PCIe-6259.

(3) Start and stop the acquisition (making ADC ON/OFF)

Our ADC needs one slow speed (static I/O) digital line which can be make ON and OFF. This is required to activate or deactivate the ADC in the beginning and at the end of the acquisition. PCIe-6259 has several PFI lines. From this lot; I would like to use one line in static I/O mode for above purpose.

For your easy reference, I have preapared a block diagram and line of actions of above explanation. Please see it in the next post (this is because I am not able to post it due to higher file size)

Thanks and regards

Pranav

Digital I/O

Interrupt driven high speed digital i/o with 256MB onboard memory support

Interrupt driven high speed digital i/o with 256MB onboard memory support

Re: Interrupt driven high speed digital i/o with 256MB onboard memory support

Re: Interrupt driven high speed digital i/o with 256MB onboard memory support

Re: Interrupt driven high speed digital i/o with 256MB onboard memory support

Re: Interrupt driven high speed digital i/o with 256MB onboard memory support

Re: Interrupt driven high speed digital i/o with 256MB onboard memory support

Re: Interrupt driven high speed digital i/o with 256MB onboard memory support

Re: Interrupt driven high speed digital i/o with 256MB onboard memory support

Re: Interrupt driven high speed digital i/o with 256MB onboard memory support

Re: Interrupt driven high speed digital i/o with 256MB onboard memory support