10-16-2015 06:36 AM
Hi,
I am getting AI Data from a FPGA over a DMA FIFO in an interleaved format. So the order of elements would be AI0, AI1, .. , AIn, AI0, AI1, and so on. For my application I need that data to be in the usual daqmx format (channels in rows, multiple samples in columns). So I basically want to replace a daqmx read with my DMA FIFO read (following the replacement of a PXIe card with 2x NI 9223 modules in a MXI Chassis).
I came up with this solution using decimate array and then building these arrays into a 2D array. However there are a lot of buffer allocations and it is not scalable.
The example generates some data for 8 channels and puts 7200 of those in an 1D array (8x AI, 3600 triggers per rev, 2 revs per combustion cycle = 8x 7200 Samples to read).
As no new data is generated I think it should be possible to convert the array without allocating a new buffer, right?
On top, a scalable approach to programmatically set the number of rows (channels) would be really nice.
Lukas
10-16-2015 06:46 AM - edited 10-16-2015 06:52 AM
Hello Lukas,
look at "reshape array". This is what you search for.
10-16-2015 06:47 AM
10-16-2015 08:22 AM
Hi Dave and Gerd,
thanks for the fast response.
Reshape does something different though, compare the resulting arrays (decimate and build on top, reshape as in Gerds example on the bottom):
If I change the inputs for the reshape function and then transpose the resulting array I get the right result but the operation takes around 5x longer to complete than decimate and build. The reshape without transpose is faster though.
I attached a new version of the example with all three approaches in a disable structure.
Looking at memory usage there is a buffer allocation for both the reshape and the transpose VI. I think this is because of the changing dimensions of the array although the amount of elements stays the same. Is there a better way to do this?
10-16-2015 09:00 AM
Hello Lukas,
the best is, to avoid constantly new allocated memory. My best solution is, to use "initialize array" and replace the elements. The performance results are shown in the screenshot below.
As you can see, the approach with replacing the elements in two nested for-loops is three times slower than the one-loop-approach. There you can use the autoindexing feature and the fast quotient&remainder- operation fits very well our needs.
For some reason I'm not able to upload a VI, so here's the BD
10-16-2015 04:18 PM
daveTW, strange benchmark results...
For me transpose and reshape is much faster. First run (memory allocation) is the same, then reshape&transpose 3-10 times faster.
LV2011, both front panels closed during test, array size was reset before test (to ensure the same starting conditions.
10-17-2015 01:59 AM
Thats interesting, I have to look at that on monday!
I got around 200uS for the decimate and build into 2D version, rehsape and transpose was always over 1000 uS.
10-17-2015 10:32 AM
That's not a very good benchmark, because the reuslts seem wildly variable. I also probably would not run the two subVIs in parallel.
You can get orders of magnitude faster by disabling debugging and inlining of the subVIs . I've seen cases where reshape was 5x faster or the opposite with no real change in code. This probably needs to be investigated with more precise benchmarking code.
10-17-2015 11:42 AM - edited 10-19-2015 04:33 PM
OK, the main flaw is in your loop code, spoonfeeding one element at a time and doing a Q&R so many times.
You get much faster replacing one column at a time.
With a more precise benchmark, I get the following on my laptop:
Your loop: 7ms
Reshape/transpose 3.0ms
My loop: 3.5ms (Edit: the original posting had a bug, reposted with correct code)
Of course this is LabVIEW 2015. No telling if it is different in your LabVIEW 2011.
(This is just a quick draft to give you some ideas. Please verify correct operation! :D)
10-17-2015 12:15 PM - edited 10-17-2015 12:16 PM
Your loop looks like a really good appraoch, there should be just one memory allocation for the initialize array function, right (besides the input data of course)?
I will look into that on monday and test it on my system. I'm also curious how your benchmark works. The execution times in the chart look a lot more stable than what I measured in my first test.
Regards,
Lukas