Help with inline processing for Memory Optimization

SeanDonner · ‎06-02-2009

Hello all. I have an embedded PXI system who's sole purpose is to gather digital data. I've been tasked to see just how much data we can gather on our PXI-8106 Real-Time controller before we run out of our 2GB memory.

The digital data is being captured by a PXI FGPA card and being DMA'd up to the Real-Time process running on the controller. The storage for the data on the controller uses a functional global that is pre-allocated before the test begins to maintain determinism and prevent jitter. Each 32-bit digital word that the FGPA captures has a 32-bit word-counter and a 32-bit timestamp attached to it prior to being sent up through the DMA channel. Once the test is complete, the large "compressed data" array is then de-interlaced into three seperate arrays (word count, data, timetag) and wrapped up in a cluster; this is where I see a problem. After I reformat the compressed data to its 'cluster of arrays', I have now doubled the amount of allocated memory when I really don't need the 'compressed data' array any longer. I was hoping somebody could offer me some help on how I can inline this conversion prior to storing the data such that only the final format of the data is left in memory, cutting my memory needs in half and thus doubling the amount of data I can gather. We are stuck with using LabVIEW 8.2 so I don't think we have any access to some fancy memory deallocation VI's that i've read about.

Here is the functional global used to store the "compressed data" that we get back everytime we do a DMA Read. This functional global has three methods: clear data, add data, and read data.

Here is the data conversion VI that converts the compressed data into its final form; ready to be TCP'd up to the host computer. This VI is passed the "CD array" from the "Read Data" case of the functional global above.

Thanks in advanced for your help.

RavensFan · ‎06-02-2009

Do you really need a cluster of 3 arrays as your end result data structure. Why not just go with the 3 arrays?

Rather than a cluster of 3 arrays, why not make it a 1-D array of the cluster?

Why not do all of your deinterlacing inside your functional global variable to create whatever final data structure makes the most sense. That way you only maintain one copy of the final large data structure rather than a copy of the original, a copy of the final, and copies of the intermediate data structures?

SeanDonner · ‎06-03-2009

Ravens Fan wrote:
Do you really need a cluster of 3 arrays as your end result data structure. Why not just go with the 3 arrays?

Rather than a cluster of 3 arrays, why not make it a 1-D array of the cluster?

The final output decision of a cluster of 3 arrays was made long ago (3 years IIRC). Immedaitly after the de-interlacing, this cluster isflattened to a string and then sent to the host PC via TCP/IP. Wrapping the arrays in a cluster made this very easy to do. At this point, this format is unfortunately set in stone for all intents and purposes as it would require a rewrite of some upstream API's in released code that expects it. To go down this path, I would have to prove that changing the output format would be the only way to fix this memory copy problem. I don't believe this is the case, is it?

Why not do all of your deinterlacing inside your functional global variable to create whatever final data structure makes the most sense. That way you only maintain one copy of the final large data structure rather than a copy of the original, a copy of the final, and copies of the intermediate data structures?

On the way home I was thinking about possible solutions and that is one I thought of and wrote down to test tomorrow. Just to be sure we are on the same page I was thinking on altering the "Read Data" case by tapping off the compressed data wire to the de-interlacing VI and then using a cluster indicator as output from the SubVI. I'm hoping this would prevent the double copy.

If not, my other idea was to de-interlace at the very beginning of the functional global, before it even enters the case structure. I would have to maintain 3 seperate arrays, each 1/3 the size of the current compressed data array and then in the "read data" case I would simply wrap up the 3 arrays in a cluster.

I hope one of these two ideas does the trick, otherwise I'm at a loss on how to do this and still keep a cluster of arrays as the output data structure.

Ben · ‎06-03-2009

SiegeX wrote:
Ravens Fan wrote:
Do you really need a cluster of 3 arrays as your end result data structure. Why not just go with the 3 arrays?

Rather than a cluster of 3 arrays, why not make it a 1-D array of the cluster?
The final output decision of a cluster of 3 arrays was made long ago (3 years IIRC). Immedaitly after the de-interlacing, this cluster isflattened to a string and then sent to the host PC via TCP/IP. Wrapping the arrays in a cluster made this very easy to do. At this point, this format is unfortunately set in stone for all intents and purposes as it would require a rewrite of some upstream API's in released code that expects it. To go down this path, I would have to prove that changing the output format would be the only way to fix this memory copy problem. I don't believe this is the case, is it?

Why not do all of your deinterlacing inside your functional global variable to create whatever final data structure makes the most sense. That way you only maintain one copy of the final large data structure rather than a copy of the original, a copy of the final, and copies of the intermediate data structures?

On the way home I was thinking about possible solutions and that is one I thought of and wrote down to test tomorrow. Just to be sure we are on the same page I was thinking on altering the "Read Data" case by tapping off the compressed data wire to the de-interlacing VI and then using a cluster indicator as output from the SubVI. I'm hoping this would prevent the double copy.

If not, my other idea was to de-interlace at the very beginning of the functional global, before it even enters the case structure. I would have to maintain 3 seperate arrays, each 1/3 the size of the current compressed data array and then in the "read data" case I would simply wrap up the 3 arrays in a cluster.

I hope one of these two ideas does the trick, otherwise I'm at a loss on how to do this and still keep a cluster of arrays as the output data structure.

That may help along with taking the next steps of putting the logic that converts and transmits the cluster in that state playing "Chase the dots" as you go.

Another approach is to convert the AE over to use the final cluster format and take advantage of the in-place operators (were they available in 8.2?, I think).

Have fun,

Ben

Retired Senior Automation Systems Architect with Data Science Automation LabVIEW Champion Knight of NI and Prepper LinkedIn Profile YouTube Channel

RavensFan · ‎06-03-2009

SiegeX wrote:

If not, my other idea was to de-interlace at the very beginning of the functional global, before it even enters the case structure. I would have to maintain 3 seperate arrays, each 1/3 the size of the current compressed data array and then in the "read data" case I would simply wrap up the 3 arrays in a cluster.

This would be my recommendation.

In case you dont' know, go to Tools/Profile/Buffer Allocations. That will show whenever a data copy is being made with a small black dot on your block diagram.

If you still don't have it optimized as much as you want, post your latest VI with a small set of sample data saved as defaults. I am by no means an expert in predicting how to optimize memory allocations, but there are several expert in the forums who are "wizards" at this.

SeanDonner · ‎06-09-2009

As an update, I have tried both methods mentioned above. That is I first tried the cluster-of-arrays conversion in the "Read Data" state of the AE and that still allocated double memory. I then tried to do the de-interlacing outside the case statement of the AE, maintaining three seperarate arrays each 1/3 the original size and then simply wrapping them up into a cluster in the "Read Data" state. Not only did memory not reduce considerably, but the CPU time on the 8106 progressivly edged up to 100% after about 20 seconds; I guess de-interlacing is more CPU intensive than I gave it credit for. Also, this was only with one 1 of 10 channels going, so that idea is off the table.

It appears that without any other ideas the only solution is to offload the data formatting to the upstream host PC (linux). Anything short of flattening the array to a string is just going to hose us.

P.S. 8.2 does not seem to have any of the in-place memory primitives sans 'Replace Array Element'.

LabVIEW

Help with inline processing for Memory Optimization

Help with inline processing for Memory Optimization

Re: Help with inline processing for Memory Optimization

Re: Help with inline processing for Memory Optimization

Re: Help with inline processing for Memory Optimization

Re: Help with inline processing for Memory Optimization

Re: Help with inline processing for Memory Optimization