01-05-2013 02:04 PM - edited 01-05-2013 06:20 PM
wetland wrote:You can change the numbers. I think that may be the zeros is for you a problem. Ok What l have is an answer of 5 to 12. I do not want an answer of 5 to 12,0,0,0. For me the zeros would represent a pixel on row zero. I cannot have code that artificially provides a zero after an image has been analyzed. I am happy that all the code l have provided the answer l want. What l am trying to obtain is some of your experience of inplace structure and DVR as when to best use it. Your following answers to other parts of your reply have helped.
You are talking about at least three different issues and I have problems following you on my own pogo stick.
LabVIEW is very good at memory managements, but sometimes if a wire is branched or a subVI is entered, a copy of the data needs to be made in memory. a DVR guarantees that all parts of the program operate on the same data in memory and no copy is ever made (except for indicators hooked up to it, etc.).
wetland wrote:Ok when using execution the first code that provides an answer is the numerical array code. This is important for to know because images contain a large volume of data. But l am also interested what is best code for large programs because simple vi code perform better than a large program. For me it is clear that for a simple vi the numerical array code is the best. However l would like to know if the use of DVR and Inplace structure code would be far better to use for large programs. E.g.. provide better performance etc.
Highlight execution provides no useful information about benchmarking or execution order. Consider the follwing code:
Under regular execution, the top loop is many oders of magnitude faster than the bottom loop. Under highlight execution, the bottom loop is many orders of magnitude faster than the top loop. In one scenario, the upper indicator is written first and in the other scenarion the bottom. There is no correlation.
When benchmarking, you should also disable debugging and Bob's your uncle.
What's good for a small program is typically even better for a large program. Why do you doubt this?
@wetland wrote:
What do you mean by IPE protects from concurrent access?
Since the data in a DVR is potentially accessed concurrently by many different code parts, it needs to be protected that at any given moment only one code element has exclusive access. Imagine two CPUs loading it in their processor cache, making some modification, and writing it back to RAM. If that were allowed to occur concurrently, you'll end up with a random and partial result. Whatever writes last wins. Thus, you can only access/modify the DVR data using an in place element frame (IPE) with DVR terminals. While the execution is inside this IPE, that code fragment has exclusive access to the DVR data and all other IPEs that simultaneously try to access the DVR data need to wait their turn. This is very important to avoid chaos.
wetland wrote:Do you normally count the buffer allocations when you program? E.g.. count dots. Less dots better code.
No, in 99.99% of the cases, performance is irrelevant. Only if you have a bottleneck operating on large datasets, and performance is important, you could code a few alternatives and count the dots for each. The number of allocation dots sometimes can give you a hint where improvments are possible. You still need to benchmark.
wetland wrote:Do you use DVR and in place structure in your code often?
No, because I don't deal with huge datasets. Still, I probably could use them more. They are great! 😉
Part of the reason is habit. I started LabVIEW programming way before DVRs were invented. 🙂
wetland wrote:I noticed that in LVOOP you can open up a vi and all you see in the VI is input and output wires connected to an in place structure. So it seems to me that this is code l should place every where l can in a large program. Is this correct?
Why would you place the same code everywhere? Yes (with some notable exceptions, especially in the shipping examples :o), code found in the LabVIEW hierarchy (vi.lib, etc.) uses good coding practices and should be used as a good example.
The size of the program is irrelevant for this.
01-05-2013 11:07 PM
Hi RavensFan
You wrote:
"
Actually, highlight execution does not tell you what code will execute first under regular execution. Under regular execution, and on a PC with multiple cores, two pieces of code can execute simultaneously. But it won't look that way under highlight execution because LabVIEW needs to slow down the code and show only one wire executing at a time so that you can see it. Which piece of code it executes first can be arbitrary.
To prove this, go looking through the forums for the hundreds of posts where someone claims there code doesn't work right giving the wrong or unexpected result, while it seems to work okay in highlight execution. These people are the ones who don't understand data flow and are using local variables incorrectly causing a race condition. Yet when they run in highlight execution, the results of the race can be different because LabVIEW has determined to schedule and execute the code differently.
If you have a concern about what code executes first or second or last, highlight execution will not help you. You might fine that something could execute first on one PC, but something else might execute first on another PC if LabVIEW decides to compile the code differently because of processor architecture. If code execution order ever matters to you for something otherwise you have problems, then you need to make sure you write your code in LabVIEW to account for that."
Thank you for a very good answer.
I now have a better understanding of high light execution and l will now from now on only use it for troubleshooting. Yes l do have at times been forced to program on one computer then another and then finally find that CPU specifications that was agreed at the beginning have been cut in half. E.g. Two core computer to one core etc. So l have seen my code on one computer speed up or nearly lock on another. I have been after DETT for years and with LV 2012 l hope to wear it out since my last LV was 8.6.1.
01-06-2013 12:30 AM
Hi Altenbach
(For the last 2 weeks during Christmas l have finally got LV 2012 working at home on my 8 core AMD computer. This may change when l get to work tomorrow because l am at the moment using a work copy of LV to able to finally open vi's on a number of online courses l had done in the past. So if there is not response in a week you know l am disconnect from the forum because l am not allowed to use it at work).
My best note l have found for in place element structure use is: In place algorithm tries to replace memory allocation space where a memory allocation is currently repeated. So this data in no longer needed to provide a correct answer. So this is going to be my guideline for judging as to where to use in place structure. From this statement l can see why in a class structure input and output for an execute vi that you would use an in place element structure.
I am going to use the tools performance and memory and DETT. (At the moment l have a bug on DETT. It currently works on projects only instead of individual vi's as well. So l hope to do a full program repair again to see if this program works correctly). From the review of my code performance l would be using as an individual vi numerical array without -1.vi and for a project l could be using numerical array without -1 with DVR and in place element structure. The reason is when a number of VI's are operating code performs differently.
The always copy function l have to think out. I would like examples of it's use.
The notes l have on the always copy function is that is helpful for notifying the compiler that you definitely want to make a copy of data. This is clearer to me than what is stated in help: Use this function to control the outcome of Labview compiler buffer allocation process.
Other note l have is accessing data from DVR requires the use of in place element structure. I also have found that in the same vi you cannot use two in place element structures to access data at the same time from one DVR source. The reason both in place structures run serially. In place structures are blocking functions to prevent data access inconsistences. So you just cannot use in place element structures everywhere.
In the future from what l have read l will be using DVR to reduce the size of the program and to further improve readability/(hopefully performance).
Ok back to your replies:
"Since the data in a DVR is potentially accessed concurrently by many different code parts, it needs to be protected that at any given moment only one code element has exclusive access. Imagine two CPUs loading it in their processor cache, making some modification, and writing it back to RAM. If that were allowed to occur concurrently, you'll end up with a random and partial result. Whatever writes last wins. Thus, you can only access/modify the DVR data using an in place element frame (IPE) with DVR terminals. While the execution is inside this IPE, that code fragment has exclusive access to the DVR data and all other IPEs that simultaneously try to access the DVR data need to wait their turn. This is very important to avoid chaos."
Buffer allocations
"No, in 99.99% of the cases, performance is irrelevant. Only if you have a bottleneck operating on large datasets, and performance is important, you could code a few alternatives and count the dots for each. The number of allocation dots sometimes can give you a hint where improvements are possible. You still need to benchmark."
Do you use DVR and in place structure in your code often?
LVOOP inplace element structure
"Why would you place the same code everywhere? Yes (with some notable exceptions, especially in the shipping examples ), code found in the LabVIEW hierarchy (vi.lib, etc.) uses good coding practices and should be used as a good example."
01-07-2013 04:00 PM
I have a relevent question here also related to DVRs, IPEs and memory usage.
I am testing some usage of DVR on LV-ARM code trying to significantly reduce memory usage. With my initial technique I put all code operational data into a large cluster and created a DVR to point to the data. My initial thought was that I can use DVR and IPE to read/write data from a single place in memory. However, when I perform show buffer allocations it appears that buffers are allocated even inside the IPE structure. If Buffers are allocated inside the IPE then this technique appears initially that it will not provide any memory reduction and would actually increase memory allocation. Can one of the experts here please let me know if my assumption that the Buffer allocations will indeed occur for each DVR IPE referencing a cluster for R/W operation??
If a buffer allocation occurs inside a an DVR IPE then it seems to defeat the purpose of the IPE for DVRs used as memory allocation minimization tool. Also, any suggestions on how I can keep my memory allocations in ONE place and have controlled access for read/write to that location without copies of the data being created would be very helpful. In the code above I have a single element bool Queue that is dequeued during Write access in order to control access to the data between all subvis operated on by parallel loops. I realize that I can break up the cluster and make DVRs for each Data element but this increases the memory usage by having many reference values allocated for passing to subvis. My goal is to point to "All data" and operate solely on that single memory space in a highly controlled way. Labview help says "When you run a VI, LabVIEW might or might not use those buffers to store data". So this implies the buffers get allocated no matter what but LV may not use them. How can I gaurantee that Labview will not Allocate the buffer in the first place if it cannot be done with IPE?
01-07-2013 05:47 PM
What are you doing here? It looks like you have a queue inside a DVR that you're using as a mutex to lock access to an element inside the cluster. This is totally unnecessary. A DVR provides locking on its own (that's one of the reasons to use it). Only one in-place-element structure can access a particular DVR at a time. A DVR is NOT equivalent to a pointer in C.
An IPE does not magically prevent all memory allocation inside it - that would be impossible. All it does is tell the compiler that it should attempt to operate on the element "in-place," without making a copy. This applies only to elements that are specifically indexed on the border of the structure.
The code you show will be cleaner and equally if not more efficient if you use only the middle IPE, and only extracted the value from the DVR, then placed a simple bundle-by-name inside it instead of the IPE. It's quite possible the DVR is unnecessary, too, unless you are sharing the DVR among multiple loops. If your "All Data" is in a single loop, stored in a shift register, eliminate the DVR and structure your code to operate on the cluster in-place.
01-07-2013 09:18 PM
Thanks for the response.
The Queue serves to lock access as you suggest but it also serves as an efficient method of secondary functionality to control other loop processes across multiple loops. The DVR is employed because there are multiple loops and because I wish to clean up the source code a little with some subvis and use subvis to attempt to force memory deallocation in key places thus minimizing buffer allocations to single DVR references in these places where I pass to subvis.
My thinking here initially was that the DVR could be used as a reference to a single cluster of data that is passed where ever needed in the program and that IPE-DVR and IPE-Unbundle/Bundle could be used to force the original memory location in RAM to be utilized resulting in a very discrete set of RAM being utilized via a cluster accessed via IPE-DVR. If the IPE-DVR allocates the buffer inside the IPE then this theory is out the window and my clarity of the IPE-DVR is lacking sufficient information to completely comprehend its capabilities. From the Labview Help I understood this as the IPE would operate on the last buffer allocation on the incoming data wire without copying the data. LV help really does not explain this all that well. Your information helps clarify things a little though.
This is my first attempt at using the IPE. The ARM embedded seems like a good place to use IPE to minimize buffer allocations. Embedded ARM apps are inherently small and easy to keep track of all the data needed when the will is there to maximize embedded capability. IPE can definitiely help reduce the memory requirements but if another buffer is allocated inside the IPE-DVR then this particular thought experiment really loses its validity as a possible technique.
I am a very experienced Labview programmer but I am not sure why it is "impossible" to have some way to do what I would like to do here. I understand that Labview allocates buffers it needs for passing data to subvis and has some internal decisions the compilers make as to whether the buffer gets utilized but I am not sure why it is "impossible" to have a way to really control data access down to a single discrete locale in RAM (without going to C). It seems to me that the functionality I seek is simply a matter of merging the IPE-DVR with the IPE-Unbundle/Bundle into single structure. If a reference to the original buffer allocation is passed to the IPE-DVR read and the IPE controls access then it seems that the only thing preventing this data access from being access to ONLY the original buffer allocation is that the IPE-DVR and IPE-Unbundle/Bundle is not merged into a single structure to prevent a buffer allocation existing the DVR Read. My ultimate goal here would be to be able to easily confine dynamic RAM usage to discrete pre-allocated areas thus minimizing data copies down to a maximum of 1 copy and 1 primary thus knowing exactly how much RAM is utilized under all operating conditions before runtime and do all of this in a simple centralized way that enables an easy way to read total data usage at runtime.
Even if I use a subvi with uninitialized shift register as a functional global then there will still be a copy made just to get into the data in/out of the subvi on top of the working copy needed or acquired in operation.
Replies here are welcomed.
01-07-2013 11:23 PM
b1 wrote:
The Queue serves to lock access as you suggest but it also serves as an efficient method of secondary functionality to control other loop processes across multiple loops.
The queue is also setting you up for a deadlock situation. What happens when two of your subVIs try to run this code at the same time? One of them gets the queue reference and dequeues the element, then exits the two layers of IPEs. Since you have no control over execution order, potentially the second VI now enters the first IPE, and waits forever for an element in the queue. Meanwhile the first VI is also waiting forever to get access to the DVR, which has been locked by the VI that's trying to dequeue. You could "fix" this by using one large IPE to get the data out of the DVR, then doing the dequeue/update/enqueue inside of it, but at that point the DVR is doing the locking and the queue just makes your code messy.
Are you solving a real performance problem, or you just think you have one? The LabVIEW compiler is pretty good at determining when it can reuse a buffer even without an IPE. Instead of wrapping everything in several layers of IPE, you might start with simpler code (for example use only one IPE, for the DVR) and then put a second IPE for the cluster inside it only when you actually want to reuse a cluster element. In the example you posted, there's no need for the IPE to get the cluster element - just use a simple bundle. Strings inside clusters are stored as pointers, so all that's going to happen, both with and without the IPE, is that the cluster element will be updated to point at the new string (the old string will of course be freed if there are no other references to it). See the help topic "How LabVIEW Stores Data in Memory."
Buffer allocations can be misleading. I assume your concern is about the buffer allocation that appears to occur at the IPE border, when you get the cluster out of the DVR. There are three things to understand about buffer allocations:
1) Just because there's a buffer allocation dot doesn't mean that a new buffer is created; it just means one could be created there if necessary.
2) In a loop, even though there's a buffer allocation dot, the buffer may only be allocated the first time through the loop and then maintained through successive iterations.
3) Memory allocation can occur in places where there's no allocation dot, particularly when an existing variable-size type (string or array) grows.
What I think you're seeing here is LabVIEW telling you that it might have to make a copy of the cluster inside the DVR at the IPE border, because prior to that, there's no separate space allocated for the cluster. LabVIEW is copy-on-write - when you fork a wire, both wires contain a reference to the same original data. A copy occurs only when one of those branches is modified while the other branch is still using the original value. I believe the compiler is smart enough to schedule operations to avoid copies whenever possible, for example if only one branch modifies the data it will finish the branch that doesn't modify first, so that no copy is necessary.
Creating the DVR is a bit like forking the wire - until you try to modify the data inside it, there's no need for a separate copy. It's also possible to pass an invalid DVR to an IPE (for example, because some other user of the DVR destroyed it), in which case the IPE will return an error but will also allocate a new cluster so that the code can proceed. That's why you see the allocation dot in the IPE - because an allocation can happen, not because one will.
01-08-2013 04:31 AM
Hi b1
01-09-2013 01:50 PM
Thanks for the in depth response. I know this is not an ARM module forum section but my desires for improved memory allocation led me here due to the overlapping discussion of IPEs and DVRs. I explored the possibility of using these IPEs and DVRs to gain more control over memory allocation not only memory usage. In the ARM module if the buffers are allocated then it can cause the compiler to fail due to not enough memory. The buffer allocation is counted in the RAM space and can quickly run out of RAM just from Allocations thus causing the program to never complete a build simply due to excessive buffer allocations alone.
I have a working program already but I was attempting to improve memory allocation in order to add futher functionality to the embedded processor. In ARM there is the option of PEEK and POKE directly to RAM address that can enable detailed control of memory space, however, writing and reading memory using PEEK and POKE has other memory overhead issues in the casting of the data because these memory locations are read and written using integers. I realize I can break up the data out of the cluster and have more control over the buffer allocation sizes and how many occur, however, my intent with having a large cluster was/is to have a simple way to know at run time how much memory is being utilized is being utilized. There is a place in my program that flattens cluster data to string already and it is here where I can simply get the string length of the flattened data to determine the total memory size of the cluster. Also by thorough code review and memory analysis I can know every place data gets copied before run time and add this to a run time calculation. These two peices of information would enable me to write a simple routine to display current RAM memory usage on the embedded arm processor. Breaking up the cluster turns a simple, low overhead, memory usage routine into broken up routinges with memory overhead not worth pursuing. I guess ultimately the ideal scenario for what I envision here would be a way built into Labview to enable a real pointer based operation on cluster data. My initial thought was that DVRs and IPE would provide an equivalent functionality but that is clearly not the case from a buffer allocation stand-point regardless of whether the data is actually copied or not.
Right now I feel that my efforts for efficient labview code is really good due to the fact that I already have the following implemented on the LM3S8962 with 256K flash and 64K of RAM:
TCP client
64 Analog Inputs through 4 Analog multiplexers
AI oversampling up to 80 samples of moving average for all AI channels
up to 12 Dynamically deployable embedded PID controllers with optional integrated TEC hot side/cold side bit operation PID controllers
12 dynamically deployable on/off with tolerance controllers
48 serial digital output with timed on/off operation
48 serial digital input
48 additional PWM channels (expandable to 96 channels) on top of the 6 ARM integrated PWM channels
All of this on a single LM3S8962 processor with our own custom developed peripheral PCBs.
I sincerely feel like this is very memory efficient for Labview but I also know from my current analysis that if I could operate on a cluster in a manner where a pointer points to a single cluster in memory and read/write access is highly controlled to single memory location then I could likely double the capability I already have.
Any NI Labview R&D guys out there see the advantage I am speaking of here??
Either way, I really appreciate everyone's highly skilled responses and details provided in the thread. This has definitely helped enlighten me to the detalied operations of the DVR and IPE. Thank you!
01-09-2013 04:11 PM
Sounds like a neat project (and more interesting than anything I'm doing these days).
I'm not familiar with the details of the ARM module; are you saying that everywhere that a buffer allocation dot appears, the ARM module allocates memory, regardless of whether or not the program actually uses it?
Is it clear that using an IPE to access a DVR will always potentially allocate a buffer, because there's no way to know in advance whether the DVR is valid? That said, when the DVR is valid, no allocation should occur. Again, I don't know if the ARM module will still allocate memory just in case it's needed; standard LabVIEW on Windows will not.
b1 wrote:
if I could operate on a cluster in a manner where a pointer points to a single cluster in memory and read/write access is highly controlled to single memory location then I could likely double the capability I already have.
You likely already have the ability to do this. The best option is of course a simple wire carrying a cluster, possibly accessed through an IPE. I don't know about the DVR approach on ARM as explained above. A functional global variable can work too, if you're careful about how you use it. You can read the entire cluster out without making a copy (due to the copy-on-write semantics), but for modifying the cluster, your functional global needs functions to write specific cluster elements. Even better is to move functions into the functional global VI itself when possible (ie, if you need to increment a value, make an "increment" function instead of reading the value out, incrementing, and writing it in again). This avoids race conditions and operates in-place.
I'm not sure how you would have a pointer type that remains consistent with LabVIEW's type safety, but I'd be curious to discuss how it could work.
If you're willing and able to share your code, and it's not excessively large (sounds like it might be, though), I'd be happy to look at it.
By the way, did my explanation about the potential for deadlock make sense?