Trim Whitespace.vi is not re-entrant. Why?

sth · ‎06-17-2017

@drjdpowell wrote:

Original conversation on OpenG Trim Whitespace.

That conversation was entirely on speed and not memory. My guess is removing the reverse array will contribute to both but only noticeable for long strings. I think Phillip Brooks brings this up late in the thread. The answer is YES I may be using this in an RT (i.e. memory constrained) environment. But as I use it for string normalization pre-parsing commands are all in the 10s of bytes not 10s of K bytes.

mcduff · ‎06-17-2017

Just because a wire forks does not mean that the data is copied. That has changed since long long ago. The compiler delays deciding on a buffer copy until one of the forks modifies the data. In this case I just count white space and there is no need for a buffer copy. You can turn on highlighting of buffer copies to check this.

Thanks I did not know that. If the compiler does not make a buffer copy how can both operations proceed in parallel? (Even if both operations do not modify the data, the both need access, is this done serially or in parallel? Just curious how you know.) So far, my initial tests in LabVIEW 2017 with the parallel read only access DVRs, it has a slower read when used in that mode, even though it is read only.

1. You lose the parallelism by doing both loops serially.

Yes I know. But parallelism is often only faster if you have large data sets, you said you have 10 bytes of data. Otherwise the overhead of multiple threads is slower along with more memory allocation. You may want a subVI that is single-threaded and fast that can run concurrently in multiple places. (This is my typical use case.) If your subVI uses too many threads then your application will be thread starved.

You need to test for your use case which solution works the best 90% of the time.

mcduff

crossrulz · ‎06-17-2017

@mcduff wrote:

Even if both operations do not modify the data, the both need access, is this done serially or in parallel? Just curious how you know.

There are a few presentations floating around that talk about this. Since the data is not changing, both parallel threads can access it at the same time.

There are only two ways to tell somebody thanks: Kudos and Marked Solutions
Unofficial Forum Rules and Guidelines
"Not that we are sufficient in ourselves to claim anything as coming from us, but our sufficiency is from God" - 2 Corinthians 3:5

sth · ‎06-17-2017

@mcduff wrote:

Just because a wire forks does not mean that the data is copied. That has changed since long long ago. The compiler delays deciding on a buffer copy until one of the forks modifies the data. In this case I just count white space and there is no need for a buffer copy. You can turn on highlighting of buffer copies to check this.

Thanks I did not know that. If the compiler does not make a buffer copy how can both operations proceed in parallel? (Even if both operations do not modify the data, the both need access, is this done serially or in parallel? Just curious how you know.) So far, my initial tests in LabVIEW 2017 with the parallel read only access DVRs, it has a slower read when used in that mode, even though it is read only.

This was a compiler optimization issue that has been ongoing. I think it was fairly long ago that they started trying to look at actual data usage. But two cores can read from an array in parallel. You only get a race condition if they are modifying the data. In reality chunks of the array are loaded into L1 or L2 cache at the CPU level.

I believe that DVRs force it to a single thread. When one DVR loop (icon, area... not sure the term??) has a CPU executing inside then all other access to the DVR is blocked. A DVR access forces a lock on that data. They are for different designs.

Agree about the speed and parallelism on this for my use case. Still I think that NI makes a lot of the string utilities reentrant and for some reason this one isn't. I maintain that this is an oversight on NIs part and should issue a CAR. Until someone gives me a reason that it should not be reentrant. The fact that their implementation is not great code is another long standing problem.

mcduff · ‎06-17-2017

This is obviously a few version ago, not sure if it still applies

https://webcache.googleusercontent.com/search?q=cache:CxA1u9z79AQJ:https://lavag.org/topic/7375-shou...

From Aristos Queue

I created major problems for LabVIEW when I made the "Trim Whitespace.vi" be reentrant a few versions ago. Just launching LabVIEW's Getting Started Window spawned 150+ copies of that subVI. It was a major memory hog. In LV8.5, that subVI is no longer reentrant. The tradeoffs between the thread synchronization and the memory usage were such that not being reentrant is better.

Now, in LV8.5 we also introduced pooled reentrancy. The LabVIEW R&D team has been seriously debating a recommendation that the vast majority of non-state-maintaining-pure-function VIs should be changed to pooled reentrancy. In the long-term, this may be something we encourage. The "Trim Whitespace.vi" is a prime candidate for this status... we didn't do that in 8.5 because the feature was brand new --- since that subVI could be used in tons of user VIs around the world already, we didn't want any potential bugs in the new feature to impact existing VIs.

mcduff · ‎06-17-2017

I believe that DVRs force it to a single thread. When one DVR loop (icon, area... not sure the term??) has a CPU executing inside then all other access to the DVR is blocked. A DVR access forces a lock on that data. They are for different designs.

In LV2017 there is a new DVR mode called "Read Only Parallel Access". It is only used if and only if the data is not modified. I would assume there are no blocks on this since no modifications, but I am not sure. But in my application the reads are slower when using it.

Cheers,

mcduff

sth · ‎06-18-2017

@mcduff wrote:

In LV2017 there is a new DVR mode called "Read Only Parallel Access". It is only used if and only if the data is not modified. I would assume there are no blocks on this since no modifications, but I am not sure. But in my application the reads are slower when using it.

That makes a lot of sense. Actually for memory and speed I am trying to use DVRs as a circular buffer implementation from a whole bunch of asynchronous network processes and that would help a *LOT*. I played with the beta but haven't installed the final version yet. But sort of wandering OT here. This thread has been a lot more involved than I thought.

sth · ‎06-18-2017

From Aristos Queue

I created major problems for LabVIEW when I made the "Trim Whitespace.vi" be reentrant a few versions ago. Just launching LabVIEW's Getting Started Window spawned 150+ copies of that subVI. It was a major memory hog. In LV8.5, that subVI is no longer reentrant. The tradeoffs between the thread synchronization and the memory usage were such that not being reentrant is better.

Well, that is a good reason from AQ. Not sure what is meant by "pooled" reentrancy. I think there is a performance hit as each clone is activated and a data space allocated for each instance. This could be significant. This is the only version allowed at subroutine priority which these utilities run at.

However, I am not sure what that actually means rather than using data space on the fly. If in this case there is a string input, then data space for a maximal string is *not* preallocated. Possibly this preallocation only applies to scalars and not string/array variable size elements where actually a pointer is normally used??? Not preallocating data should mean that temporary storage is pushed onto the stack (heap?) and used there. Either way, but avoiding the buffer copy operation should eliminate the memory hit (I assume they were passing big strings?).

The current implementation (I could look at the 8.5.1 since I run a bunch of those systems) is to use the match string primitive which makes a copy of the input split into 3 parts. This doubles the input string at each match string. The two match strings are done in series and thus memory usage is approximately 3 times the input string. (with some deallocation of the discarded parts).

Possibly some of these design decisions should be reviewed somewhere between the 8.5.1 version (very buggy!!!) and the 2016 version. Eight major releases over 11 years is a bit of a change. However that is a good reason not to make it re-entrant.

crossrulz · ‎06-18-2017

sth wrote:
Not sure what is meant by "pooled" reentrancy.

Shared Clone Reentrant. This means that a clone is created only when needed. But those clones are reused by other calls not ran at the same time. Preallocated Clone Reentrant creates a copy of the VI at each call. So the Shared Clone will reduce the memory footprint at the cost of a little jitter when an additional clone is required. Depending on how the VIs are called, this could be a significant memory reduction at the cost of a small time hit. But that time hit will not be as bad as the non-reentrant VI blocking other calls. So in summary, it is a balancing act.

There are only two ways to tell somebody thanks: Kudos and Marked Solutions
Unofficial Forum Rules and Guidelines
"Not that we are sufficient in ourselves to claim anything as coming from us, but our sufficiency is from God" - 2 Corinthians 3:5

sth · ‎06-18-2017

Yes, those are the two types. I hadn't considered the label for the "pool" as applying to the first type as a synonym for shared.

The "shared clone reentrant" is not available when the VI is set to run at subroutine priority as these utilities are.

What I was trying to get at in my last post was what is actually cloned when an instance either shared clone or preallocated clone. There is no need to duplicate the compiled instructions since they can be used by multiple processors simultaneously. There is no need to duplicate (clone?) the FP and BD since those are not open generally. The only thing that needs duplicating is the data space (DS). Now this DS will consist of some number of scalars, and in this case some arrays (strings). The arrays are not preallocated since the size is unknown so it is probably a handle (pointer) to an array.

The preallocated clone reserves (allocates) this DS in the calling VI when it is compiled. The shared clone reserves this space dynamically when it is called in the calling chain (on the heap). Possibly when it exits the memory is reserved for another call and not deallocated.

I can't come up with a reason why a clone to "Trim Whitespace.vi" should be more than a few hundred bytes. At the most. Why this should be a big hit in performance either preallocated or shared is my question.

LabVIEW

Trim Whitespace.vi is not re-entrant. Why?

Re: Trim Whitespace.vi is not re-entrant. Why?

Re: Trim Whitespace.vi is not re-entrant. Why?

Re: Trim Whitespace.vi is not re-entrant. Why?

Re: Trim Whitespace.vi is not re-entrant. Why?

Re: Trim Whitespace.vi is not re-entrant. Why?

Re: Trim Whitespace.vi is not re-entrant. Why?

Re: Trim Whitespace.vi is not re-entrant. Why?

Re: Trim Whitespace.vi is not re-entrant. Why?

Re: Trim Whitespace.vi is not re-entrant. Why?

Re: Trim Whitespace.vi is not re-entrant. Why?