06-20-2017 08:04 PM
@mcduff wrote:
@Hoovah
Maybe I should sign up and join the lurker thread.
Look at the following screen shot. The case structures may add buffer copies and hence slower. Look below at see one less pair of dots when no case structures. This may be one reason why the native trim is slower.
I was guessing that, but I ran the profiler while running the test to get memory statistics. Even with a megabyte string I was getting a maximum memory foot print of a few k bytes. Not sure why since in every case the output would be a meg. But I assume that is internal copies and not the input and output copies that are credited against the tester VI memory which was large.
06-20-2017 08:06 PM
@mcduff wrote:
If I use your new tester and turn debugging off for the tester, then if I place this directly in a case
it is about 100x faster than any other method, no case structures for removing from both sides and concurrent execution no problem. Try it.
I believe it, but placing it in the top level may mean that the entire bit is optimized away, especially if debugging is off and wire probes not allowed. So I am not completely convinced that is actually trimming a string.
06-20-2017 08:28 PM
I believe it, but placing it in the top level may mean that the entire bit is optimized away, especially if debugging is off and wire probes not allowed. So I am not completely convinced that is actually trimming a string.
I do not understand, why would the end result change if debugging is on or off? If that is true all of the exe's that I compile might give incorrect results since I turn debugging off for the compiled version. Just output the last value to an indicator or make an array for each case.
If I am remembering correctly, whenever you benchmark debugging should be off. (This is mentioned in Altenbach's Presentation http://forums.ni.com/t5/2016-Advanced-User-Track/TS9524-Code-Optimization-and-Benchmarking/gpm-p/353...) When debugging is on, the simple substitution is slower than your solutions, however your VIs are subroutines, I believe, so no debugging by default. (Don't have LabVIEW here to check.)
Another small slowdown in the native Trim Whitespace comes from the case structures. Case structures are relatively fast for two cases, however, they become slower when matching more than two cases, which is true for the native function.
06-21-2017 04:53 AM
@mcduff wrote:
I believe it, but placing it in the top level may mean that the entire bit is optimized away, especially if debugging is off and wire probes not allowed. So I am not completely convinced that is actually trimming a string.
I do not understand, why would the end result change if debugging is on or off?
Because you don't have an end result. If your Test program doesn't actually use the trimmed strings, then the compiler may eliminate the code entirely. This is called "dead code elimination".
06-21-2017 04:55 AM
@sth wrote:
But they would not be all simultaneous since there is only 1 thread at subroutine priority!!
What do you mean? Subroutine priority means run-to-completion on the current thread, whatever that thread is. Multiple subroutines can be running at the same time, on different threads on different processors.
06-21-2017 07:54 AM
@drj
Got it. But why would the other subVIs give results? Why wouldn't the compiler eliminate them if they are not using the test strings? Or maybe the compiler is not yet smart enough. I will try to experiment today if I have time from my real job.
06-21-2017 08:27 AM
Because the value no longer
@mcduff wrote:
I believe it, but placing it in the top level may mean that the entire bit is optimized away, especially if debugging is off and wire probes not allowed. So I am not completely convinced that is actually trimming a string.
I do not understand, why would the end result change if debugging is on or off? If that is true all of the exe's that I compile might give incorrect results since I turn debugging off for the compiled version. Just output the last value to an indicator or make an array for each case.
If I am remembering correctly, whenever you benchmark debugging should be off. (This is mentioned in Altenbach's Presentation http://forums.ni.com/t5/2016-Advanced-User-Track/TS9524-Code-Optimization-and-Benchmarking/gpm-p/353...) When debugging is on, the simple substitution is slower than your solutions, however your VIs are subroutines, I believe, so no debugging by default. (Don't have LabVIEW here to check.)
Another small slowdown in the native Trim Whitespace comes from the case structures. Case structures are relatively fast for two cases, however, they become slower when matching more than two cases, which is true for the native function.
Because the value is not retained on a wire for debugging purposes. Send the output to a blank subVI after the timing call so the info has to be maintained and I think that will change the in-line timing. Subroutine calls are fairly fast and probably not the timing hit here. There is no issue of "wrong answer" with debugging off, but if a value is not used within a VI then it can be optimized away.
Again *speed is not the issue* I am raising. It is simultaneity of a multi-threaded program. I don't really care if the routine is optimized or not given other timing considerations.
06-21-2017 08:31 AM
@drjdpowell wrote:
@sth wrote:
But they would not be all simultaneous since there is only 1 thread at subroutine priority!!
What do you mean? Subroutine priority means run-to-completion on the current thread, whatever that thread is. Multiple subroutines can be running at the same time, on different threads on different processors.
No. Subroutine priority has only 1 thread and does not allow thread switching. Since it runs to completion without switching there is effectively only 1 thread. Look at the "Thread Config" utility. You will notice that you can set the number of threads for "High Priority" execution but not for Subroutine. The underlying execution engine for LV is a pre-multicore operation where they try to emulate system process scheduling.
06-21-2017 09:41 AM - edited 06-21-2017 09:54 AM
Pretty sure one of the presentations at NIWeek talked about the available execution threads and indicated there was more than one subroutine thread available - just only one process per thread should be scheduled in it. I'll try find the presentation title - it was one of the introductory RT ones if I recall correctly.
Edits: I'm uncertain if I misremembered the presentation, or if there was additional discussion beyond what is covered in the slides, but the slides make no mention of subroutine. The presentation can be found here: Optimizing Performance for LabVIEW Real Time Applications.
However, the detailed help gives this for subroutine:
When a VI runs at the Subroutine priority level, it effectively takes control of the thread in which it is running, and it runs in the same thread as its caller. No other VI can run in that thread until the subroutine VI finishes running, even if the other VI is at the Subroutine priority level. In single-threaded applications, no other VI runs. In execution systems, the thread running the subroutine does not handle other VIs, but the second thread of the execution system, along with other execution systems, can continue to run VIs.
06-21-2017 10:33 AM
Updated the bench mark. Made the following changes: