Compiler is Too Smart for My Own Good

Mark_Yedinak · ‎10-29-2021

@mcduff wrote:

@billko wrote:

It's a bug. The problems that it creates are much more problematic than the problem it is solving. I think an actual compromise, and not an "agree to disagree" would be to make the developer aware of what is actually going to happen, for example, highlighting the code affected. Compiler stays the same, but the developer can reprogram if needed.

Consider the following hypothetical optimization; I have no idea whether it exists.

Look at the loop below, Array A and Array B have NO data dependencies with each other. Both arrays go into a case structure where subVI A takes 300ms to execute and subVI B takes 3 ms to execute. After the loop each array goes into the other subVI.

For pure dataflow the VI will take 600ms to run, 300ms in the loop, 300 ms outside the loop. Now assume there is some compiler optimization that says data is preserved on the wire, the wires don't interact, and whether it exits the case structure at the same time is irrelevant as the data will not change. So now the optimizer allows the wire to leave when finished, so now the execution time of the VI could be as low as 303ms.

This seems analogous to the speculative execution bug, seems efficient to have the optimization but it may bite you in the butt sometimes as in Paul's case. Once again, for some cases I may like the optimization for other cases maybe not.

But it fundamentally breaks the entire concept of the data flow paradigm. Data flow dictates that the data out of a node is available only when the node has completed all of its execution. By allowing the optimizer to do what it wants for efficiency sake you no longer have data flow programming. In your example above since there is no data dependency between A and B, the programmer can simply separate them into separate loops and they gain the same efficiency. Breaking the entire concept of the language paradigm for the sake of optimization is simply wrong. By allowing it you now have a language that has a bunch of exceptions which will become maddening for a programmer to keep track of. All data is output from a node when it completes execution, unless condition A exists within the node, or perhaps if the phase of the moon is just right, or when the optimizer decides the underlying principle of the language doesn't apply. I'm sorry, some things are simply off the table for the optimizer to do and this is one of them.

Mark Yedinak
Certified LabVIEW Architect
LabVIEW Champion

"Does anyone know where the love of God goes when the waves turn the minutes to hours?"
Wreck of the Edmund Fitzgerald - Gordon Lightfoot

billko · ‎10-29-2021

@mcduff wrote:

@billko wrote:

It's a bug. The problems that it creates are much more problematic than the problem it is solving. I think an actual compromise, and not an "agree to disagree" would be to make the developer aware of what is actually going to happen, for example, highlighting the code affected. Compiler stays the same, but the developer can reprogram if needed.

Consider the following hypothetical optimization; I have no idea whether it exists.

Look at the loop below, Array A and Array B have NO data dependencies with each other. Both arrays go into a case structure where subVI A takes 300ms to execute and subVI B takes 3 ms to execute. After the loop each array goes into the other subVI.

For pure dataflow the VI will take 600ms to run, 300ms in the loop, 300 ms outside the loop. Now assume there is some compiler optimization that says data is preserved on the wire, the wires don't interact, and whether it exits the case structure at the same time is irrelevant as the data will not change. So now the optimizer allows the wire to leave when finished, so now the execution time of the VI could be as low as 303ms.

This seems analogous to the speculative execution bug, seems efficient to have the optimization but it may bite you in the butt sometimes as in Paul's case. Once again, for some cases I may like the optimization for other cases maybe not.

Integrity of code should always come first; this violates the first principle of dataflow. But I understand this is a hypothetical, so let's say this optimization did exist. Since it violates the principle of dataflow so blatantly, I would expect in the least some kind of visual warning to tell me it's not going to happen the way I think it is.

Let's say I was depending on the results of either wire NOT being available until the case structure completes - and this is a very reasonable thing to assume - otherwise flush the whole idea of dataflow down the toilet. Now what? I don't want the compiler to consider whether or not it should violate dataflow because it thinks it should. Maybe IT is smart enough to figure it out, but am I? And that brings us to the subject line of this topic. The compiler was too smart for the developer. Not because the developer coded it incorrectly, but the compiler arbitrated a decision to violate dataflow principles on something the developer was not likely to consider.

Bill

(Mid-Level minion.)
My support system ensures that I don't look totally incompetent.
Proud to say that I've progressed beyond knowing just enough to be dangerous. I now know enough to know that I have no clue about anything at all.
Humble author of the CLAD Nugget.

mcduff · ‎10-29-2021

This is what you said for the For Loop Case

Clearly the optimizer can remove the for loop and simply return the constant. There is no other code in the loop. Nothing is fundamentally changed. The compiler cannot definitively know the intent of the code in the original post and in cases like that it MUST defer to the language syntax. If the optimizer is free to alter data flow than how can we ever know how our code will work?

If the optimizer does the following, changes the case structure from

to the following

I can make the same argument. Nothing has changed, the data is the same, only the timing has changed. By optimizing out the for loop in the earlier example, nothing changed except the timing. The compiler can NEVER infer intent from any code unless it starts using some sort of AI/ML. Did the compiler change the intent of my code by removing the for loop? Is it possible I wanted a delay specified by a for loop constant?

Do I think it violates the purity of a dataflow language, yes. But for me, most of the time the compiler will make the correct optimizations and my code will run better. That is why I am hesitant to advocate for a compiler that passes every purity test. Be careful what you wish for.

Once again, besides Paul's example, nobody here has given any real world examples of the supposed optimization being a problem for their code. Do we have an edge case here or a real world problem that is affecting everybody's code?

Mark_Yedinak · ‎10-30-2021

@mcduff wrote:

This is what you said for the For Loop Case

Clearly the optimizer can remove the for loop and simply return the constant. There is no other code in the loop. Nothing is fundamentally changed. The compiler cannot definitively know the intent of the code in the original post and in cases like that it MUST defer to the language syntax. If the optimizer is free to alter data flow than how can we ever know how our code will work?

If the optimizer does the following, changes the case structure from

to the following

I can make the same argument. Nothing has changed, the data is the same, only the timing has changed. By optimizing out the for loop in the earlier example, nothing changed except the timing. The compiler can NEVER infer intent from any code unless it starts using some sort of AI/ML. Did the compiler change the intent of my code by removing the for loop? Is it possible I wanted a delay specified by a for loop constant?

Do I think it violates the purity of a dataflow language, yes. But for me, most of the time the compiler will make the correct optimizations and my code will run better. That is why I am hesitant to advocate for a compiler that passes every purity test. Be careful what you wish for.

Once again, besides Paul's example, nobody here has given any real world examples of the supposed optimization being a problem for their code. Do we have an edge case here or a real world problem that is affecting everybody's code?

To simply be nit picky, you really should use loops in your example instead of case statements. Your diagrams have case statements and your text is discussing loops. It doesn't help your argument to be inconsistent.

I do somewhat see your point with your example but there is a fundamental problem with altering the data flow via the optimizer. In your example, let's say that I, the programmer, know for whatever reason that I want the second A300 node to execute after the completion of both the first A300 and B3. As written with both contained in the same node (loop or case statement) I can guarantee that the second A300 and B3 will execute after the completion of the containing node simply because I placed both inside the node and the language syntax states that will be the case. But if we allow the compiler to optimize it in the manner you suggest, I can no longer guarantee that and I the programmer no longer have that level of control over the application. Is the compiler smart enough to dig through the contents of A300 and B3 to realize that they are using some global variable that needs to be synchronized?

What you propose as useful and a benefit completely takes control of the application away from the developer. And as you state, unless there is some very powerful AI/ML involved the optimizer should not be making those decisions when involving core behaviors of the language syntax, i.e. data flow.

Mark Yedinak
Certified LabVIEW Architect
LabVIEW Champion

"Does anyone know where the love of God goes when the waves turn the minutes to hours?"
Wreck of the Edmund Fitzgerald - Gordon Lightfoot

mcduff · ‎10-30-2021

@Mark_Yedinak wrote:

To simply be nit picky, you really should use loops in your example instead of case statements. Your diagrams have case statements and your text is discussing loops. It doesn't help your argument to be inconsistent.

Sorry for the confusion. For the "For Loop" I was referring to an earlier message where the compiler removed the For Loop and just returned the last value. I remember sometime back that the JavaScript tests had some loop in them that only returned the last value. One JS engine optimized out the for loop and had an extremely fast test result. The other JS engines said the optimization was not fair. By removing the For Loop, I can argue that the compiler changed my intention.

@Mark_Yedinak wrote:
I do somewhat see your point with your example but there is a fundamental problem with altering the data flow via the optimizer. In your example, let's say that I, the programmer, know for whatever reason that I want the second A300 node to execute after the completion of both the first A300 and B3. As written with both contained in the same node (loop or case statement) I can guarantee that the second A300 and B3 will execute after the completion of the containing node simply because I placed both inside the node and the language syntax states that will be the case. But if we allow the compiler to optimize it in the manner you suggest, I can no longer guarantee that and I the programmer no longer have that level of control over the application. Is the compiler smart enough to dig through the contents of A300 and B3 to realize that they are using some global variable that needs to be synchronized?

I would suggest that the programmer use DVRs, semaphores, etc, rather than global variables. I don't think the compiler should reward bad programming techniques.

@Mark_Yedinak wrote:
What you propose as useful and a benefit completely takes control of the application away from the developer. And as you state, unless there is some very powerful AI/ML involved the optimizer should not be making those decisions when involving core behaviors of the language syntax, i.e. data flow.

Your credentials are way stronger than mine, I do not have a CS background, my background is in Chemistry, so I am not going to win this discussion. You as a LabVIEW champion have much more sway over NI than myself; so I am sure your opinions in this discussion have more weight at NI.

That being said, I wish you approach/opinion was more nuanced. As Paul showed there is a case where this optimization is not optimal. But what about the cases where this optimization is optimal. Computer speeds are not increasing like in the past, the trend is higher core counts which benefit from parallelization. The compiler is trying really hard to find places where parallelization can occur. Sometimes it is right, sometimes it is wrong. To limit that ability for the sake of purity may end up hindering LabVIEW applications in the future. After the NXG debacle, I prefer to add roadblocks that may not be needed.

Kevin_Price · ‎10-30-2021

Count me in with #TeamBug.

What if those subvi's are used to control or operate equipment? Then the hypothetical "optimization" in msg #43 might actually be *dangerous*. I definitely don't want the compiler deciding it can sometimes ignore the dataflow sequencing I've programmed. It can't possibly know the consequences of such a re-ordering.

-Kevin P

ALERT! LabVIEW's subscription-only policy came to an end (finally!). Unfortunately, pricing favors the captured and committed over new adopters -- so tread carefully.

altenbach · ‎10-31-2021

So do we have an actual example that demonstrates the original observation of the suspected bug? (A snippet with broken wires is not suitable!).

We need to have two unbroken VIs (caller and subVI) with all relevant settings verifiable (e.g. VI options, node setup, LabVIEW version, etc. ) in a zip file read to unzip and run. If we can verify the bug, I an sure the compiler team would be happy to hear about it.

LabVIEW Champion.

Dataflow_G · ‎10-31-2021

The code in the first post isn't correctly wired. I downloaded the snippet and tried to reproduce the error, only to discover the reference wire out of the case structure is NOT connected to the FP.Close VI method. It simply passes behind it and straight into Close Reference. So FP.Close runs in parallel and can execute at any time, in classic LabVIEW race condition style.

So this thankfully isn't a LabVIEW bug. And is a good reason to use VI Analyzer!

on VIPM, GitHub
on VIPM, GitHub
on VIPM, GitHub

billko · ‎10-31-2021

@Dataflow_G wrote:

The code in the first post isn't correctly wired. I downloaded the snippet and tried to reproduce the error, only to discover the reference wire out of the case structure is NOT connected to the FP.Close VI method. It simply passes behind it and straight into Close Reference. So FP.Close runs in parallel and can execute at any time, in classic LabVIEW race condition style.

So this thankfully isn't a LabVIEW bug. And is a good reason to use VI Analyzer!

A much easier way (and I failed to do this) is just to use the BD cleanup. Sure enough, you're right.

Bill

(Mid-Level minion.)
My support system ensures that I don't look totally incompetent.
Proud to say that I've progressed beyond knowing just enough to be dangerous. I now know enough to know that I have no clue about anything at all.
Humble author of the CLAD Nugget.

Mark_Yedinak · ‎11-01-2021

And some incorrect wiring spurred quite the healthy debate.

Mark Yedinak
Certified LabVIEW Architect
LabVIEW Champion

"Does anyone know where the love of God goes when the waves turn the minutes to hours?"
Wreck of the Edmund Fitzgerald - Gordon Lightfoot

LabVIEW

Compiler is Too Smart for My Own Good

Re: Compiler is Too Smart for My Own Good

Re: Compiler is Too Smart for My Own Good

Re: Compiler is Too Smart for My Own Good

Re: Compiler is Too Smart for My Own Good

Re: Compiler is Too Smart for My Own Good

Re: Compiler is Too Smart for My Own Good

Re: Compiler is Too Smart for My Own Good

Re: Compiler is Too Smart for My Own Good

Re: Compiler is Too Smart for My Own Good

Re: Compiler is Too Smart for My Own Good