Compiler performance: Floating-point division of array by scalar

altenbach · ‎11-27-2023

@JÞB wrote:

Of course, I can be wrong.

I get the same result also with a constant....

LabVIEW Champion.

cordm · ‎11-28-2023

Try changing the array size. On my machine the difference is barely visible with 10M elements, going to 1M elements the difference is 2x.

That is probably due to cache size and memory latency dominating the result for larger array.

JÞB · ‎11-28-2023

@altenbach wrote:

@JÞB wrote:

Of course, I can be wrong.

I get the same result also with a constant....

Could you try very large and very small constants? My mind is still wrapping around the floating point math near 1 and several orders of magnitude away from 1. The IEEE 754 implementation finer points are probably over my head without duct tape but, something (maybe my 8-Ball) is saying to check further.

Worst case, I learn something.

"Should be" isn't "Is" -Jay

Novgorod · ‎11-28-2023

@cordm wrote:

Try changing the array size. On my machine the difference is barely visible with 10M elements, going to 1M elements the difference is 2x.

That is probably due to cache size and memory latency dominating the result for larger array.

Good point, I can confirm the 2x difference with 1M array size on a 13900K (while it's ~15% difference with 10M array size). On the old Xeon the difference increases from 3x (10M) to 6x (1M), probably it's just bad at double precision. With single precision, the difference is reduced to 45% (10M) and 2x (1M).

So we can agree that the difference is real and that it was a deliberate choice to not "optimize" the division of an array by a scalar?

cordm · ‎11-28-2023

It is definitely real outside of LabVIEW: https://stackoverflow.com/questions/4125033/floating-point-division-vs-floating-point-multiplication

The choice is deliberate in so far as LabVIEW adheres to the standard. For other compilers, you have to explicitly enable this optimization. It does not come with the usual -oN levels.

To quote the gcc optimization options:

-funsafe-math-optimizations
Allow optimizations for floating-point arithmetic that (a) assume that arguments and results are valid and (b) may violate IEEE or ANSI standards. When used at link time, it may include libraries or startup files that change the default FPU control word or other similar optimizations.

This option is not turned on by any -O option since it can result in incorrect output for programs that depend on an exact implementation of IEEE or ISO rules/specifications for math functions. It may, however, yield faster code for programs that do not require the guarantees of these specifications. Enables -fno-signed-zeros, -fno-trapping-math, -fassociative-math and -freciprocal-math.

The default is -fno-unsafe-math-optimizations.

If they enabled it and you were to compare computation results from LabVIEW and another language, you would be pulling your hair where those little deviations were coming from.

If you want the most performance and you know what you are doing, create a library in your language of choice where you can tune the compiler to your hearts content (and gives you access to the latest instruction set).

rolfk · ‎11-28-2023

@cordm wrote:

To quote the gcc optimization options:

-funsafe-math-optimization

The name of that option already tells you that NI will NOT enable that in their compiler, and if they let you change that somehow, it would be at best some fairly well hidden and obscure INI file setting. But that would mean LabVIEW might do that everywhere, which is definitely NOT what you want!

I can of course hear your request already, why not a property on VI level? Well, what about the other 2578 optimization options that gcc, Intel C, Microsoft C and more obscure C compilers support? Lets add them all too, and let the average user fumble with even more options he never cared about, never will care about and which might or might not improve things depending on CPU, chipset, available memory, the moon phase and a few other random events.

Rolf Kalbermatter My Blog

DEMO, Electronic and Mechanical Support department, room 36.LB00.390

altenbach · ‎11-28-2023

In cases like this, I have been sprinkling 1/x in the right places for decades, even without ever benchmarking, but just because it seems more reasonable. We don't need to ever rely on the compiler to do that for us. 😄

LabVIEW Champion.

LabVIEW

Compiler performance: Floating-point division of array by scalar

Re: Compiler performance: Floating-point division of array by scalar

Re: Compiler performance: Floating-point division of array by scalar

Re: Compiler performance: Floating-point division of array by scalar

Re: Compiler performance: Floating-point division of array by scalar

Re: Compiler performance: Floating-point division of array by scalar

Re: Compiler performance: Floating-point division of array by scalar

Re: Compiler performance: Floating-point division of array by scalar