The problem with the performance is precisely the
evil floating point bit level hacking
The original code uses pointer redirection to do a no-op re-interpretation of a SGL pointer to I32 (or the other waay around I can't remember).
It's a core principle for the speed, but bear in mind that modern processors have functionalities to do this way faster than this method. This comes from the era where math co-processors were a thing and were not automatically part of the CPU.
Yes, it's more an academic curiosity today. It is still interesting to implement it in LabVIEW as an exercise.
As discussed elsewhere, if we just remove the FOR loop and operate on larger arrays, the speed per element can be 40-50x faster with the exact same result. (of course for small arrays, the FOR loop version is faster)