04-15-2013 11:36 AM
I need to optimize an extensive and time-critical bitwise logical function, so I want to represent it with minimum logical operations. SSE2 includes four bitwise logical instructions: AND, OR, XOR, AND NOT. If I connect wire A (UINT64) to the first input of an LV AND function, wire B (UINT64) to the input of an LV NOT function, and wire from the output of the LV NOT function to the second input of the LV AND function, how many SSE2 instructions will be used?
A and (not B) = A and not B -- at least one instruction
or
A and (not B) = A and (1 and not B) -- at least two instructions
What would be better to represent the function with: {AND, OR, XOR, NOT} or {AND, OR, XOR, AND NOT}, where AND NOT is built by means of two LV functions?
04-15-2013 11:55 AM - edited 04-15-2013 12:02 PM
I assume you are you operating on arrays, not scalars. Right?
Instead of guessing, you should wire up a few alternatives and benchmark them. Also remember that the compound node allows input inversion.
04-15-2013 12:02 PM
Yes, I get 18 UINT64 arrays on the input and return 2 UINT64 arrays.
04-15-2013
01:11 PM
- last edited on
05-05-2025
04:35 PM
by
Content Cleaner
@gmiroshnichenko wrote:
Yes, I get 18 UINT64 arrays on the input and return 2 UINT64 arrays.
18 arrays (what size?) or an array with 18 U64 elements? Where is the reduction from 18 to 2 taking place? Where is the data coming from?
For me it is hard to imagine that a few bitwise operations are the bottleneck in the overall operation. Most likely, their speed is insignificant compared to everything else. Good overall code desing is probably more important (inplaceness, avoiding data copies, etc.).
In LabVIEW, there can be a significant difference between how you wire the primitives on the block diagram, and the compiled LabVIEW code. The LabVIEW compiler is quite sophisticated and your code alternative might even result in the same code under the hood.
As I said, benchmarking is probably the only way to really tell. Also don't forget parallelization. Is this on a desktop or LabVIEW RT.
04-15-2013
02:55 PM
- last edited on
05-05-2025
04:35 PM
by
Content Cleaner
@altenbach wrote:
@gmiroshnichenko wrote:
Yes, I get 18 UINT64 arrays on the input and return 2 UINT64 arrays.
18 arrays (what size?) or an array with 18 U64 elements? Where is the reduction from 18 to 2 taking place? Where is the data coming from?
For me it is hard to imagine that a few bitwise operations are the bottleneck in the overall operation. Most likely, their speed is insignificant compared to everything else. Good overall code desing is probably more important (inplaceness, avoiding data copies, etc.).
In LabVIEW, there can be a significant difference between how you wire the primitives on the block diagram, and the compiled LabVIEW code. The LabVIEW compiler is quite sophisticated and your code alternative might even result in the same code under the hood.
As I said, benchmarking is probably the only way to really tell. Also don't forget parallelization. Is this on a desktop or LabVIEW RT.
There is the cyclic process:
1) 2 arrays with 40 U64 elements are expanded to 18 similar arrays with "Rotate 1D Array", logical shifts, and applying bitmasks.
2) The 18 arrays are reduced to the 2 arrays with bitwise logical operations.
3) New arrays are assessed.
The purpose of it is the enumeration of possibilities in the game for two players, and the program is an AI. (It is a contest of AIs. The game is a modification of "game of life".) Each cycle allows to model a change of the game board. The time for thinking is very short, and the goal is to test as many variants as possible.
Thank you, I will try several ways of wiring. This is a desktop, and it will be possible to use parallelization in for loops.