Logical function optimization taking SSE2 into consideration

gmiroshnichenko · ‎04-15-2013

I need to optimize an extensive and time-critical bitwise logical function, so I want to represent it with minimum logical operations. SSE2 includes four bitwise logical instructions: AND, OR, XOR, AND NOT. If I connect wire A (UINT64) to the first input of an LV AND function, wire B (UINT64) to the input of an LV NOT function, and wire from the output of the LV NOT function to the second input of the LV AND function, how many SSE2 instructions will be used?

A and (not B) = A and not B -- at least one instruction
or
A and (not B) = A and (1 and not B) -- at least two instructions

What would be better to represent the function with: {AND, OR, XOR, NOT} or {AND, OR, XOR, AND NOT}, where AND NOT is built by means of two LV functions?

altenbach · ‎04-15-2013

I assume you are you operating on arrays, not scalars. Right?

Instead of guessing, you should wire up a few alternatives and benchmark them. Also remember that the compound node allows input inversion.

LabVIEW Champion.

gmiroshnichenko · ‎04-15-2013

Yes, I get 18 UINT64 arrays on the input and return 2 UINT64 arrays.

altenbach · ‎04-15-2013

@gmiroshnichenko wrote:

Yes, I get 18 UINT64 arrays on the input and return 2 UINT64 arrays.

18 arrays (what size?) or an array with 18 U64 elements? Where is the reduction from 18 to 2 taking place? Where is the data coming from?

For me it is hard to imagine that a few bitwise operations are the bottleneck in the overall operation. Most likely, their speed is insignificant compared to everything else. Good overall code desing is probably more important (inplaceness, avoiding data copies, etc.).

In LabVIEW, there can be a significant difference between how you wire the primitives on the block diagram, and the compiled LabVIEW code. The LabVIEW compiler is quite sophisticated and your code alternative might even result in the same code under the hood.

As I said, benchmarking is probably the only way to really tell. Also don't forget parallelization. Is this on a desktop or LabVIEW RT.

LabVIEW Champion.

gmiroshnichenko · ‎04-15-2013

@altenbach wrote:

@gmiroshnichenko wrote:

Yes, I get 18 UINT64 arrays on the input and return 2 UINT64 arrays.

18 arrays (what size?) or an array with 18 U64 elements? Where is the reduction from 18 to 2 taking place? Where is the data coming from?

For me it is hard to imagine that a few bitwise operations are the bottleneck in the overall operation. Most likely, their speed is insignificant compared to everything else. Good overall code desing is probably more important (inplaceness, avoiding data copies, etc.).

In LabVIEW, there can be a significant difference between how you wire the primitives on the block diagram, and the compiled LabVIEW code. The LabVIEW compiler is quite sophisticated and your code alternative might even result in the same code under the hood.

As I said, benchmarking is probably the only way to really tell. Also don't forget parallelization. Is this on a desktop or LabVIEW RT.

There is the cyclic process:
1) 2 arrays with 40 U64 elements are expanded to 18 similar arrays with "Rotate 1D Array", logical shifts, and applying bitmasks.
2) The 18 arrays are reduced to the 2 arrays with bitwise logical operations.
3) New arrays are assessed.

The purpose of it is the enumeration of possibilities in the game for two players, and the program is an AI. (It is a contest of AIs. The game is a modification of "game of life".) Each cycle allows to model a change of the game board. The time for thinking is very short, and the goal is to test as many variants as possible.

Thank you, I will try several ways of wiring. This is a desktop, and it will be possible to use parallelization in for loops.

LabVIEW

Logical function optimization taking SSE2 into consideration

Logical function optimization taking SSE2 into consideration

Re: Logical function optimization taking SSE2 into consideration

Re: Logical function optimization taking SSE2 into consideration

Re: Logical function optimization taking SSE2 into consideration

Re: Logical function optimization taking SSE2 into consideration