11-09-2010 11:47 AM
Good points. I realized after posting that putting the indicators in the sequence structure might not be a good idea, but moving them had no determinable effect on execution speed. Disabling debugging speeds up all the approaches by about 5ms. I cheated a bit with the "2 shifts" approach - if there's data in the second-highest 4 bytes of the 32-bit value it will get included in the upper 12bit value. Adding an AND to mask that off is negligble additional time. The profiler seems to be giving different memory results depending on the order in which I try the algorithms and I don't have time to sort that out. In case anyone wants to play further, here it is again with those modifications:
11-09-2010 11:55 AM - edited 11-09-2010 12:02 PM
Also remember that you can parallelize my FOR loop. I get a 2x speedup on my core2 duo..
You should also delete the indicator for the raw array. Since the update is asynchronous, it will overlap with the benchmarking code in CPU use.