1) Basically there are several ways to accomplish what you are trying to do, but the main way is to either override the "Pre UUT" and "Post UUT" callbacks in your client sequence file (the one with your tests), or to rewrite the default versions of these sequences in the process model file. There is an example of this in TestStand\Examples\Callbacks\ParallelModel\OverrideSerialNumForParallelModel.seq.
2) Runstate.Root.Locals.UUT.SerialNumber is really just for backwards compatibility for the Labview adapter for how the original teststand process model worked (it is where the labview adapter expects to find this information in order to pass it to a labview vi). You also need to set Runstate.Root.Parameters.TestSocket.UUT.SerialNumber. Really though it would be best to se
t this in the Pre UUT sequence, either in the process model itself, or in an overriding version of the sequence in your client sequence file. The reason is that the process model really needs to know the serial number even before calling the mainsequence of your client sequence file in order to correctly determine things like the report file path and other options that depend on the serial number. See the example I mentioned in 1) for help in doing this.
3) I'm not really sure about this one. A lot of the speed will depend on what exactly your tests are doing, what sort of hardware they need to interract with, and how fast the hardware itself is. I think the best way to determine this is with benchmarks as you are doing. If you find that it is too slow, try to narrow down where the slowness is by eliminating possibilities, from experience I find it isn't always where you'd expect.
Anyway, hope this helps,
Doug Melamed
NI