Hello I would like to clarify a few points:
1. Multiple memory reads cannot run in parallel in an SCTL. I'm assuming that this is because two memory locations cannot be accessed simultaneously.
This is correct. You cannot access the memory block simultaneously. Since you are trying to use the memory blocks in a SCTL the calls would occur simultaneous. The memory block is essentially a shared resource. Therefore if you have multiple accessors to the shared resource an arbiter must be used to control what function gets access to the resource. Outside of a SCTL this arbitration is usually fine, but arbitration is not supported in a SCTL. That is what the initial error you saw was telling you (Invalid Arbitration for SCTL). While some functions (I/O) allow you to modify the arbitration options, the memory blocks do not. If you are going to use a memory block in a SCTL you cannot have 2 memory reads or 2 memory writes in the block diagram.
2. I initially missed the fact that the shift register to which the Memory Read vi was attached must be unititialized. The Memory Read vi will not work in the SCTL if the shift register is initialized.
Correct! For the memory and lookup table functions in a SCTL the shift register on the loop is actually a reference to a register that is within the memory/lookup table. You cannot access this register externally because it is a component of the memory block. By forcing the programmer to use shift registers on the loop the program execution is better understood i.e. the outputs of the memory block has an iteration delay. This would not be obvious if you could wire an indicator directly to the memory block. Of course you can use the "Select" and "First Call?" functions to write a simple initialization routine inside the SCTL.
3. Use of the Memory Read vi also appears to drastically limit the amount of other processing which can be performed in the SCTL. There also appears to be a relation between this and the amount of processing being performed in other parts of the block diagram.
I would not agree with this statement. The memory function takes an entire tick to execute. That means you cannot put code serially with the memory function in a SCTL. You can however use pipelining to execute other functions in parallel to the memory function. Since an FPGA supports truly parallel operations the memory function has no role in the execution of parallel functions. I am not sure which resources you were referring to but code within the SCTL should use less slices than code outside a SCTL. I have attached a screenshot of a simple test of this. The build reports show that the code using the SCTL uses less slices. So perhaps the additional resource usage is coming from somewhere else in your code.
4.Perhaps the most intriguing observation is that the Memory Read vi does not appear to work in the SCTL if the Memory Read vi is used at any other point in the block diagram, even if it is in series with the SCTL.
This is a good point and goes along the line of your first point. In order to handle multiple accessors we put in arbiters. The code generation does not detect that your functions will not be executed at the same time and thus uses an arbiter. An elegant workaround for this situation is to implement a state machine within 1 SCTL. This way you only use one memory read block and it is in the SCTL.