Hi Vishal,
The PCI-6534 has onboard memory and a FIFO used for transferring data to the PC buffer. The FIFO block which is used to transfer data from the Card to the PC memory using DMA is 16 samples deep. During operation, the onboard memory of the 6534 is transferred to the "DMA" FIFO in blocks of a given size and then the DMA transfers from this FIFO to the PC memory. The block sizes used by the DMA to transfer off the card, and by the card itself to transfer from onboard memory to the "DMA" FIFO are both dependent on the pattern clock speed and other factors. We don't have access to these parameters. The only parameter of value is that the computer's DMA will be able to handle 20MHz data that is 8 bits wide.
Now, once this data is transferred from the card to the PC memory, this memory will fill up unless LabVIEW or another application program reads the buffer. Typically, you will use a buffer transfer call to read a user configurable block size of data from the PC memory to the LabVIEW application memory. The parameters you do have access over are the PC buffer size and the block size and speed with which you read from the PC buffer to the programming environment memory. You will have to adjust these parameters to get right values for your particular application.
As for the restriction of using port 0 and port 2 before port 1 and port 3, port 0 and 1 are separated from port 2 and 3. These two groups can operate completely independently. Port 0 and 1 can not (same with 2 and 3). The card was designed to be able to add another byte to form a word. However, the way the card works is that the if you only need 8 bits, you will always use the lower order byte (port 0, 2 from the two different groups). Using port 1 intead of 0 shouldn't be a big deal thought since if you are only using 8 bits, you might as well use port 0 instead of 1. Finally, yes you can group 4 groups of 8 bits (port 0,1 clocked together and same direction while port 2,3 clocked together and use same direction) or you can have 2 16 bit ports (can be same direction or different). Finally all 4 ports can be combined to create a 32 bit word (obviously clocked together and same direction).
I'm not sure if I completely answered your question but I hope that helps a bit.
Ron