If I understand correctly, you want to read 16 bits in a single pass using the parallel port.
That may be a challenge since there are only 8 data bits and 4 control bits (that is if my memory is correct..)
You can use the 4 control bits for data if you set the port properly (my brain does not recall all the process for this one)...
However, you'll be missing 4 bits for the word write / read. The alternative is two shots.
You could use a 16-bit latch and use a control signal to capture / release the data. However, you would need two cycle each (write or read).
It's been a while since I read that pdf doc. I do remember that there were trick you could do with the port, but I cannot remember extending it to 16 bits.
R.