Well, there's a certain amount of overhead in VISA as it translates from its own high-level API to the respective serial or GPIB or other more hardare-specific API. VISA may well be doing some buffering or caching in between these layers. I'm not sure this is something you need to 'correct,' as it may be VISA's intended behavior.
If you're just looking to benchmark response time from your instrument, you could just send something other than the tick count in your VISA Write node, and start your timer immediately after. True, you'd only be timing one half of your communications, but the receive side is probably the half you're interested in, anyway.