Ben, I do believe you are correct. I usually don't stress out too much about running in the UI thread unless it impacts my performance enough to cause problems. It very well might in your case, since just about everything will be running there. However, I usually find something to get relief. In this case, you might try allocating your data as an array of clusters, each cluster containing an array. In this way, you get the same effect as the LV2 global, but you are now running in a non-UI thread. You can keep the data in a local shift register for easy access. The down side is that you will now have a gigabyte sized wire, so you have to be very careful about using it. It is probably doable, however, especially in LV7.1, where the in-placeness is fairly good.
Check out the attached VI (LV6.1) for a simple test of this concept. Since the channels are allocated separately from the main array, you can use non-contiguous memory and get more of your total memory. I managed to allocate about 1.5GBytes on a system with 0.5GBytes of physical RAM using channels with 50,000 data points. Yes, it took awhile, and my disk drive aged 5 days in 5 minutes.
I have also had some success with non-thread-safe DLLs (HDF5, for example) by using LabVIEW methods (e.g. semaphores, notifiers, etc.) to ensure serial access and calling as reentrant to avoid the UI thread. Performance increases, but you have to be very careful. Not all DLLs will handle this gracefully.