LabVIEW Idea Exchange

cancel
Showing results for 
Search instead for 
Did you mean: 
ChrisLudwig

Large Datasets XY Graph (with autobuffering and display envelope decimating)

Status: New

I am extending on an old idea, but the implementation is different than the OP so I made this a new idea:

https://forums.ni.com/t5/LabVIEW-Idea-Exchange/Decimation-feature-built-into-the-graph-indicators/id...

 

What I would want would be an XY graph with automatic disk buffering and on screen decimation.  Imagine this, I drop a super duper large data sets XY graph.  Then I start sending data in chunks of XY pairs to the graph (updating the graph at 10 hz while acquisition is running at +5000Hz).  We are acquiring lots of high rate data.  The user wants to see XX seconds on screen, probably 60 seconds, 120 seconds, or maybe 10 or 30 minutes, whatever.  That standard plot width in time is defined as a property of the plot.  So now data flows in and is buffered to a temp TDMS file on disk with only the last XX seconds of data showing on the graph.  The user can specify a file location for the plot buffers in the plot properties (read only at runtime). 

 

We decimate the incoming data as follows:

  • Calculate the maximum possible pixel width of the graph for the largest single attached monitor
  • Divide the standard display width in time by the max pixel width to calculate the decimation interval
  • Buffer incoming data in RAM and calculate the min and max value over the time interval that corresponds to one pixel width.  Write both the full rate data to the temp TDMS and the time stamped min and max values at the decimation interval
  • Plot a vertical line filling from the min to max value at each decimation interval
  • Incoming data will always be decimated at the standard rate with decimated data and full rate data saved to file
  • In most use, the user will only watch data streaming to the XY graph without interaction.  In some cases, they may grab an X scroll bar and scroll back in time.  In that case the graph displays the previously decimated values so that disk read and processing in minimized for the scroll back request.
  • If the user pauses the graph update, they can zoom in on X.  In that case, graph would rapidly re-zoom on the decimated envelope of data.  In the background, the raw data will be read from the TDMS and re-decimated for the current graph x range and pixel width and the now less decimated data will be enveloped on screen to replace the prior decimated envelope.  The user can carry on zooming in in this manner until there is at least one vertical line of pixels for every data point at which point the user sees individual points and not an envelope between the min and max values.
  • Temp TDMS files are cleared when the graph is closed. 
  • The developer can opt to clear out the specified temp location on each launch in case a file was left on disk due to a crash.

This arrangement would allow unlimited zooming and graphing of large datasets without writing excessive data to the UI indicator or trying to hold excessive data in RAM.  It would allow a user to scroll back over days of data easily and safely.  The user would also have access to view the high rate data should they need to. 

 

14 Comments
wiebe@CARYA
Knight of NI

I never use waveforms, but maybe decimation is used in Graphs, not XY Graphs nor Waveform Graphs? IIRC, A waveform can be build of waveforms so you can have gaps in the data (resulting in a non-constant dT)?

 

Not that there needs to be an excuse to skip decimation on XY\Waveform Graphs, but I think decimation was simply omitted when it got complex... 

ChrisLudwig
Member

My typical application is for time series data in flight test. We are acquiring data at a rate of about 5 GB/hour typically with tests running 1.5 to about 4 hours, occasionally longer, in the 6 to 12 hour range. Our highest rate data is usually 4096Hz, but occasionally faster, in the 5k to 8192Hz range. I know of many lab settings that are running their acquisition a few orders faster that this even. The fastest acquisition we normally encounter is to capture bearing ball pass frequencies so we can track bearing wear. This is computationally costly, but it has saved many lives and is now common in the rotorcraft and turbine engine industry. We don't graph that data though, we just analyze it and flag exceedances or show rates of change of the analyzed metric.

 

For the 5-8k data, I've used waveform graphs and XY graphs holding Y and time before. It really comes down to the data source. A waveform has a constant dt. That is often not the case for real acquired data as acquisition for us is often driven by multiple clocks on the test article, not by the acquisition system. This is the case with avionics databus sourced data for instance (the bulk of our lower rate data). We also often have data drop outs as we run radio telemetry in between the acquisition/record system and the data processing/display system. It makes sense for us to use XY graphs to handle all that. Time is always monotonic, but we can't assume a constant dt. Ignoring the dt variation is a bad practice when you are acquiring for hours and don't control the quality of the clocks.

 

It would be safe to assume a constant dt for onscreen display is some applications, but not all. I'd have to really think about that. 

 

It is great that at least some graph types already employ decimation when the plot is drawn to the UI, but what we would really probably need would be a decimation scheme that streams incoming data to disk and populates the graph from disk as the user is scrolling or just watching the last XX minutes on the graph streaming by to avoid holding millions of samples for each waveform in RAM. 

 

A SGL acquired at a rate of 4096Hz for 6 hours is 353MB in RAM, and that is if you are careful to limit copies of that data in RAM. Try that with 50 channels, and you have an issue. That is why I'd love to see a control that managed an on-disk plot buffer for the developer. The projects I did in that past have a size limit on the plot buffer held in RAM. One actually had two buffers in RAM, one short duration full rate buffer, and one low rate enveloped buffer from the start of the test. That was not fun to manage switching between the two buffers in the UI, but we made it work. Both ideas work fine, but I usually only hold about 30 minutes of full rate data in the plot buffer (user configurable). I don't give the option to scroll back more than that, but I wish I could make that happen by triggering a read from disk with decimation occurring in the read function to suit the requested X scale on the graph. The competition offers this as their data streams to a server and the clients are constantly sending requests to the server to provide the data that the user wants on the plots. They can have anything they want on a plot, but at the cost of some very, very expensive servers to keep up with all the clients' data requests. I instead prefer to stream all data to all clients via UDP and take advantage of the distributed processing power for each client to provide for their own processing overhead.

 

I have an upcoming project where I'll be doing post test data review from data stored in a TDMS file. I'll be making some new TDMS read VIs that allow fast access of input time intervals or start time and duration from the TDMS. Since dt isn't an exact constant, it would involve an iteration to zero in on the exact index of the start time and the exact end time for the time interval based on the time array in each group of the TDMS file. This project won't need to show graphs while streaming live data to the TDMS and onto the graphs; it is just pulling from a TDMS for data review and reporting.

 

I'll probably set up events based on scale changes that feed to an asynchronous process that is getting data from the TDMS and updating what is in the plot buffer so while the user is pretty well zoomed out at a given time interval, I am pulling in data n plot widths wide on either side of that time interval and at either some level of decimation performed during the read or at full resolution depending on the X scale to allow the user to scroll rapidly through the data and zoom in without disk reads slowing the response time down, and without filling the Ram with several GB of data. If they zoom in past that last read decimation level, you would see the graph refresh in a moment to the full rate data. Same if they scroll forward or back past what I read in already from the TDMS. It will take some trial and error to get all that right to make a good experience. And who knows, I might fail to provide the desirable user experience if the reads or decimation on read are too slow, but I think I'll be able to make it work. I just think that LV could be providing this type of solution to the developer.

Intaris
Proven Zealot

but what we would really probably need would be a decimation scheme that streams incoming data to disk and populates the graph from disk as the user is scrolling or just watching the last XX minutes on the graph streaming by to avoid holding millions of samples for each waveform in RAM.

 



This is literally what I've implemented already. It has two advantages in that it allows for automatic decimation by using smart disk reading schemes (We have guaranteed constant dT) and saves bucketloads of RAM. It actually allows us to "plot" datasets too large for 32-bit LabVIEW while still maintaining a decent performance. Windows 64-bit does a great job of caching files in any available RAM, essentially giving us "free" RAM access from within a 32-bit application space.

 

I have evaluated TDMS in the past for exactly such an application and found it lacking. Write times degraded rapidly when writing a TDMS file incrementally. TDMS cannot do a seamless append to data apparently, it REQUIRES writing a new data header for each and every write, something which seems kind of unneccessary for a streaming file type. This not only slows write times but massively affects read times because all of the headers need to be parsed.

ChrisLudwig
Member

@Intaris, you are correct that TMS has to often repeat the header on incremental writes. Specifically, it will write out a new header every time when changing what is written. So if streaming identical writes to the same group, then it will not repeat the header. However, if you have data being written to multiple groups, then it will write the header at each change between groups as you are writing. This would be a major problem, except that TDMS write  buffering greatly helps. The TDMS open function disables buffering by default. Set that to true. The use the TDMS Set Properties VI to set NI_MinimumBufferSize to the desired buffer size. I normally configure the buffer size so that I am buffering about 1 second's worth of data before writing to disk. That greatly reduces the headers written to the TDMS file, reduces the resultant file size, and speeds up the writes. Running the TDMS defrag function post test further reduces file size and access times. Plus, the defrag tends to be pretty quick and worth the effort. So when done right, the TDMS is pretty good.