Optimizing data routing on the fly

BertMcMahan · ‎09-30-2021

I'm working on a data parser that accepts incoming VISA serial packets that can contain several types of data. I need to look at each packet and extract the relevant parameters from it, then send the data to various plots. Currently I'm doing this "in post"- I wait for a test to finish, then process all of the packets. I'd like to do this in "sort of" real time so the user can see the plots generating during the test.

Packet rate is 1-2 kHz, and each packet can have one or more different "extractables" in it. Tests run for 60 seconds.

For example, I might get:

Packet 1- Start|Channel1:923|Channel2:1000|Endpacket

Packet 2- Start|SystemVoltage:10.2|SystemCurrent:0.3|Endpacket

Packet 3- Start|Channel1:900|Channel2:1001|Endpacket

Packet 4- Start|SystemVoltage:10.0|SystemCurrent:0.2|Endpacket

...

I can parse and extract the data easily enough. What I'm wondering is the best/quickest way to sort the data on-the-fly and avoid a million array resizes. A sample test might have 120,000 packets, each with 2-10 different elements. All datatypes will be known ahead of time, and I can estimate the number of total packets if preallocating will help.

Here are my ideas so far:

1- Use a map with a string Key ("Channel1", "SystemVoltage", etc) and an array value. Each packet grabs one or more elements, looks up the array in the Map, and appends. Conditional code could update the plot at, say, 5 Hz.

2- Use a map with a string Key and a Queue to contain the data, flushing the queue asynchronously at 5 Hz or so, adding each batch to an array which gets plotted to the screen. I could preallocate space by filling each queue with zeros then flushing before starting the operation.

3- Dump everything to a temp SQLite database and do queries to sort the data into meaningful plots (i.e., get all elements where "tag"=Channel1). This is new territory for me.

I'll be adding packets very quickly. If there are on average 5 data bits coming in at 2 kHz, that's 10,000 additions per second. That seems too fast for option 1, which will be resizing arrays very frequently unless I preallocate. Option 2 seems nice but I haven't tried it myself before, but I've heard performance is good. Option 3 is the least coupled, but I haven't tested the speed limits of database access before. Will 10,000 additions per second be too fast? Am I overthinking this? Is my dream of "semi realtime plot updates" out of reach?

Thanks for any help you can offer. I will prototype this soon but it'll save lots of time if someone here has done something similar 🙂

Kevin_Price · ‎09-30-2021

FWIW, I was also leaning toward option 2 before getting to your concluding thoughts. No specific experience, but it seems like the kind of thing where you could mock up some proof-of-concept test code in an hour or two, right?

Top of the head thoughts: pre-generate a bunch of random but realistic-looking packets. Then maybe take the unusual step of feeding them into your "engine" via queue with a fixed size of 1. Put a timeout on the producer side. Now you'll feed your consumer engine as fast as it can consume & process but no faster. Measure your total time to deliver all packets to figure out how many packets/sec your engine can consume & process.

There's the design, now the mockup is down to about an hour of mere implementation...

-Kevin P

ALERT! LabVIEW's subscription-only policy came to an end (finally!). Unfortunately, pricing favors the captured and committed over new adopters -- so tread carefully.

Dobrinov · ‎09-30-2021

Based on your example I'd prepare as many separate queues (of type DBL, I guess) as you have plots - in this case that would be:

Queue 1: Channel 1

Queue 2: Channel 2

Queue 3: SystemVoltage

Queue 4: SystemCurrent (or combine with SystemVoltage as a pair in a cluster?)

And so on.

I'd test to see if I can process each packet fast enough in your VISA processing loop and if so feed each numerical value (converted from string) (and only the numerical values, for efficiency) in the corresponding queue. By processing the packets I mean scan each string for "Channel1", "Channel2", "SystemVoltage" and "SystemCurrent", cut the number out for each parameter, convert as you please and feed in the corresponding queue based on which string "identifier" was found.

You can then implement another loop, or loops, to asynchronously process the queues, each for each plot. Display the processed data at your leisure.

Off the top of my head.

"Good judgment comes from experience; experience comes from bad judgment." Frederick Brooks

altenbach · ‎09-30-2021

Do you have a quick mock up with simulated data and the desired output? Thanks!

I really doubt that maps would do anything useful, because you only have 2-10 keys, where a O(logN) is not really significantly better than a O(N) lookup, especially considering the extra overhead of maps. You could use a (almost) static map to translate names into e.g. row indices of a 2D array that contains all data (Or the value could be a cluster of [row index, next element position], etc. whatever is needed to replace the NaN with real data at the correct position).

Is each type of data a separate graph or all on the same graph?

LabVIEW Champion.

BertMcMahan · ‎09-30-2021

Thanks for the responses everyone. Looks like I have two votes for option 2 and one for option 1 (slightly modified). I'll get to prototyping and will let you know what I come up with in case it helps others in the future.

@altenbach wrote:

Do you have a quick mock up with simulated data and the desired output? Thanks!

Simulating the data is my next step. I was just hoping someone had some thoughts on this already.

I really doubt that maps would do anything useful, because you only have 2-10 keys, where a O(logN) is not really significantly better than a O(N) lookup, especially considering the extra overhead of maps.

Makes sense. The maps would make things easier to read, but it sounds like the overhead might be a problem.

You could use a (almost) static map to translate names into e.g. row indices of a 2D array that contains all data (Or the value could be a cluster of [row index, next element position], etc. whatever is needed to replace the NaN with real data at the correct position).

Good idea. I didn't think of a giant preallocated 2D array to contain everything. Unfortunately some data appears more frequently than others, but like you said a NaN would probably solve that issue.

Is each type of data a separate graph or all on the same graph?

Multiple graphs. Ideally this would be flexible enough that it doesn't care about the target and just provides a handful of arrays so that other code could manipulate it and display it as it pleases.

I will reply back with the results of the prototyping!

LabVIEW

Optimizing data routing on the fly

Optimizing data routing on the fly

Re: Optimizing data routing on the fly

Re: Optimizing data routing on the fly

Re: Optimizing data routing on the fly

Re: Optimizing data routing on the fly