High CPU usage: using clusters vs handling data individually?

kelendas · ‎05-01-2025

Greetings Forum Members,

We have encountered a problem, and I would like to ask for your opinion. We have a Labview software that reads data from the CAN bus, processes it, and based on the inputs sends out a control message, and save everything into a tdms file. There are two loops running parallel: a CAN message read loop, and the main control loop.
I have developed the first version of this software, with my then limited knowledge, but the program could do what was needed. For this year I have asked my students to rewrite the whole code, the main task would be the same, but with a different approach to handling data on the block diagram. They have done it but I noticed that the new code is using much more CPU% (around 10-11%) than the old one (2-3%), while running on the same machine.

The fundamental difference between the old and new verisons:
- In the old code, each data was handled individually, between loops several local variables were used, and in the main control loop, everything was wired separatelly.
- In the new code, a big cluster was used to handle all data. The reason is that the block diagram is much more easier to understand. In subVIs, the In Place Element structure is used to bundle and unbundle data from the cluster. The cluster size is not that huge in my opinion, around 200 elements are present.

My questions is: Why is the new software performing much worse? Do we need to rewrite the code in the old format? Or just use smaller clusers?
I have heard that using clusters needs more cpu and memory, but this difference is a lot in my opinion. The end goal would be to use the software on a Microsoft Surface Go4, which is much weaker in hardware terms, than the laptop we used for measuring the usage. I fear we loose battery life with the high CPU usage, and freezes also.

I have attached the old and new code. Thank you for your replies.

Jacobson · ‎05-01-2025

If you haven't already, I would suggest doing some additional benchmarking to figure out exactly where the hot spots are.

Copying more data does take more time but I would want to make sure there wasn't some other change made while updating the code which is causing the slow down.

altenbach · ‎05-01-2025

I see very little reason for the sequence structures in the new code. Is that just for organization and labeling?

It is very difficult (i.e. impossible) to analyze truncated pictures of block diagrams.

Do you really have a front panel indicator for that gigantic cluster? Maybe use a DVR instead and leave the UI thread out of it.

Is this LabVIEW or LabVIEW RT? (Since you are using timed loops with assigned cores!). Can both loops keep up with the requested 50ms? If so, why not place everything in a single plain while loop and let the compiler figure out the best multicore assignments?

LabVIEW Champion.

RTSLVU · ‎05-02-2025

I would use a Producer/Consumer architecture in this application.

The Producer loop, running at whatever speed it is receiving CAN data should only receive the CAN data and place it in the Queue.

The Consumer loop dequeues and decodes the CAN message, then does whatever you are doing with the data in your lower loop

TBH: I don't know if that will reduce CPU usage but Prod/Con is what it looks like you are trying to do here but it's the right way to do it.

========================
=== Engineer Ambiguously ===
========================

JÞB · ‎05-02-2025

First, take the advice from others to heart. Learn a bit about a proper P-C approach, what exactly a timed loop is for and more about dataflow. You have trouble with all 3 concepts.

Addressing the increased CPU usage. Run VIA tests first. I strongly suspect that you have overlapping objects on your front panel - probably including the graph. Drawing and redrawing graphs and charts is a CPU intensive action and requires the UI Thread. Do you really need to rescale X after every write or, would autoscaling be fine? (Is the USER REALLY watching that close?) Would a dedicated separate display and Logging loops be better? Look at the Continuous display and Logging project templates! Would a Chart be better than a Graph? Have you looked at using the FP.Defer Updates method?

So many questions that cannot be answered without BOTH actual code and the User's Story. Sit down with the User first to get a Story of how they want to interact with the program. Then unload the UI Thread to stop wasting CPU time displaying more data than the User can possibly evaluate at rates far faster than they can react to.

If you Really Look at how much data per second you are forcing through a Mark 1 human eyeball vs how much data a human brain can actually process, then calculate what little percent of the time that human is both between monitor and chair and interested in paying attention, you likely going to vomit.

"Should be" isn't "Is" -Jay

BertMcMahan · ‎05-02-2025

I'm pretty sure I remember that Timed Loops are all serialized, so the compiler can't multithread it. Change those to regular While loops and add a Wait Until Next ms Multiple in there if you need to slow it down. (Alternatively you could switch your Consumer to be timed by waiting on data inputs from a queue instead of polling a local variable, which is the right way to do that).

If you're on RT there may be other considerations.

JÞB · ‎05-02-2025

@BertMcMahan wrote:

I'm pretty sure I remember that Timed Loops are all serialized, so the compiler can't multithread it. Change those to regular While loops and add a Wait Until Next ms Multiple in there if you need to slow it down. (Alternatively you could switch your Consumer to be timed by waiting on data inputs from a queue instead of polling a local variable, which is the right way to do that).

Not quite. Timed Loops create brand new dedicated execution systems with a special elevated priority. They have the same number of threads per core available as any other execution system. The total number of threads can be reduced by setting a core affinity (often done mistakenly) the OP may have done this...cannot tell without knowing the target OS.

"If you're on RT there may be other considerations."

Just looking at the gross number of FP Objects that are shown in the pictures, an RT target would be an insane supposition.😳 and those considerations would include possibly lobotomizing either the FP or the developer.

"Should be" isn't "Is" -Jay

BertMcMahan · ‎05-02-2025

@JÞB wrote:

Not quite. Timed Loops create brand new dedicated execution systems with a special elevated priority. They have the same number of threads per core available as any other execution system. The total number of threads can be reduced by setting a core affinity (often done mistakenly) the OP may have done this...cannot tell without knowing the target OS.

Found a thread discussing this- https://forums.ni.com/t5/LabVIEW/Timed-loop-amp-processor-allocation/td-p/3123335 Granted it's a decade old, but crossrulz mentions Timed Loops are single-threaded. I couldn't find anything in the actual documentation. I took that to mean that, if multiple things could be parallelized, the compiler may do that in a regular While loop, but not in a Timed loop. I certainly could be misunderstanding. Processor allocation and manual threading is one area I've never had to touch.

JÞB · ‎05-02-2025

@BertMcMahan wrote:

@JÞB wrote:

Not quite. Timed Loops create brand new dedicated execution systems with a special elevated priority. They have the same number of threads per core available as any other execution system. The total number of threads can be reduced by setting a core affinity (often done mistakenly) the OP may have done this...cannot tell without knowing the target OS.

Found a thread discussing this- https://forums.ni.com/t5/LabVIEW/Timed-loop-amp-processor-allocation/td-p/3123335 Granted it's a decade old, but crossrulz mentions Timed Loops are single-threaded. I couldn't find anything in the actual documentation. I took that to mean that, if multiple things could be parallelized, the compiler may do that in a regular While loop, but not in a Timed loop. I certainly could be misunderstanding. Processor allocation and manual threading is one area I've never had to touch.

Tim was wrong. The LabVIEW Help file is a bit better as a source. A lot of normally meaningless trivia about execution systems priority and the OS thread pool is given in the help file. When you really need to figure out why CPU usage is high, you are often better off looking for greedy loops, the dreaded buffer reallocations or a high UI Thread load. Like clearing a forest before trying to dig a few weeds. Although the serialization of chunks within a given loop has an impact on CPU usage, (as well as execution priority) it is dust compared to boulders opposed to the UI Thread switches for bloated FP Object updates that are simply overtaxing any possible User's ability to absorb.

"Should be" isn't "Is" -Jay

altenbach · ‎05-03-2025

If you look at the picture, here each timed loop is assigned to a dedicated CPU core

LabVIEW Champion.

LabVIEW

High CPU usage: using clusters vs handling data individually?

High CPU usage: using clusters vs handling data individually?

Re: High CPU usage: using clusters vs handling data individually?

Re: High CPU usage: using clusters vs handling data individually?

Re: High CPU usage: using clusters vs handling data individually?

Re: High CPU usage: using clusters vs handling data individually?

Re: High CPU usage: using clusters vs handling data individually?

Re: High CPU usage: using clusters vs handling data individually?

Re: High CPU usage: using clusters vs handling data individually?

Re: High CPU usage: using clusters vs handling data individually?

Re: High CPU usage: using clusters vs handling data individually?