Sub millisecond timing revisited

Andrey_Dmitriev · ‎06-04-2026

@JÞB wrote:

So, once more, I would like to know the argument for changing the exit criteria of the new Wait.

Of note: I even pointed out that a waitable object was possible in My Nugget authored so many years ago.

Hmm, I’m trying to understand every word in your question, and to be honest, I still can’t quite follow it. OK, step by step: under “new Wait” you probably mean High Resolution Polling Wait.vi, and under “exit criteria” you likely mean the polling loop at the end, which exits when the Windows Performance Counter reaches the target value and the while loop finishes.

I found the only nugget on this topic back to 2012: “Community Nugget: Sub‑millisecond timing in LabVIEW.” You introduced essentially the same concept using the Performance Counter. If I compare your implementation side by side with NI’s, there are no principal differences (except that you placed the DLL call in the UI thread, which is an absolute no‑go in tight loops, and you use “Greater or Equal” instead of “Greater,” which might save one iteration — practically negligible):

Screenshot 2026-06-04 07.09.51.png

There are some minor improvements possible, as usually. For example, the entire spinning loop can be wrapped into a single library call, where a more efficient internal polling mechanism can be used. By the way, I forgot to mention an important note: the PAUSE assembly instruction (which is spin-loop hint) is essential and recommended, as it improves the performance of spin‑wait loops. Also, the traditional Wait (ms) can be slightly improved by switching from a trivial Delay() to Waitable Timers, which was briefly discussed above.

But Windows itself is not a real‑time OS — that says all. In well‑designed systems, sleeps, delays, and pauses are rarely used because we work with events, queues, notifications, and so on. Every time I add a polling loop or a delayed execution, I ask myself: “Do I really need this here?” Sometimes it’s unavoidable, and then understanding how it works helps to do it properly (or reinvent our own bicycle). If I truly need a reliable wait, there are plenty of real‑time OSes and devices. I have personal experience with RT‑11 and OS‑9, as well as some PLCs and cRIO. All of these are real‑time to some degree. For example, if I want an exact 50 µs delay on a 40 MHz FPGA, I simply count 2000 clock cycles — that’s it.

Back to your question: what exactly needs to be changed in the current implementation, and which arguments are you looking for? Is the “new Wait” not reliable and not precise enough? It will remain unreliable forever on Windows.

Andrey_Dmitriev · ‎06-04-2026

@Andrey_Dmitriev wrote:I forgot to mention an important note: the PAUSE assembly instruction (which is spin-loop hint) is essential and recommended, as it improves the performance of spin‑wait loops.

And yes, this improves reliability rather than "raw performance". The spinning loop will run fewer iterations, but the deviation from one iteration to the next is lower:

Screenshot 2026-06-04 08.05.01.png

So, for the moment, this is my preferred way to spin such loops.

rolfk · ‎06-04-2026

@JÞB wrote:

Guys, I'm begging you to please help me get replies on topic.

It's not that I don't appreciate your thoughtful posts! I do! And the thorough knowledge exposed have never been doubted by myself.

I no longer have access to the round table (honestly, I would have preferred that forum to question why R&D is 15 years late, and missing key requirements)

So, once more, I would like to know the argument for changing the exit criteria of the new Wait.

Of note: I even pointed out that a waitable object was possible in My Nugget authored so many years ago.

You really need to calm down a bit.

First, that function was hidden in vi.lib since about 2015. So hardly many years after your post. It was added to the palette indeed only later but considering your gripes with it even that would have been a mistake.

Your main gripe seems to be that the loop in the NI VI will not exit in the first iteration. But if we enter with a delay of 0 or less the function correctly simply skips anything. => No chance to get into that loop at all.

If we enter a delay of >0 to 50 ns (considering the 10MHz performance counter frequency that modern Windows systems seem to implement), the expected counter value indeed might end up being the same as the just read counter value due to rounding in the calculations and the loop "needlessly" goes into another iteration. That's wasteful but hardly in a way that you could ever notice or measure in any way. And the user specified to wait less than 100ns, maybe he should be instead forced to sit on a chair with thumbtacks to learn that assumptions bite often in the place that starts with the same three letters. No user ever will notice however, since there is another issue:

You say NI should try to create a deterministic High Resolution Wait. But the reality is that they can't since none of the Desktop OSes is deterministic in terms of timing. You can define confidence levels based on the timing accuracy you want to get, but none really can reach 100%. If we talk about sub ms accuracy, each of these systems is absolutely and definitely non-deterministic. So rather than spend infinite time to get something impossible working, I'm pretty ok with something that does the job reliably within the constraints of the actual system. Perfection is the enemy of good!

Even the use of the Greater than instead of Greater or Equal than in the non-0 iteration is a debate that I would not recommend to loose any second of sleep over. Technically if you use >= the loop could end half a performance counter interval before the intended moment. Wow, big fish indeed! We talk about an interval of 100ns in modern Windows systems (possibly more on older Windows systems) and we discuss 50ns early on a system that can't even reliably guarantee 1 ms even when using tricks. So > or >=? Who really cares in this? Both are equally good or wrong but achieve the intended behavior of a timer that will not block indefinitely but exit somewhere between -50ns to several ms after the intended time. Good enough for me and every other LabVIEW user who doesn't want to endeavor into assembly programming or kernel driver development. And while you can get that sort of timing inside a kernel driver, you end up with the issue of having to call it from user space, which is not just a thread context switch, which is already somewhat expensive, but a complete ring context switch, which is magnitudes worse.

Disclosure: I would also use Greater or Equal here, simply as a personal preferences. That possibly 50ns early exit on a function that can't even guarantee 1ms accuracy is simply splitting hairs over nothing.

PP: The idea that the first iteration actually could ever happen to produce an expected value that is equal to the current count is actually wrong. There is a Round toward +Infinity in there. Since an input value of 0 is properly caught and skipped on entry already the resulting delay after the multiplication is always >0 and the according rounding will always generate at least 1 so there is no possibility that a check for equal value could ever possibly generate a true value. Basically adding a check for that in the first iteration would be just wasting CPU time for checking for something that can never happen. Considering this extra information, the use of >= inside the other case would be indeed preferable as there is no chance that the loop ever could exit before the intended time.

But, but, we could add a check for this difference to be <10 for instance, as the loop will anyhow not be able to run as fast as to be able to reach that point in one iteration. No we can't. Windows does not document what the frequency is but specifically requires the call to QueryPerformanceFrequency() if any resemblance to real time is required. It seems to be 10MHz currently but it could very well be less on some versions or editions of Windows. Also this would now introduce the possibility for the wait to exit prematurely to the intended time, something this VI goes to some lengths to avoid, as documented by the Round function.

PPP: There is of course another theoretical issue, the performance counter value is implemented as signed integer. This is also how Windows defines the parameter of the QueryPerformanceCounter() function. This value theoretically can overflow and will then wrap around to -2^63 resulting in an almost indefinite delay due to the check in the Default case. But, 2^63 is about 29000 years with a Performance Counter Frequency of 10MHz. The counter starts at 0 when the system starts up. So I think it is safe to not worry about that possibility until Microsoft increases the frequency of that counter to 10GHz or thereabout. And even then the computer needs to run for more than 29 years without interruption before reaching the wrap around point.

PPPP: So as a conclusion to this whole thing. I can't attest if there were any psychotropic mushrooms in any pizza eaten by anyone in LabVIEW R&D, but if there were I can confidently say that it had no adverse effect on this specific VI.

Rolf Kalbermatter My Blog

DEMO, Electronic and Mechanical Support department, room 36.LB00.390

Andrey_Dmitriev · ‎06-04-2026

@rolfk wrote: <...> value theoretically can overflow and will then wrap around to -2^63 resulting in an almost indefinite delay due to the check in the Default case. But, 2^63 is about 29000 years....

As a side note regarding overflowed counters: I remember some time ago a discussion between Linus Torvalds and a Google engineer about how this should be handled properly. Unfortunately, I can’t find the original thread in the Linux mailing list. In any case, as usually in this long thread I'm with Rust’s approach. I’m quite happy that saturating and wrapping arithmetic are available out of the box.

Here, for example, we wait for 10 increments of a u8 counter. The counter starts at 250, overflows, and the logic still works correctly (by the way, this is also a nice small example showing how closures and mutation working together):

use std::cell::Cell;

fn wait_ticks(mut get_counter: impl FnMut() -> u8, ticks: u8) {
    let start = get_counter();

    loop {
        let current = get_counter();
        let elapsed = current.wrapping_sub(start);

        if elapsed >= ticks {
            break;
        }
    }
}

fn main() {
    let counter = Cell::new(250u8);
    let mut get_counter = || {
        let val = counter.get().wrapping_add(1);
        counter.set(val);
        print!("+");
        val
    };

    println!("Test 1 - start 10 increments from 250");
    wait_ticks(&mut get_counter, 10);
    println!("\n  -> Reached: {}", counter.get());

    println!("Test 2 - continue 10 more from 5");
    wait_ticks(&mut get_counter, 10);
    println!("\n  -> Reached: {}", counter.get());
}

and it works:

Test 1 - start 10 increments from 250
+++++++++++
  -> Reached: 5
Test 2 - continue 10 more from 5
+++++++++++
  -> Reached: 16

But there’s no rocket science here, of course — the same idea can easily be implemented in any programming language...

rolfk · ‎06-04-2026

@Andrey_Dmitriev wrote:

@rolfk wrote: <...> value theoretically can overflow and will then wrap around to -2^63 resulting in an almost indefinite delay due to the check in the Default case. But, 2^63 is about 29000 years....

As a side note regarding overflowed counters: I remember some time ago a discussion between Linus Torvalds and a Google engineer about how this should be handled properly. Unfortunately, I can’t find the original thread in the Linux mailing list. In any case, as usually in this long thread I'm with Rust’s approach. I’m quite happy that saturating and wrapping arithmetic are available out of the box.

In my timeout loops I never compare the ms counter (which is an unsigned value) with the expected timeout value, exactly because of this. The better approach is to calculate the difference between the two timertick values and then evaluate it as a signed integer by simply casting it to that. If it is <=0 (or >= 0 if you got the subtraction the wrong way around 😂) then the timeout has elapsed. You could of course skip the cast and simply check if the MSB is set. Signed counter values make it slightly more complicated to reason, but work in similar ways. However with a wrap around interval in the 1000ds of years as in this case I'm not going to even bother about that. 😁

Rolf Kalbermatter My Blog

DEMO, Electronic and Mechanical Support department, room 36.LB00.390

JÞB · ‎06-04-2026

@rolfk wrote:

@Andrey_Dmitriev wrote:

@rolfk wrote: <...> value theoretically can overflow and will then wrap around to -2^63 resulting in an almost indefinite delay due to the check in the Default case. But, 2^63 is about 29000 years....

As a side note regarding overflowed counters: I remember some time ago a discussion between Linus Torvalds and a Google engineer about how this should be handled properly. Unfortunately, I can’t find the original thread in the Linux mailing list. In any case, as usually in this long thread I'm with Rust’s approach. I’m quite happy that saturating and wrapping arithmetic are available out of the box.

In my timeout loops I never compare the ms counter (which is an unsigned value) with the expected timeout value, exactly because of this. The better approach is to calculate the difference between the two timertick values and then evaluate it as a signed integer by simply casting it to that. If it is <=0 (or >= 0 if you got the subtraction the wrong way around 😂) then the timeout has elapsed. You could of course skip the cast and simply check if the MSB is set. Signed counter values make it slightly more complicated to reason, but work in similar ways. However with a wrap around interval in the 1000ds of years as in this case I'm not going to even bother about that. 😁

Overflow is Fixable. I even provided example code to the community for a U32 in 2009. ( the 64 bit int were not then native to LabVIEW)

I had seen the need after ENJOYING a Wait Until overflow circa 2001. Using the anticeder "Wait +1(ms).vi from the Traditional DAQ pallet. I authored the code in my 2009 post in 2001! So you are only 2.5 decades late with that observation.

Now can we get back on topic? Or should I just get mean?

@13.6 BILLION years ago "Time" started. The second is the most accurate scientific measurement! Why does LabVIEW have poor timing?

"Should be" isn't "Is" -Jay

Andrey_Dmitriev · ‎06-04-2026

@JÞB wrote:

Now can we get back on topic? Or should I just get mean?

We are consistently in topic all the time. Let’s proceed in a formal manner. As I come from a relatively large corporation, we typically follow structured processes. When discussing new development, or design, we prepare a documents called a Software Requirements Specifications. If we identify that something does not work as expected, we create an Issue or Bug Report. For changes or modifications we usually raise an Engineering Change Request. As far as I know, at NI this is referred to as a Corrective Action Request (CAR) or something like that.

Could you provide your formal specification for this topic and define the measurable deliverables you expect from timing functions?

JÞB · ‎06-04-2026

@Andrey_Dmitriev wrote:

@JÞB wrote:

Now can we get back on topic? Or should I just get mean?

We are consistently in topic all the time. Let’s proceed in a formal manner. As I come from a relatively large corporation, we typically follow structured processes. When discussing new development, or design, we prepare a documents called a Software Requirements Specifications. If we identify that something does not work as expected, we create an Issue or Bug Report. For changes or modifications we usually raise an Engineering Change Request. As far as I know, at NI this is referred to as a Corrective Action Request (CAR) or something like that.

Could you provide your formal specification for this topic and define the measurable deliverables you expect from timing functions?

Sure.

I would like to learn why the Wait function exit logic behavior changes with clock source. I would expect that the software requirements review would have addressed the issue and provided some justification (or, really bad pizza toppings.)

"Should be" isn't "Is" -Jay

rolfk · ‎06-04-2026

@JÞB wrote:

Overflow is Fixable. I even provided example code to the community for a U32 in 2009. ( the 64 bit int were not then native to LabVIEW)

LabVIEW 8.0 was the first to consistently have 64-bit integer support! And using it for the File IO offsets. LabVIEW 2009 was the first to run as 64-bit application but that is independent of supporting 64-bit integers. There was actually some hidden support for 64-bit integers in LabVIEW 7.1 but trying to use the according primitives was more likely to crash LabVIEW than doing anything useful.

@13.6 BILLION years ago "Time" started. The second is the most accurate scientific measurement! Why does LabVIEW have poor timing?

I'm not sure what you call poor timing. But LabVIEW can't implement an atomic clock or a hydrogen maser on a normal computer. That would be what is needed to come close to the most accurate scientific measurement.

I would like to learn why the Wait function exit logic behavior changes with clock source. I would expect that the software requirements review would have addressed the issue and provided some justification (or, really bad pizza toppings.)

Please explain to us poor souls what you mean with this. Where in that VI is anything about different clock source for the Wait function? Are you referring to the expected time - 2ms Wait function in the first iteration? What's wrong with that? It exists there to not heating up the CPU unnecessarily in a spinlock loop. And it is 2ms since the Wait (ms) has a documented inaccuracy of -1 to +x ms, which is a feature of the Windows API and beyond control of LabVIEW. You seem to insist that this function is wrongly implemented for some reason. I can't see an error beyond the inherent inaccuracy of Windows timing functions which LabVIEW can't fix without requiring an expensive plugin board with temperature stabilized high accuracy crystals and completely borking the Windows kernel to force it into submission.

Rolf Kalbermatter My Blog

DEMO, Electronic and Mechanical Support department, room 36.LB00.390

JÞB · ‎06-04-2026

@rolfk wrote:

@JÞB wrote:

Overflow is Fixable. I even provided example code to the community for a U32 in 2009. ( the 64 bit int were not then native to LabVIEW)

LabVIEW 8.0 was the first to consistently have 64-bit integer support! And using it for the File IO offsets. LabVIEW 2009 was the first to run as 64-bit application but that is independent of supporting 64-bit integers. There was actually some hidden support for 64-bit integers in LabVIEW 7.1 but trying to use the according primitives was more likely to crash LabVIEW than doing anything useful.

@13.6 BILLION years ago "Time" started. The second is the most accurate scientific measurement! Why does LabVIEW have poor timing?

I'm not sure what you call poor timing. But LabVIEW can't implement an atomic clock or a hydrogen maser on a normal computer. That would be what is needed to come close to the most accurate scientific measurement.

I would like to learn why the Wait function exit logic behavior changes with clock source. I would expect that the software requirements review would have addressed the issue and provided some justification (or, really bad pizza toppings.)

Please explain to us poor souls what you mean with this. Where in that VI is anything about different clock source for the Wait function? Are you referring to the expected time - 2ms Wait function in the first iteration? What's wrong with that? It exists there to not heating up the CPU unnecessarily in a spinlock loop. And it is 2ms since the Wait (ms) has a documented inaccuracy of -1 to +x ms, which is a feature of the Windows API and beyond control of LabVIEW. You seem to insist that this function is wrongly implemented for some reason. I can't see an error beyond the inherent inaccuracy of Windows timing functions which LabVIEW can't fix without requiring an expensive plugin board with temperature stabilized high accuracy crystals and completely borking the Windows kernel to force it into submission.

The millisecond Wait based on the msec timer waits up to msec tick counts on non RT targets. The new Wait routine based on the high resolution timer exit logic is different. Why?

Edit: I'm pretty sure "borking" is an autocorrected feature. (To block a judicial appointment) the kernel works fine. The BIOS is designed for USERS. Which is why we have real time targets where Wait exit behavior differs. If you want pi from a counter you are irrational.

"Should be" isn't "Is" -Jay

LabVIEW

Sub millisecond timing revisited

Re: Sub millisecond timing revisited

Re: Sub millisecond timing revisited

Re: Sub millisecond timing revisited

Re: Sub millisecond timing revisited

Re: Sub millisecond timing revisited

Re: Sub millisecond timing revisited

Re: Sub millisecond timing revisited

Re: Sub millisecond timing revisited

Re: Sub millisecond timing revisited

Re: Sub millisecond timing revisited