Fast coincidence logic

kglennon · ‎05-21-2021

Howdy,

I'm using Labview to perform some coincidence measurements after an experiment. I have two text files from two instruments; each text file contains a measured energy for each detected event, and a precise time stamp for when the energy was measured. Typical files contain millions of data points each, but I've attached two much smaller examples here because of file size limitations.

I'm using my labview program to:

1) only accept energy values within a certain range (e.g. 460 - 580 keV)

2) count events which are coincident to both instruments within a given time window (e.g. 0.1 ms)

My current code takes an array of all accepted timestamps from the first instrument and creates two arrays - one with the centroid value plus the time window, and the other with the centroid value minus the time window. It then compares the accepted timestamps from the second instrument against both of those arrays to find events which occur within the desired time window.

So ultimately I have n^3 comparison operations being performed. In these small example files it's not time consuming, but in my typical application I can easily have 200,000 accepted events from each instrument which is 8,000 trillion comparison operations. This is obviously going to be slow, with a typical execution time of ~ 8.5 minutes. Can you think of a way to do this with significantly fewer operations?

I'm using Labview 2020.

Kevin_Price · ‎05-21-2021

Honey I Shrunk The Code!

The following much smaller code produces the same results. I didn't try to change your main algorithm, just focused on a fairly brute force translation of your code using many fewer LabVIEW constructs. I thought the exercise might also speed things up appreciably but unfortunately it didn't. Still, it's probably a better starting point for people to start thinking about the algorithm b/c it no longer looks unnecessarily complicated.

Compare the pics below.

-Kevin P

AFTER:

BEFORE:

ALERT! LabVIEW's subscription-only policy came to an end (finally!). Unfortunately, pricing favors the captured and committed over new adopters -- so tread carefully.

kglennon · ‎05-22-2021

Thanks Kevin, it looks a lot better. This is part of the first program I've written so I'm still learning all of the functions available.

JensG69 · ‎05-22-2021

How about using integers instead of double. Operations on integers are usually much faster.

Instead of the event energy column you could use the event channel column, and instead of the event timestamp use the event timestamp (from start) in nanoseconds (instead of seconds).

EDIT: Activate the iteration parallelism for the last for-loop to make to use all cores of your CPU.

Regards, Jens

Kudos are welcome...

altenbach · ‎05-22-2021

If I understand correctly, you want to find time points that are within 1e-4s, but only count them if both values in column 4 are within range.

I assume that both time columns are non-descending.

Here's some quick code that searches for coincident time points using threshold array, and if found, does a quick range check. The result is almost identical (small differences, maybe there is a flaw in one of the versions, maybe in mine!) to the code posted earlier. Not sure which one is better but maybe you can tweak. This one uses a single loop and executes in a tiny fraction of a second for the sample files, i.e. is significantly more efficient. It will also scale much better for large files!

Please check for bugs. I'll do a bit more validation.

LabVIEW Champion.

altenbach · ‎05-22-2021

@kglennon wrote:

My current code takes ....

You already got some comments about your original code, so let me be a bit more explicit about your mistakes and unnecessary complications:

No need to wire -1 to the file I/O. It is the default.
If you want the data transposed, there is an input on the file I/O to select that. Still, you can just index columns as shown.
Index array is resizable for multiple outputs.
As already mentioned, the "in range?" function can do both comparisons at once.
First FOR loop:
- This need to be a while loop (or FOR loop with termination) because the number of iterations depend on the number of TRUE values, not on the number of elements. (Or you can count the number of true values before the loop starts, but that's also expensive).
- The upper shift register is not needed because the content never changes. Just disable indexing at the input tunnel.
- Unconditionally appending to an initially empty array is identical to autoindexing at the output tunnel. No need for the shift register.
- Your Start index for the search should be the last found index+1, not i+1, That's why you get all these duplicates!
- Once you do that, you can terminate the loop if the search result is -1, because it means there are not more TRUEs. Discard the last element after the loop, I'ts the only existing "-1".
Remove Duplicates is no longer needed if you do the first loop correctly
Second FOR loop is not really necessary if you do the first loop correctly.
Third FOR loop:
- If you are autoindexing, getting the size and wiring N is redundant. LabVIEW will iterate for all elements in the array automatically.
- The upper shift register is not needed because the content never changes. Just disable indexing.
- The lower shift register can be deleted and replaced by autoindexing.
Last FOR loop:
Again, getting the size and wiring N is redundant.
Indicators belong after the loop. No need to slow things down by updating indicators.
Both orange shift register can be removed because the content never changes. Just disable indexing at the input tunnel.
The blue Shift register should be a scalar (I32) that increments with each match. You only care about a single number, so why built an ever-growing array and sum all elements with each iteration?
If you want the "events discriminated" output, take the size of the autoindexing input array (Or derive it from [i] if you want to watch the progress as the loop executes).
etc.

LabVIEW Champion.

kglennon · ‎05-22-2021

Thanks for the advice, I'll apply the same ideas to the rest of the program and see if I can make it a lot smaller. I'll check out your solution against the larger files this week and see if it gets the right answer, I'm sure it'll be a lot faster.

altenbach · ‎05-22-2021

@altenbach wrote:

Here's some quick code that searches for coincident time points using threshold array, and if found, does a quick range check. The result is almost identical (small differences, maybe there is a flaw in one of the versions, maybe in mine!)

The mathematical problem is that your UTC times are gigantic and 0.1ms is near the 15th significant digits and such a small difference between huge numbers is completely unreliable. Your running into issues with the limitations of DBL. Most likely, the results are due to differences in execution order.

If you take the difference at coincidence times: 3704215499.13784(file 1) 3704215499.13774 (file 2) the difference is 0.000100136 s, i.e. slightly outside the valid range, but your code detects it as inside due to slight differences in computation.

You need to re-think your approach to avoid these numerical limitations. (No, going to EXT will not really help).

Is there a reason you are using the UTC timestamp (19 significant digits in the file!!!!) Can you use the event timestamp instead (column 0) or will that differ between runs? If you really need to use UTC (column 1), I would suggest to strip the leading common digits from all UTC times and parse the rest into DBL.

Can you comment on the units of the various time columns?

LabVIEW Champion.

altenbach · ‎05-22-2021

@kglennon wrote:

Thanks for the advice, I'll apply the same ideas to the rest of the program and see if I can make it a lot smaller. I'll check out your solution against the larger files this week and see if it gets the right answer, I'm sure it'll be a lot faster.

As I said, you cannot reliable use the column at index 1, because the UTC values exceed the available precision of DBL representation. If you use DBL, the results will be highly unreliable. Read my post right above for a detailed discussion.

If you compare results, you need to compare with known good code and yours isn't! Consider that before determining what the "right answer" is. Be very aware of the numerical limitations.

And yes, my code should scale much, much(!) better as the data size increases.

If you use my code, AND use time column 0 (instead of 1), AND select a time window of 1E-10, you will get 16 matches where the time difference is exactly zero. (With 1E-9, you get 42 matches, the extra ones off by ~1E-9). You need to decide what's correct.

LabVIEW Champion.

altenbach · ‎05-22-2021

@altenbach wrote:
If you use my code, AND use time column 0 (instead of 1), AND select a time window of 1E-10, you will get 16 matches where the time difference is exactly zero. (With 1E-9, you get 42 matches, the extra ones off by ~1E-9). You need to decide what's correct.

The first time column might be in microseconds (?).

Here's some expanded code that shows all 16 exact matches (zero diff in times!) in the valid range for the given files.

LabVIEW Champion.

LabVIEW

Fast coincidence logic

Fast coincidence logic

Re: Fast coincidence logic

Re: Fast coincidence logic

Re: Fast coincidence logic

Re: Fast coincidence logic

Re: Fast coincidence logic

Re: Fast coincidence logic

Re: Fast coincidence logic

Re: Fast coincidence logic

Re: Fast coincidence logic