05-21-2021 08:02 PM - edited 05-21-2021 08:03 PM
Howdy,
I'm using Labview to perform some coincidence measurements after an experiment. I have two text files from two instruments; each text file contains a measured energy for each detected event, and a precise time stamp for when the energy was measured. Typical files contain millions of data points each, but I've attached two much smaller examples here because of file size limitations.
I'm using my labview program to:
1) only accept energy values within a certain range (e.g. 460 - 580 keV)
2) count events which are coincident to both instruments within a given time window (e.g. 0.1 ms)
My current code takes an array of all accepted timestamps from the first instrument and creates two arrays - one with the centroid value plus the time window, and the other with the centroid value minus the time window. It then compares the accepted timestamps from the second instrument against both of those arrays to find events which occur within the desired time window.
So ultimately I have n^3 comparison operations being performed. In these small example files it's not time consuming, but in my typical application I can easily have 200,000 accepted events from each instrument which is 8,000 trillion comparison operations. This is obviously going to be slow, with a typical execution time of ~ 8.5 minutes. Can you think of a way to do this with significantly fewer operations?
I'm using Labview 2020.
05-21-2021 09:42 PM
Honey I Shrunk The Code!
The following much smaller code produces the same results. I didn't try to change your main algorithm, just focused on a fairly brute force translation of your code using many fewer LabVIEW constructs. I thought the exercise might also speed things up appreciably but unfortunately it didn't. Still, it's probably a better starting point for people to start thinking about the algorithm b/c it no longer looks unnecessarily complicated.
Compare the pics below.
-Kevin P
AFTER:
BEFORE:
05-22-2021 12:43 AM
Thanks Kevin, it looks a lot better. This is part of the first program I've written so I'm still learning all of the functions available.
05-22-2021 05:13 AM - edited 05-22-2021 05:49 AM
How about using integers instead of double. Operations on integers are usually much faster.
Instead of the event energy column you could use the event channel column, and instead of the event timestamp use the event timestamp (from start) in nanoseconds (instead of seconds).
EDIT: Activate the iteration parallelism for the last for-loop to make to use all cores of your CPU.
Regards, Jens
05-22-2021 12:01 PM - edited 05-22-2021 12:31 PM
If I understand correctly, you want to find time points that are within 1e-4s, but only count them if both values in column 4 are within range.
I assume that both time columns are non-descending.
Here's some quick code that searches for coincident time points using threshold array, and if found, does a quick range check. The result is almost identical (small differences, maybe there is a flaw in one of the versions, maybe in mine!) to the code posted earlier. Not sure which one is better but maybe you can tweak. This one uses a single loop and executes in a tiny fraction of a second for the sample files, i.e. is significantly more efficient. It will also scale much better for large files!
Please check for bugs. I'll do a bit more validation.
05-22-2021 01:08 PM - edited 05-22-2021 01:10 PM
@kglennon wrote:
My current code takes ....
You already got some comments about your original code, so let me be a bit more explicit about your mistakes and unnecessary complications:
05-22-2021 01:37 PM
Thanks for the advice, I'll apply the same ideas to the rest of the program and see if I can make it a lot smaller. I'll check out your solution against the larger files this week and see if it gets the right answer, I'm sure it'll be a lot faster.
05-22-2021 01:55 PM - edited 05-22-2021 02:04 PM
@altenbach wrote:
Here's some quick code that searches for coincident time points using threshold array, and if found, does a quick range check. The result is almost identical (small differences, maybe there is a flaw in one of the versions, maybe in mine!)
The mathematical problem is that your UTC times are gigantic and 0.1ms is near the 15th significant digits and such a small difference between huge numbers is completely unreliable. Your running into issues with the limitations of DBL. Most likely, the results are due to differences in execution order.
If you take the difference at coincidence times: 3704215499.13784(file 1) 3704215499.13774 (file 2) the difference is 0.000100136 s, i.e. slightly outside the valid range, but your code detects it as inside due to slight differences in computation.
You need to re-think your approach to avoid these numerical limitations. (No, going to EXT will not really help).
Is there a reason you are using the UTC timestamp (19 significant digits in the file!!!!) Can you use the event timestamp instead (column 0) or will that differ between runs? If you really need to use UTC (column 1), I would suggest to strip the leading common digits from all UTC times and parse the rest into DBL.
Can you comment on the units of the various time columns?
05-22-2021 02:24 PM
@kglennon wrote:
Thanks for the advice, I'll apply the same ideas to the rest of the program and see if I can make it a lot smaller. I'll check out your solution against the larger files this week and see if it gets the right answer, I'm sure it'll be a lot faster.
As I said, you cannot reliable use the column at index 1, because the UTC values exceed the available precision of DBL representation. If you use DBL, the results will be highly unreliable. Read my post right above for a detailed discussion.
If you compare results, you need to compare with known good code and yours isn't! Consider that before determining what the "right answer" is. Be very aware of the numerical limitations.
And yes, my code should scale much, much(!) better as the data size increases.
If you use my code, AND use time column 0 (instead of 1), AND select a time window of 1E-10, you will get 16 matches where the time difference is exactly zero. (With 1E-9, you get 42 matches, the extra ones off by ~1E-9). You need to decide what's correct.
05-22-2021 02:56 PM
@altenbach wrote:
If you use my code, AND use time column 0 (instead of 1), AND select a time window of 1E-10, you will get 16 matches where the time difference is exactly zero. (With 1E-9, you get 42 matches, the extra ones off by ~1E-9). You need to decide what's correct.
The first time column might be in microseconds (?).
Here's some expanded code that shows all 16 exact matches (zero diff in times!) in the valid range for the given files.