11-13-2018 12:01 PM - edited 11-13-2018 12:14 PM
You can't use search array or sort array, because you are only interested in one particular cluster element. If this is just an integer array, you are still far from optimal. Search is linear and needs to inspect millions of values. If the 1-4 check fails, there is no need for any search. If 2 is not found, there is no need to search for 3.
If 2 is found, that would be used as the start index for the 3 search. I am sure my suggestion is many orders of magnitude faster. (Currently on the bus. Will try something later ... You never need to inspect more than a few dozen elements to solve)
11-13-2018 12:33 PM
I would suggest a more generic formulation of the problem to make it more reusable.
"Given a array of clusters containing an integer element that is non-descending, determine if all possible integer values between the values of the first and last array element are present."
Maybe for another time... 🙂
11-13-2018 01:52 PM - edited 11-13-2018 01:54 PM
Alright array of cluster. 1 and 4 still checked the same, 2 and 3 checking each element. Is it possible to pull the cluster elements out efficiently and then use search 1D array? I get that looking at each element when there are millions can be slow, and this is brute forcing the array.
Unofficial Forum Rules and Guidelines
Get going with G! - LabVIEW Wiki.
17 Part Blog on Automotive CAN bus. - Hooovahh - LabVIEW Overlord
11-13-2018 01:58 PM
This approach would work, but if the numeric values look like this:
1, 1, 1, .... 1, 2, 3, 4
...then we have to go through the entire array to find out it's legit. I need a solution that looks like a binary search, that goes through the array as efficiently as possible.
11-13-2018 02:23 PM - edited 11-13-2018 02:24 PM
Sorry it took two pages of replies to finally show me what you are trying to do. Yeah I think an intelligent binary search could be performed.
Do we have some real world data to try the normal approach with first? Is the non binary search fast enough with real world data? I just tested with 10M values and worst case (like you showed) took around 70ms, 100M took around 700ms. Maybe cluster complexity increases the time needed?
Unofficial Forum Rules and Guidelines
Get going with G! - LabVIEW Wiki.
17 Part Blog on Automotive CAN bus. - Hooovahh - LabVIEW Overlord
11-13-2018 02:28 PM
Sorry I keep moving the goalposts. The array of clusters is the easiest way conceptually to present the coding challenge. In reality, it's not exactly an array of clusters. Suffice to say that the binary search will definitely result in time savings much greater than an order of milliseconds. But whatever "binary search with logic to handle looking for two things" answer y'all come up with should be translatable to the situation I'm dealing with. 🙂
11-13-2018 03:25 PM - edited 11-13-2018 04:17 PM
Here's a quick draft (LabVIEW 2015). A few microseconds (1-3us) for 10M million elements of an array containing a cluster with the "interesting number", three other random DBLs and a short string.
(Sorry, the data generator ran out of memory for 100M but the analysis should still be in the low microseconds. We need to simplify the cluster for larger sizes...)
(Some things could be tweaked and there might be edge cases that need to be addressed. No failure for Size=1k and >1M trials: looks promising....:) The innermost case could also be wired directly to the number with one case "2" and the other "default"))
(Note that the data generation is significantly slower that the validity check. It takes about 10s to generate the 10M array ... 🐵
11-13-2018 05:33 PM
My implementation (LV2018) is very similar to CA's except I used an enum to track progress.
11-13-2018 06:39 PM
@GregSands wrote:
My implementation (LV2018) is very similar to CA's except I used an enum to track progress.
I haven't tested yours because my data input structures are different, but one thing to remember is that any case structure with more than two cases is measurably slower than a case structures with only two cases. Your stack of case structures has 15 possible paths while mine only has four possible paths. (Yes, my outer case has 3 cases, but the code is so fast it does not really matter).
I am curious what the performance would be.
11-13-2018 10:41 PM
Christian and GregS, y'all rock! Your algorithms both work well for my situation. Y'all brought my processing time down from ~1 sec to 0.003 sec. GregS, since you conveniently posted a .vim, I just dropped it into Christian's benchmark VI to test its speed and functionality.
Y'all both get shout-outs at NIWeek 2019. Christian gets a high five too since I know he'll be there. Greg gets one too if he shows up.