BreakPoint

cancel
Showing results for 
Search instead for 
Did you mean: 

Mini Coding Challenge - Binary Search (sort of)

You can't use search array or sort array, because you are only interested in one particular cluster element. If this is just an integer array, you are still far from optimal. Search is linear and needs to inspect millions of values. If the 1-4 check fails, there is no need for any search. If 2 is not found, there is no need to search for 3.

If 2 is found, that would be used as the start index for the 3 search. I am sure my suggestion is many orders of magnitude faster. (Currently on the bus. Will try something later ... You never need to inspect more than a few dozen elements to solve)

0 Kudos
Message 11 of 28
(6,123 Views)

I would suggest a more generic formulation of the problem to make it more reusable.

 

"Given a array of clusters containing an integer element that is non-descending, determine if all possible integer values between the values of the first and last array element are present."

 

Maybe for another time... 🙂

Message 12 of 28
(6,117 Views)

Alright array of cluster.  1 and 4 still checked the same, 2 and 3 checking each element.  Is it possible to pull the cluster elements out efficiently and then use search 1D array?  I get that looking at each element when there are millions can be slow, and this is brute forcing the array.

 

Example_VI_BD.png

 

0 Kudos
Message 13 of 28
(6,109 Views)

This approach would work, but if the numeric values look like this:

 

1, 1, 1, .... 1, 2, 3, 4

 

...then we have to go through the entire array to find out it's legit. I need a solution that looks like a binary search, that goes through the array as efficiently as possible.

0 Kudos
Message 14 of 28
(6,104 Views)

Sorry it took two pages of replies to finally show me what you are trying to do.  Yeah I think an intelligent binary search could be performed. 

 

Do we have some real world data to try the normal approach with first?  Is the non binary search fast enough with real world data?  I just tested with 10M values and worst case (like you showed) took around 70ms, 100M took around 700ms.  Maybe cluster complexity increases the time needed?

0 Kudos
Message 15 of 28
(6,095 Views)

Sorry I keep moving the goalposts. The array of clusters is the easiest way conceptually to present the coding challenge. In reality, it's not exactly an array of clusters. Suffice to say that the binary search will definitely result in time savings much greater than an order of milliseconds. But whatever "binary search with logic to handle looking for two things" answer y'all come up with should be translatable to the situation I'm dealing with. 🙂

0 Kudos
Message 16 of 28
(6,090 Views)

Here's a quick draft (LabVIEW 2015). A few microseconds (1-3us) for 10M million elements of an array containing a cluster with the "interesting number", three other random DBLs and a short string.

(Sorry, the data generator ran out of memory for 100M but the analysis should still be in the low microseconds. We need to simplify the cluster for larger sizes...)

 

(Some things could be tweaked and there might be edge cases that need to be addressed. No failure for Size=1k and >1M trials: looks promising....:) The innermost case could also be wired directly to the number with one case "2" and the other "default"))

 

(Note that the data generation is significantly slower that the validity check. It takes about 10s to generate the 10M array ... 🐵

 

Message 17 of 28
(6,082 Views)

My implementation (LV2018) is very similar to CA's except I used an enum to track progress.

Message 18 of 28
(6,052 Views)

@GregSands wrote:

My implementation (LV2018) is very similar to CA's except I used an enum to track progress.


I haven't tested yours because my data input structures are different, but one thing to remember is that any case structure with more than two cases is measurably slower than a case structures with only two cases. Your stack of case structures has 15 possible paths while mine only has four possible paths. (Yes, my outer case has 3 cases, but the code is so fast it does not really matter).

 

I am curious what the performance would be.

 

 

Message 19 of 28
(6,047 Views)

Christian and GregS, y'all rock! Your algorithms both work well for my situation. Y'all brought my processing time down from ~1 sec to 0.003 sec. GregS, since you conveniently posted a .vim, I just dropped it into Christian's benchmark VI to test its speed and functionality.

 

Y'all both get shout-outs at NIWeek 2019. Christian gets a high five too since I know he'll be there. Greg gets one too if he shows up.

Message 20 of 28
(6,034 Views)