Poll: Preferred Way to Handle Offset and Length When Filtering

Hooovahh · ‎06-11-2020

Okay so this post is really to ask the opinion of the forum on how they would prefer a thing be best handled in an API. I don't think there is a right or wrong answer but I have an API I'm writing and I can't decide how others would best like to use it, and what makes sense.

Lets say I have a database. In that database is many thousands of entries for "Name" and "Age" of people. Something like this:

James - 20

John - 22

Robert - 26

Michael - 20

William - 25

David - 20

Richard - 20

Joseph - 26

Thomas - 22

Charles - 29

Because there are many thousands of entries my API has a read function that works with an Offset, and Length. So if you call the function with an Offset of 0 and Length 3 you'll get:

James - 20

John - 22

Robert - 26

Then it is assumed you would run this query again with an Offset of 3 and Length 3 getting:

Michael - 20

William - 25

David - 20

In the actual setup you would probably read something like 10,000 entries at a time, process those then get the next chunk.

With this API I see a benefit to adding a filter. So we will pass in the array of Ages that we actually care about. If we pass in a filter of 20 and read all the examples I provided, the result would be:

James - 20

Michael - 20

David - 20

Richard - 20

But here is the kicker. I'd still prefer and Offset and Length be optional inputs. You might have many thousands of values that match the 20 filter, and you may want to process them in chunks.

So the question is, how should Offset and Length be handled when filtering is applied? If I have an Offset of 0 and a Length of 3, with a Filter on 20 should the result be:

James - 20

or

James - 20

Michael - 20

David - 20

By that I mean should the length represent the total number of names processed, or the total number to be returned?

Secondly if I have an Offset of 3, a Length of 3, and a Filter on 20 should the result be:

Michael - 20

David - 20

or

Richard - 20

Or a 3rd option might be that filtering can only be applied when all entries in the database are ran. Offset and Length can't be used at all. Thoughts?

This isn't actually for a database it is for reading a CAN log which I would want to filter on the CAN ID but didn't know if people would be all that familiar with it. In these custom file formats querying the ID of all frames as they appeared can be quite difficult. Reading 3 at a time, then returning the filtered list is so much easier. Especially if a read takes place at some random part in the middle of the file first. If that happens I basically need to read all the file up to that point, counting the items that match the filter as I go.

Unofficial Forum Rules and Guidelines
Get going with G! - LabVIEW Wiki.

17 Part Blog on Automotive CAN bus. - Hooovahh - LabVIEW Overlord

al_g · ‎06-11-2020

I would expect Option 2 and Option 2

altenbach · ‎06-11-2020

I guess I don't quite understand the usefulness of this "database". The entries are not sorted in any recognizable matter and there probably are duplicate names. Returning entries based on some index or ordering seems arbitrary. Is there any other (hidden) field that is e.g. a unique users ID that determines the primary sort order?

What's the use case?

LabVIEW Champion.

al_g · ‎06-11-2020

I think he explains his use case at the end. It's for filtering CAN logs,

BertMcMahan · ‎06-11-2020

I would expect the first option in both cases, but I think this could be fixed by a change in wording. Instead of just "offset" and "length" I'd call it something like "Search start" and "Search length".

If you want it to behave the second way, I think I'd change the names to "Search start" and "Elements to return". That would make it more concise.

I think I'd want the *second* method to be the one I'd normally want to use, but the first is what I'd assume from the names alone.

PsyenceFact · ‎06-11-2020

My thought would be that the filtering applies *after* offset and length has selected the entries to work on. Then you always know how many items have been processed.

PsyenceFact

altenbach · ‎06-11-2020

@al_g wrote:

I think he explains his use case at the end. It's for filtering CAN logs,

Ah, OK. The inherent sort order is "time", I guess..

LabVIEW Champion.

Kevin_Price · ‎06-11-2020

And my thought is:

Why not make them 2 distinct API functions, one for Offset and Length, the other for Filtering? This eliminates the ambiguity *and* opens up the option to run them in either order. Much like the Unix idea of pipes. (And this wouldn't prevent you from making wrapper API functions that implement the 2 distinct orderings, if you prefer.)

You may lose some possible optimizations, but you gain clarity and flexibility. Might be worth the trade-off.

-Kevin P

ALERT! LabVIEW's subscription-only policy came to an end (finally!). Unfortunately, pricing favors the captured and committed over new adopters -- so tread carefully.

Bob_Schor · ‎06-12-2020

You have two operations: "Chunk", defined by Offset and Length, that operate on your data, and "Filter", defined by Filter, that also operates on your data.

A Mathematician would say that "Chunk and Filter don't commute", that is to say, applying Chunk ( Filter (data)) is not the same as Filter (Chunk (data)).

There's absolutely nothing inherently wrong about this -- it is just the way you've defined the Operations, and what they do. There may well be instances when you want to Filter First, and then Chunk (which I'd symbolize as Chunk * Filter ()) and other instances where you are interested in Filtering the Chunk'ed data, Filter * Chunk (). You just have to realize that Order of Operations Matters.

Bob Schor

Yamaeda · ‎06-12-2020

I'd assume the offset and length to be applied after the filter, as per SQLs "top 10"

Compare to "Select top 3 * from Persons where age >25"

G# - Award winning reference based OOP for LV, for free! - Qestit VIPM GitHub

Qestit Systems

LabVIEW

Poll: Preferred Way to Handle Offset and Length When Filtering

Poll: Preferred Way to Handle Offset and Length When Filtering

Re: Poll: Preferred Way to Handle Offset and Length When Filtering

Re: Poll: Preferred Way to Handle Offset and Length When Filtering

Re: Poll: Preferred Way to Handle Offset and Length When Filtering

Re: Poll: Preferred Way to Handle Offset and Length When Filtering

Re: Poll: Preferred Way to Handle Offset and Length When Filtering

Re: Poll: Preferred Way to Handle Offset and Length When Filtering

Re: Poll: Preferred Way to Handle Offset and Length When Filtering

Re: Poll: Preferred Way to Handle Offset and Length When Filtering

Re: Poll: Preferred Way to Handle Offset and Length When Filtering