11-07-2013 02:35 PM
Hi Team! I've run into a new unexpected behavior, this time while using Subroutines in a regular expression.
The following snippet in LV2013 shows a simple subroutine, and the results from three LabVIEW functions.
The interesting part here is the array of offets returned by Execute Offsets -- let's deconstruct this:
Elements 0-1: The Whole Match is from index position 1 to 2 in the string
Elements 2-3: The first submatch is from index position -1 to -1
Elements 4-5: The second submatch is from index position 1 to 2
The unexpected behavior comes in the form of the first submatch -- what is the significance of -1 in this context?
It seems that this "first submatch" should not exist, but its effect propagates up to Execute and up to the Match Regular Expression node in the form of "blank submatches". Snippet below for LV2013:
11-07-2013 04:05 PM
LabVIEW uses the pcre library (http://pcre.org) to handle parsing the regular expression and doing the matches. The outputs you are seeing in the offsets array are the offsets returned by the pcre library itself (the result of calling the pcre_exec function). The contents of the array are pairs of offsets (first character and one plus last character) for the whole match (the first pair) and then each captured match. It seems that the pcre library (at least the verison that LabVIEW currently uses) considers that defined subroutine to be a submatch that warrants inclusion in the offsets array, even though the specification says that this kind of expression will always fail to match (since it's only used for referring to later by reference). The -1,-1 pair just represents a captured pattern that did not match anything.
tl;dr That's just how the library we use works. It's not behavior defined by LabVIEW. It's behavior defined by pcre.
11-07-2013 10:45 PM
Adam, thank you for the quick reply.
So, is there any way to extract meaningful information from these offsets returned by Execute Offsets from a subroutine?
In this new snippet below, we add a capturing group around a new subroutine 'theletterc' in an attempt to provide some meaningful offsets -- no dice.
Subroutines give meaningful values for the whole match (not shown in the snippet below, but it's bc as expected), but I can't figure out how to extract meaningful info from capturing groups in the subroutine. Is this even possible? (Having scoured the internet and PCRE man page, it seems this may not be a supported feature PCRE).
11-08-2013 10:23 AM
I don't know. You're at the bleeding edge of advanced features of regular expressions so maybe PCRE just doesn't handle it in a sane way, or maybe those features would require additional API calls that LabVIEW doesn't use. You may just have to keep digging in the PCRE documentation.
FWIW, in my experience the more complex a regular expression gets the more likely it is that you would be better off just writing some custom logic to recognize that pattern. Your code would likely end up more readable and more efficient. I like regular expressions (I added this feature to LabVIEW, after all), but sometimes they're just not the best solution. This may be one of those cases.
11-18-2013 06:44 PM
@AdamKemp wrote:
FWIW, in my experience the more complex a regular expression gets the more likely it is that you would be better off just writing some custom logic to recognize that pattern. Your code would likely end up more readable and more efficient. I like regular expressions (I added this feature to LabVIEW, after all), but sometimes they're just not the best solution. This may be one of those cases.
Ha! Agree with you wholeheartedly; that's actually the prime motivation here -- templatizing some boilerplate regexes as subroutines! Just like with other source code, being able to modularize and reuse is likely better than a monolith. This also reduces recursion to something sensible and readable. It's been working great in practice (after a new wrapper around Offsets to handle -1's in a nicer way), and I'm still going to casually look for alternatives that allow submatches to be returned from subroutines.