LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

Match Regular Expression missing a pattern

Hi. I am attempting to use "Match Regular Expression" to help trim down some data tables that I export from ANSYS Mechanical. It was working for a while, but at some point it kinda just started skipping/missing matchable patterns, and I cannot figure out what I am missing.

 

I start with the attached RegExIssue.html file, which has various tables and things:

JBatSRO_0-1769197760885.png

I run this file through a VI which uses the regular expression to strip out various tables and things. The full version reformats the data and strips it down to the bare necessities. The version that I have attached here is a simplified version with just the expression in question.

 

JBatSRO_3-1769198983480.png

 

The regular expression that I am using is as follows:

<tbody><tr><th class="GroupName" colspan="\d*">(Scope|Definition|Integration Point Results|Information)</th></tr>(<tr><th class="ItemName">[\w\s]*</th>((<td colspan="\d*">[\w\s\(\)\+\-\*\/\=\.\,\°\%]*</td>)|(<td>[\w\s\(\)\+\-\*\/\=\.\,\°\%]*</td>)+)+</tr>)+</tbody>

 

When it functions properly, the sub tables with the headings "Scope", "Definition", "Integration Point Results", and "Information" should be found by the regular expression and then removed (one at a time). What I have found, recently, is that one of these sub tables does not get removed, the "Definition" one in this example. The result that I get currently is shown in the RegExIssue-Cleaned.html file, shown below:

 

JBatSRO_1-1769198141393.png

 

I do not know what I am missing in my regular expression to grab this table, especially since I am able to successfully utilize this regular expression on websites such as regex101.com

 

I was hoping I could get some insight or direction regarding where my issue is coming from? Any help would be greatly appreciated. Thank you.

Download All
0 Kudos
Message 1 of 6
(132 Views)

I'll just link to this classic post...

https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags

 

That said I see a couple issues- first, when I copy/paste your stuff into regex101, it gives a bunch of errors. Those same errors don't appear in LabVIEW- I haven't tracked down why, but FYI.


Also, I think I'd recommend a different approach. You are trying to grab the whole thing as one regex match, and (see post above) a regex isn't a great way to parse HTML. I'd recommend still using regex, but more as a pseudo-parser- use a regex to find the header of the table, then parse until you find the end of the table, then delete that substring. You'll need to keep track of nested table declarations manually, but I think it'll be more robust than your current method, which seems fragile.

Message 2 of 6
(113 Views)

The one thing that jumps out at me is the degree symbol on the Sweeping Phase line.

 

LabVIEW may not support extended ascii chars

0 Kudos
Message 3 of 6
(105 Views)

Yeah with regex101 I/you/we just need to change the delimeter in the top left corner to something other than the / and it seems to work fine.

 

JBatSRO_0-1769204098824.png

 

Regarding that link, yes, very good. Very passionately written. I think because I am using this for a very specific set of cases that I still want to pursue this.  I think part of the deal was I was having trouble doing what you suggested. For some reason I was having difficulty finding the end of whatever table I was looking for, and eventually I stumbled upon this goofy solution. When most of these tables end with </tr></tbody> it makes it a bit tricky to find the right one. But yeah, it is fragile. The full version I wrote in something that I call my "Step Stepper", which allows me to step through the program bit by bit to see where things break and try to fix it.

 

Any time I've tried regex stuff on an actual website it has always been a nightmare and I give up very very fast.

 

0 Kudos
Message 4 of 6
(81 Views)

Yeah I tried a few different ones,  ALT+0176 ° and ALT+248 °, and neither seemed to work. Its weird because I think I've had that symbol in there for a while so it must have worked at some point. 

0 Kudos
Message 5 of 6
(80 Views)

@JBatSRO wrote:

For some reason I was having difficulty finding the end of whatever table I was looking for, and eventually I stumbled upon this goofy solution. When most of these tables end with </tr></tbody> it makes it a bit tricky to find the right one. But yeah, it is fragile. The full version I wrote in something that I call my "Step Stepper", which allows me to step through the program bit by bit to see where things break and try to fix it.


I think that's where you'll be better off, actually. Doing some "regular code" plus regex will always be more powerful than regex alone. That's a REALLY long regex and it's going to be hard to nail it down.

 

Try breaking it up into a series of much simpler steps. As that link mentions, the groupings are not guaranteed to be in specific orders, so a regex can't really do what you need it to in a "general" sense. Your case might be focused enough for it to work, but it'll be fragile.

 

If you boil it down to substeps, I think you first want to find where each subtable starts as marked by "GroupName" and delete the areas between them. This VI will find these start points:

 

Find header row positions.png

 

If you want to delete everything between those rows, it's now easy for all except Information. The Information section goes all the way through the end of the table, so you then need to find that by looking for the first </td></tr> followed by a </table> marker AFTER the Information position. Again, that's an easy substep:

 

Find end of last row in table.png

 

Put those together and you can parse the whole thing:

 

RegExIssue.png

 

That was pretty quick and dirty but I think that method of a bunch of simpler functions/searches will work more reliably than a giant regex.

Message 6 of 6
(54 Views)