01-23-2026 02:14 PM
Hi. I am attempting to use "Match Regular Expression" to help trim down some data tables that I export from ANSYS Mechanical. It was working for a while, but at some point it kinda just started skipping/missing matchable patterns, and I cannot figure out what I am missing.
I start with the attached RegExIssue.html file, which has various tables and things:
I run this file through a VI which uses the regular expression to strip out various tables and things. The full version reformats the data and strips it down to the bare necessities. The version that I have attached here is a simplified version with just the expression in question.
The regular expression that I am using is as follows:
<tbody><tr><th class="GroupName" colspan="\d*">(Scope|Definition|Integration Point Results|Information)</th></tr>(<tr><th class="ItemName">[\w\s]*</th>((<td colspan="\d*">[\w\s\(\)\+\-\*\/\=\.\,\°\%]*</td>)|(<td>[\w\s\(\)\+\-\*\/\=\.\,\°\%]*</td>)+)+</tr>)+</tbody>
When it functions properly, the sub tables with the headings "Scope", "Definition", "Integration Point Results", and "Information" should be found by the regular expression and then removed (one at a time). What I have found, recently, is that one of these sub tables does not get removed, the "Definition" one in this example. The result that I get currently is shown in the RegExIssue-Cleaned.html file, shown below:
I do not know what I am missing in my regular expression to grab this table, especially since I am able to successfully utilize this regular expression on websites such as regex101.com
I was hoping I could get some insight or direction regarding where my issue is coming from? Any help would be greatly appreciated. Thank you.
01-23-2026 03:02 PM
I'll just link to this classic post...
https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags
That said I see a couple issues- first, when I copy/paste your stuff into regex101, it gives a bunch of errors. Those same errors don't appear in LabVIEW- I haven't tracked down why, but FYI.
Also, I think I'd recommend a different approach. You are trying to grab the whole thing as one regex match, and (see post above) a regex isn't a great way to parse HTML. I'd recommend still using regex, but more as a pseudo-parser- use a regex to find the header of the table, then parse until you find the end of the table, then delete that substring. You'll need to keep track of nested table declarations manually, but I think it'll be more robust than your current method, which seems fragile.
01-23-2026 03:14 PM
The one thing that jumps out at me is the degree symbol on the Sweeping Phase line.
LabVIEW may not support extended ascii chars
01-23-2026 04:04 PM
Yeah with regex101 I/you/we just need to change the delimeter in the top left corner to something other than the / and it seems to work fine.
Regarding that link, yes, very good. Very passionately written. I think because I am using this for a very specific set of cases that I still want to pursue this. I think part of the deal was I was having trouble doing what you suggested. For some reason I was having difficulty finding the end of whatever table I was looking for, and eventually I stumbled upon this goofy solution. When most of these tables end with </tr></tbody> it makes it a bit tricky to find the right one. But yeah, it is fragile. The full version I wrote in something that I call my "Step Stepper", which allows me to step through the program bit by bit to see where things break and try to fix it.
Any time I've tried regex stuff on an actual website it has always been a nightmare and I give up very very fast.
01-23-2026 04:04 PM
Yeah I tried a few different ones, ALT+0176 ° and ALT+248 °, and neither seemed to work. Its weird because I think I've had that symbol in there for a while so it must have worked at some point.
01-23-2026 05:38 PM
@JBatSRO wrote:
For some reason I was having difficulty finding the end of whatever table I was looking for, and eventually I stumbled upon this goofy solution. When most of these tables end with </tr></tbody> it makes it a bit tricky to find the right one. But yeah, it is fragile. The full version I wrote in something that I call my "Step Stepper", which allows me to step through the program bit by bit to see where things break and try to fix it.
I think that's where you'll be better off, actually. Doing some "regular code" plus regex will always be more powerful than regex alone. That's a REALLY long regex and it's going to be hard to nail it down.
Try breaking it up into a series of much simpler steps. As that link mentions, the groupings are not guaranteed to be in specific orders, so a regex can't really do what you need it to in a "general" sense. Your case might be focused enough for it to work, but it'll be fragile.
If you boil it down to substeps, I think you first want to find where each subtable starts as marked by "GroupName" and delete the areas between them. This VI will find these start points:
If you want to delete everything between those rows, it's now easy for all except Information. The Information section goes all the way through the end of the table, so you then need to find that by looking for the first </td></tr> followed by a </table> marker AFTER the Information position. Again, that's an easy substep:
Put those together and you can parse the whole thing:
That was pretty quick and dirty but I think that method of a bunch of simpler functions/searches will work more reliably than a giant regex.