LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

Match pattern html tag

Solved!
Go to solution

There's a way to combine those two regular expressions, "[<][buis][>]" and "[<][/][buis][>]", in one regular expression?

 

Match pattern.jpg

0 Kudos
Message 1 of 12
(4,422 Views)

I posted this last year; I think it's what you're looking for.  There's a lot of good stuff on that thread.

Jim
You're entirely bonkers. But I'll tell you a secret. All the best people are. ~ Alice
For he does not know what will happen; So who can tell him when it will occur? Eccl. 8:7

Message 2 of 12
(4,392 Views)

Thanks Jim, there's really interesting stuff in this thread but I am still stuck (maybe I am missing something)...

Let me try to explain.  I need to know the position of each HTML tag (bold, underline, italic and strikeout) in the text, remove those tags and separate the text by styles (bold, underline, italic, strikeout or a mix of styles).

I start with a text like this:

"Il était une fois un <b>jongleur qui ne <i>savait pas</i> jongler alors<u> il utilisa</b> un </u>arc.  Il demeurait quand même un <s>peu</s> suspicieux."

 

The formatted text is:
"Il était une fois un jongleur qui ne savait pas jongler alors il utilisa un arc. Il demeurait quand même un peu suspicieux."

 

And I obtain this:

Style and text.jpg

 

Those VIs do the work (with some little bugs).

I am just wondering if it is possible to simplify the search of HTML tags.

 

Jean-Marc

0 Kudos
Message 3 of 12
(4,366 Views)

Hi Jean-Marc, 

 

I am still a little confused about what you are trying to do. Do you just want to figure out where a certain tag is, then sort by style? If so, try the following:

 

1) Do a simple comparison for the tags, using an Equal? VI, to find the desired tag, and its position

2) Use a String Subset VI, with offset of two or something like that for the style tag (example: for <b>jongleur qui ne, the tag you found with the comparison would be 'jongleur qui ne,' and if you did an offset of 2 it would give you the position of the style letter, or 'b' in this case). 

 

If this is not what you mean, please clarify further. Thanks!

 

~kgarrett

 

District Sales Engineer
0 Kudos
Message 4 of 12
(4,314 Views)

Do you care that your tags are improperly nested?

0 Kudos
Message 5 of 12
(4,308 Views)

@Darin.K wrote:

Do you care that your tags are improperly nested?


That threw me when I started working on it.

Jim
You're entirely bonkers. But I'll tell you a secret. All the best people are. ~ Alice
For he does not know what will happen; So who can tell him when it will occur? Eccl. 8:7

0 Kudos
Message 6 of 12
(4,298 Views)

 

I'll try to explain what I want to do.  Currently, if I add a string to the PDF document, the font (name, size, bold, underline, italic and strikeout) is the same for all the string.

 

Append paragraph0.jpg

 

I want to be able to change the style (bold, underline, italic and strikeout) for a part of the string and it is the reason that I want to use the html tags (<b></b>,<u></u>,<i></i> and <s></s>).

 

If a add this string:

Append paragraph.jpg

 

I want to automatically separate the string ( style by style - bold, underline, italic and strikeout), like this:

 

Style and text.jpg

 

After that, I will write  to the PDF document, the array of style and text, element by element.  Actually it is working but not completly tested.

I am just wondering if it is possible to simplify the search of HTML tags by combining the two regular expressions of the first message.

 

0 Kudos
Message 7 of 12
(4,287 Views)
I understand what you are trying to do. Your example text has ill-formed HTML, is this something you have to deal with, or just a result of hand generating the example. The suggestions you'll receive depend on this detail.
0 Kudos
Message 8 of 12
(4,274 Views)

Darin,

 

To be honest, I have no real experience with html and yes,  it's "a result of hand generating the example"... 

 

What I really want, it is the possibility to put a word in bold or italic or in bold and italic.  I just tried to be as general as possible.

0 Kudos
Message 9 of 12
(4,271 Views)
Solution
Accepted by topic author J-M

You wrote what I consider to be "human form" HTML, easy for us, hard for computers.  It is really much better to generate XML compliant HTML, which simply means that all tags must be closed in reverse order from their opening.  Sometimes this means you have to close a tag and then reopen.  For example:

 

<b>This is bold, <i>this is bold and italic</i></b><i>and just italic</i>

 

The single span of italic text has to be broken up to obey the rules.

 

If you generate well-formed html, you can simply use the built-in (LV9+) XML parser to find the formatting.  I use a simple XPath expression to get all text nodes, and iterate up the parents to find the formatting.  More robust in the long run than regexes.  You should break up into subVIs, I crammed it in to a single snippet for posting purposes.

 

ParseTextFormat.png

Message 10 of 12
(4,259 Views)