Match pattern html tag

J-M · ‎07-01-2011

There's a way to combine those two regular expressions, "[<][buis][>]" and "[<][/][buis][>]", in one regular expression?

Jean-Marc
LV2019
Free PDF Report with iTextSharp

jcarmody · ‎07-01-2011

I posted this last year; I think it's what you're looking for. There's a lot of good stuff on that thread.

Jim
You're entirely bonkers. But I'll tell you a secret. All the best people are. ~ Alice
For he does not know what will happen; So who can tell him when it will occur? Eccl. 8:7

J-M · ‎07-02-2011

Thanks Jim, there's really interesting stuff in this thread but I am still stuck (maybe I am missing something)...

Let me try to explain. I need to know the position of each HTML tag (bold, underline, italic and strikeout) in the text, remove those tags and separate the text by styles (bold, underline, italic, strikeout or a mix of styles).

I start with a text like this:

"Il était une fois un jongleur qui ne savait pas jongler alors il utilisa un arc. Il demeurait quand même un <s>peu</s> suspicieux."

The formatted text is:
"Il était une fois un jongleur qui ne savait pas jongler alors il utilisa un arc. Il demeurait quand même un ~~peu~~ suspicieux."

And I obtain this:

Those VIs do the work (with some little bugs).

I am just wondering if it is possible to simplify the search of HTML tags.

Jean-Marc

Jean-Marc
LV2019
Free PDF Report with iTextSharp

kellis.wiseman · ‎07-06-2011

Hi Jean-Marc,

I am still a little confused about what you are trying to do. Do you just want to figure out where a certain tag is, then sort by style? If so, try the following:

1) Do a simple comparison for the tags, using an Equal? VI, to find the desired tag, and its position

2) Use a String Subset VI, with offset of two or something like that for the style tag (example: for jongleur qui ne, the tag you found with the comparison would be 'jongleur qui ne,' and if you did an offset of 2 it would give you the position of the style letter, or 'b' in this case).

If this is not what you mean, please clarify further. Thanks!

~kgarrett

District Sales Engineer

Darin.K · ‎07-06-2011

Do you care that your tags are improperly nested?

jcarmody · ‎07-07-2011

@Darin.K wrote:

Do you care that your tags are improperly nested?

That threw me when I started working on it.

Jim
You're entirely bonkers. But I'll tell you a secret. All the best people are. ~ Alice
For he does not know what will happen; So who can tell him when it will occur? Eccl. 8:7

J-M · ‎07-07-2011

I'll try to explain what I want to do. Currently, if I add a string to the PDF document, the font (name, size, bold, underline, italic and strikeout) is the same for all the string.

I want to be able to change the style (bold, underline, italic and strikeout) for a part of the string and it is the reason that I want to use the html tags (,, and <s></s>).

If a add this string:

I want to automatically separate the string ( style by style - bold, underline, italic and strikeout), like this:

After that, I will write to the PDF document, the array of style and text, element by element. Actually it is working but not completly tested.

I am just wondering if it is possible to simplify the search of HTML tags by combining the two regular expressions of the first message.

Jean-Marc
LV2019
Free PDF Report with iTextSharp

Darin.K · ‎07-07-2011

I understand what you are trying to do. Your example text has ill-formed HTML, is this something you have to deal with, or just a result of hand generating the example. The suggestions you'll receive depend on this detail.

J-M · ‎07-07-2011

Darin,

To be honest, I have no real experience with html and yes, it's "a result of hand generating the example"...

What I really want, it is the possibility to put a word in bold or italic or in bold and italic. I just tried to be as general as possible.

Jean-Marc
LV2019
Free PDF Report with iTextSharp

Darin.K · ‎07-07-2011

You wrote what I consider to be "human form" HTML, easy for us, hard for computers. It is really much better to generate XML compliant HTML, which simply means that all tags must be closed in reverse order from their opening. Sometimes this means you have to close a tag and then reopen. For example:

This is bold, this is bold and italicand just italic

The single span of italic text has to be broken up to obey the rules.

If you generate well-formed html, you can simply use the built-in (LV9+) XML parser to find the formatting. I use a simple XPath expression to get all text nodes, and iterate up the parents to find the formatting. More robust in the long run than regexes. You should break up into subVIs, I crammed it in to a single snippet for posting purposes.

LabVIEW

Match pattern html tag

Match pattern html tag

Re: Match pattern html tag

Re: Match pattern HTML tag

Re: Match pattern HTML tag

Re: Match pattern HTML tag

Re: Match pattern HTML tag

Re: Match pattern HTML tag

Re: Match pattern HTML tag

Re: Match pattern HTML tag

Re: Match pattern HTML tag