07-01-2011 10:36 AM
There's a way to combine those two regular expressions, "[<][buis][>]" and "[<][/][buis][>]", in one regular expression?
Solved! Go to Solution.
07-01-2011 08:18 PM
I posted this last year; I think it's what you're looking for. There's a lot of good stuff on that thread.
07-02-2011 11:20 AM
Thanks Jim, there's really interesting stuff in this thread but I am still stuck (maybe I am missing something)...
Let me try to explain. I need to know the position of each HTML tag (bold, underline, italic and strikeout) in the text, remove those tags and separate the text by styles (bold, underline, italic, strikeout or a mix of styles).
I start with a text like this:
"Il était une fois un <b>jongleur qui ne <i>savait pas</i> jongler alors<u> il utilisa</b> un </u>arc. Il demeurait quand même un <s>peu</s> suspicieux."
The formatted text is:
"Il était une fois un jongleur qui ne savait pas jongler alors il utilisa un arc. Il demeurait quand même un peu suspicieux."
And I obtain this:
Those VIs do the work (with some little bugs).
I am just wondering if it is possible to simplify the search of HTML tags.
Jean-Marc
07-06-2011 06:55 PM
Hi Jean-Marc,
I am still a little confused about what you are trying to do. Do you just want to figure out where a certain tag is, then sort by style? If so, try the following:
1) Do a simple comparison for the tags, using an Equal? VI, to find the desired tag, and its position
2) Use a String Subset VI, with offset of two or something like that for the style tag (example: for <b>jongleur qui ne, the tag you found with the comparison would be 'jongleur qui ne,' and if you did an offset of 2 it would give you the position of the style letter, or 'b' in this case).
If this is not what you mean, please clarify further. Thanks!
~kgarrett
07-06-2011 07:17 PM
Do you care that your tags are improperly nested?
07-07-2011 05:20 AM
@Darin.K wrote:
Do you care that your tags are improperly nested?
That threw me when I started working on it.
07-07-2011 08:20 AM - edited 07-07-2011 08:22 AM
I'll try to explain what I want to do. Currently, if I add a string to the PDF document, the font (name, size, bold, underline, italic and strikeout) is the same for all the string.
I want to be able to change the style (bold, underline, italic and strikeout) for a part of the string and it is the reason that I want to use the html tags (<b></b>,<u></u>,<i></i> and <s></s>).
If a add this string:
I want to automatically separate the string ( style by style - bold, underline, italic and strikeout), like this:
After that, I will write to the PDF document, the array of style and text, element by element. Actually it is working but not completly tested.
I am just wondering if it is possible to simplify the search of HTML tags by combining the two regular expressions of the first message.
07-07-2011 08:56 AM
07-07-2011 09:12 AM - edited 07-07-2011 09:14 AM
Darin,
To be honest, I have no real experience with html and yes, it's "a result of hand generating the example"...
What I really want, it is the possibility to put a word in bold or italic or in bold and italic. I just tried to be as general as possible.
07-07-2011 11:57 AM
You wrote what I consider to be "human form" HTML, easy for us, hard for computers. It is really much better to generate XML compliant HTML, which simply means that all tags must be closed in reverse order from their opening. Sometimes this means you have to close a tag and then reopen. For example:
<b>This is bold, <i>this is bold and italic</i></b><i>and just italic</i>
The single span of italic text has to be broken up to obey the rules.
If you generate well-formed html, you can simply use the built-in (LV9+) XML parser to find the formatting. I use a simple XPath expression to get all text nodes, and iterate up the parents to find the formatting. More robust in the long run than regexes. You should break up into subVIs, I crammed it in to a single snippet for posting purposes.