I've been happily coding away, converting a fairly complex MS Word solution into Writer, until I discovered a problem with the way OOo finds text in a Writer document using regular expressions. My research shows that it is well documented that the find is "greedy" and that is my problem. I am searching for text that is enclosed within tags (like html or xml tags). Here is an example:
Code: Select all
Here is an example with <myTag>some tagged text</myTag> and some non-tagged text. The problem is finding <myTag>individual instances of tagged text</myTag> that occur in the same paragraph without also finding all of the non-tagged text in between.
The basic regex that finds too much because it is greedy is:
I've also tried this:
which partly solves the problem, but fails for the case where tags are nested.
Code: Select all
Here is an example of <myTag>tagged text that also includes <anotherTag>some nested tags</anotherTag> inside it</myTag>. This won't work with the negated regex above.
Does anyone have any suggestions on how I might proceed? I realize this maybe isn't strictly a Macro/UNO API problem, since it can be demonstrated within Writer itself, but in the end I am using the regex in a Basic routine.
Thanks,