OR sequence in Find & Replace regular expression

Discuss the spreadsheet application
Post Reply
huw
Volunteer
Posts: 417
Joined: Wed Nov 21, 2007 1:57 pm

OR sequence in Find & Replace regular expression

Post by huw »

Code: Select all

^([:space:]|,|\.|[:lower:])
works to find the specified characters at the beginning of a cell. I'll call this expression A.

Code: Select all

([:space:]{2})|([:alnum:](\.|,)[:alnum:])|(([:space:]|,|\.)$)
works to find certain other character combinations or locations. I'll call this expression B

Combining them A|B thus

Code: Select all

^([:space:]|,|\.|[:lower:])|([:space:]{2})|([:alnum:](\.|,)[:alnum:])|(([:space:]|,|\.)$)
works fine, yet B|A

Code: Select all

([:space:]{2})|([:alnum:](\.|,)[:alnum:])|(([:space:]|,|\.)$)|^([:space:]|,|\.|[:lower:])
does not! Expression B is not evaluated - does anyone have an idea why?
User avatar
FPeters
Volunteer
Posts: 20
Joined: Sun Oct 07, 2007 9:28 pm
Location: Hamburg

Re: OR sequence in Find & Replace regular expression

Post by FPeters »

Could you give an example string that is not working? It's rather hard to
make your way through these regexps.

-f
huw
Volunteer
Posts: 417
Joined: Wed Nov 21, 2007 1:57 pm

Re: OR sequence in Find & Replace regular expression

Post by huw »

Sorry!

Expression A selects any cell beginning with either a space, comma, period, or lowercase letter.

Expression B selects any cell containing two consecutive spaces, OR a comma or period with an alphanumeric character directly each side, OR ending with a space, comma, or period.

A test set:

Code: Select all

 E
,E
.E
ee
E  e
E,e
E.e
E 
E,
E.
End
Note the first "E" in that list should have a leading space, but it is being stripped by the forum software so you'll have to add it yourself.

All but "End" are selected by A|B, but the mystery is that B|A misses A.

Remember to turn on Regular expressions and Case sensitivity.
User avatar
acknak
Moderator
Posts: 22756
Joined: Mon Oct 08, 2007 1:25 am
Location: USA:NJ:E3

Re: OR sequence in Find & Replace regular expression

Post by acknak »

My guess is the "^" anchor that starts your A is the root of the problem. In A|B, the "^" is at the beginning of the pattern; in B|A, it falls in the middle.

In my (limited, I admit) understanding and experience, OOo's regexp search treats the anchors rather specially and gets confused when they don't appear at the start of an expression.

PS: If it's any help, Perl gives the same result either way: all lines from your test match except #1, #8 and #11. I deleted all leading spaces from your test sample, except one from the start of line #1.
AOO4/LO5 • Linux • Fedora 23
huw
Volunteer
Posts: 417
Joined: Wed Nov 21, 2007 1:57 pm

Re: OR sequence in Find & Replace regular expression

Post by huw »

acknak wrote:My guess is the "^" anchor that starts your A is the root of the problem. In A|B, the "^" is at the beginning of the pattern; in B|A, it falls in the middle.

In my (limited, I admit) understanding and experience, OOo's regexp search treats the anchors rather specially and gets confused when they don't appear at the start of an expression.

PS: If it's any help, Perl gives the same result either way: all lines from your test match except #1, #8 and #11. I deleted all leading spaces from your test sample, except one from the start of line #1.
Thanks. It looks like another quirk of OOo's regex implementation.

I've only ever used regex in OOo, never in a respected implementation, so forgive me if I ask why lines #1 & #8 didn't match in Perl - #1 has a leading space that should be caught by expression A, #8 has a trailing space that should be caught by expression B.

Note only line #1 should have a leading space.
User avatar
acknak
Moderator
Posts: 22756
Joined: Mon Oct 08, 2007 1:25 am
Location: USA:NJ:E3

Re: OR sequence in Find & Replace regular expression

Post by acknak »

... why lines #1 & #8 didn't match in Perl...
Because I'm an idiot, in a hurry ;-)

I forgot to fix the named classes in your pattern, so they weren't working properly, and I somehow lost the trailing space on #8.

Once I actually pay attention to what I'm doing, it works much better (only #11 does not match).

Just for kicks, here is what your pattern would look like in Perl, along with the output when run against your sample:

Code: Select all

#!/usr/bin/perl -n

chomp;

my $match = 
/
   [[:space:],.]$
 |
  ^[[:space:],.[:lower:]]
 |
   [[:space:]]{2}
 |
  ([[:alnum:]][.,][[:alnum:]])
/ox;

printf("%2d: %-8s %s\n", $., "'$_'", $match ? "matched <$`'$&'$'>" : "no match");

Code: Select all

$ perl abx e
 1: ' E'     matched <' 'E>
 2: ',E'     matched <','E>
 3: '.E'     matched <'.'E>
 4: 'ee'     matched <'e'e>
 5: 'E  e'   matched <E'  'e>
 6: 'E,e'    matched <'E,e'>
 7: 'E.e'    matched <'E.e'>
 8: 'E '     matched <E' '>
 9: 'E,'     matched <E','>
10: 'E.'     matched <E'.'>
11: 'End'    no match
Perhaps this will help make it clear why the named classes are meant to appear inside "[classes]" and not to stand on their own.
AOO4/LO5 • Linux • Fedora 23
huw
Volunteer
Posts: 417
Joined: Wed Nov 21, 2007 1:57 pm

Re: OR sequence in Find & Replace regular expression

Post by huw »

This is now issue 84828.
Post Reply