Odd Regex Behavior

Topics: Developer Forum, User Forum
Jun 26, 2009 at 4:15 PM
Edited Jun 26, 2009 at 4:32 PM

Hey guys,

My regexp is as follows:

 

^/Tours/([a-zA-Z_\-& ]*)/?$

 

which should match "/Tours/SOMETHINGHERE" and "/Tours/SOMETHINGHERE/" with $1="SOMETHINGHERE" but does not match at all and continues to the next rule.

if I change the above to "^/Tours/(.*)/?$" then I get the result "SOMETHINGHERE" and "SOMETHINGHERE/" respectivley but I can not have the tailing '/' which is why i have the more complicated function.

 

Please tell me that my issue here is not that IIRF doesn't support the ? metacharacter....

 

Thanks,

A.

 

P.S. As of right now I have the following rules:

RewriteRule  ^/Tours/(.*)/$
RewriteRule  ^/Tours/(.*)/?$
but I don't like have to make exceptions.... (first matches /Tours/SOMETHINGHERE/ second one matchtes /Tours/SOMETHINGHERE and I'm no longer sure why...)

Coordinator
Jun 26, 2009 at 8:34 PM
Edited Jun 26, 2009 at 8:37 PM

Regex's can be hard to get right.

There is a nifty tool called the TestDriver.exe included in the IIRF download.  It allows you to test rules against various incoming URI requests.  In the readme there is some information on the tool, and there are also tons of samples on how to use the TestDriver in the download.  The tool can help you iteratively develop and verify the Regex's  in your rules.  Also it is indispensable for testing the rules you have.  You wouldn't deploy software without rigoous testing; rewrite rules should be subject to the same rigor.  You want to be sure your rewrites work for expected and unexpected inputs, especially weird edge cases. 

Anyway,  I spent about 60 seconds running the TestDriver tool, ran it 4 or 5 times, and got the answer you are looking for.

The rule I used:

   RewriteRule  ^/Tours/([-a-zA-Z\020&]+)/?$       /TOURS/$1   [L]

To explain what I changed and why: 

  1. First I completed the rule by adding a replacement pattern. I didn't see a replacement pattern in any of your rules.
  2. NExt I used the + quantifier and not the *, because I assume you want something to actually be there.  * means "zero or more"  while + means "one or more". I assumed you want the match to succeed if and only if there is at least one character.
  3. Next I used the \020 to indicate the space. I figured this out by looking at the output of the testdriver.exe program. It was barfing on the rule you provided, the one that included the space between the square brackets. This I figured meant that the rule parser was not happy about a space in the middle of the regex. No problem, I knew that PCRE allows the use of HEX ascii char codes, so I swapped a space out and used \020 instead. When I ran the testdriver, it was no longer barfing on the rule.
  4. Next I relocated the dash to the start of the regex. I am not sure this is required but somewhere back in the recesses of my mind I recalled a requirement that if a character set (the thing you specify between square brackets) includes a dash, then the dash must be the first character in the set.  In this case you don't need to escape it.  I am not sure if this is a real reqmt or not, but I did it anyway. I find it more readable.

IIRF uses PCRE as the regex engine, which does include support for the ? quantifier. You can read more about the regex syntax for PCRE in the the man page for PCRE.

 

Jul 1, 2009 at 5:43 PM
Edited Jul 1, 2009 at 5:45 PM

Wow, I never expected a reply with so much information!
Thankyou very much for the help you provided as I have gaiend quite a bit of knowledge here.

Somehow I must have glossed over the fact that such a tool came with this software. I had resorted to using an online tester which must have had a different engine.

Again, your post was most helpful and I am extremely appreciative of it,
A.