Problems ReWrite/Redirect Rule Set

Topics: User Forum
Sep 2, 2011 at 2:19 AM

HELP!

I'm not a programmer and I am having issues with a rule set.  With the exception of one page I have every page of one site rewriting and redirecting fine.  Here are the rules I have written for the page:

ReWriteRule ^/outings /outings.php?page_id=4050&name=Outings [L]

RedirectRule ^/.*(?=Outings) http://www.stevinsonranch.com/outings [R=301]

RedirectRule ^/.*(?=92730) http://www.stevinsonranch.com/outings [R=301]

Obviously, rule #1 rewrites the URL.  Rule #2 sets up a loop so that when the old, unfriendly URL is used, it will redirect to the new URL.  Rule #3 redirects a URL from another web site to the new page.

Run #1 and rule #3 work great.  I can't get the second rule to work at all.  I have tried several variations of the rule such as :

RedirectRule ^/.*(?=4050) http://www.stevinsonranch.com/outings [R=301]

and

RedirectRule ^/.*(?=Outings)$ http://www.stevinsonranch.com/outings [R=301]

Neither variation worked.

Please HELP!

Coordinator
Sep 2, 2011 at 3:51 AM
Edited Sep 2, 2011 at 4:01 AM

did you look in the IIRF logfile? 

Set the LogLevel to be 3 or 4, then run a URL through the site. There will be messages in the IIRF logfile indicating what IIRF did with the URL and why.

Also - you might want to try a regex tester.  It will show you that your regex does not match what you think it matches.

Also - why are you using (?=4050)  ?  Why the parens, and why the ?= which in regex-speak, is a zero-width non-capturing lookahead assertion.  In particular, the segment (?=Outings)$ will never match any string.  It asserts that the string following the cursor is Outings, but then it also asserts that the string must end.  It is not possible for Both conditions to be true, therefore the regex will never match any string.

But you don't need the assertion.  If you want to match 4050, just use 4050.  If you want to match Outings, match Outings. You can try all these combinations out with a regex tester tool.  Why not just use the real URL?

Here's a free regex tester tool I wrote: http://cheeso.members.winisp.net/srcview.aspx?file=WinForms-RegexTester.zip

 

Sep 2, 2011 at 5:45 AM
Edited Sep 2, 2011 at 5:46 AM

Hi Cheeso,

Thanks for the quick reply.  I have really struggled with the documentation and trying to understand the complex expressions.

My understanding was that using (?=4050) would cause the filter to look for 4050 in the URL, meaning a URL containing 4050.  In other words, the filter would look for an exact match of "4050."  In this case, as you can see from the rewrite rule, the unfriendly URL is:

www.stevinsonranch.com/outings.php?page_id=4050&name=Outings

In other rule pairs I have used the same type of expression to detect some unique item within the unfriendly URL to prevent any mix-ups with other similar URLs that need to be redirected.

The documentation explained that $ would indicate an "end of line," so I thought that perhaps since "Outings" was at the end of the URL, I could tell the expression to look for a ling ending in "Outing."  Needless to say I have not had any success using $ at the end of an expression.

As an example of what I have been doing, my typical rule pairs for an original URL that is www.mywebsite.com/page_id=2011 might look something like this:

rewriterule ^/contact-us /page_id=2011 [L]
redirectrule ^/.*(?=2011) http://www.mywebsite.com/contact-us [R=301]

From my limited understanding, the sample rewrite expression would look for a URL with 2011 in it.

I don't recall where I picked up all of the items I am using for the rule pairs.  I think I saw an example of them somewhere.  Although these work most of the time, from what you have said, I have completely botched the process.

I guess I have absolutely no idea what I am doing.  I am sorry to be such an idiot.  I've read, read and re-read the documentation and thought I had at least the basics of what I needed to know to accomplish simple rewriting and redirection for the sites I have been forced to manage.  I've managed to get about 99% of the pages I need rewritten/redirected to perform correctly.
I don't understand how to use the entire URL to perform the redirect and I thought I was trying to match either 4050 or Outings.

I don't know if I just completely missed the explanations in the documentation or if i was just too stupid to understand them.  I'm not a programmer -- and I suck at math -- I'm just an old news photographer who studied hard and became a sys admin.  The problem with being a sys admin is that everyone thinks you should know everything about anything that uses any kind of technology.

I set the logging level to 4 as you have suggested but have no idea what I should be looking for.  I checked for items related to the rules in question and only saw entries which said "no match."

Where would you suggest I go from here?

Sep 2, 2011 at 6:40 AM

I forgot to add that I can't match for "Outings" unless I can specify that it is at the end of the URL because other URLs use the word in them, which is why I tried to use $ in the expression.

Coordinator
Sep 2, 2011 at 2:14 PM
kwongphotography wrote:

My understanding was that using (?=4050) would cause the filter to look for 4050 in the URL, meaning a URL containing 4050.  In other words, the filter would look for an exact match of "4050."  In this case, as you can see from the rewrite rule, the unfriendly URL is:

www.stevinsonranch.com/outings.php?page_id=4050&name=Outings

Yes, that's true. but (?=4050)is a zero-width assertion, and the zero-width part is probably unnecessary. You can just use 4050 - it also matches a 4050 anywhere in the string. If you want to insure it is 4050 and not 84050, then you could use =4050. (Without the enclosing parens, and without the question mark). I know regex has an arcane and baroque syntax, which is why I suggest using a regex tester to help you learn.

In other rule pairs I have used the same type of expression to detect some unique item within the unfriendly URL to prevent any mix-ups with other similar URLs that need to be redirected.

The documentation explained that $ would indicate an "end of line," so I thought that perhaps since "Outings" was at the end of the URL, I could tell the expression to look for a ling ending in "Outing."  Needless to say I have not had any success using $ at the end of an expression.

Yes, but (?=Outings) is a zero-width assertion, meaning it asserts a match but does not advance the pointer in the subject string. It asserts that "what comes next must be 'Outings', but it does not move the pointer in the string.  In other words the next part of the regex pattern is matched against the same location in the subject string.  (If you don't get this, I suggest you google "zero-width positive lookahead assertion" - it will give you some additional information that may help).  This zero-width assertion is then followed by another assertion, $, which implies "end of string". So you are asserting the presence of a word, Outings, and you are also asserting the end of the line. The combination will never be true. Hence the regex pattern will never match any string. If you want to match the word 'Outings' only at the end of a line, then use Outings$.

As an example of what I have been doing, my typical rule pairs for an original URL that is www.mywebsite.com/page_id=2011 might look something like this:

rewriterule ^/contact-us /page_id=2011 [L]
redirectrule ^/.*(?=2011) http://www.mywebsite.com/contact-us [R=301]

From my limited understanding, the sample rewrite expression would look for a URL with 2011 in it.

I don't recall where I picked up all of the items I am using for the rule pairs.  I think I saw an example of them somewhere.  Although these work most of the time, from what you have said, I have completely botched the process.

I guess I have absolutely no idea what I am doing.  I am sorry to be such an idiot.  I've read, read and re-read the documentation and thought I had at least the basics of what I needed to know to accomplish simple rewriting and redirection for the sites I have been forced to manage.  I've managed to get about 99% of the pages I need rewritten/redirected to perform correctly.
I don't understand how to use the entire URL to perform the redirect and I thought I was trying to match either 4050 or Outings.

I don't know if I just completely missed the explanations in the documentation or if i was just too stupid to understand them.  I'm not a programmer -- and I suck at math -- I'm just an old news photographer who studied hard and became a sys admin.  The problem with being a sys admin is that everyone thinks you should know everything about anything that uses any kind of technology.

I set the logging level to 4 as you have suggested but have no idea what I should be looking for.  I checked for items related to the rules in question and only saw entries which said "no match."

Where would you suggest I go from here?

I commend you for your chutzpah.

The "no match" just means your regexes are not matching.  You already knew that.

I suggest here what I suggested in my prior message:  that you remove or replace your use of zero-width assertions, as you don't need them. Also download and use the regex tester tool, it will help.  Play around with it. Try different things. Get the feel for regexes. It's easier to understand it when you see the results immediately - immediate feedback will allow you to get the idea.  Once you get a good regex, something that matches your test urls, put it into the RedirectRule in IIRF.ini.

 

 

 

Coordinator
Sep 2, 2011 at 2:24 PM
Edited Sep 2, 2011 at 2:31 PM

Examples.

# regex subjects
1 Outings$ matches:
Outings
ShoutOutings
ABOutings
page_id=4050&name=Outings

does not match:
outings
shouting
Outing
name=Outings&page_id=4050
2 (?=Outings)$ matches nothing, ever.
3 Outings matches:
Outings
Outings and Activities
Activities and Outings
name=Outings&page=4050
page=4050&name=Outings

does not match:
Outing
outings
Aardvark

The 1st regex matches Outings only at the end of the string. The second is useless, as I have described. It never matches anything. The third may be helpful if you want to match the string 'Outings' anywhere in the input subject. In the case of IIRF, the subject used for the regex is the incoming URL, without the scheme, hostname and (optional) port. For a url like http://stevensonranch.com/who/what/where the subject is /who/what/where . In your case I think you are looking at something like /outings.php?page_id=4050&name=Outings  as the subject.

If that is what you want to match on, then why not use

^/outings\.php\?page_id=4050&name=Outings$

as the pattern ?

Oct 3, 2011 at 10:00 PM

Thanks again for your assistance.  I'll review your suggestions and try out the regex tester.  Have a great week.

 

Ken