URL modification, downcasing, then evaluate and redirect

Topics: Developer Forum, User Forum
Apr 14, 2010 at 2:55 PM
Edited Apr 14, 2010 at 2:56 PM

Hi,
I'm trying to accomplish a couple fairly simple things here and am not sure how to go about it.
I've seen the downcasing sytax and my regex is very rusty, but here's what I need to do.
As you'll see, I'm tending to think of this in a procedural perspective and I'm not sure if that's correct.

1.take incoming parameter
2.evaluate and modify if necessary,saving value in original parameter
3.check the variable value against all the rules (like a switch/case stmt)


------------------------------------------------------------------------    

Evaluate incoming url:
if url is missing "www."  ie: "somedomain.com/could/BE/AnyThing/here.asp"
     correct it to -> "www.somedomain.com/could/BE/AnyThing/here.asp"

------------------------------------------------------------------------    
    
Now that we've corrected it:
     downcase it to "www.somedomain.com/could/be/anything/here.asp"
    
------------------------------------------------------------------------    

Now that it's downcased:
     evaluate for redirection...
    
     --------------------------------------------------------------------    
     #some rules check for a specific page request...
     --------------------------------------------------------------------    
     if everything past first "/" == could/be/anything/here.aspx
          redirect to /here.aspx             (www.somedomain.com/here.aspx)
     if everything past first "/" == there/isnt/anything.aspx
          redirect to /anything.aspx         (www.somedomain.com/anything.aspx)
     --------------------------------------------------------------------    
     #some other rules simply check for any page that falls within a path    
     --------------------------------------------------------------------    
     if part of the path == /some/path/      (ex: www.somedomain.com/some/path/to/page.aspx OR www.somedomain.com/some/path/and/another/page.aspx)
          redirect to /newpage.aspx          (www.somedomain.com/newpage.aspx)
         
         
     and so on for many rules...
    
Thanks in advance,
Tom

Coordinator
Apr 14, 2010 at 6:40 PM

IIRF is rule-based, not procedural.

You need to write rules, that tell IIRF what to do when the incoming URL matches. 
You said your regex is rusty, but that's not the main point. Regex is just a way to describe the patterns to match against the incomng URL.
The larger issue is the idea behind rules-based logic, versus procedural logic.

Think about the main things you want to happen to a URL.  Let's start with downcasing as an example.  I presume you want downcasing to happen with a REDIRECT.  (If you don't clearly understand the difference between a REDIRECT and a REWRITE, read the documentation I wrote, now.  It usually makes no sense to downcase with a rewrite, but not always.  If you don't understand why, read the doc.) 

Ok, you want to downcase.  The simple rule for that is

RedirectRule ^/(.*[A-Z]+.*)$    /#L$1#E    

Look at the replacement string. That's the /#L$1#E . What this does is, replace the first capture ($1) with itself, but wrapped in a down-cased bracket. ok, so that replaces A with a, and so on.

The pattern - that's the part that looks like cartoon swearing. It includes a single capture group, denoted by parens. Within the parens, .* says "zero or more of any character", because dot is a wildcard and * is a quantifer meaning "zero or more". The [A-Z] says "any character from A-Z." The + following that is a quantifier meaning "one or more". Therefore [A-Z]+ says "one or more of any character from A-Z". Finally, another .*, which says "zero or more of any character". Taken together this matches any string that has one or more uppercase characters in it.

The redirect rule is applied only to those URLs.  It redirects to the same URL, but convverted to lower-case.  Presumably the browser will send in the lowercased URL, after it gets the HTTP 302 response.  The new URL will not have any characters A-Z in it, because they've been converted to a-z.  Therefore the rule will not match, for this second request.  It will be handled by whatever handles URLs in your server.

If you have other conditions that you'd like to use before downcasing, then work them into that rule.  For example, if you want to downcase only URLs that start with S, then include S as the first character in the capture group.  And so on. 

Ok, that takes care of downcasing.  Then you have other requirements, apply those the same way.  Create the rule, figure out the regex you need, create the replacement string.

Combine the rules into a set, and you're done.

 

 

 

Coordinator
Apr 14, 2010 at 6:46 PM

Regarding your other particular requirements,

to force a www on all incoming requests, see this example.  You can combine downcasing as well.  This gives you these two rules:

# prepend www when not present in the hostname
RewriteCond  %{HTTP_HOST}  !^www
RedirectRule ^/(.*)$       http://www.%{HTTP_HOST}/#L$1#E         [R=301]

# downcase any URL that needs it
RedirectRule ^/(.*[A-Z]+.*)$    /#L$1#E

Then you have the rules about /some/path and so on, you'll need to figure those out yourself. Shouldn't be too difficult.

 

Apr 27, 2010 at 1:50 PM

Cheeso,

First of all thanks for your info on this.  I had implemented based on your suggestions and all seemed to be working well.  Of course now that we have this in production, we seem to be having a problem.

It's related to redirecting to a url with "www" in it when it is missing.  All redirects work when the "www" is present. 

So, when the "www" is missing the process seems to be getting into an endless loop.

Here is the portion of my iirf.ini that is enabled.

RewriteLogLevel 3
#StatusInquiry ON
StatusUrl /iirfStatus

# --------
# prepend www when not present in the hostname
# --------
RewriteCond  %{HTTP_HOST}  !^www
RedirectRule ^/(.*)$       http://www.%{HTTP_HOST}/#L$1#E         [R=301]

# --------
# downcase all URLs for evaluation below
# --------
RedirectRule ^/(.*[A-Z]+.*)$    /#L$1#E

# --------
# Section and Directory Redirects
# --------
RedirectRule ^/example(.*)$ http://www.website.com/default.aspx [R=301]
RedirectRule ^/pages/something(.*)$ /something.aspx [R=301]
RedirectRule ^/pages/else(.*)$ /anotherpage.aspx [R=301]


Here is a portion of the log:

Tue Apr 27 02:25:43 -  5776 - IsIniFileUpdated: c:\web\Site\server\webroot\Iirf.ini YES
Tue Apr 27 02:25:43 -  5776 - GetSiteConfig: Obtain  site '/LM/W3SVC/30/Root' , Ini file has been updated.
Tue Apr 27 02:25:43 -  5776 - ReadSiteConfig: actual log file 'C:\web\Site\server\webroot\IIRF-Log.txt.2364.log'
Tue Apr 27 02:25:43 -  5776 - ReadSiteConfig: ini file: 'c:\web\Site\server\webroot\Iirf.ini'
Tue Apr 27 02:25:43 -  5776 - ReadSiteConfig: ini file timestamp: 2010/04/27 02:25:32 Eastern Daylight Time
Tue Apr 27 02:25:43 -  5776 - ReadSiteConfig: pass 1
Tue Apr 27 02:25:43 -  5776 - ReadSiteConfig: line   4: StatusUrl /iirfstatus
Tue Apr 27 02:25:43 -  5776 - ReadSiteConfig: line   4: StatusUrl is enabled for local requests only.
Tue Apr 27 02:25:43 -  5776 - ReadSiteConfig: line  22: RewriteCond   %{HTTP_HOST}  !^www
Tue Apr 27 02:25:43 -  5776 - ReadSiteConfig: line  23: RedirectRule (rule 1)  '^/(.*)$'  'http://www.%{HTTP_HOST}/#L$1#E'  [R=301]
Tue Apr 27 02:25:43 -  5776 - ParseRuleModifierFlags: '[R=301]'
Tue Apr 27 02:25:43 -  5776 - ReadSiteConfig: line  29: RedirectRule (rule 2)  '^/(.*[A-Z]+.*)$'  '/#L$1#E'   (null)
Tue Apr 27 02:25:43 -  5776 - ReadSiteConfig: INFO: line 29: Redirecting to a target that does not include an http(s):// scheme.
Tue Apr 27 02:25:43 -  5776 - ReadSiteConfig:                The rule will Redirect to a target on the local machine
Tue Apr 27 02:25:43 -  5776 - ReadSiteConfig: line  36: RedirectRule (rule 3)  '^/example(.*)$'  '/default.aspx'  [R=301]
Tue Apr 27 02:25:43 -  5776 - ParseRuleModifierFlags: '[R=301]'
Tue Apr 27 02:25:43 -  5776 - ReadSiteConfig: INFO: line 36: Redirecting to a target that does not include an http(s):// scheme.
Tue Apr 27 02:25:43 -  5776 - ReadSiteConfig:                The rule will Redirect to a target on the local machine
Tue Apr 27 02:25:43 -  5776 - ReadSiteConfig: line  37: RedirectRule (rule 4)  '^/pages/something(.*)$'  '/something.aspx'  [R=301]
Tue Apr 27 02:25:43 -  5776 - ParseRuleModifierFlags: '[R=301]'
Tue Apr 27 02:25:43 -  5776 - ReadSiteConfig: INFO: line 37: Redirecting to a target that does not include an http(s):// scheme.
Tue Apr 27 02:25:43 -  5776 - ReadSiteConfig:                The rule will Redirect to a target on the local machine
Tue Apr 27 02:25:43 -  5776 - ReadSiteConfig: line  38: RedirectRule (rule 5)  '^/pages/else(.*)$'  '/anotherpage.aspx'  [R=301]
Tue Apr 27 02:25:43 -  5776 - ParseRuleModifierFlags: '[R=301]'
Tue Apr 27 02:25:43 -  5776 - ReadSiteConfig: INFO: line 38: Redirecting to a target that does not include an http(s):// scheme.
Tue Apr 27 02:25:43 -  5776 - ReadSiteConfig:                The rule will Redirect to a target on the local machine
Tue Apr 27 02:25:43 -  5776 - ReadSiteConfig: Done reading, found 5 rules (0 errors, 0 warnings) on 88 lines
Tue Apr 27 02:25:43 -  5776 - ReleaseOrExpireSiteConfig: site '/LM/W3SVC/30/Root' (era=8) (rc=0) (Expired=1) (ptr=0x13367288)...
Tue Apr 27 02:25:43 -  5776 - GetSiteConfig: Obtain  site '/LM/W3SVC/30/Root' (era=9) (rc=1) (Expired=0) (ptr=0x1336DF28)...
Tue Apr 27 02:25:43 -  5776 - HttpFilterProc: SF_NOTIFY_URL_MAP
Tue Apr 27 02:25:43 -  5776 - HttpFilterProc: cfg= 0x1336DF28
Tue Apr 27 02:25:43 -  5776 - HttpFilterProc: SF_NOTIFY_AUTH_COMPLETE
Tue Apr 27 02:25:43 -  5776 - DoRewrites
Tue Apr 27 02:25:43 -  5776 - GetHeader_AutoFree: 'url' = '/example'
Tue Apr 27 02:25:43 -  5776 - GetHeader_AutoFree: 'method' = 'GET'
Tue Apr 27 02:25:43 -  5776 - DoRewrites: New Url, before decoding: '/example'
Tue Apr 27 02:25:43 -  5776 - DoRewrites: Url (no decoding): '/example'
Tue Apr 27 02:25:43 -  5776 - EvaluateRules: depth=0
Tue Apr 27 02:25:43 -  5776 - EvaluateRules: Rule 1 : 2 matches
Tue Apr 27 02:25:43 -  5776 - ReplaceServerVariables: in='%{HTTP_HOST}' out='website.com'
Tue Apr 27 02:25:43 -  5776 - GenerateReplacementString: result 'website.com'
Tue Apr 27 02:25:43 -  5776 - EvalCondition: Cond %{HTTP_HOST} !^www => TRUE
Tue Apr 27 02:25:43 -  5776 - EvalConditionList: rule 1, TRUE, Rule will apply
Tue Apr 27 02:25:43 -  5776 - ReplaceServerVariables: in='http://www.%{HTTP_HOST}/#L$1#E' out='http://www.website.com/#L$1#E'
Tue Apr 27 02:25:43 -  5776 - GenerateReplacementString: result 'http://www.website.com/example'
Tue Apr 27 02:25:43 -  5776 - EvaluateRules: Result (length 37): http://www.website.com/example
Tue Apr 27 02:25:43 -  5776 - EvaluateRules: returning 1301
Tue Apr 27 02:25:43 -  5776 - DoRewrites: Redirect (code=301) Url to: 'http://www.website.com/example'
Tue Apr 27 02:25:43 -  5776 - HttpFilterProc: SF_NOTIFY_LOG
Tue Apr 27 02:25:43 -  5776 - ReleaseOrExpireSiteConfig: site '/LM/W3SVC/30/Root' (era=9) (rc=0) (Expired=0) (ptr=0x1336DF28)...
Tue Apr 27 02:25:43 -  5776 - IsIniFileUpdated: c:\web\Site\server\webroot\Iirf.ini NO
Tue Apr 27 02:25:43 -  5776 - GetSiteConfig: Obtain  site '/LM/W3SVC/30/Root' (era=9) (rc=1) (Expired=0) (ptr=0x1336DF28)...
Tue Apr 27 02:25:43 -  5776 - HttpFilterProc: SF_NOTIFY_URL_MAP
Tue Apr 27 02:25:43 -  5776 - HttpFilterProc: cfg= 0x1336DF28
Tue Apr 27 02:25:43 -  5776 - HttpFilterProc: SF_NOTIFY_AUTH_COMPLETE
Tue Apr 27 02:25:43 -  5776 - DoRewrites
Tue Apr 27 02:25:43 -  5776 - GetHeader_AutoFree: 'url' = '/example'
Tue Apr 27 02:25:43 -  5776 - GetHeader_AutoFree: 'method' = 'GET'
Tue Apr 27 02:25:43 -  5776 - DoRewrites: New Url, before decoding: '/example'
Tue Apr 27 02:25:43 -  5776 - DoRewrites: Url (no decoding): '/example'
Tue Apr 27 02:25:43 -  5776 - EvaluateRules: depth=0
Tue Apr 27 02:25:43 -  5776 - EvaluateRules: Rule 1 : 2 matches
Tue Apr 27 02:25:43 -  5776 - ReplaceServerVariables: in='%{HTTP_HOST}' out='website.com'
Tue Apr 27 02:25:43 -  5776 - GenerateReplacementString: result 'website.com'
Tue Apr 27 02:25:43 -  5776 - EvalCondition: Cond %{HTTP_HOST} !^www => TRUE
Tue Apr 27 02:25:43 -  5776 - EvalConditionList: rule 1, TRUE, Rule will apply
Tue Apr 27 02:25:43 -  5776 - ReplaceServerVariables: in='http://www.%{HTTP_HOST}/#L$1#E' out='http://www.website.com/#L$1#E'
Tue Apr 27 02:25:43 -  5776 - GenerateReplacementString: result 'http://www.website.com/example'
Tue Apr 27 02:25:43 -  5776 - EvaluateRules: Result (length 37): http://www.website.com/example
Tue Apr 27 02:25:43 -  5776 - EvaluateRules: returning 1301
Tue Apr 27 02:25:43 -  5776 - DoRewrites: Redirect (code=301) Url to: 'http://www.website.com/example'
Tue Apr 27 02:25:43 -  5776 - HttpFilterProc: SF_NOTIFY_LOG
Tue Apr 27 02:25:43 -  5776 - ReleaseOrExpireSiteConfig: site '/LM/W3SVC/30/Root' (era=9) (rc=0) (Expired=0) (ptr=0x1336DF28)...
Tue Apr 27 02:25:43 -  5776 - IsIniFileUpdated: c:\web\Site\server\webroot\Iirf.ini NO
Tue Apr 27 02:25:43 -  5776 - GetSiteConfig: Obtain  site '/LM/W3SVC/30/Root' (era=9) (rc=1) (Expired=0) (ptr=0x1336DF28)...
Tue Apr 27 02:25:43 -  5776 - HttpFilterProc: SF_NOTIFY_URL_MAP
Tue Apr 27 02:25:43 -  5776 - HttpFilterProc: cfg= 0x1336DF28
Tue Apr 27 02:25:43 -  5776 - HttpFilterProc: SF_NOTIFY_AUTH_COMPLETE
Tue Apr 27 02:25:43 -  5776 - DoRewrites
Tue Apr 27 02:25:43 -  5776 - GetHeader_AutoFree: 'url' = '/example'
Tue Apr 27 02:25:43 -  5776 - GetHeader_AutoFree: 'method' = 'GET'
Tue Apr 27 02:25:43 -  5776 - DoRewrites: New Url, before decoding: '/example'
Tue Apr 27 02:25:43 -  5776 - DoRewrites: Url (no decoding): '/example'
Tue Apr 27 02:25:43 -  5776 - EvaluateRules: depth=0
Tue Apr 27 02:25:43 -  5776 - EvaluateRules: Rule 1 : 2 matches
Tue Apr 27 02:25:43 -  5776 - ReplaceServerVariables: in='%{HTTP_HOST}' out='website.com'
Tue Apr 27 02:25:43 -  5776 - GenerateReplacementString: result 'website.com'
Tue Apr 27 02:25:43 -  5776 - EvalCondition: Cond %{HTTP_HOST} !^www => TRUE
Tue Apr 27 02:25:43 -  5776 - EvalConditionList: rule 1, TRUE, Rule will apply
Tue Apr 27 02:25:43 -  5776 - ReplaceServerVariables: in='http://www.%{HTTP_HOST}/#L$1#E' out='http://www.website.com/#L$1#E'
Tue Apr 27 02:25:43 -  5776 - GenerateReplacementString: result 'http://www.website.com/example'
Tue Apr 27 02:25:43 -  5776 - EvaluateRules: Result (length 37): http://www.website.com/example
Tue Apr 27 02:25:43 -  5776 - EvaluateRules: returning 1301
Tue Apr 27 02:25:43 -  5776 - DoRewrites: Redirect (code=301) Url to: 'http://www.website.com/example'
Tue Apr 27 02:25:43 -  5776 - HttpFilterProc: SF_NOTIFY_LOG
Tue Apr 27 02:25:43 -  5776 - ReleaseOrExpireSiteConfig: site '/LM/W3SVC/30/Root' (era=9) (rc=0) (Expired=0) (ptr=0x1336DF28)...
Tue Apr 27 02:25:43 -  5776 - IsIniFileUpdated: c:\web\Site\server\webroot\Iirf.ini NO
Tue Apr 27 02:25:43 -  5776 - GetSiteConfig: Obtain  site '/LM/W3SVC/30/Root' (era=9) (rc=1) (Expired=0) (ptr=0x1336DF28)...
Tue Apr 27 02:25:43 -  5776 - HttpFilterProc: SF_NOTIFY_URL_MAP
Tue Apr 27 02:25:43 -  5776 - HttpFilterProc: cfg= 0x1336DF28
Tue Apr 27 02:25:43 -  5776 - HttpFilterProc: SF_NOTIFY_AUTH_COMPLETE
Tue Apr 27 02:25:43 -  5776 - DoRewrites
Tue Apr 27 02:25:43 -  5776 - GetHeader_AutoFree: 'url' = '/example'
Tue Apr 27 02:25:43 -  5776 - GetHeader_AutoFree: 'method' = 'GET'
Tue Apr 27 02:25:43 -  5776 - DoRewrites: New Url, before decoding: '/example'
Tue Apr 27 02:25:43 -  5776 - DoRewrites: Url (no decoding): '/example'
Tue Apr 27 02:25:43 -  5776 - EvaluateRules: depth=0
Tue Apr 27 02:25:43 -  5776 - EvaluateRules: Rule 1 : 2 matches
Tue Apr 27 02:25:43 -  5776 - ReplaceServerVariables: in='%{HTTP_HOST}' out='website.com'
Tue Apr 27 02:25:43 -  5776 - GenerateReplacementString: result 'website.com'
Tue Apr 27 02:25:43 -  5776 - EvalCondition: Cond %{HTTP_HOST} !^www => TRUE
Tue Apr 27 02:25:43 -  5776 - EvalConditionList: rule 1, TRUE, Rule will apply
Tue Apr 27 02:25:43 -  5776 - ReplaceServerVariables: in='http://www.%{HTTP_HOST}/#L$1#E' out='http://www.website.com/#L$1#E'
Tue Apr 27 02:25:43 -  5776 - GenerateReplacementString: result 'http://www.website.com/example'
Tue Apr 27 02:25:43 -  5776 - EvaluateRules: Result (length 37): http://www.website.com/example
Tue Apr 27 02:25:43 -  5776 - EvaluateRules: returning 1301
Tue Apr 27 02:25:43 -  5776 - DoRewrites: Redirect (code=301) Url to: 'http://www.website.com/example'
Tue Apr 27 02:25:43 -  5776 - HttpFilterProc: SF_NOTIFY_LOG
Tue Apr 27 02:25:43 -  5776 - ReleaseOrExpireSiteConfig: site '/LM/W3SVC/30/Root' (era=9) (rc=0) (Expired=0) (ptr=0x1336DF28)...
Tue Apr 27 02:25:44 -   784 - IsIniFileUpdated: c:\web\Site\server\webroot\Iirf.ini NO
Tue Apr 27 02:25:44 -   784 - GetSiteConfig: Obtain  site '/LM/W3SVC/30/Root' (era=9) (rc=1) (Expired=0) (ptr=0x1336DF28)...
Tue Apr 27 02:25:44 -   784 - HttpFilterProc: SF_NOTIFY_URL_MAP
Tue Apr 27 02:25:44 -   784 - HttpFilterProc: cfg= 0x1336DF28
Tue Apr 27 02:25:44 -   784 - HttpFilterProc: SF_NOTIFY_AUTH_COMPLETE
Tue Apr 27 02:25:44 -   784 - DoRewrites

Coordinator
Apr 27, 2010 at 6:03 PM

very clear.  I see the endless loop in the log file.

It's unexpected because you are clearly redirecting to www.website.com, but yet the new request that arrives apparently includes only website.com as the HTTP_HOST.

What I would do to troubleshoot this, is install Fiddler on the browser side to inspect the HTTP Transactions.  Inside Fiddler, you should see:

  1. the outgoing original request,
  2. the HTTP 301 response, with the new (redirected) URL, from IIRF. 
  3. the new request, to the new URL from step 2.
  4. a response...

In step 3, I would expect the request to use the URL www.website.com .   In the IIRF log, though, you are seeing the host as website.com.  Somehow there is a discrepancy.  Maybe there is something else intervening to change the hostname.  Looking at the fiddler trace may give you some clues.

You could also try using %{SERVER_NAME} instead of %{HTTP_HOST} in  your RewriteCond.   I don't know the exact difference between them, but it would be easy for you to test.  These variables are documented on MSDN.

 

Apr 28, 2010 at 7:42 PM

Hey,

I still haven't made any progress on this rule to replace the 'www' when it's missing

I just explicitly tried the following url (disguised)

http://website.com/somedir   And I get the following in my log...

Wed Apr 28 14:30:08 -  4432 - HttpFilterProc: SF_NOTIFY_URL_MAP
Wed Apr 28 14:30:08 -  4432 - HttpFilterProc: cfg= 0x12CA2648
Wed Apr 28 14:30:08 -  4432 - HttpFilterProc: SF_NOTIFY_AUTH_COMPLETE
Wed Apr 28 14:30:08 -  4432 - DoRewrites
Wed Apr 28 14:30:08 -  4432 - DoRewrites: Url (no decoding): '/'
Wed Apr 28 14:30:08 -  4432 - EvaluateRules: depth=0
Wed Apr 28 14:30:08 -  4432 - EvaluateRules: Rule 1 : 2 matches
Wed Apr 28 14:30:08 -  4432 - EvalCondition: Cond %{HTTP_HOST} !^www => FALSE
Wed Apr 28 14:30:08 -  4432 - EvalConditionList: rule 1, FALSE, Rule does not apply

Again my rule is such:

RewriteCond  %{HTTP_HOST}  !^www
RedirectRule ^/(.*)$       http://www.%{HTTP_HOST}/#L$1#E         [R=301]

I'm concerned about the log lines in RED

Why is the first red line showing the URL as simply '/'

There is NO 'www' in my request, but the rule is evaluating to false when it should be true.

Thanks again,

Tom

 

 

Apr 28, 2010 at 7:52 PM

Tom, bump your log level up (to 4 I think should do it) and you can see what HTTP_HOST is evaluating to, and that may provide a pointer to the issue.

 

Apr 28, 2010 at 8:36 PM

Ok, thanks.  I'll give that a try.

Coordinator
Apr 29, 2010 at 8:27 PM

The URL retrieved and displayed in the log is the thing that is sent on the request line for an HTTP transmission, eg

GET /urlpath/info.asp  HTTP/1.1

That's why it shows up as /  in your case.  I guess you are not using anything for the URLpath, in other words, just http://servername
rswitt is correct - to understand how the condition is evaluating, you'd need to turn up logging.