Rewrite rule for clean URLs

Topics: User Forum
Oct 5, 2011 at 6:41 PM

Hi Cheeso,

I'm new to IIRF and am starting to wrap my head around crafting rules but need a little assistance.

Essentially we would like to hide .asp extensions from any URL, for all pages in the root and subfolders of the virtual directory.

We also need to keep query strings intact, and not rewrite URLs for image and stylesheets.

I am using Testdriver to validate the rules which is very helpful, here is the ini file:

 

# IIRF ini file
# version 1.1
# Mon, 26 Sept 2011  15:18
#===========================================
RewriteLog C:\b10\portal\Temp\Logs\iirf
RewriteLogLevel 3
RewriteEngine ON
RewriteBase ON
IterationLimit 2
RewriteRule (.+\.)(jpg|png|jpeg|gif|ttf|sql|txt|xslt|zip|css|xml)$   -   [L]
RewriteRule  ^([^\?]*) $1.asp [QSA] $

Sample URLs
# Incoming URL                      Expected Result
#-----------------------------------------------------
/WS/Portal/getStuff?var1=abc 	/WS/Portal/getStuff.asp?var1=abc
/login?abcde 			/login.asp?abcde
Output
C:\Program Files\Ionic Shade\IIRF 2.1>testdriver.exe -d C:\testurls
TestDriver: linked with 'Ionic ISAPI Rewriting Filter (IIRF) 2.1.2.0 x86 RELEASE'.
TestDriver: The IIRF library was built on 'Aug 17 2011 09:58:52'

Trying to read config at 'C:\testurls\Iirf.ini'
Wed Oct 05 13:31:57 -  4848 - -------------------------------------------------------
Wed Oct 05 13:31:57 -  4848 - Ionic ISAPI Rewriting Filter (IIRF) 2.1.2.0 x86 RELEASE
Wed Oct 05 13:31:57 -  4848 - IIRF was built on: Aug 17 2011 09:58:52
Wed Oct 05 13:31:57 -  4848 - GetLogFile: app:'None'  new log:'C:\b10\portal\Temp\Logs\iirf.237
6.log'
Wed Oct 05 13:31:57 -  4848 - ReadVdirConfig: actual log file 'C:\b10\portal\Temp\Logs\iirf.237
6.log'
Wed Oct 05 13:31:57 -  4848 - ReadVdirConfig: ini file: 'C:\testurls\Iirf.ini'
Wed Oct 05 13:31:57 -  4848 - ReadVdirConfig: ini file timestamp: 2011/10/05 13:31:54 Eastern D
aylight Time
Wed Oct 05 13:31:57 -  4848 - ReadVdirConfig: cfg(0x009D65B0)
Wed Oct 05 13:31:57 -  4848 - ReadVdirConfig: LogLevel = 3
Wed Oct 05 13:31:57 -  4848 - ReadVdirConfig: C:\testurls\Iirf.ini(8): RewriteEngine will be en
abled.
Wed Oct 05 13:31:57 -  4848 - ReadVdirConfig: C:\testurls\Iirf.ini(9): RewriteBase will be enab
led.
Wed Oct 05 13:31:57 -  4848 - ReadVdirConfig: C:\testurls\Iirf.ini(9): RewriteBase will be '?'
Wed Oct 05 13:31:57 -  4848 - ReadVdirConfig: C:\testurls\Iirf.ini(10): IterationLimit 2
Wed Oct 05 13:31:57 -  4848 - ReadVdirConfig: C:\testurls\Iirf.ini(11): RewriteRule (rule 1)  '
(.+\.)(jpg|png|jpeg|gif|ttf|sql|txt|xslt|zip|css|xml)$'  '-'      [L]
Wed Oct 05 13:31:57 -  4848 - ReadVdirConfig: C:\testurls\Iirf.ini(14): RewriteRule (rule 2)  '
^([^\?]*)'  '$1.asp'    [QSA]
Wed Oct 05 13:31:57 -  4848 - CountIniLines: ini file C:\testurls\Iirf.ini (15 lines)
Wed Oct 05 13:31:57 -  4848 - ReadVdirConfig: Done reading INI for vdir(?), found 2 rules (0 er
rors, 0 warnings) on 15 lines, in 1 modules
done reading new config
Processing URLs...(C:\testurls\SampleUrls.txt)

Wed Oct 05 13:31:57 -  4848 - DoRewrites: Url: '/WS/Portal/getStuff?var1=abc'
Wed Oct 05 13:31:57 -  4848 - EvaluateRules: depth=0

***
Retrieving server variable that is not supported by TestDriver (SCRIPT_NAME)
Wed Oct 05 13:31:57 -  4848 - GetServerVariable failed. The parameter is incorrect.
Wed Oct 05 13:31:57 -  4848 - EvaluateRules: URL does not begin with RewriteBase string(?), not
 stripping it.
Wed Oct 05 13:31:57 -  4848 - EvaluateRules: Rule 1: -1 (No match)
Wed Oct 05 13:31:57 -  4848 - EvaluateRules: Rule 2: 2 matches
Wed Oct 05 13:31:57 -  4848 - ApplyUrlEncoding: out '/WS/Portal/getStuff.asp'
Wed Oct 05 13:31:57 -  4848 - EvaluateRules: Result (length 32): /WS/Portal/getStuff.asp?var1=a
bc
Wed Oct 05 13:31:57 -  4848 - EvaluateRules: depth=1

***
Retrieving server variable that is not supported by TestDriver (SCRIPT_NAME)
Wed Oct 05 13:31:57 -  4848 - GetServerVariable failed. The parameter is incorrect.
Wed Oct 05 13:31:57 -  4848 - EvaluateRules: URL does not begin with RewriteBase string(?), not
 stripping it.
Wed Oct 05 13:31:57 -  4848 - EvaluateRules: Rule 1: -1 (No match)
Wed Oct 05 13:31:57 -  4848 - EvaluateRules: Rule 2: 2 matches
Wed Oct 05 13:31:57 -  4848 - ApplyUrlEncoding: out '/WS/Portal/getStuff.asp.asp'
Wed Oct 05 13:31:57 -  4848 - EvaluateRules: Result (length 36): /WS/Portal/getStuff.asp.asp?va
r1=abc
Wed Oct 05 13:31:57 -  4848 - EvaluateRules: depth=2

***
Retrieving server variable that is not supported by TestDriver (SCRIPT_NAME)
Wed Oct 05 13:31:57 -  4848 - GetServerVariable failed. The parameter is incorrect.
Wed Oct 05 13:31:57 -  4848 - EvaluateRules: URL does not begin with RewriteBase string(?), not
 stripping it.
Wed Oct 05 13:31:57 -  4848 - EvaluateRules: Rule 1: -1 (No match)
Wed Oct 05 13:31:57 -  4848 - EvaluateRules: Rule 2: 2 matches
Wed Oct 05 13:31:57 -  4848 - ApplyUrlEncoding: out '/WS/Portal/getStuff.asp.asp.asp'
Wed Oct 05 13:31:57 -  4848 - EvaluateRules: Result (length 40): /WS/Portal/getStuff.asp.asp.as
p?var1=abc
Wed Oct 05 13:31:57 -  4848 - EvaluateRules: Iteration stopped; reached limit of 2 cycles.
Wed Oct 05 13:31:57 -  4848 - EvaluateRules: returning 1
Wed Oct 05 13:31:57 -  4848 - EvaluateRules: returning 1
Wed Oct 05 13:31:57 -  4848 - EvaluateRules: returning 1

REWRITE '/WS/Portal/getStuff?var1=abc' ==> '/WS/Portal/getStuff.asp.asp.asp?var1=abc'
ERROR expected(/WS/Portal/getStuff.asp?var1=abc)
        actual(/WS/Portal/getStuff.asp.asp.asp?var1=abc)

Wed Oct 05 13:31:57 -  4848 - DoRewrites: Url: '/login?abcde'
Wed Oct 05 13:31:57 -  4848 - EvaluateRules: depth=0

***
Retrieving server variable that is not supported by TestDriver (SCRIPT_NAME)
Wed Oct 05 13:31:57 -  4848 - GetServerVariable failed. The parameter is incorrect.
Wed Oct 05 13:31:57 -  4848 - EvaluateRules: URL does not begin with RewriteBase string(?), not
 stripping it.
Wed Oct 05 13:31:57 -  4848 - EvaluateRules: Rule 1: -1 (No match)
Wed Oct 05 13:31:57 -  4848 - EvaluateRules: Rule 2: 2 matches
Wed Oct 05 13:31:57 -  4848 - ApplyUrlEncoding: out '/login.asp'
Wed Oct 05 13:31:57 -  4848 - EvaluateRules: Result (length 16): /login.asp?abcde
Wed Oct 05 13:31:57 -  4848 - EvaluateRules: depth=1

***
Retrieving server variable that is not supported by TestDriver (SCRIPT_NAME)
Wed Oct 05 13:31:57 -  4848 - GetServerVariable failed. The parameter is incorrect.
Wed Oct 05 13:31:57 -  4848 - EvaluateRules: URL does not begin with RewriteBase string(?), not
 stripping it.
Wed Oct 05 13:31:57 -  4848 - EvaluateRules: Rule 1: -1 (No match)
Wed Oct 05 13:31:57 -  4848 - EvaluateRules: Rule 2: 2 matches
Wed Oct 05 13:31:57 -  4848 - ApplyUrlEncoding: out '/login.asp.asp'
Wed Oct 05 13:31:57 -  4848 - EvaluateRules: Result (length 20): /login.asp.asp?abcde
Wed Oct 05 13:31:57 -  4848 - EvaluateRules: depth=2

***
Retrieving server variable that is not supported by TestDriver (SCRIPT_NAME)
Wed Oct 05 13:31:57 -  4848 - GetServerVariable failed. The parameter is incorrect.
Wed Oct 05 13:31:57 -  4848 - EvaluateRules: URL does not begin with RewriteBase string(?), not
 stripping it.
Wed Oct 05 13:31:57 -  4848 - EvaluateRules: Rule 1: -1 (No match)
Wed Oct 05 13:31:57 -  4848 - EvaluateRules: Rule 2: 2 matches
Wed Oct 05 13:31:57 -  4848 - ApplyUrlEncoding: out '/login.asp.asp.asp'
Wed Oct 05 13:31:57 -  4848 - EvaluateRules: Result (length 24): /login.asp.asp.asp?abcde
Wed Oct 05 13:31:57 -  4848 - EvaluateRules: Iteration stopped; reached limit of 2 cycles.
Wed Oct 05 13:31:57 -  4848 - EvaluateRules: returning 1
Wed Oct 05 13:31:57 -  4848 - EvaluateRules: returning 1
Wed Oct 05 13:31:57 -  4848 - EvaluateRules: returning 1

REWRITE '/login?abcde' ==> '/login.asp.asp.asp?abcde'
ERROR expected(/login.asp?abcde)
        actual(/login.asp.asp.asp?abcde)


2 Errors in 2 Total Trials

I think it is close but the second rule is looping and appending the .asp extension and I'm not sure how to resolve that.

Any help is much appreciated.

Thanks

 

Coordinator
Oct 5, 2011 at 7:41 PM
Edited Oct 5, 2011 at 7:46 PM

hi, glad to see you using the testdriver.

You asked about avoiding the looping.  One easy way is to modify the rules to not rewrite any URL that already ends with .asp.  If someone uses foo.asp, then you can just NOT rewrite at all.  The way to do that is with a rule like this:

RewriteRule \.asp(\?|$) -  [L]

What that says is, if a url has a .asp, followed by either a question mark (beginning of a query string) or the end of the string, then don't rewrite it. If you place that rule ahead of the rule that appends the .asp, then you will not get the .asp.asp.asp thing.

Next though, I think you want to add a conditional for the case where you are adding the .asp .  Probably want to see if that file really exists before adding it.  Use the -f test for RewriteCond to check.  It should be something like this:

RewriteCond -f %{REQUEST_FILENAME}.asp 
RewriteRule  ^([^\?]*) $1.asp [QSA] 

The REQUEST_FILENAME server variable will resolve to the path on your filesystem represented by the thing requested via the URL path. If the document root for your vdir is c:\inetpub\wwwroot , then requesting http://example.com/fibble will give you a REQUEST_FILENAME value of c:\inetpub\wwwroot\fibble . The -f test in the RewriteCond above tests to see if there is a filename by the given name, which is the REQUEST_FILENAME with .asp appended. If that is true, then the rule applies, and the rule is just your rule, which appends .asp to the url path.

Either the first rule I suggested, or this modified version of your rule, would solve the problem of multiple .asp's getting appended to requests. You don't really need both.  I suggest you keep the latter one - which is basically your original rule with a condition applied to it.

Shortening URLs opens up the possibility to redirect old requests for .asp URLs to the new, shorter version with no extension. Rather than doing "no rewrite" as I Described above, you can modify that rule to actually redirect. like this:

# Tell requesters to use shorter URLs
RedirectRule ^/(.+\.asp(\?|$))  /$1  [QSA,R=301]

Make sense? If anyone requests http://example.com/fibble.asp, it will be redirected to http://example.com/fibble. And then when that request arrives, it will NOT have the .asp so the redirectrule won't fire. The next rule will fire, which REWRITES to add back the .asp.

The final set of rules looks like this:

# IIRF ini file
#===========================================
RewriteLog C:\b10\portal\Temp\Logs\iirf
RewriteLogLevel 3
RewriteEngine ON
RewriteBase ON
# don't rewrite for any of these static files
RewriteRule (.+\.)(jpg|png|jpeg|gif|ttf|sql|txt|xslt|zip|css|xml|ico)$   -   [L]

# For .asp requests, tell requesters to use shorter URLs
RedirectRule ^(.+\.asp(\?|$))  $1  [QSA,R=301]

# when a short request comes in, append .asp if there is an .asp file. 
RewriteCond -f %{REQUEST_FILENAME}.asp 
RewriteRule  ^([^\?]*) $1.asp [QSA]

I think that oughtta do it. You won't be able to test conditions that use REQUEST_FILENAME server variables in a non-server environment, like the TestDriver. To test, you will need to install IIRF, and put those rules into an IIRF.ini for a real vdir in IIS.

Oct 5, 2011 at 8:30 PM
Edited Oct 5, 2011 at 9:27 PM

Hi Cheeso,

Thanks so much for the assistance, I will look at the suggestions you provided.  

since my initial post I have been working on another approach that is almost working, I'm curious if this is viable with some tweaking:

RewriteRule ^([^\?]*)\?(.*) $1.asp$2

Sample URLS

# Incoming URL                      Expected Result
/WS/Portal/getStuff?var1=abc /WS/Portal/getStuff.asp?var1=abc

Output

REWRITE '/WS/Portal/getStuff?var1=abc' ==> '/WS/Portal/getStuff.aspvar1=abc'
ERROR expected(/WS/Portal/getStuff.asp?var1=abc)
        actual(/WS/Portal/getStuff.aspvar1=abc)

 

I broke up the URL into two groups, but I can't get the ? in the output.  

RewriteRule ^([^\?]*)\?(.*) $1.asp?$2 results in looping

 

REWRITE '/WS/Portal/getStuff?var1=abc' ==> '/WS/Portal/getStuff.asp.asp.asp.asp?var1=abc'
ERROR expected(/WS/Portal/getStuff.asp?var1=abc)
        actual(/WS/Portal/getStuff.asp.asp.asp.asp?var1=abc)

 

I'm interested to know if this could work.

Thanks!

Coordinator
Oct 5, 2011 at 10:10 PM
Edited Oct 5, 2011 at 10:16 PM

It could work - you just need to filter out the looping.   You can modify the regex to simply not match any url that already ends in .asp, using what is known as a non-capturing negative lookbehind assertion. The syntax is (?<!something). That says "Match where the preceding is not something."

In your case you would employ this assertion like this:

RewriteRule ^([^\?]*)(?<!\.asp)\?(.*) $1.asp?$2 

That says, capture everything up to the first question mark, if any. And only match if what precedes the first question mark, does not end in .asp. It's a non-capturing assertion so it does not increment the capture group count.  You still reference $1 and $2 as you did previously. 

But I still think you'd ideally also want to use that -f test which I mentioned previously.

 Actually, I think you would also want to capture the ? within the 2nd group, so ideally it would be like this:

RewriteRule ^([^\?]*)(?<!\.asp)(\?.*)?$   $1.asp?$2 

I've moved the ? into the 2nd capture group. Which means the 2nd group matches if and only if it begins with a question mark. I've also followed that group with a ?, unescaped. That's a quantifier that says "zero or one of the preceding". That means the 2nd group may or may not be empty. That pattern will match a URL that has a query string, and also will match a URL that has no query string. I also terminated the pattern with $, which is the end-of-string assertion. This means no characters can follow.  The upshot is, either the URL has no query string and ends in something that is NOT .asp, or it has something that does not end in .asp followed by a ? followed by some arbitrary text (the query string). 

The resulting pattern is a good illustration of why I say the regular expression language looks like cartoon swearing.