Rewriting URL

Topics: Developer Forum, User Forum
Oct 18, 2009 at 4:42 PM


I need help with URL rewriting.

I was trying to reformat the this URL:


To get:


I was doing the this:

RewriteRule  ^/prov/(.*)/(.*).html$  /ProvSite.aspx?ProvId=$1&PageId=$2

It seems like I am not doing it right.


Any help will be appriciated.



Oct 19, 2009 at 9:50 AM

Your regular expression is incorrect.

The (.*) literally means "match as many characters as possible".

I think what you're after is something like:

RewriteRule  ^/prov/([^/]*)/([^\.]*).html$  /ProvSite.aspx?ProvId=$1&PageId=$2

Cheeso may correct me yet.

This is now looking for /prov/ then anything BUT a forward slash, giving you anything in between /prov/ and the next / then anything else before the first full stop.

At least I think this is right...

Hope this helps.

Oct 19, 2009 at 10:39 AM

Many thanks.


Oct 20, 2009 at 6:01 AM

Sounds about right, Shonk.   I would make just a few changes.

Many people might think that * matches one or more characters but that's not the case.   The * matches zero or more of the prior pattern atom (a character, or a range, or if parens precede the *, then the prior group).   a* matches zero or more a characters.  [a-z]* matches zero or more alpha characters.  

You then have the somewhat confusing situation where the pattern /(a*)/ matched against the string "//" gives a positive result.  If you want "one or more" then use the + quantifier.   a+ matches one or more a characters.  [a-z]+ matches one or more alpha characters.  And /(a+)/ matched against the string "//" gives a negative (no match) result.  The same pattern applied to the string /a/ gives a positive result.

Also, if the middle path element is only digits, you may want to restrict it still further with that range.  Instead of [^/]  , which in English is "any char that is not a slash", you can use the range [0-9] , which in English is "any decimal digit".   With the + quantifier appended, you have [0-9]+ , which is "any series of one or more decimal digits".   

Finally, the "word" preceding html should exclude the slash character, normally.  Otherwise your final "word" could include a slash, and the entire pattern would match /prov/3342/one/two/three.html , which I think you don't want.   So the final capture group should be ([^\./]+) .

Using all these tweaks, the reworked rule looks like this: 

RewriteRule  ^/prov/([0-9]+)/([^\./]+).html$  /ProvSite.aspx?ProvId=$1&PageId=$2

This matches against a URL path with 2 path segments. The first is /prov, the second is a series of one or more digits. The final part of the path is the filename, which has one or more characters (none of which can be a dot or a slash), followed by .html.

Be sure to test against all expected and unexpected incoming URL requests.