rewrite for .html to .php ONLY if .php exists?

Topics: Developer Forum, Project Management Forum, User Forum
Sep 30, 2009 at 9:07 PM

Hi, am running into a little problem after install IIRF on my work's web server and I am not sure if this could be solved very easily or if it's just plain impossible. I can't find much help anywhere else so I figured I would give this a shot. It's kind of complex and I want to be thorough so please bear with me.

I just started working at a new place and here at my work we have a couple sites that have incredibly high page rank for incredibly competitive terms. BUT unfortunately whoever built these outdated pages eons ago built them in HTML and my job is complicated tenfold by the fact that some sites have upwards of 400 UNIQUELY WRITTEN HTML FILES, rather than using php includes for footers/headers/sidelinks etc to generate dynamic pages that I can easily make changes to and they will appear sitewide. So I constantly have to find and replace all across a whole site to change one link. Obviously this is extremely annoying but at the same time we can't just up and change our site to php without it affecting our SEO because everything is reading on our search results as index.html, whatever.html and so on. So our idea was to install IIRF to rewrite so on google the pages still come up the same and they click whatever.html and it really loads our updated dynamic .php file. BUT since we installed it and we haven't uploaded our .php files when you try to go to whatever.html it is 404 because there is no whatever.php there yet. So what I am getting it is, is there any way to write a rule so that if there is a .php page it will rewrite to that and if there isn't when you type in .html it will just show the original .html and not rewrite EVERY .html file. Make sense? Because we installed it and any time you try to go to a .html it is looking for the .php. We want to keep all our .html pages that have .php files behind them to redirect like that and the ones that don't to just stay and load without the rewrite as a .html. Hope that makes sense.

Here is what we have so far...

#
# Wednesday, 05 August 2009  4:52PM est
#

#IterationLimit 5
#RewriteRule ^/([^/\.]+)(html)(.*)$  /$1.php$3  [L]

#Can be accessed via http://localhost/iirfstatus
StatusUrl /iirfStatus

RewriteRule ^(.*)\.html $1.php [I,L]
RewriteRule ^(.*)\.htm $1.php [I,L]

 

Thanks!

Coordinator
Sep 30, 2009 at 11:24 PM
Edited Oct 1, 2009 at 12:16 AM

You need to test the existence of the file. This is done with a RewriteCond and the -f "special pattern".   If the site is the "main" site (no virtual path), then the rules should be like this:   

RewriteEngine ON
RewriteLog  c:\temp\iirf.wwwroot
RewriteLogLevel 2
StatusUrl  /iirfStatus
CondSubstringBackrefFlag *
MaxMatchCount 5

# -------------------------------------------------------
# Rewrite .html requests to .aspx, if the aspx file  exists.
# %{APPL_PHYSICAL_PATH} is the physical path for the IIS vdir
# or application. 
#
# The QSA modifier just appends any query string in the original
# request, so something like
#
#  http://server/page1.html?p1=value  will get rewritten to 
#     http://server/page1.php?p1=value 
#
# The [L] says, evaluate no more rules if this one fires.
#
RewriteCond  %{APPL_PHYSICAL_PATH}$1.php      -f
RewriteRule  ^/(.+)\.(html|htm)(\?(.*))?$     /$1.php             [QSA,L]

# If the request gets to this rule, that means the .php form is not present.
# If the .html file is ALSO not present, then rewrite to my custom 404 page. 
# This part is not necessary if you have a different 404 page already set up, but
# I wanted to show the syntax for "does not exist".  
RewriteCond  %{APPL_PHYSICAL_PATH}$1.html     !-f
RewriteRule  ^/(.+)\.(html|htm)(\?(.*))?$     /DoesNotExist.htm   [L]

Supposing you are working with an IIS "application" that is mapped to the /foo virtual path, then the config looks slightly different, like this:

# -------------------------------------------------------
# Rewrite .html requests to .aspx, if the PHP file  exists
# %{APPL_PHYSICAL_PATH} is the physical path for the IIS vdir. 
#
# The QSA modifier just appends any query string in the original
# request, so something like
#
#  http://server/foo/page1.html?p1=value  will get rewritten to 
#     http://server/foo/page1.php?p1=value 
#
# The [L] says, evaluate no more rules if this one fires.
#
RewriteCond  %{APPL_PHYSICAL_PATH}$1.php         -f
RewriteRule  ^/foo/(.+)\.(html|htm)(\?(.*))?$    /foo/$1.php             [QSA,L]

# If the request gets here, that means the .php form is not present.
# If the .html file is also not present, then rewrite to my custom 404 page. 
# This part is not necessary if you have a different 404 page already set up, but
# I wanted to show the syntax for "does not exist".  
RewriteCond  %{APPL_PHYSICAL_PATH}$1.html        !-f
RewriteRule  ^/foo/(.+)\.(html|htm)(\?(.*))?$    /foo/DoesNotExist.htm   [L]

 This is all documented in the CHM file.

ps: I use (.+) instead of (.*) because + implies "one or more characters" while * implies "zero or more characters".  In other words (.*)\.html will match a broken request like ".html" whereas (.+)\.html requires at least one char preceding the dot.