Rewrite Rule, Wordpress, and Sitemap.xml

Nov 10, 2008 at 5:47 PM
Edited Nov 10, 2008 at 5:49 PM
First of, Cheeso, this is an EXCELLENT and AWESOME rewrite dll.
Thank you, Thank you, Thank you!!!!

However, (sorry, I'm sure you get alot of these) I'm having a problem getting over a mental block.

I've installed and it's working wonderfully on a wordpress blog I'm using in IIS6/ 2003 server.

However, that blog has a sitemap.xml file that resides in the root.

How do I EXCLUDE that file, and others, from the REWRITE Rules. 

I've looked through the Readme.txt file, and I think that it has to do with Rewrite Conditions, but I just need a little guidance here.

Ok, I'm using
RewriteRule ^/(?!index.php)(?!wp-)(.*)$ /index.php/$1

for http://blog.beautyvice.com

In my IsapRewrite4.ini - as you can see, works like a charm.

However, when I try to get Google to pick up the sitemap.xml file, or I try to manually view any file on the root, It produces a page with a 200 OK HTTP response.

for instance

http://blog.beautyvice.com/sitemap.xml 

Should point to the xml file, but it points to a page

So is it the RewriteCond that I should be using?
Could you help me with the correct usage? 
I'm sorry, right now, the language is a little beyond me, and I would love to really get into it and learn it, but I have a pretty mission critical blog to take care of, and was hoping for a sample rule to exclude files or pages from the rule.

Thanks for any help.


Coordinator
Nov 11, 2008 at 12:51 AM
Glad you are finding it useful.
If you have a specific path you would like to exclude from URL rewrites, you can do so with a single rule that says "no rewrite" and also uses the [L] flag for "last rule."  eg

RewriteRule ^/sitemap.xml$   - [L]

The above rule says to NOT REWRITE any request for /sitemap.xml at the
current server.  The dash character says "no rewrite" and the [L] says
"don't process any more rules".  If you put that rule at the top of your
ini file, before any other rules, it will eliminate sitemap.xml from
rules processing.

Putting that rule at the bottom of your ini file, does not make much
sense.  In that case, all other rules will be evaluated against each
incoming URL, and only if none of them match the incoming request, will
that rule be applied, and it has no effect on the incoming request.

I mention this in the readme, but I did not provide an example. I'll update the readme.

Nov 11, 2008 at 3:48 AM
Worked like a charm.  Thank you.

Yes, examples would probably work best, especially for dudes like me.

I've read in other threads that you're not that up to speed on wordpress. 

This is an EXCELLENT solution for wordpress blog's on Windows servers.  It would be great if you took some time to delve into the possibilities.

But thanks for such an excellent solution!
Coordinator
Nov 11, 2008 at 5:54 PM
Mr Rumblpup,

Can you share your ini file for Wordpress, now that you've got it sorted?  I'd love to post it as an example on this site.

Jan 9, 2009 at 4:28 AM
Edited Jan 9, 2009 at 4:29 AM
Sorry I have not gotten back to this in 3 months, but I was in a automobile accident that has prevented me from a lot of things.

Here is the copy of the ini file I am using. I'm sure it's not pretty, but it's doing the job.  I would love to see someone who is better at writing these maybe add a couple of touches that would complete the ini file just for wordpress.

# PhpBlog.ini
#
#
# This example shows how one user employed IIRF twith a
# PHP-powered blog engine.
#
# Remember to rename this file to IsapiRewrite4.ini ,
# or copy this content to a file named IsapiRewrite4.ini
# before using it!
#
# Note: The example here was contributed by someone who uses IIRF.
# I haven't tested this ini file, and I don't make any guarantees that
# it works as advertised  or expected or desired.  It's an example.
# Good luck! -Cheeso
#
# Fri, 11 May 2007  12:46
#

RewriteLog  c:\temp\iirfLog.out
RewriteLogLevel 3

# MaxMatchCount
#
# Specifies the maximum number of sub-expression matches to
# capture for a single pattern. This specifies the size of the
# array in the C module.  If you have a pattern with more than
# the default number of matches, set this number.
#
# The default is 10.

MaxMatchCount 10

#
# /content/blogcategory/0/33/
#
# should translate to
#
# /index.php?option=com_content&task=blogcategory&id=0&Itemid=33

#RewriteRule ^/([^/]+)/([^/]+)/([^/]+)/([^/]+)/$  /index.php?option=com_$1&task=$2&id=$3&item=$4
RewriteRule ^/sitemap.xml$   - [L]
RewriteRule ^/(?!index.php)(?!wp-)(.*)$ /index.php/$1
May 21, 2009 at 1:40 AM

I am using this plugin with wonderful success on my site:  http://www.300guitars.com, a wordpress based site...

only thing is I am getting 404s on my sitemap.xml, this goes for direct browser connection or via google...

I was so excited to see this thread, especially the sample .ini file above... unfortunately it doesn't seem to help with my problem... any ideas?  fyi I am using rumblpup's above....

thanks so much and please know many are grateful for this fantastic filter

Jun 8, 2009 at 6:36 AM
Edited Jun 8, 2009 at 6:38 AM

Wow, the emails from codeplex show up in Junk Mail, even though I thought I fixed it.  Anyway, I didn't need anything else other than

RewriteRule ^/sitemap.xml$   - [L]

Try commenting out your rules one by one, and see if where the filter might be failing. 

Also try what Cheeso said, putting it as the last line of your rules.  It's working perfectly for the blog at my umbrellas site.

Jun 8, 2009 at 2:16 PM

hi rumblpup,

thanks so much for the response, I am confused, I thought cheeso had said to put it first?  

When I do put it first, I get a 404, when I put it last, it loads my header and sidebars without the sitemap data... 

I think I need to check with my host...

thanks again...

Jun 15, 2009 at 5:36 PM

Thanks for sharing, this helped me with my blog.

I do have a question (and believe me, I tried everything already).

Our main site needs to be on https at all times, which is a simple rewrite. I dont know however how to not rewite the blog to https while still rewriting blog-related urls.

Any ideas?

Coordinator
Jun 25, 2009 at 7:17 PM

@fileitup , Can you explain what you want in some additional detail, with some examples?

Jun 25, 2009 at 8:20 PM

Thanks for replying @cheeso but I already figured it out and posted my results in another thread. http://iirf.codeplex.com/Thread/View.aspx?ThreadId=59446

Jun 25, 2009 at 10:09 PM

Hi

 

I was hoping to be able to redirect

 

https://www.oldserver.org to https://www.newserver.org

 

I tried using

Jun 25, 2009 at 10:10 PM

RedirectRule https://www.oldserver.org$ https://www.newserver.org/$1 [I]

 

but to no avail does that make sense that that would not redirect?

Coordinator
Jun 26, 2009 at 8:43 PM

YES, it makes sense that it would not work.  Sorry, it's not as simple as you would like.  Check the readme.  You cannot include the http:// in the pattern.  You need to reference the HTTP_HOST server variable in an assoicated RewriteCond statement.  Check the readme, there are examples.  Also check the forums, I just answered a question like this.

cheers!

 

Jul 30, 2009 at 3:11 AM

Hi,

Just want to ask again in the hope that someone has some insight..

My site is here:  http://www.300guitars.com

It is wordpress with Permalinks set to:   /%year%/%monthnum%/%postname%/

Server is IIS, I would post a link to phpinfo.php, but it won't load properly like sitemap.xml

Here is my IsapiRewrite4.ini:

 

# PhpBlog.ini
#
#
# This example shows how one user employed IIRF twith a
# PHP-powered blog engine.
# Remember to rename this file to IsapiRewrite4.ini ,
# or copy this content to a file named IsapiRewrite4.ini
# before using it!
#
# Note: The example here was contributed by someone who uses IIRF.
# I haven't tested this ini file, and I don't make any guarantees that 
# it works as advertised  or expected or desired.  It's an example. 
# Good luck! -Cheeso
#
# Fri, 11 May 2007  12:46
#
RewriteLog  c:\temp\iirfLog.out
RewriteLogLevel 3
# MaxMatchCount
#
# Specifies the maximum number of sub-expression matches to
# capture for a single pattern. This specifies the size of the
# array in the C module.  If you have a pattern with more than
# the default number of matches, set this number.
#
# The default is 10. 
MaxMatchCount 10
# /content/blogcategory/0/33/
# should translate to
# /index.php?option=com_content&task=blogcategory&id=0&Itemid=33
#RewriteRule ^/([^/]+)/([^/]+)/([^/]+)/([^/]+)/$  /index.php?option=com_$1&task=$2&id=$3&item=$4
RewriteRule ^/sitemap.xml$   -[L]
RewriteRule ^/(?!index.php)(?!wp-)(.*)$ /index.php/$1

 

 

 

For some reason I still get a 404 for http://www.300guitars.com/sitemap.xml

any ideas?  what more information can I provide to help?

 

best,

todd

 

Coordinator
Jul 30, 2009 at 3:23 AM
Edited Jul 30, 2009 at 3:36 AM

I can't tell for sure but it looks like you have

  RewriteRule  ^/sitemap.xml$  -[L]

(notice, no space between the - and the [L]).  I think you want at least one space there.  More than one space is ok, in fact I think it makes the ini file easier to read. You also want to escape the dot.  The corrected line is

  RewriteRule  ^/sitemap\.xml$      -    [L]

If you had the original line, with no space between - and [L], then the result of requesting http://www.300guitars.com/sitemap.xml would be a 404.  It would be rewriting to a URL like -[L], which I suppose is not a document that exists on your server.  So, 404.  You should be able to see this pretty clearly in the IIRF log file . 

It's sort of inconvenient to scan the log file to see whether your rules are working properly, but it is iireplaceable if you want to actually see what the filter is doing.

In addition to making that change to the RewriteRule, I suggest that you also add another directive in the ini file:

    StatusUrl   /iirfstatus 

This is supported  with IIRF v2.0g or v1.2.16 R5 or later. With that directive in your ini file, if you then submit a URL like http://www.300guitars.com/iirfStatus, then you will get an HTML page with a very brief report on the status of the IIRF filter. That page is a quick way to verify that the syntax of your ini file is correct, and that there were no errors in parsing the ini file.

In your case, though, with "-[L]" in the ini file, there would be no error or warning, I think.  It's just a typo in the ini file and there's no way for IIRF to know that you intended something else. 

 

Coordinator
Jul 30, 2009 at 3:34 AM

ps; in the future it's really best to just start a new thread for a new question.  It makes it easier to search later, too.

thanks

 

Jul 30, 2009 at 5:14 AM

Hi

RumblePup, thank you for posting the ini file you use. However I am still having trouble replicating functionality like your site http://blog.beautyvice.com/

Heres my IsapiRewrite4.ini

 

#RewriteLog  C:\Temp
#RewriteLogLevel 3

# MaxMatchCount
#
# Specifies the maximum number of sub-expression matches to
# capture for a single pattern. This specifies the size of the
# array in the C module.  If you have a pattern with more than
# the default number of matches, set this number.
#
# The default is 10.

MaxMatchCount 10

RewriteCond %{HTTP_HOST} ^indiagrow\.in 
RewriteRule (.*) http://www.indiagrow.in$1 [R=301,L]

# 
# /content/blogcategory/0/33/
# 
# should translate to
# 
# /index.php?option=com_content&task=blogcategory&id=0&Itemid=33

#Expects a Custom wordpress permalink of:  /%post_id%/%category%/%postname%.html

RewriteRule ^/wp-admin/([^/]+)$  /wp-admin/$1
RewriteRule ^/([^/]+)/([^/]+)/([^/]+).html$  /index.php?p=$1
RewriteRule ^/comments/feed /index.php?feed=comments-rss2
RewriteRule ^/([^/]+)/([^/]+)$  /index.php?page_id=$1

#Add a rule for each page you wish to add in your blog.  Lookup the page id from the admin section of wordpress
RewriteRule ^/about  /index.php?page_id=2
RewriteRule ^/events  /index.php?page_id=41&
RewriteRule ^/contact  /index.php?page_id=112
RewriteRule ^/new-to-boston  /index.php?page_id=104
RewriteRule ^/food-dining  /index.php?page_id=9
RewriteRule ^/entertainment  /index.php?page_id=5

#Rewrite for categories
#/?cat=3
RewriteRule ^/defence  /?cat=3
RewriteRule ^/economy  /index.php?cat=14
RewriteRule ^/industry  /index.php?cat=17
RewriteRule ^/infrastructure  /index.php?cat=3
RewriteRule ^/happenings  /index.php?cat=4




#Rule for RSS Feeds
RewriteRule ^/feed /index.php?feed=rss2

RewriteRule ^/$ /index.php


IterationLimit 10
NotParsed  foo bar

The permalink is 

/%post_id%/%category%/%postname%.html

Could you post the permalink structure you are using with your ini file.
Thanks
DaCoder

/%post_id%/%category%/%postname%.html

 

 

 

Coordinator
Jul 30, 2009 at 5:52 AM

DaCoder, a couple observations

#1 you have logging turned off, so there's no way for you to check what is happening with your rules.  I suggest you uncomment the RewriteLog and RewriteLogLevel lines, run some URLs through the filter, and then examine the log file to see what is really happening.

#2, you have some old junk in your ini file, incuding a NotParsed statement, which I think I included in the documentation once to show that some lines will not be parsed.  Also you have two IterationLimit statements.  And, the comments in your ini file seem to have no correspondence to the actual ini file statements.  You should clean all that stuff up.  

#3, yor rules have patterns with multiple captures, but typically you only rewrite with the first capture. For example:

  RewriteRule ^/([^/]+)/([^/]+)/([^/]+).html$  /index.php?p=$1
  ...
  RewriteRule ^/([^/]+)/([^/]+)$  /index.php?page_id=$1

Notice that in the first rule there, you have 3 captures, but in the replacement string, you reference only the first capture, with $1. In the second rule shown above you have two captures, but reference only the first one. That seems surprising. Probably not what you want.

#4, you haven't explained what problem you are having. You just said "it doesn't work." I'm gonna need more information than that. What doesn't work?

 

Jul 30, 2009 at 2:22 PM

Hi cheeso

Thanks for responding. I must admit I am a newbie when it comes to regex. I dont completely understand how rewrite works. I copied parts of the file from different places on the internet trying to see which one worked. 

Right now on my site indiagrow.in I get the friendly  URLs to show up like http://indiagrow.in/50/technology/airtel-slashes-broadband-rates.html

As I mentioned earlier my permalink structure is : /%post_id%/%category%/%postname%.html

The problem is none of the category pages, tag searches work

Problem Example 1: When I click on one of the categories at the top level of my site, a URL like http://indiagrow.in/category/economy is displayed in the address bar, but the site displays the home page ,not just articles related to the economy.

Problem Example 2:When I c lick on tag say middle class a URL  "http://indiagrow.in/tag/middle-class" is displayed in the address bar, but again the content displays the homepage, not search result based on the tag "middle tag"

Thanks

DaCoder

 

 

Coordinator
Jul 30, 2009 at 4:49 PM

Turn on logging and check the log messages to figure out why your rules behave the way they do.  The log may seem hard to decipher at first but if you spend just a few moments looking, you will see the incoming URLs and the rewrite actions on them.  you'll see which rules matche and which do not.  You will see the patterns that get replaced.  It's all very enlightening.

Also check the doc - there are good examples for how to do what you want. 

 

Aug 1, 2009 at 6:24 AM

Hi

Do you think you could by any chance be able to put out some code sample for wordpress. The one from RumblePup doesnt work for me. 

I have not been able to get a log file since I have to contact the hosting company to know the path to the folder. 

Thanks

DaCoder

Coordinator
Aug 1, 2009 at 8:52 AM

I already did give you some rules.  I showed you some specific suggestions. Did you try them?   After I sent my suggestions, your next reply was more explaining what you wanted.  Then I made some more suggestions and you said "can you write the rules for me?"  and "the ini file from Rumblepup isn't working for me."    So far, I don't see that you've tried anything I suggested.  ???    

I don't have wordpress.  I can't solve this for you.  You have wordpress, though!  And you understand the URL structure.  You understand the URLs you have, and the URLs you would like to have.   If you put some effort in, I am confident that you will be able to figure it out yourself.  You can read the doc, understand the tool, check the existing examples, run some tests, iterate over a solution, and come up with something that works for you.  You need to put some effort in.    Show me that you are actually using any of my suggestions and I will be willing to give you more help.

another suggestion for you: If you are running IIRF hosted, then you need to develop and verify the rewrite rules on your local machine before deploying them to the hosted server.   Install IIRF on your own machine, figure out which URLS you want to rewrite , and how you expect them to look after rewrite. Then write the rules, starting with what you have above.  Run your URLs through the filter. Check the log, see if it behaved the way you wanted.    With your simple requirements (no evaluation of server variables), you could also do this with the testdriver without ever installing IIRF on a web server.  You can test variants of rulesets in about 3 seconds.  Edit the rule in a text editor.  Run the testdriver.  Examine results.  Repeat.  The total cycle time is 20 seconds, max.   It's not difficult.   The trick is you have to read the doc. You have to spend some of your own time.

Check the doc, it is all explained in there.   

With a little effort on your part you will get it.  

 

Aug 12, 2009 at 5:19 PM

Do you know if anyone has tried to get this to work on Windows 2003 IIS 6 with wordpress'plugin for super cache?

The rewrite rules for apache would be:

-----------------.htaccess-----------------
RewriteEngine On
RewriteBase /

RewriteCond %{REQUEST_METHOD} !=POST
RewriteCond %{QUERY_STRING} !.*=.*
RewriteCond %{HTTP_COOKIE} !^.*(comment_author_|wordpress|wp-postpass_).*$
RewriteCond %{HTTP:Accept-Encoding} gzip
RewriteCond %{HTTP_user_agent} !^.*(2.0\ MMP|240x320|AvantGo|BlackBerry|Blazer|Cellphone|Danger|DoCoMo|Elaine/3.0|EudoraWeb|hiptop|IEMobile|iPhone|iPod|KYOCERA/WX310K|LG/U990|MIDP-2.0|MMEF20|MOT-V|NetFront|Newt|Nintendo\ Wii|Nitro|Nokia|Opera\ Mini|Palm|Playstation\ Portable|portalmmm|Proxinet|ProxiNet|SHARP-TQ-GX10|Small|SonyEricsson|Symbian\ OS|SymbianOS|TS21i-10|UP.Browser|UP.Link|Windows\ CE|WinWAP).*
RewriteCond %{DOCUMENT_ROOT}/wp-content/cache/supercache/%{HTTP_HOST}/$1/index.html.gz -f
RewriteRule ^(.*) /wp-content/cache/supercache/%{HTTP_HOST}/$1/index.html.gz [L]

RewriteCond %{REQUEST_METHOD} !=POST
RewriteCond %{QUERY_STRING} !.*=.*
RewriteCond %{QUERY_STRING} !.*attachment_id=.*
RewriteCond %{HTTP_COOKIE} !^.*(comment_author_|wordpress|wp-postpass_).*$
RewriteCond %{HTTP_user_agent} !^.*(2.0\ MMP|240x320|AvantGo|BlackBerry|Blazer|Cellphone|Danger|DoCoMo|Elaine/3.0|EudoraWeb|hiptop|IEMobile|iPhone|iPod|KYOCERA/WX310K|LG/U990|MIDP-2.0|MMEF20|MOT-V|NetFront|Newt|Nintendo\ Wii|Nitro|Nokia|Opera\ Mini|Palm|Playstation\ Portable|portalmmm|Proxinet|ProxiNet|SHARP-TQ-GX10|Small|SonyEricsson|Symbian\ OS|SymbianOS|TS21i-10|UP.Browser|UP.Link|Windows\ CE|WinWAP).*
RewriteCond %{DOCUMENT_ROOT}/wp-content/cache/supercache/%{HTTP_HOST}/$1/index.html -f
RewriteRule ^(.*) /wp-content/cache/supercache/%{HTTP_HOST}/$1/index.html [L]

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
-----------------.htaccess-----------------
Coordinator
Aug 12, 2009 at 8:25 PM

I don't know if it's been done, but the IIRF conversion is like this: (basically replace all the ! prefixes for the RewriteCond with (?!...).  (But you need to move the caret out))

-----------------.htaccess-----------------
RewriteEngine On
RewriteBase /

RewriteCond %{REQUEST_METHOD} (?!=POST)
RewriteCond %{QUERY_STRING} (?!.*=.*)
RewriteCond %{HTTP_COOKIE} ^(?!.*(comment_author_|wordpress|wp-postpass_).*)$
RewriteCond %{HTTP:Accept-Encoding} gzip
RewriteCond %{HTTP_user_agent} ^(?!.*(2.0\ MMP|240x320|AvantGo|BlackBerry|Blazer|Cellphone|Danger|DoCoMo|Elaine/3.0|EudoraWeb|hiptop|IEMobile|iPhone|iPod|KYOCERA/WX310K|LG/U990|MIDP-2.0|MMEF20|MOT-V|NetFront|Newt|Nintendo\ Wii|Nitro|Nokia|Opera\ Mini|Palm|Playstation\ Portable|portalmmm|Proxinet|ProxiNet|SHARP-TQ-GX10|Small|SonyEricsson|Symbian\ OS|SymbianOS|TS21i-10|UP.Browser|UP.Link|Windows\ CE|WinWAP).*)
RewriteCond %{DOCUMENT_ROOT}/wp-content/cache/supercache/%{HTTP_HOST}/$1/index.html.gz -f
RewriteRule ^(.*) /wp-content/cache/supercache/%{HTTP_HOST}/$1/index.html.gz [L]

RewriteCond %{REQUEST_METHOD} (?!=POST)
RewriteCond %{QUERY_STRING} (?!.*=.*)
RewriteCond %{QUERY_STRING} (?!.*attachment_id=.*)
RewriteCond %{HTTP_COOKIE} ^(?!.*(comment_author_|wordpress|wp-postpass_).*)$
RewriteCond %{HTTP_user_agent} ^(?!.*(2.0\ MMP|240x320|AvantGo|BlackBerry|Blazer|Cellphone|Danger|DoCoMo|Elaine/3.0|EudoraWeb|hiptop|IEMobile|iPhone|iPod|KYOCERA/WX310K|LG/U990|MIDP-2.0|MMEF20|MOT-V|NetFront|Newt|Nintendo\ Wii|Nitro|Nokia|Opera\ Mini|Palm|Playstation\ Portable|portalmmm|Proxinet|ProxiNet|SHARP-TQ-GX10|Small|SonyEricsson|Symbian\ OS|SymbianOS|TS21i-10|UP.Browser|UP.Link|Windows\ CE|WinWAP).*)
RewriteCond %{DOCUMENT_ROOT}/wp-content/cache/supercache/%{HTTP_HOST}/$1/index.html -f
RewriteRule ^(.*) /wp-content/cache/supercache/%{HTTP_HOST}/$1/index.html [L]

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
-----------------.htaccess-----------------