IIRF not playing nicely with query string?

Feb 12, 2011 at 12:07 AM
Edited Feb 12, 2011 at 12:07 AM

First off, my iirf.ini file:

 

# Vital components of your .htaccess file
RewriteEngine On
RewriteBase /

# The Friendly URLs part

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php?q=$1 [L,QSA]

RewriteLog /LOGfile.txt
RewriteLogLevel 2

 

This is the file (well, I added the logging) that is supplied in the .htaccess with my CMS. Pages are output to the browser along the lines of [site_root]/page1 while the CMS needs /index.php?q=page1. The rewrite rule here works fine with those such pages (on the surface, I'll get to that briefly), but fails with query strings. Such as /blog/?page=1, which ought to be rewritten to /index.php?q=blog/&page=1. But I get a 404 error instead.

I looked at the logs and noticed some odd-looking behaviour:

 

Fri Feb 11 19:41:31 -  1144 - DoRewrites: Url (no decoding): '/blog/?page=1'
Fri Feb 11 19:41:31 -  1144 - EvaluateRules: Last Rule
Fri Feb 11 19:41:31 -  1144 - DoRewrites: Rewrite Url to: '/index.php?q=blog/?page=1&page=1'
Fri Feb 11 19:41:31 -  1144 - DoRewrites: Url (no decoding): '/index.php?q=blog/?page=1&page=1'
Fri Feb 11 19:41:31 -  1144 - EvaluateRules: Last Rule
Fri Feb 11 19:41:31 -  1144 - DoRewrites: Rewrite Url to: '/index.php?q=index.php?q=blog/?page=1&page=1&q=blog/?page=1&page=1'

 

Interestingly, even the pages that don't return 404 errors but bring up the proper pages are being written twice it appears:

 

Fri Feb 11 19:51:22 -  3684 - DoRewrites: Url (no decoding): '/blog/'
Fri Feb 11 19:51:22 -  3684 - EvaluateRules: Last Rule
Fri Feb 11 19:51:22 -  3684 - DoRewrites: Rewrite Url to: '/index.php?q=blog/'
Fri Feb 11 19:51:22 -  3684 - DoRewrites: Url (no decoding): '/index.php?q=blog/'
Fri Feb 11 19:51:22 -  3684 - EvaluateRules: Last Rule
Fri Feb 11 19:51:22 -  3684 - DoRewrites: Rewrite Url to: '/index.php?q=index.php?q=blog/&q=blog/'

 

This doesn't seem like normal behavior to me, but I'm not familiar with mod_rewrite enough to be able to figure this out. In working with someone who has used mod_rewrite, he agreed that something odd was happening. From what little I understand, I was expecting to see something in the log along these lines:

 

Fri Feb 11 -  3684 - DoRewrites: Url (no decoding): '/blog/?page=1'
Fri Feb 11 -  3684 - EvaluateRules: Last Rule
Fri Feb 11 -  3684 - DoRewrites: Rewrite Url to: '/index.php?q=blog/&page=1'
Fri Feb 11 -  3684 - DoRewrites: Url (no decoding): '/index.php?q=blog/&page=1'

 

Is this an issue with IIRF? I'm not looking to do anything crazy here, it seems that this ought to work.

Coordinator
Feb 20, 2011 at 5:06 PM

Well, let's think about what you're doing.

Your URL pattern is ^/(.*)$ . In english, this means, "any string that starts with / , followed by zero or more of anything, followed by the end-of-line."  In other words it matches any URL, and because of your use of parenthesis, it stuffs the entire URL, query string and all, into the first capture group. 

Now, the replacement string is index.php?q=$1 , which says, rewrite to index.php?q=SOMETHING, where SOMETHING is replaced with the first capture group.  From above we know this is the entire URL, query string and all.   Additionally, you use the QSA modifier, which says to append any query string present on the original URL, to the rewritten URL.

When you pass input like '/blog/?page=1' into that rule, the rule fires. The first capture group is .. the entire URL, query string and all:  '/blog/?page=1' . Therefore, the rewrite produces  'index.php?q=/blog/?page=1' .  Then, the QSA modifier is applied, and you get the query string appended (again).  The original URL had a query string of 'page=1',  resulting in   'index.php?q=/blog/?page=1&page=1' . 


So I think the behavior you see is expected behavior.  If you don't want the query string in there twice, don't capture it.  In other words, use a rule like this:

RewriteRule ^/([^?]*) /info.aspx?q=$1 [L,QSA]

What this says is, match any URL, but capture only up to (and not including) the first question mark.  The Question mark denotes the boundary between the url path and the query string.  So in this rule $1 gets the url path only, and never captures the query string.  The replacement string puts the URL path into querystring param q, and then appends any additional query string params.

I think that is what you are expecting to do. 

Your original rule would work fine, if you passed in only URLs that had no question mark.

 

Feb 20, 2011 at 10:15 PM

Cheeso, thanks. I understand exactly what you are saying, and it makes sense for the 404 links. With some tweaking of your proposed rule, I got it working it seems. The rule ended up as:

RewriteRule ^([^?]*) index.php?q=$1 [L,QSA]

I still see the URLs being rewritten twice in the log, not sure if this is normal or not. But things appear to be working on the user-end at least. See what I mean in this log snippet:

Sun Feb 20 18:09:08 -  3792 - DoRewrites: Url (no decoding): '/blog/'
Sun Feb 20 18:09:08 -  3792 - EvaluateRules: Last Rule
Sun Feb 20 18:09:08 -  3792 - DoRewrites: Rewrite Url to: '/index.php?q=blog/'
Sun Feb 20 18:09:08 -  3792 - DoRewrites: Url (no decoding): '/index.php?q=blog/'
Sun Feb 20 18:09:08 -  3792 - EvaluateRules: Last Rule
Sun Feb 20 18:09:08 -  3792 - DoRewrites: Rewrite Url to: '/index.php?q=index.php&q=blog/'
And then it does the same with the next URL, and so on. Thanks for at least helping get it working! If you can tell me if this is normal behaviour I see in the log or not that';d be great, I thought that the L meant that it was finished with the URL?

Coordinator
Feb 21, 2011 at 4:09 AM

Each line in your log that reads "DoRewrites: URL (no decoding): ...."  indicates a new URL arriving into the IIRF filter, probably from a browser. 

So, what I See in your log is a single URL that arrives, "/blog/", and it gets rewritten to /index.php?q=blog/ .   At some point later, IIRF receives another incoming URL request, with the same value that it has rewritten to, for the prior URL. 

To figure out what is happening, it might be helpful to look at a trace from Fiddler2, or something similar.  This would show whether the browser is actually requesting the URL twice in succession.

Regardless of whether the browser is requesting the URL twice, you will want to filter out index.php from the URLs that you rewrite.  Not sure if the RewriteCond statements are doing that in your case - you may want to test.  You can also explicitly avoid index.php via a negative lookahead, eg putting (?!index.php) at the front of your regex for the RewriteRule.

Doing this may avoid the double-handling you see, but you shouldn't use this change as a way to "fix" the problem. I'm telling you you're seeing 2 requests, and you need to understand why.  If you change the rule to avoid handling the 2nd request, it will make things better but it doesn't avoid generating the 2 requests, when you expect only one.

good luck!

 

Feb 21, 2011 at 10:44 PM

That's what I figured, I thought it was 2 requests. Sounds like it isn't an IIRF thing, I'll look into it.  Thanks again!

Coordinator
Feb 21, 2011 at 11:32 PM

once I had a person describe a problem where they were trying to rewrite, but there was a redirect happening and they couldn't figure out why.  The symptom was similar to yours: where the rewrite happened correctly, according to the IIRF log file, but then a stray redirect or re-request caused a second URL request to arrive.

After some investigation they determined that they had installed IIRF twice, and had 2 separate configurations, one of them had a rewrite, and the other had a redirect.

I don't believe this is happening in your case because with the current IIRF, for any vdir, there is only one possible IIRF.ini.  (The problem I Described above occurred with an earlier version of IIRF where configuration was stored outside the vdir).  But, the 2nd request could be caused by a separate tool or configuration setting in IIS, or something odd in the browser or in a proxy server btween the requesting browser and IIS.

But yes, almost definitely not IIRF.  If it were IIRF then you would see a log message indicating a redirect had occurred, before that 2nd request arrived.

good luck.