8 cycles limit because a bad rewritecond test

Topics: User Forum
Oct 14, 2010 at 3:15 PM

Hi Cheeso and all other fellows.

I have installed the last version of IIRF to find what may be a bug. I know that after each rewrite the result is passed again through all the filters as it must be. But as you can see from the log the loop does not stop where it might do and continue processing until the 8 cycle limit is reached producing a wrong result.

Thu Oct 14 15:38:38 -  4540 - ReadVdirConfig: ini file timestamp: 2010/10/14 15:38:21 Hora de verano romance
Thu Oct 14 15:38:38 -  4540 - ReadVdirConfig: cfg(0x02145C10)
Thu Oct 14 15:38:38 -  4540 - ReadVdirConfig: LogLevel = 3
Thu Oct 14 15:38:38 -  4540 - ReadVdirConfig: C:\Web\Doymer_pub\Iirf.ini(28): RewriteEngine will be enabled.

The rules are the following:

Thu Oct 14 15:38:38 -  4540 - ReadVdirConfig: C:\Web\Doymer_pub\Iirf.ini(44): RewriteRule (rule 1)  '^/blog/(.*)$'  '/$1'     [NC]
Thu Oct 14 15:38:38 -  4540 - ReadVdirConfig: C:\Web\Doymer_pub\Iirf.ini(48): RewriteCond   %{REQUEST_URI}  !^/plugins/ '(null)'
Thu Oct 14 15:38:38 -  4540 - ReadVdirConfig: C:\Web\Doymer_pub\Iirf.ini(49): RewriteCond   %{REQUEST_URI}  !^/images/ '(null)'
Thu Oct 14 15:38:38 -  4540 - ReadVdirConfig: C:\Web\Doymer_pub\Iirf.ini(50): RewriteCond   %{REQUEST_URI}  !^/media/ '(null)'
Thu Oct 14 15:38:38 -  4540 - ReadVdirConfig: C:\Web\Doymer_pub\Iirf.ini(51): RewriteCond   %{REQUEST_URI}  !^/includes/ '(null)'
Thu Oct 14 15:38:38 -  4540 - ReadVdirConfig: C:\Web\Doymer_pub\Iirf.ini(52): RewriteCond   %{REQUEST_URI}  !^/templates/ '(null)'
Thu Oct 14 15:38:38 -  4540 - ReadVdirConfig: C:\Web\Doymer_pub\Iirf.ini(53): RewriteCond   %{REQUEST_URI}  !^/components/ '(null)'
Thu Oct 14 15:38:38 -  4540 - ReadVdirConfig: C:\Web\Doymer_pub\Iirf.ini(54): RewriteCond   %{REQUEST_URI}  !^/administrator/ '(null)'
Thu Oct 14 15:38:38 -  4540 - ReadVdirConfig: C:\Web\Doymer_pub\Iirf.ini(55): RewriteCond   %{REQUEST_URI}  !^/index.php '(null)'
Thu Oct 14 15:38:38 -  4540 - ReadVdirConfig: C:\Web\Doymer_pub\Iirf.ini(56): RewriteRule (rule 2)  '^/(.*)'  '/index.php/$1'   (null)
Thu Oct 14 15:38:38 -  4540 - ReadVdirConfig: C:\Web\Doymer_pub\Iirf.ini(58): RewriteCond   %{REQUEST_FILENAME}  !-f '(null)'
Thu Oct 14 15:38:38 -  4540 - ReadVdirConfig: C:\Web\Doymer_pub\Iirf.ini(59): RewriteCond   %{REQUEST_FILENAME}  !-d '(null)'
Thu Oct 14 15:38:38 -  4540 - ReadVdirConfig: C:\Web\Doymer_pub\Iirf.ini(60): RewriteCond   %{REQUEST_URI}  !^/index.php '(null)'
Thu Oct 14 15:38:38 -  4540 - ReadVdirConfig: C:\Web\Doymer_pub\Iirf.ini(61): RewriteCond   %{REQUEST_URI}  (/|\.php|\.html|\.htm|\.feed|\.pdf|\.raw|/[^.]*)$ '[NC]'
Thu Oct 14 15:38:38 -  4540 - ParseCondModifierFlags: '[NC]'
Thu Oct 14 15:38:38 -  4540 - ReadVdirConfig: C:\Web\Doymer_pub\Iirf.ini(62): RewriteRule (rule 3)  '(.*)'  'index.php'      [L]
Thu Oct 14 15:38:38 -  4540 - ReadVdirConfig: C:\Web\Doymer_pub\Iirf.ini(74): RewriteCond   %{QUERY_STRING}  mosConfig_[a-zA-Z_]{1,21}(=|\%3D) '[OR]'
Thu Oct 14 15:38:38 -  4540 - ParseCondModifierFlags: '[OR]'
Thu Oct 14 15:38:38 -  4540 - ReadVdirConfig: C:\Web\Doymer_pub\Iirf.ini(76): RewriteCond   %{QUERY_STRING}  base64_encode.*\(.*\) '[OR]'
Thu Oct 14 15:38:38 -  4540 - ParseCondModifierFlags: '[OR]'
Thu Oct 14 15:38:38 -  4540 - ReadVdirConfig: C:\Web\Doymer_pub\Iirf.ini(78): RewriteCond   %{QUERY_STRING}  (\<|%3C).*script.*(\>|%3E) '[NC,OR]'
Thu Oct 14 15:38:38 -  4540 - ParseCondModifierFlags: '[NC,OR]'
Thu Oct 14 15:38:38 -  4540 - ReadVdirConfig: C:\Web\Doymer_pub\Iirf.ini(80): RewriteCond   %{QUERY_STRING}  GLOBALS(=|\[|\%[0-9A-Z]{0,2}) '[OR]'
Thu Oct 14 15:38:38 -  4540 - ParseCondModifierFlags: '[OR]'
Thu Oct 14 15:38:38 -  4540 - ReadVdirConfig: C:\Web\Doymer_pub\Iirf.ini(82): RewriteCond   %{QUERY_STRING}  _REQUEST(=|\[|\%[0-9A-Z]{0,2}) '(null)'
Thu Oct 14 15:38:38 -  4540 - ReadVdirConfig: C:\Web\Doymer_pub\Iirf.ini(84): RewriteRule (rule 4)  '^(.*)$'  'index.php'      [F]
Thu Oct 14 15:38:38 -  4540 - ReadVdirConfig: Done reading INI for the root vdir, found 4 rules (0 errors, 0 warnings) on 87 lines
Thu Oct 14 15:38:38 -  4540 - HttpFilterProc: SF_NOTIFY_URL_MAP
Thu Oct 14 15:38:38 -  4540 - HttpFilterProc: cfg= 0x02145C10
Thu Oct 14 15:38:38 -  4540 - HttpFilterProc: SF_NOTIFY_AUTH_COMPLETE
Thu Oct 14 15:38:38 -  4540 - DoRewrites


First loop. This one is correctly executed:

Thu Oct 14 15:38:38 -  4540 - DoRewrites: Url (no decoding): '/blog/es/blog/4-joomla/63-mod-arttree'
Thu Oct 14 15:38:38 -  4540 - EvaluateRules: depth=0
Thu Oct 14 15:38:38 -  4540 - EvaluateRules: no RewriteBase
Thu Oct 14 15:38:38 -  4540 - EvaluateRules: Rule 1: 2 match
Thu Oct 14 15:38:38 -  4540 - EvaluateRules: Result (length 32): /es/blog/4-joomla/63-mod-arttree    <- This is CORRECT

Second loop now with the former modified URI /es/blog/4-joomla/63-mod-arttree


Thu Oct 14 15:38:38 -  4540 - EvaluateRules: depth=1
Thu Oct 14 15:38:38 -  4540 - EvaluateRules: no RewriteBase
Thu Oct 14 15:38:38 -  4540 - EvaluateRules: Rule 1: -1 (No match)           <---- This is CORRECT. URI now does NOT starts with /blog/
Thu Oct 14 15:38:38 -  4540 - EvaluateRules: Rule 2: 2 matches
Thu Oct 14 15:38:38 -  4540 - EvaluateRules: Rule 2: evaluating condition
Thu Oct 14 15:38:38 -  4540 - EvalCondition: Cond t(%{REQUEST_URI}) op(|) p(!^/plugins/) => TRUE
Thu Oct 14 15:38:38 -  4540 - EvalCondition: Cond t(%{REQUEST_URI}) op(|) p(!^/images/) => TRUE
Thu Oct 14 15:38:38 -  4540 - EvalCondition: Cond t(%{REQUEST_URI}) op(|) p(!^/media/) => TRUE
Thu Oct 14 15:38:38 -  4540 - EvalCondition: Cond t(%{REQUEST_URI}) op(|) p(!^/includes/) => TRUE
Thu Oct 14 15:38:38 -  4540 - EvalCondition: Cond t(%{REQUEST_URI}) op(|) p(!^/templates/) => TRUE
Thu Oct 14 15:38:38 -  4540 - EvalCondition: Cond t(%{REQUEST_URI}) op(|) p(!^/components/) => TRUE
Thu Oct 14 15:38:38 -  4540 - EvalCondition: Cond t(%{REQUEST_URI}) op(|) p(!^/administrator/) => TRUE
Thu Oct 14 15:38:38 -  4540 - EvalCondition: Cond t(%{REQUEST_URI}) op(|) p(!^/index.php) => TRUE
Thu Oct 14 15:38:38 -  4540 - EvalConditionList: rule 2, TRUE, Rule will apply                                                       <--------- This is CORRECT, URI does NOT starts with any of defined keywords
Thu Oct 14 15:38:38 -  4540 - EvaluateRules: Result (length 42): /index.php/es/blog/4-joomla/63-mod-arttree

Third loop now with the former modified URI /index.php/es/blog/4-joomla/63-mod-arttree

Thu Oct 14 15:38:38 -  4540 - EvaluateRules: depth=2
Thu Oct 14 15:38:38 -  4540 - EvaluateRules: no RewriteBase
Thu Oct 14 15:38:38 -  4540 - EvaluateRules: Rule 1: -1 (No match)          <---- Again CORRECT
Thu Oct 14 15:38:38 -  4540 - EvaluateRules: Rule 2: 2 matches
Thu Oct 14 15:38:38 -  4540 - EvaluateRules: Rule 2: evaluating condition
Thu Oct 14 15:38:38 -  4540 - EvalCondition: Cond t(%{REQUEST_URI}) op(|) p(!^/plugins/) => TRUE
Thu Oct 14 15:38:38 -  4540 - EvalCondition: Cond t(%{REQUEST_URI}) op(|) p(!^/images/) => TRUE
Thu Oct 14 15:38:38 -  4540 - EvalCondition: Cond t(%{REQUEST_URI}) op(|) p(!^/media/) => TRUE
Thu Oct 14 15:38:38 -  4540 - EvalCondition: Cond t(%{REQUEST_URI}) op(|) p(!^/includes/) => TRUE
Thu Oct 14 15:38:38 -  4540 - EvalCondition: Cond t(%{REQUEST_URI}) op(|) p(!^/templates/) => TRUE
Thu Oct 14 15:38:38 -  4540 - EvalCondition: Cond t(%{REQUEST_URI}) op(|) p(!^/components/) => TRUE
Thu Oct 14 15:38:38 -  4540 - EvalCondition: Cond t(%{REQUEST_URI}) op(|) p(!^/administrator/) => TRUE
Thu Oct 14 15:38:38 -  4540 - EvalCondition: Cond t(%{REQUEST_URI}) op(|) p(!^/index.php) => TRUE         <------- this is WRONG. The URI already starts with /index.php
Thu Oct 14 15:38:38 -  4540 - EvalConditionList: rule 2, TRUE, Rule will apply
Thu Oct 14 15:38:38 -  4540 - EvaluateRules: Result (length 52): /index.php/index.php/es/blog/4-joomla/63-mod-arttree   <------ of course this is NOT CORRECT either
Thu Oct 14 15:38:38 -  4540 - EvaluateRules: depth=3
Thu Oct 14 15:38:38 -  4540 - EvaluateRules: no RewriteBase
Thu Oct 14 15:38:38 -  4540 - EvaluateRules: Rule 1: -1 (No match)
Thu Oct 14 15:38:38 -  4540 - EvaluateRules: Rule 2: 2 matches
Thu Oct 14 15:38:38 -  4540 - EvaluateRules: Rule 2: evaluating condition
Thu Oct 14 15:38:38 -  4540 - EvalCondition: Cond t(%{REQUEST_URI}) op(|) p(!^/plugins/) => TRUE
Thu Oct 14 15:38:38 -  4540 - EvalCondition: Cond t(%{REQUEST_URI}) op(|) p(!^/images/) => TRUE
Thu Oct 14 15:38:38 -  4540 - EvalCondition: Cond t(%{REQUEST_URI}) op(|) p(!^/media/) => TRUE
Thu Oct 14 15:38:38 -  4540 - EvalCondition: Cond t(%{REQUEST_URI}) op(|) p(!^/includes/) => TRUE
Thu Oct 14 15:38:38 -  4540 - EvalCondition: Cond t(%{REQUEST_URI}) op(|) p(!^/templates/) => TRUE
Thu Oct 14 15:38:38 -  4540 - EvalCondition: Cond t(%{REQUEST_URI}) op(|) p(!^/components/) => TRUE
Thu Oct 14 15:38:38 -  4540 - EvalCondition: Cond t(%{REQUEST_URI}) op(|) p(!^/administrator/) => TRUE
Thu Oct 14 15:38:38 -  4540 - EvalCondition: Cond t(%{REQUEST_URI}) op(|) p(!^/index.php) => TRUE
Thu Oct 14 15:38:38 -  4540 - EvalConditionList: rule 2, TRUE, Rule will apply
Thu Oct 14 15:38:38 -  4540 - EvaluateRules: Result (length 62): /index.php/index.php/index.php/es/blog/4-joomla/63-mod-arttree
Thu Oct 14 15:38:38 -  4540 - EvaluateRules: depth=4
Thu Oct 14 15:38:38 -  4540 - EvaluateRules: no RewriteBase

and so on until level 8 is reached. Maybe my rules are the culprit but I do not know why. I assume that REQUEST_URI is updated every time a rewrite rule is executed.

Any ideas?

Thanks and Regards

Coordinator
Oct 14, 2010 at 9:29 PM

This is a bug in your rule, or maybe a misunderstanding on your part of your own rule.  It is not a bug in IIRF.

> I assume that REQUEST_URI is updated every time a rewrite rule is executed.

This is not a correct assumption.  The RewriteCond directives that you use test a server variable (REQUEST_URI) that does not get changed each time a rule fires.  Because of this the RewriteCond evaluates to true, and rule #2 fires repeatedly.

There are two ways to avoid the problem.  #A: rather than using RewriteCond directives, insert into your RewriteRule #2, regex exclusions such as (?!index.php).  #B: use in your RewriteCond directives, the value $1 in place of %{REQUEST_URI}.  In this case you need to be careful about what you are capturing.  The rule as you have it written would not capture the leading slash in the incoming URL. 

Your question would be a good example for me to include in the documentation.  

 

Oct 14, 2010 at 10:49 PM
Edited Oct 14, 2010 at 11:00 PM

Hi Cheeso.

Yes, that was the other possibility I considered in its moment but, as the second iteration seemed to start with the modified result from the first one (the '/blog/' prefix is NOT found anymore) I ‘archived’ it in the trash bin.

Expecting that REQUEST_URI might not be updated by default may be the problem I tried to use the PT modifier but this is not supported right now by IIRF. I do not know if PT (in case it would be active) would achieve the expected result anyway but I tried it just in case.

I understand what you propose and I will try #2. I did not dare to try it by myself because I did not find any referral to that use of $ variables in the docs both yours or apache’s. Perhaps I could have been a bit more aggressive with that. You are suggesting that in each iteration the former variables are kept and injected in the process. Is this correct? How are them named, the same? How are then named variables in sucessive iterations?

So that the (reduced) ruleset to should be something like:

RewriteRule ^/blog/(.*)$ /$1 [NC] (input: /blog/es/blog/.... -> output: /es/blog/...)
RewriteCond /$1 !^/plugins/  (input: /es/blog/... <- NOT a match)
RewriteCond /$1 !^/images/
RewriteCond /$1 !^/index.php <---- This SHOULD match this time.....
RewriteRule ^/($1) or what? /index.php/$2 or what?

 

Anyway this arises another question to me; Why is not '/blog/' found again in the second iteration with the ruleset I already have? If REQUEST_URI is static and unmodified we *should* have got another match the second time too. It is the same case or, at least, it looks alike. For sure that you, as one that knows the inners of the rewriting trick, knows why. 

Out of context: if I activate logging I cannot delete the file unless I restart the IIS server. Is the file handle not closed each time?. Being able to delete the file is a good thing because it can grow really fast..

Thanks for your help.

 

Coordinator
Oct 14, 2010 at 10:59 PM

On your out-of-context question - yes, IIRF keeps the logfile open.  Adding a fopen/fclose() to each log transaction would make the log file much more costly.

Regarding "why is not /blog found again in the second iteration?"  I don't understand the question.  In the first pass evaluation of rules, the first RewriteRule looks for URLs that begin with /blog.  If the rule fires it replaces the URL being considered with whatever follows /blog.  A second iteration of the rules follows.  When the first rule is again evaluated, the URL being considered no longer begins with /blog. 

The URL gets updated with each iteration.  The Server variable REQUEST_URI, doesn't get updated, implicitly, when you update the URL.

 

Oct 14, 2010 at 11:14 PM

Sorry. You are quite fast and your reply crossed with my updating of the former message, something that I should probably have not done. Please refer to it to avoid repeating it here.

Yes. I see it. The first rule is NOT using REQUEST_URI to make the susbtitution so whatever I substitue is delivered to the following phase. As I am not using any server variable I assume that this is somehow an internal one you initialize with the URI the first time the filter is executed. Can we access and use this variable? Does it has any name to use it in succesive iterations?.

That would terminate all my problems (I think) not needing complicated uses of $ variables that may make more harm than good.

Out-of-context reply: I agree that it would slow down the processing but at least I suggest a switch to force closing. Keep in mind that the need to delete the file only happen when actively debugging a ruleset not before. So even if someone want a permenent log file he/she could disable the closing of the file avoiding the unnecessary overhead but will allow a controlled debugging of small files not having to deal with hundreds of lines. Something like a LogForceClose command defaulting to false if not present. Or a RotateLogs command that, if present, would close the file each time and open a new one.

Coordinator
Oct 14, 2010 at 11:32 PM

> As I am not using any server variable I assume that this is somehow an internal one you initialize with the URI the first time the filter is executed.

Yes, correct.  You can access the variable, in any rule or condition, as $0 .

I don't know what you mean by "complicated uses of $ variables".  I suppose you are not comfortable with the idea behind them, but numbered capture fields are standard in most regex packages.  In the context of IIRF, the 0th field ($0) represents the entire input subject.  The 1st field ($1) represents the first capture group, and so on.  IIRF is consistent with the rest of the regex world in this respect.   It might feel "simpler" for you to reference the server variable, but that gets updated only upon completion of the IIRF rule-processing iterations.  This is done for efficiency purposes, to avoid an ISAPI api call for each positive rule match.  A single URL might reasonly undergo 10 or more transformations, which would imply numerous redundant system calls, and a consequent drop in performance.  Considering that, IIRF updates the server variable once, after all processing is complete for a given URL.

on the second question - I like your idea of rotating log files, and I think that would be useful in normal operation, not just for debugging.  I am hesitant to add a new feature to IIRF to force the open/close of the logfile for each log transaction, specifically to allow users performing debugging to avoid handing log files that contain hundreds of lines.  Handling files that contain hundreds of lines is a problem that a good text editor solves quite readily, or you could use a unix-style utility like "more".   In short, I think there are better ways to address your problem.  I will consider the log rollover idea, though.

 

Coordinator
Oct 14, 2010 at 11:34 PM
This discussion has been copied to a work item. Click here to go to the work item and continue the discussion.
Oct 15, 2010 at 12:10 AM

Believe me Cheeso. Even with editors help (I am using notepad++) it is really a pain to find/follow one session in a 400K+ bytes log file. It is a lot more easy to have a one-file-one-session mechanism.

But I take your point. Only you know the overhead cost that opening and closing files will have in overall processing of iirf.

And regarding the point of being comfortable or not about using $ variables, it is not a matter of comfort; it is a matter of what I point to in my first reply: how are variables named in second and subsequent iterations when they are somehow nested?

Using the initial case we have:

RewriteRule ^/blog/(.*)$ /$1 [NC] (input: /blog/es/blog/.... -> output: /es/blog/...)  Here $1 equals blog/es/..., depth_0 iteration finishes here because a rewrite. Start depth_1 iter. This line does NOT matches this time
RewriteCond /$1 !^/plugins/  (input: $1 = /es/blog/... <- NOT a match)      Assuming that $1 has the same value in depth_1 that it had in depth_0. Is this true?
RewriteCond /$1 !^/images/
RewriteCond /$1 !^/index.php <---- This SHOULD match this time
RewriteRule ^/(.*) <or should this be ^/($1)>  /index.php/what?   <---- What do we use in the rewrite rule? Is .* the same as $1 (most probably not)

Understand what I am refering to? Or am I totally lost?

With your $0 variable this concatenated rules could be written this way:

RewriteRule ^/blog/(.*)$ /$1 [NC] (input: /blog/es/blog/.... -> output: /es/blog/...)
RewriteCond /$0 !^/plugins/  (input: $0 = /es/blog/... <- NOT a match)
RewriteCond /$0 !^/images/
RewriteCond /$0 !^/index.php
RewriteRule ^/(.*)  /index.php/$1 <---- This time .* is equal to $0, right? $1 is, now, local to the procedure so no naming problems here, different to the former depth_0 $1

Is this correct?

Regards. 

 

Coordinator
Oct 15, 2010 at 1:12 PM

You almost have it. 

$0, $1, and so on, should generally not be used in a regex pattern, but rather in the replacement pattern in a RewriteRule. 

The formal name for these things are "back references", but in more detail, in regex parlance they are references to the captured subpatterns of the regex that has just been evaluated.  You can use back references in the regex pattern, but they refer to prior captured subpatterns, and according to the PCRE documentation (PCRE is the regex engine used by IIRF), the syntax for use within a regex pattern is \1, \2, and so on.  A regex pattern like ([a-z]{4}).+?/1 will match strings like "abracadabra", with the \1 referring to the subpattern "abra".  You can never use \0 in a regex pattern, because it refers to "the entire subject string", and a subpattern will never match the entire subject string, except in the degenerate case.  If you don't understand this, read the PCRE documentation on back references.

As described in the IIRF documentation, these symbols can also be used in the RewriteCond, because the RewriteCond, though it appears before the RewriteRule (RedirectRule, ProxyPass...) in the ini file, is actually evaluated after the RewriteRule (etc) during runtime.  So the $0, $1, etc have a defined value at the time the RewriteCond is evaluated. Check the documentation for "back references" for more detail on this.

Your use of a form like /$0 in a RewriteCond seems wrong; the $0 refers to the entire subject string, including the leading slash, if any.  So if you want the refer to the URL in a RewriteCond, try just $0, with no leading slash.  

Now that I've written all this stuff, I have regrets. I am basically just quickly re-explaining what I have already written in the IIRF documentation.  I recommend that instead of asking me how regex works here, you read the IIRF documentation again, especially the parts on back references; I've explained it all in there, I promise.  If it's not clear there, cite the section and I will enhance the documentation to make it clearer for everyone.   It doesn't make sense for me to repeat descriptions of how regex works here, in parts, with each additional question you ask.  Also beyond the IIRF doc, read the PCRE doc, especially the part on back-references. And finally, run some tests, play around with the testdriver.exe, or other regex tools, and see for yourself how it works. 

Good luck!

 

Oct 15, 2010 at 1:38 PM

Hi Cheeso.

Sorry to have disgruntled you. It was not my intention. My apologizes.

Anyway you have helped a lot and have a very good eye; you are right that the / in /$0 is wrong. Thanks for all your time and interest.

Best Regards.

Coordinator
Oct 15, 2010 at 2:04 PM

I appreciate your concern.  I'm not disgruntled, I just think it's a bad idea for me to do basic explanations here in the forums when the documentation has (should have) all the necessary information. 

Anyway, good luck.