Application Pool crashes and PCRE loops

Topics: Developer Forum, User Forum
Jul 8, 2009 at 12:57 AM

I have been troubleshooting this issue for months. And by now, Cheeso must think I'm a total idiot. Thanks to him and others for their help and patience. I'm going to spare all of the gory details that led me here, and try to keep this one simple. Let's just say that I have arrived here after resolving a few other issues that cause this same event log entry, which cause my application pool processes (4 of them, in a web garden) to crash:

Event Type:	Warning
Event Source:	W3SVC
Event Category:	None
Event ID:	1011
Date:		7/7/2009
Time:		5:34:59 PM
User:		N/A
Computer:	WEB
Description:
A process serving application pool 'dotNET Pool' suffered a fatal communication error with the World Wide Web Publishing Service. The process id was '4132'. The data field contains the error number. 

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Data:
0000: 6d 00 07 80               m..€    

 


Here is the URL that is causing a crash. I know this reproduces the error, and I'm not sure if there are others like it. I've not found any yet.

/Artists/9168/Project-Object-with-Ike-Willis%2c-Ed-Mann-and-Don-Preston-(performing-the-music-of-Frank-Zappa)/Articles

Rewrites to:

/artists/articles.aspx?artistID=9168&a=Project-Object-with-Ike-Willis%2c-Ed-Mann-and-Don-Preston-(performing-the-music-of-Frank-Zappa)

 

Here are my rewrite rules in full. They've been fully tested and run fine through the TestDriver. I've run thousands of URLs through the test driver with these rules and not seen any problems. Even the URL in question above goes through the TestDriver and is rewritten as expected.

RewriteLog  C:\IIRF_1.2.15R5\iirf.out
RewriteLogLevel 0

MaxMatchCount 10

IterationLimit 10

################################################################

### CATCH Bogus Urls causing problems ###
RewriteRule   ^/*.+(\%3Cbr\%3E|\%3Cbr|<|>|\%3C\/td|%F7).*/?			/404.aspx?forbidden		[I,NF]
RewriteRule   ^/(?:(articles|artists|fans|profiles))/.+(src|target).+/?$	/404.aspx?invalidLength		[I,NF]
RewriteRule   ^/(?:(fans|profiles))/.{100,}+/?$					/404.aspx?invalidLength		[I,NF]

### REDIRECTS - not Rewrites ###
RedirectRule  ^/addvenue/?$			http://www.domain.com/Venues/AddVenue.aspx			[I,R]
RedirectRule  ^/addfestival/?$			http://www.domain.com/Festivals/AddFestival.aspx		[I,R]
RedirectRule  ^/festivals/festivalguide.aspx	http://www.domain.com/Festivals/				[I,R]
RedirectRule  ^/addshow/?$			http://www.domain.com/Shows/AddShow.aspx			[I,R]
RedirectRule  ^/bugs/?$				http://www.domain.com/About/Contact/Bugs.aspx			[I,R]
RedirectRule  ^/community/?$			http://www.domain.com/Fans					[I,R]
RedirectRule  ^/unsubscribe/?$			http://www.domain.com/Mydomain/Unsubscribe/Unsubscribe.aspx	[I,R]
RedirectRule  ^/(addband|addartist)/?$		http://www.domain.com/Artists/AddArtist.aspx			[I,R]
RedirectRule  ^/(addstory|addarticle)/?$	http://www.domain.com/Articles/AddStory.aspx			[I,R]
RedirectRule  ^/(fix|fixform|corrections)/?$	http://www.domain.com/About/Contact/Fix.aspx			[I,R]
RedirectRule  ^/(?:artists|bands)/(.[^?/\n]+(?<!\.aspx))/?$	http://www.domain.com/Search/Results.aspx?Search=$1&More=ArtSch&us=1		[I,R]

### USER PROFILES ### /profiles/name -> /fans/name 
RewriteRule  ^/profiles/(\w+).aspx(.*)$							/fans/$1.aspx$2			[I,U]
RewriteRule  ^/(?:(profiles|fans))/((([^?/\n]|/[^.?/\n])+)(?<!\.aspx))/?$		/fans/ataglance.aspx?un=$2	[I,U,L]

### ARTISTS ### /artist/id/name/subpage 
RewriteRule  ^/artists/(\d+)/((?>([^?/\n])+)(?<!\.aspx))/?(shows|articles|fans|chatter|bio|links|store)/?$		/artists/$4.aspx?artistID=$1&a=$2	[I,U,L]
RewriteRule  ^/artists/(\d+)/((?>([^?/\n])+)(?<!\.aspx))/?(goods)/?$    						/artists/store.aspx?artistID=$1&a=$2	[I,U,L]
RewriteRule  ^/artists/(\d+)/((?>([^?/\n])+)(?<!\.aspx))/?(forum|forums)/?$    						/artists/chatter.aspx?artistID=$1&a=$2	[I,U,L]
RewriteRule  ^/(?:artists|bands)/(\d+)/((([^?/\n]|/[^.?/\n])+)(?<!\.aspx))/?$		/artists/artist.aspx?artistID=$1&a=$2	[I,U,L]

### ARTICLES ### /articles/id/title/pagenum
RewriteRule  ^/articles/(\d+)/((?>([^?/\n])+)(?<!\.aspx))/(\d+)/? 	/articles/story.aspx?storyID=$1&pagenum=$4	[I,U,L]
RewriteRule  ^/articles/(\d+)/((([^?/\n])+)(?<!\.aspx))/?$		/articles/story.aspx?storyID=$1			[I,U,L]


If I take out the text in parenthesis, it works. Sending in this URL:

/Artists/9168/Project-Object-with-Ike-Willis%2c-Ed-Mann-and-Don-Preston/Articles

If I take out the %2c, it works. Sending this URL:

/Artists/9168/Project-Object-with-Ike-Willis-Ed-Mann-and-Don-Preston-(performing-the-music-of-Frank-Zappa)/Articles

With the original URL, I get an App Pool crash and a debug report that looks like this:

 

Error WARNING - DebugDiag was not able to locate debug symbols for IsapiRewrite4.dll, so the information below may be incomplete.

In w3wp__PID__9280__Date__07_06_2009__Time_08_47_31PM__389__Second_Chance_Exception_C00000FD.dmp the assembly instruction at IsapiRewrite4!pcre_exec+12f9 in C:\IIRF1.2.15R5\IsapiRewrite4.dll has caused a stack overflow exception (0xC00000FD) when trying to write to memory location 0x056d2be8 on thread 42

Please follow up with the vendor for C:\IIRF1.2.15R5\IsapiRewrite4.dll

 

Thread 42 - System ID 3688
Entry point   w3tp!THREAD_MANAGER::ThreadManagerThread 
Create time   7/6/2009 6:07:04 PM 
Time spent in user mode   0 Days 0:0:36.656 
Time spent in kernel mode   0 Days 0:0:14.968 

This thread is calling an ISAPI Filter IsapiRewrite4

Function   Source 
IsapiRewrite4!pcre_exec+12f9    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2779    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+2176    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+9ff7    
IsapiRewrite4!pcre_exec+1660    
IsapiRewrite4!pcre_exec+f3e    
IsapiRewrite4!TerminateFilter+39ba    

My IIRF Logs look like this, where the next rule, Rule 17, would be the one that matches:

Mon Jul 06 20:49:23 -  6480 - HttpFilterProc: SF_NOTIFY_URL_MAP
Mon Jul 06 20:49:23 -  6480 - OnUrlMap: storing physical path (D:\domain\Artists\9168\Project-Object-with-Ike-Willis,-Ed-Mann-and-Don-Preston-(performing-the-music-of-Frank-Zappa)\Articles), in ptr (0x05b19ff0)
Mon Jul 06 20:49:23 -  6480 - HttpFilterProc: SF_NOTIFY_AUTH_COMPLETE
Mon Jul 06 20:49:23 -  6480 - DoRewrites
Mon Jul 06 20:49:23 -  6480 - GetServerVariable_AutoFree: getting 'url'
Mon Jul 06 20:49:23 -  6480 - GetServerVariable_AutoFree - no joy (GetLastError()=1413)
Mon Jul 06 20:49:23 -  6480 - GetServerVariable_AutoFree: 128 bytes
Mon Jul 06 20:49:23 -  6480 - GetServerVariable_AutoFree: result ''
Mon Jul 06 20:49:23 -  6480 - GetHeader_AutoFree: getting 'url'
Mon Jul 06 20:49:23 -  6480 - GetHeader_AutoFree: 119 bytes, result '/Artists/9168/Project-Object-with-Ike-Willis%2c-Ed-Mann-and-Don-Preston-(performing-the-music-of-Frank-Zappa)/Articles'
Mon Jul 06 20:49:23 -  6480 - GetServerVariable_AutoFree: getting 'QUERY_STRING'
Mon Jul 06 20:49:23 -  6480 - GetServerVariable_AutoFree: 1 bytes
Mon Jul 06 20:49:23 -  6480 - GetServerVariable_AutoFree: result ''
Mon Jul 06 20:49:23 -  6480 - GetHeader_AutoFree: getting 'method'
Mon Jul 06 20:49:23 -  6480 - GetHeader_AutoFree: 4 bytes, result 'GET'
Mon Jul 06 20:49:23 -  6480 - DoRewrites: New Url: '/Artists/9168/Project-Object-with-Ike-Willis%2c-Ed-Mann-and-Don-Preston-(performing-the-music-of-Frank-Zappa)/Articles'
Mon Jul 06 20:49:23 -  6480 - EvaluateRules: depth=0
Mon Jul 06 20:49:23 -  6480 - EvaluateRules: Rule 1 : -1 (No match)
Mon Jul 06 20:49:23 -  6480 - EvaluateRules: Rule 2 : -1 (No match)
Mon Jul 06 20:49:23 -  6480 - EvaluateRules: Rule 3 : -1 (No match)
Mon Jul 06 20:49:23 -  6480 - EvaluateRules: Rule 4 : -1 (No match)
Mon Jul 06 20:49:23 -  6480 - EvaluateRules: Rule 5 : -1 (No match)
Mon Jul 06 20:49:23 -  6480 - EvaluateRules: Rule 6 : -1 (No match)
Mon Jul 06 20:49:23 -  6480 - EvaluateRules: Rule 7 : -1 (No match)
Mon Jul 06 20:49:23 -  6480 - EvaluateRules: Rule 8 : -1 (No match)
Mon Jul 06 20:49:23 -  6480 - EvaluateRules: Rule 9 : -1 (No match)
Mon Jul 06 20:49:23 -  6480 - EvaluateRules: Rule 10 : -1 (No match)
Mon Jul 06 20:49:23 -  6480 - EvaluateRules: Rule 11 : -1 (No match)
Mon Jul 06 20:49:23 -  6480 - EvaluateRules: Rule 12 : -1 (No match)
Mon Jul 06 20:49:23 -  6480 - EvaluateRules: Rule 13 : -1 (No match)
Mon Jul 06 20:49:23 -  7652 - HttpFilterProc: SF_NOTIFY_LOG
Mon Jul 06 20:49:23 -  6480 - EvaluateRules: Rule 14 : -1 (No match)
Mon Jul 06 20:49:23 -  6480 - EvaluateRules: Rule 15 : -1 (No match)
Mon Jul 06 20:49:23 -  6480 - EvaluateRules: Rule 16 : -1 (No match)

Coordinator
Jul 8, 2009 at 2:39 PM

And the error happens *every time* with that URL, is that right?

Coordinator
Jul 8, 2009 at 3:00 PM

I reproduced the problem.  I'll let you know.

Jul 8, 2009 at 4:39 PM

Glad to see that I'm not going crazy!

I tried some more URL's after finally getting one that could reproduce the problem.  Here are some that I've tried.

This is a real URL and does NOT cause any problems. /Artists/48752/The-Zone-(CA) 
But because the artist name after the ID is irrelevant, it could be anything. So I've modified it to see if I could repro the problem.

/Artists/48752/The-Zone-(CA)The-%2cZone-(CA)The-%2cZone-(CA)The-%2cZone-(CA)The-%2cZone-(CA)The-%2cZone-(CA)The-%2cZone-(CA)

^ this does repro my problem.

So then I tried this:

/Artists/48752/The-Zone-(CA)The-Zone-(CA)The-Zone-(CA)The-Zone-(CA)The-Zone-(CA)The-Zone-(CA)The-Zone-(CA)The-Zone-(CA)The-Zone-(CA)The-Zone-(CA)

^ this does repro my issue as well (without the %2c in the URL)

And to continue the test of this URL, I tried this one now as well:

/Artists/48752/The-ZoneThe-ZoneThe-ZoneThe-ZoneThe-ZoneThe-ZoneThe-ZoneThe-ZoneThe-ZoneThe-ZoneThe-ZoneThe-ZoneThe-ZoneThe-ZoneThe-ZoneThe-Zone

^ and this does NOT reproduce my issue it!


For the record I'm still running under 1.2.15 R5.

Coordinator
Jul 8, 2009 at 4:46 PM
Edited Jul 8, 2009 at 5:12 PM

mase, can you talk me through what you are trying to do with Rule 17?

the sequence ((?>([^?/\n])+) is making me curious.  Are you really intending to do "atomic grouping" as per the PCRE man page with that ?> sequence?   And do you really need all those parentheses?  Also, what do you intend with the \n ?  Is that supposed to be a newline?  I think you can safely assume URLs will not contain newlines! I think what you want, in English, is "a sequence of one or more chars, not includnig slash or question-mark, and not ending in .aspx".  Is that right?

If that's true, could you replace that rule with this:?

RewriteRule  ^/artists/(\d+)/([^\?/]+)(?<!\.aspx)/?(shows|articles|fans|chatter|bio|links|store)/?$
              /artists/$3.aspx?artistID=$1&a=$2
              [I,U,L]

(the above should be all-on-one-line in the ini file)

What this says, in English, is,

a line beginning with a slash, followed by the word 'Artists', followed by a slash, a series of decimal digits, and another slash. Then, a sequence of one or more characters that does not include a question-mark or a slash, and this sequence must not end in .aspx. Then, optionally, a slash. (Should this really be optional?) followed by one of the following words: shows, articles, fans, etc. Optionally, there is one more slash.

When I use your original rule with IIRF, I get the pcre endless loop thing.  I know sometimes the crash dump says "ntdll.dll" is the culprit, but when I make the IIRF .pdb file available, the debugdiag can resolve symbols and it is very clear that it is pcre_exec() in an infinite loop.  Your own dump shows that pcre_exec is oscillating between two addresses.  This could happen if one function calls a second function, and the second function calls the first.  With enough of that, a stackoverflow will soon occur, or IIS will reap the process due to the safety thresholds.  

pcre_exec is a function inside the PCRE library.  So it isn't a bug in the IIRF code, per se, though you could still argue that IIRF is broken.  Today, I am not really interested in debugging the PCRE library, so I looked for a way to avoid this problem.  And by changing the regular expression pattern, it seems to avoid the problem.   

When I use this modified rule - the one I showed above - it just works.

Now, if you ask me, Why does the original rule work in the testdriver.exe, and not in the ISAPI filter? - that is a complete mystery to me. I don't understand that part. I would have bet against THAT happening. But it does happen.  But again, I am not interested really, in debugging the PCRE library.  I have treated it like a black box so far, and I don't wanna open the box.

If the modified rule I suggested is ok, you will probably want to simplify the patterns in the other rules in your ini file, in the same way.  You use that sequence - ((?>([^?/\n])+) - repeatedly in various rules.  If you could swap that out for something simpler, I think you might be on your way.

 

Coordinator
Jul 8, 2009 at 5:01 PM

I tried your other URLs and they also succeed (no crash) when using my modified rule.

Jul 8, 2009 at 5:28 PM

Cheeso,

You're totally right! Your english translations of my rules are correct.

I have tested your modified rule and confirmed that it works with the URL's in question.

I think the \n was added because I was getting bogus URL requests coming in from spam bots, automated scripts, and malformed links followed by search bots. Is what I'm doing with the Not Found rules at the top, the best way to handle invalid URLs coming in? Do you recommend the UrlScan ISAPI that I've seen others use?

From what I remember, what I was trying to do with ((?>([^?/\n])+) was to quickly fail the match if there was a question mark or new line character. But I think this could be done better by adding the new line characters to the Not Found set of rules at the top.

As you said, unless there seems to be some valid reason to keep the \n, I will need to go through and modify all of my rules so they don't include this.

THANK YOU SO MUCH! This has been a problem for months and I've been at a loss for a long time.  I didn't think it was a problem with my rules because they tested perfectly in the TestDriver program.

I will return here and update you once again after I've modified all my rules and can no longer reproduce my issues.

Coordinator
Jul 8, 2009 at 6:17 PM

What you are doing with the Not Found rules looks mostly reasonable to me.  I think with the ?: you are intending to do a "non-capturing group", is that right?  Not sure why you would use that, when those patterns don't seem worthy of not capturing.   Also I don't get what the pattern in the first rule is doing.   It starts with:  ^/*.+  .  That reads "at the start of the URL, zero or more slashes, followed by one or more of any character."  Remember * is "zero or more".  I think what you want might be ^/.+ .  Or maybe something else.  But I guess you have sample urls that you run through the testdriver to make sure these rules fire properly, and catch what you think they should catch.  

You might consider UrlScan - It is handy for ruling out sets of URLs.  On the other hand if you are not bothered by too many broken URLs, skip it.  I know a couple years ago, when IIS5 was out (and very buggy), URLScan was an absolute requirement.  There used to be a directory traversal bug, so that using iis you could start cmd.exe and run commands into it.   But that class of bugs is gone now, so UrlScan is not so imperative. Still may be helpful though.

Do you really get newlines in your URLs? I've never seen that, and I didn't think it was possible.  

Jul 8, 2009 at 7:29 PM

Thankfully, your edits worked like a charm. I went through and made similar changes to the other rules that had that \n in there.  I believe that new line characters can come through on the URL with %0A (linefeed) or %0D (carriage return).

Obviously, I'm not the best at generating regular expressions. They make my head hurt and are easy to get lost in. I also used things like RegexBuddy to help test URLs when creating the expressions before putting them into the ini file. RegexBuddy never ran into any of these problems using PCRE either.

I have confirmed and tested many times the problematic URLs and they are no longer causing problems!

Since this is what I believe to be the 3rd time I've resolved this error, I really hope it's the last.

Again -- THANK YOU SO MUCH Cheeso.  I owe you big time. I don't know where you're located, but I need to buy you several beers!

Coordinator
Jul 8, 2009 at 7:41 PM
Edited Jul 8, 2009 at 7:44 PM

No problem, I'm glad it's sorted, and I'm glad to help.  I hope it stays fixed this time.

Regular Expressions are powerful, yet very hard to use.  That's for sure.

about the %0A%0D - maybe you can filter those out with more "not found" rules.  or urlscan.  Anything with that stuff in the URL is probably malicious. At the very least, match on %0A (and/or %0D), not on \n.   The %0A in the url is actually a string.  The requester wants the server to interpret it as a CR+LF, but in actuality, it is a 3-character substring within the URL, with a %, a zero, and a letter A. It will not get "transformed" into a CR+LF until someone or something does a UrlDecode on it, and IIRF v1.2 does not do that. (In IIRF v2.0, URL-decoding becomes optional.)

Thanks for the offer on the beers. Heh, maybe you should pay it forward.  Buy someone a beer, right where you are.