RewriteRule guru challenge... :)

Topics: User Forum
Oct 27, 2006 at 7:34 PM
Hi,
I'm trying to accomplish two separate things:

1) be able to utilize the results of tags, such as %{HTTP_REFERER} in the RewriteRule (not the "RewriteCond"):)
eg. i want to accomplish something like: RewriteRule ^(.*)$ $1%&referer=%{HTTP_REFERER}

2) how does one reliably extract ONLY the FILENAME portion and not the entire path in a rewrite, trimming off one or more directory paths?
eg. http://anyDir1/anyDir2/anyDirN/anyFILENAME.html?morestuff... -> http:/myIsapi.dll?form=%anyFILENAME&morestuff...

ideally, the same rule would work with:
http://anyDir1/anyFILENAME.html?morestuff... -> http:/myIsapi.dll?form=%anyFILENAME&morestuff...

or
http://anyFILENAME.html?morestuff... -> http:/myIsapi.dll?form=%anyFILENAME&morestuff...

Thanks in advance... if you can SOLVE these... YOU D'MAN!!!
Coordinator
Oct 27, 2006 at 10:19 PM
For #1, you want the result of HTTP_REFERER in the replacement string of a rewriterule.

Check the readme; it says:

The replacement string can contain:

- constant text

- references to the matched substrings in the input, in the form
of $n, where n is a digit. (Eg $1, $2, $3...). These are
called back-references.

- references to the matched substrings in the most recently
evaluated RewriteCond associated to that RewriteRule. These
take the form %n, where n is a digit. (Eg %1, %2, %3...)


The third possibility is what you want. You can use a simple RewriteCond mated to a RewriteRule to do this.

RewriteCond %{HTTP_REFERER} (.+)
RewriteRule /something.asp /something.asp?referer=%1

For your #2 request, you want to strip paths from filenames. This should be easy to do with something like

RewriteRule ^/(?!myIsapi.dll)(^\//)(.*)$ /myIsapi.dll?form=$2&morestuff

The ?!myISapi.dll says, don't rewrite if the URL starts with myIsapi.dll (no infinite rewrite loops).
The next set of parens says - capture zero or more things that end in slash. This is the path. The next set of parens is (.*) - that is the final thing, after all slashes. That is stuffed into match #2. So the replacement pattern references $2.

you should test this of course with all of your expected and unexpected inputs - look for edge cases like trailing slashes and missing files and so on.

best,
-Cheeso


Oct 28, 2006 at 1:36 AM
Cheeso,
I REALLY APPRECIATE your help!

It getting closer...

Your Solution for my original #1 works like a charm -- thanks!

The Solution for #2 has some serious constraints that weren't considered, one because I wasn't precise enough (SORRY, my mistake -- i didn't think it would be a factor.)

(source:) http://anyDir1/anyDir2/anyDirN/anyFILENAME.html?morestuff...
(target:) http:/myIsapi.dll?form=%anyFILENAME&morestuff...

you propose: RewriteRule ^/(?!myIsapi.dll)(^\//)(.*)$ /myIsapi.dll?form=$2&morestuff

several problems:
a. the original "?morestuff..." also needs to be dynamically propagated from the source to the target, and do so while changing the leading "?" to a "&" so that the target has "&morestuf...." and not merely added as a literal value on the target.
b. i didn't exactly state my original target precisely (SORRY!), as i need it to include a wired prefixed target directory (totally indifferent to any source directory) to it: ie.
(updated specification's target:) http:/newDir/myIsapi.dll?form=%anyFILENAME&morestuff...

i realize that my "newDir" would be eaten up by the recursion in your formula would cause the newDir to also become swallowed up by the trimming. again.. sorry... i didn't forsee the impact (and am only beginning to understand how the recursion works from your example.)

Is this still a possible solution?

THanks,
Dave
Nov 5, 2006 at 10:06 PM
Dave, first note that the regex you say Cheeso proposed is not what he proposed at all. The CodePlex forum software messed up his regex by converting characters between asterisks into bold text, and text between opening and closing square brackets into a link. Hence, add asterisks around the bold text and square brackets around the link, and you'll see what the regex was meant to be.

Now let's solve your problem...

You say you want to convert the following (with a directory path with any number of possible subdirectories)...

/directoryPath/filename.ext?query

...into this:

/newDirectory/myIsapi.dll?form=filename.ext&query

I'm going to assume the following, which you did not explicitly specify in your posts:

- You want this rule to apply to files with any extension (not only ".html").
- You do not want to remove the file extension from the rewritten URL (though some of your examples suggest otherwise).
- You want to preserve any number of original variable/value pairs in the URL query string (not only the first one).
- You want to apply this rule even when you do not have a filename in the original URL (i.e. without a filename your rewritten query string would read "?form=&query").

If any of those assumptions are incorrect, let me know.


Here's the rewrite rule:

RewriteRule ^(?!.?/myIsapi\.dll)/(?:(?!\?|/^/.\.).)/?(^?)(?:#^?)?\??(.)$ /newDirectory/myIsapi.dll?form=$1&$2


Note the following:
- With both "/dir1/dir2/" and "/dir1/dir2" (with or without page anchors and/or URL variables coming immediately afterwards), my regex will treat "dir2" as a directory, rather than a file without an extension (which is possible though unlikely in the second example since there is no trailing slash).
- The regex will not work correctly with directory names which include periods, although they are legal characters for directory names. If you need to account for directories with periods in their names, let me know and the regex can be modified appropriately.
- In cases where there is no original URL query string, the rewritten URL will end in an ampersand character (this shouldn't cause any problems).
- Page anchors after filenames (e.g. filename.html#top) are stripped from the rewritten URLs.
Nov 5, 2006 at 10:12 PM
Damn auto-formatting screwing up my regex... lemme try posting the regex again, this time within quotes to see if that prevents the CodePlex software from screwing it up:

"^(?!.?/myIsapi\.dll)/(?:(?!\?|/^/.\.).)/?(^?)(?:#^?)?\??(.)$"
Nov 5, 2006 at 10:21 PM
No luck, and I can't even say to just add asterisks around the bold text and square brackets around the links, because plus signs are being removed as well for no apparent reason.

Let me try again with spaces between each character (remove all spaces before using):

^ ( ? ! . * ? / myIsapi \ . dll ) / ( ? : ( ? ! \ ? | / ^ / . + \ . ) . ) + / ? ( ^ ? * ) ( ? : # ^ ? * ) ? \ ? ? ( . * ) $


Here's hoping it will work this way...
Nov 5, 2006 at 10:23 PM
Aha, that kind of worked.... Just remove all the spaces and add square brackets around the links.

(I think I'll steer clear of posting regexes on the forum here for a little while.)
Nov 6, 2006 at 4:38 PM
Since Cheeso mentioned at http://www.codeplex.com/Project/DisplayThread.aspx?ProjectName=IIRF&ForumId=408&ThreadId=1460 that page anchors are never part of the URL request sent to the server, we can remove the handling for them. So, the regex becomes (again, remove all spaces and add square brackets around links):

^(?!. * ?/myIsapi\.dll)/(?:(?!\?|/ ^/. + \.).) + /?( ^? * )\??(. * )$

Or you could modify it slightly as follows, which makes it a bit easier to follow the ending logically but takes a few extra characters and should be functionally identical:

^(?!. * ?/myIsapi\.dll)/(?:(?!\?|/ ^/. + \.).) + /?( ^? * )(?:\?(. * ))?$

If you don't want file extensions (e.g. ".html") to be included in the rewritten URL (I'm less sure of my previous assumption in this regard than the others), you could modify the regex as follows:

^(?!. * ?/myIsapi\.dll)/(?:(?!\?|/ ^/. + \.).) + /?( ^?. * )(?:\. ^? * )?(?:\?(. * ))?$

That's very basic (e.g. with the filename "document.2.html" it would only capture "document" into backreference 1) and I haven't really tested the regex, but it should work fine.
Nov 8, 2006 at 12:22 AM
It looks like the autoformatting rules changed somewhat after the CodePlex maintenance that happened earlier today.