path anaylsis and regex best practice

Topics: User Forum
Mar 15, 2010 at 7:00 PM

Hi all and Cheeso -

I've made a bit of progress from where I was a few weeks back. So I'm working with regEx's to do URL Redirects with IIRF. It works great. Though I've run into a few path patterns I'd like to handle a bit more elegantly that I'm currently doing.

Key: ^(\w+\.\w+)?$ = get's the file name off the path.

What I'm doing now: (all I need is the old file name, I know where we're going path wise)

RedirectRule ^/design/_documents/earthwork/(\w+\.\w+)?$ http://awebsite.gov/resources/design/toolsguidance/documents/$1

RedirectRule ^/design/_documents/eng-estimate/(\w+\.\w+)?$ http://awebsite.gov/resources/design/eeprog/documents/$1

RedirectRule ^/design/_documents/misc_forms/design/(\w+\.\w+)?$ http://awebsite.gov/resources/design/forms/documents/$1

It could be better(?) but it works for me.

What I'm finding though is that I have rules/paths like below, so as a result it seems I could take a little more of the original path and use it in the group variables to rebuild the new redirected url:

RedirectRule ^/design/SampleFiles/_documents/3RSampleSheets/A-Gen_sht/^(\w+\.\w+)?$ http://awebsite.gov/resources/SampleFiles/documents/3RSampleSheets/A-Gen_sht/$1

RedirectRule ^/design/SampleFiles/_documents/3RSampleSheets/B-Summ/^(\w+\.\w+)?$ http://awebsite.gov/resources/SampleFiles/documents/3RSampleSheets/B-Summ/$1

RedirectRule ^/design/SampleFiles/_documents/4RSampleSheets/A-Gen_sht/^(\w+\.\w+)?$ http://awebsite.gov/resources/SampleFiles/documents/4RSampleSheets/A-Gen_sht/$1

RedirectRule ^/design/SampleFiles/_documents/4RSampleSheets/B-Summ/^(\w+\.\w+)?$ http://awebsite.gov/resources/SampleFiles/documents/4RSampleSheets/B-Summ/$1

As you can see above I have illustraited portions of two patterns.

The full pattern is as follows:

/3RSampleSheets/A-Gen_sht/

/3RSampleSheets/B-Summ/

/3RSampleSheets/C-AnotherCategory/

/3RSampleSheets/G-AnotherCategory/

... to /3RSampleSheets/ZZ-AnotherCategory/

("AnotherCategory" could be any text, always different per path)

The same can be said for the /4RSampleSheets/ paths.

So my question is... since the information I'd like to use is already in the path, how do I get at it and use it? All I need is to extract two folder levels prior to the file name. Of course there are other similar patterns, though they may only need one folder level before the file name. I can do the code on a case by case basis, per grouped pattern. I just wanted to handle larger groups like above that may go A through Z with categories. Does this make sense? I can do it the long way, but thought I'd get some advice first. I'm not a fan of doing things the wrong way.

Thanks again for your time and efforts.

-Brent

Coordinator
Mar 15, 2010 at 8:07 PM
Edited Mar 15, 2010 at 9:18 PM

Hey Brent,

I'm glad to help.  I see you're working on a .givv site.  I hope you're doing something good for the government!

ok, on your first set of rules - the ones that you said "could be better" - they look fine.  You could consolidate and generalize them if you wanted to, using a RewriteMap.  RewriteMap is a new feature of IIRF v2.1, that does a lookup in a "map file".   like this:

RewriteMap  pathmap  txt:iirf-pathmap.txt
RedirectRule ^/design/_documents/(earthwork|eng-estimate|misc_forms/design)/(\w+\.\w+)?$ 
http://awebsite.gov/resources/design/${pathmap:$1}/documents/$2

(the above rule must be all-on-one-line)

and the contents of the iirf-pathmap.txt file would be a list of pairs, each pair maps a value from an incoming URL to a value for the url-to-redirect-to. Like this:

# map file for IIRF
earthwork          toolsguidance
eng-estimate       eeprog
misc_forms/design  forms

But that might make sense only if you have many things to map from and to.  With only 3 rules, it might not make sense. 

Ok, on your next set of rules, you can consolidate them to a single rule, like this:

RedirectRule ^/design/SampleFiles/_documents/(3|4)RSampleSheets/([^/]+)/^(\w+\.\w+)?$ 
      http://awebsite.gov/resources/SampleFiles/documents/$1RSampleSheets/$2/$3

(once again, the above rule must be all-on-one-line)

As you can see there are 3 capturing groups. The first captures either a 3 or a 4. The next captures the preceding query path segment. This is the thing that is A-SomeCategory... through ... Z-SomeCategory. Actually, that rule will match for any query path segment, not only segments that begin with A- ... through Z-. If you want to handle only A-Z, then you can write a more restrictive rule, like this:

RedirectRule ^/design/SampleFiles/_documents/(3|4)RSampleSheets/((?:[A-Z])-[^/]+)/^(\w+\.\w+)?$ 
      http://awebsite.gov/resources/SampleFiles/documents/$1RSampleSheets/$2/$3

This one will match only URLs that have the penultimate query path segment starting with A-, B-, ... through Z-.  

Mar 15, 2010 at 8:20 PM

Cheeso - Thank you for your response. You've certainly given be a bit to think about so I look forward to parsing it out and getting my brain around it.

Thanks again, Brent