Url rewrite and whitespaces

Topics: Developer Forum, User Forum
Aug 19, 2009 at 4:58 PM

Hi, today I have tryed a new rewrite rule using the library 2.0j:

#########################################
RewriteRule ^/(.*)/pagina_dinamica.html$ /pagina_dinamica.asp?pagina=$1
#########################################

Rewrite works correctly in a normal case:

#########################################
http://localhost/gruppi/pagina_dinamica.html => http://localhost/pagina_dinamica.asp?pagina=gruppi
#########################################

But if i have this url the spaces are trimmed in a very strange way:

#########################################
http://localhost/gruppi cre/pagina_dinamica.html
Wed Aug 19 17:50:04 -  5900 - DoRewrites: Rewrite Url to: '/pagina_dinamica.asp?pagina=gruppi cre
#########################################

When i get the querystring parametr using the ASP code:

#########################################
response.write(request.querystring("pagina"))
#########################################

The result is:

#########################################
gruppicre
#########################################

I don't know if it's a library problem or an ASP code limit, ,but usually I don't have problem in retriving a parameter with whitespaces!

Regards!

Coordinator
Aug 19, 2009 at 5:30 PM

Hello Marco,

First, is it really that you want spaces in your URLs?  According to my understanding of RFC2396, spaces are excluded from the list of allowed characters in a URI, specifically because of complications like the one you are encountering.  If you want a space in the URL, escape it with %20.  The RFC says:

Data corresponding to excluded characters must be escaped in order to be properly represented within a URI.

Second, what happens if you invoke '/pagina_dinamica.asp?pagina=gruppi cre' directly?    What happens if you remove IIRF from the system, for that request?
 

Aug 19, 2009 at 6:43 PM

Hi Cheeso, thanks for the quick answer.

I know that usually a "whitespace" character is a problem in a url, but all the major browser automatically escape the url and I haven't to worry about. Using a url rewrite is a new experience so probably I made same big logical mistake.

I'd like to create search engine friendly url like:

#########################################
http://localhost/Toscana/Firenze e Dintorni/hotel_alexandra_15.html => http://localhost/scheda_hotel.asp?regione=Toscana&localita=Firenze e Dintorni&hotel=Alexandra&id=15
#########################################

So the whitespaces are "needed" however I can substitute it whit an underscore, but I'd like to know if it's my mistake, an RFC specification or a bug.

If I invoke directly ''/pagina_dinamica.asp?pagina=gruppi cre'' che correct value is displayed, but I bypass the Iirf library because there are no matching rule.
If I remove the IIRF library from the system the correct value is displayed.

Regards! 

 

Coordinator
Aug 20, 2009 at 1:27 AM

Yes - the URL is encoded (or escaped) by the browser to insert a + in place fo each SPACE.  IIRF gets the URL, and decodes or unescapes it, before performing pattern matching.  The capture in your rule is capturing "gruppi cre" including the space.  That captured group then gets inserted into the replacement string for the URL path. 

From that point I don't know how or why the space disappears.  It sure seems like it *should* work. 

Can you try with this directive in your IIRF.ini file:

UrlDecoding OFF

Aug 20, 2009 at 7:53 AM

Ok, i think that the problem is solved:

- using "UrlDecondig OFF" the script works correctly
- using "+" as SPACE the script works correctly even with "UrlDecoding On". Probably the problem is that I use "%20" to escape whitespace and it's my mistake!

Many thanks!

Coordinator
Aug 20, 2009 at 3:44 PM

I'm glad it's working for you.