Duplicate content issues

Topics: User Forum
Oct 2, 2008 at 7:41 PM
Hi all,

i need some help to overcome duplicate content issues. Many pages of my website dropped down in Google ranking, probably because of duplicate content.
I have two issues, which i think might be solved with rewrite rules.
First is url's with capitals and/or non-capitals. I have external links coming in like www.mywebsite.com/Sony but also like www.mywebsite.com/sony. Google sees this as duplicate content. Can this be solved by a rule?

Second is maybe simple but i cannot resolve it. I have a default page www.mywebsite.com/default.asp. External links refer sometimes to www.website.com or to www.mywebsite.com/default.asp. In google these two pages (that are the same offcourse) have different pagerank. Is there a rule to redirect the page www.mywebsite.com/default.asp to www.mywebsite.com?

Thanks in advance.
Roger

Coordinator
Oct 3, 2008 at 6:29 AM
Edited Oct 3, 2008 at 6:34 AM
For the first issue.
I don't know exactly how google page ranking works. I'm not sure if uppercase and lowercase are treated differently.  I don't think they would be, since in the IETF RFC, URL's are case insensitive.  and of course, google's crawler would know that. But this is just a guess.  I might be totally wrong, and maybe google's url index is case sensitive.

Consider using a single rule to REDIRECT /sony to /Sony (or vice versa) so that uppercase and lowercase references end at the same endpoint - either capitalized or not.  (this means the [R] flag)    Google's indexer is smart enough to follow redirects.  Regardless whether Google;s URL index is case-insensitive or not, You might want to do this anyway, just for aesthetic or URL hygiene purposes. 

For the 2nd issue.  Same idea here.   In the IIRF ini file, specify a rule that redirects ([R] flag]) from www.mywebsite.com/Default.asp to www.mywebsite.com .    Then, specify a rule to REWRITE from www.mywebsite.com (with no page at all) to   default.asp.   You will need to take care to use the [L] flag there, because you don't want a logical loop in your rule set.  (in other words, you don't want the rewrite to result in a URL that then gets redirected).   So in actuality, if you request default.asp from the server, the server will say back - "oh no, you should request /".  And then your browser does that (follows the redirect) and the IIRF rewrites to default.asp.   It is default.asp that gets executed on the server, but in the browser (and to google's indexer) the URL looks like www.mywebsite.com

good luck 


Oct 3, 2008 at 11:16 AM
Hi Cheeso (and others),

thanks for your answers. I know for certain that Google sees lower and uppercase url as duplicate. In my webmaster help of Google it is indicated as a problem under Diagnoses -> Duplicate content in meta tags.

Because i have more than 500 urls with the brand of product in the url, is there simple rule to redirect and change all to lowercase?

For my second question, about stripping of the default.asp, can you show me an example in code? I am not good in writing specific rules.

Thanks in advance

Coordinator
Oct 4, 2008 at 1:22 AM
There is a way to change case when rewriting, yes.  Check the Readme and the examples.  In the replacement pattern, You want to surround the word to be lowercased with #L and #E.
Check out this example: 

http://www.codeplex.com/IIRF/Wiki/View.aspx?title=Case%20Folding&referringTitle=Examples

For the second case.

RewriteRule ^/default.asp$   / [R]
RewriteRule ^/$  default.asp  [L]
Oct 5, 2008 at 7:20 AM
Hi Cheeso,

thanks for your solutions. The example you refer to worked for the domainname (changed www.MyWebsite.com to www.mywebsite.com) but not for words behind the slash. But i implemented another solution (i hope this is correct for Google). I do a string compare of the word Sony in the database, strcomp(string1, string2)=0 , and if false then i force a error. The user gets a default error.asp page. I believe Google wil not index this page anymore, so no duplicate issues.

The second case i tried but does not work (for me?). I get the error.asp page when i type in www.mywebsite.com/default.asp. The first line alone gives no error. Any ideas?


Coordinator
Oct 6, 2008 at 8:49 PM
The example you refer to worked for the domainname (changed www.MyWebsite.com to www.mywebsite.com) but not for words behind the slash.

Hmm, the case-folding stuff should just work. I don't understand why it wouldn't, unless your rules are not really working. you've got something else going on.

The rules for the second case - I tried them and they work here, perfectly. I suspect something is not quite right in your configuration. Both of these things should just work. You're telling me that neither of them do.

What does the IIRF log file say?