Quantcast
Channel: Four Miles to May
Viewing all articles
Browse latest Browse all 10

Avoid Duplicate Content – Consistent URLs, Link Element & Mod Rewrite

$
0
0

Duplicate content is a misleading term. When you first happen upon it, it seems to imply that your site has the same content on more than one page. Or that your site is duplicating (or has stolen) content that’s already on someone else’s site. But in most cases, when someone says your site has duplicate content, they mean that more than one url pointing to your website serves up the same content. For example, it’s not uncommon for the following url patterns to return the same page:

www.mysite.com/mystuff/
mysite.mystuff
www.mysite.com/mystuff/index.php
mysite.stuff/index.php?some-parameter=some-value

The confusion is compounded when you have more than one domain name pointing to the same site.


From a site visitor’s perspective, duplicate content is not such a big deal. Most of us don’t particularly care by what means we got to the content we were looking for. But search engines are perplexed by such shenanigans. Which url should they include in their indexes?

In February, Google, Yahoo and Microsoft announced support of a new link element that allows you to specify which url is your preferred or “canonical” url for a particular page. Here’s the syntax:

<link rel="canonical" href="http://example.com/page.html" />

The “rel” attribute is self-explanatory. The “href” attribute is where you offer your preferred url for indexing purposes for that page. According to Google engineer Matt Cutts, Google at least, will try to use the canonical link to index the page, though Google reserves the right to use a different url in its index for that page. (Watch Matt’s video on the new element.)

If you’d rather not depend on Google’s url selection process, here are a couple of simple things you can do to minimize duplicate content problems on your site:

  • When creating internal links that link pages within your site to each other, always use the same format. Don’t use mysite.com/mystuff/index.html in one spot and mysite.com/mystuff/ in another to link to the same content.
  • If your site is hosted on an Apache server, use Apache’s mod rewrite to permanently redirect all of your domain names to one domain (and that includes mysite.com to www.mysite.com or vise-versa).
  • Here’s one way to do that. Include the following in your .htaccess file:

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_HOST} !^www.mysite.com$ [NC]
RewriteRule ^(.*)$ http://www.mysite.com/$1 [R=301,L]
</IfModule>

The previous snippet first checks for the mod rewrite module. If it’s enabled, the rewrite condition looks for any url that does not begin with http://www.mysite.com and permanently redirects it to http://www.mysite.com.

Share/Bookmark

Viewing all articles
Browse latest Browse all 10

Trending Articles