Canonical Tag Flying Under the Radar?

Image representing Google as depicted in Crunc...

Image via CrunchBase

This is a guest post by Craig Smith of Trinity Insight.

In early February of this year, search engine representatives within Google, Yahoo, and MSN (before it was Bing) made an announcement into a uniform method of embracing a new html tag to reduce duplicate content for a webmaster.  This “canonical” tag, which would be inserted within the HEAD portion of any HTML document, is a great way to reduce potential negative affects that can happen when you have the same page indexed multiple times under a variety of URL’s

The tag is written as this:  <link rel=”canonical” href=”http://mysite.com/page1.html”/>

This is essentially saying to a search crawler, “Hey Googlebot, this isn’t the preferred page for this content, href=http://mysite.com/page1.html is.”

In working with a variety of ecommerce platforms and content management systems, this is a pretty widespread issue and this tag will go along way to helping webmasters properly structure a site for optimal SEO.

Think about this for a minute.  You can have the following example variations for a fictional homepage:

www.mysite.com

mysite.com

www.mysite.com/

www.mysite.com/index.htm

mysite.com/index.htm

www.mysite.com/home.aspx

You get the picture.  Which is the primary page?  All of these pages can be indexed by search engines, but which version should an engine render when users are searching?

Sure you can use 301 redirects to fix this issue, but sometimes these are tough to generate within varied system and server environments.  You could try to eliminate varied parameters such as session ID’s and tracking codes, but then you are losing valuable data to help you understand the dynamics of your visitors and marketing campaigns.

In working with a new client in the eCommerce sector, we saw this issue first hand.  A single product page had 27 different versions of the page (different url’s for each) indexed within Google.  Each page had the same title and each page had exactly the same content.  Because of the different paths that users could take to find the product, due to the parametric filtering capabilities on the site, these urls existed in the index for years.

How does this impact your SEO efforts when you have these duplicate url’s in search indexes?  For starters, unlike a 301 which redirects all web traffic, the canonical tag is an indicator for only engines which allows you to keep your existing url parameters.  It will help engines in concentrating link equity into one primary URL, for a specific piece of content, as well as essentially tell them which page you want to have as the “authority” page.

Looking for answers direct from Google relating to the canonical tag?  Here are some Q&A answers that they provided in their webmaster central blog

Is rel=”canonical” a hint or a directive? It’s a hint that we honor strongly. We’ll take your preference into account, in conjunction with other signals, when calculating the most relevant page to display in search results. Can I use a relative path to specify the canonical, such as <link rel=”canonical” href=”product.php?item=swedish-fish” />? Yes, relative paths are recognized as expected with the <link> tag. Also, if you include a <base> link in your document, relative paths will resolve according to the base URL. Is it okay if the canonical is not an exact duplicate of the content? We allow slight differences, e.g., in the sort order of a table of products. We also recognize that we may crawl the canonical and the duplicate pages at different points in time, so we may occasionally see different versions of your content. All of that is okay with us. What if the rel=”canonical” returns a 404? We’ll continue to index your content and use a heuristic to find a canonical, but we recommend that you specify existent URLs as canonicals. What if the rel=”canonical” hasn’t yet been indexed? Like all public content on the web, we strive to discover and crawl a designated canonical URL quickly. As soon as we index it, we’ll immediately reconsider the rel=”canonical” hint. Can rel=”canonical” be a redirect? Yes, you can specify a URL that redirects as a canonical URL. Google will then process the redirect as usual and try to index it. What if I have contradictory rel=”canonical” designations? Our algorithm is lenient: We can follow canonical chains, but we strongly recommend that you update links to point to a single canonical page to ensure optimal canonicalization results. Can this link tag be used to suggest a canonical URL on a completely different domain? No. To migrate to a completely different domain, permanent (301) redirects are more appropriate. Google currently will take canonicalization suggestions into account across subdomains (or within a domain), but not across domains. So site owners can suggest www.example.com vs. example.com vs. help.example.com, but not example.com vs. example-widgets.com.

So what’s your action plan?  First thing is to evaluate your site paths and look for instances in which you have multiple url’s with the same content.  Look for the duplicates and decide which version that you desire to have as your primary version.

Embed the tag on the duplicate pages as indicated above, potentially in an automated basis within an eCommerce platform, and help the engines more effectively index your site.  The canonical tag is a major development within the SEO market that has flown a bit under the radar, but can really make a difference in your rankings.  Best of luck in reducing your duplicate content and making your website more efficient for search engines to crawl and index!


About Author:

Craig Smith is the founder of Trinity Insight, an eCommerce optimization firm that specializes in web analytics consulting and multivariate testing

Reblog this post [with Zemanta]