A guide to canonical tags
The internet is driven by content. Whether it’s product descriptions on an ecommerce site, articles on a news website, or user-generated posts on a social network, content is what we create and consume every time we get online.
For the most part, online content is heavily text-based, but there are pages – and entire websites – that are dominated by images, audio and video too.
All of this content has value, from financial value (ranging from the cost of creating it, to the revenue it brings in) through to SEO value (how it helps you to rank in Google Search results and build your online audience).
So it’s important to be able to tell the search engines which is the original copy of a page – the ‘canonical’ version of that content.
Canonical meta tags are used to achieve this, helping to identify deliberately duplicated content, and ensure that the search robots index the original and ‘best’ copy of any given web page.
In this guide we’ll define canonical tags, provide examples of their usage, and take a look at some further examples of when and why you should use canonicals.
What is a canonical tag?
A canonical tag is an HTML tag inserted into the <head> section of a web page, which includes a URL reference to the definitive, original version of that page.
Canonical tags can be used to declare that a page is itself the original version; however, they are even more powerful when they are used to declare that the page on which the canonical tag appears is not the original version of that content.
We’ll look at why this is so important in a moment. For now, just know that a canonical tag is telling the search engines one of two things:
- This page is the original.
- This page is not the original (with a link to the original page).
As a website publisher, this is your opportunity to provide information directly to the search engines that could impact on your rankings in the search results, so don’t waste this chance.
Example of a canonical tag
A canonical tag has a standard format:
<link rel=”canonical” href=”https://www.example.com/canonical-page.com” />
You might hear canonical tags referred to as meta tags. This is not strictly correct, as they are a <link> element, not a <meta> element.
However, if you are familiar with meta tags (e.g. the ‘description’ meta tags commonly used for SEO purposes), you might find it useful to think of canonical tags in the same way.
Some common characteristics of rel=”canonical” tags and meta tags include:
- Placed in the <head> of the page.
- Provide extra information about the page.
- Used by search engines when crawling the page.
From the example given above, you can see that this is a relatively simple way to include this information, but it can have a big impact on how your pages are indexed.
Why should you use a canonical tag?
Many new websites implement rel=”canonical” tags across all their pages by default, but if your existing website is more than a few years old, it is likely that you are not already using canonical tags.
It’s easy to check:
- Visit a page of your website.
- View the page source (keyboard shortcut ctrl+U in most desktop web browsers).
- Look for a canonical tag in the page header.
You should also look for canonical tags in the <body> section of the page and move them into the <head> if they are present – we’ll come on to why this is important a little later.
In any case, if you find canonical tags in your page code, it’s also worth checking that they are implemented correctly, as relatively minor mistakes can have significant consequences.
As for why canonical tags are so important:
In recent years Google in particular have prioritised original content, high quality and various forms of ownership and authorship.
Until the mid-2010s, the primary way to do this was using the rel=”author” tag to specify who had written the page, and link it to their profile on the short-lived Google+ social network.
Once this method was retired by Google, canonicalization rose to the fore as a way to let Google know that a page you publish is the original and should be included in the search results as such.
How to implement canonicals
You should understand that canonical tags are not intended for where a web page uses a small section of content that is also available somewhere else online, e.g. e-commerce product descriptions, recipes, and so on.
Rather, you should implement canonicals when the entire page has a direct equivalent somewhere else, to show that you recognize the original page exists.
Combined with hreflang tags, this can tell Google that pages are translations of one another into different languages, which can help the search engine to provide the correct language in its results for users elsewhere around the world.
You can (and probably should) add a self-referencing canonical tag to every original page you publish, too, to stake your claim as its definitive publisher in case a third-party website scrapes and republishes your content at a later date.
Don’t use canonical tags in an attempt to disguise poor design, for example, if you use URL parameters to make minor changes to the language, currency or other variables on an otherwise identical page: there are better ways to implement those variables so the search engines don’t just see an infinite (or very large) number of near-identical pages on your site.
Mistakes to avoid
The tags themselves are not complicated but there are some common mistakes in implementing canonical tags, often when they are used in conjunction with other ways to control the search engines’ access to your content.
Five of the biggest mistakes to avoid with canonical tags are:
1. Mixing rel=”canonical” and noindex
Canonical tags tell the search engines that you want the non-canonical pages to remain visible, but that you are aware the content is largely duplicated or derivative.
In contrast, noindex tells the search engines that you don’t want a page to be indexed in the search results at all.
This is a minor distinction but can have major repercussions if the non-canonical pages are completely de-listed by Google, so if you are using rel=”canonical” tags, don’t use noindex as well.
2. Blocking access with robots.txt
Your robots.txt file is another way to prevent the search engines from crawling your website and can be used to block access on a large scale, rather than page by page.
Used correctly it can be very useful, for example by hiding a brand new part of your website from appearing in the search results until all your pages are published, polished, optimized and ready for public viewing.
But one wrong instruction in robots.txt can wipe pages from the search results completely, so again, if what you want to achieve is canonicalization, don’t ban the search robots from crawling and indexing the non-canonical pages.
3. Using canonical tags for pagination
This one is about content, not access. A common mistake is to use rel=”canonical” tags on the second and subsequent pages of a piece of content, to point back to the original title page.
Examples include long articles split over several pages, series of content published at weekly intervals (or any other intervals) or multiple pages/posts in a monthly archive.
However, the content on the pages is not equivalent – they appear in series from one another, not in parallel – and so it is not correct to treat the subsequent pages as non-canonical versions of page one.
To be correct, each page should have a canonical tag that references itself. There are some more simple steps you can take for multi-page content too:
- Use rel=”next” and rel=”prev” tags, which are supported by Bing.
- Provide a single-page equivalent, which is preferred by Google.
- Direct your canonical tags from multiple pages to this single-page version.
Remember that publishing on the internet does not have to be linear and pages do not all have to be the same length, so optimizing your content is a case of learning what works best.
4. Using canonical tags in the <body> section
Many HTML header tags can be placed anywhere on the page and will still be picked up, so during development work and especially during quick fixes, this is not uncommon.
Canonical tags are different: they will not work if they are placed outside of the <head> section of your page.
Make sure they do not appear in the <body> part of your page code, and that you don’t have any other plugins inserting a closing </head> tag before your canonical tags appear.
5. Using multiple canonical tags on one page
Each page should only contain one canonical tag, which should point to the one definitive version of the content that appears on that page.
There are reasons why you might sensibly want to use multiple canonical tags, for example, if your page contains the original manufacturer’s description of several different products.
You can also end up with multiple canonical tags inserted by different conflicting plugins and other bits of code on your site, so watch out for this happening if you implement new SEO plugins over time.
It’s often an easy fix: just make sure your site is configured in such a way that only one canonical tag appears per page, and there’s no reason why this problem should persist.
Final thoughts
The code for a rel=”canonical” tag is not difficult to understand or add to a page; the complexity is in understanding the various use cases and avoiding any mistakes in implementation.
However, the potential power of a correctly configured canonical tag is huge, so take the time to check your website and add in this line of code if it’s not there already.