What you need to know about duplicate content
Duplicate content is a major cause for concern for webmasters and online marketers. If a page is flagged as duplicate content, you run the risk of it vanishing from the search results entirely.
However, this does not always happen. In some cases, you might find your page ranks lower in Google Search results than a very similar page with slightly better content.
Sometimes duplicate content is not a reason to worry at all. Some content, such as song lyrics and poetry, cannot be reworded, and Google is increasingly good at recognizing this and acting accordingly.
What is duplicate content?
Duplicate content occurs when blocks of content across the same domain appear on more than one occasion.
Generally, duplicate content is not created to deceive search engines in the hopes of achieving better search performance.
Rather, it occurs when extensive and complex content schedules develop across multiple departments, or when numerous outside agencies handle production and publication over a lengthy period.
If duplicate content is purposely created to deceive search engines, however, this may initiate a response from Google. Examples of malicious duplicate content includes content that is:
An article in Google’s Search Central states: “Duplicate content on a site is not grounds for action on that site unless it appears that the intent of the duplicate content is to be deceptive and manipulate search engine results.”
How much duplicate content is acceptable?
Depending on the size of your website, it is nearly impossible to avoid at least some duplicate content between pages. The truth, however, is that no one outside of Google is quite sure how much duplicate content is acceptable.
Some articles covering the same topic may state a percentage, but the percentage is something that Google has never certified, so it is impossible to give an accurate answer.
As a rule of thumb, your content must be of high quality, valuable to your users, and unique for it to perform well in Search.
Should I worry about duplicate content?
Content that appears on every page of your site, such as your header, footer, navigation and contact details, is unlikely to add up to a duplicate content warning.
But you should still make sure every page has some unique content that provides good value, to balance those repetitive page elements.
It’s also good practice to give each page a unique title tag and meta description tag. Your Content Management System may have this ability built in, or you might be able to install an SEO plugin or extension to edit title and meta description tags for each page.
Does duplicate content affect your rankings?
One of the misconceptions about duplicate content is that it offers a negative ranking factor, particularly in Google Search results.
However, in January 2021, Google’s John Mueller stated that duplicate content is not something that would affect a site’s ranking opportunities.
He said during a Google Search Central Office Hours live video that, “it’s not so much that there’s a negative score associated with it. It’s more that, if we find exactly the same information on multiple pages on the web, and someone searches specifically for that piece of information, then we’ll try to find the best matching page.”
This situation means that if you have two pages covering the same topic, Google will decide which is the most relevant to a user and serve them that page in the SERP.
However, this could result in Google serving a page to users that is not your preferred page, which is particularly painful when the unserved page holds commercial value.
Negative effects of duplicate content
To summarize, there are several possible negative impacts of duplicate content within your own website:
- Google might choose the ‘wrong’ page to include in Search results
- Google might include the ‘right’ page but at a lower rank
- Google might exclude the ‘right’ page completely
Depending on the outcome in each case, having identical content across multiple pages can be anything from an inconvenience to a catastrophe in terms of website traffic and revenues.
As such, it is sensible to regularly audit duplicate content on your website and take action to resolve any obvious issues, before they can impact your traffic or Search rankings.
How to find duplicate content on your website
There are several ways to identify duplicate content on your own website. Some examples of this include:
- Error messages and duplicate content add-ons in your Content Management System
- Third-party apps and browser extensions that compare your content with other websites
- Google Search Console Index Coverage Report
You can also use online plagiarism tools to identify paragraphs of text that appear on your page, which also appear word-for-word elsewhere online, either on your own website or on someone else’s.
This might not be ‘duplicate content’ in the strictest sense, as with billions of web pages online, it’s possible for the same sentence or figure of speech to appear in multiple places, purely by chance.
Remember, some duplicate content on a page usually will not harm its Search ranking. The important thing is to publish content that is well-written and of high value to your users.
Google Search Console duplicate content errors
Google Search Console is a useful tool to discover duplicate content, partly because its Index Coverage Report flags up specific pages to fix, but also because it tells you directly from Google which of your pages they have and have not chosen to index.
The Google Search Console Index Coverage Report has several sections. To find duplicate content, look under ‘Excluded’ for the following codes:
- Duplicate without user-selected canonical
- Duplicate, Google chose different canonical than user
- Duplicate, submitted URL not selected as canonical
Any of these three is an indication that your page has been excluded from Google’s Search Index due to a duplicate content issue.
In each case, you can see the important word is ‘canonical’ and we will come on to what that means, and how to use it, in a moment.
What to do with duplicate content
There are several ways to tackle duplicate content within your own website. Here we’ll look at the pros and cons of some of the main methods to fix duplicate content errors.
1. Create unique content
By definition, unique content will avoid triggering a duplicate content error. It’s good practice to create unique content with good intrinsic value, as this will generally perform better.
Over time, a website built using unique content should offer significantly more value in the Search results, compared with a repetitive or plagiarized website.
Unique content also avoids copyright issues, for example if you use press releases or manufacturers’ product descriptions word for word without express permission.
However, unique content takes time, effort and money to produce, so you might want to include some alternative methods alongside your unique content marketing strategy.
2. Use different media formats
In general, duplicate content errors only apply to the text on a page. Google Search is not yet intelligent enough to detect identical wording in videos and audio clips.
John Mueller discussed this in another Google Search Central Office Hours live video, also in January 2021.
He said: “We would not say ‘the text in this video is exactly the same as a blog post, therefore we don’t show either of them or we only show one of them’. So if you have a video that matches your blog post I think that’s perfectly fine.”
This means if you have a content marketing strategy that includes multimedia such as infographics, videos and podcasts, there should be no duplicate content concerns about publishing a transcript or identically worded blog post or landing page to your website.
3. Use canonical tags
We saw the word ‘canonical’ mentioned in the Google Search Console status codes listed above. So what is a canonical tag and how can it help duplicate content errors?
According to Google: “A canonical URL is the URL of the page that Google thinks is most representative from a set of duplicate pages on your site.”
This might be because the same page has different possible URLs (e.g. /dresses/size12/ and /size12/dresses/ in dynamically generated ecommerce URLs) or because a significant chunk of text (e.g. a long item description) is duplicated on different static pages.
You can specify a canonical page in five ways:
- Include a <link> element in the page code
- Use a rel=canonical HTTP header
- Specify canonical pages in your sitemap
- Use a 301 redirect when deprecating old duplicate content
- Indicate a canonical AMP variant for AMP pages
No matter which method you use, Google may decide to overrule your canonical page preference, if its robots decide a different version of the page offers better performance, such as faster loading time or a more mobile-friendly layout.
What to do next
Fixing duplicate content errors is a three-step process:
- Audit website content for duplicate pages
- Decide a method to fix duplicate content errors
- Fix existing errors and adopt best practice in future
With these three relatively simple steps, you can ensure future content marketing campaigns are not hampered by duplicate content effects, allowing you to drive maximum traffic, engagement and revenues from your ecommerce site.
If you want to know more about content strategy, check out our dedicated services page.