How to prevent duplicate content across international websites
International SEO is an important element to larger businesses that want to expand to new languages and locations. However, the International SEO Strategy needs to be carefully considered due to the potential negative impacts that incorrect implementations of SEO practices can have on site.
What is International SEO and its Importance?
International SEO is how a domain with alternate language versions, or versions of the same site across different countries, is optimized to improve its organic search performance in those targeted countries and languages.
By improving the organic search performance of these multilang or multi-regional sites, it improves a company’s visibility and offers a whole new range of customer insights.
There are different elements to consider with International SEO, such as the Search Engines used in different countries. Users in Russia would prefer to use Yandex over Google, users in China would search through Baidu.
These Search Engines treat websites differently, so understanding how they work is crucial for your website to succeed internationally.
What is Duplicate Content?
There are two different types of duplicate content – Internal and External. Internal duplicate content is when the same or similar content is detected across a domain, which causes different pages to compete against each other.
External duplicate content is when multiple domains under a Group have the same or similar content, and so each site is competing with the other.
There is no penalty for having duplicate content, unless it has been directly copied from another site. However, we have seen Google reward unique and interesting content. Regarding the Duplicate Content Penalty, Google has said:
“Duplicate content on a site is not grounds for action on that site unless it appears that the intent of the duplicate content is to be deceptive and manipulate search engine results. If your site suffers from duplicate content issues, and you don’t follow the advice listed above, we do a good job of choosing a version of the content to show in our search results.”
There are potential negative impacts as a result of duplicate content, such as indexing issues, crawlability and overall performance of the pages or sites.
Duplicate Content Across International SEO
Across international site versions, it’s important to distinguish clear signals between directories of the same language, despite their equivalence. For example, having the exact same content on /fr-fr/ and /fr-be/ directories may cause search engines to treat the pages as equal, which could lead to indexing issues, where one directory is chosen over another, despite hreflang being implemented.
The primary reason for this is that search engines will struggle to identify a clear unique value proposition between pages and could potentially stop crawling one directory to reduce an excessive crawl budget, which is at the expense of search engines.
This also causes a bad UX journey. For example, a page dedicated for users in France may appear in the SERP for a user in Switzerland. This is due to Google not understanding which page to rank for certain search queries as they are all providing the same content, in the same language, and therefore the same beneficial purpose and value proposition.
Below is an example of this issue. You can see for the French Canadian version of the domain that the majority of sessions are coming from France and French Canadian users are only the third highest in terms of sessions.
This suggests that Google does not see these domains as different due to the content being written in the same language and being the same across the site. Google is showing any users who search in French, despite it being a different variation of the language in this case, any version of the domain that is written in French.
For eCommerce websites, this could cause a very frustrating UX Experience, especially if there is different product availability depending on the location.
However, there are many technical and on-page factors SEOs can control, such as ensuring HTML elements are implemented correctly across the domain and by providing unique content.
The Importance of Unique Value Propositions in International SEO
A unique value proposition is a strategy which separates your site from competitors, through making sure each page has a clear intent and adds value to the user.
Unique value propositions are an important factor of International SEO as Google needs to be able to identify a unique value proposition or benefit between each page for them to be seen as individual pages. To create value proposition and an understanding of the intent of the search query, what Google is showing for the query and the value propositions that other websites are providing for the query, is needed.
Methods of Preventing Duplicate Content
Unique content is a crucial factor in International SEO. This is because duplicate content can cause issues for a website and confuse crawlers. By avoiding these indexing issues, and stop Google choosing the wrong version of a page to show in SERPS, it will improve the visibility and performance of a website.
There are many ways to ensure the content across a domain is unique, such as understanding the intent of the page and carrying out keyword research. This will allow you to determine what primary keyword you are targeting and the intent of the page, while ensuring these are different across the site. By targeting more specific long-tail keywords, Google can better distinguish which page is more suitable for a search query.
Another way to prevent duplicate content is by optimizing the meta data. By changing the Titles, Title Tags, H1s, and Meta Descriptions, this could signal to crawlers that the pages are in fact different rather than being multiple versions of the same page.
In terms of International SEO, the best practices for content production on multilang sites is to manually write the content, rather than using an AI translation tool such as Google Translate, or translating all the different language versions of a page from a singular English version.
This can also help produce more localised content for each country or set language, as translating content from English will not always sound correct in another language. There are also other variations of keywords that are more familiar for users in their mother tongue, which in turn could help with rankings as users are more likely to search for that specific term.
Further Methods to Improve Indexing
A canonical tag is a part of the HTML code that dictates which page should be seen as the ‘main’ page that should be indexed when there are similar or duplicate versions of the page.
A canonical tag can be used to show that a French language product page should be indexed under the ‘/fr/’ version of a domain, rather than the English language product page version on the ‘/gb/’ site. They also indicate to Google which version of the page to rank for relevant search queries.
The canonicals should be either self-referencing, stating that that page is the main version that should be indexed, or pointing to the correct language version. This can be checked through the Coverage section on Google Search Console.
Below is an example of Google choosing a different canonical than the one in the HTML code, in this case it is due to the content being identical.
Here we can see that Google has chosen the ‘/en-ca/’ version of this page as the authoritative page and thus indexed it, despite the canonical tag pointing to the ‘/en-gb/’ version as being the ‘main’ page due to the duplicate content across the site. See our guide to Canonical Tags for more information.
Hreflang tag attributes are HTML elements that tell Google which language a specific page is targeting, which in turn helps Google to understand which language version to show users in different countries.
Incorrect implementation of Hreflang can have a negative impact on crawlability and indexing. There are various aspects to consider when implementing Hreflang, such as where to put the tag attribute. It can be placed in either the on-page markup, the HTTP header, or in the sitemap. For a more in-depth tutorial on how to correctly implement Hreflang tag attributes, see Google’s guidance and our guide to advanced Hreflang.
Hreflang should also be self-referencing and if there are any further suitable pages, these should be included. For example, if a product is available in the UK, France and Canada, only use the ‘en-gb’, ‘fr-fr’, ‘en-ca’ tags, rather than including all the various languages. If the product on the French site becomes unavailable, the Hreflang on both the English and Canadian sites will need updating to remove the ‘fr-fr’ tag.
XML Sitemaps differ from HTML Sitemaps as they are purely for the crawler to show which pages to crawl and index across the domain. In contrast, HTML Sitemaps can be seen by a user and often include the important pages that you want Google to index.
A domain can have multiple XML Sitemaps. This signals to crawlers that there are separate sections to a site. In terms of International SEO, you could split each language variation of the site to have their own XML Sitemap, or even split them by country depending on how your site is displayed.
The knock-on effect of having multiple, logical XML Sitemaps could be successful crawlability, as you are essentially pointing the crawler to each individual page. This makes the process more efficient, which means more pages will be crawled in that timeframe, and hopefully indexed.