Content duplication happens when the same content can be found at more than one URL. Such errors should be corrected to ensure that the website is highly-rated by Google. Check how to do it and what tools to use.
Why does the duplicate content occur?
There can be many reasons for this, but it usually happens when:
- additional parameters (or alternative URLs) appear in the URL – this will happen when you add a value to the original URL www.example/seo-mistakes and you get www.example/article-on-blog?param=seo-mistakes
- the website has separate versions, e.g. www.site.com and site.com (with and without the prefix "www"), and the same content will be included in both versions
- in relation to the above, there will be active and visible for search engines versions, both at http: //, as well as at https: //
- a slash “/” will appear at the end of the address, e.g. www.site.com/ and you already own the address www.site.com
- your content is used on other websites and it does not always lead to your original article, which prevents the search engine from identifying them and "sees" another version of this article.
Why is content duplication so bad?
It’s harmful because in such a situation Google won’t know which page should be ranked higher. The search engine will be "forced" to choose from among identical content. Then it may happen that the original page won’t be the one selected for the best search results. In other words: Google will either "refuse" to rank any of them or will use an algorithm to recognise the page for the given query.
How to fix these errors?
At our Drupal agency, we help to solve this problem by:
- Setting up a 301 redirect from the duplicate page to the original content page.
- Using the rel=canonical attribute, which "tells" search engines that the given website should be treated as if it were a copy of a specified URL and that all links should be assigned to this specified URL. Then we add the rel=canonical attribute to the HTML header of each duplicate version of the website and replace the "ORIGINAL WEBSITE URL" part above with a link to the original (canonical) website.
- Ensuring that the distributing website links back to the original content and not to the URL variation.
Important e-commerce tip: If your products are offered by resellers, be sure to provide a unique description of these products on other websites, so that it differs from the one on the original website.
Tools to check duplicate content
There are many tools to verify website content for duplication. Below we’ll list a few that caught our attention. It’s good practice to combine them and analyse your website using at least two different tools. The results are more reliable then and you can see which content needs to be fixed first.
Siteliner. It’s one of the most intuitive and easy-to-use tools. The results are presented in a clear and intelligible way. After entering the main page, just enter your website address to go to the analysis. Then you receive a list of URLs in which there is a suspicion of content duplication. The list is enriched with information on the number of duplicate words. After clicking the URL, you go to further verification. Here you need to confirm whether Siteliner's suggestions are correct. It’s up to the user to judge whether a given content is actually a duplicate.
Duplichecker. This is another interesting tool that can help you in a slightly different manner. It allows you to verify the created but yet unpublished content and compare it with a specific URL address. You can load a .docx file and then find out what percentage is suspected as duplication. Of course, in this case also it’s up to you to make the final decision whether to edit the text to minimise the risk of a duplicate. The tool is also useful when your website contains a lot of content, e.g. new entries from different authors appear regularly on the blog and you want to check if a similar topic has appeared before. To do so, you just need to paste into the dialogue window a paragraph of the content that you plan to write, and you’ll receive information whether something similar already exists.
Plagium. Plagiarism Checker is similar in terms of functionality to the previous example. In this case, you can find out if and where the content is published on the Internet. In other words: you may be able to find out if someone has plagiarised your content. To do so, you need to paste a fragment of the text that you want to check into the dialogue window. After a while you’ll get a list of pages with information about the degree of similarity to your content and the number of duplicate phrases. At this stage, you can already assess whether it’s worth delving into a given analysis. If it actually is, then after clicking on a given URL you’ll be redirected to the subpage where the fragment of the phrase which Plagium suspects may have been stolen from you is marked. Is there any added value? Yes, if you plan a campaign advertising your services, you can check whether similar slogans already exist on some other websites.
We’ve already mentioned that there are many tools of this type. You can also check SEO Review Tools, Duplicate Content Checker, or Yoast – which we wrote about in regard to SEO configuration of a plugin for a Drupal website.
Plagiarism – and what next?
Although you already know how to check if the content on the websites you own contains duplicates, the situation is different when you find out that someone appropriated your content.
You should try to contact the owner of the website on which your content was published and inform them about this situation. It’s good practice to ask first. There is always a chance that the person didn’t know that they came into possession of someone else's intellectual property.
You should also consider whether it’s a good idea to request removing the duplication. It may turn out that the website is of high quality and is worth keeping the content there, providing your name as the author and the link to your website.
When you don’t know who owns a particular website, check on Whoishostingthis.com who is hosting that website. Then you can notify the hosting company that the site is using copyrighted content.
Firstly, remember about original, unique content. If you create your website's content this way, you’ll minimise any potential duplication. Secondly, don’t forget not only about the ongoing verification, but also about the entire SEO audit. The issue of duplication is an important, but not the only element that affects how Google rates a given page. I encourage you to read the text on the Drupal SEO audit and implementation of improvements to gain comprehensive knowledge on this issue. Thirdly, if you want to use someone else's content by posting, e.g. a quote, remember to provide the source and possibly redirect to another website.