Search for duplicate pages. Internal Enemy - Duplicate Pages

We have released a new book "Content Marketing in in social networks: How to get into the head of subscribers and fall in love with your brand.

Duplicate pages are pages that are identical to each other at different URLs. Copies of pages make it difficult to index sites in search engines.

What are duplicate pages on the site

Duplicates may occur when used different systems content filling. It's okay for the user if the duplicates are on the same site. But search engines, having found duplicate pages, they can apply a filter / lower positions, etc. Therefore, duplicates must be quickly removed and try not to allow them to appear.

What are the types of duplicates

Duplicate pages on the site are both complete and incomplete.

Incomplete takes- when fragments of content are duplicated on the resource. So, for example, and placing parts of the text in one article from another, we will get partial duplication. Sometimes such duplicates are called incomplete.
Full takes are pages that have full copies. They worsen the ranking of the site.

For example, many blogs contain duplicate pages. Duplicates affect rankings and reduce the value of content to nothing. Therefore, you need to get rid of duplicate pages.

Causes of duplicate pages

Use of the Content Management System(CMS) is the most common cause of duplicate pages. For example, when one entry on a resource belongs to several categories at once, whose domains are included in the site address of the entry itself. The result is duplicate pages: for example:
wiki.site.ru/blog1/info/
wiki.site.ru/blog2/info/
Technical section s. Here Bitrix and Joomla sin the most. For example, one of the site's functions (search, filtering, registration, etc.) generates parametric addresses with the same information in relation to a resource without parameters in the URL. For example:
site.ru/rarticles.php
site.ru/rarticles.php?ajax=Y
Human factor. Here, first of all, it means that a person, due to his inattention, can duplicate the same article in several sections of the site.
Technical errors. With incorrect link generation and settings in various information management systems, errors occur that lead to page duplication. For example, if the link is crooked in the Opencart system, then a loop may occur:
site.ru/tools/tools/tools/…/…/…

Why are duplicate pages dangerous?

It becomes much more difficult to optimize the site in search engines. There can be many duplicates of one page in the search engine index. They interfere with indexing other pages.
are lost external links to the website. Copies make it difficult to identify relevant pages.
There are duplicates in the output. If the duplicate source is supplied with behavioral metrics and good traffic, then when the data is updated, it can take the place of the main resource in the search engine results.
Lost positions in the issuance of search engines. If there are fuzzy duplicates in the main text, then the article may not get into SERP due to low uniqueness. So, for example, part of the news, blog, post, etc. may simply not be noticed, since the search algorithm takes them for duplicates.
The probability of getting the main site under the filter of search engines increases. Search engines Google and Yandex are fighting non-unique information, sanctions may be imposed on the site.

How to find duplicate pages

To remove duplicate pages, you first need to find them. There are three ways to find copies on the site.

How to remove duplicate pages

You need to get rid of duplicates. It is necessary to understand the causes of occurrence and prevent the distribution of copies of pages.

You can use the built-in search engine features. On Google, use an attribute in the form rel="canonical". A tag is embedded in the code of each take in the form , which points to the master page to be indexed.
You can disable page indexing in the robots.txt file. However, in this way it will not be possible to completely eliminate duplicates in the search engine. After all, you can’t write indexing rules for each individual page, it will only work for groups of pages.
You can use 301 redirects. So, the robots will be redirected from the duplicate to the original source. In this case, the server's 301 response will tell them that such a page no longer exists.

Duplicates affect rankings. If they are not removed in time, then there is a high probability that the site will fall under the Panda and ACS filters.

and if there is a duplicate content, and the url is different, there is a canonical and it is closed in robts, but the page is in the index, how should this be regarded?

Canonical solves the duplication problem.
But if the page got into the index, and then it was closed in robots.txt, then the robot cannot scan it again and recalculate the parameters.

Agree with previous answer. You can solve the problem by sending a deletion request in the search console.

Maxim Gordienko

Why is it recommended to use canonical for pagination pages, instead of deleting the text + noindex, follow + adding "Page N" at the beginning of the Title on the second and subsequent pages of pagination (or you can also add prev / next)? I encountered the fact that when placing canonical products from the second and subsequent pages were poorly indexed.

Was there a practice of using the X-Robots-Tag HTTP header to prevent indexing of pages, since when using robots, pages like this often pop up: http://my.jetscreenshot.com... ?

Canonical is just a recommendation. You can also use 301 redirects for relevant pages. For programs for finding duplicates - I recommend Comparser + shows the structure of the site and there are a few more useful features. Serpstat is expensive.

Better use canonical and prev-nekts and it will be great.

Maxim Gordienko

Seoprofi, for example, writes that it makes sense to put a canonical on pagination only if there is a "show all products" page (and Google's recommendations do not provide an example with pagination in its classic form). And so, the goods (content) on the second page differ from the first, it is illogical to set a canonical.

If you only need to check duplicates, then it is better to use specific software. I recommend Netpeak Spider. It is currently being actively developed and checks a lot of parameters on the site https://netpeaksoftware.com... . We use it all the time at work.
Serpstat is good because it is a platform with many tools: query analytics, links, audit, position check.

Pdkazhіt, bud weasel, we have removed from the site of the online store of the same category, created new ones, in the removed categories of goods and other goods, new categories have been prescribed - after all, new pages of goods have been created in us How better to rob? Make sure the URL is static (not dynamic) for the product and put 301 redirects on the old pages from new creations? (іnternet-magazin іsnuє 6 mіsyаcіv) chi maє zmіnyuvatisya url of the product yakscho zmіnili categorіyu? (in the structure of the URL of the product є naming the category).

1. To avoid duplicating product URLs, put them in the same folder /product/, and the categories are set in the breadboxes menu.
2. Even though there is no possibility of doing so, then choose one of the options.
2.1. Link rel canonical to the main page of the product. As soon as possible, the price of a new page is in your opinion, because the new category is included in the URL. And you yourself choose the main side.
2.2. Tick a 301 redirect to the head URL. If it's not your fault on the site, send a message to the old URL, then send it to a 301 redirect.
3. Product URLs are more likely to be static or User Friendly.
4. "chi maє zmіnyuvatisya url of the product yakshcho zmіnili categorіyu? (in the structure of the url of the product є the name of the category)."
Although it is impossible not to set a category in the URL (as in paragraph 1.), then when you change the category in the URL, you also need to change and change the 301 redirect to a new address.

For such an extensive explanation)

Learn how to avoid duplicate content. There are 33 items of the same type https://delivax.com.ua/pack...
Writing a unique description for each is difficult and doesn't seem to be necessary. But due to the fact that the description is duplicated, out of 33 positions in the index, only 5 hang. Is it worth worrying about this and what to do about it?

One of the main reasons why a site can lose positions and traffic is the increasing number of page duplicates on the site. They may arise as a result of the peculiarities of the CMS (engine), the desire to get maximum traffic from the search by increasing the number of pages on the site, as well as due to the conscious or unconscious placement of links by third parties to your duplicates from other resources.

The problem of duplicates is very closely related to the problem of finding the canonical address of a page by a search analyzer. In some cases, the robot can determine the canonical address, for example, if the order of parameters has been changed in the dynamic URL:

?&cat= 10 &product= 25

In fact, this is the same page as

product= 25 &cat= 10

But in most cases, especially when using , it is difficult to determine the canonical page, therefore, full and partial duplicates end up in the index.

Interestingly, for Yandex, duplicates are not so terrible, and even on the site search results pages (which are partial duplicates of each other), it can bring good traffic, but Google is more critical of duplicates (due to the fight against MFA and template sites).

The main methods for finding duplicates on the site

Below are the main methods by which you can quickly find duplicate pages on your site. Use them periodically.

1. Google Webmaster

Go to google bar for webmasters. Find the menu section "Optimization" - "Optimize HTML". On this page, you can see the number of duplicate meta descriptions and TITLEs.

In this way, you can find full copies of pages, but unfortunately, you can not determine partial duplicates, which have unique, however, template headings.

2.Xenu program

Xenu Link Sleuth- one of the popular optimizer programs that helps to conduct a technical audit of the site and, among other things, find duplicate titles (if, for example, you do not have access to Google Webmaster).

More details about this program are written in a review article. Simply crawl the site, sort the results by title, and look for visual heading matches. With all the convenience this method has the same drawback - there is no way to find partial page duplicates.

3. Search results

Search results can reflect not only the site itself, but also some attitude of the search engine towards it. To search for duplicates on Google, you can use a special query.

site:mysite.ru -site:mysite.ru /&

Where the components are:

site:mysite.ru- shows the pages of the site mysite.ru, which are in the Google index (general index).

site:mysite.ru/&- shows the pages of the site mysite.ru participating in the search (main index).

Thus, you can identify pages of little information and partial duplicates that do not participate in the search and may prevent pages from the main index from ranking higher. When searching, be sure to click on the link “repeat search, including missing results” if there were few results in order to see a more objective picture (see example site: drezex.com.ua -site:drezex.com.ua/&).

Now that you have found all duplicate pages, you can safely remove them by adjusting the site engine or adding a tag to the page titles.

- who is working on website promotion. It can create two identical master pages that differ in addresses.

Search engine algorithms work automatically, and it often happens that a duplicate is perceived by the system as more relevant than the original page. As a result, the output will not return the original, but its duplicate. In turn, the duplicate has other parameters, which will later affect the pessimization of the site.

Exist various ways search and check for duplicate pages. From the performer, they require varying degrees of CMS knowledge, as well as an understanding of how the search index works. Let's try to show you the simplest way to check the site for duplicate pages. We note right away that this method is not very accurate. But, at the same time, this method allows you to search for duplicate pages of the site, and does not take much time.

Now let's see how to do the same in Google system only. In principle, the procedure is no different, you will need to perform the same actions as in Yandex.

The Yandex search engine immediately provides a list of duplicates, but in Google, in order to see duplicates, you will additionally need to click "Show hidden results", since the original page is often displayed on the screen.

From the picture you can see that in the main issue there is 1 page of the site, and it is also the original. But there are other pages in the index that are duplicates. To see them, you need to click on the link “Show hidden results”. As a result, we get a list where the original is at number 1, and then duplicators are already placed. Often duplicates will have to be cleaned manually.

How to check a website for duplicate pages

In the article below, today we will try to consider many issues related to the problem of duplicate pages, what causes duplicates, how to get rid of it, and in general, why you need to get rid of duplicates.

To begin with, let's figure out what lies under the concept of "content duplication". It often happens that some pages may contain partially or completely the same content. It is clear that each individual page has its own address.

Causes of duplicates:

- site owners themselves create duplicates for specific purposes. Let's say it could be a printable page that allows a visitor to a commercial site to copy the necessary information about a particular product or service.

- they are generated by the engine of the Internet resource, since it is embedded in their body. A certain number of modern CMS can produce similar pages with different URLs, which are located in different directories.

- errors of the webmaster who is working on website promotion. It can create two identical master pages that differ in addresses.

- changing the structure of the site. When you create a new template with a different URL system, the new pages that contain the old content get different URLs.

We have listed possible reasons occurrence of clear doubles, but there are also fuzzy, that is, partial. Often, these pages have a similar part of the resource template, but their content is slightly different. Similar duplicates can be site pages that have the same search result or separate element articles. Most often, these elements are pictures.

Get rid of duplicate pages. No, this is not a virus, but it also grows over time, however, this does not depend on the resource itself. Duplicates are often the result of an unprofessional webmaster, or the result of an incorrect site code.

It is important to know that duplicates can cause considerable damage to the resource. What are the consequences of the presence of duplicates on the site? Firstly, this is a deterioration in the indexing of the resource. Agree that such a situation will not please the owner of the site. While finances and time are constantly spent on promoting the resource, the resource begins to lose its popularity in a few days. The depth of the problem will depend on the number of takes.

It happens that the main page may have a couple of duplicates. With blogs, things are somewhat different. Thanks to replytocom, there can be a huge amount of duplicates due to the copying of comments. It turns out that the more popular the blog, the more duplicates it will contain. In turn, search engines Google features, due to the presence of such duplicates, it underestimates the position of the resource.

Search engine algorithms work automatically, and it often happens that a duplicate is perceived by the system as more relevant than the original page. As a result, the output will not return the original, but its duplicate. In turn, the duplicate has other parameters, which will later affect the pessimization of the site.

What do we get? Duplicate pages become a real hindrance in the indexing of the site, as well as the reason for the wrong choice of the relevant page by the search engine, and reduce the influence of natural links. In addition, duplicates distribute internal weight incorrectly, reducing the strength of promoted pages, as well as changing behavioral indicators.

How to check the site for duplicate pages?

There are various ways to find and check for duplicate pages. From the performer, they require varying degrees of CMS knowledge, as well as an understanding of how the search index works. Let's try to show you the simplest way to check the site for duplicate pages. We note right away that this method is not very accurate. But, at the same time, this method allows you to search for duplicate pages of the site, and does not take much time.

To search and check your own resource for duplicates, you just need to enter a special query in the advanced search of the search engine. If you use the advanced version of the search in Yandex, you can get quite detailed results due to the fact that here it is possible to enter refinement parameters for the query.

We need the address of the resource and the part of the text we want to duplicate. To do this, we need to select a piece of text on our page, and then enter the copied text and the site address in the advanced search of the Yandex system. Now you need to click the "Find" button, after which the system will start searching.

The results will not be displayed in normal mode. The list of sites will contain only the titles and snippets of our resource. In the case when the system produces a single result, this means that there are no duplicates of this page. But when issuing several results, you have to work.

Now let's see how to do the same in Google system only. In principle, the procedure is no different, you will need to perform the same actions as in Yandex.

Advanced search makes it easy to find all duplicates for a specific piece of text. Of course, in this way we will not get duplicates of pages that do not contain the specified text. It must be said that if the duplicate was created by a curved template, then it only shows, for example, a picture from the original on another page. Of course, if the duplicate does not contain text, then it will not be possible to determine it by the method described above. This requires another way.

The second method is also distinguished by its simplicity. You need to use a special operator and request indexing of your site, or its individual pages. After that, you will have to manually look at the issue in search of duplicates.

The required query syntax rules are:

In the situation when only the address is entered into the search home page, we are shown a list of indexed pages using a search robot. But if we specify the address specific page, then the system displays already indexed duplicates of this page.

The Yandex search engine immediately provides a list of duplicates, but in Google, in order to see duplicates, you will additionally need to click "Show hidden results", since the original page is often displayed on the screen.

As you can see in the picture, in the main issue we have one page of the site and it is also the original. But there are other pages in the index that are duplicates. To see them, you need to click on the link “Show hidden results”. As a result, we get a list where the original is at number 1, and then duplicators are already placed. Often duplicates will have to be cleaned manually.

Today we will talk about duplicate content, or rather, about methods for finding duplicate pages on your resource. The duplication problem modern internet is acute, because if you have duplicate pages on the site, you may be penalized by search engines.

So the first thing we need to know is " what is content duplication (duplicate pages)” and what are their types, and then we will look for ways to deal with them.

Duplicate content is a display of the same text on different pages site (at different addresses). Duplicate pages on the site are of two types:

Full duplicates;
Incomplete (partial) duplicates;

Full takes- this is when one page fully displays the contents of another and at the same time has a different address.?&cat=10&product=25 and https://site/?product=25&cat=10

Incomplete takes- this is a partial display of the text of the page on another. This is for example a news feed in blogs or text in sidebars. Most often they are found in online stores and sites where announcements and news are published.

How to identify duplicate pages on the site.

Below I will give the methods that are used to determine duplicates. There is nothing complicated here, it just takes a little time and patience.

Yandex search results;
Google search results;
The page opens with a slash “/” and without;
The page opens with www and without www;

1. Let's start with the first method, go to your Google webmaster account. Next, go to the tab " View in search or Optimization” and choose “ HTML optimization". On this page, you can find and view all related meta descriptions and titles.

Google Webmaster determines duplicate pages on a site.

This method is great for detecting full takes, partial takes cannot be detected using this method.

2. Next, we consider how you can determine duplicates using search results Yandex. We go to the search engine and enter part of the text, while wrapping it in “quotes” in order to get the exact occurrence of the phrase.

Yandex - check duplicate pages

If only one page of the original appeared in the search results, this is excellent - it means there are no duplicates, but if a couple of pages appear, then there are duplicates that need to be removed.

3. Using the search engine Google systems, you can determine duplicate pages on the site, just as in Yandex. Only then is it necessary to search string enter the request site:moysite.ru -site:moysite.ru/&, where the phrase moysite.ru is replaced by the address of your site. If only one of your sites was found in the search results, then there are no duplicates, if there are several, it is necessary to take measures to combat duplication.

4. Duplication can also be beaten if you use . The system can generate automatic links that will open both with a slash “/” and without.?&cat=10&product=25, you can check if this address opens with a slash at the end “/” https://website/?&cat=10&product= 25/. If it opens and does not redirect () to the above page, then this is a duplicate page. If it redirects everything works fine and you don't have to worry.

5. We determine the mirrors of the main page of the site. Similarly to the method described above, we try to add www or remove site addresses from the front. If it comes to one and the other address, then you have duplicates of the main page and you need to glue them together and select the main mirror of the site.

Look for duplicate content on your resource, as this can lead to bad consequences. If Yandex is even more loyal to duplicates, then Google is very punishing for this and imposes. Duplicate pages are, roughly speaking, Internet garbage, and search engines do not like garbage because it eats a lot of resources. Therefore, I advise you to eliminate these problems even before the article is indexed by a search engine.