Clustering key requests. Clustering of the semantic kernel and search engines

Queries clustering sorts (breaks) list semantic kernel (Sia), on the group by similar, which makes it possible to optimize the site pages to further optimize them.

How do requests cluster?

The tool analyzes the issuance of Yandex on each request and compares it with the issuance of other requests from the list. If the same relevant pages are located in Top 10, these requests are defined as similar and placed in one group. This means that you can optimize one page.

Request clustering threshold This is the number of coincided relevant pages in the extradition, on different requests. Simply put, if you enter two requests to Yandex and the top 10 will be two identical pages (two out of ten), then when setting a "clustering threshold 2", these two requests will be placed in one group.

Cons of manual query grouping

Grouping key queries, also known as a breakdown, is performed SEO optimizers Immediately after collecting.

With a large number of requests, it is difficult to manual mode Determine their similarity among themselves, it is necessary to either enter each request in the search, or rely on intuition / experience, which can play a keen joke when promoting and not to give the necessary results.
High cost, which was formed due to the duration of the process. The high-quality breakdown of semantics with 500 requests on board goes on average 4..16 hours. It is necessary to deduct each request, to determine its group (the presence of which must be kept in the head), if necessary, reweper by searching or services ... BRRR.

Pros of automatic queries grouping

The speed of the breakdown is approximately equal to the speed of sound. The system will check the issuance of each request, compare them and will give the opportunity to correct the possible minor exceptions manually, after which the result can be unloaded into CSV file. (Excel).
The accuracy of the result is available by eliminating the human factor. A person can distract and lose the thought, to forget, unaccept or just not be able to make a breakdown correctly, with the program such difficulties are not observed.
The tool is fully provided on free based; it does not require monthly wages, holidays, hospital; He also has no schedule of work: 24/7 works.

The breakdown is a very important process when promoting, it sets goals to optimize each project page and the entire site as a whole.

Today we will talk about such an important process in preparing for the creation of content for the site, like clustering of the semantic kernel. This grouping key queries from the nucleus in groups so that each group corresponds to its own page. After collecting keywords For the project, they are a list without any structure and hierarchy. In the list of requests as very similar to each other, and which differ significantly in meaning.

Clustering provides for grouping queries for maximum relevance to one page. Clusterization is carried out both by hand and automatically using a variety of services that exist on the Internet. Let's consider all the techniques that it is possible to apply distributing and grouping requests from the site pages.

The content of the article:

Manual clustering of site core requests

Despite that it is quite time-consuming and takes a lot of time, is the highest quality grouping. Suitable for small projects when the number of keys is not too large. If the keys are several thousand, then it is better to carry out automatic clustering, and then refine the results with your hands.

It is done simply. Keys are collected in separate groups in meaning. To be clearer, I will give an example. There are collected keywords for an informational project on infants. You need to split the entire semantic kernel on the groups on individual diseases to write about them and about the methods of treating specific articles.

Choosing from the mass of keys collected upon request "infants", those that contain the word "tremor", we will collect a key cluster, where all requests relating to the tremor in infants will be. It is them that needs to be used in writing an article about illness.

In manual grouping, the usual Excel or Google table will help to sort, filter and highlight the desired lines and words in them. Also there is free Servicesfacilitating manual grouping. This, for example, a Keyword Assistant service that allows you to highlight the desired keys from several clicks. general list And put them in the group.

Automatic Request Clustering - Online Service

Automatic clustering performs according to certain algorithms. She does the same thing as a person. Of the advantages it is worth highlighting the speed of work thousands of times faster than manual, as well as the analysis of keywords and their position in the search results.

Of the minuses - the complete lack of logical thinking from the algorithms, due to which it is often incorrect to inclusion in groups in groups. Also, clearly suitable requests may not be included in one groups. Example, requests "How to wear a child at a low temperature on the street" and "The temperature of 39 in a child" are inquiries related to different groups, but the algorithms are more often united in one.

All the same, after automatic clustering, the semantic kernel must be refined by hand, leading it to the perfect form. From the quality of the grouping directly depends on the further optimization of the site.

For automatic work, I recommend the service Rush Analytics.which is a powerful tool to help the optimizer. It is enough to add all key requests and the program grouped them as quickly as possible. The only minus - the service is quite expensive and if you need one-time use, it is better to find an optimizer with a subscription. For a hundred and other rubles, he add keys to run to the service.

The following resources are relevant: SEO Intellect, Keyassistant.

Features of clustering semantics for commercial sites

If everything is clear with requests for information projects. Here we have stop-words are all commercial phrases with the word to buy, order, etc., then with commercial not everything is so unambiguous. It is worth a little to pay attention to how it is better to group the keys, for example, for the online store.

For example, the electronics store will sell televisions. There are a lot of requests with the word TV and we need to cluster them. All requests that belong to commercial, type "buy television samsung."," Buy TV Diagonal 43 ", we distribute over your clusters: according to brands, diagonal or other properties.

But informational requests like "how to choose good tv"Or" Which TV is good in 2017 "we sort separately and in the structure of the site we provide for a blogging section, where we will tell users about choosing or the advantages of this or that property of the technique. So we can attract more traffic due to and information requests.

As you can see, the clustering of the semantic kernel is easy, just takes enough time. But it is one of the pillars in preparing for the launch of the project.

Read Articles on this topic:

Darim 200 limits on the account to try!

Clustering keywords - This is an automated distribution of queries on issuing groups search engines.

Rush Analytics Clustering Algorithm will collect the top10 Yandex or Google URL for each of your keyword, compares the results for each keyword and groules requests exactly how they will be successfully advanced in search engines, and how it will be convenient and logically create pages on the site.

In Rush Analytics, clustering can be held in two methods: Soft and Hard

After processing requests, you will get almost ready and correctly formed, from the point of view of search engines, the structure of the site. And based on the frequency data for each group of keywords, you can easily decide on creating additional pages on the site.

Check out the video tutorial video guide

FAQ on clustering: the most frequent questions of our users

Clustering - grouping keywords based on comparison of search engines. The algorithm will assemble the top10 URL according to your keywords, compares the results for each keyword and groules requests exactlyhow they will successfully advance in search engines, and how it will be convenient and logically create pages on the site

You need to download a keyword list in Rush Analytics and their frequency (any) or to post keywords as the main (marker requests) and all others.
To use the combined clustering algorithm, you will also need frequency and marker marking. Read a little lower about it.

Clasting accuracy indicates how many common URLs should be in the search results for two requests so that we combine these requests to the group.
In other words, the greater the accuracy of clustering (grouping), the more similar phrases fall into one group (cluster).
For most topics will be quite accuracy \u003d 5.

A: Each topic has its own, necessary and sufficient threshold to similarize issuing to get a high-quality semantic kernel. For example, when promoting online stores, there will be a big problem if when clustering requests, the keywords "Multivarka Redmond RX500" and "REDMOND RX500-1 Multivark" will fall into one cluster - because These are different products and they must move to different goods cards. Here we recommend using accuracy \u003d 5

If the traffic to the site is mostly Russian and from Yandex - optimally making clustering on Yandex, choosing a region for which the site is moving.
You can also use both search engines, and then compare the results. Often the results are very similar between the search engines.
If you are promoting a site for other markets - clustering is already available for all regions and the world's issuance languages.
In a short time, we will add the functionality of the country's choice and the city for clustering for the issuance of Google.com. If you are interested in this feature - vote in our community and it will appear much faster - a link to the vote

Yes, you can. And sometimes even need.
When can you combine two clusters in one?
Often, such keywords as "buy multicookers Redmond" and "REDMOND Multivarka Multivarka can fall into different clusters due to the poor quality of issuing in Yandex and Google on these requests.
In this case, you need to combine these clusters in one and promote to the page multicual Redmond.. This is a completely normal situation.
When can you unite two clusters in one?
When in one cluster information requests, and in another commercial. For example, clusters "buy multicookers Redmond" and "Review Multividual Redmond" can not be combined because These requests should be fundamentally advanced on different pages.
I doubt to unite two clusters or not what to do?
We tell in detail what to do in this case in this manual.

Because words from the tab "Nellusterized" did not find a couple of for a cluster. Unfortunately, not all keywords can be grouped - because Not all of them are interconnected.
We are guided first of all how keywords will be advanced (ranked) and group them on the basis of similar search results.
For example: requests "Mobile phone" and " cell phones"Must move to different pages. One request is information, and the second commercial and they will never come to one page.
What to do with nonclusterized requests?
If you find the keywords valuable for you - they can be added to the manual to add to already existing groups (could not be confused due to bad issuance) or create separate pages on the site under these words.

Before clustering from the list, all phrases containing stop words will be excluded. Those. Dubbown keywords will not be used in clustering and will be discarded even before comparing requests.
We recommend using this option if you download the "dirty" list of keywords in the clustering project. The functionality helps to save the budget for clustering and solves the problem of manual, tedious cleaning of stop words in Excel. We offer to use the finished lists of stop-words on geo requests and various subjects, or create your own list of stop words.

Step-by-step algorithm for working with the service:

Creating a project.To create a project, you need to go to the Clusterization tab and click "Create new project"

Step One: Search Engine and Region.
Here you need to enter the name of the project (required field). You can enter any name, it is often convenient to enter the name of the site so that in the future it is easy to find the desired project.
Next, we specify the search engine, according to which the group will be executed. You can choose or Yandex or Google.
For Google this moment All regions and languages \u200b\u200bof the world are available.
Step Two: Collection Settings

All about our clusterization algorithms
Clusterization method:
- Soft clusterization: In this method of clustering, the algorithm determines the central (marker) requests and compares with them all other requests. The algorithm is great for clustering keywords for traffic projects: online stores, information sites, service sites with non-communication competition.
- Hard clustering: Requests are combined into a group only if there is a common URL set for all requests. At the same time, the type of clustering is grouped less keywords, but with very great accuracy. Ideal for competitive high-frequency queries.
A type - Selection of clustering algorithm.
We have 3 clustering algorithms:
- Clustering with manual markers
- Clustering on WordStat.
- Combined clustering algorithm (manual markers + WordStat)
They work in the same basic principle - comparing the similarity of the tops of the search engines, but are intended to solve several different tasks.

Algorithm using manual markers:

This algorithm is most effective to use when you have a finished and rather branched structure of the site (directory), and you need to know all the markers and you just need to understand what kinds you are going to promote existing pages, and the tasks of the expansion of the site structure should not be. In this case, you take your markers (names of categories / pages), collect tips on them, place markers as 1, assembled cloud as 0 and send on clusterization. At the exit you will receive the finished semantics for your categories, and words that are not attached to your structure will remain nonclusterized.
Data Download format: Keyword | Marker (1/0) - Download an example of input file

WordStat clusterization algorithm

This algorithm is rather solves reverse algorithm Hand Markers Task: You still do not know the structure of your site and can not highlight the markers - you simply collected WordStat, tips and tip frequency. Now you need to structure this semantics to get groups of requests under the page of the future site or future categories of the existing site. In this case, the clustering algorithm on Wordstat is suitable as it is impossible, it works as follows.
The entire list of keywords is sorted in descending order of frequency, the algorithm tries to bind all possible words from the list to the most frequent word and forms a cluster, then everything is repeated iteratively for the following keywords.
Do not worry about the fact that the keywords can at first pass the algorithm to be attached to the wrong cluster - we use machine learning algorithms built on binary trees to prevent it :)
Data Download format: Keyword | frequency (any) - Download an example of an input file

The combined algorithm (manual markers + WordStat) - combines the approaches of the two previous methods.

This algorithm is suitable for the task of simultaneous selection of keywords for the existing structure of the site and its expansion. It works as follows: First we are trying to bind all possible requests to your marker requests and form the finished structure attached to your markers. Further, all requests that were not tied to markers are sorted by descending frequency and are grouped together. As a result, you get:
a) Ready semantics for existing site categories
b) Extension of semantics for your site.
We strongly recommend using a combined algorithm - He gives the best result.
Data Download format: Keyword | | Marker (1/0) | frequency - download an example of an input file
All you need to know about clustering accuracy
Accuracy- The greater the accuracy of clustering (grouping), the more similar phrases fall into one group (cluster).
In other words, this option is responsible for how many common URLs are needed in the top 10 of the search engine, so that the keywords fall into one cluster.
Each topic has its own, necessary and sufficient threshold to similarize issuing to get a high-quality semantic kernel. For example, when promoting online stores, there will be a big problem if, when clustering requests, the keywords "REDMOND RX500" multicooker and the Redmond RX500-1 multicooker will fall into one cluster - because These are different products and they must move to different goods cards. Here we recommend using accuracy \u003d 5
For info-subjects, for example, for discount sites or recipes, such accuracy is not needed - here the task is to get the maximum number of grouped clusters for writing articles. For such sites, we recommend the accuracy of 3 or 4. And for sites in very competitive subjects, where the struggle for the top is mainly on competitive RF requests - we recommend using increased clustering accuracy - 6 or 7, and to create separate pages for non-locking queries.
It is recommended to choose options 3-6 and on the results of watching which clustering will have enough completeness and accuracy for your semantics. The greater the accuracy value, the smallest groups will be.
Other clustering settings
Not cluster if the frequency is less than - This option allows you to not cluster keywords with frequency less than specified. It will relieve you from manual cleaning of low popular queries - such words will be placed in the "not clustering" tab.
Definition of relevant URL For clusters of an existing site
It is enough for you to enter the name of the desired domain and our algorithms will try to determine the relevant URL for the clusters received.
The option works as follows: If your site is already in the top 10 - we will show this URL and highlight it with green. Otherwise, we will select the URL for the marker request using the Site :.
IMPORTANT: Relevant URLs are selected for the marker (main) cluster requests and are assigned to the entire cluster (all cluster keywords).
Step Three: "Keywords and price".
Upload a file with requests.
Supported formats: XLS, XLSX. Data entry format: request; marker or frequency. For clustering using WordStat + manual markers. Data format: query; marker; frequency.
We introduce stop words
Before clustering from the list, phrases containing stop words will be excluded. The functionality helps to save the budget for clustering and solves the problem manual cleaning Stop words. The functionality is especially useful if you cluster a "dirty", pre-not cleaned list of keywords.
We offer to use the finished lists of stop-words on geo requests and various subjects, or create your own list of stop words. And do not forget about the "Expert Options" - the default symbolic compliance is used - i.e. Partial entry will delete all the word / phrase, if you need to accurately match the stop-word - choose phrase compliance.

Press "Create a new project" - All your project is sent to clustering!

Now you can track the project status in the "queue" tab or in the list of clustering projects.
At the moment, Rush Analytics has 5 status:
Queue - The data are not yet collected, the project is waiting for its turn to collect data
Data collection - Counter shows how many keywords are processed
Clustering - Project data are already collected, the system calculates all the necessary metrics to provide you with the result.
On pause - You can manually put the project pause, if you are not sure what you want to collect it. Or, the project can have a pause itself. You have ended money on the balance sheet.
Ready - Project is ready - you can see the results in the web interface or download in XLSX format

Clustering output file - column description

The result of clustering in XLSX format is as follows:

Surface sediments - Marker requests - you specified manually, or defined by the system
Name cluster - the name of the marker request is taken
Cluster size -number of keywords in the group
The frequency of keywords - The frequency you asked in step "Keywords". Depending on which you have taken frequency - basic, in quotes or with an exclamation mark, the results of clustering may differ slightly
Total cluster frequency - The amount of the frequency of all keywords of the cluster
Coincidences Top.- the number of common URLs in the search for this request With the issuance of the reference (marker) request
Backlight - backlighting from the issuance of search engines collected by your keyword
Backlights for cluster - backlighting without duplicates, in all words of this cluster
TOP URL - The most visible competitor's URL on all requests of the cluster. Here we evaluate the frequency of occurrence of competitors URL in extradition for each request and position of each URL of competitors in extradition
Relevant URL - Found relevant URL for the cluster, if the option "Defect Relevant URLs has been selected
The option works as follows: If your site is already in the top 10 - we will show this URL and highlight it with green. Otherwise, we will select the URL for the marker request using the Site operator:

Examples ready-made files After clustering, you can see in our portfolio

Clustering keywords - This is an automatic distribution of queries on thematic clusters (groups) based on the similarity of the search results of Yandex or Google. Clustering is done to solve the following tasks:

To understand which requests need to be promoted together and one page, and which separately
To turn a huge number of requests from the semantic kernel from Kashi in a clear and logical structure
To immediately bind whole groups of requests to the already existing pages on the site and make promotion as efficient as possible.

This method Grouping queries appeared on the market quite recently, but already gained great popularity. What are the advantages of this method?

Advantages of clustering by the method of tops

Unambiguous definition of requests that must be moved to one page and vice versa - queries that will never advance for one page, despite their similarity
Accounting for synonyms and reformulations - when grouping by the method of tops such requests as "Overalls", "Clothing for workers", "Workwear" to comply with and clarify correctly, not lost

Huge speed of grouping queries of the semantic kernel. Unlike manual parsing or parsing using templates in Excell, clustering by the method of tops takes a few minutes, and not hours, days, or weeks

Disadvantages of clustering by the method of tops

With low quality issuance on request or in general in the subject (many irrelevant answers, many forums, the presence of Dorvel Etc.) the quality of clustering is proportionally declining
The difficulty of implementing this grouping method: a complex multistage algorithm is needed, you need to collect a lot of data from issuing

Clustering requests in Rush Analytics

By creating a clustering module in Rush Analytics, we tried to make it most flexible and convenient for our decision to suit any task and any subject, namely:

High speed collection and grouping. Clustering of the semantic kernel, depending on its volume will take from a few seconds to a few minutes
Customizing Grouping Accuracy - Depending on the quality of issuing in your subject and other factors, you can choose the appropriate clustering accuracy - from 3 to 8
Three clustering algorithms:

A) by Wordstat - vertices of the cluster (the requests to which the rest will be attached) become the most frequency queries. Great for information topics.

B) on markers- You yourself choose marker requests that will be the tops of the cluster. Great for shops with prevailing commodity demand.

C) hybrid algorithm - Markers are specified manually, a grouping of requests is made. For requests that failed to bind the first method, the tops of WordStat clusters are automatically selected and repeated clustering is performed. This method allows you to achieve maximum accuracy and completeness. Suitable for any projects

Simple I. understandable interfacewhich will be able to figure out both newcomers and experienced specialists.
Responing support service. If you have any technical problems Or simply need help on any issue of clustering, collecting tips or WordStat, our support will be happy to help you.

Grouping keywords (or in another query clustering) is the separation of keywords for homogeneous groups on certain features. Produced to increase the relevance between keywords and advertising announcements. As a result, the quality of advertising campaigns is improved and the cost of clicking on the announcement is reduced.

When preparing advertising campaigns, keyword grouping takes quite a long time. Some experts use the approach "1 keyword - 1 advertisement group", which discardes the need for grouping and saves time. But it is not always convenient to use, especially when large quantities Low-frequency keywords. If clustering of keywords has been made, that is, the division into semantic groups, then the structure advertising campaign It becomes more understandable and easier than analyzed.