As Google evolves it provides a number of features that can restrict or permit the crawling of the bots on a website. We have already seen the use of the robots.txt file in a separate hub which is used to meet this end. We are also aware of the ‘noindex’ meta-robots HTML tag which is placed at the page level in order to prevent crawling of bots at page level. We are also aware of the rel= “canonical” attribute which redirects the bots to another intended page and, thereby, prevents indexing of that page. All these measures are used in different circumstances and for meeting different ends. However, all of these actions can be taken only when you have the access to the website FTP and the URLs are not having too many parameters and the website is easy to manage.
Besides these commonly used measures, Google has also provided an additional tool in its webmasters tools which can be effectively used for handling the pages having URL parameters. You can find it under the Crawl section, as can be seen here:
You shall be careful while using this tool because it restricts crawling totally or partly for parameters which are added. Using this tool does not redirect bots. The most important benefit of using this tool is that it allows you to handle multiple parameters in a site even when you do not have the access to FTP.
When the Content is Same on Pages having same URL Parameter
When you begin to add the parameter, it would ask you whether the pages containing this parameter have same content or different. If there is same content, you just have to select NO and save because the parameters on these pages are essentially used for tracking visits or referrals. If you would restrict the access of bots to these, it would jeopardize tracking which is crucial for management information systems.
When Content is Different on Pages having same URL Parameter
However, there are other types of pages having a single URL parameter where the content is not same on every page. In this scenario, four options are available to the webmasters through URL Parameters tool. These are:
1. Let Google Bot Decide
2. Every URL
3. Only URLS with value=x
4. No URLs
‘Let Google Bot Decide’ is selected when you are not very sure about how to handle the pages with URL parameters. In this scenario, Google decides the behavior of parameters.
‘Every URL’ shall be selected to make the bots understand that every URL having a particular parameter is unique. When you make the bots consider these pages as unique, you must ensure that the content on these pages is not same because, if the content on pages is same and bots treat page URLs as unique, then there is a case of duplicate content. This might adversely impact your rankings. This command is not meant to permit or restrict the crawling of bots but only to tell bots that the URLs are unique.
‘Only URLs with value=x’ allows Google bots to crawl only those pages where the value of ‘x’ matches the one which is entered by the administrator. This command has been developed for a typical situation where the content on the site is same but the parameter is used to sort the order of displayed content. You might want Google bots to crawl only one sorted order of the page rather than the other.
When a parameter is entered against ‘No URLs’ category, it is a clear indication to the Google bots that all URLs containing this parameter shall not be crawled. This shall be used carefully. Most essentially, it is used in cases of long tail URLs having multiple parameters when there is a need to restrict crawling up to the last one or two parameters only.
Cases of Multiple URL Parameters
There are instances when a URL might be having more than one parameter, often 3 or more. In these cases, you might have already used this tool on one or the other of parameters appearing in the URLs. In this scenario, in case of pages having multiple of these parameters, the most restrictive blocking will take precedence over the less restrictive ones.
I will use the same example as given in Google but make it easier for you to understand this function.
Suppose, there are 3 parameters and against each of these there is a separate command used, such as:
- shopping-category (Every URL)
- sort-by (Only URLs with value = production-year)
- sort-order (Only URLs with value = asc)
Now, there are two URLs:
On the basis of the above three commands, the bots will crawl the first option and not the second one because the second command makes it clear to bots to crawl only that URL which has ‘sort-by=production-year’ and not by size.
The web development companies shall ideally use different names of URL parameters while developing the complex websites since there might be a need to control the crawling behavior of Google bots using this tool.