It contains instructions to search engine robots and helps us to allow or disallow indexing of certain pages or directories. It has other useful functions, which are applied using special directives. If the settings are correct this file is of great importance for indexing of the resource by search engines.
It is a text file located in the root directory of the site and containing instructions intended for the search engines. This file allows you to restrict indexing to any of the sections and pages of Your site, and specify the file path and the primary mirror.
An example of the closure from indexing technical sections of the site clearly demonstrates the usefulness of robots.txt. If technical pages on Your website are open for indexing, the search engine will try to throw away these pages from the index. Thus, he can close and useful pages of the site.
To create it you can use regular Notepad, and then place it in the resource directory. When the crawler will go to your site, it first reads the file that contains the instructions.
For the settings of this file decided to use such basic directives: Disallow and User-agent. The last determines exactly how search engine robot will execute a ban on indexation, stipulated in the second Directive. Approximately, therefore, the statement will disallow indexing of the entire site to all search engines User-agent:* Disallow:/. If the Disallow path is written to any directory or file, the robot will stop to index. Remember, it is not necessary to register in one line multiple paths at once, as framed thus a line is not going to work. Alternatively, if you are willing to open to the index directory or file, use the command Allow.
Robots.txt may also use additional directives. For example, Host is used when Your site has several mirrors. This Directive tells the robot on the main mirror of Your site, and it will be present during all renditions of search engines. Sitemap is a Directive that will help search engine bots to locate Your site map. As this Directive Crawl-delay is used to create delay between downloads of pages by search machines. It is very useful in case if the site has a large number of pages. Directive request-rate adjusts the frequency of loading the search engines website pages, there are phrases like request-rate: 1/9 will lead to the fact that the search engine will be to load a single page in 9 seconds. Visit-time-Directive defines the period of time during which the robot is allowed to load pages. It is recommended to set GMT time.
Role settings in the work robots.txt
Incorrect configuration of this file can result in pages that contain sensitive information about Your clients or customers will be open to search or Vice versa useful and necessary resource page will be banned for indexing by search engines.
How do you verify that your settings are correct?
For these purposes, you can use this service as Yandex. Webmaster Analysis of this service. Type in the name of the domain you want to check, and the service will display all the available errors.