Search engine optimization is something that every website owner should take very seriously nowadays. Due to the fact that search engines are constantly improving their crawling strategies and becoming more and more sophisticated, SEO can often be quite a tricky task. While some methods like keyword usage and on-page SEO are familiar to many, other techniques are more obscure, yet equally important. Therefore, in this article we have decided to talk about robots.txt – one of the most controversial SEO tools.
What Is Robots.txt
Simply put, robots.txt is a communication mechanism between the site and the search engines. It outlines the rules for indexing your website and presents them to the web crawler. So basically, the file defines which parts of your site Google’s allowed to index and which should be avoided.
It is important to remember though that while most credible search engines will recognize and honor the directives inside the robots.txt file, some malicious or low-quality crawlers may disregard them completely.
Now, if you are wondering whether robots.txt is an absolute necessity, the answer is no. Your site will be indexed by the search engines regardless of whether you have it or not. Moreover, WordPress has it’s own virtual robots.txt. However, it is strongly recommended to create this file for your site and place it in your root folder.
How to Create One for WordPress
Creating robots.txt file is actually much simpler than one may think. All you have to do is create a text file, name it robots.txt and upload it into your site’s root directory. Adding the content into the file is not so difficult for that matter. There are only 6 possible commands you can include in robots. txt:
- User-Agent – determines the search engine crawler;
- Disallow – prohibits the crawler to index particular files or directories on your site;
- Allow – specifically allows crawling an item on the site;
- Sitemap – points the crawler to the sitemap;
- Host – designates the preferred domain for a site with more than one mirror;
- Crawl-delay – specifies the time interval between search engines’ requests to your server.
As for the user-agent names, the complete list can be found on this site. Note, if you want to set the same rules for all search engines, use an asterisk (*) after the user-agent command.
Here are a few other general rules on composing the robots.txt file
- If you wish to define different rules for specific bots, do so in the beginning of the file.
- Use forward slash to allow or disallow all directories.
- Define each entity you wish to allow/disallow using a separate line of command.
- Don’t use comments or spaces.
- Remember, that the paths to directories are case-sensitive.
Taking all of the aforementioned into account, the example of robots.txt file may look something like this:
User-Agent: * Allow: /wp-content/uploads/ Disallow: /wp-content/plugins/ Sitemap: http://mysite.com/post-sitemap.xml Sitemap: http://mysite.com/page-sitemap.xml
What to Include in Your Robots.txt
As it has been mentioned in the beginning of this article, robots.txt has caused a lot of controversy around the web. Different sources provide various, often contradicting advice on the best usage cases of this SEO tool.
One of the existing tactics of composing the robots.txt file outlines the necessity to disallow the readme.html and WordPress plugin directory and login pages. This step is recommended to take to increase the security of your site and protect it from the mass attacks. According to this strategy it is also advised to add the sitemap into your robots.txt.
Another approach is rather minimalistic. It is based on the fact that search engines and Google in particular perceive the site as a whole, and due to this disallowing some entities may result in major crawl errors and thus search engine ranking decrease. This method also advises against disallowing WordPress login page and admin directory, because those already have a noindex tag added by WordPress. As for the sitemaps, it is recommended to add them manually to Google Webmaster Tools.
It is up to you which strategy to choose, but either way, make sure to check if there are no crawl errors and if Google sees your site properly. To do so, go to Google Webmaster Tools and choose “Fetch as Google” option and then “Fetch and render”. If errors occur, consider adjusting the instructions in your robots.txt file.
As you can see, robots.txt is one of the rather tricky, yet important parts of the site’s search engine optimization. If used properly, it can keep the search engines away from the content you don’t want to be seen. Nevertheless, the misuse of this tool can do a lot of harm and seriously damage the search engine rankings.
While in the past it was suitable to prohibit a whole range of entities from being accessed by search engines, nowadays the pattern goes more toward the minimalistic approach of composing robots.txt.
All in all, you have to remember, that there are no universal rules on how to compose the ideal robots.txt file and its content should be based on the type of site you have.
P.S. Looking for a way to switch to WordPress? Have a go with CMS2CMS, automated migration service. This tool will help you transfer all the content fast, easy and safe.
Latest posts by Valentyna Tysiachna (see all)
- CMS Digest: Blogging Is Not a Job, It’s a Diagnosis - June 27, 2019
- From Squarespace to WordPress: the Lazy Man’s Guide - June 18, 2019
- Great News for All CMS2CMS Users: Improved Drupal Migration - June 11, 2019