Months ago, Google announced some new indexing rules for robots.txt and sitemap files. One new rule prevents website owners and webmasters from hiding web pages through the robots.txt file. Every URL placed on said sections of your site can now be read and indexed by Google.
If you have not been able to keep up with Google algorithm updates as of late, this new crawl system may overwhelm you. However, Google’s John Mueller recently shared some indexing hacks. He presented methods for blocking robots.txt and sitemap appearing of Google’s search result pages.
The advice was in response to a tweet from Gary Illyes, also from Google. Illyes noted how robots.txt now functions in the same manner as a regular URL. Though it still provides instructions to crawlers on which sites to crawl, you cannot stop crawlers from indexing the file.
Illyes tweet read as follows: “Triggered by an internal question: robots.txt from indexing point of view is just a URL whose content can be indexed. It can become canonical or it can be deduped, just like any other URL. It only has special meaning for crawling, but there its index status doesn’t matter at all.”
Taking his colleague’s cue, according to Mueller, blocking the files from crawlers requires the x-robots-tag HTTP header. You can use this to your advantage provided you know how it works.
He further added: “Also, if your robots.txt or sitemap file is ranking for normal queries (not site:), that’s usually a sign that your site is really bad off and should be improved instead.”
Should you encounter an issue where your robots.txt file starts appearing on search results, use the x-robots-tag HTTP header to block it. Note: this solution is only good for the time being. There are no long-term methods that can stop your robots.txt file from ranking. Mueller recommends vigilance as there might be bigger issues to handle in the future.