Robots.txt File Usage and Response Code Messages
Last, but not least, in our Technical SEO Checklist series, we’ll explore how to use the robots.txt file and what to do when you receive a response code error.
Web site owners should use the /robots.txt file to give instructions about their site to web robots; its technical name is “The Robots Exclusion Protocol”. Its main purpose is to direct the robots away from pages that shouldn’t be indexed because they aren’t relevant for a search user.
You may consider adding your PPC landing pages, admin section, employee portal, etc. to the robots file. Sometimes there is a link on the accessible website that points to a page or section like the ones mentioned above: these links allow a robot to crawl through and find the page you meant to be protected or hidden.
This is also valuable when creating a website, because you want to avoid indexing development pages. Google is wonderful at crawling the internet to find websites, and a website under construction is no exception.
How It Works
If a search engine robot wants to visit a website URL, it first checks for http://www.example.com/robots.txt to either get permission or to leave.
To exclude all robots from the entire website:
To allow all robots complete access of the website:
(To accomplish this, you can also create an empty robots.txt file.)
To exclude all robots from part of the srever:
Be aware that robots can ignore your /robots.txt; especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers that will pay no attention.
The /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use. So don't try to use /robots.txt to hide information such as internal information. You will want to create an employee portal or secure login for confidential information.
There are multiple tools you can run for diagnostic testing of your website. We do it often for our client sites, as well as our own, because it’s important to see how your website responds to requests. Here’s a rundown of response codes that you may see and suggestions to take action, or continue on your merry way.
- 200: Perfect. This is how normal, live pages respond.
- 301: Redirect, put in place when a page moved elsewhere. This says that the page lives on another URL permanently, and that page is where you’ll find the relevant information. This is perfectly fine.
- 302: Temporary Redirect. This page isn’t here, or there, but it might come back? This is an error that should be corrected with a 301 redirect.
- 404: No page found. This is an error due to the deletion of a page without a redirect, or if a page was requested that doesn’t exist. This can be remedied with a 301 redirect (if the page was moved) or a custom 404 page to help the user find their way.
- 500: Internal Server Error. This is a very generic response code that is shown to users and search engine robots when something has gone wrong on the server. This is the time when you need to contact your server provider (GoDaddy, Network Solutions, etc.) and alert them of the issue.
- 502: Bad Gateway. The server was acting as a gateway or proxy and received an invalid response from the upstream server. Error that needs fixed, because users aren’t able to access your website.
There are other technical portions to the web, such as Schema markup, code structure, responsive design and image sizing, each of which are their own beast, but these are the first steps for beginners and intermediate website managers to improve their overall usability and site optimization. Make sure to read Part One and Part Two of our series to tackle your basic technical SEO.