The robots.txt is one of the fundamentals that most websites creators/owner (webmasters) leave out of a Websites SEO.
The robots.txt is a simple script that allows a person to type text that tells a search engine what is indexable and what isn't.
This allows a search engine to display results that are more relevant and accurate to a specific page, or an entire site.
The script is uploaded to your website as a file so it appears as a URL so a search engines "crawler" can access and interpret.
What are the main uses of the robots.txt
This script has many uses.
To make it less confusing we will focus on the most important functions and attributes that the script is capable of. This will ensure it is used correctly - This script is the most important part of a websites SEO!
Taking the time with the robots.txt, making sure it is always accessible and correctly scripted are the most important aspects for a search engines crawler.
How is the script created?
There are many programs available online that can be used to generate a script for a website, but they are limited in functionality and often will be only able to get half the job done.
The script itself is really simple to use and no extra software or paid service is needed to create it - If your computer is running windows all that is needed to be opened is notepad.
This is easily accessed through the start menu and located as an icon in your list of programs.
Once notepad is opened it is a simple matter of writing in the basic attributes and adding the functions that are needed for that particular website.
Once the script is finished all that has to be done is save it with this exact name with no captitals - robots.txt
After you have saved it you have to upload it as a file to your website so it is a URL that is searchable to a crawler. i.e www.website.com/robots.txt
If using a platform to build a website where all the webmaster has to do is drag in a feature and all the scripting and building of the website is automated, than the file manager/upload area would be accessed from a control panel. This option is the best when a webmaster is not familiar with HTML scripting or website building.
If your an advanced webmaster then upload it as it's own folder so it becomes its own URL.
Basic attributes and functions of the robots.txt
Let's look at a general example of the script. Depending on what is needed from the script for a website, the functions and attributes will differ.
This is just to show it's basic functions and its main attributes.
The user agent is the search engines crawler that this script will target. Placing the * means this will effect every search engines crawler that arrives at this website.
This is the part of the script where what is and what isn't allowed to be crawled is placed. Placing the / will allow every part of a website to be crawled and indexed.
Now we have explained a basic scenario, understanding what the primary function and attributes of the script are becomes less confusing.
Advanced attributes and functions of the script
The next attribute is useful when it comes to keeping a search engine from crawling content that isn't helping your SEO.
The disallow attribute
Just as the allow attribute allows an action the disallow doesn't allow an action. The same applies for the function we have chosen / meaning everything.
The benefit of this function is that you can keep pages or content out of search results that aren't helpful for your websites search performance. To narrow it down to a specific page, add the URL of the page after the forward slash / (i.e /contact-us.html)
Remove an image:
To remove an image, simply place the crawler you wish to have the effect on in the user agent than type the images URL in the disallow section.
User-agent: Bingbot-Images, Googlebot-Image... etc
This will remove all images from a website. To apphend it to a specific image type the URL after the /. i.e /images/cat-scratching.jpg
Further reading can be found at our DIY SEO Help Article