10 Oct Robots.txt Made Simple
A common question I get asked is just what the heck is a “robots.txt” file and what good is it? Well, in some ways it can be the most important file you have, but it’s often overlooked and disregarded.
It’s important in that making a mistake in your robots.txt file could easily torpedo the success of your site. One single typo, and poof, gone. Let me explain.
The robots.txt file is used by visiting search engine robots/crawlers to your site (like googlebot and yahoo slurp) to provide them with information on where to search and where not to search.
For probably about 95% of all webmasters the file, if they even have one, is two lines that read:
<p style=”padding-left: 30px;”>User-agent: *
And that’s the file, which tells all robots to search and index everything it can find. The “User-agent” refers to the actual robot, in this case it’s marked with a wildcard “*” signifying this directive is for all robots (if we wanted just to command Google’s crawler we could write “Googlebot”.
The second line “Disallow:” tells the robot that it is restricted from accessing certain files, in this case since it’s blank the robots can access all files it can find.