Robots plays important role in the field of SEO. We have two ways to control pages and folders one is using Robots META tag and other is through robots.txt
A web page creator can specify which page should be indexed and which page should not be indexed by search engines by placing Robots META tag in the HTML section
Here are some Robots tags that are common
< content=”NOINDEX” name=”ROBOTS”>- Ignore content and follow links
< content=”NOFOLLOW, INDEX ” name=”ROBOTS”>- Include content and do not follow links
< content=”NOINDEX,NOFOLLOW” name=”ROBOTS”>- Ignore content and do not follow links
< content=”INDEX,FOLLOW” name=”ROBOTS”>- Include content and follow links
< content=”NOARCHIVE” name=”ROBOTS”>- Cache link should not show Search results pages
< content=”NOODP” name=”ROBOTS”>- The Open Directory Project (ODP) title and description for the page should not be displayed in Search results
< content=”NOYDIR” name=”ROBOTS”>- The Yahoo Directory title and description for the page should not be displayed in Search results
< content=”NOSNIPPET” name=”ROBOTS”>- Titles are only displayed in Search results page and not description or text context for this page
In addition to manage folder level user agent control robots.txt file can be used. This file can be placed in root of each server and the format is plain text not HTML
Through this file website owner or webmaster can allow access to web page content and disallow access to admin, cgi and any secured files that you don’t want search engines to index.
A typical robots.txt file will look like
Explains, all robots can crawl except the admin files, and crawl files named content folder, and should not crawl test, paypal, credit and cgi folder.