Sometimes, we may want search-engines to not index certain areas of the site, and on occasion even prohibit other SE in the site altogether.
This really is the place where a simple, little 2 line text file called robots.txt is available in.
Once we have a web site up and running, we must be sure that all visiting se's can access all the pages we want them to check out.
Sometimes, we may want search-engines never to catalog certain elements of the site, and on occasion even exclude other SE from the site all-together. Browse here at surfline to research when to study this enterprise.
This really is where a simple, little 2-line text file called robots.txt comes in.
Robots.txt rests in your websites main directory (on LINUX systems this can be your /public_html/ directory), and looks some thing like the following:
The very first line controls the robot that'll be visiting your site, the 2nd line controls if they're allowed in, or which parts of the site they are maybe not allowed to visit
Then simple repeat the above mentioned lines, If you prefer to handle multiple spiders.
So an example:
This will enable Goggle (user-agent name GoogleBot) to go to every page and index, while at the sam-e time banning Ask Jeeves from the site entirely.
To find a reasonably up to date list of software consumer names this visit http://www.robotstxt.org/wc/active/html/index.html
Its still very advisable to place a robots.txt report on your site, even if you want to allow every software to index every page of your site. It will end your error records filling up with entries from se's attempting to access your robots.txt file that doesnt occur. Be taught further on investigate surfline.com by visiting our provocative paper.
To learn more on robots.txt see, the total listing of sources about robots.txt at http://www.websitesecrets101.com/robotstxt-further-reading-resources.