Harnessing the Energy of Robots.txt

Sometimes, we may want search engines never to index certain parts of the site, or even ban other SE in the site altogether. This is the place where a simple, little 2 line text file called robots.txt is available in. Once we have a website up and running, we should ensure that all visiting se's can access all the pages we want them to consider. Sometimes, we may want search-engines never to list certain areas of the site, or even prohibit other SE in the site all together. This is the place where a simple, little 2 line text file called robots.txt comes in. Robots.txt resides in your websites main directory (o-n LINUX systems this is your /public_html/ directory), and looks something like the following: User-agent: * Disallow: The very first line controls the robot that'll be visiting your site, the next line controls if they're allowed in, or which elements of the site they're not allowed to go to Then easy repeat the above mentioned lines, If you prefer to handle multiple spiders. So an example: User-agent: googlebot Disallow: User-agent: askjeeves Disallow: / This can enable Goggle (user-agent name GoogleBot) to go to every page and service, while in the sam-e time banning Ask Jeeves from the site completely. Its still very advisable to place a robots.txt report on your own site, even if you want to let every robot to index every page of your site. It'll end your mistake logs replenishing with articles from search engines trying to access your robots.txt file that doesnt occur.