Robots Meta Tag

 

How To Create A Robots Text File on Your Website
By:

How To Create A Robots Text File on Your Website The accurate instructions to create a Robots Text file on a website are presented in this article. Why you need a robots text is discussed and some tools/software to assist you construct a robots.txt file are included... although this file is easily constructed using a simple text editor like Notepad..

What Is A robots.txt File

A robots.txt is a file placed on your server to tell the various search engine spiders not to crawl or index certain sections or pages of your site. You can use it to prevent indexing totally prevent certain sensitive areas of your site from being indexed or to issue individual indexing instructions to specific search engines.

The file itself is a simple text file which as mentioned is usually created in Notepad... using a WYSIWYG editor is not recommended. The completed robots.txt file is then saved to the root directory of your site that is the directory where your home page or index page is which is commonly referred to as the public_html folder.

Another method of stopping an individual web page from being indexed is by using the appropriate Robots Exclusion Meta Tag in the head section of the HTML code. Please refer to Full instructions on How To Create A Robot Meta Tag which includes a List of Robots Names For Popular Search Engines .

Why Do You Need A robots.txt File

Most search engines or at least all the important ones now look for a robots.txt file as soon their spiders or bots arrive on your site. So even if you currently do not need to exclude the spiders from any part of your site having a robots.txt file is still a good idea it can act as a sort of invitation into your site.

There are a number of situations where you may wish to exclude spiders from some or all of your site.

You are still building the site or certain pages and do not want the unfinished work to appear in search engines

You have information that while not sensitive enough to bother password protecting is of no interest to anyone but those it is intended for and you would prefer it did not appear in search engines.

Most people will have some directories they would prefer were not crawled for example do you really need to have your cgibin indexed Or a directory that simply contains thank you or error pages.

If you are using doorway pages (similar pages each optimized for an individual search engine) you may wish to ensure that individual robots do not have access to all of them. This is important in order to avoid being penalized for spamming a search engine with a series of overly similar pages.

You would like to exclude some bots or spiders altogether for example those from search engines you do not want to appear in or those whose chief purpose is collecting email addresses.

The very fact that search engines are looking for them is reason enough to put one on your site. Have you looked at your site statistics recently If your stats include a section on files not found you are sure to see many entries where search engines spiders looked for and failed to find a robots.txt file on your site.

Creating the robots.txt file

There is nothing difficult about creating a basic robots.txt file. It can be created using notepad or whatever is your favorite text editor. Each entry has just two lines:

UserAgent: Spider or Bot name Disallow: Directory or File Name

This line can be repeated for each directory or file you want to exclude or for each spider or bot you want to exclude.

A few examples will make it clearer.

1. Exclude a file from an individual Search Engine

You have a file privatefile.htm in a directory called private that you do not wish to be indexed by Google. You know that the spider that Google sends out is called Googlebot. You would add these lines to your robots.txt file:

UserAgent: Googlebot Disallow: /private/privatefile.htm

2. Exclude a section of your site from all spiders and bots

You are building a new section to your site in a directory called newsection and do not wish it to be indexed before you are finished. In this case you do not need to specify each robot that you wish to exclude you can simply use a wildcard character * to exclude them all.

UserAgent: * Disallow: /newsection/

Note that there is a forward slash at the beginning and end of the directory name indicating that you do not want any files in that directory indexed.

3. Allow all spiders to index everything

Once again you can use the wildcard * to let all spiders know they are welcome. The second disallow line you just leave empty that is your disallow from nowhere.

Useragent: * Disallow:

4. Allow no spiders to index any part of your site

This requires just a tiny change from the command above be careful!

Useragent: * Disallow: /

If you use this command while building your site dont forget to remove it once your site is live!

Getting More Complicated

If you have a more complex set of requirements you are going to need a robots.txt file with a number of different commands. You need to be quite careful creating such a file you do not want to accidentally disallow access to spiders or to areas you really want indexed.

Lets take quite a complex scenario. You want most spiders to index most of your site with the following exceptions:

You want none of the files in your cgibin indexed at all nor do you want any of the FP specific folders indexed eg _private _themes _vti_cnf and so on.

You want to exclude your entire site from a single search engine lets say Alta Vista.

You do not want any of your images to appear in the Google Image Search index.

You want to present a different version of a particular page to Lycos and Google. (Caution here there are a lot of question marks over the use of doorway pages in this fashion. This is not the place for a discussion of them but if you are using this technique you should do some research on it first.)

Lets take this one in stages!

1. First you would ban all search engines from the directories you do not want indexed at all:

Useragent: * Disallow: /cgibin/ Disallow: /_borders/ Disallow: /_derived/ Disallow: /_fpclass/ Disallow: /_overlay/ Disallow: /_private/ Disallow: /_themes/

Disallow: /_vti_bin/ Disallow: /_vti_cnf/ Disallow: /_vti_log/ Disallow: /_vti_map/ Disallow: /_vti_pvt/ Disallow: /_vti_txt/

It is not necessary to create a new command for each directory it is quite acceptable to just list them as above.

2. The next thing we want to discuss is how to prevent a particular search engine from getting in there at all. Les use Alta Vista as an example The Altavista bot is called Scooter.

UserAgent: Scooter Di

Copyright © robotictoys1.com