PHP, MySQL, Drupal, .htaccess, Robots.txt, Phponwebsites: User-agent
User-agent -
Showing posts with label User-agent. Show all posts

6 Dec 2013


                        Robots.txt is a text file which is used to instruct the search engines for how to crawl and index the pages on your site. The search engines should be obey the robots.txt. The search engine follow the instruction specified by robots.txt file. If you want to some of your pages are not crawl by search engine, then you can protect that pages in robots.txt. Then the search engine don't crawl particular pages specified by your robots.txt file.

Robots.txt at phponwebsites

                       The search engine comes to your robots.txt file before it crawl to your pages. It must be placed in your main directory. Otherwise, it can not able to find robots.txt file in your server.

                User-agent: *
                Allow: /

                           User-agent represents the search engines.
                           Allow means the robots.txt file allow the all search engines to crawl all of your contents.

Disallow the all crawlers by robots.txt:

                User-agent: *
                Disallow:  /
                           Now, the search engine don't crawl and index your pages.

Disallow the particular folder by robots.txt:

                User-agent: *
                Disallow:  /cgi-bin/
                Disallow:  /images/

                 You need a separate disallow for each folder. Now the search engine don't index your images folder.

Disallow the particular file in a folder by robots.txt:

                User-agent: *
                Disallow:  /sample/example.php

                           Now, the search engine don't index example.php file in sample folder.

Robots meta tags by robots.txt:

                               It control the individual pages present in search result. It should be present within head tag.
For example,
                    <meta name="robots" content="noindex">
                       The above line instructs the search engine to don't show this page in search result. 

Disallow the particular search engine by robots.txt:

              If you want to restrict particular search engines don't index your pages, then you can use robots.txt to restrict it.

              User-agent: Googlebot
              Disallow: / 

                  Now all the search engines index your pages except google. Googlebot is the search engine of google.

See also: