robots txt

From IndieWeb
Jump to: navigation, search

robots.txt is a file used to inform web crawlers what parts of a site should or should not be crawled.

Example command names:


The following examples may be copy pasted into a plain text robots.txt file and placed at the root of your domain.

Brief example to block anything inside a particular top level directory "/wiki/":

User-agent: *
Disallow: /wiki/

Note that Google seems to ignore the "*" User-agent and must be specifically disallowed:

User-agent: Googlebot
Disallow: /wiki/

You may want to entirely block some particularly abusive bots:

User-agent: AhrefsBot
Disallow: /

More examples:

See Also