robots.txt is a file used to inform web crawlers what parts of a site should or should not be crawled.
Example command names:
The following examples may be copy pasted into a plain text robots.txt file and placed at the root of your domain.
Brief example to block anything inside a particular top level directory "/wiki/":
User-agent: * Disallow: /wiki/
Note that Google seems to ignore the "*" User-agent and must be specifically disallowed:
User-agent: Googlebot Disallow: /wiki/
You may want to entirely block some particularly abusive bots:
User-agent: AhrefsBot Disallow: /
- LOL: https://web.archive.org/web/20140702214604/https://www.google.com/killer-robots.txt
- Google crawler’s implementation of robots.txt: https://developers.google.com/search/docs/advanced/robots/robots_txt
- Google's C++ robots.txt parser: https://github.com/google/robotstxt