Spider Access Rule Configuration

Default Access Policy (Applies to all spiders not individually configured)

Configure mainstream search engine spiders individually

Baiduspider (Baidu)

Googlebot (Google)

Bingbot (Bing)

Sogou (Sogou)

360Spider (360)

YisouSpider (Yisou)

YandexBot (Yandex)

Bytespider (Toutiao)

Directories to Disallow - One per line, e.g., /admin/

Directories to Allow - One per line, e.g., /public/

Crawl-delay (Optional, in seconds)

Sitemap URL (Optional)

Generated robots.txt content

Definition: robots.txt is an ASCII text file stored in the root directory of a website. It is the first file checked by a search engine spider when visiting a site.
Role: It acts as a "gentleman's agreement" between the website and the crawler. It tells search engines which directories can be crawled and which are forbidden, thereby protecting website privacy and saving server bandwidth.

User-agent: Defines which search engine spider the rule applies to. * represents all spiders.
Disallow: Tells the crawler not to crawl the specified directory or file. For example, Disallow: /admin/ forbids crawling all content under the admin directory.
Allow: Tells the crawler the directories it is allowed to crawl. Usually used in conjunction with Disallow to "make an exception" and allow crawling of a specific subdirectory within a restricted large directory.
Crawl-delay: Limits the time interval (in seconds) between crawls to prevent the spider from crawling too fast and crashing the server (Note: Some search engines like Google no longer strictly adhere to this directive, opting for configuration in their webmaster tools instead).
Sitemap: Tells the crawler the URL of the website's Sitemap XML file, helping search engines discover all links on the site more efficiently.

The Robots protocol is merely an advisory protocol that "guards against gentlemen, not villains." Malicious crawlers can completely ignore it. Therefore, for truly sensitive and confidential data, you must perform permission verification on the server side.
This file must be placed in the root directory of your website, for example: https://www.yourdomain.com/robots.txt.

Robots.txt Generator