Block bingbot from crawling my site

Question

Block bingbot from crawling my site

I would like t to completely block bing from crawling my site at the moment (its attacking my site at an alarming rate (500 GB of data per month).

I have 1000 subdomains added to the bing webmaster tools, so I cannot go and set each scan speed. I tried to block it using the robots.txt file, but it doesn’t work here is my robots.txt

# robots.txt User-agent: * Disallow: Disallow: *.axd Disallow: /cgi-bin/ Disallow: /member Disallow: bingbot User-agent: ia_archiver Disallow: /

+6

asp.net-mvc .htaccess robots.txt bots bing

Zoinky Nov 28 '14 at 12:19

source share

2 answers

Your robots.txt file is invalid:

You need line breaks between records (a record starts with one or more User-agent lines).
Disallow: bingbot prohibits crawling URLs whose paths begin with "bingbot" (ie http://example.com/bingbot ), which is probably not what you want.
Not an error, but Disallow: not required (since it is the default by default).

So, you probably want to use:

 User-agent: * Disallow: *.axd Disallow: /cgi-bin/ Disallow: /member User-agent: bingbot User-agent: ia_archiver Disallow: /

This prohibits scanning anything for bingbot and ia_archiver. All other bots can scan everything except URLs whose paths begin with /member , /cgi-bin/ or *.axd .

Please note that *.axd will be interpreted literally in accordance with the original robots.txt specification (therefore, they will not scan http://example.com/*.axd , but will scan http://example.com/foo.axd ) However, many bots extend the specification and interpret * as a kind of wildcard.

+2

unor Nov 29 '14 at 19:00

source share

Carl · Accepted Answer · 2014-11-28T16:54:06+0000

This will definitely affect your SEO / search ranking and force pages to abandon the index, so use with caution

You can block requests based on the user agent string if you have the iis rewrite module installed (unless you go here )

Then add the rule to your webconfig as follows:

 <system.webServer> <rules> <rule name="Request Blocking Rule" stopProcessing="true"> <match url=".*" /> <conditions> <add input="{HTTP_USER_AGENT}" pattern="msnbot|BingBot" /> </conditions> <action type="CustomResponse" statusCode="403" statusReason="Forbidden: Access is denied." statusDescription="You do not have permission to view this page." /> </rule> </rules> </system.webServer>

This will return 403 if the bot gets to your site.

UPDATE

Looking at your robots.txt, I think it should be:

 # robots.txt User-agent: * Disallow: Disallow: *.axd Disallow: /cgi-bin/ Disallow: /member User-agent: bingbot Disallow: / User-agent: ia_archiver Disallow: /

Block bingbot from crawling my site

More articles: