Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror

Comment Re: Imagine the material... (Score 1) 86

There's a lot of info available about attack mitigation (or just hungry crawlers) and how to avoid/blackhole them. Problem is, you have to have control of portions of the network stack to do them effectively.

Security through obscurity only works so long as you can be obscure, which is part of the vibe of the post. It's really stressful sometimes, depending on what's hosted.

Of the sites I don't have behind Cloudflare, the assets aren't worth anything and I truly don't care if they show up in AI. Otherwise, what's mine is mine, and not theirs.

Comment Re: Imagine the material... (Score 1) 86

If you can afford it, also consider Cloudflare; their bot identification is really good. You can use defaults or make your own filters. They're not the only ones that do this, but my experience with them has been positive. Much depends on your skills in how the web actually works, network + site interaction.

Their protections are cheap for the quality/speed. All of the large sites I manage are behind Cloudflare, including their DNS. Their DNS management is superior, and has interesting tricks for mixed-media sites. I don't work for either of these companies.

Comment Re: Imagine the material... (Score 3, Informative) 86

Get Wordfence if your site is Wordpress. The controls inside (free version) are enough to rate-limit crawlers effectively.

If you don't have Wordpress, your choices are more complex; you MUST use an IP filtering system and front-end your site with it to rate-limit everyone methodically. Crawlers eventually quite.

Many crawlers identify themselves in the get/post sequence. You have to parse those. If you understand fail2ban conceptually, it's the method used to create like-type gets that score with higher rates, and folder transversals. Accumulate your list and band them/null-route/block or whatever your framework permits.

Yes, you can blackhole through various famous time-wasters, but this also dogs your site performance. Captcha and others are becoming easier to fool, and for this reason, they're not a good strategy.

Once you decide on a filtering strategy, monitor it. Then share your IP ban list with others. Ban the entire CIDR block, because crawlers will attack using randomized IPs within their block. If you get actual customers/viewers, monitor your complaint box and put them on your exemption list.

Comment Re: Imagine the material... (Score 0) 86

There is a difference between "secret" and "Don't Crawl Our Site".

It's almost impossible to masquerade as a human; even throttled crawlers are easily identifiable through many different and often evil traits used.

The kleptocracy of AI (and other) crawlers is what's at issue.

Slashdot Top Deals

Scientists will study your brain to learn more about your distant cousin, Man.

Working...