I run a web site and have had robots.txt request that msnbot not index my site. So far as I can tell from my access logs, msnbot and its relatives (media, and others) respect this request.
Needless to say, I was surprised when I suddenly started getting references from Bing queries. That simply shouldn't be. I've expelled the stench of Microsoft from my servers. I prefer quality over quantity, and don't care to have Microsoft benefit from anything I put my heart and sole into. So how did Microsoft index my pages?
My first thought what that they have another bot. Yet there is no reference to a bing*bot in my logs. And as I said, msnbot* isn't identifying itself if it's ignoring robots.txt. So, if I were being denied indexing access to the best sites on a given topic, but wanted to index them anyway, how would I go about it?
Well, I'd probably start off by going through a bunch of blogs with something that could understand the context - much like the recent Google Wave demonstration with respect to their new context sensitive spell checker. A lot of blogs link back to my site for detailed information and as a primary source. So if someone queries Bing with regard to this subject matter, then the indexed pages' links to my site could be used to suggest it as a primary source, thus my site would appear in the results, perhaps even higher than the blogs references it.
Thus, Microsoft can circumvent my desire to not have them index my site - and I see little that I can do to change that. Their support page says that site administrators can have some control, but only if they have an MSN Live login - which isn't going to happen.
Needless to say, I'm not at all happy about this, and will be working in some of my free time to see to it that anyone coming in from Bing are rerouted elsewhere. You couldn't pay me to use a Microsoft product. (I overwrote my final MS partition at the stroke of midnight, January 1, 2000, and have refused to use their products since, much to the headache of the HR department. Anything not available on FreeBSD, Linux, or Mac OS/X isn't necessary.) Microsoft will pay for this overstepping of bounds.