Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror

Comment Re:Sounds like the accusations are true. (Score 1) 96

And I'll add one more point, which I meant to make in my previous comment:

Agents, AI or otherwise, aren't just pulling down a single page to present to the user. They're performing logic on that page that was accessed, and probably accessing additional pages. AI agents can be given instructions to "collect all the content on slashdot.org". I did that, and here's what ChatGPT did: https://chatgpt.com/share/6894...

That's behavior that should respect robots.txt, but it apparently uses Chrome, so... It clearly doesn't.

Comment Re:Sounds like the accusations are true. (Score 1) 96

All scrapers, crawlers, and other 'bots' SHOULD respect robots.txt. The original intent was to block what was termed at the time (1994) as "crawlers", but that has evolved as the Internet has evolved.

Justifying crawling behavior by saying it's "just scraping, and then loading additional pages..." is... Well. Fucky logic to say the least. Following your logic here: If I access a single page, then extract all the links in it and add it to an RSS feed, I'm free to then access all those subsequent pages because now they're in an RSS feed, and I get to scrape them. I just run this in a loop, iterating over all of them because hey, I just want the contents of all those pages, and...

Where do you draw the line? Your "amending the fulltext to your RSS feed" example is crawling. You iterated over a series of links, accessing all the linked pages to get their full text with an automated process. It's just a nonsense argument to try to say you weren't "crawling". Just because you added --max-depth=1 doesn't mean it wasn't an iterative, automated process retrieving the contents of a page.

An AI Agent, acting on behalf of a set of logic instructions given by a user, accessing multiple pages and traversing them based on findings in the preceeding pages (such as executing a search, then following links to scrape for values) isn't crawling then?

Automated processes aren't just dumb "crawling and scraping" any more.

I don't believe any automated process should allow itself to access content denied by robots.txt, no matter the logic leaps made to justify it. If I wanted automated processes to access those URLs, they wouldn't be covered by a Disallow rule in robots.txt. A robots.txt file is a statement saying "I specifically do or do not allow access based on these criteria". The criteria isn't dependent on your use case. It's dependent on mine.

Comment Re:Sounds like the accusations are true. (Score 1) 96

Depending on the agent, the output may never be seen by a human. "Constantly monitor eBay for a good deal on a waffle iron, and trigger a notification when found", for example. No eyes will ever see the pages the agent loads. It's just consuming eBay's compute resources. A much better written prompt will also ignore promoted eBay listings, inline advertisements, and so on.

Even if it's an action taken on behalf of a person, it's very unlikely the ads on the page will be delivered to the user. Keep in mind, the entire point of agentic actions is to get an end result, not to use it as a web browser proxy where you see the site's fully rendered content.

The point is something like starting an agentic task of "Access all available pages on the site slashdot.org from the main page and all comments, ignoring advertisements and clearly sponsored content, and build a sqlite database containing the site's posts and all crawled content"

Or

"Crawl everything2 looking for any references to petrified Natalie Portman and hot grits. Copy the text of any such pages, ignoring and omitting any off-topic content such as advertisements and navigation elements. Translate these pages to Pig Latin and repost them anonymously to Slashdot under random comments."

Comment Re:Sounds like the accusations are true. (Score 2) 96

Cloudflare will block if you ask them to block. If your desire is to let AI do what they want on your site, Cloudflare won't get in the way.

Many to most websites are business supported by ad revenue in addition to ecommerce - Letting AI become ad block for your end users is not ideal. Letting AI go haywire and purchase random shit instead of what the agent was instructed to do is also not ideal, for that matter.

Each site deserves the right to say "yes", "no", or "on these terms..." to AI crawlers and agents doing things.

Comment Re:Sounds like the accusations are true. (Score 1) 96

In my case, I have this in robots.txt, on a forum I don't want to get absolutely destroyed by bots that don't rate-limit themselves:

User-agent: *
Disallow: /

As for agentic actions, it's still an AI performing the actions. I said "No" to robots accessing my site. That they then pivot and do something that ISN'T not accessing my site is inappropriate no matter the reason.

Comment Sounds like the accusations are true. (Score 4, Insightful) 96

Sounds to me like the accusations are true, and Perplexity is deflecting by saying they're harmless and even helpful.

On my own servers, I see a pattern of behavior of something hitting my robots.txt (which both has a blanket denial for all user agents, AND specific denails for all known bots), and then suddenly a variety of IP addresses start hammering my site. It's bad enough I'm either going to put my servers behind Cloudflare, or at least one of the gatekeeper challenge systems.

Perpexity's shitty response really does nothing for me but confirm the accusations.

Comment Re:Albanese is a money stealing cunt (Score 1) 125

How the fuck can a party win with 37% of the national vote?

This happens in countries with more than two political parties. In a two-party election, Labor would have won with 55% of the vote from the look of things - https://en.wikipedia.org/wiki/... and https://en.wikipedia.org/wiki/...

Comment Re:DO NOT WASTE YOUR MONEY -- no spoliers (Score 4, Informative) 70

Ah, the fine words of someone that wanted a movie that spent half its runtime retelling Superman's origin story, and the other half rehashing a specific comic book word for word.

Interesting you have so much to say about a movie you watched one quarter of.

"making his home an environmental complaint" - What?

Slashdot Top Deals

If you teach your children to like computers and to know how to gamble then they'll always be interested in something and won't come to no real harm.

Working...