Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror

Comment I'm fine with this. (Score 2) 48

Honestly, more distributions that have no reason to exist and no distinguishing traits should shut down. The people that worked on them should move to projects that aren't just rehashes of existing projects.

When your eulogy can't come up with a way to describe you in any meaningful way, your life nor death were worthy of celebration or remembering.

Comment Re:context (Score 1) 61

Re-upload it to where, exactly? What other service is providing ChatGPT 4o, when "Open"AI obsoletes it again?

You can take the context and transition it (or just select a different model for the current context) as you wish, but what these people want is the model to remain available, including its behavior as well as tone and tenor.

They didn't lose the past context or the ability to continue the "conversation" entirely, they lost the ability to continue the "conversation" with 4o. These LLMs DO have "personality" or at least a certain color to their output.

Comment Re:Sounds like the accusations are true. (Score 1) 96

And I'll add one more point, which I meant to make in my previous comment:

Agents, AI or otherwise, aren't just pulling down a single page to present to the user. They're performing logic on that page that was accessed, and probably accessing additional pages. AI agents can be given instructions to "collect all the content on slashdot.org". I did that, and here's what ChatGPT did: https://chatgpt.com/share/6894...

That's behavior that should respect robots.txt, but it apparently uses Chrome, so... It clearly doesn't.

Comment Re:Sounds like the accusations are true. (Score 1) 96

All scrapers, crawlers, and other 'bots' SHOULD respect robots.txt. The original intent was to block what was termed at the time (1994) as "crawlers", but that has evolved as the Internet has evolved.

Justifying crawling behavior by saying it's "just scraping, and then loading additional pages..." is... Well. Fucky logic to say the least. Following your logic here: If I access a single page, then extract all the links in it and add it to an RSS feed, I'm free to then access all those subsequent pages because now they're in an RSS feed, and I get to scrape them. I just run this in a loop, iterating over all of them because hey, I just want the contents of all those pages, and...

Where do you draw the line? Your "amending the fulltext to your RSS feed" example is crawling. You iterated over a series of links, accessing all the linked pages to get their full text with an automated process. It's just a nonsense argument to try to say you weren't "crawling". Just because you added --max-depth=1 doesn't mean it wasn't an iterative, automated process retrieving the contents of a page.

An AI Agent, acting on behalf of a set of logic instructions given by a user, accessing multiple pages and traversing them based on findings in the preceeding pages (such as executing a search, then following links to scrape for values) isn't crawling then?

Automated processes aren't just dumb "crawling and scraping" any more.

I don't believe any automated process should allow itself to access content denied by robots.txt, no matter the logic leaps made to justify it. If I wanted automated processes to access those URLs, they wouldn't be covered by a Disallow rule in robots.txt. A robots.txt file is a statement saying "I specifically do or do not allow access based on these criteria". The criteria isn't dependent on your use case. It's dependent on mine.

Comment Re:Sounds like the accusations are true. (Score 1) 96

Depending on the agent, the output may never be seen by a human. "Constantly monitor eBay for a good deal on a waffle iron, and trigger a notification when found", for example. No eyes will ever see the pages the agent loads. It's just consuming eBay's compute resources. A much better written prompt will also ignore promoted eBay listings, inline advertisements, and so on.

Even if it's an action taken on behalf of a person, it's very unlikely the ads on the page will be delivered to the user. Keep in mind, the entire point of agentic actions is to get an end result, not to use it as a web browser proxy where you see the site's fully rendered content.

The point is something like starting an agentic task of "Access all available pages on the site slashdot.org from the main page and all comments, ignoring advertisements and clearly sponsored content, and build a sqlite database containing the site's posts and all crawled content"

Or

"Crawl everything2 looking for any references to petrified Natalie Portman and hot grits. Copy the text of any such pages, ignoring and omitting any off-topic content such as advertisements and navigation elements. Translate these pages to Pig Latin and repost them anonymously to Slashdot under random comments."

Comment Re:Sounds like the accusations are true. (Score 2) 96

Cloudflare will block if you ask them to block. If your desire is to let AI do what they want on your site, Cloudflare won't get in the way.

Many to most websites are business supported by ad revenue in addition to ecommerce - Letting AI become ad block for your end users is not ideal. Letting AI go haywire and purchase random shit instead of what the agent was instructed to do is also not ideal, for that matter.

Each site deserves the right to say "yes", "no", or "on these terms..." to AI crawlers and agents doing things.

Comment Re:Sounds like the accusations are true. (Score 1) 96

In my case, I have this in robots.txt, on a forum I don't want to get absolutely destroyed by bots that don't rate-limit themselves:

User-agent: *
Disallow: /

As for agentic actions, it's still an AI performing the actions. I said "No" to robots accessing my site. That they then pivot and do something that ISN'T not accessing my site is inappropriate no matter the reason.

Comment Sounds like the accusations are true. (Score 4, Insightful) 96

Sounds to me like the accusations are true, and Perplexity is deflecting by saying they're harmless and even helpful.

On my own servers, I see a pattern of behavior of something hitting my robots.txt (which both has a blanket denial for all user agents, AND specific denails for all known bots), and then suddenly a variety of IP addresses start hammering my site. It's bad enough I'm either going to put my servers behind Cloudflare, or at least one of the gatekeeper challenge systems.

Perpexity's shitty response really does nothing for me but confirm the accusations.

Comment Re:Albanese is a money stealing cunt (Score 1) 125

How the fuck can a party win with 37% of the national vote?

This happens in countries with more than two political parties. In a two-party election, Labor would have won with 55% of the vote from the look of things - https://en.wikipedia.org/wiki/... and https://en.wikipedia.org/wiki/...

Slashdot Top Deals

This is a good time to punt work.

Working...