Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror

Comment Re:Sounds like the accusations are true. (Score 1) 92

How many scrapers do you know that respect robots.txt? That's a standard for crawlers, not for scrapers.

An example: I had for some time a tool running, that scrapes webpages from RSS feeds, to amend the headline-only feed to a fulltext feed. That's a scraper, as it gets a fixed list of URLs to process, fetches data, transforms data and then serves it to the user. But as the list is given fixed and no new links are added, it doesn't (need to) access robots.txt. The list is to be fetched, as the user said it. There is no autonomous decision by some (ro)bot that would need guidance by robots.txt, but there is a clear objective that is defined by the person who wants the content added to the feed.

As said above, if the scraper is combined with a crawler, that's a different thing. But in general, you look at robots.txt when you decide if you should follow a link, not if you already have the URL in your TODO list.

Comment Re:Soo, who to trust? (Score 1) 92

No, it does not.

I side with Perplexity on the access being no crawling, not being used to train AI and not being a request that needs to obey robots.txt.
But it *IS* scraping by the very definition, i.e., fetching content automatically to parse it and extract information. And most scrapers do not respect robots.txt, as it is made for crawlers, which are autonomously following links, what scrapers don't.

Of course sometimes crawlers and scrapers are combined, e.g., in the bots that search for content for AI training.

Comment Re:What? (Score 1) 92

The point is, that users are free to choose their software. If you deny specific softwares to access content, websites will not only deny AI agents, but also browsers with adblockers. Or maybe just everything that is not Chrome. If I want to read your website with lynx so I don't have to see your ugly background color it's my right. If I let some AI website fetch and summarize it, it is the same. Yeah, you wanted me to read the whole thing, but I said TL;DR and created a summary out of it, just as I may choose to block your ads or use the content in completely unrelated ways. You just can't (technically and legally) force users to use a specific way to consume your content.

If you want to try, go help Axel Springer suing Adblock Plus. They are also trying to do that, claiming that changing the html source before displaying it (I'm not even sure if that's what ABP does for blocking) would break their copyright.

Comment Re:What? (Score 1) 92

It is not unusual to use a browser user agent when accessing websites with a virtual browser.

Do you run a webserver? Ever noticed how an iPhone visits sites that are little known a few minutes after the Google bot accessed them?
It isn't even unlikely that it may be an iPhone, testing the load times and how good the page renders on mobile to decide on the ranking (especially in mobile searches). Still it doesn't identify as Google bot, probably because Google also wants to catch you if you serve other content to the Google bot than to actual users.

Comment Explanation (Score 1) 92

Perplexity has three types of access to websites.

The first is crawling. Most of that is not done by them, as they mostly use models trained by others.
The second is web search. When the AI decided what search terms may find the information for your question, it asks a search engine (e.g. using the bing API). The search engine crawled the site and a request may even trigger a recrawling.
The third is agentic access. It doesn't crawl information, but only accesses it as proxy for the user. This means when the search results are there, Perplexity fetches the content and summarizes it, just as the user's browser could do.

Now Cloudflare and Perplexity are disagreeing on the last step. Cloudflare lumps everything together and seems to be unable do distinguish Perplexity requests from requests of others using the services that Perplexity uses to fetch content, and insists on the request being crawling. Perplexity, on the other hand, says they just make a request, don't crawl any further after fetching the page, and don't train an AI with the data, but just provide it to the user, acting merely as a proxy (or modern: as an agent) for the user.

Comment Re:Ever wondered why Win11 requires a TPM? (Score 4, Insightful) 89

Yes, Stallman sounds extremist, but he's only two steps ahead of what's to come. And the problem is, that even the Stallman niches to use your computer unrestricted are becoming smaller. What use has GNU/Linux when you can't open your files because it has no valid trust chain to one of the preinstalled trust anchors?

And with the web integrity framework Google already tried to allow websites to restrict what browser you can use. Expect that stuff to come back with a new name. First they will say banking websites want to ensure you're not using a malware browser. Then they will provide some API for websites to disallow extensions (again, an extension could steal your banking data!) and the websites verify your browser to be sure you don't fake the API response.
And then you'll see the websites that won't work if you have an adblocker installed. And with a web locked down to trusted browsers you can kiss your privacy extensions goodbye as well, if you want to see the content, because ads without tracking are worth less than ads that know who you are.

Comment Re:Too late. (Score 1) 60

The point is, that if the USE constitutes fair use, there is no copyright violation.

Simple way to understand it:

Is copyright applicable at all?
Is the work trivial? If yes, then it isn't applicable.
Does a exception as fair use apply? Then it is not applicable.
If it is not, you're free to train your AI.

If it is applicable, the next questions are about licenses.
Is the work already public domain? Use it.
Is it under a public license? Use it, do what the license says
Is there an offer to buy it automatically (e.g. stockimage website)? You can buy it
It is not? Contact the owner and negotiate a license
They don't grant one? You can't use it.

Comment Whatever (Score 1) 60

And movies start by telling you that you go to jail if you copy them. This neither stops people from copying, nor gets the people who are caught into jail.

And for the AI companies the situation is simple: Either the fair use claim holds, then Universal Pictures keep their Anti AI messages to themselves, or it does not hold and the AI company can file bancrupt because they would need to ask everyone for licenses. Universal Pictures content is only a tiny fraction of what will go into the video models and if they don't have a blanket right to train, they can't create the model at all.

Comment Re:Microsoft's Palladium is here (Score 5, Insightful) 89

If you blame the cheaters for DRM that uses the TPM, you follow the narrative of the companies. Blame Activision. Anti-Cheat should be done server-side, if you lock down the player's PC you're doing it wrong. You can't justify "We need to take your rights away because of cheaters" by blaming the cheater, because that's not a problem that is worth the restrictions.

Slashdot Top Deals

He's dead, Jim.

Working...