Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
AI

Perplexity AI Faces Scrutiny Over Web Scraping and Chatbot Accuracy (wired.com) 20

Perplexity AI, a billion-dollar "AI" search startup, has come under scrutiny for its data collection practices and accuracy of its chatbot responses. Despite claiming to respect website operators' wishes, Perplexity appears to scrape content from sites that have blocked its crawler, using an undisclosed IP address, a Wired investigation found. The chatbot also generates summaries that closely paraphrase original reporting with minimal attribution. Furthermore, its AI often "hallucinates," inventing false information when unable to access articles directly. Perplexity's CEO, Aravind Srinivas, maintains the company is not acting unethically.
This discussion has been archived. No new comments can be posted.

Perplexity AI Faces Scrutiny Over Web Scraping and Chatbot Accuracy

Comments Filter:
  • Why do we care? Perplexity Is a Bullshit Machine, / it’s surprisingly unclear what the AI search startup actually is [wired.com]

    Do we finally have the beginning of the end of the AI hype cycle? Maybe it's just the end of the beginning?

    • I too have noticed Perplexity straight up making up citations. But is it just Perplexity bs-ing?

      https://www.cbc.ca/news/canada... [www.cbc.ca]

      https://fortune.com/2023/06/23... [fortune.com]

      https://www.fd.org/news/colora... [fd.org]

      Maybe its not a bad thing. If its generally understood to be wrong some part of the time, people will have to check everything instead of just copy/pasting. Its no different than real life, where its an important skill to discern fact from fiction.

      • As far as the making things up goes, I think it's worse than people think. The AI gets trained off any lies that people spot. On the other hand, if a lie doesn't get spotted then on it goes. Furthermore, that lie may welll get out into the world and into training documents. That means that it's learning to tell exactly those lies that are impossible to spot. People will check the (fake)AI by searching on the internet and will find validation in an answer that's already been dumped from the AI by some previo

      • Problem is that AI will produce bullshit at a thousand times the rate of even the worst hunan bullshitters.
      • I had this same problem with ChatGPT 4.0o today. I asked it for some crime stats and it linked to articles written by Pew Research, The FBI, and Reason magazine. At one point the response contained about 8 URLs. Of those eight, only two were real. Not only were the articles in Reason all coming up as "404 not found" searching for the article titles got me nothing. That's because they never existed at all.
  • by TomGreenhaw ( 929233 ) on Thursday June 20, 2024 @08:46AM (#64563429)
    I thought I was dealing with a low grade denial of service attack until I saw all the traffic coming from a Claude Bot user agent slurping the same pages relentlessly from dozens of Amazon VMs. Robots.txt was completely ignored. As much as I dislike useless regulation, there ought to be a law...
    • Comment removed based on user account deletion
      • Not dynamic, I can only assume it's a bug of some kind. It's a shame really. I didn't want to block them as I would like my customer's site to be in AI LLM models.

        It may not even be Anthropic, but another actor posing as them.
  • Despite all the other BS Perplexity is doing, which may or may not be accurate, there is one thing here that needs to be addressed:

    Robots.txt is not a gatekeeper

    In theory, Perplexity’s chatbot shouldn’t be able to summarize WIRED articles, because our engineers have blocked its crawler via our robots.txt file since earlier this year. This file instructs web crawlers on which parts of the site to avoid, and Perplexity claims to respect the robots.txt standard. WIRED’s analysis found that

    • If you do not want to have your text scraped from AI bots, just don't publish it on internet. Unfortunately this makes internet useless...
  • A forum I visit daily was really slow for a few days about a week ago. The forum was showing over 160 "guests" when on any normal day about 20 would exist. The forum admin tracked the IP addresses of these "guests" back to Facebook. These guests were visiting every page in the forum in rapid succession for days on end. Robots.txt did nothing to stop them. It looks like a LLama was spitting all over the forum.

  • These people are greedy assholes and do not care who they steal from or how much damage they do.

  • I've used it quite a bit over the last month and found it generally useful on a variety of subjects.

    Its good if you are asking it how to do something very specific in linux. Not always right, because the answer may be out of date, but it gives a better and quicker pointer to a viable source than doing the screening yourself.

    It is sometimes quite wrong because it relies on one inaccurate source, eg an opinion piece on an obscure partisan web site. So you have to be wary

    On literature and literary criticism

The key elements in human thinking are not numbers but labels of fuzzy sets. -- L. Zadeh

Working...