AI Tool Detects LLM-Generated Text in Research Papers and Peer Reviews 24

Posted by msmash on Friday September 19, 2025 @02:01PM from the it's-spreading dept.

An analysis of tens of thousands of research-paper submissions has shown a dramatic increase in the presence of text generated using AI in the past few years, an academic publisher has found. Nature: The American Association for Cancer Research (AACR) found that 23% of abstracts in manuscripts and 5% of peer-review reports submitted to its journals in 2024 contained text that was probably generated by large language models (LLMs). The publishers also found that less than 25% of authors disclosed their use of AI to prepare manuscripts, despite the publisher mandating disclosure for submission.

To screen manuscripts for signs of AI use, the AACR used an AI tool that was developed by Pangram Labs, based in New York City. When applied to 46,500 abstracts, 46,021 methods sections and 29,544 peer-review comments submitted to 10 AACR journals between 2021 and 2024, the tool flagged a rise in suspected AI-generated text in submissions and review reports since the public release of OpenAI's chatbot, ChatGPT, in November 2022.

AI Tool Detects LLM-Generated Text in Research Papers and Peer Reviews

Post Load All Comments

Search 24 Comments Log In/Create an Account

Comments Filter:

Fart logic: (Score:2)

by A10Mechanic ( 1056868 ) writes:

You smelt it, you dealt it.
Not terribly surprising (Score:2)

by godrik ( 1287354 ) writes:

Of course, this being /. I have read TFS but not TFA.
what is it that they are detecting really. Some text was generated by an LLM is not terribly surprising. Lots of language tools like akin to grammarly are powered with LLMs.
The abstract was partially written by an LLM is not really a problem either. Abstracts are summary and LLM are somewhat decent at that. As long as you proofread for accuracy, it's probably fine.
Now if the paper and the results are LLM generated, then yeah, it's an issue.
- Re: (Score:3)
  
  by AmiMoJo ( 196126 ) writes:
  
  I've heard complaints from autistic people that they have been accused of being AI due to the way they write. I bet the false positives are pretty bad with this one.
  - Re: (Score:1)
    
    by innocent_white_lamb ( 151825 ) writes:
    
    I write stories about hardboiled detectives, usually interacting with fairy tale and fantasy characters.
    My writing style is deliberately contrived, with over-the-top metaphors and lots of archaic slang.
    I wonder if this tool would tell me that I'm an AI based on that?
You're Totally Right (Score:2)

by lordDallan ( 685707 ) writes:

This paragraph isn't generated by an LLM. My mistake, ha! I'll make sure to do better next time

What are the odds this AI hallucinates about finding passages written by AI? Feels very slippery slope and not addressing the real problem of there generally being too many bogus papers submitted and not enough staff to review them.
- Re: (Score:3)
  
  by allo ( 1728082 ) writes:
  
  All AI detectors are known for a large false positive rate. Don't rely on them, you'll probably do harm to people who didn't use AI.
  - Re: (Score:2)
    
    by martin-boundary ( 547041 ) writes:
    
    Sometimes you have to break a few eggs to make an omelette. The wider issue is: what is the acceptable tradeoff between false positives and false negatives that keeps the slop in check for everyone? It is clearly not to err on the side of no false positives at all. That's merely sweeping the problem under the rug.
    The people you mention who didn't use AI are essentially victims of the AI cheaters whose behaviour causes predictable countermeasures. Just like the wider journal readership are victims, who are
    - Re: (Score:2)
      
      by allo ( 1728082 ) writes:
      
      If you don't need to optimize for no false positives, then just go with a detector always outputting AI. It has a zero false negative rate.
      You ALWAYS need to minimize both false positive and false negative. And if you want to make allegations against someone, you'd better have a very low false positive rate.
Begun, (Score:1)

by bonedonut ( 4687707 ) writes:

The AI wars have
Who's gonna guard the guard? (Score:3)

by devslash0 ( 4203435 ) writes: on Friday September 19, 2025 @02:29PM (#65670912)

If AI algorithms are fundamentally unreliable, are they allowed to grade their own homework?

Reply to This Share
Flag as Inappropriate
Accuracy? Relevance? (Score:2)

by Roger W Moore ( 538166 ) writes:

How accurate is this tool for modern text through? It claims it is 99.85% accurate on text generated before 2021 but styles and use of language change over time, especially in the sciences. As the article itself notes there may be a false positive rate that is increasing over time as our use of language diverges from what it was trained on. Also it cannot differentiate between passages written by AI vs. written by humans and edited by AI and the later is exactly how AI should be used.

Then there is the qu
- Re: (Score:1)
  
  by Magic5Ball ( 188725 ) writes:
  
  For this workflow, it just needs to be accurate enough to flag a manuscript or reviewer comments for human review. If the authors disclosed and it was AI assisted, great. If not, question what else the authors or reviews might be dishonest about.
  The detection AI concurs reasonably with human judgement: "The study also found that submissions in 2025 with abstracts flagged by Pangram were twice as likely to be rejected by journal editors before peer review as were those not flagged by the tool. The desk-rejec
  - Re: (Score:2)
    
    by Roger W Moore ( 538166 ) writes:
    
    For this workflow, it just needs to be accurate enough to flag a manuscript or reviewer comments for human review.
    How do you figure that? A human generally can't tell AI generated text from human generated text although I will admit that I'm getting a bit of an AI-vibe from your post.
    - Re: (Score:1)
      
      by Magic5Ball ( 188725 ) writes:
      
      "A human generally can't tell AI generated text from human generated text"
      Go read some of the grad student Facebook groups. Folks who regularly see AI text and humon-authored text can tell those apart fairly reliably. TFS talks about how humans agreed with the AI detector about AI-assisted texts being low quality.
      When I did editor training, a part of that was to read and edit for *flow*, which gen AI does not currently understand.
      > I will admit that I'm getting a bit of an AI-vibe from your post.
      Thank yo
Adversarial Networks (Score:2)

by darkain ( 749283 ) writes:

We already have adversarial network training methods. Each time I see a "tool that detects AI" I can only imagine the AI tool makers are also going to start using these tools to do adversarial training on their models / outputs to bypass whatever checks these tools do.
Then again, we're getting closer and closer to XKCD's reality: https://xkcd.com/810/ [xkcd.com]
Using a LLM to create text... (Score:2)

by MpVpRb ( 1423381 ) writes:

...is fine, if you honestly disclose how it was created
Using an LLM to create text and claiming that you wrote it is fraud
That's pretty dumb (Score:2)

by 50000BTU_barbecue ( 588132 ) writes:

All my text including this is generated by a LLM called my brain.
I doubt that (Score:2)

by nospam007 ( 722110 ) * writes:

Many people use AI to 'beautify' their own texts, or correct them stylistically or make them sound more scientific, legal or whatever, especially since LOTS of them aren't native English speakers.
That's not 'generating'.
- Re: (Score:2)
  
  by oldgraybeard ( 2939809 ) writes:
  
  "make them sound more scientific, legal or whatever," to me, being accurate and correct seems like a valid alternate goal. But hey, what could I know.
  - Re: (Score:2)
    
    by chmod a+x mojo ( 965286 ) writes:
    
    You can be as accurate as any top scientist, but if you don't SOUND like a scientist you won't be taken (as) seriously. It's stupid, but it's true.
    It's no different than with accents, which have been studied quite a lot. You can say the most profound thing, but if you have a southern / redneck / hillbilly accent when you say it? You immediately are perceived as less intelligent, and depending on the listener it can invoke some heavy cognitive dissonance, distracting from the message being sent. Again, this
quis custodiet ipsos custodes? (Score:2)

by rocket rancher ( 447670 ) writes:

Interesting article. The paper [arxiv.org] is pretty interesting, too. Pangram Text, is an AI detector that claims 99% accuracy spotting machine-generated text. Impressive benchmarks, clever tricks (mirror prompts, hard negative mining), and a lot of self-congratulation about finally solving the “who wrote this?” problem. But if you peel it back, what you see is the same old cat-and-mouse game with plagiarism we’ve been playing since middle school kids first discovered they could plagiarize an encycl

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

AI Tool Detects LLM-Generated Text in Research Papers and Peer Reviews 24

AI Tool Detects LLM-Generated Text in Research Papers and Peer Reviews More | Reply Login

AI Tool Detects LLM-Generated Text in Research Papers and Peer Reviews

Fart logic: (Score:2)

Not terribly surprising (Score:2)

Re: (Score:3)

Re: (Score:1)

You're Totally Right (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Begun, (Score:1)

Who's gonna guard the guard? (Score:3)

Accuracy? Relevance? (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:1)

Adversarial Networks (Score:2)

Using a LLM to create text... (Score:2)

That's pretty dumb (Score:2)

I doubt that (Score:2)

Re: (Score:2)

Re: (Score:2)

quis custodiet ipsos custodes? (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot