
AI Tool Detects LLM-Generated Text in Research Papers and Peer Reviews 24
An analysis of tens of thousands of research-paper submissions has shown a dramatic increase in the presence of text generated using AI in the past few years, an academic publisher has found. Nature: The American Association for Cancer Research (AACR) found that 23% of abstracts in manuscripts and 5% of peer-review reports submitted to its journals in 2024 contained text that was probably generated by large language models (LLMs). The publishers also found that less than 25% of authors disclosed their use of AI to prepare manuscripts, despite the publisher mandating disclosure for submission.
To screen manuscripts for signs of AI use, the AACR used an AI tool that was developed by Pangram Labs, based in New York City. When applied to 46,500 abstracts, 46,021 methods sections and 29,544 peer-review comments submitted to 10 AACR journals between 2021 and 2024, the tool flagged a rise in suspected AI-generated text in submissions and review reports since the public release of OpenAI's chatbot, ChatGPT, in November 2022.
To screen manuscripts for signs of AI use, the AACR used an AI tool that was developed by Pangram Labs, based in New York City. When applied to 46,500 abstracts, 46,021 methods sections and 29,544 peer-review comments submitted to 10 AACR journals between 2021 and 2024, the tool flagged a rise in suspected AI-generated text in submissions and review reports since the public release of OpenAI's chatbot, ChatGPT, in November 2022.
Fart logic: (Score:2)
Not terribly surprising (Score:2)
Of course, this being /. I have read TFS but not TFA.
what is it that they are detecting really. Some text was generated by an LLM is not terribly surprising. Lots of language tools like akin to grammarly are powered with LLMs.
The abstract was partially written by an LLM is not really a problem either. Abstracts are summary and LLM are somewhat decent at that. As long as you proofread for accuracy, it's probably fine.
Now if the paper and the results are LLM generated, then yeah, it's an issue.
Re: (Score:3)
I've heard complaints from autistic people that they have been accused of being AI due to the way they write. I bet the false positives are pretty bad with this one.
Re: (Score:1)
I write stories about hardboiled detectives, usually interacting with fairy tale and fantasy characters.
My writing style is deliberately contrived, with over-the-top metaphors and lots of archaic slang.
I wonder if this tool would tell me that I'm an AI based on that?
You're Totally Right (Score:2)
What are the odds this AI hallucinates about finding passages written by AI? Feels very slippery slope and not addressing the real problem of there generally being too many bogus papers submitted and not enough staff to review them.
Re: (Score:3)
All AI detectors are known for a large false positive rate. Don't rely on them, you'll probably do harm to people who didn't use AI.
Re: (Score:2)
The people you mention who didn't use AI are essentially victims of the AI cheaters whose behaviour causes predictable countermeasures. Just like the wider journal readership are victims, who are
Re: (Score:2)
If you don't need to optimize for no false positives, then just go with a detector always outputting AI. It has a zero false negative rate.
You ALWAYS need to minimize both false positive and false negative. And if you want to make allegations against someone, you'd better have a very low false positive rate.
Begun, (Score:1)
The AI wars have
Who's gonna guard the guard? (Score:3)
If AI algorithms are fundamentally unreliable, are they allowed to grade their own homework?
Accuracy? Relevance? (Score:2)
Then there is the qu
Re: (Score:1)
For this workflow, it just needs to be accurate enough to flag a manuscript or reviewer comments for human review. If the authors disclosed and it was AI assisted, great. If not, question what else the authors or reviews might be dishonest about.
The detection AI concurs reasonably with human judgement: "The study also found that submissions in 2025 with abstracts flagged by Pangram were twice as likely to be rejected by journal editors before peer review as were those not flagged by the tool. The desk-rejec
Re: (Score:2)
For this workflow, it just needs to be accurate enough to flag a manuscript or reviewer comments for human review.
How do you figure that? A human generally can't tell AI generated text from human generated text although I will admit that I'm getting a bit of an AI-vibe from your post.
Re: (Score:1)
"A human generally can't tell AI generated text from human generated text"
Go read some of the grad student Facebook groups. Folks who regularly see AI text and humon-authored text can tell those apart fairly reliably. TFS talks about how humans agreed with the AI detector about AI-assisted texts being low quality.
When I did editor training, a part of that was to read and edit for *flow*, which gen AI does not currently understand.
> I will admit that I'm getting a bit of an AI-vibe from your post.
Thank yo
Adversarial Networks (Score:2)
We already have adversarial network training methods. Each time I see a "tool that detects AI" I can only imagine the AI tool makers are also going to start using these tools to do adversarial training on their models / outputs to bypass whatever checks these tools do.
Then again, we're getting closer and closer to XKCD's reality: https://xkcd.com/810/ [xkcd.com]
Using a LLM to create text... (Score:2)
...is fine, if you honestly disclose how it was created
Using an LLM to create text and claiming that you wrote it is fraud
That's pretty dumb (Score:2)
All my text including this is generated by a LLM called my brain.
I doubt that (Score:2)
Many people use AI to 'beautify' their own texts, or correct them stylistically or make them sound more scientific, legal or whatever, especially since LOTS of them aren't native English speakers.
That's not 'generating'.
Re: (Score:2)
Re: (Score:2)
You can be as accurate as any top scientist, but if you don't SOUND like a scientist you won't be taken (as) seriously. It's stupid, but it's true.
It's no different than with accents, which have been studied quite a lot. You can say the most profound thing, but if you have a southern / redneck / hillbilly accent when you say it? You immediately are perceived as less intelligent, and depending on the listener it can invoke some heavy cognitive dissonance, distracting from the message being sent. Again, this
quis custodiet ipsos custodes? (Score:2)
Interesting article. The paper [arxiv.org] is pretty interesting, too. Pangram Text, is an AI detector that claims 99% accuracy spotting machine-generated text. Impressive benchmarks, clever tricks (mirror prompts, hard negative mining), and a lot of self-congratulation about finally solving the “who wrote this?” problem. But if you peel it back, what you see is the same old cat-and-mouse game with plagiarism we’ve been playing since middle school kids first discovered they could plagiarize an encycl