Please create an account to participate in the Slashdot moderation system


Forgot your password?

Fake Scientific Paper Detector 277

moon_monkey writes "Ever wondered whether a scientific paper was actually written by a robot? A new program developed by researchers at Indiana University promises to tell you one way or the other. It was actually developed in response to a prank by MIT researchers who generated a paper from random bits of text and got it accepted for a conference."
This discussion has been archived. No new comments can be posted.

Fake Scientific Paper Detector

Comments Filter:
  • Testing... (Score:2, Interesting)

    by OakDragon ( 885217 ) on Tuesday April 25, 2006 @03:25PM (#15199834) Journal
    "We believe that there are subtle, short- and long-range word or even word string repetitions that exist in human texts, but not in many classes of computer-generated texts that can be used to discriminate based on meaning."


    Yep, it works!

  • by cbelt3 ( 741637 ) <> on Tuesday April 25, 2006 @03:29PM (#15199872) Journal
    I've taken a long posting that I wrote on my blog and dropped it into the site. And I am Inauthentic. Now I understand the "Bladerunner Moment" comment in the article. I shall begin to surround myself with oddly colored polaroids and snapshots of theoretically implanted ancestors.

    The nice thing is that we've finally settled the argument if machines can be made to drink beer and like it !
  • by im_thatoneguy ( 819432 ) on Tuesday April 25, 2006 @03:35PM (#15199933)
    Apperantly I'm on average 49% artificial, based on school papers I wrote. I dub thee program: a failure.
  • Re:Self defeating? (Score:5, Interesting)

    by cp.tar ( 871488 ) <> on Tuesday April 25, 2006 @03:39PM (#15199978) Journal

    I recently had to check out an essay-grading robot for my Introduction to Natural Language Processing class.

    I'd fed it the introduction of a randomly generated essay. It got a 4/5 on all counts.

    I figure, if teachers are going to use robots to grade essays, we should use robots to create them in the first place.

  • Re:Yes! (Score:1, Interesting)

    by Anonymous Coward on Tuesday April 25, 2006 @03:45PM (#15200044)
    ...or we could have a human just read the damn thing.

    Novel idea.
  • by currivan ( 654314 ) on Tuesday April 25, 2006 @04:25PM (#15200390)
    Duplicating the first half of the sample fake paper after the end of the footnotes makes it go from inauthentic (17%) all the way up to 91% authentic. It seems to be looking for long-range n-gram repetition, but it doesn't have a ceiling on frequency or length or the repeated text.

    It shouldn't be hard to compare the distribution of n-gram recurrence rates (or distances between recurrences) to the observed distribution for actual papers. Something like a KL divergence would capture deviations in either direction.
  • Re:It Caught Mine (Score:3, Interesting)

    This raises a question... how do Wikipedia articles fare? --I'd guess that they should be at least *somewhat* scientific....
  • Re:Self defeating? (Score:2, Interesting)

    by Frumious Wombat ( 845680 ) on Tuesday April 25, 2006 @04:33PM (#15200445)
    Personally, I'd be more interested in modifying this for Fraud Detection. The robot looks over your data and text, and decides, "Sorry Dave, a leap of faith has occurred here." Presumably, at that point the robot locks you out of your lab.

    This could lead to a whole series of literary robots: The Too Many Coincidences in Fiction Detector, The Humanities Thesis Verbiage Reducer, The This Movie Is Going to Suck No Matter Who Acts In/Directs It Detector, and so forth.
  • False positives (Score:3, Interesting)

    by macklin01 ( 760841 ) on Tuesday April 25, 2006 @07:05PM (#15201439) Homepage

    Hmmm, it's an interesting idea, but it seems to give a lot of false positives. (So naturally, it will detect fake papers, if it thinks every paper is fake.)

    First thing I tried was some pages on computational oncology website [], in particular, my cancer primer [], which I wrote in not a short time. Everything I fed was determined to be inauthentic. Perhaps I just write like a robot. :-) I figured that perhaps the detector was more primed for real papers, so I figured it wasn't too big of a deal.

    So, next I tried my most recent research paper [], and it, too, was determined to be inauthentic, and in fact with less authenticity than my website. So much for the theory of being primed for scientific papers only. This thing is starting to look pretty bogus to me ... but an interesting idea, nonetheless. -- Paul

  • Re:Yes! (Score:3, Interesting)

    by Ruff_ilb ( 769396 ) on Tuesday April 25, 2006 @07:50PM (#15201658) Homepage
    They did; the board that accepted the MIT paper, not consisting of specialists in the field, was likely confused by the pseudo-scientific gibberish they encountered. By mastering the methodology for the typical unification of access points and redundancy, the MIT students were able to effectively enter the scientific conference.
  • by Animats ( 122034 ) on Tuesday April 25, 2006 @09:56PM (#15202124) Homepage
    I've been trying my own papers and articles from Wikipedia. My own papers all score around 90%. Wikipedia articles that I consider good ones seem to score in the 80% range. Badly written fancruft scores very low.

    Some variant on this thing might be useful as a new article filter in Wikipedia. We need more automation over there to stem the flow of incoming dreck.

"If value corrupts then absolute value corrupts absolutely."