Fake Scientific Paper Detector - Slashdot

Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

×

Fake Scientific Paper Detector 277

Posted by ScuttleMonkey on Tuesday April 25, 2006 @04:22PM from the paper-unnoticed-amidst-conference-white-noise dept.

moon_monkey writes "Ever wondered whether a scientific paper was actually written by a robot? A new program developed by researchers at Indiana University promises to tell you one way or the other. It was actually developed in response to a prank by MIT researchers who generated a paper from random bits of text and got it accepted for a conference."

This discussion has been archived. No new comments can be posted.

Fake Scientific Paper Detector

Search 277 Comments Log In/Create an Account

Comments Filter:

Re:That's good and all (Score:5, Informative)

by visgoth ( 613861 ) writes: on Tuesday April 25, 2006 @04:34PM (#15199931)

Oh, I'm sure the work of monkeys is quite easily identifiable [vivaria.net].

Parent Share
twitter facebook
Only works for scientific papers (Score:5, Informative)

by gurps_npc ( 621217 ) writes: on Tuesday April 25, 2006 @04:35PM (#15199936) Homepage

If you try to use it on any human written NON scientific paper, such as Lincoln's gettyburg address, it almost always considers it false.
I suspect that it is looking for the conventional thinking with conventional word structure. As such, it is NOT a good idea i

Share
twitter facebook
Re:Typos (Score:3, Informative)

by dlakelan ( 43245 ) writes: <dlakelan&street-artists,org> on Tuesday April 25, 2006 @04:35PM (#15199941) Homepage

Do robots make typos? Do they make the same typos each time, or different ones?

Based on the slashdot articles that get posted. I would say YES.

Actually it's pretty easy to add random convincing misspellings to text, you could use a database from something like usenet, and a spell checker to map misspelled words to their real counterparts, and then have a straightforward algorithm for replacing some set of words with misspellings, and you could tune that for consistency. It would be easier than many other aspects of faking papers.

Parent Share
twitter facebook
Re:An interesting experiment (Score:2, Informative)

by Ontain ( 931201 ) writes: on Tuesday April 25, 2006 @04:44PM (#15200038)

that's not surprising. i did a few articles and they come up in the 20ish percent range. this detector isn't very good.

Parent Share
twitter facebook
Re:An interesting experiment (Score:2, Informative)

by MindStalker ( 22827 ) writes: <mindstalker@[ ]il.com ['gma' in gap]> on Tuesday April 25, 2006 @04:48PM (#15200068) Journal

It was intended to classify scientific studies. Not articles.

Parent Share
twitter facebook
I am in awe (Score:5, Informative)

by DingerX ( 847589 ) writes: on Tuesday April 25, 2006 @05:02PM (#15200211) Journal

So I go there, and I start shoving it text from my hard drive. I try:

A) Text of an article (Philosophy) I (native English speaker) wrote in Italian: 98.5 Authentic.
B) Text of an article I wrote in English (History): 87.8
C) Text of an article (History) written in French by a native French speaker and translated into English: 93.2
D) Critical edition of a 14th-century Latin text (Theology): 97.7 Authentic.
E) Documentation to a Field Artillery Simulation: 95.3
F) A completely bogus narrative for a monastic order that doesn't exist, written in a style that mimics A)-C): 16.8% Inauthentic

So in this case, we have a human written document that has superficial meaning, but is written as a "fake scientific paper", and registering as such.

And yes, I did read the "purpose" of the page; I know it's not supposed to detect it.

And yet it does, decisively.

Share
twitter facebook
Re:Only works for scientific papers (Score:5, Informative)

by nasor ( 690345 ) writes: on Tuesday April 25, 2006 @05:07PM (#15200255)

No, it doesn't even seem to work on scientific papers. I submitted four papers from the latest issue of Inorganic Chemistry and it thought 2 out of 4 were false:

Inauthentic: Assembly of a Heterobinuclear 2-D Network: A Rare Example of Endo- and Exocyclic Coordination of PdII/AgI in a Single Macrocycle.

Inauthentic: Pyrazolate-Bridging Dinucleating Ligands Containing Hydrogen-Bond Donors: Synthesis and Structure of Their Cobalt Analogues

Authentic: Manganese Complexes of 1,3,5-Triaza-7-phosphaadamantane (PTA): The First Nitrogen-Bound Transition-Metal Complex of PTA

Authentic: Structure, Luminescence, and Adsorption Properties of Two Chiral Microporous Metal-Organic Frameworks

Based on this (small) sampling, the program doesn't appear to do any better than if it were to guess randomly. I wonder if this thing is even supposed to work, or if it just returns a random result based on a hash of the paper or something?

Parent Share
twitter facebook
What it says about anything (Score:3, Informative)

by Pi_0's don't shower ( 741216 ) writes: <ethan@isp.northw ... u ['est' in gap]> on Tuesday April 25, 2006 @05:26PM (#15200406) Homepage Journal

I just finished writing a scientific paper for publication. Apparently, this filter is very reliant on using long-term pattern recognition. When I fed this application my introduction only, it told me my work was INAUTHENTIC with a 35% chance of authenticity. When I fed it the first two sections, it said it was AUTHENTIC with a 66% chance of authenticity. And finally, when I fed it the entire paper, it said it was AUTHENTIC at the 87% level.

So apparently, all you need to do to beat this filter is insert the same buzzwords and phrases at many different points in a long article, and you should be fine.

Parent Share
twitter facebook
Read the Paper - Looks at Repetition (Score:3, Informative)

by Constantine Evans ( 969815 ) writes: on Tuesday April 25, 2006 @07:43PM (#15201300) Homepage

Read the paper listed in the menu of the website. The system essentially compresses the text with different window sizes, and then looks at the compression factors. In other words, it is only looking for repetition of strings. This is absurdly easy to fool, and the MIT generator could be easily fixed to pass this filter. For example, try entering a random text once (your post, for example). Note that it fails. Then append a few copies of the same text, and run that through. Your post, when run once, is too short. When run with two copies, it is rejected as 41.2%. When run with three, it passes with 93%. There is a window of repetition level required in order to pass - papers that do not repeat enough are classified as fake, as well as papers that repeat too much (try entering twenty copies of your post).

It should be relatively simple to make a random paper generator that always passes this test with a higher probability than human-written papers.

Parent Share
twitter facebook
Re:Turing test? (Score:2, Informative)

by ironring2006 ( 968941 ) writes: on Tuesday April 25, 2006 @08:51PM (#15201665)

Speaking of Turing, this showed up in the references for the automatic paper that I generated:

Turing, A., Wilkes, M. V., Nehru, B., Wang, F. Z., Subramanian, L., Zhao, W., Beaman, N. A., Turcotte, B. A., and Wu, V. Refining consistent hashing and 16 bit architectures with SandyEos. Journal of Efficient, Highly-Available Communication 1 (Apr. 2002), 50-62.
Glad to see he's still contributing to the field from the grave!

Parent Share
twitter facebook
Re:That's good and all (Score:1, Informative)

by Anonymous Coward writes: on Tuesday April 25, 2006 @09:31PM (#15201787)

The infinite monkeys and time situation(actually, you just need one or the other - or rather, you need infinite "monkeytime", which you get either if you have infinite monkeys, or if you have infinite time and an assurance that the monkey population will not die out) does not depend on a uniform probability distribution for letters. It just assumes that the probability of the desired combination of characters occuring is nonzero. The argument could be made that this is really quite likely given that it is quite thinkable that one of the things a monkey could do with a typewriter is hit keys largely at random, and thus that there is a nonzero probability of it keeping up this behaviour for long enough to produce a combination of the desired length.

(Once you have the desired probability P (- <0,1] of the combination arising in finite time t, then for all multiples kt, where k is a positive integer, of t the probability of it arising at least once is equal to or greater than 1-(1-P)^k, which approaches 1 as k(representing time) approaches infinity. Thus, the limit probability is 1.)

(Note, however, that many estimates of how likely it is to produce some given work in finite time do indeed depend on a uniform distribution, even though the underlying thought experiment does not.)

Parent Share
twitter facebook

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Related Links Top of the: day, week, month.

413 commentsChatGPT Leans Liberal, Research Shows
347 commentsAmazon CEO Says 'It's Probably Not Going To Work Out' For Employees Who Defy Return-to-Office Policy
327 commentsHotel Owners Start To Write Off San Francisco as Business Nosedives
323 commentsChina is Building Nuclear Reactors Faster Than Any Other Country
315 commentsChina is Calling in Loans To Dozens of Countries

If all else fails, lower your standards.