Fake Scientific Paper Detector 277
moon_monkey writes "Ever wondered whether a scientific paper was actually written by a robot? A new program developed by researchers at Indiana University promises to tell you one way or the other. It was actually developed in response to a prank by MIT researchers who generated a paper from random bits of text and got it accepted for a conference."
Yes! (Score:5, Funny)
Re:Yes! (Score:2, Funny)
Re:Yes! (Score:3, Funny)
They use old people's medicine for fuel.
Re:Yes! (Score:2)
Re:Yes! (Score:2)
Re:Yes! (Score:3, Interesting)
Re:Yes! (Score:2, Funny)
I for one am a paper writing robot overlord, you insensitive clod! I for one welcome our new video game consoles. They are called "hands". Shouldn't it be something like this will ever happen then you will see that they bring things out in managable increments. Sure it is a biggish program, but many lone hackers have written one in under one person/year.
That's good and all (Score:5, Funny)
Re:That's good and all (Score:4, Funny)
Monkey's typing on a typewriter as Mr. Burn's is working on the next great american novel:
Burns: This is a thousand monkeys working at a thousand typewriters. Soon they'll have written the greatest novel known to man.
(monkey smoking cigar typing on a typewriter)
Burns: Lets see. It was the best of times, it was the BLURST of times! You stupid monkey! (Smacks monkey upside his head)
Re:That's good and all (Score:5, Informative)
Re:That's good and all (Score:2)
Re:That's good and all (Score:2)
Re:That's good and all (Score:3, Funny)
Re:That's good and all (Score:2)
Re:That's good and all (Score:3, Funny)
Seems like it would be easier to develop a program that automatically detects /. dupes.. but no.
*At least the million /. pounding monkeys detect it..*
Re:That's EASY! (Score:5, Funny)
If I could just find a way to recharge my PowerBook from your hatred, I could stop carrying this ugly power adaptor.
Re:That's EASY! (Score:3, Funny)
Turing test? (Score:5, Insightful)
Self defeating? (Score:5, Funny)
It seems like it wouldn't be too difficult to modify the MIT program to use this new anti-robot robot to write papers that this anti-robot robot would not be able to detect. Ideally, this would be done with a learning algorithm (so that it could easily be extended to other anti-robot robot programs), but reverse-engineering the anti-robot robot (by humans) should also provide a solution.
Now that Indiana U has thrown down the gauntlet, I wouldn't be surprised if MIT responds. Hopefully it will result in an even better paper-writing robot. Ideally, it will lead to dissertation-writing robots. :)
Re:Self defeating? (Score:5, Interesting)
I recently had to check out an essay-grading robot for my Introduction to Natural Language Processing class.
I'd fed it the introduction of a randomly generated essay. It got a 4/5 on all counts.
I figure, if teachers are going to use robots to grade essays, we should use robots to create them in the first place.
Re:Self defeating? (Score:2)
Hence my *lead-weighted* document folders. Bwahahahah.
Re:Self defeating? (Score:2)
Re:Self defeating? (Score:5, Funny)
Re:Self defeating? (Score:2)
This reminds me of a movie where a few students started sending tape-recorders to class instead of themselves. Gradually the scene had the professor lecturing to a room full of tape recorders. The last step in this scenario was a tape of the lecture being played to a room full of machines taping it.
(Dammit if I can't recall which movie that is though.)
Re:Self defeating? (Score:2, Funny)
Okay, actually I just wanted to comment that I love the sig.
Re:Self defeating? (Score:2)
On that day, I'll be long dead and so will my Moravec-inspired uploaded mind-children.
Re:Self defeating? (Score:3, Insightful)
next comes your anti robot robot
then the anti anti robot robot robot
and of course the anti anti anti robot robot robot robot
and the anti anti anti anti robot robot robot robot robot
I could go on since cut and paste is so easy
Perhaps it would be a million anti's followed by a million and one robots before something useful came out of such an exercise, but wouldn't it be cool t
Re:Self defeating? (Score:2, Interesting)
This could lead to a whole series of literary robots: The Too Many Coincidences in Fiction Detector, The Humanities Thesis Verbiage Reducer, The This Movie Is Going to Suck No Matter Who Acts In/Directs It Detector, and so forth.
Re:Self defeating? (Score:2)
Re:Self defeating? (Score:2)
Hmmm.... Have you ever read a dissertation? You'd have a hard time convincing me that such a robot hasn't been in common use for quite a while.
Reading dissertations (Score:2)
Re:Self defeating? (Score:2)
You are assuming that P == NP here. Or that the bot that creates the paper has infinite time to run.
On this specific situation, it may be usefull used with a learnning algorithm. But not on a general case.
Re:Turing test? (Score:2, Informative)
Testing... (Score:2, Interesting)
RESULTS: FAKE
Yep, it works!
A USEFUL application... (Score:2, Funny)
Re:A USEFUL application... (Score:2)
Discrimination (Score:5, Funny)
Re:Discrimination (Score:3, Funny)
I think the preferred term is "Ferro-Americans".
An interesting experiment (Score:5, Funny)
Re:An interesting experiment (Score:5, Funny)
INAUTHENTIC
with a 24.9% chance of being authentic text"
No kidding.
Re:An interesting experiment (Score:3, Funny)
I also tried another article from ABC News about meat eaters contributing to global warming (http://abcnews.go.com/Technology/story?id=185681
Looks like they have a crafty team of robots there at abc
Re:An interesting experiment (Score:2)
Wow!
Re:An interesting experiment (Score:2)
Re:An interesting experiment (Score:2, Informative)
Re:An interesting experiment (Score:2, Informative)
Re:An interesting experiment (Score:3, Funny)
Re:An interesting experiment (Score:2, Funny)
Typos (Score:2)
Do robots make typos? Do they make the same typos each time, or different ones?
Therein lies the true heart of a proper detector.
Re:Typos (Score:2)
Re:Typos (Score:3, Informative)
Based on the slashdot articles that get posted. I would say YES.
Actually it's pretty easy to add random convincing misspellings to text, you could use a database from something like usenet, and a spell checker to map misspelled words to their real counterparts, and then have a straightforward algorithm for replacing some set of words with misspellings, and you could tune that for consistency. It would be easier than many other as
Re:Typos (Score:5, Funny)
Sadly, It appears that I am a robot. (Score:4, Interesting)
The nice thing is that we've finally settled the argument if machines can be made to drink beer and like it !
See what it says about slashdot (Score:3, Funny)
This text had been classified as INAUTHENTIC with a 32.2% chance of being authentic text
Bearing in mind that text over 50% chance will be classified as authentic, this add credence to the theory that slashdot comments are generated by monkeys randomly typing on keyboards.
What it says about anything (Score:3, Informative)
So apparently, all you need to do to beat this filter is insert
Sounds like a major innovation in input screening (Score:2)
Or is this just another application of Bayesian filters again?
The program is a failure. (Score:3, Interesting)
Re:The program is a failure. (Score:2)
The trick to reading the results is when it says "definitely fake" it's fake. Otherwise you ignore the result as either "not-fake" or inconclusive.
Tom
Re:The program is a failure. (Score:2)
Re:The program is a failure. (Score:2)
I don't know what the threshold for this test is but it's likely not around 50%.
Tom
Re:The program is a failure. (Score:2)
Only works for scientific papers (Score:5, Informative)
I suspect that it is looking for the conventional thinking with conventional word structure. As such, it is NOT a good idea i
Re:Only works for scientific papers (Score:5, Informative)
Inauthentic: Assembly of a Heterobinuclear 2-D Network: A Rare Example of Endo- and Exocyclic Coordination of PdII/AgI in a Single Macrocycle.
Inauthentic: Pyrazolate-Bridging Dinucleating Ligands Containing Hydrogen-Bond Donors: Synthesis and Structure of Their Cobalt Analogues
Authentic: Manganese Complexes of 1,3,5-Triaza-7-phosphaadamantane (PTA): The First Nitrogen-Bound Transition-Metal Complex of PTA
Authentic: Structure, Luminescence, and Adsorption Properties of Two Chiral Microporous Metal-Organic Frameworks
Based on this (small) sampling, the program doesn't appear to do any better than if it were to guess randomly. I wonder if this thing is even supposed to work, or if it just returns a random result based on a hash of the paper or something?
Read the Paper - Looks at Repetition (Score:3, Informative)
Read the paper listed in the menu of the website. The system essentially compresses the text with different window sizes, and then looks at the compression factors. In other words, it is only looking for repetition of strings. This is absurdly easy to fool, and the MIT generator could be easily fixed to pass this filter. For example, try entering a random text once (your post, for example). Note that it fails. Then append a few copies of the same text, and run that through. Your post, when run once, is too
Re:Only works for scientific papers (Score:2)
Incase anyone was wondering... (Score:2)
It passed with a "90.1% of being an authentic paper.
surely already done? (Score:2)
Ah.... (Score:2, Funny)
FYI: it wasn't really a conference (Score:2)
As I espected (Score:2)
QUI-GON Are you brainless? You almost got us killed!
JAR JAR I spake.
QUI-GON The ability to speak does not make you intelligent. Now get outta here!
This text had been classified as INAUTHENTIC with a 46.0% chance of being authentic text
What does it think of my paper? (Score:2)
This text had been classified as
AUTHENTIC
with a 95.2% chance of being an authentic paper
Whew!!, cool maybe I'll pass the turing test too.
There're a lot of "my stuff was inauthentic" posts (Score:2)
Folks, I am a robot plagiarist (Score:2)
I then tried an article from Scientific American and it scored 24% - sorry, guys, time for me to cancel the subscription, you are full of it. Alternatively, of course, it is the University of Indiana School of Informatics that's full of it and the air is thick with over-hype. It would be interesting for someone with the t
I am in awe (Score:5, Informative)
A) Text of an article (Philosophy) I (native English speaker) wrote in Italian: 98.5 Authentic.
B) Text of an article I wrote in English (History): 87.8
C) Text of an article (History) written in French by a native French speaker and translated into English: 93.2
D) Critical edition of a 14th-century Latin text (Theology): 97.7 Authentic.
E) Documentation to a Field Artillery Simulation: 95.3
F) A completely bogus narrative for a monastic order that doesn't exist, written in a style that mimics A)-C): 16.8% Inauthentic
So in this case, we have a human written document that has superficial meaning, but is written as a "fake scientific paper", and registering as such.
And yes, I did read the "purpose" of the page; I know it's not supposed to detect it.
And yet it does, decisively.
Re:I am in awe (Score:2)
For those of you who don't remember the story, Sokal, a physicist, wrote a paper full of postmodern-sounding gobbledygook, asserting among other things that gravity is a social construction (the paper was subtitled, "Towards a Transformative Hermeneutics of Quantum Gravity"). The paper was accepted at a peer-reviewed humanities journal. Sokal
Re:I am in awe (Score:2)
President Bush's Biography (Score:2, Funny)
I'm amazed too! It works!
Re:President Bush's Biography (Score:2)
Well, that's a relief (Score:2)
The Sokal Affair (Score:2)
Can fool it by duplicating first page (Score:2, Interesting)
It shouldn't be hard to compare the distribution of n-gram recurrence rates (or distances between recurrences) to the observed distribution for actual papers. Something like a KL divergence would capture deviations in either dir
Heuristic Bayesian Filtering Success! (Score:2)
Fake mission statement detector? (Score:2)
I wonder if this program, with a different set of algorithms, would be able to detect whether a coporate mission statement was created using the Dilbert Mission Statement Generator [dilbert.com]. (Beware; Dilbert.com is pop-up hell.)
We need a Sarfatti detector (Score:2)
It's just another prank. (Score:2)
But can it write a paper that will be rejected (Score:2)
Fake Scientific Paper Detector (Score:2)
Sheesh (Score:2)
You would think that this embarassment will cause the paper reviewers to look closer to what the heck they are accepting, but instead we get a program that does that job better.
Just anything, ANYTHING to keep those reviewers from actually getting their work done is well accepted.
Apparently slashdot is also written by robots (Score:2)
Neopallium writes to tell us that in a recent announcement at the Desktop Linux Summit the Free Standards Group reports fourteen of the leading Linux vendors have pledged support for the newest release of the Linux Standards Base. From the article: "'The Release of LSB 3.1 is another milestone achieved by the industry and the Open Source Community that delivers ever increasing value to customers,' said Reza Rooholamini, director
False positives (Score:3, Interesting)
Hmmm, it's an interesting idea, but it seems to give a lot of false positives. (So naturally, it will detect fake papers, if it thinks every paper is fake.)
First thing I tried was some pages on computational oncology website [uci.edu], in particular, my cancer primer [uci.edu], which I wrote in not a short time. Everything I fed was determined to be inauthentic. Perhaps I just write like a robot. :-) I figured that perhaps the detector was more primed for real papers, so I figured it wasn't too big of a deal.
So, next I tried my most recent research paper [sciencedirect.com], and it, too, was determined to be inauthentic, and in fact with less authenticity than my website. So much for the theory of being primed for scientific papers only. This thing is starting to look pretty bogus to me ... but an interesting idea, nonetheless. -- Paul
Discarded Theories (Score:2)
I've always wanted to submit a paper to one of these vanity conference "peer reviewed journals" [cough cough], the ones where no paper is ever rejected, describing some work on long-discarded theories (>50 years). Just to be cheeky.
How does "N-ray studies of the Phlogiston Content of Polywater" sound?
Should probably wait until after tenure...
Trying Wikipedia articles (Score:5, Interesting)
Some variant on this thing might be useful as a new article filter in Wikipedia. We need more automation over there to stem the flow of incoming dreck.
Re:too bad this technology... (Score:2)
What we really need is a fake Extacy detector. The world would be a better place.
Some fields don't have those (Score:2)
Re:Great... (Score:2)
Re:How about . . . (Score:2)
Re:How about . . . (Score:3, Funny)
And if you don't like 2-ply, you can separate the sheets. Keep in mind that this works best before you wipe.
Re:What a Downer (Score:2)
From the paper. (Score:2)
One must understand our network configuration to grasp the genesis of our results. We ran a deployment on the NSA's planetary-scale overlay network to disprove the mutually largescale behavior of exhaustive archetypes. First, we halved the effective optical drive space of our mobile telephones to better understand the median latency of our desktop machines. This step flies in the face of conventional wisdom, but is instrumental to our results. We halved the signal-to-noise ratio of our mobile telephones. W
Re:It Caught Mine (Score:3, Interesting)
Re:Spam? (Score:2)
This touches on the one issue that has yet to be discussed here: what, if anything, could this program do to help identify spam? My first thought is "probably not much", only because papers tend to be alot longer than e-mails, thus giving the program a chance to to generate better statistics for a bayesian filter to make a decision. Even so, when I look at some of the surreal gibberish embedded in spam tha