Slashdot Log In
Wikipedia and Plagiarism
Posted by
CmdrTaco
on Sun Nov 05, 2006 10:28 AM
from the less-than-college-papers dept.
from the less-than-college-papers dept.
Spo22a writes "Daniel Brandt found the examples of suspected plagiarism at Wikipedia using a program he created to run a few sentences from about 12,000 articles against Google Inc.'s search engine. He removed matches in which another site appeared to be copying from Wikipedia, rather than the other way around, and examples in which material is in the public domain and was properly attributed.
Brandt ended with a list of 142 articles, which he brought to Wikipedia's attention.... 'They present it as an encyclopedia," Brandt said Friday. "They go around claiming it's almost as good as Britannica. They are trying to be mainstream respectable.'"
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
But (Score:1)
That doesn't seem like alot (Score:2, Insightful)
Re:That doesn't seem like alot (Score:4, Insightful)
Re:That doesn't seem like alot (Score:5, Funny)
(http://www.puremango.co.uk/)
no, it's a wiki. If you find a problem with it, you add a template telling everyone that someone else should fix it.
How is this news? (Score:1)
(http://www.luminance.org/ | Last Journal: Wednesday April 24 2002, @05:35PM)
Impressive (Score:4, Interesting)
Not shocking, but not a big deal (Score:3, Interesting)
(http://hallert.net/)
Wikipedia is written by a large community, and people make mistakes. I have read about other reference tomes that have been caught plagiarizing (for example, some encyclopedias or atlas's will put in a fake piece of data or a fake street so that they can easily determine if they're being copied from), and the turnaround time for fixing it can be years depending on the publishing cycle.
This isn't a condemndation of Wikipedia, despite Mr. Brandt's best efforts, it's a confirmation of why WP works.
Only 142?! (Score:1)
Pfizzle. (Score:2)
(http://www.oddquad.org/)
Yes, it's a problem, but that's actually not a bad score at all. You probably get more plagiarism than that on college papers at good schools. How many of these articles cite what they "plagiarize," even if they don't put it in quotes? Also, to make it legal plagiarizing, all you have to do is re-write each paragraph in your own words.
I see 1.18% of articles as potentially having text lifted from somewhere else as a serious issue for the maintainers of Wikipedia, sure. But I don't think it has a major negative impact on its reliability, or on the quantity or quality of information contained within it. And reliable information is what I care about when I go to wikipedia. If it worked only by having mass exerpts of other sites, I'd call it "GOOGLE," and I'd still use it.
1% plagarism! (Score:1, Flamebait)
Brandt is a Republican (Score:1, Troll)
(http://www.infiltrated.net/)
The proof of the pudding (Score:2)
Conclusion; the best way of improving Wikipedia is by showing where it has a problem. Mr Brandt disproved his opinion. Live and learn.
Thanks,
GerardM
Daniel Brandt, valuable Wikipedia contributor (Score:5, Insightful)
"If you strike me down, I shall become more powerful than you could possibly imagine..." -- Obi Wiki-nobi
From an ex-wikipedia administrator (Score:2)
I used to run into these all the time... but the thing is... a lot of them are caught and removed. Wikipedia has a system to deal with such infrigements, and the users that post them. (See Wikipedia's policy [wikipedia.org] and their copyright problems reporting page [wikipedia.org]) The truth is that you're going to find copyright problems wherever there is user-submitted content (look at YouTub!, for example).
142 out of 12,000? (Score:2)
Buy, hey, 142/12000 is less than 2%. I would have thought the percentage would have been at least 5%.
US Gov copyright? (Score:2, Insightful)
The original article, Brandt said, was copied from a biography on the Wyoming state government site.
Err... I thought works of the US Government were generally free from copyright...?
Re:US Gov copyright? (Score:4, Insightful)
(1) The Wyoming state government is not the US government: state government works are not generally free from copyright.
(2) Plagiarism is separate from copyright violation, anyway. Using material that is not subject to copyright or is in the public domain that is from one unique identifiable source without crediting the source is plagiarism, as is using copyright material in a way that does not violate copyright without attribution (say, fair use.) Plagiarism isn't a violation of the law, but a violation of commonly accepted standards of integrity when it comes to not claiming other's work as your own.
Biographical articles. (Score:4, Funny)
They should write new and interesting histories for all these people rather than using the same old worn out ideas that are on so many places on the net.
All it takes is a little imagination.
A new birth place, better achivements (why could hitler not have discovered the cure for cancer and be the first man on the moon? It's better than the depressing story on Wiki at the moment.) and some creative editing would solve this problem once and for all.
Some Wiki articles are already better and contain things about people that have never happened, but sadly these often get put back to the same old boring stories almost as soon as the changes are made.
ok methodology, bad analysis (Score:2)
(Last Journal: Thursday May 03 2007, @11:34AM)
First, the sample size was 12,000. Where did that number come from? Were the samples picked randomly? Assuming so, is 12,000 a statistically an effective sample size? And if the samples are random, and the size is sufficient, is that 142 articles statistically significant, that is, are the number of matches outside the margin of error? In other words, does the sample size, selection, and methodology, merit a margin of error around 1%.
And then we get to the fact that sometimes wikipedia text is copied to other sites. This in itself leads to the conclusion that wikipedia has some credibility, even if unfounded. I found it interesting that we are not told how many articles off wikipedia were plagiarized. I also wonder what 'Wikipedia appeared to be the one plagiarized' means, and what systematic errors was introduced by that subjective judgement. Perhaps 1%?
There is no question that plagiarism is a big issue, and we all must watch for it. I am on the side that plagiarism in no more an issue than in the past, but with better communication and distribution, we catch it more. At some level, because it so easy to plagiarize now, we perhaps see more egregious cases of it.
What gets me is that an analysis of such low analytical value is news. I am once again amazed at how little people seem to know or care about proper logic. In the end all we know is that some study with questionable methodology produced 142 hits. Not a huge revalation, even if we stipulate the study is of even minimal value.
That is a very unreal scenario (Score:1)
Confused? (Score:2)
Even Virus authors contribute (Score:2)
The attackers had used a Wikipedia feature that archives all previous versions of articles when changes have been made. The malicious page thus continued to exist in the archive, and the attackers were able to point to it in mass emails.
See here [heise.de] , here [techworld.com] and here [theregister.co.uk].
Okay, Brandt is learning. (Score:1)
(http://www.iki.fi/wwwwolf/)
This is how you fix the problems with Wikipedia: Point them out in a way that makes the problems easy to fix. Okay, it's probably still harder to get criticism against user conduct and policies reacted upon, but the way Wikipedia works, the content is still easy to fix. Especially in the case of plagiarism.
I really wish people would conduct accuracy and plagiarism studies a bit more often - especially when it's easy to fix, like this.
And by the way, Wikipedia recently got a bot that finds suspected plagiarism [wikipedia.org], which is pretty cool.
Turns out they weren't plagiarized... (Score:2)
(http://cliveholloway.net/ | Last Journal: Saturday February 28 2004, @05:54PM)
Statically unsignificant (Score:1)
(http://www.jroller.com/page/shareme/Weblog)
Plagiarism or Copyright? (Score:2)
Wikipedia is not a PhD candidate. Wikipedia's job is to provide accurate information.
Of course, sources should be provided as well.
But legally, the real issue here is Copyright, isn't it?
"Plagiarism" and "Copyright infringement" are not synonyms.
There is no copyright in facts.
Therefore, nonfiction works are open to have the facts used in Wikipedia.
Where a verbatim transcription would not be fair use, someone needs to paraphrase.
Wikipedia bashing du jour (Score:2)
Citations? (Score:2)
(http://scorch.quickfox.org/)
These people who ramble on that Wikipedia is inaccurate almost appear to me like they never sat history class in high-school. Where you have to verify your sources.
I've also never heard of citing encyclopedias in research projects, ever. Good-grade coursework, also never seen them cite encyclopedia entries (they may cite information that was cited to on some encyclopedias).
Here is the link to my report (Score:2)
(http://slashdot.org/)
Why should Slashdotters care? Because while AP doesn't use links, Slashdot should have the courtesy of linking to the original sources that AP used to generate the report. (Plus AP also checked with Jimmy Wales for a reply, which is expected from professional reporters.)
The report is at http://www.wikipedia-watch.org/psamples.html [wikipedia-watch.org]
Wikipedia's own newsletter reports on it here:
http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_
The efforts of Wikipedia administrators to clean up the mess are chronicled here: http://en.wikipedia.org/wiki/User:W.marsh/list [wikipedia.org]
Of course, Slashdotters may continue shooting from the hip if they choose. It's what they do best.
Brandt vs. Wikipedia (Score:2)
He got into a dispute because he didn't like having his biography on WP (though it was constructed from publicly available news sources). He was generally combative and belligerent, and so was blocked and banned various times; check out the Talk archives for details. Afterwards he started a webpage where he attempted to list the real-world identities of the editors involved in the dispute.
Brandt is also the guy responsible for outing the anonymous editor in the Seigenthaler controversy.
Not an unflattering biography (Score:2)
(http://iabervon.org/~barkalow/ | Last Journal: Saturday May 31 2003, @02:01AM)
Wikipedia is now digg.com, without the credit! (Score:1)
(http://dotancohen.com/)
Bah.
http://what-is-what.com/what_is/digg.html [what-is-what.com]
Brandt's paper and Wikipedia's response (Score:2)
(http://math-www.uni-paderborn.de/~axel/)
142 isn't bad. (Score:2)
That's the great thing about open source and projects like wiki.
You encounter a problem, it's very easy for people to fix it quickly.
If those 142 items are real, they are probably already being fixed now if not all fixed.
Brandt's odd sayings (Score:2)
(http://cipher-text.blogspot.com/)
Well, yes. Not that odd, really, given that it is an encyclopedia.
"They go around claiming it's almost as good as Britannica."
Actually, Wikipedians don't, in my experience. Most are quite sober when it comes to comparisons with Britannica. Brandt may be referring to the journal Nature, which did make such a claim for science articles.
They are trying to be mainstream respectable.
Wikipedia is already pretty darn mainstream, and if by "respectable" Brandt means "free of plagiarised material", then he's correct.
Well (Score:1)
If you equate good with referenced (Score:2)
(http://slashdot.org/)
Whether something is plagerized or not, doesn't really impact the quality of it. If someone copied a great article into Wikipedia, then Wikipedia has a great article - just through foul play. There's previously been comparisons which have shown Wikipedia to be just as accurate as Britannica. Now, it's been a while since I looked at a dictionary, but from what I can tell Wikipedia has far more external references than your average encyclopedia. I guess mostly because a wikipedia page has little credibility on its own. So both as a reference and as a starting point it's better, what remains is just whether it's "respectable". With 99% own content, you can hardly say they've been using this as a strategy. I don't know what you could compare it with, it's as if one linux app copied some code, and someone called Red Hat not respectable for distributing it despite being completely unaware. Or better yet, tried to imply that their business is built on stolen software. What's next? "I can find text copied without permission on google. They're eeeeeeeeeeevil"?
Other concerns about Wikipedia (Score:2)
Now, an article presenting facts can be written by someone who has no academic qualifications but still represents the facts fairly and accurately, so I don't claim that a person MUST be academically qualified to write a good article, nor do I claim that an article is good just because a person with "academic" qualifications writes it. However, I believe that the articles' authors should be identified, and the article parts should be identified as primary, secondary or tertiary.
I go to the Wikipedia for information, but I'm cautious. I want to be able to cite the information in the Wikipedia, and that requires authors and accurate attribution.
Plagarism is common but usually promotional (Score:2)
(http://www.animats.com)
Plagarism shows up frequently in Wikipedia, but usually it's promotional. Typically, company X copied their "about" page into Wikipedia. Bands and musicians, usually ones that are a legend only in their own minds, try this. A new user associated with the thing being promoted is usually responsible.
Then there are the people with a collector mindset. They create endless minor articles like "Indiana State Highway 22" and biographical articles of long-forgotten city council members. Often by cutting and pasting. This is annoying, but complaints of copyright infringement are unlikely.
Ug another techono geek tries to prove he's smart. (Score:1)
If you'ver going to run an article like this and expect people to take it seriously, we need details. LOTS OF DETAILS.
Does my comment mean I don't think some of the content is uncredited, or stolen? No, it probably is, but anytime anyone presents what amounts to an experiement, it should be held to a scientific standard and subject to peer review, otherwise you end up with a bunch of people thinking something that is fact, is not, and something that is not fact is. People need to be reminded to think critically when we see articles like this, or any article that makes a claim based on "my research" or "my program". Just because you made an experiement that proves your hypothesis, doesn't mean it proves anything. I want more details, I want to review his findings, I want to review his process, and I want to see how deep he dug before he claimed that something in the public domain was actually not credited to its source.
How does he know? (Score:2)
(http://sogeeky.net/)
Wikipedia less than perfect... (Score:1)
(Last Journal: Friday August 24, @10:02PM)
Who is Daniel Brandt anyway? (Score:2)
(http://ygingras.net/)
So? (Score:2)
(http://www.leperkhanz.com/ | Last Journal: Wednesday October 01 2003, @05:17AM)
What stupid text book industry shill came up with this crack pot survey? And as somebody else pointed out ~1% of plagiarism isn't exactly high in my opinion.
Stop being an asshole, and if you DO find plagiarism, label it as such, and give us a better footnote as to where the original information came from.
Jesus!
rhY
I'd say this numerically proves its superiiority (Score:2)
(Last Journal: Friday June 30 2006, @11:10PM)
Wow.
I seem to remember from high school that the major dictionaries sometimes put made up words in their dictionaries in order to catch plagaristic competing dictionary makers. Similarly they'll add an extra fake definition to a word, and then watch over the next decade or two to see if another dictionary picks up the fake definition.
Encyclopedias do the same. Add some small tidbit of fake information to an article to see if it surfaces somewhere else.
I don't believe the dictionary and encyclopedia publishers do this by accident...they do it because they have experienced such things before and found this to be a very easy way to prove stupidity and plagarism on the other person's part.
Honestly...I think less then 200 our of over 12,000 articles is actually proving that it is quite good...and indeed non-plagarised. Especially considering that wiki articles tend to be significantly longer, more in depth, and with more recent and politically charged items in it...I think it proves quite a large degree of integrity on wikipedia's part.
Irony, pick up the white courtesy phone... (Score:1)
I have a problem with this part of the article... (Score:1)
Plagiarism the myth the fact. (Score:2)
I hope this isn't modded as flame bait, becuase I'm really being honest here in my argument. So, I'll get down to it.
First, American education enforces plagiarism. That's right! How so? Well, take for instance the fact that almost every test in any mundane American education facility almost always encourages the student to regurgitate a canned answer from a designated source of information. It gets even worse when you enter the University level, and is unbelievably worse yet, if you enter any top tier Univeristy (where the professors themselves demand you buy *their* book).
Even if such a class exists, as "Critical Thinking", there's really nothing truely critical about it. Factor into the above facts with another fact that American and European societies are bent on "Political Correctness". This only serves to deter true critical thinking, becuase any deterance or compelling factor to NOT speak you mind, regardless of how vulgar it is, is taking away from the full spectrum of perceptive analysis. Even acadamia is infected with this little bit, that's why you never see any books dedicated to the good things about Hitler, the bad things about Ghandi, even though any person in their right mind shall admit, even if in private, that Yin and Yang did not ellude either of the two. There's a formula for the above. If X is a positive admission, Y is a negative admission and Z is the general image you are trying to paint the person in (where Z is a magnitude of either X or Y), any X/Y granted that is not a magnitude of Z, then the opposite SHALL be grotesquely over exaggerated as to make the other negligable. That's why, not one single historian or author is willing to point out