Wikipedia and Plagiarism 267
Spo22a writes "Daniel Brandt found the examples of suspected plagiarism at Wikipedia using a program he created to run a few sentences from about 12,000 articles against Google Inc.'s search engine. He removed matches in which another site appeared to be copying from Wikipedia, rather than the other way around, and examples in which material is in the public domain and was properly attributed.
Brandt ended with a list of 142 articles, which he brought to Wikipedia's attention.... 'They present it as an encyclopedia," Brandt said Friday. "They go around claiming it's almost as good as Britannica. They are trying to be mainstream respectable.'"
That doesn't seem like alot (Score:2, Insightful)
Re: (Score:2, Insightful)
Sounds like much ado about nothing once more. *yawn*
Re:That doesn't seem like alot (Score:4, Insightful)
Re: (Score:3, Informative)
Re: (Score:2)
I guess the only thing this study tells us is that an UPPER limit on the number of plagiarisms is of the order of 1%. That's still an alarmingly high number.
Re: (Score:3, Insightful)
Considering that an audit of dead-tree encyclopedias hasn't been done, we can't say. What we CAN say is that its foolish to make a comparison with Britannica, when an audit of Britannica found 10% of 600 articles to be non-factual. The sources cited in those 10% disavowed the articles' contents.
This isn't all that surprising either, when you think about it. People cite people who cite people, and someone somewhere will mis-interpret what someone else wrote, or come to different conclusions while still ci
Re: (Score:2, Interesting)
Re: (Score:2)
Re: (Score:2, Insightful)
Re:That doesn't seem like alot (Score:5, Funny)
no, it's a wiki. If you find a problem with it, you add a template telling everyone that someone else should fix it.
Re: (Score:2)
Re: (Score:2)
Regarding the article, there is already a very active community weeding out Wikipedia of possible copyright violations. I don't know how this can be considered news.
Re: (Score:2)
Tycho from Penny Arcade said it best [penny-arcade.com], and this is a point that has never been addressed.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
He lists where they're plagiarized from on his website... click on each article and read the box at the top.
He's got a bit more information at these threads: [1] [wikipediareview.com], [2] [wikipediareview.com], [3] [wikipediareview.com] I don't agree with his conclusions, but he said he did put around three weeks of effort going over these by hand to make sure they were legitimate copyvios.
I hope you're not contributing... (Score:2)
Impressive (Score:4, Interesting)
Re: (Score:2)
Are you going to the prom? (Score:2)
Depends on what you're writing (Score:2)
If you're writing a summary article (e.g., on the current state of data mining [virginia.edu]), then as little as 10% (or even less) could be your own conclusions. However, if you're writing about your own research [virginia.edu], then you
Re: (Score:2)
Is this sample as biased as Wakeman from 151? (Score:2)
Does the article make any claims as to how Mr. Brandt chose the sample of 12,000 articles? How can we look for biases in the sample?
Not shocking, but not a big deal (Score:3, Interesting)
Wikipedia is written by a large community, and people make mistakes. I have read about other reference tomes that have been caught plagiarizing (for example, some encyclopedias or atlas's will put in a fake piece of data or a fake street so that they can easily determine if they're being copied from), and the turnaround time for fixing it can be years depending on the publishing cycle.
This isn't a condemndation of Wikipedia, despite Mr. Brandt's best efforts, it's a confirmation of why WP works.
Pfizzle. (Score:2)
Yes, it's a problem, but that's actually not a bad score at all. You probably get more plagiarism than that on college papers at good schools. How many of these articles cite what they "plagiarize," even if they don't put it in quotes? Also, to make it legal plagiarizing, all you have to do is re-write each paragraph in your own words.
I see 1.18% of articles as potentially having text lifted from somewhere
Re: (Score:2)
142 out of 12,000, some of which aren't really a problem, and that's numbers generated by a critic?
And a very... dedicated critic, too [crank.net].
I must admit there's a certain recursive appeal to the idea of someone being notable enough for a Wikipedia entry purely because of his vehement attempts to avoid being mentioned on Wikipedia.
As usual, the talk page [wikipedia.org] has lots of entertaining dirt.
(Uncyclopedia has the real low-down [uncyclopedia.org], of course.)
Re: (Score:2)
A school's honor code may be very different from a nation's copyright laws. (As they should be.) Ideally, if you come up with an idea in conversation with a few friends around a coffee table, and they contribute meaningfully to the genesis of the idea, you'll cite, thank them, or credit them in the finished product. But from a copyright status, while you can copyright the form of an idea, you can't usually copyright the idea itself--which is why you can write a new horror novel, or a new
Re: (Score:2)
The proof of the pudding (Score:2)
Conclusion; the best way of improving Wikipedia is by showing where it has a problem. Mr Brandt disproved his opinion. Live and learn.
Thanks,
GerardM
Daniel Brandt, valuable Wikipedia contributor (Score:5, Insightful)
"If you strike me down, I shall become more powerful than you could possibly imagine..." -- Obi Wiki-nobi
Re: (Score:2)
Besides, 142 out of 1,500,000 articles is only 0.009% of the content
I don't know (Score:2)
Re: (Score:2)
From an ex-wikipedia administrator (Score:2)
142 out of 12,000? (Score:2)
Buy, hey, 142/12000 is less than 2%. I would have thought the percentage would have been at least 5%.
Re: (Score:2)
Re: (Score:2)
US Gov copyright? (Score:2, Insightful)
The original article, Brandt said, was copied from a biography on the Wyoming state government site.
Err... I thought works of the US Government were generally free from copyright...?
Re: (Score:2)
Re:US Gov copyright? (Score:4, Insightful)
(1) The Wyoming state government is not the US government: state government works are not generally free from copyright.
(2) Plagiarism is separate from copyright violation, anyway. Using material that is not subject to copyright or is in the public domain that is from one unique identifiable source without crediting the source is plagiarism, as is using copyright material in a way that does not violate copyright without attribution (say, fair use.) Plagiarism isn't a violation of the law, but a violation of commonly accepted standards of integrity when it comes to not claiming other's work as your own.
Re: (Score:2)
Everybody with half a brain can suggest that the knowledge didnt manifest itself out of thin air, even without citations given.
Re: (Score:2)
Re: (Score:2)
Biographical articles. (Score:4, Funny)
They should write new and interesting histories for all these people rather than using the same old worn out ideas that are on so many places on the net.
All it takes is a little imagination.
A new birth place, better achivements (why could hitler not have discovered the cure for cancer and be the first man on the moon? It's better than the depressing story on Wiki at the moment.) and some creative editing would solve this problem once and for all.
Some Wiki articles are already better and contain things about people that have never happened, but sadly these often get put back to the same old boring stories almost as soon as the changes are made.
Re: (Score:2)
I also understand he was responsible for trippling the population of African elephants during his lifetime.
Re: (Score:2)
http://uncyclopedia.org/wiki/Adolf_Hitler [uncyclopedia.org]
ok methodology, bad analysis (Score:2)
First, the sample size was 12,000. Where did that number come from? Were the samples picked randomly? Assuming so, is 12,000 a statistically an effective sample size? And if the samples are random, and the size is sufficient, is that 142 articles statistically significant, that is, are the number of matches outside the margin of error? In other words, does the sample size, sele
Re: (Score:3, Informative)
Assuming that it is a binomial distribution then p=142/12000=0.0118, q=0.9882, n=12000 which means the standard error is sqrt(npq)=11.5 (approximately). Thus a 95% confidence interval is that the true number of plagiarised articles in the sample lies between 165 and 119.
And this is only plagiarism from on-line sites that are indexed by Google. Plagiarism from dead tree sources could we
Confused? (Score:2)
Re: (Score:2)
If you copy somebody's words, and these words are not in the public domain (for instance because the author is long dead or works for the U.S. government), and you can't defend the use as "fair use", then it's a civil offense and they can sue you (in some countries and severe cases it'
Even Virus authors contribute (Score:2)
The attackers had used a Wikipedia feature that archives all previous versions of articles when changes have been made. The malicious page thus continued to exist in the archive, and the attackers were able to point to it in mass emails.
See here [heise.de] , here [techworld.com] and here [theregister.co.uk].
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Turns out they weren't plagiarized... (Score:2)
Re: (Score:2)
Re: (Score:2)
Yes.
Not all issues are legal issues.
The issue here is verbatim use, anyway. An automated script is going to have more trouble finding use of "facts" from another source that aren't verbatim copies of the presentation.
Verifiability (Score:2)
Not exactly. The job of Wikipedia (or for that matter any other general encyclopedia) is to provide verifiable [wikipedia.org] information from reliable sources. Verifiability > truth until the truth becomes verifiable.
Wikipedia bashing du jour (Score:2)
Re: (Score:2)
Here is the link to my report (Score:2)
Why should Slashdotters care? Because while AP doesn't use links, Slashdot should have the courtesy of linking to the original sources that AP used to generate the report. (Plus AP also checked with Jimmy Wales for a reply, which is expected from professional reporters.)
The report is at http://www.wikipedia-watch.org/psamples.html [wikipedia-watch.org]
Wikipedia's own newsletter reports on it here:
http://en.wikipedia.org/wiki/W [wikipedia.org]
What Brandt _should_ do, rather than crowing (Score:3, Insightful)
Is release the script or code that he used to generate his 142 plagiarised articles out of 12,000.
Such a script, if tuned and more widely applied, could be extraordinarily useful in weeding out future instances of plagiarism.
142 articles flagged, 142 articles fixed within hours. That's Wikipedia working as no dead-tree encyclopedia can.
Of course, Brandt would never do anything as useful as that, but will probably content himself with continuing to "shoot from the hip" and claim this as a blow against
Brandt vs. Wikipedia (Score:2)
He got into a dispute because he didn't like having his biography on WP (though it was constructed from publicly available news sources). He was generally combative and belligerent, and so was blocked and banned various times; check out the Talk archives for details. Afterwards he started a webpage where he attempted to list the real-world identities of the editors in
Re: (Score:2)
Not an unflattering biography (Score:2)
Especially since he used to sell such info himself (Score:2)
"From the 1960s onwards, Brandt collected clippings and citations pertaining to influential people and intelligence matters. In the 1980s, through his company Micro Associates, he sold a database of citations of these clippings, books, government reports, and other publications."
Pot, kettle, hello.....??!
Brandt's paper and Wikipedia's response (Score:2)
142 isn't bad. (Score:2)
That's the great thing about open source and projects like wiki.
You encounter a problem, it's very easy for people to fix it quickly.
If those 142 items are real, they are probably already being fixed now if not all fixed.
Brandt's odd sayings (Score:2)
Well, yes. Not that odd, really, given that it is an encyclopedia.
"They go around claiming it's almost as good as Britannica."
Actually, Wikipedians don't, in my experience. Most are quite sober when it comes to comparisons with Britannica. Brandt may be referring to the journal Nature, which did make such a claim for science articles.
They are trying to be mainstream respectable.
Wikipedia is already pretty darn mainstream, and if by "respectable" Brand
If you equate good with referenced (Score:2)
Whether something is plagerized or not, doesn't really impact the quality of it. If someone copied a great article into Wikipedia, then Wikipedia has a great article - just through foul play. There's previously been comparisons which have shown Wikipedia to be just as accurate as Britannica. Now, it's been a while since I looked at a dictionary,
Other concerns about Wikipedia (Score:2)
Now, an article presenting facts can be written by someone who has no academic qualifications but still represents the facts fairly and accurately, so I don't claim that a person MUST be academically qualified to write a good article, nor do I clai
Plagarism is common but usually promotional (Score:2)
Plagarism shows up frequently in Wikipedia, but usually it's promotional. Typically, company X copied their "about" page into Wikipedia. Bands and musicians, usually ones that are a legend only in their own minds, try this. A new user associated with the thing being promoted is usually responsible.
Then there are the people with a collector mindset. They create endless minor articles like "Indiana State Highway 22" and biographical articles of long-forgotten city council members. Often by cutting and
How does he know? (Score:2)
Re: (Score:2)
Who is Daniel Brandt anyway? (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Then tubgirl is a smart woman...
Re: (Score:2)
Re: (Score:2)
victimless?!?!? (Score:2)
You're not only not buying the book of whoever did the (possibly expensive) research, you're not even crediting them so they get zero credit and because you've got the info you need you're even less likely to seek out the author's work! Just because the perpatrator(sp?) has little to gain commiting the crime doesn't make it victimless!
Re: (Score:2)
If you are the sort of person that needs to buy expensive research papers, you are not in the target audience for Wikipedia! Wikipedia is not intended to be used for professional research, it's just a little fact book that may or may not be correct, with some links to sources on each page. Nothing more. It's not going to be making a dent into your sales figures, so relax!
If you support Wikipedia, ma
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
There is a big difference between plagarised articles and articles with plagarised passages. Pretty much every medium has a significant plagarism rate, including scholarly journals.
The methodology in this case is more than a little suspect. At least 50% of Wikipedia is utter crap. There is fancruft, stubs, POV peddling forks. Anyone who is involved with Wikipedia will admit as muc
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
How works the Wherebot? (Score:2)
Re: (Score:2)
Re: (Score:2)
This guy is almost on the same level as Jack Thompson in terms of stupidity/ignorance.
Re: (Score:2)
Re: (Score:2)
Daniel Brandt doesn't like Wikipedia. His article there was started 'against his wishes,' and although he managed to get it deleted once by a few choice threats. it was quite rapidly created again. Ironically, the community now agrees that his anti-Wikipedia rantings have made him notable enough to be included in the encyclopedia.
Mr. Brandt is certainly not a nice person. While your words "politician" and "Republican" are completely
Re: (Score:2)
How can one be "well aware" of something that isn't true? Wikipedia's copyright policies (WP:C and WP:COPYVIO) address copyright violations, not plagiarism. You can have a copyright violation without plagiarism—for instance, if the use of properly quoted, properly cited material exceeds legal "fair use", it is not plagiarism while it is a copyright violation. And you can l
Re: (Score:2)
Not True (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)