English Wikipedia Gets Two Millionth Article 125
reybrujo writes to inform us of a milestone for the English-language Wikipedia: the posting of its two millionth article. At the time of this posting there is uncertainty over which article achieved the milestone. "Initial reports stated that the two millionth article written was El Hormiguero, which covers a Spanish TV comedy show. Later review of this information found that this article was most likely not two million, and instead a revised list of articles created around two million has been generated, and is believed to be correct to within 3 articles. The Wikimedia foundation, which operates the site, is expected to make an announcement with a final decision, which may require review of the official servers' logs."
Likely a lot more than 2 million (Score:5, Informative)
However, if they (or anyone else) need a plugin for Mediawiki that will list the pages in order so that you can count them and determine which article was the Nth article, I wrote a plugin called Page Create Order [bloomingpedia.org] that will put a special page called "List Pages By Creation Date" in your wiki. We developed it for Bloomingpedia originally. Its simple, but it does the job. It could be easily modified to only count articles that are of a certain size as well, the main purpose of this plugin is to see the order in which pages where created.
Re: (Score:2)
How much is 72 bytes worth of text anyhow?
Re: (Score:3, Informative)
Either way, something about that length is likely to be a stub and not a 'real' article.
Re:Likely a lot more than 2 million (Score:5, Insightful)
For example, UTF-16 needs a lot of porting effort, while UTF-8 magically works in all 8-bit-clean programs that don't need to count codepoints or tell character properties (and hey, bytes happen to _be_ 8-bit wide so unless you do something strange, you are 8-bit-clean). Most English-speaking developers won't put this effort, so here goes your multi-lingual friendliness.
Or another, more insidious flaw of UTF-16: it gives people a false feeling that they can store an entire character in a single array position. This works... as long as you don't meet any character over U+FFFF (rare Han[1], etc) or characters which need to be written using a base char + combining characters (Indic scripts, etc). UTF-8 makes no such promises, and thus doesn't lead to such non-obvious bugs.
UTF-16 is an abomination that needs to go. Unfortunately, it's entrenched in Windows API: you need to use BlueScreenW() instead of BlueScreenA() everywhere, and this is something people who don't need internationalization don't want to do. Even as of Vista, Microsoft still doesn't allow simply setting the system's code page to UTF-8, something which the whole Unix world[2] did years ago.
[1]. And according to People's Murderous Commiepublic of China's laws, you need to support these (as GB18030) in any product sold in mainland China. Of course, they don't give a damn about that law unless they want to demand a favour from a company so they have a yet another stick of non-compliance).
[2]. All non-toy distros do this by default, and if not for few whiners, non-UTF8 locales would probably be dropped by now.
Re: (Score:2)
That depends on the encoding - either 72 characters in ASCII or UTF-8 or 36 characters if they go for the more multi-lingual friendly UTF-16.
UTF-16 more multi-lingual friendly than UTF-8? Er... it has many disadvantages and not a single benefit over UTF-8.
The touted benefit of UTF-16 is that for those who make almost no use of the 7-bit ascii set (the only characters that are represented by a single byte in UTF-8), it can improve the speed of reading/scanning and ultimate size of many files.
In practice, this isn't really going to happen in most Web-based text, but for electronic versions of non-Web text, it can be a win. Overall, however, I agree with you that there's more benefit in using UTF-8 universally.
Re: (Score:2)
You are describing the benefits of UCS-2, which is the character encoding used by Windows NT and .NET and Java. UCS-2 characters are fixed-width, but UTF-16 characters (like UTF-8 characters) support surrogate pairs, so you never know how long a character might be without
Re: (Score:2)
btw, Windows NT uses UCS-2 not UTF-16. UTF-16 has no benefit over UTF-8, but UCS-2 is convenient for string operations (since UCS-2 chars are fixed width, allowing for O(1) string indexing).
Re: Likely a lot more than 2 million (Score:2)
something which the whole Unix world[2] did years ago.
[2]. All non-toy distros do this by default, and if not for few whiners, non-UTF8 locales would probably be dropped by now.
Unfortunately, that isn't quite true. As far as I know, none of the BSDs use UTF8 by default. I have verified it on FreeBSD 6.2, but I cannot imagine the {Net,Open}BSD would use it either. Internationalization is definitely an area where Linux is above and ahead of BSD.
I've heard rumors that that's one of the things being improve for FreeBSD 7.0, but I don't know just how improved it is.
Re: (Score:2)
Re: (Score:2, Insightful)
Re: (Score:1)
MediaWiki actually uses the number of articles with at least one internal link [usemod.com].
That was quick (Score:4, Funny)
Re: (Score:2)
Re: (Score:3, Informative)
Can you be notable for being not-notable? Or famous simply for being famous? ... Before you answer "no" think of celebrities like Paris Hilton...
Basically, the situation is this: Notability has its thresholds - either you are notable or not (though where exactly to draw the line is, at times, difficult - but we have pretty clear picture by now). Articles about people, bands, groups, companies, websites, etc. have to have assertions of notability (i.e. "they're really big in Pakistan and have released three albums", or whatever). Notability has to be backed up by reliable sources.
This leads to the situation that 1) people who are famous for faili
Re: (Score:2)
Confusion? (Score:2, Funny)
The millionth (Score:3, Funny)
Which was the millionth article then? Not that it really matters, just being curious, cause I'm like, bored..
Re: (Score:3, Informative)
Re: (Score:2)
You keep using that word. I do not think it means what you think it means.
Re: (Score:1)
Is it so important? (Score:3, Insightful)
Do we have so few problems that we have the need to statistically know EVERYTHING? Does that matter (other than to inflate the vanity of a few?).
Re: (Score:3, Funny)
Re: (Score:2)
Or something like that.
Re: (Score:3, Funny)
It's 207 in case anyone's interested.
Re: (Score:1, Offtopic)
Re: (Score:2)
Re: (Score:1)
Re: (Score:1)
What I love about Wikipedia.... (Score:5, Funny)
The 2,000,000 article is actually the last article to be part of the first 2,000,000 articles and the 2,000,001 is the first of the third million.
I'm glad they cleared that up - I wondered whether the 2,000,000 article might be actually the one millionth or perhaps the 4 millionth....
Re: (Score:1)
Re: (Score:1)
Re: (Score:1)
Re: (Score:3, Funny)
Oh and wow, the Firefox spell checker thinks 'zeroth' is a word. Score one for Asimov (or did he not coin it? Whoever it was then, colour me curious!)
Re: (Score:3, Funny)
maybe a c++ programmer?
Re: (Score:3, Interesting)
So "zeroth" is perfectly good word, and Asimov (who really didn't understand computers all that well) probably didn't coin it.
I once had a CS professor who insisted that his students number the sections in their papers from 0 ins
Re: (Score:3, Interesting)
It could have been worse. (Score:1)
Re: (Score:1)
It would be interesting to know (Score:2, Insightful)
Re:It would be interesting to know (Score:5, Insightful)
Re: (Score:2, Insightful)
Re: (Score:2)
The rub is that Wikipedia presents itself as a "real" encyclopedia, when it clearly isn't. If they didn't make such an issue out of the whole "notability" thing it wouldn't be so bad - as it is, it really looks like hypocrisy. I've got nothing against having all those articles up there - I've read a few of them myself. But wikipedia is presented to the world as a real encyclopedia, with high standards to match (e.g. the "accuracy competition" with Britannica) - and yet the vast majority of its material does not relate to anything real or important by any stretch of the (non-geek) imagination. When 50% of Britannica is composed of biographies of Captain Janeway and Buffy Summers then Wikipedia will be able to count itself as a real encylopedia, but not before.
Fair enough. But we could put this to the test. Has anybody done a survey to find out how the wiki articles break down by topic? My guess would be that even if the cruft were the majority of the articles, wiki is so large that it would still have britannica beat in sheer volume in the serious categories.
Re:It would be interesting to know (Score:5, Insightful)
It seems to me (and apparently the GP as well) that you're criticizing Wikipedia for not having the same limitations as a paper encylopedia. Who cares what proportion of the articles fall into some niche category, as long as one can still easily find all the information one is looking for? The simple fact that a physical encyclopedia has limited storage space and thus cannot contain in-depth articles on every little special-interest detail does not appear to me to somehow constitute an advantage for physical encyclopedias.
Or were you perhaps simply protesting the direct comparison of article counts between Wikipedia and Britannica? That I could understand, since the comparison could hardly be fair. Their requirements are simply too different for any direct quantitative comparison to be meaningful.
Re: (Score:2)
Re: (Score:1)
Wikipedia now has 136,000 articles on comic book and sci-fi characters, as opposed to last year's 86,000!
Wikipedia now has 347,000 articles on natural science and math, as opposed to last year's 297,000!
Now, no offense to the comic book crowd but I simply don't give a rat's ass about how many articles on obscure trivia there are, or by how much they've increased
Re: (Score:2)
Re: (Score:2, Insightful)
'It would be interesting to know how many "real" articles there are. That is, if you took out the individual articles for all the boring
Re: (Score:2)
Sadly I've seen too many very useful Wikipedia articles being deleted over the years. There seem to be a number of Wikipedia people setting the bar too high for stuff like notability. Wikipedia isn't a paper encyclopedia, we don't need to restrict the content by size.
The most recent example I've seen was the CallWeaver article - now CallWeaver (which is quite a big Free software project) has no Wik
Re: (Score:1)
Re: (Score:1)
Also I'm not sure the divide is whether it's fiction or not - I would expect there to be articles on subjects such as Star Trek (not necessarily every episode)
Re: (Score:1)
Re: (Score:1)
That, and because the people who had a bad experience with Wikipedia were feeling bad today.
wiki boys determine this (Score:2)
The current consensus says it is right to create a page about a fiction person in a sf serie, but a page about a real person needs a lot more noticeably to be not deleted on the fast track list.
If you think it is right to place a external link. Think again. wikithink will most likely remove it.
Re: (Score:1)
Irrelevant articles tend to get either improved or deleted so the majority of the 2 million articles are almost certainly *used* (the count does not include tiny stub articles, redirects etc...). You have obviously never tried to create an article on Wikipedia or you would know that vanity, fan
Re: (Score:2)
Re: (Score:1)
Re: (Score:3, Informative)
Around a hundred, give or take (Score:2)
Spanglish Wiki? (Score:5, Funny)
Wow, that's ironical.
Oops! CSD-A7! (Score:1)
How many articles do other encyclopedias have? (Score:3, Interesting)
Re: (Score:3, Funny)
The size of the Britannica has remained roughly constant over the past 70 years, with about 40 million words on half a million topics.
http://en.wikipedia.org/wiki/Encyclop%C3%A6dia_Br
Re: (Score:1)
Wikipedia does [wikipedia.org], of course. It turns out some of the largest encyclopedia's written were in Chinese, and that brings in a number of complexities in determining which is larger/est. Of course if you look at all languages of Wikipedia, it's over 8 million now [wikimedia.org], but that is mostly repeats.
Of course, the results will be edited... (Score:2)
and then of course (Score:3, Interesting)
It was "speedy kept", but amusing that a stratified sample [wikipedia.org] shows not only that wikipedia is filling these days with trivia, but also bureaucracy.
(Yes, I have a bee in my bonnet about wikipedia even though I love it -- see my sig.)
Re: (Score:2)
Re: (Score:2)
I find it telling (in as much as a stratified sample can tell you anything) that the two millionth article:
1. was a rather trivial piece on a TV show.
2. triggered a bureaucratic response.
Contrary to what you may believe, bureaucracy is not a good thing in most circumstances. It is certaintly not the only source of fairness in the world, and indeed in many cases it actually generates large amounts of unf
Re: (Score:2)
Actually, I added you as a foe quite some time ago. I don't remember what it was for. In case you're wondering, I have no modifiers on "foe" posts; they're not censored.
Re: (Score:2)
Regarding the AfD discussion, how is that a particularly bureaucratic process? Everyone gets a chance to voi
Re: (Score:2)
Uhhh.... maybe. Just like looking at a single random person on the street can be enlightening. With precisely the same cautions. As I've left Wikipedia, I'm not particularly in the mood to go on and do a bunch of statistical analysis on the articles.
You can help review new articles (Score:3, Informative)
http://en.wikipedia.org/wiki/Special:Newpages [wikipedia.org]
This will take you to the list of the most recently created articles. If you find that you have trouble keeping up with other editors who are reviewing the same articles, you might find this link useful:
http://en.wikipedia.org/w/index.php?title=Special:Newpages&limit=250&offset=250&namespace=0 [wikipedia.org]
Which will take you to the same list, but starting from the 250th most recent article.
Typically, it's most useful to
Anyone can do these things, and you can also just improve on any article by adding additional sources, or expanding on the article.
Yeah, but hasn't Wikipedia jumped the shark? (Score:5, Insightful)
If wikipedia is only going to allowed references to things already published elsewhere, and all written culture is inevitably moving online, how will wikipedia differentiate from Google? I mean, if there's no unique information in wikipedia, there's very little unique value in it. It's just a really labor-intensive presentation layer at that point, isn't it?
Re: (Score:1)
Re:Yeah, but hasn't Wikipedia jumped the shark? (Score:5, Informative)
Research isn't what I'm talking about. (Score:2)
By collating and linking vast amounts of information, Wikipedia does something google can't. It creates the presentation of the information manually.
So... like dmoz. A manual presentation layer. I'm content-driven, personally, a slick presentation does not increase my perception of the value of information.
Google can only index content that is already there through an algorithm.
Right, so it's an automatic (and thus more up-to-date) presentation layer, which carries quantifiable and repeatable bias by virtue of being algorithmic.
And for a long time if not forever, there will be information that is not online.
And increasingly, if your information source is not on-line,
Re: (Score:2, Informative)
Re: (Score:3, Informative)
No, the "no original research" rule was instituted to deal with physics crackpots. This is documented on wikipedia itself if you actually delve into the pages about the rule.
There is no good way for wikipedia to differentiate between the personal experiences or knowledge of a 73-year-old rocket scientist wunderkind, a crackpot writing stuff in his garage, or a pub
Re: (Score:2)
In order to deal with the very real threat of vandalism (let's not pretend it wasn't vandalism that sparked the changes in how wikipedia runs)
No, the "no original research" rule was instituted to deal with physics crackpots. This is documented on wikipedia itself if you actually delve into the pages about the rule.
The "changes" I meant to refer to were not specifically the "no original research" rule, but rather the increasing content policing with ever increasing adherence to more and more rigid rulesets. But it's kind of a bogus argument for me to make, anyway, since I suspect the vandalism and responses to it have been present since the start, and I'm just giving my own impression without researching anything. I've been dealing with aoe driver issues all week, my brain is mushy.
There is no good way for wikipedia to differentiate between the personal experiences or knowledge of a 73-year-old rocket scientist wunderkind, a crackpot writing stuff in his garage, or a published scientist dabbling poorly outside his actual area of expertise.
I disagree. I think the wiki wa
Re: (Score:2)
I know a few retired rocket scientists. I'd love it if their unique knowledge didn't go to the grave with them. I'd rather be able to look up the definition of a "yardley" as a unit of pressure than a list of characters from Harry Potter. Unfortunately, wikipedia doesn't seem to be interested in anything that's "from personal knowledge or experience" these days. [wikipedia.org]
Because Wikipedia is supposed to be an encyclopedia, not an original publication. I agree that this kind of knowledge should be archived and documented, but there are better places for it.
For example, there's a wikibooks page. You could try building an open textbook on rocket science. There's wikia [wikia.com] where you could build a rocket science Wiki. These are mostly pop-culture or community based wikis, but you could make a serious special interest wiki, with original content, if you wish.
Then you could link to i
Re: (Score:1)
...Wikipedia is supposed to be an encyclopedia, not an original publication.
Huh? The other encyclopedias are original publications. The articles I wrote in the distant past for Wikipedia were all original text from my brain... mostly from personal knowledge, with no cites at all. Some of those articles are huge now, and certainly most are far better than they were when I originally wrote them, but I think none would be unchallenged today.
Thank you for the wikibooks reference and wikia link, incidentally. Wasn't aware of those.
2M, give or take 1M (Score:2)
By some time next month I expect the 2Mth article will be more like the 1,990Kth.
and in other news... (Score:1)
More profit and conflict of interest for Wikia? (Score:2, Flamebait)
More details of this fiscal conflict of interest, that pads Wikia's pockets with each public relations brouhaha like this:
http://wikipediareview.com/blog/category/wikia/ [wikipediareview.com]
Why not the 2000001st (Score:1)
Re: (Score:1)
Re: (Score:2)
Re:Just one question (Score:4, Interesting)
Re: (Score:1)
You can't even quote an encyclopedia on a college paper, so why should anyone be using one?
So this makes encyclopedias useless? If you say so.
Re:Just one question (Score:5, Insightful)
But seriously, Not every source has to be academical to be of use. For many subjects, wikipedia is an excellent starting point. You might want to take lemmata on controversial subjects like Palestine and the Evolution with a grain of salt, but for many a subject the articles on wikipedia are of excellent quality.
Re: (Score:1)
Actually, giving it a quick glance, I don't see any reason for there to be significant problems with the evolution article? Thankfully, NPOV doesn't mean "let the Creationists get equal say", and I suspect attempts to work in a pro-ID viewpoint would get reverted.
Re: (Score:1)
Wikipedia thrives on controversial subjects (Score:3, Insightful)
Because they draw people to try to reflect their points of view; and when you read the article (say, abortion [wikipedia.org] or evolution [wikipedia.org] or software patents [wikipedia.org]) you can gain a quick overview on almost any significant point of view on the subject, and how they relate to each other. Yes, individual viewpoints may not be perfectly reflected. But you *do* gain an incredibly broad view, which no traditional encyclopedia can deliver.
Wikipedia is much more likely to be useful on a controversial subject where people feel incline
Re: (Score:1)
Sad, but true.
Re:The Bad one is Generally History (Score:2)
My favorite example were the articles on freemasonry; there was an intense defensive tone and a lot of sweeping generalizations about how awesome the temples were and how every member was a paragon of moral humanity. While I hope it's been tuned down a bit since then, I'm always on the lo
Re: (Score:1, Flamebait)
You might want to take lemmata on controversial subjects like Palestine and the Evolution with a grain of salt,
</quote>
Or Afrocentrism [wikipedia.org], or Scientology [wikipedia.org], or Han Chauvinism [wikipedia.org], or Jihad [wikipedia.org], Islamophobia [wikipedia.org], or any article relating to politics, religion, history, personalities, art, or any humanities subject. Most of those articles were taken over by partisan propaganda groups and their admin backers a long time ago. Then again, after a few months, another partisan group takes over and changes th
Re:Just one question (Score:4, Interesting)
Re: (Score:2)
Re:Just one question (Score:4, Insightful)
Who cares? I mean honestly, who does?
In the long run, this is quite a minor historical marker. We're going to see article 5 million and MAYBE that will matter a little more. Maybe.
You can't even quote Wikipedia on a college paper, so why should anyone be using it
Correct - it's rather dumb to use it on a college paper (like using a regular paper encyclopedia); however, Wikipedia is the fastest starting point and is a good medium on not only specific information on subjects and sources, but also on the opinions of people with education, expertise, and bias on their subjects. If you dig into some controversial topics' histories, there is actually some VERY good information to wade through and find sources on. The end result is not perfect, the system IS flawed, but the information that you can glean from digging and researching STARTING at Wikipedia is quite useful.
Plus, the specialized wikis that are popping up that are using wiki-style management for their small wikis (where REAL experts can actually post) may be the bigger genius behind wikipedia).
If your complaint about wikipedia is that the final articles are flawed, you're right...but look at the process behind some of those articles and the histories. Dig into that, and you find what you need.
Re: (Score:1)
Re:Just one question (Score:4, Insightful)
Wikipedia is a research tool, not the swiss army knife of research.