Catch up on stories from the past week (and beyond) at the Slashdot story archive


Forgot your password?
The Internet

English Wikipedia Gets Two Millionth Article 125

reybrujo writes to inform us of a milestone for the English-language Wikipedia: the posting of its two millionth article. At the time of this posting there is uncertainty over which article achieved the milestone. "Initial reports stated that the two millionth article written was El Hormiguero, which covers a Spanish TV comedy show. Later review of this information found that this article was most likely not two million, and instead a revised list of articles created around two million has been generated, and is believed to be correct to within 3 articles. The Wikimedia foundation, which operates the site, is expected to make an announcement with a final decision, which may require review of the official servers' logs."
This discussion has been archived. No new comments can be posted.

English Wikipedia Gets Two Millionth Article

Comments Filter:
  • by suso ( 153703 ) * on Monday September 10, 2007 @07:59AM (#20536837) Homepage Journal
    Mediawiki doesn't count all articles in its article count. And I'm not talking about talk or image pages either. I think it has a threshold of like 72 bytes before it counts an article as an article. So they are most likely way over 2 million. For instance, Bloomingpedia actually has 2,148 articles right now but the Mediawiki count on the front page only shows 2,106. So 42 of the articles are smaller than the threshold.

    However, if they (or anyone else) need a plugin for Mediawiki that will list the pages in order so that you can count them and determine which article was the Nth article, I wrote a plugin called Page Create Order [] that will put a special page called "List Pages By Creation Date" in your wiki. We developed it for Bloomingpedia originally. Its simple, but it does the job. It could be easily modified to only count articles that are of a certain size as well, the main purpose of this plugin is to see the order in which pages where created.
  • by IBBoard ( 1128019 ) on Monday September 10, 2007 @08:08AM (#20536881) Homepage
    That depends on the encoding - either 72 characters in ASCII or UTF-8 or 36 characters if they go for the more multi-lingual friendly UTF-16.

    Either way, something about that length is likely to be a stub and not a 'real' article.
  • Re:The millionth (Score:3, Informative)

    by Hachey ( 809077 ) on Monday September 10, 2007 @08:32AM (#20537031)
    The 1 millionth article was Jordanhill Railway Station []. Ironically, the 2 millionth article was almost a train station as well, this time just outside of Tokyo.

  • Re:That was quick (Score:3, Informative)

    by WWWWolf ( 2428 ) <> on Monday September 10, 2007 @09:19AM (#20537405) Homepage

    Can you be notable for being not-notable? Or famous simply for being famous? ... Before you answer "no" think of celebrities like Paris Hilton...

    Basically, the situation is this: Notability has its thresholds - either you are notable or not (though where exactly to draw the line is, at times, difficult - but we have pretty clear picture by now). Articles about people, bands, groups, companies, websites, etc. have to have assertions of notability (i.e. "they're really big in Pakistan and have released three albums", or whatever). Notability has to be backed up by reliable sources.

    This leads to the situation that 1) people who are famous for failing at something can be considered notable enough for articles of their own (provided someone noticed and documented that in a reliable source), and 2) worthless celebrities are, alas, notable enough for articles because they probably have had verifiable media appearances.

    (Think of it this way: if I had not heard about Paris Hilton before, I'd go to the article, come to the conclusion that she's a worthless celebrity, and be done with it. If there was no articles about her, I'd probably ask "hey, this... thing is on TV all the time, what the heck has she done to get there, anyway, and why isn't there an article about her?" =)

  • by ajs ( 35943 ) <> on Monday September 10, 2007 @11:02AM (#20538927) Homepage Journal
    If you would like to help review newly created articles, just follow this URL: []

    This will take you to the list of the most recently created articles. If you find that you have trouble keeping up with other editors who are reviewing the same articles, you might find this link useful: []

    Which will take you to the same list, but starting from the 250th most recent article.

    Typically, it's most useful to

    Anyone can do these things, and you can also just improve on any article by adding additional sources, or expanding on the article.
  • by Taxman415a ( 863020 ) on Monday September 10, 2007 @01:09PM (#20541029) Homepage Journal
    Wikipedia has never been interested in unique information. One of the first policies was the one against original research []. That certainly doesn't mean there isn't a place for original research, (those are plentiful), nor does it mean Wikipedia isn't valuable. By collating and linking vast amounts of information, Wikipedia does something google can't. It creates the presentation of the information manually. Google can only index content that is already there through an algorithm. And for a long time if not forever, there will be information that is not online. Further, Wikipedia summarizes information like Google will likely never be able to. Even if a Wikipedia article is not all right, it can give you an idea of where to go look and what to look for, which is perhaps it's only truly valuable contribution until there is a way to formally peer review and freeze content so that the reader can see a version that is stabilized.
  • by Carnildo ( 712617 ) on Monday September 10, 2007 @02:47PM (#20542639) Homepage Journal
    By your definition, Wikipedia has somewhere between 1,500,000 articles (discarding *all* articles about popular culture) and 1,900,000 articles (discarding just the things you consider "cruft"). The largest group of articles are biographies (30% of the encyclopedia), followed by articles on places (25%), popular culture (25%), and history (10%).
  • by Taxman415a ( 863020 ) on Monday September 10, 2007 @09:51PM (#20547727) Homepage Journal
    Well original research just happens to be the name of the policy, but it covers all unpublished ideas and thought. And what I was saying is that Wikipedia intentionally avoids that type of thing as a necessary evil to maintain improvement in quality. Otherwise you either need a power structure that can say yeah or nay on content or you open floodgates to all the latest crackpot theories and information.You have to spend enough time on the project to reallize there isn't an in between. And again, it's not like there aren't lots of other sources for publishing that other valuable non published information. That's what post-docs are for right? :)

    A manual presentation layer. I'm content-driven, personally, a slick presentation does not increase my perception of the value of information.
    - Everybody says that, but studies show time and time again that the way information is presented has drastic effects on how much information gets accross and how it is percieved. Next you're going to tell us ads don't affect you.

    Right, so it's an automatic (and thus more up-to-date) presentation layer, which carries quantifiable and repeatable bias by virtue of being algorithmic.
    - What you're missing here is that google indexes links to information, it does not summarize the actual information as Wikipedia does. Even if the information you wanted was always in a google search, you still then have to collate it and judge sources, etc. Also quality information is not all or perhaps even mostly online right now. The work of summarizing the information is valuable, and if it is already done for you can get you further ahead on the task at hand.

    Why should a wiki be "stabilized"? Why is "formality" a virtue when wikipedia was created and gained value from non-conformance to traditional models?
    - Because the real goal is information quality. Demonstrable quality in a way useful to the reader/researcher. The non conforming, radically open current system has been shown to be successful in producing content, a smaller portion of it of reasonably high quality. But studies and observation of Wikipedia show that it has extremely high variation in quality. From articles replaced with "YO MAMA SO PHAT..." to widely reviewed articles citing and properly summarizing all the best written material on the subject. Formal peer review can lead to higher information quality and if that reviewed version is available as an option, default or not, can allow the best of both worlds. (like the Linux kernel and most other software) Then there can be both a radically open article that may be more up to date, balanced, etc, and a stable version that is at least guaranteed not to be vandalized. The amount of stabalization could be as little as that or as much as the formally reviewed case, or both. Thus the best of both worlds, content is produced, and high quality content is available, and the review processes can be demonstrated.
  • by nothings ( 597917 ) on Tuesday September 11, 2007 @05:36PM (#20562137) Homepage
    In order to deal with the very real threat of vandalism (let's not pretend it wasn't vandalism that sparked the changes in how wikipedia runs)

    No, the "no original research" rule was instituted to deal with physics crackpots. This is documented on wikipedia itself if you actually delve into the pages about the rule.

    There is no good way for wikipedia to differentiate between the personal experiences or knowledge of a 73-year-old rocket scientist wunderkind, a crackpot writing stuff in his garage, or a published scientist dabbling poorly outside his actual area of expertise. So wikipedia just disallows that sort of thing entirely, and draws instead on the difficulty in those people publishing their work in peer-reviewed journals or mainstream publications by setting threshholds in that direction.

    And it's not wikipedia's fault if the knowledge of a 73-year-old-Jim-Yardley knower isn't preserved. Anecodes and anything else from him can be written down on any web page and preserved for posterity that way. (And if they get media attention because they're not crackpottery, they may make it into wikipedia someday.)

    The goal of preserving absolutely everything known by every human, but only the good stuff, is unsatisifiable, and wikipedia aims on the extremely conservative side of the problem. It may not seem like that with all the pop culture crap to be found there, but wikipedia isn't a single coherent entity, it's a teeming mass of random people following the rules to varying degrees of accuracy and with no consistency at all. Somehow people care more about following the rules when it comes to rocket science than when it comes to character summaries of last year's big TV show. And isn't that awesome?

The rich get rich, and the poor get poorer. The haves get more, the have-nots die.