Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×

Text Mining the New York Times 104

Roland Piquepaille writes "Text mining is a computer technique to extract useful information from unstructured text. And it's a difficult task. But now, using a relatively new method named topic modeling, computer scientists from University of California, Irvine (UCI), have analyzed 330,000 stories published by the New York Times between 2000 and 2002 in just a few hours. They were able to automatically isolate topics such as the Tour de France, prices of apartments in Brooklyn or dinosaur bones. This technique could soon be used not only by homeland security experts or librarians, but also by physicians, lawyers, real estate people, and even by yourself. Read more for additional details and a graph showing how the researchers discovered links between topics and people."
This discussion has been archived. No new comments can be posted.

Text Mining the New York Times

Comments Filter:
  • Homeland security (Score:4, Insightful)

    by Anonymous Coward on Saturday July 29, 2006 @06:42AM (#15805041)
    For every time homeland security is mentioned as benefitting of a new technology, you should get a swift kick to the nuts. Goddam, there is more than just terrorism in this world.
  • Funny (Score:1, Insightful)

    by vllbs ( 991844 ) on Saturday July 29, 2006 @06:48AM (#15805055)
    A relative new method? A difficult task? Sorry, but these are almost laughable, even for a poor spaniard like me.
  • by SirStanley ( 95545 ) on Saturday July 29, 2006 @07:21AM (#15805111) Homepage
    You mean they can group data by topic? Like clusty.com does when you search?

    I just read the stub of the article... because it seemed like it does exactly what clusty does and I don't care to read anymore.
  • by alcohollins ( 64804 ) on Saturday July 29, 2006 @08:09AM (#15805212)
    Not revolutionary. In fact, they're late.

    Google AdSense network has done this for years to serve contextually-relevant text ads across thousands of websites. Yahoo now, too.

  • by mrogers ( 85392 ) on Saturday July 29, 2006 @08:21AM (#15805228)
    But the pretty graph [primidi.com] clearly shows that some guy called MOHAMMED is the missing link between Religion and Terrorism - without this new technology, homeland security experts might have been kept in the dark about that.

    The graph also shows links betwen US_Military and AL_QAEDA, and between ARIEL_SHARON and Mid_East_Conflict. If only they'd had this technology when they were trying to justify the invasion of Iraq.

    "Look, Saddam Hussein has links to Al Qaeda! You can see it on the graph!"

    "Uh, Mister Vice-President, this graph is based on press conferences in which you repeatedly mentioned Saddam Hussein and Al Qaeda in the same breath. It may not have any statistical value."

    "Shut up and bring me my war britches, dimwit, the computer never lies!"

  • by 1u3hr ( 530656 ) on Saturday July 29, 2006 @08:54AM (#15805308)
    The compulsory "Homeland Security" link makes me think of the story about a drunk who was crawling about on the sidewalk under a lamppost late one night. A Police Officer came up to him and inquired, "What are you doing?"
    The drunk replied, "I'm looking for my car keys."
    The Officer looked around in the lamplight, then asked the drunk, "I don't see any car keys. Are you sure you lost them here?"
    The drunk replied, "No, I lost them over there", and pointed to an area of the sidewalk deep in shadow.
    The policeman then asked, "Well, if you lost them over there, why are you looking over here?"
    The drunk looked at him and said, "Because the light is better over here."

    Searching for terrorists by datamining from the comfort of your cubicle is about as likely to be successful.

"I've seen it. It's rubbish." -- Marvin the Paranoid Android

Working...