Text Mining the New York Times 104
Roland Piquepaille writes "Text mining is a computer technique to extract useful information from unstructured text. And it's a difficult task. But now, using a relatively new method named topic modeling, computer scientists from University of California, Irvine (UCI), have analyzed 330,000 stories published by the New York Times between 2000 and 2002 in just a few hours. They were able to automatically isolate topics such as the Tour de France, prices of apartments in Brooklyn or dinosaur bones. This technique could soon be used not only by homeland security experts or librarians, but also by physicians, lawyers, real estate people, and even by yourself. Read more for additional details and a graph showing how the researchers discovered links between topics and people."
Homeland security (Score:4, Insightful)
Funny (Score:1, Insightful)
You mean clusty.com? (Score:3, Insightful)
I just read the stub of the article... because it seemed like it does exactly what clusty does and I don't care to read anymore.
They're late to the game. (Score:4, Insightful)
Google AdSense network has done this for years to serve contextually-relevant text ads across thousands of websites. Yahoo now, too.
Re:Homeland security (Score:2, Insightful)
The graph also shows links betwen US_Military and AL_QAEDA, and between ARIEL_SHARON and Mid_East_Conflict. If only they'd had this technology when they were trying to justify the invasion of Iraq.
"Look, Saddam Hussein has links to Al Qaeda! You can see it on the graph!"
"Uh, Mister Vice-President, this graph is based on press conferences in which you repeatedly mentioned Saddam Hussein and Al Qaeda in the same breath. It may not have any statistical value."
"Shut up and bring me my war britches, dimwit, the computer never lies!"
Re:Homeland security (Score:3, Insightful)
The drunk replied, "I'm looking for my car keys."
The Officer looked around in the lamplight, then asked the drunk, "I don't see any car keys. Are you sure you lost them here?"
The drunk replied, "No, I lost them over there", and pointed to an area of the sidewalk deep in shadow.
The policeman then asked, "Well, if you lost them over there, why are you looking over here?"
The drunk looked at him and said, "Because the light is better over here."
Searching for terrorists by datamining from the comfort of your cubicle is about as likely to be successful.