from the sound-of-google-knocking-on-your-door dept.
Grv writes "Researchers at University of California-Irvine have announced a new technique they call 'topic modeling' that can be used to analyze and group massive amounts of text-based information. Unlike typical text indexing, topic modeling attempts to learn what a given section of text is about without clues being fed to it by humans. The researchers used their method to analyze and group 330,000 articles from the New York Times archive. From the article, 'The UCI team managed this by programming their software to find patterns of words which occurred together in New York Times articles published between 2000 and 2002. Once these word patterns were indexed, the software then turned them into topics and was able to construct a map of such topics over time.'"
"Card readers? We don't need no stinking card readers."
-- Peter da Silva (at the National Academy of Sciencies, 1965, in a
particularly vivid fantasy)