Tracking the Congressional Attention Span 89
Turismo writes "Ars Technica covers a new research project that uses computers to look at 70 million words from the Congressional Record. The project's goal was to track what our representatives were talking about at any given time, and researchers were able to do it without human training or intervention. From the article: '...researchers found, for instance, that "judicial nominations" have consumed steadily more Congressional attention between 1997 and 2004. In fact, the topic produced the most number of words published in a single "day" of the Congressional Record: 230,000 on November 12, 2003.' It looks like automated topic analysis has truly arrived."
Re:Pro-Gress vs Con-Gress (Score:5, Interesting)
Correlate the two and you'd really have something.
No, not that. What I meant was who outside of Congress is trying to push buttons, and who inside Congress is helping them. Also, you'd be able to watch for what you may consider important topics to see how they are dealt with.
Process Process Process (Score:5, Interesting)
Garbage in, Garbage Out. (Score:2, Interesting)
http://www.townhall.com/columnists/JohnStossel/20
Congress Zeitgeist (Score:3, Interesting)
So in web2.0 terms, this is Google Zeitgeist meets the Statistically Improbable Phrase analysis like you see on Amazon. Find pairs or sets of words which are out of the statistical norm for English, then start to track their rise and fall among the "marketplace of ideas" in Congress. Also, on the c|net news site, they have two graph views to visualize connections between similar-topic stories or often-viewed "hot" stories.
It would be interesting to see how many phrases are just a matter of the odd language that Congress uses. There's a stock metaphorical phrase for just about anything, and there are also a lot of phrases that are steeped in tradition which often get misunderstood by layfolk.
Re:Pro-Gress vs Con-Gress (Score:4, Interesting)
Just 'cause I was mildly interested (I've heard that wordplay before), I read the dictionary's entries for progress [reference.com], congress [reference.com] and con [reference.com].
And it appears con (when used in pros/cons of a decision) is different to con/com (the prefix).
The gress suffix is from indo-european ghredh (to go) and pro & con have root meanings of advance/forward & to meet respectively.
Progress = Forward Go.
Congress = Meet Go.
See also: Clustering senators by votes & topic (Score:5, Interesting)
It uses not only word data (from the text of 16 years worth of bills voted on in the U.S. Senate), but also the senator's voting records.
For example, you can see that Sen. Chafee (R-RI) (who was mentioned on this morning's NPR as a "liberal Republican") actually does fall into a cluster of Democrats, not fellow Republicans. When automatically discovering topics using word data alone (without the votes, as does the wustl.edu paper above) the topics on this Senate data are reasonably coherent, but the topics created by this "Group-Topic" new model are even more interesting because their discovery is driven by the need to predict the votes as well as the words. For example, "Social Security" doesn't appear in the old model, but pops out clearly in the new model because it has such a distinct voting pattern.
Some of the other results are also pretty interesting---on Education and Domestic policy the Republicans are more split than the Democrats (forming 3 groups, to the Democrats 1 group). On other topics, the split is the other way around.
Using the same technique, there is also an analysis of 60 years worth of voting records from the U.N. On the topic of "human rights", Nicaragua, Papua, Rwanda, Swaziland and Fiji all get clustered together---ouch!
Congressional Record *IS* false (Score:3, Interesting)