Web Log 'Word Bursts' Could Identify New Crazes 242
Zorgatron writes "New Scientist reports that a researcher from Cornell University has come up with clever method of identifying what's cool by automatically searching weblogs. Sudden increases or "bursts" in the usage of particular words may reflect a new craze, according to Jon Kleinberg. He has demonstrated the technique by searching through state of the union addresses given since 1790." I wonder how long before this can be done real time enough to really make this useful.
Google? (Score:5, Insightful)
They have the capacity to do this, I don't see why they wouldnt.
Re:Google? (Score:4, Insightful)
As for how long it will be before we can do this in "real time", this all depends on what your definition of "real time" is. If you're happy with doing a few thousand blogs and getting results back in a few minutes, since at most only a few pages change on the aveage blog a day, I'd say any decent Perl guy could do that for you now.
Re:Google? (Score:3, Informative)
Blogdex (Score:5, Informative)
Nukular weapons (Score:5, Funny)
Has an important increase of the use of the word "nukular" been reported in the last few weeks then?
Re:Nukular weapons (Score:4, Insightful)
Does anyone else find this painfully obvious ? Certainly you wouldn't expect to hear the word "computer" much in FDR's state of the union addresses; just as you wouldn't expect to hear "icebox" in GWB's addresses.
The idea isn't as revolutionary as the author makes it out to be. People have been searching for terms in literature and using counts as indices of "importance" for a long time. Just to cite one example, researchers commonly use citation indexes to find out which fields are/were "hot".
But just think! (Score:3, Insightful)
This is a significant tool for the post-information age. It could reliable guage the effectiveness of viral marketing. It could also intercept sub-culture developments before they become popular, and introduce them to the general population in association with a corporate brand.
Imagine if Nike or Pepsi, or *shudder* Microsoft, had caught the "All Your Base" thing on the upswing. They'd have a better slogan than the top down "Dude, you're gettin a Dell".
Google (Score:4, Informative)
Re:Google (Score:5, Informative)
And they do that much already ... on their Zeitgeist page: http://google.com/zeitgeist [google.com]
But this is different. The article is about monitoring the blogs, not the searches. As suggested in another comment, this may be related to Google's acquisition of Blogger.
Re:Google (Score:2)
Of course they could do something similar with weblogs.
Cheers!
Costyn.
Re:Google (Score:2)
Re:Google (Score:3, Insightful)
I'm eager to see what will come up next with Google's recent entry in weblog world.
It's just what I thought when someone said " Blogs are like dreams; they're only interesting to the people they belong to".
Re:Google (Score:4, Insightful)
In a way, it should track even how languages evolve, how new meanings are given to existing words (i.e. in the past would anyone think that defensive attack were not opposite words?
I wonder if this kind of analysis can be affected by people like me that without proper knowledge of english write in it
Great.... (Score:5, Funny)
God Help Us (Score:4, Funny)
And CowboyNeal is the most popular man alive!
Re:God Help Us (Score:2)
Re:God Help Us (Score:3, Funny)
asshat asshat asshat
asshat asshat asshat
asshat asshat asshat
just doing my part...
Re:Great.... (Score:3, Funny)
"Dude, imagine if you had a Beowulf cluster of these things!"
Re:Great.... (Score:2)
No, that would be misleading advertising, because obviously:
In Soviet Russia, Pepsi drinks YOU!
Pre-emptive strikes? (Score:2)
Perhaps this emerging trend early warning system could be used to prevent such tragedies as the chronic overuse of the word "uber."
The first time I remember seeing "uber" being used was in the days when Microsoft's plan for world domination was described as "Windows uber alles." Since then, it's snowballed and these days, the word has been so overused it's simply become an annoying cliche.
If only we'd had an early warning system back then, we might have been able to prevent the uber-ification of Slashdot.
Re:Great.... (Score:2)
"In Soviet Russia. .
Which is a refreshing change, but often not as funny as:
(text of fortune cookie) ". . . in bed."
Conspiracy (Score:3, Funny)
"Okay, everyone write about polka dot socks tomorrow. And throw in something about drinking rotten milk. I bet we can start a new fad..."
"What's cool"? (Score:5, Insightful)
Re:"What's cool"? (Score:5, Insightful)
Re:"What's cool"? (Score:3, Insightful)
Re:"What's cool"? (Score:4, Insightful)
A Report on the Creators & Marketers of Popular Culture for Teenagers
Yeah, that's right. Popular Culture is manufactured -- everything the teenies think is "cool" or "hot" is identified months in advance by a highly sophisticated machine that probes the minds of kids to predict what will be the next trend so that the marketing establishment can gear up to take advantage of the short window where the "thing" is "cool" and can be sold to teens in such a way that they don't even realize what is going on.
Applications (Score:2, Interesting)
These techniques could easily be expanded to searching weblogs - I imagine the findings could be very interesting for content providers - eg a simple measure of what people want to read about.
Apache Logs too (Score:4, Interesting)
I, myself, am a distant third.
Write about enough things and then check your referral logs for Google and Yahoo searches (which include the query in the URL), and you get an imperfect idea of what people are interested in this week.
Useful? (Score:2)
Define "useful."
Re:Useful? (Score:2)
Someone once wrote (I'm really sorry, I forgot who - public thank you to the person who knows) something like "individually, nobody knows what's going on but collectively, we know exactly what's going on." This kind of meta-information is a social scientists wet dream, I bet. I admit I'm fascinated. It's very...William Gibson? Was he the guy who wrote that line? Damn it.
Useful? (Score:4, Insightful)
Yes, I bet the spammers can't wait until they can use it...
Imagine (Score:3, Insightful)
Feedback loop and dotcom crash (Score:4, Interesting)
The analysis only works if your tool doesn't start modifying the data you are analyzing. If this thing ever caught on, it would quickly become meaningless, because everybody wants to be part of whatever craze is going on. Every morning you check which words are hip, you put them on your website... etc. etc.
You are right about feedback: the buzz would become a terrible din. That said, it is a cool idea.
Re:Imagine (Score:2, Funny)
pathetic (Score:2)
So is this guy like Screech in Saved by the Bell, constantly looking for a way to impress Zack and the guys?
Useful for... (Score:1, Funny)
Seriously, just read /. if you want to know the important stuff of the day. :)
Re:Useful for... (Score:5, Funny)
Twice usually.
Re:Useful for... (Score:2, Funny)
Except now popularity will last about 6 hours, tops, before some new wave of pop culture replaces it. By the time craze "X" hits the craze detector, all the really cool people will already be onto craze "Y", which will be detected a few hours later.
It's like the whole "avant-garde/in-style/out-of-style/retro/back-in-s tyle" cycle managed by a Perl script in an infinite loop.
Microsoft! (Score:1)
Daypop (Score:5, Informative)
Its got the top 40 every day. Doing it some other way would only catch memes sooner. And if the system doesn't catch it until its popular, it really doesn't help. What we need is a large and complete database of all meme type things.
Oh, great (Score:4, Funny)
Not In Our Brand Name, say I.
Re:Oh, great (Score:2)
New! From Mattel!
Let the webloggers determine what's cool? Heh. (Score:4, Insightful)
Furthermore, this thing isn't telling me anything I don't know. So it finds the word "Vietnam" during the Vietnam years. Hooray. I bet it finds the word Iraq today, or the phrase "Bin Ladin" last year.
Whoopdie-do. I'm impressed
Re:Let the webloggers determine what's cool? Heh. (Score:5, Insightful)
Thats the whole point. Weblogs are not the mainstream media so he is betting that a new craze (or refresh of an old one) will show up there beofore the mainstream sites get a hold of. Face it, once it has hit CNN it is already past its sell by date.
Take the whole potato gun thing for instance. if this was appearing on peoples weblogs 6 months ago and an underground following had started then it would pick this up. Could be a perfect time for one of the toy companies to start producing a parent friendly version (Not sure how...but hey!). By the time the craze hits CNN Toys 'R Us is stocked with a version that fires water ballons, only uses compressed air and comes in 10 different plastic colours. Then they would have the advantage before the other companies jump on the bandwagon.
Of course, since there is only a very specific socioeconomic subset of the world population weblogging, what real usefulness does this give us?
A lot! Let me see, I have a large group of people who are rich, computer owning, and probably middle
To paraphrase Stevenson (Score:3, Insightful)
Re:Let the webloggers determine what's cool? Heh. (Score:2)
Viral marketing is something marketing officers have been trying to tap into forever, and this might help them determine how that sort of thing works; how information and trends are passed between people.
Also, maybe they could pinpoint characteristics of market leaders, i.e. those who talk about major trends before those trends get major.
It's useful *now* (Score:3, Insightful)
Why have to wait until it's realtime? Historical analysis is very useful, and not just to historians. Linguists, anthropologists, social scientists, etc.. Taking such a body of texts is called studying a "corpus," and such studies often yield surprising and interesting results (better than "atomic" showing up in the ocld war). A new method like this would be very useful to nearly every discipline in the humanities I can think of
Not all geeks are computer geeks. Not all nerds care only about the future.
The state of the World from google (Score:2, Interesting)
Re:The state of the World from google (Score:2)
This study is to find the things that the "fashionable" people are talking about BEFORE they get big.
No Kidding? (Score:2, Insightful)
MS is bringing out 3 Degrees which is reinventing IRC, this guy is telling us the painfully obvious, and I've been working on this little trick thats gonna really change the way we think of food, get this guys: I take two pieces of bread, a piece of cheese, and a piece of meat and stack it together.. I call this wonderful new life shaping discovery "The meat-and-cheese-on-bread" I really think it's gonna change how we eat!
Re:No Kidding? (Score:2)
But, if memory serves me correctly, this war was actually called "the big war" or "the war to end all wars" by its participants. It was only years later, when WW II erupted that they renamed the earlier conflict to WW I.
Over 100 years hindsight is 20/20. I think the goal of this technology is to provide hindsight over a span of days/weeks/months.
A new apache module...? (Score:2, Interesting)
The module would count words/phrases most commonly served (less tags and the top-n most common words in the language-encoding), then serves out the top-10 as HTTP header messages. That way, the results are unobtrusive and easy to recover.
Of course, this approach would inevitably be easy to skew/cheat. Anyway, that's my sixpeneth
And in other news... (Score:4, Funny)
so much for that theory (Score:3, Funny)
I guess this pretty much lays to rest the article about how nerds don't work to be popular. We automate it!
Relegence (~eNow) already does this in realtime (Score:2, Informative)
http://www.relegence.com
Zeitgeist and Memes (Score:3, Informative)
Sounds like a combination of Google's Zeitgeist [http] and LiveJournal's MemeTracker [livejournal.com]. In other words, nothing that new.
It's also the basis for Computational Lexicography. Doing analysis on large corpora. One of the interests people have in this field is introduction of new words in society. The field used to use corpora such as the British National Corpus [ox.ac.uk], but since the explosion of the Web, sites such as Google can far exceed that size. Weblogs are simply a good example of a more natural form of language. The interesting thing would be not so much to find new trends through words... but if we can truly solve the whole natural language parsing problem and use such information to extract higher-level knowledge
The Inevitable Result... (Score:2)
New article title (Score:2, Insightful)
This is great for customer support! (Score:2, Insightful)
I'm sure customer support employees are going to love this idea... This way you can keep up an appearance of actually having read the customer emails, while really just redirecting to
Interesting use in science research (Score:2, Interesting)
Geeks finally figure out what is cool? (Score:3, Funny)
All your base are belong to us (Score:2)
Examples (Score:2)
The words 'militia', 'British' and 'savages' were used a lot around the time the American 'militia' tended to fight the 'British' and what they called the 'savages'.
The word 'depression' was used a lot during the 'depression'.
The word 'atomic' was used a lot during the cold war, and 'Vietnam' was used a lot during the Vietnam war.
I am utterly at a loss as to how such a seemingly interesting field as tracking word usage (well, it interests me) could possibly yield such stupefyingly, numbingly, almost frighteningly obvious and dull results.
I can only assume the true significance of Dr. Kleinberg's results was simply too terrifying to be revealed...
Individual words? (Score:2)
With common words the language or the way society express itself could change in a way that doing simple word counting not show, at least, not show clearly.
strange... (Score:3, Interesting)
dont they... (Score:2)
Amazon.com (Score:2, Interesting)
Prior Art -- The Economist's "Recession Index" (Score:3, Interesting)
eg:
Dec 10, 1998 [economist.com]
Nov 21, 2002 [colorado.edu]
whats george michel (Score:2)
They both get sucked off in blogs.
Stamp consumer on my forehead... (Score:4, Interesting)
"For example, identifying word bursts in the hundreds of thousands of personal diaries now on the web could help advertisers quickly spot an emerging craze."
Gonfonit!!! Why does cool new social technology have to be related to ways to help people sell things to Americans! Why is it okay for us to be considered a nation of consumers, otherwise basically useless biological skinsacks?!
I'll just strap my wallet to my chest with duct tape now and write my social security number in huge numbers on the back of my t-shirt for fast credit checks.
Re:Stamp consumer on my forehead... (Score:2)
"I was brainwashed into becoming a mindless consumer and all I got was this high quality Beefy-T (tm) shirt"
The product is you!
I'll just strap my wallet to my chest... (Score:2)
So that's where all the duct tape went.
Man, it's hard to finish a good homebrew anything without duct tape.
Re:Stamp consumer on my forehead... (Score:2)
I'll just strap my wallet to my chest with duct tape now and write my social security number in huge numbers on the back of my t-shirt for fast credit checks.
No need to go through all that trouble, just sign up for Micro$oft Passport.
Great (Score:2)
Like harvesting the info about some (rand(10)+15) year old person writing bullshit about (boyfriends|girlfriends|music|movies|stupid online quizes|webrings) with a site design that's usually so horrible they could be succesfully sued for crimes against huminity. All the marketing companies would encounter is the hype they created a few days before the harvest. So it might work to check if hypes/trends work out, but looking at "blogs" (the very word disgusts me) for something new an innovative is about as futile as trying to comprehend Bush' ramblings. The few remaining web logs or journals, as I prefer to call them without retching, are mainly technical. What trends are they going to squeeze out of the journal of a team of developers who want to keep the outside world up to date about what has happened lately? That such and so compiler sucks? That the network admin is a bitch? That the coffee tastes like sewage waste?
Heck, if any of those marketing companies are GOOD, they'll MAKE their own trends, not ride around on the succes of others.
Paper is here (Score:2, Informative)
Data from state of the union addresses here. [cornell.edu]
That's too easy.. (Score:2)
Web Log 'Word Bursts' Could Identify New
and thought it must just go through blogs looking for long rambling outburts about black helicopters, FBI, greys and aluminium beanies. Blimey, that's half the bloggers out there - you don't need a program to identify the crazies!
Isn't that Google News... (Score:2)
Well, that idea was my entry attempt for the google programming contest [google.com], inpired by the Google Zeitgeist [google.com] which I personally find was too infrequent (and not to say static).
But finally they've put exactly that system for use in Google news [google.com]. Keywords that suddenly appear in many news sources get sent to the top of the front page. That's where I learnt of Columbia, a few minutes after it happened, and the first headlines didn't make sense at first.
So as usual, just search google !
Well.. (Score:2)
One thing I've been noticing recently is `N.B.' I don't really know what it means, but people use it to insert extra comments when writing or updating something.
How Long? (Score:2)
About 3 weeks after the patent expires.
* Note: I don't actually know if the guy patented the idea, this is a joke.
Useful? (Score:2)
Economist (Score:2)
It looks for the occurance of the word recession in major newspapers, and it's a pretty good predictor (better than most economists).
Unfortunately, a lot of the related articles are subscribed content.
File this under the "no shit" drawer. (Score:2)
The death of Fads (Score:2)
The requirement to exploit emerging creative difference will change those fads to something else.
Let's figure out what it will be and sell it!
FBI has been doing this for years now... (Score:2)
As you can see, even with advanced technology and a huge corpus of email and search requests, coolness is all about the mirrored shades and gold embroidery.
Britiah Savages! (Score:2)
Buzzword bingo (Score:2)
culture jamming (Score:3, Funny)
What a great opportunity for culture jamming! We just need a few thousand webloggers to start using weird words designed to repel "normal" people.
Obviously this could backfire and we could actually start a real trend. So, I propose that the first words we need to put out are ( geek || nerd ) && sexy. (And if you understood that, you must be hot stuff.) I'm willing to take this risk if you are.
Because being Cool (Score:2)
--Blair
"AAAAaaaaaaaayyyyyyyyyyyyyy..."
I Already Have This Software (Score:2)
The first and last day this runs (Score:2)
theonion.com something something FUNNY!!!!!!
Which Garbage Pail Jr Kid are U?! TRY IT!! I'M SMELLY MCGEE!!!!
I'm gonna move out at 18!!! MOM SUCKS!!!!
New Homestar Runner this week. SO COOL!!!!
Dog bites man
IE update out!! AWESOME PRIVACY CONTROL DUDEZ
My mom totally hates my hair!!! SUCH A BITCH!!!
Wow you can block pop-unders!! KEWL ITS FREE!!
Flash RULEZ!! This movie is wack ya'all!!!
Buffy something something!!! !!!!!!
Man bits dog back!! SUPER IRONY!!
etc
Blogolalia (Score:2)
Let's see it in action (Score:2)
where is an example? (Score:2)
OK, show us! Why all the talk and no examples?
If these simple algorithms exists, why doesn't the article give us a site that actually uses these algorithms, so we can see what's popular today for ourselves?
Re:Word Bursts? (Score:2)
When I saw the term, I thought of Tourette's syndrome. Consider it: wouldn't it be nice if presidents did suffer from Tourettes "word bursts" in their State of the Unions? It'd be the only words they said that weren't written on the teleprompter by paid lackeys -- and we'd know more about the state of the union, no question.
Re:Hopefully they don't read slashdot for this (Score:5, Insightful)
Which begs the observation: once poeple know the rules that determine what a "word burst" is and when it's happening, then tools will be developed to artificially inflate desired word burts
Create a few hundred shill accounts across thousands of blogs, then each accounts on each blob will make a couple posts with the pre-determined phrase, and you have a manufactured word burst.
Like a few years ago, when poeple sold the ability to seed search engines so your site is in the top of the results list based on certain keywords.
Google makes that harder now, but it's always a contest between those who develop the rules (or algorithm) and those who seek to manipulate the data or the rules of the game.
A manufactured word burst I can remember from before the 2000 election was 'gravitas'. That word came out of nowhere, and was suddenly all over the media, used to describe a quality that Dubya was lacking. There was a talking points memo somewhere that was very widely distributed -- which is the analog version of what I am describing.
Look it up.
Reminds me of Asimov's Foundation (Score:3, Insightful)
once poeple know the rules that determine what a "word burst" is and when it's happening, then tools will be developed to artificially inflate desired word burts
The Three Theorems of Psychohistorical Quantitivity:
1. The population under scrutiny is oblivious to the existence of the science of Psychohistory.
2. The time periods dealt with are in the region of 3 generations.
3. The population must be in the billions (±75 billions) for a statistical probability to have a psychohistorical validity.
Re:Hopefully they don't read slashdot for this (Score:2)
Talking of manufactured words, there are a lot of such words which float around, some of which are created ones, like Medireview or common mistakes, like anyways as well as lingos like wanna/gonna which are part of the langauge.
It gets particularly interesting after a while to watch the stats. I'd infact written a primitive paper on such behaviour long ago, it can be found here at my site [metlin.org]. I'd also written an agent based on this, details of the agent can be found here [metlin.org].
Re:Hopefully they don't read slashdot for this (Score:2, Funny)
"Natalie" "Portman" "Soviet" "Russia" "1337" and "Dell"
Re:Art Exhibit (Score:3, Interesting)
The listening post is an art exhbit that more or less lives. It monitors certain chat rooms, and posts messages from those chat rooms to a wall of small lcd displays.
Is that the Great White's newest hit? (Score:2)