Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Google Researchers Create TV Audio Analysis System

Posted by CowboyNeal on Sat Jun 10, 2006 09:33 AM
from the tuning-in dept.
segphault writes "Ars Technica reports on a paper (PDF) about ambient audio analysis authored by Google researchers. The system described in the paper can effectively determine what television show a user is watching just by capturing a short audio clip. The paper explains how a regular computer microphone can be used to record an audio clip that is then converted into a statistical data summary and transmitted to a remote server which matches the clip against archived data in order to ascertain which TV show it is associated with. Apparently, the system is fully viable, and other kinds of ambient noise don't negatively impact its accuracy. The paper also describes how web services can provide contextually relevant information based on a consumer's television viewing activities."
This discussion has been archived. No new comments can be posted.
Display Options Threshold:
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • I have just this to say... (Score:2, Funny)

    by Anonymous Coward on Saturday June 10 2006, @09:35AM (#15508969)
    Big Brother is listening you!
  • This already exists? (Score:4, Interesting)

    by abigsmurf (919188) on Saturday June 10 2006, @09:36AM (#15508974)
    There's a system in the UK where you can go out clubbing, here a song you like, dial a number and hold the phone out to the music and it'll text you the name of the song. Assuming they don't hire scores of extremely knowledgable music buffs with quick fingers, surely it's a very similar system. TV dialogue may be less distinctive to the human ear but to a computer it just means a larger amount of data to search through.
    • Re:This already exists? by Nimloth (Score:3) Saturday June 10 2006, @09:50AM
      • Re:This already exists? by MagicM (Score:3) Saturday June 10 2006, @10:11AM
      • Re:This already exists? by sam1am (Score:2) Saturday June 10 2006, @10:27AM
      • Re:This already exists? by B1ackDragon (Score:2) Saturday June 10 2006, @10:46AM
        • Re:This already exists? (Score:4, Informative)

          by asuffield (111848) <asuffield@suffields.me.uk> on Saturday June 10 2006, @01:32PM (#15509925)
          I always wanted to have the ability to "hash" songs, and come up with an algorithm that would be robust enough to work across multiple codecs and encoding options, different (relative) normalizations, and maybe even be able to handle empty space at the beginning and/or end of the song.

          It's been done. Here's a system where you can hum a tune and it tells you the song: http://www.musipedia.org/ [musipedia.org]

          Current systems are mostly based on pitch changes, so they aren't perfect (especially with the recycled slush turned out by low-grade high-visibility pop acts), and largely useless for rap, but they mostly work. There are numerous variations on the system, this is just one of the more significant ones that is publically availabel on the web.

          I would think by making a hash based on values relative to sound signatures within the clip this might be possible, but I don't really know how this stuff works

          What google is doing may or may not be related. They might instead be using a form of speech recognition technology, or a combination of both, or something else entirely.
          [ Parent ]
    • Re:This already exists? by Anonymous Coward (Score:2) Saturday June 10 2006, @10:41AM
    • Its called Shazam by Sanity (Score:2) Saturday June 10 2006, @11:14AM
    • Re:This already exists? by 70Bang (Score:2) Saturday June 10 2006, @03:01PM
    • Re:This already exists? by fuzzix (Score:2) Saturday June 10 2006, @05:21PM
    • 2 replies beneath your current threshold.
  • by Anonymous Coward on Saturday June 10 2006, @09:36AM (#15508978)
    Is THIS why Google has been returning so many porn sites on my searches lately?
  • Uses & Motives? (Score:5, Insightful)

    This seems like a not too complicated idea. You create an inexpensive operation that extracts what features you want from the sound data. Most importantly, you avoid features that are prone to randomness and entropy. It would take some research to figure out what the best features are and that's the audio fingerprint.

    Since Google has more storage than you can imagine, they can most likely apply this fingerprinting technique to every episodes of every major show. Then they host the fingerprints in Google style and use their patented "Google Technology" to search it much the same way web content is searched.

    Why would you want this? Well, there's the obvious marketing ploys. You know that people who watch Darma & Greg like to shop at Trader Joe's and like Odwalla brand food so you offer free episodes of Darma & Greg with only Trader Joe's & Odwalla episodes. You let the sponsors (Trader Joe's and Odwalla) foot the bill for the bandwidth/royalties or whatever.

    The second useful implication would be cross suggesting shows to a user based on random sampling of the shows. You could allow users to watch old TV shows on the internet and then build a profile of them and their shows. Much how Amazon works, you could then suggest other shows, other DVDs of shows or perhaps build a site that randomly shows the user episodes that they might like based on prior viewings and statistics of other users.

    The take away from this article for me was the fact that Google has vested interest in archiving and now television will be archived Google style.

    I can't think of many other uses for this as the system isn't really "inferring" or "thinking" about data samples but is more so matching extracted features against a database. You know, voice recognition software allows for decent voice fingerprinting. You could most likely easily identify characters based on voices (but not actors due to stars like Hank Azaria who do multiple voices). Then you wouldn't need a database of all shows but more so just a database of character voice fingerprints. I would find this sort of approach more interesting but less specific and useful.

    Aside from showing this off to your friends, it's not very useful. What I personally would like to see this new Google strategy applied to is all the tapes recorded of famous people like the United States Presidents. If you divided those up into sessions and I was listening to a particular tape of the Nixon set where he talked about the "new right", perhaps a database with references would then point me to some tapes or materials on Joe McCarthey's staunch views on the right.
  • Subpoena (Score:3, Insightful)

    by wombatmobile (623057) on Saturday June 10 2006, @09:40AM (#15508994)

    Designed to maximize user privacy while minimizing dependency on unique hardware, the system described in the paper seems interesting and feasible. In order to protect user privacy, the software uses "summary statistics" automatically generated from ambient audio rather than transmitting an actual recording. The actual audio cannot be extrapolated from the summary statistic data, so the system doesn't "overhear" or transmit user conversations.

    Still, if the data reveals what show the person is watching, your President or anyone else who gets to see the data might start treating you differently depending on what you are watching latley.

    • Re:Subpoena by B1ackDragon (Score:2) Saturday June 10 2006, @10:49AM
  • TVDB (Score:2)

    by itsthebin (725864) on Saturday June 10 2006, @09:41AM (#15509000)
    (http://www.soisanook.com/)
    will help to add meta data to all those mpeg4's you have bittorrented or recorded on your DVR
  • So Google researchers find a way to find you TV watching habits with only a simple computer microphone, and in the same paper they describe how they could use the microphone to find more about you for your online profile?

    This seems to be just asking for privacy concerns.
  • by baomike (143457) on Saturday June 10 2006, @09:56AM (#15509061)
    This , I think assumes US tv. Does it work with clasical music, Canadian French TV, would it work on The National?
  • What... (Score:1)

    What, thay haven't patented it yet?

    Oh... I guess that would have to be a dupe^H^H^H^Hseparate story in YRO.
  • What about p0rn?! (Score:1)

    by Ruvim (889012) on Saturday June 10 2006, @10:00AM (#15509075)
    There could be all kind of films with any number of people, doing all kinds of things, all using the same exact soundtrack! Have they thought of THAT?!
  • by FudRucker (866063) on Saturday June 10 2006, @10:05AM (#15509090)
    i would like to see a graphical qeualizer or something similar so i can remove all the loud music (drama effect) in TV shows, i hate it when the music is drownding out the dialog/conversation
  • by beeps (161734) on Saturday June 10 2006, @10:12AM (#15509113)
    shame on google

    remember cuecat? that funky little free barcode reader from radioshack?

    http://yro.slashdot.org/article.pl?sid=01/06/10/04 10208 [slashdot.org]

    in one incarnation i beleive they included a jack on the device and the end user was suppose to hook up an audio cable from one's tv to cuecat v.2. the computer would do all the heavy lifting, eventually finding a hidden tone that would magically pull up an advertiser's web page.

    it was spam magic that never took off. gee, i wonder why.
  • by Super Dave Osbourne (688888) on Saturday June 10 2006, @10:23AM (#15509151)
    I have not owned a TV for 17 years, so Google subcontracting or farming out to their own sub-conglomerate-company 'researching' the use patterns of TV viewers is in no way going to directly affect me getting spam calls to find out what I'm watching, and then sell me more spam calls or directed advertising on based on use. Is Google really that determined to become the next M$, just as invasive and just as annoying? I had hopes they were not.
  • Nielson (Score:2)

    by sam1am (753369) on Saturday June 10 2006, @10:35AM (#15509183)
    Or you could just use the audio information encoded [norpak.ca] by Nielson and their portable people meters [arbitron.com].

    Actually - it appears they do the same thing Google's researchers talk about already:
    What happens if no audio code is present in the sample home?

    Nielsen's patented Nielsen Media Monitor Sites (MMS) collect and store a constant stream of unique audio signatures for each broadcast, cable, and satellite signal received, covering all 210 TV markets. This includes all client PBS stations and client cable origination channels.

    If any station's NAVE encoder is inadvertently interrupted, the A/P Meter installed in Nielsen sample homes uses the same patented technology to collect and store passive signatures for all non-encoded programming viewed. These signatures are downloaded each night to Nielsen's operations center. To identify viewing, the passive signatures collected from the A/P meter in the home are matched against the signatures collected by the MMS. This process occurs during the normal overnight data collection.

    The passive signature-matching engine in the A/P Meter system is intended as a fail-safe back-up system, to be used when codes are not present in the signal.
    Reference [nielsenmedia.com]
    • Re:Nielsen by apnielsen (Score:3) Saturday June 10 2006, @12:35PM
  • Privacy Maximization (Score:3, Interesting)

    How about outlawing electronic easedropping without written consent? I won't use Macromedia Flash because it turns the microphone on. That's creepy and all non free software with a microphone can do the same thing. It would be better if that kind of thing were against the law.

    In the mean time, I avoid non free software and even have bad thoughts about my cell phone.

  • Similar tech (Score:2)

    by Dan East (318230) on Saturday June 10 2006, @10:41AM (#15509216)
    (http://dexplor.com/)
    This reminds me of a past Slashdot story [slashdot.org].

    Dan East
  • I have tested it (Score:3, Funny)

    by houghi (78078) on Saturday June 10 2006, @10:56AM (#15509287)
    (http://www.houghi.org/)
    and while whatchine Fox news, I was pointed here: http://tinyurl.com/z9x2y [tinyurl.com]
  • recognizing sound samples (Score:4, Insightful)

    by mstrcat (517519) * on Saturday June 10 2006, @11:07AM (#15509325)
    I don't watch TV much, so I could care less about identifying the TV shows. But what I really would like is an app that would accurately identify mp3 files and apply artist, track #, ect. I've tried a few of the availible programs such as Replay Music and their accuracy is horrid. Maybe Google can do it better. Of course the other use I see for this is identifying music in movies and older TV shows. Newer TV shows do a great job of identifying music, but some older shows (season 1 of The Wire) have great music clips that aren't named in the credits.
  • New Concept? (Score:1)

    by E10Reads (732984) on Saturday June 10 2006, @11:37AM (#15509427)
    They been doing this for years with the microphone in my TV?
  • Posts a screenshot with something like:

    You're watching Girls on Girls on Girls on Girls on Goats, #29.
  • Peoplemeter? (Score:2)

    by DaCool42 (525559) on Saturday June 10 2006, @12:00PM (#15509529)
    (http://slashdot.org/)
    You mean like peoplemeter [www.bbm.ca]? There is an audio version of this for radio as well.
  • by LM741N (258038) on Saturday June 10 2006, @12:20PM (#15509642)
    I'm wondering if a similar system can be used to cut out commercials from TV shows? It would be great to have some system that doesn't rely on anything incredibly sophisticated to accomplish this. Perhaps like tripping on commercial type keywords and then using IR remote to mute the TV or switch it to an usused video input to blank the screen.
  • by 0WaitState (231806) on Saturday June 10 2006, @12:25PM (#15509674)
    Rather than "improving" the content of the commercials I see, how about using the technology to recognize and mute commercials that I've previously flagged as either really, really annoying (eg. that oil company's talking cars, which is like being stuck on the subway listening to really stupid people talk about stupid things), or way too loud relative to the actual TV show, or simply shown too many times?

    I, for one, would gladly pay $10 extra per month to have a button on my remote that when pressed kills the audio feed for the currently on-screen commercial now and whenever that commercial comes up again. I wouldn't even mind if a message was sent to the advertiser saying "Hey, somebody is actually paying not to hear your crap". Negative feedback can be a good thing.
  • eyes wide shout (Score:3, Interesting)

    other kinds of ambient noise don't negatively impact its accuracy

    This very statement presupposes that other noise is irrelevant, which seems bogus.
    Snoring is background noise, and suggests non-watching.
    Laughter is background noise, and suggests careful watching.
    Of course, the laughter might not be about what's on TV...

    watch [reference.com] v. tr. 1. To look at steadily; observe, carefully or continuously: watch a parade.
    look [reference.com] v. To employ one's sight, especially in a given direction or on a given object:
    --The American Heritage (R) Dictionary

    It seems to me that watching is an activity involving the eyes and mental processing. It seems to me that audio of what is coming out of the TV is not a statement about either the eyes or about mental processing. This technology of Google's may be an advance in something, but I hope the advertisers paying for this data have their eyes open about the nature of what they are buying because (to re-mix a metaphor) to my eyes this sounds a bit suspect.

    Sociologically, it sounds like a foot in the door to get harmless censors in place. Oops, Freudian slip there. That's sensors, I mean. Google would never involve itself with censorship.

    Once the sensors are in place, when "we" realize that it's not getting "us" the data "we" want, we'll just do a few "harmless" downloads of "upgrades", perhaps causing a minor tweak to look at the video data rather than the audio, or perhaps doing language processing after all, and ... With user-friendly software like this, who needs spyware?

    I also question the claim that because no information is transmitted back to Google that this is the definition of not invading privacy. How is this fundamentally different than the claim that if the police search your house but find nothing, they have not invaded your privacy because they've not placed any record of illegal activity on your permanent record?

    It seems to me that once you place a Turing Machine into someone's environment, capable of doing arbitrary processing, and all it sends is a sanitized report, you have all the mechanism in place for abuse. What if the Turing Machine, capable of arbitrary processing, decides that it doesn't want to send a sanitized report. Who is auditing what is sanitized and what is not?

    What if it turns out to later be possible to lift information from the supposedly cleansed records? Who will audit the use of that data?

    There seem to me to be a lot of slippery slopes here.

  • Where is that 'do not listen' button (Score:2, Insightful)

    by magwm (466805) <MagWm&gmX,net> on Saturday June 10 2006, @01:55PM (#15510011)
    (http://www.lofzin.nl/ | Last Journal: Tuesday October 22 2002, @05:25AM)
    I'd hate google desktop (or any other google utilitty) spying on my mic to discover my musical preft or anyting else. no tv in my home, but what about the speed at which i type or the general noise in my home or how often my phone goes off or how hard or long my baby cries.. do not listen on my mic, please: 'click' . imagine how many things can be recorded and easily recognized in a home. and many a pc/laptop/headset has a builtin mic, useful to skype, which can thus be used. horror.
  • turn your laptop into a visual aid (Score:3, Interesting)

    by po_boy (69692) on Saturday June 10 2006, @02:26PM (#15510107)
    I'd like to implement something like this for myself, but with conversational noise instead of TV. I sometimes use my laptop as a visual aid during conversations in my living room. If we're talking about a particular topic, I may pull up a relevant wikipedia article, or something like that. I wouldn't mind if this were more automated.

    I can envision running a speech-to-text translator on my laptop mic and then piping that text into my beagle desktop searcher, or maybe even one of those google desktop search tools on windows. I'd rather not send this data to google, for privacy reasons, though.

    I could see this being useful at work, or in a conference or class, too. I could stand to have relevant pieces of notes that I took from previous classes pulled up with my professor mentions a particular topic.

    Anyone know of a tool or project like this?
  • by Mr. Freeman (933986) on Saturday June 10 2006, @03:12PM (#15510265)
    Alright, the article says that you need a microphone listening to your TV right? Now, unless Google is sneaking microphones into everyone's homes, the only way you could be spied on is if you agreed to have one placed in your home. If you agree to have one placed in your home, you probably aren't worried about Google spying on you.
    Saying that Google will use this to spy on people is like saying that the NSA will spy on people who email them all of their personal information, daily habits, etc.
  • by Slippy. (42536) on Saturday June 10 2006, @04:33PM (#15510457)
    So to sum this up: I give up my privacy at home. For...better targetted ads?

    I'm very skeptical this wouldn't be abused - if not by Google, then by someone else. And even if this is not abused, I run the risk for what?

    I don't like ads now.

    Everyone who loves the idea of personalized ads, put up your hand!

    ----------

    From the other side, what will your friends think when that "random" ad for viagra pops up?

  • Cool! (Score:1)

    by dasuridai (606603) on Saturday June 10 2006, @09:07PM (#15511221)
    I think this sounds like a great new tool for enhansing television as a learning tool. Shows like NOW [pbs.org] are often telling viewers to log onto the website for additional information. Wouldn't it be nice to simply have the computer pull that info up for you as you watch the program? This could be top down or bottom up, with tv shows creating web pages to be loaded, or a wiki style approach where the viewers themselves suggested links(this would of course be difficult during first broadcasts).

    What's more, from a commercial standpoint, this doesn't have to be directly related to the program at all. Certain inaudable (to the viewer) clips could be inserted into all kinds of programming to trigger a specific function from the computer. Certainly there are privacy ramifications to this, but I think google is doing the right thing by using their creative staff to push the boundary and experiment on projects such as this. YMMV.

  • Nothing new here (Score:1)

    by eyal0 (912653) on Sunday June 11 2006, @03:11AM (#15512038)
    Is this something new? At least 3 years ago I was using www.yes.net to get the name of a song on the radio. They've changed their homepage since, probably to commercialize it more, but I used to input my city, the radio station, and the time of day. Their site would display the song that played and offer a link to purchase the disc.
  • Easy task (Score:2)

    I didn't even RTFA, but from the summary I have an idea on how to implement this idea, it's fairly simple, although it's probably not as computationally efficient as what they came up with, no need to be a great engineer, if you have studied digital signal processing for a few monthes it will be enough.

    So you take that audio clip, and you simply cross-correlate (reverse in the time-domain and convolve) it with your audio data base. The highest peak in your results denote a correlation between the audio clip and a show. The only problem being if the audio clip recorded some blank part in the show. However with this technique even if there's quite some noise in the audio clip or even someone talking over it it's all good.

  • There was a paper at Webmedia 2005 describing a system -- deployed in early 2005 -- used for real-time audio finger printing that does the same, AFAIK.

    http://doi.acm.org/10.1145/1114223.1114238 [acm.org]
    This paper describes a scalable real-time audio fingerprinting system developed by IBOPE Midia for radio and TV broadcast monitoring. A special temporal feature extraction strategy based on the Short-Time Fourier Transform has been designed. When given an input stream to analyse, the system matches it against the database and automatically recognizes instances of the previously registered samples within the input stream. The algorithm exploits the temporal evolution of the signal frequency spectrum in order to identify patterns and produce the final classification. The database is clusterized in order to provide an efficient and scalable search strategy. The system has been assessed using a database containing 393 distinct commercials. A 41-hour audio stream from three different TV channels has been analysed in less than 3 hours, attaining a 95.4% recognition rate.
  • Re:Great... (Score:2)

    by GigsVT (208848) on Saturday June 10 2006, @09:39AM (#15508986)
    (Last Journal: Saturday June 30, @01:22AM)
    Off the top of my head:

    A PVR that doesn't need to rely on blind luck and often incorrect listings to know if it's recording the right thing.

    My Tivo often mischannels to PBS. I'm pretty sure this algorithm should be able to tell Family Guy from the "Boring ass old people talking about politics hour".
    [ Parent ]
    • whereas by way2trivial (Score:3) Saturday June 10 2006, @10:39AM
  • Re:Great... (Score:2, Insightful)

    by Anonymous Coward on Saturday June 10 2006, @09:41AM (#15508997)
    profiling
    [ Parent ]
  • Re:Great... (Score:2, Interesting)

    by Rytis (907427) on Saturday June 10 2006, @09:41AM (#15508999)
    (http://rytis.blogsome.com/)
    For displaying ads. There's some info in TechCrunch [techcrunch.com].
    [...]to listen to the ambient audio in a room, determine what is being watched on TV and offer web-based supplemental information, services and shopping contextual to each program being watched.
    [ Parent ]
    • Re:Great... by Adam Hazzlebank (Score:1) Saturday June 10 2006, @09:49AM
  • Re:Great... (Score:3, Insightful)

    by Anonymous Coward on Saturday June 10 2006, @09:44AM (#15509010)
    Keeping piracy out of Google Video.
    [ Parent ]
  • Re:Great... (Score:2)

    by Janek Kozicki (722688) on Saturday June 10 2006, @09:51AM (#15509041)
    (Last Journal: Tuesday May 10 2005, @03:47PM)
    and this is useful for what exactly?

    if you want to check what tv show it is - it means that you like it, and want to watch later/tell your friends about it. Google can sell data about tv shows popularity to interested parties, that will know where to plant ads (or they could place ads themselves...). It also can be used to determine an ad price for a given show.

    I'm not watching tv for 4 years now, it feels great. If I accidentally see some of it somewhere I'm shocked at how dumb it is.
    [ Parent ]
    • 1 reply beneath your current threshold.
  • Re:Great... (Score:2)

    by tibike77 (611880) <tibikegamez@@@yahoo...com> on Saturday June 10 2006, @10:13AM (#15509117)
    (Last Journal: Friday November 10 2006, @06:20AM)
    I would assume its main use would be to display MORE pr0n ads for most people :D
    [ Parent ]
  • Popup Television (Score:3)

    by AlpineR (32307) <wagnerr@umich.edu> on Saturday June 10 2006, @10:18AM (#15509131)
    (http://soayacs.blogspot.com/)

    Do you remember the MTV show Popup Video? They showed older music videos with popup balloons that gave extra information, like actors in the video that later became famous or mistakes made during production. If Google analyzed the sounds coming into your laptop and gave you a link to a site like the Internet Movie Database [imdb.com] then you could have Popup Television. Learn more about the specific episode you are watching, and even have the ability to edit that information yourself.

    It'd make an interesting toy. I'm sure that anyone with some imagination could think of even cooler applications.

    AlpineR

    [ Parent ]
  • by dfung (68701) on Sunday June 11 2006, @12:11AM (#15511694)
    I think this is exactly right.

    If Google really wants to "do no evil", then they need to use this technology to recognize that a commercial just started, and turn the darn volume down! I'm a reasonable guy - they don't need to turn the volume off or skip over the commercial (although they are welcome to do this), just turn it down to the point where the overcompressed signal is not blasting me out of my brain. It's almost impossible to quietly watch any 10PM network TV show without getting blasted by the commercials for the next horror/exploding action movie that's in theaters.

    [ Parent ]
  • 9 replies beneath your current threshold.