Catch up on stories from the past week (and beyond) at the Slashdot story archive


Forgot your password?

Automatic Image Tagging 123

bignickel writes "Researchers at Penn State have applied for a patent on software that automatically recognizes objects in photos and tags them accordingly. The 'Automatic Linguistic Indexing of Pictures Real-Time' software (catchy name) trained a database using tens of thousands of images, and new images have 15 tags suggested based on comparisons with objects or concepts in the database. Not sure how you identify a 'concept,' and they're only talking about having one correct tag in the top 15, but still cool."
This discussion has been archived. No new comments can be posted.

Automatic Image Tagging

Comments Filter:
  • by goldmeer ( 65554 ) on Thursday November 02, 2006 @09:11PM (#16698175)
    The vast majority of the images on the internets including The Google include "Pornography" in it's top 15 tags suggested. The accuracy rate is surprisingly high.
    • I don't know how to define pornography, but I know how to tag an image with it when I see it.
  • That Sucks (Score:2, Funny)

    by JerkyBoy ( 455854 ) *
    Researchers at a publicly funded institution are using their research results for personal (financial gain). Pennsylvania's tax dollars at work? How is this legal?
    • Re: (Score:2, Informative)

      by Nybarius ( 799156 )
      Contrary to what you might believe, there is nothing unethical about making money. The government even gives out grants for entrepreneurs, and lets them keep all the profits; it's good for the economy, overall. The profit motive is a much more powerful incentive to positive social change than the goodness that lies in the hearts of men,

    • by Manchot ( 847225 )
      Just because the institution is partially publicly funded doesn't mean that the research is. As a matter of fact, at many public universities, the big research groups have the "opposite" of public funding. As an example, use the University of Illinois. In theory, they're a public university. In practice, they get 20% of their budget from the state. This means that the big research groups in the College of Engineering, some of whom bring in millions of dollars a year, can end up pay up to 50% in taxes to sub
    • Penn State is a private school (privately chartered by the Commonwealth) despite what the name implies. It also receives less than 5% of it's funding from the state.

      from []

      Today Penn State is one of four 'state-related' universities (along with the University of Pittsburgh, Temple University, and Lincoln University), institutions that are not state-owned and -operated but that have the character of public universities and receive substantial state appropriations.


      • I think how well Penn State is compensated by the state can be calculated in lots of different ways. I'm sure Penn State benefits from things like state maintained roads, sewer systems and other general public infrastructure. As an educational institution there is some chance that they don't pay property taxes which means that they don't pay in to the funds that pay for that same infrastructure.

        All that said, I think software patents suck, no matter who is doing the patenting.

        • Agreed. I was merely addressing GPs point about Penn State being a public school.

          Regarding the compensation levels, Penn State receives less than the other 'state-affiliated' private universities in PA (though my info is somewhat dated).

          And I agree with your statement about the software patents, though I think .edu's generally allow the public their free not-for-profit use. Think about FreeBSD's origin at the University of California at Berkeley.

          And EULAs suck, but linux has one!
    • As an alum still paying off out of state tution, PSU is NOT a public institution. The University gets next to nothing from the state of PA.
  • by Heir Of The Mess ( 939658 ) on Thursday November 02, 2006 @09:21PM (#16698273)

    I've seen lots of systems like this. The problem is in the 50% of the images that don't work, so basically you have to manually tag 50% of your images.

    I saw an interesting one about 10 years ago. It took an X-Ray image, did an edge detection, converted all the edges to a slope vs distance 2D plot, and conerted edge curves to a radius and distance plot, then used a kind of statistical correlation algorithm to pick which part of the body the image was from. I could imagine that you could apply something similar to the luminance of an image to pick out objects, and then maybe do some color transforms and stuff to improve results. The article says they do it in 1.4 seconds per image though, which is impressive.

    • Re: (Score:3, Insightful)

      by cloudmaster ( 10662 )
      Since you don't know *which* 50% it'll get right, though, you end up having to look at 100% to determine if the system got it right or not. At that point, it's only saving you a few seconds of typing / picking from a drop-down list. :)
      • by llauren ( 80737 )
        If they did it with any kind of sense, they would not tag pictures they weren't confident enough of. But of course, i didn't read the fine article. This is /. after all :)
      • by pe1chl ( 90186 )
        It is quite common to start building a system like this (image recognition, speech recognition, automatic translation, etc etc) and publish a press release stating that "the initial results are promising".
        That is because the coarse approach to the problem is relatively uncomplicated, and after building some framework and inputting some reference data it is easy to make the system do some things right. Like guessing keywords correct for 50% of the input.

        What is hard is to get it correct for close to 100% of
      • It depends what you're using the image tagging for. If it's just for search, a la Google, then it doesn't matter if you get some false positives (though it had better be way under 50%!). I mean, if you search for something on Google you sometimes come up with totally irrelevant stuff to the query, and people aren't that worried about it.
    • The human body is pretty much the same between people, and XRays are generally shot from similar directions person to person - so the kind of check you are describing seems like it would yield high matches for pretty much any part of the body.

      In the real world we have an object you might take a picture of from any angle, using a myriad of focal lengths, with variable levels of distorition depending on the lens and camera used. Really nasty for generic object recognition. I think the best we can hope for i
      • In the real world we have an object you might take a picture of from any angle, using a myriad of focal lengths, with variable levels of distorition depending on the lens and camera used. Really nasty for generic object recognition.

        Don't forget occlusion!

  • Not RTFA to be honest, but can I claim prior art? mage_Retrieval__A_Words_and_Pictures_Approach.pdf []

    We didn't have enough time to train the system properly, but itstarted off well...

    • I'm sorry, but that pdf link you posted, is not a scientific article, it seems like a requirements specification made by business students.
      The science in your 'prior art' is mostly segmentation, and has very limited validation.

      If you say Kobus Barnard's work has prior art, that is true, because his work is very related to the ALIP system.
      see []
  • by ch-chuck ( 9622 )
    You've got to hand it to those cunning linguists at Penn State.

  • I'm sure a lot of research is being done in this area, in fact there is lots of interest implement this sort of thing in DSP for robot vision. How much of what this patent covers overlaps with what the others are working on? Is this something completely out from left field or does it fit the trend of where this area research was headed anyway?
    • by MikeFM ( 12491 )
      I did this as a project a few years ago when I was building something similar to, the yet unheard of, Flickr. Except I didn't limit myself to 15 tags. I just went through thousands of pics and tagged them with keywords I thought of as I did them and used that information to train a system that'd then go through and tag other pics. It wasn't perfect but did work pretty well and I think it could have been awesome. I tried pitching it to Google as a mashup of Google Images and Slashdot with powerful indexing a
  • by damgx ( 132688 )
    Luis Van Ahn did something almost the same, his idea though is to use humans aswell.

    View the video on Human Computation []
  • For those who missed retrievr... [] While this is good for hours of entertainment, i hope what theyre promoting is better.
  • FTFA:
    The analysis takes about 1.4 seconds per image and in 98 per cent of tests suggests at least one correct tag in the top 15.
    I suspect you could generate a list of 15 sufficiently vague words that would cover 98% of all images. Here's a start: people, sport, animal, trees...
    • I "tested" it with an image of Gord Downie holding a microphone in the wash of a blue spotlight. The tags this "groundbreaking" program suggested were:

      building city modern historical architecture people ocean_animal fish sub_sea water space cyber art ocean sport

      Only one "suggested" tag is relevant. Too bad so much time and effort went into such a waste of time and effort.

    • by linj ( 891019 )
      I suspect you could generate a list of 15 sufficiently vague words that would cover 98% of all images. Here's a start: people, sport, animal, trees...

      No kidding... How about "pixels"?
  • at least that's what I told the psychologist. Then on the second look, it looked like splotched ink on a paper that was then folded in half... I hope this software doesn't think like me cause at the end of it all, I saw a segfaulted X server on fvwm
  • by dampjam ( 779525 )
    I currently work for the group doing this - a very cool new feature will be launched in the next week that I am writing (stay tuned). Yes - this project has been done many times before by many people (to lesser degrees of success than this), but the thing to keep in mind is that this is realtime. It takes less than a second for the tags to be generated. All previous systems required a much larger amount of processing time. Check out to try it yourself!
    • by OzPhIsH ( 560038 )
      Just wondering what type of classifier you're using. Is it just an extension of one of the classical classification approaches, or a weird hybrid approach? Is there an actual research/conference paper those of us in the field can read instead of this fluff article? The real time nature of classifications makes me think you're handling the incoming images as stream data. Maybe somehow you can extend it to video, which is after all, nothing more than an image stream...
      • by dampjam ( 779525 )
        I will ask them to upload it to tomorrow... there might be a problem because it's published and now belongs to the journal. It was just presented at the ACM Multimedia Conference at UCSB, if you get those proceedings you can find it.

        Make sure to check back tomorrow to be able to search based on the tags that the computer suggests, people verify, and ones that people enter manually. I just got all the cron jobs working together.
  • How do they get less than a 50% average that you'd get by just guessing?

    (yes, assuming a normal distribution of 'concepts' in the pictures, etc)
    • How do they get less than a 50% average that you'd get by just guessing? (yes, assuming a normal distribution of 'concepts' in the pictures, etc)

      I hate to be a hater. . . but firstly, I doubt any sort of Gaussian or even mixture of Gaussians will work well to describe the distribution of picture labels. And secondly, you get a 50% average by making guesses about an even Bernoulli distribution, like a coin flip.
    • Re: (Score:2, Insightful)

      How do they get less than a 50% average that you'd get by just guessing?

      How do you get that 50% is average on guessing? Their tag pool contains 332 "concepts", which means that randomly picking 15 would give you about 1/22 chance of getting a correct tag for a picture that is tagged with one word. For a two-tag image, you get 1/11. To get up to 50% you'd have to work with images tagged with four or five words. Did I miss something here? Besides, the claim is that "in 98 per cent of tests suggests at

  • w00t!!! (Score:3, Funny)

    by rts008 ( 812749 ) on Thursday November 02, 2006 @09:38PM (#16698433) Journal
    Now almost 7% of my pr0n will get tagged correctly!
    That's cool, the rest of it will be like opening xmas presents!

    *file: 123456.jpeg>open>Aghh! Goatse!*

    Hmmm...This may be neat when it gets a LITTLE more accurate, but a cool start none the less.
    Kudus to the gang for getting a grip on a hard problem...erm..nevermind.
    • Re: (Score:2, Funny)

      by CCFreak2K ( 930973 )
      You named your penis "problem?"
      • by rts008 ( 812749 )
        A lot of the time it is. Especially it's sense of timing.
        • by nsillik ( 791687 )
          I ran their software on the mental image after reading your comment:
          Tags: Disgusting, small, horrifying, triceratops, Grenada

          Three out of five ain't bad.
          • by rts008 ( 812749 )
            "Tags: Disgusting, small, horrifying, triceratops, Grenada"

            Okay, I get the first three, but tricerotops aand Grenada? (disclaimer: I trained Spl Forces teams to go into Grenada- and yes, it WAS as fscked up as you might have heard!- so I may have a whole different point of view/perspective about Grenada than you may have :-) )

            I'm intrigued, especially about the tricerotops, and would appreciate an explaination if you would be so kind.
            (no sarcasm intended or implied- I'm really curious!)
            • by nsillik ( 791687 )
              It was an example of how poor the software (could) be. I'm not really judging the software, but most implementations I've seen give way-off responses like that.
  • Image recognition software is making it even easier for your kids to find porn! More at 6...

  • Not sure how far they got, but remember reading that IBM was working on this and had some reasonable success at object recognition in images. I'd love to be able to classify the 10k digital images I've got around. Especially if it can recognize individuals (not that it would know their names initially, but would be trainable).
    • by griffjon ( 14945 )
      I'm sure it can figure out "Jenna Jameson" quick enough.
    • Yep, it is called QBIC []--query by image content. The web site points to another web site or two that use QBIC for retrieving images from a collection.

      Facial recognition is one thing, but if you just want to try to categorize your current collection you might try imgSeek [], which is a pretty cool program. Keep in mind that no one has really yet hit upon a great general purpose algorithm for finding matches to images or query by content. There is a large subjective component in categorizing images. If an
  • Reportedly (Score:3, Funny)

    by stunt_penguin ( 906223 ) on Thursday November 02, 2006 @09:59PM (#16698621)
    Reportedly the researchers showed the system a picture of a Death Star, and it correctly tagged the image with 'thatsnomoon'.

    The system has clearly been let crawl the web for far too long.
  • Unless Jupiter Media [] gets to it first.

    Someone like myself would understand the hours of data-entry and database development that goes into indexing imagery. I research photo copyrights for a living.

    The fact that there is a feasible, automated system that can do the work will certainly cut down the man-hours for that sort of work; at least by half.

    Pity, though. I heard that Google and others had a telecommuting thing that paid people to recognize what's in a photo. Sorry to hear they'll be out of a job

  • .........SexSurfer logs in to begin his daily search of the web to find more images to rip in an effort to increase his database of porn images, utilizing this technology, only to find that most of the images consist of naked women with political statements printed on their asses......

    Seriously now, I am sure their are people out there that have already got ideas rolling around in their heads about how they can use this technology to hijack images to their advantage. Once somebody understands how the techno
  • well that really had to happen, i've just tried it and you can't really go wrong if one of the top 15 tags is 'photo' and another of the tags is 'thing'.
  • Now all they need to do is come up with a way to recognize spam words in image text without the overhead of OCR and they can make a fortune on that alone.
  • This reminds me of a uni assignment that i did where we matched images based on colour.
  • ... usually a pedant... but you don't train a database. It was likely a neural net, but TFA is rather thin on details. Anyone got a link to their paper?

  • by Anonymous Coward
    Makes it easier to process all that data generated by all those security cams.

    Is there a "Big Brother" category on Slashdot, yet?
  • This would be cool and all, but why not focus more on letting humans do the hard work-- like if I could take a picture of a tree and then press a button and say aloud; "redwood tree", and have that tag the file.
  • When I was studying textbooks on how to do this in undergrad comp sci.
  • That's all fine and great that they can tell us, but why the heck couldn't they make a web-interface for it so I could try it out?
  • This just your standard data mining classification system, simply applied to image data as input, with the tags being possible classifications. This is an obvious application to ANYONE in the field. Software patents suck.
    • If it's so obvious, how did they do it?
      • Re: (Score:3, Insightful)

        by OzPhIsH ( 560038 )
        The application is obvious, although, I'll admit, their EXACT method isn't. But at it's core, it is basic supervised learning. Feed your classifier a training a set of images that are already tagged. Extract the features of the image and use those features to predict the tags. When the predicted classifications don't match the actual tags, adjust the model, rinse and repeat. Just pick up a data mining book. Like I said, lots of people are working on image classification, and this is an obvious application,
  • Considering the adage that "a picture is worth a thousand words", they're going to have a lot more words to index--where the words may not follow a specific taxonomy.

    And that's one of the problems: does an image define the taxonomy or taxonomy defines the image [type]?

  • I quickly run out of fantasy when it comes to assign tags to my pictures: an automated mass tag finder will save hours of my precious time while uploading photos to Flickr.
  • Neural Nets (Score:2, Insightful)

    by gekoscan ( 1001678 )
    How can you take a neural network and train it, then patent that?
    That's like patenting training a dog to fetch a stick, it's completely rediculous.

    You take software capable of generalizing a neural network algorithm by feeding it pictures and associating each picture with certain tags. It then creates a generalized algorithm model based on what you fed it initially. So that when you give new input it is capable of outputting tags most similar to what you initially trained it.

    So yes this software can recogn
  • If it worked, it'd be very useful. However, getting the top 1 tag correct 50% of the time (which is the only quantifiable claim in the article) is pretty straigthforward. For most peoples photo-albums that can be done by the following AI-program: "print 'people'"

    There's a few subjects that are so common that it's more or less a given they'll be in a large fraction of the photos. Outputting "people, buildings, nature, animals, plants, city" would probably give atleast 1-2 "correct" tags for 90% of whats in

  • by Tom ( 822 )
    So where's the download link? How can software matter if I can't get it? ;-)
  • I now have someone to play pictionary with.
  • So what were they doing, throwing a dart at a damn board? That success rate is no better then randomly applying vague words.
    Move along to real research.
  • I wonder how this compares with Riya []. At some point, there were plenty of rumours [] of a possible Google purchase of Riya. Then again they were rumours.

    I haven't RTFA and I don't have any experience with Riya either, so consider the above posting a waste of time (if you must).
  • Does it detect breast size?
  • Wouldn't it be easier to pay starving children in [???] $0.01 / hour to tag images?

Sigmund Freud is alleged to have said that in the last analysis the entire field of psychology may reduce to biological electrochemistry.