Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Hitachi Develops New Visual Search

Posted by ScuttleMonkey on Wed Jul 25, 2007 03:28 PM
from the which-of-these-things-is-not-like-the-other dept.
Tech.Luver writes to tell us that Hitachi has developed a new visual search engine that can supposedly find similar images from within millions of video and picture data entries in around 1 second. "The technology assesses the similarity of images based on image characteristics presented as high-dimensional numeric information. The information is acquired by automatically detecting information regarding the images, such as color distribution and shapes."
This discussion has been archived. No new comments can be posted.
Display Options Threshold:
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • Hmmm.... robotics? (Score:5, Interesting)

    by fyngyrz (762201) * on Wednesday July 25, @03:29PM (#19987983)
    (http://www.ideaspike.com/ | Last Journal: Monday October 22, @04:43AM)

    This is interesting to me - if it performs well - because this is one of the key missing elements for robotics; robots have a lot of trouble trying to match the environment around them to stored records of objects unless the environment is severely constrained. I'm not speaking of AI here (or at least, not yet) but just robots that would be able to clean your floor, carry your groceries, navigate in a burning building, walk your dog, tend your lawn. If they can classify images against stored images well, we're that much closer to generally useful and at least semi-autonomous robot devices.

    Training might be a little annoying the first few times, but once you had a good database, you could replicate - or share via RF, that'd be freaky... neighbor's robot learns what a ferret looks like, now yours knows too - so that newer models were more and more informed right out of the box. Crate. Coffin. Whatever.

    Add an associative database so that images normally found near other images which have just been found are searched first, and perhaps you could get the general search time down from the quoted 1 second, I'm thinking. One second is kind of pokey for a lot of robotic applications. But if the thing is in a kitchen, why would it need to be looking to recognize images that are found in a shipyard?

    And I, for one, would welcome our semi-autonomous, environment recognizing, floor cleaning robot underlings.

    • Re:Hmmm.... robotics? by dotpavan (Score:3) Wednesday July 25, @03:40PM
      • Re:Hmmm.... robotics? by poopdeville (Score:1) Wednesday July 25, @04:38PM
      • Re:Hmmm.... robotics? (Score:5, Interesting)

        by mikael (484) on Wednesday July 25, @08:10PM (#19990755)
        To implement a visual search engine you need to be able to perform the following:

        texture segmentation - splitting up a picture into segments of distinct objects. In a panoramic scene, you want to split the picture up into objects such as sky, ocean, waves, beach, boats, pier, wall, people, animals. As a psychological experiment, you can show someone a picture , point to a particular point and ask them what the first word that the associate with that point is. Then you will see how every scene becomes segmented by our own vision systems.

        Basic image segmentation is implemented using edge detection by Fourier Transforms (FFT, IFFT, DFT). This is a very computation intensive stage that is typically implemented using DSP's, GPU's or even dedicated ASIC's. Data used by the FFT can be in any dimension 1D (audio/radar), 2D (images) and 3D (volume visualisation). But to match the resolution of a human eye, you would need a 100 Megapixel floating point framebuffer.

        texture classification - having identified the silhouette of an object, now attempt to match the contents to a particular object. Simple ways include colour histograms and silhouette matching. More advanced methods attempt to simulate the first few layers of the human retina using Gabor filters, Ring filters and Wedge filters.
        But just to model a single type of retinal cell requires one or more FFT operations for an entire image. And
        there are at least twelve different types of such cells. For efficiency precalculated results of sample images are generated (these are referred to as feature vectors) and then compared against the results of any new image.
        For a really technical explanation of how human vision works have a look at The organisation of the retina and visual system [utah.edu]

        texture retrieval - the actual design of the search engine to retrieve images through content rather than just keyword:

        QBIC - Query By Image Content [ibm.com]. IBM's image retrieval database system

        All of this has to performed for a single image. For an entire movie requires the processing of hundreds of thousands of images.
        [ Parent ]
      • 1 reply beneath your current threshold.
    • Re:Hmmm.... robotics? (Score:5, Interesting)

      by lawpoop (604919) on Wednesday July 25, @03:49PM (#19988239)
      (http://lawpoop.blogspot.com/ | Last Journal: Friday May 28 2004, @06:51PM)

      I'm not speaking of AI here (or at least, not yet) but just robots that would be able to clean your floor, carry your groceries...
      Well, you are talking about AI here. It turns out that it's relatively easy to make a computer that can beat humans in chess or do complex math equations, but something as simple as walking with 6, 4, or two legs, which a lot of really stupid organisms do, is really difficult. Something like distinguishing 'indoors' from 'outdoors' or a cloud bank from the bushes, seems way in the future.

      My pet theory is that we don't have the right kind of device yet. A mind, the 'function' of an organic nervous system, is not a Turing machine. I don't really understand the math behind it, but Goedel's incompleteness theorem [wikipedia.org] seems to show that a human mathematician can understand certain mathematical proofs that a Turing machine can never prove. Since all computers are a essentially a Turing machine, no matter how fast or parallelized they are, or how much memory they have, they will never be able to do what a human mind can do. So, maybe someday we will have artificial intelligence, or, a floor-washing robot, but we currently don't have the right kind of device that can do it.
      [ Parent ]
      • Re:Hmmm.... robotics? by moderatorrater (Score:2) Wednesday July 25, @03:57PM
      • Re:Hmmm.... robotics? by fyngyrz (Score:3) Wednesday July 25, @04:08PM
      • Re:Hmmm.... robotics? (Score:4, Interesting)

        by kebes (861706) on Wednesday July 25, @04:16PM (#19988573)
        (Last Journal: Monday January 08 2007, @02:45PM)
        Your implication is that the human mind cannot be reduced to a Turing machine. I am in the other camp--who believe that the mind is subject to rigorous physical law, and that physical law can be expressed arithmetically (in principle), and so the human mind is a Turing machine.

        Godel's theorem says that a consistent arithmetic system will contain unprovable truths. Put otherwise, such a system cannot be both consistent and complete. Thus the Godel counterargument to Strong AI (that human minds and computers are not fundamentally different) is that humans (e.g. mathematicians) can prove things like Godel's theorem, so we are able to "rise above" the arithmetic and exist in states of full proof and full consistency.

        But I think there is a flaw in that logic (note: I am not a mathematician). The theorem doesn't preclude that a given arithmetic system (e.g. human mind) will be able to prove a truth that a weaker system ignored. Thus our ability to see certain truths doesn't mean that there are not other truths that are unprovable to us.

        More fundamentally, no one has actually shown that the human mind is either consistent or complete (proving both would be required to show that we are not subject to Godel's theorem). The human mind is a computational device evolved to solve real-world problems, like escaping predators, rather than contrived ones, like mathematical proofs. It is thus in fact likely to be an inconsistent (internally contradictory) computational system. The human mind may be incomplete and inconsistent.

        I agree that "true AI" will require vastly more computer power, and much more sophisticated algorithms than we have today. But the emerging evidence, from what I've seen, is that "true AI" can be achieved, at least in principle, by a Turing machine.
        [ Parent ]
        • Re:Hmmm.... robotics? by fyngyrz (Score:3) Wednesday July 25, @05:23PM
        • Re:Hmmm.... robotics? by MarsMartian (Score:1) Wednesday July 25, @05:27PM
        • Re:Hmmm.... robotics? (Score:4, Insightful)

          by lawpoop (604919) on Wednesday July 25, @05:43PM (#19989463)
          (http://lawpoop.blogspot.com/ | Last Journal: Friday May 28 2004, @06:51PM)

          Your implication is that the human mind cannot be reduced to a Turing machine. I am in the other camp--who believe that the mind is subject to rigorous physical law, and that physical law can be expressed arithmetically (in principle), and so the human mind is a Turing machine.
          I'm not saying that the mind is not subject to physical law, or is not based on math. All I'm saying is that the mind is not a Turing machine ( though it probably would have to have a Turing machine in it somewhere ). It's a different *kind* of machine, not a super-powerful Turing machine.

          Goedel basically showed that a Turing machine cannot do *all* the kinds of math that a human mind can do ( though it can do some). Not that a Turing machine lacks a certain amount of power, but just that it never will. It's just quantitavely the wrong tool for the job. It doesn't matter how much power you give it; the 'weakest' Turing machine is essentially the same as the 'strongest' one; it just simply can't do certain things. If a human is able to perceive and understand this, to know something that a Turing machine can't know, then the mind cannot *solely* be a Turing machine. This does not mean that the mind is not a different *kind* of machine, based on physical law, instead of some mystic hocus-pocus; it's just that it's not a Turing machine. My claim is that the mind is a qualitatively different kind of machine, not a Turing machine.

          Goedel's theorem says that a consistent arithmetic system will contain unprovable truths. Put otherwise, such a system cannot be both consistent and complete. Thus the Goedel counterargument to Strong AI (that human minds and computers are not fundamentally different) is that humans (e.g. mathematicians) can prove things like Godel's theorem, so we are able to "rise above" the arithmetic and exist in states of full proof and full consistency.

          But I think there is a flaw in that logic (note: I am not a mathematician). The theorem doesn't preclude that a given arithmetic system (e.g. human mind) will be able to prove a truth that a weaker system ignored. Thus our ability to see certain truths doesn't mean that there are not other truths that are unprovable to us.
          I don't think the implication of Goedel's theorem shows that we 'rise above' the Turing machine, but rather that we have a qualitatively different awareness or knowledge that a Turing machine doesn't have.

          Goedel's theorem is recursive. Any human mathematician can see that no matter how powerful the symbolic system is, the Turing machine will never be complete; there will be truths that the system can't prove. No matter how much you expand a particular system to show any truth a weaker system missed, there will be more truths that the newer, more powerful system missed. This process can go on ad naseum into infinity. A human mind can perceive this foray into eternity, but the Turing machine has no way of proving it. How could a human mind perceive something that a Turing machine couldn't, unless we had some component that was fundamentally different than a Turing machine?

          What we seem to have that the Turing machine doesn't is meta-knowledge. We can see that any attempt to create a complete and consistent arithmetic system on a Turing machine will just lead to an endless series of more powerful systems that produce ever more elusive truths, and the process never ends. In this sense the Turing machine is 'myopic' -- it will never stop and say "Hey, I'm not getting anywhere with this; this is an infinite loop. No matter how powerful the system is, there will always be more truths that it cannot express." It's unable to know what it can't know, so to speak. However, as humans, we can somehow see the 'big picture', that no matter how powerful a system you make, there will always be another level of truths out there.

          More fundamentally, no one has actually shown that the human mind is either consistent or complete (proving both would be required to sh
          [ Parent ]
        • Re:Hmmm.... robotics? by Repton (Score:2) Wednesday July 25, @07:11PM
        • Re:Hmmm.... robotics? by aragszxki (Score:3) Wednesday July 25, @07:49PM
        • Re:Hmmm.... robotics? by poopdeville (Score:1) Wednesday July 25, @08:12PM
          • Mod Up by TheCouchPotatoFamine (Score:1) Wednesday July 25, @10:43PM
        • Absoultely Wrong, analog != digital by TheCouchPotatoFamine (Score:1) Wednesday July 25, @10:17PM
        • The human brain does one thing only by master_p (Score:2) Thursday July 26, @06:10AM
        • 2 replies beneath your current threshold.
      • Re:Hmmm.... robotics? by Relic of the Future (Score:2) Wednesday July 25, @04:58PM
        • 1 reply beneath your current threshold.
      • Re:Hmmm.... robotics? by sxeraverx (Score:1) Wednesday July 25, @11:29PM
    • Re:Hmmm.... robotics? by Anonymous Coward (Score:2) Wednesday July 25, @03:57PM
    • Re:Hmmm.... robotics? by dj_tla (Score:3) Wednesday July 25, @04:11PM
    • Re:Hmmm.... robotics? by Jim Hall (Score:2) Thursday July 26, @09:29AM
    • 1 reply beneath your current threshold.
  • pr0n (Score:5, Insightful)

    by tronicum (617382) * on Wednesday July 25, @03:31PM (#19988001)
    great! new way to find even more porn.
    • Re:pr0n by Sciros (Score:2) Wednesday July 25, @03:37PM
      • Re:pr0n by grub (Score:1) Wednesday July 25, @03:46PM
      • Re:pr0n by kebes (Score:3) Wednesday July 25, @03:55PM
        • Re:pr0n by Sciros (Score:3) Wednesday July 25, @04:10PM
          • Re:pr0n by amchugh (Score:1) Wednesday July 25, @07:52PM
        • Re:pr0n by egoproxy (Score:1) Wednesday July 25, @10:47PM
        • Airplane Porn? by Ayanami Rei (Score:2) Wednesday July 25, @11:19PM
        • Re:pr0n by Wiseman1024 (Score:1) Thursday July 26, @01:37AM
        • 1 reply beneath your current threshold.
    • Re:pr0n by Wiseman1024 (Score:1) Thursday July 26, @01:54AM
    • Re:pr0n by zeylisse (Score:1) Thursday July 26, @02:13AM
      • 1 reply beneath your current threshold.
    • 1 reply beneath your current threshold.
  • by gatkinso (15975) on Wednesday July 25, @03:36PM (#19988075)
    and life ain't nuthin but bitchez n money.
  • Hash table? (Score:2)

    by Simon80 (874052) on Wednesday July 25, @03:36PM (#19988083)
    Sounds like a an on-disk format involving hash tables. They'd probably win a patent on it, too.
  • by bepolite (972314) on Wednesday July 25, @03:41PM (#19988151)
    (http://www.politemail.com/)
    I would think this would be a big and useful upgrade for http://images.google.com/ [google.com]
    • 1 reply beneath your current threshold.
  • Robots (Score:2, Funny)

    Yes, but how quickly can this be integrated into robots. Robots programmed to destroy all buttons.

    "Kill multi button gadgets! Steve Jobs robot army angry!"

    • Re:Robots by tehdaemon (Score:2) Wednesday July 25, @05:17PM
  • similar to Video Google? (Score:2, Informative)

    by gstone (236734) on Wednesday July 25, @03:55PM (#19988305)
    (http://www.robots.ox.ac.uk/)
    From the rather less than opaque description in the linked article, it seems that this works is a hierarchical extension to a system known as Video Google [ox.ac.uk]. This system detects two-dimensional features [cs.ubc.ca] in every image of a video sequence. Then uses hierarchical clustering to group together "like" features together. The centres of these clusters are used as "visual words". Scenes from the original video can then be characterised by which of these visual words they contain.

    Using these words, search engine style indices and techniques can be used to make searching -- by supplying an example image area which can have its words computed -- quite fast.

    The key bottle neck here is the clustering stage: reducing the original input of typically hundreds of features per frame -- multiplied by 25 frames per second by minutes, or hours, of video -- to a much smaller set of clusters. It looks like the work in the linked article is using a modified clustering algorithm [wikipedia.org] which does not require all of the data to be in memory at once.

    The TRECVID [nist.gov] project is a challenge style exercise where groups compete to provide the best search results for a given set of queries where the search material is hours of video.

  • Document images (Score:3, Interesting)

    by zymurgyboy (532799) <zymurgyboy@yaho o . c om> on Wednesday July 25, @03:57PM (#19988343)
    Pictures are fun, but I wonder if it would be accurate enough to locate similar images of documents (and to what degree). It would be really cool (for me anyway) if it could look at, say, a million pdfs or tiffs that don't have embedded text and come back with everything similar/identical.

    I frequently have to create large collections of images from all sorts of file types -- some text-based, some graphics -- that get housed in a collection of images for easy, standardized review. If there were something that could avoid the step of extracting text from them, or later OCRing them and still end up with a searchable image collection, well, that would be exceedingly cool. It would cut the initial time outlay I have to devote to virtually any given project I have to deal with by 25 to 50%.

  • by saveourskyline (1103211) on Wednesday July 25, @03:58PM (#19988357)
    ...of 48yr old, 365 lb. pound guys who steal pictures of girls from MySpace and pretend to be 14/f/kali.
  • by Kirin Fenrir (1001780) on Wednesday July 25, @03:58PM (#19988361)
    The technology can't determine what aspects of the image you're looking for.

    For example: I want to find more cat images. I feed it a picture of a white cat. I am more likely to be returned results of white dogs than, say, tabby or black cats.

    Unless I'm misunderstanding something?
  • I could see the FBI paying some millions of dollars for a dedicated system like this... I mean, since they have that known terrorist photo database or whatever, they might want to improve performance... Of course, I would hope that the FBI would properly configure the servers if they were to buy this. They accidentally forget to change the server from images.google.com (or something similar) to terrorists.fbi.gov, and all of a sudden, your granny is a known terrorist. Oh no!
  • by SengirV (203400) on Wednesday July 25, @04:25PM (#19988693)
    ... long since forgotten, be responsible for such innovative technology?
    • 1 reply beneath your current threshold.
  • by Space cowboy (13680) * on Wednesday July 25, @05:07PM (#19989131)
    (Last Journal: Friday April 27 2007, @02:20PM)
    ... and the theory behind what I was doing is up at my blog [gornall.net]. Or at least most of it is (all of it modulo time constraints. It'll all get there eventually).

    Back in the day (almost 2 decades ago), I was using video rather than still images (which allowed me temporal information as well as spatial information) but I recently wrote a simple application to just use the spatial information to find me images "most-like" a source one. The original goal was to train the system and then try to leverage a semantic processor from the trained system. It worked reasonably well (sometimes astoundingly well) on the database I had (some 300,000 images downloaded from keyed-searches on google images).

    As Hitachi said, the key is to develop a matching system within a higher numerical dimension. One of the missing pages on the blog (I'll get there!) is how to evaluate the usefulness of any given feature (=dimension) of a region of an image. With this, one can approximate a numerical value for the information being relayed to the recognition-system using that feature, and therefore establish its worth as a feature.

    When you know what you're looking for (your feature set) *and* the value of each of those features to your recognition system targets {man,boat,grass,house,...} you can create reasonably useful discriminators and rule-systems based on those discriminators. Note that the discriminators and the rule-system can be given to the system as a-priori information, but most of them are created and destroyed automatically *by* the recognition system as it evolves. It sounds complex, but really it's a bunch of simple ideas applied one after the other.

    Simon.
  • How long before you'll be able to search through pictures or video and the computer does image pattern recognition. So you can type the word "beach" or "jogging" and it will show you all the pictures showing scenery of a beach (or jogging .. or err jogging on the beach). Since camera makers dropped the ball and don't have easy intuitive image tagging capability built into the camera. Ideally a camera would have by now had voice recognition or recording so that you can tag a photo like "me in front of eiffel tower in paris" prior to taking it. Or at the very least a touchscreen system with common options. So now it's up to the AI of the computer to figure out the content of images using mad ocr level image analysis technology. Hmm maybe I should patent something like that.
  • by mikeasu (1025283) on Wednesday July 25, @05:52PM (#19989555)
    Example - my brother burnt me a CD a while back with an irish instrumental I just love. No idea who it is, haven't heard from him yet about it. I was thinking it'd be neat to be able to search for say, a match to maybe 10 seconds of the chorus.
  • Saw it done 10 years ago (Score:3, Interesting)

    Some time between 1992 and 1994 IIRC when I was working at the photo/press agency Pacific Press Service in Tokyo, I saw a demo of a system created IIRC by NEC which searched 90,000 photos in under one second, based on a color freehand drawing you would draw on the screen of the EWS unix workstation on which it ran. Basically if you drew a horizontal blue mass at the bottom of the screen you would get a lake, etc. In other words you could search by rough photographic composition. I am less impressed that after over 10 years Hitachi was able to do something along the same lines.
  • Visual Search? (Score:2)

    by Chas (5144) on Wednesday July 25, @10:54PM (#19992325)
    (http://www.evilnet.net/ | Last Journal: Wednesday August 30 2006, @12:30PM)
    Mother: Go find your little brother
    Older Brother: Found him! He's behind the sofa.

    *RING!*

    Mother: Hello?
    Voice On The Other End Of The Line: Ma'am, this is Pubert Skewya. I'm a lawyer for Duey, Cheatham, and Howe. We represent Hitachi.
    Mother: Uhm. Yeah? So what?
    VOTOEOTL: Ma'am, we have a record that you just encouraged your son to violate our client's patent on visual searches. Natually, we'll settle out of court for one billion dollars, American. If you refuse, with the state of the economy as it is, we'll go after you in court, but we'll go after you for one billion dollars, Canadian. If you act now, and concede to our extor*COUGH*rightful demands, you'll save yourself money in the long run.
    Mother: Uhm. Yeah. Who is this really?
    VOTOEOTL: Ma'am, this is serious. Our client has a patent on visual searches. Every time you tell your son to go look for something, you're contributing to the violation of our client's patents.
    Mother: And I know ONE young man who's going to get his ass beaten for putting one of his idiot friends up to this stupid little prank...
    *CLICK*

    VOTOEOTL to the rest of his call center: SHIT! That's like the millionth one!
  • decades old (Score:2)

    by oohshiny (998054) on Thursday July 26, @01:57AM (#19993407)
    The approach is decades old... but no doubt, it's newly patented.
  • Speed issues (Score:2)

    I wonder if they've managed to solve the slowness of this sort of search anyway other than just throwing a lot of boxes at it?
    My own system does 13 million images in about a minute, but with enough RAM to fit the dataset in memory I can do 10-20 seconds.
    I hope they're not just using a cluster to speed up access, that's a workable solution but it doesn't really help those of us who can't afford a dozen boxes to power their searcher.
  • Already done (Score:2)

    by 12357bd (686909) on Thursday July 26, @04:26AM (#19994099)
    To compare million images in a second on a comodity pc was already done since 2004/2005 see http://www.immem.com/en/ [immem.com]. By now the state of the art is to compare a video stream against a 24h video pool in realtime, using this technology.
  • So now maybe I won't have to go to 4chan to find moar of some chick whose name I don't know.
  • Photosynth (Score:1)

    by thoughtlover (83833) on Thursday July 26, @08:12PM (#20005109)
    This looks similar to how Photosynth stores and makes image correlation. And I'd say, it's one of the more impressive things I've seen Microsoft do, but I think they bought this technology.

    http://www.youtube.com/watch?v=s-DqZ8jAmv0 [youtube.com]
  • by BananaStewGuy (999543) on Friday July 27, @03:12PM (#20015657)
    I got a chance to see this software in Japan the last time I visited Hitachi's Central Research Labs. It was impressive. Unfortunately, I couldn't tell anyone about it because it was still under wraps. Now that it's out in the open, here's a post with some details. [wilkinsons.com] Briefly, it does rely on pre-indexing of the images, it doesn't rely on any text tagging, and it's not intended to compete with Google Image Search et. al. It is intended as an Enterprise application. It is remarkably good at finding faces, even when you don't tell it you're looking for faces. And it even works on video clips. Unfortunately, they didn't give me a copy to take home.
  • Re:HDD (Score:1, Informative)

    by Anonymous Coward on Wednesday July 25, @03:41PM (#19988137)
    Usually, HDD = Hard Disk Drive
    [ Parent ]
  • 6 replies beneath your current threshold.