Slashdot Log In
Hitachi Develops New Visual Search
Posted by
ScuttleMonkey
on Wed Jul 25, 2007 03:28 PM
from the which-of-these-things-is-not-like-the-other dept.
from the which-of-these-things-is-not-like-the-other dept.
Tech.Luver writes to tell us that Hitachi has developed a new visual search engine that can supposedly find similar images from within millions of video and picture data entries in around 1 second. "The technology assesses the similarity of images based on image characteristics presented as high-dimensional numeric information. The information is acquired by automatically detecting information regarding the images, such as color distribution and shapes."
This discussion has been archived.
No new comments can be posted.
Hitachi Develops New Visual Search
|
Log In/Create an Account
| Top
| 166 comments
| Search Discussion
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Hmmm.... robotics? (Score:5, Interesting)
(http://www.ideaspike.com/ | Last Journal: Monday October 22, @04:43AM)
This is interesting to me - if it performs well - because this is one of the key missing elements for robotics; robots have a lot of trouble trying to match the environment around them to stored records of objects unless the environment is severely constrained. I'm not speaking of AI here (or at least, not yet) but just robots that would be able to clean your floor, carry your groceries, navigate in a burning building, walk your dog, tend your lawn. If they can classify images against stored images well, we're that much closer to generally useful and at least semi-autonomous robot devices.
Training might be a little annoying the first few times, but once you had a good database, you could replicate - or share via RF, that'd be freaky... neighbor's robot learns what a ferret looks like, now yours knows too - so that newer models were more and more informed right out of the box. Crate. Coffin. Whatever.
Add an associative database so that images normally found near other images which have just been found are searched first, and perhaps you could get the general search time down from the quoted 1 second, I'm thinking. One second is kind of pokey for a lot of robotic applications. But if the thing is in a kitchen, why would it need to be looking to recognize images that are found in a shipyard?
And I, for one, would welcome our semi-autonomous, environment recognizing, floor cleaning robot underlings.
Re:Hmmm.... robotics? (Score:5, Interesting)
texture segmentation - splitting up a picture into segments of distinct objects. In a panoramic scene, you want to split the picture up into objects such as sky, ocean, waves, beach, boats, pier, wall, people, animals. As a psychological experiment, you can show someone a picture , point to a particular point and ask them what the first word that the associate with that point is. Then you will see how every scene becomes segmented by our own vision systems.
Basic image segmentation is implemented using edge detection by Fourier Transforms (FFT, IFFT, DFT). This is a very computation intensive stage that is typically implemented using DSP's, GPU's or even dedicated ASIC's. Data used by the FFT can be in any dimension 1D (audio/radar), 2D (images) and 3D (volume visualisation). But to match the resolution of a human eye, you would need a 100 Megapixel floating point framebuffer.
texture classification - having identified the silhouette of an object, now attempt to match the contents to a particular object. Simple ways include colour histograms and silhouette matching. More advanced methods attempt to simulate the first few layers of the human retina using Gabor filters, Ring filters and Wedge filters.
But just to model a single type of retinal cell requires one or more FFT operations for an entire image. And
there are at least twelve different types of such cells. For efficiency precalculated results of sample images are generated (these are referred to as feature vectors) and then compared against the results of any new image.
For a really technical explanation of how human vision works have a look at The organisation of the retina and visual system [utah.edu]
texture retrieval - the actual design of the search engine to retrieve images through content rather than just keyword:
QBIC - Query By Image Content [ibm.com]. IBM's image retrieval database system
All of this has to performed for a single image. For an entire movie requires the processing of hundreds of thousands of images.
Re:Hmmm.... robotics? (Score:5, Interesting)
(http://lawpoop.blogspot.com/ | Last Journal: Friday May 28 2004, @06:51PM)
My pet theory is that we don't have the right kind of device yet. A mind, the 'function' of an organic nervous system, is not a Turing machine. I don't really understand the math behind it, but Goedel's incompleteness theorem [wikipedia.org] seems to show that a human mathematician can understand certain mathematical proofs that a Turing machine can never prove. Since all computers are a essentially a Turing machine, no matter how fast or parallelized they are, or how much memory they have, they will never be able to do what a human mind can do. So, maybe someday we will have artificial intelligence, or, a floor-washing robot, but we currently don't have the right kind of device that can do it.
Re:Hmmm.... robotics? (Score:4, Interesting)
(Last Journal: Monday January 08 2007, @02:45PM)
Godel's theorem says that a consistent arithmetic system will contain unprovable truths. Put otherwise, such a system cannot be both consistent and complete. Thus the Godel counterargument to Strong AI (that human minds and computers are not fundamentally different) is that humans (e.g. mathematicians) can prove things like Godel's theorem, so we are able to "rise above" the arithmetic and exist in states of full proof and full consistency.
But I think there is a flaw in that logic (note: I am not a mathematician). The theorem doesn't preclude that a given arithmetic system (e.g. human mind) will be able to prove a truth that a weaker system ignored. Thus our ability to see certain truths doesn't mean that there are not other truths that are unprovable to us.
More fundamentally, no one has actually shown that the human mind is either consistent or complete (proving both would be required to show that we are not subject to Godel's theorem). The human mind is a computational device evolved to solve real-world problems, like escaping predators, rather than contrived ones, like mathematical proofs. It is thus in fact likely to be an inconsistent (internally contradictory) computational system. The human mind may be incomplete and inconsistent.
I agree that "true AI" will require vastly more computer power, and much more sophisticated algorithms than we have today. But the emerging evidence, from what I've seen, is that "true AI" can be achieved, at least in principle, by a Turing machine.
Re:Hmmm.... robotics? (Score:4, Insightful)
(http://lawpoop.blogspot.com/ | Last Journal: Friday May 28 2004, @06:51PM)
Goedel basically showed that a Turing machine cannot do *all* the kinds of math that a human mind can do ( though it can do some). Not that a Turing machine lacks a certain amount of power, but just that it never will. It's just quantitavely the wrong tool for the job. It doesn't matter how much power you give it; the 'weakest' Turing machine is essentially the same as the 'strongest' one; it just simply can't do certain things. If a human is able to perceive and understand this, to know something that a Turing machine can't know, then the mind cannot *solely* be a Turing machine. This does not mean that the mind is not a different *kind* of machine, based on physical law, instead of some mystic hocus-pocus; it's just that it's not a Turing machine. My claim is that the mind is a qualitatively different kind of machine, not a Turing machine.
But I think there is a flaw in that logic (note: I am not a mathematician). The theorem doesn't preclude that a given arithmetic system (e.g. human mind) will be able to prove a truth that a weaker system ignored. Thus our ability to see certain truths doesn't mean that there are not other truths that are unprovable to us.
Goedel's theorem is recursive. Any human mathematician can see that no matter how powerful the symbolic system is, the Turing machine will never be complete; there will be truths that the system can't prove. No matter how much you expand a particular system to show any truth a weaker system missed, there will be more truths that the newer, more powerful system missed. This process can go on ad naseum into infinity. A human mind can perceive this foray into eternity, but the Turing machine has no way of proving it. How could a human mind perceive something that a Turing machine couldn't, unless we had some component that was fundamentally different than a Turing machine?
What we seem to have that the Turing machine doesn't is meta-knowledge. We can see that any attempt to create a complete and consistent arithmetic system on a Turing machine will just lead to an endless series of more powerful systems that produce ever more elusive truths, and the process never ends. In this sense the Turing machine is 'myopic' -- it will never stop and say "Hey, I'm not getting anywhere with this; this is an infinite loop. No matter how powerful the system is, there will always be more truths that it cannot express." It's unable to know what it can't know, so to speak. However, as humans, we can somehow see the 'big picture', that no matter how powerful a system you make, there will always be another level of truths out there.
pr0n (Score:5, Insightful)
Nuthin but a B-tree of eigenvalues (Score:2, Troll)
Hash table? (Score:2)
Will Google copy or buy this technology? (Score:3, Interesting)
(http://www.politemail.com/)
Robots (Score:2, Funny)
(http://www.warcloud.net/~odinson/ | Last Journal: Wednesday January 14 2004, @11:43AM)
"Kill multi button gadgets! Steve Jobs robot army angry!"
similar to Video Google? (Score:2, Informative)
(http://www.robots.ox.ac.uk/)
Using these words, search engine style indices and techniques can be used to make searching -- by supplying an example image area which can have its words computed -- quite fast.
The key bottle neck here is the clustering stage: reducing the original input of typically hundreds of features per frame -- multiplied by 25 frames per second by minutes, or hours, of video -- to a much smaller set of clusters. It looks like the work in the linked article is using a modified clustering algorithm [wikipedia.org] which does not require all of the data to be in memory at once.
The TRECVID [nist.gov] project is a challenge style exercise where groups compete to provide the best search results for a given set of queries where the search material is hours of video.
Document images (Score:3, Interesting)
I frequently have to create large collections of images from all sorts of file types -- some text-based, some graphics -- that get housed in a collection of images for easy, standardized review. If there were something that could avoid the step of extracting text from them, or later OCRing them and still end up with a searchable image collection, well, that would be exceedingly cool. It would cut the initial time outlay I have to devote to virtually any given project I have to deal with by 25 to 50%.
This is going to spell the death... (Score:1)
Not as useful as it sounds... (Score:2)
For example: I want to find more cat images. I feed it a picture of a white cat. I am more likely to be returned results of white dogs than, say, tabby or black cats.
Unless I'm misunderstanding something?
Hmm... FBI contract with Hitachi, maybe? (Score:1)
Since when can an ancient indian tribe ... (Score:5, Funny)
Hmm, sounds familiar... (Score:2)
(Last Journal: Friday April 27 2007, @02:20PM)
Back in the day (almost 2 decades ago), I was using video rather than still images (which allowed me temporal information as well as spatial information) but I recently wrote a simple application to just use the spatial information to find me images "most-like" a source one. The original goal was to train the system and then try to leverage a semantic processor from the trained system. It worked reasonably well (sometimes astoundingly well) on the database I had (some 300,000 images downloaded from keyed-searches on google images).
As Hitachi said, the key is to develop a matching system within a higher numerical dimension. One of the missing pages on the blog (I'll get there!) is how to evaluate the usefulness of any given feature (=dimension) of a region of an image. With this, one can approximate a numerical value for the information being relayed to the recognition-system using that feature, and therefore establish its worth as a feature.
When you know what you're looking for (your feature set) *and* the value of each of those features to your recognition system targets {man,boat,grass,house,...} you can create reasonably useful discriminators and rule-systems based on those discriminators. Note that the discriminators and the rule-system can be given to the system as a-priori information, but most of them are created and destroyed automatically *by* the recognition system as it evolves. It sounds complex, but really it's a bunch of simple ideas applied one after the other.
Simon.
Desktop search with image pattern recognition AI (Score:2)
Could this technique be applied to sound files? (Score:1)
Saw it done 10 years ago (Score:3, Interesting)
(http://telebody.com | Last Journal: Tuesday July 30 2002, @07:28AM)
Visual Search? (Score:2)
(http://www.evilnet.net/ | Last Journal: Wednesday August 30 2006, @12:30PM)
Older Brother: Found him! He's behind the sofa.
*RING!*
Mother: Hello?
Voice On The Other End Of The Line: Ma'am, this is Pubert Skewya. I'm a lawyer for Duey, Cheatham, and Howe. We represent Hitachi.
Mother: Uhm. Yeah? So what?
VOTOEOTL: Ma'am, we have a record that you just encouraged your son to violate our client's patent on visual searches. Natually, we'll settle out of court for one billion dollars, American. If you refuse, with the state of the economy as it is, we'll go after you in court, but we'll go after you for one billion dollars, Canadian. If you act now, and concede to our extor*COUGH*rightful demands, you'll save yourself money in the long run.
Mother: Uhm. Yeah. Who is this really?
VOTOEOTL: Ma'am, this is serious. Our client has a patent on visual searches. Every time you tell your son to go look for something, you're contributing to the violation of our client's patents.
Mother: And I know ONE young man who's going to get his ass beaten for putting one of his idiot friends up to this stupid little prank...
*CLICK*
VOTOEOTL to the rest of his call center: SHIT! That's like the millionth one!
decades old (Score:2)
Speed issues (Score:2)
(http://foone.org/ | Last Journal: Sunday July 30 2006, @05:15PM)
My own system does 13 million images in about a minute, but with enough RAM to fit the dataset in memory I can do 10-20 seconds.
I hope they're not just using a cluster to speed up access, that's a workable solution but it doesn't really help those of us who can't afford a dozen boxes to power their searcher.
Already done (Score:2)
Oh boy! (Score:2)
(http://slashdot.org/)
Photosynth (Score:1)
http://www.youtube.com/watch?v=s-DqZ8jAmv0 [youtube.com]
Glad this is public now (Score:1)
Re:HDD (Score:1, Informative)