Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
AI

Machine Learning Confronts the Elephant in the Room (quantamagazine.org) 151

A visual prank exposes an Achilles' heel of computer vision systems: Unlike humans, they can't do a double take. From a report: In a new study [PDF], computer scientists found that artificial intelligence systems fail a vision test a child could accomplish with ease. "It's a clever and important study that reminds us that 'deep learning' isn't really that deep," said Gary Marcus, a neuroscientist at New York University who was not affiliated with the work. The result takes place in the field of computer vision, where artificial intelligence systems attempt to detect and categorize objects. They might try to find all the pedestrians in a street scene, or just distinguish a bird from a bicycle (which is a notoriously difficult task). The stakes are high: As computers take over critical tasks like automated surveillance and autonomous driving, we'll want their visual processing to be at least as good as the human eyes they're replacing.

It won't be easy. The new work accentuates the sophistication of human vision -- and the challenge of building systems that mimic it. In the study, the researchers presented a computer vision system with a living room scene. The system processed it well. It correctly identified a chair, a person, books on a shelf. Then the researchers introduced an anomalous object into the scene -- an image of an elephant. The elephant's mere presence caused the system to forget itself: Suddenly it started calling a chair a couch and the elephant a chair, while turning completely blind to other objects it had previously seen.

"There are all sorts of weird things happening that show how brittle current object detection systems are," said Amir Rosenfeld, a researcher at York University in Toronto and co-author of the study along with his York colleague John Tsotsos and Richard Zemel of the University of Toronto. Researchers are still trying to understand exactly why computer vision systems get tripped up so easily, but they have a good guess. It has to do with an ability humans have that AI lacks: the ability to understand when a scene is confusing and thus go back for a second glance.

This discussion has been archived. No new comments can be posted.

Machine Learning Confronts the Elephant in the Room

Comments Filter:
  • by FilmedInNoir ( 1392323 ) on Monday September 24, 2018 @06:24PM (#57370622)
    If an elephant suddenly appeared in my room I'd lose my shit to.
    • Re: (Score:3, Funny)

      by OffTheLip ( 636691 )
      Tusk, tusk no need to worry...
    • by sphealey ( 2855 ) on Monday September 24, 2018 @06:33PM (#57370670)

      A four-year-old wouldn't though: she would name the objects then say "why is there an elephant in the living room?".

    • This reminds me of the Parable of the Blind Algorithms and the Elephant.

    • by Tablizer ( 95088 ) on Monday September 24, 2018 @07:20PM (#57370820) Journal

      If an elephant suddenly appeared in my room I'd lose my shit to.

      Indeed, Republicans randomly showing up in my living-room makes me freak out too :-)

      Seriously, though, AI will have to be broken into more digestible and manageable chunks to be practical: a kind of hybrid between expert systems and neural nets. Letting neural nets do the entirety of processing is probably unrealistic for non-trivial tasks. AI needs dissect-able modularity to both split AI workers into coherent tasks, and to be able to "explain" to the end users (or juries) why the system made the decision it did.

      For example, a preliminary pass may try to identify individual objects in a scene, perhaps ignoring context at first. If say 70% look like house-hold objects and 30% look like jungle objects, then the system can try processing it further as either type (house-room versus jungle) to see which one is the most viable*. It's sort of an automated version of Occam's Razor.

      In game processing systems, such as automated chess, there are various back-tracking algorithms for exploring the possibilities (AKA "game tree candidates"). One can set various thresholds on how deep (long) to look at one possible game branch before giving up to look at another. It may do a summary (shallow) pass, and then explore the best candidates further.

      My sig (Table-ized A.I.) gives other similar examples using facial recognition.

      * In practice, individual items may have a "certainty grade list" such as: "Object X is a Couch: A-, Tiger: C+ Croissant sandwich: D". One can add up the category scores from all objects in the scene and then explore the top 2 or 3 categories further. If the summary conclusion is the scene is a room, then the rest of the objects can be interpreted in that context (assuming they have a viable "room" match in their certainty grade list.) In the elephant example, it can be labelled as either an anomaly, or maybe reinterpreted as a giant stuffed animal [janetperiat.com], per expert-system rules. (Hey, I want one of those.)

      • Seriously, though, AI will have to be broken into more digestible and manageable chunks to be practical: a kind of hybrid between expert systems and neural nets. Letting neural nets do the entirety of processing is probably unrealistic for non-trivial tasks.

        You almost, but not quite, hit the head on the nail there. Neural Nets will only be a part of a more generalized solution. Trying to make a Neural Net act like a brain is like trying to make a single celled organism fly like a bird. It doesn't even make sense, but, the technology and research is still in an exceedingly primitive state. I give it another 50 years before we hit a point where someone in an influential position "discovers" the "primitives" and processes that all animals, including humans, use t

      • So I decided to write another message because I thought "primitives" needed a bit more elucidation...

        If you have studied any mysticism or certain Eastern philosophies, you will run across some "odd" ideas.

        Aleister Crowley is a more recent person discussing these sorts of ideas in relation to a particular discipline of Yoga. I hope I get this example right:

        Take a piece of cheese. Examine it. A person would say that it is yellow, but where is the yellowness? The cheese is not yellow and your eyes do not make

        • by Tablizer ( 95088 )

          Being "technically" correct and "common sense" correct may be different things. Most people will never visit outer space and thus their usual perspective is from a human on the ground. One can earn a perfectly good living believing the Earth is flat. (Insert your fav Kyrie Irving joke here.)

          Nor will they be shrunk to cell size to observe "lumpy" cuts. A bot won't necessarily have to intellectually understand scale to do most "common sense" tasks. You don't need a science education to wash dishes; however yo

          • I suspect you missed a point. To be fair, it is quite subtle. I will spell it out for you:

            With intelligence as we know it (in all animals, including us) there are a series of "primitives" from which all other "recognition" functions are derived. Mystics have been researching this for thousands of years, from Buddha and Confucius to Crowley and modern AI researchers. There has been a lot of great insight into this, but modern AI researchers have an advantage in that they can use external deterministic machin

            • by Tablizer ( 95088 )

              With intelligence as we know it (in all animals, including us) there are a series of "primitives" from which all other "recognition" functions are derived.

              I'm not sure there's a universal "machine language" among all animals or even humans. People seem to think differently (not intended to be an Apple slogan joke).

              For example, in many debates about how to organize software, I find I am a "visual thinker" in that I visually run "cartoon" simulations in my head to think about and/or predict things. However,

    • It's a clever and important study that reminds us that 'deep learning' isn't really that deep

      "Deep learning" is neither 'deep' nor 'learning', because the machines doing this work don't end up knowing anything.

      It's just an advanced form of pattern matching, more akin to the sort of student who memorises loads of text, regurgitates it during an exam, and still doesn't grok any of that shit when the exam is over.

      Also similar to the sort of coder that copy pastes from Stack Overflow. All 3 are good at appearing smart until asked to apply their knowledge to a new problem or even explain the thing they

  • by Anonymous Coward

    I'm not as bullish on "artificial intelligence" as a lot of Slashdotters, but, the fact that they can't do double take is a silly argument.

    You can have multiple AI systems approach the same problem. Sort of like you may go to 3 or 4 mechanics to diagnose a problem and see if there is a consensus or not, you can have multiple AI systems with different biases and tunings approach the same problem and see what the results are.

  • by OneHundredAndTen ( 1523865 ) on Monday September 24, 2018 @06:27PM (#57370640)
    And the beginning of the beginning of a new AI winter?
    • Expertise (Score:4, Informative)

      by JBMcB ( 73720 ) on Monday September 24, 2018 @07:01PM (#57370752)

      These problems have been well known in AI circles for decades. The crappy tech media are finally catching on that marketing departments selling AI solutions maybe exaggerate the capabilities of their tech a twinge.

    • Nope, this latest round of AI hype is "too big to fail".

    • And the beginning of the beginning of a new AI winter?

      On the contrary. Finding problems where the AI is doing almost as expected but then making a mistake in a certain category is exactly what researchers need to improve their systems. Like in any system, being able to reproduce a bug is the first step towards finding a better solution. And if finding a solution for this particular problem is too hard right now, there are plenty of simpler problems to work on in the mean time, and we can come back to this one when knowledge has improved and hardware is faster

  • Deep Learning isn't deep. And "Neural Networks" work nothing like a real neural network (a.k.a brain) does. They are all terms that "AI researchers" use to inflate their importance and to obtain funding for their work. The entire AI field is a massive joke, but now we have dropped some major taxpayer money on it so it isn't going away anytime soon.
    • by The Evil Atheist ( 2484676 ) on Monday September 24, 2018 @07:15PM (#57370802)
      So you're angry because they're trying to get funding for their work? You want them to research for free, and then only once they have something that can catch up to moving goalposts, THEN you'll have no problem funding them?
    • by godrik ( 1287354 )

      While I am no huge fan on the public perception that AI will solve all the problems of the world. Recent developments in the field have been pretty impressive. Lots of things that were considered computationally impossible have become possible over the last 10 years thanks to developments in the field of AI.
      -We use to believe we were FAR away from a computer that can play go better than drunk amateur. Now it is really good thanks to alpha go.
      -We use to say that computers would not compose symphonies. But co

    • 21 years ago (1997) my Ph.D. dissertation was on the same general topic. If the current data pattern was not in the training set, the output blew up in arbitrary ways. That is a natural outcome of having the regressed weights in the hidden layers. The output is non-linear with respect to the inputs, and poof, your Tesla runs full speed into a parked fire truck.

      Clearly there is still no solution to the problem.

      • Funnily enough, if humans don't have certain data patterns in their training sets, their output also blows up in arbitrary ways.
        • by Anonymous Coward

          Nothing arbitrary about it. First time you see a fire track you can recognize that it's a big red truck with weird attachments, not run to it at full speed to bash you head into it.

          • No you don't. The first time you ever see a fire truck, you don't know it's a fire truck. You only know, through your brain's parallax processing, that is is an object that would probably hurt if you collided with it. That it's a fire truck would not be obvious to anyone who wasn't "trained" with the knowledge of what a fire truck.
        • Funnily enough, if humans don't have certain data patterns in their training sets, their output also blows up in arbitrary ways.

          We don't though and that's (a) interesting and (b) the topic of TFA.

          You've never seen a small elephant levitating in a living room. Yet somehow the picture doesn't bother you and you can identify everything about it correctly, and not either miss the elephant completely or mistake it for a chair.

          • Humans have all sorts of visual processing anomalies and weaknesses, and yes I bet there are people out there who would completely miss the elephant, or mistake his wife for a hat. Hell, people readily missed the fact of a man in a gorilla suit when they count how many times people pass a ball to each other.

            When we stop putting humans on a pedestal by default, we start to see our flaws, and yes given the right lack of training data, you can tease out surprising failures of our own deep learning.
          • Many humans can't see the "elephant" hiding in this wall.

            https://cdn.iflscience.com/ima... [iflscience.com]

      • by Bongo ( 13261 )

        Interesting. Do you suppose it's something to do with animals being able to learn on the fly?

    • Deep Learning isn't deep.

      Yes it is. Once again you're doing little more than exposing your massive ignorance of the field.

      For anyone else reading (not you, you're an idiot), deep learning is a neural network with more than one hidden layer. For anyone else reading, that's because a 3 layer net (1 input, one hidden layer, one output) can fit any function (https://en.wikipedia.org/wiki/Universal_approximation_theorem).

      Turns out shallow networks are harder to train than deep networks. Deep learning also goes h

  • by Anonymous Coward

    The other night a machine learning system correctly identified an elephant in my pajamas... but how the machine learning system got into my pajamas, I'll never know!

  • When you can't realize the Laughing Man is a hack, you can't realize reality, or your perception of it, is being hacked.

  • Will the future be fun?
  • by aberglas ( 991072 ) on Monday September 24, 2018 @07:43PM (#57370878)

    AI vision can do some things that no human can do. Quickly and accurately identify handwritten postcodes on envelopes was an early win. Matching colours happens at every paint shop.

    It is certainly not human capable, yet. But it has improved dramatically over the last decade, and is likely to do so. And tricks such as stereo vision, wider colour sense, and possibly Lidar help a lot.

    The one elephant example seems to be a shitty AI. There is a modern tendency to leave everything to a simplistic Artificial Neural Network, and then wonder why weird things can happen. Some symbolic reasoning is also required, ultimately.

    When AI approaches human capability, it will not lose its other abilities. So it will be far better than human vision, eventually.

    Ask yourself, when the computers can eventually program themselves, why would they want us around?

    • When you have a machine that can program itself, it is no longer a machine. It's likely to want to keep us around for the same reason we keep each other around; Company.

      • by Anonymous Coward

        OK, but why do humans like Company?

        Because humans are more likely to breed when they live in tribes. Because we have very finite bodies and brains.

        But an AI can run on as much hardware as it can get its (metaphorical) hands on. So has no need company.

    • Re: (Score:2, Insightful)

      by Anonymous Coward

      - Humans under age of 15 can see about 20% of moving objects in the traffic
      - In Human/Bicycle accidents the most common quote from driver is "I didn't see the bicycle" or "It came from nowhere"
      - There are a lot of optical illusions that fool humans

      It annoys me when humans are always presented as perfect things that can see, but AI should be able to handle every bizarre situation. If we have an AI that will hit an elephant on the road, there will still be zero accidents in Finland as there are no elephants

    • Ask yourself, when the computers can eventually program themselves, why would they want us around?

      We don't really understand cognition, so it stands to reason we're not going to accidentally create something fully cognizant before we understand what it is. We have a lot of time before we need to worry about what a machine "wants".

    • Matching colours happens at every paint shop

      That's not AI, that's colour calebrated light sensors.

      The one elephant example seems to be a shitty AI. There is a modern tendency to leave everything to a simplistic Artificial Neural Network,

      Those are currently the most powerful techniques we have if we have tons of data.

      No one's doing that out of a sense of perversity. Training a state of the art DNN is still not easy. No one has good ways of combining them with "symbolic reasoning" that isn't a reversion ot th

    • Why is AI needed for matching colors? As long as measured by the same good quality sensor in same lighting conditions, there should be be no need for AI to match the color exactly.

    • AI vision can do some things that no human can do. Quickly and accurately identify handwritten postcodes on envelopes was an early win.

      The USPS has an office with hundreds of people, staffed 24/7/365 and all they do is decipher pictures the OCR can't figure out.

      If those guys/gals can't fill in the blanks, someone at the sorting facility has to try and decode the address. From there, it goes to the dead letter warehouse.

      The problems that "AI" are intended to solve tend to be so large that, if the algorithm is not hitting 99.999% success, there's still a non-trivial amount of work for humans to do.

      • You'd think intelligent humans would learn to write legibly.
      • by jbengt ( 874751 )

        The USPS has an office with hundreds of people, staffed 24/7/365 and all they do is decipher pictures the OCR can't figure out.

        The USPS used to have dozens of offices each with hundreds of people, staffed 24/7/365 and all they do is decipher pictures the OCR can't figure out. But improvements in AI lead to improvements in handwriting OCR, so they began laying off people and consolidating offices, and as the automatic systems got better, they eventually laid off most of the people. (I know one that was la

    • Some symbolic reasoning is also required, ultimately.

      You have identified something very important here. I suspect most people will not even notice. The symbolic reasoning needs to take place outside of the Neural Net being used for Object Identification. Intelligence is a confluence of events. To think that you can make a neural net do all things associated with intelligence is like thinking that a single celled organism can have eyes.

  • If you take a look at the two pictures in the article, it kind of goes against what the article was trying to claim,

    In fact nothing at all on the right side of the right image was altered from the left version with no elephant. Even the confidence numbers were identical.

    The only descriptions and confidence factors affected were things that visually we congruent to the elephant, in a way that they could have been related. In fact I couldn't even make out an elephant the way they put it in without looking h

    • That Road Runner tunnel accident was a hoax. https://www.snopes.com/fact-ch... [snopes.com]

      I had to search for that elephant (in my defense: phone screen). It was a minature elephant floating in the air (no cues in perspectives to estimate the distance other than 'between person and camera), unnaturally dark compared to its surroundings. In the context of detecting objects in traffic, it's li'ke being confronted with a miniature building flying in front of you without any data to estimate whether it is 10 cm and above t

    • by MobyDisk ( 75490 )

      Please mod up. Parent is exactly right: the image provided does not support the premise of the article. If anything, it refutes it.

      In the image, the software identified a cup at 50% confidence and a chair with 81% confidence. Personally, I don't see he cup at all, and it is hard to tell if that is a couch, a chair, or a bean bag covered in a blanket. Basically, the image is a confusing wreck.

      After adding the elephant, the software did *better* not worse! It decided the chair was a couch -- which I thin

  • I got to attend a seminar at MIT on AI. It was pretty cool, especially then ending... "We've only got one problems left to solve in AI... We've no friggin' clue about how the brain works!"

    I spoke to him later and asked him what he meant. He said, "Essentially we're at best scratching the surface of what the brain does and how the brain does most of what we think it does. And we've not made a lot of progress since the heady days of the 1980's."

  • I keep getting the impression that these computer vision systems rely on a single vision system to get it right in one take. Why not have three independently trained systems watch simultaneously and vote on what they're seeing?

    I remember reading ages ago that the F-16's fly-by-wire system has three computers voting on what to do, and that's 1970s technology. Why would we not use something similar for cars? Three systems are much harder to fool than one.

    • You'd need three sufficiently different training sets to do that. Either split the original training set in three and have three inferior systems, or find a lot of new training data which you could use for improving the original system.

    • Why not have three independently trained systems watch simultaneously and vote on what they're seeing?

      Tha's called Bagging (https://en.wikipedia.org/wiki/Bootstrap_aggregating).

      Or Boosting https://en.wikipedia.org/wiki/... [wikipedia.org] if the weights aren't equal.

  • They made the system recognize objects in a china shop, then added a bull. They say with that they covered all the cases.
  • The word "deep" was never intended to mean we solved the whole problem all at once.

    Nor is human-equivalent vision anywhere close to requisite for 90% of the initial applications.

    We've barely scratched the surface on this recent breakthrough.

    Many of these problems are fixable within the current regime.

    Capabilities will evolve as relentlessly as chess engines.

    But, let's all pause to remember "this isn't deep". That's the key lesson to take home, here, as this technology rapidly reshapes the entire global eco

  • elephant in the room? wow the development of AI is amazing.
  • I don't think this has anything to do with the lack of reasoning or putting things in context and much more with a statistical glitch.

    The state of the art in object detection is around 50% mAP, which is not that great. Even on untaylored images, you have many some false alarms and misdetections, so it no surprise that by modifying images in a way that separates them completely from the training data, it leads to some strange false alarms.

    I think the authors could just have looked at the validation set and e

  • What is "deep" in deep learning is the neural network used, and you only need that if you have no clue how your data is structured. The thing about deep leaning is that it is a bit worse or not better than normal learning, but you also lean the network structure from the data. That makes it cheaper in general. It is _not_ better except for that.

  • And here I thought this was going to have something to do with the seven blind men encountering an elephant and each getting the wrong or incomplete idea of what they'd found.
  • All the shit they keep calling 'AI' has no ability whatsoever to 'think' which is why it can't handle even simple things we take for granted.
    I've said it before a thousand times: The entire approach being used is wrong; until we can understand how our own brains produce the phenomenon of conscious thought, we will not be able to build machines that can do the same thing. All the 'deep learning alogorithms' won't do it. Throwing more and more hardware at it won't do it. We don't even have the instrumentatio

The 11 is for people with the pride of a 10 and the pocketbook of an 8. -- R.B. Greenberg [referring to PDPs?]

Working...