Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
AI

OpenAI's State-of-the-Art Machine Vision AI Fooled By Handwritten Notes (theverge.com) 49

Researchers from machine learning lab OpenAI have discovered that their state-of-the-art computer vision system can be deceived by tools no more sophisticated than a pen and a pad. The Verge reports: As illustrated in the image above, simply writing down the name of an object and sticking it on another can be enough to trick the software into misidentifying what it sees. "We refer to these attacks as typographic attacks," write OpenAI's researchers in a blog post. "By exploiting the model's ability to read text robustly, we find that even photographs of hand-written text can often fool the model." They note that such attacks are similar to "adversarial images" that can fool commercial machine vision systems, but far simpler to produce.

[T]he danger posed by this specific attack is, at least for now, nothing to worry about. The OpenAI software in question is an experimental system named CLIP that isn't deployed in any commercial product. Indeed, the very nature of CLIP's unusual machine learning architecture created the weakness that enables this attack to succeed. CLIP is intended to explore how AI systems might learn to identify objects without close supervision by training on huge databases of image and text pairs. In this case, OpenAI used some 400 million image-text pairs scraped from the internet to train CLIP, which was unveiled in January.

This discussion has been archived. No new comments can be posted.

OpenAI's State-of-the-Art Machine Vision AI Fooled By Handwritten Notes

Comments Filter:
  • I see (Score:5, Funny)

    by nospam007 ( 722110 ) * on Monday March 08, 2021 @07:31PM (#61138642)

    So if I rob a place where the camera are watched by such an AI, a handwritten note with 'Maintenance' on my forehead would be enough to get by?

    • Re:I see (Score:5, Funny)

      by werepants ( 1912634 ) on Monday March 08, 2021 @07:54PM (#61138704)

      Slow down, mastermind - you're thinking too complicated here. Put a note on your forehead saying "houseplant" and your disguise is complete.

      • Re:I see (Score:4, Interesting)

        by gweihir ( 88907 ) on Monday March 08, 2021 @08:01PM (#61138732)

        Slow down, mastermind - you're thinking too complicated here. Put a note on your forehead saying "houseplant" and your disguise is complete.

        Indeed. The AI field is (again) massively overselling what it can do and then they pretend to be surprised when the severe limitations of their products become impossible to ignore.

        • by dwywit ( 1109409 )

          I love the spin on it:

          "We refer to these attacks as typographic attacks"

          It's an attack, sure. Not a fundamental weakness in your approach or execution.

          • by rtb61 ( 674572 )

            Their model is overly complex and dumb. They need to think like a bug, simplified view of the environment. If it moves it is potential dangerous and should be avoided, if it is still it is relatively safe and can be approached, if it fits within the targeted visual profile parameters, then you bug should do what ever it has been programmed to do with it.

            The AI it is about replacing the lack of intelligence of the people programming it, it is likely doing a much better job than them but vision still mystifie

          • by gweihir ( 88907 )

            I love the spin on it:

            "We refer to these attacks as typographic attacks"

            It's an attack, sure. Not a fundamental weakness in your approach or execution.

            Indeed. Lie by misdirection.

          • by MrL0G1C ( 867445 )

            If someone showed me 400,000,000 images each with the correct word next to it, then you showed me a banana with the word sausage under it, I'd probably call it a sausage too. What did they expect? Who's stupid here, the human or the computer?

        • I do think that AI in media is portrayed to be more advanced than it is - I'm not sure if this is a problem with the media or the field. I mean, this is a published paper and you can read it to understand the claims and problems. The presented work is state of the art and is a big step forward, but there is a gap between that and deploying it in applications. In the end, this is *just* a neural network trained to map input images and text into a vector. However, the CLIP model is an impressive step forward
      • by Tablizer ( 95088 )

        "Honey, you see, I skip shaving for my job."

      • Put a note on your forehead saying "houseplant" and your disguise is complete.

        Or a dead giveaway.

    • by Tablizer ( 95088 )

      a handwritten note with 'Maintenance' on my forehead would be enough to get by?

      No much different than using a Photoshopped security badge to get by a human guard.

      • by darkain ( 749283 )

        You're way overestimating the amount of skill actually required. Check the images in the article. A blank piece of printer paper and a sharpie with a generic written word, no fancy symbols or drawing needed at all to fool these systems. PHOTOSHOP? That ACTUALLY requires skills!

      • Who needs photoshop? I remember Bruce Schneier's stunts with TSA, back when they were controversial.

        We took our shoes off and placed our laptops in bins. Schneier took from his bag a 12-ounce container labeled “saline solution.”

        “It’s allowed,” he said. Medical supplies, such as saline solution for contact-lens cleaning, don’t fall under the TSA’s three-ounce rule.

        “What’s allowed?” I asked. “Saline solution, or bottles labeled saline solution?

        • “Bottles labeled saline solution. They won’t check what’s in it, trust me.”

          That's how I always get my own booze on the plane.

  • I've got some love notes for Tay.

  • Maybe in a few decades it will be able to reliably do some primitive tasks, but that state of affairs is a long time off.

    • by xwin ( 848234 )
      The technology is indeed in its infancy, but it already can do many things much better than humans. For example it can detect cancer in X-rays better than humans. The software just needs sanitized input and it is not ready to wonder around in the world telling you apple varieties from iPods. The software is just like a baby, you can easily fool it with some simple trick. It is more like an idiot savant - it can do complex tasks that is was trained on very well, but it is not ready for the real world.
      In fa
      • by gweihir ( 88907 )

        You fell for the marketing lies.

  • by im_thatoneguy ( 819432 ) on Monday March 08, 2021 @08:02PM (#61138738)

    This is true of humans. If you do a fast paced quiz where you have text in different colors and you're supposed to say out loud the color of the box but the text says "Orange" even though the box around it is blue your brain will frequently get tricked into saying "Orange". We have to force our higher level reasoning to override our instinctual response.

    • by ceoyoyo ( 59147 )

      Yeah, I came to say this. It's quite interesting that an AI system trained on text and images manages to conflate them this way, particularly this type of unsupervised training.

    • The difference being, the human will correct and say "Um, no, blue." The "AI" won't.
      • That's because the human hasn't moved on. Essentially, the human is doing additional processing to verify their initial gut assumption. It probably wouldn't be too hard to do the same with the AI.

    • Known as the "stroop" effect.

      It applies to any set of conflicting information. Often done with the word "orange" being printed in not orange. Doesn't need to be fast-paced either.

  • "These are not the Kilroy's you are looking for."

  • by Anonymous Coward
    There seems to be a common misperception that AI stands for Artificial Infallibility.
    • by Tablizer ( 95088 )

      First they outsourced to distant humans to save short-term bucks, then they outsource to AI for the same reasons. Hit-and-run capitalism: the short-term profits are mine, long-term problems are somebody else's. "Well, um, the MBA who made the decision left the company last year."

      (It's not that offshore workers are inherently, bad, they are often just not in a position to understand the fuller system.)

  • They need to train the system by showing it objects and names scrambled and have the system sort the names until they are correct. Just like we do with kids. One of these things is not like the others. You have to show the system incorrect pairs and identify them as incorrect or it will not learn. Just like my grand kids every kind of berries were bluebbs Raspberries, Strawberries. They had to be taught bluebb was wrong on some things.
  • by K. S. Kyosuke ( 729550 ) on Monday March 08, 2021 @08:27PM (#61138822)
    I mean, if you slap an 'iPod' label onto an apple, it's still an Apple.
  • I'm going to get a hat and shirt with "nullptr" printed on them and be completely untrackable!

  • by PPH ( 736903 ) on Monday March 08, 2021 @09:46PM (#61139056)

    ... that the AI can't handle (or more accurately explain) a complex scene. It sees a Granny Smith apple and properly classifies it. Then it sees that apple with a PostIt that says "iPod" and says it sees "iPod" (maybe not actually an iPod, but a note with the text). Both are correct. But from what I can see, the AI has been asked to identify one key item in the scene.

    First thing: Put the "iPod" PostIt in front of the apple (instead of covering it up). Then ask it to identify the key features in the scene. The PostIt masking the apple might impede the AIs recognition of the fruit. The next test: Put a PostIt with the word "iPod" next to an iPod and ask it to identify the music player. To do so,it would need a knowledge base that can identify objects in some sort of taxonomy. Electronic Device:Music Player and attach attributes like Music Player[brand:Apple, model:iPod]. And Paper:Note[contains text: iPod]. Or whatever schema best suits the use case. But even if the AI returns only the most significant item recognized, there should be no issue with differentiating between a note that reads "iPod" and the actual gadget.

    • Isn't it becoming pretty obvious at this point, that the trouble with these AI algorithms is that they are still just massive correlation filters? I remember 20 years ago at university, there was a belief that if you made them large and complex enough, they would start to demonstrate aspects of intelligence - the sort of intuition you're talking about where they would be able to figure out object hierarchies and context by themselves.

      At the time, this belief seemed to be based on our rather limited knowledg

      • I sort of feel like this is a dead end though. At this point they are just making slightly better correlation systems and throwing them at problems that could probably be better solved using other techniques.

        You have to compare it to what existed before, and when you do that there is a significant progress. The number of image-captions used and the way it is trained with loose supervision shows significant progress. I work in the field and as far as I know, there is no other "technique" that comes close - and it is not for want of trying. In my personal experience, there are academics and researchers who would love to come up with some other technique to beat DNNs and if they do, that is progress as well.

        It would be great to have the universal AI we could just throw at every problem, but unfortunately, in my lifetime at least, I think it's just going to be a lot more grinding out blended expert systems instead. Good news for programmers and white collar workers I guess, but it sure would have been fun to see the singularity.

        IMO, t

      • Maybe the missing piece is that AI training doesnâ(TM)t interact with the thing itâ(TM)s learning. If you just show a toddler pictures of things, they donâ(TM)t learn as quickly. If you give them things to pickup, interact with, and try out in weird ways, then they learn much more quickly. You donâ(TM)t have to give a toddler 500 pictures of cups, you just need to let them play with two or three and then they can go somewhere else and identify a cup thatâ(TM)s different.

    • by AmiMoJo ( 196126 )

      This basically what Timnit Gebru was warning about in her paper. Training AI this way makes it very limited and brittle. It doesn't have any real understanding, just pattern matching. It also consumes a massive amount of energy for these limited results, and is basically a dead end in terms of developing truly capable AI.

  • ... in AI is little better than snake oil.

  • Magritte, La trahison des images, 1929.

  • the pen is truly mightier.
  • This is a feature requested by most politicians. Also, putting lipstick on a pig works, too.

"Here's something to think about: How come you never see a headline like `Psychic Wins Lottery.'" -- Comedian Jay Leno

Working...