OpenAI's State-of-the-Art Machine Vision AI Fooled By Handwritten Notes (theverge.com) 49
Researchers from machine learning lab OpenAI have discovered that their state-of-the-art computer vision system can be deceived by tools no more sophisticated than a pen and a pad. The Verge reports: As illustrated in the image above, simply writing down the name of an object and sticking it on another can be enough to trick the software into misidentifying what it sees. "We refer to these attacks as typographic attacks," write OpenAI's researchers in a blog post. "By exploiting the model's ability to read text robustly, we find that even photographs of hand-written text can often fool the model." They note that such attacks are similar to "adversarial images" that can fool commercial machine vision systems, but far simpler to produce.
[T]he danger posed by this specific attack is, at least for now, nothing to worry about. The OpenAI software in question is an experimental system named CLIP that isn't deployed in any commercial product. Indeed, the very nature of CLIP's unusual machine learning architecture created the weakness that enables this attack to succeed. CLIP is intended to explore how AI systems might learn to identify objects without close supervision by training on huge databases of image and text pairs. In this case, OpenAI used some 400 million image-text pairs scraped from the internet to train CLIP, which was unveiled in January.
[T]he danger posed by this specific attack is, at least for now, nothing to worry about. The OpenAI software in question is an experimental system named CLIP that isn't deployed in any commercial product. Indeed, the very nature of CLIP's unusual machine learning architecture created the weakness that enables this attack to succeed. CLIP is intended to explore how AI systems might learn to identify objects without close supervision by training on huge databases of image and text pairs. In this case, OpenAI used some 400 million image-text pairs scraped from the internet to train CLIP, which was unveiled in January.
I see (Score:5, Funny)
So if I rob a place where the camera are watched by such an AI, a handwritten note with 'Maintenance' on my forehead would be enough to get by?
Re:I see (Score:5, Funny)
Slow down, mastermind - you're thinking too complicated here. Put a note on your forehead saying "houseplant" and your disguise is complete.
Re:I see (Score:4, Interesting)
Slow down, mastermind - you're thinking too complicated here. Put a note on your forehead saying "houseplant" and your disguise is complete.
Indeed. The AI field is (again) massively overselling what it can do and then they pretend to be surprised when the severe limitations of their products become impossible to ignore.
Re: (Score:2)
I love the spin on it:
"We refer to these attacks as typographic attacks"
It's an attack, sure. Not a fundamental weakness in your approach or execution.
Re: (Score:1)
Their model is overly complex and dumb. They need to think like a bug, simplified view of the environment. If it moves it is potential dangerous and should be avoided, if it is still it is relatively safe and can be approached, if it fits within the targeted visual profile parameters, then you bug should do what ever it has been programmed to do with it.
The AI it is about replacing the lack of intelligence of the people programming it, it is likely doing a much better job than them but vision still mystifie
Re: (Score:2)
I love the spin on it:
"We refer to these attacks as typographic attacks"
It's an attack, sure. Not a fundamental weakness in your approach or execution.
Indeed. Lie by misdirection.
Re: (Score:2)
If someone showed me 400,000,000 images each with the correct word next to it, then you showed me a banana with the word sausage under it, I'd probably call it a sausage too. What did they expect? Who's stupid here, the human or the computer?
Re: (Score:1)
Re: (Score:1)
"Honey, you see, I skip shaving for my job."
Re: (Score:1)
Put a note on your forehead saying "houseplant" and your disguise is complete.
Or a dead giveaway.
Re: (Score:3)
No much different than using a Photoshopped security badge to get by a human guard.
Re: (Score:2)
You're way overestimating the amount of skill actually required. Check the images in the article. A blank piece of printer paper and a sharpie with a generic written word, no fancy symbols or drawing needed at all to fool these systems. PHOTOSHOP? That ACTUALLY requires skills!
Re: (Score:1)
Pffft, MS-Paint can be used. It's just that using "MS-Paint" as verb confuses readers. Bing how to do it.
Re: (Score:1)
Correction: "as a verb".
Modnays.
Re: (Score:2)
Who needs photoshop? I remember Bruce Schneier's stunts with TSA, back when they were controversial.
Re: (Score:2)
“Bottles labeled saline solution. They won’t check what’s in it, trust me.”
That's how I always get my own booze on the plane.
Tay (Score:2)
I've got some love notes for Tay.
Artificial Stupidity is in its infancy (Score:1)
Maybe in a few decades it will be able to reliably do some primitive tasks, but that state of affairs is a long time off.
Re: (Score:3)
In fa
Re: (Score:2)
You fell for the marketing lies.
Re: (Score:2)
Sure. But my point is that these "AI does medical diagnosis/recommends treatment for XYZ better than human" claims have turned out to not be true. For example, IBM Watson did recommend better treatments, except in the cases where the treatment would have killed the patient. If you train a model badly so that it, say, give 90% somewhat better diagnoses but 10% massively worse ones than a human then that is just a camouflaged lie.
Humans also vulnerable (Score:5, Insightful)
This is true of humans. If you do a fast paced quiz where you have text in different colors and you're supposed to say out loud the color of the box but the text says "Orange" even though the box around it is blue your brain will frequently get tricked into saying "Orange". We have to force our higher level reasoning to override our instinctual response.
Re: (Score:2)
Yeah, I came to say this. It's quite interesting that an AI system trained on text and images manages to conflate them this way, particularly this type of unsupervised training.
Re: (Score:3)
Re: Humans also vulnerable (Score:2)
That's because the human hasn't moved on. Essentially, the human is doing additional processing to verify their initial gut assumption. It probably wouldn't be too hard to do the same with the AI.
Re: (Score:2)
Known as the "stroop" effect.
It applies to any set of conflicting information. Often done with the word "orange" being printed in not orange. Doesn't need to be fast-paced either.
Jedi pensaber (Score:1)
"These are not the Kilroy's you are looking for."
This article suggests (Score:2, Funny)
Re: (Score:1)
First they outsourced to distant humans to save short-term bucks, then they outsource to AI for the same reasons. Hit-and-run capitalism: the short-term profits are mine, long-term problems are somebody else's. "Well, um, the MBA who made the decision left the company last year."
(It's not that offshore workers are inherently, bad, they are often just not in a position to understand the fuller system.)
Train it like we train kids (Score:2)
Re: (Score:2)
Even that approach may not work in some circumstances [renemagritte.org].
Not entirely inaccurate (Score:3)
Re: Not entirely inaccurate (Score:2)
Appleâ(TM)s lawyers are not amused.
My next experiment... (Score:2)
I'm going to get a hat and shirt with "nullptr" printed on them and be completely untrackable!
It appears ... (Score:3)
First thing: Put the "iPod" PostIt in front of the apple (instead of covering it up). Then ask it to identify the key features in the scene. The PostIt masking the apple might impede the AIs recognition of the fruit. The next test: Put a PostIt with the word "iPod" next to an iPod and ask it to identify the music player. To do so,it would need a knowledge base that can identify objects in some sort of taxonomy. Electronic Device:Music Player and attach attributes like Music Player[brand:Apple, model:iPod]. And Paper:Note[contains text: iPod]. Or whatever schema best suits the use case. But even if the AI returns only the most significant item recognized, there should be no issue with differentiating between a note that reads "iPod" and the actual gadget.
Re: (Score:2)
Isn't it becoming pretty obvious at this point, that the trouble with these AI algorithms is that they are still just massive correlation filters? I remember 20 years ago at university, there was a belief that if you made them large and complex enough, they would start to demonstrate aspects of intelligence - the sort of intuition you're talking about where they would be able to figure out object hierarchies and context by themselves.
At the time, this belief seemed to be based on our rather limited knowledg
Re: (Score:1)
I sort of feel like this is a dead end though. At this point they are just making slightly better correlation systems and throwing them at problems that could probably be better solved using other techniques.
You have to compare it to what existed before, and when you do that there is a significant progress. The number of image-captions used and the way it is trained with loose supervision shows significant progress. I work in the field and as far as I know, there is no other "technique" that comes close - and it is not for want of trying. In my personal experience, there are academics and researchers who would love to come up with some other technique to beat DNNs and if they do, that is progress as well.
It would be great to have the universal AI we could just throw at every problem, but unfortunately, in my lifetime at least, I think it's just going to be a lot more grinding out blended expert systems instead. Good news for programmers and white collar workers I guess, but it sure would have been fun to see the singularity.
IMO, t
Re: It appears ... (Score:2)
Maybe the missing piece is that AI training doesnâ(TM)t interact with the thing itâ(TM)s learning. If you just show a toddler pictures of things, they donâ(TM)t learn as quickly. If you give them things to pickup, interact with, and try out in weird ways, then they learn much more quickly. You donâ(TM)t have to give a toddler 500 pictures of cups, you just need to let them play with two or three and then they can go somewhere else and identify a cup thatâ(TM)s different.
Re: It appears ... (Score:2)
so maybe i can make apostrophe's's work now!
sorry Slashdot :(
Re: (Score:2)
This basically what Timnit Gebru was warning about in her paper. Training AI this way makes it very limited and brittle. It doesn't have any real understanding, just pattern matching. It also consumes a massive amount of energy for these limited results, and is basically a dead end in terms of developing truly capable AI.
Rather demonstrates that the state of the art... (Score:1)
... in AI is little better than snake oil.
Ceci n’est pas une pipe (Score:2)
Magritte, La trahison des images, 1929.
pen is mightier (Score:2)
It's a feature (Score:2)
This is a feature requested by most politicians. Also, putting lipstick on a pig works, too.