We're not there yet but this effort by Microsoft is, IMHO, as smart as a mouse.
Mice are pretty smart, I'd argue that the current AIs are at insect level of "intelligence".
What's obvious from these results is that the AI has no idea what it's looking at. This is typical for a trained neural net: it finds the best matching pattern in an image, and maps that to one of its output categories. It makes no difference between a random black and white blob, and a penguin, so long as they match the pattern.
A mouse, and true AI, will have spatial understanding. It will (intuitively) know that the images represent objects in space, and will be able to recreate a coarse 3D model of what they see. Then they will break down the scene in basic features, and identify it based on those features. It might say: hey, these blobs remind me of a penguin, but will never say that they *are* a penguin, because the blob will miss the beak and eyes and flippers and feet.
Basically, what we have now are the neural nets we already had 50 years ago, only on much faster hardware, combined with a bot and a web search engine. It's basically ELIZA on steroids, but still a long long way from actual intelligence.