The computer isn't trying to find food or avoid predators, so what is it "trying to do" when it "sees"
Fortunately we know this because we (in the general sense) designed the algorithms.
It's trying very specifically to get a good score on the MINST or ImageNet datasets. Anything far away from the data results in funny results. I'm not being glib. This results in the following:
One generally assumes that the data lies on some low dimensional manifold of the 256x256 dimensional space (for 256x256 greyscale images). This is reasonable: a 256^2 sized space is very, very large.
A neural net essentially warps the crap out of the space, projects up into higher dimensions, warps the crap out of it again (and so on) and eventually places down a linear classifier. Things one side of a hyperplane belong to one class, things the other side belong to another class.
Or, if you prefer, it places some curved decision boundary down in the original space.
Things that are close to the decision boundary generally get low confidence, because it is hard to decide which side of the boundary they really lie.
Points far, far away from the boundary are classified with a high confidence because there is no ambiguity. Because it's far away you can move the datapoint around quite a bit and it will STILL be the same side of the boundary.
The thing is, the algorithm only optimizes the boundary near by to the datapoints it's trained with because that's what it's trying to do: optimize the performance on the training data.
If you generate a random datapoint, it will be far, far away from the manifold that the training data lies on, and therefore likely far, far away from the decision boundary. As a result, it winds up in a completely arbitrary class but with really high confidence.
People have made efforts to try to figure out when a point is too far away from anything and classify it as "unknown". However, this is tricky. Firstly NNs and other learning algorithms, like SVMs and boosting (i.e anything involving a linear classifier in a warped space) try tp push the training datapoints as far from the boundary as possible, because points too near are uncertainly classified.
Secondly, high dimensional spaces are unimaginably sparse so there's the rather irritating tendency for nothing to be near anything else.