Only data is stored in the secure enclave from what I understand. The secure enclave isn't an environment as it is storage area
It's a bit more complicated than that. The secure enclave offers services to the OS, for example you can give it a digital signature and ask if that signature was signed with a trusted key, or you can give it some data and ask it to decrypt or encrypt it with one of the stored keys. It needs to do these things so that the storage can be write-only: if you are able to pull the keys out of the secure element then you may as well do the whole thing in software. A number of Android phones do something similar using TrustZone (though recent TrustZone attacks give me less confidence in that than I'd have had a year ago), but the Apple implementation is a separate ARM core running its own OS and communicating with the main OS.
Your scenario relies on the connection between the camera and OS being open to attack and not secured.
Well, one of two attacks: the OS is fully compromised (in which case you can't get into the secure element, but you can see anything that the OS can see) or the OS exposes more information to userland than you might want it to.
We'll need more details but the current fingerprint scanner is secured from tampering from what I know.
The difference with the fingerprint scanner is that the only useful thing that you can do with the fingerprint scanner is scan fingerprints, so the OS can expose a simple interface that is simply 'is this a valid fingerprint'. This is actually a slightly too narrow interface for a lot of uses. For example, I might want several people to be able to unlock my iPad with their fingerprints, but no one other than me to be able to unlock my Internet banking app, and iOS currently doesn't provide this functionality, just a 'does this finger belong to an authorised person' query. In contrast, the distance-sensing camera is probably very useful for augmented reality things, where you actually do need to expose the depth information to userspace. This makes it much harder to prevent a malicious app from simply doing a face scan and uploading it to the NSA / FSB / Google / Flat Earth Society.