Well there's no point is using IP at all, IP is designed to allow for four billion addresses. What you need in this case is a point-to-point, like a serial cable, not a network stack. Having a network stack would seriously add to the overhead.
And you certainly don't need UDP, or FTP running inside it. Dropped data is a no-no, I believe, but there's no reason you can't just send commands and return data in the raw, then ask for corrupted blocks from the data to be sent again.
It would look like this:
E: Send me file 102.
M: Sends file....
E: Thanks, now send me sections x, y, and z of that file again because I missed it.
M: X, Y, Z...
E: Thanks, now delete the file, point the camera ten degrees left, and take another picture.
How do you know if the data is corrupt? Well you do CRC checking on the wire, and you split all file transfers into chunks and include hashes with them. The chunks need to be right-sized so there aren't too many retransmissions, but big enough not to incur massive latency and overhead.