Comment Re:Software doesn't really matter (Score 2) 259
Personally, I definitely want metadata to be stored in the image file itself, because if you do it any other way, there's always a risk of losing that association. I feel you're setting yourself up for a disaster if you use a hash, because the moment anything touches that file for *any reason*, poof, that metadata is now gone. You're highlighted the huge weakness in your system, but then created a tautology by saying "but modifying the original files is a bad idea anyway". It's only a bad idea if you've got a fragile system that depends on the exact file hash to reference critical metadata.
I think there's a reason that the XMP standard goes through great pains to embed metadata inside the image files themselves rather than resorting to external sidecar files, which is typically considered a last resort and a very poor alternative solution. If you use the image's own embedded metadata as the original and authoritative source, then you can rebuild your database from scratch automatically, no matter what you've done with your image files, or how you've folded, spindled, or mutilated them.
De-duplication is trivial if you use proper tools which compare visual features and don't rely on exact matches. Also, I don't consider the backup issue to be significant, because if you make a change to the file's metadata, then I want that file re-backed up, because I consider it to have been changed. However, since you're not changing the actual image data when you change metadata, any decent diff program should only store a small delta to represent the change.