True, except I don't consider spatial memory and image memory to be the same.
True. They are not the same, but there is overlap. Topological rarely goes without image memory. Images are encoded, yes, and photographic memory is really rare (http://www.youtube.com/watch?v=jVqRT_kCOLI) and probably there's a reason.
First of all spatial memory provides a constant stream of clues. That is you have a current view that you can use as a key to the next assiocation, if you try to remember an image you usually have a one clue -> all details relation. You can fix that by making strings of associations, but that is the memory tricks we are talking about.
Memory is association. What you call tricks are just using conscious efforts to reinforce associations.
Furthermore, memory can be improved even without conscious meddling in encoding system; one of the standard exercises during drawing classes is to observe different parts of face - for example you spend a week observing people's noses. All the time. Everywhere you go. Result: your perception and memory increases - it is like the compression algorithms that help you encode and associate a given detail (nose, ears, eyes, etc..) improves.
Also, the image is not a single, full detail clue in terms of experience on the level of neuron excitation - your eyes always provide the stream, even looking at the still image - full resolution of your eye can only be achieved on a very limited FOV, that's why you move your eyes while you read this text or while you look at the painting, etc...
It is the encoding system (and hardware) that your brain uses that distinguish the image retention capabilities - be it naturally or consciously trained.
I consider remembering numbers and images equally hard because both has a single context to a lot of details relationship, and to remember many of the details you need all kinds of tricks.
People have different difficulty of remembering certain details, some remember dates, some smells, some people, some emotions, so when having to improve your memory you would usually use something that you are natural good at remembering.
Again true, however, I repeat, spatial (in a sense of topological+image) memory is particularly good in humans. Also, most faces (image) you remember from just one meeting (names are the problem; reinforcing association is useful in remembering the name; but this remembering faces is understandable - we have specialised hardware for that, see http://www.ted.com/talks/vilayanur_ramachandran_on_your_mind.html on what happens when it breaks). Most locations you remember extremely well with quite a lot of detail. Stephen might remember more, but is that optimal? So you do abstract the images, unless there are particularly important to what you do or what you like; at the same time you do remember distinguishing factors: for example you might still remember the exact way to your childhood school, a neighbourhood where you lived ten years ago, student dorm, streets in various cities that you walked and that is not pure abstract topological graph; memory contains enough image detail for recognition (compression algorithms are normally optimised for recognition, not necessarily reconstruction; http://www.youtube.com/watch?v=Ipomu0MLFaI).