Most modern CPUs and the compilers for them are simply not designed for multiple threads/processes to interact with the same data. As an excersize, try writing a lockless single-producer single-consumer queue in C or C++. If you could make the same assumption in this two-thread example that you can make in a single-thread problem, namely that the perceived order of operations is the order that they're coded, then it'd be a snap.
But you see, once you start playing with more than one thread of execution, you gain visibility into both CPU reordering and compiler reordering. You also gain visibility into optimizations made (such as maintaining values in a register and not moving to cache or invented predictive stores and the like). If you research enough you'll find that while the volatile keyword will solve some of the problems, it doesn't solve them all, and it introduces others (it works well for what it's designed for, which is interfacing with hardware, if it's being used for intra-thread comms it's being misused). You wind up needing to use architecture-specific memory barriers/fences to instruct the CPU about reordering and when to flush store buffers to cache and so on. You wind up needing to use compiler-specific constructs to prevent it from reordering or maintaining things in registers that you're not wanting. (volatile is often used for the later, and note while volatile variables won't be reordered around each other, the standard says nothing about reordering non-volatile around the volatile. Also, it bypasses the cache, which in x86-land introduces CPU-reordering that otherwise isn't there (as I think volatile winds up being implemented using CLFLUSH?) as well as unnecessary performance hits (which perfomance is evidently important if you're trying to avoid locks...)
Atomicity is a whole different level of fun as well. I was lucky, at the boundary I was dealing with inherently atomic operations (well, so-long as I have my alignment correct, (not guaranteed by new)), but if you're not... it's yet more architecture-specific code.