3D positional audio is only a solved problem in the special cases where you have either (a) made binaural recordings — microphones in the ears of a dummy head with an HRTF known to be sufficiently similar to the listener's (or the listener's actual head!), or (b) have all the original positional information about the sound sources, and all environmental information affecting propagation and reverb, to compute the total wavefront from all directions converging at the listener's virtual head position, and then convolve that with the listener's HRTF. Neither of these are useful in general; (a) is not useful because different listeners have widely different HRTFs so the best you can do is use a generic dummy head that limits significantly the performance (there are examples you can find online and do your own test — the difference in using an HRTF substantially similar to your own is staggering: you can even clearly tell vertical directionality); (b) is useful only when you can individually record each sound source (ok for virtual ones like in computer games, and impossible for general real world recordings), and you still need the listener's HRTF (either measured, or computed from a laser scan of the head — both impractical and time-consuming). Beyond this, there's still the issue of playback acoustics. With in-ear headphones that's fine. With speakers, delivering the processed binaural sound means you need to perform of cross-talk cancellation (sound from a speaker reaching the opposite ear), which can only be done for a very limited set of spatial positions, and basically means a single listener not moving from the sweet spot.
There is a (c) as well, which doesn't rely on binaural sound and HRTFs, and can handle multiple listeners — spherical many-channel approaches that are exemplified by the BBC's old ambisonics tech. That uses a many-directional (spherically distributed) microphone for recording, encoding in spherical harmonics, and playback on a set of speakers arranged in a sphere. With enough channels (read: too many to be practical) and a treated environment (read: anechoic chamber), you can get good positional audio, but still not approaching what is possible with the binaural case ((a) and (b)).