To render in real-time for a video game (say 60 FPS), you would need a processor that was around 1 million times faster than what we have today.
What is needed is an architectural paradigm shift, not necessarily a more beefy, faster (single-instruction, multiple-threaded-based) GPU.
To elaborate, with a naive implementation where independent kernels are run in parallel, one of the major bottlenecks for ray/path tracing via GPGPU processing is that every warp, a set of 32 threads, essentially executes the same instruction, with branching realizing by transparently masking out threads; as such, if branching often leads to divergent threads, then there will be low hardware utilization and performance will degrade. With a more robust implementation, you can improve hardware utilization by appropriately partitioning sub-kernels, but you'll run into issues when you start handling secondary rays.
For ray/path tracing to be carried out in an expeditious manner, it would be prudent to move to a programmable multiple-instruction, multiple-threaded (MIMT) architecture with many small cores that can handle many threads. In fact, researchers have been moving in this direction for quite some time now and the results are rather promising: while an NVIDIA GTX 285, which has a die area of around 300mm^2, can handle around 100M primary rays/sec and 60M diffuse rays/sec, with a thread issue rate of ~70% and ~50%, respectively, a custom MIMT ASIC solution, with an area of 200mm^2 at the same fabrication level, can reach around 400M primary and diffuse rays/sec, with a thread issue rate of ~70-80% for both. (As an aside, I have a paper, that's being submitted to either the ACM Trans. Graphics or IEEE Trans. Comp. Graphics and Vis., on an FPGA and theorized ASIC solution that blows these numbers away.)