The biggest problem with replicating CUDA is not the technical aspects, but finding VC with enough brains to know whom to hire. Most CS grads have the knowledge, but not the drive. Most liberal arts grads have the drive, the creativity, but not the knowledge. You need to find one with both, because creating the next Nvidia killer will require someone who is boring enough to reinvent the wheel, but has enough creativity to find novel solutions to performance problems.
The computer science and hardware engineering behind the hardware and software (Nvidia/CUDA) have been known for decades. The Nvidia hardware could be replicated with FPGAs - notwithstanding any patents Nvidia might have. The software API could be replicated rather easily; parallelism has been known and studied in computer engineering (again) for decades now. What Nvidia did was political - they provided both the hardware and the API to easily use it in one package which could be understood by the C-Suite class. The challenge was never technical, but marketing.
More specifically, you'd need to understand how compilers work, and how to use YACC or bison, or something similar to generate the compiler code for you. You'd have to understand digital logic and how to create logic functions with NAND gates. If you see an FPGA development kit, know what it is, and think to yourself, "What I could do with that..." you're probably a good fit for the job. And you'd need someone willing to bankroll your project until you could demonstrate that you beat Nvidia on something marketable - like floating point performance. Or power consumption.
From an engineering standpoint, what Nvidia has done is trivial - because the solution could be reproduced by an engineer using already known techniques. But what Nvidia did was to combine technical knowledge with an understanding of their market to produce the dominant position they have today. Any computer engineer worth his diploma could produce a design with FPGAs that would beat Nvidia GPUs, but Nvidia did it first.