The goal of PrivateCore's product was to encrypt everything that's outside of the CPU core using software techniques. So once you've done an attested boot and gotten your crypto keys in order, from that point on anything outside the CPU socket is done in an encrypted manner (except I/O to the network I guess, but definitely hard disk and data going to the DRAM, etc.) Their important selling point here was that you could protect against cold boot attacks, DMA data dumps, data sniffers on the DRAM lines, etc. They also claim to have a secure hypervisor (preventing cross-VM thievery) because they've stripped it down to its bare bones, but I believe this ended up being a secondary concern.
Anyway, their goal was to have unencrypted data in the caches, but encrypt the data before it leaves the chips and goes out to DRAM. Their page is mostly high-level marketing fluff, so if they were claiming to do more than this, I missed it. The hardware for encrypted DRAM accesses exists in specialized platforms (e.g. the XBox 360) but doesn't currently exist in commodity x86 server parts. As such, a friend and I sat down for an evening a while ago and tried to work out how they would do this without a DRAM controller that did the encryption for you.
Again, their goal is to have decrypted data in the caches, encrypted data in the DRAM. The crypto routines would have to be contained in software. The major difficulty is that the cache does whatever the cache wants, so it's really rather difficult to say "when this data is leaving the cache, call the software crypto routines." There is no good way for the hardware to tell you it's kicking data out of the cache. (There are academic proposals for this kind of information, but nothing currently exists.)
We thought up of a number of solutions and were able to validate our guesses against their patent submission. I will gloss over some of the deeper details (such as methods for reverse engineering the cache's replacement policy).
The shortened version is:
1) Work on Intel cores that have >=30 MB of L3
2) Run a tiny hypervisor that fits into some small amount of memory (let's say 10MB)
3) Mark all data in the system that is not the hypervisor code pages are non-cacheable
4) The hypervisor also has the crypto routines, so all of these non-cacheable pages can now be software encrypted using the hypervisor's routines. The DRAM-resident data is now encrypted.
4a) Because these were marked as non-cacheable data, the hypervisor is still resident in the cache (it was never displaced).
5) Mark some remaining amount of space (let's say 20MB) of physical memory as cacheable. This physical memory currently contains no data at all.
6) When you want to run a program or an OS, have the hypervisor move that program's starting code into the 20-meg-range (decrypt it along the way) and set its virtual pages to point to that physical memory range
7) The program can now run because (at least some of its pages) are decrypted. They are also cacheable, so it will hit in the cache
8) When you try to access code or data that is still encrypted, it will cause a page fault
9) The hypervisor's page fault handler will get that encrypted data, decrypt it, and put it somewhere in the 20-meg-range
9a) If the 20 meg page is already full of decrypted data, you will have to re-encrypt some of it and spill it back to DRAM (like paging it out to disk).
Because you are only touching ~30 megs of physical memory that is marked as cacheable, you will "never" spill decrypted data to the DRAM. Essentially, they built a system that has 30 megs of main memory (that 30 megs is SRAM in the core), and DRAM is treated like disk/swap in a demand-paging system.
The reason I am convinced this is likely an acquisition-hire, rather than a hire for a particular product is: this will be unbearably slow on anything but tiny cache-resident applications. Now, not only are cache misses page faults, but they also require moving data around and running through software crypto routines. They claim good performance, but I can't imagine anything but 100x or more slowdowns when dealing with server applications that have large working sets. Faults to supervisor/hypervisor mode are not cheap. Perhaps for CPU-bound benchmark applications you will see little slowdown, but I've seen no solid evidence that their performance is any good. Looking at the way they do this memory encryption, the evidence strongly points towards huge overheads when e.g. walking linked lists or dealing with disparate data. Imagine taking a page fault (and encrypting 4KB of data and then decrypting a different 4KB of data) on every memory access because you're jumping around a 40MB hash table.
Beyond the likely performance problems, they also cannot offer complete guarantees that data in the caches won't spill to DRAM by accident. I can think of a number of possible ways to trick a cache into spilling its data, and I don't believe that any processor manufacturer guarantees that cached data will remain resident. If your security stance is "well, a cache eviction probably won't happen due to reasons outside my control," I don't think you will be very safe against a particularly dedicated hacker. Mind you, this would only leak up to 30MB of user data at a time, but that's still a crack in the dam. Why would you page a huge performance overhead for no actual security guarantee?
Their solution to this problem (in the patent application, anyway) was to periodically scan the cacheable physical memory range and scrub it of unencrypted data.
In summary: I don't believe that Facebook will be implementing this technique in their servers. If they really wanted encrypted DRAM, they would pay Intel or AMD to build a semi-custom processor with encryption techniques built into the DRAM controllers. They bought this company because they want to hire these guys who have a lot of kernel and hypervisor knowledge.