Comment PrivateCore's product - likely the employees (Score 4, Interesting) 18

by Menacer on Friday August 08, 2014 @02:44AM (#47628327) Attached to: Facebook Acquires Server-Focused Security Startup

The goal of PrivateCore's product was to encrypt everything that's outside of the CPU core using software techniques. So once you've done an attested boot and gotten your crypto keys in order, from that point on anything outside the CPU socket is done in an encrypted manner (except I/O to the network I guess, but definitely hard disk and data going to the DRAM, etc.) Their important selling point here was that you could protect against cold boot attacks, DMA data dumps, data sniffers on the DRAM lines, etc. They also claim to have a secure hypervisor (preventing cross-VM thievery) because they've stripped it down to its bare bones, but I believe this ended up being a secondary concern.

Anyway, their goal was to have unencrypted data in the caches, but encrypt the data before it leaves the chips and goes out to DRAM. Their page is mostly high-level marketing fluff, so if they were claiming to do more than this, I missed it. The hardware for encrypted DRAM accesses exists in specialized platforms (e.g. the XBox 360) but doesn't currently exist in commodity x86 server parts. As such, a friend and I sat down for an evening a while ago and tried to work out how they would do this without a DRAM controller that did the encryption for you.

Again, their goal is to have decrypted data in the caches, encrypted data in the DRAM. The crypto routines would have to be contained in software. The major difficulty is that the cache does whatever the cache wants, so it's really rather difficult to say "when this data is leaving the cache, call the software crypto routines." There is no good way for the hardware to tell you it's kicking data out of the cache. (There are academic proposals for this kind of information, but nothing currently exists.)

We thought up of a number of solutions and were able to validate our guesses against their patent submission. I will gloss over some of the deeper details (such as methods for reverse engineering the cache's replacement policy).

The shortened version is:
1) Work on Intel cores that have >=30 MB of L3
2) Run a tiny hypervisor that fits into some small amount of memory (let's say 10MB)
3) Mark all data in the system that is not the hypervisor code pages are non-cacheable
4) The hypervisor also has the crypto routines, so all of these non-cacheable pages can now be software encrypted using the hypervisor's routines. The DRAM-resident data is now encrypted.
4a) Because these were marked as non-cacheable data, the hypervisor is still resident in the cache (it was never displaced).
5) Mark some remaining amount of space (let's say 20MB) of physical memory as cacheable. This physical memory currently contains no data at all.
6) When you want to run a program or an OS, have the hypervisor move that program's starting code into the 20-meg-range (decrypt it along the way) and set its virtual pages to point to that physical memory range
7) The program can now run because (at least some of its pages) are decrypted. They are also cacheable, so it will hit in the cache
8) When you try to access code or data that is still encrypted, it will cause a page fault
9) The hypervisor's page fault handler will get that encrypted data, decrypt it, and put it somewhere in the 20-meg-range
9a) If the 20 meg page is already full of decrypted data, you will have to re-encrypt some of it and spill it back to DRAM (like paging it out to disk).

Because you are only touching ~30 megs of physical memory that is marked as cacheable, you will "never" spill decrypted data to the DRAM. Essentially, they built a system that has 30 megs of main memory (that 30 megs is SRAM in the core), and DRAM is treated like disk/swap in a demand-paging system.

The reason I am convinced this is likely an acquisition-hire, rather than a hire for a particular product is: this will be unbearably slow on anything but tiny cache-resident applications. Now, not only are cache misses page faults, but they also require moving data around and running through software crypto routines. They claim good performance, but I can't imagine anything but 100x or more slowdowns when dealing with server applications that have large working sets. Faults to supervisor/hypervisor mode are not cheap. Perhaps for CPU-bound benchmark applications you will see little slowdown, but I've seen no solid evidence that their performance is any good. Looking at the way they do this memory encryption, the evidence strongly points towards huge overheads when e.g. walking linked lists or dealing with disparate data. Imagine taking a page fault (and encrypting 4KB of data and then decrypting a different 4KB of data) on every memory access because you're jumping around a 40MB hash table.

Beyond the likely performance problems, they also cannot offer complete guarantees that data in the caches won't spill to DRAM by accident. I can think of a number of possible ways to trick a cache into spilling its data, and I don't believe that any processor manufacturer guarantees that cached data will remain resident. If your security stance is "well, a cache eviction probably won't happen due to reasons outside my control," I don't think you will be very safe against a particularly dedicated hacker. Mind you, this would only leak up to 30MB of user data at a time, but that's still a crack in the dam. Why would you page a huge performance overhead for no actual security guarantee?

Their solution to this problem (in the patent application, anyway) was to periodically scan the cacheable physical memory range and scrub it of unencrypted data.

In summary: I don't believe that Facebook will be implementing this technique in their servers. If they really wanted encrypted DRAM, they would pay Intel or AMD to build a semi-custom processor with encryption techniques built into the DRAM controllers. They bought this company because they want to hire these guys who have a lot of kernel and hypervisor knowledge.

Comment Old Paper on Toying with Crackers (Score 2) 174

by Menacer on Tuesday February 01, 2011 @11:50AM (#35068144) Attached to: World's Worst Hacker?

This reminds me of Bill Cheswick's paper "An Evening with Berferd In Which a Cracker is Lured, Endured, and Studied," from the 1992 Winter USENIX Conference. (Paper is available directly from Mr. Cheswick's site here as a postscript file).

In it, he toys with an intruder for a number of days. He pretends the system has actually been hacked, gives up bogus password files, and manually pretends to be a particularly slow machine with a lot of easy holes in it. It's a well-written, excellent piece of writing. I recommend it to anyone who enjoyed this video.

Comment Re:Adobe Reader, now even slower! (Score 4, Insightful) 201

by Menacer on Friday November 19, 2010 @11:50AM (#34282068) Attached to: Adobe Launches Sandboxed Reader X

Just get Foxit and be done with it. It's light weight, doesn't hang browsers while opening large PDFs, has a SIGNIFICANTLY better search interface, and so far hasn't been subject to any major attacks/flaws.

You're incorrect that Foxit reader has not been subject to attacks or flaws. This article from last year, for instance, describes in-the-wild attacks of Foxit. A Google search for "foxit reader buffer overflow" brings up a number of known (though patched by now) exploits.

Foxit reader, like any other piece of software, is bound to have errors. Use it because you like the interface, or use it because it's less likely to be exploited due to its relative unpopularity. Don't delude yourself into thinking it's completely secure. That's the same fallacious argument that some OSX and Linux users make when saying that their operating systems are immune from viruses or worms. They may be more secure when compared to Windows, but there's nothing in their underlying architecture that prevents them from being exploited with enough effort.

Comment Re:Just an extension of existing debug facilities (Score 5, Informative) 154

by Menacer on Friday November 12, 2010 @10:41AM (#34206406) Attached to: Hidden Debug Mode Found In AMD Processors

Sure, but it's much faster to do it in hardware. This is the whole reason data watchpoints exist (See, for instance, the paper "Some Requirements for Architectural Support of Software Debugging" by Mark Scoctt Johnson from ASPLOS-I), as you could technically have your debugger put address & data checks around every memory access, but that leads to completely unacceptable overheads. It's faster to let the hardware check the addresses in parallel with regular execution and take a fault only if you touch the watchpoint.

Similarly, if the hardware will check the value before taking a debug interrupt to the kernel and subsequently signaling/scheduling of the debugger, it will be much, much faster than performing all that and then have the debugger check the address & throw this particular interrupt away before continuing execution. That constant interrupt cycle can cause 10,000x or more slowdowns if you're constantly accessing a value & taking bad watchpoints on it.

Comment Re:Just an extension of existing debug facilities (Score 5, Informative) 154

by Menacer on Friday November 12, 2010 @10:34AM (#34206336) Attached to: Hidden Debug Mode Found In AMD Processors

Oh, and the summary's description, "hardware data-aware conditional breakpoints, and direct hardware 'page guard'-style breakpoints", matches up with the line I copied & pasted from the forum post. I previously described the "hardware data-aware conditional breakpoints"where you can make hardware take a fault if an address of a memory operation is matched && the value of the memory operation matches. Looking through my notes, embedded Power ISA (Book III-E) processors also let you set value-dependent watchpoints using the Data Address Compare (DAC) Registers. I'm not sure about other ISAs.

The second party of the summary's statement refers to to 'page guard'-style breakpoints. This is referenced by Czernobyl's "masking of any or all of 12 low address bits". Again, this is a very interesting extension of the x86 debug registesr, which only allow debug watchpoints of size 1, 2, 4, or 8 bytes (and the latter only in certain microarchitectures & modes) However, by masking out the low 1--12 bits of the address into don't-cares, it's possible to set watchpoints anywhere from 1-4096 bytes, limited to powers-of-two and size-alignment. This is cool from an x86 standpoint, but ARM, MIPS, and Itanium (off the top of my head) already do this.

Suffice it to say, the stuff that Czernobyl found is very cool in relation to x86, especially if these facilities were officially released to the public at any point in the future. However, it's very unlikely to cause any kind of AMD-only viruses or other scary security concerns. These features exist on other ISAs without any kind of world-shattering problems. :)

Comment Just an extension of existing debug facilities (Score 5, Informative) 154

by Menacer on Friday November 12, 2010 @10:13AM (#34206176) Attached to: Hidden Debug Mode Found In AMD Processors

Based solely on the Google cache of the forum post describing this (linked above), there's no need to go into hysterics. For hardware and systems geeks, this is very cool. It's an extension of the existing x86 debug registers (DR0-7) that allows you to set a debug watchpoint that only fires when specific data is loaded in.

There are a lot of researchers and tool builders that would love to have this because it would allow them to take a watchpoint fault whenever they only when they have a specific value from a specific location. For instance, let's say that every so often you get a null pointer exception at a specific address. However, if you current go into gdb and set 'watch 0x{address}', you're going to take a breakpoint every single time that pointer is accessed.. Wouldn't it be great to do something like 'watch 0x{address} NULL' and only stop your debugger whenever 0 gets written into that address?

That's what the forum posts imply, at least. "Guys, I've reversed this in part... breakpoints defined in DR0 can be made to fire only on data match (under optional mask), plus masking of any or all of 12 low address bits ! Works also for I/O break points, provided CR4_DE is set, of course !"

I would wager that this is not a large security concern. Access to DR7 is restricted to ring 0, and therefore enabling debug breakpoints must be done by the operating system. While extremely interesting (I wish I could read more!), Czernobyl appears to be describing a modification to debug breakpoints that are already enabled.

Comment Re:sweet! (Score 5, Informative) 202

by Menacer on Saturday August 07, 2010 @12:03AM (#33171532) Attached to: Debian 6.0 "Squeeze" Frozen

Individuals without a company and contributors with unknown affiliation add more to the Linux kernel than any _individual_ company, but that does not negate the statement that "the majority of contributions to Linux are from profit-making corporations". Red Hat, Novell, and IBM together make more Linux kernel contributions than all of the unaffiliated and unknown-affiliation contributors combined.

The document you appears to have misread even includes this sentence: "It is worth noting that, even if one assumes that all of the 'unknown' contributors were working on their own time, over 70% of all kernel development is demonstrably done by developers who are being paid for their work."

Comment PrivateCore's product - likely the employees (Score 4, Interesting) 18

Comment Old Paper on Toying with Crackers (Score 2) 174

Comment Re:Adobe Reader, now even slower! (Score 4, Insightful) 201

Comment Re:Just an extension of existing debug facilities (Score 5, Informative) 154

Comment Re:Just an extension of existing debug facilities (Score 5, Informative) 154

Comment Just an extension of existing debug facilities (Score 5, Informative) 154

Comment Re:sweet! (Score 5, Informative) 202

Slashdot Top Deals

Slashdot