A New Vulnerability In RSA Cryptography 108
romiz writes, "Branch Prediction Analysis is a recent attack vector against RSA public-key cryptography on personal computers that relies on timing measurements to get information on the bits in the private key. However, the method is not very practical because it requires many attempts to obtain meaningful information, and the current OpenSSL implementation now includes protections against those attacks. However, German cryptographer Jean-Pierre Seifert has announced a new method called Simple Branch Prediction Analysis that is at the same time much more efficient that the previous ones, only needs a single attempt, successfully bypasses the OpenSSL protections, and should prove harder to avoid without a very large execution penalty." From the article: "The successful extraction of almost all secret key bits by our SBPA attack against an openSSL RSA implementation proves that the often recommended blinding or so called randomization techniques to protect RSA against side-channel attacks are, in the context of SBPA attacks, totally useless." Le Monde interviewed Seifert (in French, but Babelfish works well) and claims that the details of the SBPA attack are being withheld; however, a PDF of the paper is linked from the ePrint abstract.
I got a question... (Score:4, Interesting)
VMS (Score:3, Interesting)
I started working on VAX/VMS systems in 1986. I changed jobs to another DEC site after nine months or so. I got an account, put my username in at the appropriate prompt, hit return and then immediately knew that I had entered my old username, not the new one.
I had to think for a bit before I knew the reason: VMS searched SYSUAF.DAT for my username before I entered the password. If it found the username it would stop searching. Later versions did the search after the password had been entered and one type of attack became less likely.
I suppose something like this could be done so that timing can't be used to debug the process runing the algorithm but another way of viewing it is that it is like getting the key from a chip by measuring the amount of power it uses or something. There may be limits to who far protection can go if you have the hardware or are on the same (virtual) machine.
Branch predictor as a covert channel (Score:5, Interesting)
This class of problems is only going to grow as CPUs become less and less deterministic.
Re:Unsecure computer - no secrets. Big deal ! (Score:4, Interesting)
Re:I got a question... (Score:3, Interesting)
Re:Not so bad... (Score:5, Interesting)
Essentially they took a very impractical attack with an unlikely scenario, and created a somewhat practical attack with an unlikely scenario. Avoid the problem scenario which was raised in the prior work last year, and you are still golden.
Re:I got a question... (Score:5, Interesting)
The following loop (adapted from fig. 3 in the paper) should IMHO work as well (although less efficiently): The only branch here is in the for loop, and that's independent of the key. Unless there are exploitable branches in the multiplication routine, of course.
Run it on the GPU (Score:2, Interesting)
Re:Not so bad... (Score:3, Interesting)
The problem is that theoretically the attacker could use javascript or any other locally interpreted language or an ActiveX control under Internet Explorer to run the attacking process as the same user. To get the attacking process scheduled on the same core as the RSA process, just spawn lots of attacker processes. Some of them will get scheduled alongside the crypto process, even on a massively parallel machine.
The big question will be whether multi-core processors in general will be vulnerable to these attacks using L2 cache timing attacks or something similar.
Re: DRM? (Score:3, Interesting)
No no no! I hate when researchers do that!
Not long ago someone published a way to load unsigned kernal drivers in Vista (Vista's DRM mechanisms criticially rely on owners being strictly forbidden to do that). So what happened? Microsoft patched Vista to prevent that mechanism from working... before the OS even made it to market. So the discovery and all that work went entirely to waste.
No, when someone finds an anti-DRM method the best thing to do is not just wait until release, but wait until the DRM scheme has had time to build up a critical usage base. If you find two or more methods at the same time, then only reveal them one at a time. Wait till they release the inevitable new lockdown patch, then wait maybe a week or two, then reveal the next one. (Revealing the second one on the very day they patch may be more fun and impressive, but I think it much better to drag out the value and effect of the multiple techniques over a period of weeks or months.)
-
Redundant ? (Score:3, Interesting)
The whole thing - as critical as it is - spies on the processes running on the machine. It is an indirect attack, checking on the resources used while performing some not broken algorithmic calculations.
When you disable pipelining and cache while doing the calculations, there is not much to spy on and nothing to gain. Just prevent the wanna-be intruder from seeing cache, pipelines, CPU from working makes you safe.
The problem is, that this isn't very practical.
I wished the editors used less misleading headlines. There is no vulnerability in RSA cryptography per se. It is rather that you observe Men At Work and from what you see you can guess what the're doing.
And in principle this applies to any other cryptography just as well. Inclusive DRM (which makes me giggle).
Nobody uses binary exponentiation (Score:2, Interesting)
Great idea! (Score:5, Interesting)
A slight improvement to your idea might be to balance the loop anyway, using D = 1 - di, etc., essentially a vectorized version of figure 4. This would slow it down by a factor of two but it would make it resistant to conventional timing attacks.
Solution to: Branch predictor as a covert channel (Score:4, Interesting)
R0 = 1; R1 = M
for i from 0 to n-1 do
if d[i] then
R1 = R0 * R1 mod N
R0 = R0 * R0 mod N
else
R0 = R0 * R1 mod N
R1 = R1 * R1 mod N
return R0
The key-dependent if statement is the key here, if we can remove all such branches, then there's no Branch Target Buffer entry that depends on it, and no timing channel attack either:
R0 = 1; R1 = M;
for (i = 0; i < n; i++) {
mask = 0 - d[i];
nmask = mask ^ -1;
T0 = R0 & mask;
T0 += R1 & nmask;
T1 = R0 * R1 mod N;
T0 = T0 * T0 mod N;
R1 = T1 & mask;
R0 = T0 & mask;
R0 += T1 & nmask;
R1 += T0 & nmask;
}
return R0;
There are at least three interesting issues here:
a) Most modern cpus have hw support for conditional operations, on x86 this is in the form of CMOVcc which is a (constant-time!) conditional move into a register, but as shown above, it really isn't needed here.
b) The perforance impact of the above branch removal can be negative!
On a P4 a branch miss costs about 20 clock cycles, and since a key-dependent branch will miss 50% of the time, the average cost is 10 cycles. My replacement code above takes around 5 cycles or less on any current cpu.
c) A final possible timing-channel attack would be due to the memory alignment of the R0 and R1 values:
By allocating them at the same address modulo the cpu page size, i.e. at 4 KB offset, the cache lines hit will be the same for both.
When I worked on the asm version of DFC, one of the AES also-rans, I removed a similar timing attack from a core 128-bit modular multiplication operation, using very similar techniques.
Terje
Re:I got a question... (Score:3, Interesting)
The question is if they have exploitable branching. For example, the only + in that algorithm is the C=C+1, and that can clearly be done without exploitable branching, e.g. with the following x86 assembler code (assuming lsb-first storage, as usual for x86): Note that the only branch here is the jnz for the loop, but that doesn't reveal anything about the key.
Also multiplication can be implemented without exploitable branches. I'm not completely sure about modulo, but I don't see an immediate reason why it shouldn't be possible as well.
The fact that internally the processor has to make decisions does not have any relevance for this exploit, because those won't affect the branch predictor (which is only concerned with instruction-level branches, not possible branches in microcode). So yes, from the instruction level view, the addition/multiplication "magically" happen (the "magic" of course being in the microcode and the digital logic of the processor).