Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×

A New Vulnerability In RSA Cryptography 108

romiz writes, "Branch Prediction Analysis is a recent attack vector against RSA public-key cryptography on personal computers that relies on timing measurements to get information on the bits in the private key. However, the method is not very practical because it requires many attempts to obtain meaningful information, and the current OpenSSL implementation now includes protections against those attacks. However, German cryptographer Jean-Pierre Seifert has announced a new method called Simple Branch Prediction Analysis that is at the same time much more efficient that the previous ones, only needs a single attempt, successfully bypasses the OpenSSL protections, and should prove harder to avoid without a very large execution penalty." From the article: "The successful extraction of almost all secret key bits by our SBPA attack against an openSSL RSA implementation proves that the often recommended blinding or so called randomization techniques to protect RSA against side-channel attacks are, in the context of SBPA attacks, totally useless." Le Monde interviewed Seifert (in French, but Babelfish works well) and claims that the details of the SBPA attack are being withheld; however, a PDF of the paper is linked from the ePrint abstract.
This discussion has been archived. No new comments can be posted.

A New Vulnerability In RSA Cryptography

Comments Filter:
  • I got a question... (Score:4, Interesting)

    by sam0vi ( 985269 ) on Saturday November 18, 2006 @05:50PM (#16899318)
    When i see this kind of news the following question arises: so what are we supposed to do now? Throw away RSA cryptography is not the best answer i think. What do you, fellow /.ers, would do to by pass this problem?
  • VMS (Score:3, Interesting)

    by MichaelSmith ( 789609 ) on Saturday November 18, 2006 @06:14PM (#16899528) Homepage Journal

    I started working on VAX/VMS systems in 1986. I changed jobs to another DEC site after nine months or so. I got an account, put my username in at the appropriate prompt, hit return and then immediately knew that I had entered my old username, not the new one.

    I had to think for a bit before I knew the reason: VMS searched SYSUAF.DAT for my username before I entered the password. If it found the username it would stop searching. Later versions did the search after the password had been entered and one type of attack became less likely.

    I suppose something like this could be done so that timing can't be used to debug the process runing the algorithm but another way of viewing it is that it is like getting the key from a chip by measuring the amount of power it uses or something. There may be limits to who far protection can go if you have the hardware or are on the same (virtual) machine.

  • by hpa ( 7948 ) on Saturday November 18, 2006 @06:16PM (#16899542) Homepage
    This isn't really a flaw in RSA cryptography, but rather the fairly obvious situation that a branch predictor, shared between processes of different privilege levels, can be used as a covert channel and thus can be used to reveal state. The same is true with the cache, for example, and multithreading makes this problem many times worse by increasing the bandwidth of the channel. On architectures which don't have branch predictors, or don't share them, this is not an issue. ARM processors, for example, tend to rely on predication rather than branches (except when running Thumb), and thus don't suffer the same problem.

    This class of problems is only going to grow as CPUs become less and less deterministic.
  • by Cid Highwind ( 9258 ) on Saturday November 18, 2006 @06:31PM (#16899660) Homepage
    Think managed web hosting companies that put dozens of virtual hosts on a single physical server. If this really works from an unprivileged account, one malicious user could steal SSL keys from all the rest.
  • by Charles Dodgeson ( 248492 ) * <jeffrey@goldmark.org> on Saturday November 18, 2006 @06:53PM (#16899820) Homepage Journal
    What about just adding small random timing loops into the encryption algorithm?
    Apparently that is already done to thwart what the paper calls "classical timing attacks", but this attack is going after shared information about optimization. That information sharing seems part of the CPU architecture from my brief look at the paper.
  • Re:Not so bad... (Score:5, Interesting)

    by SnowZero ( 92219 ) on Saturday November 18, 2006 @06:56PM (#16899848)
    It gets better. The attack requires that the two processes are running on the same core with hyperthreading enabled (i.e. ALU-poor CMP). The "spy" process will be sucking up 100% cpu pretty much continuously. They also simplified the multiplication routine from OpenSSL. Even if you are running such a setup on a P4 with HT turned on (even though its often useless), and you need to run secure processes along with unsecure ones (generally not a good idea anyway), patches already exist for Linux and BSDs to address this. The patches modify the scheduler to prevent processes from different users from running on the same physical core. A half-hearted attempt is made in the paper to say that these attacks to generalize to something remote, but no details are given as to how their attack would compensate for the 100,000 fold decrease in timing accuracy to pull off the attack on even a local LAN.

    Essentially they took a very impractical attack with an unlikely scenario, and created a somewhat practical attack with an unlikely scenario. Avoid the problem scenario which was raised in the prior work last year, and you are still golden.
  • by maxwell demon ( 590494 ) on Saturday November 18, 2006 @07:45PM (#16900226) Journal
    After now having read the complete article: Shouldn't it be possible to eliminate the branches completely?
    The following loop (adapted from fig. 3 in the paper) should IMHO work as well (although less efficiently):
    S = M
    A = M - 1
    for i from 1 to n-1 do
      S = S * S (mod N)
      C = di /* should be doable without branch by just bit masking and shifting */
      C = C * A
      C = C + 1 /* now if di was 1, C is M, otherwise C is 1 */
      S = S * C (mod N)
    return S
    The only branch here is in the for loop, and that's independent of the key. Unless there are exploitable branches in the multiplication routine, of course.
  • Run it on the GPU (Score:2, Interesting)

    by creysoft ( 856713 ) on Saturday November 18, 2006 @08:45PM (#16900710)
    This seems to rely on spy software watching your particular RSA application decrypt things, and thus said spy software would need to be running on the same hardware. Wouldn't it make sense to start writing RSA implementations that use the computer's GPU whenever possible? Using the GPU as a coprocessor has been discussed on Slashdot previously, and the main disadvantage was that the GPU isn't typically optimized for general purpose computing. However, since most GPU's are optimized for massive number crunching, I would think that RSA would play to its strengths. Obviously, whether or not the GPU is "good" at it, it would still come with the advantage of moving the implementation off the main CPU, and thus foiling any malware. (At least until they design malware that runs on the GPU...)
  • Re:Not so bad... (Score:3, Interesting)

    by DamnStupidElf ( 649844 ) <Fingolfin@linuxmail.org> on Saturday November 18, 2006 @09:39PM (#16901042)
    Even if you are running such a setup on a P4 with HT turned on (even though its often useless), and you need to run secure processes along with unsecure ones (generally not a good idea anyway), patches already exist for Linux and BSDs to address this. The patches modify the scheduler to prevent processes from different users from running on the same physical core.

    The problem is that theoretically the attacker could use javascript or any other locally interpreted language or an ActiveX control under Internet Explorer to run the attacking process as the same user. To get the attacking process scheduled on the same core as the RSA process, just spawn lots of attacker processes. Some of them will get scheduled alongside the crypto process, even on a massively parallel machine.

    The big question will be whether multi-core processors in general will be vulnerable to these attacks using L2 cache timing attacks or something similar.
  • Re: DRM? (Score:3, Interesting)

    by Alsee ( 515537 ) on Saturday November 18, 2006 @10:23PM (#16901218) Homepage
    I would love to see Vista DRMs cracked before the OS even make it to the market... :)

    No no no! I hate when researchers do that!

    Not long ago someone published a way to load unsigned kernal drivers in Vista (Vista's DRM mechanisms criticially rely on owners being strictly forbidden to do that). So what happened? Microsoft patched Vista to prevent that mechanism from working... before the OS even made it to market. So the discovery and all that work went entirely to waste.

    No, when someone finds an anti-DRM method the best thing to do is not just wait until release, but wait until the DRM scheme has had time to build up a critical usage base. If you find two or more methods at the same time, then only reveal them one at a time. Wait till they release the inevitable new lockdown patch, then wait maybe a week or two, then reveal the next one. (Revealing the second one on the very day they patch may be more fun and impressive, but I think it much better to drag out the value and effect of the multiple techniques over a period of weeks or months.)

    -
  • Redundant ? (Score:3, Interesting)

    by udippel ( 562132 ) on Saturday November 18, 2006 @11:19PM (#16901500)
    Even though this might be redundant after so many comments, it might be summarized again that the good prof has not broken RSA. Actually, the vulnerability has little to nothing to do with RSA.

    The whole thing - as critical as it is - spies on the processes running on the machine. It is an indirect attack, checking on the resources used while performing some not broken algorithmic calculations.
    When you disable pipelining and cache while doing the calculations, there is not much to spy on and nothing to gain. Just prevent the wanna-be intruder from seeing cache, pipelines, CPU from working makes you safe.
    The problem is, that this isn't very practical.

    I wished the editors used less misleading headlines. There is no vulnerability in RSA cryptography per se. It is rather that you observe Men At Work and from what you see you can guess what the're doing.
    And in principle this applies to any other cryptography just as well. Inclusive DRM (which makes me giggle).

  • by lgalfaso ( 1029202 ) on Sunday November 19, 2006 @12:36AM (#16901836)
    Hi, This paper is based on the wrong assumption that the algorithm that is used is binary exponentiation. This is false as every single respectable implementation uses N-ary multiplication or sliding windows in the worst case scenario. In both of these cases, the attack as shown in the paper would only be able to predict minimal information. Also, the claimed statement that you can do nothing with this type of attack is completely false (even in the case of binary exponentiation.) Just do this: - If you have a 1, do as usual - If you have a 0, do as in the case you have a 1, but ignore the last value, and use the result of the squaring. Regards, LG
  • Great idea! (Score:5, Interesting)

    by DrJimbo ( 594231 ) on Sunday November 19, 2006 @12:57AM (#16901932)
    That is a clever vectorization of the square-multiply loop. It sure looks to me like it would work (I used RSA encryption as the final project in a University assembly language class I taught). The slight decrease in efficiency of your routine will be not be noticed. The timing of the entire process is totally dominated by the N-byte x N-byte multiplications. An extra N-byte x 1-byte multiplication will cause less than a 1% slowdown, probably much less.

    A slight improvement to your idea might be to balance the loop anyway, using D = 1 - di, etc., essentially a vectorized version of figure 4. This would slow it down by a factor of two but it would make it resistant to conventional timing attacks.

  • by Terje Mathisen ( 128806 ) on Sunday November 19, 2006 @02:35PM (#16905446)
    From the linked article:

    R0 = 1; R1 = M
    for i from 0 to n-1 do
        if d[i] then
            R1 = R0 * R1 mod N
            R0 = R0 * R0 mod N
        else
            R0 = R0 * R1 mod N
            R1 = R1 * R1 mod N
    return R0

    The key-dependent if statement is the key here, if we can remove all such branches, then there's no Branch Target Buffer entry that depends on it, and no timing channel attack either:


    R0 = 1; R1 = M;
    for (i = 0; i < n; i++) {
        mask = 0 - d[i]; // Either 0 or -1
        nmask = mask ^ -1; // -1 or 0
        T0 = R0 & mask; // Either 0 or R0
        T0 += R1 & nmask; // At this point T0 will point to the value to be squared, R0 or R1!

        T1 = R0 * R1 mod N;
        T0 = T0 * T0 mod N; // Now we move the correct values back into R0 & R1
        R1 = T1 & mask;
        R0 = T0 & mask;
        R0 += T1 & nmask;
        R1 += T0 & nmask;
    }
    return R0;


    There are at least three interesting issues here:

    a) Most modern cpus have hw support for conditional operations, on x86 this is in the form of CMOVcc which is a (constant-time!) conditional move into a register, but as shown above, it really isn't needed here.

    b) The perforance impact of the above branch removal can be negative!
    On a P4 a branch miss costs about 20 clock cycles, and since a key-dependent branch will miss 50% of the time, the average cost is 10 cycles. My replacement code above takes around 5 cycles or less on any current cpu.

    c) A final possible timing-channel attack would be due to the memory alignment of the R0 and R1 values:
    By allocating them at the same address modulo the cpu page size, i.e. at 4 KB offset, the cache lines hit will be the same for both.

    When I worked on the asm version of DFC, one of the AES also-rans, I removed a similar timing attack from a core 128-bit modular multiplication operation, using very similar techniques.

    Terje
  • by maxwell demon ( 590494 ) on Sunday November 19, 2006 @05:03PM (#16906636) Journal
    everyone one of the statements that contain *, mod and + have
    branching. Or did you just think they magically happen?

    The question is if they have exploitable branching. For example, the only + in that algorithm is the C=C+1, and that can clearly be done without exploitable branching, e.g. with the following x86 assembler code (assuming lsb-first storage, as usual for x86):
    ADD1TOC:
      mov ecx, LENGTH
      mov esi, ADDRC
      mov edi, ADDRC
      clc
    LOOP:
      lodsd
      adc eax, 1
      stosd
      dec ecx
      jnz LOOP
    Note that the only branch here is the jnz for the loop, but that doesn't reveal anything about the key.

    Also multiplication can be implemented without exploitable branches. I'm not completely sure about modulo, but I don't see an immediate reason why it shouldn't be possible as well.

    The fact that internally the processor has to make decisions does not have any relevance for this exploit, because those won't affect the branch predictor (which is only concerned with instruction-level branches, not possible branches in microcode). So yes, from the instruction level view, the addition/multiplication "magically" happen (the "magic" of course being in the microcode and the digital logic of the processor).

New York... when civilization falls apart, remember, we were way ahead of you. - David Letterman

Working...