Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×

SHA-1 Cracking On A Budget 92

cloude-pottier writes "An enterprising individual went on eBay and found boards with more than half a dozen Virtex II Pro FPGAs, nursed them back to life and build a SHA-1 cracker with two of the boards. This is an excellent example of recycling, as these were originally a part of a Thompson Grass Valley HDTV broadcast system. As a part of the project, the creator wrote tools designed to graph the relationships between components. He also used JTAG to make reverse engineering the organization of the FPGAs on the board more apparent. More details can be seen on the actual project page."
This discussion has been archived. No new comments can be posted.

SHA-1 Cracking On A Budget

Comments Filter:
  • by MollyB ( 162595 ) * on Saturday September 01, 2007 @06:40AM (#20432487) Journal
    If this story is hard to understand (was for me), then a comment following TFA might be useful, if you don't/didn't read that far:

    5. FPGA - field programmable gate arrays are sort of like reconfigurable circuitry - they can be programmed to perform complex computations in one giant "step", rather than as a sequence of instructions (how a general purpose cpu like the pentium operates).

    This makes them fairly pointless for general computing, but when you need to crunch a bunch of numbers in the same way over and over, they can REALLY outperform a general cpu. Usually these are used to manipulate audio / video data streams in real time (the original purpose for the FPGAs used in this project) - but recently people have started using them to brute-force try to crack an encryption scheme. Where a general purpose cpu might take upwards of 40 clock cycles to check one possible answer, each of the FPGAs in this system can check at least one answer PER clock cycle.

    This guy pulled a bunch of FPGA systems out of some (defective?) HDTV video processing systems - reverse engineered exactly how everything was wired together, reprogrammed the FPGAs to do SHA-1 hash cracking rather than HDTV video processing, and added some usb control circuitry so the system could take commands from / return results to a pc.

    One could use this same board setup to do any sort of massively parallel data processing, but right now the system isn't wired up to really feed large amounts of data into / out of the system in real time. He can get away with that as hash cracking results are fairly small and infrequent, so the limited means he has for getting "answers" out of the system isn't too much of a problem.

    Posted at 4:39AM on Sep 1st 2007 by smilr
    HTH.

  • Re:Benchmarks? (Score:5, Informative)

    by Cheesey ( 70139 ) on Saturday September 01, 2007 @07:06AM (#20432553)
    Seems to me that it searches all possible 64-bit words that could be given to SHA-1. It cleverly reorders the search so that the Hamming distance of each block is at most 2 bits from the previous block, which allows Virtex block RAM resources to be used as part of the hash hardware. FPGA engineers often only use block RAMs for caches, FIFOs and scratchpads, so it is interesting to see them being used as part of the pipeline in this way despite the two-port limitation.

    So it doesn't search all possible inputs to SHA-1, but maybe you could use it in this situation:

    SHA-1(salt, password) = hash
    Given hash and salt, you can find password by a brute force search on this hardware (assuming password is less than 9 characters in length). This could be useful for obtaining user passwords from /etc/shadow when something like md5crypt is in use, although md5crypt might well be designed to defeat/slow down this type of attack, for example by using multiple rounds of hashing (as done by the older DES-based crypt program).
  • How fast is that? (Score:3, Informative)

    by tcdk ( 173945 ) on Saturday September 01, 2007 @07:06AM (#20432555) Homepage Journal

    NSA@home is a fast FPGA-based SHA-1 and MD5 bruteforce cracker. It is capable of searching the full 8-character keyspace (from a 64-character set) in about a day in the current configuration for 800 hashes concurrently.


    Anybody, have an idea how fast that is compared to modern a CPU?

    IIRC, the last time I did anything like this it took my 2200+ AMD about 24 hours to do a 6-character keyspace (from 64-character set) - with MD5.
  • SHA-cracker? (Score:5, Informative)

    by owlstead ( 636356 ) on Saturday September 01, 2007 @07:06AM (#20432559)
    That's nice, his own SHA-1 cracker. But, even with advanced cryptographic attacks, SHA-1 is still in the order of 2^63. Not something you would like to try with just a few FPGA's. What is meant here is a cracker to find out which plain text, with limited entropy, is used to create a certain hash value. A SHA-1-based password cracker would therefore be a better name, I suppose.

    It seems from here [unaligned.org] that it searches a 64 ^ 8 = (2 ^ 6) ^ 8 = 2 ^ 48 keyspace in 24 hours. No small feat, it should therefore do about 3,257,812,230 hashes in a second. It does 800 concurrently, which makes for 4 million a second per SHA-1 unit. Ouch, that's really fast.

    Note that this could be done with any hash or symmetric algorithm, as long as it can be implemented on FPGA. So the moral of the story: use very long password (or even better, pass phrases), or make sure that they won't be able to acquire the hash.
  • Re:How fast is that? (Score:3, Informative)

    by owlstead ( 636356 ) on Saturday September 01, 2007 @07:55AM (#20432779)
    Arg! Whoops, sorry about that. Read the post, but thought you were quoting the summary.

    I've wondered about Cell performance myself for a while, but I haven't got the time to go out of the way to do some measurements. For SSE3/4: I would call it highly unlikely that we would see anything like the performance they are posting: 2 ^ 12 performance difference for MD5 alone is quite a lot. Maybe SSE5 might speedup SHA-2 as well. Anyway, you might want to add the T2 (Rock processor) from Sun to that list, it has 8 crypto hardware accelerators on board. My 1.2 GHz C7 with hardware accelerator is probably way to slow to get anything near the performance, but I'll test it anyway. MD5 is much faster than SHA-1, which in turn is much faster than SHA-2, so you would want to use SHA-2 anyway, as a stop-gap measure.
  • Re:SHA-cracker? (Score:2, Informative)

    by Rythie ( 972757 ) on Saturday September 01, 2007 @08:15AM (#20432831) Homepage
    By comparison my Althon 64 3200+ does about 883,000 16byte hashes a second

    $ openssl speed sha1
    Doing sha1 for 3s on 16 size blocks: 2586683 sha1's in 2.93s
    Doing sha1 for 3s on 64 size blocks: 2063294 sha1's in 2.90s
    Doing sha1 for 3s on 256 size blocks: 1199179 sha1's in 2.75s
    Doing sha1 for 3s on 1024 size blocks: 479901 sha1's in 2.84s
    Doing sha1 for 3s on 8192 size blocks: 71496 sha1's in 2.87s
    OpenSSL 0.9.8c 05 Sep 2006
    built on: Tue Mar 6 08:16:57 UTC 2007
    options:bn(64,64) md2(int) rc4(ptr,char) des(idx,cisc,16,int) aes(partial) blowfish(ptr2)
    compiler: gcc -fPIC -DOPENSSL_PIC -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -m64 -DL_ENDIAN -DTERMIO -O3 -Wa,--noexecstack -g -Wall -DMD32_REG_T=int -DMD5_ASM
    available timing options: TIMES TIMEB HZ=100 [sysconf value]
    timing function used: times
    The 'numbers' are in 1000s of bytes per second processed.
    type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
    sha1 14125.23k 45534.76k 111632.66k 173034.73k 204074.99k
  • Re:SHA-cracker? (Score:3, Informative)

    by owlstead ( 636356 ) on Saturday September 01, 2007 @09:55AM (#20433319)
    Don't forget that PKCS#5 v2.0 uses an iteration count and a salt. This means that the algorithm is not applied just once, but 1000 times (or more, 1000 is the minimum). This would mean a slowdown of 1000 on these kind of crackers *if* they implement the iteration count. A salt would make it hard to use a default configuration like this one found on internet as well.

    As said, the hash algorithm itself does not matter too much. The problem with all these schemes is that the amount of entropy in the passwords is simply too small. Creating an SSL channel before authentication and sending a (hashed?) password would do for mail services I suppose. That way there is no way to do the off-line cracking; they'll have to attack the server directly.

  • by Space cowboy ( 13680 ) * on Saturday September 01, 2007 @11:34AM (#20433879) Journal
    There are some FPGA's that can control their output impedance on pins, but an FPGA is really for digital electronics - you're using 4-way look-up tables to emulate arbitrary 4-input logic-gates for the most (99.99%) part. I've seen genetic-algorithms produce capacitance-based designs where unconnected circuits affect each other due to analogue effects, but not humans. We tend to stick to the straight and narrow...

    An FPGA really is conceptually very simple, and they're not hard to "program" either... Contrived example:

    Verilog design to add/subtract 2 numbers (you'd never do this, but...)

    module addsub (a, b, addnsub, result);
    input [7:0] a;
    input [7:0] b;
    input addnsub;
    output [8:0] result;
    reg [8:0] result;

    always @(a or b or addnsub)
    begin
    if (addnsub)
    result = a + b;
    else
    result = a - b;
    end
    endmodule

    Compare that to a K&R "C" routine to do the same thing...

    void addsub(a, b, addnsub, result)
    short a;
    short b;
    unsigned char addnsub;
    short *result;

    {
    if (addnsub)
    *result = a + b;
    else
    *result = a - b;
    }

    In both cases, of course, you'd just use the 'if...else...' part, but I wanted to show more language structure...

    The key thing to remember is that in C, all things happen serially, unless you arrange otherwise with threading libraries. In Verilog, any block beginning with 'always @' happens in parallel with every other 'always @' block. Once you've mentally-mapped the concept of vast numbers of threads to the way hardware works, any competent multi-threaded programmer can become a competent hardware engineer.

    Of course, there's "guru stuff" in there as well (as much as you want, trust me :-), you don't get world-beating overnight, but it's relatively easy to get the 80% solution, and that might be just fine. Eking out the last 20% is where it gets hard, as you have to understand the internal structure of the LUTs, and how they interact with the carry-chain, what the LUT->LUT delay can be useful for etc. None of this is at all relevant unless you're missing your timing on a critical circuit (eg: you need 133MHz so your DDR2 SDRAM can work, but the synthesis tools (equivalent to a compiler) only deliver 125 MHz for your design).

    The 'always@' part is the hint of just where the power lies. *Everything* can happen in parallel, so you can build pipelines (like CPU's are pipelined today) into your logic, thus reducing the time taken per step (while taking multiple steps), thus increasing your clock rate. The benefit of course is that although the *first* result comes out in the same time, every clock thereafter, you'll also get a result.

    I wrote a JPEG decoder a couple of years or so ago, running at ~130MHz. That doesn't sound much, but that comes to ~65 MPixels/second because of the pipelining. Looking at the SSE-optimised intel libraries, a CMYK422->CMYK baseline decode (which is what the FPGA was doing) takes 371 clocks/pixel. The intel chip I was comparing to was a 3.6GHz P4, meaning it could do ~9.7 Mpixels/second. For motion-jpeg that's the difference between decoding SD frames (for the P4) and decoding HD frames (for the FPGA)...

    So, FPGAs tend to run slowly (relative to today's CPUs) but can exploit parallelism in ways CPUs just can't, but of course for serial processing, you can't beat a tradition

  • Re:Benchmarks? (Score:3, Informative)

    by BlueParrot ( 965239 ) on Saturday September 01, 2007 @11:55AM (#20434039)

    You might be in the shadow group, and there might be a server application that is in said group in order to read /etc/shadow, so if you can exploit that service to gain access to the contents of the shadow file, you can then try to root the machine after cracking root's (or someone with sudo I guess) password.


    http://www.bash.org/?701504 [bash.org]

All seems condemned in the long run to approximate a state akin to Gaussian noise. -- James Martin

Working...