Please create an account to participate in the Slashdot moderation system


Forgot your password?

Submission + - John the Ripper Cracks Slow Hashes On GPU ( 1

solardiz writes: "New community-enhanced version of John the Ripper adds support for GPUs via CUDA and OpenCL, currently focusing on slow to compute hashes and ciphers such as Fedora's and Ubuntu's sha512crypt, OpenBSD's bcrypt, encrypted RAR archives, WiFi WPA-PSK. A 5 times speedup over AMD FX-8120 CPU per-chip is achieved for sha512crypt on NVIDIA GTX 570, whereas bcrypt barely reaches the CPU's speed on AMD Radeon HD 7970 (a high-end GPU). This result reaffirms that bcrypt is a better current choice than sha512crypt (let alone sha256crypt) for operating systems, applications, and websites to move to, unless they already use one of these "slow" hashes and until a newer/future password hashing method such as one based on the sequential memory-hard functions concept is ready to move to.

The same John the Ripper release also happens to add support for cracking of many additional and diverse hash types ranging from IBM RACF's as used on mainframes to Russian GOST and to Drupal 7's as used on popular websites — just to give a few examples — as well as support for Mac OS X keychains, KeePass and Password Safe databases, Office 2007/2010 and ODF documents, Firefox/Thunderbird/SeaMonkey master passwords, more RAR archive kinds, WPA-PSK, VNC and SIP authentication, and it makes greater use of AMD Bulldozer's XOP extensions."

Comment Re:NTLM hasn't been in active use for a while (Score 1) 45

NTMLv2 uses a challenge response system and so you can't offline crack it in the same way.

John the Ripper, in the -jumbo versions (community-enhanced), includes support for cracking of both NTLMv1 and NTLMv2 challenge/responses - see NETNTLM_README in the documentation and NETNTLM_fmt.c and NETNTLMv2_fmt.c in the source tree.

Comment Re:i7 what? Who cares? (Score 1) 45

Matt -

I have so many comments on what you wrote that I don't dare to post them. :-) But I'll say a few things:

Password policies still make sense to me when combined with modern (salted and stretched) password hashes, particularly for large user databases where each account is of relatively little value (your Sony example applies here). Rather than absolutely require certain character classes, users should also be given the option to use longer passphrases, where the number of required character classes can be reduced to 2. I think you have our passwdqc in DragonFly (via FreeBSD), right? Well, it includes passphrase support by default, starting with 3 words of combined length 11, including separator characters - or longer, indeed.

Thank you for describing your authentication methods policy for developers. We use a similar policy for multi-developer or multi-sysadmin projects:

- Alexander

Comment Re:DES is slow and 3DES is slower (Score 2) 45

Slow? DES used to be slow prior to bitslicing. The 33 Gbps figure I mention is on par with that for AES using specialized instructions, but without reliance on such instructions. Sure, 3DES is 3x slower. But even for 3DES we get around 10 cycles/byte on one CPU core, which is on par with AES without specialized instructions. That said, data encryption with DES/3DES is in fact not the primary intended use for our results. We realize perfectly well that people want to hear "AES" these days.

DES is being used for non-encryption a lot. Is authentication truly of no relevance to people that care about having secure encryption?

Is security auditing or other work on/with existing systems that use DES as a component now not worthwhile? Should we treat them as black boxes? It is not realistic to expect all of them to be gone in a few years from now. So research on DES is still relevant. Granted, smaller S-box circuits don't directly enable an attack better than slightly faster key search, but they may be useful in further research, including in cryptanalysis of DES itself - e.g., bitslice implementations of DES were used for differential cryptanalysis of DES.

There are side-channel attacks on AES. Sure, they are not always relevant, but so are the DES/3DES concerns you mention. In many cases, side-channel attacks are a practical threat.

How many fully pipelined AES cores can you fit in an FPGA chip doing password hashing in an authentication server (with lots of parallelism included per one hash computation by our new hashing method)? And how many DES? The difference may be an order of magnitude, in favor of DES. And this means that our password hashes become this much slower to attack by CPUs/GPUs, compared to hypothetical hashes built on top of AES yet implemented in FPGA. (The small key and block sizes of DES may be dealt with by appropriate use of DES, and the slowdown is not a problem at all for this application - it's only efficient use of resources that matters.)

We actually wanted to build a password hashing method on top of SHA-2 and/or AES - since this is what people want to hear - but it is so tempting to build upon DES and/or Blowfish instead, resulting in much better properties against a number of realistic attack scenarios (offline password cracking on different kinds of hardware) that we're seriously considering these. To make people happy, we might call this most important component "non-crypto", add a PBKDF2 with SHA-256 or SHA-512 step, and show how the cryptographic security of our hashing method as a whole only depends on the latter. Everyone is happy. But DES, if we use it in the "non-crypto" component, plays an important role.

Summary: for some applications AES is better (perhaps for most of them), but for some DES is a better building block.

Finally, circuit minimization has uses beyond DES, and similarly sized S-boxes exist in other ciphers. So advances in this area may have uses beyond DES.

Comment Re:i7 what? Who cares? (Score 2) 45

Bitwise operations are not an issue. Besides, we have versions of our S-box expressions that primarily use "bit select" instructions, such as ATI's BFI_INT - these work on PowerPC/AltiVec in JtR 1.7.8, but I think they will see even more use on high-end AMD/ATI GPUs (this is what we primarily had in mind).

The real issue is register pressure (bitslice DES needs a lot of registers) and memory latencies. In our S-box expressions, we tried to minimize not only gate count, but also the number of registers needed for storage of temporary values in a software implementation. This was among the criteria applied to choose a few best versions among thousands of same-gate-count expressions that we generated. We also cared about the amount of inherent parallelism available in a single instance of the code for each S-box, even though it sort of contradicted the preference to require fewer registers.

Comment Re:i7 what? Who cares? (Score 2) 45

AES-NI is definitely too specific to AES, not reasonably reusable for DES. Yes, we have achieved a speed for DES comparable to that of AES with AES-NI.

We're actually considering building a password hashing method on top of something like this, where bitslice DES has the advantage of being scalable to arbitrary SIMD vector widths and not requiring specialized instructions for efficient implementation. DES is also FPGA-friendly (more so than AES), and we have a project to implement password hashing for authentication servers equipped with FPGA boards: - project rationale - alternative approach

We're also considering Eksblowfish-like constructions, though - such as to make use of Xilinx Block RAMs (and thus require attackers to use more resources too).

BTW, not sure if I am speaking to the right Matt, but of the two SHA-crypt flavors the SHA-512 based one actually has a practical advantage over the SHA-256 one: more complete use of 64-bit CPUs in servers. So I think Dragonfly BSD's choice was a mistake. GPU implementations for both are being worked on, and the difference should be seen.

Comment Re:Stupid question from crypto-newb (Score 2) 45

6-to-4 is large enough that you can't realistically find a perfect solution (the absolute smallest gate count) on present computers and given present knowledge. You can do it for 5-to-1, though. Also, generic Boolean expression minimization tools produce relatively poor results for DES S-boxes; specialized algorithms are the way to go. IIRC, I tried Espresso - - in late 1990s. It couldn't even get close to Matthew Kwan's results from 1998, where he used a specialized algorithm.

Comment Re:i7 what? Who cares? (Score 4, Interesting) 45

Here are some specific performance numbers for DES-based crypt(3) on GPUs (for comparison, recall that we're reporting over 20M c/s on a CPU):

oclhashcat-plus is reported to achieve 55M on ATI HD 5970, only 25M on NVidia GTX570 at 1600 MHz core clock, 310M on 8x ATI HD 6970, 181M on 7x NVidia GTX580 (1594 MHz). The numbers for oclhashcat-lite are very similar (57M, 26M, 297M, 181M, respectively). These are off the hashcat website. This does not use our new S-boxes yet (I expect that future versions of *hashcat tools will).

Notice how the number for high-end NVidia is on par with that for our CPU, and for ATI is less than 3x better. Of course, GPUs do have an advantage, but it still does make sense to use CPUs as well, which a typical organization has more of and doesn't need to spend extra time to deploy, install drivers for, etc.

Now, our new S-boxes and other optimizations will provide better performance. Per discussions with a tripcode cracker author, I expect all the way up to 400M c/s on ATI HD 5970, which is close to its theoretical peak speed (approx. 80% of it per some estimates). This is a 20x improvement over our figure for the Core i7 CPU, which is significant. (There's a little room for improvement on the CPU as well, though - specifically, if we pre-generate or runtime-patch the code for each salt as opposed to using pointers at runtime like we do now. This kind of optimization is assumed in the 400M figure for the GPU. So with both having the optimization, the GPU's advantage will be less than 20x.)

Curiously, 400M c/s for 25 iterations of DES will mean that a single ATI HD 5970 with proper code will be able to crack 56-bit DES keys in just 42 days on average.

So, yes, GPUs have an advantage, and we have contributed to that as well.

Comment Re:i7 what? Who cares? (Score 2) 45

Actually, a lot of people care about CPUs. I spoke to someone from a penetration testing company the other day. They run a lot of password hash cracking. And they have 10x more CPUs (used for other purposes as well) than GPUs (bought specifically for password cracking). Given that performance of DES-based crypt(3) on GPUs is by far not as impressive as it is for other hash types, they typically test this sort of hashes on CPUs and not GPUs.

That said, yes, when we worked on the S-boxes, we thought of GPUs as well. One of our target sets of "logic gates" is specifically that of high-end AMD/ATI GPUs (it also works well for Cell, PowerPC/AltiVec, and AMD XOP, but we deliberately excluded gates/instructions that are present on only some of these four platforms). The author of one of the GPU-based cracking tools (for tripcodes) reported a 20% improvement on Radeon HD 5970 due to our new S-boxes. Andrey Belenko of ElcomSoft wrote in a tweet that "Effect for GPUs might be well above 20%, actually."

Comment Re:ONLY 17%? (Score 5, Insightful) 45

We were not the first to generate and try to optimize Boolean expressions for the S-boxes. Other researchers worked on this before, starting 1997 when Eli Biham wrote his classic paper on bitslice DES. 17% is our improvement compared to those previous results. To me, it is impressive that after 14 years and numerous attempts by others, including successful ones, it was still possible to improve on the previous best results by as much as 17% at once. My gut feeling is that further improvements, while definitely possible, will be more limited. But the again, some people I spoke to had thought that our 17% was not possible.

Comment More detail (Score 1) 2


Submission + - 17% Smaller DES S-box Circuits Found ( 2

solardiz writes: "DES is still in use, brute-force key search remains the most effective attack on it, and it is an attractive building block for certain applications (the key size may be increased e.g. with 3DES). Openwall researchers, with funding from Rapid7, came up with 17% shorter Boolean expressions representing the DES S-boxes. Openwall's John the Ripper 1.7.8 tests over 20 million of combinations against DES-based crypt(3) per second on Core i7-2600K 3.4 GHz, which roughly corresponds to DES encryption speed of 33 Gbps."

Comment Re:Here's what's affected (Score 1) 130

Was I the only one who read that as PHP ass framework?

No, a lot of people read it like that, and this word play was deliberate - it actually helps the marketing by attracting extra blog posts ("hey, I found something stupidly named" and the like). ;-) However, the official spelling is "phpass", with no caps, where "pass" is for password or pass (successful authentication).

I previously made a similar spelling joke in naming popa3d, a POP3 server. Russians get this one.

As to the bug, as other people noted the typical alternatives to not using phpass and crypt_blowfish would have been far worse. This is not an excuse. I do feel embarrassed for the bug. But I am also being realistic such that I and others learn the right lessons from this.

In practice, most uses of phpass that I am aware of don't actually use crypt_blowfish (and thus are not affected) - those choose to use phpass' "portable hashes" instead. For passwords without 8-bit chars, those are weaker than crypt_blowfish, but they do avoid the bug. (And with 8-bit chars it is not obvious which are weaker, even despite of the bug.)

(Apparently, I forgot to submit this comment the first time around. To avoid re-typing, I ptrace(2)-dumped my Firefox process memory to a file, then grepped it for "deliberate" - and here the comment is. I am used to doing things like that, e.g. when investigating abusive processes on compromised shared hosting accounts of customers. Maybe someone will find this tip useful or curious.)

Comment Re:Ulrich Drepper was right (Score 1) 130

It seems like Ulrich Drepper was right opposing, in rather harsh words, my suggestions to include bcrypt in glibc. My bad.

I also briefly thought of where we would be if Ulrich accepted bcrypt into glibc. I have several points to make:

1. It is likely that adjusting the crypt_blowfish code to glibc conventions would happen to remove the bug - just like it happened with Perl's Crypt::Eksblowfish (it's based on crypt_blowfish, but the bug is gone). Yes, this does mean that those coding conventions are maybe superior, although it is easier for them to be such when only a more limited number of platforms is considered. There is a lesson for me to learn here.

2. bcrypt is not only crypt_blowfish. glibc could also use OpenBSD's code (lacking this bug) if it looked more suitable by whatever criteria.

3. I just took another look at Ulrich's SHA-crypt.txt, sha256crypt.c, and sha512crypt.c. I don't see any 8-bit characters in passwords in the test vectors. Unless I have missed those, it looks like a bug in SHA-crypt causing similar misbehavior would go unnoticed. No, I do not find it likely that such a bug exists there, but then I also did not find it likely for crypt_blowfish. Anyone wants to test and confirm that there's no 8-bitness bug in SHA-crypt? Please do. But what to test Ulrich's SHA-crypt against? Does Solaris use the same code or a reimplementation? (I don't know.) If it's the same code, and no reimplementation exists, then you'd have to try causing collisions or something like that, perhaps with low "rounds" (to make this test reasonably quick). Or create that reimplementation for testing. (BTW, I did the latter in phpass, for testing the correctness of my implementation of its "portable hashes".)

4. SHA-crypt is reasonably good, especially for acceptance due to its use of NIST-approved SHA-2 family functions. However, it does have its drawbacks. bcrypt turned out to be GPU-unfriendly, whereas we should see reasonably efficient implementations of SHA-crypt for GPUs soon (this is being worked on and I see no major obstacles). In neither case a GPU is usable for password hashing on an authentication server (there's too little parallelism in one instance of a bcrypt or SHA-crypt hash computation), even if you had a GPU there, so GPU-unfriendliness is an advantage of bcrypt if you compare it against SHA-crypt.

5. Finally, there have been plenty of security bugs in glibc.

Slashdot Top Deals

Mathematics is the only science where one never knows what one is talking about nor whether what is said is true. -- Russell