Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×

Comment Re:on-board AES? (Score 1) 219

The fastest code that I know of for AES in CTR mode is Kasper-Schwabe. It does 8 128-bit encryptions at a time, so it also should be suitable for, say, PMAC if you doctor it. I believe that it does not handle decryption (outside of CTR mode where it's the same as encryption) or other key sizes. Modes other than CTR lose some optimization, and should be ~20% slower. It should be available on Kasper's homepage. It requires SSSE3 and reportedly achieves 6.9 cycles/byte on Nehalem for CTR mode.

My code is available here. On Nehalem, it achieves ~9.4 cycles encrypting, ~11.1 cycles decrypting in essentially any mode. It is suitable for encryption or decryption, and supports all three key sizes (longer keys are slower, of course). A newer (unreleased, experimental) version makes slight performance improvements (maybe down to 9.1 cycles encrypting on Nehalem) and implements an optimization for CTR mode that brings it down to ~7.5 cycles. Email me (mhamburg AT cs DOT stanford DOT edu) if you want to try the experimental version. However, my code fundamentally requires SSSE3, and it performs quite poorly on Conroe.

Also, Dan Bernstein (homepage) has somewhere a fast conventional (not timing-attack resistant, but not requiring any sort of SSE) implementation of AES for several processors, and I've heard Crypto++ is pretty fast too.

I believe that all of the above libraries are public-domain and patent-free.

Out of curiosity, what's your application? Can you just get a VIA Nano or Intel Westmere core and run on that?

Comment Graphics core (Score 1) 219

It's a shame they didn't make a mid-level version with no graphics core, a la Core i7 860. As a crypto/security guy, I'd like to try out PCLMULQDQ, the AES instructions and maybe the IOMMU. But if I'm going to get a fancy new computer, I might as well put a decent graphics card in it, at which point their on-die graphics card is simply a waste of space, power, money and latency. And no, I'm not dropping $1k for a 6-core Gulftown.

Comment Re:on-board AES? (Score 3, Informative) 219

Why put AES on-board?

They're not: they're putting extra instructions on-board which help implement AES more efficiently. They may also allow you to implement other algorithms more efficiently, though I haven't looked at them in enough detail to be sure.

The instructions perform a single round of AES (which has 10-14 rounds depending on key size), either encrypting or decrypting. Certain other algorithms such as Lex, Camellia, Fugue and Grostl use AES S-boxes in their core, and can probably benefit from these instructions. However, they will not achieve nearly so much a speedup as AES.

The AES instructions themselves will approximately double the speed of sequential AES computations. This is very unimpressive; VIA's AES instructions are much faster. They will also make it resistant to cache-timing attacks without losing speed, which is unimpressive because you can already do this on Penryn and Nehalem. The low speed results from the AES instructions having latency 6; if you can use a parallel mode (GCM, OCB, PMAC, or CBC-decrypt, for example) then the performance should be 10-12x the fastest current libraries. Hopefully, this will cause people to stop using CBC mode, but perhaps I'm too optimistic.

Intel also added an instruction called PCLMULQDQ which does polynomial multiplication over F_2. If it's fast (I can't find timing numbers, but hopefully it's something like latency 2 and throughput 1) then it will be very useful for cryptography in general, speeding up certain operations by an order of magnitude or more. This is more exciting to me than the AES stuff, because it might enable faster, simpler elliptic-curve crypto and similarly simpler message authentication codes. Unfortunately, these operations are still slow on other processors, so cryptographers will be hesitant to use them until similar instructions become standard. If the guy you're communicating with has to do 10x the work so that you can do half the work... well, I guess it's still a win if you're the server.

I thought AES was relatively fast as encryption algorithms go.

That still doesn't make it fast at an absolute level. Particularly when you're doing full-disk encryption with user account encryption on top and IPSEC on all your network connections.

AES is fast for a block cipher, but modern stream ciphers such as Salsa20/12, Rabbit, HC and SOSEMANUK are about 3-4x faster. (In other words, they are still faster than AES in a sequential mode on Westmere.) AES is still competitive, though, if you can use OCB mode to encrypt and integrity-protect the data at the same time.

The fastest previous Intel processor with cutting-edge libraries in the most favorable mode could probably encrypt or decrypt 500MB/s/core at 3-3.5GHz. This is fast enough for most purposes, but in real life with ordinary libraries you'd probably get a third of that. So this will significantly improve disk and network encryption if they use a favorable cipher mode.

Cred: I am a cryptographer, and I wrote what is currently the fastest sequential AES library for Penryn and Nehalem processors. But the calculations above are back-of-the-envelope, so don't depend on them.

Comment Re:Premature optimization is evil... and stupid (Score 1) 249

This is only the case until most all instructions spend exactly 1 cycle in an execution unit. This includes shifts, adds, subs, muls, ands, ors, xors, and so on and on. Thats the state of the modern processor.

mul has latency 3 on Core i7. Of course, this assumes that by "modern" you mean i7. If we're talking about mobile processors, the latency is terrible on Atom, and probably not so hot on ARM either.

All of these instructions are 3 cycle latency on the AMD64 processors that I am familiar with (pre-phenom) and essentially the first cycle is loading the operands from the register pool into an execution unit, the second cycle is the execution, and the third cycle is retirement from the execution unit, updating the register pool. Core2's and i7's have these down to 2 cycle latency.

Every major desktop processor built in the last 10 years (maybe more like 20?) has single-cycle latency (or less!) for most simple operations (add/sub/xor/shift), enabled by forwarding between different stages of the pipeline. I don't know if lea counts for this because it's CISC-y, but something like an add always has single-cycle latency.

Comment Re:rule of the code (Score 2, Informative) 249

One of the courses I took was programming for parallelism. For extra credit, the instructor assigned a 27K x 27K matrix multiply; the person with the best time got a few extra points. A lot of the class worked hard in trying to optimize their code to get better times, I got the best time by playing with the compiler flags.

Really? Because I had a similar assignment (make Strassen's algorithm as fast as possible, in the 5-10k range) in my algorithms class a while back. I found that the key to a blazing fast program was careful memory layout: divide the matrix into tiles that fit into L1, transpose the matrix to avoid striding problems. Vectorizing the inner loops got another large factor. Compiling with -msse3 -march=native -O3 helped, but the other two were critical and took a fair amount of effort.

Comment Re:Premature optimization is evil... and stupid (Score 1) 249

GCC on x86 these days likes to emit small multiplies as one or two lea instructions. It gives you a = b + [1248]c + const. This lets you multiply by 2,3,5 or 9 in one cycle, along with the shifter which lets you multiply by 1,2,4,8,... in one cycle. Between these, you should be able to multiply by any constant up to 21 in 2 cycles, and add a constant to boot. Similarly, you can multiply by a smaller range of values and add another register and a constant as well.

You can do this even better on ARM, where every instruction gets a free shift or rotate. In 2 cycles you can multiply by thousands of different values.

Regardless of this, on any platform with a multiplier, the multiplier is faster for some random unknown value. For a small known value, though, add/shift ladders may be faster.

Comment Re:Stop with the drugs already (Score 1) 595

I don't remember the exact details, but I'll guess based on what I can find on the Internet. I'm guessing that I remember wrong and you should s/antigen/toxin/g. Wikipedia says that some malaria vaccines target the parasite itself, and some target the toxins it produces. People receiving anti-toxic vaccines would still be infected and would still have toxins in their systems, but the toxins would be reduced by an immune response.

Suppose these toxins increase malaria's infectiousness in some way (which is a reasonable guess, because otherwise malaria wouldn't produce them... killing your host for no reason is not adaptive). Then there may be a competitive advantage for strains which produce more toxins, so that the anti-toxic immune response is less effective. This means that malaria would evolve to be more harmful and more lethal, especially to unvaccinated people.

The same probably isn't true for an anti-parasitic vaccine, especially if it can completely prevent infection.

Comment Re:1:1 (Score 1) 582

There is a way to make some real, mathematical sense of something like IEEE 754 floating point numbers. In analysis, real numbers are defined as equivalence classes** of Cauchy sequences* of rational numbers. Arithmetic and comparison for > and -0. Furthermore, you get the usual laws like +0 * n === +0 for n > 0, +0 * +0 === +0, etc.

You can also add in infinity as a set of sequences of points whose absolute values grow without bound. These aren't Cauchy sequences, so infinity isn't a number. Nonetheless, the usual arithmetic gives you 1/0 === inf and 1/inf = 0 (It's not quite === because you can't get sequences like [0,0,0,...]), n+infinity === infinity, but infinity+infinity and infinity-infinity aren't defined (and therefore you don't have inf = inf).

Similarly to +-0, you can define +inf (resp -inf) as the subset of infinities consisting of sequences which are eventually positive (resp negative). You then get 1/+0 === +inf, (+inf) + (+inf) === +inf, 1/+inf === +0, etc. You also get comparisons like +inf > -inf, +inf != -inf and +inf > n for all real numbers n.

If you do these things, you end up with the same basic concepts as IEEE 754, but with a few differences:

  • There is an unsigned 0 in addition to +0 and -0; things break otherwise. 1-1 === 0 rather than +0.
  • Despite +0 = -0, the most straightforward definition also has +0 > -0.
  • Infinity isn't a number. +0 and -0 aren't strictly numbers either, but they're equal to 0 which is a number.
  • Infinities are incomparable to themselves unless you add a special rule. This isn't too surprising because (+inf) - (+inf) isn't defined.
  • There is usually an unsigned infinity, which is the result of 1/0. Unsigned infinity is neither greater nor less than any number.

Disclaimer: I am a mathematician, but I mostly just sucked this out of my thumb in the last 15 minutes, so it might well contain errors and/or be different from some more standard definition.

*A Cauchy sequence is a sequence that "ought to converge" because as you go down the sequence, the elements all get very close to one another. Decimal expansions are Cauchy sequences: pi, for example, is the limit of a sequence [3, 3.1, 3.14, 3.141, ...]. You can see that the terms of this sequence are all close together eventually, because after the 8th they're all within 0.0000001 of 3.1415926, etc.

** The equivalence relation x == y is that the sequence of differences x_i - y_i, which is easily seen to be a Cauchy sequence, converges to 0. This is a relationship between sequences, not numbers, so it's different from equality as numbers, which I'll write as x=y meaning that x'=y' for any x',y' in x,y; and identity as sets, which I'll write ===.

Comment Re:Stop with the drugs already (Score 1) 595

I'd say that you're both wrong. Preventing someone from getting polio doesn't generally make them more likely to get tuberculosis. This is the opposite case from antibiotic soaps, where killing off the colony on a surface just makes way for another, different colony. The new colony isn't particularly more or less likely to be harmful, but it is more likely to be resistant.

On the other hand, I read (too lazy to dig up the reference) a study that suggests that some (but not all) malaria vaccines may encourage evolution of a more harmful strain of the disease. If the vaccine targets an antigen that increases both pathogenicity and infectiousness, then strains which express more of that antigen may be more successful, and also more harmful to unvaccinated people.

Comment Emerging encryption technologies (Score 1) 93

"Emerging encryption technologies" such as Gentry's doubly-homomorphic encryption (which is what the link points to) tend to have a major disadvantage: they tend to be horribly inefficient. We're talking 6 orders of magnitude minimum, probably more like 12 orders. Unless there's a major breakthrough, this is not going to help.

Cryptographic engineering solutions, like DRM, might help. But then again, they might not: they require lots of engineering effort from the cloud providers, which they have little incentive to perform; and even then, DRM technologies don't have the greatest security record.

Operating system security measures will probably be very useful to protect against attacks, not from the hosting provider, but from other clients. These measures are tricky and unlikely to provide "perfect" security, but can definitely make attacks much more difficult.

I predict that after conventional defenses are applied, the solution will be either be less paranoid, or don't move to the cloud.

And yes, I am a cryptographer.

Comment Re:Too bad the US can't comprehend this concept (Score 1) 204

Is this known to be the cause of the fire?

Yes, but a footnote in the operator's manual warns against operating near any sort of combustible material, because MegaCorp knows how to cover its collective ass.

Lawsuits aren't usually simple, which is why I agree that judges should have discretion in cases like this.

Comment Re:Too bad the US can't comprehend this concept (Score 1) 204

If it was a candle, that's one thing. But any electrical device connected to mains power could short-circuit, overheat and set the whole place ablaze. This includes at least his lights, electronics and prolly a dozen heavy appliances.

Or do you Norwegians turn off the heat at night and throw the breakers?

Or have I been trolled?

Comment Re:Too bad the US can't comprehend this concept (Score 1) 204

This wouldn't help as much as you might think. Legal services are expensive. CommonMan probably can't afford a lawyer unless he wins, so even if he only has to pay "normal" attorney's fees, losing probably bankrupts him anyway. It basically negates the advantages of contingency.

Slashdot Top Deals

"Ninety percent of baseball is half mental." -- Yogi Berra

Working...