Comment Re:on-board AES? (Score 1) 219
The fastest code that I know of for AES in CTR mode is Kasper-Schwabe. It does 8 128-bit encryptions at a time, so it also should be suitable for, say, PMAC if you doctor it. I believe that it does not handle decryption (outside of CTR mode where it's the same as encryption) or other key sizes. Modes other than CTR lose some optimization, and should be ~20% slower. It should be available on Kasper's homepage. It requires SSSE3 and reportedly achieves 6.9 cycles/byte on Nehalem for CTR mode.
My code is available here. On Nehalem, it achieves ~9.4 cycles encrypting, ~11.1 cycles decrypting in essentially any mode. It is suitable for encryption or decryption, and supports all three key sizes (longer keys are slower, of course). A newer (unreleased, experimental) version makes slight performance improvements (maybe down to 9.1 cycles encrypting on Nehalem) and implements an optimization for CTR mode that brings it down to ~7.5 cycles. Email me (mhamburg AT cs DOT stanford DOT edu) if you want to try the experimental version. However, my code fundamentally requires SSSE3, and it performs quite poorly on Conroe.
Also, Dan Bernstein (homepage) has somewhere a fast conventional (not timing-attack resistant, but not requiring any sort of SSE) implementation of AES for several processors, and I've heard Crypto++ is pretty fast too.
I believe that all of the above libraries are public-domain and patent-free.
Out of curiosity, what's your application? Can you just get a VIA Nano or Intel Westmere core and run on that?