Filed under: Misc. Gadgets
VeriSign has already teamed up with PayPal to offer one-time use passwords on key fobs, but it looks like it's now found a way to make that additional layer of protection even more portable, partnering with Innovative Card Technologies Inc. to squeeze the disposable digits onto standard size bank cards. Apparently, you'll get a new password after each transaction you make online (displayed by pushing a button on the back of the card), making it theoretically impossible for anyone without the card to access your account, even if they somehow manage to get a hold of your regular password. While it's not clear when the cards will actually be put into use, VeriSign is promising to make an announcement about a "major bank" set to use the cards sometime this month.Read | Permalink | Email this | Comments
Office Depot Featured Gadget: Xbox 360 Platinum System Packs the power to bring games to life!
Just look at the matrix multiplication case. Look at the graph and see that 1000x1000 takes 30 seconds on CPU and 7 seconds on GPU. Let's translate it to Millions of operations per second: CPU -> 33 Mop/s, GPU -> 142 Mop/s Matrix multiplication has cubic complexity so for CPU: 1000 * 1000 * 1000 / 7 seconds / 1000000 = 33 Mop/s
Now think a while: 33 million operations on 1.5 GHz Pentium 4 with SSE (I assume there is no SSE2). Pentium 4 has fuse multiply-add unit which makes it do two ops per clock. So we get 3 billion ops per second peak performance! What they claim is that the CPU is 100 times slower for matrix multiply. That is unlikely. You can get 2/3 of peak on Pentium 4. Just look at ATLAS or FLAME projects. If you use one of these projects you can multiply 1000 matrix in half a second: 14 times faster than the quoted GPU.
Another thing is the floating point arithmetic. GPU uses 32-bit numbers (at most). This is too small for most scientific codes. CPU can do 64-bits. Also, if you use 32-bits on CPU it will be 4 times as fast as 64-bit (SSE extension). So in 32-bit mode, Pentium 4 is 28 times faster than the quoted GPU.
Finally, the length of the program. The reason matrix multiply was chosen is becuase it can be encoded in very short code - three simple loops. This fits well with 128-instruction vertex code length. You don't have to keep reloading the code. For more challenging codes it will exceed allowed vertex code length. The three loop matrix multiply implementation stresses memory bandwidth. And CPU has MB/s and GPU has GB/s. No wonder GPU wins. But I can guess that without making any tests.
The beer-cooled computer does not harm the ozone layer. -- John M. Ford, a.k.a. Dr. Mike