Comment Re:What the hell is AVX-512? (Score 1) 132
With C code like:
int a[4] = { 1, 2, 3, 4 };
int b[4] = { 5, 6, 7, 8 };
for (n = 0; n 4; n++)
{
a[n] = a[n] + b[n];
}
Without vector instructions that would produce a handful of instructions. With vector instructions (MMX, SSE, AVX, AVX-512) it could be done with something like:
movups xmm0, [a]
movups xmm1, [b]
paddd xmm0, xmm1 ; parallel add double words (uint32_t)
movups [a], xmm0
movups loads the 128 bit SSE registers from memory, paddd takes the xmm0 and xmm1 registers, splits them into 4 ints, adds them together, and puts the result back into xmm0. Assuming there are enough execution units, all 4 adds can be done at the same time.
In the end a will be { 6, 8, 10, 12 }. It's also possible to split them into four 32 bit floats, eight 16 bit ints, sixteen 8 bit ints, etc.
AVX adds 256 bit registers and AVX-512 adds 512 bit registers along with masking (something the Playstation 2 could do).
In English, this means with AVX-512, it's possible to compute Mandelbrots 16 pixels at a time in a single thread.