SuSE Submits Enhancements for AMD Hammer 57
ackthpt writes "SuSE has this press release as they are submitting enhancements to the Linux kernal particular to the AMD's x86-64 processor instruction set. Anticipated for 2.6 kernel, some enhancements may appear in 2.4, as development is only beginning on 2.5. AMD's take on the announcement as well.". nik notes that SuSE join NetBSD in having ports to Hammer. Usenix members can see the paper Wasabi's Frank van der Linden wrote about the porting effort.
Good stuff! (Score:2)
Re:Good stuff! (Score:1)
off, and their secret contract involves not
supporting AMD (at least the new way)
But I might be wrong, at least isn't Intel said
to include x86-64 compatibility stuff into the
next Pentium IV releases?
This would be a really, really cool way to get
rid of M$ in a large market share, because _if_
IA-64 doesn't pay off, but x86-64 does (and it
will, because of its ease to convert from and
to x86-32), Intel will activate this, both chips
sell (AMDs more I should guess), but M$ OS run
only in 32-bit mode
great, but what about GCC? (Score:5, Interesting)
It's being done!! :) (Score:3, Interesting)
Re:It's being done!! :) (Score:1)
Re:It's being done!! :) (Score:4, Insightful)
already being done... (Score:5, Informative)
Re:great, but what about GCC? (Score:5, Interesting)
Re:great, but what about GCC? (Score:2)
Re:great, but what about GCC? (Score:3, Informative)
GCC's mission statement [gnu.org] is not the running time of executable code, we've recently been having a thread about it on the plan9 mailing list (or comp.os.plan9). (although ours started as a flame from Thomas Bushnell that plan9's 8c was nothing more than a "cute toy" - 8c is more concerned with compilation speed than execution time where it beats GCC hands down, if you want raw execution speed look elsewhere).
It could well be that Intel's compiler will show similar performance gains over GCC on the Hammer.
I wonder if every problem will start to look like a nail when the hammer claws it's way out of the AMD tool box.
Re:great, but what about GCC? (Score:3, Informative)
But Intel's C compiler won't generate 64 bit code, which means that AMD has to rely on GCC for 64 bit applications. So any performance advantage of 64 bit is more than nullified because there's not a decent compiler for it.
compilers (Score:2)
Isn't that only for floating-point code? (Score:2)
Anyway, who's to say AMD don't have a demon proprietary compiler for x86-64 up their sleeve for just this purpose?
Re:Isn't that only for floating-point code? (Score:2)
Besides the fact that they've never developed a compiler before?
intel compiler not free (Score:2)
Intel does provide a number of free open source products [intel.com], including an Intanium assembler [intel.com], library routines, vision routines, and a network performance analyzer.
Linux Kernal? (Score:2, Funny)
Commodore machines have a kernal (Keyboard Entry Read, Network, And Link), linux has a kernel.
To make life more complicated: if you want to run a Unix like OS on a machine with a kernal (like the c64) it is not going to be linux but lunix (http://lng.sourceforge.net/).
Re:Linux Kernal? (Score:2)
But where/when can we get a Hammer? (Score:2)
Re:But where/when can we get a Hammer? (Score:3, Informative)
Re:But where/when can we get a Hammer? (Score:3, Funny)
Yes! Your nearest hardware store should have a good selection!
i sniff a server market takeover .. (Score:1, Interesting)
Re:i sniff a server market takeover .. (Score:1)
The Itanic smells a lot like IBMs ill fated move to the Microchannel bus. On the other hand, if Itanic delivers on the promise of vastly superior performance (doubtful) AND if they make it easy to post I32 programs, then it will have a chance.
It seems likely that Intel will back track and create a hybid 32-64 processor like AMD.
Newbie 64-bit question (Score:2)
For a decimal example, multiply 123,456 by 2 to get 246,912. Imagine your old number system was limited to max. 999. With the new system (max. 999,999) you've effectively multiplied 123*2 = 246 and 456 * 2 = 912 by a single instruction. Of course you'll have to separate the resulting numbers at the end, but you might get improvements if you do multiple instructions in succession.
Re:Newbie 64-bit question (Score:3, Informative)
It's usually not worth doing this if there's no SIMD hardware support, because the time wasted loading your values and then separating them isn't compensated by the gain in speed. Of course there are special cases (like when dealing with bit strings) where this is used by definition (and will be an improvement).
Re:Newbie 64-bit question (Score:3, Informative)
Besides, 64 bit operations are higher latency than 32 bit operations, and the cost of all of the shifting and masking to separate the results would be very high. It would be much faster to just do two separate 32 bit operations.
SIMD is a different story since the hardware assembles and reassembled the operands, and executes them on separate executions units.
Re:Newbie 64-bit question (Score:1)
Is this actually true of the x86-64 instruction set? It would strike me as a very poor design if simple operations (add/sub/bitwise) took more than a single cycle, otherwise having 64 bit words would be rather pointless as you could do 64 bit operations just as fast in 32 bit. The only advantage would be larger register space.
I can't actually find any documentation of instruction timings on AMD's site or x86-64.org. I would guess that most instructions take the same time in 64 bit as 32 bit. The exceptions would be things like multiply/divide etc.
Re:Newbie 64-bit question (Score:1)
Doing SIMD without SIMD hardware is possible (Score:2, Interesting)
Multiplication is a bad example, but it is possible to multiply several numbers at the same time by one or more coefficients. This usually isn't worth it unless the numbers are very small compared to the word size - e.g 4 bits vs 32 bits.
However - there are a lot of operations which can be dramatically improved by packing data without any extra SIMD hardware. For example, you can perform some tricks with bit shifting to do pixel masking 32 bits (or 64!) at a time. You can do addition/subtraction trivially with the only thing to watch out for being the carry.
Whether it's worth it is a case-by-case decision. Sometimes the packing/unpacking/carry correction takes longer than the performance gain.
And here's an example where there's definitely a performance increase! I've used the code below to do motion blur in the past. It's slower than using MMX, but not by much. I wrote it so long ago I don't have any comparitive figures though.
The idea here is that the framebuffer persists the image. The input and output buffers are 8 bits per primary. Now, you could do this a single byte at a time, but that would suck for speed. Instead, 4 bytes are computed at once. The formula for each output byte is based on:
out = (out * 3 + in) / 4
This is actually performed here slightly less accurately:
out = out / 2 + out / 4 + in / 4
I remove some of the visible artifacts in practise by a post-processing stage where 1 bit of noise is added.
The bit masks are applied to prevent the shifts "leaking" into the next byte in the word. Now, on the topic of 64bit - the above can be performed on 64bit words with no performance loss. This means it goes twice as fast. Although you'd be silly to do this on an architecture with SIMD instructions designed to do exactly this job.
On architectures without SIMD, tricks like this can give you several times speed increase. If anyone's interested in any other tricks I can pull some code onto a web page somewhere.
Re:Doing SIMD without SIMD hardware is possible (Score:2)
Re:Doing SIMD without SIMD hardware is possible (Score:1)
Alternatively you could do:
unsigned inw; unsigned char *in;
inw = *in++;
inw |= inw << 8;
inw |= inw << 16;
This gets the job done in nearly the same time. It depends on the architecture and context I suppose. On ARM you get or+shift in one cycle so the above looks pretty much like the C code:
ldrb r0, [r1]
add r1, r1, #1
orr r0, r0, lsl#8
orr r0, r0, lsl#16
On x86 it takes a ton more instructions, so the multiply ends up better. It's a shame compilers can't spot these kinds of optimisations.
Of course, you shouldn't be storing to memory one byte at a time if you can pack the bytes into words and store words at a time. Modern x86's will merge writes for you but I'd guess at it not making the instruction scheduling any easier for the processor.
Re:Doing SIMD without SIMD hardware is possible (Score:2)
misleading headline (Score:1)
Re:misleading headline (Score:4, Informative)
Interesting read (Score:4, Informative)
Interesting - they tested one of the Hammer CPUs on Suse, but they only ran XP in 32-bit...
Re:Interesting read (Score:3, Informative)
Freedom to Innovate (Score:5, Interesting)
The truth is that there is not a 64-bit version of Windows for the Hammer. AMD was able to modify the existing Linux code to create their own 64-bit version of Linux. This is the best example of the freedom granted by the GPL that I have seen in months. AMD is releasing a new product at the end of the year, and they are able to create a demand for it NOW by having software for it NOW.
Do you remember the lag between the introduction of Intel's Itanium and a Windows version for Itanium? It was not well coordinated. AMD has done the opposite, they created a demand and a use several months before the release, and it's working. We are all drooling over a 64-bit architecture, and we will have 6-8 months to think about (and save up for) the purchase of a Hammer.
This is the freedom to innovate that is granted by the GPL and denied by the MS EULA. GPLed software is going to make AMD some money.
I feel all warm and fuzzy inside.
Re:Freedom to Innovate (Score:1)
Excuse me? (Score:2)
Re:Excuse me? (Score:2)
As far as Windows for the Itanium, who cares? No, seriously, I don't know. I don't follow new Microsoft product releases unless something funny or terrible happens (Bill's demo crashes, or spyware). I am under the impression that there is a 64-bit Itanium Windows out there, but maybe I am wrong. That would be one hell of a lag.
Open Source software vital to hammer success (Score:3, Interesting)
While Hammer will fly at 32 bit code, the 64 bit code will really differentiate the proccessor. Two-way clawhammer Beowulfs should be a huge business. But, the differentiation will really not show on Windows until (unless) they develop a x86-64 bit windows. I wouldn't count on them doing that until Intel comes out with their version of x86-64. (note that I didn't say if). There will be great pressure to recompile and reoptimize Open software to take advantage of the Hammer.
I think this is a wonderful advancement. I run Suse on an athlon now, and will run suse on a dual hammer in probably a year in a half (I can't afford to be bleeding edge). I can't find many optimizations for the Athlon in compilers and such. However, with the Hammer, the optimizations will be out there. Not only will the compilers have flags, but entire distributions will likely be built with re-compiled applications. That would be something I would pay more for.