Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
Check out the new SourceForge HTML5 internet speed test! No Flash necessary and runs on all devices. ×
AMD

SuSE Submits Enhancements for AMD Hammer 57

ackthpt writes "SuSE has this press release as they are submitting enhancements to the Linux kernal particular to the AMD's x86-64 processor instruction set. Anticipated for 2.6 kernel, some enhancements may appear in 2.4, as development is only beginning on 2.5. AMD's take on the announcement as well.". nik notes that SuSE join NetBSD in having ports to Hammer. Usenix members can see the paper Wasabi's Frank van der Linden wrote about the porting effort.
This discussion has been archived. No new comments can be posted.

SuSE Submits Enhancements for AMD Hammer

Comments Filter:
  • Hammer is definitely gonna be an interesting and very cool set of chips! Glad to see someone is working on enhancing linux for it. Especially since the big bad wolf in Redmond hasn't yet even done a beta of 64-bit XP for the Hammers.
    • No wonder they don't, because Wintel is paying
      off, and their secret contract involves not
      supporting AMD (at least the new way) ;)

      But I might be wrong, at least isn't Intel said
      to include x86-64 compatibility stuff into the
      next Pentium IV releases?

      This would be a really, really cool way to get
      rid of M$ in a large market share, because _if_
      IA-64 doesn't pay off, but x86-64 does (and it
      will, because of its ease to convert from and
      to x86-32), Intel will activate this, both chips
      sell (AMDs more I should guess), but M$ OS run
      only in 32-bit mode ;)
  • by Khopesh ( 112447 ) on Saturday March 02, 2002 @09:37AM (#3097497) Homepage Journal
    this is truely a great move in the right direction, but we also need to see something like a gcc support and optimization for this new architecture. AMD, please: you are the expert on your chips. As Intel made it's own free compiler, so too can you. Ideally, release your compiler via MIT-License, LGPL, GPL, or something similar, and releasing an optimization for GCC would blow my mind.
    • It's being done!! :) (Score:3, Interesting)

      by Daath ( 225404 )
      FreeBSD [freebsd.org] is working on an x86-64 GCC! Actually AMD itself has sponsored this! Take a look at the link!
    • by scorcherer ( 325559 ) on Saturday March 02, 2002 @09:45AM (#3097514) Homepage
      Take a look at GCC main page [gnu.org] and you'll see a note on the x86-64 port contributed by SuSE.
    • by Ace Rimmer ( 179561 ) on Saturday March 02, 2002 @09:46AM (#3097516)
      There are some people in SuSE working on gcc Hammer optimizations this is a part of the contract between AMD and SuSE.
    • On the other hand, that is a pretty big nail in Hammer's coffin. Intel's compiler is 2x the speed of GCC, and since it doesn't support x86-64, there goes any performance advantage (and then some) which Hammer was supposed to have.
      • Not necessarily. Recall from this [slashdot.org] slashdot story about this [open-mag.com] article the intel compiler also showed similar results over GCC when targetting the Athlon.

        GCC's mission statement [gnu.org] is not the running time of executable code, we've recently been having a thread about it on the plan9 mailing list (or comp.os.plan9). (although ours started as a flame from Thomas Bushnell that plan9's 8c was nothing more than a "cute toy" - 8c is more concerned with compilation speed than execution time where it beats GCC hands down, if you want raw execution speed look elsewhere).

        It could well be that Intel's compiler will show similar performance gains over GCC on the Hammer.

        I wonder if every problem will start to look like a nail when the hammer claws it's way out of the AMD tool box.
        • Oh, I have no doubt that Intel's compiler will produce great 32 bit code on Hammer. Hammer is just a proliferation of K7 with 64 bit extensions, and AMD knows how to optimize their hardware for that compiler (they use it when submitting their SPEC scores).

          But Intel's C compiler won't generate 64 bit code, which means that AMD has to rely on GCC for 64 bit applications. So any performance advantage of 64 bit is more than nullified because there's not a decent compiler for it.
        • I agree with you. If AMD were to release it's own set of x86-64 optimizations for GCC, very few of them would find themselves in the GNU release of GCC. HOWEVER, I am suggesting an AMD-GCC distribution/supplement; this would be released as a set of diffs on the current GNU GCC and neither "waste" GNU developers' time nor "bloat" the standard GCC distribution.
      • I've seen that quoted before, but from what I've read the performance differences on integer code aren't anywhere near that great. If you're running Apache, for instance, the relative floating-point performance is neither here nor there.

        Anyway, who's to say AMD don't have a demon proprietary compiler for x86-64 up their sleeve for just this purpose?

    • Just a nit-pick, but Intel compilers [intel.com] actually cost: $500 for linux C/C++ compiler [shop-intel.com] ($125 academic [intel.com])

      Intel does provide a number of free open source products [intel.com], including an Intanium assembler [intel.com], library routines, vision routines, and a network performance analyzer.
  • by tuxzone ( 64722 )
    To straighten things out:
    Commodore machines have a kernal (Keyboard Entry Read, Network, And Link), linux has a kernel.

    To make life more complicated: if you want to run a Unix like OS on a machine with a kernal (like the c64) it is not going to be linux but lunix (http://lng.sourceforge.net/).

    • I remember reading that in commodore hacking magazine trivia section - I always thought it was a classic case of commodore naming something them wrapping an acronymn around something.
  • Are Hammers available right now? If so, where can I get one? Strictly for research purposes, of course...... ;)
  • by Anonymous Coward
    because intel put their itanium 64bit egg in the windows xp64 basket.
    • I agree, and possibly a bigger slice of the desktop market as well.

      The Itanic smells a lot like IBMs ill fated move to the Microchannel bus. On the other hand, if Itanic delivers on the promise of vastly superior performance (doubtful) AND if they make it easy to post I32 programs, then it will have a chance.

      It seems likely that Intel will back track and create a hybid 32-64 processor like AMD.
  • Would it be possible to process two 32-bit operations at once in a 64-bit system? I imagine this is possible considering the information content.

    For a decimal example, multiply 123,456 by 2 to get 246,912. Imagine your old number system was limited to max. 999. With the new system (max. 999,999) you've effectively multiplied 123*2 = 246 and 456 * 2 = 912 by a single instruction. Of course you'll have to separate the resulting numbers at the end, but you might get improvements if you do multiple instructions in succession.

    • This is called a SIMD (single instruction, multiple data) operation. It's what MMX is all about.

      It's usually not worth doing this if there's no SIMD hardware support, because the time wasted loading your values and then separating them isn't compensated by the gain in speed. Of course there are special cases (like when dealing with bit strings) where this is used by definition (and will be an improvement).
    • No, you can't, unless you can guarantee that the result from the lower half of the operand will not affect any bits in the upper half. For multiplication this will happen all the time but for addition it will happen whenever the lower operand carries over.

      Besides, 64 bit operations are higher latency than 32 bit operations, and the cost of all of the shifting and masking to separate the results would be very high. It would be much faster to just do two separate 32 bit operations.

      SIMD is a different story since the hardware assembles and reassembled the operands, and executes them on separate executions units.
      • Besides, 64 bit operations are higher latency than 32 bit operations, and the cost of all of the shifting and masking to separate the results would be very high. It would be much faster to just do two separate 32 bit operations.

        Is this actually true of the x86-64 instruction set? It would strike me as a very poor design if simple operations (add/sub/bitwise) took more than a single cycle, otherwise having 64 bit words would be rather pointless as you could do 64 bit operations just as fast in 32 bit. The only advantage would be larger register space.

        I can't actually find any documentation of instruction timings on AMD's site or x86-64.org. I would guess that most instructions take the same time in 64 bit as 32 bit. The exceptions would be things like multiply/divide etc.

    • Granted the GPRs are being extended from 32bits to 64bits, but more than that it's x86-64 because it has 64 bit *ADDRESSING*.
    • Ignore the other replies - it is possible to do this, and it definitely is a speed increase. See the example code below. You just have to be careful about the packing arrangement of data in each word, and the overlap when performing operations on them.

      Multiplication is a bad example, but it is possible to multiply several numbers at the same time by one or more coefficients. This usually isn't worth it unless the numbers are very small compared to the word size - e.g 4 bits vs 32 bits.

      However - there are a lot of operations which can be dramatically improved by packing data without any extra SIMD hardware. For example, you can perform some tricks with bit shifting to do pixel masking 32 bits (or 64!) at a time. You can do addition/subtraction trivially with the only thing to watch out for being the carry.

      Whether it's worth it is a case-by-case decision. Sometimes the packing/unpacking/carry correction takes longer than the performance gain.

      And here's an example where there's definitely a performance increase! I've used the code below to do motion blur in the past. It's slower than using MMX, but not by much. I wrote it so long ago I don't have any comparitive figures though.

      unsigned *bufin = (unsigned *) buffer;

      unsigned *bufout = (unsigned *) motionbuf;
      unsigned mask1 = 0xfcfcfcfc;
      unsigned mask2 = 0xfefefefe;
      for(unsigned n = (width * height) >> 2; n; n--) {
      unsigned in = *bufin++;

      unsigned out = *bufout;
      in &= mask1;
      in >>= 2;
      out &= mask2;
      out >>= 1;
      out += (out & mask2) >> 1;
      *bufout++ = in + out;
      }

      The idea here is that the framebuffer persists the image. The input and output buffers are 8 bits per primary. Now, you could do this a single byte at a time, but that would suck for speed. Instead, 4 bytes are computed at once. The formula for each output byte is based on:

      out = (out * 3 + in) / 4

      This is actually performed here slightly less accurately:

      out = out / 2 + out / 4 + in / 4

      I remove some of the visible artifacts in practise by a post-processing stage where 1 bit of noise is added.

      The bit masks are applied to prevent the shifts "leaking" into the next byte in the word. Now, on the topic of 64bit - the above can be performed on 64bit words with no performance loss. This means it goes twice as fast. Although you'd be silly to do this on an architecture with SIMD instructions designed to do exactly this job.

      On architectures without SIMD, tricks like this can give you several times speed increase. If anyone's interested in any other tricks I can pull some code onto a web page somewhere.

      • Another simple example is turning a 1-byte grayscale image into a 4-byte "color" image as needed by some hardware by multiplying each input byte by 0x1010101. I have measured this and it definately is faster than storing the 4 characters one after another into the output buffer.
        • Hmm, forgot about that one. Depending on the hardware, the multiply will end up single cycle because it'll early out after the first 8 bits of coefficient. I know recent ARM processors perform 12 bits per cycle, so you tune coefficients to be less than 4096. I think recent x86 processors are single cycle regardless of coefficient, possibly with some result latency.

          Alternatively you could do:

          unsigned inw; unsigned char *in; ...
          inw = *in++;
          inw |= inw << 8;
          inw |= inw << 16;

          This gets the job done in nearly the same time. It depends on the architecture and context I suppose. On ARM you get or+shift in one cycle so the above looks pretty much like the C code:

          ldrb r0, [r1]
          add r1, r1, #1
          orr r0, r0, lsl#8
          orr r0, r0, lsl#16

          On x86 it takes a ton more instructions, so the multiply ends up better. It's a shame compilers can't spot these kinds of optimisations.

          Of course, you shouldn't be storing to memory one byte at a time if you can pack the bytes into words and store words at a time. Modern x86's will merge writes for you but I'd guess at it not making the instruction scheduling any easier for the processor.
          • I did try doing shifts and or into a word, instead of the multiply and the result was slower, this was on both MIPS and Pentium. I'm sure the compilers could have done better, but it is possible that the only better thing would have been to recognize the equivalence to the multiply and do that.
  • No, SuSE is submitting enhancements for Linux for the AMD Hammer. Made me think they were actually making suggestions to the chip design for a second.
  • Interesting read (Score:4, Informative)

    by Matrim9 ( 558092 ) on Saturday March 02, 2002 @10:23AM (#3097612)
    http://www6.tomshardware.com/cpu/02q1/020227/

    Interesting - they tested one of the Hammer CPUs on Suse, but they only ran XP in 32-bit... :o
    • Re:Interesting read (Score:3, Informative)

      by Phosphor3k ( 542747 )
      Just a clarification. Tom did not test these. This was a demonstration at a trade show that everyone and their brother has been reporting on. No one was allowed to test any software on these machines. However, this is the FIRST batch of hammers tested in public. The series on the cpu was A0. Generally, AMD and Intel do not test/show of such early production CPUs to the public as they are still going through testing/debugging.
  • Freedom to Innovate (Score:5, Interesting)

    by Paul the Bold ( 264588 ) on Saturday March 02, 2002 @01:25PM (#3098238)
    You will recall that when AMD demoed hammer recently, they showed a 32-bit Windows system and a 64-bit Linux system. People were commenting on AMD preferring Linux over Windows, therefore showing a more powerful Linux demo than a Windows demo.

    The truth is that there is not a 64-bit version of Windows for the Hammer. AMD was able to modify the existing Linux code to create their own 64-bit version of Linux. This is the best example of the freedom granted by the GPL that I have seen in months. AMD is releasing a new product at the end of the year, and they are able to create a demand for it NOW by having software for it NOW.

    Do you remember the lag between the introduction of Intel's Itanium and a Windows version for Itanium? It was not well coordinated. AMD has done the opposite, they created a demand and a use several months before the release, and it's working. We are all drooling over a 64-bit architecture, and we will have 6-8 months to think about (and save up for) the purchase of a Hammer.

    This is the freedom to innovate that is granted by the GPL and denied by the MS EULA. GPLed software is going to make AMD some money.

    I feel all warm and fuzzy inside.
    • Yeah, you're probably right on that one. Although I wouldn't bet on a 64-bit linux version creating _that_ much demand for a processor. Sure, some geeks (including me) will want one ASAP, but remember, we're just talking about the kernel now. I'll feel warm and fuzzy inside when people who produce user-land software will compile AND optimise for the Hammer. Little OT: I _reeeeaaally_ hope AMD will keep using the hammer names (although I realise it's not likely)
    • When was the Itanium released? Where is Windows for the Itanium?
      • I don't remember exactly when it was released, but a search on pricewatch [pricewatch.com] for "Itanium" brings up a lot of vendors. It's out there.

        As far as Windows for the Itanium, who cares? No, seriously, I don't know. I don't follow new Microsoft product releases unless something funny or terrible happens (Bill's demo crashes, or spyware). I am under the impression that there is a 64-bit Itanium Windows out there, but maybe I am wrong. That would be one hell of a lag.

  • by AZPhysics ( 561228 ) on Saturday March 02, 2002 @03:21PM (#3098721)

    While Hammer will fly at 32 bit code, the 64 bit code will really differentiate the proccessor. Two-way clawhammer Beowulfs should be a huge business. But, the differentiation will really not show on Windows until (unless) they develop a x86-64 bit windows. I wouldn't count on them doing that until Intel comes out with their version of x86-64. (note that I didn't say if). There will be great pressure to recompile and reoptimize Open software to take advantage of the Hammer.

    I think this is a wonderful advancement. I run Suse on an athlon now, and will run suse on a dual hammer in probably a year in a half (I can't afford to be bleeding edge). I can't find many optimizations for the Athlon in compilers and such. However, with the Hammer, the optimizations will be out there. Not only will the compilers have flags, but entire distributions will likely be built with re-compiled applications. That would be something I would pay more for.

Dreams are free, but you get soaked on the connect time.

Working...