Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
AMD

AMD Delays Hammer 357

TeJarz writes "C|Net reports that their next processor (Hammer) has been rescheduled from its original Q4 release to Q1 2003. To quote C|Net: 'The delays are occurring to accommodate the release of a new version of Athlon with a 333MHz bus, said Crank. Current Athlons come with a 200MHz bus and 256KB of secondary cache.' Let's hope this doesn't get moved again."
This discussion has been archived. No new comments can be posted.

AMD Delays Hammer

Comments Filter:
  • Current Athlons (Score:3, Insightful)

    by Anonymous Coward on Friday September 13, 2002 @12:12AM (#4249327)
    ...have a 266MBz bus
    • by Anonymous Coward
      What is that, MegaBizzatch?
    • Re:Current Athlons (Score:2, Informative)

      by parkanoid ( 573952 )
      Technically it's 133, just double-pumped (it's obvious that the 533 p4s don't actually have 533 mhz fsb, that would be jsut silly). I hope they are reffering to the base speed in this case as well.
      • Re:Current Athlons (Score:4, Informative)

        by packeteer ( 566398 ) <packeteer@sub d i m e n s i o n . com> on Friday September 13, 2002 @02:21AM (#4249717)
        In case everyone doesn't know what "double pumped" or "DDR FSB" mean let me explain. The clock that sets how often data is transfered clicks over and over to keepo the pace. On an Athlon it transfers data twice for every click. On a Pentium 4 its 4 times a click. Right now most Athlons run at 133mhz "DDR FSB". Mine already runs at 166mhz (overclocked of course) and let me tell you its sweet. I cant wait to see everyone have access to 166 mhz FSB Athlons.
        • Something I've never seen a good explanation of -- is there performance-wise any difference between a 266 MHz clock with data transferred once per clock and a 133 MHz clock with data transferred twice per clock (despite the actual clock ticking rate of course)?
          • by kimmo ( 52756 ) on Friday September 13, 2002 @06:14AM (#4250217) Homepage
            Latency.

            With single data rate a new address can be sent every clock for all memory requests.

            With double data rate a new address can be send with every other "clock", but while data transmission rate stays the same. Effectively this means transferring double data for each request, while the amount of requests doesn't change.

            This isn't very serious problem, since single bytes/bus wide data aren't usually transferred, but whole cachelines of 32/64 bytes. They will generate 4/8 sequential burst requests nullifying much of the "halfclocked" address generation potential latency problems.

            Ok, so why can't the addresses be sent like the data is another question which someone else with more knowledge might explain.. Maybe it would complicate things too much since the request-answer mechanism should be pipelined to accept new requests until previous requests are served. Or maybe the physical bus has some limitations, like using the same pins for address/data, which would simply make it impossible to send new addresses simultaneously (on falling edge of clock) while receiving data.
            • The additional latency required to synchronize the address with the rising edge rather than either edge is negligible when you consider the total amount of time required to perform the fetch from L3 or L4. Therefore there is no need to endure the more complex design to implement this.

              Most data is fetched in bursts. So there are typically 4 or more data phases per request. Consequently, there is no need for as high bandwidth for the address bus as for data.

              Plus, as another post said, it reduces the power requirements. This, combined with the fact that there are typically 4 or 8 data transfers per address is why P4 has gone to QDR buses. This way, there is one address per cycle, and an entire 4-unit burst can be completed in a cycle so the address bus could theoretically be completely saturated. Once you pass QDR (to Octal DR?), you may start requiring a higher data rate on the address bus as well for performing two 4-unit bursts per cycle.

  • by Anonymous Coward on Friday September 13, 2002 @12:13AM (#4249328)
    AMD = All Microprocessors Delayed
  • 266 Bus (Score:3, Interesting)

    by clinko ( 232501 ) on Friday September 13, 2002 @12:19AM (#4249355) Journal
    Current Athlons have 266 bus. You can still get the older 200 bus, but it died out about a year ago. Sorted in price on pricewatch [pricewatch.com]
  • Comment removed (Score:4, Insightful)

    by account_deleted ( 4530225 ) on Friday September 13, 2002 @12:22AM (#4249371)
    Comment removed based on user account deletion
    • Because the sooner it comes out, the sooner I get to play with a 64-bit OS development on a machine that gets top performance and doesn't cost $20,000. That alone is reason enough for AMD to ship it sooner.
      • Incidently, you can get a nice new dual Alpha 21264 667 4u rackmount with 4GB ram and 18GB scsi (64 bit) for = $14,000 these days. With educational discount, you can buy a Compaq ES40 (with single cpu to start) for $20K. I have no idea what the used 21164 machines are selling for these days.

        I don't have the same motivation for 64 bit machines (I need them for cycle servers with big memory), but I'm just as anxious for a commoditized 64 bit platform to emerge.

        -Paul Komarek
      • Hell, I got a 500MHz Alpha system for $300, used.

        You are thinking about this all wrong. You seem to believe that a thousand dollar AMD chip is going to perform like a mainframe. You may be seriously disappointed when you figure out that these new 64-bit chips aren't going to make your current systems obsolete at all.
        • Err, I have no such illusions. I expect the Hammer to be about 20-30% faster at a given clock than existing chips, which is somewhat optimistic, but entirely within the expectations for the chip. I want a 64-bit machine because there are some things in OS development that are more fun when you have 64-bits of address space. Things like single-address space operating systems and persistant virtual memory stores become feasible when with 64-bits of address space while they aren't so nice to implement with only 32 bits.
    • by Paul Komarek ( 794 ) <komarek.paul@gmail.com> on Friday September 13, 2002 @12:38AM (#4249421) Homepage
      I, for one, am hoping to replace our Alphas with cpus from the AMD Hammer series. We're about to buy a bunch of P4-based machines despite the problems we've had with certain tight loops in scientific code performing 80 times slower than a similarly clocked Athlon (according to Athlon advertised "speed", not actual clock). No, I'm not exaggerating, and this has been verified independently -- the P4 cpu has some huge weak spots that really suck if you hit them. If Hammer were out and working properly, we probably wouldn't buy the P4 machines to hold us over.

      We need 64 bit machines to accomodate massive memory for our research. I'm really hoping the Hammer can provide a relatively inexpensive and *commoditized* 64 bit platform for us to work on, compared to existing 64 bit (workstation/server) platforms. And I want it yesterday. Actually, I want it last year.

      I have no idea what the editors or submitter meant, of course.

      -Paul Komarek
      • Re:Comment non-sense (Score:3, Interesting)

        by Archfeld ( 6757 )
        Any place I can look for some doc on that issue ?
        We are migrating from our Alphas to dual P4's and seeing a serious drop in performance that should not exist :( The fingers have all been pointed at software optimization and we are doing some heavy duty examinations but it sounds all too pat to me...
        • Re:Comment non-sense (Score:4, Interesting)

          by Paul Komarek ( 794 ) <komarek.paul@gmail.com> on Friday September 13, 2002 @01:28AM (#4249578) Homepage
          I can probably send you some test code (same for anyone else who asks), but I'll have to check with my advisor first. The smallest I've made the test code is a bit under 300 lines. It's been run on Alpha 21264 EV67, Athlon C, Athlon XP, P4, and P-III, and one other Pentium-ish platform. At least two (I believe it's actually three) profilers have been run to find the bottleneck; it appears to be the floating point unit stalling for data.

          Here are the timings. Note that these are just via "time" on GNU/Linux or a wall clock on Windows (or something -- I didn't do the Windows tests).

          P4 dual Xeon 1.7GHz/gcc: 82 seconds
          P3 1000/msvc: 18 seconds
          Athlon C 600/msvc: 2 seconds
          P3 1000/msvc, using floats and sse:
          2 seconds
          Alpha 667/gcc: 2 seconds
          Athlon XP 1900+ 0.88 seconds

          I guess the Athlon's clock was closer to the P4's clock than I recalled in my original post. Either way, the slowdown on the Pentiums can be easily seen.

          -Paul Komarek
          • by stewartjm ( 608296 ) on Friday September 13, 2002 @03:13AM (#4249837)
            The P4's x87 FPU and x86 ALU are just plain slow compared to P3s and Athlons. Though I am surprised your code is running 82x slower. I'd expect more like 2-8x slower for compute bound code. You can get a somewhat sensationalistic overview of why it's so slow at this link [emulators.com].
            If you want more in-depth numbers you can compare appendix C of the Intel Pentium 4 Optimaztion Manual [intel.com] with chapter 29 of Agner Fog's Pentium/II/III Optimization Manual [leto.net]. You can see the Athlon numbers in Appendix F of AMD's Athlon Optimization Manual [amd.com].
            If you want to do number crunching with Pentium 4s your best bet is to use the SSE2 instructions/registers. You should be able to get a noticable speedup by using the Intel C++ compiler [intel.com] and telling it to use SSE2 instructions. If you want to eek out max performance you'll have to use assembly language. Though you can probably get most of the way there using the Intel C++ Compiler's SSE2 intrinsics.
            I'm curious as to why your code is so much slower on a P4 than on an Athlon. The best way to find out would be to look at the assembly code that gcc is producing. You can do that by using gcc's -S option. If you'd like send me the C code and the output from -S and I'll see if I see anything obvious.
            I'm somewhat paranoid about posting my email address. My paranoia seems to work, as I've received no more than the occasional spam in the last few years. My email address is my slashdot user name at woh.rr.com.
            • If you'd like my test code, email me (my address is in my user info). We've already compared gcc's asm on Alpha, Athlon, and P4, and found nothing particularly strange. The stall seems to come from memory fetches. It's possible that blocking our matrix could really improve cache behavior, but it would be fairly painful to implement in this case.

              -Paul Komarek
            • Aren't SSE ops lower precision? I'm guessing that the original poster was more interested in performance for scientific computing of some kind, where precision matters.
          • Re:Comment non-sense (Score:3, Interesting)

            by MSG ( 12810 )
            it appears to be the floating point unit stalling for data.

            Well, if it's stalling for data, your problem is probably that the P4 has a *tiny* L1 data cache compared to... uh... anything. It's only 8K, compared to the Athlons 64K. See the following URLs:
            http://www.tomshardware.com/cpu/02q2/020402 /p4_240 0-01.html
            http://www.geek.com/procspec/intel/nort hwood.htm
            http://www.geek.com/procspec/amd/k7sele ct.htm

            It's probably also worth noting that Intel does NOT list the P4 as a "server processor". The P4 is listed as a desktop or workstation processor. Only P3, Xeon, and Itanium chips are recommended for server use:
            http://www.intel.com/products/browse/process or.htm ?iid=Homepage+Find_Products_Processors&

            You might want to show that to management and reconsider your purchase of P4 equipment. Even a P3 is likely to perform better.
            • Even a P3 is likely to perform better.

              And by saying that, I don't mean to imply that I think the P3 is a good choice, (I like the Athlons :-) I just mean that if the P4 is performing like crap for your applications, then you shouldn't use that processor.
            • It's probably also worth noting that Intel does NOT list the P4 as a "server processor". The P4 is listed as a desktop or workstation processor.

              Quite honestly, I think workstations tend to be more floating-point intensive than servers. For example, how many floating-point calculations does 3D CAD software do vs. Sendmail or LDAP?

              So, new PC customers should be buying "servers" for any graphics, mathematics, or scientific work. This only increases my dislike of Intel's marketing tactics.

              Perhaps Intel should market the P4 as an administrative assistant's toy, and let the engineers and scientists go to Sun, SGI, HP, and IBM for real workstations?
              • You can't believe how much I agree with you. I'm very tired of having to wade through "server" literature just to find a good workstation. OTOH, if you think of the workstation as a "cycle server", well .... =-)

                -Paul Komarek
        • Any place I can look for some doc on that issue ?

          Darek Mihocka of emulators.com has written a whole bunch of stuff about the Pentium 4. He has examples of code that performs badly on Pentium 4, although I'm not sure how the most recent versions of the P4 would work on his code samples.

          http://www.emulators.com/pentium4.htm [emulators.com]

          steveha
      • May I ask why you are going to P4s instead of just getting more Alphas? You yourself said you are loosing quite a bit of performance with the P4 compared with the Alpha, but you don't say why more Alphas aren't an option.
        • Price/performance on the Alphas is low for most of our applications, making the only Alpha selling point it's 64-bitness for big memory. Many of our apps don't need that much space and can run on x86. The few apps that do need 64-bitness will be run on our existing Alphas. If we could get dual Alpha 1GHz machines for the same price as dual P4 Xeons, we would.

          There's also the issue that finding replacement sysadmins for the Alphas isn't as easy as it is for the x86 machines. Alphas aren't much different to admin, but it can be a bit of a speedbump.

          -Paul Komarek
    • "Let's hope this doesn't get moved again."

      There's a damn good reason I want this to come out soon. The sooner AMD comes out with Hammer the sooner Intel has some extremely serious competition. If Hammer can stand up to its hype the P4 won't look so hot, especially if Hammer ramps well in clock speed. Strong competition = lowering of prices. Also, Athlon XPs would then be pushed into the value market. So not only would Intel be forced to drop prices on their desktop and server CPUs, but AMD's old lineup would become and absolute steal. Sounds good for the average consumer, eh? Lets hope for no more delays.

      -Yoweigh
    • Why should we hope it gets released now instead of later? Do you have anything riding on it?

      Hard as that may be to believe, some people use their computers for real work. And some of those people run into that dreaded 4G limit--4G is not a lot of memory anymore these days. And many of these people would love to have the choice of a Hammer over Itanium.

  • Anybody here have stock in AMD? I've been long on the company for like two years now, but it never seems to finally launch the Hammer!!! I was hoping for a christmas release, but that's not gonna happen now...... my stocks will get beaten tomorrow!!! :(
    • Hold onto your AMD.

      Really, there are only two companies positioned to provide CPUs for PCs. Intel is showing more and more signs of losing their grip on bits of the market, and AMD only really needs to gain small bits of Intel's narket share to be wildly successful.

      Besides, you should never sell during a recession. Now, if you want risky products, join me and buy some Abiomed (makers of the Abicor heart). Might not work out long term, but if they work out the kinks people will pay any price...

      • Where is Intel losing their grip? They increased their market share by 3% this year...
      • I totally agree with you..... I only think they shouldn't start behaving like Intel, delaying their products by years..... Hammer was initially slated to come out first half of 2002. :(

        What you're saying is true..... AMD only has to get double their current market share (about 40% of the market) to start raking in the millions...
        and it's wildly undervalued, in my opinion.

        I might look into this Abiomed company..... sounds interesting.... I might be a customer in the not-too-distant future! :)
    • Re:Invested in AMD. (Score:3, Interesting)

      by Perdo ( 151843 )
      They got beaten today.

      Down 7% on Intel's 2 cent per share dividend.

      They'll get beaten again tomorrow.

      They'll get beaten at Christmas.

      They'll get beaten until Sledgehammer is released.. not Claw hammer which will have no x86-64 desktop software support right off the bat, and will have to rely solely on it's pure x86 performance.

      Microsoft shafted them on the X-box because Intel paid Microsoft 200 million to use the Pentium III. Nvidia was stuck with an unused AMD integrated chipset for X-box and Nforce was born.

      Intel will pay Microsoft to shaft them again. No x86-64 Windows XP for AMD despite AMD testifying on Microsoft's behalf in exchange for anti-trust testimony. AMD made an unenforceable agreement with Microsoft. To enforce it would perjure themselves.

      So Intel wins again.

      Until Sledgehammer arrives. Sledgehammer is a server/workstation chip and will have full support of the dominant server operating system, Linux. Microsoft must support Sledgehammer or risk losing more of their already weak server market share.

      Long after Microsoft has done the work to get Windows running on the Server, Microsoft will incorporate x86-64 support into their desktop OS.

      Probably about the same time as they support Hyper threading and SSE3 for Intel.
  • by Anonymous Coward
    who works at AMD, we were talking about this tuesday, that the Hammer chips will be released next year, and I told him I thought late this year. Well, looks like he was right.
  • by mfago ( 514801 )
    "C|Net reports that their next processor (Hammer) has been rescheduled..."

    So now C|Net is making processors too?! (Sorry, couldn't resist...)
    • Actually, that was my first thought as well. Of course, if they DID make processors, I can only imagine the reviews.

      "C|Net gives this CPU 10 out of 10!"

      (ok, be gentle, this is my first post, literally :P )
  • ...again?
  • Honesty? (Score:3, Interesting)

    by wray ( 59341 ) on Friday September 13, 2002 @12:39AM (#4249425)
    What is the reason for the delay? Can it really be that it's just a business decision (as they seem to say) rather than a technological problem? It seems that AMD _needs_ this jump in 64 bit computing, the sse2 registers, and boost in performance on Intel. So to me, if it is a business decision, it is a poor one.

    Everything I have seen shows that Intel is doing much better in performance and climbing. AMD claims there is no real technological reason, yet there must be. Anyone have insights? It seems that it would be prudent for AMD to issue better explanations -- how could it hurt to be honest? I want to see competition, if they are going to lag in performance, then they present no reason for people to buy. (A similarly performing Intel chip is close in price right now)

    • The only reason I can think of it being a business reason is to milk the Athlon design a little longer before it goes into the value market. They would only do this if they thought Intel wasn't going to make many speed bumps, or they have silicon that ramps better than it currently overclocks. Once you finish the design work for a chip, the more you produce the lower your costs are, since you don't have to do a major revision to the design for some time, new processor roll outs are a balance between lowering fixed costs and keeping up with the competition.
    • by PaxTech ( 103481 ) on Friday September 13, 2002 @01:04AM (#4249504) Homepage
      They're waiting so they can ship the new chip bundled with Duke Nukem Forever. ;)
  • Good (Score:3, Informative)

    by Billly Gates ( 198444 ) on Friday September 13, 2002 @12:44AM (#4249439) Journal
    A delay from palladium which will be included by default starting with the Hammer. It was probably delayed because longhorn aka drm-Windows was delayed and its needed to actually use the cyptography in the cpu.
    • While its true that Palladium hardware needs the OS to enter trusted mode, Longhorn is in no way needed to run Hammer or any other Palladium enabled hardware. Remember that Palladium is not involved in the boot process, and when it is enabled it runs parallel to the kernel. But we have already been over this, haven't we Billly...
      • If the user wants to use palladium to secure his/her documents then the OS has to support it. Palladium is not required for the boot process unlike TCPA but other complications or bugs could arise if palladium is enabled by default and a non supported OS. Palladium's own [microsoft.com] faq states ..." we have defined the "Palladium" initiative as a "new set of features in a forthcoming version of Windows that, when combined with new hardware and software, enable . . ." What we refer to as "Palladium" incorporates a Microsoft operating system.". So Longhorn is needed or a special version of XP in other words. Go do your homework next time cheezedawg.



        I bet it would be easy to disable it by default in the bios to boot XP or Linux. Microsoft should of picked TCPA which is more open and already has XP drivers for it by IBM.

        • Re:Good (Score:5, Informative)

          by Billly Gates ( 198444 ) on Friday September 13, 2002 @02:52AM (#4249799) Journal
          Oops I forgot to include this from the faq. [microsoft.com]



          Q: Can Linux, FreeBSD or another open source OS run on "Palladium" hardware?

          A: Virtually anything that runs on a Windows-based machine today will still run on a "Palladium" machine (there are some esoteric exceptions[1]). If you currently have a machine that runs both Linux and Windows, you would be able to have that same functionality on a "Palladium" machine.

          The exceptions are here



          [1] These exceptions include the following:

          1.)Some debuggers may need to be updated to work in the "Palladium" environment, but they can still work.

          2.)Some special performance tools may need to be updated.

          3.)Software that writes directly to TCPA hardware will need to be updated.

          4.)Memory scrub routines (at the hardware level) will need attention.

          5.)Third-party crash dump software may need to be updated.

          6.)BIOS mode hibernation features will need to be updated to work with "Palladium."



          Its these 6 reasons why palladium is still beta and why AMD is probably waiting before releasing Hammer.

  • the other side... (Score:3, Informative)

    by tanveer1979 ( 530624 ) on Friday September 13, 2002 @12:49AM (#4249449) Homepage Journal
    Lot of posts are screaming "again, again"... but the fact is a 64 bit processor is one devil to design.
    The biggest problem with current processors is that to design such devices we *have* to use dynamic logic. Ask any VLSI design engineer.. that is no joke. Infact many multipliers and dividers have to be hand edited! So delays are expected and it does reflect upon the desigers and companiesd in any way.

    Before you ask.. I do now work for AMD, i work in another VLSI company, thats why i say.. its tough. Millions of gates thousands to be hand edited its a bitch.. but as they say the fruits of labour are sweet... and for AMD hammer is going to be the sweetest

    • Lot of posts are screaming "again, again"... but the fact is a 64 bit processor is one devil to design. The biggest problem with current processors is that to design such devices we *have* to use dynamic logic. Ask any VLSI design engineer.. that is no joke. Infact many multipliers and dividers have to be hand edited!

      This is true for any processor, 64-bit, 32-bit, or otherwise. If you want the last 20%-30% of the performance, it will involve hand-optimization and an ungodly amount of work.

      Designing a 64-bit chip vs. a 32-bit chip, OTOH, is mainly just replicating elements, though you do need design tweaking for a few pieces that don't scale well.

      Re. dynamic logic, it really depends on the process you're using and what your design goals are. There are a lot of "gotchas" that you're undoubtedly already aware of that can degrade performance in a dynamic logic system, and other tradeoffs that are made when adding dynamic components to an otherwise-static system.

      As with other design choices, there's no _requirement_ to do this. You just get a performance boost for certain specialized types of structure, which can justify the headaches if that kind of structure is on your critical path.

      For a 64-bit processor that doesn't use dynamic logic (last I checked), just look at the MIPS line.
  • I'm betting they're adding Palladium. It seems likely, since these days you must make sacrifices to gain things. XP Service Pack 1 will fix a few security holes, but at the cost of your privacy. Hammer will be 64-bit and more powerful than anything you've got now, but will probably be Palladium-enabled. Or maybe I'm being a pessimist and they're not adding Palladium. Lets hope not :|
    • Um, AMD announced support for Palladium long before Intel did. I'm not sure if it will ship with the first generation Hammer, but it will ship eventually.
    • If by "more powerful than anything you've got now" you mean "more powerful than any other AMD processor", you might be right.

      On the other hand, if not, there are a lot of processors out there now that would leave AMD and Intel's new offerings in the dust.
  • struct conspiracy theory {
    real : MS;
    int : palladium;
    int * : hammer;

    hmmm is it for integrating palladuim support!
    };//end of struct

    Sorry... couldnt help it ;-)
  • The hammer is a critical product for AMD that they would never delay unless there were *major* problems with it.

    1) AMD is currently losing huge amounts of money. The hammer would have allowed them to sell at the high-performance end of the market again where the sales prices are higher and might have helped them reduce the flow of red ink.

    2) The delay will badly hurt AMD partners such as motherboard and chipset vendors who have developed supporting products for hammer.

    3) The hammer had a potential performance lead over Intel that will be greatly eroded by the time it finally appears.

    4) Critical software development for hammer will be slowed which will slow eventual market acceptance of hammer.

    5) The delay will build momentum for Itanium.

    6) The delay will greatly reduce the pressure on Microsoft to support hammer and will give Microsoft the opportunity to also build momentum for Itanium. Depending on market conditions when the hammer finally appears, it is now even possible that Microsoft will never need to support hammer.

    7) This delay is so serious that it creates real doubts that hammer will *ever* be a viable product.

  • NooooooooooooOOOOOOooooOO!!!!

    The poster dies in a fit of agony...

  • by deft ( 253558 ) on Friday September 13, 2002 @01:34AM (#4249590) Homepage
    but it turns out you can't touch this.
  • by Anonymous Coward
    I live in Austin, and have friends who work at AMD. AMD may make a great processor, but their motherboards suck because the motherboard testing department's manager tries very hard not to find any bugs. (Test stuff that you know will work. Never install an OS, just use a ghost image of preinstalled windows XP copied through the network onto the hard drive. Testing with linux is a no-no, because you actually find reproducible bugs in the hardware! We can't have that, we're a testing department...)

    At least one woman was fired for making a Linux test CD and distributing it internally around the company, against that manager's wishes. Her name's on the test CD, and it was still being used inside AMD last week, but she answered too many Linux questions for people outside her department and as such was labeled "not a team player" in the internal politics. As far as I can tell, that was the most knowledgeable linux person they had anywhere near that area.

    AMD makes great processors, but until they get a new motherboard testing department, they'll have nothing to put them in.
  • by Erpo ( 237853 ) on Friday September 13, 2002 @03:00AM (#4249813)
    Everyone always makes the same really annoying mistake when it comes to athlon fsbs. Athlon front side busses do not run at 200MHz and 266MHz. They offer bandwidth equivalent to 200MHz and 266MHz by using both sides of the clock (DDR) on 100MHz and 133MHz fsbs. All new athlons use 133MHz DDR fsbs. The hammers will support 166MHz DDR memory busses, offering performance equivalent to 333MHz SDR memory.

    However, the notion of "fsb" is a little blurred with the hammer. Hammers will be directly connected to dimm banks and have integrated memory controllers, so the speed of the fsb will no longer be a determining factor in memory bandwidth. (* see mp note below) The traditional fsb to the traditional northbridge will be replaced by a "high speed" hypertransport link to a chip that connects to the agp slot, and has another (slower) hypertransport link to what could be called the south bridge. This "south bridge" will then connect the pci bus, serial ports, hard drives, usb ports, and any other devices that need to talk to the processor or main memory.

    *What does this mean for MP systems? Well, that's actually the really cool part. By moving the memory controller onto the processor and providing communication between processors over a hypertransport link (3.2GB/sec for dual, 6.4GB/sec for quad and above), memory bandwidth actually increases as more cpus are added! This is in contrast to a normal MP system where as more cpus are added, there is increased competition for a fixed resource (main memory) which is already the bottleneck in many single processor applications.

    That's my rant on terminology. Here's the question:

    I'm no kernel hacker, and I certainly don't know anything about writing schedulers, but it seems like this would require a change in how processes are handled in hammer mp systems. In traditional mp systems, every processor has equal access to main memory. If a process gets moved from one cpu to another, there's initial overhead to do the moving, but after that it can still get to its areas in memory without any problems. On a hammer mp system, migrating a process from one cpu to another would mean that in order to access its memory it would have to reach out of its cpu's hypertransport link, into another cpu's memory controller (which may or may not be busy) and into the attached ram. Considering there would not be enough bandwidth available on the 3.2GB/sec hypertransport bus (in the case of a dp system) for both processors to reach into eachothers 166MHz DDR memory at the same time without suffering a performance hit, it seems like there would definitely be an advantage to keeping processes close to their data.

    What changes would this require to scheduling and process management code, if any? Has this already been addressed, or are there people working on it in the linux kernel?
    • IANAKH, too, but I occasionaly lurk on linux-kernel. As far as I have understood, there are things to be done before it makes even sense to think about scheduling and process mgmt. policies, because there's simply not the information for the scheduler etc. to know about something like "local" or "remote" memory.

      Simple Topology API [google.com] is one thread about this stuff.
    • by Erik Hensema ( 12898 ) on Friday September 13, 2002 @06:50AM (#4250255) Homepage

      Essentially this would be a NUMA system (non-uniform memory architecture). As far as I know Linux 2.6 will have support for these systems.

      In a real NUMA machine there would be a hierarchy of clusters of processors. Each cluster functions a bit like a traditional SMP system, but the clusters are interconnected over "low"-bandwidth busses. This makes memory accesses across clusters slower than direct accesses into the clusters' memory.

      Both the VM and the scheduler will have to know about this.

      Another point with NUMA systems is the possibility of gaps in the main memory (discontinues memory). Kernel hackers are currently working on support for that (discontigmem patch, merged in 2.5.34).

    • Actually the part of the kernel which would be affected most by this kind of architecture is the memory management code. While allocating pages to processes you probably want to make sure most of a process' pages "belong" to the same CPU. If they don't, nothing you do to the scheduler will gain you anything. (See below why this is stupid.)

      This isn't a particulary new requirement. You have to be careful about selecting pages for processes today even on single CPU systems to avoid cache thrashing. Because of the way first or second-level CPU-caches map to physical memory, certain memory-access pattern lead to constand reloading of the cache, making it pretty ineffective, even worse if it wasn't there in the first place. By carefully mapping physical pages to virtual memory the OS can avoid this problem. Solaris does this, I don't know about Linux. Probably.

      So, this is one new requirement for the memory management code. No problem, we just make sure all process pages belong to one particular CPU and schedule this process to this CPU only. Everything is fast and nice. Intel is doomed. Or is it? Not so fast, all this is probably a bad idea:

      We can't make sure pages on the right CPU a even available. What if they are not? Give out wrong pages? This would lead to results in running time which are not reproducable. This is really bad. It gets worse. What it the right CPU is not available because it's running some other process?

      Probably it's best to allocate evenly distributed pages (some fast, some not so fast) to processes and not schedule them special in any way.

      Easy ;)

      • Running times have never been particularly reproducable, and they're only going to get less so. Pretty soon, we'll have processors whose clock speed is dependant continuously on temperature; they run as fast as they can do accurately without damaging the chip. Of course, that means that your computer is faster during the winter.

        As far as existing chips go, hyperthreading also messes a lot with running times, because when you get processor time depends on when another process has cache misses.
    • By moving the memory controller onto the processor and providing communication between processors over a hypertransport link (3.2GB/sec for bandwidth actually increases as more cpus are added! This is in contrast to a normal MP system where as more cpus are added, there is increased competition for a fixed resource (main memory) which is already the bottleneck in many single processor applications.

      This is true only if the processors are running tasks with unrelated working sets (and if the data for each task is in that processor's memory).

      If you have tasks that require memory managed by another processor, you have to go through the hypertransport link and the other processor's memory controller to get it. This will be _slow_. HT is decent, but nowhere near as good as a direct connection to memory, and there _will_ be delays due to arbitration on the second chip and the various buffering stages the data transfer has to go through.

      So, for multiple processors working on a shared workload, you're screwed.

      The only way to ameliorate this is to have very smart OS-level memory management that can duplicate shared-but-not-modified pages across multiple memory banks, and both OS and processor support for update-based coherence between the banks. The hardware support for this is a bit tricky, and the OS support will be a nightmare if the OS wasn't NUMA-friendly to begin with.

      And under some cases - like tasks on multiple processors competing for access to a lock or all heavily modifying the same data page - you're screwed no matter what you do.

      So, don't rejoice yet. We'll only know for sure how well this will work when we have Hammer systems on our desks.
      • If you have tasks that require memory managed by another processor, you have to go through the hypertransport link and the other processor's memory controller to get it. This will be _slow_. HT is decent, but nowhere near as good as a direct connection to memory, and there _will_ be delays due to arbitration on the second chip and the various buffering stages the data transfer has to go through.

        So, for multiple processors working on a shared workload, you're screwed.


        From the Hammer presentations I've seen, this is not true at all. The HT link between CPUs is 6.4GB/s, which is actually faster than the direct-attached memory (~5.3GB/s). Since the HT controllers are running at >2GHz, they introduce minimal latency.

        And under some cases - like tasks on multiple processors competing for access to a lock or all heavily modifying the same data page - you're screwed no matter what you do.

        I don't think this is true either. Contention for a cache line will simply bounce the line between caches, which is much faster on Hammer than on a 400MHz shared-bus SMP.
  • by Heretic2 ( 117767 ) on Friday September 13, 2002 @03:25AM (#4249865)
    You ever notice how all the Hammers are clock speed locked at 800MHz? Yea, there's a reason for that. They're having problems cranking the clock speed up. For 800MHz they're fast as hell, beating P4 with twice the frequency, but they're not gonna release them until they clock faster than current Athlons so they're trying different types of transitors and what not.

    How the hell do I know that??? Look where I live, take a guess...The birds outside my window know things.
  • that's much bigger news than some delay - even bigger news is that it is not the first processor they produced :) "C|Net reports that their next processor (Hammer)". How low can you go :)
  • Seems smart to me... (Score:3, Interesting)

    by DeathPenguin ( 449875 ) on Friday September 13, 2002 @04:51AM (#4250105)
    Why should they rush the Hammer when the Itanium is failing as is? They know they can't push people to use their 64-bit capabilities, just like people didn't switch to Alphas. Squeeze every ounce of strength from the Athlon as they possibly can for now. Let Intel push the IA64 standard on everyone first to create a demand to migrate from 32-bit to 64-bit. That's where AMD plans to make their killing.

    I would imagine it would be better to release Hammer ASAP and create the 64-bit market themselves. Then again, I don't know the logistics required for such a launch, nor do I know exactly how much better, if any better, x86-64 would perform. Let's face it, not many people care about 64-bit versus 32-bit, they only know what the dork at CompUSA tells them. And if Hammers can't outscore P4's in the 32-bit apps that very short-sighted people care about, then there is really no place for Hammer in the consumer market.

    From what I've heard, mostly from internet gossip, is that AMD is having problems making Hammer scale high enough to beat the P4 in 32-bit apps, although it only requires roughly 1 Hammer MHz to beat 3 P4 MHz. I've also heard that AMD is having problems making Hammers run above 800MHz. With the expected debut of the P4 at clock speeds above 3GHz, the Hammer doesn't stand much of a chance in 32-bit apps.

    In short, don't expect to see Hammers until Intel manages to salvage the Itanic.
  • I can wait another few months. That'll give me the extra time I needed to figure out how to get linux running on my 64-bit Atari Jaguar.

    Anything with 64-bit doom is good enough for me :)

  • Folks,

    While AMD works out the bugs of their Hammer line of CPU's, don't forget that AMD still has a card to play in terms of CPU competition with Intel: the Barton-core Athlon CPU due later this fall.

    Unlike the Athlon CPU core designs since the original Thunderbird-core Athlon's, the Barton-core Athlon sports a larger 512 KB L2 cache on the CPU die, which will offer dramatic performance increases, especially with memory-intensive programs. Remember, the current Thoroughbred-core Athlon CPU rated at 2600+ already has reached parity with the Intel Pentium 4 2.53 GHz part, and that's with only 256 KB of L2 cache on the CPU die and using DDR266 DDR-SDRAM! What will the Barton-core Athlon do?
  • If this means Barton sooner rather than later, I'm happy... although from what I've read Barton (166 MHz FSB, 512k cache) is still slated for Q1'03. Sigh.

    Why? Because I'd like to get a Barton CPU for my next computer. I'm already in the waiting game for the NV30 and (to a much lesser extent) Serial ATA, so putting a better CPU on the list isn't a big deal.

    Why not Hammer? Because I know better than to buy a first generation CPU with first generation motherboards. Barton is just a mild revision to a 4 year old CPU core, and the motherboards are now hitting their 6th generation (KT133, KT133A, KT266, KT266A, KT333, KT400).

    For those who need the speed, power, and addressibility of a 64-bit chip this announcement sucks, but for those just looking for a faster current generation chip it's not entirely bad.
  • Two weeks ago, there was a thread on the AMD 2700+. Several slashdotters were suggesting we hold of on purchasing an AMD processor until the K8 was released. I suggested [slashdot.org] that if we weren't careful, AMD might suffer in the same way Osborne Computer's [digitalcentury.com] sales slumped when they announced the Osborne 2.

    If too many people hold off purchasing an AMD now, because they want to wait for the newest, whiz-bang thing, then the possibility exists that AMD will not be able to finance the development of the K8 on time, or even that AMD will go bust.

The Tao is like a glob pattern: used but never used up. It is like the extern void: filled with infinite possibilities.

Working...