Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×

HyperTransport 3.0 Ratified 179

Hack Jandy writes "The HyperTransport consortium just released the 3.0 specification of HyperTransport. The new specification allows for external HyperTransport interconnects, basically meaning you might plug your next generation Opteron into the equivalent of a USB port at the back of your computer. Among other things, the new specification also includes hot swap, on-the-fly reconfigurable HT links and also a hefty increase in bandwidth."
This discussion has been archived. No new comments can be posted.

HyperTransport 3.0 Ratified

Comments Filter:
  • by DaHat ( 247651 ) on Monday April 24, 2006 @05:21PM (#15192935)
    I can only imagine what that could do to us cheap bastards who have small clusters of older PC's sitting in a second bedroom or closet.

    "Hum... I can't quite afford a whole new system or even a motherboard and two new procs... I'll just add a new one to the back of an existing one"

    At last! The day of easily being upgrade to a multi-proc system may soon be at hand! (assuming they also have some sort of... external hub device).
    • by merreborn ( 853723 ) * on Monday April 24, 2006 @05:46PM (#15193098) Journal
      "Hum... I can't quite afford a whole new system or even a motherboard and two new procs... I'll just add a new one to the back of an existing one" ...Except you'd need a hypertransport 3.0 motherboard to begin with, and enough appropriately clocked RAM to make use of the processor. The whole "External CPU" idea was just speculation anyway; it's not mentioned anywhere in the article.

      Point being, you'll never be able to plug a new opteron into _anything_ that's sitting in your closet right now.
  • by Anonymous Coward
    Maybe they should integrate the RAM in to the CPU or something.
    • Good point... but do you really want to dedicate a large chunk of ram to a specific processor in such a manor?

      Sure, with it there would be a possibility of cache coherency issues while without there would be a performance hit whenever something hit the bus...

      I guess it'd depend on the cost of ram when building such a device... I'm guessing that a whopping 64-128 meg cache aught to be enough for sometime.
      • Not to be pedantic, but while I might not want to dedicate a large chunk of ram to a specific processor in such a manor, I might want to live in that manor, and maybe have my serfs carry out the computations for me.
      • In a design class I took, our professor talked about something called "processor-in-RAM". The idea is that you'd have a few processors all with their dedicated RAM. The program you are running would be copied in each processors's RAM. When a branch was ready to be taken, half the processors would go one way and the other half the other. The processors that guessed right would let the other processors know they were wrong and update them with the new information. This way there is no penalty hit as all
        • by smallfries ( 601545 ) on Monday April 24, 2006 @06:32PM (#15193336) Homepage
          You're mixing up a few pieces of technology here. Processors with their own dedicated memory has been invented many times by different people. Modern loosely coupled clusters fit this bill, but further back there was the transputer systems in which each processor had memory on board. Systems like this are more difficult to program than single image systems (even with a CSP derivative as the language) but they produce higher performance.

          The other thing that you are describing is multiway branch prediction. A processor like the Pentium guesses which way a branch goes and despatching instructions down that path to the pipeline. When it is wrong there is a hit as the pipeline stalls and all of those cycles are lost. In multiway branching both outcomes of the branch are despatched to the pipeline. The cost is that half the instructions being executed will be thrown away. If you go 2 branches deep then it is 75%. The advantage is the latency is minimised as the pipeline is always full.

          The last thing is processor-in-RAM, or smart memory. In this system a miniture processor is embedded on the DRAM die. The small processor is capable of computing striding patterns in arrays. As the program executes on the main processor the smaller processor predicts which memory locations are going to be accessing and presending the data to the host processor, reducing latency.

          Good luck on your class. Architecture is one of the more interesting courses in a CS degree.

    • Take a variant of the older Slot 1 processor design. Have the processor on one side, and 4 memory slots on the other. It makes the traces between memory and processor almost negligible. Without the traces and slots for memory on the motherboard, they should get smaller and hopefully cost less.
    • I've often thought that. They already have support for NUMA in OSes like Windows, Linux, etc, and DRAM cells take 1/6th as many trasistors as SRAM cells (used in cache). You could still have RAM external to the CPU, it would just be recognized as being non-local.
    • Maybe they should integrate the RAM in to the CPU or something.

      The problem with integrating DRAM is that capacitance is very sensitive to heat; cells won't be able to hold a charge (and will be useless functionally) if temperatures get too high.
    • Instead of that, how about having some REALLY fast RAM right next to the CPU? Take a look at modern vid-card. Hi-end models have 256-512MB of uber-fast DDR3-RAM on 256bit bus. And the GPU's are usually bigger than CPU's are. And still, they can seel the entire package (GPU, card and RAM) for about $500. What if we did something similar with CPU's? Instead of selling CPU's as chips, sell them as modules (like SGI and Sun do). Attached to that module would be the CPU, and attached to the CPU would be 256-512M
      • The RAM attached to video cards is not only fast, but has terrible latency. How do you think they clock it at 1GHz+? It wouldn't do at all for the main memory in an AMD64 machine.
        • IIRC, individual memory-chips (like in vid-cards) can be clocked higher than DIMM-modules can. And besides, we currently have 800Mhz (effective) RAM. How about leaving it at 800Mhz, but doubling the bus? latency would be reasonable, but bandwidth would be twice as big.
          • They can be clocked higher, but the timings must be relaxed for them to operate at that speed. It's a limitation of the DRAM core, not the interface. The bus could be widened, but that doesn't help latency. The big problem with modern machines is actually not bandwidth, but latency. The AMD64, for example, gains almost nothing with DDR2, even with an almost doubling of bandwidth, because latency is not reduced.
      • That's exactly what a cache is. It's very high speed memory, often SRAM, attached on a very wide bus. Instead of letting the programmer or the OS decide which parts of software to put in the high-speed ram, and what to leave in the low-speed ram, the cache controller does, essentially letting all the data have a place in the high-speed ram, but occasionally replacing it.

        What you describe doesn't really solve any real problems. Graphics cards bennefit from fancy memories like gddr3 because they are bandwidth
      • You do raise an interesting point about the memory expansion. Opterons are limited in the amount of local memory they can address. Each opteron includes 2 memory controllers. DDR memory controllers can't drive more than 3 modules per bus, limiting an opteron to 6 dimms per cpu. Currently that puts a ceiling of 12GB/cpu of memory until 4GB dimms become available in quantity. For most people, this is not a real hinderance; 24GB for a dual-proc sled is plenty enough. There are some cases, however, when your re
  • External FGPA units? (Score:3, Interesting)

    by SaDan ( 81097 ) on Monday April 24, 2006 @05:24PM (#15192955) Homepage
    Hrm... Need a temporary boost in your folding at home project? Plug in an FPGA module!

    This can only be a good thing.
    • Problem is current crop of FPGA chips aren't fast enough to replace a 'real' cpu.

      Im a great fan of FPGA and they they are cool, but i also know what their place is, and replacing comparably ( relative cost/performance curve ) cheap CPUs isn't it.
  • A port? (Score:2, Funny)

    by Anonymous Coward
    "you might plug your next generation Opteron into the equivalent of a USB port at the back of your computer"

    Is this a serial connection?
    Or will you need a foot wide port with 700 or so contacts on it?

    I know serial connections are very fast nowadays, but I don't know if you can get the entire memory bandwidth of a cpu without spreading the bandwidth in parallel connections.
    • Re:A port? (Score:2, Interesting)

      by Loconut1389 ( 455297 )
      Check out the SGI/CrayLink setup used for ccNUMA - the port is around 2.5 inches, but has quite a lot of pins (maybe 100?). I don't think foot-wide is really necessary.

      IMHO, fiber optics- though delicate, could offer higher bandwidth. I'd rather have my whole fiber go dark from a break and know it than have one strand of many go out and not know it and have all kinds of whacky/intermittant behavior.

      I still struggle to understand why fiber optics are so expensive- the lasers used are fairly cheap and the cab
      • Re:A port? (Score:5, Informative)

        by Anonymous Coward on Monday April 24, 2006 @06:31PM (#15193325)
        The reason fiber optic (particularly glass core) is so expensive is due to the difficult and sensitive process required to manufacture that cable, though the materials used are extremely inexpensive. The diameter of the glass core must be matched exactly to the wavelength of light to travel over that fiber. In addition the composition and purity of the glass must meet certain standards to prevent reflection, signal attenuation, or signal skew, all of which would result in inconsistent or degraded performance. As far as the lasers being cheap, yes a laser can be cheap, but again the same demanding requirements apply to both versions of laser used in data communications, which again increases the manufacturing cost.
      • If you designed your untrusted channels with some type of sliding window go-back-and-retry protocol, you'd know if you broke a wire or it's otherwise noisy.
  • So, you take the external interconnects, a large SMP box, and a transfer rate unachievable by anything except channel-bonded Myri/Infiniband/Quadrics, and you've suddenly commoditized (is that a word?) the Origin 2K architecture. Unfortunately, there will be that inevitable gap between "announced" and "benchmarkable", but this should lead to interesting system design.

    Computing might just become fun again. Small systems passing information around to form a display wall, or big systems chained together t
    • Re:Nice... (Score:3, Interesting)

      by questionlp ( 58365 )
      Although HT 3.0 will be a very good step to bring the Opteron closer to the Origin architecture, but the Opteron still lacks or does not have good implementationse of the cache coherency and other caching features of NUMAlink used in the Origin servers/clusters. The Horus chipset helps in some ways, but doesn't help scaling beyond 8P in a glueless fashion.

      Just my $0.01
      • Re:Nice... (Score:2, Troll)

        by Gothmolly ( 148874 )
        You're right. So because it doesn't do $SPECIFIC_BUZZWORD, we should shitcan the entire thing. Very +1, Insightful.
      • Re:Nice... (Score:3, Insightful)

        Are you suggesting AMD buy SGI?
      • the Opteron still lacks or does not have good implementationse of the cache coherency and other caching features of NUMAlink used in the Origin servers/clusters.

        I have an SGI running Linux that has NUMAlink with cache coherency with stock Itanium CPUs and of course NUMAlinks. Is this something that cannot be extended from what SGI has done to use the cache coherency over HTX?

        I don't know, but hopefully someone does.

    • commoditized (is that a word?)

      Whatever. If you're in IT and you don't invent two words a year you're coasting. Try 'elaborisha': (obviously excessive complexity for the sake of questionable or obsolete tangibles.) Zero hits on Google. Verb it and you have elaborize. :)

      Computing might just become fun again.

      It's fun now. Over at Supermicro you have four socket motherboards designed for 1U hosts. Intel is planning 4 core CPUs (MP, blah blah) by Q1 '07. 16 cores in 1U. Meanwhile Sun has an 8 core CPU sh
  • Hmmmm. (Score:5, Insightful)

    by ultramk ( 470198 ) <ultramk@noSPAm.pacbell.net> on Monday April 24, 2006 @05:30PM (#15192998)
    I can see an interesting situation where you could have a traditional CPU, to which you could plug in additional external processor modules as your needs expand. (assuming the OS could handle sharing out multithreaded apps over a variety of different multi-CPU configurations.)

    Dave has a processor intensive project this week? He gets the big stack plugged into his machine until someone else in the office needs it.

    Server getting bogged down? Add another couple modules to the system.

    I like the idea.

    m-
    • Re:Hmmmm. (Score:3, Interesting)

      by DaHat ( 247651 )
      I was thinking something similar... there is one issue that no one here has thrown out yet. Heat.

      Lets say your company has a 4-way hub that can be plugged into the system of choice... imagine the cooling such a thing would require in order to keep from burning up in its enclosed plastic or (more likely) metal box.

      Not to mention the noise... oh good god the noise. My dual core 3800+ at home is quite loud... I can only imagine what a few of those bad boys sitting on your desk would sound like under full load.
      • I'm not exactly sure what you're saying, but if you're implying that having a loud 4-way would be hot, then I'd have to agree with you, though I think I'd prefer bad girs rather than boys. The only thing I don't understand is what this has to do with a manor.
      • My dual core 3800+ at home is quite loud...

        Really? I've got a 4200 with the stock cooler and it's whisper quiet. I had a shuttle box before and I was afraid the switch would be unpleasant. (I've had cpu coolers before that sounded like jets taking off. Not good...) But the cooler that came with the cpu was just fine, not much louder than the shuttle, and I run it with an open case.
      • Re:Hmmmm. (Score:4, Insightful)

        by masklinn ( 823351 ) <.slashdot.org. .at. .masklinn.net.> on Monday April 24, 2006 @06:09PM (#15193207)

        My dual core 3800+ at home is quite loud...

        No it isn't you dummy, your cooling system is, now just get a knowledgeable friend to slap a Thermalright HR-01 and a Nexus 120mm fan (undervolted to 9V) on it and it'll be whisper-quiet.

      • See, I was thinking more along the lines of sealed, liquid-cooled units.

        One of the things that people forget is that one of the biggest reasons that most PC cases are loud is that they have to be upgradable, and re-configurable. By default, the fans are full speed, and the air flow isn't designed for quiet operation.

        If something's a sealed module that will never need to be opened, the thermal profile is a known quantity that can be engineered around. Look at Apple's G5 series: whisper quiet, unless your amb
        • One of the things that people forget is that one of the biggest reasons that most PC cases are loud is that they have to be upgradable, and re-configurable. By default, the fans are full speed, and the air flow isn't designed for quiet operation.

          No, the one thing people forget about is that they didn't care about noise in the first place, not until they started going deaf, and they didn't want to put $50 into their CPU cooling solution and stuck to the crappy buldozer-engine like 60mm fans because it was

      • Lets say your company has a 4-way hub that can be plugged into the system of choice... imagine the cooling such a thing would require in order to keep from burning up in its enclosed plastic or (more likely) metal box.

        The new socket AM2 dual core Athlon X2 3800+ will be available in both "normal" 89W versions and ALSO 65W and 35W (!!) versions. The 89W number is already lower than what the Athlon (original one, not XP) 1400 would require. Simply put, these processors are not power hungry. Furthermore, y

      • I suppose a good deal of issues could be eliminated if low power cpu's were to be used in such a manor...

        I only do this because you have written this twice. The word is "manner," not "manor." A manner is a way of acting. A manor is a mansion.

      • Ok, you've done it at least twice in this story. THE WORD IS MANNER [answers.com], NOT MANOR [answers.com]. Please... I know they're homonyms. It's not that hard, though. Show respect for yourself and the people reading your comments by not coming across as ignorant.
    • I can see an interesting situation where you could have a traditional CPU, to which you could plug in additional external processor modules as your needs expand.

      Indeed. It sounds like a Cell-type Opteron configuration waiting to happen. If AMD manages to pull something like that off, Intel will have to eat dust for a while. For now, though, it's a fun speculation.
  • A fast replacement for MIDI!
  • We've got a liquid cooled CPU in a separate enclosure that is connected to the body by a HyperTransport Interconnect, too! Soon vendors will come to market with processers in self-contained water cooling devices where you just take the cord and plug it into the computer.

    Mother Nature knew it all along.

  • Increased Bandwidth (Score:5, Informative)

    by Metabolife ( 961249 ) on Monday April 24, 2006 @05:41PM (#15193075)
    HT 3.0 increases the bandwidth to 41.6 GB/s, that's 86% more than 2.0. It's also expected to be backwards compatible with current motherboards using 2.0. The new processor will run with 3.0 speeds while the motherboard will be stuck with 2.0. The new Rev. F AMD cpus are expected to have HT 3.0. It should help with multi-processor systems where the high bandwidth connects each cpu.
  • by Visaris ( 553352 ) on Monday April 24, 2006 @05:41PM (#15193076) Journal
    Whoever subimtted the article doesn't understand what the external HT links are for. They are _NOT_ a replacement for USB or any other similar technology. External HT is used to link multiple chassis together to form a large SMP box. This is similar to infiniband, etc. This is NOT designed to be a way to just plug in a CPU to an external port. Read the pdf:

    http://www.hypertransport.org/docs/tech/ht30pres.p df [hypertransport.org]
  • In the meantime... (Score:5, Interesting)

    by jd ( 1658 ) <imipak@ y a hoo.com> on Monday April 24, 2006 @05:48PM (#15193108) Homepage Journal
    Broadcom's BCM1250 MIPS processor implements a totally non-standard HyperTransport that blends several of the early 1.x specifications in a way that is unpredictable and a pain. Yes, folks, there are manufacturers out there who don't debug or maintain their product lines, who won't stick to published specs, and who can't be relied upon to publish their own specs. Sometimes, those of us who post on Slashdot slam Intel for decisions that are nothing short of insane, but there are actually far far worse offenders out there.


    Most of the HyperTransport updates look to be good (and, frankly, about time) but I am highly concerned that if certain manufacturers (such as Broadcom) haven't even bothered to do better than a fragmentary 1.x and have ignored 2.x entirely, there is little hope that they'll do much with 3.x.


    And that's the big problem. If AMD are the only ones who ever implement the specification in full, correctly, then it doesn't offer any significant advantage. It isn't universal enough to be useful. That is the killer that has murdered so many excellent technologies. Being good - even being the best - isn't enough. If a rival is more widely adopted, then it'll be the rival that wins. The marketplace doesn't reward quality, it rewards popularity. Quality achieves nothing.

  • Thanks alot (Score:3, Funny)

    by hurfy ( 735314 ) on Monday April 24, 2006 @05:56PM (#15193146)
    Now half my brain will be trying to design a 939 connector USB cable in the background....

    hehe external CPU, someone got a better batch of something than i did.....

  • Processor on a stick. Cool idea. Now we only need to update the USB spec to supply devices with 100W of power! While you're at it don't forget that we'll also want a couple hubs in the path.

  • So finally (Score:2, Funny)

    by iminplaya ( 723125 )
    We'll be able to go from New York to Tokyo in less than three hours?

    •   We'll be able to go from New York to Tokyo in less than three hours?

      Ninety minutes from New York to Paris, well by '76 we'll be A-OK...

  • by Inoshiro ( 71693 ) on Monday April 24, 2006 @06:20PM (#15193273) Homepage
    Why are MacBook Pros so much faster than Powerbooks?

    The MacBook Pro sports a 666Mhz DDR FSB, while the Powerbook sports a 133Mhz FSB. It doesn't matter how fast your processor is if you don't have a fast enough way to power it (much like a V-12 will not do well with a single-barrel carb used on a lawnmower engine).

    The Von Neumann bottleneck [wikipedia.org] is the significant limiting factor in all machines, once your working set of data exceeds that of your L1/L2 cache. Suddenly your 1.5 Ghz G4 is 266 Mhz :/

    Faster hypertransport means happier users of AMD machines. My AMD64 beats the pants off my Sempron 2500 because its 800Mhz HT bus allows it to do context switches in less than 1/3rd the time of the Sempron!
    • or... (Score:3, Informative)

      Perhaps it's because your Sempron 2500 is a socket 754 chip, so cannot use dual-channel memory. The AMD64 has a faster FSB, and it's dual-channel.

      Many people (including yourself it seems) misunderstand HT. It isn't the FSB, an Athlon 64 has no FSB. HT is only used to communicate non-memory I/O and to synchronize caches between processors when doing memory I/O. So it's rather unlikely that HT could make your context switches 3X faster. Best thing for that would be a bigger cache, which your AMD64 probably ha
      • My AMD64 is a Socket 754, and my Sempron is Socket 462. It's on a much, much slower bus connection to its RAM. The Sempron has 180ns latency to RAM, while my AMD64 has 60 ns (worst case).

        The AMD64 average context switch latency is a few microseconds; 15ns average. Sempron is 10ns best, 70ns average. I can send you a PDF with a few hundred graphs I did with lmbench on several platforms for a reseach project recently, if you don't believe me.

        So, if my kernel is doing a context switch HZ times a second, I'
        • by ArbitraryConstant ( 763964 ) on Monday April 24, 2006 @10:08PM (#15194229) Homepage
          "The bus connection between my CPU and the RAM is, indeed, the Hypertransport. Northbridge, CPU, and RAM are all connected by it."

          This is wrong. Athlon64s have an on-die memory controller. They communicate with memory directly through the dual-DDR memory bus, no intermediaries. This is what gives Athlons their famously low memory latency.

          In Athlon64s, the northbridge as we know it does not exist because the memory is connected directly to the CPU itself. The CPU is connected to the chipset by way of a hypertransport bus, and memory I/O for other devices goes over this bus to the CPU's memory controller.
        • The bus connection between my CPU and the RAM is, indeed, the Hypertransport. Northbridge, CPU, and RAM are all connected by it.

          Well, whoever marked you as informative was fooled by the same info that fooled you into thinking this. Hypertransport, as the poster you are replying to explained, is *only* used to acces non-memory I/O in single-CPU systems. In those systems, like yours, it is used as a link between the CPU and the northbridge (as the wikipedia article indicates), but, unlike Intel systems, the R
    • That 266MHz statement is only true when every other instruction is hitting a new address in RAM. In reality, you are likely to be hitting the disk a lot, too. Then you'll have less than 1MHz performance :-)
  • by zaguar ( 881743 ) on Monday April 24, 2006 @06:22PM (#15193278)
    I've said it before and I'll say it again - Open standards lead to better products. Case in point - Hypertransport. That story about the possibilities of fluid simulations/path finding in the oil industry opened up by co-processors slotting into HTT links is just a case in point.

    Hey Intel, hows the FSB? And, for that matter, how's that DRM-soaked Viiv product going?

  • You've gotta see my dedicated Hypercard stack co-processor running on top of my custom Hypertransport stack.
    It's smokin!
  • Bah. Why bother.

    I'd rather have an external motherboard. Keep the CPU in the case, and everything else outside. /silly off.
  • Legos (Score:3, Funny)

    by Slayback ( 12197 ) on Monday April 24, 2006 @06:34PM (#15193347)
    Just make all the components (memory, CPU, disks, interfaces) like Legos, and you'll be set. Need more RAM? Just add another block. Suzy needs some extra CPU for a big project, let her borrow your block for the day.

    The bonus feature would be collecting enough hardware to make the Millenium Falcon out of your PC.
  • Too bad Apple isn't making new products with Hypertransport anymore, now that they're using Intel instead of the G5 or AMD. It would be interesting to have a rack of XServe machines that just do plug-and-play clustering via a Hypertransport port. Unless they go with AMD in the XServe (which actually wouldn't make much sense for a 1U single/dual processor unit), then I don't think we'll see anything like this.
    • How do you know they're not making new products with HyperTransport? They're still a signed member of the HyperTransport Consortium, and could be using HT elsewhere in the business. Just because their mainline products don't use it, isn't any reason to write it off.

      And who knows, maybe they'll convince Intel into using HT instead of the CSI bus they've been working on for so long. Intel's got to have an in-house implementation of HT up and running (it's an open standard, why not?), it's not all that far-
  • Is this intended to be used for peripherals as well? For example, I might have a handheld device that I can plug in to a desktop to use its CPU to do processor-intensive stuff on the handheld that it would not normally be used for when on its own.

    Or is that completely wrong?

The Tao is like a glob pattern: used but never used up. It is like the extern void: filled with infinite possibilities.

Working...