Become a fan of Slashdot on Facebook


Forgot your password?

AMD Unveils Barcelona Quad-Core Details 206

mikemuch writes, "At today's Microprocessor Forum, Intel's Ben Sander laid out architecture details of the number-two CPU maker's upcoming quad-core Opterons. The processors will feature sped-up floating-point operations, improvements to IPC, more memory bandwidth, and improved power management. In his analysis on ExtremeTech, Loyd Case considers that the shift isn't as major as Intel's move from NetBurst to Core 2, but AMD claims that its quad core is true quad core, while Intel's is two dual-cores grafted together."
This discussion has been archived. No new comments can be posted.

AMD Unveils Barcelona Quad-Core Details

Comments Filter:
  • by aussie_a ( 778472 ) on Wednesday October 11, 2006 @04:08AM (#16389743) Journal
    No he didn't. But please, don't let that stop you claiming otherwise.
  • Re:Once again... (Score:4, Informative)

    by tygerstripes ( 832644 ) on Wednesday October 11, 2006 @04:34AM (#16389891)
    It wasn't that good. AMD came out with an architecture which, in practical terms, was better designed, while Intel just kept trying to push the envelope with this very hot chip, and steadily lost market share as a result. Core2Duo is fantastic, relatively speaking, but it was a very long time coming...
  • Hmmmm Wrong. (Score:3, Informative)

    by Solokron ( 198043 ) on Wednesday October 11, 2006 @04:48AM (#16389965) Homepage
    Looks like someone RTFA a bit wrong. Ben Sander works for AMD. He is one of their media presenters. Here are a few of the events he has done: [] thru.mid2005.html [] [] v.txt []
  • by Anonymous Coward on Wednesday October 11, 2006 @05:17AM (#16390099)
    Some of us do care. Some for work, some for fun. AMD's "designed as quad-core" approach has some notable consequences, especially in the cache layout that (on paper, of course) seems very well suited to virtualization -- much more so than the Intel solution in TFA.

    AMD: a shared L3 feeding core-specific L2 caches. Intel: each core-pair sharing a L2 cache. AMD's approach better avoids threads competing for the same data (thanks to copying it from L3 to every L2 that needs it), while keeping access latencies more uniform and predictable (thus better optimizable).

    Other AMD enhancements look more like catch-up to Core 2: SSE [and it's "Extensions", dammit, not "Enhancements"] paths from 64bit to 128bit, more advanced memory handling (out-of-order loads versus Intel's disambiguation et al.), more instructions per clock by beefier decoding (more x86 ops through fast path instead of microcode) and more "free" ops (where Intel added way more discrete execution units from Core to Core 2).

    If AMD's quad manages to be better due to better memory bandwidth and latency (in practice), then they were quite right about "true quad-core" :)

  • by wolrahnaes ( 632574 ) <> on Wednesday October 11, 2006 @05:45AM (#16390223) Homepage Journal
    As the person who responded to your last post explained, that's just not possible with the K8 architecture as it is. The memory controller is on-die and memory technology is evolving, therefore the interface between the processor (where the controller is) and motherboard (where the DIMMs are) must also change.

    The closest to a solution we have would be going back to Pentium 2/3 style processor-on-a-card designs which would move the memory slots to an expansion card shared with the processor which would then have a HyperTransport interface to the motherboard.

    This works, as some motherboard manufacturers (ASRock on the 939DUAL for one) have implemented something along these lines for AM2 expandability. The problem lies in laying out the circuitry for this new slot, not to mention the incompatibility with many of the large coolers we often use today. It also would become even more complex when faced with another one or two extra HyperTransport lanes as found on Opteron 2xx and 8xx chips, respectively.

    AMD made a compromise when they designed K8. On the one hand, the on-die memory controller improves latency by a huge amount and scales much better by completely eliminating the memory and FSB bottlenecks that Intel chips get in a multiprocessor environment. On the other hand, new memory interface = new socket, no way around it.

    From what I understand, the upcoming Socket F Opterons will have over 1200 pins in their socket so as to allow both a direct DDR2 interface and FB-DIMM. If I understand FB-DIMM technology correctly, it should end this issue by providing a standard interface to the DIMM which is then translated for whatever type of memory is in use. Logically this will trickle down to the consumers in another generation. For the time being however, AMD has stated that the upcoming "AM3" processors will still work in AM2 motherboards, as they will have both DDR2 and DDR3 controllers.
  • by Phleg ( 523632 ) <stephen&touset,org> on Wednesday October 11, 2006 @07:13AM (#16390677)
    A "true" quad-core means that all of them share the same L2 cache, AFAIK. Basically, performance benefits as they can all use the same high-speed memory cache for L1 misses. This is also extremely useful in the case of multiple processes which aren't bound to a CPU. If process A is scheduled on processor 1, then 2, then 3, then 4, there are going to be a lot of cache misses (since it's in no CPU's L1 cache). With two dual-cores bolted on to each other, processes switching from processors 1-2 to 3-4 are going to incur severe performance penalties as any relevent memory is fetched over the memory bus from RAM.
  • Re:Once again... (Score:4, Informative)

    by LaughingCoder ( 914424 ) on Wednesday October 11, 2006 @08:11AM (#16390995)
    Netburst was designed for a market that touted clock speed as the performance measure for CPUs. AMD, with a big helping hand from the gamers, changed the game into rewarding true benchmark/performance rather than simple clock speed. I suppose if Intel had managed to achieve 10GHz clocks their performance would have been top notch, though one wonders how long those instruction pipelines would have to be ... and how much power they would have burned.

    Now Intel has out-benchmarked AMD, and is attempting to change the rules again to performance-per-watt. This next wave should be interesting to watch.
  • by Visaris ( 553352 ) on Wednesday October 11, 2006 @08:18AM (#16391055) Journal
    I won't buy any AMD processors anymore until AMD clears its socket plans and guaranties a minimum of 3 year availability for processors on a socket.

    I suppose that means you won't buy an Intel chip either. Look at what happened with Conroe. Core 2 Duo uses a socket with the same name as the P4 socket, the same number of pins too. But guess what? When Conroe came out there were less than a handful of reasonable boards out of the hundreds of models out, that would actually support it. The voltage requirements changed slightly, the BIOS requirements changed, and the end result was that upgrading to Conroe on a given board was hit or miss. I fail to see how Intel's MB upgrade situation is any better than AMD's. It sounds to me like you're falling for Intel's game: "We kept the socket name and number of pins the same, so that means we have better socket longevity." Sorry, but I'm not falling for it. I've read too many horror stories on the forums from Conroe upgraders that thought they could use their current P4 boards.

    Don't get me started on Intel's TDP scam either (AMD's = max, Intel's = average). AMD may not always have the best tech, but I find them to be a much more straight-forward company, with fewer sneaky games designed to trick customers.

    And why are we posting a story about AMD's tech said/written by an Intel employee? Sounds like it was biased before it even started to me.
  • by DohnJoe ( 900898 ) on Wednesday October 11, 2006 @08:22AM (#16391093)
    actually, there's three levels of cache in the opteron: L1 and L2 are CPU bound, L3 is shared.
    They claim that this improves performance with virtualization

    From the article:
    Barcelona uses a three-stage cache architecture. The L1 cache is 64KB, the L2 cache is 512KB and the L3 cache is 2MB. The L1 and L2 caches are dedicated to a particular core, while the L3 cache is shared among all cores. Note that the L3 cache has been engineered to be variable in size, so that different products may offer different L3 cache sizes. The L1 and L2 caches are exclusive, as with current Opterons and Athlon 64s. This means that the L1 and L2 cache don't hold copies of the same data.
  • by tomstdenis ( 446163 ) <tomstdenis@gm a i> on Wednesday October 11, 2006 @08:23AM (#16391103) Homepage
    As others pointed out, inter core communication has to hit the FSB. That makes things like owning/modifying/etc cache lines slower as you have to communicate that outside the chip.

    There are also process challenges. Two dies take more space than 4 cores on one die since you have replicated some of the technology [e.g. FSB interface driver for instance]. Space == money therefore it's more costly.

    If one dual-core takes 65W [current C2D rating] than two of them will take 130W at least [Intels ratings are not maximums]. AMD plans on fitting their quadcore within the 95W enveloppe. Given that this also includes the memory controller you're saving an additional 20W or so. In theory you could save ~55W going the AMD route.

    Also currently, C2D processors have lame power savings, you can only step into one of two modes [at least on the E6300] and it's processor wide. The quad-core from AMD will allow PER-CORE frequency changes [and with more precision than before] meaning that when the thing isn't under full load you can save quite a bit. For instance, the Opteron 885 [dual core 2.6Ghz] is rated for about 32W at idle down from 95W at full load. I imagine the quad-core will have a similar idle rating.

  • True QC versus MCM: (Score:5, Informative)

    by Visaris ( 553352 ) on Wednesday October 11, 2006 @08:39AM (#16391255) Journal
    Intel's QC is really an MCM, or multi-chip-module. That means they have literally grabbed two Conroe (Core 2 Duo) chips off of the assembly line, and mounted them in a single package. From the outside it looks like a single chip, but inside, it has two, separate peices of Si, connected over the FSB. That is the problem: the two chips are connected to the same bus. A single chip presents one electrical load on the bus, and two chips present two loads. This means that the speed of the bus needs to be dropped. That is why kentsfield will have a slower bus speed than normal chips. If you think about it, this is the exact opposite of the situation you want. You have just added a core, so it would be nice to add more bus bandwidth. Instead, the Intel solution lowers the overall bus bandwidth, not to mention that it is a shared bus. The two cores fight each other over a very slow external bus, and this creates a performance bottleneck.

    When all four cores are on a single peice of Si, all sharing a L3 cache, the chips don't need to fight over the external bus as much. The cores can share information between them internally, and do not need to touch the slow external bus to perform cache coherency and other synchronization. Also, true QC chip presents one load to the outside bus. This means that the bus speed does not need to drop because of electrical load.

    There are many people who don't care how the cores are connected as long as the package works. The point is that the way the cores are connected have a direct impact on performance. We'll be talking about Intel vs. AMD cache hierarchy in 2007 when AMD uses dedicated L2 and shared L3 while Intel uses only shared L2. Expect cache thrashing on Intel's true QC chips with heavily threaded loads when it comes out. Next I'll hear people say that the cahce doesn't matter as long as it works. As long as it works for what? Single-threaded tiny-footprint benchmarks like SuperPi or Prime95? How about a fully threaded and loaded database or any other app that will actually stress more than the execution units?
  • by mhectorgato ( 1012167 ) on Wednesday October 11, 2006 @09:57AM (#16392199)
    I guess it all depends on what real-world applications your talking about. If you're referring to Word/Excel/Web/etc then an AMD QC won't be much quicker than a AMD DC as well. If your real-world apps refer to multi-threaded activity then: Acorrding DailyTech's benchmarks - comparing to a similarly clocked Core2Duo to Kentsfield (2+2), it's about 90% faster in 3D Studio Max 8, Cinebench 9.5 and TMPG Encoder; about 70% faster in Windows Media encoding. According to HardwareSecret compared it to a Core2Extreme (10% faster clock speed) it's 80% faster in POV-Ray, 50% faster in Sony Vegas 7.0a. XBitLabs compared it to a Core2Extreme (10% faster clock speed) and it's 54% faster in the 3D Mark 06 CPU tests. AnandTech estimated it was 51.4% faster than a similarly clocked Core2Duo in Divx 6.2.5 with XMPEG. The article concluded thusly: "With only a 266MHz difference in clock speed, the new Core 2 Extreme QX6700 isn't too hard of a choice to make. When Intel introduces a lower cost 2.40GHz Core 2 Quad version, things may get a little more complicated, but at the very high end we would rather have four slightly slower cores than two slightly faster cores. We expect that there will be some improvements in multitasking performance, especially if you have a decently fast I/O setup, and don't forget the performance boost you'll get in well threaded applications"
  • by aka1nas ( 607950 ) on Wednesday October 11, 2006 @02:38PM (#16397165)
    Pretty much every benchmark around shows that the K8 doesn't benefit signficantly from extra cache. From 512KB to 1MB you get maybe 3% or 4% more performance if you are lucky. The IMC saps quite a bit of the gains that having more cache brings you as the penalty for a cache miss is reduced.

Can anyone remember when the times were not hard, and money not scarce?