AMD Finally Unveils Barcelona Chip 118
Justin Oblehelm writes "AMD has finally unveiled its first set of quad-core processors, three months after its original launch date due to its "complicated" design. Barcelona comes in three categories: high-performance, standard-performance and energy-efficient server models, but only the standard (up to 2.0 GHz) and energy-efficient (up to 1.9 GHz) categories will be available at launch. The high-performance Opterons, together with higher frequencies of the standard and energy-efficient chips, are expected in the out in the fourth quarter of this year.
But it's far from clear that this is the product that will help right AMD's ship."
how well will it overclock? (Score:2, Interesting)
I get 2.7ghz out of a 2.0ghz rated X2 (on air).
Once again they have beaten Intel's prices by at least $100 so we all win.
Re:how well will it overclock? (Score:4, Informative)
Re: (Score:2)
Re: (Score:3, Insightful)
They are talking about server chips, which typically are more expensive than desktop chips.
Re: (Score:2)
Why? What's the difference? Honest question - isn't speed just speed?
Re: (Score:3, Funny)
Kinda like an Associates Degree vs. a High School Diploma.
Cheers.
Re: (Score:2)
Re: (Score:2)
Benchmarks (Score:5, Informative)
And a performance preview for Barcelona desktop as well [anandtech.com].
Re:Benchmarks (Score:4, Insightful)
Re:Benchmarks (Score:5, Insightful)
Re:Benchmarks (Score:4, Insightful)
Re: (Score:2)
Re: (Score:1)
Re: (Score:1)
Clock for clock Barcelona is faster than Cloverton (Score:1, Interesting)
If you scale the benchmarks to the same GHz rating you will see that clock for clock Barcelona is at worst on par with Intel's best chip, and at best 80% faster on floating point. This is really quite amazing when you consider it's using the same amount of power as the previous 2 core AMD Opte
Re:Clock for clock Barcelona is faster than Clover (Score:4, Insightful)
We already know that AMD has superior memory performance. If you are doing bandwidth-limited floating point, Barcelona is the clear winner.
If you're making a general statement about floating point performance, you're wrong.
Re: (Score:3, Informative)
Unfortunately processors are not generally sold "clock for clock." If you're on par clock for clock, but the other guy is clocked more than 50% faster than you... that could be trouble.
What good is an Intel chip that has fast floating point but the bus cannot feed it data fast enough?
Plenty good if the data can fit in cache, in which case the unit can be fed fast enough. For instance, say you're running LinPack [wikipedia.org].
Re:Clock for clock Barcelona is faster than Clover (Score:4, Informative)
Define throughput. At some point you need to decide if you are solving equations like LinPack or equations like spec_fp. One causes lots of cache misses and benefits from memory bandwidth, the other does not.
Right now that chip appears to be Barcelona.
Well that's a hypothetical statement based on perception of your needs and their marketing.
I'm not interested with hypothetical arguments
That explains why you're making them (???)
I am looking forward to using Barcelona processors because they will get my mathematical computations done faster.
Hypothetically. Are you going to hypothetically switch when Intel's Penryn with SSE4 comes out? What about Intel's Nehalem?
By the way, check out number 2 and 3 on your top 500 supercomputer list - they're Opterons.
And?? They were designed and built before Core 2 was released. Do you think I'm going to argue they should have used Pentium 4's? Those systems also make solid use of NUMA through a custom Cray crossbar (Seastar), and Intel doesn't have that. If they made them today I see no reason for them not to use Opterons. Do you have a computer with lots of Opterons and a Cray Seastar router on order?
The performance of those systems is measured using LinPack. As I mentioned at the beginning, declaring a 2.0 GHz Barcelona as having faster fp throughput than 3.2 GHz Core 2 depends wholly on which types of calculations you are doing. spec_fp does calculations that are memory bound, LinPack does not (at least not as much). Barcelona's faster fp throughput is not due to markedly superior fp unit (though it may be marginally better) but its onboard memory controller. If you need that sort of thing, great, go with barcelona. If you need raw speed on smaller units (under a couple of megabytes) chances are good that the higher clocked Core 2 with huge cache will win.
Re: (Score:1)
I really dislike this whole "tuner" mentality from most reviewers. This is a server chip, so not just clock for clock, but also dollar for dollar, and watt for watt will be big issues. Plus, Intel still generally releases larger caches, so that weighs in.
"Full generation behind"? (Score:2, Informative)
Heh, shouldn't that be "full generation ahead" since AMD manages to put four cores on a single die?
Re: (Score:1, Informative)
Emphasis mine. Reading comprehension 101: Read the whole sentence. AMD is at 65nm, Intel is at 45nm, just as when AMD was at 90nm, Intel hit 65nm. This qualifies them as being "a generation behind" in chip making processes.
Whether or not their architecture or their core design is better is completely irrelevant to that sentence (but relevant to the next, which is why it's so odd they'd put tho
Re:"Full generation behind"? (Score:5, Insightful)
This is a direct reference to 65nm vs. 45nm geometry. If AMD brings their quad core to a 45nm process, that should help yield, power and performance. If nothing else, it puts them on a level playing field with Intel (who already have product at 45nm [intel.com]) so that it's down to "design vs. design." Being stuck one silicon technology generation back, they need to resort to other tricks to "keep up."
In other words, to be at overall performance parity with Intel, they have to have a more advanced design in 65nm to keep up with Intel's 45nm work.
Another thing worth noting: By being 1 generation back, the quad core setup is a double whammy. The die area of a given chip roughly halves with each technology node. Not only is AMD putting twice as much on one chip, it's also making chips that are twice the size per transistor. (Remember, to double square area, you only increase your linear feature size by sqrt(2). 65/45 = 1.444... which is about sqrt(2).) Each additional sq mm of die area causes greater yield loss than the one before it (driven by defect density in the source silicon). Doubling die size has a huge impact on yield. So, AMD will potentially suffer significantly higher yield loss, and correspondingly higher costs. Even if it can keep its ASP (average selling price) up, the profit margins will suck.
It'll be interesting to see if AMD can quickly shrink this design to 45nm and get closer to parity. The benefits of the quad core design probably become much more apparent at 45nm.
--JoeRe: (Score:3, Informative)
Re:"Full generation behind"? (Score:4, Insightful)
That could help with leakage power, but that doesn't address the yield and cost issues at all.
Re: (Score:3, Interesting)
The die area of a given chip roughly halves with each technology node.
This is not entirely true. Although I agree overall with what you're saying, core logic transistors scale much worse than cache as the manufacturing process decreases in size. I'm not sure if AMD factors this process disadvantage into their chip design, but it is an interesting design choice that they choose to stuff their chip real estate with logic transistors instead of cache. I'm sure that I'm oversimplifying, but I have a gut feeling that they possibly might be choosing to use less cache and more logi
Re: (Score:3, Interesting)
Re: (Score:2)
Fair enough. That said, it's not the transistors so much as it is the wires that don't scale well. I'll warn you: I'm not a physical designer, I'm just an architect. The one and only cookie I directly designed and baked was in 2 micron. [spatula-city.org] That said, I'm aware of the trends.
There could be a couple reasons AMD throws more logic at th
A little more complicated (Score:1)
Re: (Score:2)
Fair enough, but in terms of dollars of revenue per wafer, though, the relative cost of a given defect is generally smaller on a 45nm wafer than a 65nm wafer if the 45nm design is roughly 1/2 the size of the 65nm design. You've taken out a smaller percentage of the devices on the wafer. Note that I say "relative cost." 45nm wafers are still more expensive than 65nm wafers. :-)
Also, with a RAM-heavy design, you can build significant redundancy into your RAM arrays and perform RAM repair (remap columns
Re: (Score:2)
Yes, they are. [theregister.co.uk]
Re: (Score:1)
No... AMD's arrogance costs them dearly. Intel has superior fab/process technology and could build monolithic quad-core but it is more expensive than MCM because of decreased yield in monolithic quad-core per wafer.
AMD already has a decent infrastructure to support MCM quad-core very well but refuse to use it to increase their yields. Only arrogance and pride keep AMD from releasing MCM parts, which would si
Re: (Score:2)
Re: (Score:1)
Techreport (Score:5, Informative)
Re: (Score:3, Insightful)
Once Barcelona ramps up,
Re: (Score:1)
--sabre86
Re: (Score:2)
I'm curious (Score:1)
One thing of note is that Motherboards already exist for this processor in fair number. The Barcelona uses a socket F (1207) which the current dual core Operons already use. That should give this processor a decent jumpstart in t
Re: (Score:3, Informative)
* 2350 - 2,0 GHz, $389
* 8347 - 1,9 GHz, $786
* 8350 - 2,0 GHz, $1019
* 2344 HE - 1,7 GHz, $209
* 2346 HE - 1,8 GHz, $255
* 2347 HE - 1,9 GHz, $377
* 8346 HE - 1,8 GHz, $698
* 8347 HE - 1,9 GHz, $873
Finally, (more) fair wattage numbers! (Score:2)
Before everyone slams them for coming up with yet another cheesy marketing gimmick, I would point out that Intel has done this ever since the first of the power-sucking P4 line. They did it a bit less up-front, however, choosing to redefine "TDP" in their specs rather than give their numbers a new term (such as "ACP").
This still won't make for a completely fair direct comparison, because Intel's TDP
"right AMD's Ship" ? (Score:1)
Re: (Score:1)
Re: (Score:1)
Re: (Score:2)
Re: (Score:1)
It'll be good to see what comes up in the next 2-3 months as production ramps up.
Re:"right AMD's Ship" ? (Score:5, Insightful)
Because it doesn't matter how many fronts you are leading on, if you run out of money and can't borrow any more, you lose.
AMD has been running out of money, fortunately they can still borrow. If they don't stop losing money their credit rating will tank and then they will not be able to borrow any more.
THAT is what righting the ship means.
Re: (Score:2)
C//
Cool (Score:3, Interesting)
Re: (Score:2, Troll)
The only regret I have is that we probably won't use em for DB servers because of Oracle's asinine policy of charging per core, sometimes I wish we had gone SQL2005 for more stuff as it is going to scale better with improving hardware.
That is the most draconian pricing policy I have ever heard. You actually have to pay Oracle for increasing your processing power?
And an honest question: was there a reason why you didn't look at MySQL or PostgreSQL? I'm not a database expert but my work with them has made me believe they are robust solutions--I certainly prefer them to MSSQL, which is about as pleasant to use as a suppository.
Re: (Score:2)
Um, yeah. Charging per processor (or machine) is par for the course for large "enterprise" software packages. Oracle, Rational, all the hardcore rendering software, etc. they all do it. Welcome to real life.
Re: (Score:2, Interesting)
I think what the grandparent is distressed about is that they charge per core, rather than per physically discrete processor.
It is an interesting issue; what if you promised, honest Injun, to only use 3 cores of a quad-core CPU, for example?
Re: (Score:3, Informative)
Re: (Score:2)
I think what the grandparent is distressed about is that they charge per core, rather than per physically discrete processor.
I don't get it. A 'core' is simply a CPU that happens to share a piece of silicon with other CPUs. Charging per a piece of silicon doesn't really reflect the computing resources available for the application.
Re: (Score:2)
Charging per processor or machine
There's a large gap between charging per processor or per machine. A machine could feasibly be used to independently run the software alongside other instances of it on the network, so charging for another license isn't unreasonable. But charging per core doesn't make any sense to me: unless each core is running a separate and independent instance of Oracle (can it be programmed to do so? Does virtualization play a role here?) then it just seems like you're being penalized for attempting to increase your
Re: (Score:2)
haha, oh man, charging per core is a break (Score:1, Informative)
Re: (Score:1)
That shouldn't even be legal.
Re: (Score:3, Insightful)
Oracle is an amazingly powerful brand and managers think that "scalability" is something you buy rather than an engineering problem for programmers and system architects to solve. That's really the whole story. Given what servers cost and the actual performance differences between different database software given appropriately written client softw
If you considered using MSSQL (Score:1, Insightful)
Re:If you considered using MSSQL (Score:4, Insightful)
Re: (Score:2)
Re: (Score:2)
Oracle charges per CPU for up to 2-CPU servers (Score:2)
If you have 1 or 2 CPUs (i.e. you typically run "Standard Edition One" Oracle which limits you to 2 CPU sockets anyway), you are only charged per CPU *socket*, regardless of the number of cores per CPU - here's the details [oracle.com] from the horse's mouth.
So a quad core single CPU server will set you back about 3,000 pounds + VAT in
Re: (Score:1)
The penalty of moving too fast... (Score:1)
I fear that Barcelona might well wind up as the Great Eastern [wikipedia.org] of chip making - an impressive technological first, but,
Re: (Score:2)
Re: (Score:2)
Remember, AMD isn't Intel. They don't have the same resources as Intel. Spending the time on an MCM design would necessarily mean having fewer resources devoted to the native quad core version, and for whateve
Re: (Score:1)
question for the local geniuses... (Score:2)
Cheers.
RS
Re:question for the local geniuses... (Score:4, Informative)
1: Whether the software CAN use multiple cores.
2: How efficiently it uses the extra cores.
3: Whether the program is currently limited by cpu power or by something else.
For "1:", if the program can't use the extra cores, then you'll only see a speed improvement from the fact that the cores are 15% more efficient. i.e. A 2GHz one of these quads performs the same as a 2.3GHz (+15%) dual core from the previous generation for applications in this category.
For "2:", if the program can use the extra cores, but not as efficiently as the first, then you'll see a speed increase equivalent to this. e.g., if the program does two tasks at once, one that takes 70 seconds and one that takes 30, then on one core it'll take 100 seconds. On two cores it would do the 70 second task on one core and the 30 second task on the other, reducing the total time to 70 seconds, a ~40% speed improvement.
For "3:", if the application is limited by something other than the cpu, e.g. "how quickly it can pull data from the hard-disk", you will likely see no improvement whatsoever.
In conclusion, depending on what applications you use, you will see anywhere from no improvement up to 2.3x the previous speed (x2 for double the cores and +15% from the improved efficiency).
Note: As these cpus also have an extra instruction set extension, applications that make use of this could exceed the speed improvements I noted above.
Re: (Score:2)
Re:question for the local geniuses... (Score:4, Insightful)
When you move a multithreaded program to a system with more cores, than any given thread is more likely to get a core to run on when it needs it. Assuming, of course, that you have enough threads so that's an issue.
Shameless plug: I'm the docs lead for this Opeteron-based server [sun.com], which can have up to 8 CPUs, for a total of 16 cores. When the Barcelona-based CPU modules are ready, customers will be able to upgrade their systems to a maximum of 32 cores. (Don't ask me when this will happen; Marketing would have me killed.) Obviously any software running on such a system has already dealt with the multicore optimization issue.
Re: (Score:2)
Re: (Score:1)
http://en.wikipedia.org/wiki/Multi-threading [wikipedia.org]
Not another fake number AMD! (Score:1)
But AMD customers who relied on the company's previous power metric of TDP (thermal design power) were putting too many resources into cooling and electrical supply, said Bruce Shaw, director of server and workstation marketing for AMD. That's because TDP was developed so server manufacturers would know much power the chip consumes in worst-case maximum-power situations that very rarely occur, and design their systems accordingly, he said. So now AMD will advise customers of an Opteron processor's average CPU (central processing unit) power, or ACP. "ACP is meant to be the best real-world end-user estimate of what they are likely to see from the power consumption on the processor," Shaw said.
Oh Great, first they used the + speed numbers (which I think cyrix actually started but they jumped right in there). I can see that the core speed really meant nothing so I was OK with that, but TDP is a real number. Obviously their marketing folks decided it drew too much power so they opted to make up a lower power usage number. Frankly even a home user wants to know top power usage of the cpu and video card to properly size their power supply.
Re:Not another fake number AMD! (Score:5, Insightful)
Re: (Score:2)
Re:Not another fake number AMD! (Score:4, Informative)
To some extent. The Pentium 4 is where this started. The Netburst architecture was very power hungry normally, but it's maximum power was insane. The graph of power consumption vs benchmark had a long "tail", which Intel sought to chop off. See, TDP is a real-life number, since it's used by OEMs and others to design thermal solutions for the parts. If the thermal solution is insufficient, then the parts fail. So it's not actually possible to fudge TDP numbers.
What Intel decided to do was implement an on-chip thermal diode and some logic that halved the effective clock cycle* if the temperature went above a certain threshold. What this meant is that based on how they programmed this logic, they could guarantee that the chip's power consumption would never go above a certain level no matter what code you were running. They had effectively lopped off the long tail. The downside is that if your application does draw more power than the limit, then you'll see vastly reduced performance because of the clock throttling. Most of the time this is transient so it's not that noticeable, but there were benchmarks out there that showed this effect very clearly. Like a certain game benchmark would get lower scores at 640x480 than 1600x1200 because at the lower res the game was cpu bound as was crossing the thermal threshold.
So theoretically with this feature Intel could fudge the numbers however they wanted and claim whatever TDP they desired. In practice they don't have that much flexibility because if they set the bar too low then their effective performance would suck, and their TDP numbers are set at average power + several standard deviations.
The main reason why Intel was able to suddenly have low power chips is because they ditched the Netburst architecture and went back to a design that was more balanced between high clock speeds and high IPC.
They kept the clock throttling logic, though, since it does still give them some benefit in reporting lower TDP numbers. AMD doesn't have this feature, so their TDP is truly the maximum power (as determined by running a "power virus") that you would ever see, even though it's unlikely. Since power has become ever more important as a marketing feature even outside of mobile, I'm not surprised that AMD would decide to start touting expected numbers vs maximum.
* Actually a 50% duty cycle of full speed for some number of microseconds followed by completely off.
I've been buying Intel/Nvidia . . . (Score:3, Interesting)
Intel's been good to us Linux folk, and Nvidia has been easy enough to deal with.
If AMD comes out with an end-to-end Linux solution, CPU, GPU, and a good Linux-friendly partner for chipset, I'll seriously consider switching back to AMD parts.
Re:I've been buying Intel/Nvidia . . . (Score:4, Interesting)
Re: (Score:1)
Re: (Score:1)
AMD also has more energy-efficient chipsets with.. (Score:2)
More Barcelona (Score:2, Informative)
http://www.hothardware.com/Articles/AMD_Barcelona_Architecture_Launch_Native_QuadCore [hothardware.com]
does Intel need AMD (Score:1)
Re: (Score:1)
Re: (Score:1)
Blah (Score:2)
In the I/O arena, AMD potentially has the edge, and for HPC there's no question Barcelona will do well: this architecture is buil
Re: (Score:2)