Sun To Release 8-Core Niagara 2 Processor 214
An anonymous reader writes "Sun Microsystems is set to announce its eight-core Niagara 2 processor next week. Each core supports eight threads, so the chip handles 64 simultaneous threads, making it the centerpiece of Sun's "Throughput Computing" effort. Along with having more cores than the quads from Intel and AMD, the Niagara 2 have dual, on-chip 10G Ethernet ports with cryptographic capability. Sun doesn't get much processor press, because the chips are used only in its own CoolThreads servers, but Niagara 2 will probably be the fastest processor out there when it's released, other than perhaps the also little-known 4-GHz IBM Power 6."
Good floating point too (Score:5, Interesting)
This processor will also have a floating-point unit for each core, unlike the UltraSPARC T1 (Niagara) which only had one shared amongst all 8 cores. This should make it much more suitable than the T1 for a wide variety of applications. The T1 did great on multithreaded server-type tasks (e.g web, email, database) but would have been pretty hopeless for anything doing more than a bare minimum of FP work.
Interesting (Score:5, Interesting)
I'd like to see some benchmarks, and more technical specs, on these babies.
quad is a quad and I want a cheap 8-way desktop (Score:4, Interesting)
That said I've always wanted to get my hands on some of these new multicore UltraSparcs. I think they have a lot of potential, and the new ones seem extremely powerful.
Now if only Sun would but the low end one in a mac mini form factor and sell it as a java developers kit then maybe I could play with one. The low end sun fires are something I could almost afford, but I don't really want to keep a 1u on my desk just to try out the technology.
I think the big 64-bit address space and the ability to run lots of threads seems to fit well with Sun's Java. Not that I am a Java developer, I just think it's a good match, and it seems to be that's why people were using the older CoolThreads systems, enterprise Java.
Re:Sun doesn't get much processor press (Score:5, Interesting)
Also, if the last thing you have touched is a V440 then you are not exactly up to speed with the cutting edge of Sun products. I promise you that if you had actually ever seen a system running a T1 chip you would not say "their processor division has been kinda lagging". The cool threads stuff is amazing and they are the only people doing anything quite like it. I am not sure if you picked this up from the article but with one chip you get _64_ hardware based threads.
In our internal benchmarks a £20k T2000 with 1 x 8 core T1 outperformed a £100k+ V880 with 8 x 2 core Sparc. Freakin' cool and excellent value for money. Plus all this fits in two rack units.
Working in small companies is nice but I promise you that out there in the big wide world "most" companies don't think that $US20k is very much at all to spend on a system that will be part of a critical service.
Re:quad is a quad and I want a cheap 8-way desktop (Score:3, Interesting)
Note that the post was about the number of cores/threads in the Niagra chip design. In terms of chip design, the circuitry on the silicon is what matters, not how you package, integrate, or market it. Moreover, it does matter to a customer if marketing speak fobs him with two dualcore chips on a cracker instead of an integrated four core design.
Performance does not scale purely with the number of cores, it also matters how efficiently the cores can share resources and intercommunicate. Things like accessing shared memory and inter-process communication are an important part of real world applications. Just try to run a heavy duty threaded server benchmark with a lot of IPC on a faux "Quad" and compare how it scales relative to a true Dual core design, and you'll be lucky to get 1.5-1.6 times the performance instead of twice.
Re:Sun doesn't get much processor press (Score:5, Interesting)
It's all realative. Your 'high performance' Dell or Gateway wouldn't do much other then run bind at one of our locations. You are comparing apples to oranges. These systems are not for you to surf the net with, and as for price, well there is a lot to be gained from stability. I still have sparc systems with OEM (minus the disks) that are close to 20 years old running at some locations. Bet your Dell can't say that.
Re:yes but ... (Score:3, Interesting)
can it blend? - yes I'm sure it can, the iphone blended.
speaking of which how much does this processor cost, and why doesn't Sun Microsystems make laptops, I was looking for Unix machines recently and I decided to go with the Mac book pro, rather than the Linux machines (laptops) at Dell, because of the hardware and general lack of processing power, which doesn't seem to lend itself to virtualizing other Operating systems.
Re:Not going to be the fastest, but... (Score:2, Interesting)
Re:Interesting (Score:1, Interesting)
The Niagra line of processors is impressive, but you had to worry about workload AND what would happen in 18 months when the server was "reused" for some alternate need by folks that didn't know what floating point was?
I've tried to find a fit for the T1 servers in my work as a technical arch for the last year
I do about 25 projects a year - from start to finish, so I see a fair number of needs. The Niagra is an easy fit for a web farm. I just wish I had more of those projects that weren't wintel.
Re:Sun doesn't get much processor press (Score:3, Interesting)
Rock and T2 look very promising, but before that their processor division was lagging so badly they were putting re-badged Fujitsu chips in their high-end machines to try to stay competitive. Between the end of the dot-com era to the release of the T1, Sun's microprocessor division seemed like dead weight. They made a huge gamble to start designing web app optimised chips as the bubble was bursting, and it looks like it will end up paying off, but it comes at the end of a period where 'lagging' is a very polite way of describing their performance.
By the way, there seem to be a lot of low-power SPARC variants, but I've never seen a palmtop form-factor device containing one. Do they exist? I'd be very interested in one, since SPARC, even SPARC32, is a lot better supported than ARM, in spite of the latter's ubiquity.
Re:Trust me... (Score:3, Interesting)
There'd be a Linux port in practically no time, and I know a bunch of us Linux power users would adopt that setup in no time... cheap commodity hardware coupled with a high-throughput RISC processor would be great for desktop multitasking, software development, file serving, etc.
Re:Interesting (Score:3, Interesting)
If anybody is planning to benchmark this running common apps, I'd also be very interested to see how the approach to hiding memory latency works on more pedestrian applications like video encoding and pattern recognition (and maybe even thread-heavy GUI's).
IIRC (I researched this proc years ago for a University paper), it tries to hide latency by switching thread contexts whenever there is a cache miss or branch misprediction. The crossbar should help a little with cache-related stalls, but the core would already have switched to another thread in any case. So, if there are complex paths of execution, you'd only run them a fraction of the time, on cores that are pretty bare-bones to start with. HPC is probably still better off with single-processor systems, even with the addition of per-core FPU's in Niagara 2, but the Niagara architecture could be really great as a coordinating hub and reporting center for a number of networked number-crunching machines.
Re:Trust me... (Score:2, Interesting)
Which makes me wonder
Re:Trust me... (Score:1, Interesting)
Also the extensive pipelining (OGL2/DX9/DX10) which murders the result latencies (irrelevant for graphics rendering) and the huge amount of execution units (ATI/AMD's R600 consists of 320 FP32 ALUs -- the Niagara 2 has 8 FP64 ALUs total).
I posted in my journal recently suggesting that it would be easier to produce a modern GPU than an older card, since modern GPUs have much less application-specific logic and do more in software, relying on just having lots of cores / pipelines to give speed.
This is just plain silly. Many stages of the old fixed-function pipeline are still necessarily there in today's GPUs -- triangle setup, rasteriser, render output -- and about the rest, the mere management hardware needed to run DX9 shader programs (not counting the actual ALUs) is more complex than the simple vertex transform engines and pixel register combiners of DX7 vintage.
(And the actual ALUs, taking up most of the real estate together with the huge register files, aren't a walk in the park to design either. Although you can replicate functionality a lot, it's tough to get the whole enchilada efficient -- for example R600 has a complex hierarchy of four banks of sixteen units of a vector + scalar engine pair, with multi-level caching, and multiple shared texture samplers.)
They don't do more in software, they do vastly more complex things in hardware. Although you have a point that they have been somewhat stepping toward general purpose CPUs in their complexity if not in their workflow and role and strengths.
Re:Smokin'... (Score:1, Interesting)
But how can a core run 4 threads with only 2 integer units? (Hint: it doesn't. "Running" doesn't mean the same as "managing"... where Niagara shines is the lightning-quick context switches between the 4 threads -- hiding memory latency like no other design except certainly Niagara 2 -- but no it doesn't run them in parallel. Nevermind what "32 simultaneous threads" goodness Sun has implied and clueless journos repeated.)
Re:Trust me... (Score:3, Interesting)
On the other hand, I have heard that the non-x86 processor families (SPARC, PPC, MIPS, etc.) use some innovative chipsets, so it might be advantageous to be able to use a SPARC processor on a motherboard with a completely different chipset as well.
What I was getting at, basically, is that there's no reason why the PC platform has to be x86-only! The only thing that WAS holding it back is the closed-source nature of the Windows operating system and most applications... case in point: 5 years after x86-64 was introduced, nearly all PCs use 64-bit processors, and yet 64-bit Windows is hopelessly lacking in native applications and drivers. On the other hand, Linux and FreeBSD both had x86-64 ports available before the processors were even available for purchase. I run Ubuntu 64-bit and it works great.
If I could buy a SPARC or MIPS or PPC board that could be physically and electrically integrated into my current PC, I'd definitely give it a try! There's actually not a lot to it... produce a motherboard that conforms to the ATX physical specifications with the right power connector, and include some of the on-board peripherals that PC users have come to expect: ethernet, audio, IDE and SATA controllers.
Back in the late 90s/early 2000s, there were similar things, though unfortunately never popular: PowerPC-based PC platforms and Alpha-based PC platforms. I believe that with the increased prevalence of open-source, this is an idea whose time has come!