Intel Reveals More Larrabee Architecture Details 123
Ninjakicks writes "Intel is presenting a paper at the SIGGRAPH 2008 industry conference in
Los Angeles on Aug. 12 that describes features and capabilities of its
first-ever forthcoming many-core architecture, codenamed Larrabee.
Details unveiled in the SIGGRAPH paper include a new approach to the
software rendering 3-D pipeline, a many-core programming model and
performance analysis for several applications. Initial product
implementations of the Larrabee architecture will target discrete graphics
applications, support DirectX and OpenGL, and run existing games and programs.
Additionally, a broad potential range of highly parallel applications including
scientific and engineering software will benefit from the Larrabee native C/C++
programming model."
Re:Good old SIGGRAPH (Score:5, Informative)
Re:Good news (Score:5, Informative)
I think it depends on how much Larrabee will cost, however with what we know so far Apple seems to be heading into multi-CPU architectures, so using Larrabee would make sense.
Larrabee costs somewhere between 150 and 300 Watt, so MacBooks and Mac Minis are not likely to use them. Mac Pro, on the other hand, possibly.
Re:Ray Tracing -v- Rasterization (Score:4, Informative)
No, because the article is about Intel explaining that the purpose of Larrabee is NOT to be specialised like that. It's meant to be a completely programmable architecture that you can use for rasterization, ray tracing, folding, superPi or whatever else you want to program onto it.
Basically, they're trying to say "it's not REALLY a GPU as such, it's actually a really fat, very parallel processor. But you can use it as a GPU if you really want to".
Re:Trying to fight the trend toward specialization (Score:4, Informative)
It almost certainly won't work. In the past, there has been a swing between general and special purpose hardware. General purpose is cheaper, special purpose is faster. When general purpose catches up with 'fast enough' then the special purpose dies. The difference now is that 'cheap' doesn't just mean 'low cost' it also means 'low power consumption,' and special-purpose hardware is always lower power than general-purpose hardware used for the same purpose (and can be turned off completely when not in use).
If you look at something like TI's ARM cores, they have a fairly simple CPU and a whole load of specialist DSPs and DSP-like parts that can be turned on and off independently.
Re:OpenGL (Score:5, Informative)
Re:OpenGL (Score:2, Informative)
You still need an API - which OpenGL provides. On the hardware side of things, few chips actually implement the (idealized) state machine that OpenGL specifies, it's always a driver in between that translates the OpenGL model to the chip model.
Not at all like Cell BE (Score:2, Informative)
> so far it looks like the x86 version of Cell
Then you missed the fact that the article says it uses a coherent 2-level cache for inter-core communications; the Cell BE is quite exotic in that it uses DMA transfers and has no memory coherency between the SPEs.
The article doesn't explicitly state that the Larrabee cores are homogeneous, but I would be surprised if they weren't; the Cell cores are somewhat heterogeneous if you want to use the PowerPC core to squeeze the last drop of processing power out of it.
You are correct in that Intel appears to have copied the ring network of the Cell BE, although I don't understand why they need it in addition to the coherent cache. Oh, well, guess I'll have to wait until the paper really hits the public.
Re:OpenGL (Score:4, Informative)
OpenGL is just an abstraction layer. Mesa implements OpenGL entirely in software. Implementing it 'in hardware' doesn't really mean 'in hardware' either, it means implementing it in software for a coprocessor that has an instruction set better suited to graphical operations than the host machine.
Sure, you could write your own rasteriser for Larrabee, but it wouldn't make sense to do so. If you use an off-the-shelf one then a lot more people are likely to be working on optimising it. And if you're implementing an off-the-shelf rasteriser, then implementing an open specification like OpenGL for the API makes more sense than making everyone learn a new one, and means that there's already a load of code out there that can make use of it.
Anandtech has an excellent article (Score:4, Informative)
That is much more detailed than the one linked in the article summary. It can be found here. [anandtech.com]
Re:C++ programming Model (Score:2, Informative)
In addition, Fortran to C tools have been around for some years. To say that Fortran is the only scientific language is BS. R [r-project.org], S Plus [insightful.com], Octave [wikipedia.org], matlab [mathworks.com], perl and CUDA [nvidia.com] to name a few. Taking R as an example - it provides an code interface that allows you to write optimised C/C++ routines and utilise those in the language itself.
Re:Not everybody plays 3D games (Score:1, Informative)
Nobody is arguing that Intel makes good graphics hardware. They make adequate graphics hardware that the majority use without problems.
Go to any non-gaming office building and tell me how many Intel graphics vs Nvidia and ATI you find. I am willing to bet that most, if not all, of them will be Intel.
Re:Good old SIGGRAPH (Score:3, Informative)
Re:Good news (Score:3, Informative)
They've stated that it will be a 150W+ chip on a PCI Express 2 card, as I recall, and is intended as a GPU, though it will be fully programmable and have CPU capability (so when not doing GPU stuff, it could serve as extra CPUs). It is intended to compete in the high end graphics market.
Essentially, it's a clutch of high performance software vector units in parallel with a bunch of CPUs. Graphics scale with each added processor because it is a software driven architecture, whereas traditional GPUs don't scale because they have a fixed function pipeline (if everything were written for shaders, I would think it would scale). One of the things Intel is touting is Binned rendering (aka chunked or tile rendering), which is breaking the frame into tiles and storing a list of front-to-back polygons in off-chip memory and the tile buffer is scaled to cache. Technically, this should be no faster than z-buffering, but I believe they're sorting and ray casting and in a brute-force sort of way this is faster than z-buffering. What I don't get here is how they get "2-7x the performance" because they have the extra sort step.
By the way, if you look at CPUs, Intel's Core2 line has five power designations:
X - Extreme - power > 75W
E - Standard Desktop 55-75W
T - Standard Mobile 25-55W
L - Low Voltage 15-25W (their name - they mean low power)
U - Ultra Low Voltage - Power < 15W
According to Wikipedia [wikipedia.org] the mini uses mobile processors (the T designation). Max power consumption of most laptops is 80W, so it is likely your mini maxes at 80W.