Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×

Comment Re:Topology matters more than GFLOPS (Score 1) 59

Comparing LINPACK numbers makes sense. But GFLOPS (or TFLOPS or PFLOPS or whatever), by itself, is a meaningless and misleading number. Most people just stop thinking when they see a single metric like GFLOPS, and then they compare GFLOPS in one system to GFLOPS in another system. If those systems *are* comparable, then fine. But often enough, they are not comparable.

Also, it wasn't a rant. A rant would have involved caps lock.

Comment Re:Topology matters more than GFLOPS (Score 1) 59

Yes, many problems can be expressed as dense linear algebra, and so measuring and comparing LINPACK perf for these makes sense for those problems. However, many problems don't map well to dense linear algebra. The Berkeley "parallel dwarfs" paper expresses this idea better than I ever could: http://view.eecs.berkeley.edu/wiki/Dwarfs

Comment Topology matters more than GFLOPS (Score 5, Insightful) 59

I really, really wish articles would stop saying that computer X has Y GFLOPS. It's almost meaningless, because when you're dealing with that much CPU power, the real challenge is to make the communications topology match the computational topology. That is, you need the physical structure of the computer to be very similar to the structure of the problem you are working on. If you're doing parallel processing (and of course you are, for systems like this), then you need to be able to break your problem into chunks, and map each chunk to a processor. Some problems are more easily divided into chunks than other problems. (Go read up on the "parallel dwarves" for a description of how things can be divided up, if you're curious.)

I'll drill into an example. If you're doing a problem that can be spatially decomposed (fluid dynamics, molecular dynamics, etc.), then you can map regions of space to different processors. Then you run your simulation by having all the processors run for X time period (on your simulated timescale). At the end of the time period, each processor sends its results to its neighbors, and possibly to "far" neighbors if the forces exceed some threshold. In the worst case, every processor has to send a message to every other processor. Then, you run the simulation for the next time chunk. Depending on your data set, you may spend *FAR* more time sending the intermediate results between all the different processors than you do actually running the simulation. That's what I mean by matching the physical topology to the computational topology. In a system where the communications cost dominates the computation cost, then adding more processors usually doesn't help you *at all*, or can even slow down the entire system even more. So it's really meaningless to say "my cluster can do 500 GFLOPS", unless you are talking about the time that is actually spent doing productive simulation, not just time wasted waiting for communication.

Here's a (somewhat dumb) analogy. Let's say a Formula 1 race car can do a nominal 250 MPH. (The real number doesn't matter.) If you had 1000 F1 cars lined up, side by side, then how fast can you go? You're not going 250,000 MPH, that's for sure.

I'm not saying that this is not a real advance in supercomputing. What I am saying, is that you cannot measure the performance of any supercomputer with a single GFLOPS number. It's not an apples-to-apples comparison, unless you really are working on the exact same problem (like molecular dynamics). And in that case, you need some unit of measurement that is specific to that kind of problem. Maybe for molecular dynamics you could quantify the number of atoms being simulated, the average bond count, the length of time in every "tick" (the simulation time unit). THEN you could talk about how many of that unit your system can do, per second, rather than a meaningless number like GFLOPS.

Comment Re:Powershell (Score 2) 1154

I hope you realize that PowerShell is totally extensible, and totally supports reading / writing text streams as well as object streams. You can do exactly what you described in PowerShell. Your ignorance of that fact doesn't mean it isn't true. So there is no barrier between PowerShell and non-Microsoft technologies. You can either write PowerShell scripts, or you can write PowerShell command-lets in C#, or you can read / write flat text data and process it, exactly the way that you would do in /bin/bash or whatever. If you spent a little time learning more about PowerShell, you would see that it is *not* a walled garden at all.

Comment Re:New Memory Technologies - The Impact (Score 2) 139

Uhhh, no. The cache hierarchy was added over time. The first few generations of computers did not have caches at all. Even the processors that powered a lot of early PC-era computers did not have caches, unless you count the registers. For example, 8086/8088 did not have cache, 6502 did not have cache, 6800 did not have cache, 68000 did not have cache, etc. Cache hierarchies were added later.

The cache hierarchy also continually changes, albeit at a slow pace. Current generation CPUs typically have a 3-level cache, but the cache hierarchy in GPUs is quite different. Also, you have to take into account cache-coherent architectures (easy to program, inherently non-scalable) vs. non-coherent architectures (harder to program, far more scalable). It's not the case that you just always want more and more cache -- you want the right kind and size cache for the problem you are working on. For example, GPUs have a lot of local, read-only non-coherent caches, used for texture sampling and for constant buffers. These are very specialized caches, that don't look much like the general-purpose caches used in the L2 and L3 caches of CPUs. (The L1 I-cache and D-cache in a CPU is also very specialized, but differently specialized than a GPU cache.)

Comment Re:Don't hire union workers (Score 2) 487

I'm an American, and AC has it totally right. German labor dynamics are awesome. Union/mgmt relations in the US are almost inherently adversarial. It's about how best to screw the other guy, not how to succeed together. My dad has been a machinist (tool and die, now primarily CNC) for 40 years, and has worked in union shops and non-union shops. The union shops were only marginally better, if at all. The best shop he ever worked at (and has worked at for 20 years now) is a non-union shop, run by... Germans.

Comment Re:AMD64 != Intel64 (Score 2) 101

Lots and lots of SDKs, packages, etc. in the Microsoft world use "x64". One example (among an endless stream of examples): The DirectX SDK uses "x64" for the binary and lib directories. Lots of installer packages use "x64" subdirs or use x64 in the name of the setup executable, etc. Another example: If you run msinfo32.exe on an x64 system, the "System Type" is listed as "x64-based PC". "x64" and "amd64" are both used quite a lot.

It's really rude to tell someone to "suck it". Be nice.

Slashdot Top Deals

One small step for man, one giant stumble for mankind.

Working...