Comment Re:Abusive Relationship with Dr. Chu? (Score 1) 113
If your point is that the NRC were abusive *toward* Dr. Chu and his appointees, then I'll agree with you.  But that's not generally what the phrase "abusive relationship with" means.
If your point is that the NRC were abusive *toward* Dr. Chu and his appointees, then I'll agree with you.  But that's not generally what the phrase "abusive relationship with" means.
I'm not sure where the idea came from that there was any sort of relationship at all with Dr. Chu himself described in this article.
All he did was gather up some experts in the field and facilitate their advise to the Japanese. That's exactly what the Secretary of Energy should do.
And yes, some of their suggestions were radical. That's what "brainstorming" means. Coming up with all sorts of ideas and determining as a group which are the good ones and which are the bad ones. Has no one ever seen an episode of House before?
And Dr. Chu, as far as I can tell, was not himself directly involved in the "Chu group," the at-best-misleading-at-worst-inaccurate term used in the article. So to say anyone had an "abusive relationship" with Dr. Chu is just silly.
If a US Diplomat gets into a shouting match with a foreign minister, do we accuse Hillary Clinton of being abusive?
The first thing you need to do is realize you all are in over your heads. If you're desperate enough to post to Slashdot for help, you're already there.
The second thing you need to do is look for a consultant to help you out until you can hire permanent help to fill your vacant positions. I can strongly recommend R Systems (http://www.rsystemsinc.com/). It's run by former NCSA HPC gurus. I've worked with them many times; they have the know-how you need to salvage this mess in short order. You can't call them quickly enough; trust me on that.
Third, to answer some questions. The IB vs. 10GbE debate has been pretty well covered, but just to emphasize: if you need low latency (for tightly-coupled massive parallel processing), you *need* IB. Preferably QDR or FDR. For your core switches, go for a blade-style chassis whose backplane can handle FDR even if you opt for QDR for now. If it can handle EDR, even better, but I'm not sure those are shipping yet. FDR IB data rate is 56Gb and latency in the nanoseconds. Ethernet can't touch that yet.
All the scientists working with GPUs here are using nVidia. We've got 2050s and 2070s, so the 2090s are probably the right choice at the moment.
For management, xCat is by far the most scalable solution available right now, though we're working on an alternative. ROCKS does not scale well, largely due to its stateful nature. I'd caution you against using Scyld ClusterWare; it's based on BProc AFAIK, and as one of my friends is the former BProc maintainer, I can tell you that even *he* won't touch it with a ten-foot pole any more. It's too hairy and errorprone; it's also almost impossible to debug. Use something stateless and powerful but still relatively easy to maintain. Most of the large-scale shops (national labs and large academic sites) I know of use xCat or Perceus. Here at LBNL we use both xCat and Perceus with great success.
For Linux distribution, using RHEL or a clone. I'd recommend Scientific Linux 6 at this point. It's the best-run and most professionally-maintained of all the clones.
HTH. Good luck, and condolences on your recent loss(es).
Classic.  I'm really wishing I had mod points right now.
The Rocks approach is nice for quickly regenerating a failed node. And it's Centos under the covers, as noted, so it's RHEL in disguise. If you're running 16 boxes with dual quad-cores, you'll lose the occasional disk drive. If you run 64 cheap desktops with single-socket dual-cores, you'll lose a disk drive every week or two.
Of course, if you're using a modern (read: stateless) provisioning system, "regenerating a failed node" simply requires a power-cycle. And you lose far fewer disk drives since they're not used for the OS. And replacing a dead node with a new one is a single command and a power button.
Systems like ROCKS only seem great if you haven't used anything else.
I'll preface this by saying that I'm an HPC admin for a major national lab, and I've also contributed to and been part of numerous HPC-related software development projects. I've even created and managed a distribution a time or two.
There are two important questions that should determine what you run. The first is: What software applications/programs are you expecting the cluster to run? While some software is written to be portable to any particular platform or distribution, scientists tend to want to focus more on science than on code portability, so not all code works on all distributions or OS flavors. Small clusters like yours often focus on a few particular pieces of scientific code. If that's the case for you, figure out what the scientists who wrote it use, and lean strongly toward using that.
The second question is, who will run it? Many small, one-off clusters are run by grad students and postdocs who work for their respective PI(s) for some number of years and then leave. In this scenario, it's important to make sure things are as well-documented and industry-standard as possible to ease the transition from one set of student admins to the next. (And yes, PI-owned clusters have a surprisingly long lifespan. Usually no less than 5 years, often longer.) To that end, I strongly recommend RedHat or Scientific Linux.
We, and most large-scale computational systems groups, use one of two things: RHEL and derivatives, or vendor-provided (e.g., AIX, Cray). We run CentOS but are moving away from it ASAP. The Tri-Labs (Livermore, Sandia, and Los Alamos) use TOSS, which is based on CHAOS (https://computing.llnl.gov/linux/projects.html), which is based on RHEL. Many other sites use Scientific or CentOS. Older versions of Scientific deviated more from upstream, which caused sites like us to use CentOS instead. That's no longer true with SL6, and since CentOS 6 doesn't even exist yet (and RHEL6.1 is already out!), there are strong incentives to move to SL6.
Let me address some other points while I'm at it:
Why RHEL? If you can run RHEL itself, do so. RHEL isn't built with the same compilers it ships with; the binaries are highly optimized. Back when we were working on Caos Linux, we did some benchmarks that showed RHEL (and Caos, FWIW) to be as much as twice as fast as CentOS running the exact same code. So if performance is a consideration, and you can afford a few licenses, it's definitely worth considering. The support can be handy as well, particularly if this is a student-run cluster.
Why Scientific Linux? If you need a free alternative to RHEL or are running at a scale that makes RHEL licensing prohibitive, SL is the way to go, without a doubt. It's maintained professionally by a team at Fermilab whose fulltime job is to do exactly that. They know their stuff, and they're paid for it by the DOE. Other rebuild projects suffer from staffing problems, personality problems, and lack-of-time problems that SL simply doesn't have.
Why not Fedora? Stability and reliability are critically important. Fedora is essentially a continuous beta of RHEL. It lacks both the life-cycle and life-span of a long-term, production-quality product.
Why not Gentoo? Pretty much the same answer. The target audience for Gentoo is not the enterprise/production server customer. Source-based distributions do not provide the consistency or reproducibility required for a scale-out computational platform. You'll also have a hard time getting scientific code targeted at Gentoo or other 2nd-tier distributions.
Why not Ubuntu or Debian? Ubuntu is a desktop platform, not a server platform. Again, it boils down to their target market. There's really no value-add in the server space with Ubuntu, so why not just run Debian? If Debian's what your admins know best, it's worth considering, but keep in mind that very, very few computational resources run Debian, so you may have to do a lot more fending for yourself if you go that route.
Why not SLES?  Mostly a personal choice, but with its uncertain future, I'd be hard-pressed to say it's a safe option.  If you have a support contract from, e.g., IBM, that's different.  But judging by your cluster size, I'm going to wager that's not the case.
Why not ROCKS?  Anyone who runs large systems will tell you that stateful provisioning is antiquated at best, largely because it simply doesn't scale well.  ROCKS is firmly locked into the stateful model, and rather than rethinking their design, are trying to find ways to make it faster.  You can only say, "It's just a flesh wound!" so many times before the King is going to call it a draw and gallop on by you.
As for the question about user-friendliness, it depends on the people for whom you wish it to be friendly. If you want friendliness for the admin, what I've seen of Bright Cluster Manager looks promising. (I don't know if Scyld still uses BProc, but what I know about it has thoroughly convinced me never to touch the stuff.) IBM also has its Management Suite for Cloud that looked quite friendly at SC10.
For the users, there are a number of portal options you could try, including one from Adaptive (makers of Moab) that greatly simplifies job submission.  But the truth is, it's just Not That Hard(tm) to write up a template qsub script and hand it off to your users.  You really want to spend more time worrying about how to manage the resource efficiently and competently and make sure you maximize performance and stability.  That's what will get the most science done in the least amount of time...and isn't that really the point?
Maybe so, but the comment to which you replied, and with which you disagreed, was specifically about "using a message passing library."  That's MPI, not OpenMP.  It's like responding to someone saying, "I don't like spam!" with "But grilled cheese sandwiches are so much tastier when you put ham on them, so clearly you're wrong!"  Your statement may be technically correct, but as a response to the topic at hand, it is in error.
MPI != OpenMP
HTH.
Double the initial price? That's nothing. Back in my day, we hit TEN TIMES our initial stock price. Even our Friends and Family were millionaires!
Go burst your wussy little bubble wanna-be somewhere else and away from my azaleas.
Damn kids.
Blu-ray was released globally in June 2006; by December 2010, even with PS3s counted, it had a consumer penetration of 10.7%, according to NPD. This is the slowest adoption of a non-fringe video technology in history.
http://www.blu-ray.com/news/?id=4554
For scale, DVD was released in Japan in Nov 96, in the US in March 1997 and in Europe in Oct 1998. Even though it took them two years to get to three continents, it passed the 12% penetration mark in under four months (I can't find a number between 8 and 12%, it penetrated so fast.)
According to this, you are mistaken. It took three years for DVD to reach an appreciable footprint, same as Blu-Ray, and the BD chunk is larger than the DVD chunk after the same time. You also have to take into account that BD had direct competition from HD-DVD, whereas original DVD did not.
http://www.screendigest.com/www/reports/2010629b/10_07_evolution_of_home_entertainment_chart.gif
And compared to VHS, DVD looked just as abyssmal.
So. Global release takes almost three and a half years to reach ten percent, whereas Japan-only release passes the 12% mark in under one financial quarter.
Even LaserDisc, the famously failed standard, hit 10% in under two years.
What is your metric for "catching on just fine?" Is it "I own two of them?"
Nice try, but your trolling skills are rusty.
No, it won't, for the same reason that the much more plausible minidisc format failed: it is ridiculously unweildly, slow, expensive-per-byte, fragile and so on. A blu-ray burner starts around $85, and a writable 5-gig disc is in the neighborhood of $3.50 in bulk.
By comparison, the tiny, fast, durable, reliable MicroSD format will give you a reader/writer that pushes ten times the data rate of blu-ray *and* a cartridge five times the maximum size of a blu-ray disc for seven dollars.
<ad-hominem>Are you on crack?!</ad-hominem>
I can get a 50-pack of BD-R DL for $500. That's $10/disc for 50GB of storage, or $0.20/GB. By comparison, the best price I found for 64GB SDXC was about $140, and $60 for 32GB microSD, roughly $2/GB. The BD media price per GB is BETTER by an ORDER OF MAGNITUDE.
Oh, and its stability isn't on the order of single digit year counts.
Again, you are sorely mistaken and providing misinformation (with no evidence or proof whatsoever, mind you) to make your snarky comments look intelligent and well-considered. They aren't.
http://www.techmount.com/index.php/20060905/blu-ray-lifespan/
Blu-Ray disks will last 100-150 years. DVD's start at 10 years. Again, as much as an ORDER OF MAGNITUDE better. SD card life expectancy is similar to that of DVD's; even an SD card specifically designed for long-term, write-once archival storage will only last 100 years, making it comparable AT BEST to Blu-Ray: https://www.pcworld.com/article/199672/sandisks_sd_card_can_store_data_for_100_years.html
Why would anyone *ever* turn to blu-ray for storage? It's flash or tape, guy.
It is absolutely amazing to me that you're attempting to justify hardware choices in terms of the hardware being replaced, while ignoring the alternatives available. That's the kind of thinking one expects from a politician, not from someone with a five digit slashdot id.
How are you on an HPC group at LBNL if you think things like blu-ray will succeed as a storage medium? Do you make clusters of 386es?
Q.E.D.
The question is fundamentally flawed, so there can't really be a valid answer.
Blu-Ray is catching on just fine. I have 2 Blu-Ray players myself, not counting the Blu-Ray burner in my laptop. The quality is noticeably superior to DVD (unless you lack the quality of equipment or visual acuity to discern it). You also have to remember that many Blu-Rays these days also come with DVD copies (and digital copies), so the numbers may be skewed as a result of that.
Even if Blu-Ray doesn't succeed as a movie medium, it will succeed as a data medium due to the simple fact that file sizes are growing and DVD's are too small to provide reasonable backup/archive storage for today's larger drives, much like 3.5" floppies did back in the days of the Floppy Shuffle. A dual-layer BD holds ~6x as much as a dual-layer DVD, so for longer-term archival, BD is a necessary evolution of size.
I have a number of them, but those are my reasons, and they're driven by my experiences and my needs as a user (and, as a sysadmin, the needs of my customers and colleagues). You have to draw your own conclusions.
My advice would be to do your homework. Look at what others are saying about the various options you have: about the projects, about the people behind them...about quality and community and all those things that matter.
Then go with whatever project fits your needs and makes you proud to be part of it. It really is that simple.
Actually, they are mistaken. CentOS came out of the cAos Foundation and was started immediately after the announcement of the cessation of Red Hat Linux. Furthermore, cAos and CentOS have origins in the Vermillion project, which was a rebuild of RHL 7 (before RHEL even existed), and which in turn can trace its own lineage back to "Red Hat Linux with VA Linux Enhancements," or RH-VALE, a rebuild and customization of RHL begun in the late 90's at VA Research/VA Linux.
So they got their history a bit wrong.
Reading their web page, if they build RPMs the way they tell everyone else to build them (in their documentation and presentations), I'd be wary of using their packages.  Proper distribution building and packaging involves controlled, reproducible, well-defined builds inside jails or other similar constructs, not "Joe User building in his ad-hoc home directory build tree."  Just my opinion, of course.
Scientific Linux 6 is already out. See http://ftp1.scientificlinux.org/linux/scientific/6.0/x86_64/os/sl-release-notes-6.0.html for their detailed release notes. If there was any doubt in your mind that the direct rebuild projects are unaffected by this move, there shouldn't be any longer.
It's pretty clear they're trying very hard this time around to stay in lock-step with upstream (what they call TUV and what CentOS calls PNAELV) and add fewer packages into the mix directly. They're also funded to do this work full-time by the US government, and since many universities and national labs rely on SL, it's not going away any time soon.
If you've never tried it before, I encourage you to do so. To quote the old tagline, it's already ready already.
...but no "serious administrator" would be caught dead using linuxconf.
In fact, the only GUI/TUI admin tool a "serious administrator" would ever use was smit/smitty on AIX, and that's only because the F6 key taught you how to do everything the right way (command line) faster than getting up and walking over to the bookshelf to find the appropriate Redbook.
SAM, admintool, linuxconf, YaST...all anathema to a truly "serious" administrator.
PS: And if you read the above and asked yourself, "What's AIX?" the only thing serious about you is your acne. Now pipe down, and get off my lawn.
"355/113 -- Not the famous irrational number PI, but an incredible simulation!"