Nite_Hawk - Slashdot User

Comment Re:Not enough. (Score 1) 517

by Nite_Hawk on Saturday January 14, 2012 @06:02PM (#38700938) Attached to: White House Responds To SOPA, PIPA, and OPEN

Here's a question for you: Which is the worse failure? The D student getting an F or the A student getting a D?

Comment MIC presentations at SC11 (Score 3, Informative) 122

by Nite_Hawk on Wednesday November 16, 2011 @05:22PM (#38078446) Attached to: Intel Announces Xeon E5 and Knights Corner HPC Chip

I'm at SC11 right now and just attended NIC's MIC presentation. The scaling looks fantastic according to various codes that they compiled to run on it, but what was notably absent was performance relative to traditional x86 chips. The final presenter even said that now that the technology has been demonstrated to work (with minimal porting effort required) the next step will be to optimize and improve performance. The take away is that relative to Intel's other chips, MIC performance wasn't impressive enough to include in the presentation. That's fine in my book because it's an ambitious project, but it sounds like there is still some work to do.

Comment Re:Bipartisan support (Score 1) 548

by Nite_Hawk on Thursday November 10, 2011 @08:49AM (#38011170) Attached to: Bipartisan Internet Sales Tax Bill Introduced

That's the price we pay for free trade. It's not just taxes, it's environmental laws, labor laws, safety regulations, anything that imposes a cost on a company. The problem is that these things are all expensive and the inherent reasons for companies to keep labor local (shipping costs, training costs, etc) aren't enough to to keep them from moving to where they aren't enforced. We could raise import taxes on countries with unethical laws but it won't be popular with consumers nor with many powerful multinational corporations. Given the current state of political campaign finance laws it would be unlikely to survive if it can even be passed in the first place.

Unfortunately the only thing that seems likely to change any of this is that the cost of shipping becomes high enough that companies have incentive to keep production and consumption local. I say that is unfortunate because it increases overall costs, severely limits consumer choice, reduces competition, and encourages local monopolies.

Comment Re:you have no clue at the depth of fraud (Score 4, Informative) 548

by Nite_Hawk on Thursday November 10, 2011 @08:00AM (#38010866) Attached to: Bipartisan Internet Sales Tax Bill Introduced

You are insane if you think teachers are making $100k/year in retirement. My wife used to teach elementary school and was making ~$40k/year with a masters degree in education. If you take a look at the national teacher averages that's right in line:

http://www.payscale.com/research/US/All_K-12_Teachers/Salary

Ok,l lets look at police:

http://www.payscale.com/research/US/Industry=Law_Enforcement/Salary

Wow. Lots of $100k salaries there.

Comment Something Else (Score 2) 320

by Nite_Hawk on Tuesday November 01, 2011 @08:36AM (#37906000) Attached to: Which OSS Clustered Filesystem Should I Use?

Hi,

I work for a supercomputing center and am the maintainer of our 1/2 PB Lustre deployment. I also hang out on the GlusterFS and Ceph IRC channels and mailing lists and have spent some time looking at both solutions for some of our other systems.

For what you want, Lustre isn't really the right answer. It's very fast for large transfer (though slow for small ones). On our storage I'm getting about 12GB/s under ideal conditions and that's totally uninteresting as far as Lustre goes. There are very few other options out there that are competitive at the ultra-high-end (ie PBs of storage at 100+ GB/s). On the other hand you *really* need to understand the intricacies of how it works to properly maintain it. It doesn't handle hardware failures very gracefully and there are still numerous bugs in production releases. A lot of progress has been made since the Oracle acquisition, but it's going to be a while before I'd consider Lustre mainstream. I wouldn't use it for anything other than scratch (ie temporary data) storage space on a top500 cluster.

GlusterFS and Ceph are both interesting. GlusterFS is pretty easy to setup and has a replication mode but last I heard there were some issues simultaneously enabling striping and replication at the same time. Now that RedHat is backing it I imagine its going to pick up in popularity really fast. Also, having the metadata distributed on the storage servers eliminates a major problem that Lustre still has: A single centralized metadata server. Having said this it's still pretty young as far these kinds of filesystems go, and it's not immune from problems either. Read through the mailing list.

Ceph is also very interesting, but you should really run it on btrfs and that's just not there yet. You can also run it on XFS but there have been some bugs (see the mailing list). Ceph is really neat but I wouldn't consider it production ready. Rumors abound though that dreamhost is going to be making some announcements soon. Watch this space.

Ok, if you are still reading, here's what I would do if I were you:

If you are running on straight up gigabit ethernet you basically have no reason to bother with distributed storage from a performance perspective. 10GE is a cheap upgrade path and a single server will easily be able to handle the number of clients you'll have on a home network. From a reliability standpoint I've personally found that something like 70-80% of the hardware problems I have are with hardware raid controllers. I'd stick with something like ZFS on BSD (or Nexenta if you don't mind staying under 18TB for the free license). Then export via NFS or iscsi depending on your needs. If you want HA across multiple servers, here's what people are doing on BSD with ZFS:

http://blather.michaelwlucas.com/archives/221

Comment Re:Awesome! (Score 2) 205

by Nite_Hawk on Thursday September 29, 2011 @06:21AM (#37551704) Attached to: GNOME 3.2 Released

Everyone, please note that a slashdotter with a 4 digit UID likes GNOME 3.

Hey bashers, take note! :-)

Wow, that's almost as good as having Linus's endorsement!

Comment Re:Yes, this is legit and no, we're not idiots (Score 1) 387

by Nite_Hawk on Wednesday September 14, 2011 @02:13PM (#37401538) Attached to: Ask Slashdot: Best Use For a New Supercomputing Cluster?

Hi,

I work for a Supercomputing Institute as an HPC system administrator. I thought you were trolling too. I'm still not entirely convinced, but I'll give you the benefit of the doubt. ;)

First, let me say that you are in a painful situation on multiple levels. Buying the nodes separate from the network (and storage! and GPUs! and infrastructure!) is a recipe for vendor in-fighting. You will almost certainly have at least some issues with one of these things at scale and there's going to be finger pointing. Be ready for it. You may want to consider conservative solutions just to increase your chances that everything will work when you hook it up. You should check with your node vendor to see what warranty implications putting 3rd party GPUs in their nodes has. If you can get the rest of the equipment from your node vendor and have them put together a plan for how it will all work together with some kind of acceptance criteria it's going to dramatically increase your chances of success.

The interconnect isn't going to be as painful to deal with as something like Jaguar, but it's still something you need to think carefully about. You really need to figure out if the applications you are going to run need low latency and/or high throughput. If that doesn't matter just do bonded gigE for data and leave at least one GigE port for management. If you need high throughput and are considering 10GE anyway you should also consider IB. QDR is expensive but DDR is more reasonable and uses cheaper cables. You still get lower latencies and higher throughput than 10GE. Just make sure you hire or train someone capable of supporting it. Also be aware that to get full throughput to every node on the machine you need to buy a solution that really supports that. Don't expect full throughput to every node at once if you cheap out.

For GPUs: Why do you need GPUs? Do you have production code ready to run on them? Do you just want a couple of nodes to do development on and play with? If you really need them, you need to figure out the power and cooling your nodes and general infrastructure can handle and plan accordingly. Depending on what you are doing, QPI/hypertransport will be the bottleneck with 2 GPUs per IO hub (more like 1 if you do IB), so keep that in mind if you are doing a lot of main-memory intensive GPU computation. Again, find out what your programmers are using. It's probably CUDA unless they are cutting edge enough to have switched over to OpenCL. You'll probably want to stick with nVidia 2050+ cards for now unless you know that something else will meet your needs (consumer grade cards in bulk, ATI cards with OpenCL, etc).

For infrastructure, do you have water cooling? You should! A system of that size is going to pump out a ton of heat. Look into active water-cooled doors for your racks. This is especially true if you are going to buy a significant number of GPUs. Do you know that you have enough power? Our 1k node cluster uses 30% more power than the vendor speced when running HPL. You already have the nodes so do some stress testing.

As far as operating system goes, you may not be able to run rocks at that scale. We don't use rocks at all, but the people I know who do says it's not really appropriate for systems with more than a thousand nodes. xCat with RH/CentOS/Scientific Linux/SLES should work well. SLES licenses can be pretty expensive per year to maintain and I'm not sure it's really worth it over the others (We run SLES here). Scientific Linux may not be the best to run on IB connected machines as most of its bigger users use GigE.

I feel bad saying this, but you guys are in over your heads. Frankly it kind of sounds like you were in over your heads before your other staff departed, you just didn't know it yet. Go back to your node vendor and see if they can put together a plan for you. There's lots of HPC on the east coast, you should be able to find a qualified consultant that could really look at your situation in detail and help you out. The worst thing you could do at this point is try to buy a bunch of commodity parts and cobble something together.

Comment It was fun (Score 1) 1521

by Nite_Hawk on Thursday August 25, 2011 @06:03PM (#37212998) Attached to: Rob "CmdrTaco" Malda Resigns From Slashdot

Hi Rob,

It was fun hanging out with you in #e along with Raster, Mandrake, Snowman, and all the rest of the gang back in the old days when enlightenment and afterstep were the newest coolest things on the block. Whatever you end up deciding to do, I wish you the best of luck. Thanks for everything!

Mark

Comment Re:So much wasted time... (Score 4, Funny) 294

by Nite_Hawk on Wednesday August 17, 2011 @07:55PM (#37124666) Attached to: Interview With GNOME 3 Designer Jon McCann

s/through/threw

oh, and emacs sucks!

teehee

Comment lots of possibilities (Score 1) 174

by Nite_Hawk on Monday July 11, 2011 @04:54PM (#36726426) Attached to: Why No War Over MS's Android Patent Shakedown?

Just off the top of my head:

- They don't want the OEMS to run to google every time someone threatens them with patent lawsuits.
- Why stop Microsoft from pissing all over the OEMs? Does Google need OEM goodwill more than they would like Microsoft to have OEM ill-will?
- Maybe they want something from the OEMs to help them out that the OEMs aren't willing to agree to.
- Perhaps they don't want to escalate a patent war with Microsoft.

Comment Re:Perfect for Bitcoin mining! (Score 2) 184

by Nite_Hawk on Thursday June 30, 2011 @03:11PM (#36626214) Attached to: AMD Llano APU Review - Slow CPU, Fast GPU

I don't necessarily have an opinion regarding your discussion, but I wanted to point out that power draw specs can be incorrect, especially now that everyone is trying to be green. We have a large cluster here that we ended up having to install extra power for because the machine would shut down during HPL runs. The vendor (and this is not a small vendor) told us that for HPL, you have to spec power for 130% utilization instead of 100%. Now HPL is pretty intense, but it's something to keep in mind.

Comment Re:Funny how 128 cores used to seem like a lot (Score 1) 264

by Nite_Hawk on Thursday May 26, 2011 @05:10PM (#36256262) Attached to: Ask Slashdot: Best Linux Distro For Computational Cluster?

I know, it's insane. Once the 16-core Interlagos chips are out you could do the entire thing in a fully populated Dell 6145 2U enclosure (2 nodes).

Comment Building Clusters (Score 5, Informative) 264

by Nite_Hawk on Thursday May 26, 2011 @04:15PM (#36255412) Attached to: Ask Slashdot: Best Linux Distro For Computational Cluster?

Hi,

I work at a Supercomputing Institute. You can run many different OSes and be successful with any of them. We run SLES on most of our systems, but CentOS and Redhat are fine, and I'm using Ubuntu successfully for an Openstack cloud. Rocks is popular though ties you to certain ways of doing things which may or may not be your cup of tea. Certainly it offers you a lot of common cluster software prepackaged which may be what you are looking for.

More important than the OS are the things that surround it. What does your network look like? How you are going to install nodes, and how you are going to manage software? Personally, I'm a fan of using dhcp3 and tftpboot along with kickstart to network boot the nodes and launch installs, then network boot with a pass-through to the local disk when they run. Once the initial install is done I use Puppet to take over the rest of the configuration management for the node based on a pre-configured template for whatever job that node will serve (for clusters it's pretty easy since you are mostly dealing with compute nodes). It becomes extremely easy to replace nodes by just registering their mac address and booting them into an install. This is just one way of doing it though. You could use cobbler to tie everything together, or use FAI. XCAT is popular on big systems, or you could use system imager, or replace puppet with chef or cfengine... Next you have to decide how you want to schedule jobs. You could use Torque and Maui, or Sun Grid Engine, or SLURM...

Or if you are only talking about about like 8-16 nodes, you could just manually install ubuntu on the nodes, pdsh apt-get update, and make people schedule their jobs on google calendar. ;) For the size of cluster you are talking about and what I assume is probably a very limited administration budget, that might be the best way to go. Even with someting like Rocks you are going to need to know what's going on when things break and it can get really complicated really fast.

Comment Re:This is why... (Score 1) 317

by Nite_Hawk on Wednesday January 19, 2011 @07:38AM (#34925752) Attached to: Australia Mandates Microsoft's Office Open XML

I felt similarly until I left my (higher paying) job in industry and went back to work in academia at a supercomputing institute. It's not perfect but I'm a lot happier there. Why? Because the values of the people there and the Institute itself are a lot closer to mine than where I used to work. If you really care about openness, freedom, and open-source, there are ways you can surround yourself with it and still make a living. It may not be as glamorous and it may not pay as well, but its still possible.

Submission + - Allegations of an OpenBSD Backdoor Not Confirmed (infoq.com)

Submitted by aabelro on Thursday December 23, 2010 @01:23PM

aabelro writes: Some allegations regarding backdoors implemented at FBI’s request in OpenBSD’s IPsec stack were made earlier this month. After auditing the code, Theo de Raadt, the founder of OpenBSD, has concluded that there are no such threats in the open source operating system.

Slashdot Top Deals