Please create an account to participate in the Slashdot moderation system


Forgot your password?
Slashdot Deals: Cyber Monday Sale! Courses ranging from coding to project management - all eLearning deals 25% off with coupon code "CYBERMONDAY25". ×

Comment Roll it yourself but take responsibility (Score 1) 219

Super-Micro has 36 and 72 drive racks that aren't horrible human effort wise (you can get 90 drive racks, but I wouldn't recommend it). You COULD get 8TB drives for like 9.5 cent / GB (including the $10k 4U chassi overhead). 4TB drives will be more practical for rebuilds (and performance), but will push you to near 11c / GB. You can go with 1TB or even 1/2TB drives for performance (and faster rebuilds), but now you're up to 35c / GB.

That's roughly 288TB of RAW for say $30k 4U. If you need 1/2 PB, I'd say spec out 1.5PB - thus you're at $175K .. $200k.. But you can grow into it.

Note this is for ARCHIVE, as you're not going to get any real performance out of it.. Not enough CPU to disk ratio.. Not even sure if the MB can saturate a 40Gbps QSFP links and $30k switch. That's kind of why hadoop with cheap 1CPU + 4 direct-attached HDs are so popular.

At that size, I wouldn't recommend just RAID-1ing, LVMing, ext4ing (or btrfsing) then n-way foldering, then nfs mounting... Since you have problems when hosts go down and keeping any of the network from stalling / timing out.

Note, you don't want to 'back-up' this kind of system.. You need point-in-time snapshots.. And MAYBE periodic write-to-tape.. Copying is out of the question, so you just need a file-system that doesn't let you corrupt your data. DEFINITELY data has to replicate across multiple machines - you MUST assume hardware failure.

The problem is going to be partial network down-time, crashes, or stalls, and regularly replacing failed drives.. This kind of network is defined by how well it performs when 1/3 of your disks are in 1-week-long rebuild periods. Some systems (like HDFS) don't care about hardware failure.. There's no rebuild, just a constant sea of scheduled migration-of-data.

If you only ever schedule temporary bursts of 80% capacity (probably even too high), and have a system that only consumes 50% of disk-IO to rebuild, then a 4TB disk would take 12 hours to re-replicate. If you have an intelligent system (EMC, netapp, ddn, hdf, etc), you could get that down to 2 hours per disk (due to cross rebuilding).

I'm a big fan of object-file-systems (generally HTTP based).. That'll work well with the 3-way redundancy. You can typically fake out a POSIX-like file-system with fusefs.. You could even emulate CIFS or NFS. It's not going to be as responsive (high latency). Think S3.

There's also "experimental" posix systems like ceph, gpfs, luster. Very easy to screw up if you don't know what you're doing. And really painful to re-format after you've learn it's not tuned for your use-case.

HDFS will work - but it's mostly for running jobs on the data.

There's also AFS.

If you can afford it, there are commercial systems to do exactly what you want, but you'll need to tripple the cost again. Just don't expect a fault-tolerant multi-host storage solution to be as fast as even a dedicated laptop drive. Remember when testing.. You're not going to be the only one using the system... Benchmarks perform very differently when under disk-recovery or random-scatter-shot load by random elements of the system - including copying-in all that data.

Comment Git for large files (Score 1) 383

Git is an excellent system, but is less efficient for large files. This makes certain work-flows difficult to put into a git-repository. i.e. storing compiled binaries, or when having non-trivial test-data-sets. Given the 'cloud', do you forsee a version of git that uses 'web-resources' as SHA-able entities, to mitigate the proliferation of pack-file copies of said large files. Otherwise, do you have any thoughts / strategy for how to deal with large files?

Comment network-operating systems (Score 1) 383

Have you ever considered a network-transparent OS-layer? If not why? I once saw QNX and and how the command line made little differentiation of which server you were physically on. (run X on node 3, ps (across all nodes)). You ran commands pretty much on any node of consequence.. I've ALWAYS wanted this capability in Linux... cluster-ssh is about as close as I've ever gotten. These days hadoop/storm/etc give a half-assed approximation.

Comment GPU kernels (Score 4, Interesting) 383

Is there any inspiration that a GPU based kernel / scheduler has for you? How might Linux be improved to better take advantage of GPU-type batch execution models. Given that you worked transmeta and JIT compiled host-targetted runtimes. GPUs 1,000-thread schedulers seem like the next great paradigm for the exact type of machines that Linux does best on.
Open Source

When Enthusiasm For Free Software Turns Ugly 177

An anonymous reader writes: Bruce Byfield writes for Linux Magazine about the unfortunate side-effect of people being passionate about open source software: discussions about rival projects can get heated and turn ugly. "Why, for example, would I possibly to see OpenOffice humiliated? I prefer LibreOffice's releases, and — with some misgivings — the Free Software Foundation's philosophy and licensing over that of the Apache Foundation. I also question the efficiency of having two office suites so closely related to each other. Yet while exploring such issues may be news, I don't forget that, despite these differences, OpenOffice and the Apache Foundation still have the same general goals as LibreOffice or the Free Software Foundation. The same is true of other famous feuds. Why, because I have a personal preference for KDE, am I supposed to ignore GNOME's outstanding interface designs? Similarly, because I value Debian's stability and efforts at democracy, am I supposed to have a strong distaste for Ubuntu?"

A good supervisor can step on your toes without messing up your shine.