maraist - Slashdot User

Comment Roll it yourself but take responsibility (Score 1) 219

by maraist on Saturday July 25, 2015 @05:16PM (#50182439) Attached to: Ask Slashdot: How Do You Store a Half-Petabyte of Data? (And Back It Up?)

Super-Micro has 36 and 72 drive racks that aren't horrible human effort wise (you can get 90 drive racks, but I wouldn't recommend it). You COULD get 8TB drives for like 9.5 cent / GB (including the $10k 4U chassi overhead). 4TB drives will be more practical for rebuilds (and performance), but will push you to near 11c / GB. You can go with 1TB or even 1/2TB drives for performance (and faster rebuilds), but now you're up to 35c / GB.

That's roughly 288TB of RAW for say $30k 4U. If you need 1/2 PB, I'd say spec out 1.5PB - thus you're at $175K .. $200k.. But you can grow into it.

Note this is for ARCHIVE, as you're not going to get any real performance out of it.. Not enough CPU to disk ratio.. Not even sure if the MB can saturate a 40Gbps QSFP links and $30k switch. That's kind of why hadoop with cheap 1CPU + 4 direct-attached HDs are so popular.

At that size, I wouldn't recommend just RAID-1ing, LVMing, ext4ing (or btrfsing) then n-way foldering, then nfs mounting... Since you have problems when hosts go down and keeping any of the network from stalling / timing out.

Note, you don't want to 'back-up' this kind of system.. You need point-in-time snapshots.. And MAYBE periodic write-to-tape.. Copying is out of the question, so you just need a file-system that doesn't let you corrupt your data. DEFINITELY data has to replicate across multiple machines - you MUST assume hardware failure.

The problem is going to be partial network down-time, crashes, or stalls, and regularly replacing failed drives.. This kind of network is defined by how well it performs when 1/3 of your disks are in 1-week-long rebuild periods. Some systems (like HDFS) don't care about hardware failure.. There's no rebuild, just a constant sea of scheduled migration-of-data.

If you only ever schedule temporary bursts of 80% capacity (probably even too high), and have a system that only consumes 50% of disk-IO to rebuild, then a 4TB disk would take 12 hours to re-replicate. If you have an intelligent system (EMC, netapp, ddn, hdf, etc), you could get that down to 2 hours per disk (due to cross rebuilding).

I'm a big fan of object-file-systems (generally HTTP based).. That'll work well with the 3-way redundancy. You can typically fake out a POSIX-like file-system with fusefs.. You could even emulate CIFS or NFS. It's not going to be as responsive (high latency). Think S3.

There's also "experimental" posix systems like ceph, gpfs, luster. Very easy to screw up if you don't know what you're doing. And really painful to re-format after you've learn it's not tuned for your use-case.

HDFS will work - but it's mostly for running jobs on the data.

There's also AFS.

If you can afford it, there are commercial systems to do exactly what you want, but you'll need to tripple the cost again. Just don't expect a fault-tolerant multi-host storage solution to be as fast as even a dedicated laptop drive. Remember when testing.. You're not going to be the only one using the system... Benchmarks perform very differently when under disk-recovery or random-scatter-shot load by random elements of the system - including copying-in all that data.

Comment Git for large files (Score 1) 383

by maraist on Thursday June 25, 2015 @07:50PM (#49991109) Attached to: Interview: Ask Linus Torvalds a Question

Git is an excellent system, but is less efficient for large files. This makes certain work-flows difficult to put into a git-repository. i.e. storing compiled binaries, or when having non-trivial test-data-sets. Given the 'cloud', do you forsee a version of git that uses 'web-resources' as SHA-able entities, to mitigate the proliferation of pack-file copies of said large files. Otherwise, do you have any thoughts / strategy for how to deal with large files?

Comment Package managers (Score 1) 383

by maraist on Thursday June 25, 2015 @07:47PM (#49991077) Attached to: Interview: Ask Linus Torvalds a Question

What is your preference for package management? E.g. for a new library, which flavor do you reach for? rpm? tgz? deb? git-clone? home-brew? </wink>

Comment BSD/OSX (Score 1) 383

by maraist on Thursday June 25, 2015 @07:45PM (#49991065) Attached to: Interview: Ask Linus Torvalds a Question

What is your opinion of BSD and the OS-X stack? Ever use them? Do you envy and of their attributes?

Comment Hot heads (Score 1) 383

by maraist on Thursday June 25, 2015 @07:43PM (#49991049) Attached to: Interview: Ask Linus Torvalds a Question

How do you deal with hot heads? You are bound to have interacted with some of the most self-important, bombastic, difficult people on earth in your days. Do you have any advice how you've managed to keep your cool?

Comment C v.s. the world? (Score 1) 383

by maraist on Thursday June 25, 2015 @07:39PM (#49991025) Attached to: Interview: Ask Linus Torvalds a Question

Is C your favorite language? Given your ability to invent your way out of frustration, I find it hard to believe C hasn't frustrated you into innovation.. So many languages were created for that very reason. Perl, python, dart, go, rust, etc.

Comment network-operating systems (Score 1) 383

by maraist on Thursday June 25, 2015 @07:34PM (#49990989) Attached to: Interview: Ask Linus Torvalds a Question

Have you ever considered a network-transparent OS-layer? If not why? I once saw QNX and and how the command line made little differentiation of which server you were physically on. (run X on node 3, ps (across all nodes)). You ran commands pretty much on any node of consequence.. I've ALWAYS wanted this capability in Linux... cluster-ssh is about as close as I've ever gotten. These days hadoop/storm/etc give a half-assed approximation.

Comment GPU kernels (Score 4, Interesting) 383

by maraist on Thursday June 25, 2015 @07:30PM (#49990961) Attached to: Interview: Ask Linus Torvalds a Question

Is there any inspiration that a GPU based kernel / scheduler has for you? How might Linux be improved to better take advantage of GPU-type batch execution models. Given that you worked transmeta and JIT compiled host-targetted runtimes. GPUs 1,000-thread schedulers seem like the next great paradigm for the exact type of machines that Linux does best on.

Comment grand-central (queuing) (Score 1) 383

by maraist on Thursday June 25, 2015 @07:26PM (#49990939) Attached to: Interview: Ask Linus Torvalds a Question

What are your thoughts on apple's GrandCentral? It seems that a central kernel managed queuing system would be less overhead than having 10 apps each launching 10x num-cpu's threads and all over-subscribing.

Comment To 3D print out woes away (Score 1) 888

by maraist on Friday February 14, 2014 @05:19PM (#46250209) Attached to: Star Trek Economics

I'm not seeing the connecting-gap between ' Amercans no longer fret over iPhones (because we can print one with a 3D printer ) ' and 'We can build a star-ship because we've decoupled interest-in-work from the-need-to-work-to-earn-money-to-survive / acquire the things we wish to have'.

I don't fundamentally understand how a star trek society can exist. If we can all convert energy into material things. Consider the fabel, "these are rich people's problems".. Meaning the stresses that make us work harder are ultimately enslave us to our commitments, _change_ as we get wealthier (individually and socially), but they do not disappear.

You might consider the man that has earned enough money that he can go back-packing in Asia for 10 years.. Could the world function if everybody did so? Assume even that we had robots to build houses / plant our food. SOMETHING is always going to be present that prevents eutopia, even 1,000 years after such a world.

It's too narrow minded to look at today's problems, remove a single variable and say; now sci-fi happens.

Comment Re: USB is slower with high CPU overhead (Score 2) 141

by maraist on Wednesday March 06, 2013 @06:21PM (#43098049) Attached to: Seagate's New SSHD Hybrids Have Dual-Mode Flash Caches

Usb 3 is no where near as fast as even esata 2. Ignore the hype. BW is rarely the limiting factor. I can regularly get ssd boosts of 30% on the "slower" esata.

Comment Re:Strange title.... (Score 1) 286

by maraist on Saturday July 21, 2012 @08:30AM (#40722833) Attached to: Why You Shouldn't Write Off Google+ Just Yet

Not sure why we'd have another dot-crash. Lets compare

1999:
The world was in un-fettered prosperity
The government was the good-guys with surpluses and expanding state/local infrastructure.. Fiber was being laid. Communism was failing
People would quit the corporate world and be driven by silly business plans to build entire small-business capital ventures
The market saw growth-growth-growth
The pricing grew to reflect the short-term trend - the lead-in to a generic business plan - it self-fed (unsustainable exponential growth)
Then when the generic business plan got into the ROI phase.. there was NO ROI.... All the business plans failed at the same time
The market tanks
The world economy restabalizes (note, doesn't crash..yet)
All those small firms put a lot of people in under-employment (less shipping, less flying, less office-supplies, less construction, less luxry purchases, etc).
Local municipalities/labor-intensive-corporation had contractually obligated themselves to 7% annual growth for pension plans
Said market collapse and re-stabalization with more modest 4% growth, brings projected short-falls EVERYWHERE
2002:
World governments over-react (including to 9/11) - drop interest rate to near zero
under-employed masses react (as intended) by borrowing
The ONLY viable investment at this point is the still-growing land/gold-inflation. (e.g. finite-resource ownership).
Both hyper-inflate.
Producing another lead-in to a business plan that will have exponential growth and ultimately super-saturate ROI and thus pop - nothing would prevent this BUT
Newly deregulated banks now cross-buy their depleted LOSS-MAKING pension-funds (due to 2000 collapse) into the ONLY profit making venture, the obvious-bubble-making finite-resource market (gems, land, etc). Gems run the risk of a precious metal rush (e.g. uncovering a massive gold main). Housing is highly contingent upon the pyramid scheme.. Need more buyers than sellers - can't perpetuate unless you have an abundant birth rate (WHICH IS DROPPING).
World banks determine inflation is too high.. They jack up interest rates.
This chokes but does not end the bank-borrowing growth rate
Deregulated banks get more clever and aggressive with their loan practices - new forms of insurance (CDF) allows them to flat out gamble against their own customers - hedging their gambling bets. This is a short-term win.. And so long as you're the first one that quits the game, you can win. Now, there is no longer a free-market incentive for banks to find credible loan customers, and likewise they have incentives to bribe ratings agencies to lie, and both then have incentives to lie to share-holders. So the market capitalizes this ultimately flawed strategy. Country-wide (of which I personally successuflly contributed) had the country's leading CD ROI (at 6%), reflecting secured investments due to guarnteed fraud-based profits.
Then bad-debt begins to default.
The insurance begins to pay-out
Projects are re-normalized
Heavy gamblers that didn't immediately exist are punished.
The world governments over-react
The re-normalized land-value chokes potential sellers (being under-water they couldn't sell if they wanted to)
This prevents geographic job migration (you're stuck in Detroit Michigan)
The people employed in real-estate, investment-banking, corporate sales are now under-employed again - cascading more large corporate [semi-]failures. Air-lines, automotive, etc. All cascading an unemployment crisis in some countries.
Reduced commerce, unfullfilled gambling bets, investment losses, and projections there-of bring about impractical pension plans in countries like Greece, spain Ireland. Their elected policies could work in a 1999 world economy, but not in a 2008 recession. Debts pile up, productivity halts in greece, countries and currencies are on the verge of collapse.

2013
Unemployement likely continues as inflation and government debt escallates and starts to choke public investment spending (thereby reducing world-wide government employment - e.g. austerity measures).. Somehow conservatives in Germany and the US prevent big projects/investments from cash-injection, and thus the world economy stiffles.. Meanwhile...
Communist controlled countries with managed currency, factories, natural resources, continue to buy entire countries with scarse resources.. (Using cheap labor as their initial source of income).. Oil, and materials used in high tech equipment like batteries and electronics are slowly shifting profits from middle-east to China.

2016
Trade wars are in full effect, with corporate espionage, contractual violations
Natural disasters and continued US draughts further escallate world-wide shortages (shifting cheap food production sales from China)

2020
Shortages of basic scarce resources lead to a new world-war between the east and west..

Not seeing tech as a major factor. :)

Comment Re:Discourage (Score 1) 107

by maraist on Tuesday November 01, 2011 @08:08PM (#37914520) Attached to: Ask Slashdot: Learning Dart Development?

I've also worn the hat of hiring highly skilled technical programmers. What I've found is that most of what 'good' programmers exhibit is self-motivated-determination to read on their own. People that read, not because they HAVE to get something done for work (and thus the bare minimum will suffice), but because they like to read technical manuals as if they were novels. They'll read it through, not because they're looking for a short-cut, or to get some nagging bug fixed, but because they want to dive-deep into some paradigm or language.

Well-read programmers sometimes comes from CIS degrees, usually NOT. Ironically most people I see coming out of universities are CRAP programmers. They go in thinking they're going to do this thing, but get overwhelmed quickly, become bare-minimalists in terms of understanding and (typically implementation), and resort to side-effect, poorly-documented, maximal-surprise code. Why? Because just like the all-nighter they pulled getting their project to work; it was passable.

English majors make great programmers, in my experience.. Presumably because they are people that can absorb a technical manual in a single night.

I've also found some electrical engineers too be good programmers (I happen to be among them). Mostly because they tend to attach problems from the bits up. They often have a very deep understanding of what a function is doing. It also means they have, by default, a richer math background - doing lots of math/equation proofs is useful when writing logic-functions.

So if you're past the college years and are trying to prove yourself. Do a lot of deep-dives of open-source projects.. Convince yourself that they work (e.g. critically analyze the code to understand the decisions made, as if you were the one making those decisions). Make sure you become familiar with a tool-chain (gpp -> gcc -> asm -> objcopy -> ln -> kernel-loader). Convince yourself that lisp is a great language (this will require every ounce of logical-strength that you can muster). Learn small-talk (the parent of most paradigms these days). Learn C++ (so you can see what everybody is trying to implement without actually implementing). Develop a VERY good understanding of C - (learning how everything is a symbol) - try and correlate obdump -x and 'nm' against C functions.. Learn how to make a shared library (either windows DLL or linux .so or Mac OSX dynlib). Delve into the format/layout of ELF. Learn the significance of the various segment-types (this generally applys to all OSs). Learn an ASM if you can.. Start by running
gcc -S helloworld.c
and
gcc -S -m64 helloworld.c
for the 64bit equivalents.. Make sure to put lots of function-calls, floating-point and OS calls.. Learn what the assembly is doing.. wikipedia ANYTHING you don't understand.

Learn a good editor.. Visual Studio, Eclipse/IntelliJ, X-code, kdevelop, codeblocks. Learn at least 64 short-cuts in two of them. Get familiar with thin editors (notepad++, vim, kate).

LEARN TCP. Google it.. Use perl, python, ruby or Java to write your own client / server in both TCP and UDP if you can. If you're up to it, try writting it in C or C++ + boost or Visual Studio.

LEARN the HTTP protocol (almost impossible to be useful these days without it). Use 'nc' 'curl' 'wget' 'telnet' interchangably to interact with an HTTP service.

Learn XML.. At least the DTD, but to really do well, learn XSDs.. Use javascript's DOM to muck with it to start.. But you'll probably need to learn a C/C++/Java/.NET's perferred APIs. It's hard to NOT have to parse XML in most paid-applications.

Learn UML. Read a good book on design patterns; eventually you'll think in UML for classes and DB entities; but you'll also need to think in terms of it for collaboration diagrams, sequence-diagrams etc.. (lots of free [online] tools.. creately, lucid-charts, argouml). Learn to white-board as if you were Italian. This goes over GREAT in interviews.

Learn SQL.. It's not going anywhere, I promise. Use postgres + pgadmin or mysql + phpmyadmin. Learn what RDBMS is.. What ACID is.. What the CAP theorem is. Try the free Oracle if you have an afternoon to totally lose; but it's probably useful in job-hunting.

Read up on NoSQL solutions.

When creating your resume (after 2 hours of doing all this), make sure to be HONEST in your skill levels in all of the above.. Don't JUST list that you know mysql,postgres,oracle,MS SQL. List mysql [expert], postgres [seasoned], oracle [exposed to], MS SQL [novice]. This avoids wasting people's time, and prevents dissapointments in the interview that would otherwise have been managed expectations leading up to a (yeah, he'll need some ramp-up-time, but I think we have work for him).

Comment Re:no way - wrong search terms leave things behind (Score 1) 434

by maraist on Monday October 10, 2011 @07:01AM (#37660494) Attached to: Putting Emails In Folders Is a Waste of Time, Says IBM Study

Depends what's meant by 'putting in separate folder'. Can an email have exactly one parent? Then what happens when you have 50 folders, each with 300 unread items. Is this more or less organized?

My preference is to have 100% of emails show up in inbox - but be auto-tagged. This is better than traditional folders because there is more than one parent.

todo, Reference, todelete, asap, projX, companyY, contactCategoryZ, personal, mailinglist, mailinglistX, etc

For each new email, I set up a rule to tag all similar emails (90% are todel). BUT, because they always show up to INBOX, I have a half second glance to decide if that email needed TODO/Reference, or if I don't want it todelete for some reason.

Searching on tags is superior to remembering keywords, because you can navigate the tags (just like folders). And depending on your email tool, you can mix and match "(tags:foo or tags:bar) subject:sales".

Comment Re:Improve Slashdot By Rewinding To What It Grew O (Score 1) 763

by maraist on Friday October 07, 2011 @05:58AM (#37636858) Attached to: Help Shape the Future of Slashdot

Right, but what is the basis for the paranoia. I am highly skeptical (even of a geek community) of properly directing that paranoia to non tin-foil-hat conspiracies.
1) Theft follows the money and the naive (e.g. major banks, major places with credit cards, and people/groups susceptible to social engineering attacks)
2) net-Stalking generally is done by major govs/institutions that make wide-area attacks with non-targetted victims, or petty people with no servers from which to reliably cause a reasonably cautious netizen to worry.
3) Ad-tracking / Ad-metric-gathering allows vendors to.. Well, produce more targetted ads. I never understood the visceral hatred of double-click. Though I share the frustration with ad providers that steal my cursor with CSS popup DIVs or flash.
4) Porn sites presumably can detect repeat non-paying visitors and restrict content (big shocker there).

I understand the notion of a condom-mode web browsing (no cookies, no cache, no passwords), and I can see the frustration with the web essentially being broken in that mode; but honestly. Session cookies are much more elegant than embedded tokens in paths; as they are perma-linkable. And being a personal hater of 'apps' when a stateful website is just as functional (and almost by definition, more portable), I find it difficult to swallow a demand that HTTP remain stateless.

Slashdot Top Deals