Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!


Forgot your password?

Comment From someone who's bought this much storage... (Score 1) 217 217

While I agree with most commenters that you need to supply many more details before even beginning to narrow the options, if you do look at the storage vendors, DDN (Data Direct Networks) is really hard to beat.

I see the EMC Isilon guys posting here and need to counter. :) They are overpriced and underpowered for almost every application. Their strength is typical enterprise environments - lots of small files accessed via NFS and "enterprise" SLAs. That's almost always the wrong solution for big data applications (NFS is terrible for big data). EMC Isilon sold a lot of storage into my space (gene sequencing) and very few customers are happy, especially when they find out what the other vendors could do.

I've organized bake-offs between DDN, Isilon, and a number of other vendors. DDN always came out ahead on price and performance (every time they were half the price and twice the speed as Isilon). DDN is the most represented of the vendors on the Top 500 Supercomputing list and also power a certain streaming movie/TV service we all know and love. DDN is also a pretty ethical - if they're a bad match for your application, they'll let you know and provide recommendations.

Whatever you do, don't build it yourself. As tempting and fun as it is, given that you're asking the question, you've already self-identified as someone who won't be able to support it. I've seen many smart people go the SuperMicro JBOD route only to create support nightmares for themselves.

Also, for that much space, avoid Amazon at all costs. It's way too expensive compared to dedicated hardware.

For cost, budget around $150-250k to get started. It might seem pricey, but you'll spend more than that on manpower building it yourself (or your first few months on Amazon).

In addition to DDN, IBM, Dell, and HP all have solutions in this range that aren't terribly expensive.


Submission + - Building an "Open Source" community for a "Proprietary" Software Product

An anonymous reader writes: I run a company that develops scientific computing software. Our core product is a traditional proprietary application — we develop the software and deliver the "binaries" to our customers. We're considering changing our deployment to include all of the source code and giving our customers some additional rights to explore and extend it. The codebase is HTML/JavaScript/Python/SQL, so a lot of the code is available in some form already, albeit minified or byte compiled.

Because we are in a scientific domain, most of our customers use Open Source software alongside our product. We also maintain Open Source projects and directly support others. We're strong supporters of Open Source and understand the value of having access to the source code.

We also support a free (as in beer) version of the software with a smaller feature set (production and enterprise elements that individual users don't need are removed). We'd like that version to use the same model as well to give users that don't need the full commercial version the ability to extend the software and submit patches back to us for inclusion in future releases.

Overall, we'd really like to find a model that allows our core product to work more like an Open Source product while maintaining control over the distribution rights. We'd like to foster a community around the product but still generate revenue to fund it.

In our space, the "give the product away but pay for support" model has never really worked. The market is too small and, importantly, most customers understand our value proposition and have no problem with our annual license model.

We've looked at traditional dual licensing approaches, but don't think they're really right fit, either. A single license that gives users access to the code but limits the ability to redistribute the code and distribute patches to the "core" is what we'd prefer.

My questions for the Slashdot community: Does anyone have direct experience with models like this? Are there existing licenses that we should look at? What companies have succeeded doing this? Who has failed?

Comment Re:cool idea, but poorly done (Score 1) 32 32

Agreed. As a climber (haven't done El Cap yet, but have done some long Valley climbs), this is pretty lame. The distance between the images is way too far. I want a real, seamless view that I can follow up a route and see all the details. I want to be able to look at the rock, turn the camera and see all around. Look up, look down, find the next hold, see where my feet would be. You know, all the stuff that makes climbing fun.

Trip reports on Supertopo (www.supertopo.com) are way better than this if you want to get a feel for what it's like to climb a long route.


Comment Re:C++ is never the right tool (Score 4, Interesting) 296 296

Python is written in C. Linux is written in C. OS X is written in C (with libraries in Objective C). Most low level software is written in C, not C++. It's very important for this exercise to differentiate C from C++. They are not the same language and haven't been since C++ stopped being implemented using macros and the preprocessor and got its first compiler.

C is a much simpler language to learn and maintain, especially if you're doing low level code. C++ has a lot of very nice features, but it's benefits really only come into play if you're willing to put the time and effort into properly learning generic programming (the foundation Boost and the STL).

But, as most people have already pointed out, starting with Python and then migrating portions over to C or C++ as needed for performance is a much better approach. You can manage IO just as effectively from Python as you can from C or C++ and your development time will be much much shorter.


Comment Re:The Fuck? (Score 4, Insightful) 175 175

And the author hasn't looked at a relation database in the last few years, either. PostgreSQL, Oracle, MySQL, and I'm sure the other big ones all have JSON (or similar) column types now that let you attach semi-structured elements to your records. You get all the benefits of a RDBMs (ACID, referential integrity, 40 years of history) _and_ all the benefits of NoSQL.

Seriously, there's no good reason not to start with PostgreSQL and only add MongoDB if you really have a good use case for it (you know, you suddenly need to be Web Scale). Personally (and professionally), I use both, with PostgreSQL as the main DB for everything and MongoDB for read-only collections of indexed data.

My challenge to devs out there: spend the hour it takes to learn SQL and understand what you can actually do with it. And, stop pretending that an RDBMS won't scale to meet your needs (spoiler alert: it will).


PC Games (Games)

The Real Scars of Korean Gaming 126 126

An anonymous reader writes: Professional e-sports have been slowly but steadily gaining a following in the U.S. over the past couple of decades, but in South Korea, it's already arrived as a popular form of entertainment. An article at the BBC takes a look at the e-sports scene there, which is generating huge salaries for the top players, but also injuries and insular lifestyles. It's growing more similar to traditional pro sports all the time. From the article: "A scar, half an inch wide, stretched from just above the elbow and up over his shoulder. 'Our company paid for full medical expenses, so he had an operation,' explained his coach, Kang Doh Kyung. [He] is the best player in StarCraft and has won everything in this field and is still going strong.' Repetitive strain had injured Mr Lee's muscles, deforming them and making surgery the only option to save his illustrious career."

Comment Re:Learn from the wealthiest (Score 2) 150 150

Huh? Can you back that up with some evidence? I'm not in the super wealthy class, but do know some of them. They all let their kids use smartphones and tablets. Just as I let my kids use them and so do all my friends. Sure, there are the occasional families that don't allow access or restrict access, but those are few and far between - much like the families without TVs when I was growing up in the 80s.

Comment It's OK to Quit (Score 2) 583 583

Your first job could be the best job you'll ever have and it could be your last job. But, it could also be the worst job you'll have.

Be honest with yourself. If it's not working, don't be afraid to move on. It's not worth being miserable when you're just starting your career. Don't quit impulsively, but if things don't feel right, ask some older friends if what you're experiencing is normal or not. You don't have the experience yet to know better, but your elders do.

My first job was as a software engineer at a site everyone over 30 has used (it's still around, but not as popular). It was the early days of the internet. At my 6 month review, I got "dinged" for going home one morning at 3 am when everyone else stayed through the night. This was after two weeks of 18 hour days. I was doing more harm than good coding at that point. I was being paid $33k/yr and had no stock options. I was told everyone had to do this to keep up with "Internet Time". Over the next few weeks, most of the senior developers (back when senior developers were actually senior with 10+ years' experience) quit en masse. It took me a few more months to realize that this was not normal and leave as well. I would have been much better off walking after the first month.


Comment Re:Pay them market value (Score 5, Insightful) 234 234

Most CS professors are paid market value. You can look up salaries at public schools. You'll find that at the ones that compete with CMU, the salaries are all in the range of what the researchers would make at a company ($100-250k). Bonuses are a little harder to compete with. But, in CS at least, grants cover a ton of travel. To publish in CS, you have to go to the conferences you're publishing in, unlike the rest of science which just has journals. That more than makes up for the lack of bonuses as far as fringe benefits go.

Now, the one benefit you get from industry is that you don't have to write grants. But, you also have more job security in academia. What worries me most about this is that when this bubble bursts, Uber will be one of the first companies to go (at least, research at Uber will go quickly). These researchers will now be stuck without jobs in a market that will be very hostile towards PhDs. For their sake, I hope they all vest quickly enough to get a nest egg before things go south. (it's going to happen, it always does)


Comment I had microwave Internet 15 years ago... (Score 3, Informative) 221 221

In Lousiville, CO, I lived in one of the few neighborhoods that was skipped over for broadband in 1999. Sprint setup a microwave service that filled in the gap. Bandwidth was awesome - I was getting 10-30 MBs regularly. The downside was the latency - 100 ms ping times were the norm. I remember trying to play Duke Nuke 'Em with friends and having the unfair "advantage" of disappearing regularly when my client didn't ping back in time. Being line-of-site, there were also issues with trees occasionally swaying in front of the dish (a pizza box attached to my roof) and snow blocking the signal.

As others have pointed out, microwave Internet isn't something new and, unfortunately, in the real world isn't a perfect solution.


Comment Hadoop was never really the right solution... (Score 5, Insightful) 100 100

A scripting language with a good math/stats library (e.g., NumPy/Pandas) and decent raid controller are all most people really need for most "big data" applications. If you need to scale a bit, add few nodes (and put some RAM in them) and a job scheduler into the mix and learn some basic data decomposition methods. Most big data analyses are embarrassingly parallel. If you really need 100+ TB of disk, setup Lustre or GPFS. Invest in some DDN storage (it's cheaper and faster than the HDFS system you'll build for Hadoop).

Here's the break down of that claim in more computer sciencey terms: Almost all big data problems are simple counting problems with some stats thrown in. For more advanced clustering tasks, most math libraries have everything you need. Most "big data" sizes are under a few TB of data. Most big data problems are also I/O bound. Single nodes are actually pretty powerful and fast these days. 24 cores, 128 GB RAM, 15 TB of disk behind a RAID controller that can give you 400 MB/s data rates will cost you just barely 5 figures. This single node will outperform a standard 8 node Hadoop cluster. Why? Because the local, high density disks that HDFS encourages are slow as molasses (30 MB/s). And...

Hadoop has a huge abstraction penalty for each record access. If you're doing minimal computation for each record, the cost of delivering the record dominates your runtime. In Hadoop, the cost is fairly high. If you're using a scripting language and reading right off the file system, your cost for each record is low. I've found Hadoop record access times to be about 20x slower than Python line read times from a text file, using the _same_ file system for Hadoop and Python (of course, Hadoop puts HDFS on top of it). In Big-O terms, the 'c' we usually leave out actually matters here - O(1*n) vs. O(20*n). 1 hour or 20 hours, you pick.

If you're really doing big data stuff, it helps to understand how data moves through your algorithms and architect things accordingly. Almost always, a few minutes of big-O thinking and some basic knowledge of your hardware will give you an approach that doesn't require Hadoop.

tl;dr: Hadoop and Spark give people the illusion that their problems are bigger than they actually are. Simply understanding your data flow and algorithms can save you the hassle of using either.


If it is a Miracle, any sort of evidence will answer, but if it is a Fact, proof is necessary. -- Samuel Clemens