Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!


Forgot your password?

'Data Science' Is Dead 139

Nerval's Lobster writes "If you're going to make up a cool-sounding job title for yourself, 'Data Scientist' seems to fit the bill. When you put 'Data Scientist' on your resume, recruiters perk up, don't they? Go to the Strata conference and look on the jobs board — every company wants to hire Data Scientists. Time to jump aboard that bandwagon, right? Wrong, argues Miko Matsumura in a new column. 'Not only is Data Science not a science, it's not even a good job prospect,' he writes. 'Companies continue to burn millions of dollars to collect and gamely pick through the data under respective roofs. What's the time-to-value of the average "Big Data" project? How about "Never?"' After the 'Big Data' buzz cools a bit, he argues, it will be clear to everyone that 'Data Science' is dead and the job function of 'Data Scientist' will have jumped the shark."
This discussion has been archived. No new comments can be posted.

'Data Science' Is Dead

Comments Filter:
  • TFA is BS (Score:5, Interesting)

    by Sarten-X ( 1102295 ) on Wednesday March 05, 2014 @12:02PM (#46408829) Homepage

    Unfortunately, unless this is structured data, you will be subjected to the data equivalent of dumpster diving. But surfacing insight from a rotting pile of enterprise data is a ghastly process—at best.

    Sounds like this Miko Matsumura has no idea how successful Big Data projects actually work.

    To refine his analogy, unstructured data is much like processing recyclables. Everything that might possibly be good gets thrown into a large bin, and several sorting processes run to extract individual relevant (though messy) pieces. While those pieces alone aren't pure enough to be useful, there's enough meaningful information in them that statistical analysis can separate the good from the bad, and that's where the insight comes from.

    With a typical RDBMS, insight is readily apparent. A hypothesis that 75% of a user's purchases were widgets is simple to verify. In a non-relational database, as is often used in Big Data projects, that would be an inefficient computation (though it can be done). Rather, those databases are more aligned to produce a whole list of correlations between user demographics and purchasing habits, showing for example that users who buy widgets have often already bought foo bars. The "Data Scientist" didn't have to ever look specifically at statistics for widgets or foo bars, but the correlation is presented in a nice and accessible form, gleaned from millions or billions of independent data points.

    Miko Matsumura is a Vice President at Hazelcast, an open source in-memory data grid company.

    This is a SlashBI article written by executives for executives, with little basis in fact. Lovely.

  • by Daniel Hoffmann ( 2902427 ) on Wednesday March 05, 2014 @12:19PM (#46409021)

    90% of what a data science expert do is what people like to call data-juijitso (data reconfiguration). Which basically means getting data out of your RMDBs, SAP, Twitter, Facebook, random text (.csv, etc) file dumps, random Excel/Word Files and legacy databases and into some place you can actually generate conclusions from (like inside a HDFS Hadoop cluster). Plus during this process you need to normalize all your data so you can apply the same algorithm no matter where the data came from.

    All this means is that you will spend countless hours trying to connect to the client legacy stuff and then countless hours trying to get the data out (without impacting production systems!), so you can then spend countless hours formatting this data around to be able to spend countless hours trying to get this data into your Big Data(tm) solution so you can finally run some algorithms and create results. Now multiply all that by the number of different kinds of databases the client has and you get the idea.

    As an IT professional you really do not want to work in this field. No organization keep its data in a clean uniform way, data scientist is like an IT janitor.

  • Re:data scientist (Score:2, Interesting)

    by Anonymous Coward on Wednesday March 05, 2014 @12:45PM (#46409371)

    As the author of the article, I'm happy to encourage people to call themselves statisticians, database engineers, etc. These roles are definitely in demand and will never go away when the bubble for "data scientists" pop.

    I'm just concerned about the recent spate of large companies trying to hire data scientists to "save" their expensive big data projects that arent producing actionable insight. Those jobs are a dead end.

  • by Anonymous Coward on Wednesday March 05, 2014 @01:19PM (#46409833)

    I don't know about you but I am sick and tired of DICE's attempts to
    channel and steer the employment market through astroturf postings
    to Slashdot, which they also happen to own. Most of what the talking-heads
    at DICE churn out regarding employment is simply untrue. Not 'not-the-truth'
    as that they don't know any better, but telling lies as in spreading deliberately
    misleading information, as in telling a mean-spirited lie.

    DICE is not a platform for you and me to find lucrative jobs. Instead it is the
    other way around, DICE is a platform for employers to find cheap labor. The
    people who in THE END PAY DICE (that is those who use their system to
    recruit and those who advertise on DICE.COM sites), they are not interested
    in hooking you up with a $150,000 job when you could also be working for

    I'm not a Data Scientist myself, but I work with a bunch of them and from what
    see they are working on I know I'd have to go back to school for that. It also
    explains why they are worth so much and hard to get.

  • by rockmuelle ( 575982 ) on Wednesday March 05, 2014 @01:48PM (#46410175)

    I've been working with big data since before it was a term and currently run a scientific software company that touches on many aspects of "data science". Many of my colleagues also work in the field. I've seen many fads come and go. Data Science as a profession is one of those.

    Most people who call themselves data scientists are really just doing "big data" processing using tools such as Hadoop. They are delivering results to managers who have jumped on the big data band wagon and, not knowing any better, have asked for these skills. In 99% of the cases, the processing is simply haphazardly looking for patterns or running basic statistics on data that really isn't that big. However, there is a lot of low hanging fruit in data that hasn't been analyzed before and most practitioners who've suddenly become data analysis experts are rewarded for trivial findings. A tiny bit of statistics, programming, and data presentation skills go a long way.

    Compare this to the Web Masters of the late 1990s. The Web was new and managers knew that they needed Web sites. HTML and CGI were techie things but also fairly easy to learn. A group of people quickly figured out that they could be very important to a company by doing very little work and created the position of Web Master. A tiny bit of programming, sys admin, and design skills went a long way.

    Web Masters disappeared when IT departments realized that you actually needed real software developers, real designers, and real sys admins to run a corporate Web site. Sure, the bar is still low, but expertise beyond a 'For Dummies' book is still needed. And, few people can be experts in each area, hence the need for teams.

    Real data science has actually been around for a long time. Statisticians and data analysts have been performing this role for decades and have built up a lot of rigor around it. It a tough skill set to develop, but a very useful one to have. "Big Data" distracted people a bit and let the current generation of data scientists jump in and pretend everything was new and we could throw out the old methods. As the field evolves, data science will necessarily transition back to the experts (statisticians) and become a team effort that includes people skilled in programming, IT, and the target domain (analysts).

    That said, there's good money to be made right now, so if you have Web Master on your resume, you might as well be a data scientist while you can. ;)


Did you hear that two rabbits escaped from the zoo and so far they have only recaptured 116 of them?