Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Stats Businesses

Cutting Through Data Science Hype 99

An anonymous reader writes: Data science — or "big data" if you prefer — has evolved into a full-fledged buzzword, thanks to marketing departments around the world. John Foreman writes that part of the marketing blitz has been focused on how fast big data analysis can be. Most companies offering some kind of analytic service try to sell you on how it'll make it easy for you to quickly find and fix the problems with your business. But he points out that good, robust models need a stable set of inputs, and businesses often change far too quickly for any kind of stable prediction. He takes IBM's analytic services as an example, quoting Kevin Hillstrom: "If IBM Watson can find hidden correlations that help your business, then why can't IBM Watson stem a 3 year sales drop at IBM?" Foreman offers some simple advice: "Simple analyses don't require huge models that get blown away when the business changes. ... If your business is currently too chaotic to support a complex model, don't build one."
This discussion has been archived. No new comments can be posted.

Cutting Through Data Science Hype

Comments Filter:
  • by Anonymous Coward

    IBM, like SAP, Oracle and the rest, are dinosaurs unable to adapt their businesses to changing markets. Why would they be able to do the same for your company?

    • IBM, like SAP, Oracle and the rest, are dinosaurs unable to adapt their businesses to changing markets. Why would they be able to do the same for your company?

      Well, I'd say that fossil fuels, which are mostly composed of dinosaurs who were unable to adapt(along with plants who were unable to adapt, and various other organisms who were unable to adapt) revolutionized the hell out of our entire civilization...

      Maybe if IBM were buried and subjected to a few million years of heat and pressure they too would become a highly coveted resource?

      • What do you mean "dinosaurs failed to adapt", there are several of them flying around in my garden right now!
        • Birds heap shame upon their ancestors merely by existing. (Except maybe shrikes; their willingness to keep up a proud tradition of bloodthirsty carnivorous murder despite now being about the size of a sparrow is pretty honorable).
    • The dinosaurs did not die out because they were unable to adapt anymore than a person dies because they fail to "adapt" to a grenade.

      • Evolution is a cast-iron bitch sometimes. Dino's didn't adapt to the big grenade. Lots of other critters did.

        (And yes, fossil fuels are composed of relatively few actual dinosaurs, it's mostly ex-plant life.)

        • Grenades and huge rocks aren't "evolutionary," they are "catastrophic."

          • Catastrophe is a critical factor in most evolutionary history. Practices and traits that were successful, successful enough to become part of the biology or lifesstyle of an organism, often fail as circumstances change. I'm afraid that abrupt changes in environment are a common, through often unpredicatable, factor in many species.

            • Catastrophe is a critical factor in most evolutionary history.

              Citation, please.

              • by Antique Geekmeister ( 740220 ) on Saturday January 31, 2015 @06:08PM (#48948645)

                >> Catastrophe is a critical factor in most evolutionary history.

                > Citation, please.

                Wikipedia has a fairly good entry on "Catastrophism", and another on "Punctuated equilibrium". But even without large scale events such as dinosaur killer asteroids or the evolution of photosynthesis poisoning most species with much higher concentrations of volatile oxygen, the are much smaller and more frequent effects. Forest fires are a crtical factor in breeding jack pine trees, floods are vital to the fertility of the ecosystem near river banks, and hurricanes spread species throughout their trail and profoundly affect the ecology and evolution of areas that are likely to endure hurricanes. And catastrophes can and do create a "founder effect", where a small number of introduced species members become a new species quite quickly in their new environment.

                Do I need to find individual links links for each of those?

  • by turkeydance ( 1266624 ) on Friday January 30, 2015 @07:30PM (#48943397)
    "we don't need no stinkin' sales", we have Ginni.
  • by ShaunC ( 203807 ) on Friday January 30, 2015 @07:30PM (#48943401)

    "Big Data" is like sex in high school. Nobody really knows for sure how to do it properly, but everyone thinks everyone else is doing it, so everyone says they're doing it, too.

    • Mod up. Slightly vulgar but a really good analogy.
    • Re: (Score:2, Funny)

      by Anonymous Coward

      "Big Data" is like sex in high school. Nobody really knows for sure how to do it properly, but everyone thinks everyone else is doing it, so everyone says they're doing it, too.

      Well, OK, but this is slashdot. Are you sure your audience will get this analogy? Can you try to rework this into a car analogy instead?

      • by Registered Coward v2 ( 447531 ) on Friday January 30, 2015 @09:42PM (#48944093)

        "Big Data" is like sex in high school. Nobody really knows for sure how to do it properly, but everyone thinks everyone else is doing it, so everyone says they're doing it, too.

        Well, OK, but this is slashdot. Are you sure your audience will get this analogy? Can you try to rework this into a car analogy instead?

        "Big Data" is like sex in a car while in high school. Nobody really knows for sure how to do it properly, but everyone thinks everyone else is doing it, so everyone says they're doing it, too.

        • by gweihir ( 88907 )

          Sex in a car? Sounds messy and uncomfortable...

          • I used to have a car with a back seat truly the size of a sofa, a 1960 Dodge Phoenix (2dr dart... before they shrunk it). But alas, although I actually was having sex regularly, the car had no working parking brake so I couldn't do it in the car. Haven't had a vehicle with a big enough back seat to get my freak on since. I may never lose that purity point.

    • by gweihir ( 88907 )

      Indeed. The one big-data project I personally see at a customer does have the advantage that the IBM-team is too stupid to actually collect the data (they just cannot hack the engineering and have been delayed for over a year now and just recently were removed from the productive platform again because they break other things). So while the customer pays them oogles of money, they at least do not get bogus analyses in return.

      The fascinating thing is that I though that you do not find the combination of extr

    • I know the guy that did it. Big data is about asking the guy that did it.

      If I can assign that guy an identifier, then I know you forever.

      I know the girl, and I know the guy. More importantly, I know the guy that didn't go for that girl. I want to get paid.

      More importantly, I want everyone to be private.

      Don't pay me. I can't be bought.

      But everyone else, for all practical purposes, can.

    • by DeBaas ( 470886 )

      Indeed, and the few that actually do (or did) get it, love(d) it!

  • by Mikkeles ( 698461 ) on Friday January 30, 2015 @07:36PM (#48943437)

    Statistical Process Control and Western Digital rule are very applicable here. Without stability for a baseline, it's (pretty well) impossible to utilize small data, much less big data (big bad data:).

  • Marketing (Score:4, Funny)

    by sexconker ( 1179573 ) on Friday January 30, 2015 @07:58PM (#48943565)

    If you have a marketing department, you're wasting money.
    If you hire a marketing firm, you're burning money.
    If you hire a marketing firm and then take their advice, you're emptying your bank account into a volcano.

    • by Anonymous Coward

      Actually, marketing is the soul of the business.
      *Cue to the corporate-atheists that claim that business have no souls...

    • If you don't have a marketing department no one knows you exist.

      Marketing is a bucket of shit at the best of times, but you can do very little without it.

      • by quax ( 19371 )

        Marketing also encompasses requirement gathering i.e. understanding what the market needs. Especially for the fast moving software industry it is a core business process and about much more than just advertising and branding.

  • by rockmuelle ( 575982 ) on Friday January 30, 2015 @09:48PM (#48944121)

    Data scientists are this bubble's web masters. 'Nuff said.

  • these systems could be effective, but it comes down to ontology or more broadly research design

    i'm not saying *any* company can benefit from "big data", but most can

    the core problem is a misunderstanding of what is happening...from a to z alot of biz people are just clueless...the techies they hire to do the big data are partially responsible for this

    data analysis is great...everyone does it to some level...highly complex data analysis in a biz situation must have well thought out research questions and res

  • by EmperorOfCanada ( 1332175 ) on Friday January 30, 2015 @11:26PM (#48944597)
    I have worked with many very large data sets or very important data sets covering large numbers of people (not that big just complex). In both cases my first fight was with the data itself. I don't know how many databases I would get into with fields (all in one table) like phone, phone_num, number_phone, phonenum, and then usually a magical set like phone1, phone2, phone3, and phone2a.

    Or I would have lat longs for customers that put them in 100 miles off the coast of Nova Scotia (not sable island either). Or a mostly good lat longs but if they couldn't get one then they would use the lat long of the nation's capital resulting in 20% of the customers residing in any given nation's capital which also then obscured the actual number of customers in the nation's capital.

    And then dates, can nobody ever get dates right. A favourite is that round one of the system will only record the day of a transaction but later they expand their collection to the hour and minute but now the old dates are all at noon or something. So when you try to find the usage pattern of users there will be this massive spike at noon and a scattering of transactions in the rest of the day. Try and run that through a Bayesian analysis.

    I can go on and on with one of my recent favorites is a phone company database where many phone calls never begin, or never end.

    So I think the big bucks is not in doing an ML processing of their data using some ingenious Hadoop crap but to maybe use ML to clean the data up. And by the way if someone has a tilde(~) in their name your OCR needs to be shot.
    • Absolutely true. Unfortunately, it's far easier to convince management that the problem is the lack of a shiny tool that shows them pretty graphs than shitty data that they have to pay some consultant an ungodly amount of money to fix. Because, of course, no one in the company has the time to fix the data on which they run their business.

      • Hey now!!! Ungodly amounts of money paid to consultants is how I make my living; don't go shitting on it :)

    • And then dates, can nobody ever get dates right. A favourite is that round one of the system will only record the day of a transaction but later they expand their collection to the hour and minute but now the old dates are all at noon or something. So when you try to find the usage pattern of users there will be this massive spike at noon and a scattering of transactions in the rest of the day. Try and run that through a Bayesian analysis.

      Data quality has been an issue with every project I've worked on involving data analysis or integration into a new system. One project was combining two employee databases for a merged company, where they decided to use SSNs as the key for unique records since it was a US company. Unfortunately for them, foreign employees on temporary jobs in the US often had 999-99-9999 or 123-45-6789 as SSNs, with the occasional real one thrown in. Then their were duplicate valid SSNs for employees that worked for both co

      • "Data cleanup will take twice as long, cost twice as much, and you will lose at least 10% of your data when you decide to finally give up scrubbing the data."

        I like this. I will use this from now on with my client. I will be sure to give proper credit to a Registered Coward :)

      • Data cleanup will take twice as long, cost twice as much, and you will lose at least 10% of your data when you decide to finally give up scrubbing the data.

        I actually independently came up with the 10% figure today as well, and mentioned to my project manager that unless he wants to invest real money chasing the long tail of data, he was going to have 10% of the records with bogus values in some fields. I will certainly adopt the rest of your quote!

        I have since added a corollary: I do not do IT projects unless you pay me enough to retire on.

        Here you lost me. Why were you even in this business if you didn't love the challenge? Don't take other peoples' bad data personally. Take it as an opportunity.

        • I have since added a corollary: I do not do IT projects unless you pay me enough to retire on.

          Here you lost me. Why were you even in this business if you didn't love the challenge? Don't take other peoples' bad data personally. Take it as an opportunity.

          I get enough work doing other things so IT work is something I can avoid unless it is lucrative enough. Most of my IT projects started out doing something differently then getting roped into staying on when they discovered I could actually deliver results. I've learned to so NO when asked to stay.

    • Yes! Dear Tea Pot! YES YES YES!!!!!!

      Then you find out the transactional data is jacked because it is 1) manually entered by a third party (not the user/customer) 2) entered without regard to policy 3) maybe not entered at all. [hangs head] and then they are the very ones asking for the analysis of that same data to drive their future planning and you want to beat them over the head with your rusting slide rule!!!!!!!

      • Worst database I ever worked on was the billing system for a telco. All fields text fields except for the automatically generated ID field. Thanks Lotus Notes and your IT Mall School training for that gem.

        Oh and the data input had pulldowns as a suggestion. So you could type Hal and it would suggest Halifax. But if you wanted you could just type Helifax and use that. This allowed for the easy addition of new towns and cities because in this small region they seemed to think we would be getting new towns
  • big data needs data science. data science does not need big data. data science = statistics and machine learning (mostly)

  • To predict global warming? Isn't this a form of "Data Science"?
  • by quax ( 19371 ) on Saturday January 31, 2015 @12:35AM (#48944797)

    Watson was impressive on Jeopardy, but a TV show is a very different venue than business data analytics.

    For the latter you really need a statistically sound approach in order to reach the right conclusion. [bayesia.us]

    (DISCLAIMER: I do not work for Bayesia, but actually a competitor, yet any person or company that understand Bayesianism [lesswrong.com] as a sound foundation for knowledge inference knows this dirty little secret about Watson)

"Protozoa are small, and bacteria are small, but viruses are smaller than the both put together."

Working...