Cutting Through Data Science Hype 99
An anonymous reader writes: Data science — or "big data" if you prefer — has evolved into a full-fledged buzzword, thanks to marketing departments around the world. John Foreman writes that part of the marketing blitz has been focused on how fast big data analysis can be. Most companies offering some kind of analytic service try to sell you on how it'll make it easy for you to quickly find and fix the problems with your business. But he points out that good, robust models need a stable set of inputs, and businesses often change far too quickly for any kind of stable prediction. He takes IBM's analytic services as an example, quoting Kevin Hillstrom: "If IBM Watson can find hidden correlations that help your business, then why can't IBM Watson stem a 3 year sales drop at IBM?" Foreman offers some simple advice: "Simple analyses don't require huge models that get blown away when the business changes. ... If your business is currently too chaotic to support a complex model, don't build one."
Re: (Score:2)
Actually last year bonuses were forgone amid lower profits: BBC [bbc.com].
Re: (Score:1)
However they authorized stock buybacks that probably more than made up for the lack of 'bonuses' through sell off of restricted stock units. They didn't have bonuses directly, but they authorized giving cash to stockholders (particularly themselves).
Re: (Score:2)
Last year they gave up their bonuses. This year [yahoo.com] they brought them back.
Re: (Score:2)
Actually last year bonuses were forgone amid lower profits....
Now Watson has some data on what happens to a company when you cut the pay of its top-performing employees more than the lowest performing! *
* I'm talking about the regular employees who get ranked, not necessarily the exectives.
Humans ask the questions. (Score:2)
The problem with op
Re: (Score:2)
Or "garbage in, garbage out" if you've seen the results of mathematically optimized processes encountering physical reality. But hey, someone earned a bonus for implementing them, and its not their fault someone got the flu, a storm delayed a ship, a roadwork delayed a truck which thus arrived just after lunch hour began, the warehouse door got stuck so they had to use anoth
Re: (Score:1)
He is making the assumption that IBM is concerned with a sales drop. For the last decade and a half the only thing their awful management has cared about is executive compensation. Even after this year's awful earnings the genius Ginni said 'the results prove our strategy is working', and lo and behold they voted themselves bonuses today.
Agreed, his criticism is making the same mistake that the scientific method is there to avoid, jumping to conclusions by way of logical fallacies.
There could be any number of causes for a 3 year sales drop and many of them are the market IBM is operating in. Making snarky commentary about using Watson to automagically fix the sales drop is hyperbole not any analysis of predictive analytics or how it works
This article says nothing about the size and diversity of datasets, nothing about regression algorithms,
Re:IBM (Score:5, Insightful)
This pretty much sums up the entirety of Big Data.
Data analysis can highlight the correlations that would otherwise go unnoticed, and the "big" data sets involved help to ensure that the noticed correlations are statistically significant. With a large enough sample size, the effects of time can be eliminated from the statistics, supporting analysis of even highly-dynamic models. To a statistician, this is all trivial, given a large enough data set.
Once correlations are discovered, interpreting them in the business context is a different matter for which computers are not well-suited. As the phrase goes, correlation is not causation. A business expert must analyse the observations and figure out what it all means. There may be a correlation indicating a causal relationship, or there may be a hidden cause not covered by the available data.
Even if a causal relationship can be identified, the management may not want to act on it. Sure, the company might make more money by changing their behavior in a particular market segment, but if that segment is dying, it may not be worth the expense to change now. That's also not a task for computers, yet.
Big Data techniques are effectively just a tool. It does one job particularly well, and does a few other jobs well enough to be useful. It is still up to humans to determine if Big Data is the best tool for a particular situation.
Re: (Score:2)
> With a large enough sample size, the effects of time can be eliminated from the statistics.
Oh, dear. This is so wrong, on so many levels, I'm having difficulty even knowing where to start. But "time" is one of the most critical axes in any systems involving feedback and cannot be safely ignored.
Re: (Score:2)
It's poorly worded above, but perhaps a better way to say it is that the time-dependent churn in a particular model is negligible (to a statistical irrelevance) if you can get enough data quickly enough. Effectively, once your data stream outpaces the time-dependent effects, those effects may no longer be relevant variables in your calculations.
For example, I'd expect that Google can collect enough data in an hour to determine if a UI improvement is helpful, or if a particular change to PageRank results in
Re: (Score:2)
The problem with Big Data as I see it: information is not the same as knowledge.
Sure, there is a lot of data, as more and more information feeds are made available, but there are still a lot of hidden data. The amount of work put into hiding data is huge. Also, the amount of work put into generating data is huge too, which creates a lot of noise. The point is, a typical decision involves tiny little microscopic bits of _knowledge_, and only a small sample from the masses of information that could be waded t
Re: (Score:1)
Re: (Score:2)
Find a rusty railroad spike. Shove it through your eyeball over and over again. That's what IBM products are like.
Re: (Score:3)
Find a rusty railroad spike. Shove it through your eyeball over and over again. That's what IBM products are like.
Buy a very expensive rusty railroad spike. Shove it through your eyeball over and over again. That's what IBM products are like.
There, fixed it for you.
IBM has turned into GM (Score:1)
I've never had much of a chance to use IBM offerings. What is AIX like? What is DB2 like? What is Informix like? What is Lotus like? What is WebSphere like? What is the XL C/C++ compiler like?
IBM is repeating what General Motors has been doing, putting out junks, after junks, after junks
Decades ago it didn't matter if you bought Pontiac or Chevrolet or Buick, you bought the same fucking junk
Nowadays it doesn't matter if it is Informix or WebsSphere or AIX or DB2 ... they simply don't worth their sticker price
How can she live on such a low income? (Score:2)
Top 10 Reasons Why Ginni Rometty Will Fail as IBM's New CEO [netnetweb.com]
Summary from the article:
1. IBM Forgot Who They Were.
2. Ginni Has No Vision for the Future of IBM.
3. IBM Executives are out of Touch.
4. IBM's Sales Culture is Poison.
5. IBM's Executive Compensation is Misaligned.
6. IBM's Rape, Pillage & Burn Acquisition Strategy.
7. IBM's Offshore Model will kill its Services Business.
8. IBM Sells Futures. What is IBM's strategy
Missing the forest for the trees (Score:1)
IBM, like SAP, Oracle and the rest, are dinosaurs unable to adapt their businesses to changing markets. Why would they be able to do the same for your company?
Re: (Score:2)
IBM, like SAP, Oracle and the rest, are dinosaurs unable to adapt their businesses to changing markets. Why would they be able to do the same for your company?
Well, I'd say that fossil fuels, which are mostly composed of dinosaurs who were unable to adapt(along with plants who were unable to adapt, and various other organisms who were unable to adapt) revolutionized the hell out of our entire civilization...
Maybe if IBM were buried and subjected to a few million years of heat and pressure they too would become a highly coveted resource?
Re: (Score:3)
Re: (Score:3)
Re: (Score:3)
The dinosaurs did not die out because they were unable to adapt anymore than a person dies because they fail to "adapt" to a grenade.
Re: (Score:2)
Evolution is a cast-iron bitch sometimes. Dino's didn't adapt to the big grenade. Lots of other critters did.
(And yes, fossil fuels are composed of relatively few actual dinosaurs, it's mostly ex-plant life.)
Re: (Score:2)
Grenades and huge rocks aren't "evolutionary," they are "catastrophic."
Re: (Score:2)
Catastrophe is a critical factor in most evolutionary history. Practices and traits that were successful, successful enough to become part of the biology or lifesstyle of an organism, often fail as circumstances change. I'm afraid that abrupt changes in environment are a common, through often unpredicatable, factor in many species.
Re: (Score:2)
Catastrophe is a critical factor in most evolutionary history.
Citation, please.
Re:Missing the forest for the trees (Score:4, Interesting)
>> Catastrophe is a critical factor in most evolutionary history.
> Citation, please.
Wikipedia has a fairly good entry on "Catastrophism", and another on "Punctuated equilibrium". But even without large scale events such as dinosaur killer asteroids or the evolution of photosynthesis poisoning most species with much higher concentrations of volatile oxygen, the are much smaller and more frequent effects. Forest fires are a crtical factor in breeding jack pine trees, floods are vital to the fertility of the ecosystem near river banks, and hurricanes spread species throughout their trail and profoundly affect the ecology and evolution of areas that are likely to endure hurricanes. And catastrophes can and do create a "founder effect", where a small number of introduced species members become a new species quite quickly in their new environment.
Do I need to find individual links links for each of those?
Re: (Score:1)
Across all of the Firefox-branded products, 87% of people report being "sad" with Firefox, while only 13% are "happy" with it!
There is a problem with the sad/happy feedback classes. Which feedback type is right to pick for idea submission, neutral feedback, feedback about issues that make the user both sad and happy? What about interactions with add-ons? I personally try to send both feedback types by breaking up the issues as much as possible, sometimes failing at it. With free software, the user can be happy even with clear problems, or limitations.
"Big Data" HYPE is "bullshit". (Score:2)
none of which disproves TFA's thesis...
TFA is about the **hype**...everything described in your post is value-added...not hype
Re: (Score:2)
Big data is really a thing.
Firefox feedback is not, in any sense, a representation of big data.
Global data sets are, for lack of a better word, global.
You are, for lack of a better word, a complete and total brain-lacking vacuum.
Re: (Score:2)
You are absolutely right, only problem is that Watson doesn't perform proper statistics. It's anything but Bayesian learning.
IBM's got this (Score:3)
Reminds me of a joke (Score:5, Funny)
"Big Data" is like sex in high school. Nobody really knows for sure how to do it properly, but everyone thinks everyone else is doing it, so everyone says they're doing it, too.
Re: (Score:1)
Re: (Score:3)
Re: (Score:2, Funny)
"Big Data" is like sex in high school. Nobody really knows for sure how to do it properly, but everyone thinks everyone else is doing it, so everyone says they're doing it, too.
Well, OK, but this is slashdot. Are you sure your audience will get this analogy? Can you try to rework this into a car analogy instead?
Re:Reminds me of a joke (Score:5, Funny)
"Big Data" is like sex in high school. Nobody really knows for sure how to do it properly, but everyone thinks everyone else is doing it, so everyone says they're doing it, too.
Well, OK, but this is slashdot. Are you sure your audience will get this analogy? Can you try to rework this into a car analogy instead?
"Big Data" is like sex in a car while in high school. Nobody really knows for sure how to do it properly, but everyone thinks everyone else is doing it, so everyone says they're doing it, too.
Re: (Score:2)
Sex in a car? Sounds messy and uncomfortable...
Re: (Score:2)
I used to have a car with a back seat truly the size of a sofa, a 1960 Dodge Phoenix (2dr dart... before they shrunk it). But alas, although I actually was having sex regularly, the car had no working parking brake so I couldn't do it in the car. Haven't had a vehicle with a big enough back seat to get my freak on since. I may never lose that purity point.
Re: (Score:2)
Indeed. The one big-data project I personally see at a customer does have the advantage that the IBM-team is too stupid to actually collect the data (they just cannot hack the engineering and have been delayed for over a year now and just recently were removed from the productive platform again because they break other things). So while the customer pays them oogles of money, they at least do not get bogus analyses in return.
The fascinating thing is that I though that you do not find the combination of extr
Re: (Score:2)
I know the guy that did it. Big data is about asking the guy that did it.
If I can assign that guy an identifier, then I know you forever.
I know the girl, and I know the guy. More importantly, I know the guy that didn't go for that girl. I want to get paid.
More importantly, I want everyone to be private.
Don't pay me. I can't be bought.
But everyone else, for all practical purposes, can.
Re: (Score:2)
Indeed, and the few that actually do (or did) get it, love(d) it!
Re: (Score:1)
It would have been mush more effective if you left the last few sentences out.
Put simply, Tyson is a celebrity, not a scientist.
Re: (Score:1)
Brady Haran is neither, but he puts actual scientists on his YouTube channels, and they talk about honest science (and occasional amusing trivia), with no CGI or celebrity required. No politics, no manufactured quotes, many Nobel prizes.
Re: (Score:2)
Watson is a bad example since the goal of Watson was to be a showcase of what can be done in a particular area. It was the same with Deep Blue, the computer that win against the world chess champion Gary Kasparov. Nobody is using Deep Blue or Deep Blue like machines to play chess. This was an algorithm and architecture challenge. The same hold for Watson.
The argument using Watson's incapacity to make IBM the most profitable company in the world is then irrelevant. However, IBM is selling since a long time d
Re: (Score:2)
Actually, Watson is pretty cool as you can feed in natural language data. That removes the very expensive translation step from creating an expert system. It does not do predictions or analyses though, it is just an expert system. Expert systems can be very useful in some tasks, but are rather limited in what they can do. And no, Watson is not (true/strong/whatever) AI and at least to expert audiences IBM is not claiming it is.
SPC (Score:3)
Statistical Process Control and Western Digital rule are very applicable here. Without stability for a baseline, it's (pretty well) impossible to utilize small data, much less big data (big bad data:).
Re: (Score:1)
Er, yes (he wrote, shamefacedly).
Marketing (Score:4, Funny)
If you have a marketing department, you're wasting money.
If you hire a marketing firm, you're burning money.
If you hire a marketing firm and then take their advice, you're emptying your bank account into a volcano.
Re: (Score:1)
Actually, marketing is the soul of the business.
*Cue to the corporate-atheists that claim that business have no souls...
Re: (Score:3)
If you don't have a marketing department no one knows you exist.
Marketing is a bucket of shit at the best of times, but you can do very little without it.
Re: (Score:2)
Marketing also encompasses requirement gathering i.e. understanding what the market needs. Especially for the fast moving software industry it is a core business process and about much more than just advertising and branding.
Data scientists == web masters (Score:3)
Data scientists are this bubble's web masters. 'Nuff said.
Re: (Score:2)
Fair assessment.
research design = solution (Score:2)
these systems could be effective, but it comes down to ontology or more broadly research design
i'm not saying *any* company can benefit from "big data", but most can
the core problem is a misunderstanding of what is happening...from a to z alot of biz people are just clueless...the techies they hire to do the big data are partially responsible for this
data analysis is great...everyone does it to some level...highly complex data analysis in a biz situation must have well thought out research questions and res
Good data first, then maybe big data later (Score:5, Insightful)
Or I would have lat longs for customers that put them in 100 miles off the coast of Nova Scotia (not sable island either). Or a mostly good lat longs but if they couldn't get one then they would use the lat long of the nation's capital resulting in 20% of the customers residing in any given nation's capital which also then obscured the actual number of customers in the nation's capital.
And then dates, can nobody ever get dates right. A favourite is that round one of the system will only record the day of a transaction but later they expand their collection to the hour and minute but now the old dates are all at noon or something. So when you try to find the usage pattern of users there will be this massive spike at noon and a scattering of transactions in the rest of the day. Try and run that through a Bayesian analysis.
I can go on and on with one of my recent favorites is a phone company database where many phone calls never begin, or never end.
So I think the big bucks is not in doing an ML processing of their data using some ingenious Hadoop crap but to maybe use ML to clean the data up. And by the way if someone has a tilde(~) in their name your OCR needs to be shot.
Re: (Score:3)
Absolutely true. Unfortunately, it's far easier to convince management that the problem is the lack of a shiny tool that shows them pretty graphs than shitty data that they have to pay some consultant an ungodly amount of money to fix. Because, of course, no one in the company has the time to fix the data on which they run their business.
Re: (Score:2)
Hey now!!! Ungodly amounts of money paid to consultants is how I make my living; don't go shitting on it :)
Re: (Score:3)
And then dates, can nobody ever get dates right. A favourite is that round one of the system will only record the day of a transaction but later they expand their collection to the hour and minute but now the old dates are all at noon or something. So when you try to find the usage pattern of users there will be this massive spike at noon and a scattering of transactions in the rest of the day. Try and run that through a Bayesian analysis.
Data quality has been an issue with every project I've worked on involving data analysis or integration into a new system. One project was combining two employee databases for a merged company, where they decided to use SSNs as the key for unique records since it was a US company. Unfortunately for them, foreign employees on temporary jobs in the US often had 999-99-9999 or 123-45-6789 as SSNs, with the occasional real one thrown in. Then their were duplicate valid SSNs for employees that worked for both co
Re: (Score:2)
"Data cleanup will take twice as long, cost twice as much, and you will lose at least 10% of your data when you decide to finally give up scrubbing the data."
I like this. I will use this from now on with my client. I will be sure to give proper credit to a Registered Coward :)
Re: (Score:2)
Data cleanup will take twice as long, cost twice as much, and you will lose at least 10% of your data when you decide to finally give up scrubbing the data.
I actually independently came up with the 10% figure today as well, and mentioned to my project manager that unless he wants to invest real money chasing the long tail of data, he was going to have 10% of the records with bogus values in some fields. I will certainly adopt the rest of your quote!
I have since added a corollary: I do not do IT projects unless you pay me enough to retire on.
Here you lost me. Why were you even in this business if you didn't love the challenge? Don't take other peoples' bad data personally. Take it as an opportunity.
Re: (Score:2)
I have since added a corollary: I do not do IT projects unless you pay me enough to retire on.
Here you lost me. Why were you even in this business if you didn't love the challenge? Don't take other peoples' bad data personally. Take it as an opportunity.
I get enough work doing other things so IT work is something I can avoid unless it is lucrative enough. Most of my IT projects started out doing something differently then getting roped into staying on when they discovered I could actually deliver results. I've learned to so NO when asked to stay.
Re: (Score:2)
I see what you mean. You seem to suffer from The Curse of Competence:
http://dilbert.com/strip/2008-... [dilbert.com]
Re: (Score:2)
Yes! Dear Tea Pot! YES YES YES!!!!!!
Then you find out the transactional data is jacked because it is 1) manually entered by a third party (not the user/customer) 2) entered without regard to policy 3) maybe not entered at all. [hangs head] and then they are the very ones asking for the analysis of that same data to drive their future planning and you want to beat them over the head with your rusting slide rule!!!!!!!
Re: (Score:2)
Oh and the data input had pulldowns as a suggestion. So you could type Hal and it would suggest Halifax. But if you wanted you could just type Helifax and use that. This allowed for the easy addition of new towns and cities because in this small region they seemed to think we would be getting new towns
data science != big data ! (Score:1)
big data needs data science. data science does not need big data. data science = statistics and machine learning (mostly)
Aren't climatologists using "Big Data"? (Score:1)
The convoluted concept doesn't help (Score:3)
Watson was impressive on Jeopardy, but a TV show is a very different venue than business data analytics.
For the latter you really need a statistically sound approach in order to reach the right conclusion. [bayesia.us]
(DISCLAIMER: I do not work for Bayesia, but actually a competitor, yet any person or company that understand Bayesianism [lesswrong.com] as a sound foundation for knowledge inference knows this dirty little secret about Watson)