Follow Slashdot blog updates by subscribing to our blog RSS feed


Forgot your password?

Comment: Re:Shot in the dark: (Score 3, Insightful) 223

by arlow (#15101891) Attached to: Why Is Data Mining Still A Frontier?
It does work, but it requires judgement. A lot of people seem to think that you just shove the data into a statistical test, out comes a p-value, and if it's small enough you win. Interpreting and validating the initial hit is where 90% of the real work is, and it requires the careful application of prior knowledge and subsequent experiments. I work with a guy who's probably one of the best statisticians in the world, and he often asks me, "well, does the result make sense?" His judgement was developed over decades of looking at real data. If you just shove your data into an algorithm and take the top-scoring hits, you'll probably spend most of your time chasing bogus predictions. Algorithms are good for automating specific tasks that are essentially repeatable. Data mining requires an in-depth understanding of the specific problem you're trying to solve; you usually need to tailor your statistics so that they make sense for the problem. That's why the idea of selling someone a suite of fancy data mining software is probably useless; you need to sell them the statistican too.

probably :/

"The pyramid is opening!" "Which one?" "The one with the ever-widening hole in it!" -- The Firesign Theatre