Why Is Data Mining Still A Frontier?

Follow Slashdot stories on Twitter

Why Is Data Mining Still A Frontier? 223

Posted by Zonk on Monday April 10, 2006 @04:48PM from the let's-get-together dept.

bbsguru writes "How much do we know that we still don't know? A story in The Register points out that little has changed since Francis Bacon proposed combining knowledge to learn new things 400 years ago, despite all the computer power we now have. Scientific (and other) data is still housed in unrelated collections, waiting for some enterprising Relational Database Programmer to unlock the keys to understanding. Is RDBMS still a Brave New Frontier, or will Google make the art obsolete once they finish indexing everything?"

This discussion has been archived. No new comments can be posted.

Why Is Data Mining Still A Frontier?

Load All Comments

Search 223 Comments Log In/Create an Account

Comments Filter:

Shot in the dark: (Score:5, Insightful)

by Spazntwich ( 208070 ) writes: on Monday April 10, 2006 @04:50PM (#15101182)

Either
a) There's not enough money in it to make it worthwhile

or

b) It doesn't work.

Share
twitter facebook
- Re:Shot in the dark: (Score:5, Insightful)
  
  by Disavian ( 611780 ) writes: on Monday April 10, 2006 @04:58PM (#15101271) Homepage
  
  How about
  
  c) our ability to produce data far outstrips our ability and/or willingness to analzye it
  
  Parent Share
  twitter facebook
  - Re:Shot in the dark: (Score:5, Insightful)
    
    by flynt ( 248848 ) writes: on Monday April 10, 2006 @05:31PM (#15101529)
    
    Also, blindly "mining" data for trends can be very misleading. Hypothesis generation is usually better done some other way. There will always be trends in data we already have that are there by chance, and this is what data mining finds in many cases. Then models are fit to that data and don't validate on future samples taken, and everyone wonders why.
    
    Parent Share
    twitter facebook
    - Re:Shot in the dark: (Score:3, Insightful)
      
      by AuMatar ( 183847 ) writes:
      
      Occurences of polio go up in summer.
      People eat more ice cream in summer.
      
      Conclusion: ice cream causes polio.
      
      This was actually something people believed for a brief time before the Salk vaccine. Its also a great example of the kind of facts data mining most frequently dredges up- accidents or correlation with no real common cause.
      - Re:Shot in the dark: (Score:2)
        
        by VolciMaster ( 821873 ) writes:
        
        Conclusion: ice cream causes polio.
        
        Amazing how many people still don't understand the difference between correlation and causation. I think everyone should take an intro economisc or statistics class just to realize the difference.
        
        Re:Shot in the dark: (Score:2)
        
        by mcmonkey ( 96054 ) writes:
        
        Conclusion: ice cream causes polio.
        Amazing how many people still don't understand the difference between correlation and causation. I think everyone should take an intro economisc or statistics class just to realize the difference.
        Who has time for for an economics class? Summer is just around the corner. We've got to do something about all that ice cream out there!!
        Please, think of the children.
        (Maybe we should start adding vaccine to the sprinkles. (Am I the only one who calls them jimmies?))
  - Re:Shot in the dark: (Score:3, Insightful)
    
    by rainman_bc ( 735332 ) writes:
    
    I'm a big fan of c. As a reporting and data analyst, I see the same crap all the time.
    
    People design systems for what they want to put into it, without consider what they want to get back out of it. That usually results in crappy query performance and all that crap because of undue care. When designing a system, engineers need to be aware of : 1) What do we want to store and how do we want to store it, 2) how do we want to put it in there, 3) What do we want to get back out of it.
    
    Many people in designin
    - Re:Shot in the dark: (Score:2)
      
      by IdleTime ( 561841 ) writes:
      
      Not to mention that when the data is not normalized (as is the case with most customers I deal with), it's just a messy spagetti of data that can not be related outside their inital scope.
      - Re:Shot in the dark: (Score:2)
        
        by budgenator ( 254554 ) writes:
        
        not all data should be normalized (accounting data jumps to mind), but most data that should be normalized isn't
        
        Re:Shot in the dark: (Score:2)
        
        by lrichardson ( 220639 ) writes:
        
        One of the database I play with (and they pay me, too!) is Essbase ... in one sense, the most highly normalized form possible. And yes, accounting data goes in it.
        Think star schema, with the central table containing just numerical 'facts'. Each record's key links to every other table, and, for query optimization, we've got just one 'fact' per record. Payments, APR, Balances, they all get slapped in.
        It's one of the best OLAP tools I've seen. A hell of a lot of work to do it 'right', like ten hours pr
      - Re:Shot in the dark: (Score:2)
        
        by rainman_bc ( 735332 ) writes:
        
        Of course when data is normalized poorly sometimes you end up with impossible joins, only solved by unions.
        
        A specific case comes to mind. A guy I worked with wanted to design a billing system. He had six tables represnting detail lines on the invoce. Each table had identical fields except for a few items. The data should not have been normalized because a report on invoicing would have required a six table union. Unacceptable IMO.
        
        Re:Shot in the dark: (Score:3, Funny)
        
        by Shimmer ( 3036 ) writes:
        
        Doesn't sound very normalized to me. Those "identical fields" should have been moved into their own table.
        
        Re:Shot in the dark: (Score:2)
        
        by jbolden ( 176878 ) writes:
        
        If you are using a real database look up "materialized views". If you aren't then add this to the list of reasons you should be.
    - Re:Shot in the dark: (Score:2)
      
      by Doctor Faustus ( 127273 ) writes:
      
      3) What do we want to get back out of it.
      
      Many people in designing systems pass over 3.
      
      Good. The desired results change too often to put them in the data model.
      
      It's been my experience that the best database designs come from focusing on a layout that makes sense. Aside from recursive hierarchys (which are a special case because they don't fit into the relational model very well), you should only need to look much at the actual queries you expect to run when you're deciding on indexes.
  - Re:or... (Score:2)
    
    by symbolic ( 11752 ) writes:
    
    Our ability to produce meaningful results, in most cases, is little more than a crapshoot.
  - Re:Shot in the dark: (Score:2)
    
    by KefabiMe ( 730997 ) writes:
    
    c) our ability to produce data far outstrips our ability and/or willingness to analzye it
    
    Wouldn't that be the same as b) it doesn't work?
    - Re:Shot in the dark: (Score:2)
      
      by Tim C ( 15259 ) writes:
      
      No, his c) is more like "I could do it, but I really can't be bothered".
  - Re:Shot in the dark: (Score:2)
    
    by drinkypoo ( 153816 ) writes:
    
    C is the "all of the above" choice. Willingness is tied to monetary reward. Ability falls under "it doesn't work". Congratulations on your +5 score on a totally redundant comment.
  - Re:Shot in the dark: (Score:2)
    
    by Miraba ( 846588 ) writes:
    
    You get cookies from me. It's especially true when scientists do field work, since the emphasis is to take as much data as possible.
    
    Real World Example: This past summer, I went to Cyprus for a field survey (surface examination and collection, no digging involved). In three weeks of 15 people working 4 hours a day, we grabbed over 10,000 pieces of worked stone. A proper excavation will yield enough data for an academic lifetime, but only a small percentage will ever be thoroughly analyzed and published.
- Clarification of your "b". (Score:2)
  
  by khasim ( 1285 ) writes:
  
  It doesn't work.
  
  How about "It doesn't work the way the vendor/consultant/salesguy/magazine said it would."
  
  The information you get out depends upon the data you put in.
  
  The people looking to "find" information in the data are the same people who decided what data to collect in the first place. And from whom to collect it. Etc.
  
  That means that you'll find out that 2004 was a banner year for bubblegum ice cream. But you won't know what will be popular in the summer of 2006.
- Re:Shot in the dark: (Score:5, Informative)
  
  by Daniel Dvorkin ( 106857 ) * writes: on Monday April 10, 2006 @05:03PM (#15101313) Homepage Journal
  
  Neither of those is quite true -- a lot of entities public and private are throwing a lot of money at data mining research, reasonably expecting a big payoff, and sometimes it gets very good results indeed. The basic problem is that, as with any worthwhile CS question, doing it well is hard. It is very easy to come up with false connections between data. Sorting the wheat from the chaff in any kind of automated or even semi-automated fashion, OTOH, is an enormous challenge.
  
  Analogies like this are always dangerous, but I'd say data mining now is about where language development was in the mid-1950's, when FORTRAN was first being developed. IOW, we have a set of tools that kind of work, most of the time, for certain applications -- but we can pretty much guarantee that they're not the best possible tools, and that we will build better ones. Consider how much work is still going on in language development half a century later, and you can see how much room there is for further development.
  
  Parent Share
  twitter facebook
  - Re:Shot in the dark: (Score:5, Informative)
    
    by Coryoth ( 254751 ) writes: on Monday April 10, 2006 @06:15PM (#15101791) Homepage Journal
    
    a lot of entities public and private are throwing a lot of money at data mining research, reasonably expecting a big payoff, and sometimes it gets very good results indeed. The basic problem is that, as with any worthwhile CS question, doing it well is hard. It is very easy to come up with false connections between data. Sorting the wheat from the chaff in any kind of automated or even semi-automated fashion, OTOH, is an enormous challenge.
    
    I would suggest that, in practice, the real difficulty is that the problems that need to really be solved for data mining to be as effective as some people seem to wish it was are, when you actually get down to it, issues of pure mathematics. Research in pure mathematics (and pure CS which is awfully similar really) is just hard. Pretending that this is a new and growing field is actually somewhat of a lie. It's avery very old field which people have been working on for a very long time, to the point where the problems that remain to be solved are incredibly difficult. What is new is someone other than pure mathematicians taking much interest in these problems. Do a search for "non linear manifold learning" on Google and you'll see what I mean.
    
    Jedidiah.
    
    Parent Share
    twitter facebook
    - Re:Shot in the dark: (Score:2)
      
      by TubeSteak ( 669689 ) writes:
      
      I would suggest that, in practice, the real difficulty is that the problems that need to really be solved for data mining to be as effective as some people seem to wish it was are, when you actually get down to it, issues of pure mathematics.
      Don't forget that if you ask the wrong questions you get either:
      A. Wrong Answers
      or
      B. Garbage
      
      Having computers crunch data to look for relationships is all well and good, but you're almost always going to need someone to interpret the results to make sure they aren't A or
    - Re:Shot in the dark: (Score:4, Interesting)
      
      by asuffield ( 111848 ) writes: <asuffield@suffields.me.uk> on Monday April 10, 2006 @09:25PM (#15102870)
      
      I would suggest that, in practice, the real difficulty is that the problems that need to really be solved for data mining to be as effective as some people seem to wish it was are, when you actually get down to it, issues of pure mathematics.
      
      That's part of the problem.
      
      Another part is computational complexity. No, I'm not kidding. These things are often in like the second and third powers of the data set size. The data sets are often terabytes in size. We don't have computers that big, and by the time we do, we'll probably have bigger data sets. Contemporary data mining is an exercise in finding a fast enough approximation that is accurate enough to look convincing. We're not really sure how accurate they actually are - most of the time, there's no way to find out for certain. "Probably good enough" is the best you normally get. Some researchers can put a number on that 'probably' for you, eventually. Mostly they just compare the available approximations and tell you which one works the best.
      
      The biggest problem is the inability to figure out intelligent things to do with it. Computers aren't smart. You can't just hand them a heap of data and say "find me the things I want to know". You have to work out what the patterns in the data are for yourself, then do pure math research to turn those patterns into a mathematical model. Then you have to come up with useful questions to ask that model. That's two major insights plus several years of work - and most researchers only have one major insight in their entire career. Just to figure out what question to ask. Data mining is then the process of repeatedly answering that question for all possible values of the parameters. And the answers you get out will only be as good as the model you invented. The current method for discovering usable patterns in data is trial and error.
      
      I think that 'data mining' is more or less a frontier by definition. It's all the things we don't yet know about the data we currently have which would take a huge amount of effort to discover. Most unsolved problems in mathematics could probably be called 'data mining problems': if an answer exists, it can be derived from the existing body of theory. Most decisions that people make, from deciding whether to eat now or later, to deciding whether to invade a foreign nation, can also qualify. The sheer range of things it could cover means that there will probably always be vastly more unsolved problems than solved ones.
      
      Parent Share
      twitter facebook
      - Re:Shot in the dark: (Score:2)
        
        by RollingThunder ( 88952 ) writes:
        
        We're about to roll out the "new" data warehouse at work. It's gonna start at 60 TB. I pray to god we never have to restore the bloody thing.
      - Semantic web (Score:2)
        
        by inKubus ( 199753 ) writes:
        
        Maybe Tim Berners-Lee and his semantic web will make something happen. That's the real problem. When you have to write like 30 or 40 layers of SQL queries to get what you want, and then to get a decent report you have to spend 100 hours in crystal or make compromises, and in the end all you have is more data. What is the MEANING of the data? I think a lot of the knowledge of humanity is stored in words and books and not indexed. Most db data is just statistics, which are useless ;)
        
        What if you could "exp
      - Re:Shot in the dark: (Score:2)
        
        by asuffield ( 111848 ) writes:
        
        So to say todays computers can't handle it is crap. The problem is purely around not knowing how to process it
        
        That's what I said. Nobody knows a way to process it that today's computers can handle. We *do* know several ways to process it that those computers *can't* handle.
        
        As to your claim that "we're storing hundreds of terabytes of data, obviously we can handle it" - you're just storing data. The problem is computational complexity, not storage. The well-known 'right' answers to most data mining problems
    - Re:Shot in the dark: (Score:2)
      
      by RussP ( 247375 ) writes:
      
      I think the fundamental problem is the lack of structure of most of the information on the Internet. It's mostly just one gigantic blob of amorphous text. Google may have a great search engine, but I am tired of getting results on condoms when I want information about LaTeX typesetting. XML was supposed to help solve this problem, but I'm still waiting for it to happen.
  - Re:Shot in the dark: (Score:5, Insightful)
    
    by plover ( 150551 ) * writes: on Monday April 10, 2006 @06:33PM (#15101897) Homepage Journal
    
    I have to wonder if data mining isn't the problem -- the real problem seems to be that there are few obvious problems data mining will solve.
    Consider WalM*rt. When the 2005 hurricanes were predicted, they mined their sales data for previous hurricanes. They found that in the last hurricane people stocked up on beer, pop tarts and peanut butter, so they sent trucks full of that stuff to the stores in the path of the hurricanes. They made lots of sales, and provided a valuable service to the communities. Capitalism at its finest.
    Data mining worked very well in this case. The issue was "here's an obvious problem, and a clever solution involving data mining."
    The big problem is that people expect the same golden results from non-obvious situations. "Hey, sales are down in the Wisconsin stores, let's do some data mining to figure out what they'll buy" makes no sense. Data mining worked well in the case of an obvious trigger event, but data mining by itself didn't reveal the trigger. You can't predict hurricanes based on the sales of pop tarts and beer, for example.
    But, can you ever correlate pop tart and beer sales to an external event? You might be able to go back and say "here's a strange case where pop tarts and beer sold out quickly, why did this happen?" If you can tie this to external events, you'd think you'd be better prepared to react to the same events in the future.
    Maybe correlating sales to Google News is the next step? Republican scandal == lower white bread sales; French riots + Senate bickering over immigration control reform == higher 'Peeps' sales; etc. p. Or maybe it's always been a bad idea to equate correlation with causality.
    
    Parent Share
    twitter facebook
    - - Re:Shot in the dark: (Score:2)
        
        by plover ( 150551 ) * writes:
        
        First of all, Walmart's response to last year's hurricanes was noble. They donated over a thousand trucks full of relief supplies, and I don't want to take anything away from that.
        But that's not what I was talking about. I'm in the retail industry, and keep one eye facing Walmart (everyone in retail does.) The "beer and poptarts" story was one of those stories that circulated about the same time as the hurricane, so I can't quote exactly which source I got it from first (could have been at a department
    - - Re:Shot in the dark: (Score:2)
        
        by plover ( 150551 ) * writes:
        
        you're only interested in corrrelation, if its going to hold true in this case.
        There's the problem. "If". Let's say the data mining came up with a correlation between French riots and immigration legislation with the sale of Peeps. Next Easter you're going to disappoint a lot of shoppers when you don't have Peeps available; and next fall when rioting and legislation happen to hit the news at the same time, you're going to have a lot of wasted Peeps on your store shelves.
        You may say "Of course riots h
  - Re:Shot in the dark: (Score:2)
    
    by cafeman ( 46922 ) writes:
    
    The basic problem is that, as with any worthwhile CS question, doing it well is hard. It is very easy to come up with false connections between data. Sorting the wheat from the chaff in any kind of automated or even semi-automated fashion, OTOH, is an enormous challenge.
    
    I'll respectfully disagree. There's a very large number of organisations that are using predicitive modelling through data mining to conduct various forms of customer scoring and analytical CRM activities. These are being used in a pro
- Re:Shot in the dark: (Score:3, Insightful)
  
  by delete ( 514365 ) writes:
  
  Or
  
  c) The title of this submission is inaccurate, as data mining tools are both useful and financially lucative in a wide variety of domains today, particularly bioinformatics, image analysis and text mining.
  
  Of course, the title of this article is quite ambiguous and misleading: the article itself is concerned with RDBMS, rather than the statistical analysis of data.
  - Re:Shot in the dark: (Score:2)
    
    by mizhi ( 186984 ) writes:
    
    Maybe I'm missing something. The article title suggests that datamining is not a frontier of research, the summary insinuates that there are no more uses for RDBMS systems since we have google, and the actual article talks about the use of MS SQL server to discover patterns in a set of data more efficiently and seemed to insinuate that many researchers overlook these technologies to analyze their datasets.
    
    If anything, the article is support for the use and continued development of datamining technologies.
    
    M
- Re:Shot in the dark: (Score:2)
  
  by guitaristx ( 791223 ) writes:
  My "Shot in the dark" goes like this:
  
  Manager-type person wants to start collecting data from which data mining should occur.
  
  Manager-type person finds publically-available, easy-to-process data, and assumes that all data has the same attributes.
  Manager-type person fails to make the distinction between qualitative and quantitative data.
  Manager-type person fails to make the distinction between real data and derived data (i.e. data that can be calculated from other data).
  Manager-type person fails to unde
- Re:Shot in the dark: (Score:3, Insightful)
  
  by arlow ( 649026 ) writes:
  
  It does work, but it requires judgement. A lot of people seem to think that you just shove the data into a statistical test, out comes a p-value, and if it's small enough you win. Interpreting and validating the initial hit is where 90% of the real work is, and it requires the careful application of prior knowledge and subsequent experiments. I work with a guy who's probably one of the best statisticians in the world, and he often asks me, "well, does the result make sense?" His judgement was developed over
- Re:Shot in the dark: (Score:3, Insightful)
  
  by polv0 ( 596583 ) writes:
  
  I'm a statistician and data mining consultant, and i've implemented models based on millions of records generating consulting fees in the high hundreds of thousands of dollars. I thus have a strong understanding of the data, modeling and project management aspects of data mining ventures.
  
  I believe there are several fundamental factors required to make a data-mining project succesful:
  
  1) A mathematically precise definition of what it is to be modeled (the response) as in the probability of purchasing
- Re:Shot in the dark: (Score:2)
  
  by tacocat ( 527354 ) writes:
  
  I think it's more a problem of access to the data for purpose of mining. In order to do any meaningful dataming you have to have a few barriers removed. Namely:
  
  It has to be cheap to access. This is in terms of network costs, labor costs, and most importantly everyone believes that they can make a profit if they sell the access to their data. For data mining purposes, this becomes cost prohibitive. You have to Free the Data.
  It has to be legal to access. As time goes on, the amount of data, or the typ
- - Re:Shot in the dark: (Score:2)
    
    by cafeman ( 46922 ) writes:
    
    But the simple fact is that once you have enough data available, you can "mine" any result you want! Datamining is not about letting the data lead you to certain conclusions. It's all about trying to find things in the data that "hidden" - things that really aren't there when the data are properly analyzed.
    
    Depends what you mean by "data mining". As the other reply has already said, bad statistics is bad statistics, regardless of the name. There's plenty of techniques in use to prevent spurious or misl
What does the article have to do with the subject? (Score:3)

by xxxJonBoyxxx ( 565205 ) writes: on Monday April 10, 2006 @04:56PM (#15101244)

"...correlating Henslow's plant collections with the time of collection, the people involved, Darwin's published work and so on using a card index, was woefully inefficient. He designed a database to hold all the information available from Henslow's collections..."
This still looks like a basic, specialized database to me. Where's the great leap to "all your data are belong to us?"

Share
twitter facebook
Companies are doing it, but... (Score:4, Insightful)

by deanj ( 519759 ) writes: on Monday April 10, 2006 @04:56PM (#15101248)

There are companies and research project that are doing this sort of thing. The trouble is, there are a LOT of people that are freaking out about it, and that's making companies less willing to 1) admit they're doing it, and 2) even think about starting to do it.

Considering how up and arms people are about it, how long before we have people accusing others of "data profiling"?

Share
twitter facebook
- Re:Companies are doing it, but... (Score:3, Interesting)
  
  by castoridae ( 453809 ) writes:
  
  Well there are a lot more areas where data mining is useful than just mining for consumer habits. People are freaking out about mining of personal information - ChoicePoint, Locate Plus, Lexus Nexus, to name a few examples - the article is discussing the lack of data mining in science and actually claims that data mining is commonplace in business.
  
  A snippet from the article:
  
  the tools taken as routine in business are being overlooked in academia
  
  I can't see anybody getting upset about scientific data mining
  - Re:Companies are doing it, but... (Score:2)
    
    by inKubus ( 199753 ) writes:
    
    Legally change your name to John Smith and switch social security numbers once a month. That will teach them.
I tell you why (from a bioinformatics viewpoint) (Score:5, Insightful)

by Neil Blender ( 555885 ) writes: <neilblender@gmail.com> on Monday April 10, 2006 @04:57PM (#15101250)

Programmers have no idea of context. Biologists have no idea about programming. It is very hard to mix the two. You can be the shit-hottest dba in the world but if you have no relevant (deep) biology background you are guaranteed to produce crap. Almost every piece of biological software is a POS because of this.

Share
twitter facebook
- Re:I tell you why (from a bioinformatics viewpoint (Score:2)
  
  by networkBoy ( 774728 ) writes:
  
  So what you need is a so-so dba who has a passionate hobby of biology to hack something together, then the real dba's can tune it and the biologists can hack it and then you will have speciation withing the code (AKA a fork) and everything will be as it was.
  
  Balence, restored.
  -nB
  - Re:I tell you why (from a bioinformatics viewpoint (Score:3, Insightful)
    
    by Anonymous Crowhead ( 577505 ) writes:
    
    So what you need is a so-so dba who has a passionate hobby of biology to hack something together, then the real dba's can tune it and the biologists can hack it
    
    Well, that's pretty much how it works in academia (+/- the real dba). Problem is that this is a lab by lab (or department) solution to problems that appear in hundreds or thousands of institutions. The wheel is reinvented over and over again because either commercial/free solutions suck or don't exist. The commercial versions suck because they ar
  - Re:I tell you why (from a bioinformatics viewpoint (Score:2)
    
    by espressojim ( 224775 ) writes:
    
    This sounds like bioinformatics.
    
    The um...field I've been working in for the last 6 years.
    
    Programming + Biology + Statistics + Algorhitm development.
- Re:I tell you why (from a bioinformatics viewpoint (Score:2)
  
  by moochfish ( 822730 ) writes:
  
  I'm pretty sure any elegant solution would be blind to the context of the implmentation.
  - Re:I tell you why (from a bioinformatics viewpoint (Score:2)
    
    by quanticle ( 843097 ) writes:
    
    Any solution general enough to be blind to the context of implementation would either be so slim that you'd have to add context-specific information to it in order to get anything done, or so fat that it'd try to be everything to everybody and would end up being nothing to nobody.
    - Re:I tell you why (from a bioinformatics viewpoint (Score:2)
      
      by TheSpoom ( 715771 ) * writes:
      
      Indeed. This is why I prefer a compromise: modularity. Generalization in the parent software, specialization in the modules. Plus it allows for third parties, if they so choose, to easily integrate with the parent software.
- Re:I tell you why (from a bioinformatics viewpoint (Score:3, Interesting)
  
  by TrappedByMyself ( 861094 ) writes:
  
  Hmmm, why don't the developers and biologists...gasp!....work together to design something? Yes, the developers may have to actually listen to the biologists and not spend their days doing cool programming tricks, and the biologists may actually have to do real requirememns work. If no one wants to put the effort in, then no one has the right to bitch about the results.
  - - Re:I tell you why (from a bioinformatics viewpoint (Score:2)
      
      by jlarocco ( 851450 ) writes:
      
      We'll for starters, you get developers convincing the biologists that they need Oracle...and it only goes downhill from there.
      Is it possible the developers are saying something like "It starts out with the biologists saying they need 30 TB of data available 24/7 with 99.999% uptime and 200-250 concurrent users, and goes downhill from there..."
      Simply saying the developers are idiots because they suggest Oracle really doesn't make sense without more context. If more than one group of developers sugges
      - Re:I tell you why (from a bioinformatics viewpoint (Score:2)
        
        by Hast ( 24833 ) writes:
        
        There's a reason it's so expensive...
        
        Because Larry Ellison needs a new sub-woofer [bizjournals.com]?
- It's all in the management. (Score:2)
  
  by Ruff_ilb ( 769396 ) writes:
  
  I used to work a simple job where I did database work for a company doing medical studies. It wasn't a lab, but it wasn't your typical cubicled office either. Although I had very little knowledge on the actual medical component of the studies I was doing, certianly not enough to design the stuff I needed to do, the management was superb - I wasn't REQUIRED to know anything about the medical component, and they trusted me to do the programming. What I didn't know they were happy to fill me in on - I knew eno
- Re:I tell you why (from a bioinformatics viewpoint (Score:2)
  
  by ebuck ( 585470 ) writes:
  
  I happen to be one of those few fools that have both a degree in Biology and in Computer Science. And at one time I relied on my research skills in Biology as my ONLY income, until the dreaded and softly spoken "balancing" of the budget that spelled doom to most low level Biologists of my time.
  
  It is hard to mix the two. This is even more frustrating if you're marginally inclined to understand where things come from and how they are designed. Some of the earliest proponets of object oriented software prog
- Re:I tell you why (from a bioinformatics viewpoint (Score:2)
  
  by dodobh ( 65811 ) writes:
  
  Then wouldn't it be useful for the biologists to define the context for the programmers? It shouldn't be impossible to do so (very hard, I will grant you).
  - - Re:I tell you why (from a bioinformatics viewpoint (Score:2)
      
      by Miraba ( 846588 ) writes:
      
      Its easy enough to give the basics (DNA makes RNA makes Protein(1)) its that biology is wall to wall special cases. Biological systems run the worst spagettee code you can imagine written in a language thats barely documented(2), written by a developer who is willing to hack the executable, the source code, the compiler, the operating system and in extreme cases the hardware to get a functional system.
      (1) except RNA can 'make' DNA, RNA can act like a protein (enzyme)
      (2) Using language only comprehensibl
Data Mining != RDBMS (Score:2)

by EraserMouseMan ( 847479 ) writes:

Having the data in an RDBMS is only the first step to being able to mine data for knowledge. Data mining is a whole different discipline that requires statistical analysis of the aggregated data to find trends, etc.
Aristotle (Score:2)

by Bacon Bits ( 926911 ) writes:

Huh? Francis Bacon? Didn't Aristotle claim he created logic in his Prior Analytics? With his four types of statements (A is true about all X; A is false about all X; A is true about this X; A is false about this X) and the basic logical syllogism? The whole point of logic is to preserve truth so you can synthesize new knowledge.
- Re:Aristotle (Score:3, Funny)
  
  by TRACK-YOUR-POSITION ( 553878 ) writes:
  
  Bah! Aristotle couldn't tell a horse's head from an animal's head!
- Re:Aristotle (Score:2)
  
  by aminorex ( 141494 ) writes:
  
  That's true of one form of logic, but not true of others. "Logic" has come to be a very large and fuzzy thing in the last 100 years or so.
  - Re:Aristotle (Score:2)
    
    by Bacon Bits ( 926911 ) writes:
    
    True enough, but Mr. Bacon's statement was 400 years old. 19th century logic, while much more advanced than simple syllogisms, is entirely out.
Re: (Score:2)

by account_deleted ( 4530225 ) writes:

Comment removed based on user account deletion
- Re:Semantic Web goodness (Score:2)
  
  by poot_rootbeer ( 188613 ) writes:
  
  Too bad that the Semantic Web is a pipe dream at the moment.
  
  Too bad that the Semantic Web will always be a pipe dream, at least until the day comes when it's possible for a computer to understand the semantic content of a document with zero hinting from the author. The potential is there, but the willingness of humans to spend time explaining semantic structures to machines, when they're obvious enough to other humans, is lacking.
- Re:Semantic Web goodness (Score:3, Insightful)
  
  by TrappedByMyself ( 861094 ) writes:
  
  Datamining would be a piece of cake if all data were kept in clear, standard XML dialects. See Visualising the Semantic Web , ed Geroimenko and Chen (Spring Verlag, 2004). Some of the possibilities of combing through information and elucidating it, combining it and converting it described in that book are simply awesome. Too bad that the Semantic Web is a pipe dream at the moment.
  
  Well, XML is not really import. The problem lies in going from the infinite real world to a well defined ontology or whatever
  - Re:Semantic Web goodness (Score:2)
    
    by Narphorium ( 667794 ) writes:
    
    And the stuff they've produced is still academic level. The average high school kid isn't going to be hacking OWL into his web pages.
    The average high school kid has an RSS feed on their blog.
    The average high school kid listens to MP3s tagged with IDV3 metadata.
    The average high school kid annotates thier photos on Flickr with semantic metadata.
    The average web user may not know what the Semantic Web is but that doesn't mean they're not using it.
Privacy (Score:2)

by gurps_npc ( 621217 ) writes:

Privacy concerns stopped a lot of data mining.
Another thing is that it is only usefull for information we don't already know.
We don't exactly need data mining to realize that people that buy diapers also buy baby food.
- Re:Privacy (Score:2)
  
  by scdeimos ( 632778 ) writes:
  
  We don't exactly need data mining to realize that people that buy diapers also buy baby food.
  
  Old people buying diapers tend to go with the generic brand sardines, actually.
- - Re:Privacy (Score:2)
    
    by plover ( 150551 ) * writes:
    
    Many people refuse to believe it's not personal. And in most cases it is personal. It's long been known that repeat customers are the most profitable, by a wide margin. With nothing else to go on, go back to your previous customers. It doesn't take long for them to feel "picked on".
    The other side is that some places use loyalty cards which actually advertise and use the loss of privacy as a selling point: "This is a personal promotion just for you, PHILIP J. FRY!"
    Some people are comfortable giving i
  - Re:Privacy (Score:2)
    
    by gurps_npc ( 621217 ) writes:
    
    A lot of it is personal information. Here is a simple one.
    You sign up for a grocery datamining card. You give them your name, phone, address, and they give you a card to scan when you buy groceries. Now you use it to buy things. Among other things you buy:
    a six pack of beer. Every day.
    tampons, even though you are a man.
    stop buying tampons, but pick up some penicillian at the pharmacy in the back.
    These things are very, very personal. And they have your name, number, address.
Because it's not sexy (Score:5, Insightful)

by beacher ( 82033 ) writes: on Monday April 10, 2006 @05:02PM (#15101301) Homepage

From my expierience - The people who are subject matter experts in their field (outside of computers) and typically don't have the time to perform all of the data entry. So you have to get an ETL / Miner to do all of the work for you. ETL and data mining are *NOT* the sexiest jobs in the industry by a long shot. Auditing data makes you want to gouge your eyes out after the fourth day straight of reviewing loads.

Share
twitter facebook
- Re:Because it's not sexy (Score:5, Interesting)
  
  by Coryoth ( 254751 ) writes: on Monday April 10, 2006 @06:03PM (#15101735) Homepage Journal
  
  As someone who has done datamining, ETL, and data auditing for very large systems (every transaction on every slot machine in a large Las Vegas casino for 5 years or so) I can assure you that the problem is not lack of data or issues with data entry. The problem, simply put, is that analysis is hard. The data is sitting there, but extracting meaningful information from it is far harder than you might imagine. The first hard part is determining what constitutes meaningful information, and yes that requires subject matter experts. Given the amount of money that can be made with even the slightest improvement, getting subject matter experts to sit down and work with the data people was not the problem. The problem is that, in the end, even subject matter experts can't say what is going to be meaningful - they know what sorts of things they currently extract for themselves as meaningful, but they simply don't know what patterns or connections are lying hidden that, if they knew about it, would be exceedingly meaningful. Because the pattern is a subtle one that they never even thought to connect they most certainly couldn't tell you to look for it. The best you can do is, upon finding an interesting pattern, is say "suppose I could tell you ..." and wait for the reaction. Often enough with some of the work I did they simply didn't know how to react: the pattern was beyond their experience; it might be meaningful, it might not, even the subject matter experts couldn't tell immediately.
  
  So how do you arrive at all those possible patterns and connections? If you think the number of different ways of slicing, considering, and analysing a given large dataset is anything but stupendously amazingly big then you're fooling yourself. Aside from millions of ways of slicing and dicing the data there are all kinds of useful ways to transform or reinterpret the data to find other connections: do fourier transforms to look at frequency spaces, view it as a directed graph or a lattice, perform some manner of clustering or classification against [insert random property here] and reinterpret, and so on, each of which expose whole new levels of slice and dice that can be done. If you'ev got subject matter experts working closely with you then you can at least make some constructive guesses as to some directions that will be profitable, and some directions that definitely will not be, but in between is a vast space where you simply cannot know. Data mining, right now, involves an awful lot of fumbling in the dark because there are simply so many ways to analyse the sort of volume of data we have collected, and the only real way to judge any analysis is to present it to human because our computers simply aren't as good at seeing understanding an interpreting patterns to trust with the job. Anytime a process has to route everything through humans you know it is going to be very very slow.
  
  Jedidiah.
  
  Parent Share
  twitter facebook
Chloe to the rescue (Score:2)

by Christopher_G_Lewis ( 260977 ) writes:

They use it all the time on 24.
- Re:Chloe to the rescue (Score:2)
  
  by patio11 ( 857072 ) writes:
  
  Yeah. Chloe is apparently the only person in the office who knows the proper syntax for the all-powerful "cross-reference" operator. And she's hampered by incompetent upper management who, in all the years between the series, never thought to say "Hey, Chloe, cross-reference Los Angeles and upcoming terrorist attack", which would solve most seasons in three minutes or less.
  I think Jack tells Tony to keep Chloe in reserve so he can play the hero more.
FTFA . . . (Score:2)

by Dausha ( 546002 ) writes:

"Darwin was his pupil (Henslow helped arrange for Darwin's presence on the Beagle), but Darwin made the intellectual leap that allowed him to interpret Henslow's records of variation - not as evidence of a fixed set of created species with variations, but as evidence of the evolution of new species in action."

Hmm, I read recently that Darwin's grandfather was also a Naturalist, as was Chuck. So, I don't think Darwin made the "leap," so much as his family was already in that direction. Methinks the article p
Data mining is DIFFICULT (Score:5, Informative)

by GlobalEcho ( 26240 ) writes: on Monday April 10, 2006 @05:19PM (#15101436)

The blurb hit on a fundamental reason data mining is still at (or beyond) the horizon...defining relations between the various elements is hard. Available datasets are not themselves in anything like normal relational form, and so have potential internal inconsistencies. And that gets in the way before you even have the chance to try to form intelligent inferences based on relations between data sets, which of course are terribly inconsistent.

Consider the following boring but difficult task I was given: two large organizations were to merge, each with a portfolio of about 100,000 items. Each item had a short history, some descriptive information, and some data such as internal quality ratings or sector assignments. This data was available (for various reasons) as big CSV file dumps. Questions to answer were: (1) how much overlap did the portfolios have? (2) were the sector distributions similar?

These are very simple, concrete questions. But you can imagine that since the categorizations differed, and descriptors differed within the CSV files, let alone between the two, the questions were difficult to answer. It required a lot of approximate matching, governed intelligently (or so I flatter myself).

Contrast this situation with what people typically think of as data-mining: answering interesting questions, and you can appreciate that without a whole lot of intelligence, artificial or otherwise, those questions will be unanswerable.

Share
twitter facebook
- Re:Data mining is DIFFICULT (Score:2)
  
  by inKubus ( 199753 ) writes:
  
  Standards are the key. I work in the mortgage banking business and they are trying to build standards for data as there are really only a limited number of relevant fields and everyone in the industry uses the same sort of format. This is largely due to extensive government regulation and oversight (which has held the industry back, of course). There are thousands of fields, but it's not a huge deal to make a big list of them. What it will do is help everyone do business more efficiently because banking
Nothing to do with Technology (Score:4, Informative)

by wdavies ( 163941 ) writes: on Monday April 10, 2006 @05:21PM (#15101461) Homepage

This is a hoary chestnut. I have a masters in AI, and a PhD in machine learning (and had a lot of interest in machine discovery).

The ultimate problem, is that for most datasets, there are an infinite (at least), set of relations that can be induced from the data. This doesn't even address the issue, that the choice of available data is a human task. However, going back to assuming we have all the data possible, you still need to have a specific performance task in mind.

Think of this in terms of permutations. Lets say you have variable A, B, and C. They are all binary (have values 1 or 0). Now, you are given a set of these assigments (eg A=1, B=1,C=1, A=1,B=1, C=1, and so on). Now, try to tell me what the correct partition is. Sort them in to two sets of any size. See the problem ? I didn't tell you what I wanted as characteristics of those sets - so in effect, they are all possible good partitions.

So, data-mining ultimately relies on human's deciding what they want to read from the tea-leaves of the data.

Now, give it up, and start addressing issues of efficient algorithms given that you have a specific performance task :)

Winton

Share
twitter facebook
- - Re:Honest question from serious lackey- (Score:2)
    
    by alienmole ( 15522 ) writes:
    
    You haven't given enough information to go on. Points 2 & 3 are far too general. No offense, but point 2 reminds me of sales execs who ask software developers to "just give me a single button that does what I want". It's all very well to talk about a "single, simple UI" to do something very complicated, but it's something entirely different to design and implement such a UI. Think of existing applications and tell us which ones do something like your point 2. If there are any, then how is your syst
What's the question? What are the barriers? (Score:2)

by g8orade ( 22512 ) writes:

I think the issue with Google or other search engine is how to do analytics.
How do I write a multi-variable where clause?
How do I ask a multi-variable question and then hone it or drill into it along one or more parameters, unfolding detail but preserving multiple layers of an outline hierarchy?

So just there is the idea of a different presentation layer, hierarchy and tabular perhaps.

Then, what kind of barriers do I have to getting at the data? Privacy issues? Copyright or patent issues?

If you want to conne
title misleading (Score:2)

by flynt ( 248848 ) writes:

After RingTFA, this doesn't seem to be about data mining in the computer science/statistic sense at all. Instead, the article suggests that scientists in academia aren't using the best database tools and techniques available. This I agree with strongly, there is often a disconnect between experiments done in scientific fields and proper database techniques to store that data efficiently. However, I don't call that data mining.
TFA (Score:2)

by wfberg ( 24378 ) writes:

What about that TFA? Some one converted a stack of indexcards to a relational database? And this warrants a post on regdeveloper AND slashdot, exactly why?
Like there aren't things to write about like the Open Archives Initiative Protocol [openarchives.org].. Geez.
Copyrights in the way? (Score:2)

by miffo.swe ( 547642 ) writes:

Correct me if im wrong but arent copyrights the biggest obstacle against this? You canl only mine your own data as IBM and others already does today. Im interested in when you can mine data from all the various sources and combine those into conclusions. File formats are another thing hampering this kind of technology, especially if you look at it in a longer time frame. Try mining those Lotus 123 documents for historic facts ;D
Google and Self-joins (Score:2)

by CrazedWalrus ( 901897 ) writes:

I just want to comment on this question from the summary:
[...]or will Google make the art obsolete once they finish indexing everything?
Isn't the value of relational databases in the ability to "relate" indexed datasets? Google doesn't support a "join" syntax, as far as I know.

Even Google's fantastic text indexing doesn't break the data up into the discreet "fields" that would be needed to do any meaningful relating. It's sort of like having all of your data in a single column in a single table, and tryin
Easy answer (Score:2)

by El_Muerte_TDS ( 592157 ) writes:

How much do we know that we still don't know?

We don't know
The problem is both easier and more difficult (Score:4, Insightful)

by zappepcs ( 820751 ) writes: on Monday April 10, 2006 @05:44PM (#15101612) Journal

The problem is both easier and more difficult than it first appears, or even second and third times:

Data, whether held in databases (usually nice and tidy) or in flatfiles, or random text files spread all over hell's half acre, is simply data, not the information required to link it to other data. Even meta data about the data held in any data store is not the information required to link it to other data.

One of the things I believe will help (possibly) is ODF (buzzword warning sounds) because it begins to help format data in a universally accepted manner. Though it is not the only way, universal access methods are required for accessible data. Second, the structure of the data must be presented in a universal manner. This second part allows query languages to support cognitive understanding of the structure, and thus (with some work) the value of data held in a storage location, where ever and whatever that location is, be it RDBMS, text files, or phone bills.

Indexing is simply not enough. The ability to retrieve and utilize the index with the most probability of having relevent data is what is needed. We all know that any search engine can get you too many 'hits' that contain useless data. Google or anyone else is helpless until there are accepted methods for applying metadata and data structure descriptions on all data.

When there is far more organization to data storage, there will be a great sucking sound of people actually using data from the internet in brand new ways.... until then, its all hit and miss.

Share
twitter facebook
How do Google do their queries? (Score:2)

by caluml ( 551744 ) writes:

I want to know how, if I put a random string on my webpage (say ioeuhncio38u9384hynfxiuhfnx847uvh04897x ), and wait for Google to index it, that searching for that string will return my page in milliseconds. It obviously can't be a pre-executed query. So how the hell do they do that? SELECT * FROM index WHERE text ILIKE '%foo%' just won't cut it.
I'd love to know how search engines do do it - anyone reading this worked for one?
42 (Score:4, Insightful)

by DesertWolf0132 ( 718296 ) writes: on Monday April 10, 2006 @05:52PM (#15101663) Homepage

"I checked it very thoroughly," said the computer, "and that quite definitely is the answer. I think the problem, to be quite honest with you, is that you've never actually known what the question is."-Hitchhiker's Guide to the Galaxy"

One must remember when undertaking to find answers in the data to first figure out the question. Otherwise the answer you find will be as useful to you as the answer 42.

Without context you only have a neat compilation of arranged meaningless facts.

On the small scale data mining is used daily by marketing people and the like to figure out who would be most receptive to their approach. Webmasters use it to optimize content and respond to user trends. In most large corporations data mining is used on some level.

Data mining on the scale discussed here may be practical at some point in the future once we determine the questions we wish answers to.

Let us hope the answer is more useful than 42.

Share
twitter facebook
(Machine Learning == Data Mining) does work ! (Score:2, Interesting)

by copdk4 ( 712016 ) writes:

what used to be called 'data-mining' in 80 and 90s is now machine learning in 21st century.. and there are several instances where machine learning has shown tremendous success (probably this is the only by-product of AI that has shown promising real world applications)

- The DARPA Grand Challenge [pbs.org] - Stanely, the winning robot from Stanford used 'Adaptive vision' which used some real-time learning algorithms
- Clustering and Micro-Array Analysis [google.com] - Once genetic-medicine will become a reality, the physicians
Disappointed (Score:2)

by Jon Chatow ( 25684 ) * writes:

I'm really quite astonishingly disappointed that the summary made no reference to the priceless phrase "unknown knows" to describe data left 'buried' in the dross, presumably from sources left, as it were, under-mined.
easy answer: (Score:2)

by circletimessquare ( 444983 ) writes:

entropy

data mining will always be a frontier, because consolidaiton and standardization of data will always be a frontier, because simple entropy leads to fragmentation. furthermore, for various reasons, some good, some bad, some data will always be purposefully constrained from consolidation, only to be released into freer usage later, when data mining can commence

it's a permanent frontier
Let the monkeys mine that data (Score:2, Funny)

by suv4x4 ( 956391 ) writes:

Data Mining is still a frontier for the same reason monkeys are still having trouble reproducing Hamlet despite all the theoretical knowledge of all the incredible opportunities.

Too much assumption, too much possibilities, too little knowdledge, and not enough monkeys. You can never have enough friggin' monkeys.
Its long and hard, just to get started (Score:2)

by benow ( 671946 ) writes:

Well, I've been collecting data from various sources lately, and most is still in 'data' form, ie no real revelevant difference one set of bits to the next. I've been on a push to surface the interactions between the data, but to even get to that point, there is alot of data massaging to do.. decompression, format interchange, subject recognition, etc. In theory, once the data is in an understood format it can be searched and indexed and the searches mined. It requires a general idea of where to go with
- - Re:Its long and hard, just to get started (Score:2)
    
    by benow ( 671946 ) writes:
    
    Yeah, totally. Thanks for the response, tho. I've dl'd the data and have made a .gz to sql importer, which I've yet to fully run... 300M of compressed ascii takes ages to import... 600k+ actors alone. When done and validated, should mean a local imdb cache which should be faster than imdb. I plan an exception handler which queries and fetches from imdb when the data is not available locally, and then to create lightweight pda-friendly dynamic pages for presentation of data. May go live with the 'mobile
New Use for Google. (Score:2)

by Allnighterking ( 74212 ) writes:

The Patent administration takes your idea puts it into google, filtering out you and any article talking about you. If they get a hit, prior art, eeeeeeeeh patent rejected!
As I've Said Repeatedly (Score:2)

by Master of Transhuman ( 597628 ) writes:

without conceptual processing, data is just so much bits and bytes. Some of it can be analyzed as such, but much of it cannot without some conceptual comprehension on the part of the software (if not the analyst - which is the other problem).

A decent (read, relatively effective and efficient) simulation of conceptual processing would change the entire world of computer use from development to databases to computer education to robotics. It is THE world-class issue that needs to be resolved and soon.
A biotech scientists point of view (Score:2)

by cinnamon colbert ( 732724 ) writes:

I don't see problems that are susceptible to data mining. I suspect this view is shared by many of my colleagues. (this is similar to the view of most bio oriented scientists that desgin of experiment is not useful)

What would change the field ?

In science, what usually changes peoples minds is a BIOLOGICAL results obtained with a new technique that could not be obtained (easily) another way.
this may just be restating the old truism that success breeds success, but to get biologists interested in large scale
- Re:Scooty Puff Jr!! (Score:2)
  
  by stinerman ( 812158 ) writes:
  
  Who's ready for safe fun?
- Re:Scooty Puff Jr!! (Score:2)
  
  by tompaulco ( 629533 ) writes:
  
  Once google is finished indexing EVERYTHING, it will then index itself, thus destroying the universe.
  Better hurry. Google's already indexed 805,000 pages on "Beavers mate for life".
- - Re:Please correct your terminology! (Score:2)
    
    by nagora ( 177841 ) writes:
    
    As the USA is now on the SI system, please update your nomenclature to the currently correct "Metric Fuck Ton".
    You are clearly unaware that the Standard Metric Fuck Ton(ne), which is stored in Paris, France has recently be found to be shrinking at a rate of "Shit-all Squared" per year.
    The current US administration has jumpped on this as a pretext to move to the new "God-damned Freedom Ton" which is defined to be exactly equal to 1 original Metric Fuck Tonne, except it is not in any way connected with Fra

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Shot in the dark: (Score:5, Insightful)

Re:Shot in the dark: (Score:5, Insightful)

Re:Shot in the dark: (Score:5, Insightful)

Re:Shot in the dark: (Score:3, Insightful)

Re:Shot in the dark: (Score:2)

Re:Shot in the dark: (Score:2)

Re:Shot in the dark: (Score:3, Insightful)

Re:Shot in the dark: (Score:2)

Re:Shot in the dark: (Score:2)

Re:Shot in the dark: (Score:2)

Re:Shot in the dark: (Score:2)

Re:Shot in the dark: (Score:3, Funny)

Re:Shot in the dark: (Score:2)

Re:Shot in the dark: (Score:2)

Re:or... (Score:2)

Re:Shot in the dark: (Score:2)

Re:Shot in the dark: (Score:2)

Re:Shot in the dark: (Score:2)

Re:Shot in the dark: (Score:2)

Clarification of your "b". (Score:2)

Re:Shot in the dark: (Score:5, Informative)

Re:Shot in the dark: (Score:5, Informative)

Re:Shot in the dark: (Score:2)

Re:Shot in the dark: (Score:4, Interesting)

Re:Shot in the dark: (Score:2)

Semantic web (Score:2)

Re:Shot in the dark: (Score:2)

Re:Shot in the dark: (Score:2)

Re:Shot in the dark: (Score:5, Insightful)

Re:Shot in the dark: (Score:2)

Re:Shot in the dark: (Score:2)

Re:Shot in the dark: (Score:2)

Re:Shot in the dark: (Score:3, Insightful)

Re:Shot in the dark: (Score:2)

Re:Shot in the dark: (Score:2)

Re:Shot in the dark: (Score:3, Insightful)

Re:Shot in the dark: (Score:3, Insightful)

Re:Shot in the dark: (Score:2)

Re:Shot in the dark: (Score:2)

What does the article have to do with the subject? (Score:3)

Companies are doing it, but... (Score:4, Insightful)

Re:Companies are doing it, but... (Score:3, Interesting)

Re:Companies are doing it, but... (Score:2)

I tell you why (from a bioinformatics viewpoint) (Score:5, Insightful)

Re:I tell you why (from a bioinformatics viewpoint (Score:2)

Re:I tell you why (from a bioinformatics viewpoint (Score:3, Insightful)

Re:I tell you why (from a bioinformatics viewpoint (Score:2)

Re:I tell you why (from a bioinformatics viewpoint (Score:2)

Re:I tell you why (from a bioinformatics viewpoint (Score:2)

Re:I tell you why (from a bioinformatics viewpoint (Score:2)

Re:I tell you why (from a bioinformatics viewpoint (Score:3, Interesting)

Re:I tell you why (from a bioinformatics viewpoint (Score:2)

Re:I tell you why (from a bioinformatics viewpoint (Score:2)

It's all in the management. (Score:2)

Re:I tell you why (from a bioinformatics viewpoint (Score:2)

Re:I tell you why (from a bioinformatics viewpoint (Score:2)

Re:I tell you why (from a bioinformatics viewpoint (Score:2)

Data Mining != RDBMS (Score:2)

Aristotle (Score:2)

Re:Aristotle (Score:3, Funny)

Re:Aristotle (Score:2)

Re:Aristotle (Score:2)

Re: (Score:2)

Re:Semantic Web goodness (Score:2)

Re:Semantic Web goodness (Score:3, Insightful)

Re:Semantic Web goodness (Score:2)

Privacy (Score:2)

Re:Privacy (Score:2)

Re:Privacy (Score:2)

Re:Privacy (Score:2)

Because it's not sexy (Score:5, Insightful)

Re:Because it's not sexy (Score:5, Interesting)

Chloe to the rescue (Score:2)

Re:Chloe to the rescue (Score:2)

FTFA . . . (Score:2)

Data mining is DIFFICULT (Score:5, Informative)

Re:Data mining is DIFFICULT (Score:2)

Nothing to do with Technology (Score:4, Informative)

Re:Honest question from serious lackey- (Score:2)

What's the question? What are the barriers? (Score:2)