Follow Slashdot blog updates by subscribing to our blog RSS feed


Forgot your password?

Data Mining Used to Create New Materials 106

Roland Piquepaille writes "MIT researchers have successfully integrated data mining tools and modern methods of quantum mechanics. They've designed software which can help predict the crystal structures of materials. To simplify, they say they've used methods used by online sales sites to suggest books to customers. And it seems to work: they claim they can determine in days the properties of atomic structures that might have taken months before. Read more for additional references and pictures."
This discussion has been archived. No new comments can be posted.

Data Mining Used to Create New Materials

Comments Filter:
  • Excellent (Score:1, Offtopic)

    by cptgrudge ( 177113 )
    Maybe now I can get that Vorpal Mace of Undying +3 in the same time that I could only get the +1 model before. This will reduce the time I spend level grinding and farming in the MMORPGs that I frequent. Finally, a technology I can use.
  • by 140Mandak262Jamuna ( 970587 ) on Sunday August 27, 2006 @03:55PM (#15990714) Journal
    Considering how broadly software patents are worded now a days, I would not be surprised if MIT gets sued by Amazon for patent infringement.
    • by Anonymous Coward
      Oh J.S. Christ. For a forum of geeks you all know diddly squat about law. A software patent isn't the same as a business patent. Second you can't patent math.

      "Using a technique called data mining, the MIT team preloaded the entire body of historical knowledge of crystal structures into a computer algorithm, or program, which they had designed to make correlations among the data based on the underlying rules of physics.

      Harnessing this knowledge, the program then delivers a list of possible crystal structures
      • Oh J.S. Christ. For a forum of geeks you all know diddly squat about law.

        I say that every day.

        Slashdot is the Fox News of patents.

    • by Landak ( 798221 )
      No, no, no! It would be the person who sold the data-mined product who'd be infringing on Amazons "Click to buy" patent! And you know what we do to Patent Infringers...
  • Aha ! (Score:3, Funny)

    by Anonymous Coward on Sunday August 27, 2006 @04:09PM (#15990753)
    Transparent aluminum anyone ?
  • by suv4x4 ( 956391 ) on Sunday August 27, 2006 @04:13PM (#15990764)
    There are two surprisingly simple and "dumb" principles that exist in our world.

    The first is called evolution (random mutation, breeding of the fittest) the result of which is basically everything around us, and it has resurfaced in computer programming as genetic programming, which essentially uses random processes and selection to create new inventions, mechanisms and even intelligent virtual creatures.

    The second I'll call "intelligent observation". It's basically how animals and people learn everything they know, by observing and applying "what seems to make sense" in other areas of our lives, even without understanding the underlying mechanisms (and how we discovered fire, or tools by observing similar nature mechanisms/animals). This has resurfaced in computer programming as data mining.

    Data mining and genetic programming: these two beat any patent, any existing algorithm, because they are not crippled by our limited brain capacity to understand the world around us. Expect a lot more of both in computer science and our lives in the following years.
    • Lame name. (Score:5, Insightful)

      by Inoshiro ( 71693 ) on Sunday August 27, 2006 @04:34PM (#15990838) Homepage
      "The second I'll call "intelligent observation"."

      You know, most people would call it statistics (in this example, using a mathematical model to predict results), or the scientific method (in general, observing repeatable events).
      • Re:Lame name. (Score:4, Insightful)

        by haluness ( 219661 ) on Sunday August 27, 2006 @05:01PM (#15990918)
        Very true - data mining is the new buzzword. The techniques used in data mining are prettyold and standard. Thats not to say that theres no research - theres a ton of stuff that can be done especially when handling large datasets. But fundamentally, it's well known statistical modeling - just rephrased for the 'Age of Marketing' :)
      • by suv4x4 ( 956391 )
        "The second I'll call "intelligent observation".

        You know, most people would call it statistics

        I can just imagine life with statistics a million years ago:
        "Gee, did you notice how most animals have claws they use in their attacks? According to statistics, we can make some of our own with wood..."

        Statistics and data mining are both as you call it "buzz words". Let's not spin the discussion into an argument about terms though, right...
    • by KDR_11k ( 778916 )
      Yes but what did Intelligent Design resurface as?
    • by rm999 ( 775449 )
      genetic programming

      I disagree, I find genetic algorithms to be very limited. Data mining makes sense - we can see an animal learning something over its lifetime, study how it does so, and emulate it. This, I think, is the future of AI and anything that will follow AI.

      Genetic Algorithms, on the other hand, is trying to to emulate a process that took billions of years over countless concurrent processes in a few days. I know genetic algorithms tackle problems in a much simpler problem state space that evoluti
      • by GeffDE ( 712146 ) on Sunday August 27, 2006 @09:19PM (#15991655)
        Your view on genetic algorithms is just plain wrong.

        One of the reasons that "Intelligent Design" is so palatable to so many people is that nature and life are so damn complex. There is a textbook called Molecular Biology of the Cell; this book's aim is to precisely define the chemical pathways and biological structures that constitute a living cell, and it is roughly 2000 pages long. It is still outrageously incomplete. This massive tome is looking at something that is so incredibly minute that you are formed by trillions of them. It takes a 2000 page book to incompletely describe the simplest part of you. What is mind-boggling to many people (and simply awe-inspiring to the rest) is that such a simple rule as "survival of the fittest/random mutation" could create so complex a system. The fact is, such complexity is inherent in the system, and that complexity arises out of simplicity. A great tutorial on that is Cellular Automata [].

        Now, you do bring up an interesting point about the positive feedback loop that our brains have created with technology. But if you extend your scenario to a few years after "The Almighty CAD Program" is designed, you may indeed reach that technological singularity, where a machine can design another machine inside a CAD program, and, a few years later, might be able to either make that machine with the automated robots already used for assembly, or even emulate it with its own hardware. Now you have reached the point where "genetic algorithms" are doing exactly what you have claimed they cannot. Genetic algorithms only tackle problems in a simple problem state because they have not been allowed to evolve enough. Bacteria are much simpler than humans, and they also first came around billions of years ago. After nature had time to evolve from the bacteria, it got more and more complex. So too will genetic algorithms.
        • by rm999 ( 775449 )
          You didn't address my main argument how a "simple" process like survival of the fittest took billions of years over a countless number of concurrent processes (every single birth could be considered a process) to do what it did. The reason why I think genetic algorithms are limited in AI is for that, and only that, reason. Maybe my view is wrong, but you don't give an argument for why it is wrong.

          And my technological singularity thing has nothing to do with gentic algorithms. I do not think an intelligent m
          • by GeffDE ( 712146 )
            GAs aren't inefficient. A powerful computer can chug through billions of "births" a day. Additionally, GAs are set to have a higher rate of mutation than real life. So really, they just speed up "evolution" and while they might not be the fastest logically (you might think it would be better to intelligently pick which mutations to make), you can perform the random mutations a lot faster, so it's all a wash in the end.
            • by rm999 ( 775449 )
              "GAs aren't inefficient. A powerful computer can chug through billions of "births" a day."

              Yeah, but those "births" aren't in a very complex problem space. In real evolution, each of those births will take a lifetime to be judged in the world to determine if it will procreate. One of those births you speak of will take up maybe a few hundred clocks on a cpu. Truly complex problems will take up too much computer time.

              I think we are arguing apples and oranges. You are talking about basic genetic algorithms tha
          • by khallow ( 566160 )
            If evolution has a goal, it is survival. Intelligence is a byproduct not a goal.

            I imagine that directed evolution starting from the same begining point would be orders of magnitude faster.

        • by suv4x4 ( 956391 )
          Your view on genetic algorithms is just plain wrong.

          You know I could never figure out why you said my view is "plain wrong", and then went on talking about something that has nothing to do with what I said, or my view on genetic algorithms anyway.

          Also genetic algorithms and genetic programming are two different techniques and not the same thing.
      • While it's true the evolutionary process that generated higher life forms took millions of years, GA systems cover a more specific problem space with a more static environement. They also itterate MUCH faster (100's of generations per second rather than fractional per decade).
        However I'd agree if your point is that GA's aren't the holy grail of AI. I don't think there is ONE (emphasis on 1!).
        GA's are just a piece ofthe puzzle imho, as are expert systems, neural nets, fuzzy lo
    • by meburke ( 736645 )
      First, I'd like to give a nod tot he the other comments made here: Interesting thinking going on.

      Second, I'd like to point out that there are certain processes alrady in exisitence (especially TRIZ and ARIZ) that are predecessors to this type of approach. In one of his earlier books, Altshuler (inventor of TRIZ) proposed that once we were able to catalog the tertiary combinations of chemical reactions, invention and innovation would blossom explosively. It looks like this is where it's happening.

      Data mining
    • Re: (Score:3, Interesting)

      by greg_barton ( 5551 ) *
      Following years? How about now? e.html [] []

      (2006 results aren't posted yet...)

      I was at the GECCO06 conference (Genetic and Evolutionary Computation COnference) when the Human Competitive awards were handed out. The first place winner went to a guy whu evolved an oscillator that used HALF as many capacitors and resistors than the industry standard one. The second place winner evolved input parameters to Schrodinger's equations t
    • by patio11 ( 857072 )
      ... who has never actually used genetic programming. Genetic programming doesn't create new inventions -- it typically tweaks parameters in an existing invention so that the output of the invention approaches a goal. For example, you could use it dynamically weigh, say, SpamAssassin test scores. It doesn't just magically evolve new tests, and it certainly doesn't evolve a regular-expression based server side spam filter, it just tweaks the efficiency of one which already exists. Even for artifically res
      • but they can also be used to discover or predict interactions between common components we'd never think of on our own. This is just applied design of experiments. The bonus is that the computer can run millions of experiments with nothing more than time being the limitation.... as long as you have a good model. So simple chemical interactions are ripe for this type of thing. I'd think another applilcation would be bug hunting Linux distros for hardware, program, and usage interactions you'd never see
      • While I generally agree with you, it's worth mentioning that even though the practical applications of GAs tend to be limited to tweaking parameters there's no reason they couldn't generate a calendar application given appropriate scope and execution time. Imagine a GA which constructs a bit string of arbitrary length, can mutate by randomly flipping bits occasionally and can increase/decrease in size as another mutation. Given the evaluation function "Does this program organize my day?" the GA will eventua
    • I hope you're not serious when you say "intelligent virtual creatures". They don't exist yet! And they won't for quite some time to come. Don't you mean "pseudo-intelligent virtual creatures" instead?
  • This sounds a bit like Computer Learning/ AI to me. Give it a zillion past cases to learn from and then let it predict the next one. I did some things along those lines in my AI class for machine problems (perceptron comes to mind), though not nearly as complicated. That was a fun CS class.
  • No more freaking Roland Piquepalle!
  • I, for one (Score:5, Funny)

    by eclectro ( 227083 ) on Sunday August 27, 2006 @04:38PM (#15990846)
    I welcome our cyrstalline entity overlords. Oh, wait, they were killed off in season five.
  • Oddly enough, I was reading Kurt Vonnegut's Cat's Cradle just before I checked out /. today...and this article echoes with something of Vonnegut's idea of ice-nine. (For those of you unfamiliar with the book, ice-nine is a (fictional) form of water with a melting point well above 100 degrees Fahrenheit and, when dropped in water, it "teaches" the water how to crystallize in the same way. Crappy description, I know, but ice-nine eventually is dropped into an open body of water, "teaches" all the water in the
    • by KDR_11k ( 778916 )
      A self-replicating catalyst could work like that provided you can make it work with water (which is unlikely since hydrogen and oxygen alone can't be used to form many things, though you may be able to make that work with oil). Hm, that'd be a nice way to kill the economy: A catalyst that turns oil into itself (plus some waste substances possibly), can't be used for the things oil can be used for and isn't easy to spot when converting oil... Drop that into a central pipeline or storage and watch the western
    • That sounds completely absurd with water, but that is pretty much the idea of how prion diseases like mad cow are supposed to work. The misfolded protein gets normal proteins to refold like itself.
  • by Colin Smith ( 2679 ) on Sunday August 27, 2006 @04:41PM (#15990854)
    I mean, regression to match candidates against an existing body of data, we have dating web sites which do that [] these days. Nice way of quickly sorting the candidates but Nature material?

  • by Gary W. Longsine ( 124661 ) on Sunday August 27, 2006 @04:46PM (#15990867) Homepage Journal
    People who liked room temperature superconductors also liked:
    • transparent aluminum [Add to Cart]
    • broad spectrum LEDs [Add to Cart]
    • efficient peltier effect alloys [Add to Cart]
    • 3D holographic memory array crystals [Add to Cart]
    NOW How Much Would You Pay? (TM)
  • And it seems to work: they claim they can determine in days the properties of atomic structures that might have taken months before.

    Last time I checked, Engineers look for the easiest reliable method for finding a solution. Why are the MIT folks complicating this?

    I have them beat. I can find the properties of atomic structures, that took months to solve before, in seconds.


    Google. Why reinvent the wheel when the work has already been done? ;)

    (I know, I know, that's not what they meant, but the submitter

  • Then hopefully they'll have a better success rate at suggesting new materials than the recommendations of crap I keep getting from Amazon.

    No Amazon, I'm not interested in season 6 of DS9... nor season 2... nor season 5... nor season 3... nor the entire series! And don't you dare think about suggesting Desperate Housewives to me again!
  • by purduephotog ( 218304 ) <hirsch&inorbit,com> on Sunday August 27, 2006 @05:28PM (#15990989) Homepage Journal
    ... I worked on it when I was employed by Eastman Kodak back in 2000. We had/have any number of sophisticated ways of modeling parameters based upon previous research- but it wasn't called data mining.

    One of the companies that has supplied hardware (or is known in the industry to do so) is PQS- []. They 'sell' hardware and software, but their software is pretty darn slick for setting up large jobs.

    Since I did mostly dye research, I'm supposing the big difference is these are more interested in metalic properties than what we were- light, colour, mp, etc- all things that might be useful for film or OLEDs.

    But still, if it's getting positive press, maybe it's time to put it back on the resume...
    • It wasn't called data mining because what you did isn't what's described in the article. To quote:

      "the program then delivers a list of possible crystal structures for any mixture of elements whose structure is unknown. "

      They *then* used quantum modeling software much like what you've described in order to test those structures:

      "The team can then run that list of possibilities through a second algorithm that uses quantum mechanics to calculate precisely which structure is the most stable energetically -- a
      • Thanks, but I did read the article. And that's exactly what I did- using the paramaters selected I could 'guess' structures that should be compatible with our objectives. Re-run with new structures and *poof* answers.

        Time not spent in the lab? Priceless.

        I'll be more precise in the future. But thanks for the attempt at a correction.
  • Metals with this spectral reading also had:

    -face-centered cubic structure
  • by w33t ( 978574 ) * on Sunday August 27, 2006 @05:44PM (#15991030) Homepage
    Reading about how a program can find something that a human could not, or would not, brings to mind a notion I had the other day.

    I am learning Java (and OO programming best practices in general), and am pretty heavily into it at this point. I was tooling along, writing some code to test some aspects of the language when I suddenly realized that much of what I was typing I was kind of unaware of.

    When I had first begun studying in earnest a few months ago I remember how closely I paid attention to the smallest syntactical details. But now that much of this has become wrote I found myself automatically just cruising through - not really conceptualizing what I was doing. But it was still working.

    I went back into my little code and delved into a deeper reading of what I had written. It was all correct according to theory - and I could recall all the little subtleties of how Java's VM was interpreting this and that - but while I was writing it I was giving no thought to it. It just happened; it just came out of me.

    Now, hearing about these programs that can mine data and find things that human eyes would miss - and relatedly hearing about machines that can invent [] - I wonder if one day invention, discovery and the like will all be wrote.

    I wonder if, like my mindless coding moment, things will just happen - research will just occur - without really a second thought of the "low-level" processes that currently are held so dear.

    It's interesting. It might be akin to mathematics in some ways - wherein you can generalize a large body of calculation and come to a conclusion without actually outputting the raw numerical form.

    It is an approximation, yes. But with some work the approximation can be decomposed into elementary school level math expressions - if you really want to go through all that work.

    But why decompose it, it works fine generalized (much better for humans in fact).

    It's interesting to me - this modern high-level generalization.
    • Re: (Score:3, Insightful)

      by Xiroth ( 917768 )
      Programming is like writing in any language. Once you're sufficiently familiar with it, you don't need to think of which word to use and where to put the punctuation - you just know what you're trying to express and take the most natural path through the language to express it. The human brain is designed to work with language, and while programming is not the most natural type for it, we can use much of the wiring used for human language in coding.
      • by w33t ( 978574 )
        Hmm, that's a very good point. Interesting, I never thought of it that way actually.
    • by MoogMan ( 442253 )
      Welcome to The Zone :)
  • We get the characteristics of all the past politicians we liked (& hated) and plug them in and then compare them against "Our Boy".

    I just suspect we are not going like the megalomaniacal persons we will elect any better than the ones we have, but it will be interesting, because I can guarantee someone will or is doing it.
  • The Bush Administration uses data mining to create presidential powers out of whole cloth.

  • What the hell? This is a Roland article and there's no stolen story link on his blog?

    What IS the world coming to?

  • ZOMG! Don't let Lex Luthor get his hands on this technology! He will try to sell real estate on a crystal island after killing billions of people who had all the money to buy his real estate!
  • by rxmd ( 205533 ) on Sunday August 27, 2006 @06:42PM (#15991180) Homepage
    This approach has been popular for quite some time now. For example, there is a research group at CAESAR in Bonn, Germany, called Combinatorial Material Science [] that has been doing something similar for the last five years or so in the field of material science, especially regarding thin films.
  • Ceder's group found the new algorithm could select five structures from 3,000-4,000 possibilities with a 90 percent chance of having the true structure among the five.

    If it's based on probability, and only gives the right answer 90 percent of the time (assuming that the more thorough stability analysis chooses the best of the five every time), how useful is this really?
    Does determining the structure without any doubt still require the full time consuming lab analysis, or can you easily verify the cand

    • Re: (Score:3, Insightful)

      No chemist will ever trust a computer result without doing the full lab work. This can still be incredibly useful.

      Consider if each thorough test takes 6 months for 3000-4000 possibilities. If the computer can tell you the 5-10 compounds that are likely to work, in a few years you can have a product (or a PhD). Otherwise you were looking at nearly a thousand years before finding something.
  • by Anonymous Coward

    ... not.

    But I did work on a project that applied data mining techniques to drug screening problems. Specifically, we used kernels on molecule data in a support vector machine to predict the outcome of AIDS and cancer screening data. It worked moderately well. (AUC of up to .94)

    So: Surprise, surprise, data mining is used for all kinds of things! Drug screening, materials engineering, process control, analyzing NMR spectra, ... it's not just marketing! Basically, every application that produces a lot of d

  • The two questions are: "given a particular chemical composition, what is the crystal structure? what are its likely properties?"

    The scientists have a database of known crystal structures (plus some unspecified physics). Step 1 is to use 'datamining' techniques to generate a shortlist of possible structures for a given composition based on the database. Step 2 they perform quantum mechanical calculations to decide the likely properties (eg band gap).

    I assume as step 3 they then investigate in more detail any
    • That seems to be the process. I'm going to get a hold of a copy of that Amazon software and use it to make stock predictions ; )
  • If I understand this correctly:

    Select theoretical material.

    Determine subset of likely configurations.

    Run simulator to determine physical properties of new material for each likely configuration.

    Running this process against a large set of theoretical materials and saving those results into a database and allowing data mining on those results would allow materials scientists to perhaps cherry pick their next effort or research direction. Highest tens

An elephant is a mouse with an operating system.