Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×

Making Science Machine Readable 135

holy_calamity writes "New Scientist is reporting on a new open source tool for writing up scientific experiments for computers, not humans. Called EXPO, it avoids the many problems computers have with natural language, and can be applied to any experiment, from physics to biology. It could at last let computers do real science - looking at published results and theories for new links and directions."
This discussion has been archived. No new comments can be posted.

Making Science Machine Readable

Comments Filter:
  • by w33t ( 978574 ) on Wednesday June 07, 2006 @11:34AM (#15487630) Homepage
    I think that computers have actually been able to do real science for at least a little while already.
    John Koza [popsci.com] is a leader in field of genetic and evolutionary computation. Very much his computer's do real science. The computers analize a set of data (observation), they make a series of modifications (hypothesis), they run fitness tests against these modified versions of the data (experiment), then they begin again analizing these results (back to obeservation).

    The computer clusters which John Koza has engineered have created high-pass and low-pass filters when given nothing more than a random assortment of electronic components; even while John himself knew nothing of electronics that would enable him to create such a circut himself.

    Most impressively is how the computer cluster evolved a new antenna for NASA - when it was completed John was worried that the computer had made some grievious errors because the little antenna looked like a bent paper clip - but it worked!

    And that's science if you ask me. Especially the antenna - the results of experiments can, and seeminly do, often go against "common sense" and give answers which are "unintuitive".

    Perhaps computers will be much better with the next generation physics we're discovering. Perhaps our little numerical darlings are simply better suited to deal with the abstract, multi-dimensional world of what the universe is starting to appear to be.

    (Please pardon my lay and simplified version of the scientific method - but I feel it is a valid interpretation (if overly simplified for minds such as mine ;) )
    --
    Music should be free [w33t.com]
  • by mapkinase ( 958129 ) on Wednesday June 07, 2006 @11:38AM (#15487666) Homepage Journal
    Just look by the name of the authors: Ross King and Larisa Soldatova.

    I personally knew Ross by his time in Mike Sternberg's lab, and have only high praise for his intellectual abilities.
  • by mapkinase ( 958129 ) on Wednesday June 07, 2006 @11:43AM (#15487700) Homepage Journal
    Just read the next sentences to the quote. It is the same idea that lies behind RSS: the author is responsible for providing results in an EXPO format.

    For automatical data mining from scientific papers check the leading software on that matter (disclaimer: it is a plug):

    http://ariadnegenomics.com/technology/medscan/ [ariadnegenomics.com]

    Currently works for biology, but it is expandable.
  • by Pedrito ( 94783 ) on Wednesday June 07, 2006 @11:49AM (#15487752)
    The article is weak on technical details. So, I went to the Sourceforge site which has no home page, no documentation, nothing in the forums, and the only "released" file has an extension of .OWL (insides a zip) that contains XML in an invalid format (various unescaped characters that should be escaped. Also noted in the sole bug submission in the Sourceforge project).

    There appears to be nothing of values here. An XML file does not do anyone any good without some documentation as to how one might use it. Did New Scientist somehow get duped or is there simply more to this and it's all hidden away?
  • by frankie ( 91710 ) on Wednesday June 07, 2006 @11:55AM (#15487794) Journal
    ...reveals that EXPO is an OWL schema [w3.org]. Exactly as described, it's an attempt to regularize the content of experimental design into machine readable form (XML). So any discussion of whether EXPO is a good idea or not really hinges on whether you think OWL is a good idea or not.
  • What's going on? (Score:5, Informative)

    by golodh ( 893453 ) on Wednesday June 07, 2006 @12:28PM (#15488106)
    The New Scientist article was clear enough but a little short on technical detail. Note: I'didn't know any of this until I read the article, so my comments are based on nothing more than a few minutes of experience.

    What is it?

    EXPO is a piece of software (written in a formal language called "owl", but they didn't tell you that), which provides a formal dictionary especially for experiments. The terms in this dictionary let you describe your experiment in a formal way. That's a bit messy, but then you're supposed to use an editor to help you. An editor for this language (called "protégé")can be fund at http://protege.stanford.edu/index.html [stanford.edu]. Download it (61 Mb., or 31 Mb. without the JVM) and use it to read the EXPO document.

    What's it good for (in principle)?

    Once an experiment is decribed in the OWL language using this dictionary, it can be searched automatically. You could automate queries such as "list me all published 3-factor experiments that test Ohm's law". Or "give me all 2-factor experiments that deal with lung-cancer, smoking, and gender and that use tomography as a diagnostic instrument".

    Now at the moment you can do that too, but you'd have to spend quite a bit of time and know quite a bit about the field to be able to do this because you won't be able to do a full-text search (thanks to the publishers of scientific journals for this). And then you'd find that not everyone uses the same terms, and then you'll find only English-language results because you wouldn't know how to spell "lung-cancer" or "2-factor experiment" in Spanish, French, German, Chinese, Japanese or whatever, but then again neither can many foreign language authors spell it in English (which doesn't ever seem to stop them from publishing however).

    Such a schema (provided it's universal and standardised like the Dewey decimal system) would allow you to find your way in the fog of language. Unfortunately however, if anything we will probably see lots and lots of different standards ("standards are good ... we should all have one !") and properietary solutions with "enhancements" and "extensions" (read safeguards against portability).

    What can we expect in the next 3 years?

    Nothing useful, I'm afraid. In theory it's great but don't hold your breath. Any author would have to download an OWL editor, understand the editor, understand the formal language used, and then code up his/her article in OWL using the EXPO distionary, and submit it (in electronic form) along with his article. Good luck to you authors! Lets just hope no-one makes any tiny but significant mistake in describing their experiment, and that all authors take the time to learn this formal language and then use it.

    If within the nect 10 years any significant amount (say more than 5% of all publications) annually will be coded in such a schema I'd be more than surprised.

  • by vertinox ( 846076 ) on Wednesday June 07, 2006 @01:23PM (#15488564)
    But what happens if we get to the point where all of science is automated by computer?

    We get a Technological Singularity [wikipedia.org].
  • by Anonymous Coward on Wednesday June 07, 2006 @02:53PM (#15489275)
    I couldn't find much about EXPO but I found some previous work.

    They have a publication in Nature biotech: The failure of many bio-ontologies to follow international standards for ontology design and description is hampering their application and threatens to restrict their future use.
    http://www.nature.com/nbt/journal/v23/n9/abs/nbt09 05-1095.html;jsessionid=873A8C7D8ADA6CD6B7ABB60E1E 640D45 [nature.com]

    They discuss microarray experiments.

    Microarray experiments are interesting from the massive data they produce and what you can get out them. In a microarray experiment, you looks at all the mRNA trancripts generated in an organism under specific conditions. You get a whole lot of data from this experiment and often the researchers are only interested in one specific question and the rest of the data goes to waste. However, when the data is standardized and made available other researchers can look at the same data with a different question. Or look over multiple datasets with standardized data. These are massive data sets and for other people and groups to use the data (or you using the data in a different way) depends on standardization.

    Right now, to find other research, you do a text search for a name you know. But what if someone is doing a very similar experiment with a different set of proteins that have a different name? If you could search the structure of the experiment instead of just the text, you could conceivably pull relevant information that you didn't know about.

    Interestingly, King has another paper:
    http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd= Retrieve&db=pubmed&dopt=Abstract&list_uids=1472463 9&query_hl=5&itool=pubmed_docsum [nih.gov]

    Functional genomic hypothesis generation and experimentation by a robot scientist.
    King RD, Whelan KE, Jones FM, Reiser PG, Bryant CH, Muggleton SH, Kell DB, Oliver SG.

    Department of Computer Science, University of Wales, Aberystwyth SY23 3DB, UK.

    The question of whether it is possible to automate the scientific process is of both great theoretical interest and increasing practical importance because, in many scientific areas, data are being generated much faster than they can be effectively analysed. We describe a physically implemented robotic system that applies techniques from artificial intelligence to carry out cycles of scientific experimentation. The system automatically originates hypotheses to explain observations, devises experiments to test these hypotheses, physically runs the experiments using a laboratory robot, interprets the results to falsify hypotheses inconsistent with the data, and then repeats the cycle. Here we apply the system to the determination of gene function using deletion mutants of yeast (Saccharomyces cerevisiae) and auxotrophic growth experiments. We built and tested a detailed logical model (involving genes, proteins and metabolites) of the aromatic amino acid synthesis pathway. In biological experiments that automatically reconstruct parts of this model, we show that an intelligent experiment selection strategy is competitive with human performance and significantly outperforms, with a cost decrease of 3-fold and 100-fold (respectively), both cheapest and random-experiment selection.

    I couldn't see big leaps of innovation coming from this kind of experimentation, but there is a lot of basic grunt work done in research that this system could automate.

And it should be the law: If you use the word `paradigm' without knowing what the dictionary says it means, you go to jail. No exceptions. -- David Jones

Working...