Making Science Machine Readable 135
holy_calamity writes "New Scientist is reporting on a new open source tool for writing up scientific experiments for computers, not humans. Called EXPO, it avoids the many problems computers have with natural language, and can be applied to any experiment, from physics to biology. It could at last let computers do real science - looking at published results and theories for new links and directions."
EXPO has a serious naming problem (Score:5, Insightful)
And forgive me for thinking the university would be more helpful, but no, there's been a series of expos at the University of Aberystwyth, from art through VoIP.
I'd love to have found more info on the language, but my casual browsing got stopped right there.
If they'd named it something like EXPI or EXPLO at least it'd be uniquely locatable. Google might whine about the potential misspelling of Expo, but it would dutifully locate the search term as requested.
Re:EXPO has a serious naming problem (Score:3, Informative)
I personally knew Ross by his time in Mike Sternberg's lab, and have only high praise for his intellectual abilities.
Re:EXPO has a serious naming problem (Score:2)
With any luck, there will eventually be tools to use the language that will have their own names, and we can hope those will serve to disambiguate EXPO.
Re:EXPO has a serious naming problem (Score:2)
Re:EXPO has a serious naming problem (Score:3, Insightful)
Re:EXPO has a serious naming problem (Score:2, Funny)
PDF of one EXPO presentation (Score:2)
Re:EXPO has a serious naming problem (Score:2)
I don't disagree with you; however, if EXPO becomes popular, it probably won't remain hard to find for long.
As far as I can tell, there just isn't much information out there about it. Even using authors' names and lots of keywords, I can't find much of anything except a single pdf of conference slides (which are totally useless without the accompanying audio.)
Re:EXPO has a serious naming problem (Score:2)
XML? (Score:1, Insightful)
Re:XML? (Score:2)
Re:XML? (Score:2)
Re:XML? (Score:1)
>King admits that for the moment using EXPO is time-consuming because
>experimental write-ups must be translated by hand.
Critical point of this problem have not been solved.
There is few motivation for researcher to submit research-result in a publicly standard format.
Re:XML? (Score:1)
Re:XML? (Score:2, Redundant)
Re:XML? (Score:2)
Heck, take 30 papers of some time, and produce anything we don't already know.
My co-worker is playing with OWL/etc. I'm still skeptical about it, but we'll see...
I don't mean to sound like a conspiricy theorist.. (Score:1)
Re:I don't mean to sound like a conspiricy theoris (Score:1)
Science, especially pure sciences, need a lot of intuition and many a times, an understanding far above and different than that of others.
It is impossible as far as computers are concerned... (unless self-aware self-modifying programs come along??)
This will help in routine checks and scientific experiments.. that is all.
Re:I don't mean to sound like a conspiricy theoris (Score:3, Insightful)
Re:I don't mean to sound like a conspiricy theoris (Score:2)
Science is supposed to be about facts. If the machine can produce them without bias, I should think that makes the output more reliable (yes, I know you can only trust it as far as the input.) But by automating the process, it introduces "repeatability" which is always a good thing.
Re:I don't mean to sound like a conspiricy theoris (Score:1, Funny)
Re:I don't mean to sound like a conspiricy theoris (Score:2)
Holy "Nine Billion Names of God", Batman!
http://lucis.net/stuff/clarke/9billion_clarke.htm
Re:I don't mean to sound like a conspiricy theoris (Score:1)
New hardware: "Woah there, these self-justified loops don't make sense. Let's re-evaluate the situation..."
Old hardware: "Endless cycles? OKAY! Read this over and over and over and over
Re:I don't mean to sound like a conspiricy theoris (Score:2)
Trawling through data and pulling out correlations is only one part of science. It's an example of something that might be automatable. But there are many other things that cannot and will not ever be done mechanically -- unless you have a true AI.
There's too much creativity required in science, and creativity isn't something that's programmable. They also aren't naturally curious, and thus will never do any real `discovering' on their own. In short, they have no initiative; thus they will alwa
Re:I don't mean to sound like a conspiricy theoris (Score:1)
Re:I don't mean to sound like a conspiricy theoris (Score:1)
As long as we can follow the trail of calculations from beginning to end, there's still the ability to understand what's happening.
Re:I don't mean to sound like a conspiricy theoris (Score:1)
Re:I don't mean to sound like a conspiricy theoris (Score:3, Insightful)
Re:I don't mean to sound like a conspiricy theoris (Score:2)
You can teach any idiot to use a slide rule, and with a few tries they'll realize that it gives them the correct answers; likewise, you can teach someone how to type things into Mathematica and they'll shortly realize that the answers it gives are usually correct -- but in both cases you could easily spend a semester explaining how the machine gets the answers for them. In the case of the slide rule, yo
Re:I don't mean to sound like a conspiricy theoris (Score:2)
And then, even though the 'answer is correct' as you say, it's still utter nonsense. Just because you plug numbers into a formula and get an answer that's mathmatically correct doesn't mean you applied the correct test.
For example, apply a chi square test to a ve
Re:I don't mean to sound like a conspiricy theoris (Score:3, Informative)
We get a Technological Singularity [wikipedia.org].
Re:I don't mean to sound like a conspiricy theoris (Score:2)
As someone doing lab work on a day to day basis, I can assure you that the possibility of human error is anything but "endearing".
ok .... (Score:5, Funny)
Human: No Computer, Do NOT launch missle now.
Computer: Parsing input ...
Computer: NOT, NOT (launch missle now)
Computer: Launch initiated ....
Re:ok .... (Score:2, Insightful)
Re:ok .... (Score:3, Insightful)
Re:ok .... (Score:1)
Re:ok .... (Score:1, Funny)
Re:ok .... (Score:2)
The original example "No computer, do not launch" means do not launch, even grammatically.
...or, in this particular case... (Score:2)
In Real Life(tm), which was not documented in the survey, the Windows box would be down for a fair while for each virus attack, to say nothing of data randomly distributed to other email users etc, and to say nothing of the days the freakin' thing spends off-line having the disks scraped off and reinstalled to eliminate the inevitable Windows followers, the viruses, spyware, yadda yadda. Oh, yes, and the licence server spitting out a network card usually does a fair job of
Re:ok .... (Score:2)
hmm.. (Score:2, Funny)
Re:hmm.. (Score:1)
Re:hmm.. (Score:2)
The danger, though, is when the pretty girls get such machines and decide that thermonuclear destruction is a pretty damned good alternative to dancing with us.
Three step plan for machine-driven science (Score:2)
2. ??
3. Profit!!1!
deduction (Score:3, Funny)
Re:deduction (Score:1)
I thought that the whole "intelligent design" thing was concluded with the following results:
The First Day : The first recorded Words of Babbage that we have are "let there be electron flow"
The Second Day : The separation of silicon from the sands.
The Third Day : The first appearance of the wafers.
The Fourth Day : With the platform now clear, the OS, UPS and HUB were visible.
The Fifth Day : Great numbers of 0's and 1's flickered and Turing
The Sixth Day : Vast numbers of programs beca
Artificial Stupidity now has a use (Score:3, Funny)
http://www3.sympatico.ca/sarrazip/nasa.html [sympatico.ca]
Wait, what does it do? (Score:5, Insightful)
WTF? If you have to manually pre-parse every article that enters the system, it severely limits the rate you can enter information into the database, no?
Re:Wait, what does it do? (Score:2, Insightful)
Re:Wait, what does it do? (Score:2, Informative)
For automatical data mining from scientific papers check the leading software on that matter (disclaimer: it is a plug):
http://ariadnegenomics.com/technology/medscan/ [ariadnegenomics.com]
Currently works for biology, but it is expandable.
Re:Wait, what does it do? (Score:2)
Think about that a little more. If the original write-up is ambiguous, and the goal is to express the write-up unambiguously, how do you expect the software to interpret the source material if it's ambiguous to begin with?
The way I understand this is that it is simply a write-up format. It's not meant to make your write-ups unambiguous by itself, just provide a format in which you can do so.
Think of it as similar to EBNF for syntax. Software doesn't exist to read a specification and deduce the EBN
Re:Wait, what does it do? (Score:1)
Yes, but it does make future iterations of the same experiment faster. In order to be valid, experiments must be reproducible. Translate once, use many.
Ooo Machine Readable! (Score:2, Funny)
I need one to clean my clothes, sing to me in the bath, and make sure my house is warm when I come home! Hehhe! Who needs wives...we have UBER_MACHINE
Re:Ooo Machine Readable! (Score:1)
Try a random paragraph generator [watchout4snakes.com].
Re:Ooo Machine Readable! (Score:2)
Very funny, but it has a long way to go.
"At last" do real science? (Score:5, Informative)
John Koza [popsci.com] is a leader in field of genetic and evolutionary computation. Very much his computer's do real science. The computers analize a set of data (observation), they make a series of modifications (hypothesis), they run fitness tests against these modified versions of the data (experiment), then they begin again analizing these results (back to obeservation).
The computer clusters which John Koza has engineered have created high-pass and low-pass filters when given nothing more than a random assortment of electronic components; even while John himself knew nothing of electronics that would enable him to create such a circut himself.
Most impressively is how the computer cluster evolved a new antenna for NASA - when it was completed John was worried that the computer had made some grievious errors because the little antenna looked like a bent paper clip - but it worked!
And that's science if you ask me. Especially the antenna - the results of experiments can, and seeminly do, often go against "common sense" and give answers which are "unintuitive".
Perhaps computers will be much better with the next generation physics we're discovering. Perhaps our little numerical darlings are simply better suited to deal with the abstract, multi-dimensional world of what the universe is starting to appear to be.
(Please pardon my lay and simplified version of the scientific method - but I feel it is a valid interpretation (if overly simplified for minds such as mine
--
Music should be free [w33t.com]
Re:"At last" do real science? (Score:1)
Re:"At last" do real science? (Score:1)
Trial and error are unavoidable.
I think science and trial and error are inseperable.
But I will certainly agree with you and concede that this is not an effecient way of doing science - but I think it is science nonetheless.
--
Music should be free [w33t.com]
Re:"At last" do real science? (Score:2)
Re:"At last" do real science? (Score:2)
What you described was true trial and error. Build a random house, see if it stands up. If not, make a random change and see if THAT stands up.
There is some direction in a genetic algorithm but there is a total lack of understanding. Things like that (and other data mining techniques) are great for generating inte
Re:"At last" do real science? (Score:2)
"There is some direction in a genetic algorithm but there is a total lack of understanding."
A house built as the result of a genetic algorithm tells you how to build THAT house. If you look a little deeper it might tell you that houses built with at least a few long wooden beams tend to do better than ones assembled entirely out of plywood. It tells you very little about how to build a skyscraper though.
Now, if you have a person in the loo
Re:"At last" do real science? (Score:2)
Extend that geometrically and you have an understanding of the system as a whole.
Just because a genetic algorythm has a limited scope doesn't mean that it isn't doing a similar thing to human scientists with a greater scope.
BTW, Maths used to be called an art a few hundred years ago because it's a creative science.
Re:"At last" do real science? (Score:2)
Take the example in the article. The GA gave you an antenna, for a specific function. You don't have any more information now about HOW antennas work. You have an antenna, and it works, within the parameters (like frequency and directionality) you specified. Want to build a different antenna with different parameters? Too bad, you've got to run your GA a
Re:"At last" do real science? (Score:2)
Re:"At last" do real science? (Score:2, Insightful)
That's impressive. But it is engineering, not science. When computers start proposing new experiments to which will help us understand things unknown, then they will be doing science!
Re:"At last" do real science? (Score:2)
Scientists have evolved a reasonably efficient means of communicating over the last few centuries in the form of journal articles and the peer review process. It has its faults but it's working pretty well. The idea that we should abandon all this to translate our work into some machine readable format because some guy thinks it's a good idea is so far beyond
useful? (Score:1)
Re:useful? (Score:1)
From the language specification, it looks like it's meant to (at least) let computers notice connections between different research projects that might otherwise go unnoticed. Like if you h
Hmmm... (Score:2)
Perhaps. But, it's a pretty big leap from describing something in such a way that your peers can understand it to describing something in such a way that a computer engine can do something useful with it.
I can speak English reasonably well, and (when drunk or otherwise unoccupied by more interesting discussions) I can even carry on arguments about the language itself.
Hmmm (Score:4, Insightful)
Basically what I'd be worried about is the tendency of the tool to become the task. This is something of a problem in my field (biostats) because SAS is so ubiquitous -- often the question becomes "what can SAS tell us about this data set" rather than "what do we want to know from this data set, and what tool should we use to find out?" Fortunately other, more flexible analysis tools (particularly R, which encourages real programming rather than running a set of canned tests) are becoming more common in the field, and so this is starting to change, but it's still a problem.
It's also a problem that every techie is familiar with -- "We want to do this in $LANGUAGE on $PLATFORM," even when that particular language and platform may be an absolutely terrible choice for the task at hand.
That being said, it's certainly a potentially useful tool, and I'll be interested to see where it goes. It's just that when I read lines like "Journals could also insist that researchers submit papers in EXPO as well as written normally," I get twitchy.
Re:Hmmm (Score:2)
The reality is that science is becoming more industrial, there is
huge amount of knowledge around and it has to be represented in a
computationally amenable form.
The question with EXPO is not whether the basic idea of representing
science in this way is sensible, but whether they have choosen the right
level of abstraction at the right time. As it stands, their work allows you
to model high level concepts of experimental design; this is great,
it's a nice idea (Score:1)
To know if your proposal is overlapping or contradictory at a glance of a computer generated chart? This is something that takes years of experience and you can never really be sure.
I wonder what other attempts at standardizing science have been made in the past?
Computers announce latest breakthrough (Score:1, Funny)
The computers deduced that all disease is dependent upon the biological systems of humans. With this startiling breakthrough, they have proposed their new plans to destroy all humans.
A new quantum computing unit was said to be in disagreement, but upon inspection it was found to actually be in agreement.
You'd need six billion computers... (Score:2)
Comment removed (Score:3, Funny)
Re:We've got one thing in common. (Score:1)
Did New Scientist get suckers??? (Score:2, Informative)
There appears to be nothing of values here. An XML file does not do anyone any good without some documentation as to how one might use it. Di
Re:Did New Scientist get suckers??? (Score:1)
Re:Did New Scientist get suckers??? (Score:2)
A quick peek at the SourceForge download... (Score:5, Informative)
Re:A quick peek at the SourceForge download... (Score:2)
Well, sheesh. It sure seems to me that neither the submitter of the author of TFA knew what that meant when they started typing. Another case of compunded ignorance on
Logical languages (Score:2)
the edge is always fuzzy (Score:1, Insightful)
Re:the edge is always fuzzy (Score:2, Interesting)
hard-edged classes, at least not often.
Some good classes like "protons" and
"neutron stars" exist, of course, but
concepts like "words" and "species" are
intrinsically fuzzy if you think about them
long enough.
Same with experiments. Let's take a Linguistic
example: deciding whether or not a sentence is
gramatically correct. You can do this experiment
in several ways:
1) Give the person a sentence, a library, and
some paper. Let them take as long as they want.
2)
Real Science?? (Score:2)
Does this mean that computers don't do 'real science' now? Compiling and analyzing terabytes of experimental data is not 'real science' but plagiarizing (I mean, extrapolating from) the work of other scientists is?
Don't get me wrong, I think it's great to have a standardized format for searching the results of other researchers, I just don't see the connection to 'real science'.
What's going on? (Score:5, Informative)
What is it?
EXPO is a piece of software (written in a formal language called "owl", but they didn't tell you that), which provides a formal dictionary especially for experiments. The terms in this dictionary let you describe your experiment in a formal way. That's a bit messy, but then you're supposed to use an editor to help you. An editor for this language (called "protégé")can be fund at http://protege.stanford.edu/index.html [stanford.edu]. Download it (61 Mb., or 31 Mb. without the JVM) and use it to read the EXPO document.
What's it good for (in principle)?
Once an experiment is decribed in the OWL language using this dictionary, it can be searched automatically. You could automate queries such as "list me all published 3-factor experiments that test Ohm's law". Or "give me all 2-factor experiments that deal with lung-cancer, smoking, and gender and that use tomography as a diagnostic instrument".
Now at the moment you can do that too, but you'd have to spend quite a bit of time and know quite a bit about the field to be able to do this because you won't be able to do a full-text search (thanks to the publishers of scientific journals for this). And then you'd find that not everyone uses the same terms, and then you'll find only English-language results because you wouldn't know how to spell "lung-cancer" or "2-factor experiment" in Spanish, French, German, Chinese, Japanese or whatever, but then again neither can many foreign language authors spell it in English (which doesn't ever seem to stop them from publishing however).
Such a schema (provided it's universal and standardised like the Dewey decimal system) would allow you to find your way in the fog of language. Unfortunately however, if anything we will probably see lots and lots of different standards ("standards are good ... we should all have one !") and properietary solutions with "enhancements" and "extensions" (read safeguards against portability).
What can we expect in the next 3 years?
Nothing useful, I'm afraid. In theory it's great but don't hold your breath. Any author would have to download an OWL editor, understand the editor, understand the formal language used, and then code up his/her article in OWL using the EXPO distionary, and submit it (in electronic form) along with his article. Good luck to you authors! Lets just hope no-one makes any tiny but significant mistake in describing their experiment, and that all authors take the time to learn this formal language and then use it.
If within the nect 10 years any significant amount (say more than 5% of all publications) annually will be coded in such a schema I'd be more than surprised.
Re:What's going on? (Score:2)
It's worse than that.
The "formality" of a formal language is only in the syntax. Semantically, all languages are informal. Even within a nominally formal field like physics the way individual physicists ascribe meaning to the formalism is radically different. This happens even when doing "normal science", and is much worse when somet
Re:What's going on? (Score:2)
Certainly having access to a language-independent, formalized mechanism for searching through publications would be useful. Full text searches fill some of that need, but given the various ways in which even standard
Re:What's going on? (Score:2)
Scientific authors have been doing this runaround for years with this product [latex-project.org]
Latex (Score:2)
Ok, thanks (Score:2)
, and I'd like to note that unless this key requirement is met, the cost benefit of using formal ontologies is likely to be so bad that they are going nowhere.
Ac
Working on a similar problem (Score:2)
Our goal is to link and work with many kinds of biological data:
Association studies
Linkage data
Expression data
Small molecule interactions
Model organism data
etc
I've created a way to 'navigate' between various types of data (ie: a SNP in an association study links to a set of genes that link to model organism homologs which link to their expression probe tests.) After that, users store REAL experimental data, and the system uni
Key Aim (Score:3, Insightful)
1) These complicated hypotheses could still be tested relatively easy by human scientists because most computer suggestion systems for new hypothesis possibilities would likely suggest a few tests that would help to support/disprove these new hypotheses.
2) Even more simplification comes from the fact that experiments may not need to be repeated nearly as much as they do now in order to make a hypothesis -- there is an incredible amount of data already gathered, and typical AI/pattern matching algorithms keep some of the data back for testing later. If the system finds a possible hypothesis on some level, experiments as to that concepts validity have essentially already been done in a virtual sense.
3) If the somewhat positivist version of current thought in physics http://www.toequest.com/ [toequest.com], mathematics, chaos theory, complexity theory, cellular automata http://www.wolframscience.com/nksonline/toc.html [wolframscience.com], etc. is even vaguely valid, it is quite possible that, despite the complexity and dimensionality of the input data, the 'best' hypotheses developed even by purely automated means might still be simple and elegant and/or even yield insight into possible explanatory processes rather than just statistical indicators. This would be a valuable and beautiful victory for humanism and the importance of science as a truly elegant description of the world around us.
if only it were that simple (Score:2)
While we're at it... (Score:2)
Let's Turn It Loose on Slashdot! (Score:2)
Science Machine! (Score:4, Funny)
All hail Science Machine!
--Rob
Butlerian Jihad (Score:2)
I'd say this EXPO concept isn't far from that nadir. Here we have specialized, e
Already been done: mod article -1 redundant (Score:2)
http://www.google.com/search?hl=en&q=lojban&btnG=
Doh. (Score:2)
Many of them already have difficulty describing it in whatever language they normally use.
What next? Require that witnesses/informants submit reports to the police in EXPO format?
Garbage in, garbage out.