Become a fan of Slashdot on Facebook


Forgot your password?

Comment reductio ad absurdum (Score 5, Insightful) 1121 1121

The universe came into being 6 seconds ago, in exactly the state we see now, with all of our memories intact.

Prove me wrong.

Hint - it can't be done. You can always reintroduce the possibility of some omnipotent force. By carefully framing the question, proving it wrong becomes impossible. Instead, you have to unask the question. Western philosophy spent then entire last century trying to unask the premises Descartes set forth for exactly that reason.

This isn't a scientific question, it isn't in a scientific arena, and any scientist thinking they can 'win' the debate/bet is on shaky ground. Not because the science is bad, but because it isn't about science at all...

Comment Re:Odd (Score 1) 146 146

It is possible to get embryonic cells and DNA from the mother's blood, but isolating it so it is free from contamination for a clean sequence is difficult. The technology is already being used in an array-based assay to detect Down Syndrome and a few others. See here.

That page doesn't say much, but the confidence intervals are already on par with the risk factors for an amino, which means the amino is on the way out. Sequence data would be better, having the father's genome might help, but regardless of the details, the foundations have already been laid to do non-invasive genetic screening of the child in the womb.

The potential for both good and evil to come of this is enormous.

Comment Who hasn't noticed it? (Score 4, Interesting) 463 463

Disclaimer: I am a PhD chemist, but I am not your chemist (or something like that).

The nebulous threat of 'chemicals' has been present for years, but there has been a bit of an uptick in rhetoric recently.

Much as the traditional computer hacker resents the rise in the use of the term hacker in the media to mean malicious computer criminal, most chemists I know are quick to dismiss the silly bias against 'chemicals' in the media. But the term has become a catchphrase for the larger population, and pointing out that everything is made of chemicals has little effect. 'Organic' food is the same way - no one would eat inorganic cucumbers (aka rocks), but the word organic means something else in that context.

Long-hand chemical names won't fix it, because your eyes just gloss over the *fnord*perfluorooctanoic acid*fnord* chemical names. If you want to call out specific chemicals, give them a shorter name (maybe spell them out for people who really want to know), but then explain them and why they are bad.

There are plenty of naturally occuring chemicals that will kill you in small doses, there are manufactured chemicals that are perfectly safe to spray on your children, and every spectrum in between. If the media wants to call out 'chemicals', I think we would all appreciate them specifying which ones.

The whole 'fraking' thing is a great example of this. Most 'fraking fluid' is water and PEG (polyethylene glycol, a harmless 'chemical' found in lots of beauty products - see what I did here?). So who cares if you inject that into a shale formation miles below the water table? Are there other chemicals in there that might be harmeful? Could be (and often are). Call them out specifically if you want me to worry about them. But we have a problem here - people won't panic if you tell it like it is, making it much better to light someones tap water on fire! What's burning? Not 'fraking fluid', not any of those nasty 'chemicals', just natural gas that was probably there before any drilling started. But if you tell people that the oil companies are pumping nasty chemicals into the ground, and show them a faucet on fire, they'll draw their own conclusions, based an anecdotal evidence rather than logic and causality. And this is, or course, exactly what was supposed to happen in response to the scaremongering in the media.

People love to get riled up about something, and there are no shortage of chemicals that they could be getting riled up about. Some more careful journalism, and a requirement that most people need at least science 101 and math 101 to really understand the information they will be presented with would all be good, but I don't see any of those changes happening in the short term. It is at once wonderfully reassuring and extremely terrifying that you don't need a brain to have an opinion.

Comment The problem isn't completed genomes... (Score 2) 239 239

Though, there is quite a lot of that being generated these days.

The problem is the *raw* data - the files that come directly off of the sequencing instruments.

When we sequenced the human genome, everything came off the instrument as a 'trace file' - 4 different color traces, one representing a fluorescent dye for each base. These files are larger than text, but you store the data on your local hard drive and do the base calling and assembly on a desktop or beefy laptop by today's standards.

2nd gen sequencers (Illumina, 454, etc) take images, and a lot of them, generating many GB of data for even small runs. The information is lower quality, but there is a lot more of it. You need a nice storage solution and a workstation grade computer to realistically analyze this data.

3rd gen sequencers are just coming out, and they don't take pictures - they take movies with very high frame rates. Single molecule residence time frame rates. Typically, you don't store the rawest data - the instrument interprets it before the data gets saved out for analysis. You need high end network attached storage solutions to store even the 'interpreted' raw data, and you'd better start thinking about a cluster as an analysis platform.

This is what the article is really about - do you keep your raw 2nd and 3rd gen data? If you are doing one genome, sure! why not? If you are a genome center running these machines all the time, you just can't afford to do that, though. No one can really - the monetary value of the raw data is pretty low, you aren't going to get much new out of it once you've analyzed it, and your lab techs are gearing up to run the instrument again overnight...

The trick is that this puts you at odds with data retention policies that were written at a time when you could afford to retain all of your data...

Comment Re:Bioinformatics (Score 1) 314 314

Most of my phylogenetics stuff is unpublished, or mentioned only briefly in other papers. You pretty much have to use the software to know it exists. The 'phylogenetics' itself is mostly DNA neighbor-joining, but we're doing some ancestral state reconstructions to look at what amino acid changes define certain clades (using Fitch parsimony, mostly), and how those chances correlate with observed antigenicity and activity measurements. Turns out that in my particular application, neighbor-joining is the only thing fast enough, and it produces trees that are nearly identical to any of the statistically rigorous methods. All the interesting analysis happens after you build a tree.

What sort of analysis are you working on? Just because a tiny handful of people use it doesn't mean it isn't a cool technique. Good luck procrastinating on your paper.

Comment Bioinformatics (Score 3, Interesting) 314 314

The 'problem' with bioinformatics is that the field is extremely broad. Unless you write BLAST or one of the big sequence assemblers, your software is only going to appeal to a tiny fragment of an already small bioinformatics community.

I wrote software as part of my Ph.D. that is now distributed world wide. I guarantee you've never heard of it - it sets the standard for how to do certain types of phylogenetic analysis, but almost no one does that analysis.

During my time as a postdoc, I wrote a very simple curve fitting routine and put a minimal GUI on top of it. I am now getting requests from multiple countries to modify it to read in files from their instrumentation. Once again, only the tiniest handful of people care, but for those people, this is revolutionary stuff.

The question here is, how do you define success? Like a lot of the responses to this thread, I wrote a small script here or there to solve my own problem. Turns out, it solved a problem for someone else, too. My best known piece of software was a hack, a one-off script, written in an afternoon, that I got yelled at for even bothering to spend time on, and was only ever intended for my own use. It turned out to be the lynchpin for our project, got published in a peer reviewed journal, and has since gone global. I found out later that one of my undergrad computer science profs had solved the same problem 20 years before I did, in a more elegant way, and published it in a good, but non-science, journal - no one has ever heard of it.

Neither of us had the expectation that our software would amount to much. I would define the prof's work as 'successful' - he published a paper on an interesting academic topic. I would define my software as 'wildly successful' - I got an unexpected publication and a global (if small) user base, along with a reputation for fixing problems that would later get me a good postdoc position.

This isn't really an academia question. The most common advice in the open source community is 'scratch an itch'. Write something to fix a problem you see. If you write good stuff, maybe your code will become 'successful'. Or, maybe your afternoon worth of hacking will just turn into an afternoon worth of experience you can apply to the next problem.

Comment No feedback mechanism (Score 3, Insightful) 123 123

How in the world will setting filters on a database put a bacteria in a lab half way around the world at an evolutionary disadvantage? The bacteria will still grow, contaminate the sample, and get sequenced, but the sequence will be rejected. There is no feedback mechanism here, no selective pressure.

Genome sequence assembly is pretty far removed from the milieu in which a bacteria must make it's way. And inadvertently including bacterial sequences on a gene expression chip is sloppy science, but hardly news.

Traditional computer viruses are the only things that truly 'reproduce' in silico. Memes are your next best option, but the 'net is just a carrier - they have to infect a human host to reproduce. Stay away from 4chan if you want to avoid infection...

But bacteria? In silico? Where are we going with this strained analogy, anyway?

Comment Good for science and engineering, too (Score 1) 318 318

It may have been designed for financial calculations, but it holds its own for science and engineering tasks, too. A lot of problems in a lot of fields lend themselves very naturally to RPN workflows.

I learned to use these from my dad - he still has his, and I'm not sure there is any so-called feature that could ever make him give it up. Even when I was required to have a TI graphing calculator for classes, I found myself using it in RPN-style due to having learned to use the old HP (the last result is stored, allowing you to use the 'ans' key as a very short stack).

The 12C and friends are, in my opinion, nearly perfect as far as pure calculators go. They don't do anything your cell phone can't these days, but I've never met an app that felt as natural for handling pure computational tasks, and I have never needed to place a call from my calculator. Sometimes, purpose-built hardware is just better.

Comment Re:Letting it all out (Score 1) 55 55

From here:

I don't do test-driven development; I do stupidity-driven testing. When I do something stupid, I write a test to make sure I don't do it again.

If you are the hardcore TDD type, that probably sounds like blasphemy. However, if you don't want to do full on TDD, at least consider SDT. Write tests for stuff you already screwed up - that way you never make the same mistake twice. SDT makes testing another tool for a solid coder (who may make mistakes, because of being human).

Unfortunately, the acronym SDT gives so many more opportunities for embarrassing typos...

Comment 9th doctor is the easy starting point (Score 1) 655 655

I watched some of the old stuff as a kid with my dad, missed the series starting back up, then got my wife into it right after the birth of my daughter. We had a lot of time where there was a sleeping baby preventing you from doing very much, so we got a NetFlix subscription and caught up on what had been happening since the 2005 revival of the show.

Start with the 9th doctor (Christopher Eccleston). The show had a great run in it's heyday, fizzled a bit, and went off the air for ~10 years. When it was revived, it kept a lot of the flavor of the old show, but didn't assume that you as the viewer had seen any of the old stuff. This is a great starting point, especially if you don't know much about the old series. From there, you move on to David Tenent - getting top quality actors and giving them weird stuff to do is awesome entertainment. The techno-babble is techno-lite and babble-heavy, and you don't even care because it is *fun*. Torchwood spins off from the second season of David Tenet, so that should give you a little backstory that was lacking if you started watching that series cold.

Once you've done that much, jump around a bit. Matt Smith's 11th doctor started a little slow, but is growing on me. Catching some of the original William Hartnell 1st doctor episodes was a hell of a throwback, and honestly, it is amazing how much conceptual integrity there is in the show even from those early days. Big chunks of Patrick Troughton's 2nd doctor are missing (destoryed by BBC for shelf space). The 3rd and 4th doctor episodes (Jon Pertwee, Tom Baker) are some of the most iconic and best remembered. Doctors 5-8 are a bit of a hole in my experience - I don't remember them from being young, and I haven't gotten back to them this time around (not yet, anyway).

Because Dr Who is such a part of British scifi culture, it also has a lot of material that surrounds it. Articles, books, radio shows, and 'Dr Who Confidential' (a making of for the new series), tabloid speculations about both the doctor and favorite companion characters, etc. If you like to go deep, there is plenty of extras out there.

Comment Would a rose, by any other name (Score 1) 225 225

Still smell as sweet?

Naming and labeling things in science is as old as science itself. Often, though, as our understanding changes, so to must the old naming scheme. Usually the knowledge change becomes obvious in the scientific community - the facts are the facts, after all.

What causes all of the consternation is almost always semantics about the classification.

If there is a clear-cut scientific definition, go ahead and assign names and classifications.

Too often, though, there is no clear-cut definition, the labels don't correspond directly to the categories they are supposed to describe, and the 'formal' language fails to form a commonly accepted means of communicating your ideas, which was probably the whole point of assigning labels in the first place. SO DON'T ASSIGN THE LABELS! They don't solve the problem they are supposed to solve, they make new problems, don't bother with them.

Instead, let the language evolve as the knowledge does. I'm sure all practicing astronomers interested in galaxy scale structures share a roughly isomorphic understanding of what a galaxy is, and would agree about how to classify a pretty large subset of galaxy-ish objects. The interesting stuff - they things they actively research - will often be in the grey areas, anyway, defying classification. Don't worry about it, just go write the paper describing 'weird new not quite galaxy thing I found', and describe it as best as possible. The knowledge base will grow, the language will evolve right along with it, and we won't have to undo some silly bit of formalism the next time someone finds something that defies description.

Comment Machine Vision is the only 'AI' here... (Score 1) 206 206

[quote]Have you ever had the feeling that AI is getting just a little be too commonplace?[\quote]

I wrote a 'human' version of a sudoku solver on vacation a couple of years ago - on the flight between two Hawaiian islands. It would have been easier/shorter to write the recursive solver that will solve any sudoku board, but I wanted to write code that works the puzzle similar to how I do it by hand. There wasn't much there deserving of being called 'AI'.

The only thing vaguely AI about this is the 'Machine Vision' needed to recognize a sudoku board and ocr the numbers. I'm no AI expert, but since the board is a well-defined grid and the numbers are printed in a clean, bold, large computer font, it seems to me that most machine vision researchers could hack together to code to do this in an afternoon, made mostly from pre-existing code they had lying around anyway.

Don't get me wrong - I love my Android phone, I'm drooling at the prospect of upgrading to a faster one in the next year or so, Google Goggles is a sweet app (it impressed the hell out of my non-technical, non-gadget-loving sister, and that is hard to do), etc, etc.

But this isn't AI - this is sophomore algorithms homework, graduate-level (but not at all new) machine vision, and decent camera phone hardware. This isn't revolutionary, and it almost doesn't qualify for evolutionary. It comes closer to being inevitable. But think of how much more productive we can all be now that computers will solve all those pesky sudokus for us!

Comment Re:What's next? (Score 1) 611 611

Typical, short-sighted, pure-expectation value reasoning, that only applies if the value of a dollar is absolute.

The only way to have any chance to win is not to play much.

I play the lottery - just very infrequently. Let's say I play 20 times, total, over the course of my whole life, paying just one dollar for one set of numbers each time. My 'expected winnings' is a bit less than $10, but my most likely outcome is $0 for me, a $10 donation to support open space/wildlife/etc and a $10 donation to J. Random Lotto Player, based on the skewed payout schedule. That $20 I'm out represents zero risk to my financial well being - I've lost more than that in random pockets of an old jacket, and never missed it. In return, I have a non-zero chance of winning hundreds of millions of dollars. That kind of money would be a life changer.

I know exactly what the risks and rewards are, and exactly how low the probabilities are of me winning any real money. I know that because every lotto drawing is an independent event, my odds don't go up by playing more than that.

If you never play, you never win. If you play just a little, there is a very small chance you might win a crap-ton of money. If you play a lot, you'll definitely pay a lot, and there is essentially that same very small chance you might win a crap-ton of money. One dollar every now and then for a shot at that much money is totally worth it to me. People that think they are actually going to win tend to pay $5 a week until they die, but their odds of winning are only a hair better than mine - those are the stupid people that make it worth my time to play one set of numbers every couple of years.

Your 'don't play at all' argument does apply to slot machines, though - payouts are too small, time invested is too large, and slot machines aren't actually any fun.

Comment Be like Google... (Score 1) 235 235

"Search, don't sort".

The size and complexity of your data management should match the size and complexity of your data set. If you have thousands of datasets, give serious consideration to a relational database. Store all of your metadata (pH, date, etc) in the database so you can query it easily. If your raw data lives in a text-based format, put it in the database too, otherwise just store the path to your file in the database and keep your files in some sort of simple date-based archive or whatever.

Now, you can start to search though the data by thinking about which sets of data to compare. Much easier.

This is very general advice - if you have one experimenter and a couple of experiments, just use a lab notebook. If you have a handful of experimenters and ~100 experiments, try a spreadsheet or well organized structure on disk. If you have many people involved, or thousands of experiments, or both, you need something to help manage all of that in a way that lets you think in terms of sets rather than individual data files. Otherwise, you'll find yourself wearing your 'data steward' hat way to often, and not wearing your 'experimentalist' or 'analyst' hats much at all.

Comment Re:As a flu researcher... (Score 1) 158 158

No question, we thought the mortality rate was high(er) when this started, but quoting 50% is silly - it was endemic in rural Mexico for ~2 months before it was recognized in the US. Their health care isn't great, and certainly some people died that didn't need to, but nothing like 50%.

If it had been that high, the Mexican government would have recognized that they had a problem. As was, it wasn't until there were two cases in the US that were identified and characterized that Mexico was forced to acknowledge that they had a problem, and by that point, the cat was out of the bag. The CDC had done a bunch of pandemic planning, and had to throw all of it out, because they never accounted for the idea flu could get into humans and have two months to spread before they would find out about it.

You are right, though - we dodged a bullet. That was about the best possible scenario for a flu pandemic in terms of number of people killed. The media screaming pandemic, the increased mortality in children and pregnant women, and health care workers who refused to be vaccinated were less than ideal, but not all that many people died relative to what it could have been.

You can't go home again, unless you set $HOME.