rockmuelle - Slashdot User

Comment Re:Age vs experience... (Score 1) 233

by rockmuelle on Sunday November 18, 2012 @10:01PM (#42023457) Attached to: Hounded By Recruiters, Coders Put Themselves Up For Auction

Comment Re:Bioinformatics Bubble? (Score 4, Interesting) 38

by rockmuelle on Wednesday October 03, 2012 @03:59PM (#41542125) Attached to: ROSALIND: An Addictive Bioinformatics Learning Site

I run a bioinformatics software company, have been in the field for over a decade, and have worked in scientific computing even longer.

I'll start with a quick answer to the bubble question: there are already too many 'bioinformatics' grads but there are not enough bioinformatics professionals (and probably never will be). There are many bioinformatics Masters programs out there that spend two years exposing students to bioinformatics toolsets and give them cursory introductions to biology, computer science, and statistics. These students graduate with trade skills that have a short shelf life and lack the proper foundations to gain new skills. In that respect, there's a bubble, unfortunately.

If you're serious about getting into bioinformatics, there are a few good routes to take, all of which will provide you with a solid foundation to have a productive career.

The first thing to decide is what type of career you want. Three common career paths are Researcher, Analyst, and Engineer. The foundational fields for all are Biology, Computer Science (all inclusive through software engineering), and Statistics. Which career path you follow determines the mix...

Researchers have Ph.D.s and tend to pursue academic or government lab careers. Many research paths do lead to industry jobs, but these tend to morph into the analyst or engineer roles (much to the dismay of the researcher, usually). Bioinformatics researchers tend to have Ph.D.s in Biology, Computer Science, Physics, Math, or Statistics. Pursing a Ph.D. in any of these areas and focusing your research on biologically relavent problems is a good starting point for a research career. However, there are currently more Ph.D.s produced than research jobs available, so after years in school, many bioinformatics-oriented Ph.D.s tend to end up in Analysis or Engineering jobs. Your day job here is mostly grant writing and running a research lab.

Bioinformatics Analysts (not really a standard term, but a useful distinction) focus on analyzing data for scientists or performing their own analyses. A strong background in statistics is essential (and, unfortunately, often missing) for this role along with a good understanding of biology. Lab skills are not essential here, though familiarity with experimental protocols is. A good way to train for this career path is to get an undergraduate degree in Math, Stats, or Physics. This provides the math background required to excel as an analyst along with exposure to 'hard science'. Along the way, look for courses and research opportunities that involve bioinformatics or even double major in Biology. Basic software skills are also needed, as most of tools are Linux-based command line applications. Your day job here is working on teams to answer key questions from experiments.

Bioinformatics engineers/developers (again, not really a standard term, but bear with me) write the software tools used by analysts and researchers and may perform research themselves. A deep understanding of algorithms and data structures, software engineering, and high performance computing is required to really excel in this field, though good programming skills and a desire to learn the science are enough to get started. The best education for this path is a Computer Science degree with a focus on bioinformatics and scientific computing (many problems that are starting to emerge in bioinformatics have good solutions from other scientific disciplines). Again, aligning additional coursework and undergraduate research with biologists is key to building a foundation. A double major in Biology would be useful, too. To fully round this out, a Masters in Statistics would make a great candidate, as long as their side projects were all biology related. Your day job here is building the tools and infrastructure to make bioinformatics function.

All three career paths can be rewarding and appeal to different mindsets.

If you haven't followed the NPR series on gene sequencing over the last few weeks, it's definitely worth listening to. I also did a talk a few years back at TEDxAustin on the topic that makes the connection between big data and sequencing ( http://bit.ly/mueller-tedxaustin ). Affordable sequencing is changing biology dramatically. Going forward, it will be hard to practice some parts biology without sequencing and sequencing needs informatics to function.

Good luck!

-Chris

Comment Agile Manifesto (Score 2) 491

by rockmuelle on Saturday July 14, 2012 @12:07PM (#40648989) Attached to: New Analyst Report Calls Agile a Scam, Says It's An Easy Out For Lazy Devs

Comment iPhone + BT keyboard + HDMI/DVI adapter (Score 2) 339

by rockmuelle on Sunday June 17, 2012 @03:36PM (#40353501) Attached to: Ask Slashdot: Instead of a Laptop, a Tiny Computer and Projector?

Comment Re:This is what I like about Microsoft (Score 5, Insightful) 118

by rockmuelle on Wednesday May 23, 2012 @02:00PM (#40091093) Attached to: Microsoft Research Introduces Record-Beating MinuteSort Tech

Submission + - Just what is 'Big Data'?

Submitted by rockmuelle on Wednesday February 22, 2012 @05:06PM

Comment Work with your company's legal team (Score 1) 467

by rockmuelle on Sunday February 12, 2012 @06:02PM (#39013833) Attached to: Dealing With an Overly-Restrictive Intellectual Property Policy?

Comment Re:Physical keyboard? (Score 2) 188

by rockmuelle on Tuesday February 07, 2012 @09:56AM (#38952875) Attached to: Halliburton To Dump Blackberry For iOS

Comment TEDx Talk on the Subject (Score 3, Informative) 239

by rockmuelle on Friday December 02, 2011 @06:00PM (#38244100) Attached to: Genome Researchers Have Too Much Data

I did a talk on this a few years back at TEDx Austin (shameless self promotion): http://www.youtube.com/watch?v=8C-8j4Zhxlc

I still deal with this on a daily basis and it's a real challenge. Next-generation sequencing instruments are amazing tools and are truly transforming biology. However, the basic science of genomics will always be data intensive. Sequencing depth (the amount of data that needs to be collected) is driven primarily by the fact that genomes are large (e. coli has around 5 M bases in it's genome, humans have around 3 billion) and biology is noisy. Genomes must be over-sampled to produce useful results. For example, detecting variants in a genome requires 15-30x coverage. For a human, this equates to 45-90 Gbases or raw sequence data, which is roughly 45-90 GB of stored data for a single experiment.

The two common solutions I've noticed mentioned often in this thread, compression and clouds, are promising, but not yet practical in all situations. Compression helps save on storage, but almost every tool works on ASCII data, so there's always a time penalty when accessing the data. The formats of record for genomic sequences are also all ASCII (fasta, and more recently fastq), so it will be a while, if ever, before binary formats become standard.

The grid/cloud is a promising future solution, but there are still some barriers. Moving a few hundred gigs of data to the cloud is non-trivial over most networks (yes, those lucky enough to have Internet2 connections can do it better, assuming the bio building has a line running to it) and, despite the marketing hype, Amazon does not like it when you send disks. It's also cheaper to host your own hardware if you're generating tens or hundreds of terabytes. 40 TB on Amazon costs roughly $80k a year whereas 40 TB on an HPC storage system is roughly $60k total (assuming you're buying 200+ TB, which is not uncommon). Even adding an admin and using 3 years' depreciation, it's cheaper to have your own storage. The compute needs are rather modest as most sequencing applications are I/O bound - a few high memory (64 GB) nodes are all that's usually needed.

Keep in mind, too, that we're asking biologists to do this. Many biologists got into biology because they didn't like math and computers. Prior to next-generation sequencing, most biological computation happened in calculators and lab notebooks.

Needless to say, this is a very fun time to be a computer scientist working in the field.

-Chris

Comment Queasy... (Score 1) 332

by rockmuelle on Tuesday November 29, 2011 @07:22PM (#38208512) Attached to: A Floating Home For Tech Start-ups

Comment Re:Possible use... (Score 2) 412

by rockmuelle on Monday November 14, 2011 @08:00PM (#38054294) Attached to: China Building Gigantic Structures In the Desert

Comment Phidgets (Score 2) 147

by rockmuelle on Friday November 11, 2011 @11:32PM (#38032344) Attached to: Ask Slashdot: Physical Input Devices For Developers?

Comment HPC Applications (Score 1) 314

by rockmuelle on Tuesday September 27, 2011 @12:09PM (#37527480) Attached to: Ask Slashdot: Successful Software From Academia?

Comment DNA, RNA, and Genes (Score 5, Informative) 173

by rockmuelle on Saturday July 30, 2011 @10:22AM (#36932572) Attached to: Ruling Upholds Gene Patent In Cancer Test

Comment Re:Depends... (Score 1) 244

by rockmuelle on Monday February 28, 2011 @11:49AM (#35338316) Attached to: Is Attending a CS Conference Worth the Time?

Slashdot Top Deals