mikewas - Slashdot User

Submission + - How Perl Saved the Human Genome Project (dobbscodetalk.com)

Submitted by

viyh

on Saturday May 30, 2009 @11:49PM

viyh writes: "The human genome project was inaugurated at the beginning of the decade as an ambitious international effort to determine the complete DNA sequence of human beings and several experimental animals. The justification for this undertaking is both scientific and medical. By understanding the genetic makeup of an organism in excruciating detail, we hope to better understand how organisms develop from single eggs into complex multicellular beings, how food is metabolized and transformed into the constituents of the body, and how the nervous system assembles itself into a smoothly functioning ensemble. From the medical point of view, the wealth of knowledge that will come from knowing the complete DNA sequence will greatly accelerate the process of finding the causes of, and potential cures for, human diseases.

From the beginning, researchers realized that informatics would have to play a large role in the genome project. An informatics core formed an integral part of every genome center that was created. The mission of this core was twofold: to provide computer support and database services for their affiliated laboratories, and to develop data analysis and management software for use by the genome community as a whole.

Consider the steps that may be performed on a bit of newly sequenced DNA. First, there's a basic quality check on the sequence: Is it long enough, and are the number of ambiguous letters below the maximum limit? Then, there's the "vector check." For technical reasons, the human DNA must be passed through a bacterium before it can be sequenced (this is the process of "cloning"). Not infrequently, the human DNA gets lost somewhere in the process, and the sequence that's read consists entirely of the bacterial vector. The vector check ensures that only human DNA gets into the database.

Next, there's a check for repetitive sequences. Human DNA is full of repetitive elements that make fitting the sequencing jigsaw puzzle together challenging. The repetitive-sequence check tries to match the new sequence against a library of known repetitive elements. The penultimate step is to attempt to match the new sequence against other sequences in a large community database of DNA sequences. Often, a match at this point will provide a clue to the function of the new DNA sequence. After performing all these checks, the sequence (along with the information that's been gathered about it along the way) is loaded into the local laboratory database.

The process of passing a DNA sequence through these independent, analytic steps looks like a pipeline, and we realized that a UNIX pipe could handle the job. We developed a simple Perl-based data-exchange format called "boulderio" that allowed loosely coupled programs to add information to a pipe-based I/O stream. Boulderio is based on tag/value pairs. A Perl module makes it easy for programs to reach into the input stream, pull out only the tags it is interested in, do something with them, and drop new tags into the output stream. Tags that the program isn't interested in are just passed through to standard output so that other programs in the pipeline can get to them."

Comment Re:Work Experience (Score 1) 834

by mikewas on Monday May 11, 2009 @06:56PM (#27914891) Attached to: Go For a Masters, Or Not?

The working level at my location (primarily EE & CS, some other engineering & science) is BS + 2 masters degrees. Generally hired with a BS, then an MS in a technical field & MBA later in the career.

Comment Already in Toronto -- really bad for travellers (Score 2, Interesting) 585

by mikewas on Saturday February 14, 2009 @01:31PM (#26856767) Attached to: Automation May Make Toll Roads More Common

I ran into this system in Toronto a few years ago.

There's no way to pay manually. Sections that are toll aren't well marked. Cost isn't clearly defined and changes as a function of time and/or traffic density. So when turning in the rental car there's no way to determine the charges for tolls.

Months after the trip I got a bill from the car rental agency: cost of tolls + several taxes + surcharge by the car rental agency + a billing fee.

Can you tell I'm not a fan of this technology?! Car rental agency added costs were more than twice the cost of tolls.

Comment Worked 9/80 15 years ago, still miss it! (Score 1) 1055

by mikewas on Tuesday January 13, 2009 @10:22PM (#26442999) Attached to: How Does a 9/80 Work Schedule Work Out?

It's been almost 15 years and I still miss it -- it was great! We had 9 hour days Mon-Thu, either 0 or 8 hours on Friday.

The company originally instituted it in California to meet a mandate to reduce pollution by 20%. They shut down 1 day out of 10 + took credit for 10% reduction in employee commuting. It was popular enough that they spread it to other sites.

One side effect was that the week started/stopped at noon Friday. Part of the plant was unionized and union rules said anything over 40 hours in a week was paid at overtime rates, and if the company scheduled you for less than 40 hours work in the week you still got paid for 40 hours. Nothing in the contract specified when the week started. So it was timecards at noon Friday.

Comment Re:finally! (Score 5, Insightful) 369

by mikewas on Thursday January 01, 2009 @03:05PM (#26292205) Attached to: Security Checkpoints Predict What You Will Do

No, we'll only know what they think they want.

Slashdot Top Deals