Here are the requirements:
1) The setup and maintenance must be performed by people with above average intelligence but not much computing know-how.
2) Because of #1, it should be commercially available. A homebrew setup is unlikely to fly in this environment.
3) The system must support several terabytes of data
4) The system must be backed up routinely. Ideally, we should easily know whether or not the backups really happened.
5) The cost should be less than $1000.
For a small, tech-novice environment that needs both storage and backup, what is a good solution?"
Link to Original Source
So I work in biological sciences, and I have the special privilege of having the guy who sequenced the first cancer genome working down the hall from me (he's also my thesis committee).
There is now technology to sequence entire genomes very quickly using massive parallel sequencing. Ideally, if you were sequencing a tumor from a single person, you would get tissue from the tumor and also from the non-tumor (usually skin) and sequence them at the same time. Then you compare the two to distinguish what is simply variation in each person's genetics and what is acquired by the tumor. In my opinion, that's the best way to do things and probably the most informative because you're looking a tumor in a real person that is subject to all the selective evolutionary pressures that occur in people.
These groups didn't take that approach for reasons unclear to me. Instead, they sequenced cancer cell lines. If you cut out a person's tumor and stick it in a test tube with various growth factors, it will almost certainly die within a week or so. However, you occasionally get some cells that can grow in this situation because they've acquired some mutation that lets them grow in tissue culture. You then expand and passage these cells until they grow rapidly in culture. The problem here is that you're no longer dealing with a normal human tumor; you're selecting for tumor cells that grow in the artificial tissue culture environment. The second problem is that you're not sure what to compare the tumor sequence with. Due to privacy concerns, you almost never know who actually gave the tumor that was made into a cell line (as an aside, look up the HeLa cell line and its sordid history) so you have to compare to the human genome project. The problem here is that there are differences between people and you can't tell whether the "mutation" you see is just a normal variation or actually something in the tumor.
These are the important limitations you have to consider when evaluating these papers.
Now, on to your question. They have 30,000 changes in the DNA compared to their reference "normal" genome. Nearly all of those are in "junk" DNA: as far as we know, they don't code any genes or anything else that regulates genes. Of the ones that are in interesting regions, the vast majority of them are called synonymous mutations which means the DNA is changed but due to the way it is interpreted, the protein that it makes is identical (to use a computer analogy, imagine that an the opcode for JMP was changed from 01 to 02 but both 01 and 02 are translated by the computer as JMP).
Now, a certain number of mutations aren't like that. They either lead to truncated proteins, alter the amino acid sequence of proteins, alter mRNA splicing, etc. There are also other genetic changes such as duplications where the gene sequence is unchanged but may be copied several times to increase the gene dose. These are really the interesting things because they alter protein function or gene dose. From a brief reading, it looks like there are around 100 of these.
Now, it's really difficult to tell whether these mutations are really relevant to cancer progression. Some of them might just happen due to tumors just mutating really fast and not really affect the cancer progression one way or another; they are so called "passenger" mutations that just come along for the ride. You can introduce these mutations into cells in lab to see if they do anything, but the real test is to sequence a bunch of human cancers and see if certain mutations are recurrent. This work is currently underway and will prove very informative about how genetically heterogeneous tumors really are.
So, in short, there are about 100 haystacks. Further sequencing of other tumors will show if these are relevant to cancer in general. In my personal opinion, I think that further sequencing will identify very few common mutations and everyone's cancer will be essentially unique in the mutations it acquires. That will force us to completely rethink how we view cancer on a broader scale as not a single disease but a collection of highly related diseases that need to be treated individually.