Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×

Comment Re:the way I see it (Score 4, Informative) 533

If you look at the laws themselves it's a bit weird; 18 USC sec. 2332a seems to introduce the term "weapons of mass destruction" for the sole purpose of re-naming a definition provided in 18 USC sec. 921 called "destructive device," which dates to 1934 at the latest. I'm not savvy enough to figure out when the "WMD" terminology was introduced, but it's at least older than 1996 and seems to serve no purpose other than sounding grandiose.

Comment Re:Good. (Score 1) 433

Of course, that also requires that the patient is unaware of it (unlikely), or that they cannot speak (in which case there's probably a caregiver who does...) I suppose it could happen, but it's pretty combinatorically hard. Antibiotic allergies tend to manifest symptoms that are fairly non-life-threatening and limited by the dose size, they can be tested for in advance if the medical practitioner has cause for concern, and tend to go away as you grow up. That being said, there can be other fairly serious drug allergies, so the point's not moot.

Comment Re:At least they're not rolling their own. (Score 1) 138

Here's the lowdown on how BZGF works, as one example. In this case, there are many short distinct of DNA being stored together, each with offset and quality information, many of which may be identical. The compression is localized to smaller blocks (I'm not sure if they're 4096-byte disk sectors or something else.) You're right that there's probably some performance lost due to the misalignment, but 6 and 8 line up every 24 bits, so at worst that means patterns of four codons or three bytes—and a step of four amino acids is ideal for alpha helix motifs, so it's not all a loss.

And, yes, regarding individual genomes: I'm pretty sure that'd be all anyone stored if they didn't have to hold onto the FASTQ files for auditability.

Comment Re:At least they're not rolling their own. (Score 2) 138

It's a neat thought, but it would never beat the basics. While there are a lot of genes that have common ancestors (called paralogues), the hierarchical history of these genes is often hard to determine or something that pre-dates human speciation; for example, there's only one species (a weird blob a little like a multi-cellular amoeba) that has a single homeobox gene.

While building a complete evolutionary history of gene families is of great interest to science, it's pointless to try exploiting it for compression when we can just turn to standard string methods; as has been mentioned elsewhere on this story, gzip can be faster than the read/write buffer on standard hard drives. Having to replay an evolutionary history we can only guess at would be a royal pain.

That being said, we can store individuals' genomes as something akin to diff patches, which brings 3.1 gigabytes of raw ASCII down to about 4 MB of high-entropy data, even before compression.

Comment Re:To put this into perspective (Score 1) 138

Well, if you really need to have that kind of contest...

The data files being discussed are text files generated as summaries of the raw sensor data from the sequencing machine. In the case of Illumina systems, the raw data consists of a huge high-resolution image; different colours in the image are interpreted as different nucleotides, and each pixel is interpreted as the location of a short fragment of DNA. (Think embarrassingly parallel multithreading.)

If we were to keep and store all of this raw data, the storage requirements would probably be a thousand to a million times what they currently are—to say nothing of the other kinds of biological data that's captured on a regular basis, like raw microarray images.

Comment Re:Oddly... I have a clue about this stuff lately (Score 1) 138

CNVs actually can be detected if you have enough read depth; it's just that most assemblers are too stupid (or, in computer science terms, "algorithmically beautiful") to account for them. SAMTools can generate a coverage/pileup graph without too much hassle, and it should be obvious where significant differences in copy number occur.

(Also, the human genome is about 3.1 gigabases, so about 3.1 GB in FASTA format. De novo assembles will tend to be smaller because they can't deal with duplications.)

Comment Re:At least they're not rolling their own. (Score 4, Informative) 138

I can't comment on the physics data, but in the case of the bio data that the article discusses, we honestly have no idea what to do with it. Most sequencing projects collect an enormous amount of useless information, a little like saving an image of your hard drive every time you screw up grub's boot.lst. We keep it around on the off chance that some of it might be useful in some other way eventually, although there are ongoing concerns that much of the data just won't be high enough quality for some stuff.

On the other hand, a lot of the specialised datasets (like the ones being stored in the article) are meant as baselines, so researchers studying specific problems or populations don't have to go out and get their own information. Researchers working with such data usually have access to various clusters or supercomputers through their institutions; for example, my university gives me access to SciNet. There's still vying for access when someone wants to run a really big job, but there are practical alternatives in many cases (such as GPGPU computing.)

Also, I'm pretty sure the Utah data centre is kept pretty busy with its NSA business.

Comment Re:Good. (Score 1) 433

That's a good counterpoint—although, just to play devil's advocate, it could be argued in that case the blame belongs to the medical practitioner who assumed there was nothing wrong with you and didn't re-run all the relevant tests. Medical histories are more or less context-free these days; if something's still relevant, it can be re-discovered. Except perhaps mental illness, since we don't have the same quality of diagnostic tools for psychological profiling.

Comment Re:Phenotipyc variance (Score 1) 204

This happens occasionally in animal breeding. Blue eyes in a white-furred cat has a high chance of indicating deafness.

That being said, however, the definition of "proper" biochemical function is relative, so you can't really say that a developmental gene that produces healthy results is really malfunctioning. A lot of subtle differences between people are caused by changes in how long or how tightly two proteins interact. You could call the European light skin phenotype evidence of a defective gene, because it's defined by a shortage of melanosomes, which protect the body from UV light. (On the other hand, it improves vitamin D production, which requires UV light.)

There are even plenty of cases in the human body where healthy behaviour depends on what should be, by all rights, improper gene function: the cervical plug is made up largely of malformed virus particles (just the shells) which our ancestors commandeered millions of years ago. Without this strange adaptation, most pregnancies would fail. The attached placenta also owes its heritage to viral genes; without it, newborn human babies wouldn't be much larger than newborn rats.

Slashdot Top Deals

Today is a good day for information-gathering. Read someone else's mail file.

Working...