TwilightXaos - Slashdot User

Comment Re:the way I see it (Score 4, Informative) 533

by Samantha Wright on Friday June 28, 2013 @12:12PM (#44133329) Attached to: Boston Marathon Bomber Charged With Using 'Weapon of Mass Destruction'

If you look at the laws themselves it's a bit weird; 18 USC sec. 2332a seems to introduce the term "weapons of mass destruction" for the sole purpose of re-naming a definition provided in 18 USC sec. 921 called "destructive device," which dates to 1934 at the latest. I'm not savvy enough to figure out when the "WMD" terminology was introduced, but it's at least older than 1996 and seems to serve no purpose other than sounding grandiose.

Comment Re:Good. (Score 1) 433

by Samantha Wright on Friday June 28, 2013 @12:03PM (#44133211) Attached to: Reject DRM and You Risk Walling Off Parts of the Web, Says W3C Chief

Of course, that also requires that the patient is unaware of it (unlikely), or that they cannot speak (in which case there's probably a caregiver who does...) I suppose it could happen, but it's pretty combinatorically hard. Antibiotic allergies tend to manifest symptoms that are fairly non-life-threatening and limited by the dose size, they can be tested for in advance if the medical practitioner has cause for concern, and tend to go away as you grow up. That being said, there can be other fairly serious drug allergies, so the point's not moot.

Comment Re:To put this into perspective (Score 1) 138

by Samantha Wright on Friday June 28, 2013 @11:51AM (#44133025) Attached to: The DNA Data Deluge

Even within biology this is pretty stale news. I'm pretty sure this story is technically a shill piece for the products mentioned: Hadoop and Amazon ECC.

Comment Re:At least they're not rolling their own. (Score 1) 138

by Samantha Wright on Friday June 28, 2013 @11:48AM (#44132959) Attached to: The DNA Data Deluge

Here's the lowdown on how BZGF works, as one example. In this case, there are many short distinct of DNA being stored together, each with offset and quality information, many of which may be identical. The compression is localized to smaller blocks (I'm not sure if they're 4096-byte disk sectors or something else.) You're right that there's probably some performance lost due to the misalignment, but 6 and 8 line up every 24 bits, so at worst that means patterns of four codons or three bytes—and a step of four amino acids is ideal for alpha helix motifs, so it's not all a loss.

And, yes, regarding individual genomes: I'm pretty sure that'd be all anyone stored if they didn't have to hold onto the FASTQ files for auditability.

Comment Re:At least they're not rolling their own. (Score 2) 138

by Samantha Wright on Friday June 28, 2013 @02:25AM (#44129827) Attached to: The DNA Data Deluge

It's a neat thought, but it would never beat the basics. While there are a lot of genes that have common ancestors (called paralogues), the hierarchical history of these genes is often hard to determine or something that pre-dates human speciation; for example, there's only one species (a weird blob a little like a multi-cellular amoeba) that has a single homeobox gene.

While building a complete evolutionary history of gene families is of great interest to science, it's pointless to try exploiting it for compression when we can just turn to standard string methods; as has been mentioned elsewhere on this story, gzip can be faster than the read/write buffer on standard hard drives. Having to replay an evolutionary history we can only guess at would be a royal pain.

That being said, we can store individuals' genomes as something akin to diff patches, which brings 3.1 gigabytes of raw ASCII down to about 4 MB of high-entropy data, even before compression.

Comment Nothing adds class like a live Twitter feed (Score 1) 60

by Samantha Wright on Friday June 28, 2013 @12:23AM (#44129439) Attached to: PayPal Spaces Out With Paypal Galactic

Highlights from the bottom of the PayPal Galactic page:

@Stratocumulus: RT @lbillin: #paypalgalactic Incur debt in space! Paypal wants to help http://t.co/cqVsVyCy0B

@JodyYeoh: I visited space and all I got was a probe. #PayPalGalactic

Comment Re:To put this into perspective (Score 1) 138

by Samantha Wright on Friday June 28, 2013 @12:18AM (#44129427) Attached to: The DNA Data Deluge

Well, if you really need to have that kind of contest...

The data files being discussed are text files generated as summaries of the raw sensor data from the sequencing machine. In the case of Illumina systems, the raw data consists of a huge high-resolution image; different colours in the image are interpreted as different nucleotides, and each pixel is interpreted as the location of a short fragment of DNA. (Think embarrassingly parallel multithreading.)

If we were to keep and store all of this raw data, the storage requirements would probably be a thousand to a million times what they currently are—to say nothing of the other kinds of biological data that's captured on a regular basis, like raw microarray images.

Comment Re:Oddly... I have a clue about this stuff lately (Score 1) 138

by Samantha Wright on Friday June 28, 2013 @12:14AM (#44129405) Attached to: The DNA Data Deluge

CNVs actually can be detected if you have enough read depth; it's just that most assemblers are too stupid (or, in computer science terms, "algorithmically beautiful") to account for them. SAMTools can generate a coverage/pileup graph without too much hassle, and it should be obvious where significant differences in copy number occur.

(Also, the human genome is about 3.1 gigabases, so about 3.1 GB in FASTA format. De novo assembles will tend to be smaller because they can't deal with duplications.)

Comment Re:AO-Hell metrics... (Score 1) 138

by Samantha Wright on Friday June 28, 2013 @12:06AM (#44129379) Attached to: The DNA Data Deluge

I think you mean "exciting and hitherto unleveraged microwaveable coaster opportunities."

Comment Re:The answer is obvious! (Score 1) 138

by Samantha Wright on Friday June 28, 2013 @12:05AM (#44129369) Attached to: The DNA Data Deluge

Wait, I've heard this one before.

Comment Re:Who uses DVDs? (Score 4, Funny) 138

by Samantha Wright on Friday June 28, 2013 @12:00AM (#44129351) Attached to: The DNA Data Deluge

And we can double storage efficiency by using two stacks! Clearly, they need to hire one of us.

Comment Re:At least they're not rolling their own. (Score 4, Informative) 138

by Samantha Wright on Thursday June 27, 2013 @11:59PM (#44129343) Attached to: The DNA Data Deluge

I can't comment on the physics data, but in the case of the bio data that the article discusses, we honestly have no idea what to do with it. Most sequencing projects collect an enormous amount of useless information, a little like saving an image of your hard drive every time you screw up grub's boot.lst. We keep it around on the off chance that some of it might be useful in some other way eventually, although there are ongoing concerns that much of the data just won't be high enough quality for some stuff.

On the other hand, a lot of the specialised datasets (like the ones being stored in the article) are meant as baselines, so researchers studying specific problems or populations don't have to go out and get their own information. Researchers working with such data usually have access to various clusters or supercomputers through their institutions; for example, my university gives me access to SciNet. There's still vying for access when someone wants to run a really big job, but there are practical alternatives in many cases (such as GPGPU computing.)

Also, I'm pretty sure the Utah data centre is kept pretty busy with its NSA business.

Comment Re:Good. (Score 1) 433

by Samantha Wright on Thursday June 27, 2013 @05:39PM (#44126911) Attached to: Reject DRM and You Risk Walling Off Parts of the Web, Says W3C Chief

That's a good counterpoint—although, just to play devil's advocate, it could be argued in that case the blame belongs to the medical practitioner who assumed there was nothing wrong with you and didn't re-run all the relevant tests. Medical histories are more or less context-free these days; if something's still relevant, it can be re-discovered. Except perhaps mental illness, since we don't have the same quality of diagnostic tools for psychological profiling.

Comment Re:Good. (Score 1) 433

by Samantha Wright on Thursday June 27, 2013 @04:47PM (#44126225) Attached to: Reject DRM and You Risk Walling Off Parts of the Web, Says W3C Chief

I think you've got the question backward—it's "How can DRM kill you?" to which the answer is "metaphorically" with the possible post-script "it's a bit too late to protest hyperbole."

Comment Re:Phenotipyc variance (Score 1) 204

by Samantha Wright on Thursday June 27, 2013 @12:28PM (#44123005) Attached to: Industrious Dad Finds the Genetic Culprit To His Daughters Mysterious Disease

This happens occasionally in animal breeding. Blue eyes in a white-furred cat has a high chance of indicating deafness.

That being said, however, the definition of "proper" biochemical function is relative, so you can't really say that a developmental gene that produces healthy results is really malfunctioning. A lot of subtle differences between people are caused by changes in how long or how tightly two proteins interact. You could call the European light skin phenotype evidence of a defective gene, because it's defined by a shortage of melanosomes, which protect the body from UV light. (On the other hand, it improves vitamin D production, which requires UV light.)

There are even plenty of cases in the human body where healthy behaviour depends on what should be, by all rights, improper gene function: the cervical plug is made up largely of malformed virus particles (just the shells) which our ancestors commandeered millions of years ago. Without this strange adaptation, most pregnancies would fail. The attached placenta also owes its heritage to viral genes; without it, newborn human babies wouldn't be much larger than newborn rats.

Slashdot Top Deals