Comment Re:Favorite quote from the article (Score 1) 34
Not that I'm a pro or anything, but junk DNA was anything that didn't encode proteins, right?
No, that's "non-coding DNA". The Ars Technica article has a very nice Venn diagram. In short, we infer that most non-coding DNA is junk DNA because it shows signs of neutral drift (i.e. it doesn't matter to reproductive fitness), but non-coding DNA is different from junk DNA, and regulatory DNA is always non-coding but can be either junk or non-junk.
Some concrete examples (with Venn diagram colors in parens):
- Coding DNA that isn't junk (white): a gene.
- Coding DNA that is junk (blue): an endogenous retrovirus.
- Regulatory non-coding DNA that isn't junk (orange/yellow): a promoter for a gene.
- Regulatory non-coding DNA that is junk (orange/yellow/blue): a promoter for a pseudogene.
- Non-regulatory non-coding DNA that isn't junk (yellow): hmm... an intron, I guess.
- Non-regulatory non-coding DNA that is junk (yellow/blue): the letters "CGG" 30 times in a row on the X chromosome. (See aside below for more info.)
(Terminology: a "pseudogene" is a gene damaged so badly by frame shifts or early stop codons that it can't code for protein anymore. Before they break and become pseudogenes, they're often duplicates of some existing gene, which is why breaking them can be fitness-neutral. DNA transposons and sloppy cross-overs in meiosis make gene duplication reasonably common. Gene duplication is important for evolution as well: duplicated genes are free to mutate in random directions until they stumble on a new useful function, with the original free to keep the old one. For instance, the vertebrate blood clotting cascade was clearly formed from several rounds of dupe-then-mutate, and similarly with the huge family of myosin muscle proteins.)
(Terminology: an "intron" is a stretch of DNA that gets snipped out of the resulting RNA before the RNA can code for protein. It's not quite junk: an intron has recognition signals that say "please cut RNA here", and IIRC the intron needs to have roughly the correct length, but most of the intron is arbitrary nonsense. Some genes have alternative splices, where the same gene can code for different proteins by swapping in different coding regions -- "exons" -- like lego bricks. Alternative splices are important in the immune system, for instance: they're how antibodies work. And the alternative splicing stuff wouldn't be possible without introns, including the nonsense filler that helpfully spaces out the exons so the splice enzymes can operate correctly.)
(Aside: long sequences of repetitive DNA can trip up the DNA polymerase enzyme that copies DNA, causing the stretch of DNA to lengthen itself in the next generation... and the longer it gets, the better the chance is that DNA polymerase will screw up and make it longer still. The