Not that I'm a pro or anything, but junk DNA was anything that didn't encode proteins, right?
No, that's "non-coding DNA". The Ars Technica article has a very nice Venn diagram. In short, we infer that most non-coding DNA is junk DNA because it shows signs of neutral drift (i.e. it doesn't matter to reproductive fitness), but non-coding DNA is different from junk DNA, and regulatory DNA is always non-coding but can be either junk or non-junk.
Some concrete examples (with Venn diagram colors in parens):
- Coding DNA that isn't junk (white): a gene.
- Coding DNA that is junk (blue): an endogenous retrovirus.
- Regulatory non-coding DNA that isn't junk (orange/yellow): a promoter for a gene.
- Regulatory non-coding DNA that is junk (orange/yellow/blue): a promoter for a pseudogene.
- Non-regulatory non-coding DNA that isn't junk (yellow): hmm... an intron, I guess.
- Non-regulatory non-coding DNA that is junk (yellow/blue): the letters "CGG" 30 times in a row on the X chromosome. (See aside below for more info.)
(Terminology: a "pseudogene" is a gene damaged so badly by frame shifts or early stop codons that it can't code for protein anymore. Before they break and become pseudogenes, they're often duplicates of some existing gene, which is why breaking them can be fitness-neutral. DNA transposons and sloppy cross-overs in meiosis make gene duplication reasonably common. Gene duplication is important for evolution as well: duplicated genes are free to mutate in random directions until they stumble on a new useful function, with the original free to keep the old one. For instance, the vertebrate blood clotting cascade was clearly formed from several rounds of dupe-then-mutate, and similarly with the huge family of myosin muscle proteins.)
(Terminology: an "intron" is a stretch of DNA that gets snipped out of the resulting RNA before the RNA can code for protein. It's not quite junk: an intron has recognition signals that say "please cut RNA here", and IIRC the intron needs to have roughly the correct length, but most of the intron is arbitrary nonsense. Some genes have alternative splices, where the same gene can code for different proteins by swapping in different coding regions -- "exons" -- like lego bricks. Alternative splices are important in the immune system, for instance: they're how antibodies work. And the alternative splicing stuff wouldn't be possible without introns, including the nonsense filler that helpfully spaces out the exons so the splice enzymes can operate correctly.)
(Aside: long sequences of repetitive DNA can trip up the DNA polymerase enzyme that copies DNA, causing the stretch of DNA to lengthen itself in the next generation... and the longer it gets, the better the chance is that DNA polymerase will screw up and make it longer still. The ...CGG-CGG-CGG... sequence I mentioned has about 30 repeats in healthy individuals; but if the number of repeats climbs high enough, it causes Fragile X syndrome. Apparently the nucleus tries to silence the repeat by attaching methyl groups (CH3), which is standard procedure in the nucleus for turning off misbehaving DNA, but methylation isn't terribly precise and a nearby promoter happens to live nearby. This promoter is responsible for a nearby gene that's important in brain development; if the promoter is silenced by methylation, the reduced gene expression causes a form of severe autism.)