They've created an entire virtual machine for the sole purpose of font rendering. Doesn't that strike you as just a little bit over the top? Text is just symbols arranged on the screen -- I'm certain better ways of doing this could be imagined that wouldn't require an exploitable VM with root permissions
Spoken like someone who has never actually written code to display text. Sure, with monospaced bitmap fonts, this is an easy problem. For modern text, you start off with a set of bezier paths representing each glyph. That's fairly easy to render, and you can just start drawing each one to the right of the previous one. That will give you blurry characters with ugly spacing, but it's a start.
So how do you fix the blurriness? Now you need some hinting telling the renderer when it should try to snap lines to the nearest pixel rather than approximate it and just rely on antialiasing. Oh, and those hints have to work on every combination of point size for the font and pixel size for the display (and, ideally, for different sub-pixel layouts) and so they're heavily parameterised. Doesn't need to be quite Turing-complete yet, but you're getting very close to Lambda calculus, although you can get away without recursion.
But you still have spacing problems. Consider this trivial example: To. Now, in your naive approach, the left hand side of the o is the same distance from the right hand end of the cross-bar of the T. This distance will be the same as the distance between characters in nm. If you see this at the start of a word, like Tool, then it will look like there is more space between To than between oo or ol and that's ugly. So now you need some kerning hints that tell you how to tweak the spacing for each pair of letters, and these need to be parameterised over every pair of letters. For a simple ASCII font, that's 2^14 combinations, so you don't want to list them individually, you need to compute them.
And that's just very basic letter layout. On a typical window, you may have thousands of characters, which all need to be laid out correctly (and deterministically, so characters don't jump around on every redraw). And so this is on the fast path. Is it surprising that it ends up in the fast path?
Both Windows and *NIX have had serious exploits involving font rendering. X used to put FreeType in the X server (which ran as root), windows used to put an equivalent in the kernel. Both have resulted in vulnerabilities from documents that embed fonts. When you have something that's performance critical (slow text rendering translates to slow window updates, which directly translates to user-perceived slowness) and depends on user-provided data, it's not surprising that there are security holes. X11 now moves font rendering to the client (although, like Quartz, it composites the glyphs on the server), so a font exploit doesn't get you root, it just gets you arbitrary code execution in your current application, for example the web browser.