btw, I'm pretty sure you have an interesting point here when you said this:
Functional decomposition is a really poor way of abstracting complexity, when it's being used in isolation, and does not include mandatory boundary layer order and direction of operations over said boundary.
but I'm not entirely sure what you meant. Could you clarify? What other option is there besides functional decomposition?
DJB's philosophy is to minimize individual attack surfaces by reducing code complexity. This has three components, of which DJB himself is a proponent of two of them. I'm not sure whether this is because he doesn't realize that it's a consequence of his implementation paradigm, or whether he simply thinks it's too obvious to talk about. These are the components:
(1) Reduce complexity by separating the problem domains into individual processes. This separates necessary privilege escalations from other code, and separates cross-functionality address space based attacks on the code.
(2) Reduce complexity into functional time domains involving serialization of operations which could (potentially) otherwise take place in parallel. This is also done through use of individual processes, but is based on the trigger initiating the processes being separate, and therefore not under the control of an attacker. This increases the difficulty of an attack by requiring serial attacks for each component between the intermediate targets and the final target of an exploit (as in the previously referenced "shellshock" attack). For a shellshock attack, this particular precaution was meaningless, since there was a direct passthrough of the data without prevalidation without action before passing the data onto another component. In other words: the particular attack zips through this security precaution.
(3) This may or may not have been intentional, but he reduces the network and system call footprint for each of the components in such a way that it reduces the remotely accessible attack surface (you can only attack things you can talk to) to something which can be firewalled, and the system call footprint of individual components into something that could have local application sandboxing applied to prevent particular system calls being used by individual program components, or even sequences of system calls being used outside a particular order, or in excess of a particular number of times. This was probably not a design goal, given that neither deep packet inspection/stateful firewalls, nor sandboxing, were utilized in most systems at the time qmail was originally written.
That's cool and all, but it's taking a hammer to a problem which is actually a result of programmer discipline and machine architecture, and, frankly, some of those architecture issues have been addressed at the operating systems and compiler level for years, and others are better addressed through other mechanisms. It also failed miserably in intentional strategy #2, above.
The first mechanism is boundary layer violations. The most infamous email program in existence is Microsoft Outlook, and it's for good reason. Outlook engaged in interface layering violations. These are responsible for nearly all the initially exploited Outlook vulnerabilities.
What avoiding boundary layer violations means is that, if you are designing correctly, you identify architectural layers for your libraries in order to abstract complexity of each layer from the layer below it. As part of this, you define an interface contract: you are permitted to call down to the interfaces below yourself, and you are permitted to call across, within the same layer to auxiliary functions, but under no circumstances are you permitted to call upward. A good example of a boundary layer violation in libc is the use of a function pointer for the compare function in the qsort library routine, which will result into an upcall from the libc layer, to upper level code. In general, this should be avoided -- and if you have multiple protection domains, such as ring 1 and ring 2, which are generally unused by most operating systems, it should be prohibited in hardware. A "poor man's" version of hardware prohibition is achievable through a rather more radical utilization of large address spaces than is used in ASLR: statistical page protection. If you can't find the page, and if functions in a library are not laid out in adjacent pages in the process address space when the library is loaded, you can use a computed location based on a known call site to find an attack vector.
Another boundary layering violation which Outlook has failed on -- and which qmail periodically fails upon, and which constitutes a number of usable qmail exploits -- is container boundary verification.
In the second vector for Outlook exploits, after the layering violations are dismissed, we have container boundary verification. While qmail is not subject specifically to MIME based container boundary verification issues, it has its own problems with containers. In Outlook, these took the form of intentionally malformed data content being passed as part of a message. The easiest of these is the fact that, in order to render a message more quickly, and, specifically to support the rendering of HTML messages (which Microsoft still things are "Nifty!(tm)"), Outlook started decoding the container contents before verifying the validity of the container. Specifically, it would start rendering GIFs before they were verified to be valid GIFs, it would start rending other content before it was determined the content was verified to be valid content. This is where we get the "malformed attachment" exploits in Outlook.
The correct thing to do is to download the content, verify each container matches its purported size, and then the containers inside the containers -- images, audio, video, etc. -- are themselves valid, before handing them off to the rendering component. Outlook failed to do this, and treated the header as a dispatch item, handing off the data stream to the rendering component, which allowed a header on a container to cause mush more of the byte stream than the container boundary to execute payload in subsequent containers. Qmail fails in a similar way with the handing off to a renderer unverified content container content
Most modern (predominantly research) security architectures have moved to a container-in-a-mailbox mechanism. You put the contents of the container into a mailbox, and then you run a verifier -- separate from the renderer -- on the mailbox contents, thus preventing an assignment of meaning, and therefore a communication of intelligence (attack data) to a target; only after that, do you hand the mailbox off to the content renderer.
Note that this application of containers in mailboxes has a couple of significant advantages: (A) it's really amenable to things like statistical memory protection, since if you run off the end of a page, you fault, instead of getting meaningful payload data, and (B) for hardening purposes, you can put the container contents end at the end of the page boundary, and index the start of the mailbox into the first page, rather than at the start of the page (you can do this, because you are aware of the content length as a result of looking at the initial container and having vetted it before mailboxing the data). This means that a scan forward into the container past the boundary results in an immediate fault, even though your hardware perhaps only supports 4K page boundaries, rather than byte-level mapping. And finally, (C), you can map the mailbox contents as read-only, non-executable, non-writeable, before you hand them off even to the verifier, thus preventing self-referential execution as part of an attack.
To deal with the issue of attack surface at interface layers, which is handled by decomposition into processes, instead you can rely on the link-loader. In most modern operating systems, the link-load phase is handled in the kernel exec/fork/spawn functions, which also manage ASLR. An alternative is to only make the addresses of code in inferior layers (one should never access data in an inferior layer, other than through use of an accessor/mutator function, rather than directly), you make known the function only at the call site. Further, you decompose the functions locations into groups of pages which are non-contiguous. Thus the fprintf() function in libc might (should) be in a physically discontiguous location from the gethostent() or other libc function. Thus address space decomposition is a better approach than functional decomposition based on role and program boundaries: it's much more granular.
There's at least 5 additional techniques that you might be expected to use, each with diminishing security utility, which you could utilize to do a better job than qmail does, but you get the basic idea, and I'm not going to write an entire paper here on Slashdot.