There seems to be a major trend towards making compilers create code that is as different as possible from what the programmer wrote without being so different that the programmer actually notices. One might assume it's a secret NSA plot to defeat security measures in all software everywhere. You know, if one was incredibly paranoid, that is.
It's hard to say whether this is justified behavior. As an example, consider this code from a link an AC posted:
char digest[DIGEST_LEN]; ....
memset(digest, 0, sizeof(digest));
Exploit mitigation code like this is a case of writing code which we expect to never have any effect, just in case we're wrong and it does have an effect. Then the compiler comes along and decides for itself that the code we wrote will never have any effect and removes it. It's kind of hard to blame it for noticing the uselessness of the operation when we ourselves expected the code to likely have no effect when we wrote it, but then, the whole reason we wrote it is because we thought we might be wrong. Should the compiler then assume that we might be wrong as well, and that we might access that memory using a different pointer?
Does it make sense to compile with optimization enabled when, by including things like the memset() call to clear memory we're finished using, we clearly have goals other than optimization?
The article mentions the fix being the use of a different function which won't be optimized away, but I wonder if even that is a legitimate fix. Our "digest" array is just another variable that the compiler is free to do whatever it wants with in the name of optimization. If it will make the program run faster, it's free to make two copies of it. Then our new never-optimized-away function will end up erasing only one copy of the variable.
So problem here isn't the use of memset() rather than some other function. The problem is that we're asking the compiler to create code that doesn't match what we've written. It should be no surprise then when it goes ahead and does that. Thus, I don't think it's correct to claim that the error here is the failure to use the correct function to clear the memory. I think the error is in asking the compiler to generate code that isn't identical to the source code.
The core of the problem is that C isn't a language that allows us to clearly tell the compiler exactly what we want to happen. Without bounds checking on pointer use, every pointer is effectively a pointer to all memory. Thus, when a pointer falls out of scope, it doesn't mean anything. That memory can still be accessed via any other pointer anywhere in the program. If C enforced bounds checking, such that accessing the data in "digest" via any other pointer was impossible, then the compiler could safely work under the assumption that once "digest" falls out of scope, the data it points to will never be accessed again, and thus removing the memset() call would be a safe thing to do since it truly would have no effect.
It really seems ridiculous when you think about it. Compilers assume that bounds on pointers will be respected, yet make no attempt whatsoever to enforce those bounds, essentially guaranteeing that they will not be respected since programmers are imperfect.
Consider what the compiler will do when it encounters code like this:
b[-1] = 0;
Despite the obvious error in the above code, GCC will compile it without error. It will then perform optimizations that assume that neither a nor c have been affected by the assignment to b. It seems rather ridiculous that anyone is expected to create secure software in such an environment. Either the compiler should enforce bounds checking, or it should assume that any pointer operation can affect any variable.
C could really use an extension for bounds checking. Even if it is purely optional, like a new type of pointer that includes a limit. In many cases the compiler could enforce the bounds at compile time simply by analyzing the program as it currently does when optimizing and so no additional machine code would be generated. In other cases, I think the security benefits would be worth the loss in efficiency. With bounds checking, the compiler could safely optimize away all of those memset() calls and other things programmers do to mitigate security issues since it would then actually know with certainty that the memory being cleared will never be accessed after the variable falls out of scope.
As things are now, we're currently benefiting from those optimizations despite not having modified the language such that the compiler can truly know that those optimizations are safe. Even without modifications to the language, the compiler should at least point out violations in how it assumes the programmer uses pointers, like the assignment in the code above. To optimize under the assumption that the programmer doesn't do such things while ignoring obvious violations of that assumption seems rather negligent.