Comment Big problems from a little seed. (Score 1) 427
Bjarne: There really is a question at the end of this, but it takes some setting up. Please bear with me.
In the late '80s I came to Silicon Valley for a startup, which was using C++ for a major project. I learned the language then and got over the "doing it the object-oriented way" hump. In the process I analyzed what cfront was producing and got a good understanding of what was under the hood (at the time).
The project was very ambitious. Much of it was creating a data base engine for a complicated indexing mechanism. The result was that transactions occurred by creating a transient structure. The bulk of the work occurred during its construction, and errors during this stage had to be unwound carefully.
In those days C++ didn't have exception handling - "catch" and "throw" were reserved words, to be defined later. (So I built a set of exception handling classes that unwound even errors thrown from deep in construction and destruction correctly.)
Some of the architectural types had come to OOP via Xerox PARC and Smalltalk, and didn't want to be "slowed down" by getting "manual" memory management right. So we built a set of classes (incluing "smartpointers") and a preprocessor (to automatically generate pointer-identifying virtual functions) and got garbage collection working just fine. (We did a similar thing for remote procedure calls, and so on. We may still hold the record for layers of preprocessing between source and object...)
C++ WOULD have been the ideal language for it. But we found a little hole in the spec that caused BIG problems. The language got it SO ALMOST right that it was painful.
Consider construction of a subclass with virtual functions. Suppose the base class exports a pointer to itself (say, putting it into an external list). Then suppose that, at some time during the execution of the constructor of a member object of the derived class (or other initialization of the derived class), something gets hold of that pointer and calls a virtual function of the base class that is overridden by the derived class. Does it get the base class or derived class version of the function?
IMHO it should get the BASE class version during the initialization of the derived class UP TO the execution of the first user-written statement of the constructor, and the derived class version once the constructor's guts are executing. Getting the derived version before the constructor proper is entered means attempting to use functionality that has not been initialized. (Before the constructor is enetered you're still initializing the component parts. During the constructorf you initialize the assembly, and the author can manage the issue.) Similarly, during destruction it should get the derived version through the last user written line of the destructor, the base class version after that (as first the object-type members, then the base class(es), are destroyed).
Examples of how this would work in real problems:
- Object represents a an object in a displayed image. The base class is a generic displayable object, which hooks the object into a display list. It has a do-nothing "DrawMe()" virtual function. The derived class adds the behavior. When the display finishes a frame the list is run, calling the "DrawMe()" methods of all the objects. If one is still under construction and the derived class overriding is clalled, uninitialized memory is accessed (including pointers containing stack junk).
- Object is heap-allocated. Virtual functions are the "mark()" or "sweep()" pointer enumerator for the garbage collector. Base classs is the generic "I'm a heap allocated object" substrate, hooking into an "allocated objects" list with do-nothing virtual functions. At each level of object derivation the new version of the function enumerates, for the mark and sweep phases, the member variables that are pointers (and calls the base class version to also enumerate the pointers at the more baseward layers. The pointers' own initialization clears them to NULL before this can happen. If a derived class constructor exhausts memory and triggers a garbage collection before all the pointers are set to null, getting the derived-class version of an enumeration routne means following stack-junk pointers. Crash!
- Exception-handling during constructors involves a similar mechanism to identify what levels of DEstructor need to be called to unwind the construction. Again a virtual function (this time the actual destructor) identifies the level of construction that needs unwinding. Again, getting the derived class overriding before the constructor is entered means calling the destructor for an uninitialized level. Oops!
There are, no doubt, many similar real-world patterns, since the problem is fundamental.
So when we discovered the wrong level was getting called in some cases, I did a survey of available compilers, looking for one that got it "right". WIth both constructor and destructor behavior to be done right or wrong, there were three wrong ways and one right way:
- cfront (and the cfront-derived compilers got it wrong one way.
- the three commercial compilers for PCs got it wrong a second way.
- gnu g++ got it wrong the third way.
So we had to program around it. We were able to get the exception handling to operate correctly. But both exception handling and garbage collection required that all nontirival processing had to be moved from initialization to constructors. Member variables of object type had to be moved to the heap - replaced by smartpointers - and allocated by the constructor. Compact composite objects allocated and freed as a unit became webs of pointer-interconnected heap objects, allocated separately (thus multiplying allocations and frees). In addition to the extra memory management, the number of pointers that had to be followed by the garbage collector exploded. Efficiency went out the window.
Of course, since the published language definition didn't actually DEFINE the behavior, "right" or "wrong" were not official.
This was during the ANSI deliberations for the first generation of standards. The current draft explicitly left open the issue of which overriding you got in this circumstance. So I petitioned the committee to define it - "correctly". I had high hopes, since everybody would have had to make a change so it wouldn't favor some vendors over others. But on the second round of email replys I got somethint that seemed strange - saying something about it not being important because "C++ doesn't have an object model". (Say WHAT?) I was too busy to pursue it further, let it drop, and the standard came out with the behavior explicitly undefined.
When the second round of ANSI standard came by (after my participation in the project was over) I tried again, just to "fix the language" to avoid this sort of subtle bug. Still no luck: This time the standard not only left it open, but explicitly said (approximately) "Don't write code where this could happen."
By the third round I had "gone over to the hard(ware) side of the force" and didn't pursue it.
IMHO this is the one thing in C++ that is a real language killer (of the sneak-up-behind-and-knife-the-developer variety).
So, FINALLY, the question:
Has this been fixed yet by the newer standards? If not, is there a chance that it will be at some future time? (Perhaps if YOU brought it up there might be more attention paid to it.)