Unexpected encounters with dirty code will make it very difficult to make a sane prediction.
Dirty code is defined as ' overly complex or highly coupled.' As a programer you are expected to deliver X number of features by Y date. Unless one of those features is 'simple and loosely coupled code' what does that have to do with predicting anything? For performance you don't predict. Experiments are the only thing you have that work: test and change and re-test and un-change and re-test, endlessly. Anything else is voodoo programming, not to insult the pracitioners of Santaria, Vodou or Hoodoo.
How about predicting the schedule? I recall that Steve McConnell once joked that to get better at estimating we need to get better at estimating. (This may have been someone else.) Greg Wilson showed we can do this in programming, and Computer Science in general. We only have to do scientific experimentation with various methods. We throw away what doesn't work (instead of writing pulpy business books to bilk people out of money.) But you'll still have to run a lot of tests to do that, too.
It is not uncommon to see "quick" refactorings eventually taking several months to complete. In these instances, the damage to the credibility and political capital of the responsible team will range from severe to terminal. If only we had a tool to help us identify and measure this risk.
It is my opinion that any refactoring that cannot be done by an automatic program isn't refactoring. The original definition of refactoring is just 'factoring' or re-organizing the code. It is not a re-writing as in an 'several months' effort.
Misuse of a sexy, trendy name from the 90s does not change this. All re-writing suffers the risk of second-system syndrome and not in the throw-one-away sense of prototyping. Do you have a button to press in your IDE to make the change? Do you have in mind a short sed statement, simple awk program, EMACS macros or a on-hand shell scriptlet to do the transformation? If not then you cannot get away from re-thinking the problem. This will require re-design of the solution and re-implementation of the feature. Each of these carries time risk at least as high as the original work.
What if the problem is overly complex or highly coupled? The code may merely be an expression of this. In this case only a paradigm or perspective change by the customer, developer or user can untangle the problem. The computer cannot help you do anything but automate making a mess if the problem is a mess. Changing perspective is often an unbound-in-time problem for human beings. Good luck with estimating completion dates for that.
In fact, we have many ways of measuring and controlling the degree and depth of coupling and complexity of our code. Software metrics can be used to count the occurrences of specific features in our code. The values of these counts do correlate with code quality.
In fact, Greg Wilson showed in his presentation that almost every metric on the market when analyzed showed no better and usually equal predictive power as simple counts of Lines of Code.
The situation in programming is almost as if more code equals more bugs while less code equals less bugs.
This seems obvious and trivial, but this is quite real and has serious implications. One of those is the increasing spread of syntactic sugar in programming languages. Another is the proliferation of VM models that take over more features like threading and memory management over time. This is enabling less skilled programmers to do things that once required lots of skill, training and thought to implement. This also forces certain performance requirements for applications, e.g. the arguably fictitious idea that Java Virtual Machine is bloated and slow so all Java applications must be bloated and slow.
One downside to software metrics is that the huge array of numbers that metrics tools produce can be intimidating to the uninitiated. That said, software metrics can be a powerful tool in our fight for clean code.
But if they are no better than simple counts of Lines of Code, why should the uninitiated bother? If you know that the more you write the more bugs you are going to have, why not seek to write less instead?
They can help us to identify and eliminate dirty code bombs before they are a serious risk to a performance tuning exercise.
The fastest code is that which is never run. The only code without (implementation) bugs is code that doesn't exist. Why is quicksort so quick? Because it does less than other sort algorithms.
This is also, I think, why a lot of great programmers are known for writing either some major tool or a programing language. Rephrase the original problem in a well-matched language or tool's command interface. Then you only have to write a little amount of code. Writing parsers is so well known of a task that many tools exist to process a description of a language into a compiler automatically. The real trick is realizing you need to do this the first time, not the second time around.