1. "code works": Easy to say, hard to check. It might work today in some circumstances but fail tomorrow in other circumstances.
2. "is maintainable": That's a subjective criteria which is impossible to enforce.
3. "is to spec": Again, easy to check for common pathways, but hard to catch all the nuances (for the same reason that no one has 100% code coverage in their unit tests).
4. "passing a security audit": This helps, but as well all know by now it does not guarantee that the code is secure. Code usually depends on 100s of transitive dependencies. No one in their right mind includes transitive code in their security audits, unless you're the military and have that kind of money.
I'm not saying that building a house is any easier. I'd simply point out that we evaluate houses and bridges after a 30-year track record. If bad things happen, people get sued and there is some form of liability.
How many people behind software development (from the programmers up to the project managers) are liable for their work 30 years later?
Until we become liable for our work there will be no incentive to measure and improve some of these metrics. Just my 2 cents.