But if dual-stack is the expected norm, that kinda makes the "push to move everyone to v6 to solve the network address issue" a bit of a fail.
Not really: the v4 side will end up behind piles of NAT and generally suck, but that doesn't matter anywhere near as much if it's just for backwards compatibility rather than being all you've got.
I thought one of the goals of the v6 addressing space, at least initially, was that there would be a "v4 compatibility" built into the V6 addressing space, at least for some sense of local addresses -- so that you could talk to a v4 device that was on the same local network.
And it does have that. The main backwards compatibility method is to just use the v4 stack as-is. It's the easiest possible way to do it (you don't even have to do anything: your existing network does the job already) and it's guaranteed to be the most compatible (because you're already using it). It's also the only way to do it on a LAN, where you're talking directly to the other machine without a router in the way to translate.
You mention that there is a NAT64, and I can make some guesses as to how it operates, at least if the V6 machine is initiating the connection. You also mention that there are multiple ways to make this work; so why not have a single standard that works?
Roughly the same as NAT44 does, except with v6 addresses on the local side. You're right, it'll be outbound only, unless you configure a "port forward" (more of an IP forward).
("Roughly" because there is the issue of getting client programs to connect to 64:ff9b::203.0.113.1 instead of 203.0.113.1. Normally you do this by inventing fake DNS responses -- this is the "a few extra problems of its own" part.)
There are multiple transition methods because they target different scenarios. 6to4 allows a v6-capable device with only a (public) v4 address to talk to v6 hosts (and it gives you a