If MIT were to choose to turn, say, 18.128.x.x over to ARIN, they'd have to completely reconfigure their network
True. If they were to return any addresses at all, they can no longer announce a route to 18.0.0.0/8. (I don't know if they do announce it like that, but having a class A, they could). If they can no longer announce a /8 they would have to announce smaller blocks. If they were able to squeeze everything into half the addresses they have now, then they could announce a /9 and return the other /9. If that is not possible, then they would be forced to announce their addresses as multiple prefixes and thereby increase the global routing table size and cause problems for everybody else.
If you have a /8 and would want to return some, you shouldn't do so unless you really think it is feasible to squeeze all your usage into half the space you used to have. You also need to ensure that whatever address space you are left with is enough to last until everybody else have switch to dual stack (or IPv6 only). Since the purpose of returning addresses is to prolong the transition (which isn't much of a useful purpose), it also implies that the more addresses you return, the longer those you have left will have to last. That's not much of an incentive to return addresses.
Apart from BGP is it really much of a problem to return just small fragments of address space? Do you have to change your network, if you weren't using those addresses anyway? The answer to that question of course is yes. If you were to just change your BGP announcements to stop announcing some addresses that you weren't using anyway, nothing would break yet. However if you were to return them and they got reassigned, things would break, if you had only changed your BGP announcements.
The reason things would break is that the route to those addresses would still end up somewhere inside the same network, if it originated there. Taking your example, a packet from 18.127.1.2 to 18.128.2.3 would never leave the MIT network even if the destination address had been returned, because the routers inside the MIT network would still think it was a destination within their network. It would propagate through their network until a point where some router would report no route to host. So both the BGP announcements and the routing inside the network would have to be redone before addresses could be returned.
It would make more sense for them to set up a brand new IPv6 network, slowly migrate everything there, and once it's all done and if they don't need IPv4 anymore, they can then turn the entire 18.x.x.x back to ARIN.
Even once they have everything set up as dual stack, they would still want to keep IPv4 for as long as there was anybody else on IPv4 only who they wanted to communicate with. The point at which it makes sense to return the IPv4 addresses to ARIN is also the point at which nobody else would want them anymore. It is not like the demand for IPv4 addresses would drop to zero overnight, but it is probably going to be close to it.
Most likely we will eventually reach a point where nobody would bother to setup new IPv4 networks, and the demand for addresses will decrease. The decrease in demand will not be very visible since there won't be much of a supply either. But those who are already dual stack won't turn down their IPv4 support right away. Just as their isn't much to gain from turning up IPv4 when everybody else is dual stack, there won't be much to gain from turning down IPv4 support.
At a later date the work to keep administrating IPv4 in parallel with IPv6 will be more significant than the work it will take to remove the last few IPv4 dependencies. At that point people will turn down IPv4 and return the addresses. And nobody will care about the returned addresses.
It's true that the initial allocation of IPv4 addresses was badly done
Effectively the original addressing architecture capped the achievable HD ratio somewhere around 75%. Nowadays we are doing 80-90% with more administrative overhead. The bad decision was not so much the addressing architecture, it was the limited size of the addresses. A 64 bit address with a 60-70% HD ratio would have allowed for more hosts than a 32 bit address with a 90% HD ratio.
The proper design is to figure out how high you want to push the HD ratio (with the knowledge that administration becomes a burden if you try to push it over 80%) and how many devices you want to support, and then with those two numbers in mind compute the required address size and design an addressing architecture to match it. Of course we shouldn't blame the people who designed it at the time. They didn't know then what we know today. The entire research on HD ratios is more recent than the design of IPv4.
IPv6 addressing architecture was designed for an HD ratio of 80% on bit positions 3-47, and a much lower HD ratio on bit positions 48-127. But overall it will most likely allow for more devices than this planet can sustain.