Because in 1981 or so, everybody was pretty sure that this fairly obscure educational network would *never* need more than about 4 billion addresses... and they were *obviously right*.
Well, maybe. Back then home computers were already a growth area and so it was obvious that one computer per household would eventually become the norm. If you wanted to put these all on IPv4, then it would be cramped. The growth in mobile devices and multi-computer households might have been a bit surprising to someone in 1981, but you'd have wanted to add some headroom.
When 2% of your address space is consumed, you are just over 6 doublings away consumption. Even if you assume an entire decade per doubling, that's less than an average lifetime before you're doing it all over again.
With IPv6, you can have 4 billion networks for every IPv4 address. Doublings are much easier to think about in base 2: one bit per doubling. We've used all of the IPv4 addresses. Many of those are for NAT'd networks, so let's assume that they all are and that we're going to want one IPv6 subnet for each IPv4 address currently assigned during the transition. That's 32 bits gone. Assuming that we're using a /48 for every subnet, then that gives us 16 more doublings (160 years by your calculations). If we're using /64s, then that's 32 doublings (320 years). I hope that's within my lifetime, but I suspect that it won't be.
In practice, I suspect that the growth will be a bit different. Most of the current growth is multiple devices per household, which doesn't affect the number of subnets: that /64 will happily keep a house happy with a nice sparse network, even if every single physical object that you own gets a microcontroller and participates in IoT things using a globally routable address.
IMHO: what needs to happen next is to have a 16 bit packet header to indicate the size of the address in use. This makes the address space not only dynamic, but MASSIVE without requiring all hardware on the face of the Earth to be updated any time the address space runs out.
This isn't really a workable idea. Routing tables need to be fast, which means that the hardware needs to be simple. For IPv4, you basically have a fast RAM block with 2^24 entries and switch on the first three bytes to determine where to send the packet. With IPv6, subnets are intended to be arranged hierarchically, so you end up with a simpler decision. With variable-length fields, you'd need something complex to parse them and that would send you into the software slow path. This is a problem, because you'd then have a very simple DoS attack on backbone routers (just send them packets with large length headers that chew up CPU before they're dropped). You'd also have the same deployment headaches that IPv6 has: no one would buy routers that had fast paths for very large addresses now, just because in 100 years we might need them, so no one would test that path at a large scale: you'd avoid the DoS by just dropping all packets that used an address size other than 4 or 16. In 100 years (i.e. well over 50 backbone router upgrades), people might start caring and buy routers that could handle 16 or 32 byte address fields, but that upgrade path is already possible: the field that you're looking for is called the version field in the IP header.