The node name many years ago used to match the polysilicon gate length. Each subsequent node was 0.7 times the prior one. I.e., the shrink factor is 0.7. Note that 0.7 * 0.7 = 0.49, or roughly 0.5 — so transistors on the new node take up 1/2 the space of the prior. Put another way, you can put twice as many transistors in the same chip area. This is the key — the transistor budget doubles each generation. Other benefits, such as increased transistor drive strength. However, the area available to route signals is cut in half, so you have to add metal layers to maintain routing ability.
As the technology advanced, lithography wasn’t able to maintain the 0.7 shrink. However, other methods were used to double the transistor budget. So the node name doesn’t match the polysilicon gate length, but the transistor density was still doubled — maintaining the overall 0.7 shrink factor. Advanced transistor engineering brought other improvements. It used to be that pmos transistors had 1/2 the transconductance of nmos transistors. So pmos gate would have to be 2X wider than the nmos gate to have a balanced cmos driver. Now, through gate strain and other methods, the pmos transconductance is matched to nmos.
Maximum chip size is limited by the exposure field size of the litho tools — which is roughly the size of your thumbnail. Think of new nodes just communicating that the next node can put twice as many transistors in that thumbnail-sized area.
Why has this transistor doubling been achieved every node for decades now? Executive bonuses. In TD, executive bonuses are tied to achieving that doubling.