Just to expand on that a little, the issue with parallel busses became that data on each line would arrive out of sync with other lines and the clock.
For example, PCIe 4.0 is 16 gigabits/second per serial channel. Each bit has a width of 0.0625 nanoseconds, during which time light can travel about 18mm. If you wanted to transfer say 32 bits in parallel like the old PCI bus, you would need 32 connections. Problem is that for practical reasons the PCIe slot needs to be quite wide, so if your 32 pins are spaced over say 18mm then the extra distance that a signal at one end has to travel compared to a signal at the other end will mean its signal arrives one clock cycle later than the other.
Then there is clock jitter. When you have a 16GHz clock, the variation from one cycle to the next is a huge problem and low jitter clocks at that speed are very expensive and tricky to use.
You also need to use very small signals when you get up to those clock rates. There is something called slew rate, the speed at which voltage can change. You can either make the change happen faster, which we did and is now quite hard, or you can lower the voltage so it doesn't have as far to go. But lowering the voltage also makes it more prone to electrical noise.
The solution is to use serial. One one differential signal for both clock and data. Differential signals have much better noise immunity, and as an added bonus you only need an 8GHz data rate on either line to get a combined data rate of 16Gb/sec. It also means your cables are smaller and easier to manage, the electronics are much simpler and cheaper etc.