Even if you dedicated a core and sat in a busy loop polling the NICs for new packets, you'd still have to wait for the receiving NIC to get the whole packet, you'd still have to set up a DMA transfer to ram, you'd still have to look up the address in an O(log n) trie too large to stay in the L3 cache and you'd still have to set up a DMA transfer from ram to the outbound NIC which would still wait for the entire packet before beginning to transmit it.
"Big Iron" routers don't do this. They wait until they have the whole packet. Then the address is looked up in the O(1) TCAM*, a special tri-state static ram that isn't present in your generic x86 machine. Then the packet is transmitted across the backplane to the outgoing interface without ever touching main memory or the the main processor.
Even then the packet tends to get buffered at least twice with a stochastic probability of waiting in the buffer for other traffic to clear. And that's if you're using a high-quality service provider that avoids running links over 80% of capacity.
* TCAM = Ternary Content Addressable Memory. Bits are organized in rows containing an address or subnet. Each bit can have three states: 1, 0 or "don't care." The address to be looked up is injected at the top of the TCAM and compared against all rows in the TCAM during a single clock. The TCAM outputs the position of the first matching row.
Yes, it's a heater.