If the CPU in the IoT Device is powerful enough to make offloading actually worthwhile, isn't that CPU way overkill for the IoT Device's primary function?
Not at all. The CPU is fast to reduce latency. This not only meets response targets, but it also means the CPU can shut down after a very short time, saving power.
This is especially important on battery powered devices. If the CPU is off except for a couple of milliseconds every few seconds, a battery can last for years.
The CPU is also fast because it's made of small components close together. It's built using current large-chip fabrication technology. Making it physically small means many chips per die, which means low cost per chip. If that makes it fast, so much the better .
As long as you're not using extra power to increase the speed further, there's no problem with a processor being "too fast". That just means it can go to sleep sooner. In fact, slowing it down can be expensive: Slower means not only that the power is on longer, but it also usually means bigger components which require more electrons to change their voltage. The more electrons delivered by the battery, the more if it is used up. Oops!
Granted that the processors are powerful and cheap, and have a lot of computation potential. But there are other downsides to trying to use IoT devices for a computing resource.
One is that the volatile memory, which uses scarce power just holding its state is very small, and the permanent memory, though it may be moderately large, is flash: VERY slow, VERY power consuming to do a write (and the processor stops while you're writing flash, screwing things up for its primary purpose).
Much of the current generation IoT devices run on either the Texas Instruments CC2541 (8051 processor, 8kB RAM, 256kB flash) and its relatives, or the Nordic nRF51822 (32-bit ARM® Cortexâ M0 CPU, 32kB/16kB RAM, 256kB/128kB flash) and its family, and the next generation is an incremental improvement rather than a breakthrough. You can do a lot in a quarter megabyte of code space (if you're willing to work at it a bit like we did in the early days of computing). But there's not a lot of elbow room there.
The tiny memories mean you don't have a lot of resource to throw at operating systems and extra work. In fact, though the communication stacks are pretty substantial (and use up a LOT of the flash!), the OSes are pretty rudimentary: Mostly custom event loop abstraction layers, talking to applications that are mostly event and callback handlers. Development environments encourage custom loads that don't have any pieces of libraries or system services that aren't actually used by the applications.
Another downside is the lack of bandwidth for communicating between them. (Bluetooth Low Energy, for example, runs at one megaBIT per second, has a lot of overhead and tiny packets, and divides three "advertising" (connection establishment) channels, in the cracks between 2.4GHz WiFI chnnels, among ALL the machines in radio "earshot".) Maybe they can do a lot of deep thought - but getting the work to, and the results from, all those little guys will be a bottleneck.
Maybe Moore's Law and the economic advantage of saving programmer time may make this change in the future. But I'm not holding my breath waiting for "smart" lightbulbs to have large, standardized, OSes making that "wasted" CPU power available to parasitic worms.