> MTBF ... the machine will fail before it can compute anything meaningful
MTBF is statistical though. This can be overcome. Look at it this way. Surely the totality of servers on the Internet would exceed exascale computing power but how many servers fail at any instant in time? Perhaps a few. Ok, but somehow when I surf to my favorite sites, they are almost always up. That means they are doing something to keep them reliable. Such measures may increase the cost of each node but if you want to achieve the necessary uptime in order to finish the job, that would be required.
Node reliability might receive less investment for the sake of keeping nodes compact though. So it comes down to the manufacturers to increase MTBF for all consumers. And assemblers have to be careful not to slam the hardware around
Also, each node or cluster has to be periodically tested or probed to determine whether it is reliable. If a node or cluster can perform a calculation reliably at random times, then the results from the node may be deemed correct. If not, then the circumstances that cause the node to misbehave may have to be worked around or the node has to be replaced. A highly reliable system may emerge.
The exascale system will be built but there may be a limit on the number of nodes that any organization is willing to pony up for because if there is a huge leap of size from the previous #1 system, there is the obvious expense to consider as well as the obsolescence factor as new technology makes the same speed achievable for less only a few years later.