Comment Re:Google/FB/etc are Embarassingly Parallel (Score 1) 112
Most problems can't be solved by a single map reduce. Map reduce tasks are normally written as a series of map reduce jobs; each map runs on the output of the previous reduce.
Since map reduce jobs write their output to disk at every step, it can be thought of as a form of check pointing. The difference between map reduce and mpi check pointing is that mpi needs to restart the whole job at the checkpoint, but map reduce frameworks can rerun just the work assigned to failed nodes. In the map reduce model, the checkpoint interval stays constant when adding additional nodes. With mpi, the checkpoint interval decreases as nodes are added because adding nodes increases the chance of at least one node failing in a given time interval and forcing the entire job to restart at the checkpoint.
Not all mpi jobs can easily be rewritten as map reduce jobs, but map reduce does address the problem discussed in the article.