Apologies accepted, and my apologies to you. In my experience, trolls wouldn't reply to to this sort of bait, I must have been wrong.
I don't know what sort of point I was supposed to have missed, really, given that I started this sub-thread with an example; if anything - and yes, I'm nitpicking now - everyone else in this sub-thread is missing my point, pretty much by definition.
So let's get back to the point I was trying to make: efficient techniques for parallel programming are all about how you subdivide your data. This is completely language-independent. For a totally high-level approach to subdividing data (mostly) efficiently, look to MapReduce.
What you would do with MapReduce in a simple case on a single multi-core machine would be to divide your data set into N segments, where N is the number of threads you want to run in parallel. Take MapReduce to the level of networked machines, and the concept remains the same, except that you've got M machines to each process a chunk of the overall data (and maybe in addition, on each machine, you further subdivide the data into N segments).
The problem with talking about MapReduce in this context is that at this high level of abstraction, you might as well use OOP, too. It's a good example for illustrating the data partitioning problems that exist around writing parallel code, but not a good example for how OOP might stand in the way of efficient parallel execution.
I figured the best example would be one that takes the subdivision problem to the lowest level, which invariably means breaking down the problem to how efficiently a CPU's cache protocol handles data, based on how it's subdivided in memory. Using C++-ish code as an example made sense, as C++ is a language where you *can* influence the data layout, and where you *can* lay out data in a fashion that's typical to OOP.
As far as I am concerned, OOP is inherently great at a number of things, and at first glance none of them impact parallelism in any way. All of these things - encapsulation, access control, inheritance, etc. - have a side-effect, though, because the very concept requires you to lump data together that's conceptually related, whether or not it's used together (You might feel like rephrasing that as "the very concept IS to lump data together", but that would ignore some OOP features). In that sense, OOP is inherently about preferring some data layout in memory over another, whether you think about it in that way or not.
The downside of this data layout is that it makes it harder to split one set of relevant data off from another, that is, it makes *efficient* subdivision more difficult. Whether or not that's the case of course depends on the problem you're trying to solve; the assumption must be that you want to solve a problem involving one subset of your objects' data members in parallel to another problem involving a disjoint subset of data members.
You're arguing that whether or not data is, in fact, laid out in memory in a "bad" way is dependent on the language or compiler. To the degree where we're speaking about C or C++, that's just not the case - but of course clever enough compilers for other OOP languages might exist. This is something I doubt, though I would love to be proven wrong, with code to read. The reason I doubt it is primarily because it's pretty damn hard without marking your data as "please parallelize", which none of the programming languages I know/use provide a (portable, standardized) mechanism for. Never mind whether they're OOP. Someone mentioned Haskell here; I don't know the language well enough to know whether or not such a mechanism exists.
Does this explain things better?