Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×

Comment The only real argument against reviews is... (Score 1) 495

... time. And it is a good argument in many cases. Code reviews are great, but they make sense only if you provide good feedback from your review, and the original author has time to revisit their code and make changes according to the feedback. I've seen many work environments were such things were considered too expensive (in terms of time).

Comment I do this every day... (Score 1) 213

I mean, he ends up at reading the subject of every email (check), and scanning through his spam to see if there are false positives (check). My ham volume is about as large as his, and my spam volume is significantly lower (ca. 30%) because I've got a good spam filter.

I don't see what the big deal is.

Submission + - Carry-On Luggage with detachable Laptop Bag? (not.available)

unwesen writes: I'm going on more and more short business trips, and am sick of checking in luggage when a carry-on bag holds everything I need. That's also convenient, as some airlines do not allow you to carry an extra laptop bag. My current solution is less than ideal, though: I basically squash my laptop bag into a duffel bag of the right dimensions. Surely there must be a better solution? My ideal would be a fairly sturdy carry-on backpack or trolley with a detachable laptop case just large enough to hold a 15" laptop, a magazine or two and a few pens. My searches haven't led to much; I've found one or two potential candidates, but they all seem to have serious drawbacks. Surely I can't be alone amongst the /. crowd with my requirements; do you have any suggestions? There seems to be plenty of luggage with a laptop compartment, but you can't take that to the office on its own...

Comment Re:Big difference (Score 1) 1486

And how does it relate to the topic that people have faith in some things, but take a scientific approach to trying to understand others? The two are not mutually exclusive. It's not even mutually exclusive that they apply both approaches to the same thing at different times.

Trying to understand science in relation to individual faith is non-sensical; science works at the scope of humanity as a whole. Many individuals being able to double-check many other individual's claims, not one individual proving their own.

Comment That post makes my brain hurt with it's stupidity (Score 2) 1486

Let's make it simple why: Science is not what scientific disciplines have found out. Science is a set of methods to further human knowledge.

To confuse the two is to misunderstand science so thoroughly that it pains me.

More precisely, science is a set of tools to guard against our individual fallacies (such as blind faith) contaminating the species' body of knowledge, by enabling each and every person to apply these tools and validate or disprove every piece of knowledge in existence. In other words, it doesn't bloody matter if you, the individual, believe in the tooth fairy when you can prove P != NP. Nor does it matter when you, the individual, don't even know what there is to prove, what the problem is. What matters is that someone else can verify or disprove either your proof of P != NP, or your belief in the tooth fairy. Or both.

To be fair, it's terrifying how people will take on an "expert's opinion" on blind faith. The answer to that problem, though, is to teach scientific process, so that people can make better choices in what to believe. The answer most certainly is not to suggest that science is little more than faith in a different set of beliefs.

Comment Re:Please enlighten me... (Score 1) 755

Apologies accepted, and my apologies to you. In my experience, trolls wouldn't reply to to this sort of bait, I must have been wrong.

I don't know what sort of point I was supposed to have missed, really, given that I started this sub-thread with an example; if anything - and yes, I'm nitpicking now - everyone else in this sub-thread is missing my point, pretty much by definition.

So let's get back to the point I was trying to make: efficient techniques for parallel programming are all about how you subdivide your data. This is completely language-independent. For a totally high-level approach to subdividing data (mostly) efficiently, look to MapReduce.

What you would do with MapReduce in a simple case on a single multi-core machine would be to divide your data set into N segments, where N is the number of threads you want to run in parallel. Take MapReduce to the level of networked machines, and the concept remains the same, except that you've got M machines to each process a chunk of the overall data (and maybe in addition, on each machine, you further subdivide the data into N segments).

The problem with talking about MapReduce in this context is that at this high level of abstraction, you might as well use OOP, too. It's a good example for illustrating the data partitioning problems that exist around writing parallel code, but not a good example for how OOP might stand in the way of efficient parallel execution.

I figured the best example would be one that takes the subdivision problem to the lowest level, which invariably means breaking down the problem to how efficiently a CPU's cache protocol handles data, based on how it's subdivided in memory. Using C++-ish code as an example made sense, as C++ is a language where you *can* influence the data layout, and where you *can* lay out data in a fashion that's typical to OOP.

As far as I am concerned, OOP is inherently great at a number of things, and at first glance none of them impact parallelism in any way. All of these things - encapsulation, access control, inheritance, etc. - have a side-effect, though, because the very concept requires you to lump data together that's conceptually related, whether or not it's used together (You might feel like rephrasing that as "the very concept IS to lump data together", but that would ignore some OOP features). In that sense, OOP is inherently about preferring some data layout in memory over another, whether you think about it in that way or not.

The downside of this data layout is that it makes it harder to split one set of relevant data off from another, that is, it makes *efficient* subdivision more difficult. Whether or not that's the case of course depends on the problem you're trying to solve; the assumption must be that you want to solve a problem involving one subset of your objects' data members in parallel to another problem involving a disjoint subset of data members.

You're arguing that whether or not data is, in fact, laid out in memory in a "bad" way is dependent on the language or compiler. To the degree where we're speaking about C or C++, that's just not the case - but of course clever enough compilers for other OOP languages might exist. This is something I doubt, though I would love to be proven wrong, with code to read. The reason I doubt it is primarily because it's pretty damn hard without marking your data as "please parallelize", which none of the programming languages I know/use provide a (portable, standardized) mechanism for. Never mind whether they're OOP. Someone mentioned Haskell here; I don't know the language well enough to know whether or not such a mechanism exists.

Does this explain things better?

Comment Re:This is just about as dumb... (Score 1) 755

Look elsewhere in the comments, there's a longer thread where I've commented :)

I'm neither opposed nor in favour of dropping OOP, personally. As far as I am concerned, a good CS graduate must know OOP. A good software engineer must know OOP. But whether or not they *start* with OOP is an entirely different question... because to be good, they also must know a lot of other stuff, which will include other programming paradigms.

What's dumb to me is to drop OOP completely. The principles of OOP can be taught to someone who understands programming in a matter of a few lessons. They won't come out of that being *good* at OOD/OOP, but they'll understand what it is. You can build on that in a sophomore course, much like CMU suggests.

Besides, the "being good" part happens after formal education anyway...

Comment Re:Please enlighten me... (Score 1) 755

I think you're trying to impress all of us with your supposed in-depth knowledge about processors' architecture. I'm still waiting to be impressed.

And I considered for a moment to actually try and clear any misconceptions expressed in your latest comment, and then realized that you're not actually interested. You're just a troll.

Comment Re:Please enlighten me... (Score 1) 755

Nothing, absolutely nothing in this:

struct asdf {

int a[200];

int b[200];

};

is telling the compiler "i'm going to use these two arrays in different cores, so be so kind to generate assembler code to handle this efficiently".

Nothing to do with assembler code, everything to do with data layout in memory, how the data is loaded into cache lines, and whether or not multiple hardware threads accessing the same cache line will be causing lots of cache invalidation or not.

So, it's a compiler problem, not an OOP problem. Two different levels of abstraction.

No, it's not a compiler problem. The C/C++ standards mandate certain things that make this unavoidable; structs must be contiguous bits in memory, and adjacent array elements must be adjacent in memory. The rest follows from there (and an understanding of cache architectures).

On the other hand, I have a problem with this statement in the context of the current discussion:

The bottleneck isn't CPU any longer, it's loading data from main memory into the CPU caches for the most part

Who cares? A worse bottleneck is reading from a RDBMS. The processor's cache is orders of magnitude apart if you think in terms of levels of abstraction.

Entirely dependent on your problem. If reading from RDBMS is your problem, then this discussion is of little or no worth to you. Just because you can't use information doesn't make it wrong, though. There are plenty of people in the world to whom this type of optimization matters.

The CS courses are presumably supposed to impart knowledge about what types of problems exist with parallelization. The same underlying problem, for example, also exists for distributing work to multiple CPUs, or to multiple machines. MapReduce, for example, is pretty much the same principle applied at a much higher level of abstraction - and probable about the level of abstraction that matters to you if you see RDBMS as the major bottlenecks in your experience.

Comment Re:Please enlighten me... (Score 1) 755

Let's look at a single thing to do first and ignore parallel.

These days processors are incredibly fast at executing instructions on register values. The bottleneck isn't CPU any longer, it's loading data from main memory into the CPU caches for the most part. That's simplifying things a lot, but let's keep it at that.

With the example I've given, you've got two ints per struct, and a reasonably large number of these structs. The 200 I've given was just an example. You've also got a problem that involves only one of those ints, say, finding the highest in your set. Because you're loading both ints into the cache, but only need half of them, you'll be forced to perform twice as many loads than if you had two arrays of ints.

Now add a second problem; say, calculate the sum of the other ints. The crucial part is that one of these problems may take longer than the other to execute. That means, that at the point where one piece of code is done with the contents of a cache line, it'll try and replace it with the next chunk of data. The first problem, however, still needs the first chunk of data, so will cause this to load into cache the next time it executes. Instead of having "just" twice as many loads as you need, you might end up having hundreds.

This is not an unlikely scenario. The CPU caches will get filled with data from main memory as it's requested, i.e. when the first int is accessed for solving the first problem. When the CPU switches to processing the second problem, the data is already in cache and not loaded into a separate cache page. Unless the problems are solved at *exactly* the same speed, i.e. both problems will cause the next page of data at the same time, you *will* run into this issue.

Keep the two data sets in two separate arrays, and you'll use two cache lines. But they can be flushed and re-filled independently from each other, leading to fewer loads.

This sort of problem doesn't occur for every type of parallelization, mind you. If you try to parallelize by offloading two pieces of work to two different machines, it's not an issue. If you've got a single problem to solve that covers both ints, you don't have that problem - you can still paralellize, and parallelize pretty well, by running two hardware threads that each process half of the data set. And so on.

The point is that by getting too used to thinking in terms of arrays of structs instead of several separate arrays, you pretty much prevent yourself from solving certain parallelization problems efficiently. It's not that this *must* be bad, it's that it *can* be bad. But if you think in terms of arrays of structs, you're unlikely to even know.

Comment Re:Please enlighten me... (Score 1) 755

I probably should've posted this straight away, but I would recommend two pieces of reading:
- http://lwn.net/Articles/250967/ (all parts) gives a good idea of all the sorts of effects that memory layout can have on your program's performance, and what you can do about it (to the degree that you can).
- http://www.slideshare.net/naughty_dog/multiprocessor-game-loops-lessons-from-uncharted-2-among-thieves specifically speaks about how to improve parallel processing efficiency by (amongst other things) doing what I wrote about above.

Linking > repeating :D

Comment Re:Please enlighten me... (Score 1) 755

See, I don't even mind whether assembly or OOP in a high level language gets taught first. Either way, it helps to learn both to master all aspects of programming.

Personally, I'd prefer to start with assembly, but I've met plenty of people who were sufficiently confused by higher level programming languages that that wouldn't have been useful for them. And to be fair, plenty of programmers will never need to understand assembly, but still earn a decent living and contribute to the state of the art in other ways.

It really depends what you want to achieve. Get many people ready to earn a living? Might as well start with OOP. Get programmers to as high a level of understanding as possible? Probably start with assembly, then.

Comment Re:Please enlighten me... (Score 1) 755

No, but reading helps. I did not say "this is OOP" anywhere.

One of the unintended consequences of OOP is that data tends to be laid out in memory in a similar fashion as I showed, which in turn is less than ideal for efficient parallelization of some tasks. That's all I said.

My point is that the struct you showed represents not a drawback of OOP, but an advantage that is lost in a procedural approach like that, which is the separation between the data layout and the code using the data structure, enabling changes to the data layout at any time without affecting existing code.

We're talking about parallelism here. You might have a point when talking about OOP in general, but when it comes to parallelizing code, this can be a huge drawback to efficiency. That's all I said, that's all I care about for the purposes of this conversation.

It's a problem that exists with certain approaches to creating parallel software, and has little to do with your programming language of choice in the first instance. Well, some languages don't let you control the layout of data in memory, for better or worse, so your choice does play a role... but it's mostly a question of paradigms than languages.

I gave an example in Haskell exactly because it follows different paradigm, being functional instead of OO.

Indeed. Nobody says that other paradigms might not have similar problems. But a purely procedural approach wouldn't necessarily.

The OOP paradigm is problematic for parallel programming because it suggests the inefficient approach; it doesn't require you to write inefficient code, nor does it prevent you from writing efficient code. It's very unlikely that an inexperienced programmer will chose an approach that is in conflict with the paradigm they were taught, though.

But I don't see how OOP suggests the inefficient approach any more than other paradigms.

It suggests that you think of your data in terms of conglomerates of related data items (structures) first, and consider collections (lists, arrays) of such conglomerates second. Many times, parallelization is best achieved by picking these apart into one or more collections of non-conglomerated data. OOP practically forces you to think in terms of conglomerated data, though.

Slashdot Top Deals

UNIX is hot. It's more than hot. It's steaming. It's quicksilver lightning with a laserbeam kicker. -- Michael Jay Tucker

Working...