Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×

Torvalds Has Harsh Words For FreeBSD Devs 571

An anonymous reader writes "In a relatively technical discussion about the merits of Copy On Write (COW) versus a very new Linux kernel system call named vmsplice(), Linux creator Linus Torvalds had some harsh words for Mach and FreeBSD developers that utilize COW: 'I claim that Mach people (and apparently FreeBSD) are incompetent idiots. Playing games with VM is bad. memory copies are _also_ bad, but quite frankly, memory copies often have _less_ downside than VM games, and bigger caches will only continue to drive that point home.' The discussion goes on to explain how the new vmsplice() avoids this extra overhead."
This discussion has been archived. No new comments can be posted.

Torvalds Has Harsh Words For FreeBSD Devs

Comments Filter:
  • by RailGunner ( 554645 ) * on Friday April 21, 2006 @01:25PM (#15174955) Journal
    Yes, you've got it right on what Linus is saying.

    The issue of memory copy performance is a tricky one, especially since CPU cycles are not the be-all to end-all of performance. Does the exception generated really cost that much more than he believes, or is it often eclipsed by the cost of the extra memory read/writes and CPU waits that are normally generated by a copy? Is it really feasible to expect program developers to do manual memory management in a day in age when programs easily weigh in at hundreds of megs?

    What programs weigh in at hundreds of megs? Don't count data files or map files for games. The entire bin directory of a PostgreSQL install is only 20 megs, and that's a lot of stuff there.

    And as far as doing memory management... YES. I have yet to see a compiler do a better job at managing memory than what I can do when I write my code - and the reason is quite simple: I'm the domain expert, not the compiler. Compilers generally do a good job, but it's those specific cases that bite you over and over again.

    Linus is also right about child threads writing to memory. If that never happened, we wouldn't have a concept of a lock or a semaphore. The bottom line is that is happens a lot.

    He may be right, but I'd like to hear more discussion between the *BSD guys and Torvalds before we put this matter to rest. And preferrably without the insults this time. :-/

    I agree, the ad hominem was completely unecessary.

  • by Peaker ( 72084 ) <gnupeaker@nOSPAM.yahoo.com> on Friday April 21, 2006 @01:29PM (#15174988) Homepage
    Ok, let me see if I've got this straight:

    Copy on Write saves you real memory, cache memory, and CPU time by pretending that each forked process has a true copy of a memory segment when it in fact is looking at the original. That is, right up until a fork tries to write to that memory location, in which case an exception is handled by making an actual copy to a new location and allowing the write.
    Linus believes that the exception will occur enough in real world usage that it will be slower than just doing the copy in the first place.
    Linus wants to push the manual use of zero-copy memory sharing through the vmsplice() routine. He believes that the programmer will always know better than the system when to share memory.
    Linus doesn't like "VM Games" despite the fact that Virtual Memory, Memory Mapped Files, Disk I/O, Write Caching, etc, etc, etc, are all already "Memory Games" and "VM Games"

    Do I have that right?


    I do not know the context of the current debate, but after reading some of it, it seems it doesn't have anything to do with fork at all. I believe everyone agrees COW for fork() is good.

    The disagreement is about a specific optimized implementation of data transfer. Linus says that a simple non-optimized and portable interface already exists. The debate is on the optimized, less portable, high-performance implementation. Linus says it is pointless to use COW in the high-performance implementation, and that makes sense. For this specific issue, it is faster to just explicitly disallow the user from modifying his buffer after "sending" it. If the user wants a more friendly interface, and give up some performance (as COW would), he can just use the friendly low-performance interface.

    If so, I'm not really seeing his issue. Or at least not as hard-line as he sees it. The issue of memory copy performance is a tricky one, especially since CPU cycles are not the be-all to end-all of performance. Does the exception generated really cost that much more than he believes, or is it often eclipsed by the cost of the extra memory read/writes and CPU waits that are normally generated by a copy? Is it really feasible to expect program developers to do manual memory management in a day in age when programs easily weigh in at hundreds of megs?

    Explicitly disallowing "touching" of the buffer you "sent" until you have some ACK that means it completed sending, has little to do with the size of the program (given that it is sanely modular) and is the only way to extract the best performance of the machine. Again, you can always revert to using the simple low-performance send calls that allow you to touch the buffer after sending.

    I'm just not sure that Torvalds is really looking at all sides of this. He may be right, but I'd like to hear more discussion between the *BSD guys and Torvalds before we put this matter to rest. And preferrably without the insults this time. :-/

    Slashdot obviously brings the short words without any context.

    Linus is not saying COW is bad, he says COW for this specific purpose in this specific context is bad. I don't know the context, and I only read the article itself and put only a little thought into it, but so far it makes sense.
  • by Anonymovs Coward ( 724746 ) on Friday April 21, 2006 @01:30PM (#15174993)
    I'm not an expert on any of this, but what I do know is that when you start using up a lot of memory Linux totally sucks. On a 256 MB RAM machine, with about twice that amount of swap, if I run over 50% memory usage the system becomes unusable for long periods of time. Even at much greater loads, FreeBSD just feels slightly sluggish at worst. This has been true for years. It was the main reason many people I know refused to use linux (they went for either commercial Unix or the BSDs). It's still true with 2.6.15 -- I'm experiencing it on my work machine as I type this.
  • by mrsbrisby ( 60242 ) on Friday April 21, 2006 @01:31PM (#15175004) Homepage
    I think the problem with this approach is that COW will only give you a copy of the particular piece of the memory that you accessed. That means that the system has to keep huge tables of what is shared and what is not and every time you make a call to request ANY memory it's going to need to check the table. This action is going to result in an overall performance degradation since the application has to check the table for every write over the long-haul, rather than just duplicate the memory and go.

    It does all those things anyway. The problem is that faults are expensive, and yes- and because they happen in real life, copying the memory IS faster in real life.

    It is possible to exploit the mechanism FreeBSD uses to gain performance- Simply never touch a page after it's been sent out. Or rather, wait as long as possible- say until malloc() fails.

    This would work, but it'd be hard to test and hard to get right.

    What Linus suggests is explicit notification- say a select() or poll() operation that says "these pages are now free". This works out well, and is indeed faster because there aren't any copies or page faults. It's also easier to develop.

    Of course, using COW for TCP buffers is stupid. That's why people don't use them on FreeBSD (at least, not once they've seen the profiler results)- it's never faster. They always use a static buffer and ALWAYS get the page fault when the system is under any load.
  • by AKAImBatman ( 238306 ) * <akaimbatman@gmaYEATSil.com minus poet> on Friday April 21, 2006 @01:38PM (#15175071) Homepage Journal
    I *think* I understand what you're saying. Basically, the problem is caused by the fact that usermode code never (or rarely, depending on your platform) releases any of the memory it has allocated. Instead, it keeps reusing the same memory pools over and over again. This becomes a problem with CoW because the kernel doesn't learn about the deallocation of memory until the usermode reallocates it for another purpose. When that reallocation happens, the read-only exception is going to be triggered. Thus there's going to be a 100% occurance of exceptions on CoW pages.

    However, given that the "free()" routine is part of the OS in FreeBSD, wouldn't it make sense to create a smarter "free()" routine that would attempt to recognize and explicitly deallocate CoW pages?
  • by mrsbrisby ( 60242 ) on Friday April 21, 2006 @01:38PM (#15175081) Homepage
    I'm not an expert on any of this,

    That's obvious.

    but what I do know is that when you start using up a lot of memory Linux totally sucks.

    Correction: when _you_ start using up a lot of memory Linux totally sucks. When I start using up a lot of memory, Linux acts exactly as I expect, and better than FreeBSD.

    http://bulk.fefe.de/scalable-networking.pdf [bulk.fefe.de]

    Hrm. Looks like FreeBSD panics under load in it's default configuration. So sad.

    Meanwhile, I have some systems that constantly run with a run-queue length above 100.0 and are still (albeit somewhat) responsive.
  • by mrsbrisby ( 60242 ) on Friday April 21, 2006 @01:49PM (#15175174) Homepage
    Basically, the problem is caused by the fact that usermode code never releases any of the memory it has allocated.

    Oh no. That's the solution actually. :)

    The problem is in using a static buffer instead of allocating a buffer for each send operation. If you use a static buffer, you ALWAYS cause a fault. If you malloc() each time, you won't fault- at least until you reuse the pages later (when malloc() fails).

    However, given that the "free()" routine is part of the OS in FreeBSD

    No, it's not unfortunately. It's a library call that mucks up [s]brk() or munmap().

    Free _could_ be smart enough to avoid actually freeing the pages until notification occurred, but userspace would still need explicit notification (or just to wait for a while).

    The real issue is explicit notification versus page fault. The page fault is undesireable because it wastes time, memory, and cache. The page fault can be avoided by never reusing memory like I proposed above.

    OR the userspace can simply wait for notification that the pages are done. A signal could be used, but vmsplice() actually causes a fd to wake up that can receive the notification via the recvmsg() system call.

  • by tearmeapart ( 674637 ) on Friday April 21, 2006 @01:50PM (#15175191) Homepage Journal
    >> Linus wants to push the manual use of zero-copy memory sharing
    >> through the vmsplice() routine. He believes that the programmer
    >> will always know better than the system when to share memory.
    >
    > That's correct.

    No, that is not always correct.

    I am a C developer for a large multinational corporation that likes to make money. When I need to fork(), I do not have the time to think of all the memory management invovled with fork(). I just want it to be done reliably, and I want it to be done fast.

    If it turns out that my code runs 10% faster on FreeBSD than on Linux, than that means that the code is probably going to go on a FreeBSD system. And if FreeBSD is not an option, than I am not going to do the optimization (because CPUs cost less than my wages).
    Also: optimization never happens anyways (or at least, not properly).

    So from my perspective:
    I want the kernel to run my code as fast as possible by default.
  • by Anonymous Coward on Friday April 21, 2006 @01:51PM (#15175207)
    Correction: when _you_ start using up a lot of memory Linux totally sucks. When I start using up a lot of memory, Linux acts exactly as I expect, and better than FreeBSD.

    I'm sorry to interrupt here, Your Holiness, but instead of being snarky and flaming the BSD kid, you could've been somewhat helpful and provided an idea as to *why* that might be the case (e.g. swappiness, etc).

    Just a suggestion. And the rap on programmers for being cranky sons of bitches is totally false.
  • by Shohat ( 959481 ) on Friday April 21, 2006 @01:59PM (#15175323) Homepage
    Linus is a gifted engineer ,let him be rude . Aside from Linus being rude , there is no actual story here .
    I used to own restourant and also an Office supplies shop . It was quite interesting and made me some money , but I hated the fact that the most important factor in my life was pleasing(customers) or fighting(suppliers) other people . I had to constantly think what to say and how to behave .
    I am no longer a business owner , and now I work with a rather gifted bunch of engineers , and frankly it gives me great pleasure to know that neither I nor the people I work with dont really care about being polite , clean shaven well spoken or good looking . I can be rude if I want to , they can be rude if they want to , and we all get along very well .
  • by Anonymous Coward on Friday April 21, 2006 @02:00PM (#15175328)
    The point of the fefe.de tests was mainly to expose the general *BSD grandious claims of "stability" and "scalability" and anti-Linux claims to be zealot lies. It has successfully done so.

    BSDers have been saying "you should run version X.Y.Z" since the tests were published, but at this point it matters not because they've already been exposed as frauds. No BSDer has been willing to reproduce the tests, as it will only confirm what the marketplace has already decided ... Linux is the superior OS.
  • by nuzak ( 959558 ) on Friday April 21, 2006 @02:07PM (#15175409) Journal
    Actually he's been into boorish behavior from day 1 when it comes to microkernels. Namecalling between him and Tanenbaum [fluidsignal.com] (admittedly Tanenbaum is a bit haughty and provoking), and his slanderous accusations against microkernel researchers in general (a quote I can't find at the moment, but he basically accuses them all, as one big class, of academic fraud to procure grant money).

    The only microkernel Linus knows jack about is Mach, an ancient piece of crap, which indeed is Linus indeed calls it. It's unfortunate real-world systems were saddled with it, and it's got real performance issues, but Linus carries on about it like Mach ran over his dog or something.

    He conveniently ignores or chooses to remain ignorant of the fact that L4Linux is typically faster than Linux itself. To say nothing of the real-world success of QNX. And even L4Linux is pretty old by today's standards.

    This is all pretty typical behavior of Linus: bluster now, bone up and learn, and implement it later. He did so with SMP (saying famously that the way to do it was one Big F**ing Lock, then learning that no this wasn't such a great idea after all). Then he went on a tirade about sun's /dev/poll before learning that yes they actually didn't cheat and they did it smarter, and Linux followed.

    Ultimately, Linus and Linux come around. Sometimes he just has to vent.
  • by dgatwood ( 11270 ) on Friday April 21, 2006 @02:16PM (#15175506) Homepage Journal

    What Linus suggests is explicit notification- say a select() or poll() operation that says "these pages are now free". This works out well, and is indeed faster because there aren't any copies or page faults. It's also easier to develop.

    Problem is that unless you're talking about declaring the pages "free" by storing more data in the heap info structure, declaring the pages free would require trapping into the kernel, and that is every bit as slow as the exception on most architectures, only now you're doing it more often, since you're doing it every time a page changes from free to not free.

    Even if you do this by just adding info in the heap structure, it isn't clear that the performance hit of doing so will be worth it in the average case, since most fork() calls are followed by exec() and thus zero copies actually occur, so you're optimizing for the 1% case and causing a performance hit throughout the entire execution of the 99% case.

    Even if that performance hit is nearly zero, and even if all of the programs that use fork() never call exec(), though, Linus is -still- wrong. The three possible ways this could work are:

    • COW for active pages, create new blank page in physical RAM for unused---the time (and possible paging) spent creating the new blank pages can result in a massive stall on fork, and you're still doing COW.
    • Live copy for active pages, create new blank page in physical RAM for unused---the time spent copying the live pages can result in a massive stall on fork, and your RSS just bloated dramatically for all the unused pages.
    • Live copy for active pages, map unused pages with virtual pages---you now have a trap when you access the unused pages to actually allocate physical RAM to hold them, thus are no better off than if those unused pages had been COW.
    • COW for active pages, map unused pages with virtual pages---you still have a trap when you access the unused pages to actually allocate physical RAM to hold them, so this comes out exactly the same as using COW for everything.

    I fail to see the logic in this unless you don't care about interactivity. If we are talking about relatively small process footprints, Linus is right. For large process footprints (including stack and heap), the huge lag to copy even the used pages would be unacceptably large, however.

  • by 0xABADC0DA ( 867955 ) on Friday April 21, 2006 @02:47PM (#15175858)
    Torvalds:
    I claim that Mach people (and apparently FreeBSD) are incompetent idiots. Playing games with VM is bad. memory copies are _also_ bad, but quite frankly, memory copies often have _less_ downside than VM games,

    I totally agree with that, I just go one step further and say that Torvalds is also a total idiot: VM games are bad, so use copying instead because that's less bad. But copying is also bad, so why it at all? Neither are good solutions.

    The problem is that linux and bsd are using "virtual memory" to protect processes from each other, but it is designed to run programs that use more memory than is available. Does it sound right that to protect one process from another you are going to use hundreds of thousands of descriptors for each 4k that all say the same thing? It's pretty stupid actually. 4k is too fine-grained for virtual memory these days as disks have grown. It's both too small and too large for process separation.

    The better solution is to use vm for virtual memory and run all code in the same memory space, but only run code that cannot access memory illegally (ie no pointer arithmetic, only references). This code could be written in Java, or libmo, or D, or maybe other 'safe' languages and run at much faster speeds than they do now as traditional linux processes. The code could be straight C that is JIT recompiled/checked to prevent illegal accesses. That's right, I claim that an average Java program would run faster in such a system than a C program does under a linux/bsd-like system.

    Linus is right, there is massive overhead from doing vm games -- like what is done in linux for instance to separate processes. Did you even wonder why you can't use more than about 80% of the physical memory simultaneously (ie walk an array of 80% physical mem size and see what happens)? That's right, the kernel is using that much as overhead and about 7% of that is page tables for *physical memory*. It takes ~1200 cycles just to enter a system call because of using vm for process separation vs maybe 5 using a single memory space. Unix kernels do not give fine-grained access to anything because it's simply not possible with process separation based on vm to do so, not in practice.
  • Re:Sweet (Score:3, Interesting)

    by TCM ( 130219 ) on Friday April 21, 2006 @03:12PM (#15176102)
    Windows: Where do you want to go today?
    Linux: Where do you want to be tomorrow?
    BSD: Are you guys coming or what?!
  • by pthisis ( 27352 ) on Friday April 21, 2006 @06:52PM (#15177956) Homepage Journal
    The vmsplice() approach that Linus is talking about is exactly that -- a call that will block until the kernel is done with the previous buffer.

    That's certainly not the impression I get from Dave Miller's commentary about splice/tee to sockets, which discusses using poll/select/more advanced methods to see when the splice has finished and comments:


    We really can't block on this, but I guess we could consider allowing
    that for really dumb applications.

    It does indeed require some smarts in the application to field the
    events, but by definition of using this splice stuff there is explicit
    knowledge in the application of what's going on.

    This is why I'm very hesitant to say "yeah, blocking on the socket is
    OK", because to be honest it's not. As long as the socket buffer
    limits haven't been reached, we really shouldn't block so the user can
    go and do more work and create more transmit data in time to keep the
    network pipe full.


    Or Linus commenting:

    Some users may even be able to take _advantage_ of the fact that the
    buffer is "in flight" _and_ mapped into user space after it has been
    submitted. You could imagine code that actually goes on modifying the
    buffer even while it's being queued for sending. Under some strange
    circumstances that may actually be useful

I've noticed several design suggestions in your code.

Working...