Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!


Forgot your password?

Virtualized Linux Faster Than Native? 153

^switch writes "Aussies at NICTA have developed a para-virtualized Linux called Wombat that they claim outperforms native Linux. From the article: 'The L4 Microkernel works with its own open source operating system Iguana, which is specifically designed as a base for use in embedded systems.'" Specific performance results are also available from the NICTA website.
This discussion has been archived. No new comments can be posted.

Virtualized Linux Faster Than Native?

Comments Filter:
  • by LiquidCoooled ( 634315 ) on Wednesday May 31, 2006 @07:32AM (#15434194) Homepage Journal

    Running a virtual Iguana OS from within a virtualised Linux environment is dangerous.
    ETROS and NICTA will not be held responsible for any resulting time paradoxes.

  • I can Linus already gearing up to defend his position that microkernels are crap.

    However, I thought the purpose of a microkernel was stability, not performance.
  • Bad Second Link (Score:5, Informative)

    by Ctrl+Alt+De1337 ( 837964 ) on Wednesday May 31, 2006 @07:36AM (#15434205) Homepage
    Ignore the second link. The actual performance results are here [nicta.com.au].
  • by DaHat ( 247651 ) on Wednesday May 31, 2006 @07:36AM (#15434207) Homepage
    Just how fast would a virtualized Linux instance running inside of a virtualized Linux instance running on hardware be?
  • by XoXus ( 12014 ) on Wednesday May 31, 2006 @07:37AM (#15434210)
    The summary is misleading a bit - it's only faster on ARM v4 or v5 processors.

    From TFA:

    Wombat, NICTA's architecture-independent para-virtualised Linux for L4-embedded, can be faster than native Linux on the same hardware. Specifically on popular ARM v4 or v5 processors, such as ARM9 cores or the XScale, Wombat benefits from the fast address-space switch (FASS) technology implemented in L4-embedded, while this is not supported in native Linux distributions.
    • Only? (Score:4, Interesting)

      by Anonymous Coward on Wednesday May 31, 2006 @07:53AM (#15434261)
      I'm not sure if you realize the market penetration of ARM-based processors. They're basically everywhere. One popular use is in routers. Many printers also have ARM chips. They're also very widely used in cell phones and other mobile technology.

      It benefits us all of more performance can be extracted from such chips, just because they're so widely used. Being able to get a greater degree of performance out of a device already in use can lead to lower-cost systems. To suggest that this is of limited use is naive, just because of how prevalent these processors are.

      • Re:Only? (Score:5, Informative)

        by JanneM ( 7445 ) on Wednesday May 31, 2006 @08:10AM (#15434323) Homepage
        It benefits us all of more performance can be extracted from such chips, just because they're so widely used.

        The reaction is not against the performance but the disingenious presentation. A cursory reading makes it seem as if the performance gain was somehow tied to it being a microkernel, or that the virtualization step somehow magically speeded things up. It wasn't - their kernel is using some platform specific optimizations that Linux doesn't, that's all.

        • Exactly.

          It's not "Linux is bad" just that they're using specific optimisations which haven't been realised in Linux.

          Can anyone tell me where these would be applied? In GCC? In the Kernel? Stop using precompiled kernels? :-)
      • Re:Only? (Score:3, Insightful)

        However that quote tells the reason for the performance boost: fast address-space switch (FASS) is supported in L4-embedded, but not in Linux native. IOW, it's not really "virtualized faster than native", but "using FASS faster than not using it". I guess you'd get even better performance if you'd make Linux native support FASS.
      • Seems to me that native Linux would out perform virtualised Linux if they modified Linux to use the same fast address-space switch that L4 does with running on the ARM.
        Now if I could just get a PC that used 4 ARM cores running at 2+Ghz with a good floating point.

        I would love to see a real speed ARM system. Now that Apple has gone over to Intel my hopes of an X86 free life seem more and more distant.
    • Microkernels will not come of age until CPUs support true modularization. Previously on /. :

      http://linux.slashdot.org/comments.pl?sid=185800&c id=15341069 [slashdot.org]

    • Go read the article again (the article with the actual numbers [nicta.com.au]). Yes, there are lots of functions for which the Virtual version is a lot faster, at least on the ARM platforms, so there may be specific kinds of applications where it really rocks - I'd look at routing and real-time control.

      But the AIM7 benchmark, which models typical general-purpose Unix system usage, has consistently faster results for regular Linux than for the Wombat virtualized version, even though there may be individual functions tha

  • Neato but... (Score:3, Interesting)

    by tomstdenis ( 446163 ) <tomstdenis@gmCOMMAail.com minus punct> on Wednesday May 31, 2006 @07:43AM (#15434226) Homepage
    They sacrificed portability by performing some TLB caching hacks. It's a good idea but comparing it to Linux as a whole is a bad idea as Linux runs on more than the ARM they're testing on. If you look at all of the results most are comparable and exec/fork favour Linux.

    • Not to mention the null syscall is 5.7 times slower in the microkernel when compared to Linux. One might think that this would be a nonissue, but you'd be amazed how often server programs call "simple" syscalls such as gettimeofday.
    • They did not sacrifice portability to get this result, it's the whole point of having a tiny microkernel like L4 in the first place. Since the microkernel is so small you can just rewrite it for each hardware type from scratch. You can add a new HAL to linux with some assembly code, but Linux makes a lot of assumptions about how the vm and such work that make it hard to do it the optimal way for some hardware.
    • >They sacrificed portability by performing some TLB caching hacks.

      That's a biased way of reporting what they did: they used platform specific optimisation, Linux does it also quite often but the Linux kernel tested doesn't have this particular optimisation.
  • Twice the buffering (Score:4, Interesting)

    by jellomizer ( 103300 ) * on Wednesday May 31, 2006 @07:45AM (#15434233)
    It is possible. First you have drive access. Normally the data is buffered in memory then is paged out to the drive when the OS sees fit. When it is on the memory it can be accessed faster. So now you are virtualizing the hardware so when the OS says write to the Hard Drive it goes to the Host OS who then buffers it in memory and writes to the drive when it seems fit, so the files are buffered in memory for twice as long, allowing twice the time that it can access the faster data. Usually that is the largest slowdown on the system is drive access, also because when the host OS is writing to the drive the Virtualized Linux kernel is free to do what it wants. I am sure if the application requires a lot of interrupt calls or a lot of displaying to the screen it will slowdown (Unless the virtualized video drivers are much more optimized then the normal ones)
    So it is possible, just as long as you have a system powerful enough to run both OSs well and with a lot of RAM.

    • by tomstdenis ( 446163 ) <tomstdenis@gmCOMMAail.com minus punct> on Wednesday May 31, 2006 @07:57AM (#15434272) Homepage
      This is OT.

      The speedup comes from TLB caching between processes. Not from "double caching".

      In Linux when you switch processes the TLB is flushed [e.g. reloading CR3 on x86-*]. This is a safe [but slow] way to ensure your virtual memory for a given process is mapped correctly. I'm guessing [didn't fully read the linked research papers] that they share a virtual memory base between processes but map processes to different regions or something. Unless they have segment limits this will cause problems with process isolation.

      For those not in the know, a TLB cache holds the translation of a virtual address into a physical one. Parsing a typical 32-bit address requires several layers [with 4KB pages it's four I think] of table lookup which is slow if you had to do it for every memory access. For example, take your 32-bit address, the lower 12 bits is the byte in a 4KB page, the next 10 bits points selects one 4KB page, the next 10 bits selects one 1024-entry array of pointers to 4KB pages. [iirc]. It's even worse in x86-64 mode as we are parsing a 48-bit virtual address.

      So the processor will cache TLB lookups. When you switch processes you have to flush it because the translations don't map to your processes physicals memory.

    • This is interesting... quite often I've seen Windows XP start up faster under qemu than it would natively as Linux has kept an amount of the disk image in the cache. (Of course, if I start it from cold, it spends about 20 seconds just transferring a portion of the image into RAM, and then the rest of the startup is very quick.)
    • Also, a lot of software tends to sync after every write, believing that this is the holy grail for data consistency. This is not true, as losing power already pretty guarantees data loss. Plus, most hardware faults cause something worse than just a clean, nice instant shutdown. Having the disks synced won't save you from most motherboard glitches, a bit faulty memory, and so on. Plus, a sudden poweroff is something that can be easily handled with an UPS; there is no such easy way to ensure the rest of t
    • Or more general, virtualizing allows you to do dynamic optimizations in the interpretation of whatever virtualized API you are using. This how dynamically compiled Java can outperform statically compiled C for selected code fragments. This is also how HP showed a C vm outperforming compiled C on the same machine for selected programs. The vm could do optimizations at run time a compiler couldn't possibly do before run-time.

      Native means performing according to whatever (naive) assumptions the compiler or pro
  • Hm (Score:5, Informative)

    by FidelCatsro ( 861135 ) * <fidelcatsro @ g m a i l . c om> on Wednesday May 31, 2006 @07:49AM (#15434249) Journal
    Could it be because linux for ARM is not that well optimised . I can't imagine such massive performance gains otherwise , bar a massive bug in the kernel.

    Fast Address-Space Switching for ARM Linux Kernels [nicta.com.au]

    The Fast Address-Space Switching (FASS) project aims to utilise some of the features of the Memory Management Unit in the ARM architecture to improve the performance of context switches under the L4 kernel and ARM Linux.
    • Is FASS like "lazy task swapping" that you could turn off with the Intel StrongARM 233T, but not the earlier DEC StrongARM 200K/J?

      I remember on RISC OS it was not turned off (on?) by default as it was pretty unstable.
    • So, the benefits are because the virtualisation host OS provides better use of the underlying hardware than the corresponding Linux port.

      Does this suggest an approach to Linux development whereby the Linux kernel is clean and abstract, and hardware idiosyncracies are handled by a virtualisation layer?

      As always, I speak from a position of total ignorance.
      • It's not a particularly easy thing to do but it has definitely been done before - over and over again. In particular the best-known case of this is the AS/400, whatever the hell they call it now (iSeries maybe?) AS/400 systems used to have a very wonky processor, and ever since they've been putting these hardware/firmware bytecode translators between the OS and the CPU to the point where AS/400 systems now use PowerPC or POWER processors (forget which) and they have this thing stuck on the front that transl
        • Interestingly, some tidy fortunes have been made by small shops building useful AS/400 applications. Those beasts never die.

          At one time, this was a sure way to make money:

          1. Find a hot-selling and useful business app that only runs on some other hardware, say x86.
          2. Invest the time and patience to wade through 9,000 pages of obscure IBM wonkery ("those guys have a different word for EVERYTHING"), enough time at least to create a reasonable development environment for yourself.
          3. Code the app.
          4. Profit.

        • by FnH ( 137981 )
          Last name I heard was System i5 [ibm.com]
          They use POWER5 [ibm.com] processors
          The wonky IBM term is Integrated Language Environment [wikipedia.org]
    • RTFA again. Wombat isn't a "massive performance gain", though there are some functions for which it's several times faster and therefore there may be some real-world applications for which it could be faster. If you look at the AIM7 benchmarks, which are modeling average workloads for a typical Unix system, Wombat was actually 2-3% slower than Linux, in spite of having those functions go faster. That's still really pretty good, given the reputation for slowness than microkernels have. If they can port t
      • >>
        That's still really pretty good, given the reputation for slowness than microkernels have. If they can port those optimizations to Linux, then maybe Linux would be even faster.

        That's kind of happening in (and around) the Xen community, but mostly on the Debian / Xen side of the street. Lots of people have been playing with msxen/ocxen on SBC's (ULV Celeron / P4's) and have optimized the kernel and even some core utils for small memory systems.

        Now you have a whole rash of 'unofficial' Debian
  • by VincenzoRomano ( 881055 ) on Wednesday May 31, 2006 @07:54AM (#15434264) Homepage Journal
    I think that the whole L4 family [l4hq.org] smicrokernels hould deserve some more attention from IT professionals.
    As far as I know L4 is one of the microkernels with more efforts for development. Along with MinixV3 [minix3.org] of course.
    • Sure it deserves attention, but what's the point of using L4 to run.....a monolithic kernel?

      When running Linux under L4 (like in L4Linux), when the Linux process dies because of a bug, the system DIES. Sure, you can restart it, but so can you do in linux when something oopses using Kexec.

      L4 was written to run real microkernels on top of it. If you want to run Linux instances so that a crash of the kernel doesn't crash the system, you'd surprised to know that Linux already includes in it's heart a vm-ish/mic
  • Are those winning performances valid also outside the embedded world?
    I fear that Linux running over a "normal" x86 CPU outperferms almost everything else.

  • Pet maths peeve (Score:3, Interesting)

    by Emil Brink ( 69213 ) on Wednesday May 31, 2006 @08:17AM (#15434350) Homepage
    The performance results [nicta.com.au] page states:

    The result is that context-switching costs of virtualised Linux are up to thirty times less than in native Linux.

    (Emphasis in the original text). This is one of my pet peeves, since I think it's so sloppy use of maths. How can something be "thirty times less?" So, if it takes one second in Linux, it takes them ... what? 1 - 30 * 1 = -29 seconds? I guess they mean 1/30:th of a second, but still, that should have been caught before being published, imo. Or maybe it's just because I'm not a native speaker of English, that it annoys me so.

    • How can something be "thirty times less?" So, if it takes one second in Linux, it takes them ... what? 1 - 30 * 1 = -29 seconds?

      Definitely a problem with your interpretation of the english. "smaller by 30 times", "smaller by a factor of 30".

      The "less" there isn't read as a (subtraction) operator.
    • It takes one-thirtieth (1/30 times x, where x is the time normally taken) the time.

      Yeah, english is like that.
    • Well the problem isn't the math on the article, the problem is that you're trying to parse a natural language statement with a mathematical mindset. Mathematical texts, proofs and literature in general is written in a very concise and with great attention to the precision of what is being stated, and it is read in the same way, with every step being dissected and followed precisely.

      Natural language don't have those requirements, it's intention is to communicate an idea and if this is successfully it don't m
    • sloppy maths or not its the normal way of saying it in english.

    • That's definitely common english usage, meaning:
        30 * virtualized = native
    • Re:Pet maths peeve (Score:3, Insightful)

      by nuzak ( 959558 )
      It's mathematically and grammatically quite sound: if X is 30 times more than Y, then Y is 30 times less than X.

    • Your annoyance is completely justified. When reading technical writing, it often happens that what appears to be a poorly written passage turns out to be a very carefully written passage that reveals something the reader did not previously understand. For this reason, readers of technical material do not gloss over sloppy usage as quickly as casual speakers and readers do. Instead, they try to puzzle meaning out of it. Technical writers should keep that in mind and write documents that repay careful rea
  • by agent dero ( 680753 ) on Wednesday May 31, 2006 @08:19AM (#15434360) Homepage
    I've been researching more and more into NICTA's microkernel and virtualization (for my L4::BSD [google.com] idea) and one thing that is important to understand is that NICTA's development is mainly on ARM, the Kenge toolset, as well as the Iguana OS are both much further along on ARM as opposed to i386

    Considering the work that NICTA does with companies that produce embedded hardware like Qualcomm [nicta.com.au], this isn't surprising, but don't go crazy about this.

    Linux development is much more fine tuned on x86, and Kenge/Iguana development is much more fine tuned on ARM; no need to start holy wars here ;)

    That said, nice work benno, chuck, and gernot (and whomever else I'm forgetting)
  • Not hugely, perhaps 0.3%, but it was consistently faster for what I was doing. I put it down simply to having a better scheduler, and less cache trashing on task switches, or some other voodoo like that.

    So such paradoxes are far from unusual.

    Of course, we could combine these two improvements...

  • by waif69 ( 322360 ) on Wednesday May 31, 2006 @08:39AM (#15434440) Journal
    If one were to use 33 levels of virtualization on the ARM processor, the efficiency is so great that power may be removed and the system runs on its own efficiency. Yeah! We don't need oil anymore.
  • by pmbuko ( 162438 ) <pmbuko@gm[ ].com ['ail' in gap]> on Wednesday May 31, 2006 @08:39AM (#15434443) Homepage
    Even better than the real thing....
  • by mikael ( 484 ) on Wednesday May 31, 2006 @09:30AM (#15434723)
    ^switch writes "Aussies at NICTA have developed a para-virtualized Linux called Wombat that they claim outperforms native Linux.

    So if a para-virtualised microkernel runs a para-virtualised microkernel running Linux, then there should be an even greater speedup?
  • Strange (Score:3, Insightful)

    by Sgt Pinback ( 118723 ) on Wednesday May 31, 2006 @09:37AM (#15434788)
    So, what are they trying to show? "Because we've implemented support for a certain MMU feature and native Linux hasn't, we've demonstrated that virtualizing Linux on L4 is a good idea"? Doesn't sound perfectly logical to me. Apples and oranges come to mind.
  • Welcome (Score:3, Funny)

    by Anonymous Coward on Wednesday May 31, 2006 @10:33AM (#15435312)
    I for one welcome our new Fast Address-Space Switching overlords!
  • Bullshit.
  • May or not be the same thing...but a month or so ago, Anandtech sported a review of the new Apple Intel Core Duo and ran benchmarks on "BootCamp" (native XP with Apple hardware drivers) vs. using a virtual PC technology from Parallel and Parallel or MS-Generic Drivers.

    Tests showed the Virtual PC ran consistently faster than "BootCamp" [anandtech.com] except on a disk-heavy Multimedia benchmark and even there, the virtual PC was only about 2% slower.

    So tell me, how does a Virtual machine run faster than Native?

    It looked lik

1 1 was a race-horse, 2 2 was 1 2. When 1 1 1 1 race, 2 2 1 1 2.