Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Slashdot Log In

Log In

[ Create a new account ]

HPs Dynamo Optimizes Code

Posted by CmdrTaco on Thu Mar 23, 2000 11:10 AM
from the nifty-tricks dept.
sysboy writes "ArsTechnica have a very interesting piece about HP's Dynamo. " Interesting stuff about their run-time optimization stuff. They compare it to Transmeta's code morphing technology.
This discussion has been archived. No new comments can be posted.
HPs Dynamo Optimizes Code | Log In/Create an Account | Top | 78 comments (Spill at 50!) | Index Only | Search Discussion
Display Options Threshold:
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • Closed Source Binaries by Anonymous Coward (Score:1) Thursday March 23 2000, @06:33AM
  • This can be done better by Anonymous Coward (Score:1) Thursday March 23 2000, @06:49AM
  • Re:Bad compilers? by David Greene (Score:1) Thursday March 23 2000, @07:50AM
  • Re:Letting you know... by Nick Mitchell (Score:1) Thursday March 23 2000, @09:08AM
  • Yeah, Ars! by pb (Score:1) Thursday March 23 2000, @06:28AM
  • Re:A Priori vs. A Posteriori Optimizations by mprinkey (Score:1) Thursday March 23 2000, @09:10PM
  • A Priori vs. A Posteriori Optimizations by mprinkey (Score:1) Thursday March 23 2000, @09:41AM
  • Good for multiprocessors by Josh (Score:1) Friday March 24 2000, @06:23AM
  • Overhead? by Taco Cowboy (Score:1) Thursday March 23 2000, @07:55PM
  • Re:Bad compilers? by slpalmer (Score:1) Thursday March 23 2000, @06:35AM
  • Re:Windows by omega1 (Score:1) Thursday March 23 2000, @07:33AM
  • Re:Thank Goodness! by Thagg (Score:1) Thursday March 23 2000, @07:08AM
  • patent stuff by Xtacy (Score:1) Thursday March 23 2000, @06:59AM
  • Re:Runtime morphing is NOT NEW by um... Lucas (Score:1) Thursday March 23 2000, @08:25AM
  • JavaVM + Dynamo = Fast Java? by devinjones (Score:1) Friday March 31 2000, @11:12AM
  • Runtime Optimization in Linux Kernel? by TeknoDragon (Score:1) Thursday March 23 2000, @07:52AM
  • code morphing vs. profiling by arodrig6 (Score:1) Thursday March 23 2000, @07:43AM
  • Re:20% faster than native by SEWilco (Score:1) Friday March 24 2000, @08:48AM
  • 20% faster than native by SEWilco (Score:1) Thursday March 23 2000, @06:29AM
  • Re:An artifact of HP architecture? by Yumpee (Score:1) Thursday March 23 2000, @11:47AM
  • Re:Yabut by Mike Wilson (Score:1) Thursday March 23 2000, @04:00PM
  • Re:patent stuff by mmakunas (Score:1) Thursday March 23 2000, @07:21AM
  • VMWare / Plex by codec (Score:1) Thursday March 23 2000, @03:09PM
  • Re:Bad compilers? by Piquan (Score:1) Thursday March 23 2000, @09:48PM
  • FYI by Darkseer (Score:1) Thursday March 23 2000, @07:45AM
  • OT: But did you see the ad? by Sami (Score:1) Thursday March 23 2000, @07:04AM
  • Re: Yabut by spuk (Score:1) Thursday March 23 2000, @07:17PM
  • Re:Yeah, Ars! by Nerf97A4 (Score:1) Thursday March 23 2000, @01:39PM
  • Re:Bad compilers? by _Mustang (Score:1) Thursday March 23 2000, @08:45AM
  • Re:Bad compilers? by jafuser (Score:1) Thursday March 23 2000, @07:59AM
  • Re:Runtime morphing is NOT NEW by scott@b (Score:1) Thursday March 23 2000, @06:37AM
  • Re:Runtime Optimization in Linux Kernel? by Wesley Felter (Score:1) Thursday March 23 2000, @01:30PM
  • Moderate Down! Ignore! by bcilfone (Score:1) Thursday March 23 2000, @08:12PM
  • Thank Goodness! by Lizard_King (Score:1) Thursday March 23 2000, @06:26AM
  • Re:20% faster than native by Niflar (Score:1) Thursday March 23 2000, @11:52AM
  • Re:Is that really runtime morphing? by Mathonwy (Score:1) Sunday March 26 2000, @09:47PM
  • Is that really runtime morphing? by Mathonwy (Score:1) Thursday March 23 2000, @06:21AM
  • Re:Runtime morphing is NOT NEW by Bill Gates (Score:1) Thursday March 23 2000, @06:37AM
  • Re:Bad compilers? by Bill Gates (Score:1) Thursday March 23 2000, @06:42AM
  • Re:Runtime morphing is NOT NEW by RyanShelswell (Score:1) Thursday March 23 2000, @11:23PM
  • Re:Runtime morphing is NOT NEW by RyanShelswell (Score:1) Thursday March 23 2000, @11:29PM
  • Runtime morphing is NOT NEW by Anonymous Coward (Score:2) Thursday March 23 2000, @06:19AM
  • Re:Letting you know... by bhurt (Score:2) Friday March 24 2000, @06:29AM
  • Re:Is that really runtime morphing? by X (Score:2) Monday March 27 2000, @02:11PM
  • Re:Runtime morphing is NOT NEW by X (Score:2) Thursday March 23 2000, @06:35AM
  • Re:Is that really runtime morphing? by X (Score:2) Thursday March 23 2000, @06:38AM
  • Yabut by tilly (Score:2) Thursday March 23 2000, @09:42AM
  • Re:Bad compilers? by SimonK (Score:2) Thursday March 23 2000, @07:39AM
  • Not really; run-time vs compile-time by jabber (Score:2) Thursday March 23 2000, @06:54AM
  • Re:A Priori vs. A Posteriori Optimizations by Azza (Score:2) Thursday March 23 2000, @02:06PM
  • Re:An artifact of HP architecture? by jovlinger (Score:2) Thursday March 23 2000, @11:42AM
  • Well an idea usually has to be changed a little by slashdot-terminal (Score:2) Thursday March 23 2000, @06:27AM
  • Re:Yabut by cburley (Score:2) Thursday March 23 2000, @03:15PM
  • An artifact of HP architecture? by Animats (Score:2) Thursday March 23 2000, @08:26AM
  • Re:Bad compilers? by jeff_bond (Score:2) Thursday March 23 2000, @06:58AM
  • The algorithm's the thing by Skald (Score:2) Thursday March 23 2000, @07:08AM
  • Bad compilers? (Score:3)

    by slpalmer (6337) <slpalmer@g m a i l .com> on Thursday March 23 2000, @06:27AM (#1179835) Homepage
    IMHO, if they are getting a 20% increase in speed after fragmenting *native binaries*, then they have some serious work to do on thier compilers. If the code had been properly optimized by the compiler, then on the fly fragmenting should not speed up execution time. On that same note though, I thought HP native compilers were among the best at optimization. Am I off on this?

    Stephen L. Palmer
    ---
    Here is the result of your Slashdot Purity Test.
    You answered "yes" to 86 of 200 questions, making you 57.0%
  • by tilly (7530) on Thursday March 23 2000, @08:08AM (#1179836)
    Perl compiles the instructions down to byte-code, and then runs an interpreter on the byte-code. Trust me, you really don't want it to interpret one line at a time. You really, really don't want that.

    Also note that while Perl was the first well-known interpreted language to do this, it is an idea that is now considered the norm.

    But to apply HP's ideas to Perl would be quite possible and probably beneficial. Today Perl has a lot of run-time lookups. For instance if I write a function foo, and I call it from a function bar, then (I believe) every time you run my function bar there is a run-time lookup while running bar to find foo because you might have changed it.

    But foo almost never changes from call to call! Imagine instead the following. Each time you run bar, first check a flag for whether you have profiled it. If you have not then increment a counter for how many times it is accessed. If that counter is below some level, just execute. If it is above some target, then profile it and record information from which you can check whether it needs to be profiled. After profiling then execute the profiled version.

    If your original check found that you profiled it, then quickly check that the profiled version has not been invalidated, and if all is fine then run the profiled version. If it has been invalidated then flip the profiled version, set the counter at 0, and run the safe version.

    So...what does this complicated logic do? Why for a minor run-time penalty you manage to remove most of the repeated dynamic run-time lookups for which function to call, probably with significant speedups! (It could be an even bigger win if you did a similar trick on method-call lookups.)

    The basic technique of taking out time to store a fast static lookup on data that is otherwise recalculated is called memoizing, and is good to know. (In Perl memoizing is almost always done by having a lexical hash (ie one defined with my) around the function with memoizing being done by sticking the answer in the hash. So the program becomes check the memoized hash and return that answer if you can. Otherwise calculate the answer, stick it in the hash, and return the answer.)

    Cheers,
    Ben
  • HP's Dymo Server (Score:3)

    by Ratface (21117) on Thursday March 23 2000, @06:36AM (#1179837) Homepage Journal
    Huh - how do HP think they can market an entire server just for producing little sticky tags? I mean, who even buys those small electronic Dymo machines? I mean the old fashioned mechanical ones were perfectly good and everyone loves that "oops, I didn't click hard enough on the last letter, so I'll over-compensate on this letter" look.

    ...Oh, you said *Dynamo*! Sorry!

    ...I'll go away now....

  • Re:Bad compilers? (Score:4)

    by David Greene (463) on Thursday March 23 2000, @06:54AM (#1179838)
    Yes, you are. HP has some of the best compiler people around.

    The thing about static compilers is that they have no idea what happens at run-time. Profiling has been used to mitigate this somewhat, but it's still a huge problem.

    Accesses through memory are slow, so you want to get rid of them. One way to do this is through register allocation. Unfortunately, even if an infinite number of registers was available, you couldn't allocate everything to registers.

    Why? Because we use pointers. There are multiple names for the same data running around in our little electronic brain. When you allocate something to a register, you bind it to one name. This is by definition incorrect for aliased data (data with multiple names).

    Optimizations like register promotion try to get around this by allocating things in regions where the compiler can prove it only has one name. But this is exceedingly difficult when you have things like function calls which must be assumed to access and modify lots of data.

    I won't even get into the problem of static instruction scheduling or other optimizations like partial redundancy elimination.

    In short, aliasing through memory is nearly impossible to track accurately at static compile time. At run-time, the machine knows exactly which memory accesses reference which data, so things like run-time register allocation can do a better job. Crusoe does this to a limited extent.

    Dynamo is essentially a software trace cache [wisc.edu]. Except that when forming the trace, it does transformations like Common Subexpression Elimination and other traditional compiler manipulations.

    IBM has the Daisy project, which does code morphing from PPC to a VLIW ISA. I believe it also does some run-time optimizations. Projects like DyC [washington.edu] and Tempo [irisa.fr] have been compiling at run-time for a while now.

    I like to think of dynamic compilation in terms of the stock market. Which would you rather do: trade stocks with only limited information about their past behavior (and sometimes none at all), or trade stocks after having observed the absolutely most recent trends? I'll bet that if you pick the first strategy and I pick the second, I'll beat you every time.

    That said, there are tricks ou can pull with static compilation. IA64 has the ALAT, which lets the machine track when store addresses match load addresses. This lets the static compiler speculatively move a load ahead of the store. If the store conflicts, the machine will execute some compiler-provided code to fix up the error. Essentially, the compiler is making an assumption that the load and store do not reference the same data and is communicating that assumption to the machine. The machine checks the assumption and invokes some fixup code if it proves to be incorrect.

    --

  • Re:Bad compilers? (Score:4)

    by Cederic (9623) on Thursday March 23 2000, @06:39AM (#1179839)
    The review explains that the optimisations performed are those not available to a static compiler. Because the compilers are effectively optimising 'design time' code, there are further optimisations that can be performed on the 'run time' binary.

    As an example, consider a fragment of code that performs an operation on a sequence of numbers. The sequence is unknown at design time - bounds checking, conditional statements and other code will all need to be compiled in to ensure different sequences can be handled. At runtime it may be that the same 4 number sequence is processed a large number of times. The runtime optimiser (Dynamo) detects this and can optimise out the bounds checking, the conditional statements and streamline the rest of the code.

    It's very clever technology, but it's surprising it hasn't been done more extensively before now. What would be very interesting is to see if something like VMWare incorporated this technology - could it effectively negate the CPU cost of running VMWare itself?

    ~Cederic
  • 17 replies beneath your current threshold.