Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Slashdot Log In

Log In

[ Create a new account ]

IBM's OSS Code Morphing Code/or OSS vs. Transmeta

Posted by Hemos on Wed Nov 29, 2000 01:28 AM
from the morphing-for-fun-and-profit dept.
jjr writes: "It seems that IBM has a Open Source Project called Daisy that does a lot of what transmeta does. Their code-morphing technology supports PowerPC, x86, and S/390, as well as the Java Virtual Machine. They Morph the [code] into VLWI just like transmeta but they still have some issues to work out. Other issues dealt with in the report include self-modifying code, precise exceptions, and aggressive reordering of memory references in the presence of strong MP consistency and memory mapped I/O."
This discussion has been archived. No new comments can be posted.
Display Options Threshold:
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1) | 2
  • Re:Can I run MS-WinNT on PowerPC and S/390? by web_angel_tr (Score:1) Tuesday November 28 2000, @09:48PM
  • Re:Code morphing patented? by Baki (Score:1) Tuesday November 28 2000, @09:51PM
  • Re:How will Amiga compete? by Mr Z (Score:2) Tuesday November 28 2000, @11:43PM
  • by Mr Z (6791) on Tuesday November 28 2000, @09:52PM (#594726) Homepage Journal

    BTW, Transmeta has been working on their stuff since 1995, so the technology mentioned in the 1997 paper doesn't strictly predate it.

    I read about Daisy a few years back when I was studying VLIW scheduling techniques and whatnot. The DAISY VLIW is quite different than most VLIWs around. Their instruction word is built upon the ability to execute large numbers of "branches" in parallel every cycle. (As best as I can tell, these "branches" are actually closer to being composite predication conditions in many cases, which is why I put "branches" in quotes.) Their experimental physical implementation could execute something like 8 branches every cycle. Downright weird.

    A more traditional VLIW uses predication [google.com] to convert short branches into a simple "if (cond)" prefix on individual instructions. (This technique is known as if conversion.) Also, traditional VLIW instruction words are flat -- all N instructions in a VLIW bundle execute together in parallel, with no tree structure implicit in the encoding.

    All that aside, the DAISY scheduling techniques sound pretty similar to trace scheduling [google.com] , which was used on the old Multiflow VLIW machines [google.com]. The actual process of converting PowerPC instructions to individual DAISY operations is mostly search and replace, and preserving program order is a matter of constructing proper dependences between the instructions.

    Feel free to ask me questions if you're curious about this kind of stuff. It's my day job.

    --Joe
    --
    Program Intellivision! [schells.com]
  • Re:I can't drive 55, I've got an electric car. by Goonie (Score:2) Wednesday November 29 2000, @12:26AM
  • Re:I can't drive 55, I've got an electric car. by cyber-vandal (Score:2) Wednesday November 29 2000, @12:46AM
  • Re:compilers often fail by kubrick (Score:1) Wednesday November 29 2000, @12:48AM
  • Re:Another way to do emulation by kurisuto (Score:1) Wednesday November 29 2000, @04:39AM
  • Re:Cool Shit by cyber-vandal (Score:2) Wednesday November 29 2000, @12:49AM
  • Re:Apple Dynamic Recompilation Emulator - 68k to P by kennedy (Score:1) Wednesday November 29 2000, @04:45AM
  • Apple Dynamic Recompilation Emulator - 68k to PPC by goingware (Score:2) Wednesday November 29 2000, @01:00AM
  • Re:code morphing technology by Anonymous Coward (Score:1) Wednesday November 29 2000, @01:19AM
  • Re:Rearranging Compiled Code for Optimization by Espen Skoglund (Score:1) Wednesday November 29 2000, @05:00AM
  • Re:Rearranging Compiled Code for Optimization by Fishtank (Score:1) Wednesday November 29 2000, @01:59AM
  • Re:Can I run MS-WinNT on PowerPC and S/390? by EverCode (Score:1) Wednesday November 29 2000, @05:15AM
  • Re:other stuff (Score:3)

    by Salamander (33735) <[slashdot] [at] [pl.atyp.us]> on Wednesday November 29 2000, @05:42AM (#594738) Homepage Journal
    A color LCD of usable brightness (another huge drain on battery life) is going to output a certain amount of energy

    True for LCD, but why limit yourself to one technology?. There's no reason a screen has to emit light at all. After looking at several flavors of "electronic paper" it doesn't seem particularly fanciful to imagine a display which consumes zero power if the image isn't changing and which is readable under the same wide variety of conditions as regular paper. It may well be that such displays will always lag behind more conventional technologies in areas such as transition time or color depth, but for a very wide variety of devices and applications that would still be a big win.

    Even within the realm of light-emitting display technology, there's plenty of room to reduce power consumption. For example, the Light Emitting Polymer work at CDT could lead to displays that consume a lot less power than CRT or LCD displays, in addition to being extremely thin, light and flexible.

    I'm not trying to argue with you here. I completely agree with your main point that power consumption needs to be addressed beyond the CPU. Displays and rotating media in particular are at least as deserving of attention. This is all just FYI.

  • Re:other stuff by Snowfox (Score:1) Wednesday November 29 2000, @06:00AM
  • Re:IBM licensing from Transmeta by CAIMLAS (Score:1) Wednesday November 29 2000, @08:50PM
  • Re:From the FAQ by vortexSurpher (Score:1) Thursday November 30 2000, @04:21AM
  • Re:Can I run MS-WinNT on PowerPC and S/390? by QuantumG (Score:2) Tuesday November 28 2000, @09:55PM
  • Seems like very cool tech by ocelotbob (Score:1) Tuesday November 28 2000, @09:56PM
  • Re:IBM licensing from Transmeta by QuantumG (Score:2) Tuesday November 28 2000, @09:57PM
  • Question by Anonymous Coward (Score:1) Tuesday November 28 2000, @10:05PM
  • Re:also.. by Anm (Score:1) Tuesday November 28 2000, @10:08PM
  • Re:Seems like very cool tech by Mr Z (Score:2) Tuesday November 28 2000, @10:14PM
  • Re:other stuff by The Variable Man (Score:1) Wednesday November 29 2000, @02:15AM
  • coincidence? by cjsteele (Score:1) Wednesday November 29 2000, @02:18AM
  • Re:Java? by QuantumG (Score:2) Tuesday November 28 2000, @10:15PM
  • Suggest profiling userspace kernels by goingware (Score:2) Wednesday November 29 2000, @02:18AM
  • Re:Can I run MS-WinNT on PowerPC and S/390? by firewort (Score:1) Wednesday November 29 2000, @02:27AM
  • Oxymoron alert! by MartinG (Score:1) Wednesday November 29 2000, @02:38AM
  • Is IS emulation by Wesley Felter (Score:1) Wednesday November 29 2000, @07:06AM
  • Re:Rearranging Compiled Code for Optimization by Wodin (Score:1) Wednesday November 29 2000, @02:44AM
  • color screens really aren't necessary. by saintlupus (Score:1) Wednesday November 29 2000, @07:21AM
  • Another way to do emulation by kurisuto (Score:1) Wednesday November 29 2000, @03:03AM
  • Re:Oxymoron alert! by taniwha (Score:2) Wednesday November 29 2000, @07:37AM
  • Re:Another way to do emulation by David Greene (Score:1) Wednesday November 29 2000, @08:52AM
  • Re: 68040 FPU by cant_get_a_good_nick (Score:1) Wednesday November 29 2000, @08:59AM
  • also.. by MrP- (Score:2) Tuesday November 28 2000, @08:36PM
  • Code morphing patented? by MorseKode (Score:2) Tuesday November 28 2000, @08:34PM
  • Can I run MS-WinNT on PowerPC and S/390? by plukas (Score:1) Tuesday November 28 2000, @08:44PM
  • by taniwha (70410) on Tuesday November 28 2000, @08:48PM (#594764) Homepage Journal
    yup they have a cool patent on their writeback buffer - basicly it stalls to clean points where exceptions are resolved - that way they don't have to worry about having 'clean' exceptions - just toss the memory/register changes and drop back to interpretting the code instruction by instruction
  • Nice start... by The-Pheon (Score:1) Tuesday November 28 2000, @08:48PM
  • other stuff by blugecko (Score:2) Tuesday November 28 2000, @08:49PM
  • Re:Nice start... (Score:5)

    by furiousgeorge (30912) on Tuesday November 28 2000, @08:58PM (#594767)
    >Code morphing is a great way to transition to
    >VLIW, but dynamic translation and
    >parallelization will always be slower than
    >native processes.

    No. you're actually wrong (though it is counter-intuitive). Dynamic translation lets you make optimizations at runtime about the behavior of the code that can't be done statically at compile time (or even as well in the CPU using branch prediction, etc etc) . e.g. check out the 'Dynamo' project at HP - emulate the PA-RISC processor on top of itself in software, and get substantial speed improvements....

    http://arstechnica.com/reviews/1q00/dynamo/dynam o- 1.html

    http://www.hpl.hp.com/cambridge/projects/Dynamo/

  • Re:daisy by ToLu the Happy Furby (Score:2) Tuesday November 28 2000, @10:18PM
  • Re:other stuff (Score:4)

    by TWR (16835) on Tuesday November 28 2000, @09:00PM (#594769)
    why doesn't the industry start to concentrate on making energy efficient devices besides the processor, and it would also help out so that we aren't pushing battery technology, because that field seems to be lagging behind badly

    There is certainly research and development on low-energy components besides the CPU; check out the energy usage of the mobile Radion, for one thing. However, there are limits on how much you can possibly squeeze out of some components. Hard drives (which probably eat the most energy in a portable system) need to spin, and there's a certain amount of mass which is being kept moving at a certain velocity, along with a certain amount of energy required to read/write data. That puts a limit on how much energy you can save there. CD-ROM drives have similar limitations.

    A color LCD of usable brightness (another huge drain on battery life) is going to output a certain amount of energy; you could make the screens dimmer, but then they are harder to see. Wireless connections are going to require a certain amount of power for broadcast; the further the connection, the more juice. Sound output requires a certain amount of power, and so on.

    What you're seeing is the design decisions which made the original Palm Pilot: no movable parts for storage, B&W, passive matrix screen, no wireless. And it could run for two months on two AAAs. Adding on just a color screen drops that down significantly and requires rechargable batteries for a reasonable experience. Ditto for wireless. I just don't think there's going to be much of a way around it until we figure out how to store more energy in a light, safe way.

    -jon

  • A portable emulator? by os2fan (Score:1) Tuesday November 28 2000, @09:02PM
  • How will Amiga compete? by FIGJAM (Score:1) Tuesday November 28 2000, @10:29PM
  • Re:other stuff by Anonymous Coward (Score:1) Tuesday November 28 2000, @10:37PM
  • by goingware (85213) on Tuesday November 28 2000, @10:38PM (#594773) Homepage
    I read an IBM paper when I was an OS engineer at A Big Fruit Company [apple.com] which discussed the use of instruction-pointer sampling profilers to optimize compiled PowerPC code (I think maybe actually POWER code, similar but not the same) by rearranging blocks of the machine code in the executable file.

    This was in either late '95 or early '96 - but the IBM work on this had been around for a while by the time I read the paper.

    This technology is widely available now - read all the way to the end to see how you can try it out.

    If you have a jump to a certain offset in a routine, you can move the code where you jump to elsewhere in the file and change the offset you give in the jump. Complicated, because you need to parse RISC machine code, but doable.

    It's made a little easier by PowerPC instructions always being fixed at 32 bits with no extension words (a side effect of that is that there's no way to load a 32-bit constant into a register with a single instruction, which makes it hard to scan machine code by eye for constants in an assembly debugger.)

    This has the effect of speeding up the overall program execution because you group frequently used code blocks together in the executable file, and also in memory once it's loaded. You may find less-commonly used branches of an if-statement put miles away at the end of the file, so that you jump a long ways away and then back in sometimes, but this isn't a big deal because all the frequent cases flow straight along.

    The reason this is a big win is twofold. First, you reduce virtual memory paging and the code resident in physical memory because less commonly used code is all grouped together and just sits idly paged out on disk; that which is taking up valuable physical RAM is of a minimum size and being used actively.

    Also (and more importantly in small programs, and in CPU-bound cases), you make more effective use of your processor's code cache.

    This is because jumping over an uncommonly used branch may load a few unused instructions into the cache at the beginning and end of the branch that's not taken - cache lines (blocks) are of a fixed size and are always aligned by the cache block size, so if you have 32 byte cache lines then the start of any cached code falls at a physical address that is divisible by 32.

    If you run even one instruction into the address rangle, you load 32 whole bytes of code into the cache, deleting 32 bytes of code that might be useful later, then if your code is not optimized this way you'll just end up jumping over most of it.

    Many people who are trying to make their programs run faster would benefit from knowing more about how the cache works. Gary Kacmarcik's Optimizing PowerPC Code [fatbrain.com] has a good discussion of this that will benefit anyone who programs on modern microprocessors - not just PowerPCs. And while Kacmarcik emphasizes PowerPC assembly, most of the benefit of improving cache use you can do from C, C++ or another higher level language.

    The way the profiler works is that an interrupt-driven task is used to check the instruction counter at frequent but random intervals. The samples are saved to a file for later analysis, then a postprocessor makes a histogram which gives the number of samples per basic block of instructions.

    (A basic block, essentially, is any code that falls between a pair of curly braces if it came from original C source code. It's more complicated than that in practice but basically it's a chunk of machine code that has one entry point and one exit. It's possible to analyize machine code with a program and divvy it up into basic blocks.)

    Then basically what you do is sort the machine code, with the most frequently used basic blocks coming earlier in the file.

    Note that the profiling process depends necessarily on the use to which the program is put during the sampling. For best results, you might actually want to prepare several seperate binaries of the same program, each optimized for a different purpose. Or you might want to construct test data or a test script that gives you a good overall average performance.

    Now, how do you get this tool? It's more than just theory. It's available for IBM RS-6000's, although I don't remember what they call it.

    But if you can spare the cash for an iMac you can get it included with the Macintosh Programmer's Workshop [apple.com] - MPW. The particular tool that's used for this is called MrPlus, which is discussed in Apple's Technote 1174 [apple.com] and Technote 1066 [apple.com]

    I believe a variant of this is available in the Metrowerks Codewarrior [metrowerks.com] development environment for PowerPC (CodeWarrior also supports Windows, Linux via GCC and lots of embedded systems but I believe the code reordering is only available for PowerPC).

    CodeWarrior provides both an IDE (on Windows there's a choice of MDI user interface or Mac style with a global menu bar and free windows, which makes me much happier when I program on Windows) and it also provides command line tools, including the entirety of MPW with mwcc preinstalled so you can do "make" style builds on the MacOS (but with a weird makefile syntax). I don't seem to find any mention of this on Metrowerks' website. I'll ask their friendly support guy if I'm correct about this.

    Perhaps you're lusting over using this for Linux. It would certainly be interesting to try using this on the kernel - build the kernel, boot the machine off it, run it for a while under a normal load while you run the instruction pointer sampler, then reorder the instructions in the kernel and boot off the new kernel and you run faster!

    This would probably be easiest to do on PowerPC Linux given the availability of published information from IBM and Apple about it, but I don't see why you couldn't do it for any instruction set. Some would just be harder to parse or rearrange correctly than others.

    Stop drooling and start studying.


    Michael D. Crawford
    GoingWare Inc

  • Re:Code morphing patented? by willy_me (Score:2) Tuesday November 28 2000, @10:48PM
  • Re:daisy by QuantumG (Score:2) Tuesday November 28 2000, @10:48PM
  • Re:Can I run MS-WinNT on PowerPC and S/390? by TheInternet (Score:2) Tuesday November 28 2000, @10:48PM
  • Re:Can I run MS-WinNT on PowerPC and S/390? by GMontag451 (Score:1) Wednesday November 29 2000, @03:24AM
  • Re:other stuff by Goonie (Score:2) Tuesday November 28 2000, @10:51PM
  • Re:Java? by macpeep (Score:2) Wednesday November 29 2000, @03:48AM
  • Re:Another way to do emulation by SlaterSan (Score:1) Wednesday November 29 2000, @03:53AM
  • Re:other stuff by LotharHP (Score:1) Wednesday November 29 2000, @03:57AM
  • Re:Can I run MS-WinNT on PowerPC and S/390? by Omega996 (Score:1) Wednesday November 29 2000, @09:03AM
  • Re:other stuff by Random Walk (Score:1) Wednesday November 29 2000, @04:01AM
  • Re:Can I run MS-WinNT on PowerPC and S/390? by Chang (Score:1) Wednesday November 29 2000, @04:26AM
  • Re:Another way to do emulation by SCHecklerX (Score:1) Wednesday November 29 2000, @04:27AM
  • Re:Code morphing patented? by um... Lucas (Score:2) Wednesday November 29 2000, @09:18AM
  • Transmeta watch out by Cyno (Score:1) Wednesday November 29 2000, @09:25AM
  • Re:Nice start... by jovlinger (Score:2) Wednesday November 29 2000, @10:35AM
  • Digital did x86/Alpha Dynamic Binary Transl in '95 by control meta (Score:1) Wednesday November 29 2000, @11:23AM
  • Re: 68040 FPU by kennedy (Score:1) Wednesday November 29 2000, @11:24AM
  • Re:Suggest profiling userspace kernels by woggo (Score:2) Wednesday November 29 2000, @12:17PM
  • Cool Shit by maccroz (Score:2) Tuesday November 28 2000, @09:03PM
  • Re:Code morphing patented? by pb (Score:1) Tuesday November 28 2000, @09:04PM
  • daisy (Score:3)

    by QuantumG (50515) <qg@biodome.org> on Tuesday November 28 2000, @09:08PM (#594794) Homepage Journal
    yes.. that is because Daisy is a DYNAMIC BINARY TRANSLATOR.. say the words with me. What make Transmeta special is that they have put a dynamic binary translator in a chip and have developed silicon to make it faster. At this very moment I am doing maintenance work on a Pentium -> Sparc dynamic binary translator. Getting x86 float point instructions to work is a bitch, but for some reason the compress95 benchmark needs float point to generate data in the test harness, even though it's an integer benchmark.
  • From the FAQ (Score:4)

    by dieman (4814) on Tuesday November 28 2000, @09:08PM (#594795) Homepage
    How similar is DAISY to Transmeta?

    According to their white paper, Transmeta uses dynamic binary translation to convert x86 code into code for Transmeta's internal architecture. This is similar in concept to the current version of DAISY which converts PowerPC code into code for an underlying DAISY VLIW machine. DAISY was developed at IBM independently of Transmeta. The DAISY research project focuses less on low power and more on achieving instruction level parallelism in a server environment and on convergence of different architectures on a common microprocessor core. A more detailed comparison of the DAISY and Transmeta approaches will be possible after Transmeta publishes their techniques in more detail.
  • by CAIMLAS (41445) on Tuesday November 28 2000, @09:18PM (#594796) Homepage Journal
    Might this be why IBM held back on licensing Transmeta's chip to make the sub-sub-notebooks? (or whatever you want to call them)

    -------
    CAIMLAS

  • Re:other stuff by hyperstation (Score:1) Tuesday November 28 2000, @09:24PM
  • Re:Cool Shit by Anonymous Coward (Score:1) Tuesday November 28 2000, @10:51PM
  • Linux -Daisy - Desktops? by Zecho (Score:1) Tuesday November 28 2000, @09:26PM
  • Re:Nice start... by diablovision (Score:2) Tuesday November 28 2000, @09:27PM
  • Re:Question by Goonie (Score:1) Tuesday November 28 2000, @10:56PM
  • Re:Nice start... by norton_I (Score:2) Tuesday November 28 2000, @11:20PM
  • by norton_I (64015) <hobbes@utrek.dhs.org> on Tuesday November 28 2000, @11:27PM (#594803)
    RE: prior art, please read Transmeta's patents before flying off the handle here. To the best of my knowledge, TM didn't patent dynamic translation, but they have several patents on optimizations for dynamic translation. Most of them are particularly suited to the situation where the hardware can be specifically designed to help out the translator, like the shadow registers used to insure precice exception behavior.
  • Re:Rearranging Compiled Code for Optimization by Mr Z (Score:2) Tuesday November 28 2000, @11:36PM
  • I can't drive 55, I've got an electric car. by Kibo (Score:2) Tuesday November 28 2000, @11:37PM
  • Re:Another way to do emulation by Mr Z (Score:1) Wednesday November 29 2000, @12:39PM
  • Re:coincidence? by Mr Z (Score:1) Wednesday November 29 2000, @12:45PM
  • Re:Java? by os2fan (Score:1) Wednesday November 29 2000, @03:07PM
  • Re:Java? by QuantumG (Score:2) Wednesday November 29 2000, @03:56PM
  • Re:compilers often fail by QuantumG (Score:2) Wednesday November 29 2000, @03:58PM
  • Interesting spin-off's... by SirFlakey (Score:2) Tuesday November 28 2000, @09:28PM
  • Re:Can I run MS-WinNT on PowerPC and S/390? by 2nd Post! (Score:2) Tuesday November 28 2000, @09:35PM
  • energy storage by nut (Score:1) Tuesday November 28 2000, @09:37PM
  • Java? by mvc (Score:1) Tuesday November 28 2000, @09:37PM
  • compilers often fail by QuantumG (Score:2) Tuesday November 28 2000, @09:45PM
(1) | 2