IBM's OSS Code Morphing Code/or OSS vs. Transmeta 93
jjr writes: "It seems that IBM has a Open Source Project called Daisy that does a lot of what transmeta does. Their code-morphing technology supports PowerPC, x86, and S/390, as well as the Java Virtual Machine. They Morph the [code] into VLWI just like transmeta but they still have some issues to work out. Other issues dealt with in the report include self-modifying code, precise exceptions, and aggressive reordering of memory references in the presence of strong MP consistency and memory mapped I/O."
Re:Can I run MS-WinNT on PowerPC and S/390? (Score:1)
On the other hand... imagine running Daisy atop Windows, sitting inside a VMWare instance on Linux on Daisy on an S/390!
If i'm not completly wrong you can't run Linux on Daisy on a s/390. You could run Linux programms if you like but not install the hole operating system.
But a funny idea :)
--
Re:Code morphing patented? (Score:1)
They (IBM) might be successful since they're such a good customer of the patent office. Surely they will get some favourable treatment to overthrow those patents.
Btw in this case I'd be on IBM's side, since here it is in-the-open research against patents. Aside of what one thinks about IP patents, research should never be limited by futile things like patents. But it would be ironical since IBM, in other issues such a big user (profiter) of the patent system, would kind of have to act against itself (i.e. the patent office) then.
Re:How will Amiga compete? (Score:2)
Not really. Amiga's just implementing a thin virtual machine layer, providing an "ideal assembly language" that provides more control than, say, C, but still provides sufficent abstraction that the code can be targeted to a wide range of CPUs easily. (This is in stark contrast to, say, the Java VM, which is comparitively quite heavy.) You can think of Amiga's virtual assembly as a "medium level language", if such a term exists.
DAISY translates other ISA into its own native Tree-VLIW ISA. Rather than providing an abstract assembly language that gets targeted to a wide variety of CPUs, DAISY is doing the reverse: Take a wide variety of ISAs, and target them to this specialized CPU. Transmeta is similar, although they've chosen to focus primarily on x86 to get the biggest bang for their limited bucks.
--Joe--
Program Intellivision! [schells.com]
Re:Interesting spin-off's... (Score:5)
BTW, Transmeta has been working on their stuff since 1995, so the technology mentioned in the 1997 paper doesn't strictly predate it.
I read about Daisy a few years back when I was studying VLIW scheduling techniques and whatnot. The DAISY VLIW is quite different than most VLIWs around. Their instruction word is built upon the ability to execute large numbers of "branches" in parallel every cycle. (As best as I can tell, these "branches" are actually closer to being composite predication conditions in many cases, which is why I put "branches" in quotes.) Their experimental physical implementation could execute something like 8 branches every cycle. Downright weird.
A more traditional VLIW uses predication [google.com] to convert short branches into a simple "if (cond)" prefix on individual instructions. (This technique is known as if conversion.) Also, traditional VLIW instruction words are flat -- all N instructions in a VLIW bundle execute together in parallel, with no tree structure implicit in the encoding.
All that aside, the DAISY scheduling techniques sound pretty similar to trace scheduling [google.com] , which was used on the old Multiflow VLIW machines [google.com]. The actual process of converting PowerPC instructions to individual DAISY operations is mostly search and replace, and preserving program order is a matter of constructing proper dependences between the instructions.
Feel free to ask me questions if you're curious about this kind of stuff. It's my day job.
--Joe--
Program Intellivision! [schells.com]
Re:I can't drive 55, I've got an electric car. (Score:2)
Personally I think it's a shame that while we all wait for these technologies to get economically viable the suburbs of the US, Canada and Australia are being filled with fuel-guzzling gasoline-powered four wheel drives, despite the fact their owners never take them off road :(
Re:I can't drive 55, I've got an electric car. (Score:2)
This is a sign of how screwed up the US is that this incredibly remote possibility should even have to be taken into account.
Re:compilers often fail (Score:1)
On the Amiga, with Motorola's 68000, 68020, 68030, 68040 and a few 68060, someone actually released a binary patcher that attempted to patch binaries compiled for lower processors to make them faster (use new instructions, avoid ones emulated on the newer chips and thus slower, etc.) It also attempted to patch some sub-optimal cases often produced by the main C compilers in the market.
Tended to work pretty well...
OK, completely irrelevant, but I thought you might be interested anyway
Re:Another way to do emulation (Score:1)
I'm talking about a program which takes a binary compiled for one processor as its input, and gives a binary native to another processor as its output (and then runs it). This way, you only translate once, rather than each time thru the loop.
Re:Cool Shit (Score:2)
Re:Apple Dynamic Recompilation Emulator - 68k to P (Score:1)
Apple Dynamic Recompilation Emulator - 68k to PPC (Score:2)
The first emulator I understand was basically an interpreter, sort of like the Java virtual machine but where the "bytecodes" are 68000 instructions (I'm not sure which actual microprocessor was emulated, maybe it was the '020). Not real fast because you have to decode each instruction every time you hit it, but it was well-written and reliable.
Then there was the dynamic recompilation emulator which I believe first appeared in the first PCI Macs (like the 8500/120) and System 7.5.3 (not exactly sure if that's right but thereabouts).
This was like the JIT - "Just in Time" compilers for Java, it would compile 68000 code to PowerPC code and then execute the PowerPC code natively.
This was a shipping product I believe in late '95 and I'm pretty sure Apple was not the first to do such a thing.
Note that on the Mac they were unable to rewrite much of the low-level OS code from 68000 to PowerPC, at least not initially, and so a lot of system software remained emulated and probably still does. Also it is very common for Mac applications to install interrupt time tasks and many of those are legacy 68k apps and it would be innefficient to switch instruction set architectures all the time.
I seem to recall it takes something like 200 PPC instructions to switch from one architecture to the other so if you're already in 68k code and you're about to run a small routine it's best to remain emulated.
It is possible to write "fat" code that provides both options and the machine will use whichever one it's currently running - this is common for "Extensions" which make "fat patches" to OS calls, and many OS calls are "fat traps".
For this reason, the Classic System 7 MacOS (of which Mac OS 8 and OS 9 are examples, but Mac OS X is a whole different thing) handles hardware interrupts in emulated 68000 code.
Interrupt handlers and device drivers may be written in 68k code or PowerPC code as you like and run on a PowerPC machine.
The dynamic recompilation emulator I think emulates an '040 with its instruction cache issues, and it correctly handles hardware interrupts that happen in the middle of running a chunk of recompiled code.
Early Mac apps very commonly used self-modifying code. For example, if a "code resource" was expected to be loaded into memory and used by the system, many applications would load a small stub that jumped to an offset that was a placeholder. Then they would write an address in the running program code into the placeholder after it was loaded. This kind of thing screwed up on the 040 because you were writing to code using data instructions, but there were lots of workarounds such as the painful decision to flush the data cache after calling BlockMove - and the addition of the BlockMoveData call which wouldn't flush the cache.
Also note that an application (or any code) can install callbacks that are written in 68k, PPC code or fat, and this code will be correctly called from the OS or toolbox, whether it started in 68k or PPC. This works because of something called a "routine descriptor" that is a compact description of a function API - it handles Pascal vs. C calling conventions, instruction set architectures, and the possibility of providing alternative entry points for each architecture.
On 68k there is a "trap" - a defined illegal instruction, that causes a jump to an exception handler. The exception handler reads in the routine descriptor and does the right thing. On PPC, you pass the routine descriptor to the CallRoutineDescriptor function (or something like that).
68k code is legacy and knows nothing about routine descriptors, but the emulated processor handles traps correctly. PowerPC being released after the routine descriptor architecture was all implemented, developers can easily put it directly in their code. There are headers with macros that make most of this transparent so you can compile both kinds from one set of sources.
Michael D. Crawford
GoingWare Inc
Re:code morphing technology (Score:1)
It looks like BASIC and C on crack!
Oh wait...
Is that Python?
Re:Rearranging Compiled Code for Optimization (Score:1)
It's kind of cool; they actualy sample executing code (including kernel) at regular intervals by interpreting some instructions from the instruction stream instead of just recording the instruction pointer. This enables them to gather statistics about the outcome of instructions, physical location of load/store instructions, whether the instruction hit in the cache, how long it took to execute the instruction, and so on.
There is supposedly a downloadable evaluation version of the software at their website (problem of course, is that it only works on alphas running Tru64 Unix or Windows NT).
Re:Rearranging Compiled Code for Optimization (Score:1)
My biggest problem has been with the hideous instruction set x86 provides.
Re:Can I run MS-WinNT on PowerPC and S/390? (Score:1)
Re:other stuff (Score:3)
True for LCD, but why limit yourself to one technology?. There's no reason a screen has to emit light at all. After looking at several flavors of "electronic paper" it doesn't seem particularly fanciful to imagine a display which consumes zero power if the image isn't changing and which is readable under the same wide variety of conditions as regular paper. It may well be that such displays will always lag behind more conventional technologies in areas such as transition time or color depth, but for a very wide variety of devices and applications that would still be a big win.
Even within the realm of light-emitting display technology, there's plenty of room to reduce power consumption. For example, the Light Emitting Polymer work at CDT could lead to displays that consume a lot less power than CRT or LCD displays, in addition to being extremely thin, light and flexible.
I'm not trying to argue with you here. I completely agree with your main point that power consumption needs to be addressed beyond the CPU. Displays and rotating media in particular are at least as deserving of attention. This is all just FYI.
Re:other stuff (Score:1)
Re:IBM licensing from Transmeta (Score:1)
-------
CAIMLAS
Re:From the FAQ (Score:1)
Appearantly, no one at IBM or Transmeta would call ZDNet back, so the writers came to
One other thing, the ZDNet article mentioned something about Crusoe doing parallel processing, and I believe they mean internally, not just multiple processors on a board. I haven't seen anything anywhere indicating Crusoe or DAISY is capable of true parallel processing. Has anyone seen anything about this, or are the ZDNet writes drinking their glow-stick juice again?
Re:Can I run MS-WinNT on PowerPC and S/390? (Score:2)
Seems like very cool tech (Score:1)
ST: Anyone else immediately think of 2001 when they saw the project was named Daisy?
Re:IBM licensing from Transmeta (Score:2)
Question (Score:1)
IMO, the best approach would be a hybrid, where the code morphing could use the intermediate representation as a binary form to generate machine language, and then optimize it using runtime profiling and scheduling based on techniques already used by current compilers on IR trees.
How is this different from a really good Java JIT compiler?
Re:also.. (Score:1)
Read the page. IBM doesn't use the term.
Re:Seems like very cool tech (Score:2)
I seem to recall someone (perhaps IBM's old DAISY website) mentioning that the name DAISY was picked as an obscure reference back to that. Goes back to the old HAL == IBM << 1 thing...
--Joe--
Program Intellivision! [schells.com]
Re:other stuff (Score:1)
coincidence? (Score:1)
-C
Re:Java? (Score:2)
Suggest profiling userspace kernels (Score:2)
It occurred to me that profiling a kernel like I suggested is a problem because the kernel can disable interrupts (as when handling an interrupt) and so even though you might be able to sample to some extent it may be hard to get good results. Also you crash the machine, etc.
But I recall reading recently here that someone had the Linux kernel running as a user space program. So you boot a real linux kernel, then run a fake kernel inside of some kind of hardware emulator or something. It was suggested to use this for kernel development - you could quit the kernel and restart it much quicker than rebooting and there's less danger of corrupting your machine, if your test machine is also your user machine, as is all too often the case.
But with this you could easily profile a userspace kernel and be interrupting it from the outside without the test kernel being aware its being interrupted, as those interrupts are not handled by the test kernel, but external code.
Of course, you'd want this to work for ordinary programs first. Let the kernel be your fourth year project!
Michael D. Crawford
GoingWare Inc
Re:Can I run MS-WinNT on PowerPC and S/390? (Score:1)
Linux, the whole operating system, installs and runs on s/390.
If Daisy runs under Linux, then you could conceivably have an S/390 (and can we be friends if you do?), run linux, run daisy, run windows, run daisy, run macOSx, etc....
A host is a host from coast to coast, but no one uses a host that's close
Oxymoron alert! (Score:1)
Is IS emulation (Score:1)
I see that you also got suckered by the story; DAISY actually does not support x86 and it produces VLIW code, not VLWI (whatever that is).
Re:Rearranging Compiled Code for Optimization (Score:1)
I don't know if anybody is working on it currently, but here's [lwn.net] an article about it:
http://lwn.net/1998/1029/als/rope.html
color screens really aren't necessary. (Score:1)
A color LCD of usable brightness (another huge drain on battery life) is going to output a certain amount of energy
i agree, but i for one (and i'm sure there are others out there) would be happy to get a greyscale screen if i could get an increase in battery power for it. are there any decent laptops out there with black and white screens?
--saint----
Another way to do emulation (Score:1)
It occurs to me that there's a third possible way: rather than doing the emulation step by step as the program runs, step thru the whole compiled program and convert it to native code just once, and then run it natively from then on, rather than re-emulate it each time thru the loop.
How come nobody is doing it that way?
Re:Oxymoron alert! (Score:2)
I suggest the following test to anyone considering patenting something - "would you feel proud explaining your idea to Mr Edison? or embarassed?"
Re:Another way to do emulation (Score:1)
FX!32 does something like what you're talking about, expect it uses the initial, emulated run of the program to find out what parts are actual code. On the next run, if untranslated code is touched, an exception handler emulates it and marks it for translation after program execution.
--
Re: 68040 FPU (Score:1)
Re: (Score:2)
Code morphing patented? (Score:2)
Can I run MS-WinNT on PowerPC and S/390? (Score:1)
No, I'm not trolling. I'm just curious since there has been this sort of "hardware emulation" trend going on recently.
Re:Code morphing patented? (Score:4)
Nice start... (Score:1)
other stuff (Score:2)
Re:Nice start... (Score:5)
>VLIW, but dynamic translation and
>parallelization will always be slower than
>native processes.
No. you're actually wrong (though it is counter-intuitive). Dynamic translation lets you make optimizations at runtime about the behavior of the code that can't be done statically at compile time (or even as well in the CPU using branch prediction, etc etc) . e.g. check out the 'Dynamo' project at HP - emulate the PA-RISC processor on top of itself in software, and get substantial speed improvements....
http://arstechnica.com/reviews/1q00/dynamo/dyna
http://www.hpl.hp.com/cambridge/projects/Dynamo
Re:daisy (Score:2)
No, actually you have it backwards. Intel (and later NextGen, AMD and I believe Cyrix) put a dynamic ISA translator *on their chips* starting with the P6--they decode (i.e. translate) x86 instructions into internal "u-op" instructions (AMD calls them "macro-ops", same idea) which are used by the rest of the silicon. (This is necessary because x86 instructions are too heterogenous in length and complexity to work well in a deeply-pipelined out-of-order core.)
What Transmeta did was essentially move this translator *off* the chip, into software. The advantage of this is simpler silicon, and therefore lower power consumption. (Also, all things being equal, higher maximum clock speeds; all things are clearly not equal.) A secondary advantage is that far more resources (16MB IIRC) can be devoted to buffering, tracing, analyzing and optimizing the instructions than on a chip, where the physical chip-size keeps buffers small and optimizations simple. The disadvantage is that all this needs to be run on general-purpose (i.e. slower) silicon--and worse, competes for CPU-time with the very programs it is trying to optimize. (Not to mention takes up 16MB of system resources.)
So far the tradeoff has been (IMO) a big loser except in special circumstances--where you need long battery life, x86-compatibility (otherwise there are faster, smaller, more efficient chips out there, like anything in the ARM family), little weight (otherwise just use a bigger battery), and have efficient enough components for the rest of the system to actually make a difference (this is the gotcha with traditional laptops). Whether this particular set of circumstances will turn out to be a small or huge market niche, it is certainly a small problem space. Of course, much of the blame is due to TM's implementation rather than the (basically sound) idea; apparently their architecture is not up to Intel's standards (their process technology is IBM, so that's not the problem). Of course, mistakes are very common in the first iteration of a wildly new idea--witness Itanium (harnessing VLIW for very different ends--and arguably with less success) for proof of that.
Re:other stuff (Score:4)
There is certainly research and development on low-energy components besides the CPU; check out the energy usage of the mobile Radion, for one thing. However, there are limits on how much you can possibly squeeze out of some components. Hard drives (which probably eat the most energy in a portable system) need to spin, and there's a certain amount of mass which is being kept moving at a certain velocity, along with a certain amount of energy required to read/write data. That puts a limit on how much energy you can save there. CD-ROM drives have similar limitations.
A color LCD of usable brightness (another huge drain on battery life) is going to output a certain amount of energy; you could make the screens dimmer, but then they are harder to see. Wireless connections are going to require a certain amount of power for broadcast; the further the connection, the more juice. Sound output requires a certain amount of power, and so on.
What you're seeing is the design decisions which made the original Palm Pilot: no movable parts for storage, B&W, passive matrix screen, no wireless. And it could run for two months on two AAAs. Adding on just a color screen drops that down significantly and requires rechargable batteries for a reasonable experience. Ditto for wireless. I just don't think there's going to be much of a way around it until we figure out how to store more energy in a light, safe way.
-jon
A portable emulator? (Score:1)
You will still have a need for OS specific apps, but so much of the customer cost in OS replacement is in replacing the apps for the OS.
Different emulator codes could be optimised for different classes of program: for example, games and productivity suites have different requirements, and thus could use different emulators. You could can the games emulator to stop people running games at work :)
There is so many possibilities that we missed because of the `ILOVEYOU' affair with Windows :(.
How will Amiga compete? (Score:1)
Re:other stuff (Score:1)
Also, battery technology has been slow. (Don't think people arn't trying to fix this.) Getting energy out of something stable like a battery isn't easy.
I guess the point is: people are making low power devices, you just can't be pleased.
Rearranging Compiled Code for Optimization (Score:5)
This was in either late '95 or early '96 - but the IBM work on this had been around for a while by the time I read the paper.
This technology is widely available now - read all the way to the end to see how you can try it out.
If you have a jump to a certain offset in a routine, you can move the code where you jump to elsewhere in the file and change the offset you give in the jump. Complicated, because you need to parse RISC machine code, but doable.
It's made a little easier by PowerPC instructions always being fixed at 32 bits with no extension words (a side effect of that is that there's no way to load a 32-bit constant into a register with a single instruction, which makes it hard to scan machine code by eye for constants in an assembly debugger.)
This has the effect of speeding up the overall program execution because you group frequently used code blocks together in the executable file, and also in memory once it's loaded. You may find less-commonly used branches of an if-statement put miles away at the end of the file, so that you jump a long ways away and then back in sometimes, but this isn't a big deal because all the frequent cases flow straight along.
The reason this is a big win is twofold. First, you reduce virtual memory paging and the code resident in physical memory because less commonly used code is all grouped together and just sits idly paged out on disk; that which is taking up valuable physical RAM is of a minimum size and being used actively.
Also (and more importantly in small programs, and in CPU-bound cases), you make more effective use of your processor's code cache.
This is because jumping over an uncommonly used branch may load a few unused instructions into the cache at the beginning and end of the branch that's not taken - cache lines (blocks) are of a fixed size and are always aligned by the cache block size, so if you have 32 byte cache lines then the start of any cached code falls at a physical address that is divisible by 32.
If you run even one instruction into the address rangle, you load 32 whole bytes of code into the cache, deleting 32 bytes of code that might be useful later, then if your code is not optimized this way you'll just end up jumping over most of it.
Many people who are trying to make their programs run faster would benefit from knowing more about how the cache works. Gary Kacmarcik's Optimizing PowerPC Code [fatbrain.com] has a good discussion of this that will benefit anyone who programs on modern microprocessors - not just PowerPCs. And while Kacmarcik emphasizes PowerPC assembly, most of the benefit of improving cache use you can do from C, C++ or another higher level language.
The way the profiler works is that an interrupt-driven task is used to check the instruction counter at frequent but random intervals. The samples are saved to a file for later analysis, then a postprocessor makes a histogram which gives the number of samples per basic block of instructions.
(A basic block, essentially, is any code that falls between a pair of curly braces if it came from original C source code. It's more complicated than that in practice but basically it's a chunk of machine code that has one entry point and one exit. It's possible to analyize machine code with a program and divvy it up into basic blocks.)
Then basically what you do is sort the machine code, with the most frequently used basic blocks coming earlier in the file.
Note that the profiling process depends necessarily on the use to which the program is put during the sampling. For best results, you might actually want to prepare several seperate binaries of the same program, each optimized for a different purpose. Or you might want to construct test data or a test script that gives you a good overall average performance.
Now, how do you get this tool? It's more than just theory. It's available for IBM RS-6000's, although I don't remember what they call it.
But if you can spare the cash for an iMac you can get it included with the Macintosh Programmer's Workshop [apple.com] - MPW. The particular tool that's used for this is called MrPlus, which is discussed in Apple's Technote 1174 [apple.com] and Technote 1066 [apple.com]
I believe a variant of this is available in the Metrowerks Codewarrior [metrowerks.com] development environment for PowerPC (CodeWarrior also supports Windows, Linux via GCC and lots of embedded systems but I believe the code reordering is only available for PowerPC).
CodeWarrior provides both an IDE (on Windows there's a choice of MDI user interface or Mac style with a global menu bar and free windows, which makes me much happier when I program on Windows) and it also provides command line tools, including the entirety of MPW with mwcc preinstalled so you can do "make" style builds on the MacOS (but with a weird makefile syntax). I don't seem to find any mention of this on Metrowerks' website. I'll ask their friendly support guy if I'm correct about this.
Perhaps you're lusting over using this for Linux. It would certainly be interesting to try using this on the kernel - build the kernel, boot the machine off it, run it for a while under a normal load while you run the instruction pointer sampler, then reorder the instructions in the kernel and boot off the new kernel and you run faster!
This would probably be easiest to do on PowerPC Linux given the availability of published information from IBM and Apple about it, but I don't see why you couldn't do it for any instruction set. Some would just be harder to parse or rearrange correctly than others.
Stop drooling and start studying.
Michael D. Crawford
GoingWare Inc
Re:Code morphing patented? (Score:2)
Willy
Re:daisy (Score:2)
Re:Can I run MS-WinNT on PowerPC and S/390? (Score:2)
NT was ported to PowerPC, but later dropped.
- Scott
------
Scott Stevenson
Re:Can I run MS-WinNT on PowerPC and S/390? (Score:1)
On a related note, does anyone know whether or not this is true: I heard that because the Macintosh Network Server models that apple put out a while ago were based on a chrp motherboard, you could install NT on them. If it is true, does anyone have one of these models they would be willing to sell?
Re:other stuff (Score:2)
Given that the automotive industry, amongst others, has been throwing money at battery research for decades and hasn't made any order-of-magnitude breakthroughs suggests that making more efficient batteries is extremely difficult.
Re:Java? (Score:2)
Re:Another way to do emulation (Score:1)
Re:other stuff (Score:1)
Re:Can I run MS-WinNT on PowerPC and S/390? (Score:1)
Re:other stuff (Score:1)
Of course there are no room-temperature superconducting materials known yet (and strong magnetic fields may cause problems
Also, if you use a head-worn display instead of a huge screen, the size and power output of the LCD drops substantially.
Re:Can I run MS-WinNT on PowerPC and S/390? (Score:1)
Re:Another way to do emulation (Score:1)
They are, and have been for the past few years. Just because it doesn't happen in linux doesn't mean it doesn't happen.
Look at OS/2's odin project for running windoze 95/NT executables NATIVELY:
http://odin.netlabs.org/ProjectAbout.phtml [netlabs.org]
Re:Code morphing patented? (Score:2)
And lets not forget... Transmeta initially chose ibm as a foundry specifically because they have a license from intel to manufacture x86 compatible chips... IBM could have extracted a cross-license agreement to cover whatever technologies they needed covered when the were negotiating wth Transmeta.
Transmeta watch out (Score:1)
Don't you wish you could talk to those managing transmeta directly? I'd love to point at articles like this and say, "I told you so." They are a good year ahead of any competition, but unfortunately their products are still too pricey and too slow. Since Transmeta refuses to open source their code morphing capability I'll put my money and support behind IBM or whomever writes software to give me the functionality Transmeta doesn't even want to give its customers.
I want a system that can change its instruction set on the fly, or at least in prom or bios. I want a system I can run solaris, OSX, Linux, IRIX and wintendoze on natively at near hardware speeds. It would also be nice if this could be a portable system, but that's not nearly as much a requirement. Transmeta refuses to write additional code morphing software for the ultrasparc, MIPS, PPC, etc. instruction sets. So as far as I'm concerned they can be consumed by AOL or the next big monopoly. I won't shed a tear.
Re:Nice start... (Score:2)
Dynamo currently hosts a single binary, while Code Morphing also hosts the OS and the device drivers and
The neat thing is that if you're careful, you can start inlining kernel and driver code into the application. Goodbye context switch on system call! That could get you big, big, performance wins.
Of course, transmeta isn't doing it yet, so maybe I missed something.
Digital did x86/Alpha Dynamic Binary Transl in '95 (Score:1)
Digital (Compaq) [compaq.com] developed an x86 Dynamic Binary Translator running on Alpha called FX!32 [digital.com]. FX!32 won Byte Magazine's "Best Technology" award [digital.com] at Fall Comdex '95.
Dynamic in this case means that some code is emulated on the fly, and some is translated. This approach was pioneered for bytecode systems in Smalltalk implementations in the 80's, and of course is now used in Sun's HotSpot and other dynamic adaptive JVMs.
Static binary translators have been around for even longer, and were used (among other things) for running VAX programs on Alpha. A useful overview [digital.com] of this sort of technology appeared in the Digital Technical Journal 4:4 (1992) [digital.com]. HP also performed binary translation between the HP3000 and the Precision architecture, but I can't find on-line info on that, just a citation [nec.com] to a paper article (1987). There is also a useful survey article [nec.com] on static and dynamic binary translation.
What is presumably novel in Transmeta's approach is that their instruction set architecture (ISA) is tuned specifically for dynamic translation (see page 12ff of Transmeta's paper The Technology Behind Crusoe Processors [transmeta.com]. Some microcode architectures have been designed specifically for general emulation (most have been tuned for a particular macroinstruction ISA), e.g. the early Lisp Machines [uni-hamburg.de] (1976-81).
Re: 68040 FPU (Score:1)
Re:Suggest profiling userspace kernels (Score:2)
The original (sparc solaris) port has been going on for over four years now, with several graduate students working on it. There is an i386 linux port in progress, but I don't know if it's generally available yet. I'd suggest reading the papers (the above link), as there are a lot of fascinating "gotchas" and the ways that these were dealt with are quite clever. (For example: how do you atomically insert a sequence of two instructions into a process that you can't stop?)
In any event, the papers are good reading, and will be very useful to your research. (BTW, the student who started this project is finishing his PhD this winter and has a "killer job" waiting for him. :-) )
~wog
My views are my own and do not reflect those of my university or research group.
Cool Shit (Score:2)
This could mean that upgrading architechtures could be possible while still retaining backwards compatibility. Isn't it about time Microsoft left the x86 instruction world and embraced the newest technology available? This would be like Apple's transition to PPC, although unlike Apple, they wouldn't need to write a software emulator for older software, they could simply use DAISY to morph the code.
Does anyone know how DAISY compares with software emulation in terms of speed? I'm guessing it is a great deal faster.
Re:Code morphing patented? (Score:1)
So now the question is, what patent rights did IBM get out of them?
---
pb Reply or e-mail; don't vaguely moderate [ncsu.edu].
daisy (Score:3)
From the FAQ (Score:4)
According to their white paper, Transmeta uses dynamic binary translation to convert x86 code into code for Transmeta's internal architecture. This is similar in concept to the current version of DAISY which converts PowerPC code into code for an underlying DAISY VLIW machine. DAISY was developed at IBM independently of Transmeta. The DAISY research project focuses less on low power and more on achieving instruction level parallelism in a server environment and on convergence of different architectures on a common microprocessor core. A more detailed comparison of the DAISY and Transmeta approaches will be possible after Transmeta publishes their techniques in more detail.
IBM licensing from Transmeta (Score:4)
-------
CAIMLAS
Re:other stuff (Score:1)
all of that sounds really sweet, and maybe this little battery [slashdot.org] is the answer
Re:Cool Shit (Score:1)
What technology are you talking about? ia-64? FYI, they are. Besides that, I would say that x86 is pretty much the best tech out there, as far as Microsoft is concerned.
Besides, the point of this technology is that the software vendors don't have to do anything (or not much). Their code just runs where you want it. Microsoft is not stopping you from emulating an x86 on this new technology you're talking about.
Linux -Daisy - Desktops? (Score:1)
Re:Nice start... (Score:2)
Well yes...and no. Yes it can let you make runtime optimizations on the code by agressively profiling it at runtime (something a compiler can't do), but you have to remember that when you translate from one instruction set to another it isn't the same as going from an intermediary form to machine language. If you translate from one machine language to another, you have to deal with the fact that code has already been compiled once, and has been scheduled and optimized (perhaps poorly) by a previous compiler. You're stuck with an instruction stream, and extracting the meaning of that instruction stream and generating an equivalent and more optimized set of instructions in another machine language is extremely difficult...much more difficult than it is for a compiler which has access to an intermediate representation. Register allocation, instruction scheduling, prefetching, and instruction selection have already been done for one specific architecture. That's one main disadvantage of code morphing, because you can't really ever hope to correct the mistakes of a previous compiler (which didn't know any better because it was doing the best it could do for its target architecture).
IMO, the best approach would be a hybrid, where the code morphing could use the intermediate representation as a binary form to generate machine language, and then optimize it using runtime profiling and scheduling based on techniques already used by current compilers on IR trees.
Re:Question (Score:1)
Re:Nice start... (Score:2)
Re:Interesting spin-off's... (Score:3)
Re:Rearranging Compiled Code for Optimization (Score:2)
Newer GCCs have something like this. Look up -fprofile-arcs -ftest-coverage in the GCC/gprof documentation. I haven't looked super closely at how well it works, but the documentation seems to hint that it's doing similar types of optimizations. Basically, it takes the profile information for each arc in the control flow tree, and uses that to decide how to lay out the basic blocks when it generates the code.
In my limited experimentation, I didn't see much of a difference (too small to measure) using these tricks, so either my code wasn't helped by it (too small?), or GCC was just going through the motions. YMMV.
--Joe--
Program Intellivision! [schells.com]
I can't drive 55, I've got an electric car. (Score:2)
A nice link to a readable and somewhat technical overview of fuel cells.e s/pems/pems.html [memagazine.org]
http://www.memagazine.org/contents/current/featur
A nice Scientific American article.o ns.html [sciam.com]
http://www.sciam.com/explorations/122396explorati
Two nice links to NEC's proton polymer battery.
Asian Biz Tech article. [asiabiztech.com]
EE Times article (short and sweet). [eet.com]
I'm still waiting for the car that runs on happy thoughts and chocolate that John Stewart promised me.
Re:Another way to do emulation (Score:1)
That works as long as you can identify all of the code cleanly. Particularly outside the UNIX world (think DOS / Win9x), it's commonplace to treat data as code and code as data. (Most UNIX programs just rely on the ELF/COFF file format and don't muck with code vs. data attributes, unless they're doing something icky like GNU C's trampolines which put executable code on the stack.... *blech*)
It's hard enough to write a reliable disassembler that doesn't fall over when it hits a jump table, callback, overlay, dynamic library or other indirect method for loading / invoking code. (At least, one that's reliable in the absence of a symbol table that highlights all of the valid entry points.) What makes you think you can reliably re-assemble a binary for a different target in such a setting?
--Joe--
Program Intellivision! [schells.com]
Re:coincidence? (Score:1)
Unlikely. IBM's been working on DAISY for a loooong time.
--Joe--
Program Intellivision! [schells.com]
Re:Java? (Score:1)
A lot of the speed is lost in the parsing of the source. If the source were tokenised, like the old basic stuff was, then it would probably run a lot faster.
Re:Java? (Score:2)
Re:compilers often fail (Score:2)
Interesting spin-off's... (Score:2)
--
Re:Can I run MS-WinNT on PowerPC and S/390? (Score:2)
MS had the opportunity and ability to port WinNT to PowerPC and Sparc(they did, a while ago)... but it would be really neat to see the alternative!
On the other hand... imagine running Daisy atop Windows, sitting inside a VMWare instance on Linux on Daisy on an S/390!
Geek dating! [bunnyhop.com]
energy storage (Score:1)
Java? (Score:1)
Actually, thinking about the speed limitations of Java... do any of the JIT runtimes optimize code when they translate it? If it works as well as it's supposed to for Transmeta, I'd think the same principle could be applied in Java. Anyone out there know if it's being done, or why it wouldn't work?
--Moss
compilers often fail (Score:2)