Transmeta's patent simplifies maintaining state when instructions have been reordered. The state at the instruction that caused the fault must be determined even if instructions following it have already been executed. Transmeta achieves this by using working integer registers (and uncomitted memory stores), which may or may not be comitted to the x86 registers (and memory) depending on whether or not an exception occured. Comitting occurs on integral instruction boundaries which I take to mean boundaries the state of the native instructions and the host instructions happen to coincide. Comitting appears to be the equivalent of a zero-cycle instruction. Should an exception occur, the state at the last commit is restored and the target instructions are translated and re-executed in order to recreate a correct state when the exception reoccurs. The patent's prefered embodiment uses 64 working integer registers and 32 working floating point registers.
The patent describes hardware assist for memory mapped I/O (MMIO) regions. Instructions accessing these regions may not be reordered since the behaviour of one I/O location may depend on the contents of another (say a index,value pair of MMIO registers). An optmization strategy traditional emulators cannot use is to reorder memory accesses which is only safe if accessing main memory: recall that in C, volatile is used to give the compiler more chances to optimize, but volatile does not exist in machine code. Transmeta's native instruction set includes memory accesses that will cause an exception if an address refers to a MMIO region (achieved by adding a MMIO bit to the TLB which states whether or not a memory region is MMIO). Thus the Morphing software can aggressively optmize all target instructions assuming non MMIO, but recover where the assumption fails, and retranslate the code appropriately. A similar process allows target variables to be kept in registers instead of memory.
The patent claims a preferred embodiment of the native instruction set is VLIW because VLIW instruction sets are easy to parse and allow one to specify what each of the execution units is doing at any cycle. A cache of translated instructions is kept either on chip or in main memory. The die includes only one CPU which translates and executes the translated code. No performance gain would be achieved by having a second processor to do the translation since Transmeta's studies indicate that "the translation for a target instruction (once completely translated) will be found in the translation buffer all but once for each one million or so executions of the translation". In the case of code that often generates exceptions, both the reordered and the in-order translation may be cached. The prefered embodiment uses a 2 Meg cache.
The patent describes hardware assist to detect self-modifying code due to a CPU-writes or DMA writes to areas of code with cached translations. A write-protection bit is added to the TLB to specify that the translation code will go stale if the TLB entry's page is written to. Writing to a write-protected page causes an exception, just like the read-only bit of the x86's page table.
The patent finishes with an example showing a while loop of 10 x86 instructions being translated into native instructions. Every X86 EIP update and every general protection fault check is expressed as an explicit instruction. In general this translation increases instruction count six fold. Further optmization however reduces this down to two-fold, with the equivalent of only 2 core instructions being in the loop. Recall that 6 instructions can be packed into one VLIW instruction. The general optimization technique is: be optimistic and fix it if it turns out the assumptions were wrong.