Transmeta Unveils 256-bit Microprocessor Plans 229
nam37 writes "PCWorld has an article about how Transmeta has outlined its initial plans for a new 256-bit microprocessor dubed the TM8000. They claim it will offer significant advantages over their current TM5x00 line of chips. The processor will be a switch to a 256-bit VLIW (very long instruction word), allowing twice as many instructions in one clock cycle and greater energy efficiency." The article also touches on the popularity Transmeta enjoys in Japan, noting that 92% (CD: corrected from 55%) of the company's revenue comes from there.
Not 55% revenue from Japan - its 92% (Score:3, Informative)
Re:Not 55% revenue from Japan - its 92% (Score:2)
Generally it has me a bit worried - being so dependent on Japanese customers is not a perfect state of the union for a Californian company...
"makes no predictions on availability date" (Score:1, Interesting)
55% figure is wrong (Score:1, Redundant)
In the first quarter of the current fiscal year, 92 percent of Transmeta's net revenue came from Japan, a figure which is up from 55 percent in the year earlier.
In other words, in the first quarter, 92% of their revenue was from Japan. Last year during the first quarter, 55% of their revenue was from Japan.
That could mean anything, btw.
Re:55% figure is wrong (Score:1, Offtopic)
Wait, wait! I know this one! I saw it in a movie [imdb.com] once: The Japanese are stealing our technology!
Re:55% figure is wrong (Score:2)
How will this chip be energy efficient? (Score:4, Interesting)
Re:How will this chip be energy efficient? (Score:2, Interesting)
Re:How will this chip be energy efficient? (Score:3, Informative)
Unlike an Intel processor, the Transmeta chip is based on a RISC architecture. If you take a look at a CISC processor, like an Intel chip, there is a ton of work that just goes into decoding the instructions. Some instructions are one byte, others are two, some have data imbedded in various bits of the instruction, etc. This makes the decoding and dispatching of instructions quite complex. On a RISC architecture chip, certain bits always indicate the instruction, others are always data. Decoding on these chips is simple.
Now, if you were to double the number of input bits on a CISC processor - you would have to duplicated some fairly complex (read power hungry) circuitry. On a RISC processor, doubling the input bits simply doubles some simple hardware.
Still, that doesn't explain why 2x the bits yields an energy saving... The reason for that is that the concept of doubling the circuitry is a simplified explanation - some of the hardware can be shared. Really, they're just going to be feeding two instructions through in parallel, so for example, you only need to go through one power hungry bus cycle to get the data. You only need to run the dispatch unit once per two instructions, etc.
Much like an automated car wash that uses a bunch of water and electricity. If you changed the design slightly, so that you could run two cars through at once instead of only one you'll use more water and electricity then one car but not as much' as if the two ran through seperately.
Re:How will this chip be energy efficient? (Score:2, Funny)
Re:How will this chip be energy efficient? (Score:3, Informative)
This is basically completely wrong.
The Transmeta machine is a VLIW machine, almost the antipathy of CISC. It is closer to what is called "superscalar" machines than anything else.
The idea is that you have a 256 bit INSTRUCTION, not data path. There are several different functional units. Maybe one is a multiplier/divider, another is a floating point unit, another is an address calculator. Maybe you double up each of these resources when you go from 128 to 256 bits. The idea is that each functional unit gets it's own part of the instruction. VLIW stands for Very Large Instruction Word after all - not very large data path!
Next - you need fancy compilers, in this case it's the Transmeta just-in-time compilation that can schedule use of as many as possible of these functional units on a computation thread. Thus as the number of functional units goes up, the potential computation done per clock goes up.
VILW is the opposite approach from RISC. (Score:2)
CISC uses a relatively larger set of complex instructions. Those instructions are typically decoded using microcode. Intel takes the approach of decoding a large quantity with a massive hardware decoder, which is faster, but takes up a lot of silicon.
VILW uses a very long instruction word. The instructions typically have to be decoded by microcode, because a hardwired decoder would be prohibitivly huge. It gets it's speed by taing a single instruction that can define multiple task which the processor can perform in parallel. The instruction set is designed to take advantage of the different types of opperations that can be performed in parallel, and a very complex compiler is required to create the most efficient machine code. The end result is a processor that gets a lot more done per machine cycle and can therefore run at lower clock speed and still perform well.
It's the lower clock speeds that really help the power disapation. Heat is produced by resistence to the flow of electrisity, and the resistence in a capacitor goes up quickly as frequency (clock speed) increases. Even though a VILW processor is doing more and creating more heat per clock cycle, they can end up with less heat at the same performance level.
Re:How will this chip be energy efficient? (Score:2)
CISC vs. RISC is a red herring. Today's RISC machines are as complex as today's CISC machines under the hood. The real difference between VLIW machines and current ones is that the VLIWs are statically scheduled whereas other current desktop and workstation CPUs are dynamically scheduled.
Statically scheduled machines rely on compiler software (in the case of Crusoe, the code morphing software) to take a sequence of instructions and determine what order they'll be issued in and what instruction-level parallelism is available.
Dynamically scheduled machines take a serial sequence of instructions, and use large amounts of complex hardware to detect dependences between the instructions. From this, it determines the instruction schedule on the fly.
Statically scheduled processors benefit from greatly simplified instruction decode and dispatch (since no dependence tracking and no real decision making is required aside from conditional branches). Dynamically scheduled processors have some performance benefits insofar as they can make opportunistic scheduling decisions with the additional information that's available at run-time, and not available to the compiler.
On traditional VLIWs, the compiler is usually only able to statically analyse a program and so it may have to schedule conservatively. A typical example is that the compiler may not be able to tell when two different pointers point to the same thing, so it must serialize accesses via the two separate pointers. Crusoe is able to do a couple things better: First off, the instruction set and hardware provide some mechanisms that allow the machine to speculate sequence of instructions (that is, essentially, make a programmatic guess that a given optimization is OK and check it afterwards, discarding the result on the off-chance it's wrong). Second, it can instrument the code and get on-the-fly branch and function profile data so that it can re-optimize the hot spots more aggressively. Both of these can allow the statically scheduled Crusoe to approach the performance of dynamically scheduled CPUs in the cases where it would've fallen behind. In a sense, embedding the code-morphing software on an otherwise statically-scheduled device makes it a "blockwise dynamically scheduled" device.
Spelling aside on dependences vs. dependencies . The correct term is dependences when talking about how one instruction depends on another's result. This link [umbc.edu] gives a primer on the types of dependences that can exist between instructions.
As for energy efficiency: If you're able to get your work done in fewer cycles, you can power the clock off sooner or run it at a much slower rate. Power consumption is linear with respect to clock over lower clock speeds, but as you get to higher speeds, various effects cause non-linear increases in power consumption.
Also, keep in mind that energy efficiency is computational work per Joule. The absolute power consumption may are may not be lower with a more energy efficient part. In this case, they're saying 3x faster and 47% more energy efficient. I read that as meaning approximately, if you compare TM5800 to TM8000 at full-tilt-boogie on a given task, TM8000 will probably dissipate 2x as much power (Watts vs. Watts), but do so for 1/3rd as long.
Another thing to keep in mind is that TM8000 will probably be on a newer semiconductor process node than TM5800.
--JoeRe:How will this chip be energy efficient? (Score:2)
I am assuming that their are a ton of technical challenges to overcome, as well as the fact that such chips probably couldn't run at very high clock speeds. Any other reasons?
Re:How will this chip be energy efficient? (Score:1)
Re:How will this chip be energy efficient? (Score:2)
I think the current Transmeta chips are 128 bit.
"Conventional" CPUs (like Intel/Sun/AMD/etc) wouldn't benefit from 128 bits, but the Transmeta chips are VLIW, meaning they cram several instructions into a single word. Doubling the number of bits doubles the number of instructions that can be crammed into a single word. Of course this assumes that you can extract that level of parrallelism from the code.
Re:How will this chip be energy efficient? (Score:3, Informative)
In fact, I seem to recall that the original VLIW work in the 80s was done on 512 bit and 1024 bit designs, using bit slice components of course.
Large processors need large data and address buses, which means a lot of power hungry transistors on the periphery of the chip, as well as the longer array of the various bus gates inside the chip. The technical challenges in doubling the bus width are enormous.
In fact, a major feature of the Transmeta design is the way the internal compiler reviews code, rearranges it and caches the streamlined code for repeat execution. It means that, just like JIT compilation in Java, the first time through a loop is slower than subsequent accesses. The wider the instruction word, the greater the opportunity for this kind of rescheduling, but also the more cache memory is needed and the more the initial performance hit. Great for playing DVDs or database searches, not so good for office work.
Re:How will this chip be energy efficient? (Score:2)
Re:How will this chip be energy efficient? (Score:2)
The Trace architecture was extensible to 256 if memory serves.
Both machines were available in the mid 80's.
The fact that you couldn't grow these architectures and remain binary compatible was there achiles heel - consequently they were dead -ends both architecturally and commercially. It's the JIT application the problem that is revolutionary.
Re:How will this chip be energy efficient? (Score:2, Funny)
then enjoy all the publicity and sell more of your simple 8 bit processors.
Re:How will this chip be energy efficient? (Score:2)
Given that TM's JIT-recoding to VLIW's been around for a while i guess they know by now if they can get usefull ILP out of such a wide instruction word.
Their approach to low power - basicly gating every clock in sight even though it plays havoc with the tools most other people use is becoming more common (and the tools are evolving too). It works because they only turn on the parts of the chip that are being used at anyone time and a very small granularity - in the past people like Intel tend to do things like stopclk duty cycle modulation (stops the clock to everything for some percentage of the time) - this still means you're wasting energy when you are running
Re:How will this chip be energy efficient? (Score:2)
Intel has made GHz cool and increasing IPC uncool. Transmeta is trying to blow this away. If they are succesful, Intel's marketing hype will unravel and the company will consume itself like Jabba the Hut with a banquet just out of arms reach (Jabba is immobile methinks).
Or maybe killing Intel's marketing will just cause MSOffice+email people to be happy with their PCs instead of constantly upgrading when Intel tells them to, resulting in a crash in the tech sector. Correction: been there, done that.
Jabba the Hutt (Score:2)
Re:How will this chip be energy efficient? (Score:2)
Re:How will this chip be energy efficient? (Score:2)
Re:How will this chip be energy efficient? (Score:2)
Unfortunately I think it's going to be a long time before we see Itaniums replacing the P4 as Joe sixpack's standard comb-puter.
Re:How will this chip be energy efficient? (Score:5, Informative)
Since transmeta chips are VLIW, they do not have to schedule instructions, and do not have to determine (at run time) which instructions can be executed in parallel. With VLIW, both of those functions are performed by _software_, statically, all at once. A singificant amount of the complexity of a cpu is dedicated to performing these functions, which are offloaded to software by transmeta in their "code morph" phase.
Furthermore, the conversion from outdated x86 microOps occurs in software during the "code morph" phase, further offloading functionality that otherwise would exist in silicon.
For these reasons, the transmeta CPU is dramatically simpler than comparable x86 cpus. Unfortunately, it did not perform as anticipated. However, since the die size is so small and the cpu so simple, it does offer some advantages (low power consumption, low heat dissipation).
Re:How will this chip be energy efficient? (Score:2)
My confusion comes from the fact that the Crusoe is rarely dealing with code that was compiled for it. The code is usually compiled for X86, and then "code morphed" into instructions for the Crusoe. That seems like you'd lose all your efficiency because even if the compiler takes a long time to figure out the best instructions, it's time taken once and not while the application is running. Code morphing is interpreting the x86 into Crusoe instructions and then running them on the Crusoe, and the code morphing is done while the applicatin is running and on the same processor. I just don't understand how this can be efficient.
Is the current X86 processors which take CISC instructions, convert them into a reduced set of instructions that it can handle quickly, and then shove them through at really high clock speeds equally inneficient that a horribly non optimized VILW processor can compete? If that's the case, why isn't the Itanium which is VILW blowing us away with it's performance?
Re:How will this chip be energy efficient? (Score:2)
Re:How will this chip be energy efficient? (Score:2)
It does hurt them that they code morph on the same chip that they run the x86 software on. However, they can get away with it becuase they can cache the translated code segments. Self-modifying code and stuff with "debugger bombs" in it may destroy performance and/or prevent proper execution. In general, though, they get saved because, on average, 90% of the time is spent in 10% of the code. This means their translation cache gives them a huge performance boost in most applications. The P4 also uses an on-chip microOp translation cache, probably creating huge savings in terms of power usage due to the x86 decoder unit.
In it's purest form, VLIW would be like taking several MIPS chips and giving them the same cache and register file and demuxing the instructions out to the different chips. The chips would trust the compiler and not check for data dependancies.
Itanium doesn't know what it wants to be. Intel doesn't call ia64 VLIW, they call it "EPIC : Explicitly Parallel Instruction Computing". It's a beast with lots of registers (RE: really long context switches. The ia64 Linux porters decided to cut down on the number of user-space avalable registers in order to shorten context switches.) and register windowing (windowing didn't help SPARC very much, and eats up a fair number of transistors). On the other had, they neglected to give it a full floating point unit, so any floating point op causes an FPSWA (floating point software assist) interupt. Furthermore, the decided not to match the instructions to the bare hardware, but instead made the CPU pretend to hav infinately many execution units and inserted some flags in the instructions to indicate where the parallelism breaks. This is needlessly complex. Don't ofrget on-chip slow-ass x86 emulation. Do a google search for Elburus, or look backa couple of days on /. They've gotsome good arguments about why EPIC (and Itanium, in particular) is worse than VLIW. They also say their approach is better than the Transmetta approach, but say Transmetta is onthe right track. Basically, they would like to see a partiall static and partialy dynamic recompilation solution rather than an all-dynamic solution used by Transmetta. I think the Elburus approach is better for geeks, but may be hard to make seamless for the general populace.
Re:How will this chip be energy efficient? (Score:2, Interesting)
The chip design can be thrus much simpler. On the other hand, by using a code-morph as it's "public" layer, the chip can be adapted to almost any requirement, making it very popular for... yes... making gadgets... and that is life and blood in Japan.
[how i liked to have a devkit for transmeta... but here in Europe, that is not only hard to find, but not a *must* for this market]
Cheers...
P.S.- Can someone pls add a JVM inside the new transmeta processor? Why run x86 when you can run bytecode directly?
Well... (Score:3, Insightful)
It would be great if they came out with more mainstream ways to use their products, such as real viable ATX style boards. It would certainly let their products be used in more mainstream areas. Who wants to develop/search for a custom mainboard, which (due to lack of volume) costs more than anything comparable Intel/AMD. This may in fact be a large contributor to why Asia is such a huge market for Transmeta, they are more friendly to manufacturing custom boards/systems to use the chips efficiently.
Re:Well... (Score:2, Insightful)
You don't see desktop computers based on a Transmeta just as you don't see desktops based on StrongARMs.
Re:Well... (Score:1)
I know that the worlds full of better solutions, but this is still a way for Transmeta to get their stuff into the oem market. To get a rack full of transmeta CPU's, the only solution I know of is RLX System 324 (not to say there arent others -- pardon my negligance of the market).
I guess what I'm saying is that for a company developing a proprietary CPU, they really need to do a better job of making sure that there's technology to work around it (I can see a low power utilization cpu/mobo/powersupply for 1U being a decent seller). If they had a few partners developing mobo/powersupply stuff that was accessible through normal oem/distributor/retail channels, they might have better success in america + western europe.
EEK! When did I turn into a marketing head. I would used to be talking about the technology instead of how to sell it
Re:Well... (Score:2)
I really like the trend towards smaller devices though, and as the engineering gets tougher, it would be nice to throw the fans out and make the heatsinks smaller. Small computers like the SS50 are attractive to me, but not quite enough for me to take the tradeoffs needed. However, if I could get a computer with the functionality of the SS50 that was just a tad larger than the CD drive, and had rougly P3 400MHz kind of performance, and didn't cost much more than a good white box PC, I'd jump at it.
The problem isn't fitting this device into the mainstream, it's changing the mainstream so people see the need for this device.
Sony and Transmeta - in like Flynn (Score:1, Interesting)
Transmeta has been promising a lot of things since they were formed those many years ago. Nothing of substance has ever come out, though. Sure we've now got a low-power processor, so what? It comes at the cost of serious lack of speed.
Now they promise 256-bit processors. That's great, but it's completely worthless when any chip that it is attempting to emulate maxes out at 64-bits. Hell, the 64-bit chips haven't even come out yet.
Transmeta is dying. Especially if they've hitched their horse to the floundering Japanese economy.
Re:Sony and Transmeta - in like Flynn (Score:2)
Now, you may be referring to 64-bit x86 chips, but that is not implied.
Just to correct your statemet, here's a small list of some 64-bit CPUs:
Digital/Compaq Alpha
Intel Itanium (well, I'm not entirely sure if it's available)
PA-RISC
SUN Sparc
SUN Blade
AIX
IBM Power 4
Power G4
IBM AS/400 (and many other in the AS-series)
I'm not entirely sure about all of these though, so if some of them aren't 64-bit, please correct me.
Re:Sony and Transmeta - in like Flynn (Score:2, Informative)
This begs the question: (Score:2, Funny)
lower power (Score:1)
Chandler: CEO (Score:1, Offtopic)
Ditzel and Matthew Perry, the recently appointed chief executive officer of Transmeta
Well, it's good to see he's got work lined up now that Friends is almost over.
Re:Chandler: CEO (Score:2)
I doubt it's a real 256 bit processor (Score:1)
I don't think this will be a real 256-bit processor.
Interview, Dave Ditzel (Transmeta founder) (Score:3, Informative)
Faster is indeed better... (Score:2, Interesting)
Re:Faster is indeed better... (Score:2)
Well, the people who can greatly benefit from Alpha AND can afford to buy them in quantities (scientific research institutions and CG render farms) *do* buy Alpha. But it's almost a maxim: the sexier the application, the less of a sustainable market there is. If you want to succeed in the computer industry you have to aim at boring sectors like secretaries who want a simple word processor.
It's not a 256b datapath, but a 256b VLIW word... (Score:5, Informative)
Unfortunatly, transmeta is hampered by several factors.
The first is that 256b will require the translator to discover 8 translated instructions (assuming a 32b instruction size) which can be executed in parallel to get good performance. This is a TOUGH barrier, the reality is probably closer to 2-4. Also, the way to get more instructions to issue is through speculation, but too much speculation really hurts power.
Secondly, the transmeta cache for translations and translating code is so small that it hurts quality. Transmeta would do better with OS cooperation, giving a larger hunk of memory to store more and better translations, and to enable more sophisticated translating algorithms. But that breaks the x86 compatability model.
Third, they have lost the battle on performance, and power doesn't matter: Intel can outfab them and if REALLY low power was required/useful in the x86 world, Intel could crush them by simply dusting off the old Pentium core, process shrinking it to
Fourth, transmetas claims in the past have been so full of hot air, so why should we believe anything they say now?
Re:It's not a 256b datapath, but a 256b VLIW word. (Score:4, Interesting)
B) The translation doesn't have to be that great. They're still performing fairly competitively with Intel chips.
C) Pentiums don't play well enough. Transmeta can simulate fairly well a several hundred megahertz (probably about 4-500) Pentium III. Also, Intel is notoriously bad at doing such things. Their memory is not written down on how to make such chips, but only remembered in the minds of the workers. It would be VERY hard for them to do that, actually.
D) Transmeta based solutions have often employed other cool ideas in terms of power consumption: Better LCD's that don't need backlights, e.g. Not perfect, but getting there.
E) Transmeta's solution is so amazing that, even if it hasn't revolutionized the world, it has changed the course of Intel's strategy non-trivially. Plus, it's awesomely cool.
Here's a dumb question... (Score:1)
32-bits, 64-bits, 256-bits .... what's the limit ? (Score:3, Interesting)
First there was that 4-bit microprocessor, then it went to 8-bit, then 16-bit, 32-bit, and 64-bit.
When Transmeta announced it's 256-bit microprocessor, I'm not surprise.
However, I do have a question
Is there a theoretical limit on the maximum
bit-path for microprocessors ?
Or in other words, will we see microprocessors with giga-bit (or even exa-bit) path ?
Re:32-bits, 64-bits, 256-bits .... what's the limi (Score:2)
But who knows? Maybe when volumetric holograms come into play we'll see a need for numbers that big.
I do think we're due to move off of silicon soon, though, and move to something organic. I'm really curious what'll happen when we replace bits with neurons.
Re:32-bits, 64-bits, 256-bits .... what's the limi (Score:2)
brains are, however, massively parallel.
I think you'd get more grunt and reliability from massively parallel silicon in this century than from any artificial neuronic configuration.
But really your quantum computers, which can parallelise across their own probabilities will almost certainly be the next major leap.
Re:32-bits, 64-bits, 256-bits .... what's the limi (Score:2)
they scratched and scribbled and did excatly what a single server can achieve today.
which gives those young men time to go out and get drunk instead.
Re:32-bits, 64-bits, 256-bits .... what's the limi (Score:1)
Re:32-bits, 64-bits, 256-bits .... what's the limi (Score:2)
> with giga-bit (or even exa-bit) path ?
Using current technologies (DNA&Quantums excluded) the main problem will become the size. At some point, the barrier will be hit - there is a limit for the number of transistors you can fit in certain size
My quess is, however, that we will see a true 1024 bit processor by year 2008. I also quess that at this point we have seen the best the current technology can offer, and we will start shifting away from transistors. Majority of our computers will be based on these alternative technologies by year 2015.
Save this for future reference.
Re:32-bits, 64-bits, 256-bits .... what's the limi (Score:2)
Re:32-bits, 64-bits, 256-bits .... what's the limi (Score:2)
correctify my mistakes
Re:32-bits, 64-bits, 256-bits .... what's the limi (Score:5, Informative)
1) What's a "true" 1024 bit processor?
You have to make assumptions to answer this question. Probably the most useful "bit"ness to know for a particular processor is the number of bits it can use for a "normal" memory address. For Athlons, that is 32 bits, and the same for the Intel P4. Some Intel chips have a 4 bit extension, but it's a pain to use and should be ignored (and mostly is). There are a handful of mass produced cpus with 64 bit addressing; the DEC^H^H^HCompaq^H^H^HIntel Alpha, some version of the Sparc lineup, and certain varieties of IBM's POWER family come to mind. Since memory addresses on typical cpus refers to one byte, having 32 bit addresses allows you to uniquely reference 2^32 (~= 4 billion) bytes with a single memory address. How much of that "address space" you can map to physical ram is an entirely different issue. Being "64 bit" typically also means you can represent every integer between 0 and 2^64-1 exactly.
In my experience (I do scientific computing, not enterprise stuff), the ability to address tons of ram from a single cpu is what really counts 99.99% of the time. We have a machine, a Compaq ES40 Model II, with 1 cpu and 14GB of ram. It can grow to 32GB of ram -- and the new version goes up to 64GB of ram (and the machine's a steal at $20K with educational discount -- I'm being serious, but things will change with AMD's 64bit x86 "Hammer" stuff at the end of this year). You can't do that in any sensible way on a 32 bit cpu.
2) From what I understand from the other posts, this transmeta proc is not 256 bits in the same sense that Intel's current chips are 32 bits
True. The "instruction word" on most modern (RISC) cpus == "word" size == integer size == memory address size. In fact, this was one of the big simplifications propounded in the RISC paradigm. Note that modern x86 cpus are RISC based, even though their instruction set is CISC (you can look up CISC and RISC and the web; note that CISC was the right thing to do under certain conditions). The Transmeta Crusoe is *not* a RISC cpu. In some ways it is simpler. However, it requires *very complicated* software support, unlike RISC cpus (take this with a grain of salt). So when someone says that the Crusoe instruction word is 256 bits, you shouldn't make any assumptions about integer or memory address sizes (I don't know, but I assume these are 32 bits on the Crusoe -- 64 bit would be silly for the Crusoe's target applications). A single "instruction" for a Crusoe will (evidently) be 256 bits in the future. However, it will (evidently) be guaranteed that this 256 bits will be broken down into 8 smaller 32 bit instructions by the cpu. That is, 256 bits are fetched from memory (don't ask which memory) at once, which the cpu will interpret as 8 different things to do at the same time.
I'm not mentioning a lot of stuff, like variable width instruction encoding in the x86 instruction set, or how software converts files full of x86 instructions into files full of 256 bit Crusoe instructions, and certain efficiencies and inefficiencies of 64 bit cpus versus 32 bit cpus. My main point is that you shouldn't get hung up on the "bit"ness of a cpu unless you are writing software for that cpu. FWIW, 64 bit cpus is nothing new. I talked to a 70 year-old who claimed to work on experimental 64 bit machines in the 1960s or 70s for the military (I don't recall which military =-).
Since 2^64 is a *really* big number (where are those stupid "number of atoms in the universe" figures when you need them?), it's unlikely that we'll need memory spaces larger than 2^64 anytime soon. Same goes for integer sizes. Improved floating point precision from wider floating point types would be much appreciated by folks like me who are tired of working with crappy 64 bit doubles and can't afford to take the performance hit of wider fp types on 32 bit architectures.
As far as optimal width for instructions, I have no idea. If you want to make a big fat instruction, you better have a lot of good stuff to do at once. And that depends not only on the compiler that converts C (or whatever) into the cpu's instruction set, but also how the human chose to use C (or whatever) to implement her idea.
Computer history is full of people wanting to do something, computers catching up by removing performance bottlenecks, humans adjusting to the new machines, and then the whole thing repeats. Heck, at one time it wasn't clear whether digital computers were really a better idea than analog computers (however, I think this argument is over for general purpose computing), and analog computers don't have any "bits" at all.
Like I said, don't take anything I wrote above (at 5am while waiting for some code to produce output) as fact without double checking somewhere else. If you really want to get your head screwed on right, take an architecture course or (if you're really disciplined) work your way through something like Hennessy and Patterson's "Computer Architecture, A Quantitative Approach". You can get a lot of good info from 'popular' texts like "The Indispensable PC Hardware Book". A big warning about that book, though -- when the author writes "PC", he almost always means "PC when used with MS-DOS or Windows" -- often this is subtle, for instance when discussing the boot process or how memory is organized.
-Paul Komarek
Re:32-bits, 64-bits, 256-bits .... what's the limi (Score:2)
So what are you saying exactly, that the number of bits that a CPU is rated only means the total number of RAM bits that can be addressed? In other words, a 32 bit CPU can only address 2^32 bits of RAM? Is that the only real difference?
If it is, I can't imagine any computer in the 60's and 70's being able to address 2^64 bits of RAM.
By the way, there are 10^81 atoms (supposedly) in the Universe, which is somewhere between 2^269 and 2^270 (to be precise, it's 2^269.07617568587635017749587378908).
Re:32-bits, 64-bits, 256-bits .... what's the limi (Score:2)
On RISC cpus things are supposed to be simple (by definition). They have a word size which is also the address size and integer size. When someone says "32 bit cpu", they're probably talking about the word size of a RISC cpu. The lab group I work in is mostly interested in large memory attached to a single cpu, hence our desire for a 64 bit address space. I recently needed 64 bit integers for exact arithmetic, but that is the first time that happened for anyone in my lab group.
And a word of caution. A 32 bit RISC cpu has 32 bit memory addresses, but that doesn't mean one can address 2^32 bytes of ram. Modern operating systems use "virtual memory" for a variety of reasons, and one of the side effects is that the virtual memory system "steals" some of those 2^32 addresses, and hence not all 2^32 addresses are available for mapping to physical RAM. There are many other things that "steal" addresses from the address space. In the "simplest" scenario (in some sense), you can only have half as much physical RAM as you have addresses. Thus only 2^31 bytes worth on a 32 bit RISC cpu (2GB).
The folks using the IBM Stretch in the 1960s (thanks to tri44id for his post about this) probably weren't really concerened with having lots of RAM. They were probably more interested in making calculations concerning nuclear tests more accurate. Furthermore, RISC cpus (on the market) didn't show up until the mid 1980s, and saying that the Stretch was a 64 bit computer would be very misleading. Parts of the cpu handled 64 bits "simultaneoulsy", but which parts? You'd have to do some research to find out.
If you're interested in computer history as much or more than computer architecture, I recommend "A History of Modern Computing" by Ceruzzi (curator of the Smithonian's Air & Space Museum). I recommend only glancing at the Introduction, as it is isn't nearly as good as the rest of the book. Overall, I love this book.
-Paul Komarek
Re:32-bits, 64-bits, 256-bits .... what's the limi (Score:2)
For instance: a MC68000 chip is a 16 bit CPU, I think we've all accepted that at some point in our lives. However, it has 24 bit addressing.
Re:32-bits, 64-bits, 256-bits .... what's the limi (Score:2)
-Paul Komarek
Re:32-bits, 64-bits, 256-bits .... what's the limi (Score:2)
Re:32-bits, 64-bits, 256-bits .... what's the limi (Score:2)
But instruction size is just silly, and why I think you're trolling. The Athlon in this box uses variable-length instructions; most are 8 bits long. The Alpha, normally considered a 64-bit machine, uses 32-bit instructions. The Itanic puts three instructions into a 128-bit bundle, making its instruction length about 42 bits. The NEC Vr4181 in the Agenda VR3 PDA in my pocket has a 16-bit data bus and modes for both 32- and 64-bit GPRs.
Oh and the Vr4181 has both the standard 32-bit MIPS II instruction set and the 16-bit MIPS16 set, with instructions to switch between them.
From a programmer's point of view the data bus, physical address pins, and the size of instructions are just implementation details. What's important is the instruction set architecture, and the computing model defined by it. In both MIPS II and MIPS16 modes, the Vr4181 has 32-bit GPRs and a flat 32-bit address space. (With a little kernel hacking, it'd be 64-bit GPRs and addressing, but that would be silly.) When I take my code to a Vr4131, which has a 32-bit data bus, I don't have to change anything.
That's why I consider the 68000 to be a 32-bit architecture. Except for performance, my code will run identically on the 68008, 68000, and 68020, with their external 8, 16, and 32 bit data buses.
For the new Transmeta chip, this evaluation strategy says that it's still a 32-bit chip. Programmers outside of Transmeta don't directly program the device, so it makes no difference what the internal ISA is. The externally visible ISA is still the variable-length 8-bit IA-32 architecture, with its 32-bit GPRs. I'd bet they aren't implementing the cheap hacks to get 36-bit physical addressing...
Re:32-bits, 64-bits, 256-bits .... what's the limi (Score:2)
The problem here is that we're not talking about RISC cpus, where a word is a word is a word.
-Paul Komarek
Re:32-bits, 64-bits, 256-bits .... what's the limi (Score:2)
The PET was expandable to 256K (not bad, for an 8-bit machine!) but the screen flickered horribly. My guess, based on that observation, was that the technique here was to have a very fast switch flip between banks, with the screen RAM on only one of those banks.
In consequence, you could "see" all 256K, but not all at the same time. You would probably need to go through some intermediate layer, to actually go between pages, for that very reason. There would be no way of telling which layer you meant.
The BBC Micro used a system it called "Sideways ROM/RAM", which (AFAIK) was based on having the lower part of RAM fixed, and the upper part selectable. In other words, it wasn't based on auto-cycling (as with the PET) but required code to do the selecting.
IMHO, the CBM64 (which used an 65x02 processor, and could therefore access 64K of memory directly) didn't have any bank selection, as far as I know. 8-bit processors could access 64K of RAM because they worked in pages. 256 pages of 256 bytes. This is why addresses 0-255 are referred to as "Zero Page". (That is also where a lot of OS-based work was done, as that could be accessed the fastest.)
In other words, 8-bit processors, such as the 65x02, used two words for addresses. The Page, and the Offset, giving you an effective 16-bit address length.
(It's also why the ZX80 only had 256 bytes of RAM, and yet worked surprisingly well. With everything in Zero Page, you had the fastest access possible at all times.)
Re: 32-bits, 64-bits, 256-bits .... what's the lim (Score:5, Funny)
> First there was that 4-bit microprocessor, then it went to 8-bit, then 16-bit, 32-bit, and 64-bit.
No one should ever need more than 640 bits.
Re:32-bits, 64-bits, 256-bits .... what's the limi (Score:3, Informative)
Here, the 256-bits refers to the instruction word, not the data-word size. These are completely different things. If you're going by this, then your x86 could be considered up to a 48-bit machine or so. The TMTA chips are still 32 or 64 or 48 or something like x86 is. this is just going to mean that because it's VLIW, it can do 8 ops per cycle per pipeline stage instead of 4. Cool, but not any more revolutionary than anything else TMTA has done.
OS support? (Score:1)
Re:OS support? (Score:2)
There is absolutely no reason I can think of that all the size dependency issues (for data, address, etc) can't be shifted into a layer between the kernel itself and the underlying processor support. If you were to do that, then someone could come along with a 4096-bit (data), 256-bit (instruction), 512-bit (address), 8192-bit (register) processor, and you wouldn't need to give a damn. Just copy an existing header file, shove in the new constants, and the kernel would support such a layout from day 1.
Is this possible? Sure! There's nothing magical about the sizes used for data structures - they're just sizes used because they are. If a kernel pointer took 1K, would it really change the logic any? No. It would just mean that the pointer took that much RAM, and could point to 2^1024 definable points in memory.
The only times you actually NEED to know the size of a data structure are when you're checking for out-of-range conditions, or when you're streaming in/out. The first just requires a pre-defined set of constants (eg: MAXINT) to reflect the bit-lengths. The second is more complex, as byte ordering becomes important. If you're putting a 32-bit number into a 64-bit structure, and the endianness isn't the same, you have to first convert the 32-bit number to 64-bits, in order for the endianness to convert correctly.
This doesn't work too well, the other way round, though. 64-bits can hold more data than 32-bits, so if you have code which assumes you've the full 64-bit range, it'll break.
BUT, this is the crux, what happens if you don't assume, but ask? What happens if the kernel interrogates the hardware, to find the bit-lengths? What happens if user-land software doesn't assume specific ranges, but enquires at run-time? (eg: Things like sizeof() become system calls)
Sure, things will run slower on any given hardware platform, because none of the software would be making assumptions about the nature of the hardware. Any hardware-dependent optimizations would need to be made at run-time. (eg: when loading an application, it would be "linked" with the hardware layer, unknowns would be resolved, and the code tuned to the bit-lengths in use.)
All of this is certainly possible, and certainly practical. The only real question is whether it is even remotely useful. At any given time, is there so much diversity that hardware-independent layers would have any actual value to anyone? If not, then although you can do it, why bother? It would be easier just to hand-tune the system each time you moved it.
(X is a classic example of a system that could be hardware-independent, but is actually heavily hand-tuned. Manufacturers who want a performance better than that of a slug do some intense work on tuning X for specific hardware/software combinations.)
VLIW at IBM Research, Transmeta & IBM married (Score:4, Informative)
"We developed an experimental prototype of a VLIW processor, capable of performing multiway branching and conditional execution, which is currently operational. The prototype has helped us investigate some of the hardware constraints in building VLIWs.
This processor executes tree-instructions within a ``classical'' VLIW architecture, that is, fixed-length VLIWs with preassigned slots for the different operations. The register state consists of 64 32-bit general purpose registers, 8 single-bit condition code registers, 4 memory address registers, program status word register, and some special registers. Each Very Long Instruction Word is 759 bits, which include..."
Now, when we know the relationship [wired.com] between IBM and Transmeta, can you combine the results of these two 'projects'.
Re:VLIW at IBM Research, Transmeta & IBM marri (Score:2)
the first 256-bit OS.. (Score:2, Funny)
Memory Addressing, Parallel VLIW Issues (Score:2, Interesting)
There's nothing too spectacular about 256-bit instruction paths in VLIW processors, but I'm not sure this will offer the caliber of benefits they claim it will: VLIW instructions (which are usually bundles of smaller, discrete instructions) are by nature very complex beasts, and trying to shove two down the pipeline without the instructions stepping on each other's toes is a difficult process.
But, of course, I'm not working at Transmeta, so I really can't say what wonders they're working over there.
CORRECTION (re: Memory Addressing, Parallel VLIW) (Score:1)
By "pipeline" I meant instruction size. It can't be said for sure if it's a 256-bit wide datapath, but it seems that anything less would make the chip even harder to build.
Later when I referred to shoving "two down the pipeline", it was in consideration of size of the previous 128-bit VLIW instructions, not that they were attempting to parallelize the execution of the previous VLIW instruction set.
.. just trying to clarify what I meant. Heaven forbid it be misinterpreted
Re:Memory Addressing, Parallel VLIW Issues (Score:2)
Re:Memory Addressing, Parallel VLIW Issues (Score:2)
hobbyist motherboards? (Score:2, Interesting)
I'm speaking out of self-interest, of course. I'd like to build a home, rack-mount style server with ultra-low energy requirements. As it is, I'm thinking about going with an iMac motherboard and Darwin, but I'd much rather use a Transmeta system with a standard Linux distribution.
Transmeta Motherboards here (Score:3, Informative)
http://www.ibase-i.com.tw/ib755.htm
They've got more Transmeta motherboards, including a CPU PCI board.
I bought the first one that came out and I like it. You'll have to find a way to mount it to an ATX case since it's one third the size.
Other Transmeta Products:
http://www.transmetazone.com/products.
Perhaps transmeta will change strategies (Score:1)
Architecture is not relevant (Score:3, Insightful)
Thus all the market will care about is how much does it cost, how much power does it use and how fast is it compared to the offerings from Intel and AMD.
Is that a battle Transmeta can win? Intel can always pretend to have a better low power pentium around the corner, and they might not even be pretending.
Now, if they could use it to make a machine which can run both Mac PowerPC and x86 software are high performance, that might be something that would bring in users.
Great... (Score:1, Redundant)
256 bits (Score:1, Redundant)
If a really low power processor was useful, then Intel or AMD would already have an ultra-low power product out the door to fulfill the market need.
Transmeta claims they can get equivalent performance at much lower power. This is a dubious claim given that their past products have fallen far short of this goal. Their customers are few and far between and the stock price has reacted accordingly.
Have one and love. Get one if you can. (Score:4, Informative)
But the best thing is the low amount of heat that the thing kicks out. Anyone who has ever sat with a P3/4 notebook on their lap for any amount of time knows how hot they get. These get a little warm after an hour or so, but not hot.
Bought mine in Japan, not sure what is available elsewhere.
Cheers.
Re:Have one and love. Get one if you can. (Score:2)
Re:Have one and love. Get one if you can. (Score:2)
I don't get it! (Score:3, Interesting)
Now I know it's more complicated than just adding more transistors. Still, though, they seem to have a good design, and it seems to me like they should just add more horsepower to each part of the chip. It would have the potential to be a great server chip, and if my wildest dreams came true, it would outperform the Motorolla's best chips by such a margin that Apple would pay Linus to write a code-morphing routine to have the chip emulate a PowerPC. It would be a seamless transition for Mac users, and it would make Macs competitive again for price-conscious performance users.
Re:I don't get it! (Score:2, Informative)
That's not cheap to make, and no doubt power hungry, which is the reverse of what the Crusoe does best. Besides, there's no guarantee more cache will help given it's current design - if you want a smokin' processor with lots of cache, use one that was designed for that purpose.
What I want from Transmeta (Score:2, Insightful)
I don't care about power efficiency except as a means to an end.
And that end is a passively cooled machine of sufficient performance to run a desktop workstation or server. I'd like to replace my aging PPro200 with a passively cooled machine, and Transmeta seems to be able to deliver that.
So why don't they do that? I think there's a market there, too. A Transmeta mobo and processor is all that is needed, yet in the Netherlands, I can find neither...
Of course, 'cheap' would be a nice property of such a system too, though I don't know if Transmeta could deliver that.
Re:What I want from Transmeta (Score:2)
Correct me if I'm wrong (Score:2)
I have this funny feeling that the one company who could really get the most out of Transmeta's technology is the one company that won't: Apple. Given Apple's constant problems with Motorola and G4 deliveries, this is one processor that could give them a boost. I have no idea though. This is just a question.
The Japanese don't play dice with the Universe (Score:4, Insightful)
Re:The Japanese don't play dice with the Universe (Score:2)
I'd rather put it this way - 8% outside Japan (Score:2)
Quite frankly, the reason I didn't get one is that it ran like a limping turtle. Then again I'm rather picky and ended up getting a Toshiba Portégé 2000 instead, but that's just me. Chokes out after 90-100 mins with primary but lasts a good 6 hours with the included secondary, sitting outside, screen to max brightness and using the built-in WiFi. Everyone but my wallet and a few jealous friends are happy
Kjella
cheapest supercomputer based on transmeta (Score:2)
transmeta's income statement (Score:2, Interesting)
Whole lotta DIMMs (Score:2)
Oh great (Score:2)
oh yeah... (Score:2)
Between Matrox and Transmeta, It's a very good time to be waiting patiently for processors and video cards, since the next generation looks like it will be sweet!
Re:Transmeta (Score:2)
Remember, the Crusoe is "different". Rules apply diferently.