Reverse Multithreading CPUs

Become a fan of Slashdot on Facebook

Reverse Multithreading CPUs 263

Posted by ScuttleMonkey on Tuesday April 18, 2006 @06:26PM from the quick-geordi-reverse-the-polarity dept.

microbee writes "The register is reporting that AMD is researching a new CPU technology called 'reverse multithreading', which essentially does the opposite of hyperthreading in that it presents multiple cores to the OS as a single-core processor." From the article: "The technology is aimed at the next architecture after K8, according to a purported company mole cited by French-language site x86 Secret. It's well known that two CPUs - whether two separate processors or two cores on the same die - don't generate, clock for clock, double the performance of a single CPU. However, by making the CPU once again appear as a single logical processor, AMD is claimed to believe it may be able to double the single-chip performance with a two-core chip or provide quadruple the performance with a quad-core processor."

This discussion has been archived. No new comments can be posted.

Reverse Multithreading CPUs

Load All Comments

Search 263 Comments Log In/Create an Account

Comments Filter:

Isn't that just superscalar? (Score:5, Interesting)

by tepples ( 727027 ) writes: <tepples@nospAm.gmail.com> on Tuesday April 18, 2006 @06:29PM (#15153179) Homepage Journal

Multiple cores presented as one sounds familiar. Last time I heard about that, it was just called "superscalar execution" [wikipedia.org]. As I understand it, multithreading and multicore were added because CPUs' instruction schedulers were having a hard time extracting parallelism from within a thread.

Share
twitter facebook
- No, superscalar is different (Score:5, Interesting)
  
  by overshoot ( 39700 ) writes: on Tuesday April 18, 2006 @06:45PM (#15153272)
  
  Superscalar refers to having multiple execution paths inside of a single processor, allowing the dispatch of multiple instructions in a single clock cycle. However, the register sets (etc.) maintain a common state (although keeping the out-of-order updates straight sucks a huge amount of complexity and power.)
  In this case, AMD appears to be trying to decouple the states enough that the out-of-order resolution doesn't require micromanaging all of the processes from a single control point.
  
  Parent Share
  twitter facebook
  - Re:No, superscalar is different (Score:5, Insightful)
    
    by RalphTWaP ( 447267 ) writes: on Tuesday April 18, 2006 @07:00PM (#15153350)
    
    What AMD appears to be trying isn't the same as superscalar processing, but it might run into a similar problem.
    
    Where superscalar requires a good dispatcher to minimize branch prediction misses, AMD appears to be making decisions, not about dispatch, but about how to do locking of shared memory (think critical sections).
    
    Critical section prediction might prove less expensive than branch prediction in practice even if they are similar in theory (http://www.cs.umd.edu/~pugh/java/memoryModel/Doub leCheckedLocking.html [umd.edu] shows the problem, which already is an issue on 64-bit hardware).
    
    Parent Share
    twitter facebook
    - It's not branch mis-prediction, it's the memory (Score:5, Informative)
      
      by vlad_petric ( 94134 ) writes: on Tuesday April 18, 2006 @10:32PM (#15154523) Homepage
      
      Current superscalars still fetch instructions in order, and squash everything after a mis-predicted branch. The cost of branch mis-speculation is in fact getting higher, because deeper pipelines means longer times between the mis-prediction of the branch and the execution (where the correction takes place). In other words, it means longer times on the wrong path.
      The purpose of "good dispatching" (i.e. out-of-order execution) is to hide the latencies of misses to main memory (it takes between 200 and 400 cycles these days to get something from memory, assuming that the memory bus isn't saturated), by executing instructions following the miss but not dependent on it. Out-of-order execution has been around Pentium Pro, btw.
      
      Parent Share
      twitter facebook
- Re:Isn't that just superscalar? (Score:3, Informative)
  
  by lbrandy ( 923907 ) writes:
  
  . Last time I heard about that, it was just called "superscalar execution".
  
  That's not quite right, and I think there is alot of misunderstanding going around. So let me tell you what I know about this technology. First of all, the entire idea of having two processors work on a single thread of a program isn't that far-fetched, and has been a topic of research for a long time. What most people don't understand is that, in general, it requires a massive revamp of the instruction set. What happens is you de
  - Itanium (Score:2)
    
    by logicnazi ( 169418 ) writes:
    
    Isn't this exactly what intel did with the IA64 instruction set, i.e., the Itanium family. Added explicit support for simultaneous instruction execution?
    
    Personally I'm still a big fan of this instruction set/system and feel it's a real shame that backward's compatibility/resistance to change has kept it out of the mainstream. I would dearly love the irony if AMD tried to introduce an Itanium like processor now.
I suggest a compromise (Score:5, Funny)

by Quaoar ( 614366 ) writes: on Tuesday April 18, 2006 @06:30PM (#15153184)

I believe that one and a half cores, sideways-threaded, is the way to go.

Share
twitter facebook
- Re:I suggest a compromise (Score:5, Funny)
  
  by mctk ( 840035 ) writes: on Tuesday April 18, 2006 @06:34PM (#15153210) Homepage
  
  Of cores!
  
  Parent Share
  twitter facebook
  - Re:I suggest a compromise (Score:2, Funny)
    
    by PunTrollCritic ( 756583 ) writes:
    
    Score -1 Punny
Scheduling Threads (Score:5, Insightful)

by mikeumass ( 581062 ) writes: on Tuesday April 18, 2006 @06:31PM (#15153190)

If the OS scheduler only know about one core, how in the world would it ever know to set two threads in the execute state simultaniously to take advantage of the extra horsepower. This article is lacking any substantial detail.

Share
twitter facebook
- Re:Scheduling Threads (Score:3, Insightful)
  
  by WindBourne ( 631190 ) writes:
  
  It won't. But that is not the problem. They are moving the selection down below and making the access to memory and other resources the issues. It is possible that this will increase the overall system performance.
- Re:Scheduling Threads (Score:5, Informative)
  
  by DerGeist ( 956018 ) writes: on Tuesday April 18, 2006 @06:42PM (#15153254)
  
  It's not, you're actually losing parallism here. The idea is to hide the multiple processors from the OS and make it think it is scheduling for only one. The OS is so good at single-processor scheduling that allowing the CPUs to take care of who does what will effect better performance than splitting up the tasks among the processors at the OS level.
  At least that's the idea. Whether or not it works is yet to be seen.
  
  Parent Share
  twitter facebook
  - Re:Scheduling Threads (Score:2)
    
    by drsmithy ( 35869 ) writes:
    
    The OS is so good at single-processor scheduling that allowing the CPUs to take care of who does what will effect better performance than splitting up the tasks among the processors at the OS level.
    The problem with this reasoning is that all contemporary OSes have been designed with multiprocessor machines in mind and are thus not only heavily multithreaded, but also have schedulers designed to detect and take maximum advantage of, multiple CPUs.
    I'm highly sceptical that any CPU can do a better job than t
    - Re:Scheduling Threads (Score:4, Informative)
      
      by Homology ( 639438 ) writes: on Tuesday April 18, 2006 @07:31PM (#15153516)
      
      The problem with this reasoning is that all contemporary OSes have been designed with multiprocessor machines in mind and are thus not only heavily multithreaded, but also have schedulers designed to detect and take maximum advantage of, multiple CPUs.
      A kernel intended to run on a single CPU machine can be made to run faster, partly due to less need to use locks. OpenBSD has offers two kernels for the archs that supports multi CPU: one single CPU kernel, and a multi CPU kernel. The single CPU kernel is faster.
      
      Parent Share
      twitter facebook
      - Re:Scheduling Threads (Score:4, Informative)
        
        by drsmithy ( 35869 ) writes: <drsmithy@ g m ail.com> on Tuesday April 18, 2006 @07:50PM (#15153640)
        
        A kernel intended to run on a single CPU machine can be made to run faster, partly due to less need to use locks. OpenBSD has offers two kernels for the archs that supports multi CPU: one single CPU kernel, and a multi CPU kernel. The single CPU kernel is faster.
        OpenBSD's SMP support is not particularly good, I don't think it's a good example to use for performance comparison purposes.
        
        Parent Share
        twitter facebook
        
        Re:Scheduling Threads (Score:2)
        
        by bleak sky ( 144328 ) writes:
        
        This is not specific to OpenBSD. I did some rudimentary benchmarking for Debian with UP and SMP kernels (same config except for the SMP option), in each case using only one processor, and found between a 15% and 30% performance hit depending on hardware configuration.
        
        The issue was that Debian was (probaby still is) considering not shipping any UP kernels, since it's kind of a pain to maintain a UP and SMP flavor for each kernel configuration. It turns out the performance hit is still big enough that, exce
        
        Re:Scheduling Threads (Score:4, Insightful)
        
        by somersault ( 912633 ) writes: on Wednesday April 19, 2006 @04:54AM (#15155568) Homepage Journal
        
        I did some rudimentary benchmarking for Debian with UP and SMP kernels (same config except for the SMP option), in each case using only one processor
        
        Why do you think they included 2 different kernels, and how do you expect a kernel that has been optimised for parallelisation to run as well on a single processor? Seems rather trivial to me..
        
        Parent Share
        twitter facebook
        
        Re:Scheduling Threads (Score:2)
        
        by skwirlmaster ( 555307 ) writes:
        
        You're correct, OpenBSD has primitive SMP support. However it seems the point was to show an example of an OS that would perform better without dealing with SMP.
        
        Re:Scheduling Threads (Score:3)
        
        by TheRaven64 ( 641858 ) writes:
        
        Actually, it is, for exactly that reason. OpenBSD's current SMP support uses the simplest possible approach; put the entire kernel inside a big giant lock. This means that every system call has an implicit get-lock operation at the start and a release-lock operation at the end. This is the best possible case performance for a SMP-capable kernel when running on a single CPU.
        Consider a write operation. With the OpenBSD kernel, you make the system call, lock the kernel, run to completion, unlock the ker
  - Re:Scheduling Threads (Score:2)
    
    by misleb ( 129952 ) writes:
    
    But I don't understand how the CPU can split a single process/thread among cores without the same problems encountered with superscaler architectures.
    
    -matthew
    - - Re:Scheduling Threads (Score:2)
        
        by misleb ( 129952 ) writes:
        
        Right, this is currently done inside modern CPUs. The reason for doing hyperthreading (single CPU presented as 2) was because of the trouble of parallelizing single threads. The idea is that the OS knows better than the CPU what can run in parallel. But, I guess if you are targetting home users who generally only run one program at a time...
        
        -matthew
  - Re:Scheduling Threads (Score:2)
    
    by giminy ( 94188 ) writes:
    
    The OS is so good at single-processor scheduling that allowing the CPUs to take care of who does what will effect better performance than splitting up the tasks among the processors at the OS level.
    
    I thought OSes were only so good at multiprocessor scheduling because things can only be done in parallel to a certain level of granularity -- data dependencies, data locking, and other problems cause stalls in how well multithreading can work.
    
    I guess what we're all trying to figure out: how does 'figuring out wh
    - Re:Scheduling Threads (Score:2)
      
      by ttfkam ( 37064 ) writes:
      
      Perhaps for the same reasons that some folks use stored procedures in databases rather than sending a series of queries and responses over the wire. A highly tuned CPU and northbridge chipset combination may be able to perform functions with faster timings than a somewhat more generic OS-level version by reducing the number of roundtrips in and out of the CPU and memory subsystems.
      
      And hardware can indeed be faster than the equivalent software. Witness the rise of the GPU. A dedicated hardware item like a GP
- Re:Scheduling Threads (Score:2)
  
  by diegocgteleline.es ( 653730 ) writes:
  
  Export that info somewhere, once the the cpu scheduler know what features the CPU has it can start to try to take decisions optimized for that cpu. 2.6.17 will feature a new "scheduler domain" [kernel.org] which optimizes scheduling decisions for multi-core CPUs, for example.
  
  Of course you could choose not to export that info and let the CPU do it transparently, but does that have any sense at all? Now that cores are becoming so important you may end having more than one CPU with different number of cores each one, and t
- Re:Scheduling Threads (Score:2)
  
  by Theatetus ( 521747 ) writes:
  
  If the OS scheduler only know about one core, how in the world would it ever know to set two threads in the execute state simultaniously to take advantage of the extra horsepower
  
  It won't and that doesn't matter. When will you SMP-fetishist's learn that two simultaneous threads won't be running each in their own CPU? If you have two threads (or processes) running, that doesn't mean that each gets its own CPU; they'll share the 2 cpu's along with the dozens of other running processes. If AMD is right that
Huh? (Score:4, Interesting)

by SilentJ_PDX ( 559136 ) writes: on Tuesday April 18, 2006 @06:32PM (#15153192) Homepage

What's the difference between 'reverse multithreading' (it sounds like having one execution pipeline on a chip with enough hardware for 2 cores) and just adding more Logic/Integer/FP units to a chip?

Share
twitter facebook
- Re:Huh? (Score:5, Funny)
  
  by mrscorpio ( 265337 ) writes: <twoheadedboy&stonepool,com> on Tuesday April 18, 2006 @06:34PM (#15153205)
  
  ......these amps go to 11!
  
  Parent Share
  twitter facebook
- Re:Huh? (Score:2)
  
  by cyngus ( 753668 ) writes:
  
  Count me in the confused camp. It would seem to achieve roughly the same effect as adding more functional units, but with lower returns because the code running on each chip can't share state information as easily. My best guess is that they want the processor to appear as two cores sometimes and one core at other times. I haven't a clue who makes this decision, perhaps a hypervisor in a virtualized environment? My only other speculation is that perhaps branch prediction is somehow easier, although I ca
- Re:Huh? (Score:2)
  
  by smallfries ( 601545 ) writes:
  
  The effect is the same but you gain more flexibility. If you only add more EUs to a chip then they can get starved because it's hard to to the despatch to keep them full. If you have multicore, but only one main thread then at least half of your EUs are getting unused. The cute answer, from an engineering point of view is to allow both, and then switch between them. Then if you have long single-threaded sections of code with lots of implicit parallelism (ie games) you can load up the EUs, or if you have lot
  - Re:Huh? (Score:2)
    
    by cyngus ( 753668 ) writes:
    
    The effect is the same but you gain more flexibility. If you only add more EUs to a chip then they can get starved because it's hard to to the despatch to keep them full. If you have multicore, but only one main thread then at least half of your EUs are getting unused. The cute answer, from an engineering point of view is to allow both, and then switch between them. Then if you have long single-threaded sections of code with lots of implicit parallelism (ie games) you can load up the EUs,
    
    You definitely lo
    - Re:Huh? (Score:2)
      
      by smallfries ( 601545 ) writes:
      
      The difficulty in keeping the cores full is because most program don't expose that much parallelism. Adding cores doesn't magically fix that - but some programs do. Chosing the chip design is just an engineering tradeoff - which is the most common case, and then optimise for that. So Intel/AMD went down the superscalar route for as far as they could, but they got diminishing returns after a while.
      
      Multicore designs are optimising for a different kind of code - but they suck at running the programs that do ex
Sounds familiar (Score:5, Funny)

by Anonymous Coward writes: on Tuesday April 18, 2006 @06:32PM (#15153194)

Didn't they do this on Star Trek once to get more power or something?

Share
twitter facebook
- Re:Sounds familiar (Score:2)
  
  by ScrewMaster ( 602015 ) writes:
  
  No, that was the "Reverse Algorithmic" on that the video technician on CSI uses to sharpen blurry images.
  - Re:Sounds familiar (Score:2)
    
    by zippthorne ( 748122 ) writes:
    
    You're both wrong. It has to do with the process used to change Bender Bending Rodriguez from robot to frat guy.
- Re:Sounds familiar (Score:5, Funny)
  
  by aliens ( 90441 ) writes: on Tuesday April 18, 2006 @07:15PM (#15153430) Homepage Journal
  
  No that was Ghostbusters. They crossed the streams.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by account_deleted ( 4530225 ) writes:
    
    Comment removed based on user account deletion
Hmmm (Score:2)

by TheSpoom ( 715771 ) * writes:

This would seem to be better for processes designed to only use one CPU, but it then prevents me from coding something in, say, OpenMP [openmp.org], in order to fine tune the parallelisation of my code (which would almost certainly work better than the generic optimizations that they would be putting in the CPU). Admittedly 95+% of programs aren't coded to be parallel, but this would still take away an option that would otherwise be there.

Perhaps there could be a documented way to access both CPUs directly? That may s
- Re:Hmmm (Score:2)
  
  by CastrTroy ( 595695 ) writes:
  
  I haven't used Open MP, but I took a class in parallel processing, and we used LAM-MPI [lam-mpi.org]. If they are anything alike, then anything you program takes 10 times as long, plus you have to explicity tell it how to split up and collect the data in an efficient manner. Which is often the hardest part. Anyway, I think that this kind of stuff is only necessary for applications which are required to be highly parallel. Otherwise, it would probably be easy just to add a couple threads to you application, and let th
  - Re:Hmmm (Score:2)
    
    by TheSpoom ( 715771 ) * writes:
    
    Your parallelized program took ten times as long? What were you doing?
    
    Although I will say that when we were doing parallel programming, we did have access to Sharcnet [sharcnet.ca] nodes, so perhaps you were more limited.
    - Re:Hmmm (Score:2)
      
      by CastrTroy ( 595695 ) writes:
      
      Sorry for the ambiguity. It takes 10 times as long to write the code. Can't really say much about the performance increase, as I mostly ran the stuff on a single CPU, so the overhead would make things run slower. There was a couple 2 and 4 processor machines that we could SSH into, but when you're sharing with 50 other users, it's hard to gauge from one run to the next whether or not you're seeing any actual improvement.
Software isn't evolving. (Score:3, Interesting)

by Anonymous Coward writes: on Tuesday April 18, 2006 @06:38PM (#15153229)

Part of the problem is that we're still writing software using techniques that were designed for single-processor systems. Languages like C and C++ just aren't suited for writing large distributed and/or concurrent programs. It's a shame to see that even languages like Java and C# only have rudimentary support for such programming.

The future lies not with languages such as Erlang, and Haskell, but likely with languages heavily influenced by them. Erlang is well known for its uses in massively concurrent telephony applications. Programs written in Haskell, and many other pure functional languages, can easily be executed in parallel, without the programmer even having to consider such a possibility.

What is needed is a language that will bring the concepts of Erlang and Haskell together, into a system that can compete head-on with existing technologies. But more importantly, a generation of programmers who came through the ranks without much exposure to the techniques of Haskell and Erlang will need to adapt, or ultimately be replaced. That is the only way that software and hardware will be able to work together to solve the computational problems of tomorrow.

Share
twitter facebook
- occam (Score:5, Informative)
  
  by EmbeddedJanitor ( 597831 ) writes: on Tuesday April 18, 2006 @07:01PM (#15153353)
  
  About the best language I've ever seen for multi-threading is occam, the language used with Transputers. occam allows threading to be done as a language primitive. http://en.wikipedia.org/wiki/Occam_programming_lan guage [wikipedia.org]
  
  Parent Share
  twitter facebook
- Re:Software isn't evolving. (Score:4, Informative)
  
  by DrSkwid ( 118965 ) writes: on Tuesday April 18, 2006 @07:07PM (#15153386) Journal
  
  Limbo [vitanuova.com] is an example of a CSP programming language. One definitely worth having a look at.
  
  Parent Share
  twitter facebook
- Re:Software isn't evolving. (Score:3, Informative)
  
  by 0xABADC0DA ( 867955 ) writes:
  
  The main problem is that the CPUs are only designed for high-level parallelism. So you don't get much benefit from Haskell of Erlang because they could theoretically say "run this loop of 1..10 as two loops 1..5 and 6..10", but in practice doing so would be much slower on today's multi-core multiprocessors due to the setup overhead.
  
  I actually have a brand new dual-core box and the gnome System Monitor shows both cpu's separately. The only time I've seen both cpu's used by a single program it was a Java pr
may not want to go back.. yeah right (Score:5, Funny)

by igotmybfg ( 525391 ) writes: on Tuesday April 18, 2006 @06:39PM (#15153234) Homepage

However, by the time the technology ships - if it proves real, and ever becomes more than a lab experiment - the software industry will have had several years focusing on multi-threaded apps, and it may not want to go back.

Hah, yeah right, we started parallel programming just this semester and already I want to kill myself. "May not want to go back"? I'd go back in a heartbeat!

Share
twitter facebook
- Re:may not want to go back.. yeah right (Score:4, Insightful)
  
  by ivan256 ( 17499 ) * writes: on Tuesday April 18, 2006 @06:50PM (#15153297)
  
  Boy are you screwed.
  
  Even though the trade rags haven't realized it, real life software engineers have been using parallel programming techniques for decades. Sure, apps are optimized for what they run on, so most shrinkwrap software at your local CompUSA probably doesn't have much of that in there, but the author missed the boat already when it comes to "had several years focusing on...".
  
  Better learn to like that parallel programming stuff. It's the way things work.
  
  Parent Share
  twitter facebook
  - Re:may not want to go back.. yeah right (Score:2)
    
    by John_Sauter ( 595980 ) writes:
    
    Better learn to like that parallel programming stuff. It's the way things work.
    
    I can echo that. I have been doing programming on parallel CPUs since 1968 (on a monstrosity at Stanford University that included a 166 and a KA-10 processor). You have to think differently to write parallel code, but once you learn to think that way it becomes no harder than conventional, “linear” programming.
- Re:may not want to go back.. yeah right (Score:2)
  
  by DrSkwid ( 118965 ) writes:
  
  What language are they foisting upon you ?
Gotta love these CPU companies... (Score:5, Funny)

by __aaclcg7560 ( 824291 ) writes: on Tuesday April 18, 2006 @06:39PM (#15153235)

First, they get the software industry's licensing panties in a knot because users only want to pay a license fee for one physical chip instead of paying for each processor on the chip. Now, twisting the panties in other direction, they want to reverse all that by representing multiple processors as one virtual processor. Would that be covered by a multi or single processor license agreement? Do I still get free wedgie with that one?

Share
twitter facebook
- Re:Gotta love these CPU companies... (Score:2)
  
  by suv4x4 ( 956391 ) writes:
  
  "Now, twisting the panties in other direction, they want to reverse all that by representing multiple processors as one virtual processor. Would that be covered by a multi or single processor license agreement?"
  
  Or maybe the software industry can start acting logically and license per a machine.
  
  That's of course until the "reverse virtualisation" from Intel happens, that makes your entire server cluster run as a single PC :)
  - Re:Gotta love these CPU companies... (Score:2)
    
    by dgatwood ( 11270 ) writes:
    
    That's of course until the "reverse virtualisation" from Intel happens, that makes your entire server cluster run as a single PC :)
    You mean NUMA [wikipedia.org]?
- Re:Gotta love these CPU companies... (Score:2)
  
  by Brandybuck ( 704397 ) writes:
  
  I don't believe in paying license fees to begin with, so there!
Amdahl's Law (Score:5, Interesting)

by overshoot ( 39700 ) writes: on Tuesday April 18, 2006 @06:41PM (#15153247)

OK, I know some of the gang doing architecture for AMD and they are damned sharp people.
What I want to know is which of the premises underlying Amdahl's Law [wikipedia.org] they've managed to escape?

Share
twitter facebook
- Re:Amdahl's Law (Score:4, Interesting)
  
  by grumbel ( 592662 ) writes: <grumbel+slashdot@gmail.com> on Tuesday April 18, 2006 @06:57PM (#15153334) Homepage
  
  Quick guess:
  
  Amdahl's Law has little impact when the number of cores is small and the available task is "large", as todays multitaskin OSs are.
  
  Of course that doesn't mean that AMD will get a 100% improvment, but something close to that migth be doable if they can break the tasks at hand into parallel stuff at a much smaller level then threads.
  
  Parent Share
  twitter facebook
- Shi's law (Score:5, Informative)
  
  by G3ckoG33k ( 647276 ) writes: on Tuesday April 18, 2006 @07:15PM (#15153431)
  
  From here [temple.edu]:
  
  Researchers in the parallel processing community have been using Amdahl's Law and Gustafson's Law to obtain estimated speedups as measures of parallel program potential. In 1967, Amdahl's Law was used as an argument against massively parallel processing. Since 1988 Gustafson's Law has been used to justify massively parallel processing (MPP). Interestingly, a careful analysis reveals that these two laws are in fact identical. The well publicized arguments were resulted from misunderstandings of the nature of both laws.
  
  This paper establishes the mathematical equivalence between Amdahl's Law and Gustafson's Law. We also focus on an often neglected prerequisite to applying the Amdahl's Law: the serial and parallel programs must compute the same total number of steps for the same input. There is a class of commonly used algorithms for which this prerequisite is hard to satisfy. For these algorithms, the law can be abused. A simple rule is provided to identify these algorithms.
  
  We conclude that the use of the "serial percentage" concept in parallel performance evaluation is misleading. It has caused nearly three decades of confusion in the parallel processing community. This confusion disappears when processing times are used in the formulations. Therefore, we suggest that time-based formulations would be the most appropriate for parallel performance evaluation.
  
  Parent Share
  twitter facebook
  - Word (Score:2)
    
    by Bill, Shooter of Bul ( 629286 ) writes:
    
    Sorry, I don't have mod points. Thats pretty darn informative right there.
    
    I think thats a great example of the problems facing researchers in matehmatics (and sciences) today. Its really hard to make connections between all of the disperate facts, theories, and expiramental data to draw conclusions and lead to productive research and development. In short, we often experience mental stack overflow errors.
Sounds a lot like Intel's Mitosis research (Score:3, Informative)

by Anonymous Coward writes: on Tuesday April 18, 2006 @06:46PM (#15153280)

Despite the lack of details, it sounds quite a bit like Intel's Mitosis research:
http://www.intel.com/technology/magazine/research/ speculative-threading-1205.htm [intel.com]

The article has simulated performance comparisons.

From the article:
"Today we rely on the software developer to express parallelism in the application, or we depend on automatic tools (compilers) to extract this parallelism. These methods are only partially successful. To run RMS workloads and make effective use of many cores, we need applications that are highly parallel almost everywhere. This requires a more radical approach."

Share
twitter facebook
- Re:Sounds a lot like Intel's Mitosis research (Score:2)
  
  by logicnazi ( 169418 ) writes:
  
  No, it sounds nothing at all like this research. Intel's research (in the paper you link and with the entire Itanium system) has been all about exposing the out of order execution and speculative execution capabilities of the processesers to the compiler. In other words the exact opposite of what AMD is supposedly doing here by hiding the dual core nature of the chip.
  
  For what it's worth I think in the long run intel has the right answer the question is whether AMD can steal lots of market share in the sho
Similar to MacOSRumors rumor (Score:4, Insightful)

by salimma ( 115327 ) writes: on Tuesday April 18, 2006 @06:46PM (#15153281) Homepage Journal

.. in this post [macosrumors.com] they reported on a project supposedly aiming at breaking down single threads into multiple threads so as to better utilize core utilization beyond the fourth core.

It supposedly involve Intel. I personally think both rumors are just that, but the timing is curious. Same source behind both? AMD PR people not wanting to lose out in imaginary rumored technology to Intel?

Share
twitter facebook
- Re:Similar to MacOSRumors rumor (Score:2)
  
  by Calroth ( 310516 ) writes:
  
  .. in this post they reported on a project supposedly aiming at breaking down single threads into multiple threads so as to better utilize core utilization beyond the fourth core.
  
  In the Eiffel programming language, they've proposed a concurrency algorithm that doesn't use "traditional" threads.
  
  The idea is, you sprinkle the "separate" keyword onto various objects that it makes sense for. The compiler or runtime then does a dependency analysis and breaks out your program into different threads or processes. A
I know... (Score:5, Funny)

by Expert Determination ( 950523 ) writes: on Tuesday April 18, 2006 @06:47PM (#15153283)

Hyperthreading makes one core look like two. Reverse hyperthreading makes two cores look like one. So if we chain reverse hyperthreading with hyperthreading we can make one core look like one core but have twice as many features for the marketing department to brag about.

Share
twitter facebook
- Re:I know... (Score:2)
  
  by cnettel ( 836611 ) writes:
  
  Actually, that would be the answer to what to do with code that really is multi-threaded on a CPU like this. Give the OS two states (stack + registers and whatnot) to run threads on, but virtualize that above actual cores, so one thread might totally dominate both cores, especially if the other is just executing HLT. I still think this is vaporware close to vacuum, though.
- Re:I know... (Score:5, Insightful)
  
  by barracg8 ( 61682 ) writes: on Tuesday April 18, 2006 @08:31PM (#15153936)
  
  Ironic that this post is modded funny, since I think it might be closest to the mark.
  I'd suggest x86-secret & the Reg have got the wrong end of the stick here. SMT is running two threads on one core - try taking "reverse hyperthreading" literally. I'd suggest that AMD are looking at running the one same thread in lock-step on two cores simultaneously. This is not about performance, it is about reliability - AMD looking at the market for big iron (running execution cores in lock-step is the kind of hardware reliability you are looking at on mainframe systems).
  The behaviour of a CPU core should be completely deterministic. If the two cores are booted up on the same cycle they should make the same set of I/O requests at the same point, and so long as the system interface satisfies these requests identically an on the same cycle, then the cores should have no reason not to remain in sync with each other until the next point that they both should put out the next, identical pair of I/O requests. If the cores every get out of sync with each other, this indicates an error.
  Just speculation of course, but I seem to recall AMD looking into this having been rumoured previously.
  G.
  
  Parent Share
  twitter facebook
  - Or maybe predicting both branches? (Score:5, Interesting)
    
    by silverdirk ( 853406 ) writes: on Wednesday April 19, 2006 @02:18AM (#15155187)
    
    As one reply stated, you can't know which is right unless you had 3 cores.
    But, with two cores, you could have a way to predict "branch" and "not branch" at every prediction spot. The core that gets it right sends the registers to the other core so they can continue as if every branch were predicted correctly...
    That would only work if you had a nice fast way to copy registers accross in a very small number of clock cycles... so again, just a bunch of speculation. But it was a neat enough idea I had to say it.
    
    Parent Share
    twitter facebook
Yes, AMD! You get it! (Score:2, Interesting)

by totro2 ( 758083 ) writes:

As a systems admin in a large datacenter with many AIX, Solaris, HPUX, Redhat, and Suse boxes, I'm glad to see a vendor who wants to simplify management of systems (one processor is easier to manage than two). This is to say nothing about all the developer effort that would be saved from not needing to make making SMP-safe code. I want large, enterprise level boxes to be just as easy to administer/use as the cheapest desktop in their line. The OS should see as-simple-as-possible hardware. You wouldn't b
- Re:Yes, AMD! You get it! (Score:2)
  
  by saleenS281 ( 859657 ) writes:
  
  that makes no sense at all. So you want all boxes to act as uniprocessor... and then what happens when you want to run multiple tasks at once? You do realize sometimes you just want things to run parallel don't you?
  
  I guess by your response I'm highly doubting you admin systems in a large datacenter because it makes absolutely no sense. I don't know any admin that would only want to have one processor, logical or not, in a large server. There's WAYYYY too many things that need to go on at the same tim
- Re:Yes, AMD! You get it! (Score:2)
  
  by misleb ( 129952 ) writes:
  
  How is one processor easier to manage than two? The OS takes care of it for you. All you have to do is make sure the load is appropriate and balanced. But you have to do that anyway... The problem with the OS seing "as-simple-as-possible" hardware is that it can't take advantage of any of the features that you get with high end hardware. You can't get good diagnostics. And it is difficult to tune for a particular task. What if the algorithm that AMD uses to parallelize single threads isn't very good for you
To sweet to be true (Score:3, Insightful)

by suv4x4 ( 956391 ) writes: on Tuesday April 18, 2006 @07:02PM (#15153359)

"AMD is claimed to believe it may be able to double the single-chip performance with a two-core chip or provide quadruple the performance with a quad-core processor."

Even the article writers aren't pretty sure that's possible to do, apparently it's possible to "claim" it though, what isn't :)?

Modern processors, including the Core Duo rely on a complex "infrastructure" that allowed them to execute instructions out of order, if certain requirements are met, or execute several "simple" instructions at once. This is completely transparent to the code that is being executed.

Apparently for this to be possible the commands should not produce results co-dependent of each other, meaning you can't execute out-of-order or at-once instruction that modify the same register for ex.

This is an area where multiple cores could join forces and compute results for one single programming thread as the article suggests.

But you can hardly get twice the performance from two cores out of that.

Share
twitter facebook
It's not exactly clear what they have in mind (Score:5, Informative)

by ameline ( 771895 ) writes: <ian.ameline@ g m a i l .com> on Tuesday April 18, 2006 @07:07PM (#15153383) Homepage Journal

There are several techniques for increased performance or throughput that the designers of next gen microarchitectures are likely looking at.

There are extensions to known techniques;

A: more execution units, deeper reorder buffers, etc trying to extract more Instruction Level Paralelism (ILP).

B: More cores = more threads

C: hyper threading -- fill in pipeline bubbles in an OOO superscaler architetcure; also = more threads

I personally don't think any of these carry you very far...

Then there are some new ideas:

a: run-ahead threads -- use another core/hyperthread to perform only the work needed to discover what memory accesses are going to be performed and preload them into the cache - mainly a memory latency hiding technique, but that's not a bad thing as there are many codes that are dominated by memory latency

a': More aggressive OoO run-ahead where other latencies are hidden

Intel has published some good papers on these techniques, but according to those papers these techniques help in-order (read Itanic) cores much more than OoO.

b: aggressive peephole optimization (possibly other simple optimizations usually performed by compilers) done on a large trace cache. Macro/micro-op fusion is a very simple and limited start at this sort of thing. (Don't know if this is a good idea or not, or whether anyone is doing it)

But it's far from clear what AMD is doing. Whatever it is, anything that improves single threaded performance will be very welcome. Threading is hard (hard to design, implement, debug, maintain, and hard to QA). And not all code bases or algorithms are amenable to it.

Intels next gen (nahalem) is likely going to do some OoO look-ahead, as they have Andy Glew working on it, and that's been an area of interest to him...

A very interesting new concept is that of "strands" (AKA: dependency chains, traces, or sub-threads). (The idea is instead of scheduling independent instructions, schedule independent dependency chains. - For more info, see http://www.cse.ucsd.edu/users/calder/papers/IPDPS- 05-DCP.pdf [ucsd.edu])
But it's not clear how well it would apply to OoO architectures, but I would expect that likely approaches would also need large trace caches.

Applying this to an OoO x86 architecture, and detecting the critical strand dynamically in that processor could be very cool, and potentially revolutionary.

It will be very interesting to see what Intel and AMD are up to -- it would be even cooler of they both find different ways to make things go faster...

Share
twitter facebook
- - Re:It's not exactly clear what they have in mind (Score:3, Informative)
    
    by ameline ( 771895 ) writes:
    
    Where did I find the Evil Research(tm)? Where else but directly from the source of evil -- no, no, not Microsoft, the *other* source of evil -- Intel :-)
    
    It's already in their compiler;
    http://www.intel.com/software/products/compilers/c lin/docs/main_cls/mergedprojects/optaps_cls/common /optaps_pgo_sspopt.htm [intel.com]
    (Their compiler absolutely rocks BTW)
    
    And their excellent paper titled "Speculative Precomputation: Long-range Prefetching of Delinquent Loads" by Jamison Collins, Hong Wang, Dean Tullsen, Christopher Hugh
But what I really want to know... (Score:4, Interesting)

by Joebert ( 946227 ) writes: on Tuesday April 18, 2006 @07:10PM (#15153407) Homepage

Is Microsoft going to recognise this contraption as a single, or multi-liscense-able processor ?

And

Will AMD only hide the fact there's multi-cores from Operating systems other than Microsoft ?

Share
twitter facebook
- Re:But what I really want to know... (Score:2)
  
  by Zak3056 ( 69287 ) writes:
  
  Is Microsoft going to recognise this contraption as a single, or multi-liscense-able processor, and will AMD only hide the fact there's multi-cores from Operating systems other than Microsoft ?
  
  You're barking up the wrong tree here. MS has already addressed this in favor of their customers, and licenses on a per-socket rather than a per-core basis. One core, two cores, four cores, doesn't matter--one processor.
- Re:But what I really want to know... (Score:2)
  
  by TubeSteak ( 669689 ) writes:
  
  Will AMD only hide the fact there's multi-cores from Operating systems other than Microsoft?
  I'm going to guess that AMD will hide the multi-cores from everyone.
  
  The idea is that AMD will have the CPU do all the fancy (de)threading stuff on the chip. The entire point is to increase performance for non-optimized applications.
  
  If you're going to be using programs optimized for dual CPUs/cores, then there really isn't a point in buying a chip with AMD's technology on it, unless AMD plans to stop selling 'normal'
bullshit (Score:2)

by tomstdenis ( 446163 ) writes:

The bus between the two cores is FAR TOO SLOW for this sort of operation. Moving [say] EAX from core 0 to core 1 would take hundreds of cycles.

So if the theory is to take the three ALU pipes from core 1 and pretend they're part of core 0... it wouldn't work efficiently. Also what instruction set would this run? I mean how do we address registers on the second core?

AMD would get more bang for buck by doing other improvements such as adding more FPU pipes, adding a 2nd multiplier to the integer side, incre
- Re:bullshit (Score:4, Informative)
  
  by tomstdenis ( 446163 ) writes: <tomstdenis AT gmail DOT com> on Tuesday April 18, 2006 @07:20PM (#15153455) Homepage
  
  For those not in the know... reading a register from core 1 and loading it in core 0 would work like this
  
  1. core 1 issues a store to memory [dozens if not hundreds of cycles]
  2. core 0 issues a read, the XBAR realises it owns the address and the SRQ picks up the read
  3. core 0 now read a register from core 1
  
  It would be so horribly slow that accessing the L1 data cache as a place to spill would be faster.
  
  The IPC of most applications is less than three and often around one. So more ALU pipes is not what K8 needs. It needs more access to the L1 data cache. Currently it can handle two 64-bit reads or one 64-bit store per cycle. It takes three cycles from issue to fetched.
  
  Most stalls are because of [in order of frequency]
  
  1. Cache hit latency
  2. Cache miss latency
  3. Decoder stalls (e.g. unaligned reads or instructions which spill over 16 byte boundary)
  4. Vectorpath instruction decoding
  5. Branch misprediction
  
  AMD making the L1 cache 2 cycle instead of 3 cycle would immediately yield a nice bonus in performance. Unfortunately it's probably not feasible with the current LSU. That is, you can get upto 33% faster in L1 intense code with that change.
  
  But compared to "pairing" a core, die space is better used improving the LSU, adding more pipes to the FPU, etc.
  
  Tom
  
  Parent Share
  twitter facebook
- - Re:bullshit (Score:2)
    
    by tomstdenis ( 446163 ) writes:
    
    "armchair"... whatever. I'd say I know a bit more about the K8 design than your average slashdotter.
    
    The point is as it stands now the K8 cannot, repeat cannot, get a register from one core to another FASTER THAN THE L1 CACHE WORKS.
    
    Now that we got that out of the way... realize that ...
    
    IPC OF 99% OF ALL CODE is less than 1 on most cases and why is that? Aside from register contention there is the three cycle latency of the L1. So it's very trivial to stall an entire execution unit.
    
    So AMD would see little
    - - Re:bullshit (Score:2)
        
        by tomstdenis ( 446163 ) writes:
        
        Um actually you're wrong. The Core [64-bit stuff coming out] processors have a 4-way instruction window which is 1 larger than AMD already. That means they can issue upto 4 macro-ops per cycle. So processors are already using more pipes.
        
        There there THREE FPU pipes. Therefore it is possible to add an adder [or vice versa] to the multiplier then have the decoder be aware of this and feed stuff into either pipe. So technically you don't have to change the ICU at all to support more FPU resources.
        
        As for th
Academia's been proposing this for awhile (Score:4, Informative)

by Mifflesticks ( 473216 ) writes: on Tuesday April 18, 2006 @07:15PM (#15153427)

There are various projects that take differing views about how to do this. One class of such processors are "run-ahead" microprocessors. The idea here is to allow invalid results to be executed but not retired by a second processor running up to a few thousand instructions "ahead" of the processor executing real code to be retired.

There are several variations of this. One is to use the second core to run in advance of the 1st thread, the first thread effectively acting as a dynamic and instruction-driven prefetcher. One such effort includes "slipstreaming" processors, which works by using the advanced stream to "warm up" caches, while the rear stream makes sure the results are accurate, and to dynamically remove unecessary instructions in the advanced stream. Prior, similar research has been done to perform the same work using various forms of multithreading (like HT/SMT, and even coarse-grained multithreading). See the www.cs.ucf.edu/~zhou/dce_pact05.pdf for more details.

Others, such as Dynamic Multithreading techniques take single-threaded code and use hardware to generate other threads from from a single instruction stream. Akkaray (at Intel) and Andy Glew (previously intel, then amd, then...?) have proposed these ideas, as have others. Some call it "Implicit Multithreading".

Now, the register article is so wimpy (as usual) that there's no actual information about what technologies are used, but maybe it's a variation on one of the above.

Share
twitter facebook
Not True! (Score:5, Funny)

by Gorimek ( 61128 ) writes: on Tuesday April 18, 2006 @07:21PM (#15153458) Homepage

We have always been at war with hyperthreading!

Share
twitter facebook
Magically Parallelized? (Score:2)

by Bob9113 ( 14996 ) writes:

I write a fair shitload of multithreaded and single threaded code. Most code cannot be magically parallelized. Parallel execution of code that has not been made thread-safe would cause teaming masses of race conditions. Null pointers everywhere. Division by zero would be the norm, not an exception.

Now, if they're talking about allowing separate processes to run separately without specific SMP code in the kernel, fine. But that's not 2x performance.
Speculative Multithreading (Score:4, Interesting)

by DrDitto ( 962751 ) writes: on Tuesday April 18, 2006 @07:45PM (#15153614)

This was proposed in acadamia over 10 years ago. Its called speculative multithreading, or "multiscalar" as coined by one of the primary inventors at the University of Wisconsin (Guri Sohi).

Basically the processor will try to split a program into multiple threads of execution, but make it appear as a single thread. For example, when calling a function, execute that function on a different thread and automatically shuttle dependent data back/forth between the callee and the caller.

Share
twitter facebook
- Re:Speculative Multithreading (Score:2)
  
  by pkhuong ( 686673 ) writes:
  
  Or, potentially more simply, speculatively execute both branches and only commit the changes from cache (and kill the mispredicted thread) when the branch has been resolved. Things become easier to safely multi-thread when mutable state isn't shared. *cough* FP *cough* ;)
- Re:Speculative Multithreading (Score:2, Insightful)
  
  by AcidPenguin9873 ( 911493 ) writes:
  
  The "problem" with Multiscalar is that it requires compiler support to partition the program (AFAIK). It's not really a problem, I suppose, because Multiscalar is an academic project. But for millions of existing codes out there, a compiler-driven TLS system isn't going to buy you anything in terms of single-thread performance.
  There are other academic projects that are attempting to do TLS dynamically, in hardware. PolyFlow at Illinois is one, Dynamic Multithreading (mentioned elsewhere in this story)
beware... (Score:2)

by xiao_haozi ( 668360 ) writes:

"...two-core chip or provide quadruple the performance with a quad-core processor." unify, unite, and unihilate....beware the QUAD LAY-ZAH!
Load balancing might be interesting (Score:5, Insightful)

by Mia'cova ( 691309 ) writes: on Tuesday April 18, 2006 @07:57PM (#15153700)

It might be interesting if they took this idea in a slightly different direction. Set it up so the OS detects two CPUs. But, when the OS fails to utilize both CPUs effectively, allow the idle CPU to take some of the active CPU's load. I'm taking this idea from nVidia working on load balancing between graphics and physics in a SLI setup. So in this case the OS gets the best of both worlds, the ability to break tasks off to each CPU and a free boost when it's stuck with a single cpu-limited thread.

Share
twitter facebook
- Re:Load balancing might be interesting (Score:2)
  
  by tomstdenis ( 446163 ) writes:
  
  The problem is once an instruction gets down to the core of the ... er core ... it's hard to get it to another core.
  
  So you can only load balance at the process/thread level.
  
  Tom
This is Like RAID for CPU's (Score:5, Interesting)

by Marc_Hawke ( 130338 ) writes: on Tuesday April 18, 2006 @07:57PM (#15153705)

Striping: What is that? Raid 1? Raid 0? You take multiple disks, present them as one, and let the controller make the most effecient use of them while the OS and all the programs just have to deal with one big disk.

Looks like the same thing. You take multiple CPU's present them as one, and let the controller figure out how to best use them.

This could make for hot-swappable CPUs (heh) and the ability to have a CPU die without taking out your system. The redundacy nature of the other RAID configurations don't seem to translate very easily, but the 'encapsilation' concept seems to fit nicely.

Share
twitter facebook
- Re:This is Like RAID for CPU's (Score:2)
  
  by verbatim_verbose ( 411803 ) writes:
  
  Actually, linux has supported hot-swappable CPUs for years. You generally need special hardware for it, but it's been out there for a while.
- Re:This is Like RAID for CPU's (Score:2)
  
  by tomstdenis ( 446163 ) writes:
  
  The problem is that CPUs are very independent once instructions get into the decoder window. The only way to stop it is to raise an exception or interrupt (e.g. APIC signal).
  
  So just because you may have 4 cores in your box [say dual-core 2P] doesn't mean all of the cores can act as one logically to the OS in a meaningful and efficient manner.
  
  The striping analogy would be to dispatch instructions in round-robin fashion to all the processors. The problem with that is that the architectural state has to be s
- - Re:This is Like RAID for CPU's (Score:2)
    
    by zippthorne ( 748122 ) writes:
    
    This may be a stupid question, but why is the ram separate from the CPU anymore anyway? Would it really be that difficult to include a few gigs on the main chip with enough redundancy to overcome manufacturing defects?
    - Re:This is Like RAID for CPU's (Score:2)
      
      by x2A ( 858210 ) writes:
      
      Well then you've got two options - upgrading the RAM would mean you have some of your address space slower than the rest. So, you'd wanna use the faster RAM for most frequently used stuff, ie, it becomes a level3 (or for some CPU's) cache, you'd need management and storage to keep track of what's most often used, and swap pages out to the slower ram.
      
      So all you're talking about is having a larger cache, which is happening.
    - Re:This is Like RAID for CPU's (Score:3, Informative)
      
      by be-fan ( 61476 ) writes:
      
      Yes, it would be. RAM takes up a lot of area. Have you ever looked at a RAM module? It's made up of 8-16 seperate chips. The densest single RAM chips are on the order of 128MB. Moreover, RAM is manufactured on very different (and much cheaper) processes than CPUs are. Certain types of RAM are compatible with certain CPU processes (eDRAM, for example), but these are not cheap, nor particularly dense.
The Ideal Processor and Software Model (Score:2)

by MOBE2001 ( 263700 ) writes:

FTA: It's the very antithesis of the push for greater levels of parallelism

There is only one way to achieve optimum performance using multiple cores (or multiple processors) and that is to adopt a non-algorithmic, signal-based, synchronous software model. In this reactive model [rebelscience.org] , there are no threads at all, or rather, every instruction is its own thread or processor object. It waits for a signal to do something, performs its operation and then sends a signal to one or more objects. There can be as many ope
- Re:The Ideal Processor and Software Model (Score:2)
  
  by cnettel ( 836611 ) writes:
  
  The only caveat is that it would easily kill performance, and that your reliability statements are bogus. Wired hardware isn't reliable because you handle signals. It's relatively reliable because mistakes are expensive. Implement something like a chess AI or a nice user interface without algorithmic parts and without bugs. THAT would impress me.
Imagine... (Score:2)

by Junior J. Junior III ( 192702 ) writes:

Is this more or less like a beowulf cluster on a chip?

No, seriously, I'm having trouble envisioning it.
What about this... (Score:2, Interesting)

by Darkenreaper57 ( 774307 ) writes:

I don't have a lot of background in CPU architechture, but what if there was a parallel processing unit designed specifically to allocate threads to the cpus? This way, the cores can all function as one at the hardware level, rather than the software level (thus making it easier on developers and potentially increasing performance). Would it be better to have a dedicated unit/sector to process this information and divy it up to the separate cores, or no?
- Re:multi cpu (Score:2)
  
  by nsayer ( 86181 ) writes:
  
  Sun has been saying this, more or less, since about 1994. Personally, I always saw that argument as similar to the guy who gets the wooden medal saying he could have won the 100 meter if it had been best-of-4.
- Re:multi cpu (Score:4, Insightful)
  
  by cgenman ( 325138 ) writes: on Tuesday April 18, 2006 @07:57PM (#15153699) Homepage
  
  I'm guessing economic reasons push harder than technical ones.
  
  Sony already assumes that their PS3 chips will have a fault in one of the cores, and simply lock off that section when one is found. One fault no longer kills a chip, though two can render the power unacceptably low.
  
  The cool thing is this scales. If you have a 10cm^2 chip, traditionally your chance of perfection is 1/4th that of a 5cm chip, cutting your yield drastically. But if you have 6 cores on a chip with one dead one, and you want to go to 12, you should get a similar yield for a proportionally similar amount of dead cores.
  
  Cores let you limit damage from manufacturing errors, letting you build bigger chips more cheaply. At least, that's my layman's understanding.
  
  Parent Share
  twitter facebook
- Re:multi cpu (Score:2)
  
  by Billly Gates ( 198444 ) writes:
  
  Its still 2. The difference is the threading and scheduling for smp systems is now done in hardware rather than software.
  
  Makes sense since its a difficult and complex mess to write an app or an operating system that can run on 2 or more cpu's efficiently.
  
  My guess is in hardware you can do alot more then in software.
  - Re:multi cpu (Score:2)
    
    by tacocat ( 527354 ) writes:
    
    Well you can certainly do it faster. But it's a lot harder to work on a patch.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Isn't that just superscalar? (Score:5, Interesting)

No, superscalar is different (Score:5, Interesting)

Re:No, superscalar is different (Score:5, Insightful)

It's not branch mis-prediction, it's the memory (Score:5, Informative)

Re:Isn't that just superscalar? (Score:3, Informative)

Itanium (Score:2)

I suggest a compromise (Score:5, Funny)

Re:I suggest a compromise (Score:5, Funny)

Re:I suggest a compromise (Score:2, Funny)

Scheduling Threads (Score:5, Insightful)

Re:Scheduling Threads (Score:3, Insightful)

Re:Scheduling Threads (Score:5, Informative)

Re:Scheduling Threads (Score:2)

Re:Scheduling Threads (Score:4, Informative)

Re:Scheduling Threads (Score:4, Informative)

Re:Scheduling Threads (Score:2)

Re:Scheduling Threads (Score:4, Insightful)

Re:Scheduling Threads (Score:2)

Re:Scheduling Threads (Score:3)

Re:Scheduling Threads (Score:2)

Re:Scheduling Threads (Score:2)

Re:Scheduling Threads (Score:2)

Re:Scheduling Threads (Score:2)

Re:Scheduling Threads (Score:2)

Re:Scheduling Threads (Score:2)

Huh? (Score:4, Interesting)

Re:Huh? (Score:5, Funny)

Re:Huh? (Score:2)

Re:Huh? (Score:2)

Re:Huh? (Score:2)

Re:Huh? (Score:2)

Sounds familiar (Score:5, Funny)

Re:Sounds familiar (Score:2)

Re:Sounds familiar (Score:2)

Re:Sounds familiar (Score:5, Funny)

Re: (Score:2)

Hmmm (Score:2)

Re:Hmmm (Score:2)

Re:Hmmm (Score:2)

Re:Hmmm (Score:2)

Software isn't evolving. (Score:3, Interesting)

occam (Score:5, Informative)

Re:Software isn't evolving. (Score:4, Informative)

Re:Software isn't evolving. (Score:3, Informative)

may not want to go back.. yeah right (Score:5, Funny)

Re:may not want to go back.. yeah right (Score:4, Insightful)

Re:may not want to go back.. yeah right (Score:2)

Re:may not want to go back.. yeah right (Score:2)

Gotta love these CPU companies... (Score:5, Funny)

Re:Gotta love these CPU companies... (Score:2)

Re:Gotta love these CPU companies... (Score:2)

Re:Gotta love these CPU companies... (Score:2)

Amdahl's Law (Score:5, Interesting)

Re:Amdahl's Law (Score:4, Interesting)

Shi's law (Score:5, Informative)

Word (Score:2)

Sounds a lot like Intel's Mitosis research (Score:3, Informative)

Re:Sounds a lot like Intel's Mitosis research (Score:2)

Similar to MacOSRumors rumor (Score:4, Insightful)

Re:Similar to MacOSRumors rumor (Score:2)

I know... (Score:5, Funny)

Re:I know... (Score:2)

Re:I know... (Score:5, Insightful)

Or maybe predicting both branches? (Score:5, Interesting)

Yes, AMD! You get it! (Score:2, Interesting)

Re:Yes, AMD! You get it! (Score:2)

Re:Yes, AMD! You get it! (Score:2)

To sweet to be true (Score:3, Insightful)

It's not exactly clear what they have in mind (Score:5, Informative)

Re:It's not exactly clear what they have in mind (Score:3, Informative)

But what I really want to know... (Score:4, Interesting)

Re:But what I really want to know... (Score:2)

Re:But what I really want to know... (Score:2)

bullshit (Score:2)

Re:bullshit (Score:4, Informative)

Re:bullshit (Score:2)

Re:bullshit (Score:2)

Academia's been proposing this for awhile (Score:4, Informative)

Not True! (Score:5, Funny)

Magically Parallelized? (Score:2)