Ars Technica on Hyperthreading 235
radiokills writes "Ars Technica has a highly-informative technical paper up on Hyper-Threading. It's a technical overview of how simultaneous multithreading works, and what problems it will introduce. It also explains why comparing the technology to SMP is Apples to Oranges, in a sense. Starting with the 3 GHz Pentium 4, this tech will be standard in Intel's desktop lines (it's already in the Xeon), so this is important stuff."
AMD (Score:1)
Re:AMD (Score:1)
That's odd, right about now I was thinking perhaps the limit should have been a bit higher to allow people to proofread posts a bit more.
Re:AMD (Score:3, Insightful)
As Barton and MP were mentioned, I did think to ask what [Richard Heye, AMD Vice President of Platform Engineering and Infrastructure and the Computation Products Group] thought about the threat of Intel's Hypertheading. While I see Hyperthreading as possibly becoming a very useful add-on for the Intel CPU, I can assure you that Richard Heye does not. In fact, the subject of Hyperthreading seemed to excite him. Mr. Heye explained that he had been reading papers on the subject for years and that for Intel to bring Hyperthreading to market successfully, they (Intel) were going to have to throw many more dollars at the marketing side than the development side of the issue.
Who Cares? Its got DRM! (Score:2)
I'm pinning my hopes on Apple and maybe even China's new Dragon chip for my future computing needs.
No (Score:1, Offtopic)
Re:No (Score:2)
Dear sir, (Score:5, Funny)
Sincerely,
Intel
Might not speed up benchmarks... (Score:4, Informative)
SMP performance (Score:2, Informative)
Re:SMP performance (Score:2, Interesting)
Re:SMP performance (Score:2, Interesting)
Re:SMP performance (Score:2)
On the other hand, this might finally be an argument in support of threads versus processes, since that former avoids the cache issues.
Re:Might not speed up benchmarks... (Score:2)
Windows seems to be multi-threaded pretty well, at least from a UI point of view. I cannot help but feel that hyperthreading will likely have a similar result. If it does, it means Windows will behave better for the end user.
Users often weigh performance on how fast a window pops up, not so much on how many calculations can be performed in a second.
Re:Might not speed up benchmarks... (Score:2, Interesting)
SMP handles does a very good job of "hiding" all the processes the OS runs from a desktop user, you'll never experince slowdowns when the OS / an other app wirtes / reads from disk (if it's not because it's out of memory, and have to use the swap files...). On older systems this could be a significant problem. Playing games from cd-rom was often impossible, as the cd-rom drive used 40%-60% cpu when reading / seeking. With SMP you had another cpu to do your stuff, while the OS did it's stuff on another (not true of course, but close).
An SMT pc woun't necessarily benefit the same way as a SMP when running such unrelated processes simultaneous, especially cache intensitive processes (cache is a shared, limited resource).
I think SMT will benefit processor intensitve programs like simulations, and (multitreaded) games.
If some way of restricting each process / threads use of cache isn't implemented, realtime scheduling on these processors will be all but impossible (it's rather hairy on SMP as well).
- Ost
Re:Might not speed up benchmarks... (Score:2)
Might I suggest the use of an IDE controller that supports DMA... otherwise known as pretty much all of them since 1994...
Re:Might not speed up benchmarks... (Score:2)
This really indicates poorly designed hardware missing interrupts and/or DMA. Surely SMP will help here, but an extra CPU is a high price to pay compensating for the few bucks saved by using poorly designed components for the rest of the system.
I think HT will also help. As long as the busy CPU is busywaiting, the clever driver/OS designer could even make use of the pause instruction to reduce this virtual CPUs resource usage and thus speeding up the other virtual CPU. This means that on HT the resources wasted on busywaiting can less than on SMP.
Re:Might not speed up benchmarks... (Score:4, Interesting)
Do you know why they are afraid? In my view, threads re-introduce the problem where you have a bunch of processes that can freely share any memory at will, use any means of communication, and are a pain in the Ass with a capital A to debug/trace properly(without using internal debuggers). Try debugging a single process with dozens of different threads(i.e. threads with diff. entry points), where each thread has another dozen instances of itself. Now try using traditional debugging tools like strace,gprof(for tracing), or gdb.
In traditional multi-process environments, multiple processes are forced to communicate using well-designed message passing interfaces(pipes, unix domain & net sockets, FIFOs, message queues, shared-memory). Sure you can use share memory, but its done in a more restricted way(you share a buffer) so that it's not abused. Badly written threads in my experience use global variables and literally hundreds of flags(i'm not joking) for communicating what to do,whats the state,etc. Debugging processes are easier IMO, because all processes can dump their core, you can pause a process in action and see exactly what its currently doing(tracing).
I want to ramble more, but I'm tired. Anyone have more input on threads v.s processes?
Re:Might not speed up benchmarks... (Score:2)
Re:Might not speed up benchmarks... (Score:2)
"It's already in the Xeon" (Score:4, Insightful)
Yes, but since no one has a supersentient compiler and assembler like ht requires, very few programs are able to really take advantage of this.
I dig innovation. I dig more impressive chips. But it's getting to the point where boxes with top of the line CPUs are like those old VWs with Porsche engines in them: there comes a point when improving one part doesn't really matter any more.
Re:"It's already in the Xeon" (Score:4, Insightful)
Re:"It's already in the Xeon" (Score:4, Interesting)
There's a good reason for this. The biggest problem with debugging multithreaded code is preventing the threads from shooting each other in the foot. On unix-like systems, there's a simple, elegant solution to this: processes. If you use independent processes with shared memory, you can limit the foot-shooting problems to only the shared segments, and the rest of the code is safe. You also have several kinds of inter-process communication that are easy to program and fairly failsafe.
On Windows, you don't much have these things. Developers don't much take advantage of multiprogramming, because the inter-process communication tools are so complex. So the model is a single huge program that does everything. The natural development is toward an emacs-like system, in which everything is a module in one huge program. In such a model, it makes sense to want to use threads, so that some tasks can proceed when others are blocked.
One way to get unix/linux developers adopt threads is making it more difficult to use the basic unix multi-processing and IPC tools. If they can be made more complex than threads, then people will adopt the Windows model.
Alternatively, the threads library could be made as easy to use as the older unix approach. But so far, there's little sign of this happening.
Threads are a debugging nightmare, and a programmer who has lost months trying to debug a threadized program, and finding that the end result runs even slower than the original, is going to be shy to do it again.
Also, calling the developers dummies isn't very persuasive. They mostly hear such insults as a euphemism for "It's too complicated for your simple mind." When I hear things like that as answers to my questions, I tend to agree with my critic, and revert to things that I can understand and get to work right.
Re:"It's already in the Xeon" (Score:5, Informative)
Yes, you have to use mutexes and other synchronization primitives to serialize (or at least de-conflict) accesses to shared data. But, there's nothing that requires you to share data between threads. In fact, a significant percentage of the data in the average multi-threaded program is not shared. No matter whether you are building an application using multiple threads or multiple processes, you still have the freedom to use whatever mix of data sharing and message passing is appropriate for your application.
Data shared by multiple processes needs exactly the same kind of protection as data shared by multiple threads. Except that using shared memory segments requires a lot of extra book keeping and the segments aren't cleaned up if a program terminates abnormally. And obviously, no matter whether you are using multiple threads or processes, the foot shooting is limited to the shared data only.
You can communicate between threads (or even between the same thread or process and itself) using named pipes if you want. Same goes for sockets. Using a multi-process model instead of a multi-threaded model doesn't give you access to any additional mechanisms. In fact, it's much easier to build useful communications mechanisms if you're working with threads.
In Windows, you have basically the same tools. You may not know this, but the process & thread model in Windows is virtually the same as in most modern UNIX systems. The fact that old UNIX command line tools are small and oriented around using pipes for IPC is mainly a byproduct of history & convention, if that's what you're thinking of.
I would say that building applications with multiple threads is already easier than building applications with multiple processes. That has been my experience anyway.
On the contrary, debugging apps that consist of multiple processes is a nightmare. Debugging multi-threaded programs is much easier. For one thing, how many debuggers let you attach to & debug more than one process at a time in the same set of debugger windows (or at all)? Further, when you're debugging a program with multiple processes, if you signal or interrupt one process the others continue on (and vice-versa when you continue). This is rarely what you want. In general, the differences boil down to the fact that the OS & debugger coordinate & manage the execution of multiple threads within one application, while you have to do it manually if you have an application built with multiple processes. That means less work for the developer in terms of lines of code, less work in debugging, etc.
The problem isn't so much that old school UNIX programmers are dumb. Mostly, they're either afraid of change or just too damn arrogant & obstinate to bother learning new technologies.
Re:"It's already in the Xeon" (Score:2)
Now, why on earth would you do that? The reason for putting things in separate processes is to have them separated. There is no reason to treat two entirely separate things as one. How many word processors do you see that allows you to edit two documents in the same window/pane?
Re:"It's already in the Xeon" (Score:2)
No, and that isn't what I said. I would admit that I really didn't say much, but if I had said something I would have said that in the majority of cases, processes are more managable, and easier to debug than threads.
The number one reason to use multiple threads (or processes) is to isolate operations that block from the parts of the program that need to remain interactive.
Which in many cases can also be done with separate processes, select() or poll().
For example, the first web browsers were single threaded and would block waiting for a DNS query to complete.
Something that can easily be done as a separate process. Since the browser would have to listen on multiple file-descriptors anyway, in order to listen to both network and GUI events, this would have minimal overhead on the design of the webbrowser. And the increased modularity would help you in both designing and debugging both applications.
Another reason to use more than one thread is if a part of your program has to sit on a socket waiting for packets or connection requests - you couldn't write a useful web server without multiple threads or processes.
There exists several useful single-threaded webservers that select() or poll() instead of using threads. And there exists many that use processes instead of threads.
Finally, a lot of applications involve calculations that take a long time and it's usually desirable to let them run in the background rather than freezing up the whole UI while the calculation completes.
And most of those background calculations could just as well be put into a stupid background process. Leaving you with a simple, and dumb GUI to control them, and simple to write easily replacable background processes that can be developed and debugged independently.
I could go on, but I hope you get the point by now.
And so could I, but I doubt you get the point anyway.
Re:"It's already in the Xeon" (Score:2)
Nah.
Programming in threads is fine, so long as you have some ability to encapsulate your memory reliably. Doing multithreaded programming in Java is the default, and Java's strong object encapsulation and memory protection makes it quite reasonable to program with threads.
You absolutely have to be aware of concurrency issue.. deadlock, livelock, and all that, but it's not a terribly bad burden given that you gain so much in simplicity of memory management, integrated exceptions on null pointers, etc., etc.
Re:"It's already in the Xeon" (Score:2)
Well there are a lot of very nice wrappers out there (e.g. boost::thread, QThread and so on), plus most serious applications have their own wrappers. This is the same reason, incidentally, why developers generally don't use open(), read(), write() and creat(): that's not the level you're meant to program at. You should use stdio, sfio or iostreams instead, unless you really need the finer control.
One real reason, I think, why we don't see more threads under Linux is that Linux doesn't support POSIX threads. The POSIX threads model is processes which have threads. Linux, on the other hand, has processes which can share address spaces, file descriptors and so on, which is not the same thing. For example, fork()ing a process in one thread and waitpid()ing on it in another thread simply doesn't work under Linux. This sort of thing makes porting POSIX-compliant multithreaded applications to Linux difficult at best.
Note: Before anyone accuses me of FUDing, note that I'm not passing value judgements here. Linux threads might well be better than POSIX threads. As a developer, however, when I #include <pthread.h>, I expect POSIX threads, and under Linux I don't get that. It's the lie that concerns me.
Re:"It's already in the Xeon" (Score:3, Informative)
Wrong. Linux threads are compliant with POSIX 1003.1c (and most of the common extensions). There is one exception, abeit a minor one - you can signal individual threads in Linux. The POSIX standard specifies nothing about how threads are to be mapped to processes.
In Linux, the mapping between processes and threads is strictly one-to-one at the kernel level, although the use of thread groups makes it effectively one-to-many at the user process level. Other operating systems such as Solaris offer a many-to-many mapping with kernel light weight processes (LWPs), but it's again one-to-many at the user process level. Both implementations are about equally close to being POSIX compliant (Solaris threads aren't POSIX compliant because they don't support cancellation).
Not true again. In Linux 2.4, a parent process will wait on any child in the same thread group by default, unless you block SIGCHLD. In previous versions, it wasn't the default, but you could still do it. Besides, this doesn't have much to do with POSIX threads, because fork() and waitpid() aren't part of the pthreads API. fork() and waitpid() are process management functions. To create a new thread in POSIX, you use pthread_create() and to wait for one to exit you use pthread_join().
Perhaps not, but you are passing bad info.
Re:"It's already in the Xeon" (Score:2)
And, as the AC above already said, the POSIX thread standard leaves the choice of 1-1 or m-n up to the implementor. Which is logical, since it doesn't change the semantics of the program using pthreads.
Re:"It's already in the Xeon" (Score:2, Informative)
There are a LOT of good reasons to use this sort of multi-threading, especially since - if correctly implemented - it requires much less memory, cpu and debugging efforts than processes or the old sort of threading model.
Re:"It's already in the Xeon" (Score:3, Interesting)
Re:"It's already in the Xeon" (Score:2)
In general, I think that people learning to write multithreaded code is important. In a program where there are several disjoint tasks that can run in parallel, then multithreaded code can run faster. However, I disagree when people complain that it's a matter of lazy programmers, and that if they made everything multithreaded the world would be a better place. It's not so simple.
-J
Re:"It's already in the Xeon" (Score:2)
Re:"It's already in the Xeon" (Score:5, Insightful)
A correct multithreaded program is HARD!!!!! Anybody who thinks otherwise is an idiot. I have seen the results. All the systems I have seen are either broken or have so many locks in them that they may as well be single-threaded. Most Windows programmers use multithreading so that they can keep more state in local variables, which may be an ok goal but has nothing to do with speed. One of biggest buggiest programs here is a multh-threaded monstrosity written by a Windows program where there are 50 threads, ALL WAITING ON THE SAME SOCKET, and it crashes sparodically in the rare cases when two threads actually become alive at the same time. Every single rewrite to reduce the number of threads has greatly improved performance and reliability.
I have no idea why you think GUI should be multi-threaded. GUI has no reason to be fast, computers are MUCH faster than humans, at least at drawing junk on the screen. In fact the best way to do it is pseudo-multithreading, such as the method windows uses (gasp! Fact alert: it is NOT multithreaded, only one "DispatchMessage" is running at a time!).
I think perhaps you mean that the GUI should be running in a parallel thread with the calculations and there you have a point, however a lot of the problems are solved by deferred redraw, which the X toolkits do quite well (and in fact Windows is broken because it produes WM_PAINT events without knowing if the program has more processing to do).
Now if there are intense calculations I grant that parallel threads are necessary, and I am working on such a program, but I must warn you that it is extremely difficult: the GUI cannot modify ANY structure being used by the parallel thread, instead it must kill the threads, wait for them to stop, modify the structure, and start them again. If in fact nothing changed you need to restart so the partially-completed answer from last time can be reused, this means you must write all the code you would for a single-threaded appliation, it does NOT save you anything. If you restart the complete parallel calculation you will get an unresponsive program if that parallel calculation takes more than a second or so. You could instead do a fancy test to see if your modifications will change the data before you kill the threads and commit them, but this often requires you to calculate the modifications twice, and the overhead of this may well kill the advantage of the parallel thread, and at least in my example this was far worse than reusing all the single-threaded restart code.
Re:"It's already in the Xeon" (Score:2, Insightful)
Yes, it does: If you multithread it, you can e.g. show debug output, update controls and enable the user to still use the GUI. In many unix-apps your gui sort of freezes while the processes/threads are running in the background (doxygen had/still has this problem, if I remember correctly).
the GUI cannot modify ANY structure being used by the parallel thread, instead it must kill the threads, wait for them to stop, modify the structure, and start them again
This is not correct. It only happens when you don't know how to correctly implement a threading model, e.g. if you use journal-based threads instead of log-based you won't have any of those problems whatsoever. For example, the folks using Scheme48 [s48.org] implemented this, and it made a lot of their problems just vanish.
Re:"It's already in the Xeon" (Score:2)
I have no idea what you mean by "journal" or "log" based and would like an explanation.
I am working from the assumption that threads (as opposed to processes) are taking advantage of shared memory. It seems to me that if I have a big calculation that depends on some data structure that the GUI can modify, I have to stop the calculation before I can modify the structure, and I have to inform the parallel thread that the structure has changed (in my case I decided to tell it to restart the calculation from the start, which I was calling "killing the thread" although in fact I really set a flag that makes the threads throw an exception, they then go to code that waits for the GUI to unlock them so that they can start the calculation over again, I am not really creating & destroying threads).
Notice that in my case the threads are doing a calculation that can take several minutes, although they produce some results immediatly (a portion of an image) so the user can tell if they want to twiddle the controls some more.
It is possible that I am confused because you are considering calculations that are done in a fraction of a second, such as parallel updating of a complex OpenGL scene. In such cases it may make sense for the GUI to wait for the previous calculation to finish before updating the structures. This should produce good results as long as it is trivial to exclude the majority of GUI events as not modifying the data structures.
Other possibility is that you are assumming it is inexpensive to build a new data structure while keeping the old one in memory, destroying it only after the new one is built and the threads running the old one have either exited or are killed. This may be true but is certainly not in my case, where a huge savings is had by the reuse of cached information in the previous data structure.
Although I don't know what "journal" vs "log" are but they both sound like communication pipes. Unix has had pipes since 1972 so I don't think they are a recent innovation and in fact all possible uses for them have long been explored. Basically if you are avoiding use of shared memory then you are not using threads.
Re:"It's already in the Xeon" (Score:2)
This is what I meant by "storing state in local variables". This may be a useful goal but I think it requires compiler/language support and certainly there is no reason to use anything other than lightweight threads for this, it can be totally cooperative. In my experience with this (I have tried to write systems that worked this way by using setjmp hacks) there are serious problems with the creation and destruction or visibility changes of widgets, and focus navigation cannot be handled locally by widgets. It also tends not to work well for minimal update, in fact many of the Unix programs you complain about have this exact problem in that they were written so one "important" widget (the main display) was managed using local state and incremental update. So in the end I am unconvinced that this is a good idea. However it could be a very good idea with language support.
By "building new data structure" I mean modifying/creating whatever it is that the parallel calculation thread is reading. You cannot change it without a lock. Obviously you can change the parts the other thread does not care about, but by definition they have nothing to do with the calculation!
I would guess that my implementation is pretty much a "journal". The GUI thread updates structures that are not used by the calculation thread. When it thinks something has changed it kills the calculations threads (there are 4 or more of them), waits for them to go idle, and then copies all the changes from the GUI structure to the actual structure, at that time comparing them to see if there really was a change. This is not "really" a journal because the actual changes are not recorded, but I do make a list of what objects have been modified so it does not have to search them all. If there really was a change it also instructs the calculation to destroy cached data so it starts over at the start. I can't believe my situation is unusual: the data is enormous and the calculation takes several minutes and it is extremely difficult to be certain whether the user's modifications have actually changed anything without looking at the current calculation results.
In any case I cannot see any way a journal can avoid locks. Even if the parallel threads read from the journal you have to lock the journal while this is happening.
Re:"It's already in the Xeon" (Score:2)
This can reduce the work provided that the time code needed to retrieve or insert is tiny compared to the code needed to make the changes. It also requires that the code needed to turn a GUI action into a "message" and to turn a "message" into modifications of the data structure is about the same order of magnitude as the code needed to go from the GUI to the data structure directly. Not counting this overhead has been a problem with many message passing systems, in effect the GUI thread is "blocked" for the time it takes to calculate and insert a message, and the calculation thread is "blocked" for the time it takes to interpret a message, and even though they are not synchronized this overhead can add up to more time.
In fact I think this overhead, and the difficulty of programming message-passing, is why there was such a push for multithreaded applications. I think now we are seeing the backlash as people realize that multithreaded is not the end-all solution they thought it was and are trying to find the correct middle ground between parallel processes and mt.
The only way I see to reduce locks is to "batch" journals entries into a block, and allow them to be inserted and retrieved as blocks. Oddly enough though the more work done on this the more it looks like Unix pipes and stream i/o.
Lock Granularity (Score:3, Interesting)
To scale well you want to lock data rather than code and that can lead to many locks when you are operating on many structures. Ideally these locks each have less contention and better data sharing than "bigger" locks.
Re:Lock Granularity (Score:3, Interesting)
for (;;) {
lock(big_lock_shared_by_everybody);
figure_out_what_to_do();
lock(small_lock_around_my_work);
do_about_95%_of_the_work();
unlock(big_lock_shared_by_everybody);
do_about_5%_of_the_work();
unlock(small_lock_around_my_work);
do_a_bit_more_that_should_be_locked_anyway();
&n
}
Re:"It's already in the Xeon" (Score:2)
But this program convinced me that there is a mindset that says "more threads is better" and that this mindset is wrong. It is producing bloated slow libraries with locks around every single function (like putc() in the pthreads standard, thus killing the speed of the basic K&R design!). It also seems to be used 90% of the time for programming convienence so that state can be kept in local variables, something that could be solved by cooperative multitasking inside the program with some library support, rather than using real parallel multiprocessor threads.
Re:"It's already in the Xeon" (Score:2, Insightful)
I disagree. I will not be able to get you see the thing from my side, but let's state a few of thing:
1/ I know computer science.
2/ I wrote multi-threaded apps. Hell, I even wrote an IP stack and a B-Tree base transactional system. Those worked.
3/ I painfully debugged other people multi-threaded race conditions that only occurs twice a day on a heavily loaded server.
4/ I saw my co-workers writing bad multi-threaded code
5/ I maintained code I wrote for years, so I know the cost of the mistakes and complex features
6/ I now reject most designs that contains multiple threads.
> You can either do it, or you shouldn't be trying.
Most software is written by youngsters that did not realize that they should not be trying to write multi-threaded code. See the swing source code for instance. It is a pathetical mess of code made by (mostly) clueless coders that thought they were smart.
I am not saying that I cannot write multi-threaded app, or that you cannot write one. I was arguing that the original poster, the Be Fan, was deadly wrong when he said that multi-threaded apps were easy, that "Namely, developers have to make their programs multithreaded."
And that moderators that gave him that +5 never debugged production multi-threaded code.
Cheers,
--fred
Novell Netware 5 & 6 (Score:2)
Hyperthreading on Windows (Score:5, Informative)
www.microsoft.com/windows2000/docs/hyperthreading
Re:Hyperthreading on Windows (Score:5, Funny)
Re:Hyperthreading on Windows (Score:5, Informative)
However, it won't go on to use the extra 2nd logical CPU in each physical CPU because you've used up all your licences by then (2000 server only gives you a 4 CPU licence).
If your BIOS doesn't enumerate CPUs the way Intel says they should, then 2000 will use both logical CPUs on the 1st and 2nd physical CPUs, and presumably leave your other two physical CPUs idle.
In
Re:Hyperthreading on Windows (Score:2)
Remind me - where was the FUD again?
Re:Hyperthreading on Windows (Score:2, Informative)
When examining the processor count provided by the BIOS, Windows
[DIAGRAM 4]
This example illustrates the great benefit provided by Windows
Well that's unsurprisingly lame on Microsoft's part. Basically that document says "we're too lazy to update Windows 2000 to PROPERLY recognize SMT-enabled processors, and will screw you on licensing unless you upgrade to
Re:Hyperthreading on Windows (Score:1)
Well that's unsurprisingly lame on Microsoft's part.
and exactly the same on any older linux kernel - you can't support what you don't know - prior to ev8 or p4 most kernel hackers had ever thought about "logical" processors. I'm pretty sure they could release a service pack to support hyperthreading on w2k, but they love to make money
Re:Hyperthreading on Windows (Score:2)
2) the lame part is that you have to pay to not have to pay for your logical processors. linux doesn't have these lisensing issues.
Re:Hyperthreading on Windows (Score:2, Insightful)
2) the lame part is that you have to pay to not have to pay for your logical processors. linux doesn't have these lisensing issues.
yep - it might be lame to force people to buy new licenses, but hey we're free to run other operating systems and I'm sure any real microsoft zealot upgrades to
Re:Hyperthreading on Windows (Score:2)
This reminds me of what Microsoft did with DirectX under NT4. I had a copy of directx7 or directx8 for windows2000 beta2 and it worked fine under NT4. I finally could play other games besides quakeIII. I had a hard disk crash and had to re-install. Guess what? Microsoft updated the code to not install on n4 by defualt and they removed the old directx package! They did this to sell more copies of Windows2000. Very sleezy.
It worked. I then paid $300 for win2000 as more and more games used directx rather then opengl. My guess is that only a single line of code was used to force users to upgrade. Same is true with the code in Windows3.1 to make sure only ms-dos was used in conjunction with it. Add a single of line coder here and there and watch consumers open their wallets.
Linux at least does not have this problem.
Re:Hyperthreading on Windows (Score:4, Informative)
From hyperthreading.doc "Windows 2000 Server does not distinguish between physical and logical processors on systems enabled with Hyper-Threading Technology"
Basically for 2000 family you need 2x your CPU-license limit; each virtual processor counts as a physical one.
So A
Re:Hyperthreading on Windows - user experience (Score:2, Informative)
The CPU is a single Intel Xeon 2.2 GHz.
Hyperthreading can be turned on or off in the BIOS of the machine. I turned it on before I installed Win2K.
The system was seen as a dual CPU machine from the time I installed it from the original CD, before I applied any service pack.
If I disable hyperthreading in the BIOS and boot Win2K, then I only see one CPU.
I have a second Xeon CPU on order for this machine as it is dual capable. Once I get it, it should make it look like a quad CPU in Win2K.
FYI, I am also running another OS on the system, Warp Server for E-business with the SMP kernel. Unfortunately the OS2APIC.PSD driver only detected one CPU even with hyperthreading enabled. I contacted the OS/2 kernel developer at IBM Austin, who told me that somehow there needed to be explicit support for it in OS/2 SMP for it to work.
I also left about 20 GB unpartitioned on my hard disk for Linux, but I haven't gotten around to installing it yet. Thread support in Linux has historically been poor and this is the main reason why I haven't done so. With the availability of the NPTL library, I'm looking forward to installing Linux, as NPTL becomes the standard pthreads library for Linux.
Hyper-Threding, eh? (Score:3, Funny)
Hyperthreading verses SMT (Score:3, Interesting)
Re:Hyperthreading verses SMT (Score:2)
In the historical context, the name is perfectly fitting.
multithreading (Score:3, Funny)
yes, I realize this is anti-geek, so this processor would also allow you to take control of thread creation by flipping a register or something.
Re:multithreading (Score:1)
Re:multithreading (Score:1)
it's very difficult to do well (Score:5, Informative)
typo (Score:2)
Re:multithreading (Score:3, Informative)
Until that happens it's simply not possible for anything but the most trivial of tasks (which is already done by compilers and processors with multiple execution units).
Re:multithreading (Score:1, Informative)
ftp://ftp.cs.wisc.edu/sohi/papers/2002/mssp.mic
It's all about language support (Score:2)
There are languages (well, mostly modifications to existing languges) that allow one to create a program that will scale to any number of processors.
It's actually a very tough problem, because most coders thing in terms of doing x, then y, then z. You really need to think in terms of I need these things done and they have these dependencies, but other than that, divide and concor any way you want.
parallel programming languages on Google [google.com]
Re:It's all about language support (Score:2)
Programs tend to be a linear order rather than a partial order.
This can be a problem even with strictly sequential processing if the requirements keep changing.
Software Objects Should Be Concurrent as a Rule (Score:2)
There should be no such thing as a sequential or algorithmic task. Programs should be parallel to start with. The biggest problem in software engineering is the age-old practice of using the algorithm as the basis of programming. This is the primary reason that software is so unreliable and so hard to develop. Objects in the real world are concurrent. Why should our software objects be any different?
Re:multithreading (Score:3, Informative)
Beyond that, you really need to be able to look at the program as a whole in order to do anything that clever, so you're talking language, compiler, or library features, and you generally have to involve the programmer somewhat, although you don't necessarily have to do it as explicit threads. (E.g., there's a C variant with a keyword that says it's okay to evaluate all of the arguments to a function at the same time)
Re:multithreading (Score:2)
Suns MAJC tried to do that (Score:2)
Inicidentally, that chip was also supposed to do SMT and single-chip-SMP and SIMD. Dunno how well it faired, I kinda forgot about the chip after its second schedule slip, and I haven't seen it mentioned much since then... it should have been out for at least a year now.
More fun for OS guys (Score:2)
Hyperthreading? What's next? (Score:5, Funny)
obligatory spaceballs reference
Oracle, W2K Enterprise (Score:2, Interesting)
Double your licencing cost for a 5% to 30% performance improvement? I don't think so. Hyperthreading is DOA on for enterprise.
Luckly MS has decided to enable 2 CPUs in XP home so you dont have to ante up another hundred bucks for XP professional for the 5% to 30% performance improvement.
Junkware.
Re:Oracle, W2K Enterprise (Score:5, Informative)
On the downside of HT, until the 2.6 (or 3.0, subject to Linus' whim) kernel comes out, there's no point in enabling HT on a Linux box; because the 2.4 scheduler is unaware of HT, all CPUs are treated the same, and the scheduler ends up starving one physical CPU. Performance on a dual-1.8Ghz Xeon, 1Gb RDRAM with HT enabled under 2.4.10 is roughly 5-15% slower than with HT disabled.
2.5.31 with the HT patch dramatically reverses these numbers, providing an average performance that is 30% better than 2.4.10 without HT. YMMV, of course, and I'm not talking about OS performance, I'm talking about Oracle's performance. Still, 30% increase just for flipping a switch in the BIOS and recompiling the kernel is nothing to sneeze at.
Re:Oracle, W2K Enterprise (Score:2)
I don't think comparing Linux 2.5.31 with HT to Linux 2.4.10 without HT is a fair comparison. That supposed 30% performance gain could easily be attributed to many of the HUGE kernel changes made in the 2.5.x series. A more fair comparison would be 2.5.31 without HT turned ON and then turned OFF. Then you only have a single variable.
Re:Oracle, W2K Enterprise (Score:3, Insightful)
So many technologies can interfere with processor count that Oracle and Microsoft are using whatever is a best case scenario for them. If licensing is by physical silicon only, future iterations of multi-processing on die will really hamper software provides profitability - something you know they will not stand for.
If it was exclusively per CPU, you would also see a lot of shops always buying the absolute fastest processors available, and specialty shops selling factory over clocks of those processors. Reduced licensing costs would actually make the price of exotic cooling methods and reduced cpu life look good.
Same rule applies to Co-location in a different way. How much power can you stuff into 1u of rack space?
If the most costly machine you can buy is a 48 CPU machine that can fit into 3u using Quad processors cards on a back plane but costs less in the long term because you are not paying for 24u of rack space that dual processor 1u machines would take, you buy it. Even if your per cpu cost is 10 times the cost of more conventional systems, the machine pays for itself in rack space costs in 10 months. After 18 months you upgrade the machine because by then you are paying twice as much for per cpu licenses as you could be paying with modern hardware.
Note to businesses: Upgrade now while prices are depressed, and interest rates are low. Sticking with your old hardware is costing you in the long term.
Take out a loan and upgrade. If your hardware is over 18 months old, you can cut your licensing costs in half. Don't sit on hardware when you are just waiting for it to break.
IT is not a static business. Do not keep your hardware until it has no resale value. Do not keep your hardware until you are paying twice as much for licenses as you could be paying. Do not balk at high up front costs if it saves you 10 times it's upfront cost due to licensing/rack space costs. Do not keep old machines that are costing you three times as much in electricity at a given performance level.
Do a real cost analysis, put in the time. This is the perfect time to upgrade. Competition has never been more fierce for the dollars you have to spend. You will get more value for your dollar now than you ever have been able to.
IT is crap as capital. It has no value in three years. Keep you IT expenditures dynamic to avoid riding your capital investment into the ground. Playing the depreciation tax game will not save you nearly as much as keeping old hardware costs you in other areas.
Disclaimer: I am not invested in any IT infrastructure provider and I do not do IT consulting. I just have to run my own shop like the rest of you.
Terra/Cray MTA (Score:5, Interesting)
Be careful (Score:2, Redundant)
Re:Be careful (Score:2)
Since most applications have only one processor in mind, it's typical to see dual processor systems that don't have much performance gain.
The distributed systems (SETI, GIMPS and others) are very well suited to multi-processor environments... and this is taken to extremes by having the multi-processing done on entirely different machines with some 'master' computers that handle the overhead of reassembling the multiple datasets into something coherent for the whole system. It's actually an amazing thing when you get to thinking about it.
Great article (Score:2)
The distinction between a program in memory and a process in execution is important. It is also important to understand the illusion of simultaneous execution that is acheived through concurrent processes using context switches.
Given all that, the article makes it easy to understand where your performance gains (and losses) happen having multi-processors, and indeed in having multi-processing on the same chip.
All in all a good read.
Re:Great article (Score:2)
Not really. For starters, it doesn't go into any detail on how to use threads. They make no mention of things like semaphores, locks, monitors or race conditions, the sort of things that make threaded application development difficult.
It goes into the very basics of that part of the OS module I did at the beginning of my second year, so please don't trivialise such courses.
Re:Great article (Score:2)
For people who hasn't studied computer architecture I bet it's a rather tough read. (In which case they should go get Patterson & Henneseys books. They are just great regarding this type of stuff.)
Re:Great article (Score:2)
Damn thief (Score:2, Funny)
(On a related note, this brings to mind one of my favorite
He stole my
Linux support for Hyperthreading.. (Score:3, Informative)
http://kerneltrap.org/node.php?id=391
http://k
</karmawhoring>
SYMMETRIC Multi Threading (Score:5, Insightful)
They call this stuff Symmetric Multi Threading, but I think that name is a bit misleading. While the thread scheduling itself is symmetric (all process threads are created equal and receive equal execution time), the shared resources on the CPU (cache, shared registers) are NOT symmetric. Since these shared resources are in essence handled on the way in to the execution unit, it becomes really easy to starve the processor when you have contention for one of those resources.
While proper application development can alleviate some of this issue, it will depend heavily on the actual usage patterns of the system. When you have a lot of overlap coming in from memory (like the file system cache on a web server), you don't worry too much about threads stepping on each others' registers. This sounds fantastic for data servers.
Desktop systems, on the other hand, almost never work this way. When you're playing MP3s in the background while web surfing and checking your email, you're already working with vastly different areas of data. Throw the OS and any various background processes into the mix and you've pretty much eliminated any gain and possibly slowed down due to cache contention.
While this was touched on at the end of the article, I don't think it was given enough weight. It doesn't just depend on what applications you're running and wether they were written to take advantage of it. It depends on what you want to do with the whole system. For serving data, this will certainly be good (especially with multiple CPUs!). For desktop systems, this is a non-starter.
I'm not disparaging the technology - far from it. I'm just waiting for Intel and Microsoft to market this to my mom as a way to have higher quality DVD playback - at twice the cost. And her buying it. Again.
Re:SYMMETRIC Multi Threading (Score:2, Informative)
I believe when symmetric is used in the context of SMP and SMT it is intended to mean "all execution elements have the same public interface".
Things would be asymmetric in cases where there was a differentiation between the performance or capabilities of the execution elements - e.g. where one processor could handle interrupts and the other couldn't. An 80286+80287 is an example of an asymmetric system - one execution element can only do FP stuff, the other can do everything but FP.
Re:SYMMETRIC Multi Threading (Score:3, Informative)
It's SIMULTANEOUS multithreading.
This means that both threads are in the processor pipeline simulatenously.
Increasing pain of Mis Predicts and IO Access (Score:3, Interesting)
Re:Increasing pain of Mis Predicts and IO Access (Score:2)
So... they could eliminate the whole concept of branch "penalties" altogether.
Now is this how it is actually implemented? I don't know. There are already plenty of complications present in the processor, so changing this bit of logic is far from trivial. Still, since a branch calculation is a fixed amount of time that leaves part of the execution units free, I don't know of any reason this sort of scheme could not be implemented.
Perhaps if someone else has some information?
More good news (Score:2)
ASUS has released BIOS upgrades [tech-report.com] to the P4T533 line of motherboards that now support Hyperthreading.
And rumors persist that Hyperthreading is on the current P4 chips (Socket 478?) and may be enabled at a later time if all goes well
Re:More good news (Score:2)
Intel may have disabled Hyperthreading in it's current CPU's to prove out the technology first, and once they are ready a simple BIOS update may enable this feature.
a clarification (Score:3, Informative)
Since lots of people seem to be missing the point of "hyperthreading", as Intel is calling it, I feel like jumping in and trying to clarify a little bit.
Processor clocks have gotten faster and faster and faster and faster over the last decade. Multiple orders of magnitudes faster. Not only that, but processors have incorporated increasingly clever tricks to process the data they have available to them. Memory speeds have increased too, but even with DDR and all that great stuff, they haven't kept pace. So there are times when your super-fast processor is just sitting there waiting around because it's run out of data to process.
Even if you could (cheaply) make memory that actually ran at 2 GHz or whatever, this would not solve an even more fundamental problem that makes the situation worse: due to the speed of light, a 2 GHz processor is going to have to wait a really significant amount of time if it has to wait on main memory before it's time to process something.
So, here's a question for you: if the processor has to wait a really long time, maybe enough time to execute maybe like 50 instructions, what should it do during that time? Should it:
Well, the idea behind the hyperthreading (a/k/a thread-level parallelism) is that the processor should make some sort of effort to do something.
So, IMHO hyperthreading isn't stupid or a marketing ploy. It's a genuine attempt (one that many processor makers are working on, by the way) to solve a genuine problem. And not only a genuine problem, but one that will increasingly become a bottleneck. (It's already bad enough that it has its own name: "The Von Neumann Bottleneck".)
And by the way, the advantage of this over two processors is that you don't have to build two chips! You don't get double the performance, but it's quite possible that you might get a better bang for the buck. (Notice I said "might".)
Also note on the cache pollution issue (where one thread slows down another by "hogging" the cache and actually causing slower execution for another) that there are ways to mitigate this problem. An obvious one that comes to mind is to bias the processor towards executing a particular one of the threads. That way, one thread runs much more often and should tend to have what it needs in the cache.
Anyway, until the economy gets better and I find a way not to be one of the masses of unemployed software developers anymore, I'm not buying one of these fancy processors...
Placing an Idea in the Public Domain... (Score:2)
The starvation issues with symmetric-multithreading can easily be addressed by keeping an instruction count for each virtual thread; perhaps hooked to an interrupt the OS can use to tell when each thread has consumed its allotted processor resource.
That way, threads that have been starved for resources will remain in the process core longer than the any who happen to "hog" a resource. In other words, instead of time slicing, you can use instruction slicing to insure fair use of the scheduler between contending threads.
Volla! Problem solved. (Not counting the dozen man-years it would take to implement.)
Hyperthreading at UCSD, and why the Tera Sucks (Score:3, Interesting)
References: The Tera: http://www.cs.ucsd.edu/users/carter/Tera/tera.htm
Dean Tullsen: http://charlotte.ucsd.edu/users/tullsen/
I was one of the first five students to use the Tera after it came out of development. I decided to take a different approach in evaluating its performance. I didn't like what the Tera corporate benchmarkers were doing. Which was taking applications with known parallelism, writing a serial version of the code, and then post with glowing reviews the results of the Tera automatically finding parallelism, ignoring that the number of pragmas they had to put into the code to allow the compiler to discover parallelism was more work that just writing a parallel code oneself.
I instead called them on their advertising that their compiler could discover latent parallelism in any computation-heavy code. I noticed John Carmack's
When I reported the results to Carmack, his response was, "I have never been a big believer in magically parallizing dusty deck codes. I don't mind specifying explicitly parallel activities and threads, especially with the large payoffs involved."
Cheers,
Bill Kerney
Re:Solution to "hog" problem (Score:2)
I have to think there is some merit to your idea, though. I just wanted to point out that it's not a simple matter.
-J
Re:SMP is the way grasshopper (Score:4, Informative)
When the locked application's timeslice runs out, other applications will get a go, and from that it it possible to kill the locked application. This is one of the reasons pre-emptive multi-tasking became popular.
Re:SMP is the way grasshopper (Score:2)
And before the *ix/X/KDE folks smile too broadly about this, I routinely have the same thing happen in KOffice applications when I scroll through the font selector drop-down a bit too zealously. XFree86 and xfstt decide to have a CPU party and other X clients are not invited (sniff, sniff).
My point being that a pre-emptive multi-tasking O/S is no guarantee you'll make it out of a (near-)infinite loop alive with your original session intact.
Re:SMP is the way grasshopper (Score:2)
I do have a KDE application that tends to do funny things at times. The CVS version of Kopete sometimes decides to lock up just as I open a menu, which seems to render nothing else on the screen clickable. The way I get around this is to drop to a terminal and kill Kopete from the command line.
I'd like to point out that in both those cases, adding another CPU doesn't increase your chance of being able to recover. The system might be a bit more responsive, but it doesn't matter how many CPUs you have if your UI is dead or the scheduler has fucked up.
Re:SMP is the way grasshopper (Score:2)