Posted
by
michael
from the make-kpkg-kernel_image dept.
Sivar writes "Andrew Morton of EXT3 fame has posted benchmarks of Linux 2.5.47 prerelease compared with the latest from the current 2.4 series. With some tasks more than tripling in performance, the future looks very promising."
This discussion has been archived.
No new comments can be posted.
What really amazes me is that _every_ major release of nVidia drivers has a huge performance gain for an already-great card. Hell, ATi can't even get their own cards to work right!
Well according to Linux Format [linuxformat.co.uk] the reason you need a NVidia card for UT2003 is that only a commercial driver can implement the patented S3 texture compression used by UT2003! Sounds like the beef there is with Epic not ATI. btw, is there a good source on how to get cvs X up and running (on debian for preference)? I got a new laptop with an ATI M9 a couple of weeks ago and know that this is only supported in CVS at the mo. The last times (4+ years ago) I tried compiling X things worked far less than wonderfully!
I'm glad you put the "of EXT3 fame" bit, I was worried the article might be talking about the infamous author [amazon.com].
Although he might end up on the front page of/. if he writes an unauthorized biography of Mr. Gates, what kind of juice could be dragged up from the past... I wonder?
I looks like if you are talking about large internet operations and large cpu intensive tasks and large io tasks all at the same time... then YES! it will make the internet faster.
So maybe this will help some people stand up to being/.'ed.
Nice to see Linux doing good on big machines with standard packages and such. I love linux, and it's the only thing I use at home for anything serious, but commercial software has always had the edge on *big* things (big disks, large processes, etc.). With recent advances in process management, and now this, a lot more people will be able to use Linux top to bottom.
I think one interesting thing that could come out of this is that IBM (and others) will be pushed more and more towards a pure service or application only niche. They won't always be able to say, "Sure Linux is great for the workstation, but what about your 8 TB database?" There's a ways to go, but a lot of the features are falling into place.
Having a unified OS from your palmtop to your TB file server will open up a lot of possibilities for people. My personal interest is in a next level of integration which is more natural to use and easier to develop, and we're getting close.
IBM is also going to stay in the high-end hardware department; it'll be "Sure commodity hardware is great for the workstation, but what about your 8 TB database that has to survive even if someone saws it in half down the middle?" This also puts them in, essentially, the BIOS department for these machines (you want to run you web site off of whatever portion of your database machine isn't actually being used by the database, without risking problems if the web server gets hacked).
I don't disagree at all. I said that this would begin to push others more to service. With each new thing that you can get for free that works just as well as what other's charge for, you capture a little bit more of the market. This alone, and what has been developed to date, will not push IBM out, nor is everything that needs to be done for such things as you say been done. But it's a step.
I'm not talking now, nor even tomorrow, but in 5-10 years, I think we could see a very different landscape in how old school commercial software and hardware companies (or, in IBM's case, departments) work.
If you can spend $1 million on developing your whizzy new file system, or you can use something that's freely available (or spend $100,000 to tweak it), then the economics of it start to push people out of commercial development in some areas, especially around OS and OS functionality. Instead, you just consult, or deploy, or support and such.
This was a major reason that 2.5 is, put simply, needed by any and all serious Lunix users.
Based on this image (0202_lab_xp_4.gif [interex.org]), one can see that large volumes of asynchronous I/O is, as the author puts it, the "Achilles' Hell" of Linux.
The Linux kernel itself in all versions 2.5 serializes disk Input/Output with a single spinlock.
(The yellow is the Windows XP box; the green line is the data for the SuSE Linux pee sea)
He newer say that linux is worse, just that linux has an achilles heal.
On the contrary, you stated that there was a reason why serious Linux users needed the 2.5 kernel. But if Linux and XP perform the same overall, why do serious Linux users need the 2.5 kernel? They don't.
I'm not saying that this particular article doesn't make certain conclusions about asynchronous I/O. It's a simple fact that it does. But you made a conclusion based on those conclusions, and your conclusion is what disagreed with.
I also pointed out that presenting a single image without understanding the full context of the article is silly. The tests in the article you quoted were run on laptop computers. What kind of "serious Linux users" with great need for asynchronous I/O use a laptop for that purpose? The image I quoted showed performance in a more typical laptop use pattern (according to the article's author). For that matter, would they use ReiserFS, or would they use XFS, JFS, or another filesystem?
Hello moderators! This is not a troll. He pointed out something that has a nice graph to support what he's saying. Not only that, but it's very well known. AIO on Linux has never been stellar...but should be soon enough.
Someone, please mod the partent back up. He wasn't trolling and was simply stating fact!
Then it's just karma whoring then?
Look he pulled the giff out of his ass, and tried to pass himself off as a expert.
You might look at some of his other posts.
Meanwhile I do recall seeing such a graph much like, if not that graph, by Open Bench Labs (name my not be correct -- it's been a while). Not only that, but the graph EXACTLY matches known expectations on a common Linux AIO Linux implementation (which is purely userland). This is why kernel level AIO implementations have been underway for some time now.
In other words, you may not like the message but it EXACLY matches the current state of some AIO implementations on Linux. Call is a lie if you like, meanwhile those of us that know, will simply nod, move on, and occationally laugh at those that continue to hide their heads in the sand.
Believe it or not, Linux isn't the be-all, end-all of OS's. If it were, there wouldn't be a need for the 2.6 kernel.
For a guy such as myself, who does all his daily tasks on a linux box, what does this mean? Will it mean faster loading time/stability. Or will it make little difference at all?
It means that you won't see too much speed-up on your desktop machine. But, if you run a big server that does multiple processes at once, say Oracle, you could see significant performance gains.
Fact is, anyone that heavily uses their Linux box will see some difference. It's just that the heavier your box gets used, the bigger difference you'll see.:)
Those that do little serious multitasking may see "smoother" multitasking but little more. Those that perform concurrent compiles, heavy CPU or I/O database servers, big time-share systems, etc, will see larger and larger note worthy gains.
You'll get better interactive performance under load. So if you're encoding an mp3 and writing your home directory to a CD, your mouse cursor won't stick and your windows will refresh reasonably well. Unless you're doing something kind of disk/processor intensive, you won't notice the difference, because 2.4 is too good already for there to be much improvement. If you try to encode 32 mp3s at the same time, 2.6 will actually do worse than 2.4, but at least it won't make ls quite so slow.
The main goals are interactivity (input gets handled quickly), low latency (your mp3 player gets a chance to send the next second of audio to the sound card before this second is over), and fairness (every program makes at least a little progress after a short amount of time).
Overall throughput has not increased (actually, it is believed to have decreased). So the overall speed of the system is relatively equal to the 2.4 series of kernels. You probably won't see any major performance speedups in any apps you use.
However, the overall responsiveness of the system is improved. Most people who have used it have claimed that it feltmuch faster than the 2.4 series. You won't have starved processess.
This means if you're running XMMS and you compile a kernel, XMMS won't just hang until the compilation is done. The kernel developers have done a great job in improving -fairness- between processes.
Mostly, the results will be seen on Big Iron and server applications, but the overall desktop experience is expected to improve.
I'm a huge linux fan and I love to brag about how much better than Windows it is, etc. However I don't think it's right to say false truth like "linux 2.6 will be 3 times faster!!!!!" KernelTrap mentions that:
Most significant gains can be expected at the high end such as large machines, large numbers of threads, large disks, large amounts of memory etc. [...] For the uniprocessors and small servers, there will be significant gains in some corner cases. And some losses. [...] Generally, 2.6 should be "nicer to use" on the desktop. But not appreciably faster.
Some of the biggest improvements for desktop responsiveness can be found (for Kernel 2.4.x) at Con Kolivas' web site of performance linux patches [optusnet.com.au].
However I don't think it's right to say false truth like "linux 2.6 will be 3 times faster!!!!!" That would be why I said: "With some tasks more than tripling in performance,..."
I doubt any semi-knowledgeable person is going to take that statement to mean that kernel 2.6 makes a Linux system three times faster, but depending on what they use that system for, it may do just that. The performance figures are very respectable alone, but when you consider that they kernel hasn't even been frozen yet and that tuning hasn't begun, as I sais, the future looks very promising.
The fine-grained locking improvements on SMP will make it noticeably better for SMP boxes.
A very big improvement is that IDE has been parallelized, meaning that if you use multiple IDE devices at once you will see a "night and day" difference in performance.
If you are uniprocessor and all SCSI and already use low-latency patches, well, as you were.
When I first heard about some of the things going on in the 2.5 branch,
such as the newly tuned VM system, improved filesystem code, and especially
Ingo Moyar's O(1) scheduler project, I was ecstatic. The
promise of an average workstation computer handling 100,000 threads with as
much grace as it handles 100 sounded too good to be true. And alas, it was.
There are a number of serious problems with Linux 2.5's scalability pushes,
trading performance for normal tasks in order to run better at more esoteric
tasks, and many of these can be traced back to the new O(1) scheduler.
A month ago, I downloaded the 2.5.44 kernel, and have been benchmarking it
extensively on one of the Pentium 4 2GHz workstations in the computer lab. For
a control, I ran a stock 2.4.19 kernel on the Athlon XP 2000+ machine next to
it. My test consisted of running an increasing number of parallel processes
each executing a for(;;) loop repeatedly nanosleep(2)ing for 10ms, thus
yielding the scheduler every time they awake. This made sure that the scheduler
was more or less the only thing running on the system, and that I could get an
accurate count of the average task-switching time.
By gradually increasing the number of threads on each machine in parallel, I
was able to graph the comparative performance of the two schedulers. The results
do not bode well for the new scheduler: (forgive my somewhat clumsy approximation, text is not the best medium for graphic content)
S |
c | . O(n) scheduler (2.4.19)
h |.
e |.
d |-----.------- O(1) scheduler (2.5.44)
T | . |
i | |
m | |p
e |_______|_______
No. of Threads
As you can see, the new scheduler is in fact O(1), but it adds so much initial
overhead that task switching is slower than under the old scheduler until you
have a certain number of threads, labeled p above. My benchmarking
experiments put p at around 740 threads.
Now, this is obviously good
for high-end applications that run thousands of processes concurrently, but
the average Linux user rarely has more than 100 processes on his machine at a
time. The vast majority of servers rarely exceed more than 250 concurrent
processes. In these cases, the O(1) scheduler is cutting their computer's
performance almost in half.
What we're seeing here is the likes of IBM and Sun
putting their needs before those of the hobbyist hackers and users who make
up the majority of the Linux user base. While the O(1) scheduler project is
a noble cause and should certainly be made available as an option for
those few applications that benefit from it, outright replacing the old
scheduler at this point is a folly.
Um, doing benchmarks between an Athlon XP and a Pentium 4 is folly. The P4 has notoriously slow context switching performance. Also, if you are running a small number of threads, your computer isn't spending a whole lot of time thread switching anyway, so the hit doesn't really affect you. When you have lots of threads, scheduling becomes far more important, and so the increase is much more noticible.
The P4 has notoriously slow context switching performance.
The Pentium IV has notoriously slow performance in some areas, but a processor being slow in context switching doesn't make sense. Depending on the context(English context, not computer context), context switching is either the system switching from kernel mode (running kernel code) to user mode (user applications) or vice-versa, OR it is simply moving from one execution path to another (as was scheduled by the, um, scheduler)
The processor has nothing to do with it. Context switching in BOTH instances is handled entirely by the operating system. While Windows NT 3.1 may have "slow context switching" and Linux with the O(1) scheduler may have "fast context switching", the Pentium IV cannot "have fast or slow context switching" because it doesn't have anything to do with the Pentium IV.
One might theorize that the original poster's comment was refering to the Pentium IV being particularly slow at the actual instructions used in context switching. Regarding the discussion of the kernel scheduler, the meaning of "context switching" that we are using probably refers to switching between tasks (AKA multitasking), so the important instructions would simply be jump instructions like "jmp", which AFAIK are not particularly slow on the Pentium IV like, say, bit shifting (which is glacially slow on the Pentium IV).
The instructions involved in the context switch are slow on the Pentium 4. The P4 has a long internal pipeline to flush, and a huge amount of internal state to synchronize, which makes context switches slow. For example, an interrupt/return pair take 2000 clock cycles on the P4!
The Pentium IV has notoriously slow performance in some areas, but a processor being slow in context switching doesn't make sense.
Well, those of us who actually design CPUs and stuff rather than pretend we know about them use the term "context switch" to describe dumping the current CPU state (to memory, other registers, whatever) then loading a new state, or something logically equivalent. This can be for a thread switch, interrupt handling, whatever.
The processor has nothing to do with it.
A CPU level context switch is part of what happens during an OS level context switch, and therefore has a significant effect on OS performance.
well, this guy is apparently a troll, but just for the sake of argument... Anyone repeating his test would probably find very similar results. HZ (the constant controlling how often the scheduler runs) has been changed from 100 to 1000, improving smoothness for many things (multimedia apps espescially) at the cost of making the schduler overhead 10 times what it was before.
Luckily, it was very small before, and it's still very small. Maybe it went from taking 0.001% of your CPU power to 0.01%:-). The *only* times the scheduler was really a problem before were a) when it made bad choices and b) when there was gazillions of tasks. The rest of the time, it was totally negligible.
So, even if the scheduler did slow down by a factor of 2 as he claimed (and in fact, it would have slowed down by a factor of 10 due to the HZ changes so his claim would leave O(1) 5 times faster than the old scheduler) it really wouldn't matter to an ordinary desktop/server. The scheduler time is too small to be important on normal machines .
I don't know much about linux, but seeing these benchmarks suggests that the performance is getting faster confuses me. I recently tried linux again when Mandrake 8 came out, and on a hellafast computer it was taking a long time for basic things to be accomplished. I thought winXP was slow compared to win2k, but this mandrake was taking even longer to do the comparable things like open Konqueror, and open text-editing programs. Is there a simple explanation?
I believe part of the slow load times is due to the fact that glibc on most modern distros does not include object preloading technology. The latest glibc has this I believe, and the only distro I know that uses the latest glibc are currently beta. I think around the Redhat 9.2 timeframe you will see linux that is more than suitable for the desktop.
There was at least one performance bug with Mandrake 8 that resulted in extremely slow X performance. I don't remember the details but maybe someone will share them...
The long answer is that anything written in C++ on Linux will load slow (but should run fairly quick once loaded) because of something to do with loading the C++ libraries and some other compiler gook. I can't remember where I read it, or how I found it on google, but aparently this will be fixed soon in glibc.
Of course, I could be WAY off, so if someone could back me up...
I tried Mandrake 9 recently, and it turned me off linux for weeks. This is what you do: get a different distro. RedHat is good, and KNOPPIX is great too (knoppix.de).
Most distributions go with a more or less specific kernel (586/686/Atholon, etc) but only i386 applications. Newer processors only really sing with specially compiled code.
A distribution such as Gentoo [gentoo.org] may not be the easiest to install but you get the whole gubbins, X, Gnome or KDE and the apps compiled for your system.
Microsoft tend to distribute generic code, and if you are lucky you may get a model specific dll. What Microsoft can not do is to distribute code that can be compiled for a specific model, well not until they deliver code that gets compiled during setup. Note, this can be done with optimisable intermediate code rather than source, but it wouldn't be easy.
Please note that I *did* state that it wasn't for the newbie. However, if the packaging was improved (i.e., loading of groups of programs), I don't see why compilation from source can't be hidden from installer and a better configuration shell added for configuration.I wouldn't not recommend Gentoo to someone who was a novice. Did you try configuring Linux in the early days? A learning experience, but not neccessarily a bad one. However if somone doesn't want to learn a lot about Linux, then, yes, Gentoo isn't the best.
Installation from source allows a system to be customised and is extremely powerful. With many home systems equipped with significant hard disk and memory, why can't a system rebuild itself overnight automatically?
The user stated that they were inexperienced. Inexperience is not an absence of intelligence and this is clearly a person who is at least willing to try things.
I'm sorry that you consider that an inexperienced person should be afraid of other ways of installing. Please remember that the idea of a GUI installer for an operating system is quite new. Haven't you ever tried to get an operating system up and running with inadequate documentation, a lot of unwritten dependancies and nothing but the command line?
I don't consider myself an evangelist for Gentoo but I want to explain that there is a faster way to run Linux. Wouldn't you agree? Would you be happier putting in a slow distro, getting the hang of it and then moving onto something faster, if you know that things *will* get better?
I found Gentoo relatively easy to get up. It took a lot loooonger to get it do what I wanted though.
No, I have worked on a lot of computers in the past and have had to struggle through a lot worse than this.OTOH, my notebook is running RH and I keep Gentoo for some other systems.
I've been in a much worse situation courtesy of Redmond with their dependency hell which forced an operating system reinstall.
Really? Well I'm running it right now (on both laptop and desktop), and IMO, it's great for semi-newbies. It's a great idea, because if you screw something up, you just reboot. And then, when you're ready, you can install it to the HD, and then you've got a fully loaded install of Debian.
and then you've got a fully loaded install of Debian.
with:
and then you've got close to Debian, with pretty much everything you'd need for work installed (i.e. KDE, OpenOffice, WINE, Mozilla, and pretty much everything else).
For all of you who have issues compiling 2.5.x let me remind you that 2.5.x are development kernels. They aren't perfect. They may have issues building in certain configurations, because they are development kernels.
Why did they get rid of the old make xconfig, it sucks now, it uses gconf or kconfig, stupid.... Makes life harder, I wish they would never had changed, the old system rocked, now I have to have either gnome or kde installed, or use make menuconfig from the terminal! arg! Come on go back to normal! But good work Linus! keep up the great work!
by Anonymous Coward writes:
on Saturday November 16, 2002 @09:55PM (#4688477)
I'm a big time VMware user (I use it for testing and Windows). I usually have 2 or 3 VMware machines running at any given time and I have plently of memory (usually 1GB, sometimes more). However, the disk buffer (or disk caching) of Linux sucks ass. I'm not kidding, if I have 1GB of memory, 900+ megs will be used for disk buffers and my very important interactive VMware processes will be swapped out to the slow disk swap file. Just using one of the VMware processes causes a lot of disk I/O and all that I/O gets loaded into the disk buffers in memory then when I go to use another VMware process it has to come out of swap. Linux is pretty bad about this with normal processes, but VMware exasperates the problem.
To boil it down: The disk buffering in 2.4 is way , way too aggressive and I haven't figured out a way to fix it. I need to be able to either limit the total ammount of memory the buffers will use or a better method would be to tag certain processes so that they will never be moved into swap for disk buffers (moving to swap "normally" is OK, just not for disk buffers). Or maybe just make it never swap out any process for disk buffers.
It seems Windows uses a more reasonable disk buffering technique and VMware works better there (especially when using several instances). I don't want to use Windows as my primary OS though because I like the built-in disk encryption and network security of Linux (the ip filter stuff is much better than Windows).
Anyone know if 2.5 has got any better disk buffering?
> would be to tag certain processes so that they > will never be moved into swap for disk buffers
I beleive that this is what the sticky bit was intended for. Before I go about explaining what it is and how to use it, does anyone know if Linux actually *honors* the sticky bit or does it just have it for compatibility?
man chmod:...and the Linux kernel ignores the sticky bit on files. Other ker-
nels may use the sticky bit on files for system-defined purposes. On
some systems, only the superuser can set the sticky bit on files. -- Matt
Inspired by the numbers and new "snappyness" under load, I decided to download and compile the 2.5.47 kernel, and see for myself, disappointed is all I can say, 2.4.19 with preempt and low-latency is snappier by quite a bit than 2.5.47. My test isn't quite as numeric as the stories... I simply start ripping a DVD (oops did I say that...) to avi, and compiling something (in this case xmms) and then get my term window, open limewire, and drag the term window around on the maximized limewire window, under 2.4.19 I can never get the whole window grey (as I drag the term window it acts as an eraser on the limewire window, until that window is redrawn) undery 2.5.47 I can easily grey out the entire limewire window, normally for 2 or 3 seconds before it redraws... under 2.4.19 I can maybe grey out about 1 term window worth of area in the limewire window before it is redrawn...
Of course it states in the story that 2.5 has not been tuned at all really, so hopefully this will improve, but for now I'm sticking with 2.4.19 preempt low latency
I use gentoo, the gentoo-sources kernel has low latency preempt pre-patched but it doesn't have xfs, they have an xfs kernel as well, but I don't know if it has low latency and preempt already, I seem to remember something about low latency and or preempt causing problems with the xfs kernel, but I might just be smoking crack. check forums.gentoo.org. Ok I just did, and yeah, preempt + XFS is a bad idea, much instability, the patches fight with each other (XFS trying to journal, preempt trying to let something else use the cpu) result, massive instability. So, no I don't think you can get all three to play nice, but you can run low-latency+XFS, or low-latency+preempt, but you can't throw preempt in with XFS... gentoo is nice and patches the kernel automatically, if not running gentoo, you'd have to patch the kernel yourself...
I've tried manually patching XFS, O(1) and a couple of other odds and ends (latency, preempt) and wind up with something that doesn't even boot or crashes/panics right after words.
So thanks...that pretty much confirms that it's not something I'm doing...;)
I'm still milling over which kernel to use with my old 486's.
Right now, they're running 2.2.10, iirc; whatever the debian stable had on her boot disks.
I'm not going to compile any kernels until my dual ppro is fixed, because compiling a kernel on a PoS 486 portable is not fun:P
Anyone have any comments/recommendations on if/which new kernels are good to run on old shite?
I've just installed NetBSD+Apache on a 33Mhz 486 laptop with 8MB RAM. I did
recompile the kernel on another machine, though. This was essential
because the generic laptop kernel was taking too much memory. The end
result is nice and quite smooth.
Of course there are lighter webservers such as Boa, but I needed PHP,
and the Apache process is taking less than a megabyte of memory.
I'd LOVE to try out the 2.5 series, but because LVM is still not in there (not a week ago at least), and I have all my data (movies, oggs, etc) on LVM, I'm unable to use it...:(
Does anyone have a clue when there will be LVM for 2.5?
"It's impossible to get a speedup of more than 10 with any processor-related activities.
Using Amdahl's Law, one can find that Speedup = (s + p ) / (s + p / N ) where N is the number of processors, s is the amount of time spent (by a serial processor) on serial parts of a program and p is the amount of time spent (by a serial processor) on parts of the program that can be done in parallel."
While I'm no expert in software engineering (and I haven't really looked over the equation you put too closely) I think it assumes the original was written with some sort of intelligence behind it. I bet I could write some really atrocious code that would be so incredibly inefficient that almost anyone else could get a huge performance gain from it.
I'm not sure if I would have to try hard or not try at all to write really bad code.:-)
It actually doesn't matter, because speedups are calculated using algorithm speed, not clock ticks or anything concrete like that. In other words, speedups only give you part of the story. A poorly-written program using a O(log n) search algorithm may be slower than a well-written program using a O(n) one; but in the normal case, the programmers will be sufficiently competent for the better algorithm to make the better program as well.
There are a whole bunch of ways you can conceal information or mislead readers by claiming really good big-oh times, but this isn't really one of them. (How about a perfect hash table that calculates keys using a O(m^n) hashing algorithm?)
Amdahl's law is used to predict speed increases for multi-processor systems. In this case, you can see a gain of more than 10 if you have enough processors in use, and the majority of the work is in parallel.
I think it assumes the original was written with some sort of intelligence behind it. I bet I could write some really atrocious code that would be so incredibly inefficient that almost anyone else could get a huge performance gain from it.
It doesn't really assume anything. The equation pertains to gains simply by increasing the number of parallel processors, not the strength of the code.
Anyways, this is probably redundant, but the big gains from the new kernel is that the amount of parallel processes are increased and the serial processes decreased. In a single processor system, performance decreases as there is more overhead in swapping processes in and out. In multi-processor systems, the gains would be enormous.
For those of you wondering, this is not a proof that you cannot optimize something to be more than 10 times faster in general.
For example, suppose you have an algorithm A that takes X time. And then suppose you change it to algorithm B that takes 11X time by making it do algorithm A 11 times. Well algorithm B can be optimized to be 11 times faster by making it algorithm A instead, since they give the same result.
Anyway, just wanted to make sure no one was missing the "processor-related activities" clause in your statement.
Informative? I don't think so. (Moderators, please check the crack that you are smoking)
Amdahl's law makes a (wrong) statement about the amount of speedup that can be obtained through parallel as opposed to serial execution. (By the way, the number 10 doesn't come into it anywhere. You might as well have mentioned the speed of sound.).
Here, we are talking about the comparative performance of two operating systems running on the same number of processors. Since there is no limit on how stupidly the original could have been implemented, there is correspondingly no limit on the amount of possible speedup due to a better implementation.
Anyway, if you think you know something about Amdahl's law, you need to google for "Gustafsons's law". Executive summary: Amdahl was wrong. Exactly how wrong is still a matter of debate, but it's generally agreed that it lies somewhere between "very" and "completely". Please don't quote this nonsense in support of anything, just don't do it.
It also is quite unlikely, since Ahmdahl's law is a trivial observation that is completely independent of parallelization or even software engineering (it also applies to hardware design or even accounting). Basically, it says: if initially only 10% of X (CPU cycles, money, whatever you are trying to save) is spent in the part you are optimizing, there is an upper bound of 10% to the X you can save.
It also is quite unlikely, since Ahmdahl's law is a trivial observation that is completely independent of parallelization or even software engineering (it also applies to hardware design or even accounting). Basically, it says: if initially only 10% of X (CPU cycles, money, whatever you are trying to save) is spent in the part you are optimizing, there is an upper bound of 10% to the X you can save.
Sorry, wrong law. You seem to be thinking "90% of the time in 10% of the code", a rule of thumb that nobody to my knowledge has ventured to dignify with the term "law". Amdahl's Law (which IMHO doesn't deserve the dignity either) was an attempt to make a statement about the limitations of parallel computing. Relying on wrong assumptions, he drew wrong conclusions, and in the event, parallel clusters have gone on to scale nearly linearly into the tens of thousands of processors, a result he would have liked to have proved impossible.
Anyway, if you think you know something about Amdahl's law, you need to google for "Gustafsons's law". Executive summary: Amdahl was wrong.
If you had actually tried using google for "Gustafson's law" you would have seen as the first link a paper claiming it and Amdahl's law are identical, not that Amdahl was wrong.
The parent post should be ignored. The information content, while real, is misapplied, and that "10" number is pulled out of his ass.
That is what I thought at first, too. But the orignal poster is right (in a way), a factor of 10 is about the best you can hope for when parallelizing code. Since Amdahl's (or some other guy's) law also says something like 90% of the time is spent in 10% of the code. That makes s=10 and p=90. The limit of his equation, (s+p)/(s+p/n), as n goes to infinity is 10. A number not pulled out of anyone's ass.
Maybe the original poster should be moderated down because I don't think the stuff here is really about parallelization (they talk about speed ups on uniproc systems too), but for the parallel case, he seems to be right.
the orignal poster is right (in a way), a factor of 10 is about the best you can hope for when parallelizing code. Since Amdahl's (or some other guy's) law also says something like 90% of the time is spent in 10% of the code. That makes s=10 and p=90.
No it doesn't. How do you know the 90% is serializable and the 10% isn't? Answer: you don't, there is no relationship whatsoever.
In a reply on lkml to Aaron Lehmann's praising of the contest results of the latest 2.5-mm kernel Andrew Morton [interview] explains some of the important performance and design differences between the 2.4 stable series and the 2.5 development series accompanied by illustrating benchmarks.
Most significant gains can be expected at the high end such as large machines, large numbers of threads, large disks, large amounts of memory etc. [...] For the uniprocessors and small servers, there will be significant gains in some corner cases. And some losses. [...] Generally, 2.6 should be "nicer to use" on the desktop. But not appreciably faster.
From: Aaron Lehmann To: linux-kernel Subject: Re: [BENCHMARK] 2.5.47{-mm1} with contest Date: Mon Nov 11 2002 - 18:04:53 AKST
On Tue, Nov 12, 2002 at 10:31:38AM +1100, Con Kolivas wrote: > Here are the latest contest (http://contest.kolivas.net) benchmarks up to and > including 2.5.47.
This is just great to see. Most previous contest runs made me cringe when I saw how -mm and recent 2.5 kernels were faring, but it looks like Andrew has done something right in 2.5.47-mm1. I hope the appropriate get merged so that 2.6.0 has stunning performance across the board.
From: Andrew Morton To: linux-kernel mailing list Subject: Re: [BENCHMARK] 2.5.47{-mm1} with contest Date: Tue Nov 12 2002 - 02:04:23 AKST Aaron Lehmann wrote: > > On Tue, Nov 12, 2002 at 10:31:38AM +1100, Con Kolivas wrote: > > Here are the latest contest (http://contest.kolivas.net) benchmarks up to and > > including 2.5.47. > > This is just great to see. Most previous contest runs made me cringe > when I saw how -mm and recent 2.5 kernels were faring, but it looks > like Andrew has done something right in 2.5.47-mm1. I hope the appropriate get merged so that 2.6.0 has stunning performance across > the board.
Tuning of 2.5 has really hardly started. In some ways, it should be tested against 2.3.99 (well, not really, but...)
It will never be stunningly better than 2.4 for normal workloads on normal machines, because 2.4 just ain't that bad.
What is being addressed in 2.5 is the areas where 2.4 fell down: large machines, large numbers of threads, large disks, large amounts of memory, etc. There have been really big gains in that area.
For the uniprocessors and small servers, there will be significant gains in some corner cases. And some losses. Quite a lot of work has gone into "fairness" issues: allowing tasks to make equal progress when the machine is under load. Not stalling tasks for unreasonable amounts of time, etc. Simple operations such as copying a forest of files from one part of the disk to another have taken a bit of a hit from this. (But copying them to another disk got better).
Generally, 2.6 should be "nicer to use" on the desktop. But not appreciably faster. Significantly slower when there are several processes causing a lot of swapout. That is one area where fairness really hurts throughput. The old `make -j30 bzImage' with mem=128M takes 1.5x as long with 2.5. Because everyone makes equal progress.
Most of the VM gains involve situations where there are large amounts of dirty data in the machine. This has always been a big problem for Linux, and I think we've largely got it under control now. There are still a few issues in the page reclaim code wrt this, but they're fairly obscure (I'm the only person who has noticed them;))
There are some things which people simply have not yet noticed.
Andrea's kernel is the fastest which 2.4 has to offer; let's tickle its weak spots:
Run mke2fs against six disks at the same time, mem=1G:
2.4.20-rc1aa1: 0.04s user 13.16s system 51% cpu 25.782 total 0.05s user 31.53s system 63% cpu 49.542 total 0.05s user 29.04s system 58% cpu 49.544 total 0.05s user 31.07s system 62% cpu 50.017 total 0.06s user 29.80s system 58% cpu 50.983 total 0.06s user 23.30s system 43% cpu 53.214 total
2.5.47-mm2: 0.04s user 2.94s system 48% cpu 6.168 total 0.04s user 2.89s system 39% cpu 7.473 total 0.05s user 3.00s system 37% cpu 8.152 total 0.06s user 4.33s system 43% cpu 9.992 total 0.06s user 4.35s system 42% cpu 10.484 total 0.04s user 4.32s system 32% cpu 13.415 total
Write six 4G files to six disks in parallel, mem=1G:
2.4.20-rc1aa1: 0.01s user 63.17s system 7% cpu 13:53.26 total 0.05s user 63.43s system 7% cpu 14:07.17 total 0.03s user 65.94s system 7% cpu 14:36.25 total 0.01s user 66.29s system 7% cpu 14:38.01 total 0.08s user 63.79s system 7% cpu 14:45.09 total 0.09s user 65.22s system 7% cpu 14:46.95 total
2.5.47-mm2: 0.03s user 53.95s system 39% cpu 2:18.27 total 0.03s user 58.11s system 30% cpu 3:08.23 total 0.02s user 57.43s system 30% cpu 3:08.47 total 0.03s user 54.73s system 23% cpu 3:52.43 total 0.03s user 54.72s system 23% cpu 3:53.22 total 0.03s user 46.14s system 14% cpu 5:29.71 total
Compile a kernel while running `while true;do;./dbench 32;done' against the same disk. mem=128m:
2.5.46: Throughput 19.3907 MB/sec (NB=24.2383 MB/sec 193.907 MBit/sec) Throughput 16.6765 MB/sec (NB=20.8456 MB/sec 166.765 MBit/sec) make -j4 bzImage 412.16s user 36.92s system 83% cpu 8:55.74 total
2.5.47-mm2: Throughput 15.0539 MB/sec (NB=18.8174 MB/sec 150.539 MBit/sec) Throughput 21.6388 MB/sec (NB=27.0485 MB/sec 216.388 MBit/sec) make -j4 bzImage 413.88s user 35.90s system 94% cpu 7:56.68 total - fifo_batch strikes again
It's the "doing multiple things at the same time" which gets better; the straightline throughput of "one thing at a time" won't change much at all.
It is simple, tar -xvzf linux-{current}.tar.gz. cd linux; make menuconfig; make dep bzImage modules modules_install Assuming you can do a bzImage on your platform.... Anyway, It's not so hard, you just need to know what the hardware in your machine is, and what you actually want to work out of that hardware, then turn it on. {grin}
It is simple, tar -xvzf linux-{current}.tar.gz. cd linux; make menuconfig; make dep bzImage modules modules_install
You're joking, right? How many options in 2.5.47 must be selected in order for your run of the mill $9 generic PS/2 keyboard to work? I can't tell you how much fun it was building 2.5.47, missing one *somewhere* and suddenly I couldn't do anything because my keyboard stopped working.
The kernel only has an expert mode. It would be nice if there were a higher order config that asked you basic questions and built the things you were most likely to need, with the option of going into a more expert mode if you needed to fine tune something.
A home user (meaning non hacker) never has the need to recompile a kernel. NEVER. Your distribution has all the modules available and if you're running the more popular distros, they will even detect your hardware and load the module for you.
Sometimes people shouldn't mess with stuff, the kernel is one of those things. RedHat does a good job with their builds and an average user doesn't need to rebuilt it at all. A more experienced user might want to tweak, but then he can use make menuconfig or make config...and choose his options.
The kernel only has an expert mode. It would be nice if there were a higher order config that asked you basic questions and built the things you were most likely to need
There are. They are called RedHat, Mandrake, SuSE, etc.
You mean, when will compiling a Linux kernel, which most users will never need to do, become as straightforward as recompiling your Windows kernel, which you can't do?
make xconfig && make dep && make bzImage && make modules && make modules_install && make install
make oldconfig dep clean modules modules_install install
Yes oldconfig is nice when you already have a.config file from a previous kernel. But I have really been missing xoldconfig, that will give me the xconfig interface but with only the questions I'd need to answer when using oldconfig.
It'll be quite a while before recompiling a kernel gets any simpler. Recompiling assumes that you know (somewhat) what you're doing. Keep at it. It took me at least 10 tries before I compiled a bootable kernel.
quick hint; isnstall the kernel sources that came with your dist. Use the.config file found in this to compile first. These are the settings that your kernel was compiled with. The you can use make xconfig alter a known working config. Good luck.
I know exactly how you feel. I actually use linux quite a bit, but it's all precompiled suse packages for the most part except when I need oddball stuff like gif support for GD. Then it's time to compile php.
I'm blessed to have friends that know more than I do and are willing to help me out when I get stuck.
Compiling the kernel is something I haven't attempted since 386DX40 days.
I grew up with DOS, too. If you installed Borland's Sidekick (many did) successfully, you can compile. That's the stuff that went on in Sidekick's install process: it used Borland's compiler -- and that's why it ran so well.
I just finished *this morning* compiling a 2.2.22 (yes, RH-6.2) for my box. Use the.config file from the stock kernel sources for your distro, usually in/usr/src/linux* (you may have to install them) open a root terminal window in/usr/src, issue `make xconfig' choose the.config from the load configuration file box and start disabling everything you KNOW you do not need. The help buttons are mostly very helpful. If your box is used for web surfing, compile in ppp, same with lpd if you need to print. Unless you have a SCSI drive, disable all SCSI boxes. Load as much of your equipment into the kernel as you can, and disable the modules that enable hardware and features you don't have or use, like firewire or USB. Make sure equipment you DO HAVE are supported either in the kernel or as a module. Keep doing `next' until the end, when there is no `next.' Choose Main Menu,
Then save the new configuration. Do a 'make dep bzImage modules modules_install' and copy the ~/System.map file as System.map-new.kernel.number and drill down to/usr/linux/arch/i86/boot and copy bzImage as vmlinuz-number.of.kernel to/boot.
from/usr/src/linux , do make modules_install. Modify/etc.lilo.conf to include the new kernel and System.map. Activate lilo (/sbin/lilo -v -v).
Reboot into new kernel. If you get lots of error messages about modules not loading, reboot at the command prompt, and everything will have been rewritten magically. Use your new kernel for testing. You may find you want to try another configuration. Do it all again, changing the Makefile each time under line 3 EXTRAVERSION with another digit or letter to keep it from overwriting a working kernel when you copy in to/boot and to keep the modules straight (though they appear not to care....)
Frankly, I've tried nine builds and although my kernels are smaller than stock, use about 5Kb less RAM and benchmarks seem to indicate about 5-6 per cent increase in speed, I feel no difference in use.
I do feel better knowing I am using the latest (and perhaps the last) kernel in the 2.2.x series, though. FWIW.
I like this one:
"WXP......... more secure than Linux"
Well i lost track but there are far more than 200 security articles for xp last time i checked. That page with the number of articles no longer appears. I wonder why:P
by Anonymous Coward writes:
on Saturday November 16, 2002 @09:24PM (#4688388)
I don't know about troll, but perhaps just an overactive imagination:)
Apparently he works for a [slashdot.org] development firm, [slashdot.org] studies meterology, [slashdot.org] works for Verizon store at a local mall, [slashdot.org] owns a chain of pet stores in London, and [slashdot.org] has a thing for CmdrTaco.
Triple? (Score:5, Funny)
Damn, I wish my video card had kernel updates
Re:Triple? (Score:4, Informative)
Re:Triple? (Score:1, Interesting)
Re:Triple? (Score:2)
I'll do what ever it takes to squeeze the last bit (Score:1)
Andrew Morton?? (Score:2, Funny)
I'm glad you put the "of EXT3 fame" bit, I was worried the article might be talking about the infamous author [amazon.com].
Although he might end up on the front page of /. if he writes an unauthorized biography of Mr. Gates, what kind of juice could be dragged up from the past... I wonder?
Re:Andrew Morton?? (Score:1, Informative)
2.5 (Score:5, Funny)
Re:2.5 (Score:1)
Re:2.5 (Score:1)
in certain cases... (Score:2)
So maybe this will help some people stand up to being /.'ed.
Nice to see Linux "Growing Up" (Score:4, Interesting)
I think one interesting thing that could come out of this is that IBM (and others) will be pushed more and more towards a pure service or application only niche. They won't always be able to say, "Sure Linux is great for the workstation, but what about your 8 TB database?" There's a ways to go, but a lot of the features are falling into place.
Having a unified OS from your palmtop to your TB file server will open up a lot of possibilities for people. My personal interest is in a next level of integration which is more natural to use and easier to develop, and we're getting close.
Re:Nice to see Linux "Growing Up" (Score:3, Insightful)
Re:Nice to see Linux "Growing Up" (Score:3, Interesting)
I'm not talking now, nor even tomorrow, but in 5-10 years, I think we could see a very different landscape in how old school commercial software and hardware companies (or, in IBM's case, departments) work.
If you can spend $1 million on developing your whizzy new file system, or you can use something that's freely available (or spend $100,000 to tweak it), then the economics of it start to push people out of commercial development in some areas, especially around OS and OS functionality. Instead, you just consult, or deploy, or support and such.
Re:Nice to see Linux "Growing Up" (Score:1)
Excellent (Score:3, Troll)
Based on this image (0202_lab_xp_4.gif [interex.org]), one can see that large volumes of asynchronous I/O is, as the author puts it, the "Achilles' Hell" of Linux.
The Linux kernel itself in all versions 2.5 serializes disk Input/Output with a single spinlock.
(The yellow is the Windows XP box; the green line is the data for the SuSE Linux pee sea)
Karma whore or just trolling? (Score:3, Insightful)
Re:Karma whore or just trolling? (Score:1)
Re:Karma whore or just trolling? (Score:2)
I like this one better... (Score:2, Flamebait)
Needless to say, context is everything.
You aree arguing the wrong point (Score:2)
Re:You aree arguing the wrong point (Score:2)
On the contrary, you stated that there was a reason why serious Linux users needed the 2.5 kernel. But if Linux and XP perform the same overall, why do serious Linux users need the 2.5 kernel? They don't.
I'm not saying that this particular article doesn't make certain conclusions about asynchronous I/O. It's a simple fact that it does. But you made a conclusion based on those conclusions, and your conclusion is what disagreed with.
I also pointed out that presenting a single image without understanding the full context of the article is silly. The tests in the article you quoted were run on laptop computers. What kind of "serious Linux users" with great need for asynchronous I/O use a laptop for that purpose? The image I quoted showed performance in a more typical laptop use pattern (according to the article's author). For that matter, would they use ReiserFS, or would they use XFS, JFS, or another filesystem?
Re:Excellent (Score:2)
Someone, please mod the partent back up. He wasn't trolling and was simply stating fact!
Re:Excellent (Score:2)
Re:Excellent (Score:2)
Meanwhile I do recall seeing such a graph much like, if not that graph, by Open Bench Labs (name my not be correct -- it's been a while). Not only that, but the graph EXACTLY matches known expectations on a common Linux AIO Linux implementation (which is purely userland). This is why kernel level AIO implementations have been underway for some time now.
In other words, you may not like the message but it EXACLY matches the current state of some AIO implementations on Linux. Call is a lie if you like, meanwhile those of us that know, will simply nod, move on, and occationally laugh at those that continue to hide their heads in the sand.
Believe it or not, Linux isn't the be-all, end-all of OS's. If it were, there wouldn't be a need for the 2.6 kernel.
Re:Excellent (Score:2)
And how long have you been comparing AIO implementations?
So what does this mean for the everyday linux user (Score:3, Interesting)
Re:So what does this mean for the everyday linux u (Score:2)
Re:So what does this mean for the everyday linux u (Score:2)
Fact is, anyone that heavily uses their Linux box will see some difference. It's just that the heavier your box gets used, the bigger difference you'll see.
Those that do little serious multitasking may see "smoother" multitasking but little more. Those that perform concurrent compiles, heavy CPU or I/O database servers, big time-share systems, etc, will see larger and larger note worthy gains.
Re:So what does this mean for the everyday linux u (Score:5, Informative)
The main goals are interactivity (input gets handled quickly), low latency (your mp3 player gets a chance to send the next second of audio to the sound card before this second is over), and fairness (every program makes at least a little progress after a short amount of time).
Re:So what does this mean for the everyday linux u (Score:5, Informative)
However, the overall responsiveness of the system is improved. Most people who have used it have claimed that it felt much faster than the 2.4 series. You won't have starved processess.
This means if you're running XMMS and you compile a kernel, XMMS won't just hang until the compilation is done. The kernel developers have done a great job in improving -fairness- between processes.
Mostly, the results will be seen on Big Iron and server applications, but the overall desktop experience is expected to improve.
Re:So what does this mean for the everyday linux u (Score:2)
Performance gains mostly for high-end (Score:5, Interesting)
Some of the biggest improvements for desktop responsiveness can be found (for Kernel 2.4.x) at Con Kolivas' web site of performance linux patches [optusnet.com.au].
--
Re:Performance gains mostly for high-end (Score:2)
That would be why I said: "With some tasks more than tripling in performance,..."
I doubt any semi-knowledgeable person is going to take that statement to mean that kernel 2.6 makes a Linux system three times faster, but depending on what they use that system for, it may do just that. The performance figures are very respectable alone, but when you consider that they kernel hasn't even been frozen yet and that tuning hasn't begun, as I sais, the future looks very promising.
Re:Performance gains mostly for high-end (Score:4, Informative)
The fine-grained locking improvements on SMP will make it noticeably better for SMP boxes.
A very big improvement is that IDE has been parallelized, meaning that if you use multiple IDE devices at once you will see a "night and day" difference in performance.
If you are uniprocessor and all SCSI and already use low-latency patches, well, as you were.
This is This is the exact opposite of my findings. (Score:1, Troll)
A month ago, I downloaded the 2.5.44 kernel, and have been benchmarking it extensively on one of the Pentium 4 2GHz workstations in the computer lab. For a control, I ran a stock 2.4.19 kernel on the Athlon XP 2000+ machine next to it. My test consisted of running an increasing number of parallel processes each executing a for(;;) loop repeatedly nanosleep(2)ing for 10ms, thus yielding the scheduler every time they awake. This made sure that the scheduler was more or less the only thing running on the system, and that I could get an accurate count of the average task-switching time.
By gradually increasing the number of threads on each machine in parallel, I was able to graph the comparative performance of the two schedulers. The results do not bode well for the new scheduler: (forgive my somewhat clumsy approximation, text is not the best medium for graphic content)
As you can see, the new scheduler is in fact O(1), but it adds so much initial overhead that task switching is slower than under the old scheduler until you have a certain number of threads, labeled p above. My benchmarking experiments put p at around 740 threads.Now, this is obviously good for high-end applications that run thousands of processes concurrently, but the average Linux user rarely has more than 100 processes on his machine at a time. The vast majority of servers rarely exceed more than 250 concurrent processes. In these cases, the O(1) scheduler is cutting their computer's performance almost in half.
What we're seeing here is the likes of IBM and Sun putting their needs before those of the hobbyist hackers and users who make up the majority of the Linux user base. While the O(1) scheduler project is a noble cause and should certainly be made available as an option for those few applications that benefit from it, outright replacing the old scheduler at this point is a folly.
Re:This is This is the exact opposite of my findin (Score:5, Informative)
Re:This is This is the exact opposite of my findin (Score:2, Insightful)
The Pentium IV has notoriously slow performance in some areas, but a processor being slow in context switching doesn't make sense. Depending on the context(English context, not computer context), context switching is either the system switching from kernel mode (running kernel code) to user mode (user applications) or vice-versa, OR it is simply moving from one execution path to another (as was scheduled by the, um, scheduler)
The processor has nothing to do with it. Context switching in BOTH instances is handled entirely by the operating system. While Windows NT 3.1 may have "slow context switching" and Linux with the O(1) scheduler may have "fast context switching", the Pentium IV cannot "have fast or slow context switching" because it doesn't have anything to do with the Pentium IV.
One might theorize that the original poster's comment was refering to the Pentium IV being particularly slow at the actual instructions used in context switching. Regarding the discussion of the kernel scheduler, the meaning of "context switching" that we are using probably refers to switching between tasks (AKA multitasking), so the important instructions would simply be jump instructions like "jmp", which AFAIK are not particularly slow on the Pentium IV like, say, bit shifting (which is glacially slow on the Pentium IV).
Re:This is This is the exact opposite of my findin (Score:2)
I think some processors have multiple register sets, so threads do not have to thrash the same set of registers for every thread context switch.
Re:This is This is the exact opposite of my findin (Score:4, Informative)
Re:This is This is the exact opposite of my findin (Score:2)
The Pentium IV has notoriously slow performance in some areas, but a processor being slow in context switching doesn't make sense.
Well, those of us who actually design CPUs and stuff rather than pretend we know about them use the term "context switch" to describe dumping the current CPU state (to memory, other registers, whatever) then loading a new state, or something logically equivalent. This can be for a thread switch, interrupt handling, whatever.
The processor has nothing to do with it.
A CPU level context switch is part of what happens during an OS level context switch, and therefore has a significant effect on OS performance.
If you'd read his past posts... (Score:1)
Re:This is This is the exact opposite of my findin (Score:3, Interesting)
Luckily, it was very small before, and it's still very small. Maybe it went from taking 0.001% of your CPU power to 0.01%
So, even if the scheduler did slow down by a factor of 2 as he claimed (and in fact, it would have slowed down by a factor of 10 due to the HZ changes so his claim would leave O(1) 5 times faster than the old scheduler) it really wouldn't matter to an ordinary desktop/server. The scheduler time is too small to be important on normal machines .
inexperience (Score:1, Flamebait)
Re:inexperience (Score:1, Informative)
Re:inexperience (Score:2)
Re:inexperience (Score:2, Informative)
The long answer is that anything written in C++ on Linux will load slow (but should run fairly quick once loaded) because of something to do with loading the C++ libraries and some other compiler gook. I can't remember where I read it, or how I found it on google, but aparently this will be fixed soon in glibc.
Of course, I could be WAY off, so if someone could back me up...
Re:inexperience (Score:1)
Gentoo (Score:2)
A distribution such as Gentoo [gentoo.org] may not be the easiest to install but you get the whole gubbins, X, Gnome or KDE and the apps compiled for your system.
Microsoft tend to distribute generic code, and if you are lucky you may get a model specific dll. What Microsoft can not do is to distribute code that can be compiled for a specific model, well not until they deliver code that gets compiled during setup. Note, this can be done with optimisable intermediate code rather than source, but it wouldn't be easy.
Re:Gentoo (Score:2)
Installation from source allows a system to be customised and is extremely powerful. With many home systems equipped with significant hard disk and memory, why can't a system rebuild itself overnight automatically?
Re:Gentoo (Score:2)
I'm sorry that you consider that an inexperienced person should be afraid of other ways of installing. Please remember that the idea of a GUI installer for an operating system is quite new. Haven't you ever tried to get an operating system up and running with inadequate documentation, a lot of unwritten dependancies and nothing but the command line?
I don't consider myself an evangelist for Gentoo but I want to explain that there is a faster way to run Linux. Wouldn't you agree? Would you be happier putting in a slow distro, getting the hang of it and then moving onto something faster, if you know that things *will* get better?
Re:Gentoo (Score:2)
No, I have worked on a lot of computers in the past and have had to struggle through a lot worse than this.OTOH, my notebook is running RH and I keep Gentoo for some other systems.
I've been in a much worse situation courtesy of Redmond with their dependency hell which forced an operating system reinstall.
Re:inexperience (Score:1)
What were your problems?
Re:inexperience (Score:1)
and then you've got a fully loaded install of Debian.
with:
and then you've got close to Debian, with pretty much everything you'd need for work installed (i.e. KDE, OpenOffice, WINE, Mozilla, and pretty much everything else).
2.5.x are DEVELOPMENT kernels (Score:1)
They aren't perfect.
They may have issues building in certain configurations, because they are development kernels.
Why did they get rid of the old make xconfig? (Score:1)
Re:Why did they get rid of the old make xconfig? (Score:3)
Making new configurators is simple with the new system and I'm sure there will be gtk/whatever else configurators available.
Disk buffers & memory subsystem updated?? (Score:5, Interesting)
To boil it down: The disk buffering in 2.4 is way , way too aggressive and I haven't figured out a way to fix it. I need to be able to either limit the total ammount of memory the buffers will use or a better method would be to tag certain processes so that they will never be moved into swap for disk buffers (moving to swap "normally" is OK, just not for disk buffers). Or maybe just make it never swap out any process for disk buffers.
It seems Windows uses a more reasonable disk buffering technique and VMware works better there (especially when using several instances). I don't want to use Windows as my primary OS though because I like the built-in disk encryption and network security of Linux (the ip filter stuff is much better than Windows).
Anyone know if 2.5 has got any better disk buffering?
Re:Disk buffers & memory subsystem updated?? (Score:3, Insightful)
> will never be moved into swap for disk buffers
I beleive that this is what the sticky bit was intended for. Before I go about explaining what it is and how to use it, does anyone know if Linux actually *honors* the sticky bit or does it just have it for compatibility?
Re:Disk buffers & memory subsystem updated?? (Score:3, Informative)
man chmod:
nels may use the sticky bit on files for system-defined purposes. On
some systems, only the superuser can set the sticky bit on files.
--
Matt
Disable your swap. (Score:3, Informative)
I'm serious. With another gig costing a hundred dollars -- maybe less -- the overhead of disk-based VM is just no longer justified.
WinXP benefits from this optimization even more than Linux.
Yours Truly,
Dan Kaminsky
DoxPara Research
http://www.doxpara.com
Re:Disk buffers & memory subsystem updated?? (Score:4, Informative)
For example, add the following to
vm.kswapd = 12800 512 8
When no free memory, kswapd will free more
memory than that in default.
Same tests on BSD? (Score:2)
Interesting, but not for me... (Score:2)
2.4.19 with preempt and low-latency is snappier by quite a bit than 2.5.47. My test isn't quite as numeric as the stories... I simply start ripping a DVD (oops did I say that...) to avi, and compiling something (in this case xmms) and then get my term window, open limewire, and drag the term window around on the maximized limewire window, under 2.4.19 I can never get the whole window grey (as I drag the term window it acts as an eraser on the limewire window, until that window is redrawn) undery 2.5.47 I can easily grey out the entire limewire window, normally for 2 or 3 seconds before it redraws... under 2.4.19 I can maybe grey out about 1 term window worth of area in the limewire window before it is redrawn...
Of course it states in the story that 2.5 has not been tuned at all really, so hopefully this will improve, but for now I'm sticking with 2.4.19 preempt low latency
Re:Interesting, but not for me... (Score:2)
Re:Interesting, but not for me... (Score:2)
the gentoo-sources kernel has low latency preempt pre-patched
but it doesn't have xfs, they have an xfs kernel as well, but I don't know if it has low latency and preempt already, I seem to remember something about low latency and or preempt causing problems with the xfs kernel, but I might just be smoking crack.
check forums.gentoo.org. Ok I just did, and yeah, preempt + XFS is a bad idea, much instability, the patches fight with each other (XFS trying to journal, preempt trying to let something else use the cpu) result, massive instability. So, no I don't think you can get all three to play nice, but you can run low-latency+XFS, or low-latency+preempt, but you can't throw preempt in with XFS... gentoo is nice and patches the kernel automatically, if not running gentoo, you'd have to patch the kernel yourself...
Re:Interesting, but not for me... (Score:2)
I've tried manually patching XFS, O(1) and a couple of other odds and ends (latency, preempt) and wind up with something that doesn't even boot or crashes/panics right after words.
So thanks...that pretty much confirms that it's not something I'm doing...
Older boxen? (Score:2, Interesting)
Right now, they're running 2.2.10, iirc; whatever the debian stable had on her boot disks.
I'm not going to compile any kernels until my dual ppro is fixed, because compiling a kernel on a PoS 486 portable is not fun
Anyone have any comments/recommendations on if/which new kernels are good to run on old shite?
Re:Older boxen? (Score:2)
Of course there are lighter webservers such as Boa, but I needed PHP, and the Apache process is taking less than a megabyte of memory.
LVM!!! (Score:2)
Does anyone have a clue when there will be LVM for 2.5?
Re:2.6 (Score:1)
Re:2.6 (Score:1)
Apperciated.
Re:Can't get a speedup of more than 10 (Score:3, Insightful)
Hmm..... sounds like modern business management
Re:Can't get a speedup of more than 10 (Score:3, Informative)
Using Amdahl's Law, one can find that
Speedup = (s + p ) / (s + p / N ) where N is the number of processors, s is the amount of time spent (by a serial processor) on serial parts of a program and p is the amount of time spent (by a serial processor) on parts of the program that can be done in parallel."
While I'm no expert in software engineering (and I haven't really looked over the equation you put too closely) I think it assumes the original was written with some sort of intelligence behind it. I bet I could write some really atrocious code that would be so incredibly inefficient that almost anyone else could get a huge performance gain from it.
I'm not sure if I would have to try hard or not try at all to write really bad code.
Re:Can't get a speedup of more than 10 (Score:2)
There are a whole bunch of ways you can conceal information or mislead readers by claiming really good big-oh times, but this isn't really one of them. (How about a perfect hash table that calculates keys using a O(m^n) hashing algorithm?)
Re:Can't get a speedup of more than 10 (Score:3, Interesting)
I think it assumes the original was written with some sort of intelligence behind it. I bet I could write some really atrocious code that would be so incredibly inefficient that almost anyone else could get a huge performance gain from it.
It doesn't really assume anything. The equation pertains to gains simply by increasing the number of parallel processors, not the strength of the code.
Anyways, this is probably redundant, but the big gains from the new kernel is that the amount of parallel processes are increased and the serial processes decreased. In a single processor system, performance decreases as there is more overhead in swapping processes in and out. In multi-processor systems, the gains would be enormous.
Well (Score:2, Informative)
For example, suppose you have an algorithm A that takes X time. And then suppose you change it to algorithm B that takes 11X time by making it do algorithm A 11 times. Well algorithm B can be optimized to be 11 times faster by making it algorithm A instead, since they give the same result.
Anyway, just wanted to make sure no one was missing the "processor-related activities" clause in your statement.
Re:Can't get a speedup of more than 10 (Score:4, Informative)
Amdahl's law makes a (wrong) statement about the amount of speedup that can be obtained through parallel as opposed to serial execution. (By the way, the number 10 doesn't come into it anywhere. You might as well have mentioned the speed of sound.).
Here, we are talking about the comparative performance of two operating systems running on the same number of processors. Since there is no limit on how stupidly the original could have been implemented, there is correspondingly no limit on the amount of possible speedup due to a better implementation.
Anyway, if you think you know something about Amdahl's law, you need to google for "Gustafsons's law". Executive summary: Amdahl was wrong. Exactly how wrong is still a matter of debate, but it's generally agreed that it lies somewhere between "very" and "completely". Please don't quote this nonsense in support of anything, just don't do it.
Wow, you can disprove Ahmdahl's law? (Score:5, Informative)
This is a major breakthrough in computer science.
It also is quite unlikely, since Ahmdahl's law is a trivial observation that is completely independent of parallelization or even software engineering (it also applies to hardware design or even accounting). Basically, it says: if initially only 10% of X (CPU cycles, money, whatever you are trying to save) is spent in the part you are optimizing, there is an upper bound of 10% to the X you can save.
I'm very interested in how you can disprove that.
Re:Wow, you can disprove Ahmdahl's law? (Score:3, Informative)
Such rhetoric, oh my.
This is a major breakthrough in computer science.
It also is quite unlikely, since Ahmdahl's law is a trivial observation that is completely independent of parallelization or even software engineering (it also applies to hardware design or even accounting). Basically, it says: if initially only 10% of X (CPU cycles, money, whatever you are trying to save) is spent in the part you are optimizing, there is an upper bound of 10% to the X you can save.
Sorry, wrong law. You seem to be thinking "90% of the time in 10% of the code", a rule of thumb that nobody to my knowledge has ventured to dignify with the term "law". Amdahl's Law (which IMHO doesn't deserve the dignity either) was an attempt to make a statement about the limitations of parallel computing. Relying on wrong assumptions, he drew wrong conclusions, and in the event, parallel clusters have gone on to scale nearly linearly into the tens of thousands of processors, a result he would have liked to have proved impossible.
Read more here [temple.edu].
Re:Wow, you can disprove Ahmdahl's law? (Score:2)
Re:Can't get a speedup of more than 10 (Score:2)
If you had actually tried using google for "Gustafson's law" you would have seen as the first link a paper claiming it and Amdahl's law are identical, not that Amdahl was wrong.
Re:By Sturgeon's Law (Score:2, Insightful)
That is what I thought at first, too. But the orignal poster is right (in a way), a factor of 10 is about the best you can hope for when parallelizing code. Since Amdahl's (or some other guy's) law also says something like 90% of the time is spent in 10% of the code. That makes s=10 and p=90. The limit of his equation, (s+p)/(s+p/n), as n goes to infinity is 10. A number not pulled out of anyone's ass.
Maybe the original poster should be moderated down because I don't think the stuff here is really about parallelization (they talk about speed ups on uniproc systems too), but for the parallel case, he seems to be right.
Re:By Sturgeon's Law (Score:2, Insightful)
No it doesn't. How do you know the 90% is serializable and the 10% isn't? Answer: you don't, there is no relationship whatsoever.
Sheesh.
I'm really sorry. (Score:5, Informative)
In a reply on lkml to Aaron Lehmann's praising of the contest results of the latest 2.5-mm kernel Andrew Morton [interview] explains some of the important performance and design differences between the 2.4 stable series and the 2.5 development series accompanied by illustrating benchmarks.
Most significant gains can be expected at the high end such as large machines, large numbers of threads, large disks, large amounts of memory etc. [...] For the uniprocessors and small servers, there will be significant gains in some corner cases. And some losses. [...] Generally, 2.6 should be "nicer to use" on the desktop. But not appreciably faster.
From: Aaron Lehmann
To: linux-kernel
Subject: Re: [BENCHMARK] 2.5.47{-mm1} with contest
Date: Mon Nov 11 2002 - 18:04:53 AKST
On Tue, Nov 12, 2002 at 10:31:38AM +1100, Con Kolivas wrote:
> Here are the latest contest (http://contest.kolivas.net) benchmarks up to and
> including 2.5.47.
This is just great to see. Most previous contest runs made me cringe when I saw how -mm and recent 2.5 kernels were faring, but it looks like Andrew has done something right in 2.5.47-mm1. I hope the appropriate get merged so that 2.6.0 has stunning performance across the board.
From: Andrew Morton
To: linux-kernel mailing list
Subject: Re: [BENCHMARK] 2.5.47{-mm1} with contest
Date: Tue Nov 12 2002 - 02:04:23 AKST
Aaron Lehmann wrote:
>
> On Tue, Nov 12, 2002 at 10:31:38AM +1100, Con Kolivas wrote:
> > Here are the latest contest (http://contest.kolivas.net) benchmarks up to and
> > including 2.5.47.
>
> This is just great to see. Most previous contest runs made me cringe
> when I saw how -mm and recent 2.5 kernels were faring, but it looks
> like Andrew has done something right in 2.5.47-mm1. I hope the appropriate get merged so that 2.6.0 has stunning performance across
> the board.
Tuning of 2.5 has really hardly started. In some ways, it should be tested against 2.3.99 (well, not really, but...)
It will never be stunningly better than 2.4 for normal workloads on
normal machines, because 2.4 just ain't that bad.
What is being addressed in 2.5 is the areas where 2.4 fell down: large machines, large numbers of threads, large disks, large amounts
of memory, etc. There have been really big gains in that area.
For the uniprocessors and small servers, there will be significant gains in some corner cases. And some losses. Quite a lot of work has gone into "fairness" issues: allowing tasks to make equal progress when the machine is under load. Not stalling tasks for unreasonable
amounts of time, etc. Simple operations such as copying a forest of files from one part of the disk to another have taken a bit of a hit from this. (But copying them to another disk got better).
Generally, 2.6 should be "nicer to use" on the desktop. But not appreciably faster. Significantly slower when there are several processes causing a lot of swapout. That is one area where fairness really hurts throughput. The old `make -j30 bzImage' with mem=128M takes 1.5x as long with 2.5. Because everyone makes equal progress.
Most of the VM gains involve situations where there are large amounts of dirty data in the machine. This has always been a big problem
for Linux, and I think we've largely got it under control now. There are still a few issues in the page reclaim code wrt this, but they're
fairly obscure (I'm the only person who has noticed them
There are some things which people simply have not yet noticed.
Andrea's kernel is the fastest which 2.4 has to offer; let's tickle its weak spots:
Run mke2fs against six disks at the same time, mem=1G:
2.4.20-rc1aa1:
0.04s user 13.16s system 51% cpu 25.782 total
0.05s user 31.53s system 63% cpu 49.542 total
0.05s user 29.04s system 58% cpu 49.544 total
0.05s user 31.07s system 62% cpu 50.017 total
0.06s user 29.80s system 58% cpu 50.983 total
0.06s user 23.30s system 43% cpu 53.214 total
2.5.47-mm2:
0.04s user 2.94s system 48% cpu 6.168 total
0.04s user 2.89s system 39% cpu 7.473 total
0.05s user 3.00s system 37% cpu 8.152 total
0.06s user 4.33s system 43% cpu 9.992 total
0.06s user 4.35s system 42% cpu 10.484 total
0.04s user 4.32s system 32% cpu 13.415 total
Write six 4G files to six disks in parallel, mem=1G:
2.4.20-rc1aa1:
0.01s user 63.17s system 7% cpu 13:53.26 total
0.05s user 63.43s system 7% cpu 14:07.17 total
0.03s user 65.94s system 7% cpu 14:36.25 total
0.01s user 66.29s system 7% cpu 14:38.01 total
0.08s user 63.79s system 7% cpu 14:45.09 total
0.09s user 65.22s system 7% cpu 14:46.95 total
2.5.47-mm2:
0.03s user 53.95s system 39% cpu 2:18.27 total
0.03s user 58.11s system 30% cpu 3:08.23 total
0.02s user 57.43s system 30% cpu 3:08.47 total
0.03s user 54.73s system 23% cpu 3:52.43 total
0.03s user 54.72s system 23% cpu 3:53.22 total
0.03s user 46.14s system 14% cpu 5:29.71 total
Compile a kernel while running `while true;do;./dbench 32;done' against
the same disk. mem=128m:
2.4.20-rc1aa1:
Throughput 17.7491 MB/sec (NB=22.1863 MB/sec 177.491 MBit/sec)
Throughput 16.6311 MB/sec (NB=20.7888 MB/sec 166.311 MBit/sec)
Throughput 17.0409 MB/sec (NB=21.3012 MB/sec 170.409 MBit/sec)
Throughput 17.4876 MB/sec (NB=21.8595 MB/sec 174.876 MBit/sec)
Throughput 15.3017 MB/sec (NB=19.1271 MB/sec 153.017 MBit/sec)
Throughput 18.0726 MB/sec (NB=22.5907 MB/sec 180.726 MBit/sec)
Throughput 18.2769 MB/sec (NB=22.8461 MB/sec 182.769 MBit/sec)
Throughput 19.152 MB/sec (NB=23.94 MB/sec 191.52 MBit/sec)
Throughput 14.2632 MB/sec (NB=17.8291 MB/sec 142.632 MBit/sec)
Throughput 20.5007 MB/sec (NB=25.6258 MB/sec 205.007 MBit/sec)
Throughput 24.9471 MB/sec (NB=31.1838 MB/sec 249.471 MBit/sec)
Throughput 20.36 MB/sec (NB=25.45 MB/sec 203.6 MBit/sec)
make -j4 bzImage 412.28s user 36.90s system 15% cpu 47:11.14 total
2.5.46:
Throughput 19.3907 MB/sec (NB=24.2383 MB/sec 193.907 MBit/sec)
Throughput 16.6765 MB/sec (NB=20.8456 MB/sec 166.765 MBit/sec)
make -j4 bzImage 412.16s user 36.92s system 83% cpu 8:55.74 total
2.5.47-mm2:
Throughput 15.0539 MB/sec (NB=18.8174 MB/sec 150.539 MBit/sec)
Throughput 21.6388 MB/sec (NB=27.0485 MB/sec 216.388 MBit/sec)
make -j4 bzImage 413.88s user 35.90s system 94% cpu 7:56.68 total - fifo_batch strikes again
It's the "doing multiple things at the same time" which gets better; the
straightline throughput of "one thing at a time" won't change much at all.
Corner cases....
Re:Make it simple please (Score:1)
It is simple, tar -xvzf linux-{current}.tar.gz.
cd linux; make menuconfig; make dep bzImage modules modules_install
Assuming you can do a bzImage on your platform....
Anyway, It's not so hard, you just need to know what the hardware in your machine is, and what you actually want to work out of that hardware, then turn it on. {grin}
Re:Make it simple please (Score:5, Interesting)
cd linux; make menuconfig ; make dep bzImage modules modules_install
You're joking, right? How many options in 2.5.47 must be selected in order for your run of the mill $9 generic PS/2 keyboard to work? I can't tell you how much fun it was building 2.5.47, missing one *somewhere* and suddenly I couldn't do anything because my keyboard stopped working.
The kernel only has an expert mode. It would be nice if there were a higher order config that asked you basic questions and built the things you were most likely to need, with the option of going into a more expert mode if you needed to fine tune something.
Re:Make it simple please (Score:3, Insightful)
Sometimes people shouldn't mess with stuff, the kernel is one of those things. RedHat does a good job with their builds and an average user doesn't need to rebuilt it at all. A more experienced user might want to tweak, but then he can use make menuconfig or make config...and choose his options.
My grandmother will never recompile her kernel.
Re:Make it simple please (Score:1, Insightful)
There are. They are called RedHat, Mandrake, SuSE, etc.
Re:Make it simple please (Score:2, Insightful)
make xconfig && make dep && make bzImage && make modules && make modules_install && make install
Re:Make it simple please (Score:2)
make xconfig dep clean bzImage modules modules_install
-adnans
Re:Make it simple please (Score:2)
Of course, you have to be using a BSD kernel. Theres nothing wrong with using GNU userland tools and a BSD kernel...
Re:Make it simple please (Score:2)
make oldconfig dep clean modules modules_install install
Yes oldconfig is nice when you already have a
Re:Make it simple please (Score:5, Informative)
quick hint; isnstall the kernel sources that came with your dist. Use the .config file found in this to compile first. These are the settings that your kernel was compiled with. The you can use make xconfig alter a known working config. Good luck.
I hear you... (Score:2)
I'm blessed to have friends that know more than I do and are willing to help me out when I get stuck.
Compiling the kernel is something I haven't attempted since 386DX40 days.
Re:Make it simple please (Score:4, Informative)
I just finished *this morning* compiling a 2.2.22 (yes, RH-6.2) for my box. Use the
Then save the new configuration. Do a 'make dep bzImage modules modules_install' and copy the ~/System.map file as System.map-new.kernel.number and drill down to
from
Modify
Reboot into new kernel. If you get lots of error messages about modules not loading, reboot at the command prompt, and everything will have been rewritten magically. Use your new kernel for testing. You may find you want to try another configuration. Do it all again, changing the Makefile each time under line 3 EXTRAVERSION with another digit or letter to keep it from overwriting a working kernel when you copy in to
Frankly, I've tried nine builds and although my kernels are smaller than stock, use about 5Kb less RAM and benchmarks seem to indicate about 5-6 per cent increase in speed, I feel no difference in use.
I do feel better knowing I am using the latest (and perhaps the last) kernel in the 2.2.x series, though. FWIW.
Re:The data from the benchmarks is pasted here: (Score:1)
Re:Linux Benchmarks (Score:4, Funny)
Re:Linux Benchmarks (Score:5, Funny)
Apparently he works for a [slashdot.org]
development firm, [slashdot.org]
studies meterology, [slashdot.org]
works for Verizon store at a local mall, [slashdot.org]
owns a chain of pet stores in London, and [slashdot.org]
has a thing for CmdrTaco.
Read together, they make amusing reading
Re:Linux Benchmarks (Score:2, Funny)
Cone dumping is a problem... hopefully, this new vehicle [ucdavis.edu] solves it.