Dual Athlon Preview: Linux Kernel Compile Smokes 177
Mr. Flibble writes: "The fellows over at NewsForge have an article describing how they were able to test the 'World's First Dual DDR Athlon' running Mandrake 7.2 on a prerelease motherboard and chipset. The surprising thing is that the dual system was 142% faster in a kernel compile than a single processor system!" Jeff (of NewsForge) says this is the genuine truth. Now if only the right motherboards would start showing up in quantity on pricewatch ...
142% Faster? Ok, but did you............. (Score:4)
Ah, yes... I can see it now.
Marketing Bozo: "Ok folks, here's the "before" version. Wow, thats mighty slow! Lets Ctrl-C out of that kernel build, drop in that second Athlon, and build that kernel again!"
A few minutes of fiddling pass
Crowd: (ooooh---ahhh!)
Marketing Bozo: "Ok! Off we go! Wow, look at that sucker haul! Its nearly 150% faster than the single processor version!"
Crowd: (ooooooh!---ahhh!) (clap clap clap)
A voice from the back of the crowd speaks...
Bowie: Hey dickweed! You forgot to MAKE CLEAN!
A fight breaks out..
Crowd: "Kill him!! KILL HIM! He runs that PROPAGANDA [tilez.org] page! His words and ideas bring fear, destruction and DEATH to all who listen!! KILL HIM!!"
The sounds of the beating continue as Marketing Bozo takes pre-orders for his motherboard..
Just another day in at the convention..
SMP becoming a standard? Umm...probably not. (Score:1)
Umm...I wish, but I dont think so. The biggest "get SMP into the hands of the masses"-style event I can think of was when people realized they could put celeron's into special SMP-enabling slockets,
and put them in (relatively) cheap SMP motherboards.
This Tyan board isnt ATX form factor (look at it, it's *HUGE*), and it has 64 bit PCI slots. Somehow, I doubt it'll be a cheap board either. I'll bet it'll MSRP at over $500 for just the mobo.
Besides, Intel doesnt seem committed to SMP as they once were. They've stratified their CPU lines, and now they make a lot of Celerons and P-III's (some models) that physically cannot do SMP. It used to be (well 2-3 years ago) that _every_ Intel CPU could do SMP, but it isnt so anymore.
FWIW, all Athlons can do SMP, but there's no boards on the market that support it, and even when this one makes it to market, it'll probably cost a mint and require a special case/PS.
Re:Its the width (Score:1)
But we also have a passive heatsink that IS rated at 1.2 GHz, and you can have a special fan that blows through from the side.
So there definitely are ways to fit it into 1U. Of course, whether my boss decides to use this motherboard in 1U systems or not is up to him.
Re:And here's the pic (Score:1)
Only took 3m50.767s on my dual PIII 500 box (Score:1)
3m50.767s
Compared to a 2m compile on the dual Athlon 1.2GHz w/256 MB RAM. That doesn't make the dual Athlon look very good, IMHO.
I just unpacked a fresh kernel tree, then:
make config (used defaults)
time make -j3 bzImage
Anyone else have any results to share??
--
Re:smoking against physics (Score:1)
I.e. if I'm running at half the speed you are, does it make me faster than you by 50%? Maybe not.
--
this vendor is a crook (Score:1)
___
Re:Not quite a perfect comparison (Score:1)
Re:Linux SMP kernel "does the right thing." (Score:1)
LOL! And they run in and work for five minutes and then run back out? (That would complete the analogy.
Well, the EV6 bus that the Athlon uses is a point-to-point link (hence, technically, not a bus...), so it is indeed compatible with a crossbar. I remember at least one company that was going to make crossbars for multiple athlons - I can't remember their name - but I do remember them closing down.
-
Proper make usage? (Score:1)
make -j3 'MAKE=make -j3' bzImage
Re:dual 1.2GHz Athlon not all it's cracked up to b (Score:1)
The Athlons 64-bit path to second level cache cannot compete. It would be nice if AMD optimized their processors for gcc, but honestly I think SpecWeb and Quake are more important benchmarks for them.
make -j3... NOT! Try this on for size! (Score:1)
make "MAKE=make -j3" -j3 bzImage
Be sure you've got plenty of memory (I've got 768M)
Yeah, but ... (Score:2)
Did they try to boot the kernel they compiled? =)
Re:Ace's Hardware (Score:5)
In order to be scientific, you need a control.. I was sorry to say that this reviewer did no such thing. You point out that the -j helps even for single-CPU's, and this definately was the case with my test results (I can go dig them up if anybody is interested). BUT, there is a limit to the performance enhancement of -jxxx, since a single task running at full throttle is much faster than 2 or 10 tasks switching back and forth. So what I did was for both single and dual CPU modes, I ran with the bare make, then -j 2, then -j 3, -j 4, and finally -j 5 (where performance was being hurt).
I don't recall, but I believe beyond -j 4 I was swapping to disk (though I know I achieved that phenonmena at a sufficiently large number).
Another problem with the experiment was that the slower method was run first.. There is the issue of disk-cashing - namely that the second test stood the chance of having key libraries and possibly most source code still in cache during launch which would dramatically reduce the IO latency. An ideal test of CPU performance would be to put half a gig of memory in there, run it through once, "reboot", then run it for the other.. This is precisely what I did, and I do believe there were several seconds shaved off for cached recompiles.
Personally, I like dual proc's just so I can watch xosview's dual-CPU meters flop back and forth.
-Michael
Re:The lie of -j3 and no "make dep" (Score:2)
You mean two processes (jobs as make calls it).
That reminds me.. When is someone going to add make and gcc libraries to Perl? I want to be able to use Perl as the "process-glue" between all these steps so that building does not require forking / reparsing of all those damn
-Michael
Re:I stand corrected... (Score:2)
The test they ran does not indicate the benefits of dual CPU alone. It shows the benefits of dual CPUs combined with the benefits of running multiple compiles at the same time. That's why you end up with more than 100% increase.
Re:will the Dual athlons be socket or slot? (Score:1)
blessings,
What about caching? (Score:2)
The disk itself will be doing more caching on the second time through, as will the RAM disk cache, and various other caches (even the caching of gcc itself...) Also, does a `make clean` _really_ clean the tree back to pre-compile stage?
To do this properly would require two separate kernel trees to compile, and a reboot in the middle, and preferably SMP kernel vs non-SMP kernel in the reboot... The other way, which is more practical in the circumstances, could be to try doing a `make -j1; make clean; time make -j1` followed by a `make -j2; make clean; time make -j2`... That would be closer to reality, but still not quite...
rr
Re:142% Faster? Ok, but did you............. (Score:1)
(that's why you need to do these tests in a ramdrive, people; whoever thinks that compiling the Linux kernel is a Scientific Benchmark please Shut The Fuck Up--I suggest killing yourself as the preferred method of Shutting The FUCK Up...)
As long as the price is reasonable... (Score:2)
I really want a dual Athlon Abit board
What's that noise?! (Score:5)
The noise you hear is the sound of thousands of single-CPU
Re:Great! (Score:1)
Re:I've noticed this too (Score:2)
Because two of these can run at the same time (one per CPU), the data doesn't have to be written to the disk between stages.
The data doesn't have to be written do disk anyway -- that the entire point of the -pipe option in gcc. All data is written to stdin of the child processes, resulting in faster compiles because nothing ever hits the disk until to assembler outputs the object file.
Multi-processor builds are faster because make (with the -j option) compiles several files in parallel (idealy, one per processor), not because data is piped from stage to stage in the compiler.
The reason they use make -j3 on a two-processor is to take advantage of the fact that compiling programs is both a compute-bound and I/O-bound operation. While one instance of the compiler is waiting for data to be read, another instance is busy generating code. Both processors are always in use, even when one of the instances of gcc is stalled waiting for an I/O operation to complete.
---
The Hotmail addres is my decoy account. I read it approximately once per year.
make -j3.... (Score:2)
alias make='make -j2'
then you do a make..The reason i believe was that make would then go into each subdir and use the make -j2 command rather than just go serial after the top dir ? Hmmm been some time so i am not completely sure, but thats what i recall..... anyone have more info on that, and should they have done something like that ?
142% (Score:2)
Why do people consider 142% for two processors impressive?
Re:Benchmark was incorrect! (Score:1)
It's getting more expensive and less common (Score:2)
Re:Yes but... (Score:2)
Sounds a bit slow to me. What you really want is 20 second kernel compile times [linuxcare.com.au] :-)
DDR v.s. Rambus (Score:2)
This of course requires that enough memory is used to fully cache the disk and alleviate disk-IO latency.
-Michael
Re:Ace's Hardware (Score:1)
Any thoughts?
Sweet (Score:1)
Re:Ace's Hardware (Score:1)
I wonder how fast one could make a compiler if it were persistent, so that all the common #includes didn't need to be reparsed for every source file.
Re:DDR v.s. Rambus (Score:2)
Re:Not quite a perfect comparison (Score:2)
142%? (Score:1)
this test is a total bullshit (Score:1)
Re:Not quite a perfect comparison (Score:1)
Basically, any supra-linear speedup makes me worry -- I'm willing to believe a bit of superlinearity from preloaded disk caches and what not, but 142% is quite a bit more than I can accept at face value.
Great! (Score:1)
Twice is Nice! (Score:2)
Re:Switches invalidate the results (also: 4-way SM (Score:1)
Ace's Hardware (Score:5)
"Unfortunately, the benchmarks vary significantly between the two tests in that the first is completely serialized while the second (dual-processor) test is run with three parallel make processes (notice the -j flag). Because the first system is running with only a single build instance, the processor is spending a great deal of time simply waiting on IO. Meanwhile, the dual-processor test was performed with not just two, but, in fact, three make processes. The difference here is that a processor will not be completely idle while waiting on IO in the second test, as there are two additional build processes running concurrently. This is why the use of the -j parameter is often recommended even for uniprocessor systems, as a parallel make will often yield much higher CPU utilization and thus faster compiles.
"Until then, it is very difficult to make a representative statement about the performance of a dual-processor Athlon system from this benchmark."
-----------------------------------------
Re:What's that noise?! (Score:2)
SOMEONE PLEASE MOD IT UP TO SUPER FUNNY!!! darn i posted too early!!!
Re:Not quite a perfect comparison (Score:2)
Single-processors system, compiling linux-2.2.18 w/ ReiserFS patch 3.5.29.
'make clean && make dep && time make bzImage':
real 7m24.803s
user 6m30.070s
sys 0m39.630s
'make clean && make dep && time make -j3 bzImage':
real 7m9.606s
user 6m28.400s
sys 0m38.910s
This is a relatively monolithic kernel; only sound is modular, everything else I need is compiled in. So, doing a 'make -j3' on *my* uniprocessor system yields an absolutely <sarcasm>*MASSIVE*</sarcasm> 15.2 second gain.
In short, while I wouldn't make any bets on the benchmark these fellows did, I don't think they're as useless as most people seem to be thinking.
Barclay family motto:
Aut agere aut mori.
(Either action or death.)
Re:142% (Score:2)
If they had said that the dual athlon system was 142% as fast as the uniprocessor system, then it would have been disappointing, but 142% faster is more than twice as fast.
Of course, the benchmark was flawed, as many others have pointed out, so the real numbers may not be as impressive.
Re:its both DDR & dual CPU system (Score:2)
Uhm, NO, sorry. Wish it were true, though. The 760MP chipset supports up to TWO processors only. You'll see >2 proc support chipet(s) from other vendors, later on.
Re:DDR != Dual (Score:1)
What about NAT?
--
This headline might be rewritten soon: (Score:1)
The Article.. (Score:1)
I realize, and realized when I published the article, that I could not be 100% accurate for a number of reasons. And, I stated many times, that this is preliminary/experimental in every way. This is the results of both prerelease hardware and the fact I could not physically access the box, and had no way of removing or disabling one of the CPUs in order to do accurate testing of the -j switch. I realize that these results, therefore, are not the most accurate, however, I wrote this merely to give people an idea of what Dual Athlon performance would be like. That is why this article is labeled as prerelease, a preview, etc. Trust me, when I get a final release motherboard that supports DP Athlons, I will be doing a full and correct evaluation, but for now I figured that the best thing to do would be to get some results out there, just to give an idea. You can do Single Processor benchmarks of an Athlon on your own, and I would encourage you to do so and post them here.
Thanks for the feedback, 99% of which has been very constructive. I will of course be attempting to get some quality time alone with this thing (*wink wink nudge nudge*), but for now this is the best I was allowed to do. I figured a glimpse would be better than nothing.
Wee,
-Jeff
Can MPEG4 compression be parallelised? (Score:1)
I haven't kept in touch with the MPEG standards, but I assume at least some multi-processor usage is possible.
I don't car how long a kernel compile takes. With a monolithic kernel, you're going to spend most of your time in the link stage (my work project takes about 15 minutes to link (on a huge Sun server), purely because it can only use one of the 4 Ultrasparc chips) and thus isn't a good multi-processor benchmark.
FatPhil
-- Real Men Don't Use Porn. -- Morality In Media Billboards
how many fans (Score:1)
Re:And here's the pic (Score:1)
-motardo
Face it, that guy doesn't know how to benchmark. (Score:1)
I don't want to troll, but this makes no sense, the only way you could do more than 100% increase is if the compiler is optimised for dual processor systems and not for single (which wouldn't make sense at all) OR the guy obviously changed a key/switch in the benchmark tests from a system to another. Which voids the benchmark in the first place.
Linux compile is an okay benchmark, but how about something with less confusion, where you can't play with settings too much and the application is both single and dual CPU capable?
I personnaly use Lightwave3D a lot, so I benchmark using this software, it gives a good idea on a system to system basis and supports up to 8 threads. There are already scenes that are standard on the CD for benchmarking so people can see if their system is better than another or the increase of performance. Yes I know, Lightwave doesn't run on linux, but the point is it's a benchmark that can be run cross platform and people can rely on the numbers, I want a dual Athlon system, everybody wants one or almost, but I don't want "overjoy" to overrate de numbers, we have enough of Intel's marketting dept for that sort of bs.
Re:142%? (Score:4)
No, it was a bogus test. The author did "make bzImage" for the uniprocessor test and "make -j3 bzImage" on the multiprocessor test. If he'd done "make -j3 bzImage" for both, he would've discovered that the machine sped up by less than 100% most likely.
The thing is, "make -jX " for about 1 < X <= 4 still gives a speedup on uniprocessor systems because some compile tasks can be in disk-wait while others sit on the CPU. (The optimal number for X depends on how fast your disks are and how much RAM you have. If X is too big, you start swapping, and end up losing performance.)
--Joe--
Re:142%? (Score:1)
Hmm... could be a RAM issue, or a front-side bus issue. I know my fiancee's dual 450MHz Pentium II box (which has a 100MHz FSB) is about 3x faster at building the kernel (just shy of 3x) than my single 300MHz Pentium II (with 66MHz FSB). Between MHz and SMP, she got nearly 100% of the extra CPU's benefit, and this is with a 2.2.x kernel.
(It's my turn for a CPU upgrade next. Dual 1.2GHz Athlon with DDR is my current dream machine.)
--Joe--
Re:Ace's Hardware (Score:1)
Ah yes... I remember the days of Turbo C's precompiled header files. :-)
Not a bad idea, actually, but really a lot of
the compilation time in the compiler is now spent in the optimization stages, not the parsing stages. (Particularly in the compiler I use at
work.)
--Joe--
Re:Ace's Hardware (Score:2)
--
Re:Not quite a perfect comparison (Score:1)
-Yenya
--
Re:And here's the pic (Score:2)
It has no ISA ! Blasphemy ! I bet it doesn't even have ROM BASIC
--
Re:And here's the pic (Score:1)
The positioning of the power socket is really strange - smack in the middle. Notice some other interesting features - on the edge closest to the camera those black trapezoid sockets look like they might be SCSI? And what's the large beige rectangle right next to the Intel logo chip?
I sure hope they get this out soon. I have a dual PIII system right now, and an 800MHz Thunderbird also. The Athlon is as fast as the PIII system for most tasks, except when multitasking - playing MP3s at the same time really slows the single-CPU system, while the dual-PIIIs are barely affected. While SMP systems don't tend to go faster on specific tasks (as the link in the main story would have you believe) the real benefit is that you never have to wait - while something is processing, just start another window and do something else. Hope they include ATA100 raid, like many of the slot-A boards do now.
(Ob-bedroom-hardware-review-site comment: "Obviously this is an extrememly stable board, because there are lots of capacitors and they are big ones.")
Re:Not quite a perfect comparison (Score:1)
um, what does that have to do with parallelisim? compiling driver xyz as a module doesn't give you a more multithreaded kernel or any other speed advantage. it just means you can add and remove features as needed for a small memory gain. trivial, at today's prices.
--
Re:How fast does it (Score:1)
A.
I stand corrected... (Score:2)
lamer benchmark (Score:3)
This suggests that they made a kernel on the same system before, and try to ``undo'' the make.
This is stupid. Why? Because:
Then make a ``test config'' .config-file, for example with ``cp arch/i386/defconfig .config; make oldconfig'' (and press a couple of enters). Copy this file to ``Testconfig'' or something.
Now start the system with the single processor kernel and run the following:
make mrproper; cp Testconfig .config; make oldconfig; make dep; time make -j$N bzImage
Now reboot the system and run the dual processor kernel. Recompile, with -j$N maybe going up to 4 or 5 or so.
Now *that* is something that comes close to a benchmark.
Re:Not quite a perfect comparison (Score:1)
He makes his point well - the benchmarks probably aren't that bogus. However, they are still bogus to an extent, and thus, well, bogus. :) It would be interesting to see a non-weighted benchmark of this type. This takes away from any actual speed that the dual board config might actually boast. For instance, now 125% increase won't seem quite as impressive as it truely is to most individuals. :)
-------
CAIMLAS
Possibly among the hardcore. (Score:2)
I agree on the power supply, Athlons have a big draw, and duals will be worse of course. I disagree on the cost. Remember the BP6 from Abit? It was a big sucess because it took cheaper chips (Celerons) and created SMP systems at at price that was reasonable.
Here we have Athlons, offering a far better price/performance ratio than anything Intel has to offer. If Abit comes out with a board in the BP6 price range, I bet its more popular than the BP6. Remember, the BP6 has its own website [bp6.com] few motherboards can claim that.
Linux SMP kernel "does the right thing." (Score:3)
The section "Hardware Cache" in Chapter 2, Memory Addressing, explained that the contents of the hardware cache and the RAM maintain their consistency at the hardware level. The same approach holds in the case of a dual processor.
Hope this helps
Cheers,
Andrew
Re:Not quite a perfect comparison (Score:2)
9m47s real, 9m12s user, 0m44s sys.
Same 5100, second processor in, the time is:
5m19s real, 9m25s user, 0m43s sys.
The result? Only a 82% speed increase. Which should be typical for all Intel based SMP systems (Loss numbers run 12-17% cumulative per additional CPU, depending on mobo.)
Compare that loss figure with, say the AS/400. Quoted loss figure is less than 3% cumulative per CPU.
3v1l_b0r1s at d4rkr0ck d0t c0 d0t uk
Lies, statistics and benchmarks (Score:2)
2/ They used -j 3 for the bi procesor one, while the first one was probably I/O bound.
3/ The did the mono-processor first, than the bi processor. The disk cache may have helped the second compile.
A correct way to test would be:
foreach i (1 2 3 4)
1/ Boot the machine with a mono processor kernel
2/ time make -j $n bzImage ; make clean
Repeat with multi processor kernel.
If one don't want to reboot between each test, than you should do something like:
make -j $n bzImage ; make clean ;
before running the real tests.
Cheers,
--fred
Re:How fast does it (Score:2)
Have you recompiled your kernel to remove checks for hardware you don't have?
Is windows 2000 starting some of its services after the gui appears, just giving the impression it's finished booting, when, in fact, it's still doing stuff in the background (I know NT 4 does that...)
Beta testing (Score:2)
---
Some comparable benchmarks... (Score:5)
System: SuSE 7.0, kernel 2.4.1 compiled with Uniprocessor and APIC/IO_APIC.
Athlon 1.1GHz, Asus A7V motherboard. FSB is 100MHz DDR. Memory is 256 megs at PC133, ATA66 5400RPM drive with ReiserFS.
I performed three series of tests. All tests were performed in single/double/triple thread orders, and each thread compile had it's own directory.
First test, all three had been make config'd per the original article, followed by make dep. After that, I rebooted and did all three compiles without rebooting. Second series started the process over again by make mrproper/make oldconfig/make dep/time make -jN bzImage, with N being the corresponding thread. Finally, I did a make mrproper/make oldconfig/make dep and rebooted each time before the compile.
I should note that on several occasions, I got Odd results; whether this was caching of some sort or not I don't know, but I would get 3m35s on a single thread and 1m9sec on a -j2 with a removed and recreated directory, as well as one or two other occasions - unfortunately, all the other occasions were when I was accidentally failing to use "time make -j2 bzImage" and instead was only doing "make -j2 bzImage", so I have no empirical proof. At any rate, here's the recorded ones.
Round 1
Straight
real 3m17.571s
user 2m54.660s
sys 0m13.120s
-j2
real 3m13.772s
user 2m58.390s
sys 0m13.390s
-j3
real 3m13.470s
user 2m59.390s
sys 0m13.180s
Round 2
Straight
real 3m8.048s
user 2m54.780s
sys 0m13.140s
-j2
real 3m11.912s
user 2m58.050s
sys 0m13.590s
-j3
real 3m12.532s
user 2m58.370s
sys 0m13.900s
Fresh-boot compile
Single thread was not redone; it was the Round 1.
-j2
real 3m15.634s
user 2m58.030s
sys 0m13.700s
-j3
real 3m16.433s
user 2m59.310s
sys 0m13.290s
As you can see, not much of a variation on here. The times are also a hell of a lot better than a 1.2GHz system single-threaded with DDR SDRAM, which makes me wonder what precisely is slowing down the 1.2GHz...
Food for thought.
Re:And here's the pic (Score:3)
The reclining dimm slots are there because the DDR memory for this motherboard is fairly tall, and Tyan would like to be able to use this motherboard on 1U rack systems. The reclining dimm slots does waste a lot of real estate, which could've been used to place the power connector closer to the edge. But the market for 1U rack mount systems appears to be growing rapidly, so I think the reclining dimm slot is very important.
For those of you that are complaining that they just bought a system or is looking to buy a system, this board isn't even supposed to be announced until March, so don't hold your breath.
Re:142%? (Score:2)
Ok, I can definitely see that if you have two isolated tasks, each of which would fit in the cache on its own, but both of which would thrash each other in a single-CPU environment, you could get a superlinear speedup by going to multiple CPUs.
--Joe--
Re:Not quite a perfect comparison (Score:2)
Normally when you benchmark you try to change only one factor. If you change the code you're running and the number of CPUs then how do you know how much each factor is affecting the result?
Re:Lies, statistics and benchmarks (Score:2)
Correct me if I'm wrong, as I don't own an smp system, but don't you have to reboot to change from uniprocessor mode to dual processor mode? I seem to remember reading something about pulling out the processor and installing a termintor in the socket. Of course, there may be an easier way to disable one processor, but I still think that you'd have to reboot in order to make the switch.
Don't Bogart that joint, my friend... (Score:2)
Not pricewatch! (Score:2)
The lie of -j3 and no "make dep" (Score:4)
Single thread:
597.00user 46.40system 12:11.08elapsed 88%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (789303major+881687minor)pagefaults 0swaps
Two threads on one processor:
511.41user 31.30system 9:21.66elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (489357major+669019minor)pagefaults 0swaps
By the same logic as they used in this benchmark, my uniprocessor system is thus 31 percent faster than the same old uniprocessor system. Bah! I just wish people weren't posting nonsensible benchmarks like this. At least, they should _try_ to make it somewhat representable...
Re:Not quite a perfect comparison (Score:2)
Re:Not quite a perfect comparison (Score:3)
Linux 2.2.18 (Dual p3-550, 1Gb ram, all SCSI compiling Apache 1.3.17)
make
real 0m37.244s
user 0m23.900s
sys 0m6.000s
make -j3
real 0m26.915s
user 0m24.360s
sys 0m6.020s
make -j4
real 0m23.724s
user 0m24.130s
sys 0m5.880s
make -j5
real 0m20.154s
user 0m22.940s
sys 0m5.000s
make -j6
real 0m21.326s
user 0m24.120s
sys 0m5.830s
FreeBSD (Dual p3-550, 512Mb ram, all SCSI compiling Apache 1.3.17)
make
39.458u 5.635s 0:48.99 92.0% 1686+1874k 0+1249io 0pf+0w
make -j3
40.007u 5.725s 0:32.53 140.5% 1696+1884k 0+1645io 1pf+0w
make -j4
40.027u 5.817s 0:32.73 140.0% 1691+1877k 0+1631io 0pf+0w
make -j5
40.154u 5.832s 0:31.74 144.8% 1701+1884k 1+1628io 0pf+0w
Re:Sounds Good (Score:2)
a) It's not our fault, it's those nasty third-party vendors
b) You should spend a fortune redesigning your network and then a further fortune buying new licenses for Windows2000
Of course, as NT has no logs or core dumps worth a damn, there's no way to know what went wrong.
Re:DDR != Dual (Score:2)
Oops. I'm wrong. My fault. Moderate down accordingly.
Re:DDR != Dual (Score:2)
Please check your facts before shooting your mouth off.
"Linux is broken! When I type date, it just gives me the time!"
Not quite a perfect comparison (Score:5)
The article states:
This isn't really a good way to compare single processor results to dual processor results. The make -j3 lets make run three processes at once, which would lead to a speedup even on a single processor system, because disk I/O and CPU-bound compilation can overlap. The only totally fair way to compare is to boot a non-SMP kernel, run the benchmark, then boot an SMP kernel and run exactly the same benchmark.
Even though the 142% speedup is bogus, the two minute kernel compile is pretty damn fast.
Re:Tyan SMP board pricing (Score:2)
Anyhoo, I believe that the layer count has to do with the fact that there is a LOT of wiring associated with the particular bus type, er, well I understand that it isn't _really_ a bus, but I don't want to get into the particulars.
Re:And here's the pic (Score:2)
Or are your particular sound card drivers that inefficient? I've noticed some Linux sound drivers were _very_ slow. I never did inquire why.
The good and bad of AMD's MP (Score:5)
The bad thing is that so far only Tyan has announced a MB based on the 760MP chipset and that MB is definitely suited for servers, won't fit in a standard ATX case.
How fast does it (Score:5)
Re:Lies, statistics and benchmarks (Score:2)
Sure. You misread me, or I mis-expressed myself.
There was a 'foreach' before point 1 and 2. I meant:
boot mono kernel
compile kernel with -j 1
boot mono kernel
compile kernel with -j 2
boot mono kernel
compile kernel with -j 3
boot mono kernel
compile kernel with -j 4
boot smp kernel
compile kernel with -j 1
boot smp kernel
compile kernel with -j 2
boot smp kernel
compile kernel with -j 3
boot smp kernel
compile kernel with -j 4
This is how a serious benchmark should be done, with the machine state as similar as possible before each tests.
If one suspect that the best compile time would be '-j 2' for a mono kernel and '-j 3' for a bi proc one, then the 8 tests are more or less necessary ("mono -j 1" and "mono -j 3" are necessary to prove that best compile time for a mono machine is -j 2 and "smp -j 2" and "smp -j 4" are needed to prove than -j 3 is best for smp. "mono -j 4" amd "smp -j 1" are here for the sake of completness).
If someone want to avoid 8 reboots, he can do:
boot mono kernel
compile kernel and throw away results
compile kernel with -j 1
compile kernel with -j 2
compile kernel with -j 3
compile kernel with -j 4
boot smp kernel
compile kernel and throw away results
compile kernel with -j 1
compile kernel with -j 2
compile kernel with -j 3
compile kernel with -j 4
There, only 2 reboots are necessary. Of course, if the machine have much memory, the numbers will be very different from first benchmarks (because you could end with everything cached)
Cheers,
--fred
Tom's Hardware also has a dual CPU comparison (Score:2)
Go to www.tomshardware.com and have a look - get the real picture.
Now why aren't there any Q3A benchmarks??
Re:Linux SMP kernel "does the right thing." (Score:2)
Glad to hear it... (Score:3)
As one of those dual-celeron guys (bang for the buck!), I love to see AMD finally show off dual processor machines. But the next time we get a chance to play with one, lets try to make a more realistic comparison.
Switches invalidate the results (also: 4-way SMP) (Score:3)
See Ace's Hardware [aceshardware.com] for a discussion of exactly this:
"[T]he dual-processor test was performed with not just two, but, in fact, three make processes. The difference here is that a processor will not be completely idle while waiting on IO in the second test, as there are two additional build processes running concurrently. This is why the use of the -j parameter is often recommended even for uniprocessor systems, as a parallel make will often yield much higher CPU utilization and thus faster compiles."
Also, see reader comments [aceshardware.com] saying that AMD demonstrated a 4-way SMP Athlon system at LinuxWorld.
And here's the pic (Score:2)
And here's the pic [impress.co.jp]
I've noticed this too (Score:2)
Please pardon the excessive simplification...
dual processor machines are great for normal use (Score:2)
There are actually a lot of benefits to a dual-processor setup. I did a research project on the Linux scheduler for interactive users:
http://www.people.fas.harvard.edu/~rross/cs265/pap er.html [harvard.edu]
Afterward I put together a dual-celeron system and the improvement in the overall responsiveness and feel of the system was quite dramatic.
- Russ
Re:Lies, statistics and benchmarks (Score:3)
I agree. I think this post was on the right track in performing a number of tests to find out where the sweet spot is for the -j argument. There have been hypotheses posted here that caching effects may have interfered with the results. (I wonder if interim/final files' locations on the disk could vary the results, too -- longer seek/write times... maybe need to defrag the disk between iterations, too?)
BUT, it strikes me that EACH test should be repeated a sufficient number of times so that the durations measured vary within a desired confidence level (statistics term -- standard deviation and variance and other stuff whose name and vague conepts I recall but I learned too long ago to recall, now). At an absolute minimum, doing each test twice and having results that vary within, say, a couple seconds would counter the concerns that there was some unknown but suspected optimization happening (e.g. disk cache, left over interim files, etc.).
Personally, I'd still prefer to see each test performed at least 3 times. In my experience, I've seen very close 2-try results where the results on the 3rd time sometimes confirmed them, but other times refuted them. (Yes, I know it's not "scientific", but I'd rather repeat an unnecessary test than omit a necessary one!)
Then, to make sure there were no accumulated small effects from running all those tests, repeat the very first test one more time to confirm that its results fell in line with the orginal results.
Tyan SMP board pricing (Score:2)
Re:Sounds Good (Score:2)
You were probably simply using a brain-damaged kernel driver !!!! Nothing to do with NT itself !
Guess what... it's the same thing. If the drivers are crash NT regularly then something is wrong with the product available to consumers. The same holds true for Linux.
treke
BeBox!!!!! (Score:2)