Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
Check out the new SourceForge HTML5 internet speed test! No Flash necessary and runs on all devices. ×

Comment Re:Still no competition (Score 1) 91

Its not 25% of the price though. Not sure where you got that from. The benchmark Intel cpu that AMD is competing against is the I7-7700K, which is $350 on Amazon. It will be the AMD 6-core against that one.

AMD will also be able to compete against Intel's i3's. An unlocked Ryzen *anything* (say, the 4-core ryzen) will be the hands-down winner against any Intel i3 chip on the low-end. Intel will have to either unlock the multipliers on all of its chips to compete, or pump up what they offer in their i3-* series.

The AMD 8-core is probably not going to be a competitive consumer chip.

-Matt

Comment Re:parallelism vs raw clock speed (Score 2) 139

The unfortunate result of VLIW was that the cpu caches became ineffective, causing long latencies or requiring a much larger cache. In otherwords, non-competitive given the same cache size. This is also one of the reasons why ARM has been having such a tough time catching up to Intel (though maybe there is light at the end of the tunnel there, finally, after years and years). Even though Intel's instruction set requires some significant decoding to convert to uOPS internally, it's actually highly compact in terms of the L2/L3 cache footprint. That turned out to matter more.

People often misinterpret the effects of serialization. It's actually a matrix. When a portion of a problem has to be serialized it winds up adding latency, but requiring a portion of a problem to be serialized does not necessarily mean that the larger program cannot run with parallelism. There will often be many individual threads each having to run serially which in aggregate can run in parallel on a machine, and in such cases (really the vast majority of cases) one can utilize however many cores the machine has relatively efficiently.

This is true for databases, video processing, sound processing, and many other work loads. For example, if one cannot parallelize video compression on a frame by frame basis that doesn't mean that one cannot use all available cpus by having each cpu encode a different portion of the video.

Same with sound processing. If one is mixing 40 channels in an inherently serialized process this does not prevent the program from using all available cpus by having each one mix a different portion of the overall piece.

For databases there will often be many clients. Even if the query from one particular client cannot be parallelized, if one has 1000 queries running on a 72-core system one gains scale from those 72 cores. And at that point it just comes down to making sure the caches are large enough (including main memory) such that all the cpus can remain fully loaded.

-Matt

Comment Re:Most depressing thing I've read all week (Score 2) 139

The main bottleneck for a modern cpu are main memory accesses. What is amazing is that all of the prediction and the huge (192+) number of uOPS that can be on the deck at once is able to absorb enough of the massive latencies main memory accesses cause to bring the actual average IPC back towards roughly ~1.4. And this is with the cache misses causing *only* around ~6 GBytes/sec worth of main memory accesses per socket (with a maximum main memory bandwidth of around 50 GBytes/sec per socket, if I remember right).

Without all of that stuff, a single cache miss can impose several hundred clock cycles of latency and destroy average IPC throughput.

So, for example, here is a 16 core / 32 thread - dual socket E5-2620v4 @ 2.1 GHz system doing a bunch of parallel compiles, using Intel's PCM infrastructure to measure what the cpu threads are actually doing:

http://apollo.backplane.com/DF...

Remember, 32 hyperthreads here so two hyperthreads per core. Actual physical core IPC (shown at the bottom) is roughly 1.39. At 2.1-2.4 GHz this system is retiring a total of 55 billion instructions per second.

In this particular case, being mostly integer math, the bottleneck is almost entirely memory-related. It doesn't take much to stall out a core. If I were running FP-intensive programs instead it would more likely be bottlenecked in the FP unit and not so much on main memory. Also note the temperature... barely ~40C with a standard copper heatsink and fan. Different workloads will cause different levels of cpu and memory loading.

-Matt

Comment Re:Most depressing thing I've read all week (Score 2) 139

Paging to a hard drive doesn't really work in this day and age, the demands of the VM system (due to the commensurant increase in scale of modern machines) are well in excess of what one or two HDDs can handle.

However, virtual memory works quite well with a SSD. Sure, the SSD isn't as fast as memory, but the scale works similarly to how cpu caches vs main memory scaling works. Its back in the ballpark so the system as a whole works quite well.

It depends on the workload of course... browsers are particularly bad if they exceed available ram but its primarily because browsers fragment their memory space badly. Firefox, for example, with a 4GB VSZ, will keep 3GB in core no matter what due to fragmentary access, even though it might only be using 1GB worth of memory in its actual accesses.

For example, one of our bulk builders has 128GB of ram and roughly 200GB of SSD swap configured. In order to configure enough paralllelism to keep the 48 cores fully loaded at all times a portion of the build (when it gets to the larger C++ projects) will require more than 128GB of ram and start eating into the swap space to the tune of another ~100GB or so. However, the cpus are still able to load to 100% because there is enough parallelism to absorb the relatively fewer processes blocked on page-in.

Similarly, my little chromebook with 4G of ram has 16GB of SSD swap configured (running DragonFly of course, not running Chrome), and has no problem with responsiveness despite digging into that swap quite extensively.

So virtual memory does, in fact, work well. And it will work well in most use cases when configured properly with a SSD as backing store. One can also go beyond the SATA SSD and throw in a NVMe SSD for swap, which is even faster (~3GBytes/sec reading for a cheap one). Given that main memory typically has 25-50 GByte/sec of bandwidth, that's only a 8x to 16x difference in speed.

-Matt

Comment Re:Firefox...hmmm (Score 1) 154

I wouldn't really characterize it as any sort of turf war... insofar as I know, there was never any turf war. Each has its uses.

In terms of security, don't forget user account separation. Why trust the application to secure the environment for you? I don't. I segregate browser use cases into separate dummy user accounts and start independent instances of the browser with a simple ssh localhost -l dummy1 -n chrome .... (or whatever, all bound up into a GUI button).

After all, the browser is only part of the problem. I want to be able to open up PDF documents (which runs xpdf), or other things (which might run open-office) from the browser. Even if I trusted the browser's security, I don't trust xpdf or open office. If I were really rabid, I'd segregate xpdf and OO execution into a sub-sub user account (but I'm not quite that rabid).

-Matt

Comment Re:How many DNS queries can it launch (Score 1) 154

Kinda of a funny question. Don't people realize that they can just run a caching DNS service on their workstation or in-home server? I do! It takes care of the problem just like that. No way I would *ever* trust my ISP's DNS service. Or anyone else's for that matter.

-Matt

Comment Confusion about processes and cores (Score 1) 154

There seems to be a lot of confusion about processes and cores and cpu use. I'm going to clarify some of this for people (I should konw after all!).

A process can be thought of as a memory management context. The memory image used by the program. But a process is not limited to just one cpu. A process can be multi-threaded. In a multi-threaded process, all threads share the same memory context but each is scheduled independently by the schedluer, so one process can in fact easily use all available cpu resources.

When a process fork()s new processes, each new process has its own independent memory mangement context. However, any data from the parent will be entirely shared by the child until one or the other modifies it (then it because separate). So simply creating new processes does not necessarily eat all that much memory. It depends what the new processes do.

Browsers tend to use a combination of processes and threads:

office1:/home/dillon> ps ax | fgrep chrome | wc -l
            23
office1:/home/dillon> ps axH | fgrep chrome | wc -l
          188

Firefox also uses both processes and threads:

office1:/home/dillon> ps ax | fgrep firefox | wc -l
              4
office1:/home/dillon> ps axH | fgrep firefox | wc -l
            69

In fact, most complex GUI programs probably uses both processes and threads. However, firefox, until now, does not segregate tabs into processes. The processes it uses are to mange other aspects of browser operation.

In terms of cpu utilization, one process thread can use up to 100% of one cpu thread. One process with N threads can eat up all of your cpu resources. Multilple processes won't eat up any more cpu resources than one process with multiple threads.

In terms of memory use, firefox has *HORRIBLE* memory fragmentation problems (and always had). This means that if Firefox has a VSZ of 5GB, it will probably be forcing 4GB of that into core even when idle or even if you are only messing with one out of many tabs. This has been a serious problem in Firefox for ages. One advantage of giving each tab its own process is that now the OS can take care of cleaning up after Firefox's stupid memory mangement (even if firefox doesn't fix it), because the process context is tracking the per-tab memory use and the modified memory used in that tab is not fragmenting the memory used by other tabs. So in a per-tab process mechanism, when you close the tab the OS can scrap all of that memory and that is a good thing.

So going multi-process won't make memory use any worse. In fact, it will help the OS separate the VM pages out and give the OS a chance to page idle memory to swap, whereas the memory fragmentation that exists in a single-process-many-tabs setup generally prevents the OS from being able to swap out idle memory resources.

In terms of idle cpu use, there are many issues here. 100%+ idle cpu use on idle pages can usually be attributed to three things:

(1) Bad interaction between the browser and the sound device (the intermediate streaming libraries have been known to cause problems in the past).

(2) The open tabs are running lots of ads with animations or video. AdBlock+ helps a lot here.

(3) The javascript on many (most) sites gets *very* unhappy when it is not given any cpu at all, so the browser has to give each tab some cpu so the tab's javascript from that site can run at least a little in order to stay happy. Also, a great deal of site javascript is badly written and consumes cpu resources even when idle.

Finally, in terms of cpu use, the operating system's scheduler usually does a good job but if the browser is causing problems for other work you do on the machine you can always nice +5 or nice +10 the browser. Or (in Linux) run it in a scheduler-constrained container. However, either of these actions will reduce the responsiveness of the browser. Most people don't do it.

-Matt

Comment Generally speaking a good thing (Score 2, Interesting) 154

I'm glad they're finally giving each tab its own process. Of course, if they didn't they'd wind up in the dust bin of history... its about the minimum of work they need to do just to keep Firefox relevant. There is much more they need to do in addition. Honestly, Firefox's biggest competition here, after fixing the tab problem, is going to be chrome 55 with its significantly improved (reduced) memory footprint.

The interesting thing about giving each tab its own process is that although this increases the total amount of memory used by the browser, it also has the effect of reducing the memory fragmentation that forces the OS to keep almost every byte of it in core (Firefox is best known for this effect). With the process separation, the OS will have a much easier time paging out unused memory without nominal browser operation forcing every single last page back in. THAT is a big deal, and is one of the reasons why chrome is so much more usable than firefox.

I regularly leave my browser(s) running for weeks. Process sizes generally bloat up during that time, to the point where the browser is consuming ~8GB+ of ram. With Firefox the horrid memory management in-browser forces most of that to stay in core. With chrome, most of it gets paged out and stays out. This makes chrome far more usable, particularly considering the fact that my workstation runs from a SSD and has a relatively large (~60G) swap partition configured.

But these days I'm a chrome user. Firefox has been too buggy for at least the last 4 years. It crashes on all sorts of things, taking the whole browser out with it. And after all this time they *STILL* can't fix the idiotic pop-up windows. Disable popups only disables some of them. That and the bugs is the main reason why I stopped using Firefox.

In terms of sandboxing... also a good thing. In addition to the work the browser does, I also segregate my browser instances into multiple dummy user accounts (that my GUI buttons can just ssh into from my main account), and run multiple instances of the browser from those. One for unsecure browsing, one for browsing important accounts, and one with the most bulletproof setup I can think of (no video group access, no direct X server access)... which is slow, but about as safe as its possible to be in an X environment.

People often forget about user account separation. It's a bit sad.

-Matt

Comment AMD experience has been poor of late (Score 1) 157

Sorry, but the last straw for me was when I upgraded the radeon drivers on my W10 machine (which I use for gaming). It took an hour to remove all the crapware AMD installed in addition to the drivers. Particularly onerous was their new video recording technology deciding that it would record a game session without telling me so it could pop up a 'see how great this was' window later on.

My answer - spend an hour removing it all from the machine. Then go out and replace my radeon card with a low-end GTX 1060 which performed better and uses 1/3 the power. Instead of buying into AMD's next-gen Polaris.

--

In anycase, external GPUs only matter for game playing these days, or if you need to multi-head four or more monitors. The GPU packed onto the cpu die is plenty fast enough for almost everything these days, and its video acceleration is decent so there's really no reason to buy an external GPU unless you are a game-player.

For non-game activities, AMD's APUs or Intel's GPUs on the cpu chip work fine. I have no problem driving two 4K monitors on my workstation (nearly all of my machines being Intel these days, since AMD dropped the ball on power consumption years go). That said, Intel has been far more open in the last few years and both Linux and DragonFly work great with Intel's built-in CPUs and can use all the 2D, 3D, and video accel features.

The fact that low-end GPUs packed into cpus work fine removes a large vector for customer loyalty. And the crapware AMD started forcing onto people finished the job. Hence why I have a little 1060 in my windows gaming box now. Nice and quiet, zero stress on the board or the machine... no reason to spend more money on a higher-end card.

-Matt

Comment You can somehow do this. (Score 2) 229

I have a high-end camera, which you can program to put your pictures in different folders (you can increment the folder number with a very simple 3 button press operation), which is extremely handy to classify photos.

Another feature restricts playback to a single folder, rather than all the folders in chronological order.

It became very handy when I was abusively threatened with arrest unless I deleted the pictures I took of an abusive train ticket inspector...

Afterwards, I climbed the few stories to the transit authority headquarters to lodge a complaint against that inspector, who eventually got fired...

Comment Yah but (Score 0) 98

They turned all this crap on by default along with annoying auto-run apps. To say that I am unamused would be an understatement. However, I was able to fix the issue trivially by blowing away ALL of AMD's radeon junk, ripping out the radeon card, and buying a nice cheap little Nvidia GeForce GTX 1060.

Problem solved.

-Matt

Slashdot Top Deals

Do you suffer painful illumination? -- Isaac Newton, "Optics"

Working...