b00fhead - Slashdot User

Comment Re:Existing TRX40 motherboards? (Score 1) 71

by m.dillon on Wednesday January 08, 2020 @01:53PM (#59599690) Attached to: AMD Unveils Ryzen 4000 Mobile CPUs Claiming Big Gains, 64-Core Threadripper

I'm sure many people will be replacing their older TR systems with newer TRX40 systems. I'm not sure why you believe people wouldn't. The TR3 chips are considerably more powerful than the TR2 chips core-for-core, older TR systems have value on the used market, and not everyone with TR2 systems are running the highest-end TR2 chips.

Someone with a 2970WX or 2990WX system probably wouldn't be upgrading (except possibly to a 3990X), but I would say that many people with a 1900X, 1950X, 2920X, or 2950X will definitely be in the market.

If these people don't upgrade to TR3, they will probably opt for upgrading to an AM4 based 3950X instead (which is a much cheaper motherboard + cpu combination than TR2/3).

-Matt

Comment Re: Can we have real SMP back? (Score 1) 71

by m.dillon on Wednesday January 08, 2020 @01:44PM (#59599652) Attached to: AMD Unveils Ryzen 4000 Mobile CPUs Claiming Big Gains, 64-Core Threadripper

How many threads ? If I startup firefox here and open up four windows with a few tabs in them (six tabs total) I see around 8 discrete processes and over 200 program threads.

-Matt

Comment Re:Existing TRX40 motherboards? (Score 2) 71

by m.dillon on Wednesday January 08, 2020 @04:23AM (#59598304) Attached to: AMD Unveils Ryzen 4000 Mobile CPUs Claiming Big Gains, 64-Core Threadripper

Any TRX40 motherboard can run the 3990X. It is true that older X399 motherboards cannot run the new TR3 chips, or vise-versa, and I agree it kinda sucks a little. But its hard to be angry at AMD considering what they packed into the TRX40. They didn't just force people onto a new TR socket gratuitously, unlike Intel.

The TRX40 motherboards have 4x the data bandwidth between cpu and chipset that X399 had. That's four times the bandwidth. Not the same, not twice... four times. It means that all the PCIe lanes hanging off the chipset are usable, and this cannot be said for any other motherboard from either AMD or Intel. TRX40 has 72 total unencumbered PCIe lanes available to the user.

The TRX40 motherboards are also all PCIe-v4-ready (the X399 motherboards had no chance of doing PCIe-v4), and the DDR4 channels have been re-laid-out to allow the ram to be clocked significantly higher.

So... complaining about it kinda silly. AMD saw a chance to quadruple chipset bandwidth and they took it. That's the main reason why the socket isn't compatible, and I'm fine with it.

AMD also gave people 16 cores on AM4, backwards compatible all the way to B450 (I wouldn't try it on an A320). So 90% of the AM4 motherboard line-up can now take a 16-core cpu. That's a pretty nice present AMD gave us there!

-Matt

Comment Re: Can we have real SMP back? (Score 4, Informative) 71

by m.dillon on Wednesday January 08, 2020 @03:54AM (#59598278) Attached to: AMD Unveils Ryzen 4000 Mobile CPUs Claiming Big Gains, 64-Core Threadripper

The CPUs in TODAY's laptops beat the holy living crap out of what we had in the Sandy Bridge era, even when running at lower frequencies. It isn't even a contest. Yes, laptop vendors put physically smaller batteries in the thinner laptops... they still put large batteries in 'gaming' laptops, though, and even the smaller batteries generally have twice the watt-hours of capacity that older laptops from that era had.

In addition, the CPU performance has very little to do with battery life unless the laptop is actually being loaded down. Most of the battery's power consumption is eaten up by the display.

Just playing Video or browsing around puts basically ZERO load on a laptop CPU. The video is handled by dedicated decode hardware in the iGPU, and having a ton of browser windows open doing animations won't even move the needle on CPU use. The only way to actually load a laptop CPU down these days is to do some sort of creator type of work... batch photoshop, rendering, VR, or other work.

Almost nothing running on a modern laptop is single-threaded, not even a browser that has only one tab open. At a minimum the graphics pipe will use a second core (whether using GPU HW acceleration or not), which means that software logic and screen updates get their own cores. Even for a single-threaded program. There are no bottlenecks outside of the storage subsystem so if that's a SSD a modern laptop is going to have lightning response under almost all conditions.

Any real browser, such as chrome or firefox, is pretty seriously multi-threaded. I have four chrome windows open on my workstation right now with not very many tabs... maybe 6 tabs open at the moment, and ps shows 182 program threads associated just with chrome across 21 discrete processes. 182 program threads.

Where there is bloat on today's systems tends to be with memory use, particularly when running browsers. Getting a laptop with at least 8GB (and better, 16GB) of ram is definitely an important consideration. My relatively minimal browser use is eating... 5GB of ram. Of course, my workstation has 32GB so I don't really feel it. But the same issue exists on a laptop. Get more ram, things will run more smoothly. You can swear at the software... but still get more ram :-).

-Matt

Comment Re:Can we have real SMP back? (Score 3, Interesting) 71

by m.dillon on Wednesday January 08, 2020 @03:35AM (#59598260) Attached to: AMD Unveils Ryzen 4000 Mobile CPUs Claiming Big Gains, 64-Core Threadripper

Yes and no. Yes, a better cooler will result in better performance, but there are three problems.

First, there are limits to just how quickly heat can be dissipated from the silicon due to the transistor density. As geometries get smaller, power density continues to increase. Ambient cooler (whether air or liquid based) limit out. Going sub-ambient is generally a non-starter for regular use, but if you decide to you still can't go below freezing without causing serious condensation. Not for regular use anyway.

The second problem is power consumption. Power goes exponential as the frequency goes past its sweet spot (around 3.8 GHz or so on Zen 2). This is fine if only one core is being boosted, but try to do it on all cores and you can easily start pulling 200-300W just for the CPU socket alone.

The third problem is called electro-migration... basically the more current you throw into the CPU die on these smaller nodes, the lower the 'safe' voltage winds up being. Where the two cross gives you the maximum safe frequency you can actually run the CPU at. So when you are trying to push higher all-cores frequency you wind up in a rat-race. Higher frequencies require higher voltages, but the maximum safe voltage drops the more cores you try to run at those higher frequencies.

These problems also apply to Intel's 10nm and will likely apply to all future (smaller) nodes as well for both Intel and other foundries.

-Matt

Comment Locks are complicated (Score 5, Informative) 191

by m.dillon on Monday January 06, 2020 @01:49AM (#59591110) Attached to: Linus Torvalds Calls Blogger's Linux Scheduler Tests 'Pure Garbage'

Locks are complicated. It's really that simple (ha ha). All of the operating system projects have gone through a dozen generations of lock design over the last 30 years because performance depends heavily on all sorts of things. In modern-day, cache-line effects (what we call cache-line ping-ponging between CPUs) are a big deal due to the number of CPU cores that might be involved. Optimal implementations in the days of 4-core and 8-core machines fall flat on their faces as the core count increases.

Even situations that you might think wouldn't be an issue, such as a simple non-contended shared lock, has serious performance consequences on multi-core machines when they are banged on heavily... consequences that can cause latencies in excess of one microsecond from JUST a single NON-CONTENDED atomic increment instruction. That's how bad it can get.

In modern-day kernel programming, spin-locks can only be used safely because the kernel has fine control over the scheduler. Spin-locks in userland tends to be disastrous in the face of any sort of uncontrolled scheduler action. And to even make them work reliably on many-core machines we need backoff mechanisms to reduce the load on the cache coherency busses inside the CPU. Linus is exactly right.

There are other major issues with locks that become dominant on systems with more cores. Shared/Exclusive lock conflict resolution becomes a big problem, so the locking code needs to handle situations where many overlapping shared locks are preventing a single exclusive lock from being taken, or where many serial exclusive locks are preventing one or more shared locks from being taken. Just two examples there.

Even cache-line-friendly queued locks (sequence space locks) have major trade-offs. Stacked locks (that look like mini binary trees) eat up serious amounts of memory and have their own problems.

The general answer to all of this is to develop code to be as lockless as possible through the use of per-cpu (or per-thread) data structures. The design of RCU was one early work-around to the problem (though RCU itself has serious problems, too). Locks cannot be entirely avoided, but real performance is gained only when you are able to code an algorithm where no locks are required in most critical path situations. That's where all the OS projects are moving in modern-day.

-Matt

Comment Holy cow, so much mis-information (Score 2) 201

by m.dillon on Sunday August 11, 2019 @12:39PM (#59076658) Attached to: Can Swap Space Solve System Performance Issues?

An unbelievable amount of junk is being posted. The short answer is: Always run with swap installed on a Linux or BSD system, period. If you don't, you're an idiot. As to why? There are many, many reasons. How much? Generally as much as main memory. If you have a tiny amount of memory, more, if you have tons of memory, like a terrabyte, then less. And these days it absolutely should be on a SSD. SSDs are not expensive... think about it. Do you want 40GB of swap? It's a $20 SSD. Frankly, just putting the swap on your main SSD (if you have one) works just as well. It won't wear out from paging.

Linux and BSD kernels are really good at paging out only what they really need to page out, only paging when there is actual memory pressure, doing it in batches (no need to worry about SSD swap write amplification much these days... writes to swap are batched and are actually not so random. Its the reads that tend to be more random). I started using SSDs all the way back in the days of the Intel 40GB consumer drives, much of them for swap, and have yet to reach the wear limit for any of them. And our machines get used heavily. SSDs wear out from doing other stupid things... swap is not usually on the list of stupid things. The days of just random paging to swap for no reason are long over. Windows... YMMV.

Without swap configured you are wasting an enormous amount of relatively expensive ram to hold dirty data that the kernel can't dispose of. People really underestimate just how much of this sort of data systems have... it can be huge, particularly now with bloated browsers like Chrome, but also simply things like TMPFS which is being used more heavily every day. Without swap configured if memory gets tight the ONLY pages the kernel can evict are shared read-only file-backed pages.... generally 'text' pages (aka code). These sorts of pages are not as conducive to paging as data pages and it won't take long for the system to start to thrash (this is WITHOUT swap) by having to rip away all the program code and then instantly page it back in again. WITH swap, dirty data pages can be cleaned by flushing them to swap.

Configure swap, use SSDs. If you are worried about wear, just check the wear every few months but honestly I have never worn out a SSD by having swap configured on it.... and our systems can sometimes page quite heavily when doing bulk package builds. Sometimes as much as 100GB might be paged out, but it allows us to run much more aggressive concurrency settings and still utilize all available CPU for most of the bulk run.

So here are some bullets.

1. Systems treat memory as SWAP+RAM, unless you disable over-commit. Never disable over-commit on a normal system. The SWAP is treated like a late-level cache. CPU, L1, L2, L3, [L4], RAM, SWAP. Like that. The kernel breaks the RAM down into several queues... ACTIVE, INACTIVE, CACHE, then SWAP. Unless the system is completely overburdened, a Linux or BSD kernel will do a pretty damn good job keep your GUI smooth even while paging dead browser data away.

2. Kernels do not page stuff out gratuitously. If there is no memory pressure, there will be no paging, even if the memory caches are not 'balanced'.

3. There is absolute no reason to waste memory holding dirty data from idle programs or browser tabs. If you are running a desktop browser, swap is mandatory and your life will be much better for it.

4. Same is true for (most) SERVERs. Persistent sessions are the norm these days, and 99% of those will be idle long-term. With swap the server can focus on the ones that aren't, and paging in an idle session from a SSD takes maybe 1/10 of a second.

5. CPU overhead for paging is actually quite low, and getting lower every day. Obviously if a program stalls on a swapped page that has to be paged in you might notice it, but the actual CPU overhead is almost zip.

6. The RAM required to manage swap space is approximately 1 MByte per 1 GByte of swap. I regularly run hundreds of gigabytes of swap, just to give me breathing room if I happen to get runaways. System ram overhead is just not a big deal.

7. SSD wear is not typically an issue these days, for many reasons. Writes to swap are typically batched and actually don't have all that much write amplification. Its the reads which tend to be more random and random reads are the perfect load for a SSD.

Run with a SSD, configure a reasonable amount of swap on it, stop worrying about paging-induced wear, and you are done. This is the modern world.

-Matt

Comment Re:And yet... (Score 1) 148

by m.dillon on Sunday July 15, 2018 @01:02PM (#56951932) Attached to: Chrome is Using 10-13% More RAM to Fight Spectre

Yes, an ad-blocker definitely reduces memory usage, by a lot. However, its a bad idea to use any add-on for 'important' sites. I compartmentalize my browser into different user ids so the actual chrome instance I use to access sensitive accounts is completely independent of the instance I use for general browsing. The ad-blocker is disabled for the one I use to access sensitive accounts (in fact, ALL add-ons are disabled for that one), and enabled for the one I use for general browsing.

-Matt

Comment Feature is more swap-friendly, so actually (Score 1) 148

by m.dillon on Sunday July 15, 2018 @12:59PM (#56951918) Attached to: Chrome is Using 10-13% More RAM to Fight Spectre

So actually even though the memory footprint is larger, using separate processes also makes chrome more swap-friendly, which means the kernel can page-in/page-out the tabs more efficiently. The result seems, at least for me, to be a smoother ride when I have a lot of tabs open.

Of course, swap space should always be configured on a SSD.

I always enable the site isolation option. Its nice to see google finally making it the default.

-Matt

Comment Spend a Billion to... (Score 2) 56

by hazydave on Thursday April 19, 2018 @02:47PM (#56466385) Attached to: Facebook To Design Its Own Processors For Hardware Devices, AI Software, and Servers

So Facebook will spend a billion to deliver what, exactly... a home internet speaker that will automatically post to Facebook pictures of my dinner, so I don't have to? Detect what TV shows I watch and give me automatic LIKEs for those? Listen to my phone calls and automatically "Friend" those people? Trick the Echo next to it into ordering random crap, so we get rid of it?

Comment Re:Intel doubled Mac sales (Score 1) 513

by hazydave on Thursday April 05, 2018 @01:15PM (#56387889) Attached to: No More Intel Inside, Apple Plans To Use Its Own Custom-Built Chips in Mac

The 64-bit instruction set used in 64-bit x86 processors was originated by AMD. The ISA these days is a mix, since Intel designed most of the new instructions, SSE (AMD has a competing thing called 3D Now!), etc.

The machine architecture to run those instructions changes from processor family to processor family, and was certainly designed by Intel, when it's in an Intel chip. Both Intel and AMD use their own version of the technique first used in the NexGen's processors, the idea of converting x86 instructions on-the-fly into one or more RISC-like instructions. But just the idea (well, AMD bought NexGen and used some of their technology directly in the K6 series).

Comment Re:Intel doubled Mac sales (Score 1) 513

by hazydave on Thursday April 05, 2018 @01:10PM (#56387863) Attached to: No More Intel Inside, Apple Plans To Use Its Own Custom-Built Chips in Mac

It's not really close to the same situation.

When Apple went PowerPC, they were going there for performance, to support the huge percentage of the media content creation market they had wound up with, back when PCs didn't support such things very well. Motorola wasn't competitive with Intel, in a big part because in 1996 Apple was the last standing 68K personal computer company and didn't have the market share to sustain that kind of development for Motorola/Freescale.

The idea with the AIM Alliance was to promote a standard PowerPC platform (PReP, I mean CHRP, no PPCP, ok, maybe CHRP... ) to rival the Intel/IBM Ad Hoc standard. That was not a bad idea.

The problem was that, immediately, other comapanies did this better than Apple. Power Computing won a big chunk of the market. My company at the time, PIOS AG, launched the first 300MHz Mac Clone available. And then when SJ came back, it was curtains for the gherkins... he was the original closed, appliance computer, and it had to be "mine, mine, all mine". Of course, SJ neglected one of the main points of AIM -- enough volume to keep hardware competitive with Intel. So in 1997, it was absolutely obvious that Apple would eventually leave PowerPC. The PPC970 was nice... for about two weeks. Intel pretty much invented the way all successful modern chip comapnies work, with multiple tweaks of each technology and three independent teams always working on the Next Big Thing. So there's a new thing every six months. That's how nVidia won on GPUs... doing the Intel thing.

But these days, Apple's scared off their high end media content creation people by phoning it in on the Mac Pro. A new major upgrade every five years, whether you need it or not. They have built market share mostly from iOS coattail people... like my sister Kathy. Maybe this report is nothing, but it makes perfect sense for Apple to move macOS closer to iOS. It saves on development efforts. It lets them push out more advanced ARM tech before it's possible to make that low enough power. It will win more coattail customers. The average desktop PC user today doesn't need a faster CPU than a decent ten-year-old PC, and Apple's ARM cores are already faster than that.

Not my next computer, but then again, the last Macintosh I'd even have considered using was the one I was the CHRP machine I was developing back then Jobs put the kebash on the whole thing. Apple doesn't build serious computers today, anyway.

Comment Re:Whoa (Score 1) 513

by hazydave on Thursday April 05, 2018 @12:52PM (#56387689) Attached to: No More Intel Inside, Apple Plans To Use Its Own Custom-Built Chips in Mac

The real problem Adobe had wasn't Apple changing processors, it was Apple not selling enough computers. At one point Apple fell to about 1.5% of global PC shipments. Adobe did what every other successful company did -- it concentrated on supporting the platform with actual paying customers: Windows.

That prompted Apple to get more serious about their own in-house professional media content creation software. And that didn't help the rift between Apple and Adobe at all. Then there was Jobs, going in full attack mode about Adobe Flash... not that he was wrong about proprietary Flash vs. standard HTML5. And Adobe didn't fundamentally care, because Flash was just a means to the end of their selling Flash development tools. But a smack-down is a smack-down.

Today, the Mac is 10% of less of Apple's business. They don't want to kill it, but it's also a ton of work compared to iOS per unit sold. RIght now they have to have several different laptops at different performance levels, they have to have iMacs, they still have Mac Pros though they only seem to sell in the first year or two of their 5-year-or-so lifespan. As that market continues to shrink, Apple's going to contiune to lose interest unless it becomes, essentially, part of the iOS product lineup.

Comment Re:does Apple's A-series have the pci-e needed? (Score 1) 513

by hazydave on Thursday April 05, 2018 @12:41PM (#56387595) Attached to: No More Intel Inside, Apple Plans To Use Its Own Custom-Built Chips in Mac

Most embedded application processors have at least one PCIe link... no idea about Apple's, specifically, but that's a standard everyday module on the Chinese Menu of ARM components. I don't know if Apple is using AMBA/AHB for high speed internals on today's SOCs, or something else, but it's available right now up to 1024-bits wide. I doubt they'd have a performance bottleneck for laptop/desktop things.

And they're not building a Xeon or i7, either... Apple's been slowly killing off their high-end users through years of high-end neglect. They could pick up more sales, and lower development costs, by pushing the Macintosh into more of a desktop/laptop iPad Pro kind of thing... still mouse & keyboard but more like what iOS users expect. Not that I'd buy one, but I'd never buy a Mac PC either.

Comment Re:Whoa (Score 1) 513

by hazydave on Thursday April 05, 2018 @12:32PM (#56387523) Attached to: No More Intel Inside, Apple Plans To Use Its Own Custom-Built Chips in Mac

There was never anything called "Acorn RISC Machines".

There was the Acorn RISC Machine -- the V1.0 ARM Architecture and all that began at Acorn Computers Ltd. When the CPU company was split off from the main body of Acorn, it was launched as Advanced RISC Machines, and it was a three-way partnership between Acorn, Apple, and chip maker VLSI Technology.

Slashdot Top Deals