Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
Compare cell phone plans using Wirefly's innovative plan comparison tool ×

Comment Re:What did they do to their processors? (Score 2) 86

Skylake doesn't "need" special support, unless you want to take advantage of it's special clocking ability, which makes it more responsive. Normally the OS tells the CPU what speed to run at, but the OS can only update this on context switch, which can take several milliseconds per change and many changes to ramp-up the frequency. When the CPU controls itself, it can change frequency in response to load up to 2x faster. For long sustained tasks, this shows up as a about 2% increase in performance, but for short bursty tasks, this shows up as a 25% improvement, all the while only consuming about 0.8% more power under load.

Summary based on benchmarks:
1) Makes the CPU 25% "faster" for very short lived workloads by quickly ramping up from idle
2) Makes the CPU 2% faster for sustained workloads
3) Only consumes 0.8% more power under load and saves power for the short lived loads by completing them more quickly.

Comment Re:I like GPLv2 too, but there's just one thing (Score 1, Flamebait) 237

If they make changes, it's no longer my code. I don't see the issue. This is the way I see it
GPL: Forcing riff-raff to contribute back
BSD: Make the world a better place by sharing

The only good code is code given willingly. GPL is made by forced labor in North Korean sweat shops and BSD is made by freedom loving hippies.

Comment Re:massive parallel processing=limited application (Score 1) 114

That's why the many core server CPUs have massive L3 caches and quad channel memory. 24 core x86 CPU with around 60MiB of L3 cache? Why not? More memory channels allow more concurrency of access. Intel NICs support written packets directly to L3 cache as to skip memory writes. Large on NIC buffers to make better use of DMA collecting and reduce memory operations, transferring in larger chunks to make use of that high bandwidth memory.

In case it's not clear, I'm not trying to say your point isn't valid, just saying your point explains a lot of current features in high end components.

Comment Re:massive parallel processing=limited application (Score 1) 114

Also, multicore designs can have separate memory.

NUMA comes to mind but it has complexity issues added to the OS and application. Accessing another CPU's memory is expensive, so the OS needs to try to keep the threads close to the data. The applications need to try to do cross socket communication by using a system API, assuming it exists, to find out which socket the thread is on and trying best to limit to current socket threads. Cross socket communication is probably best done via passing messages instead of reference because copying and storing the data locally, even if duplicated, will be faster than constantly accessing the other socket's memory.

Then you have the issue of load balancing memory allocations. May not always be an issue but it can become an issue if you consume all of one socket's memory. There are other issues like one socket may have direct access to the NIC while the other socket has direct access to the harddrives. Topology is important.

As soon as you step out of a cache-coherent system, then you run into even more fun problems. Stale data is a huge issue. You need to make sure you're always getting the current data and not some local copy. At the same time, without cache-coherency, cross core communication is very high latency. Most x86 CPUs can remain cache-coherent into single cycle latencies. While copying the data may not be any faster, you know if the data changed very very quickly. If the data is read a lot and rarely changed, then you have some nice guarantees about how quickly you know if the data changed and only incur the access cost when that event happens. Without coherency , you are now forced to check out to memory every time, incurring high latency costs every access.

With multi-core systems, cache-coherency has an N^2 problem. I'm sure someone will come up with an idea of "channels" to facilitate low latency inter-core communication while allowing normal memory access to be separate. Possibly even islands of cache-coherency in an ocean of cores. Each island can be a small group. Some of the many-core designs where they have 80+ cores have heavy locality issues. Adjacent cores are fast to access and far aware cores are very expensive. Pretty much think of each core only able to talk to adjacent cores, and requests to far away cores need to go many "hops". Even worse is cores physically nearer the memory controller have faster access to the memory. All memory requests have to go through these cores. Lots of fun issues that requires custom OS designs.

Comment Re: massive parallel processing=limited applicatio (Score 1) 114

I've managed super-linear a few times with multi-threading. Required good use of cache. If you can get the threads to be pseudo-synchronized without having to use any actually synchronization, what the first thread reads from memory, the other threads can benefit from. This case only applies to cores that share the same cache. The "super-linear" part no longer applied adding more sockets/CPUs, and adding more cores had diminishing returns, approaching a fixed percentage increase in performance over a single thread.

Then I tell people I code in C# and they don't understand how someone who writes in a high level language know how to think so low level. Lets just say I'm that go-to guy when you can't empirically find why your code is slow. Many hard performance issues cannot be measured because measuring can change the outcome. At that point you need a good mental model of how CPUs, cache, memory, OSes, threads schedulers, io schedulers, harddrives, SSDs, and networks interact to produce strange slowness when no one part is the bottleneck. Almost always an issue of latency vs throughput and different parts of the system with different throughput or latency characteristics.

Comment Re:So let me get this straight (Score 1) 188

$180 PSU, $150 mobo, $150 memory, $400 few SSDs, $60 case, $200 monitor, $300 GPU, $70 Intel network card. Not to mention the $30 each for mag-lev bearing fans. Yep, I really want to save $50 on a CPU with heat and power issues.

I came from a poor family, I had to earn my own money to buy computer parts when I was a child. I've learned to appreciate quality. If AMD can get within 10% of Intel in performance per core and efficiency, I will support the underdog. I really want a bunch of cores and ECC memory on my desktop, but Xeons are too expensive.

Comment Re:Reliability (Score 1) 209

Been building and repairing computers for 25+ years and have worked in IT for quite a few. I have never seen a harddrive die from power issues. I have seen burnt motherboards, and melted traces where the power comes in, but never had an HD die from a surge or lightning strike. Pretty much only unexpected shutdowns in need of a scandisk. I have seen drives die for a myriad of other reasons.

How common are surge/lightning/PSU-blow-up HD deaths? My limited experience is "not often" since I've never seen one.

Comment Re:comment (Score 1) 209

To change any earlier block. Changing earlier data requires later data to be re-written because the write head is wider than the read head. As long as you append data, you're fine. There in lies the rub. How do you know if you're near the front or back of a shingled region? If it's always per track, then that information is available. Even then, most/all file systems don't care. OpenZFS will care in the future. CoW nature plays well with being able to almost always append to these regions, reducing the amount of re-writing.

Comment Re:comment (Score 1) 209

OpenZFS has been working to become aware of shingled storage. The CoW nature of ZFS already plays well with shingled recording, but it will become much better once the FS is aware of the layouts. In theory it's not much work, in practice, it's a lot of refactoring.

Slashdot Top Deals

You might have mail.

Working...