Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×

Comment Re:Let me get this straight (Score 1) 405

I'd wager that a large percentage of the 47% are businesses with a large number of seats and pretty much a standard application set, introducing a new application into these kind of environments are normally a big descision anyway with the likelyhood of a role out of new hardware/OS.

If the support contract for your software has come to an end and the supplier is no longer willing to support older software then you obviously have a business case for an upgrade. However as Microsoft will provide security updates for XP until 2016 why upgrade now?

Comment Let me get this straight (Score 5, Insightful) 405

If you are a company that has a working system that runs fine, why would you force an upgrade just because XP is n't used by consumers any more? Even if you put the economic costs at zero which it certainly is n't and the summary brushes aside way to casually; you always have a risk factor of unforseen issues getting passed testing.

No business should upgrade for the sake of technology fashion, weather it be OS or applications. Hell you see companies running custom DOS programs all the time.

Comment Re:oversimplified (Score 2) 403

in essence: the x86 instruction set is *extremely* efficiently memory-packed. it was designed when memory was at a premium. each new revision added extra "escape codes" which kept the compactness but increased the complexity. by contrast, RISC instructions consume quite a lot more memory as they waste quite a few bits. in some cases *double* the amount of memory is required to store the instructions for a given program [hence where the L1 and L2 cache problem starts to come into play, but leaving that aside for now...]

New x86 instructions did n't use escape codes, they used unused opcode space, the instruction format has n't changed much since the 386. In fact when going to 64-bits the default data size and some of the fields meanings where changed but other than that nothing radical.

so what that means is that *regardless* of the fact that CISC instructions are translated into RISC ones, the main part of the CPU has to run at a *much* faster clock rate than an equivalent RISC processor, just to keep up with decode rate. we've seen this clearly in an "empirical observable" way in the demo by ARM last year, of a 500mhz Dual-Core ARM Cortex A9 clearly keeping up with a 1.6ghz Intel Atom in side-by-side running of a web browser, which you can find on youtube.

This makes no sense, instructions on modern x86 processors are decoded and stored in the trace cache, in a loop the processor is executing pre-translated instructions. In fact on AMDs latest CPU architecture instructions cant be sent to the execution units fast enough.

intel know this, and AMD don't. it's why intel will sell their fab R&D plant when hell freezes over. AMD have a slight advantage in that they've added in parallel execution which *just* keeps them in the game i.e. their CPUs have always run at a clock rate that's *lower* than an intel CPU, forcing them to publish "equivalent clock rate" numbers in order to not appear to be behind intel. this trick - of doing more at a lower speed - will keep them in the game for a while.

AMD has n't done this for years, and when they did it was because their processors had higher a IPC count than Intels at the time so MHz was not a fair metric to compare the processors.

but, if intel and AMD don't come out with a RISC-based (or VILW or other parallel-instruction) processor soon, they'll pay the price. intel bought up that company that did the x86-to-DEC-Alpha JIT assembly translation stuff (back in the 1990s) so i know that they have the technology to keep things "x86-like".

Erm, Intel had VLIW, remember Itanium? Other than in highly parallel number crunching workloads it was being outperformed by Intels own x86s. For general computing VLIW sucks, program execution is just too dynamc.

Comment Re:Blast in time (Score 2) 403

Every modern CISC chip is basically a dynamic translator on top of a RISC core.

And that's the problem for power consumption. You can cut power to execution units that are not being used. You can't ever turn off the decoder ever (except in Xeons, where you do in loops, but you leave on the micro-op decoder, which uses as much power as an ARM decoder) because every instruction needs decoding.

If it was just the case of turning off execution units for a processor with a simpler decoder then the Cortex-A9 would n't have the need for the extra low-power fifth core.

Comment Re:oversimplified (Score 1) 403

Well there has never been a CISC-RISC instruction conversion from an Intel processor, AFAIK the AMD-K5 was the only processor to do so and the original Pentiums pretty much out performed them.

Out of order Intel processors since the Pentium Pro have converted instructios into very simple uOPs, in fact many RISC processors do the same thing.

Comment Inetl having to work backwards (Score 1) 403

Intel have always designed their processors for performance first where as with ARM it was for power consumption, hell only recently did ARM get a hardware integer divide instruction. x86 instruction decode is not so complicated that it requires four times the amount of power, if it did Intel would never be able to produce high performance chips.

The CISC/RISC debate is pretty much a red-herring but it keeps on coming up over and over again, because as you increase performance the instruction decode becomes a smaller part of the processor, this is why on the A9 you have a fifth extra core for stand-by which has been engineered for low power.

It is n't the 80s and 90s any more.

Comment Re:Why so little memory? (Score 1) 67

Accessing global memory on GPUs is extremely slow and there is a strict memory heirarchy that you have to adhere to in order to get any kind of performance.

It could be seen as being the same as the CPU, except will automatically cache it to fast memory for you.

Any problem where you need random access over a large amount of data is just not feasible on GPUs.

What makes you think it would be faster with the MIC?

Granted it is the hardware doing the work, but you have four threads per core on Knights Corner with a cacheline miss causing a context switch masking the latencies. You also do n't have the extra overhead of your code setting up the copies between global and shared memory (which is limited to 48K on CUDA) everytime you want to access a data structure. Obviously you have many more cores on a GPU but how much performance do you think you will get once you have to jump through all the hoops and basically implement your own caching mechanisms? Ultimately GPUs are limited to simple problems where your dataset can be broken into very small peices with very little logic and simple random memory access which is fine for big number crunching problems, with MIC you atleast will have more flexibility.

Comment Re:Why so little memory? (Score 1) 67

You will be parallelizing, and each thread will only ever be able to use max_mem/N for its own processing.
When you parallelize, you avoid sharing memory between threads. Your data set is split over the threads and synchronization is minimized. In a SMP/NUMA model, this is done transparently by simply avoiding to access memory that other threads are working on. In other models, you have to explicitly send the chunk of memory that each thread will be working on (through DMA, the network, an in-memory FIFO or whatever), but it doesn't change anything from a conceptual point of view.

If your parallel decomposition is much more efficient if your data per thread is larger than 1GB, then you cannot possibly run 64 threads set up like this on the MIC platform. There is often a minimum size required for a parallel primitive to be efficient, and if that minimum size is greater than max_mem/N then you have a problem. This is the limiting factor I'm talking about.128 MB, however, is IMO quite large enough.

For algorithms where you have a basically regular streaming data then yes, your working data set will be mem/n but as I mentioned there are a number of problems where you have a large mainly static dataset such as raytracing or financial modeling. In these scenarios being able to access a large shared pool of memory has big advantages.

In fact this is a major advantage of MIC versus GPUs.

The advantage of MIC lies in ease of programming thanks to compatibility with existing tools and the more flexible programming model.
Memory on GPUs is global as well, so I have no idea what you're talking about. There is also so-called "shared" memory (CUDA terminology, OpenCL is different) which is per block, but that's just some local scratch memory shared by a group of threads.

Accessing global memory on GPUs is extremely slow and there is a strict memory heirarchy that you have to adhere to in order to get any kind of performance. This heirarchy is what makes it a pain to program for and why you need special tools and kernels in the first place. Any problem where you need random access over a large amount of data is just not feasible on GPUs.

There is nothing nighmarish of the above

Please stop deforming what I'm saying. What is nightmarish is finding the optimal work distribution and scheduling of a heterogeneous or irregular system.
Platforms like GPUs are only fit for regular problems. Most HPC applications written using OpenMP or MPI are regular as well. Whether the MIC will be able to enable good scalability of irregular problems remains to be seen, but the first applications will definitely be regular ones.

For those kinds of problems there is n't anything in the MIC that will set the world on fire other than the easier programming model as it basically comes down to bandwidth and FLOPS. However from what I have seen in terms of architecture there are a number of areas where it should perform nicely. FYI, if you havee n't already read it:

http://newsroom.intel.com/servlet/JiveServlet/download/38-11511/Intel_Xeon_Phi_Hotchips_architecture_presentation.pdf

Comment Re:Why so little memory? (Score 1) 67

On a card with 8GB your effective memory accessible per core is 8GB, lots of problems have large data sets that can be shared over cores such as the example I gave. In fact this is a major advantage of MIC versus GPUs.

There is nothing nighmarish of the above, it would appear just a shared memory area to the process.

Comment Re:Why so little memory? (Score 1) 67

According to the architecture each core does n't have it's own memory other than the L2 & L1 caches. How the memory is mapped per core is arbitary, there is nothing stopping you from having for exampe a shared data set using 4GB and using 64MB per core for a raytracer where the scene data is stored in the shared memory and each core works on part of scene. So no, you don't have to limit memory per thread to fully use the architecture properly.

Comment Ooops. (Score 2) 67

Ooops, scratch that miss-read the summary. There probably is n't a need for that much memory because the kind of problems they are most likely to be dealing with will have massive datasets that don't fit in memory anyway. The limiting factory will be CPU and node interconnect bandwidth so adding extra memory wont make much if any difference to performance.

Slashdot Top Deals

This restaurant was advertising breakfast any time. So I ordered french toast in the renaissance. - Steven Wright, comedian

Working...