Comment Re:Actual Solaris Sysadmin Here - Here's the story (Score 1) 190
> Also, yes, various Linux virtualization technologies you can hot-migrate running systems between software VMs within the same chassis (and warm migrate with downtime between hardware chassis).
In this case, it was a live migration over two different chassis.
> But the price/perf/features just don't add up in the modern world versus commodity hardware and open source software, except...
In the existing case with a large corporate environment with a Linux/x86 cloud and a Solaris/SPARC cloud, Oracle wins the spot for lowest price point for a system, and wins at the higher end because Linux doesn't (reasonably) scale that big.
> You just have to install the "mcelog" package on e.g. Debian/Ubuntu. I'm sure the same software exists for the other distros.
Will it simply retire bad pages (at a page level) as they happen, or is it able to detect when enough errors have happened on a single DIMM and to retire all the pages on the DIMM (because it understands the hardware layout and can map those pages to specific hardware)? That's the advantage of controlling the OS and controlling the hardware. Here is an example of multiple errors being detected by Solaris on a DIMM and it identifying and retiring all pages on the entire DIMM:
Fault class : fault.memory.dimm_sb
Affects : mem:///motherboard=0/chip=1/memory-controller=0/dimm=3/rank=0 degraded but still in service
FRU : "CPU 1 DIMM 3" (hc://:product-id=Sun-Fire-X4200-Server:chassis-id=0000000000:server-id=oryx/motherboard=0/chip=1/memory-controller=0/dimm=3)
Description : The number of errors associated with this memory module has exceeded acceptable levels. Refer to http://sun.com/msg/AMD-8000-2F for more information.
Response : Pages of memory associated with this memory module are being removed from service as errors are reported.
Impact : Total system memory capacity will be reduced as pages are retired.