Please create an account to participate in the Slashdot moderation system


Forgot your password?

Slashdot videos: Now with more Slashdot!

  • View

  • Discuss

  • Share

We've improved Slashdot's video section; now you can view our video interviews, product close-ups and site visits with all the usual Slashdot options to comment, share, etc. No more walled garden! It's a work in progress -- we hope you'll check it out (Learn more about the recent updates).


Comment: Re:Actual Solaris Sysadmin Here - Here's the story (Score 1) 190

> SunOS/Solaris started out lean; when it got bloated, people like me jumped ship. Linux started out lean and it is getting bloated; when it is getting too bloated, I will jump ship again.

FWIW, Linus said that Linux is bloated. That was at LinuxCon in 2009. I don't think that's much of a factor. I'll agree with you that Linux has plenty of life left in it.

Comment: Re:Actual Solaris Sysadmin Here - Here's the story (Score 2) 190

> Wow. How impressive. Oh wait, Linux has had EDAC since 2006. But you keep paying your millions to Oracle. I'm sure its worth it.

Actually, this might be worth an illustration. It was a long time back, so I'm sure I've forgotten a few details, but I'll give you the big picture.

Around 2000, Sun Microsystems had a problem with the L2 cache on their 400mhz CPUs. It seems that IBM misrepresented the error rate on the chips, and they were having bit errors that were much higher than specified. Because of what was supposed to be an incredibly low error rate, they engineered the L2 cache with parity protection. That's enough to detect an error and cause a UE (uncorrectable error) event. So I know that your EDAC functionality in 2006 was in Solaris well before 2000.

After that problem, Sun Microsystems did two things. First, they mirrored the L2 cache. Second, they completely beefed up their handler for CE/UE (correctable errors and uncorrectable errors) along the memory/cache/bus/cpu to bring it up to Enterprise level error handling. You get an Uncorrectable Error in your CPU's L2 cache. Do you panic? I looked over the EDAC documentation and I could be wrong (please correct me, if so) but it looks like that would result in a panic. Or you could just have it log that the UE event happened but take no action.

What would Solaris do differently? It would find the page of virtual memory that had the corresponding error. Has it been modified? If not, just discard the page, log the event, and go on. There is a whole set of rules it goes through to determine the best way to keep the system running when it hits an uncorrectable error. Let's say that the page was modified and that there was an uncorrectable error in the L2 cache. We panic now, right? No. Solaris checks and sees who the page of memory belongs to. If it is a user process, then that process is simply killed (and the event logged) and the OS continues running. Only if it is a dirty page of active kernel memory do we have a panic.

That isn't just recovering from a soft error. That's recovering from a hard error. So, as this story illustrated, there are quite a number of things happening behind the scenes in an enterprise level OS. You picked a good example with Linux EDAC.

Comment: Re:Actual Solaris Sysadmin Here - Here's the story (Score 1) 190

> Why bother? Just shut down that server, replace the memory, restart. If your application can't handle a brief downtime for one of your servers, there is something wrong with the application, and no OS magic can fix that for you.

You know, it is kind of funny. Person A will argue, "See? Linux has all of the cool features of an enterprise class operating system. What makes Solaris on SPARC so special?" When you point out just a fraction of the things that Linux doesn't do, person B will jump in and claim, "OMG that OS is so bloated that people are running away from it for that very reason!"

Hey, Solaris on SPARC certainly has its issues, but let's be real here. People are not running away from it in any significant numbers due to the inclusion of enterprise-class features. The irony to that argument is that you'll find that many of these "bloated" enterprise-class features in Solaris have been working their way into Linux for years. Linux has been making great strides over the past ten years.

Comment: Re:Actual Solaris Sysadmin Here - Here's the story (Score 1) 190

> Also, yes, various Linux virtualization technologies you can hot-migrate running systems between software VMs within the same chassis (and warm migrate with downtime between hardware chassis).

In this case, it was a live migration over two different chassis.

> But the price/perf/features just don't add up in the modern world versus commodity hardware and open source software, except...

In the existing case with a large corporate environment with a Linux/x86 cloud and a Solaris/SPARC cloud, Oracle wins the spot for lowest price point for a system, and wins at the higher end because Linux doesn't (reasonably) scale that big.

> You just have to install the "mcelog" package on e.g. Debian/Ubuntu. I'm sure the same software exists for the other distros.

Will it simply retire bad pages (at a page level) as they happen, or is it able to detect when enough errors have happened on a single DIMM and to retire all the pages on the DIMM (because it understands the hardware layout and can map those pages to specific hardware)? That's the advantage of controlling the OS and controlling the hardware. Here is an example of multiple errors being detected by Solaris on a DIMM and it identifying and retiring all pages on the entire DIMM:

Fault class : fault.memory.dimm_sb
Affects : mem:///motherboard=0/chip=1/memory-controller=0/dimm=3/rank=0 degraded but still in service
FRU : "CPU 1 DIMM 3" (hc://:product-id=Sun-Fire-X4200-Server:chassis-id=0000000000:server-id=oryx/motherboard=0/chip=1/memory-controller=0/dimm=3)

Description : The number of errors associated with this memory module has exceeded acceptable levels. Refer to for more information.

Response : Pages of memory associated with this memory module are being removed from service as errors are reported.

Impact : Total system memory capacity will be reduced as pages are retired.

Comment: Actual Solaris Sysadmin Here - Here's the story (Score 4, Informative) 190

Solaris/SPARC is still going strong in large companies. One of the greatest advantages it has is that Oracle creates and supports the operating system, and Oracle creates and supports the hardware. (If you're running an Oracle database or some other piece of software, then that's an additional component that they create and support.) What this means is that if I'm having a problem, mundane or esoteric, I can go to one vendor and say, "Fix it." There isn't any bickering about what company's problem it is, and who manufactured my RAM, or any other the other silliness that crops up in vendor support. Large companies value this (as do us sysadmins). That also means they can do some very cool software tricks (which I'll mention a few here below).

The decreasing unit shipments is just as much a sign of virtualization as anything. Right now, I'm looking at an older T5240, with two eight-core CPUs which presents itself as having 128 virtual CPUs (execution engines or thread engines), and 64gb of RAM. This is by no means the biggest box on the floor. We carve these up into smaller systems using either Solaris Zones, or LDOMs. That's two different methods of virtualization with two different goals.

I did something great with an LDOM last week. I took a virtual server that was on the box and migrated the entire operating system and all the applications over to another LDOM... WHILE IT WAS STILL RUNNING. Aside from a quick (1 second) pause, the applications on the server had no idea that it just migrated to another piece of hardware while it continued to run. Slick! The original server had a failing DIMM. No worries, though even aside from ECC, the operating system automatically mapped out which parts of the DIMM were defective and retired the pages of memory so that they weren't constantly being exercised. Linux does all that... right? No?

Someone else, above, said, "I don't think you can have a zfs system fail and move it to different hardware like you can with vmware...". Nevermind that we can migrate a running operating system and application to another piece of hardware and keep it running. Yes, of course if you have a hardware fault, you can bring it back up on another machine. The virtualization with Solaris is quite capable.

In the environment of a large company where we're competing against Linux on the low-cost end of things, Solaris/SPARC is not only holding its own, but actually beating our Linux cloud counterparts in the costs of a virtualized OS/hardware. (I should ask my boss if we can publish a paper on this, because it is rather impressive.)

On the high end of things, we completely dominate. We generally use a T5-4 for our internal cloud (which really isn't the biggest Oracle server out there). It has 64 cores, presenting 512 execution threads to the scheduler. RAM goes up to 2TB. If someone starts out on a tiny box with only one CPU and 4gb of RAM, we can scale them all the way up to the top by increasing their virtualization settings. No migrating to different or unusual hardware. If an application team can't scale their code horizontally (hey, it happens), they can go way vertical in this configuration. We haven't had a need yet for an M6-32 (32tb of RAM, and 32 of the 12-core CPUs (3072 execution threads or "virtual cpus"). We have Linux surrounded (on the low-cost side and the high-performance side) in a large enterprise environment, and that's why Solaris is still there.

Now, I'm not an Oracle salesperson. But if Slashdot ever did an AMA with an Oracle sales engineer, I think my fellow Linux admins would be particular impressed on how far ahead Oracle/SPARC is in a number of key areas.

Comment: Patrol more, sit less (Score 1) 468

by Wokan (#48914257) Attached to: Police Organization Wants Cop-Spotting Dropped From Waze App

I understand cops ideally need to be parked somewhere to monitor traffic, but rush hour makes up a limited part of their day. And even then they can alter their location periodically to foil both stalkers and people hoping to avoid the upcoming speed traps.

It's not like, "But my app said you were further up the road," is going to get anyone out of a ticket. And all the invalid police location entries will keep the Waze users slowed down more than it would have if the cop stayed put because there's now a larger area they might get busted in.

Comment: Fake CC list wouldn't work (Score 1) 130

by Wokan (#48600741) Attached to: Sony Pictures Leak Reveals Quashed Plan To Upload Phony Torrents

I can't speak to the other fake flooding ideas, but I do recall hearing a story where the reporter was shocked to learn that many who sell compromised CC lists offer a refund for any CC numbers that don't work. Putting out a long list of free numbers might district a few newbies, but the people with the experience and the financial backing wouldn't have been distracted by such free lists of dubious origin.

Comment: Rush that check to the bank (Score 1) 400

Good on Mozilla for getting an influx of cash, but I'll be changing the search preferences of every Firefox I install on my and my coworkers systems. Yahoo's little "no more working from home" mess was cited by our CEO as a reason to halt the practice where I work. So Yahoo can go fsck itself with a broken broomstick.

Comment: Re:But the case hasn't even started! (Score 1) 119

by MacAndrew (#48414257) Attached to: US Marshals Auctioning $20M Worth of Silk Road's Bitcoins

OMG, self-deprecation on the web. Seriously, kudos. (I am not being sarcastic.)

You're very right that the way the law uses certain words and expressions—"terms of art"—can be very different from expected. "Weapons of mass destruction" for example. :)

Good link provided in above comment:

Comment: Pavlovian classroom? (Score 1) 66

The quote "I like it because you get rewarded for your good behavior — like a dog does when it gets a treat" should be plenty to flag a really archaic approach to school that's going to work for some kids and poison the rest. The article mentions the criticism for the underlying theory as well. Teachers should be connecting with their kids. What's next? Food pellets for good behavior? Arf! Johnny's a good boy.

"Nature is very un-American. Nature never hurries." -- William George Jordan