Catch up on stories from the past week (and beyond) at the Slashdot story archive


Forgot your password?

Slashdot videos: Now with more Slashdot!

  • View

  • Discuss

  • Share

We've improved Slashdot's video section; now you can view our video interviews, product close-ups and site visits with all the usual Slashdot options to comment, share, etc. No more walled garden! It's a work in progress -- we hope you'll check it out (Learn more about the recent updates).


Comment: Re:Actual Solaris Sysadmin Here - Here's the story (Score 1) 190

> SunOS/Solaris started out lean; when it got bloated, people like me jumped ship. Linux started out lean and it is getting bloated; when it is getting too bloated, I will jump ship again.

FWIW, Linus said that Linux is bloated. That was at LinuxCon in 2009. I don't think that's much of a factor. I'll agree with you that Linux has plenty of life left in it.

Comment: Re:Actual Solaris Sysadmin Here - Here's the story (Score 2) 190

> Wow. How impressive. Oh wait, Linux has had EDAC since 2006. But you keep paying your millions to Oracle. I'm sure its worth it.

Actually, this might be worth an illustration. It was a long time back, so I'm sure I've forgotten a few details, but I'll give you the big picture.

Around 2000, Sun Microsystems had a problem with the L2 cache on their 400mhz CPUs. It seems that IBM misrepresented the error rate on the chips, and they were having bit errors that were much higher than specified. Because of what was supposed to be an incredibly low error rate, they engineered the L2 cache with parity protection. That's enough to detect an error and cause a UE (uncorrectable error) event. So I know that your EDAC functionality in 2006 was in Solaris well before 2000.

After that problem, Sun Microsystems did two things. First, they mirrored the L2 cache. Second, they completely beefed up their handler for CE/UE (correctable errors and uncorrectable errors) along the memory/cache/bus/cpu to bring it up to Enterprise level error handling. You get an Uncorrectable Error in your CPU's L2 cache. Do you panic? I looked over the EDAC documentation and I could be wrong (please correct me, if so) but it looks like that would result in a panic. Or you could just have it log that the UE event happened but take no action.

What would Solaris do differently? It would find the page of virtual memory that had the corresponding error. Has it been modified? If not, just discard the page, log the event, and go on. There is a whole set of rules it goes through to determine the best way to keep the system running when it hits an uncorrectable error. Let's say that the page was modified and that there was an uncorrectable error in the L2 cache. We panic now, right? No. Solaris checks and sees who the page of memory belongs to. If it is a user process, then that process is simply killed (and the event logged) and the OS continues running. Only if it is a dirty page of active kernel memory do we have a panic.

That isn't just recovering from a soft error. That's recovering from a hard error. So, as this story illustrated, there are quite a number of things happening behind the scenes in an enterprise level OS. You picked a good example with Linux EDAC.

Comment: Re:Actual Solaris Sysadmin Here - Here's the story (Score 1) 190

> Why bother? Just shut down that server, replace the memory, restart. If your application can't handle a brief downtime for one of your servers, there is something wrong with the application, and no OS magic can fix that for you.

You know, it is kind of funny. Person A will argue, "See? Linux has all of the cool features of an enterprise class operating system. What makes Solaris on SPARC so special?" When you point out just a fraction of the things that Linux doesn't do, person B will jump in and claim, "OMG that OS is so bloated that people are running away from it for that very reason!"

Hey, Solaris on SPARC certainly has its issues, but let's be real here. People are not running away from it in any significant numbers due to the inclusion of enterprise-class features. The irony to that argument is that you'll find that many of these "bloated" enterprise-class features in Solaris have been working their way into Linux for years. Linux has been making great strides over the past ten years.

Comment: Re:Actual Solaris Sysadmin Here - Here's the story (Score 1) 190

> Also, yes, various Linux virtualization technologies you can hot-migrate running systems between software VMs within the same chassis (and warm migrate with downtime between hardware chassis).

In this case, it was a live migration over two different chassis.

> But the price/perf/features just don't add up in the modern world versus commodity hardware and open source software, except...

In the existing case with a large corporate environment with a Linux/x86 cloud and a Solaris/SPARC cloud, Oracle wins the spot for lowest price point for a system, and wins at the higher end because Linux doesn't (reasonably) scale that big.

> You just have to install the "mcelog" package on e.g. Debian/Ubuntu. I'm sure the same software exists for the other distros.

Will it simply retire bad pages (at a page level) as they happen, or is it able to detect when enough errors have happened on a single DIMM and to retire all the pages on the DIMM (because it understands the hardware layout and can map those pages to specific hardware)? That's the advantage of controlling the OS and controlling the hardware. Here is an example of multiple errors being detected by Solaris on a DIMM and it identifying and retiring all pages on the entire DIMM:

Fault class : fault.memory.dimm_sb
Affects : mem:///motherboard=0/chip=1/memory-controller=0/dimm=3/rank=0 degraded but still in service
FRU : "CPU 1 DIMM 3" (hc://:product-id=Sun-Fire-X4200-Server:chassis-id=0000000000:server-id=oryx/motherboard=0/chip=1/memory-controller=0/dimm=3)

Description : The number of errors associated with this memory module has exceeded acceptable levels. Refer to for more information.

Response : Pages of memory associated with this memory module are being removed from service as errors are reported.

Impact : Total system memory capacity will be reduced as pages are retired.

Comment: Actual Solaris Sysadmin Here - Here's the story (Score 4, Informative) 190

Solaris/SPARC is still going strong in large companies. One of the greatest advantages it has is that Oracle creates and supports the operating system, and Oracle creates and supports the hardware. (If you're running an Oracle database or some other piece of software, then that's an additional component that they create and support.) What this means is that if I'm having a problem, mundane or esoteric, I can go to one vendor and say, "Fix it." There isn't any bickering about what company's problem it is, and who manufactured my RAM, or any other the other silliness that crops up in vendor support. Large companies value this (as do us sysadmins). That also means they can do some very cool software tricks (which I'll mention a few here below).

The decreasing unit shipments is just as much a sign of virtualization as anything. Right now, I'm looking at an older T5240, with two eight-core CPUs which presents itself as having 128 virtual CPUs (execution engines or thread engines), and 64gb of RAM. This is by no means the biggest box on the floor. We carve these up into smaller systems using either Solaris Zones, or LDOMs. That's two different methods of virtualization with two different goals.

I did something great with an LDOM last week. I took a virtual server that was on the box and migrated the entire operating system and all the applications over to another LDOM... WHILE IT WAS STILL RUNNING. Aside from a quick (1 second) pause, the applications on the server had no idea that it just migrated to another piece of hardware while it continued to run. Slick! The original server had a failing DIMM. No worries, though even aside from ECC, the operating system automatically mapped out which parts of the DIMM were defective and retired the pages of memory so that they weren't constantly being exercised. Linux does all that... right? No?

Someone else, above, said, "I don't think you can have a zfs system fail and move it to different hardware like you can with vmware...". Nevermind that we can migrate a running operating system and application to another piece of hardware and keep it running. Yes, of course if you have a hardware fault, you can bring it back up on another machine. The virtualization with Solaris is quite capable.

In the environment of a large company where we're competing against Linux on the low-cost end of things, Solaris/SPARC is not only holding its own, but actually beating our Linux cloud counterparts in the costs of a virtualized OS/hardware. (I should ask my boss if we can publish a paper on this, because it is rather impressive.)

On the high end of things, we completely dominate. We generally use a T5-4 for our internal cloud (which really isn't the biggest Oracle server out there). It has 64 cores, presenting 512 execution threads to the scheduler. RAM goes up to 2TB. If someone starts out on a tiny box with only one CPU and 4gb of RAM, we can scale them all the way up to the top by increasing their virtualization settings. No migrating to different or unusual hardware. If an application team can't scale their code horizontally (hey, it happens), they can go way vertical in this configuration. We haven't had a need yet for an M6-32 (32tb of RAM, and 32 of the 12-core CPUs (3072 execution threads or "virtual cpus"). We have Linux surrounded (on the low-cost side and the high-performance side) in a large enterprise environment, and that's why Solaris is still there.

Now, I'm not an Oracle salesperson. But if Slashdot ever did an AMA with an Oracle sales engineer, I think my fellow Linux admins would be particular impressed on how far ahead Oracle/SPARC is in a number of key areas.

Comment: Re:Say what you will but this is cool (Score 1) 52

by AtariDatacenter (#47784419) Attached to: Google Testing Drone Delivery System: 'Project Wing'

Amazon recently announced it was getting into the advertisement business, and it beat out Google to acquire Twitch.

Pure speculation on my part, but I have to wonder if this is just Google's CEO trying to steal some of the spotlight away from Amazon?

Suddenly, Google is saying, "Oh yeah... delivery drones. We've been doing this for some time now." It smells like petty CEO bickering. (As cool as delivery drones are.)

Comment: Re:noone trusts their cya legalese (Score 4, Insightful) 134

Based on published information, we know that the NSA gets customer information by compelling companies to produce the records, or it taps the connections between their datacenters and it gets the data in transit). Apple didn't deny either -- neither one of those involve installing a backdoor or giving SERVER access.

I think you're on the right track. There really is nothing that Apple can say to convince foreign users that their data is safe.

Comment: Ouya just isn't compelling (Score 5, Insightful) 134

I was an original backer for the Ouya. The interface is a bit awkward, but worse, the software titles just aren't compelling. There doesn't seem to be a great reason to make an exclusive Ouya game, and anything you can find there you can get on your phone or another platform. Playing smartphone games on your TV just doesn't deliver any kind of wow factor. :(

Comment: Welcome to Google Island? (Score 1) 43

by AtariDatacenter (#43816295) Attached to: Google Plans Wireless Networks In Emerging Markets

So, Google wanted their place that was free of government regulation to experiment and try new things out. It sounds like, in many ways, they have found it. They can get their feet wet and learn the ropes of wireless networks. Maybe in time, they'll come back to the US and play against the big boys.

Comment: Re:You gotta love Larry's self-serving hypocrisy.. (Score 3, Interesting) 486

by AtariDatacenter (#43750213) Attached to: Larry Page: You Worry Too Much About Medical Privacy

“Computer science has a marketing problem." That's what Larry said. And his presentation was about marketing more than anything. He was trying to sell the world view that works great for his company, and he certainly put his sour grapes on the table.

He talks of "resistance to technological change", which is code for Google Glasses and the glasshole syndrome. He talks of how people should should be more relaxed with their medical records, which is code for Google Health. They had a clear plan how they were going to make money with Google Health (selling user data). The problem was that, on the user side, they had a solution that was in search of an actual need. But Google has made it clear that they're not going to learn that lesson.

You know, I kind of like his idea of a mirror universe where more avant-garde ideas can be tested out, in small scale, in the real-world. He wanted a Burning Man type of environment for new technology. Actually, Eureka (the town from the TV show of the same name) might have been a closer fit (although the reference would have been lesser-known, and is almost synonymous with disaster). Being able to try things out (on the small scale and a limited geography) and work out the problems there is great for allowing a company to iterate on a product without the marketing backlash for failures.

In theory, I'd love to live in that Eureka town. But only if it was about the product and about the science. The only thing Google Health did for me was to convince me that Google's products and services aren't about what they deliver (search, ubiquitous health records). They are about Google's real customers (advertisers, health care industry) and Google's real problem is finding a way to get everyone to jump on board so they can make money. That's what he is saying, in code, when he says "computer science has a marketing problem".

Theory is gray, but the golden tree of life is green. -- Goethe