Forgot your password?
typodupeerror

Comment Link to Facebook Blog Post (Score 5, Informative) 103

Since the link in the summary is broken, this is the facebook blog post.

Post contents:
Early today Facebook was down or unreachable for many of you for approximately 2.5 hours. This is the worst outage we’ve had in over four years, and we wanted to first of all apologize for it. We also wanted to provide much more technical detail on what happened and share one big lesson learned.

The key flaw that caused this outage to be so severe was an unfortunate handling of an error condition. An automated system for verifying configuration values ended up causing much more damage than it fixed.

The intent of the automated system is to check for configuration values that are invalid in the cache and replace them with updated values from the persistent store. This works well for a transient problem with the cache, but it doesn’t work when the persistent store is invalid.

Today we made a change to the persistent copy of a configuration value that was interpreted as invalid. This meant that every single client saw the invalid value and attempted to fix it. Because the fix involves making a query to a cluster of databases, that cluster was quickly overwhelmed by hundreds of thousands of queries a second.

To make matters worse, every time a client got an error attempting to query one of the databases it interpreted it as an invalid value, and deleted the corresponding cache key. This meant that even after the original problem had been fixed, the stream of queries continued. As long as the databases failed to service some of the requests, they were causing even more requests to themselves. We had entered a feedback loop that didn’t allow the databases to recover.

The way to stop the feedback cycle was quite painful - we had to stop all traffic to this database cluster, which meant turning off the site. Once the databases had recovered and the root cause had been fixed, we slowly allowed more people back onto the site.

This got the site back up and running today, and for now we’ve turned off the system that attempts to correct configuration values. We’re exploring new designs for this configuration system following design patterns of other systems at Facebook that deal more gracefully with feedback loops and transient spikes.

We apologize again for the site outage, and we want you to know that we take the performance and reliability of Facebook very seriously.

Comment Re:Server technology? (Score 2, Insightful) 271

It's nice they've developed a way to transfer data at ridiculous speeds, but it does the average user no good as long as we're using mechanical hard drives. Even a "mere" 1 gigabit network connection outstrips the ability of spinning platters to absorb it. I guess this Light Peak thing is aimed at the server market then?

That's not really a fair analysis. HD video is often stored compressed, but needs to be transferred at full resolution uncompressed to the display medium. The DVI spec supports 3.96Gbit/s. HDMI even goes up to 10.2Gbit/s. There are plenty of other examples where a high-bandwidth transport will be useful.

Privacy

Red-Light Camera Ticket Revenue and Short Yellows 976

NicknamesAreStupid writes "A Fort Meyers news station reports a nerdy husband getting his wife out of a red-light camera ticket by proving the light was set with too short of a yellow. Then he goes out and proves that nearly 90% of the lights are set an average of about 20% too short. Is this a local incident, or have local governments nationwide found a new revenue source? What puzzles me is how a single picture can tell if you ran a light. If you are in the intersection before the light turns red, you have not run it, even if it takes a little while to clear it (say to yield to an unexpected obstacle). Wouldn't you need two pictures — one just before the light went red showing you are not in the intersection, and another after the light went red showing you in the intersection?"
Government

It's Time To Split Up NSA Between Spooks and Geeks 122

Hugh Pickens writes "Noah Shachtman writes in Wired that most of us know the National Security Agency as the supersecret spook shop that allegedly slurped up our email and phone calls after the September 11 attacks, but not so many know that the NSA is actually home to two different agencies under one roof: the signals-intelligence directorate, who can tap into any electronic communication, and the information-assurance directorate, the cybersecurity nerds who make sure our government's computers and telecommunications systems are hacker- and eavesdropper-free. 'The problem is, their goals are often in opposition,' writes Shachtman. 'One team wants to exploit software holes; the other wants to repair them.' Users want to know that Google is safeguarding their data and privacy. The trouble is that when Google calls the NSA, everyone watching sees it as a package deal. Google wants geeks, but it runs the risk of getting spies, too."

Slashdot Top Deals

Kill Ugly Processor Architectures - Karl Lehenbauer

Working...