Forgot your password?
typodupeerror
Facebook

+ - Facebook Turns to Visualization to Manage Bad Servers->

Submitted by
Nerval's Lobster
Nerval's Lobster writes "How do Facebook engineers manage hundreds of servers and racks without getting lost in all that data? By visualizing it, of course.

In a corporate blog posting Sept. 19, Facebook application operations engineer Sean Lynch revealed the development of a tool, “Claspin,” which generates a heat map of the company’s numerous racks and servers—the better to determine which are “bad” and in need of repair.

According to Lynch, Facebook originally set out to manage the health of its computing resources via two tools: Memcache, and TAO, a caching graph database that performs its own MySQL queries. While the TAO tool generates reams of data from servers and clients, all of it collected into dashboards showing various latency and error rate statistics, it started giving Facebook engineers some scalability issues.

In the wake of that, Lynch turned to creating a tool that could generate lists of hosts, each with rankings for the number of timeouts, for example, or TCP retransmits. The resulting tool listed each server in a tuple, or an ordered list of elements. But the solution was also text-heavy and required a somewhat-trained operator to manage the problem—in that case, Lynch himself. So Lynch settled on a heatmap, with each “pixel” representing a host."

Link to Original Source
This discussion was created for logged-in users only, but now has been archived. No new comments can be posted.

Facebook Turns to Visualization to Manage Bad Servers

Comments Filter:

A LISP programmer knows the value of everything, but the cost of nothing. -- Alan Perlis

Working...