Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Power Data Storage Facebook IT

How Facebook Is Saving Power By 10-15% Through Better Load Balancing 54

An anonymous reader writes Facebook today revealed details about Autoscale, a system for power-efficient load balancing that has been rolled out to production clusters in its data centers. The company says it has "demonstrated significant energy savings." For those who don't know, load balancing refers to distributing workloads across multiple computing resources, in this case servers. The goal is to optimize resource use, which can mean different things depending on the task at hand.
This discussion has been archived. No new comments can be posted.

How Facebook Is Saving Power By 10-15% Through Better Load Balancing

Comments Filter:
  • by Anonymous Coward on Friday August 08, 2014 @08:01PM (#47634871)

    Just turn it off.

  • to sum it up (Score:3, Informative)

    by roman_mir ( 125474 ) on Friday August 08, 2014 @08:11PM (#47634909) Homepage Journal

    to sum it up, if a FB server is idle it consumes 60 watts, if CPU is minimally utilised it consumes 130 watts and if it's utilised more it consumes 150 watts.

    Instead of round robin use an algorithm that pushes requests to the servers that are already processing other requests, thus allowing many CPUs to remain at 60 watts, while some CPUs to hit 150 watts of power consumption and so instead of doubling or almost trippling power consumption of all servers due to round robin distribution of requests, tripple power consumption of fewer CPUs and let many CPUs to stay at 60 watts.

    Sure, it's an interesting thing to optimise, but unless you are running dozens or maybe hundreds and even thousands of servers in a data centre you won't care about this much at all.

    • to sum it up, if a FB server is idle it consumes 60 watts, if CPU is minimally utilised it consumes 130 watts and if it's utilised more it consumes 150 watts.

      Instead of round robin use an algorithm that pushes requests to the servers that are already processing other requests, thus allowing many CPUs to remain at 60 watts, while some CPUs to hit 150 watts of power consumption and so instead of doubling or almost trippling power consumption of all servers due to round robin distribution of requests, tripple power consumption of fewer CPUs and let many CPUs to stay at 60 watts.

      Sure, it's an interesting thing to optimise, but unless you are running dozens or maybe hundreds and even thousands of servers in a data centre you won't care about this much at all.

      Some of us do actually run hundreds or thousands of web servers so it is actually interesting to us.

      Also, I think the idea is not only applicable to web servers. I'm not an expert in this field but I would think the power consumption difference is due to dynamic frequency scaling both by direct consumption and by subsequent heat generation.

  • by manu0601 ( 2221348 ) on Friday August 08, 2014 @08:19PM (#47634951)

    TFA is vague on that point: do they switch off some server during idle hours?

    Such a practice seems good for power consumption, but we have to account the fact that switching on and off shortens hardware lifetime: it creates temperature stress, and we all know that electronics most often die at power on time. Hence what looks like a power saving may hide bigger costs (either financial or environmental) for hardware replacement.

    • No, they are simply letting the CPU util go to 0% (+ whatever necessary for OS etc). But the hosts are still awake and available. Another advantage is that the load can be instantly added back, whereas if they actually turned the machines off they'd have to wait for boot time, so the reaction to capacity shifts wouldn't be as fast.
      • I see the value of having a pool of idle servers ready for request peaks. But the graphs in TFA shows they have huge daily variations, hence it could make sense to switch off a fraction of idle servers.
        • by Lennie ( 16154 )

          I wouldn't be surprised if you turn off the machines there is also a larger chance of failure.

          So when you try to turn it on, it wouldn't turn on or some disk would have not spun up.

          But I've never done the numbers if this is actually true.

        • by Dogers ( 446369 )

          I quite like this artificial demo of VMwares DPM functionality:
          http://www.youtube.com/watch?v... [youtube.com]

    • Or it could be that the power-cycle stresses actually aren't a big factor, or aren't a big factor given the expected lifecycle of the device, or are a big factor but not big enough to offset the savings, and so it would make perfect sense to turn them off any time you're fairly sure it's safe to add the delay of a boot cycle. It's also possible that reducing power usage might be worth more than the pure cost of power, as it might reduce say, expected future power costs or installation costs or any of 100 ot

      • it's silly to talk about how factor B might overwhelm factor A when you don't have numbers for factor B.

        Nor for factor A!

  • Should we care? (Score:4, Insightful)

    by penguinoid ( 724646 ) on Friday August 08, 2014 @09:06PM (#47635167) Homepage Journal

    Is this a case of "Facebook was being obliviously wasteful" or a case of "Facebook discovers way to increase efficiency"? I'm guessing it's the former.

  • by MS ( 18681 ) on Friday August 08, 2014 @09:38PM (#47635257)

    Not only Facebook, but the end-users also could save a lot of electricity by not using Facebook at all. People should get out and have a real social life.

  • There's some savings to be had by, if you have a geographically distributed system across time zones, moving loads to lower commercial rates based on time zone.

    For those that don't know, commercial rates vary, and spike at peak demand time (~14:00) Moving peak load by forward or back 2 time zones would move you out of peak rates.

    • by Lennie ( 16154 )

      You can't do this for the webservers, because you build these datacenters to be closer to the users. To improve latency.

      So you are not going to increase latency to save power.

      I believe I heared someone from Google mention they do this for certain batch jobs, but I could be mistaken.

      The problem might be: you can only move such workloads if you have the data in the other datacenter too and the data is up to date enough.

  • "those who don't know, load balancing refers to..."

    no shit sherlock

  • ... comments threads after 24 hours, on slashdot? Now we know which keyword turns off the slashdot crowd! :-P

Get hold of portable property. -- Charles Dickens, "Great Expectations"

Working...