Forgot your password?
typodupeerror

How To Set Up A Load-Balanced MySQL Cluster 127

Posted by CmdrTaco
from the someone-teach-me-please dept.
hausmasta writes "This tutorial shows how to configure a MySQL 5 cluster with three nodes: two storage nodes and one management node. This cluster is load-balanced by a high-availability load balancer that in fact has two nodes that use the Ultra Monkey package which provides heartbeat (for checking if the other node is still alive) and ldirectord (to split up the requests to the nodes of the MySQL cluster)."
This discussion has been archived. No new comments can be posted.

How To Set Up A Load-Balanced MySQL Cluster

Comments Filter:
  • by Anonymous Coward
    from the article:

    "All data is stored in RAM! Therefore you need lots of RAM on your cluster nodes. The formula how much RAM you need on ech node goes like this:

    (SizeofDatabase × NumberOfReplicas × 1.1 ) / NumberOfDataNodes

    So if you have a database that is 1 GB of size, you would need 1.1 GB RAM on each node!

    - The cluster management node listens on port 1186, and anyone can connect. So that's definitely not secure"
    • by Jamesday (794888) on Sunday April 23, 2006 @11:32AM (#15184846)
      The sort of thing people use the MySQL Cluster storage engine for include session management and telecoms. That data tends to be of modest size, even for a company with many millions of customers. If you don't mind losing the data if there's a power failure longer than your UPS life you can use the Memory (Heap) storage engine instead of Cluster. You can often back that up wth triggers and reload in a startup script.

      The RAM only bit will go away once MySQL 5.1 is released. It's currently in early beta. It doesn't mean you lose your data if there's a power failure; the data is backed up to disk regularly.
      • Doesn't the cluster engine in 5.0 have the capacity to write the transaction log to a file on disk?

      • by kpharmer (452893) on Sunday April 23, 2006 @12:22PM (#15185086)
        > The RAM only bit will go away once MySQL 5.1 is released. It's currently in early beta. It doesn't
        > mean you lose your data if there's a power failure; the data is backed up to disk regularly.

        Seems backwards, but this is probably because mysql bought a product and has had to figure out how to bolt persistence onto it. A much simpler approach is to go the direction of most commercial databases: provide fine-tuned memory controls that easily ensure that the most-used data is kept in memory.

        In this scenario (with a database like db2 or oracle) you simply organize tables into tablespaces, dedicate however much memory you want to each tablespace. If you want a table to live in memory - just give it enough memory and it will. If a different table is 100 GB in size, then give it as much memory as you want and the database will manage the memory cache accordingly.

        With a solution like the above you don't need completely separate products, but instead see clustering, in-memory databases, and reliable persistence as all just different aspects of a robust and mature database.
        • by Jamesday (794888) on Sunday April 23, 2006 @01:02PM (#15185255)
          Cluster was designed for the telecoms case where you need to authenticate your subscriber and verify that they are entitled to make a call now even if they haven't made a call for a month, doing it in the same time window as the other calls. Also to scale out to as many servers as it takes to handle either load or data volume while delivering that predictable real-time performance. In a non-telecoms case, one user is capturing in real time performance data from thousands of vehicles for analysis, exploiting the write rate scaling and reliability.

          You can get some of that by allocating data space to RAM on one server but that doesn't get you all of Cluster's properties. Cheaper if you can use the approach you gave, of course, just doesn't do the job for some situations. Same for the Memory engine approach I outlined - doesn't get you all Cluster provides, but is enough for some situations.

          Like the other MySQL storage engines, you don't use the Cluster engine if you don't need its properties. It's a pick the right engine for the job situation.
    • The cluster management node listens on port 1186, and anyone can connect. So that's definitely not secure"
      So you need a firewall. Big whoop. In fact, the above is an incomplete quote from TFA, here's the rest:
      ", and therefore you should run your cluster in an isolated private network!"
  • 'Ultra Monkey'? (Score:5, Insightful)

    by Anonymous Coward on Sunday April 23, 2006 @11:21AM (#15184799)
    One of the major problems with a number of useful open source projects is that their names are terrible for the corporate environment. This is a perfect example of such a project. I sure wouldn't want to pitch the use of this software to a CTO or CIO, even if it was perfect for our database cluster, just because it is called 'Ultra Monkey'. That's just not a professional name, and this product will likely not be taken seriously.

    • Re:'Ultra Monkey'? (Score:4, Insightful)

      by kryten_nl (863119) on Sunday April 23, 2006 @11:54AM (#15184935)
      So? Call it the "U.M. system".....

      Problem solved.
      • Wouldn't be so sure at this point, they are using 4 machines to load-balance a mysql server with 2 storage points ? Perhaps the ultra monkey is a good name for it ...

        Is it just me or does it seem a bit bad idea in most cases where the data amount is small but traffic is high? The latencies created all over this ultra monkey setup will make it slow, if compared to one smtp box with 2 cpu's and a really good raid card in it, attached to 15k rpm scsi disks, i don't think there's much of a difference in
    • by Conanymous Award (597667) on Sunday April 23, 2006 @11:56AM (#15184946)
      But, imagine a Beowulf Cluster of Ultra Monkeys...
    • One of the major problems with a number of useful open source projects is that their names are terrible for the corporate environment. This is a perfect example of such a project. I sure wouldn't want to pitch the use of this software to a CTO or CIO, even if it was perfect for our database cluster, just because it is called 'Ultra Monkey'. That's just not a professional name, and this product will likely not be taken seriously.

      Of course, since it's open source there's nothing to prevent you from renaming i
      • Uh, there's only 1 'L' in the word "excel".
      • And let's not forget there are plenty of stupid names for succesful, commercial software. [snip bad examples]

        You seem to be confusing "literal names" with "good names". What makes a good name isn't that it literally means what it does, it's that it 1) sounds good, and 2) doesn't have negative connotations, and 3) isn't hard to say. It's a bonus if it even vaguely resembles what it does. Lotus Notes, Oracle and Excel all have vague connotations of what they do, but more importantly, there is nothing offensive about them, and they're easy to say and remember.

        Compare to some of the popular names in Open Source. GIMP. Gnu (Gah-nu, the absolutely STUPIDEST name and pronounciation ever invented, and that goes for all their programs with the 'g' prefix). Ultra Monkey is right in line with traditional stupid naming.

        • I agree 100% that Gnu is a completely shit name. I think Stallman & Friends would have had much more success in their little name campaign with a better name.

          But it wasn't invented [wikipedia.org], it's a wildebeest. What they did tack on, unbelievably, is the crappy pronunciation.

          That's right, they found the worst word in the world, took a dump on its pronunciation, and then demanded that we prefix Linux with it. Ha! "Let me know how that works out for you".

          • Not quite.

            I take it you haven't heard "The gnu song" (I'm a gnu) by Flanders and Swann ?

            They pronounce it "ger-noo" throughout.

            And seeing as that was recored in 1959, it predates the Stallman Gnu effort somewhat.

            In fact, I was 'brought up' thinking this was how "gnu" is pronounced, maybe due to hearing this song as a child.

            google/p2p it.

          • But it wasn't invented [wikipedia.org], it's a wildebeest.

            I didn't mean "invented" as in created the word from nothing, I just meant the name they picked. And yes, it would've been a stupid name even without the stupid pronunciation, but the added syllable just proves how brainless they are about P.R.


        • Actually, Gnu is a good name. The recursive acronym isn't necessarily board member friendly, but it is not offensive, it's difficult to misinterpret, and it doesn't put anybody off anything.

          Prepending other names with 'g' on the other hand works only when it's a good word (like Gopher, or Gnome).

      • Or "Lotus Notes", which has nothing to do with making notes (and e-mails are called "memos" in Notes), and nothing to do with flowers, either.

        I don't want to nitpick but originally "documents" were called "notes" in Lotus Notes. Later on probably PR guys thought that "document" sounds better. There are still property names (like NoteID) and API functions (like SelectAllNotes) that mention "notes" in Lotus Notes.
    • Re:'Ultra Monkey'? (Score:3, Insightful)

      by Eunuchswear (210685)
      Insightful?

      What is all this shit whining about names of "open source" (bleurgh) projects?

      I am the fucking CTO. I couldn't give a shit what the name of the product is. I want to know if it works.

      "Professional name"?
      • by Anonymous Coward
        So, your business card reads "fucking CTO?"
      • I'd be interested to know how big your company is, how many people are under you in the Org chart, what your yearly spending budget is, etc.

        I, too, am the "CTO" of my company, but that is only to increase the chances that my quotes/sound-bytes will be used in marketing crap from Oracle, or anyone else who needs at least a "C" level officer listed in order to seem credible. In reality, I'm the Technical Architect, President, etc., of a 10-man shop.

        That being said, I have a ton of large clients where CTO me
      • What do you want it to do? :-)

        "Does it work?" is a meaningless question asked by a great many CxO's.

        Usually they really DON'T know if it "works" because they don't have any idea what the business needs are.

        Does Windows "work"? Does that modem over there "work"? Does the shiny new laptop on your desk "work"? Nope.

        They don't "work" unless a smart human makes them either a) Make the company revenue, or b) Save the company revenue.

        The real question is:

        a) Does it make the company money?
        b) Does it save the com
    • Yeah, I mean, Google. Haha. What a bunch of losers they must be. Oracle? Heh. Red Hat. Terrible.
      • For every company with a name like "Oracle", "google" and "Red Hat", there were 10 companies who went bust, and who just happened to have names like "Monkeybrains", etc.
        • For every company with a name like "Oracle", "google" and "Red Hat", there were 10 companies who went bust, and who just happened to have names like "Monkeybrains", etc.
          Nothing to do with names.
    • Re:'Ultra Monkey'? (Score:3, Interesting)

      by RookKilla (970222)
      I'm with you on that one. Well, i'm a CTO, and i don't care about the names, but sometimes the initiative starts "from the bottom", so to speak, and sometimes needs to be run by not-so-open-minded people who happen to write the check. Unless the person wants to start company-wide project of opening minds prior to initiating something like "Ultra Monkey", he/she has to do something about the name. In this very case, i'd call the software "UM", i guess, if i were working in a "good old corporation" and had to
      • "and sometimes needs to be run by not-so-open-minded people who happen to write the check."

        Write a check? It's an open source program. Oh and when was the last time you wrote THE NAME OF THE PRODUCT on a check? You write the check out to the company, is the company named ultra monkey?

        "and sometimes needs to be run by not-so-open-minded people who happen to write the check."

        Just go download it. You don't need a check.

        "At least those a my thoughts."

        Well we now know why you are a CTO. You have cleary risen/flo
        • Hmm... While one certainly does not need a check to use an open source software, one does need to get a permission to use company resources to implement the project (people, to be precise) which involves submitting certain documentation known as project charter, or project specs. For those of us who is not clear on what "write a check" means, let me be more specific - "write check" = "allow usage of company resources in a particular project". I am also not sure how familiar some people may be with the way

          • "Hmm... While one certainly does not need a check to use an open source software, one does need to get a permission to use company resources to implement the project (people, to be precise) which involves submitting certain documentation known as project charter, or project specs."

            OK let's see how this conversation might go...

            "sir, we are going to implement a clustered failover database so that we don't have any downtime and our customers will be happy, the best part is that it's not going to cost anything!
            • Um, it doesn't cost nothing. My time or the OP's time costs the company money. That is the point you have so obviously have missed.

              Hint: In the real world companies have internal budgets for resources.

              For example: If it takes me a day to learn how to install & configure "Ultra Monkey" on a box, then that costs the company a day of my time not doing something else productive.

              • "Um, it doesn't cost nothing. My time or the OP's time costs the company money. That is the point you have so obviously have missed."

                Is it your position that implementing a clustering database using a better named product would not take of your time and the OP's time?

                "For example: If it takes me a day to learn how to install & configure "Ultra Monkey" on a box, then that costs the company a day of my time not doing something else productive."

                This presumes that the decision makers are considering factors
            • "WHAT!!!!. I refuse to have something called ultra monkey in my company. How dare you install something called ultra monkey. Here is a check for 80 thousand dollars, now go and install an oracle cluster. Oh and you are fired for even saying ultra monkey in my presence"

              If that happened in my company, I'd forward the issue to the CEO, get the manager canned for supidity and get my job back. Yeah, and Oracle clusters - not so hot. Clusters lead to failures and nd instability. Instability leads to outages.

    • Anybody else getting tired of the "we don't like the names" troll?

      The people who wrote switchtower (cool name) got sued and had to switch over to capistrano (a crappy name). This happens all the time. If you name your product something that's even remotely decent you will get sued by someone.

      Let's also consider the fact that we may not want people who choose products based on name as the userbase for open source products. They are clearly not going to contribute anything at all and intead be BOFHs who are a
    • How about CARP from the BSD project? Pretty darn simple compared to Ultra Monkey.

      Plus then you can say "Our machines handle failover at the IP level through Common Address Redundancy Protocol".
    • Then don't.

      Just show what's so perfect about your environment and show the benefit, cost ratio compared to what it is now, and do they drop the benefit just because it's got a monkey in the name? but can save them quite some $$$ if that's the case?

      Seriously, if your boss drops the name over the benefit, change your job or your boss.
    • Yeah, my CEO was disgusted with the name "Microsoft". Why the hell would we want any computer clicky icon things which are both small and soft??!!

      YOU'RE FIRED!

  • ... and they just stare at you and ask incredulously, "You had to do WHAT with the seat?"
  • by Jamesday (794888) on Sunday April 23, 2006 @11:26AM (#15184827)
    This is for the MySQL Cluster storage engine, describing how to set up load balancing across two MYSQL servers that serve as front ends to it.

    If you want code to set up a MYSQL cluster (cluster of MYSQL servers NOT using the Cluster storage engine) you might do something like looking at the PHP code in MediaWiki, which does application-level load balancing for reads (not writes) and has so far been tested to 25,000 or so queries per second. Or you could use the same director tools and skip using them for writes, implementing failover for the master server getting the writes.
  • The MySQL Protocol is proprietary.

    The MySQL Protocol is part of the MySQL Database Management System. As such, it falls under the provisions of the GNU Public License (GPL). A copy of the GNU Public License is available on MySQL's web site, and in the product download.

    Because this is a GPL protocol, any product which uses it to connect to a MySQL server, or to emulate a MySQL server, or to interpose between any client and server which uses the protocol, or for any similar purpose, is also bound by the GPL. Therefore if you use this description to write a program, you must release your program as GPL. Contact MySQL AB if you need clarification of these terms or if you need to ask about alternative arrangements.

    http://dev.mysql.com/doc/internals/en/licensing-no tice.html [mysql.com]
    • Just because they say it is? I'm not trolling, I really want to know.
      • Therefore if you use this description to write a program, you must release your program as GPL.

        You can allways reverse engineer it. Or write an additional program to 'translate' between the MySQL protocol and your own chosen protocol, and use inter-process communication.
      • Just because they say it is? I'm not trolling, I really want to know.

        You can copyright source code, but not the rules of a protocol. You'd have to use a patent for that. Since the GPL is all about copyright and not patents (yet), what they say is probably nonsense.

        The only construction that I can think of that would make the protocol governed by the GPL (though not for the reasons MySQL AB states) is if the protocol requires the transmission of some text that is itself copyrighted under the GPL *. I'm not f
      • Re:Is that true? (Score:3, Insightful)

        by jadavis (473492)
        As usual, MySQL is intentionally being confusing about their licensing. You can't GPL a protocol, only the implementation of a protocol.

        However, the protocol itself still every bit as proprietary a format as an MS Word document. It just happens that they license their implementation to you under the GPL. You can reverse engineer it, just like the MS word doc format, but MySQL AB can always make subtle changes to break it (just like MS can do to the word format).
    • It looks like whoever actually wrote that note on the MySQL site does not know how GPL licenses work. It's a violation of the GPL to emulate the protocol? Nonsense, unless there are patents on it.
    • Mysql may believe this (of course, they believed foreign keys were unnecessary up until pretty recently) but they aren't right, at least, not under the terms of US copyright law.

      You can't copyright a protocol. You can copyright the literal words of their description of the protocol, but that in now way binds you in terms of the license. You can't copy their description into a book without permission, but I doubt that the description is licensed under the Gnu FDL.

      If you lift their client app and use it to ta
    • If it's GPLd, it's Free and definitely _not_ proprietary.
  • by layer3switch (783864) on Sunday April 23, 2006 @11:45AM (#15184900)
    No MySQL sessions are carried over from failed node, so all MySQL connections will be dropped and need to re-establish again. For sessionless Apache or other fast and short connections may make this very useful, but for application service with shared connection pool, a DNS RR with node check makes better choice rather than elaborate setup using lb nodes in front of MySQL cluster due to +2 lb nodes administration overhead. And also because of arp problem due to MAC on loopback broadcasting, the article forgets to mention;

    net.ipv4.conf.all.arp_announce = 2
    net.ipv4.conf.all.arp_ignore = 1

    Should take care of all interface arp announce/ignore. Doesn't need to set net.ipv4.conf.eth0.arp_ seperately as the article provides. Or easier way to control it is to set ARP=no on /etc/sysconfig/network-scripts/ifcfg-eth0 and ifcfg-lo:x

    Personally I rather have Active/Active lb nodes in front and carry over sessions from failed node to active node transparently by using shared memory space to replicate all session info. OpenMosix comes to mind.
    • I agree about the hardware load-balancer being unnecessary. Round-robin balances just fine, and if you have keepalived on the nodes, all your interfaces will be available if one of the node drops out of the cluster for any reason, so none of the addresses in DNS will look "dead" to clients.

      Question: which process would replicate sessions?
      • "Question: which process would replicate sessions?"

        There are expensive solution to do just that. Many L4-7 switches on the market does this. Of course, the whole point of the article is to home-craft the solution, but I think, SQL Relay [sourceforge.net] does what application layer loadbalancer would but without the replicate session feature and with some what failover/loadbalance feature.

        Of course, there is always Oracle... Believe me, I think, Postgresql is great open source product, but if I decide to go notch upward o
  • by m50d (797211) on Sunday April 23, 2006 @11:50AM (#15184920) Homepage Journal
    Anyone have a guide for doing this kind of thing with postgresql?
    • by Anonymous Coward
      PostgreSQL supports these capabilities natively, without needing to resort to third-party packages. The best place to learn about setting it all up is from the PostgreSQL docs [postgresql.org]. Between the FAQ and the manuals, you should have more than enough to get you started.

    • Slony [google.com]. But although I've poked around at it, I've never given it the time to try and get it working.
      It seems... well, strange to me. Doesn't work in the way I expect.
    • Right now there is no equivalent for Postgres. Best you can do is use Slony for creating read only replicas and then send any write queries to the master database. Heres a link (though slightly old) explaining the situation.

      http://brianray.chipy.org/postgreSQL/cluster_vs_re plication.html [chipy.org].

      Not much has changed since then. I do think there are some 3rd party commercial apps that handle master-master replication or even shared storage but thats about it.
  • I know this is an unrelated question, but does anyone know if it's even possible to use SSL with MySQL (with the windows binaries). I ask cause i haven't found any documentation anywhere on this subject.
    • YaSSL support is included in the MySQL binary builds for most platforms in 5.0.20a, with support for fewer in the original 5.0.20 release. Release note [mysql.com].
    • Also, you might want to consider using the Open_SSL with your web server; I've used it on both IIS, and Apache. I think that Open_SSL lends itself to any of the major web servers.
    • If you mean SSL'ized connections, then you should stop reading this comment 'cause I don't know anything about that. However, are you familiar with ssh port forwarding? Works with just about any kind of TCP/IP traffic, and it's all but guaranteed to work where communication is carried out with a single TCP connection - such as between mysql clients and servers. I've done it on Windows with putty. There's a commandline client called plink, which I use to do port forwards.
  • MyPostgres? (Score:5, Interesting)

    by Doc Ruby (173196) on Sunday April 23, 2006 @12:51PM (#15185205) Homepage Journal
    Anyone have an equivalent HowTo for Postgres clusters?

    And while I'm getting everything in life I casually ask for, where's the SW that automatically swaps out a MySQL install and replaces it with Postgres, including revising source code that calls/queries the DB, or just uses a MySQL installation as a proxy replica for Postgres nodes in a mixed cluster?
    • Not been written yet AFAIK. Too many programming languages, table type issues, API quirks. I've made most of my PHP code work on both MySQL and PostgeSQL, as well as SQLite, but it was a bit of work to assemble the abstraction functions.

      In theory, an app to glob a mysql server and recreate it in a postgresql server (or vice-versa) could be written, and I may even try, but revising source code? Maybe if everything you use uses one language, the same API version, and you're willing to let the app handle a
    • And while I'm getting everything in life I casually ask for, where's the SW that automatically swaps out a MySQL install and replaces it with Postgres, including revising source code that calls/queries the DB, or just uses a MySQL installation as a proxy replica for Postgres nodes in a mixed cluster?

      Burried with Jimmy Hoffa.
  • by daitengu (172781) * on Sunday April 23, 2006 @01:08PM (#15185280) Homepage Journal
    I use a Monkey Management Technique for my mySQL servers. If one server goes down, I have a monkey that jumps up and down at the datacenter and gets real mad until someone fixes it.

    .. Works like a charm!
    • I think you mispelled managment.
      I know cause i get management and monkey mixed up all the time

    • Sadly, this is often much more effective and reliable than expensive commercial high-availability systems where the management tools and the absurdities they commit on the actual software actually make the system unreliable and liable to intermittent failures. Before spending the money on using such tools it's often worth examining what downtime costs and factoring in the likelihood of the high-availability tools failing or in fact never working at all.

      But a monitoring tool that actually and effectively rep
  • NDB and RAM (Score:3, Informative)

    by moogoogaipan (970221) on Sunday April 23, 2006 @01:51PM (#15185478)
    I heard that you need to have enough RAM on the SQL nodes to hold your entire database on each cluster machine so that NDB will work. I did not see this info in the front page. I think it's critical to know.
    • Close. You don't need enough RAM in each machine for the whole database. You can split the database among many machines and the combined set must have at least enough RAM for twice the data size.

      The Cluster storage engine wasn't originally intended for really large data sizes but disk storage support is being added because it turned out that many people want to be able to use it to build clusters in this way instead of via replication, so want to handle larger data sizes. If that interests you the 5.1 manua
    • Sequoia [continuent.org] provides both load balancing and failover. It supports all MySQL table types (and also other databases). One nice feature is transparent failover that completely hides any failure (connection, controller, backend) even in a middle of a transaction.
      Note that Sequoia now has a native implementation of the MySQL client library [continuent.org] to allow any other API besides JDBC.
      For other databases, ODBC support [continuent.org] is also available in the Carob [continuent.org] project.
  • We are using Continuent's m/cluster. Costs a little bit (still loads cheaper than Oracle) but so far delivers the goods with little setup or maintenance. It is for a web application so read performance is where it's at. It distributes the writes across all the nodes, so write performance actually takes a slight hit, but read performance is load balanced. We are running three dual processor nodes and the performance is fantastic. It's shared nothing, so all data is on disk on all three machines. Suppos

"Text processing has made it possible to right-justify any idea, even one which cannot be justified on any other grounds." -- J. Finnegan, USC.

Working...