Building a Scalable Mail System? 109
clusteredMail asks: "I work for a small ISP that up until now has survived with single servers for most critical roles, including the mail server. We are planning to introduce multiple mail servers (primarily for email collection via POP3 and IMAP) and want to put in place the most scalable, resistant to failure system that we can manage. Everything is currently running on one or another flavour of Linux. In my mind, the ultimate scenario would be to have some sort of distributed/clustered file system between the multiple machines, so that any user could log onto any server, and the loss of a single server would not cause downtime for any group of users. Has anyone in the Slashdot community had to put together a system like this using Linux and Open Source Software? If so, how did you fare and what were the major stumbling blocks?"
"So far, the plan is to split up the mail accounts between multiple servers and use some sort of connection proxy to sort out which account logs into which server but this seems like a rough approach. The disadvantage to this setup: if one server fails all the users who have accounts on that machine will be in the dark, email-wise."
Check out Perdition (Score:5, Informative)
Re:Check out Perdition (Score:4, Interesting)
It uses epoll. We replaced a perdition proxy that was seriously loading two servers with a single 8 process nginx instance that's not even breaking a sweat. It's amazing what the change from 32000 process down to 8 processes can do on a busy site! The two frontend machines are now configured with heartbeat to get full failover of IP addresses. Downtime appears to be on the order of 1-2 seconds with an orderly cutover and probably about 10 seconds for a total host failure.
Cyrus supports replication now, which is a good way to handle the backends. I'd say more about it, but I haven't actually finished configuring the full failover system yet for this - lots of gating logic required to make sure two machines don't both believe they're master for a bit!
Er, but why would I help you anyway, you're the competition
(I work for FastMail.FM [fastmail.fm] btw)
Re:Check out Perdition (Score:1)
Re:Check out Perdition (Score:2)
I might write up something about nginx as a pop/imap proxy and ask Igor to link to it.
It really is very nice to use, if hard to read the docs!
Re:Check out Perdition (Score:2)
I still read the Axkit mailing list for its spam and other exciting goodness - but don't have much need to touch XML, bargepole or not, these days.
Bron.
MIT (Score:2, Interesting)
Of course, that might just be because the IT department at MIT does not take advice from the faculty and students, and just generally sucks.
Re:MIT (Score:2)
It's probably a lot more reliable since the IT department doesn't have to deal with the hassles of faculty and students mucking up the email servers. Besides, they probably don't want to share their Diablo 2 game server (which happened at one company I worked for).
You mean, like Cyrus? (Score:5, Informative)
It's not for the faint of heart, but only takes a couple of "a-ha!" moments to go from lost to competent. Good luck!
Re:You mean, like Cyrus? (Score:4, Informative)
However if you used the Murder as your frontend for clients, and applied fairly standard high availability tactics to the individual backends you could achieve clustering. Make each backend server a redundant load balanced virtual server, then make the Murder know about the mailbox locations on the virutal systems.
I'm sure it could be done, but its definitely not something that Cyrus does out of the box.
In practice the multiple servers w/ proxy has been good enough for CMU. With good hardware for the backend servers, and good RAID arrays, hardware failures are rare.
Re:You mean, like Cyrus? (Score:1)
Re:You mean, like Cyrus? (Score:2)
The short answer [cmu.edu] is that a Murder tries to present a unified system image, which is important if you use a lot of things like shared folders.
The trick is... (Score:4, Funny)
Re:The trick is... (Score:2)
Re:The trick is... (Score:1)
You'd be better of with seperate overlapping sheets of mail head to the body with leather straps. Maybe not as good as a
There is always (Score:4, Funny)
Re:There is always (Score:2)
Re:There is always (Score:1)
You'll have to apply a prefix to all of your accounts [eg: -mybusiness-_-username-@gmail.com] and then map users to gmail MX & POP/IMAP accounts... good luck with that.
Re:There is always (Score:2)
YES RLY. It's called GMail for Domains.
https://www.google.com/hosted/ [google.com]
Re:There is always (Score:2)
Sounds like there's a little more you need to do, besides just pointing your MX record at their servers. Like for instance, getting them to accept you as a beta user, answering a f
Foundation. (Score:3, Interesting)
Re:Foundation. (Score:1)
distributed (Score:2)
don't forget testing it. get a system up and running, then see what happens if you power off the master. Hint: it shouldn't change anything.
Re:distributed (Score:2)
Re:distributed (Score:2)
Re:distributed (Score:2)
EMC's not bad either, but I think you get better bang for the buck with Netapp. They are faster. Isn't IBM just rebranding the Netapp's now anyway??? Netapp's snapshots are so cool. Wish we could get that on Linux. EMC doesn't come close either.
Re:distributed (Score:1)
Re:distributed (Score:2)
Re:distributed (Score:2)
Re:distributed (Score:2)
Re:distributed (Score:2)
A pity, cos NetApp gear looks pretty swish.
Use NFS (Score:3, Informative)
Scaling can be done easily by adding more NFS boxes and managing the directory structure with links or whatnot.
Re:Use NFS (Score:3)
Just to clarify (I initially parsed this incorrectly), you mean, "all the interesting mail processing software is written to use [perhaps optionally] forms of locking that NFS can tolerate", right?
Usually locking is the problem people hit with NFS.
Re:Use NFS (Score:2)
Solaris, or in extremis NetApp or the other dedicated NAS boxes, do just fine, up to pretty darn big sized installations.
I am in the process of doing this... (Score:3, Interesting)
Cyrus + postfix + ldap + spam/virus (Score:2)
Re:Cyrus + postfix + ldap + spam/virus (Score:1)
Re:Cyrus + postfix + ldap + spam/virus (Score:1)
Re:Cyrus + postfix + ldap + spam/virus (Score:2)
Amen, brother, amen. But you'd have to pay yearly for the OpenVMS licenses. And that adds up.
Re:Cyrus + postfix + ldap + spam/virus (Score:1)
It's a bit like a friend that works for $.EDU.AU
For starters... (Score:5, Informative)
Re:For starters... (Score:1, Flamebait)
Re:For starters... (Score:2)
And then there's the breakage over NFS, the locking problems, the fact that it makes it harder to cluster, and so on.
Re:For starters... (Score:2)
kashani
Re:For starters... (Score:2, Insightful)
Re:For starters... (Score:4, Informative)
You shouldn't state this as an absolute, because it's not. You also need to give reasons WHY to use maildir.
An example exception case: We had an application where thousands of very small emails needed to be delivered to a single mailbox every minute. They all get picked up every minute by POP, and all messages are deleted every cycle. mbox is *vastly* better in this scenario, because you don't have to create all the files, move them around a few times, stat large dirs every time POP runs, etc. With mbox, all the delivery threads become sequential, so you cut down seek overhead, and the POP read becomes a single large file read, which is far faster. You also cut way down on metadata updates, and caching works better.
mbox shines in this scenario, and it's not that uncommon. Many customer service apps work like this.
In the situation of handling many users's email in a scalable system Maildir is usually better (NFS-safe, concurrent delivery, efficient individual message deletion, etc), but you've not even considered the other range of things available. MH and database backends come to mind. Each has their good and bad points.
Re:For starters... (Score:2)
Re:For starters... (Score:2)
Re:For starters... (Score:1)
Re:For starters... (Score:2)
Re:For starters... (Score:1)
If you are using mail servers for something other than email, then advice about how to handle email would indeed not apply.
Your comment is sort of like saying the advice for safely handling uncooked poultry [k12.ga.us] doesn't really apply to the test engineers using the c [straightdope.com]
Re:For starters... (Score:1)
Re:For starters... (Score:2)
While your scenario probably does need mbox, it's not really typical. The biggest factor is that it deletes all mail in every pickup cycle. Of course that does mean there is mail arriving between the time all mail was fetched and the delete all is requested, which means the IMAP process has to rewrite the mbox file, which means locking out further deliveries while that is taking place. But regardless, it's not the kind of scenario most mail server users are doing. Your setup is probably better off using
Re:For starters... (Score:1)
Maildir sucks badly at such scales. Why? Try accessing 100000 files at once or just having 10000000000 files lying around.
Shared Storage (Score:3)
Re:Shared Storage (Score:2)
OpenVMS. Clustering is deeply built into the OS and all relevant libraries.
open SMTP relays (Score:3, Funny)
now if I could only figure out how to receive...
Yet again, Communigate Pro (Score:1)
Run it on Linux, it just works...
Re:Yet again, Communigate Pro (Score:1)
Honestly, I have a hard time trusting anything owned by a company with the word "Stalker" in it.
so you want what ? load / redunancy / clustering ? (Score:2)
o when a user points their mail client at the server do you want one address ?
if yes
then you want to invest in equipment/software load balance the IMAP/POP3/HTTP sessions across mail servers And shared require all serve out same same client data
and
you want several/all mail servers to be able serve out all the accounts then you want a shared stroage backend
(either file system OR database wise)
else
simple option many mail servers all acting exactly the same with exactly t
Re:so you want what ? load / redunancy / clusterin (Score:2)
Re:so you want what ? load / redunancy / clusterin (Score:2)
Re:so you want what ? load / redunancy / clusterin (Score:2)
We'll be using a few front-line MX boxes running Postfix, SpamAssassin, and PureMessage (not FOSS), deli
Re:so you want what ? load / redunancy / clusterin (Score:2)
Cyrus machines, which will share SAN storage via Lustre.
Have you tested that this provides good performance? AFAIK Lustre is designed and optimized for massively parallel applications doing sequential IO on huge files (e.g. HPC apps using the MPI-IO API). Sounds like maildir (lots of tiny files) is the exact opposite.
Clustering and redundancy? Look at CommuniGate... (Score:2, Interesting)
Some ideas for qmail (Score:2)
Re:Some ideas for qmail (Score:2)
Don't depend on finding anything in the code either. Everything, I mean _everything_ is hard coded. If you can navigate around regular open source code, it's not going to help for qmail. Everything is a nightmare in there. I encourage you to compar
Scalix (Score:2)
It costs, but since I've been looking into mail servers lately I can let you know Scalix has an enterprise edition [mailto] that run on multiple servers.
BAD LINK (Score:2)
Scalix Comparison [scalix.com]
IBRIX (Score:3)
If you have a lot of data, then you can choose a scalable system like IBRIX [ibrix.com] and then use stateful load balancers between each of the POP3/IMAP servers. When you get to multiple nodes on the same filesystem, you have two problems: synchronization between nodes and locking.
Note that the Oracle Clustered Filesystem v2 has now been merged with the mainline kernel.
Use a forwarding server on the front end (Score:2)
You can even cluster the load balancers...
Re:Use a forwarding server on the front end (Score:2)
Re:Use a forwarding server on the front end (Score:2)
Re:Use a forwarding server on the front end (Score:2)
Re:Use a forwarding server on the front end (Score:2)
Also the company is pretty small and staffed by clued people, which I always appreciate
Layer 4/7 switching (Score:2, Informative)
Using this the only thing your servers need in common is backend storage that you can easily mount off NFS etc.
Re:Layer 4/7 switching (Score:3, Funny)
I've built several systems like this, one is close to 100,000 accounts with no problems. This system scales out (i.e. adding another cheap server for more power) as opposed to scaling up with huge servers (price, power demands, price, and price).
It's also very easy to troubleshoot.
Do not split your email users between servers and proxy them. Big problems.
Qmail/Vpopmail, but only if you can spend the time (Score:3, Interesting)
That being said, it's also rock-solid, extremely fast when properly configured, and more flexible than you can imagine.
We currently use a single RAID-10 NFS and MySQL DB system handling the backend, with 5 cluster servers in front of it, each of them able to perform any number of roles. (We had a load balancer in front of them at one point, but it actually more just got in the way than anything else.) A sixth box handles all DNS requests for the servers, and we'll be bringing a 7th up soon to offload some of the spam processing from the three that currently run our asnychronous processing code. The cluster boxes are cheap MicroATX Athlon XP 3000+ machines with 2 GB of RAM. I've seen each box take well over a 100 simultanous SMTP connections without CPU being noticably affected. Current 1 does webmail, 1 does incoming MX, 1 does POP3/IMAP, 1 is for development and servers IMAP to the webmail box, and 1 is running SMTP, 587, and SMTP-SSL.
When properly administered, I think it beats anything out there. However, if you can't afford the time and 3am-bang-your-head-against-your-monitor agony, I'd suggest one of the other solutions people have mentioned here.
My $.02
My solution (Score:5, Informative)
Data stores were maildirs on NetApps
SMTP servers running Postfix
IMAP servers running Courier IMAP
Logins via NIS
IMAP and SMTP failover by means of load balancers
The SMTP and IMAP servers get NIS-distributed automounter tables, so everyone's homedir is available everywhere. The load balancers distribute the load out to the SMTP and IMAP servers, and work around any that fail. Mail comes into the SMTP servers, and Postfix delivers to maildirs in the users' homedirs. Any SMTP server can deliver to any user. Users log in with IMAP on the Courier IMAP servers. Again, all homedirs are everywhere, so it doesn't matter which server they hit.
Adding capacity at any point is easy - you just add more servers of the appropriate type when you need more. IMAP and SMTP are fully redundant. Load balancers usually only operate in failover pairs, but you can add more A records in DNS for more LB pairs if you need it.
The one sticky point is the data stores on the NFS servers. Adding capacity is easy (just add more servers). but there's no easy way to make this fully redundant. See notes for more.
So there you have it. That'll scale to a pretty large system, and it's simple to implement. It's not THE MOST scalable system, but if you have to ask, this is probably sufficient for your needs.
Notes:
You must use maildirs, not mbox. Maildirs perform very well even on NFS, because there can be multiple simultaneous readers and writers. mbox requires locking.
With NetApp, or Red Hat Cluster Server, or any other cluster NFS server, you can make the head end redundant, so your disk shelf becomes the last single point of failure. If you run RAID 1+0, you can have all the disks mirrored across two shelves, so at least the hardware is completely redundant. However, there are still rare, but possible failure modes. STONITH is, ultimately, a problem that has no perfect solution. (Look it up if you're not familiar with STONITH.)
NetApp makes very reliable NFS servers. Even in single head configurations my uptime experience has been incredibly good. Dual head is even better. But they're god awful expensive. There are other ones you can buy at all different price points. Clustered file systems like Coda sound really sexy, but they're still half baked. Lustre http://www.lustre.org/ [lustre.org] might work well, but it wasn't available when I last did this, so I can't say. Choose what's appropriate to your needs and budget.
I used NIS. These days LDAP is more fashionable. Make your LDAP server redundant of course.
You need redundant networks. In the simplest case, put half of each type of servers (IMAP, SMTP, LB, NFS) on two different switches.
I never bothered with POP, but you can get POP servers for maildirs, too.
Configure your load balancers to balance per session - IE, if a user creates multiple IMAP connections, they all go to the same server. This helps keep down the number of NFS mounts, LDAP requests, etc.
Software opinions: I like Postfix and Courier. They're simple, robust, flexible enough for most situations, and perform very well. Cyrus also has a good following in the large-scale arena, but does things different. Qmail's non-OSS license prevents people from releasing versions that strip out djb's quirky way of doing things, which is why I left it for Postfix (and never looked back). Sendmail doesn't suck as much as it used to, but I haven't really seen why I SHOULD use it these days either. Any of these can be made to work, though, so use whatever you're comfortable with.
Tip for any email system: outright reject (IE, don't accept at all, don't send to someone's spam folder) as much spam as you can. If 90% of your mail is spam, and you reject the 90% most-likely-spam (delivering the other 10% more questionable stuff to a spam folder), you've just increased your mail performance and disk space by > 5x.
Good luck!
Re:My solution (Score:1)
Locking is obviously not an issue with mail. A mailbox is accessed by one user -- his owner -- and local delivery agent. Locks are fine here. There is indeed some concurrent access but as far as mail mostly sits idle in the file and waits it's not a problem.
Why locking matters (Score:2)
A typical way to set this up. (Score:3, Informative)
High availability redundant NFS servers for storing the mailbox data and user information.
One or more machines mounting this file-system for handling POP, IMAP, and SMTP from accounts and mailfolders off the NFS server.
Webmail can be tricky because you need to make sure that either users always hit the same machine for webmail during a session, or session information is shared among the cluster. LVS systems can handle either of these scenarios, so it's not a problem, just something you have to be aware of.
LVS systems up front, again running High Availability which do load-balancing and automatic removal of failed servers. These are the machines that have the IPs which your customers contact, and then get spread across the real machines in the middle layer above.
This sort of solution works really well, and we have deployed it for customers of ours with good results. You can get started for only $5k to $10k worth of hardware and if you're building this from scratch it will probably only take you around 100 hours. If you have experience with this sort of setup it can take as little as 10 to 20.
If $5k to $10k for hardware is out of your budget, you probably shouldn't be looking at this sort of solution. Individual stand-alone servers or even a single pointy box, possibly with high availability, is probably where you want to be in that case.
linux-ha.org is the place to go for High Availability software on Linux.
Sean
Scalemail (Score:1)
A Real-world Big Design (Score:5, Informative)
Before I feed you the design, let me tell you a *crucial* concept that you must carry with you at all times.
EMAIL SYSTEMS ARE PROTOCOL SPEAKERS BETWEEN USER DIRECTORIES AND STORAGE.
Read that and inwardly digest it before you even start to design your system.
For the design, first, I'm going to proselytize a particular piece of software.
DOVECOT IS THE FREE POP/IMAP SERVER OF THE FUTURE. It leaves the Cyrus codebase rotting in the slime. It already kicks Courier's butt in performance and ease of deployment. It's beautifully coded; it has the most elegant authentication architecture; it's exceptionally fast. It isn't complete yet but it's featureful and stable enough that I have successfully deployed 1.0-betas into production. http://www.dovecot.org/ [dovecot.org] for the last IMAP server you'll ever need.
Here is the design:
1 x OpenLDAP 2.3 master server
2 x OpenLDAP 2.3 read-only replicas
2 x world-facing mail servers running Postfix 2.3
4 x mail scanning servers running amavisd-new 2.3.3, ClamAV, SpamAssassin, Sophos SAVI and Sophos PMX-ENGINE. LMTP in from the mail front-ends; ESMTP out to the mail storage.
2 x mail storage front-ends running Postfix 2.3 and Dovecot IMAP/POP3 1.0-beta. These servers also run mysql for amavisd-new quarantine and squirrelmail user options. Actual storage is over NFS to the NetApps. Using Dovecot's Sieve-based delivery agent for server-side filtering.
2 x Squirrelmail webmail servers. We have our own skin, and our own sqm plugins as the user interface to our various system options - which are all in LDAP. We have integrated MailZu into sqm as a quarantine view/release interface.
2 x NetApp FAS3020c heads w/4TB NFS storage allocated to mail.
Everything is load-balanced using foundry hardware LBs. It's very high-throughput and very reliable. It's also easy to monitor (we're using Nagios).
Base OS is Debian Sarge with applicable backports. I'd prefer FreeBSD but this happens to be a Debian shop, and I wasn't out to change their world, just their mail system.
Probably the most borderline item is mysql's performance as a quarantine DB; however much RAM and index/query tuning we throw at it, I'm yet to be satisfied with InnoDB's performance on this 100GB+ INSERT-heavy database.
If I could change one thing about it, it'd be to use the extremely pretty and surprisingly good value @mail (a commercial choice) rather than SquirrelMail. I'd also consider Fedora Directory Server over OpenLDAP, but it wasn't looking ready for this design at the time.
I have to say there is some bad advice in this thread; now for the hatchet:
Cyrus: difficult to configure, doesn't support shared storage, horribly ugly codebase, and has some nasty-ass failure modes.
Qmail: stale, poorly integrated MTA software from the bitchiest developer in town.
Sendmail: doesn't scale. Even the developers think so, which is why Sendmail X is a rip-off of postfix.
Communigate Pro: if I don't get to futz with the source for integration and value-add, I'm not interested.
GFS/GPFS: you don't need the complexity or interesting failure modes of shared-block-storage filesystems. Stay away.
Linux NFS: isn't reliable enough. We've had problems with data corruption to Linux NFS, both kernel and userland. Right now the only NFS server implementations I trust are NetApp's and Solaris's. No doubt the Linux one can/will improve, or already has, but trust is a hard thing to build
Re:A Real-world Big Design (Score:2)
I'd be curious about the "interesting failure modes" you mention. Do you have (bad) experience with this? I've deployed this technology multiple times with great success so far. It appears from your design you also try to avoid the "single point of failure". That's why with FC shared storage and GFS I generally design in more than one SAN/RAID unit and keep them sync'ed. GFS in my experience t
Re:A Real-world Big Design (Score:1)
CGP has a well-documented api for all kinds of third-party integration. I have sucessfully integrated ClamAV and SpamAssassin to communigate.....so I don't really understand the need for source here. In fact - CommuniGate's flexibility in this area (the ability to interface with other applications) makes it easier to work with than any other mailer I've worked on before. What kind of value-add/
I wrote my SMTP/POP servers to handle this problem (Score:2, Interesting)
At first I used Postfix and Cyrus, but I found it to be a nightmare when your talking about more than 50k accounts.
What I wanted was an email platform that integrated with ClamAV, DSPAM, supported SPF, Greylisting/Blacklisting/Whitelisting, and was all controlled from a MySQL database. I also wanted it to support SSL, and clustering.
Frankly I didn't find anything. So I wr
Re:I wrote my SMTP/POP servers to handle this prob (Score:3, Informative)
We use open source software throughout our system and contribute back most of our changes (where they actually have some utility outside our little world, 50 line perl programs that just query out database for status information need not apply - and we wouldn't want to inflict our web framework on the world. It certainly doesn't need another
Flawed plan. (Score:2)
Now 3 servers would be a waste. Think about it. What are the chances a casing would FAIL?
So lets put 3 servers in one box. Data has to go onto each of the 3 disks. Instantly. Theres so much IO involved. Should each email coming in have to go through the tcpip stack, through the kernel API levels, through the HAL out the driver, out the network card, through the switch and all the way back down to the disk? Using too m
How large providers do it. (Score:2)
http://72.14.203.104/search?q=cache:v5XWBwgqXQcJ:w ww.hserus.net/mailboxes-srs-inboxevent2004.ppt+inb oxevent2004.ppt+site:hserus.net&hl=en&ct=clnk&cd=1 [72.14.203.104]
That system currently handles over 41 million users, serves up POP3, IMAP, Webmail, spam and virus filtering for paying customers, and deals with over half a billion messages per day.
Every service is on physically separate hardware: MX, outbound MTAs, content filters, frontends....
Why bother? (Score:1)
Definitely more scalable than anything that you can come up with.
Mail:Toaster (Score:3, Informative)
http://www.tnpi.biz/internet/mail/toaster/ [tnpi.biz]
it's qmail/imap based and scales quite well in my experience.
Re:Mail:Toaster - I disagree (Score:2)
I will say that using tcpserver is a hit/miss proposition. If you don't get the memory requirements just right, you can easily take down a server with too many processes.
My primary incoming MX is a $200 machine from WAL-MART with maybe 768MB. I've eaten a few h
how cam.ac.uk do it.. (Score:2)
is how the University of Cambridge do it....
lots of nice details in there
look into dbmail (Score:1)
overview [dbmail.org]
Dbmail is as scalable as the database system that is used for the mail storage. In theory millions of accounts can be managed using dbmail. One could, for example, run 4 different servers with the pop3 daemon each connecting to the same database (cluster) server.
Dbmail is based u
Sun Messaging Server (Score:1)