Doubtful that the time is largely spent compiling the regexes... But without knowing more about the OPs exact setup, it's hard to say. In particular, we don't know how many rules the OP has in their corpus. It could easily be tens of thousands or hundreds of thousands, if they just throw a bunch of strings they've seen in spam into a list of "don't let me see this message again" expressions. egrep is probably already compiling any expressions, it's just doing a *LOT* of matching.
You could try doing statistical matching on the corpus and moving more frequent matches earlier, so that matches cause the rules to terminate more quickly. "-q" might help speed it up by short-circuiting on failure (not sure if it does this or not, but I see no reason why "-q" wouldn't).
But to really improve the performance, you're probably going to have to simply be more clever than looking for a bunch of strings. For example, using something like razor fingerprinting or bayesian matching.
You can't just drop your corpus into a database and solve it, you'd need to come up with a way of indexing the data such as fingerprinting to get something that you can index.
You might also want to do different checks depending on whether the message is directly addressed to you or not. For example, any e-mail that doesn't mention one of my addresses in the To or Cc, or that comes from specific mailing lists, gets stored into a separate folder that I look at very rarely. The vast majority of spam that I get goes into that box.
Sender IP is VERY easy to use for a database lookup. When I get spam from an IP, I will often set of a blacklist for IPs around that address. Unless it is something like gmail or another big mail service that I recognize. It's surprising how often I get spam from a bunch of very similar IPs (in the same /24 or same /22).