In an ideal world, many of the tips you mention would be fine and not produce any false positives. Unfortunately, we don't live in that world and users *WILL* receive e-mail from servers without proper PTF records, that don't know how to properly deal with greylisting (sending from multiple IPs or sender addresses, immediately bouncing on a 4xx response), from an IP that is on a blacklist... And god forbid you have any users, because they often will squeeze you from both ends: "I've *GOT* to receive this e-mail RIGHT NOW", but also: "Why am I getting so much spam?"
grep *CAN* take a bunch of patterns, we simply don't know if the user in question is using it in that way. Agreed though, if you are running egrep once for every pattern you are looking for, that is probably your problem and simply putting the patterns in a file and having egrep load the patterns from it via the "-f" flag will likely reduce this dramatically. However, doing many matches is still relatively expensive.
Doubtful that the time is largely spent compiling the regexes... But without knowing more about the OPs exact setup, it's hard to say. In particular, we don't know how many rules the OP has in their corpus. It could easily be tens of thousands or hundreds of thousands, if they just throw a bunch of strings they've seen in spam into a list of "don't let me see this message again" expressions. egrep is probably already compiling any expressions, it's just doing a *LOT* of matching.
You could try doing statistical matching on the corpus and moving more frequent matches earlier, so that matches cause the rules to terminate more quickly. "-q" might help speed it up by short-circuiting on failure (not sure if it does this or not, but I see no reason why "-q" wouldn't).
But to really improve the performance, you're probably going to have to simply be more clever than looking for a bunch of strings. For example, using something like razor fingerprinting or bayesian matching.
You can't just drop your corpus into a database and solve it, you'd need to come up with a way of indexing the data such as fingerprinting to get something that you can index.
You might also want to do different checks depending on whether the message is directly addressed to you or not. For example, any e-mail that doesn't mention one of my addresses in the To or Cc, or that comes from specific mailing lists, gets stored into a separate folder that I look at very rarely. The vast majority of spam that I get goes into that box.
Sender IP is VERY easy to use for a database lookup. When I get spam from an IP, I will often set of a blacklist for IPs around that address. Unless it is something like gmail or another big mail service that I recognize. It's surprising how often I get spam from a bunch of very similar IPs (in the same
Worse, a lower rate kind of is *MORE* indicative of a load that needs an SSD rather than *LESS*. SSDs are *VERY* good at random seeks and you can easily saturate a spinning disc at 400KB/sec or less worth of random I/Os. (assuming 10ms average access time, or 100 accesses/sec).
If you are streaming a lot of data, an SSD is "only" around 4x faster than a spinning disc. If you are doing random I/Os, an SSD is more than 100x faster.
I'd love to write you a letter but my pen needs charging.
I used to work doing IT for the ILEC and the more I worked with their systems the more surprised that I was able to pick up the phone and get a dial-tone. A friend of mine worked on the systems that managed the in-the-ground cables, he's the one that said the previous sentence. I worked mostly on the billing and ordering systems. They were not the most robust systems.
You mention the "time out" before sedation is administered, which is great. But, the last time I had a procedure, we went through a bunch of stuff before I went into the operating theater, with my wife present and verifying everything that was going on. Then I was moved into an operating theater and asked by someone I had never met before to sign a paper about some sort of sedation, which I could opt out of. Without my wife present to "double check my math".
At this point, I hadn't had anything to eat in 24 hours, nothing to drink in 12, and had gotten little if any sleep in the previous 24+ hours because of the operation preparation... To recap: I was sleep deprived, dehydrated, and my blood sugar was all messed up, and had to make a decision that was so important that it required signing off on a page of dense text.
In retrospect, I should have said that I wasn't able to make that decision and lobbed the ball back into their court. What I did was the doctor said he recommended it, and I signed it.
In second retrospect, if at all possible, I'm never going to "meet" a doctor in the operating room. Apparently I had the opportunity to have a sit-down in their office, but this was presented to me as a waste of time. Never again...
zfsonlinux has less testing than Btrfs? Really?
I think you mean *THE LINUX SHIM* has less testing. However, there's this *HUGE* portion of the code, as a wild ass guess I'd say 80%, which is the internal algorithms, data structures, and other internal parts of the file-system that are shared by the Linux and Solaris versions and those have been quite seriously tested for ZFS.
My experience with ZFS under Linux via FUSE was that there were some bugs in the integration layer, but they tended to be fairly shallow and never lead to data loss. This is over around 3 years of ZFS+FUSE on Linux serious use (~30TB of backup storage, home storage server). I tested the heck out of ZFS+FUSE before we deployed it, found some issues, worked with the developers (who were amazing!), and eventually got to a point where the stress test I was running on it was more stable than it was under our OpenSolaris systems a few years prior (and the reason I built the stress test).
Based on my experience with ZFS, ZFS+FUSE, and btrfs, I'd personally trust ZFSonLinux over btrfs. My experimentation with btrfs the last few years has been that it still needs a lot of work.
Please explain it to me, because I really don't see any reason not to rely on an "out of tree FS". My system won't boot without tons of stuff that is outside of the kernel tree, including things like init but also things like graphics drivers on my desktop.
It seems to me that the ZFS license issue is only with the kernel, and can be solved by distributors. Distributions deal with wrapping up things under multiple licenses *ALL THE TIME*. And Ubuntu seems to be pretty close to having this integration done, based on what a friend reported with his experiments with zfsonlinux as a root device.
With all due respect to those involved, I think the pronouncement that it must be in the kernel and that it must be in the kernel, and that it is a "rampant layering violation" have set Linux back a long ways. FreeBSD, DragonFly BSD, OpenSolaris, have all had "advanced filesystems" for years now. Linux is basically stuck with a feature-set from Berkeley FFS and isn't really showing that that is going to change for several years... It's kind of a shame, especially since at the time of the "layering violation" comment it was clear to me that the violation came with significant compelling reasons for it, and now btrfs seems to be realizing that and implementing the same features...
Hindsight and all that, but it's a damn shame. ZFS is insanely awesome, I have a number of systems running it under FUSE and it has proven very reliable over the years.
If you are "trusting your data" to *ANY* file-system, you are likely to be disappointed.
I have run btrfs off and on for maybe 3 or 4 years because I don't *HAVE* to trust my data to it. I have good backups that run daily. If btrfs screws the pooch, I'm not really out that much.
Note though, my backup servers run ZFS.
Honestly, it seems to me that btrfs has gotten worse over the last few years rather than better. 4 years or so ago when I first started using it, it actually worked pretty well and I was fairly happy with it, including taking automatic snapshots, but I never had a data loss. ISTR that I switched away from it because I upgraded to a new distro and had to reformat, for various reasons. Newer versions I've tried have been barely usable and I've had brtfs wedge itself a few times. Some of the issues were distro integration issues I think, like 12.04 seemed to *ALWAYS* run a full fsck on boot, and I think it took a snapshot when I tried to do an upgrade to 12.10, which somehow caused it to think that it had space available when it didn't and it ran out of disc space during the upgrade...
I really want btrfs to get production ready, but I'm half thinking that by the time it is HAMMER2 will be out and I'll be infatuated with it. Note that btrfs and HAMMER started around the same time, maybe HAMMER had a 6 month lead. HAMMER has been "production stable" and has been the default Dragonfly BSD filesystem for several years. Dillon seems to know how to build a file-system...
PyCon, as an organization, takes it very seriously if someone expresses what they feel is a code of conduct violation. They didn't "side" with either party, they arbitrated a discussion between the 3 of them. They would have done that no matter the gender of the reporting and reportee sides, I am quite confident. I say this because I know the organizers fairly well.
That said, one evening I said something fairly similar to one of the organizers and another community member: "I just wish we could all act like adults". The one guy said "I hear you, but I can cite several papers on why we can't just do that." And with this guy, I have no doubt that he literally could. His theory (which I buy because he's much smarter than me
I'll admit, I have done some soul searching since I heard about donglegate about whether I would attend PyCon 2014. I hear other responses saying things like "If I were a python dev, I would
But here's the thing... Not going doesn't really send a message to the conference organizers, or at least it doesn't send the one you think it does. More on that in a moment. What it *DOES* send is a message to people who will take any opportunity to grandstand on their agenda, that they can find an audience at these conferences, to the extent that it goes on for multiple years. It doesn't matter whether the actions taken here were grandstanding or not. Irrespective of her intentions, many people are seeing it as such, so I think it's fair to say it can send a message to others who would, without speculating on the intentions that started this.
If conference attendance were way down next year, the story would be about how donglegate caused it, and it would be feeding all the horrific sentiments behind this. If, however, attendance is up next year, the story will be how despite this the Python community remained strong, shutting down the bad sentiments and making it into a positive story.
Unfortunately, I and a number of folks are expecting attendance to be down next year, before any of this donglegate stuff came out. PyCon tends to lose attendees every time it moves cities -- though moving to Santa Clara didn't suffer from that. Moving it such that a significant number of Americans need passports, who haven't in the past, may reduce attendance. On the other hand, there may be people who come from around the world who didn't want to deal with the TSA... It's all speculation, but an informal poll I took showed about half the people were expecting it to be smaller.
So why doesn't it send a useful message to the organizers? Because the conference organizers did all they could about this incident. When the incident was reported, they acted swiftly (by all accounts), spoke to the 3 involved, apologies were given and apparently accepted, and everyone went away happy. No complaints were filed about the posting of the photograph.
Everything that happened that is making this show up on slashdot happened *OUTSIDE THE CONFERENCE*. The incident itself happened, I believe, in the last hour of the conference (her blog post sounds like it happened during the closing Lightning Talks, the last session of the conference). But in any case, the firing and rage happened largely on the Internet, in response to her post of that picture.
What can the conference do? Ban any of them from the show in future years? The only official complaint to the conference was handled to the satisfaction of all involved, at the time. Excluding someone from the conference without any complaint would lead to another storm...
As I said in the subject, PyCon is a wonderful thing. I've been to 10 of them, I've only missed one. PyCon has been working hard to include more diversity, and this year we had around 20% women. I remember when we literally had a hand-full of women at PyCon, and I was married to one of them. In order to get here PyCon has had to do a lot of outreach and take reports of harassment and the like very seriously. The community is stronger for it. And we now have experience dealing with someone tweeting "shame photos"...
Retaliating against the conference for this is going to do more harm than good. Plain and simple.
Am I going to PyCon 2014? Absolutely!
The guys were *NOT* kicked out. None of the three were kicked out. According to the official statement and my personal conversations with other conference organizers:
"Both parties were met with, in private. The comments that were made were in poor taste, and individuals involved agreed, apologized and no further actions were taken by the staff of PyCon 2013. No individuals were removed from the conference, no sanctions were levied."
The short answer is: If you tell us that an e-mail from you is not to be trusted, we will honor that request. But if our system falsely catches an e-mail from you, say because of a bad SPF record, we will whitelist that sender.
We have a custom system I built and haven't yet had time to polish for a open source project (and it needs it before it could be publicly released). It has an awesome feature though...
Messages are rejected at the SMTP level, for things like SPF, greylisting, and SpamAssassin. The bounce message has a URL that the sender can visit to release the message. I was anticipating this might get abused and have to have a captcha on it, but so far it has not. Actually, it's worse than that, 99+% of people who get this message don't read any further than the subject, which is generated by their mail server, so they usually contact us some other way saying "Your mail server is broken".
BUT, the killer feature is that our users can go to a website and see the messages sent to them, and "release" them. So if you are looking for an important message no problem! You go to that page, type Control-F and the sender name or subject you are expecting, and click a button, and you have the message. It's kind of like a quarantine, but it's controlled by the sender AND the recipient.
Now, as far as what we do about senders that have broken SPF... We add them to a whitelist and tell them "You've been whitelisted, but your domain is publishing a notification that says this e-mail is invalid, you probably want your mail server admin to fix this because other places are honoring this as well."
No big deal, not ideological stands, we just deal with it and report it on.
Because that's the crucial thing: Your domain is telling me that this e-mail is not to be trusted. The CEO of our company understands that yelling "You can never have a false positive" means that they are going to have to deal with an inbox full of sewage -- they understand what false negatives are.
"Internet access is as crucial to everyday life as having a phone connection [...]"
The telcos *WISH* that having a phone connection were as crucial to everyday life as Internet access...