Rob Riggs - Slashdot User

Comment Re:Procmail is a fine tool -- but the wrong tool (Score 1) 190

by jafo on Saturday August 31, 2013 @04:41PM (#44726165) Attached to: Ask Slashdot: Speeding Up Personal Anti-Spam Filters?

In an ideal world, many of the tips you mention would be fine and not produce any false positives. Unfortunately, we don't live in that world and users *WILL* receive e-mail from servers without proper PTF records, that don't know how to properly deal with greylisting (sending from multiple IPs or sender addresses, immediately bouncing on a 4xx response), from an IP that is on a blacklist... And god forbid you have any users, because they often will squeeze you from both ends: "I've *GOT* to receive this e-mail RIGHT NOW", but also: "Why am I getting so much spam?"

Comment Re:Problem spotted. (Score 1) 190

by jafo on Saturday August 31, 2013 @04:37PM (#44726133) Attached to: Ask Slashdot: Speeding Up Personal Anti-Spam Filters?

grep *CAN* take a bunch of patterns, we simply don't know if the user in question is using it in that way. Agreed though, if you are running egrep once for every pattern you are looking for, that is probably your problem and simply putting the patterns in a file and having egrep load the patterns from it via the "-f" flag will likely reduce this dramatically. However, doing many matches is still relatively expensive.

Comment Re:You could speed up your current solution (Score 1) 190

by jafo on Saturday August 31, 2013 @04:34PM (#44726107) Attached to: Ask Slashdot: Speeding Up Personal Anti-Spam Filters?

Doubtful that the time is largely spent compiling the regexes... But without knowing more about the OPs exact setup, it's hard to say. In particular, we don't know how many rules the OP has in their corpus. It could easily be tens of thousands or hundreds of thousands, if they just throw a bunch of strings they've seen in spam into a list of "don't let me see this message again" expressions. egrep is probably already compiling any expressions, it's just doing a *LOT* of matching.

You could try doing statistical matching on the corpus and moving more frequent matches earlier, so that matches cause the rules to terminate more quickly. "-q" might help speed it up by short-circuiting on failure (not sure if it does this or not, but I see no reason why "-q" wouldn't).

But to really improve the performance, you're probably going to have to simply be more clever than looking for a bunch of strings. For example, using something like razor fingerprinting or bayesian matching.

You can't just drop your corpus into a database and solve it, you'd need to come up with a way of indexing the data such as fingerprinting to get something that you can index.

You might also want to do different checks depending on whether the message is directly addressed to you or not. For example, any e-mail that doesn't mention one of my addresses in the To or Cc, or that comes from specific mailing lists, gets stored into a separate folder that I look at very rarely. The vast majority of spam that I get goes into that box.

Sender IP is VERY easy to use for a database lookup. When I get spam from an IP, I will often set of a blacklist for IPs around that address. Unless it is something like gmail or another big mail service that I recognize. It's surprising how often I get spam from a bunch of very similar IPs (in the same /24 or same /22).

Comment Re:Amended quote (Score 2) 743

by Rob Riggs on Thursday August 29, 2013 @02:19PM (#44708783) Attached to: Snowden Spoofed Top Officials' Identity To Mine NSA Secrets

You forgot the dash!

Yeah, that's right. I check my spelling with Yahoo! Answers before posting. Brilliant!!

Comment Re:Why do they care? (Score 0) 57

by Rob Riggs on Thursday August 29, 2013 @01:34PM (#44708169) Attached to: USPTO Publishes Suggestions For Intellectual Property Enforcement

Temporary monopoly? Copyright has not been that in my lifetime. And I'm what most around here would call "old". Who the fuck to I have to bribe to get Gilligan's Island in public domain? Sorry, I meant how many Senators do I need to buy? No, that's still not it... Which campaigns do I need to contribute to? Yeah -- that's the one I'm supposed to use in polite company.

Comment Re:Brilliant? (Score 5, Funny) 743

by Rob Riggs on Thursday August 29, 2013 @01:25PM (#44708061) Attached to: Snowden Spoofed Top Officials' Identity To Mine NSA Secrets

Umm, ok, now you have to be brilliant to "sudo su ".

Sucker. Now you'll never get hired by the NSA.

Comment Re:Amended quote (Score 5, Funny) 743

by Rob Riggs on Thursday August 29, 2013 @01:24PM (#44708045) Attached to: Snowden Spoofed Top Officials' Identity To Mine NSA Secrets

That's why I play dumb. Yeah -- that's it. I'm really brilliant in disguise so I will get hired. And keep up the facade so I won't get fired.

Comment Re:Call me old fashion (Score 1) 156

by jafo on Tuesday August 20, 2013 @06:48PM (#44624701) Attached to: Samsung SSD 840 EVO 250GB & 1TB TLC NAND Drives Tested

Worse, a lower rate kind of is *MORE* indicative of a load that needs an SSD rather than *LESS*. SSDs are *VERY* good at random seeks and you can easily saturate a spinning disc at 400KB/sec or less worth of random I/Os. (assuming 10ms average access time, or 100 accesses/sec).

If you are streaming a lot of data, an SSD is "only" around 4x faster than a spinning disc. If you are doing random I/Os, an SSD is more than 100x faster.

Comment Re:Uh huh (Score 1) 570

by Rob Riggs on Tuesday August 20, 2013 @10:46AM (#44618379) Attached to: The Steady Decline of Unix

A) I did not say a cluster is new. B) You did not have a VAX cluster so much as you had a DEC cluster. C) I have no problem at system-z heads laughing at Linux re-inventing things . Whatever features system-z has -- it (and your DEC cluster) has one key misfeature: vendor lock-in.

Comment Re:Uh huh (Score 1) 570

by Rob Riggs on Monday August 19, 2013 @05:21PM (#44611681) Attached to: The Steady Decline of Unix

Are they? Or are they just realizing that a cluster of redundant, possibly virtualized, machines is just as reliable even if each single machine is not? Two linux boxes with 99% uptime each running the same service redundantly is equivalent to one machine with 99.99% uptime but I bet the linux boxes are cheaper.

Exactly. Hardware and software architectures have changed a lot since 1973. Redundancy that used to be done in one piece of hardware -- "the server" or "the mainframe" -- is now handled by "the cluster". We still have expensive hardware when you look at the servers, network infrastructure, storage infrastructure, clustering and/or virtualization software and monitoring systems. But individually, we can take our pick of vendors for each of these components and that competition is what keeps the costs down.

Our vendors know that they cannot screw us (as, for example, Sun/Oracle does my previous employer) because they will very quickly find themselves with one less customer. There is healthy competition in the marketplace. And we work to avoid vendor lock-in.

We can also identify bottlenecks and selectively upgrade the pieces as needed. The cluster is organic in that regards. Our software runs on the same cluster it did years ago -- but all of the components have been upgraded numerous times, just like the cells in our bodies.

Comment Re:Why did Python avoid some common "OO" idioms? (Score 1) 242

by Rob Riggs on Monday August 19, 2013 @01:04PM (#44608777) Attached to: Interviews: Q&A With Guido van Rossum

Interfaces, abstract classes, private members, etc... Why did python avoid all this?

I'm curious -- how many dynamically-typed languages have these features?

Comment Re:No RHEL/CentOS? (Score 1) 627

by Rob Riggs on Friday August 09, 2013 @01:49PM (#44522901) Attached to: Your preferred Linux distribution for 2013?

Preferred? No. Required? Unfortunately...

I use Fedora at home (preferred) and RHEL at work (required). And I get the same yummy package management system for both. Besides, with the shit I pull on my home desktop machine, the added stability of RHEL isn't as noticeable as the lack of modern packages.

Comment Re:At the end of the day (Score 1) 634

by Rob Riggs on Friday August 09, 2013 @01:07PM (#44522319) Attached to: NSA Firing 90% of Its Sysadmins

Good point. Really good point. The only counter that occurs to me is: This assumes that these people are smart enough to put such a system in place. My vote would be, No. In fact, the resulting debacle might be entertaining.

Unfortunately, if it can be done, they have enough money to outsource the development to really smart people who can.

Comment Re:Let's all Google together. (Score 1) 923

by Rob Riggs on Friday August 02, 2013 @08:47AM (#44455249) Attached to: Google Pressure Cookers and Backpacks: Get a Visit From the Feds

We should all Google 'pressure cooker' and 'backpacks'. Let's send them for a spin.

pressure cooker, backpack, hot grits...

What's Stopping Us From Eating Insects? 655

Posted by timothy on Tuesday July 30, 2013 @11:45AM from the there-will-never-be-a-fast-food-place-called-thoraxes-etc. dept.

Lasrick writes "Scientific American has a really nice article explaining why insects should be considered a good food source, and how the encroachment of Western attitudes into societies that traditionally eat insects is affecting consumption of this important source of nutrients. Good stuff." Especially when they're so easy to grow.

Slashdot Top Deals