Comment The Test Lab Response. (Score 5, Informative) 200
Hi All,
Love all the comments !! And despite popular belief I did not get my 2 year old son to write the review.
Reading through them it seems to me there is definitely a few misconceptions that need to be cleared up, so hopefully this may sort a few things out. Then again it may not! :-)
Before we begin down this path I appreciate your patience in getting through this abnormally large post, but it is better to deal with the comments on a whole rather than one by one.
1. We are the RMIT Test Lab, based in Australia, we are a totally separate organization from the magazine who is one of our clients, they contract us to perform three independent technology reviews every month on products that they invite the vendors to submit. The RMIT Test Lab will have been performing independent magazine reviews for 16 years in January 2005. We have certainly produced a hell of a lot of words over that time. For more information on the RMIT Test Lab hit www.testlab.rmit.edu.au The vendors don't pay the Lab one cent to have their products tested for the magazine reviews.
2. For all you Open Source buffs out there, you know who you are! The magazine creates a list of what technologies will be tested approx. six months in advance, one and a half months before going to press the magazine issues invites to various product vendors to submit product(s) to us at the lab for testing, this is generally accompanied by a "scenario" which is set by the magazine to ensure that the vendors stick to certain criteria and submit products of a certain caliber/type and not all eight products that they may have in their inventory which fits into that review category. Therefore it is the magazine who invites the vendors, not the Test Lab nor the reviewer. Basically we have no control over which vendors are invited to submit and at the end of the day every single vendor could not possibly be reviewed, there will always be some who cant submit, wont submit, have not been invited or don't have Australia as a target market. So don't blame us for not including Spam Assassin or any of the other 100's of commercial and open source Anti-Spam solutions that are out there. Also note that a review we have recently completed and submitted "E-Mail Clients" for the next edition of the magazine contained several Open Source products, and a review we have just commenced "Internet Browsers" also contains several Open Source products too. So before pulling out the "Paid for Results" and "Advertising Driven" and "Open Source Bashing" comments think again and take a look at a few of the other reviews we have performed.
3. We are fundamentally IT engineers who design and execute testing frameworks, methodologies and create reports, we just happen to have a very very small modicum of writing ability, we are by no means trained journalists "out for the scoop" or trying to generate traditional "media hype" around varying technologies. We report things as we see them. We are also very experienced in testing these technologies; in fact the majority of the work the lab is contracted to perform is private testing for corporate clients and vendors/manufacturers/developers. Therefore we will not "test" where others try unless the test will provide valid worthwhile results that we will stand behind happily. The fact that we are not journalists means that the Magazine's editorial staff have their work cut out editing our reviews while still maintaining our individual writing styles and the basic concepts of what we are trying to deliver, sometimes it is successful sometimes less so. An example for you is that the review we submitted on Spam was 7,049 words long (25 A4 pages in Word, or Writer, with screen shots and images). And that does not even include the features table or the overview table. The space available for that edition of the magazine was less than 3000 words. Therefore 4000 words had to be lost. We don't get to see the finished product until it is published. Overall I personally feel that the review turned out ok post editing (however on review of the comments posted here there were two key sections that were cut which would debunk many of the myths surrounding the comments in this forum) These sections were entitled "Testing anti-spam -- Given all the time, resources and money in a perfect world." And also "Recent private contract anti-spam testing performed by the lab." For those of you patient enough to keep reading this post I have cut and pasted them at the end of this reply verbatim from the original document.
4. The online editorial team at the Magazine called the online review the "Ultimate" when it pretty obviously is not the "Ultimate" - I think enough people have made that comment now. However when the review was published in the print edition of the magazine it was entitled "Slam that Spam", Don't ask me what was wrong with sticking with the original "Slam that spam" title. Perhaps this could be titled the "Ultimate" reply? :-)
5. The link in the body of this latest spam review points to the first review of spam applications which appeared in the July 2003 edition of the magazine, we also performed a follow up test which was published in the October 2003 edition. That one has the testing, methodology and results etc. Do a search for "Son of Spam" if you want to see a basic methodology covering a very short few weeks of testing.
6. So remembering the vendors are invited approx six weeks before going to print. One must also bear in mind the resources that are involved in putting together a review such as this, the more products and more product complexity the longer it becomes both time wise and size and therefore the more resources need to be utilized to bring it all together. Particularly when the magazine target market is enterprise level and therefore the products are too. Resources particularly time, print space (out of the labs control), equipment and money cause the most grief. For example we have approx three to four weeks from when an invite is sent to a vendor before the review is due to be completed and submitted to the magazine and the magazine has approx one to two weeks to edit and layout the data. In this timeframe we need to get the products from the participating vendors, learn how to install, configure, administer and then test each one, let alone writing them up. In the case of Spam we needed to setup a live e-mail server on the internet for each product tested, not to mention the equipment needed for the testing 11 products meant we had to configure almost 22 servers (one as the mail server one to host the application).
As previously mentioned the majority of the Test Labs work is carrying out private, independent, confidential testing for corporate clients and vendors and in these cases there is adequate time, money and equipment resources available to perform the required evaluation.
Again I thank you very much for wading through this response, and as always the readers are more than welcome to their opinions.
Best Regards,
Matt Tett
THE CUT SECTIONS ;
"Testing anti-spam -- Given all the time, resources and money in a perfect world.
To do a complete and thorough live accuracy test would take at least two to three months. It would involve the setting up of two concurrent running mail servers and applications per product on test, one set to default vendor baseline static "out of the box" settings and the other as a dynamic "tweakable" system to ensure that benefits were being derived on a day-to-day and week-to-week basis between the static and dynamic machines. So for the eleven products in this review there would be twenty two mail servers required.
We would then have to select which combination from our live honeypot domains would provide the best mix of unique spam messages for the testing. Honeypots are live mail servers with valid domains and user accounts that we constantly have running which attract and collect spam, it takes a considerable time to build these honeypots up as you cannot just go off and subscribe or add your e-mail address to the spammers database, then you are inviting or entrapping the spammer and those messages would have to be classed as "grey" not "spam". We have to ensure that our test e-mail accounts are harvested via normal spammers means, like domain and address harvesting from live websites, name database additions etc, our honeypots have a history of almost three years now.
Once we had our live spam feed we then need to inject a live ham/grey mail feed too. We have modified a centralized mail server to enable us to perform this initial combining then aggregation of the message stream to ensure that each mail server receives exactly the same feed live from the internet to one address on the mail server as though the message had come straight from the source to the final destination. This is very difficult to achieve particularly considering that many anti-spam applications rely on the original e-mail header information being intact. Something that mail applications like Novell Groupwise, Microsoft Exchange and Outlook do not do.
So while this is all good and well, we would then have twenty two mail servers and anti-spam applications up and running with live feeds of spam, ham and grey mail. We would then set up a machine or groups of machines to POP the e-mail messages from the respective servers using Outlook Express. OE keeps the headers intact for later reference or use via our controlled/static test, yes once the live testing is over we have developed a methodology for mass "re-testing" under a controlled environment. Once the messages are in OE the hard part starts sorting out the missed spam, the canned grey mail, and dare we say it the false positives. This is the most labor resource intensive part of the testing. I couldn't bear to imagine how many hours/days it would take to go through twenty two servers results every week for two or three months.
So the resources like time, budget and labour are against us to complete such a test for this review.
Recent private contract anti-spam testing performed by the lab.
We can however let you in on a few "generic" results derived from the various private anti-spam product testing contracts that we have completed in the past eighteen months. This spam testing experience has shown us that most of the applications we have performed private testing on in an "out of the box" configuration rate at around 65% to 70% spam catch accuracy with very low to zero false positives. You may well appreciate as with most things in life there are no two e-mail environments which are identical, therefore the anti-spam product vendors do have a very difficult time setting these default baseline settings to achieve the best spam catch rate with a low to zero false positive rate for the widest possible base of users. Really administrators should expect that there will be a certain amount of fine tuning or tweaking expected to achieve higher catch rates and possibly depending on the industry lower false-positives. Imagine applying a anti-spam filter if you were in the pharmaceutical industry or the porn industry! Both are legitimate and both naturally require e-mails to be sent and received containing information about their respective businesses.
A false positive as its name suggests is a legitimate e-mail message, (ham), that a recipient should receive that gets filtered incorrectly and flagged as spam. This is particularly bad if the filter has been set to drop all messages determined to be spam as it means that ultimately the recipient would potentially have no idea that the message was ever sent. To add another level of complexity to the mix there are also messages which fall in to the area between ham and spam these are generally classed as "grey" mail and are the newsletters, circulars etc that some recipients may subscribe to and need to receive. These have many common characteristics of spam messages and are very hard to filter correctly. So at the end of the day there are three bodies of messages to be concerned with, spam, ham and grey. In a perfect world all spam should be dropped, all ham and grey mail should be delivered.
With a few weeks of concerted tweaking and testing by the mail administrator most of the applications that we have performed private testing on we have managed to increase the spam catch rate to somewhere between 85% to 92% while still maintaining a zero false positive rate. Grey mail on the other hand is a very different beast the best ways we have found to deal with the sensitivity of the filters in respect to these messages is to include them explicitly on a white list, more on white and black lists later."
Love all the comments !! And despite popular belief I did not get my 2 year old son to write the review.
Reading through them it seems to me there is definitely a few misconceptions that need to be cleared up, so hopefully this may sort a few things out. Then again it may not!
Before we begin down this path I appreciate your patience in getting through this abnormally large post, but it is better to deal with the comments on a whole rather than one by one.
1. We are the RMIT Test Lab, based in Australia, we are a totally separate organization from the magazine who is one of our clients, they contract us to perform three independent technology reviews every month on products that they invite the vendors to submit. The RMIT Test Lab will have been performing independent magazine reviews for 16 years in January 2005. We have certainly produced a hell of a lot of words over that time. For more information on the RMIT Test Lab hit www.testlab.rmit.edu.au The vendors don't pay the Lab one cent to have their products tested for the magazine reviews.
2. For all you Open Source buffs out there, you know who you are! The magazine creates a list of what technologies will be tested approx. six months in advance, one and a half months before going to press the magazine issues invites to various product vendors to submit product(s) to us at the lab for testing, this is generally accompanied by a "scenario" which is set by the magazine to ensure that the vendors stick to certain criteria and submit products of a certain caliber/type and not all eight products that they may have in their inventory which fits into that review category. Therefore it is the magazine who invites the vendors, not the Test Lab nor the reviewer. Basically we have no control over which vendors are invited to submit and at the end of the day every single vendor could not possibly be reviewed, there will always be some who cant submit, wont submit, have not been invited or don't have Australia as a target market. So don't blame us for not including Spam Assassin or any of the other 100's of commercial and open source Anti-Spam solutions that are out there. Also note that a review we have recently completed and submitted "E-Mail Clients" for the next edition of the magazine contained several Open Source products, and a review we have just commenced "Internet Browsers" also contains several Open Source products too. So before pulling out the "Paid for Results" and "Advertising Driven" and "Open Source Bashing" comments think again and take a look at a few of the other reviews we have performed.
3. We are fundamentally IT engineers who design and execute testing frameworks, methodologies and create reports, we just happen to have a very very small modicum of writing ability, we are by no means trained journalists "out for the scoop" or trying to generate traditional "media hype" around varying technologies. We report things as we see them. We are also very experienced in testing these technologies; in fact the majority of the work the lab is contracted to perform is private testing for corporate clients and vendors/manufacturers/developers. Therefore we will not "test" where others try unless the test will provide valid worthwhile results that we will stand behind happily. The fact that we are not journalists means that the Magazine's editorial staff have their work cut out editing our reviews while still maintaining our individual writing styles and the basic concepts of what we are trying to deliver, sometimes it is successful sometimes less so. An example for you is that the review we submitted on Spam was 7,049 words long (25 A4 pages in Word, or Writer, with screen shots and images). And that does not even include the features table or the overview table. The space available for that edition of the magazine was less than 3000 words. Therefore 4000 words had to be lost. We don't get to see the finished product until it is published. Overall I personally feel that the review turned out ok post editing (however on review of the comments posted here there were two key sections that were cut which would debunk many of the myths surrounding the comments in this forum) These sections were entitled "Testing anti-spam -- Given all the time, resources and money in a perfect world." And also "Recent private contract anti-spam testing performed by the lab." For those of you patient enough to keep reading this post I have cut and pasted them at the end of this reply verbatim from the original document.
4. The online editorial team at the Magazine called the online review the "Ultimate" when it pretty obviously is not the "Ultimate" - I think enough people have made that comment now. However when the review was published in the print edition of the magazine it was entitled "Slam that Spam", Don't ask me what was wrong with sticking with the original "Slam that spam" title. Perhaps this could be titled the "Ultimate" reply?
5. The link in the body of this latest spam review points to the first review of spam applications which appeared in the July 2003 edition of the magazine, we also performed a follow up test which was published in the October 2003 edition. That one has the testing, methodology and results etc. Do a search for "Son of Spam" if you want to see a basic methodology covering a very short few weeks of testing.
6. So remembering the vendors are invited approx six weeks before going to print. One must also bear in mind the resources that are involved in putting together a review such as this, the more products and more product complexity the longer it becomes both time wise and size and therefore the more resources need to be utilized to bring it all together. Particularly when the magazine target market is enterprise level and therefore the products are too. Resources particularly time, print space (out of the labs control), equipment and money cause the most grief. For example we have approx three to four weeks from when an invite is sent to a vendor before the review is due to be completed and submitted to the magazine and the magazine has approx one to two weeks to edit and layout the data. In this timeframe we need to get the products from the participating vendors, learn how to install, configure, administer and then test each one, let alone writing them up. In the case of Spam we needed to setup a live e-mail server on the internet for each product tested, not to mention the equipment needed for the testing 11 products meant we had to configure almost 22 servers (one as the mail server one to host the application).
As previously mentioned the majority of the Test Labs work is carrying out private, independent, confidential testing for corporate clients and vendors and in these cases there is adequate time, money and equipment resources available to perform the required evaluation.
Again I thank you very much for wading through this response, and as always the readers are more than welcome to their opinions.
Best Regards,
Matt Tett
THE CUT SECTIONS
"Testing anti-spam -- Given all the time, resources and money in a perfect world.
To do a complete and thorough live accuracy test would take at least two to three months. It would involve the setting up of two concurrent running mail servers and applications per product on test, one set to default vendor baseline static "out of the box" settings and the other as a dynamic "tweakable" system to ensure that benefits were being derived on a day-to-day and week-to-week basis between the static and dynamic machines. So for the eleven products in this review there would be twenty two mail servers required.
We would then have to select which combination from our live honeypot domains would provide the best mix of unique spam messages for the testing. Honeypots are live mail servers with valid domains and user accounts that we constantly have running which attract and collect spam, it takes a considerable time to build these honeypots up as you cannot just go off and subscribe or add your e-mail address to the spammers database, then you are inviting or entrapping the spammer and those messages would have to be classed as "grey" not "spam". We have to ensure that our test e-mail accounts are harvested via normal spammers means, like domain and address harvesting from live websites, name database additions etc, our honeypots have a history of almost three years now.
Once we had our live spam feed we then need to inject a live ham/grey mail feed too. We have modified a centralized mail server to enable us to perform this initial combining then aggregation of the message stream to ensure that each mail server receives exactly the same feed live from the internet to one address on the mail server as though the message had come straight from the source to the final destination. This is very difficult to achieve particularly considering that many anti-spam applications rely on the original e-mail header information being intact. Something that mail applications like Novell Groupwise, Microsoft Exchange and Outlook do not do.
So while this is all good and well, we would then have twenty two mail servers and anti-spam applications up and running with live feeds of spam, ham and grey mail. We would then set up a machine or groups of machines to POP the e-mail messages from the respective servers using Outlook Express. OE keeps the headers intact for later reference or use via our controlled/static test, yes once the live testing is over we have developed a methodology for mass "re-testing" under a controlled environment. Once the messages are in OE the hard part starts sorting out the missed spam, the canned grey mail, and dare we say it the false positives. This is the most labor resource intensive part of the testing. I couldn't bear to imagine how many hours/days it would take to go through twenty two servers results every week for two or three months.
So the resources like time, budget and labour are against us to complete such a test for this review.
Recent private contract anti-spam testing performed by the lab.
We can however let you in on a few "generic" results derived from the various private anti-spam product testing contracts that we have completed in the past eighteen months. This spam testing experience has shown us that most of the applications we have performed private testing on in an "out of the box" configuration rate at around 65% to 70% spam catch accuracy with very low to zero false positives. You may well appreciate as with most things in life there are no two e-mail environments which are identical, therefore the anti-spam product vendors do have a very difficult time setting these default baseline settings to achieve the best spam catch rate with a low to zero false positive rate for the widest possible base of users. Really administrators should expect that there will be a certain amount of fine tuning or tweaking expected to achieve higher catch rates and possibly depending on the industry lower false-positives. Imagine applying a anti-spam filter if you were in the pharmaceutical industry or the porn industry! Both are legitimate and both naturally require e-mails to be sent and received containing information about their respective businesses.
A false positive as its name suggests is a legitimate e-mail message, (ham), that a recipient should receive that gets filtered incorrectly and flagged as spam. This is particularly bad if the filter has been set to drop all messages determined to be spam as it means that ultimately the recipient would potentially have no idea that the message was ever sent. To add another level of complexity to the mix there are also messages which fall in to the area between ham and spam these are generally classed as "grey" mail and are the newsletters, circulars etc that some recipients may subscribe to and need to receive. These have many common characteristics of spam messages and are very hard to filter correctly. So at the end of the day there are three bodies of messages to be concerned with, spam, ham and grey. In a perfect world all spam should be dropped, all ham and grey mail should be delivered.
With a few weeks of concerted tweaking and testing by the mail administrator most of the applications that we have performed private testing on we have managed to increase the spam catch rate to somewhere between 85% to 92% while still maintaining a zero false positive rate. Grey mail on the other hand is a very different beast the best ways we have found to deal with the sensitivity of the filters in respect to these messages is to include them explicitly on a white list, more on white and black lists later."