Spam Filters Explained What do they do? How do they work? Which
one is right for me? By Alan Hearnshaw
Spam is a very real problem that many people have to deal with
on a daily basis. For those that have decided to do something
about it and start to investigate the options available in spam
filtering, this article provides a brief introduction to your
options and the types of spam filters available.
Despite the bewildering array of spam filters available today,
all claiming to the best one “of its kind” there are really just
five filtering methodologies in general use today and all
products rely on one, or a combination of these:
Content-Based Filters “In the beginning, there were
content-based filters.”
These filters scan the contents of the and look for tell-tale
signs that the message is spam. In the early days of spamming it
was quite simple to look out for “Kill Words” such as ”Lose
Weight” and mark a message as spam if it was found.
Very soon though, spammers got wise to this and started
resorting to all kinds of tricks to get their message past the
filters. The days of “obfuscation” had begun. We started getting
messages containing the phrase “L0se Welght” (Notice the zero
for “o” and “l” for “i”) and even more bizarre – and sometimes
quite ingenious – variations. This rendered basic content-based
filters somewhat ineffective, although there are one or two on
the market now that are clever enough to “see through” theses
attempts and still provide good results.
Bayesian Based Filters “The Reverend Bayes comes to the rescue”
Born in London 1702, the son of a minister, Thomas Bayes
developed a formula which allowed him to determine the
probability of an event occurring based on the probabilities of
two or more independent evidentiary events.
Bayesian filters “learn” from studying known good and bad
messages. Each message is split into single “word bytes”, or
tokens and these tokens are placed into a database along with
how often they are found in each kind of message. When a new
message arrives to be tested by the filter, the new message is
also split into tokens and each token is looked up in the
database. Extrapolating results from the database and applying a
form of the good reverend’s formula, know as a “Naive Bayesian”
formula, the message is given a “spamicity” rating and can be
dealt with accordingly.
Bayesian filters typically are capable of achieving very good
accuracy rates (>97% is not uncommon), and require very little
on-going maintenance.
Whitelist/Blacklist Filters “Who goes there, friend or foe?”
This very basic form of filtering is seldom used on its own
nowadays, but can be useful as part of a larger filtering
strategy.
A “whitelist” is nothing more than a list of e-mail addresses
from which you wish to accept communications. A whitelist filter
would only accept messages from these people and all others
would be rejected
A “blacklist”, conversely, is a list of e-mail addresses - and
sometimes IP Addresses (computer identification addresses) -
from which communications will not be accepted.
While this may seem like a good idea from the outset, a
whitelist methodology is too restrictive for most people and, as
virtually all spam e-mails carry a forged “from” address, there
is little point in collecting this address to ban it in future
as it is very unlikely to be the same next time. There are
bodies on the internet that maintain a list of known “bad”
sources of e-mail. Many filters today have the ability to query
these servers to see if the message they are looking at comes
from a source identified by this Internet-based blacklist, or
RBL. While being quite effective, they do tend to suffer from
“false positives” where good messages are incorrectly identified
as spam. This happens often with newsletters.
Challenge/Response Filters “Open sesame!”
Challenge/Response filters are characterised by their ability to
automatically send a response to a previously unknown sender
asking them to take some further action before their message
will be delivered. This is often referred to as a "Turing Test"
- named after a test devised by British mathematician Alan
Turing to determine if machines could “think”.
Recent years have seen the appearance of some internet services
which automatically perform this Challenge/Response function for
the user and require the sender of an e-mail to visit their web
site to facilitate the receipt of their message.
Critics of this system claim it to be too drastic a measure and
that it sends a message that "my time is more important than
yours" to the people trying to communicate with you.
For some low traffic e-mail users though, this system alone may
be a perfectly acceptable method of completely eliminating spam
from their inbox - one step above the "Whitelist" system
outlined above.
Community Filters “A united front”
These types of filters work on the principal of "communal
knowledge" of spam. When a user receives a spam message, they
simply mark it as such in their filter. This information is sent
to a central server where a “fingerprint” of the message is
stored. After enough people have “voted” this message to be
spam, then it is stopped from reaching all the other people in
the community.
This type of filtering can prove to be quite effective, although
it stands to reason that it can never be 100% effective as a few
people have to receive the spam for it to be “flagged” in the
first place. Just like its similar cousin the Internet black
list (RBL), this system also can suffer from “false positives”,
or messages incorrectly identified as spam.
Hopefully you are now armed with a little more information to be
able to make an informed decision on the best spam filter for
you. For further information, consider reading the reviews and
articles found at http://www.whichspamfilter.com
Alan Hearnshaw is the owner of http://www.whichspamfilter.com, a
web site which conducts weekly in-depth reviews of current spam
filters, provides help and guidance in the fight against spam
and provides a useful community forum. alan@whichspamfilter.com
About the author:
Alan Hearnshaw is a computer programmer and the owner of
http://www.WhichSpamFilter.com, a site which provides weekly
in-depth spam filter reviews, user help and guidance and a
community forum.
|