Bayes - XWall for Microsoft Exchange

XWall · The Mail Filter

Bayes

Bayes FAQ

Where is the Bayes good-word and bad-word list kept?

The list is in bayes-g.dat and bayes-b.dat in the XWall directory.

There is no bayes-g.dat and bayes-b.dat in the XWall directory

Restart XWall so that the files are written to disk.

I like to modify the good-word and bad-word list and add or remove words?

You can't modify the list, because XWall stores only a hash of the word and not the word itself.

Can I delete the good-word and bad-word list and start over in learn mode?

Simply delete bayes-g.dat and bayes-b.dat while XWall is stopped.

Do the good-word and bad-word list files get any larger or just remain at 10 KB?

They lists will grow up to 100000 words or what you defined in Options->Spam->Bayes

Is there a pre-learned read- to- go word list available?

You can download a pre-learned bayes-g.dat and bayes-b.dat from here.

Stop XWall and extract bayes-g.dat and bayes-b.dat from bayes-dat.zip and copy it into the XWall directory.

If you want a more aggressive filtering, then use only the bayes-b.dat, if you want more relaxed filtering, then use bayes-g.dat and bayes-b.dat.

How do I know if the Bayes filter is working as expected?

For each message where the Bayes value is more than zero, XWall shows the calculated Bayes value in the "Bay:" line.

A sample looks like:

From: sales@xxxxx.com
To: someone@dataenter.co.at
Subj: The finest in Printed Erotic Art Reproductions
Att: [raw] message.eml
Size: 1 K
Bay: 94

Can you tell me which word in a message make a message a spam message?

It is not that a single word make a message spam or not.

Bayes is a statistic approach and this means that the message in question has a lot of words that usually in a spam message and it had less words that make a good message.

Why does the same messages get different Bayes values every time XWall processes it?

If Learn Mode is enabled each messages changes the statistic for the next messages. So the same message get different values every time XWall processes it. This means that the same message get a higher or lower Bayes value the next time, but it will get the same value.

This message is not spam, but the Bayes value indicates that it is spam. How can I avoid this?

If a legitimate e-mail looks the same way as a spam mail ( from the statistic point of view ), then only way is to exclude the senders e-mail address in Options->Exclude->E-Mail Address->Inbound MAIL MAIL FROM

How can I make sure that the Bayes filter is getting the best information during its learning mode?

Make sure XWall finds a lot of Spam mail, either by SLS or by BCC.

Another good idea is to add the e-mail address of employees that have left the company to Options->Exclude->E-Mail Address->Inbound RCPT TO. Usually these addresses get only spam and XWall can use this messages to build up the bad-word list.

When should I turn off learning mode?

Leave learning mode enabled all the time, because this dynamically adjusts the word lists.

How do I know how many emails XWall has learned?

Dump the statistic into the logfile and then it will show how many messages are learned.

A sample looks like:

Bayes message count (good/bad): 7970 / 9048
Bayes word count (good/bad): 51995 / 84818

Do you have any additional documentation on how Bayes works?

Paul Graham's original method in A Plan For Spam

Gary Robinson's alternative method can be found here