Training Naive-Bayes Model for Spam Detection

Training Rspamd Naive-Bayes

Assuming your folders are on the host system, copy them into the rspamd container:

docker cp spam/ rspamd:/tmp/spam
docker cp ham/ rspamd:/tmp/ham

docker exec -it rspamd apk add rspamd-client

docker exec -it rspamd rspamc -P tuxguard stat

Look for the line total learns: to see how many samples have been processed so far.

Note, that you need a minimum of 200 ham samples to enable the bayes model.

docker exec -it rspamd rspamc -P tuxguard learn*spam /tmp/spam/*.eml
docker exec -it rspamd rspamc -P tuxguard learn*ham /tmp/ham/*.eml

docker exec -it rspamd rspamc -P tuxguard stat

The total learns count should increase.

If you don’t yet have enough of your own spam samples, you can bootstrap training using a public dataset, e.g. https://untroubled.org/spam/:

cd /tmp
curl -o 2022.7z <https://untroubled.org/spam/2022.7z>

7za x 2022.7z

docker cp 2022/ rspamd:/tmp/spam

docker exec -it rspamd rspamc -P tuxguard learn_spam /tmp/spam/\*

After this, run rspamc stat again and confirm that the number of learned samples has gone up.

Afterward Rspamd will report the Bayes symbol in the log: