DirectAdmin Rspamd Bayes Is Not Learning: How to Train Spam Filtering and Read the Right Logs

~10 min read

492 views

0 likes

If Rspamd Bayes is not learning on your DirectAdmin server, the usual cause is simple: Bayes has not seen enough clean mail and spam yet, or you never wired DirectAdmin and Dovecot to feed training data into Rspamd in the first place. The giveaway is a log line like: “The ham class needs more training samples. Currently: 0; minimum 200 required.” DirectAdmin’s mail stack can scan mail just fine while Bayes contributes nothing at all. That is why you can see spam scoring, SPF checks, DKIM checks, and rule hits, but never see BAYES_HAM or BAYES_SPAM affect the final result.

In our experience managing production mail systems, this is one of the most common false assumptions in panel-based hosting. Admins expect Bayes to “learn automatically” because spam filtering is enabled. It does not work that way. Rspamd needs a Redis-backed statistics store, enough balanced ham and spam samples, and a reliable training path such as manual rspamc learning, WebUI uploads, or Dovecot IMAPSieve actions when users move mail into and out of Junk. Rspamd’s current documentation still shows min_learns = 200 for both ham and spam, and Redis is explicitly required for Bayesian learning and statistics storage. The recent DirectAdmin forum thread using your exact error confirms the same root cause. Rspamd statistics settings Rspamd first setup DirectAdmin forum thread

What is actually broken

Usually, nothing is broken at the scanner layer. Rspamd is scanning messages. Exim is handing mail to it. DirectAdmin is exposing the familiar spam settings in the panel. The broken part is the learning loop.

Messages are scanned, but Bayes never activates because the classifier is below its minimum training threshold.
Redis is missing, misconfigured, or unreachable, so statistics never persist.
You followed an older SpamAssassin-focused DirectAdmin guide that calls sa-learn, which does not apply to Rspamd.
Users move mail between Inbox and Junk, but no IMAPSieve or equivalent training hook sends those messages to rspamc learn_spam or rspamc learn_ham.
The ham and spam sample counts are badly unbalanced, so autolearn is skipped by balance checks.

That last point matters more than most admins think. A mailbox with hundreds of obvious spam samples and almost no real ham will teach Rspamd the wrong lesson. On production systems, we see this after rushed migrations. Clients import old Junk folders, but no one imports sent mail, inbox mail, or transactional mail as ham. Then Bayes becomes either inactive or biased.

The minimum facts you need before touching anything

Rspamd’s current statistics documentation shows a standard Bayes classifier with backend = "redis", min_tokens = 11, and min_learns = 200. It also documents autolearn thresholds and balance controls such as check_balance and min_balance. The official rspamc client still supports learn_spam, learn_ham, and stat. The Rspamd controller worker handles statistics and learning operations, which is why the WebUI can learn messages if the controller is configured correctly. Rspamd statistics settings rspamc man page Rspamd controller worker

On the DirectAdmin side, the current docs confirm that per-user Rspamd config is written under /etc/rspamd/users.d/username.conf. They also document the main Rspamd log at /var/log/rspamd/rspamd.log and recommend tailing mail logs such as /var/log/exim/mainlog during live testing. DirectAdmin also ships Rspamd configuration through CustomBuild, and newer versions support custom overrides under the CustomBuild Rspamd path instead of hacking upstream files directly. DirectAdmin incoming spam docs DirectAdmin directories and locations DirectAdmin changelog 1.646

The first logs to read

Do not start by staring at the GUI. Start with logs and counters.

tail -f /var/log/rspamd/rspamd.log /var/log/exim/mainlog

rspamc stat

journalctl -u rspamd -n 100 --no-pager

What you are looking for:

Lines saying ham or spam needs more training samples
Redis connection failures
Controller access or permission errors during learning
Messages being scanned with normal symbols but no Bayes symbols
Repeated autolearn skips caused by balance or duplicate-learning checks

A healthy rspamc stat output should show Bayes-related counters and learned message totals. If it does not, stop there. Bayes cannot classify what it has never learned. On a shared hosting server, that distinction matters because the UI can make the whole setup look complete even when the statistical backend is empty.

Step 1: confirm Redis is available

Rspamd’s own setup guide is explicit: Redis is required for statistics and Bayesian learning. Without Redis, static checks like SPF, DKIM, DMARC, and RBL lookups still work, so the filter appears alive while Bayes remains dead. That fools a lot of admins. Rspamd first setup

redis-cli ping
ss -tlnp | grep 6379
journalctl -u rspamd | grep -i redis

You want to see PONG, a listening Redis socket, and no repeated connection errors in the Rspamd journal. If Redis is down or listening on a different address than the one configured in /etc/rspamd/local.d/redis.conf, Bayes learning will never persist.

Step 2: verify the Bayes classifier config

Check the active statistics config. Depending on distribution and packaging, the classifier can live in a file such as /etc/rspamd/statistic.conf, /etc/rspamd/local.d/statistic.conf, or a packaged include. The exact path matters less than the active values.

rspamadm configdump statistic | less

Look for these basics:

classifier "bayes" {
  tokenizer {
    name = "osb";
  }
  backend = "redis";
  min_tokens = 11;
  min_learns = 200;

  statfile {
    symbol = "BAYES_HAM";
    spam = false;
  }
  statfile {
    symbol = "BAYES_SPAM";
    spam = true;
  }

  learn_condition = "return require('lua_bayes_learn').can_learn";
}

If backend is not Redis, or the classifier is missing entirely, fix that before doing any training. If you are running DirectAdmin, keep your changes in the supported customization path where possible instead of editing generated files in place. DirectAdmin’s newer CustomBuild layout explicitly supports Rspamd customizations under the CustomBuild Rspamd directory. Rspamd statistics settings DirectAdmin changelog 1.646

Step 3: train manually before you automate

Do not jump straight into IMAPSieve debugging. First prove that learning works manually. The official client supports exactly the commands you need:

rspamc learn_spam /root/samples/spam-001.eml
rspamc learn_ham /root/samples/ham-001.eml
rspamc stat

Use raw RFC822 message files, not copied text from a mail client preview window. That mistake is common. If the message body is mangled, incomplete, or stripped of headers, your training quality drops immediately. In production, we usually export real messages from Maildir or obtain them from the WebUI scan and learn view.

For bulk training:

find /root/spam/ -type f -print0 | xargs -0 -I {} rspamc learn_spam "{}"
find /root/ham/  -type f -print0 | xargs -0 -I {} rspamc learn_ham "{}"

rspamc stat | grep -A5 BAYES

Rspamd’s current guidance is still to train with at least 200 spam and 200 ham messages. That is the minimum to begin classification, not a magic number for good accuracy. In our experience, 500 to 1,000 clean samples per class produces more stable results on busy multi-domain hosting nodes. Small sample sets work, but they are noisy. Rspamd first setup

Step 4: wire DirectAdmin and Dovecot to feed Bayes continuously

This is where many DirectAdmin deployments fall apart. The official DirectAdmin page for “Automatically marking moved mail to Junk as spam” is written around SpamAssassin-era hooks and refers to Pigeonhole and Sieve. That concept is still correct, but for Rspamd you need the same mailbox movement logic to call rspamc, not sa-learn. That mismatch is exactly what recent forum posts were complaining about. DirectAdmin incoming spam docs DirectAdmin forum thread

The DirectAdmin docs still show the important IMAPSieve trigger points:

# From elsewhere to Junk folder
imapsieve_mailbox1_name = Junk
imapsieve_mailbox1_causes = COPY
imapsieve_mailbox1_before = file:/usr/local/bin/dovecot-sieve/report-spam.sieve

# From Junk folder to elsewhere
imapsieve_mailbox2_name = *
imapsieve_mailbox2_from = Junk
imapsieve_mailbox2_causes = COPY
imapsieve_mailbox2_before = file:/usr/local/bin/dovecot-sieve/report-ham.sieve

For Rspamd, your Sieve pipeline or wrapper script must extract the message and call the right learning action. A common pattern is a shell wrapper that reads the message from stdin and hands it to rspamc. The exact script varies by distro and Dovecot version, especially on newer Dovecot 2.4 setups, but the principle stays the same. Recent DirectAdmin forum guidance around Dovecot 2.4 and IMAPSieve reflects this newer approach. DirectAdmin Dovecot 2.4 Rspamd learning guide

#!/bin/sh
# /usr/local/bin/rspamd-learn-spam.sh
/usr/bin/rspamc learn_spam

#!/bin/sh
# /usr/local/bin/rspamd-learn-ham.sh
/usr/bin/rspamc learn_ham

That looks simple, but the edge cases are not. The Dovecot service user must be allowed to call the command and access the message source safely. The controller privileges must be right if your environment requires them. And if the same message has already been learned, you may see skips that look like failures. The first thing we check when a client says “training does nothing” is whether the script runs at all, then whether the message is a real raw message, then whether Redis counters move.

How to read the logs without guessing

Read these logs in this order during a live test where you move one message to Junk and one back to Inbox:

/var/log/rspamd/rspamd.log for learning and classification events
/var/log/exim/mainlog for delivery flow and spam headers on the message path
journalctl -u rspamd for service-level errors such as Redis failures or config problems
Dovecot logs through journald or your distro mail log if IMAPSieve hooks are not firing

Typical useful messages include:

The ham class needs more training samples. Currently: 0; minimum 200 required
Redis timeout or connection refused messages
Config syntax errors after edits
Duplicate-learn or already-in-class skips

Rspamd logging supports log levels from error through debug, and each task includes a unique tag you can follow across entries. That is extremely useful when tracking one test message through the pipeline. Raise verbosity only long enough to reproduce the issue. Leaving debug enabled on a busy node creates more noise than value. Rspamd logging settings

Common reasons Bayes still does not kick in

You only trained spam. Bayes needs both classes. A spam-heavy ratio can block or distort learning.
You trained the wrong files. Webmail exports, copied snippets, or MIME-broken files produce weak results.
You are watching score output too early. Crossing 200 messages per class only enables classification. It does not guarantee strong signals yet.
You are changing generated DirectAdmin files directly. Rebuilds overwrite them. Use the supported custom paths.
You expect per-user learning, but only configured global Bayes. On shared hosting, that difference can matter if many unrelated domains share one classifier.

A production insight here: on a 50-mailbox shared hosting server, per-user Bayes sounds attractive, but it often underperforms if mailbox volumes are low. Global learning is usually more resilient unless you host very different mail populations that poison each other. That is why we evaluate volume and tenant mix before enabling per-user statistics on hosted mail environments.

When DIY stops being worth it

If you only run a few business mailboxes, you can solve this yourself. If you are managing many domains, multiple users, and recurring false positives, the mail stack becomes operational work, not a one-time setup. At that point, a managed platform with stable mail hosting and correct DNS, DKIM, SPF, and spam filtering defaults saves time.

For teams already using DirectAdmin-based hosting, ServerSpan’s web hosting plans and email hosting cover the infrastructure layer while leaving room for panel-based administration. The value is not that Bayes becomes magical. The value is that the boring parts such as mail stack maintenance, panel integration, and service health checks are handled before your users start dragging messages around to compensate for a broken filter.

A practical recovery plan

Run rspamc stat and confirm Bayes counters exist.
Verify Redis with redis-cli ping and Rspamd logs.
Dump active config with rspamadm configdump statistic.
Manually train 200 ham and 200 spam messages using raw message files.
Recheck rspamc stat.
Test one known ham and one known spam through live delivery.
Only then automate learning from Junk folder moves with Dovecot IMAPSieve or an equivalent wrapper.
Keep ham and spam learning balanced over time.

If you skip step 4 and jump straight into mailbox-move automation, you will spend hours debugging Dovecot while the real problem is that Bayes has no usable dataset. Fix the classifier first, then the workflow.

For more mail operations context, see Email Deliverability Guide: Why Your Hosting Matters and Hosting a Business Email: Why Free Email Breaks at Scale.

Romanian version: Rspamd Bayes în DirectAdmin nu învață: cum antrenezi filtrarea de spam și cum citești logurile corecte

Source & Attribution

This article is based on original data belonging to serverspan.com blog. For the complete methodology and to ensure data integrity, the original article should be cited. The canonical source is available at: DirectAdmin Rspamd Bayes Is Not Learning: How to Train Spam Filtering and Read the Right Logs.

Business Email Hosting

Secure email hosting with spam protection and webmail access.

Learn More

DirectAdmin spam filter troubleshooting, bayes ham spam is not working, Rspamd learn_spam learn_ham, DirectAdmin email spam filtering, DirectAdmin Rspamd Bayes not learning, DirectAdmin web hosting email filtering, Rspamd logs DirectAdmin, Dovecot IMAPSieve Rspamd, Rspamd Redis setup, managed email hosting spam protection, Rspamd Bayes training threshold, DirectAdmin mail server troubleshooting

09 Mar 2026

DirectAdmin Rspamd Bayes Is Not Learning: How to Train Spam Filtering and Read the Right Logs

What is actually broken

The minimum facts you need before touching anything

The first logs to read

Step 1: confirm Redis is available

Step 2: verify the Bayes classifier config

Step 3: train manually before you automate

Step 4: wire DirectAdmin and Dovecot to feed Bayes continuously

How to read the logs without guessing

Common reasons Bayes still does not kick in

When DIY stops being worth it

A practical recovery plan

Source & Attribution

Business Email Hosting

You May Also Like

How to Enable SSH on a VPS and Connect with PuTTY: Complete Guide for All Linux and BSD Families

Firefly III on a VPS: Own Your Financial Data Before Your Bank's App Owns You

Stop Screenshotting Recipes: self-host Mealie and build a family cookbook that actually gets used

Network Troubleshooting Commands on Linux VPS: Fix Issues Fast

DirectAdmin Rspamd Bayes Is Not Learning: How to Train Spam Filtering and Read the Right Logs

What is actually broken

The minimum facts you need before touching anything

The first logs to read

Step 1: confirm Redis is available

Step 2: verify the Bayes classifier config

Step 3: train manually before you automate

Step 4: wire DirectAdmin and Dovecot to feed Bayes continuously

How to read the logs without guessing

Common reasons Bayes still does not kick in

When DIY stops being worth it

A practical recovery plan

Source & Attribution

Business Email Hosting

Share This Post

You May Also Like