Writing Fail2ban filters

Fail2ban is a popular tool for automatically banning malicious bots as they try to log into various services. It analyzes log files for things like web or ssh servers, finds repeated failed attempts to log in, and bans the originating IP addresses.

Fail2ban supports a number of software out of the box—meaning, it comes with configuration files which define regular expressions that parse the log files for particular pieces of software, identifying failed login attempts (or similar potentially malicious requests) and their originators in the log files.

You might, however, want to add support for software that Fail2ban does not support out of the box. The problem is, Fail2ban does not always have the best documentation around, as I have discovered recently when I decided to write a simple filter, which nevertheless required multi-line support. We will be using Fail2ban version 0.11.1.

The simple case

The simple case with logs is situations where each failed login attempt or malicious request (henceforth just "failed login") produces a single line of logs, complete with the address of the offender. For example, let us consider some fake logs of traffic from fake localhost addresses:

2020-08-01 12:06:09 WARN: Authentication failure for user bob (127.1.2.3)
2020-08-01 12:06:10 INFO: Authentication success for user alice (127.5.6.7)
2020-08-01 12:06:11 WARN: Authentication failure for user root (127.1.2.3)

An entirely made up log format showing two failed logins from the IP address 127.1.2.3

We want to give Fail2ban a regex which will match the lines signifying authentication failures, extracting the IP address of the originator. Fail2ban also needs to know how to parse the timestamps in the log. Given these pieces of information, Fail2ban will be able to figure out how many failed logins we have had in a given time period from a given originator, and will be able to ban those which exceed the allowed limits.

Let's write a configuration file, then. We'll call it madeup.conf, after our made up daemon, now called madeup:

[Definition]
datepattern = ^%%Y-%%m-%%d %%H:%%M:%%S\s
failregex = ^WARN: Authentication failure for user \S+ \((?:<IP6>|<IP4>)\)$ 

madeup.conf, a Fail2ban configuration file for the log format of our made up daemon, madeup

Our configuration file includes both a datepattern and a failregex. datepattern is what Fail2ban uses to extract the date, appropriately enough. It is a blend of a regular expression and the Python format codes for dates, with the %s doubled up. Fail2ban is frequently able to identify dates even if we do not specify a datepattern explicitly, by trying out a number of common formats and seeing if any fit. For the sake of example, and since we made up the log format, we have set the datepattern ourselves.

The second entry in our file is failregex. This is the regex that detects a failed login. Note that the string that this regex applies to is the log line without the date. This is why our datepattern had a \s at the end—the space that separates the last digit of the seconds from the WARN was removed, and thus our failregex can use a ^ anchor at the very start of the string.

Of interest in the failregex is the (?:<IP6>|<IP4>) group. This group captures either an IPv4 or IPv6 address. The captured address is what Fail2ban uses as the originator, so if there was some other IP address in the log line, we would not want to capture it with <IP6>|<IP4>. The angle bracket constructs available for capturing things are listed in the jail.conf(5) manual page—at least in the manual page from newer versions of Fail2ban, which does not appear easily available in a rendered format online.

Having written a configuration file, we can test it using fail2ban-regex. We save our made up log file under madeup.log, and we give fail2ban-regex both the log file and the config file, and get results which look something like this:

$ fail2ban-regex ./madeup.log ./madeup.conf

Running tests
=============

Use   failregex file : ./madeup.conf
Use      datepattern : ^Year-Month-Day 24hour:Minute:Second\s
Use         log file : ./madeup.log
Use         encoding : UTF-8


Results
=======

Failregex: 2 total
|-  #) [# of hits] regular expression
|   1) [2] ^WARN: Authentication failure for user \S+ \((?:<IP6>|<IP4>)\)$
`-

Ignoreregex: 0 total

Date template hits:
|- [# of hits] date format
|  [3] ^Year-Month-Day 24hour:Minute:Second\s
`-

Lines: 3 lines, 0 ignored, 2 matched, 1 missed
[processed in 0.00 sec]

|- Missed line(s):
|  2020-08-01 12:06:10 INFO: Authentication success for user alice (127.5.6.7)
`-

Running fail2ban-regex tests

Note that we have 2 matched lines—these are the bob and root failures. All 3 lines have matched for dates, which is expected, since they are all of the same format. It also tells us the successful alice login did not match, which is also what we wanted.

Multi-line logs

Suppose we have another daemon, this one called veryfake. veryfake logs each session over multiple lines of logs. Each session is tagged with a unique session ID in the logs, and that is how we can tell which lines go with which session. An example follows:

2020-08-01 12:06:09 aaaAAAb Connection established [client=fd12:3456::123]
2020-08-01 12:06:10 xxOWOxx Connection established [client=127.5.6.7]
2020-08-01 12:06:12 aaaAAAb Authentication failure [user=alice; method=password]
2020-08-01 12:06:13 aaaAAAb Disconnected
2020-08-01 12:06:14 xxOWOxx Authentication successful [user=bob; method=password]
2020-08-01 12:06:15 xxOWOxx Submitted request [id=12345]
2020-08-01 12:06:16 xxOWOxx Disconnected

A log from our fake daemon, veryfake

Here we have two sessions, each with a 7 character ID, which we are going to pretend was entirely randomly selected. The aaaAAAb session resulted in an authentication failure, while xxOWOxx was successful.

The traditional way to handle this situation in Fail2ban was to use a regex which spanned several lines, and used a (?P=ref) group backreference to match the session IDs. The new way to do it is a bit cleaner, and uses some new constructs apparently mostly documented in an issue comment and the changelog. As an aside here, if you are as confused about the use of "resp." in the changelog, as I was: it means the same thing as "i.e."; there is a note about it on the relevant Wiktionary article.

So, what does our veryfake.conf look like?

[Definition]
datepattern = ^%%Y-%%m-%%d %%H:%%M:%%S\s
prefregex = ^<F-MLFID>\w{7} </F-MLFID><F-CONTENT>.+</F-CONTENT>$

failregex = ^<F-NOFAIL>Connection established \[client=(?:<IP6>|<IP4>)\]</F-NOFAIL>$
            ^Authentication failure
            ^<F-NOFAIL><F-MLFFORGET>Disconnected</F-MLFFORGET></F-NOFAIL>$

veryfake.conf, a configuration file for the veryfake daemon log files

Our datepattern returns, since our timestamps look the same. A new field is prefregex—this is a regex which preprocesses each log line, separating out the session prefix from the main content part. The part within <F-MLFID>…</F-MLFID> is the session tag which Fail2ban uses to match up log lines to each other, and the part within <F-CONTENT>…</F-CONTENT> is what gets passed to the failregex. Note that our MLFID is 8 characters long: 7 for the session ID, and one space—consuming extra unchanging characters can be convenient and does not affect Fail2ban's ability to identify identical session IDs.

Our new failregex is also more complicated: there are actually three whole regexes. The first one is wrapped in <F-NOFAIL>…</F-NOFAIL>, which tells Fail2ban that a match here does not indicate a failed login. We are matching here just so we can grab the IP address.

The second regex simply matches an authentication failure. There are no IP addresses on the authentication failure line, so we cannot grab anything, but if the regex matches, Fail2ban will consider this a failed login.

The third regex is inside a <F-MLFFORGET>…</F-MLFFORGET> construct, which signals to Fail2ban that this session is over and can be forgotten, and that there is no need to consider it when processing further lines.

So, how can we tell this configuration file works? Toggling the verbose flag (-v) on fail2ban-regex can help us here:

$ fail2ban-regex -v ./veryfake.log ./veryfake.conf

# ... OUTPUT LINES OMITTED ...

Results
=======

Failregex: 5 total
|-  #) [# of hits] regular expression
|   1) [2] ^<F-NOFAIL>^Connection established \[client=(?:<IP6>|<IP4>)\]</F-NOFAIL>$
|      fd12:3456::123  Sat Aug 01 12:06:09 2020
|      127.5.6.7  Sat Aug 01 12:06:10 2020
|   2) [1] ^Authentication failure
|      fd12:3456::123  Sat Aug 01 12:06:12 2020
|   3) [2] ^<F-NOFAIL><F-MLFFORGET>Disconnected</F-MLFFORGET></F-NOFAIL>$
|      fd12:3456::123  Sat Aug 01 12:06:13 2020
|      127.5.6.7  Sat Aug 01 12:06:16 2020
`-

# ... OUTPUT LINES OMITTED ...

Lines: 7 lines, 4 ignored, 1 matched, 2 missed

# ... OUTPUT LINES OMITTED ...

Testing veryfake.conf

As we can see, there was one matched line. This is what we wanted, since there was one failed login in our logs. The Failregex listing also shows us that the ^Authentication failure match was actually associated with the correct IP address: fd12:3456::123. This indicates that the config worked: even though the IP address was not mentioned in the matching line itself, Fail2ban knew which one applies.

In the realer world

If we look at the filters that ship with Fail2ban, we can see that they are often quite a bit more complicated the filters we have created for madeup or veryfake.

Some of this complexity comes from the fact that many stock Fail2ban filters are configurable: you can specify that you want more aggressive filtering, which will add extra regexes to the list of failregexes for the filter. Fail2ban also supplies some reusable functionality to simplify use with common logging systems.

Let's say we have made the controversial decision to send our logs from the veryfake daemon to systemd-journald. Fail2ban supports journald (you might have to install some extra dependencies), but we need to make some modifications to our filter config file. To this end, we import common.conf, and interpolate parts of it into our configuration:

[INCLUDES]
before = common.conf

[Definition]

_daemon = veryfake
__prefix_line = %(known/__prefix_line)s(?:\w{7} )

prefregex = ^<F-MLFID>%(__prefix_line)s</F-MLFID><F-CONTENT>.+</F-CONTENT>$

failregex = ^<F-NOFAIL>Connection established \[client=(?:<IP6>|<IP4>)\]</F-NOFAIL>$
            ^Authentication failure
            ^<F-NOFAIL><F-MLFFORGET>Disconnected</F-MLFFORGET></F-NOFAIL>$NOFAIL>$

journalmatch = _SYSTEMD_UNIT=veryfake.service

A new veryfake.conf with support for journald

We now have a _daemon entry, which is a regex that matches the name of our daemon. In our case, this is very simple, because our daemon's executable is called just veryfake.

We also have a __prefix_line entry, which interpolates the upstream __prefix_line from the included common.conf—this is what the %(__prefix_line)s part does. Our newly reassigned __prefix_line is thus the old __prefix_line concatenated with our previous regex for the session ID: %(known/__prefix_line)s and (?:\w{7} ), respectively.

The __prefix_line is then interpolated into prefregex, where it serves as our MLFID (i.e. our session ID). As mentioned before, it is okay for the MLIFD consumed by Fail2ban to include more than just strictly the session ID string itself.

At the end, we include a journalmatch entry. As the name indicates, this is a hint on how to get what we want from journald. In our case, the systemd unit is called veryfake.service. This is the same name you would pass to systemctl status veryfake.service or journalctl -u veryfake.service. You might omit the .service ending in every day usage, as it is implicit, but you can just add it back in here.

One question arises: what is the upstream __prefix_line that we have expanded on? It is the stuff that journalctl would output before the actual logged line, if you were to ask it to print logs to terminal. Fail2ban seems to prefer text output from journald, rather than one of the serialization formats journald provides for consumption by computers, so it is those text strings that our regexes get applied to. As an example, consider this contrived journal output on a host called somehost, where veryfake is running with the PID 68419 (which we will pretend was assigned by the OS):

Aug 01 12:06:09 somehost veryfake[68419]: aaaAAAb Connection established [client=fd12:3456::123]
Aug 01 12:06:10 somehost veryfake[68419]: xxOWOxx Connection established [client=127.5.6.7]
Aug 01 12:06:12 somehost veryfake[68419]: aaaAAAb Authentication failure [user=alice; method=password]
Aug 01 12:06:13 somehost veryfake[68419]: aaaAAAb Disconnected
Aug 01 12:06:14 somehost veryfake[68419]: xxOWOxx Authentication successful [user=bob; method=password]
Aug 01 12:06:15 somehost veryfake[68419]: xxOWOxx Submitted request [id=12345]
Aug 01 12:06:16 somehost veryfake[68419]: xxOWOxx Disconnected

Contrived output of journalctl -eu veryfake

Even without explicitly specifying a date format, Fail2ban should successfully identify the timestamps. The upstream __prefix_line would—after timestamp removal—match somehost veryfake[68419]: , the veryfake substring specifically being matched by our _daemon regex. After we modified __prefix_line, the regex would match somehost veryfake[68419]: aaaAAAb , including the session ID.

We can test our new, more complicated filter, but because it includes an external file we will have to drop it in among Fail2ban config files. It is possible to check out the Fail2ban repository, and use that, which is easier than dropping the file in our live /etc/fail2ban. We can copy the config file to config/filter.d/veryfake.conf inside the repo, save our contrived journald output to veryfake-journal.log, and then run fail2ban-regex on them as usual.

Deploying the filter

To actually use the newly crafted filter in our live Fail2ban instance (and hopefully not ban ourselves from our server forever), we need to copy it to /etc/fail2ban/filter.d/veryfake.conf. We can then edit /etc/fail2ban/jail.local and add an entry to enable our new filter:

[veryfake]
mode = normal
enabled = true
port = smtp,12345
backend = systemd

The part of jail.local that enables our veryfake filter

Of note here is the port setting, which tells Fail2ban what ports the veryfake daemon listens on, and so which ports should potentially be blocked. The strings used here seem to be based on /etc/services. If you leave it blank, it will block the offending IP addresses on all ports.

After adding these changes, we can restart the Fail2ban daemon. After restart, Fail2ban should report in its logs that it has now loaded the "veryfake" jail. After some time, remote IP addresses might start getting banned, a fact which should then also be recorded in the log files. Alternatively, the fail2ban-client command can also be used to display statistics by issuing fail2ban-client status veryfake.

Caveats

The core of any filter in Fail2ban is one or more regexes, and those regexes have to be written with care. Quick and dirty regexes might miss some corner cases, or worse, produce false positives. Such filters might be serviceable, but might not be appropriate for wider use. Filters included in Fail2ban itself are often developed by examining the source code of the target program, to see what kind of things the program outputs to logs. Filters are also developed while keeping in mind that user input reproduced in logs might cause spurious matches, such as when a user submits an IP address string as a username.

It is also important to keep in mind that Fail2ban is not an all-powerful method of protecting a server from intrusion. It does clean up logs, and prevents bots from trying to brute force their way into a server with impunity, but probably will not prevent an attacker from logging into a server where sshd accepts password based authentication, and root's password is password.

Further reading