At Lanedo we’ve been working on file system monitoring in many contexts, like Gvfs/GIO development or Tracker, and usually we get asked about which are the available interfaces in the Linux kernel…
The history behind filesystem monitoring interfaces in Linux can be summarized as follows:
dnotify ⊂ inotify ⊈ fanotify
Let’s look a bit more in detail what all this is about…
dnotify is a directory monitoring system officially released in Linux 2.4.0, which provides a very limited way of interacting with the kernel to get notifications of changes in files inside a given directory.
dnotify filesystem monitoring tool is implemented as a new
F_NOTIFY operation available in the
fcntl() system call, and thus it is based on the manipulation of standard file descriptors retrieved with
open() calls on existing directories.
dnotify allows registering for different types of events, which can be specified as a bit mask passed to the
fcntl() call. Among the type of events that
dnotify supports, we have file accesses, file creations, file content modifications, file attribute modifications or file renames within the directory.
Events are notified to user-space via signals raised on the process. By default,
dnotify will use
SIGIO signal for that purpose, although it is recommended to use some other real-time signal, in the [
SIGRTMAX] range (configurable with the
F_SETSIG operation in
The drawbacks and limitations of
dnotify are given mainly by how its interface was designed. To list just a few of the most obvious ones:
- Cannot monitor single files:
dnotifycan monitor a directory and its contents, but not single files. If only a single file needs to be monitored, its whole parent directory needs to be monitored.
- Prevents unmount of partitions: If the path being monitored is within a partition that may get unmounted, as long as the file descriptor is open, the unmount operation won’t be allowed. In order to unmount the partition, your process will need to close all file descriptors corresponding to paths inside the mount point of the partition. This makes
dnotifyespecially problematic when working with removable media.
- Limited event information: Signals are definitely a poor interface between kernel and user-space. No additional interesting data is provided in the signal handler, besides the file descriptor number, and therefore the process receiving the signal will need to
stat()all files in the given directory, and compare the results with previously cached results, in order to know what event exactly happened and in which file.
inotify is an inode monitoring system introduced in Linux 2.6.13. This API provides mechanisms to monitor filesystem events in single files or directories. When monitoring directories,
inotify will return events both for the directory itself and for files inside it. As such,
inotify is a full replacement of
dnotify, and avoids most of its issues.
dnotify‘s signal-based interface with user-space,
inotify is implemented as a device node which can be opened and read with a single file descriptor. Also,
inotify comes with its own system calls:
inotify_init() to create a new monitoring instance with its own file descriptor,
inotify_add_watch() to tell the instance to monitor a given file or directory (which returns a watch descriptor), and
inotify_rm_watch() to remove the monitoring of the file or directory. The single
inotify file descriptor, therefore, can be used to monitor multiple paths (i.e. a single instance can manage multiple watch descriptors).
The user-space application then just needs to
POLLIN events in the single
inotify file descriptor, and
information about what event happened in an
inotify_event struct. This struct includes several things, like the specific watch descriptor which triggered the event (so the user can map it to the actual path monitored), a mask of events that happened in the watch, a filename (in case the watch was for a directory), and last but not least a cookie to synchronize events.
Thanks to this cookie value given,
inotify is capable of not only supporting all of the event kinds that
dnotify supported, but also providing support to monitor file or directory rename and move events across different directories. For example, a move of one file to another directory in the same mount point will trigger an
IN_MOVED_FROM event on the source directory with cookie A, and an
IN_MOVED_TO event on the target directory with the same cookie A (asuming that both source and target are monitored, of course). This kind of event matching is specially useful to e.g. file indexer applications like Tracker, as the indexer can just assume that the file was moved (path changed) but not its contents (so no need to re-index the file).
And due to the fact that standard file descriptors are no longer used as base, monitoring a given file or directory doesn’t prevent the mount point where it resides from being unmounted. Actually,
inotify itself will notify via
IN_UNMOUNT events when that happens to one of your monitored paths.
inotify is not perfect. Some of the most strong criticisms are:
- Maximum number of
inotifyinstances and watches per instance: The kernel imposes some (configurable) limits to the number of
inotifyinstances a user can create (
/proc/sys/fs/inotify/max_user_instances, e.g. 128) and also to the number of watches a given user can set per instance (
/proc/sys/fs/inotify/max_user_watches, e.g. 8192). This effectively limits the amount of paths a user can monitor.
- No recursive monitoring: There is no way to tell the kernel to request monitoring a given directory and all its subdirectories.
- Map between watch descriptor and real path: The user needs to keep itself the map of which watch descriptor corresponds to which path. Not that this is a limitation, just a bit of a burden for the user who monitors lots of paths.
The API to setup monitoring using
fanotify is similar to that used in
inotify; this is, we have a
fanotify_init() syscall which will give the user a new
fanotify file descriptor, and a
fanotify_mark() syscall to add or remove marks (watches). The user-space application then just needs to
POLLIN events in the single
fanotify file descriptor, and
read() information about what event happened in a
Since the very beginning, the most advertised feature of
fanotify was that it allowed recursive monitoring, which can be accomplished by using a special
FAN_MARK_MOUNT when adding a new mark. Once such a mark is set for a mount point path, the user will get events in any directory available in the same mount.
fanotify_event_metadata struct will not tell you on which file, path or watch an event happened. Instead, it will give you an open file descriptor to the exact file or directory where the event happened. Giving a file descriptor will let the user gather the full path of the file by
/proc/self/fd file, so in some way it helps user-space as there is no longer the need to have a map of watches vs paths.
Giving an open file descriptor is the basis for the other new big feature that this monitoring system provides: file access control. A process using
fanotify can request the kernel not only to be notified about events happening, but also can tell the kernel to allow or forbid access to open a given file by a given process. Just think of the most obvious use case for this feature, an antivirus software. An antivirus monitor needs to be able to analyze a file being opened before the user gets to open it, and then allow or disallow the open operation. Before this system was in place, antivirus programs usually relied on the out-of-tree maintained Dazuko kernel driver to provide the same level of file-access control. But now,
fanotify provides a mechanism for such programs to let them decide whether a given user-space process will be able to access a file.
The last big improvement coming with fanotify is that the kernel will not only give the file descriptor of the file where the event happened, but also the PID of the program which caused the event to happen. This is very useful for different programs, if, for example, they want to ignore events created by themselves (e.g. Tracker’s writeback).
What not everyone seemed to understand, though, was that
fanotify is not an
inotify replacement. Let me re-state the same thing with other words:
fanotify doesn’t provide the same set of events that
inotify provides. The root issue of this is again the open file descriptor in the
fanotify_event_metadata struct API which we talked about before. Using an open file descriptor to notify where an event happened directly breaks the possibility of notifying events like file deletions or renames/moves… So, fanotify does NOT notify file deletions, file renames or file moves. Of course, it also doesn’t provide cookies to match source vs destination move events, as
inotify did. To be fair, let’s say that
fanotify covers just a subset of the use cases you could have with
inotify; but also adds some new ones.
The other big drawback of
fanotify is that it currently is root-only (
CAP_SYS_ADMIN-only to be more specific). This means that only the root user can request to use the monitoring capabilities provided by
fanotify, and therefore non-root use cases (like the Tracker file indexer mentioned earlier) are unable to use it.
I’ve prepared some simple examples showing all the previous technologies, so if you want to play more with them, take a look at:
- dnotify example: reports events in a directory
- inotify example: reports events in a directory
- fanotify example, simple directory monitoring: reports events in a directory
- fanotify example, recursive mount monitoring: reports all events in a mount point
- fanotify example, access control: won’t allow opening files which have ‘666’ in the path or filename 🙂
So who uses fanotify nowadays?
There are lots of programs out there using
inotify, but the same cannot be said for
fanotify, even several years after having it publicly available in the upstream Linux kernel.
The following list of Free Software projects using
fanotify is, from what I can gather, close to complete:
- fatrace is a monitor of system-wide file access events, basically equivalent to the mount monitoring example provided in the previous section, but more polished.
- systemd, the init system, uses
fanotify‘s mount monitoring capabilities in its
- FirefoxOS uses
fanotifyin their disk space watcher implementation.
That list does not include any project using
fanotify‘s file access control feature. Of course, several proprietary antivirus programs do make use of that feature.
Do you know of other projects using fanotify, or are you planning to migrate one? Do not hesitate to leave a comment in the post!