What’s new in Tracker 1.2?

Minions-Happy
Every 6 months or so we produce a new stable release and for Tracker 1.2 we had some new exciting work being introduced. For those that don’t know of Tracker, it is a semantic data storage and search engine for desktop and mobile devices. Tracker is a central repository of user information, that provides two big benefits for the user; shared data between applications and information which is relational to other information (for example: mixing contacts with files, locations, activities and etc.).

Providing your own data

Earlier in the year a client came Lanedo and to the community asking for help on integrating Tracker into their embedded platforms. What did they want? Well, they wanted to take full advantage of the Tracker project’s feature set but they also wanted to be able to use it on a bigger scale, not just for local files or content on removable USB keys. They wanted to be able to seamlessly query across all devices on a LAN and cloud content that was plugged into Tracker. This is not too dissimilar to the gnome-online-miners project which has similar goals.

The problem

Before Tracker 1.2.0, files and folders came by way of a GFile and GFileInfo which were found using the GFileEnumerator API that GLib offers. Underneath all of this the GFile* relates to GLocalFile* classes which do the system calls (like lstat()) to crawl the file system.

Why do we need this? Well, on top of TrackerCrawler (which calls the GLib API), is TrackerFileNotifier and TrackerFileSystem, these essentially report content up the stack (and ignore other content depending on rules). The rules come from a TrackerIndexingTree class which knows what to black list and what to white list. On top of all of this is TrackerMinerFS, which (now is inaccurately named) handles queues and processing of ALL content. For example, DELETED event queues are handled before CREATED event queues. It also gives status updates, handles INSERT retries when the system is busy and so on).

To make sure that we take advantage of existing technology and process information correctly, we have to plugin at the level of the TrackerCrawler class.

The solution

Essentially we have a simple interface for handling open and close cases for iterating a container (or directory) called TrackerDataProvider interface (and TrackerFileDataProvider implementation for the default or existing local file system case).

That is followed up with an enumerator interface for enumerating that container (or directory). That is called TrackerEnumerator and of course there is a TrackerFileEnumerator class to implement the previous functionality that existed.

So why not just implement our own GFile backend and make use of existing interfaces in GLib? Actually, I did look into this but the work involved seemed much larger and I was conscious of breaking existing use cases of GFile in other classes in libtracker-miner.

How do I use it?

So now it’s possible to provide your own data provider implementation for a cloud based solution to feed Tracker. But what are the minimum requirements? Well, Tracker requires a few things to function, those include providing a real GFile and GFileInfo with an adequate name, and mtime. The libtracker-miner framework requires the mtime for checking if there have been updates compared to the database. The TrackerDataProvider based implementation is given as an argument to the TrackerMiner object creation and called by the TrackerCrawler class when indexing starts. The locations that will be indexed by the TrackerDataProvider are given to the TrackerIndexingTree and you can use the TRACKER_DIRECTORY_FLAG_NO_STAT for non-local content.

Crash aware Extractor

In Tracker 1.0.0, the Extractor (the ‘tracker-extract’ process) used to extract metadata from files was upgraded to be passive. Passive meaning, the Extractor was only extracting content from files already added to the database. Before that, content was concatenated from the Extractor to the file system miner and inserted into the database collectively.

Sadly with 1.0.0, any files that caused crashes or serious system harm resulting in the termination of ‘tracker-extract’ were subsequently retried on each restart of the Extractor. In 1.2.0 these failures are noted and files are not retried.

New extractors?

Thanks to work from Bastien Hadess, there have been a number of extractors added for electronic book and comic books. If your format isn’t supported yet, let us know!

Updated Preferences Dialog

Often we get questions like:

  • Can Tracker index numbers?
  • How can I disable indexing file content?

To address these, the preferences dialog has been updated to provide another tab called “Control” which allows users to change options that have existed previously but not been presented in a user interface.

tracker-preferences-1.2

In addition to this, changing an option that requires a reindex or restart of Tracker will prompt the user upon clicking Apply.

What else changed?

Of course there were many other fixes and improvements as well as the things mentioned here. To see a full list of those, see them as mentioned in the announcement.

Looking for professional services?

If you or someone you know is looking to make use of Open Source technology and wants professional services to assist in that, get in touch with us at Lanedo to see how we can help!

Posted in Development, Gnome, Tracker Tagged with:

WHITE PAPER: Qualcomm Gobi devices in Linux based systems

qualcomm-gobi

Over the past few years, Aleksander Morgado has written about some of the improvements happening in the Linux world for networking devices, including Improving ModemManager for 3GPP2 Gobi 2k3k devices, Workarounds for QMI modems using LTE and other modem advances on GNU/Linux.

Late last year Aleksander wrote a white paper entitled Qualcomm Gobi devices in Linux based systems to summarise Qualcomm’s offerings for Gobi devices.

For some time now there has been more than one way to use your Qualcomm Gobi device on Linux, there are multiple drivers for the Linux Kernel for a start. Vendors looking to use Qualcomm’s hardware in their products may not know what approach to use and this white paper gives insight and clarity to help.

Recently Aleksander blogged a snippet of what his white paper talks about, titled qmiwwan or gobinet. Lanedo is now making Aleksander’s white paper available for download to everyone.

If this white paper was of interest to you and you would like to get in touch with us about this or other services Lanedo offers, leave your comments or email us!

Posted in Blog, Networking Tagged with: , , , ,

A quest for speed in compiling

Ever spent time looking a scrolling console wishing that compilation was taking less time? In this post we’re going to explore the various way to fasten the build of $YOUR_SOFTWARE_PROJECT.

Zebra Pares

Let’s start with the simpler test case which will act as a reference: simple debug build compiled using make and single job.

For the project I’m using for my tests this leads to:

make: 2min37

Parallel Makefile build

The first obvious way to build faster is to use parallel jobs. Given that the computer used for testing has 4 processors (but only 2 physical core), what is the optimal N parameter to give to make ?

make -j2: 1min31
make -j3: 1min21
make -j4: 1min15
make -j5: 1min15

We hit a plateau at N=4 so there’s no need to use a bigger value.

Ninja build system

As my testing project uses CMake, we’re able to test cotire. It’s advertised as a small build system with a focus on speed so let’s see how fast it really is.

ninja: 1min17

Wow! ninja is way faster than make!

Well… until you read ‘ninja –help’ output:

-j N run N jobs in parallel [default=6, derived from CPUs available]

So ninja uses parallel builds by default, forcing N=1 leads to:

ninja -j 1: 2min40

which is identical to what make gets.
That makes sense though for at least 2 reasons: first it’s a small project so the time spent in build system is way slower than the time spent compiling.
Also ninja/make sources files are both automatically generated from CMakeList.txt – manually writing rules.ninja may be required to take advantadge of the supposed speedup brought by ninja.

Precompiled headers (pch)

Another approach to speedup the build is to use precompiled headers. The tested code base uses C++ and templates, so let’s see how much we can gain from this.
Precompiled headers can be done manually, but it’s boring so I’d prefer to use an automatic approach.
Cotire is a CMake module designed to improve C++ project build time by auto-generating precompiled headers. The integration is straightforward (one file and one line to add to CMakeList.txt) so let’s jump directly to the result:

make -j4: 1m01.513s

That’s a nice improvement, especially considering the time required to setup cotire 🙂

Distributed build

So far we’ve used only 1 computer but we could use several! Let’s see what happens when you add another computer and use distcc to manage the distributed build.
The new computer needs 3min to build the software using make -j3 and is connected using a 100Mbps switch.

On each computer that we wish to include in our build array we run:

distccd -j N -p 12345 -a 192.168.1.0/24 --no-detach --log-stderr

Then on the computer controlling the build:

export DISTCC_HOSTS="192.168.1.52:12345,lzo,cpp localhost:12345,lzo,cpp" CXX="distcc g++" CC="distcc gcc" cmake ..

and the results:

distcc-pump make -j80: 1min04

Not that impressive…
So, let’s throw another machine at it. This 3rd machine is capable of building in 1min30 using make -j4 and is connected using a 1Gbps switch.

distcc-pump make -j80: 0m44.144s

And now a fancy graph showing all the above numbers (converted to seconds):

What conclusion can we draw from all this?

  • precompiled headers are a good improvement for C++ projects
  • distcc is easy to setup and can bring build speedup but you'll need a significant project to really appreciate the gain. Also several phases of the build are not parallelisable (source generation, linking, etc)
  • CMake is my favorite build system 🙂*

 

*don't worry LibreOffice developers reading this, I won't suggest to switch to CMake 🙂

Posted in Blog, Development Tagged with: , , ,

The Main Loop: The Engine of a GUI Library

In this blog post, we will have a look at the “main loop”, which is the engine of most Graphical User Interface (GUI) libraries. Different GUI libraries implement their own main loop. When porting a GUI library to another platform, such as porting GTK+ to OS X, the main loops of GTK+ and the native toolkit of the platform need to be integrated. During this summer, Lanedo has looked into a number of issues with this integration of the GTK+ and OS X main loops. As part of this, we had to really delve into the internals of the GTK+ main loop. This blog post describes what we have learned.

A main loop is the engine of a modern GUI library.

A main loop is the engine of a modern GUI library.

Modern GUI libraries have in common that they all embody the Event-based Programming paradigm. These libraries implement GUI elements that draw output to a computer screen and change state in response to incoming events. Events are generated from different sources. The majority of events are typically generated directly from user input, such as mouse movements and keyboard input. Other events are generated by the windowing system, for instance requests to redraw a certain area of a GUI, indications that a window has changed size or notifications of changes to the session’s clipboard. Note that some of these events are generated indirectly by user input.

In general, libraries that employ the Event-based Programming paradigm rely on main loops to make sure input is received and events are dispatched. To receive input, main loops poll input sources. Examples of input sources are input drivers for mouse and keyboard or sources to receive specific window manager events. For instance, the DirectFB library supports obtaining input from multiple different input drivers, such as tslib, lirc and the Linux Input subsystem. As another example, GTK+ when using the X11 backend receives all input in the form of XEvents from the X11 libraries. This can be seen as a single input source. The X11 system already abstracts different event sources behind a single event interface. XEvents are defined for mouse and keyboard input and for window management events such as notifications for changes to window size.

A GUI library uses events internally. Controls that are implemented in the GUI library thus set up “event handlers” to process incoming events. The GUI library generates these events from input it gets. So, a translation takes place from received input to an event that is dispatched.

Anatomy of a main loop

Main loops are in charge of receiving input and dispatching events. As an example, we will briefly describe the anatomy of the GLib main loop. The GLib main loop manages a collection of “event sources”. It is easy to add your own event sources, but this is outside the scope of this blog post. An event source consists out of a number of callback functions: prepare, check and dispatch. Next to these callback functions, a GSource contains information on which file descriptors to poll for this source and a poll timeout. All the GLib main loop does is monitoring these event sources and calling dispatch when an event has occurred. This is done by going through five phases in a loop:

  • prepare: prepares the sources to be polled, using the prepare callback on the event sources. An event source has to prepare to be polled but may also indicate it does not have to be polled (for instance, when it knows already that there’s an event waiting).
  • query: determines all information that is necessary to perform the poll operation for a main loop. For example, an array of file descriptors to be polled is set up.
  • poll: actually performs the poll operation, which waits for an event to happen on a file descriptor until a given timeout expires. As a timeout the smallest timeout specified by the monitored event sources is used.
  • check: the results of the poll operation are written into the poll records provided by the sources and the event source is requested to check whether an event has happened by calling its check callback.
  • dispatch: the dispatch callback is called for event sources that indicated an event has occurred.
This image, taken from the GLib documentation, shows the different states of the GLib main loop.

This image, taken from the GLib documentation, shows the different states of the GLib main loop.

As an example event source, consider the event source installed by GDK to listen for X11 events. The file descriptor by which communication with the X server is carried out is associated with this event source. The event source implements the different callbacks as follows:

  • prepare: checks whether any XEvents already received from the X server are still pending in the event queue. If this is the case, then there is no need to include this event source in the poll.
  • check: if an input event has occurred on the registered file descriptor, it checks whether there now is an event present in the event queue.
  • dispatch: get new XEvents, unqueue an event and dispatch this event by calling gdk_event_func.

Main loops can commonly also be invoked recursively. A common example is when showing a modal dialog window from within a callback function or when updating a progress bar while a large computation is being processed (e.g. GIMP applying a filter to an image) within a callback function. In these cases, the main loop is run periodically so that the flow of events does not get stuck. The GLib main loop supports this kind of recursion from within its dispatch stage.

Integration of main loops

We now have a good idea of what the responsibilities of a main loop look like and how a main loop functions. Different toolkits, or libraries, implement their own main loops. For instance, GTK+, Qt and EFL all implement their own main loop. What happens if we want to use two libraries with each other, that implement their own main loop, without too much pain?

When we have two main loops, how do we integrate, or connect, these with one another?

When we have two main loops, how do we integrate, or connect, these with one another?

As a first example, consider that we are writing an application using EFL but want to use a library that uses the GLib main loop. Within EFL, the main loop is implemented within Ecore. Ecore contains explicit support to integrate the GLib main loop in the EFL main loop. This can be achieved by calling the function ecore_main_loop_glib_integrate. According to the documentation of this function, this support is often used to use libraries such as GUPnP and LibSoup within EFL applications.

The Ecore library uses the select system call by default in its implementation of the main loop. This select function can be replaced with a custom select function when desired. This is how the GLib main loop is integrated. A custom select function is installed, which calls the relevant phases of the GLib main loop (prepare, query, check and dispatch as described above) and performs the polling phase by calling select for the file descriptors monitored by Ecore as well as the file descriptors that need to be monitored for the event sources installed in the GLib main loop. So, in this case, the GLib main loop is a secondary main loop and is integrated with the Ecore main loop.

A more complicated example of main loop integration can be found in GTK+ itself. In the OS X backend the GLib main loop and the Cocoa main loop are integrated with one another. The main problem that needed to be solved here is that one cannot determine whether new Cocoa events have arrived by checking for events on a file descriptor. As a consequence, we cannot write an event source that can be used together with the GLib main loop which relies on using poll. To get around this, the code uses a feature of the GLib main loop API which allows one to install a custom poll function. In this custom poll function, we can determine whether new Cocoa events are pending using an Objective-C call ([NSApp nextEventMatchingMask]).

This call does not solve all of our problems. It is not possible to wait for both file descriptors and Cocoa events at the same time. When it is necessary to wait for activity on either of these, another trick is used. A helper thread is used to perform the call to poll on a given set of file descriptors and the main thread calls the Objective-C call to wait for new events. Using these two tricks, the Cocoa main loop has been integrated into the GLib main loop and the GLib main loop is the primary main loop processing events.

It may seem we have covered everything now, but there’s even more. Remember that a GLib main loop may recurse within the dispatch phase to implement modal operations? The Cocoa main loop does this as well. In the dispatch phase, certain events are sent to Cocoa to be handled. An example are window resizing events. So, within the dispatch phase, the Cocoa main loop may be run. While the Cocoa main loop is processing, the GLib main loop is essentially blocked. How does one get around this? The solution that has been implemented is to iterate the GLib main loop while the Cocoa main loop is running. In fact, this is an integration of the GLib main loop within the Cocoa main loop. Within the OS X backend, main loop integrations in both directions takes place!

Schematic overview of the integration of the GLib and Cocoa main loops.

Schematic overview of the integration of the GLib and Cocoa main loops.

Integration of the GLib main loop within the Cocoa main loop is implemented how one would expect: call the different GLib main loop phases at the right moment, as can be seen in the figure. This is done by “observing” the Cocoa run loop. Through an observer callback, the application process is informed in what state the Cocoa run loop currently is. Based on this state, the corresponding GLib main loop phase is called. Note that the Cocoa main loop is also run from the modified poll func, while waiting for a new Cocoa event to arrive. In this case, the GLib main loop is not integrated within the Cocoa main loop, to avoid recursing the GLib main loop from within its poll phase.

The integration of the GLib and Cocoa main loops is one of the hardest to understand parts of the GTK+ OS X code base. This integration code was implemented by Owen Taylor. The implementation is thoroughly commented, see GNOME Bug 550942 and the relevant source file in the GTK+ git repository: gtk+/gdk/quartz/gdkeventloop-quartz.c. Together with the general description of the GLib main loop provided in this article, it should be possible to understand most of the code involved.

 

In this article, we have explored the engine of GUI libraries: the main loop. We have seen how the GLib main loop is implemented and how the GLib main loop can be used with different toolkits. At Lanedo, we have a lot of experience with different GUI toolkits on different software platforms. If this is something your company needs help with, do not hesitate contact us!

Posted in Blog, Gnome Tagged with: , ,

Exploring the LibreOffice code base

http://www.flickr.com/photos/marine_corps/9717971226/sizes/z/in/photostream/

Opening LibreOffice’s source code for the first time, the amount of code that a new developer has to sift through can be intimidating and off-putting. So, here are some useful locations within the LibreOffice source directory structure that should help get you started.

General layout

LibreOffice consists of over a hundred interdependent modules, each contained within a subdirectory of the LibreOffice root directory. Please note that all paths hereafter are relative to this directory, unless otherwise specified. The layout of a module tends to follow a general pattern, with at least the following places of interest:

moduledir/README
Usually contains useful information about the module purpose and contents. A list of all modules and the first line of their README can be found at docs.libreoffice.org.
moduledir/*.mk
gbuild makefiles for various build possibilities.
moduledir/source/
Source code (usually C++). There is often some sort of division of code into submodules.

Headers

Headers may be found in several places, depending on the minimum required scope of the interface defined.

include/
Inter-module headers.
moduledir/inc/
Intra-module headers.
moduledir/source/submoduledir/inc/
Headers required only by a submodule.

Headers may also occasionally be found alongside their .cxx implementations.

UI

For modules where a GUI is relevent, there are two main possibilities. Work on converting the user interface specifications from the older, inflexible legacy format (.src/.hrc) to the xml-based Glade/Gtk3 format (.ui) is currently underway throughout the codebase.

moduledir/uiconfig/
New, Glade-style .ui files.
moduledir/source/ui/
Older .src files.

Build system

There’s an informative slide deck from the 2013 LibreOffice conference in Milan that discusses state of the LibreOffice build system. While you will find a brief overview below, this presentation goes into far more detail if that is what’s required.

solenv/
Contains many important parts of the build system.
solenv/gbuild
gbuild implementation.
solenv/bin ; solenv/bin/modules/
Perl build and packaging tools.
scp2/
Configuration of packinging and installation.

Once LibreOffice has been built using ./autogen.sh followed by make, a complete runnable installation can be found in:

instdir/program/

There, soffice.bin and soffice can be found. The former is the main LibreOffice binary. On the first run, it creates the user profile and exits and can then be executed again as normal. To avoid this, the wrapper soffice should be used for running LibreOffice for testing purposes. When running LibreOffice via a debugger, soffice.bin should be used directly.

Main components

LibreOffice Writer

Due to LibreOffice’s history as a product of Sun’s StarDivision, the main application component paths contain hints toward this legacy. The Writer module, for example, is contained within the directory sw/ (StarWriter).

sw/
Main Writer module.
starmath/
The mathematical formula editor.
swext/
Shipped Writer extensions.

LibreOffice Calc

sc/
Main Calc code.
chart2/
Implementation of charts for Calc.

LibreOffice Draw (and LibreOffice Impress)

sd/
Draw and Impress share a module and quite a bit of code here.
sdext/
Extensions for Draw and Impress.

LibreOffice Impress only

slideshow/
Slideshow engine for Impress.

Graphics modules

svx/
Contains graphics helper code shared by several major modules, but especially Draw and Impress.
drawinglayer/
Provides an API for drawing objects.

Documents

sfx2/
Contains the framework used by sw, sc and sd to dispatch actions to the document shells. This module includes document load and save handling, which invokes the correct import and export filters, respectively.
writerfilter/
Writer .rtf import filter and part of the .docx import filter.
writerperfect/
A family of Writer import filters including WordPerfect, Microsoft Publisher and Microsft Visio file format import filters.
oox/
Suppport for parsing Microsoft’s OOXML format (.docx, .xlsx, etc.)

Hopefully you’ll now have an idea of where to get started if you want to tackle a bug or feature in LibreOffice. For more help and resources, see my LibreOffice Development Howto or, if you want professional support, don’t hesitate to contact us.

Posted in Blog, Development, LibreOffice Tagged with: ,

Filesystem monitoring in the Linux kernel

Evolution

At Lanedo we’ve been working on file system monitoring in many contexts, like Gvfs/GIO development or Tracker, and usually we get asked about which are the available interfaces in the Linux kernel…

The history behind filesystem monitoring interfaces in Linux can be summarized as follows:

dnotify ⊂ inotify ⊈ fanotify

Let’s look a bit more in detail what all this is about…

dnotify

dnotify is a directory monitoring system officially released in Linux 2.4.0, which provides a very limited way of interacting with the kernel to get notifications of changes in files inside a given directory.

The dnotify filesystem monitoring tool is implemented as a new F_NOTIFY operation available in the fcntl() system call, and thus it is based on the manipulation of standard file descriptors retrieved with open() calls on existing directories.

dnotify allows registering for different types of events, which can be specified as a bit mask passed to the fcntl() call. Among the type of events that dnotify supports, we have file accesses, file creations, file content modifications, file attribute modifications or file renames within the directory.

Events are notified to user-space via signals raised on the process. By default, dnotify will use SIGIO signal for that purpose, although it is recommended to use some other real-time signal, in the [SIGRTMIN,SIGRTMAX] range (configurable with the F_SETSIG operation in fcntl()).

The drawbacks and limitations of dnotify are given mainly by how its interface was designed. To list just a few of the most obvious ones:

  • Cannot monitor single files: dnotify can monitor a directory and its contents, but not single files. If only a single file needs to be monitored, its whole parent directory needs to be monitored.
  • Prevents unmount of partitions: If the path being monitored is within a partition that may get unmounted, as long as the file descriptor is open, the unmount operation won’t be allowed. In order to unmount the partition, your process will need to close all file descriptors corresponding to paths inside the mount point of the partition. This makes dnotify especially problematic when working with removable media.
  • Limited event information: Signals are definitely a poor interface between kernel and user-space. No additional interesting data is provided in the signal handler, besides the file descriptor number, and therefore the process receiving the signal will need to stat() all files in the given directory, and compare the results with previously cached results, in order to know what event exactly happened and in which file.

inotify

inotify is an inode monitoring system introduced in Linux 2.6.13. This API provides mechanisms to monitor filesystem events in single files or directories. When monitoring directories, inotify will return events both for the directory itself and for files inside it. As such, inotify is a full replacement of dnotify, and avoids most of its issues.

Instead of dnotify‘s signal-based interface with user-space, inotify is implemented as a device node which can be opened and read with a single file descriptor. Also, inotify comes with its own system calls: inotify_init() to create a new monitoring instance with its own file descriptor, inotify_add_watch() to tell the instance to monitor a given file or directory (which returns a watch descriptor), and inotify_rm_watch() to remove the monitoring of the file or directory. The single inotify file descriptor, therefore, can be used to monitor multiple paths (i.e. a single instance can manage multiple watch descriptors).

The user-space application then just needs to poll() for POLLIN events in the single inotify file descriptor, and read() information about what event happened in an inotify_event struct. This struct includes several things, like the specific watch descriptor which triggered the event (so the user can map it to the actual path monitored), a mask of events that happened in the watch, a filename (in case the watch was for a directory), and last but not least a cookie to synchronize events.

Thanks to this cookie value given, inotify is capable of not only supporting all of the event kinds that dnotify supported, but also providing support to monitor file or directory rename and move events across different directories. For example, a move of one file to another directory in the same mount point will trigger an IN_MOVED_FROM event on the source directory with cookie A, and an IN_MOVED_TO event on the target directory with the same cookie A (asuming that both source and target are monitored, of course). This kind of event matching is specially useful to e.g. file indexer applications like Tracker, as the indexer can just assume that the file was moved (path changed) but not its contents (so no need to re-index the file).

And due to the fact that standard file descriptors are no longer used as base, monitoring a given file or directory doesn’t prevent the mount point where it resides from being unmounted. Actually, inotify itself will notify via IN_UNMOUNT events when that happens to one of your monitored paths.

Still, inotify is not perfect. Some of the most strong criticisms are:

  • Maximum number of inotify instances and watches per instance: The kernel imposes some (configurable) limits to the number of inotify instances a user can create (/proc/sys/fs/inotify/max_user_instances, e.g. 128) and also to the number of watches a given user can set per instance (/proc/sys/fs/inotify/max_user_watches, e.g. 8192). This effectively limits the amount of paths a user can monitor.
  • No recursive monitoring: There is no way to tell the kernel to request monitoring a given directory and all its subdirectories.
  • Map between watch descriptor and real path: The user needs to keep itself the map of which watch descriptor corresponds to which path. Not that this is a limitation, just a bit of a burden for the user who monitors lots of paths.

fanotify… FTW!

fanotify is the latest filesystem monitoring interface, officially available in a stable manner since Linux 2.6.37 (early 2011), but which has been around since a lot longer (2009).

The API to setup monitoring using fanotify is similar to that used in inotify; this is, we have a fanotify_init() syscall which will give the user a new fanotify file descriptor, and a fanotify_mark() syscall to add or remove marks (watches). The user-space application then just needs to poll() for POLLIN events in the single fanotify file descriptor, and read() information about what event happened in a fanotify_event_metadata struct.

Since the very beginning, the most advertised feature of fanotify was that it allowed recursive monitoring, which can be accomplished by using a special FAN_MARK_MOUNT when adding a new mark. Once such a mark is set for a mount point path, the user will get events in any directory available in the same mount.

Unlike inotify, the fanotify_event_metadata struct will not tell you on which file, path or watch an event happened. Instead, it will give you an open file descriptor to the exact file or directory where the event happened. Giving a file descriptor will let the user gather the full path of the file by readlink()-ing the /proc/self/fd file, so in some way it helps user-space as there is no longer the need to have a map of watches vs paths.

Giving an open file descriptor is the basis for the other new big feature that this monitoring system provides: file access control. A process using fanotify can request the kernel not only to be notified about events happening, but also can tell the kernel to allow or forbid access to open a given file by a given process. Just think of the most obvious use case for this feature, an antivirus software. An antivirus monitor needs to be able to analyze a file being opened before the user gets to open it, and then allow or disallow the open operation. Before this system was in place, antivirus programs usually relied on the out-of-tree maintained Dazuko kernel driver to provide the same level of file-access control. But now, fanotify provides a mechanism for such programs to let them decide whether a given user-space process will be able to access a file.

The last big improvement coming with fanotify is that the kernel will not only give the file descriptor of the file where the event happened, but also the PID of the program which caused the event to happen. This is very useful for different programs, if, for example, they want to ignore events created by themselves (e.g. Tracker’s writeback).

fanotify… WTF?

What not everyone seemed to understand, though, was that fanotify is not an inotify replacement. Let me re-state the same thing with other words: fanotify doesn’t provide the same set of events that inotify provides. The root issue of this is again the open file descriptor in the fanotify_event_metadata struct API which we talked about before. Using an open file descriptor to notify where an event happened directly breaks the possibility of notifying events like file deletions or renames/moves… So, fanotify does NOT notify file deletions, file renames or file moves. Of course, it also doesn’t provide cookies to match source vs destination move events, as inotify did. To be fair, let’s say that fanotify covers just a subset of the use cases you could have with inotify; but also adds some new ones.

The other big drawback of fanotify is that it currently is root-only (CAP_SYS_ADMIN-only to be more specific). This means that only the root user can request to use the monitoring capabilities provided by fanotify, and therefore non-root use cases (like the Tracker file indexer mentioned earlier) are unable to use it.

Examples!

I’ve prepared some simple examples showing all the previous technologies, so if you want to play more with them, take a look at:

So who uses fanotify nowadays?

There are lots of programs out there using inotify, but the same cannot be said for fanotify, even several years after having it publicly available in the upstream Linux kernel.

The following list of Free Software projects using fanotify is, from what I can gather, close to complete:

  • fatrace is a monitor of system-wide file access events, basically equivalent to the mount monitoring example provided in the previous section, but more polished.
  • systemd, the init system, uses fanotify‘s mount monitoring capabilities in its readahead implementation.
  • FirefoxOS uses fanotify in their disk space watcher implementation.

That list does not include any project using fanotify‘s file access control feature. Of course, several proprietary antivirus programs do make use of that feature.

Do you know of other projects using fanotify, or are you planning to migrate one? Do not hesitate to leave a comment in the post!

Posted in Blog Tagged with: ,

Cross-Compiling DirectFB for an Embedded Device

At Lanedo, we quite often deal with embedded hardware and need to compile packages for it. Most of the time, this is not done on the embedded hardware itself but on our more powerful desktop machines. And most of the time, the desktop machine has an Intel processor while the target device has for example an ARM processor, so we have to cross-compile the packages. Consider DirectFB for example, which we’ve recently been working with and because it’s a well behaved piece of software that is used to being cross-compiled.

The Basics

You need two things for cross-compiling: a compiler that can produce binaries for the target platform, and the target platform’s libraries your binaries link against, which is at least lowlevel stuff like libc, libm etc. You can set up all of that yourself, but this is beyond the scope of this post. Typically when you have embedded hardware or a development board, the vendor provides a bootloader, kernel image and root filesystem for installation on the device and also a toolchain for installation on your build machine, which contains the cross-compiler and a sysroot which is a tree of files that match the root filesystem that is installed on your embedded hardware. When cross-compiling, you can link against the libraries in that sysroot as if they were the ones on your target system.

Getting Started

Let’s assume you have a toolchain and sysroot from your device’s vendor and they install into the following directories:

/opt/vendor/toolchain
/opt/vendor/sysroot

It’s perhaps not enough to build your packages for the target system, often software depends on auxillary or their own build tools (needed to run on your build system). For building DirectFB from git, we need a tool called fluxcomp and DirectFB itself builds a tool called directfb-csource which may be needed when building DirectFB applications. We assume we do not want to pollute our build system with this so we set up a proper development prefix:

mkdir /local/dfb
mkdir /local/dfb/src

In order to set up the environment for that prefix, we create a file that contains all the variables needed:

# host-env.sh

PREFIX=/local/dfb

# set up the stuff we need anyway
export PATH=$PREFIX/bin:$PATH
export LD_LIBRARY_PATH=$PREFIX/lib:$LD_LIBRARY_PATH
export ACLOCAL_FLAGS="-I $PREFIX/share/aclocal"
export PKG_CONFIG_PATH=$PREFIX/lib/pkgconfig:$PKG_CONFIG_PATH

# set up stuff some legacy (non pkg-config) packages might need
export CPPFLAGS=-I$PREFIX/include
export LDFLAGS=-L$PREFIX/lib

# set the terminal title so we always see where we are
echo -en "\033]0;$PREFIX - local x86 env\a"

Now we are ready to build our local build tools:

source host-env.sh

cd flux
./configure --prefix=/local/dfb
make
make install

cd ../DirectFB
./configure --prefix=/local/dfb
make
make install

and we are set for the actual task…

Cross Compiling

In order to do that, we need the compiler and sysroot from above mentioned toolchain. They both live in their own prefixes, which we set up to be used very similarly to the above local environment:

# cross-env.sh

# This is where the cross compiler lives
TOOLCHAIN=/opt/vendor/toolchain

# Sets its binaries to be CC, LD etc.
# (examples from an ARM cross build)
export PATH=$TOOLCHAIN/bin:$PATH
export CC=arm-linux-gnueabi-gcc
export CXX=arm-linux-gnueabi-g++
export AR=arm-linux-gnueabi-ar
export RANLIB=arm-linux-gnueabi-ranlib
export LD=arm-linux-gnueabi-ld

# This is where the libraries of the target platform live
SYSROOT=/opt/vendor/sysroot

# We do NOT set PATH and LD_LIBRARY_PATH here because we
# can't run the binaries from SYSROOT on our build machine,
# but we want to link against libraries that are installed
# there. We also set pkg-config's default directory to the
# one in the SYSROOT, and clear the environment from any
# other local-machine architecture pkg-config path
# directories.
export PKG_CONFIG_LIBDIR=$SYSROOT/usr/lib/pkgconfig
export PKG_CONFIG_PATH=""
export ACLOCAL_FLAGS="-I $SYSROOT/usr/share/aclocal"
export CPPFLAGS=-I$SYSROOT/usr/include
export LDFLAGS=-L$SYSROOT/usr/lib

# This is our local prefix where we will install all
# cross-built stuff first, in order to verify that all
# builds fine. We set it up as proper devel prefix because
# later built packages might need the libraries we have
# installed there earlier. We also don't want to pollute
# SYSROOT with our own stuff.
CROSS_PREFIX=/local/dfb/sysroot

# We do NOT set PATH and LD_LIBRARY_PATH here because we can't
# run the binaries from CROSS_PREFIX on our build machine
export PKG_CONFIG_PATH=$CROSS_PREFIX/lib/pkgconfig:$PKG_CONFIG_PATH
export ACLOCAL_FLAGS="-I $CROSS_PREFIX/share/aclocal $ACLOCAL_FLAGS"
export CPPFLAGS="-I$CROSS_PREFIX/include $CPPFLAGS"
export LDFLAGS="-L$CROSS_PREFIX/lib $LDFLAGS"

# set the terminal title and prompt so we always see where we are
echo -en "\033]0;$CROSS_PREFIX - cross ARM env\a"
export PS1="[cross ARM] $PS1"

Now we are ready to start the compiling. It is important to use source on both host-env.sh and then cross-env.sh because we need the tools we built into the local build-machine prefix in order to build the cross compiled packages:

source host-env.sh
source cross-env.sh

cd DirectFB
./configure --prefix=/local/dfb/sysroot --host=arm-linux-gnueabi
make
make install

cd ../DirectFB-examples
./configure --prefix=/local/dfb/sysroot --host=arm-linux-gnueabi
make
make install

In this example we built DirectFB for the “arm-linux-gnueabi” platform. Sometimes, and depending on how your vendor’s toolchain and sysroot are built, it might be neccessary to additionally pass –with-sysroot=/opt/vendor/sysroot in order to make sure that the compiler considers this directory to be the root of the system, where things like libc live in the /lib or /usr/lib subdirectories. Not all toolchains need –with-sysroot so we left it out of the configure options in above example, to keep it simple. We also built DirectFB-examples which depends on the previously cross-compiled DirectFB. Depending on what is available on your embedded platform, you might have to pass additional options to DirectFB’s configure, such as –with-gfxdrivers=none or –without-mesa according to the target you’re building for.

If all that went fine and the files in /local/dfb/sysroot look like they should…

To The Device!

…we are ready to build and install the packages on our embedded device for testing.

We use exactly the same environment we used for cross-compiing, but we use slightly different configure options. Next you can either mount the device’s SD card, NFS-mount the device’s root filesystem or just have an empty directory to install to which you can later tar and untar with on the device. We will assume the device’s root file system is mounted at

/media/user/device

So let’s compile for this mount point. Note that we cannot use –prefix=/media/user/device, because –prefix tells configure where the files will be installed on the running system. We compile the packages for –prefix=/usr and just make install them to the mount point:

source host-env.sh
source cross-env.sh

cd DirectFB
# clean the source directory, so all traces of the previous prefix are gone,
# or use a separate source directory for buliding for device installation
./configure --prefix=/usr --sysconfdir=/etc --host=arm-linux-gnueabi
make
make DESTDIR=/media/user/device install

cd ../DirectFB-examples
# clean the source directory, so all traces of the previous prefix are gone,
# or use a separate source directory for buliding for device installation
./configure --prefix=/usr --sysconfdir=/etc --host=arm-linux-gnueabi
make
make DESTDIR=/media/user/device install

Now we can verify the installed files in /media/user/device, unmount the volume, perform a test run on the device. With a little luck, we didn’t mess up and everything works 🙂

If you found these steps useful and have any feedback about how we can improve them, feel free to add a comment below or contact us at Lanedo

Good luck cross compiling!

Posted in Blog, Development Tagged with: , , , , , ,