Last weekend was the seventeenth edition of one of the largest and most important events on the Open Source calendar, which of course is FOSDEM. As usual the event was bustling with people discussing the various open source projects they use and contribute to. Impressively, the website describes there now being 8000+ people attending each year, which means there’s always a good spectrum of talks.
On Saturday, I attended various talks relating to the area of build infrastructure — as this is of general interest to me, whilst also being helpful for Embecosm — in addition to a couple of talks in more wide-reaching areas, notably licensing.
The first keynote I attended was on Software Heritage, which aims to preserve the cultural history of the software world. The impression I got was that it’s best described as a software-specific version of archive.org, holding as much free and open source software as it can. The archive is currently around 150TB and, given that is all de-duplicated source files, it’s an impressive feat.
On the infrastructure front, I attended a talk on Bazel, the build tool Google use internally for massive projects and that has started to be open sourced over the past couple of years. It has been designed for dealing with very large projects that are being worked on by thousands of engineers concurrently. Whilst at the current stage it might not necessarily provide me much benefit over CMake generated Ninja files, I look forward to seeing more about the distributed cache feature coming at some point in the future, as in even medium-sized teams, the idea of having one person trigger a build and then everyone use the intermediary files is compelling.
More eye opening in this area was the talk on Continuous Integration at a Distribution Level; how Ubuntu makes use of CI to make sure every package update doesn’t break anything. The idea of having a small change to a library generate potentially days of testing sounds overwhelming, but when that package is going to be used by millions of people, then that sounds necessary.
On the licensing front, the last keynote of the day was very well received; this was concerned GPL and whether legal enforcement has been a good thing or not. Whilst there was not a consensus in the room about how aggressive GPL enforcement should be, it was a very good talk and if you only catch up on one video from FOSDEM, I highly recommend this one.
Day two
Most of Sunday was spent in the LLVM Toolchain devroom, as that is my main focus right now and it is always useful to have a good understanding of recent developments, so that we can produce better toolchains for our customers.
Embecosm were giving two talks, as FOSDEM is a good place to start off discussions about making large infrastructure changes to LLVM, with the aim of continuing these discussions at later events such as EuroLLVM.
The first talk discussed the solution to a problem that many people in the LLVM community have solved independently in similar sounding ways, which is the issues caused by chars not being 8-bit. Like many pieces of software, LLVM unfortunately has an assumption that chars are 8-bit and therefore 8-bits is a valid degree of addressability for all architectures it generates code for — however, this is not the case for all processors. My colleague, Ed, described the changes which must be made to Clang and LLVM in order to seamlessly support architectures with non 8-bit chars, and discussed our plans for upstreaming these changes based on our AAP architecture.
In the second talk Jeremy presented some initial thoughts on integrating security features into the compiler. The idea here is that although a compiler cannot automatically make non-secure code secure, we can do a lot more to help prevent users from making simple mistakes. Particular focus is on minimising data leakage in the compiler, and assisting with the kind of transformations that are commonly done by hand in writing secure code, with both of these building on research from the LADA project.
For me, the two most interesting talks in the devroom were the ones on GlobalISel, a new instruction selection process that has recently been added to LLVM and that seems to be the direction LLVM’s instruction selector is going in. This is of interest to me because by replacing SelectionDAGISel — the existing instruction selector — I would need to re-implement a large chunk of the back-ends we work on, replacing one set of hooks for one instruction selector with the equivalent for this new approach, and it’s useful to understand how such a change should be made.
The other talk of particular interest was on LLD from a user’s perspective. LLD is a system linker using the LLVM framework, which aims to be a lot faster than other linkers, primarily through doing less and having less abstraction slow down the link process. It reminds me a lot of the motivations behind the gold linker, which has proved to be much faster than the BFD based linker. Whilst this linker is currently geared for the non-embedded case, it’s nice to see how this project is moving along and is one step closer to a full LLVM-only toolchain.