Notwithstanding the Brussels public transport system, we made it to Day 2 good and early. At 9AM I turned up to hear Carlos Machado speak about Sign Language Free Linguistic Resources. Unfortunately, Carlos did not arrive— presumably a victim of the public transport roadworks program. However, the replacement was an informal presentation of the work covered by the Coding for Language Communities program, with minority languages from around the world. How do you get natural language processing tools for mainstream languages to work with other languages? Many of these tools work because of huge datasets; think how many English texts and spoken transcriptions live on the Internet. Niche languages need techniques that can work with relatively small datasets. Possibly languages that have no written form. The goal is not only to preserve, but to provide free tools for these communities. As an example the 1,000 languages of Indigenous Tweets.
The LLVM dev room was popular, but I managed to get in. Arnaud Grandmaison provided a good overview of the range of tools surrounding the main LLVM project. I had not realized that sanitizers were first incorporated in LLVM 3.1 and then ported to GCC (release 4.8 in March 2013). He also gave a timely reminder of the importance of LLVM’s linting tool, clang-tidy.
The two LLVM talks I really wanted to see of course were Ed Jones‘ overview of AAP and Simon Cook‘s talk on building the AAP simulator using LLVM MC. This is a big initiative from Embecosm, and it is good to see it getting some traction in the wider LLVM community. Always a good sign when a talk provokes a lot of questions and discussion.
The legal devroom is never a dull place and talks are often packed out. I went to hear Jeremiah Foster on the intersection of FOSS and safety critical software. Not entirely curiosity, we have customers asking for ISO 26262 compliance (automotive functional safety). I learned for the first time what SIL (safety integrity level) means. GPL code is rejected by regulators, because of the right to modify the software, yet we allow ordinary people’s to change their tyres or modify their vehicles. Why should that not also apply to software? Copyleft in particular is an excellent tool to ensure transparency that regulators require. Good discussion of how to gain SIL accreditation for compilers. GCC is 5-7 million lines of 30 year old code. It seems the best approach is “proven in use”, the approach used.
The last talk I attended before heading to the station was from Ian Romanick and on simulation as an aid to developing software for hardware. This is something we almost always have to do for compilers, but is much less common elsewhere. Ian was exploring issues we take for granted in the compiler work: the ability to develop software without hardware locking up, introspection into the hardware, the ability to progress without hardware being available and the ability to add validation to the simulator (“you tried to write to a read-only register”). A good analysis of when HDL models are good and when they are bad. The options to instrument software via macros (good for new projects) or by adding instrumentation to the kernel interface (for old projects).