The compiler tool chain is one of the largest and most complex components of any system, and increasingly will be based on open source code, either GCC or LLVM. On a Linux system only the operating system kernel and browser will have more lines of code. For a commercial system, the compiler has to be completely reliable—whatever the source code, it should produce correct, high performance binaries.
So how much does producing this large, complex and essential component cost? Thanks to open source not as much as you might think. In this post, I provide a real world case study, which shows how bringing up a new commercially robust compiler tool chain need not be a huge effort.
An analysis by David A Wheeler’s SLOCCount shows that GCC is over 5 million lines. LLVM is smaller at 1.6 million lines, but is newer, supports only C and C++ by default and has around one third the number of architectures included as targets. However a useful tool chain needs many more components.
In addition the tool chain needs testing. In most GNU tools, the regression test suite is included with the main source. However for LLVM, the regression tests are a separate code base of 500 thousand lines. Plus for any embedded system, it is likely a debug server will be needed to talk to the debugger to allow tests to be loaded.
Our interest is in a port of the tool chain that is robust for commercial deployment. Many PhD students round the world port compilers for their research, but their effort is dedicated to exploring a particular research theme. The resulting compiler is often produced quickly, but is neither complete, nor reliable—since this is not the point of a research program.
This article is instead concerned with creating a set of tools which reliably produce correct and efficient binaries for any source program in a commercial/industrial environment.
Fortunately most of this huge code base is completely generic. All mainstream compiler tools go to considerable efforts to provide a clean separation of target specific code, so the task of porting a compiler tool chain to a new architecture is a manageable task. There are five stages in porting a compiler tool chain to a new target.
Let us consider the general case. A new architecture with a large external user base, which must support C and C++ in both bare metal and embedded Linux targets. In this case it is likely that the architecture provides a range of implementations, from small processors used as bare metal or with RTOS in deeply embedded systems, to large processors capable of supporting a full application Linux environment.
Overall first production release of a such a tool chain takes 1-3 engineer years. The initial proof of concept tool chain should be completed in 3 months. Implementation of all the functionality then takes a further 6-9 months, with a further 3 months if both bare metal and Linux targets are to be supported.
Production testing takes at least 6 months, but with a large range of customer specific testing this can be as large as 12 months. Initial roll-out takes 3 months, but with a large user base, phased general release can take up to 9 months more.
Maintenance effort depends hugely on the size of the customer base reporting in issues and the number of new features needed. It can be as little as 0.5 engineer months per month, but is more usually 1 engineer month per month.
It is important to note that a complete team of engineers will work on this: compiler specialists, debugger experts, library implementation engineers and so on. Compiler engineering is one of the most technically demanding disciplines in computing, and no one engineer can have all the expertise needed.
Not everyone needs a full compiler release for a large number of external users. There are numerous application specific processors, particularly DSPs which are used solely in-house by one engineering company. Where such processors have proved commercially successful they have been developed and what was a tiny core programmed in assembler by one engineer has become a much more powerful processor with a large team of assembler programmers. In such cases moving to C compilation would mean a great increase in productivity and reduction in cost.
For such use cases, the tool chain need only support C, not C++ and a minimal C library is sufficient. There may well be a pre-existing assembler and linker that can be reused. This greatly reduces the effort and timescales to as little as one engineer year for a full production compiler.
The proof-of-concept still takes 3 months, but then completing full functionality can be achieved in as little as 3 more months. Production testing is still the largest effort, taking 3-6 months, but with a small user base 3 months is more than sufficient for roll out.
The tool chain still needs to be maintained, but for this simpler system with a small user base, an effort of 0.25 engineer months/month is typically enough.
For the smallest customers, it can be sufficient to stop after completing full functionality. If there are only a handful of standard programs to be compiled, it may be enough to demonstrate that the compiler handles these well and efficiently without progressing to full production testing.
In 2016, Embecosm was approached by an electronic design company, who for many years had used an in-house 16-bit word addressed DSP designed to meet the needs of their specialist area. This DSP was now on its third generation and they were conscious that they needed a great deal of assembler programming effort. This was aggravated by the standard codecs on which they relied having C reference implementations. They had an existing compiler, but it was very old and porting it to the new generation DSP was not feasible.
Embecosm were tasked with providing a LLVM based tool chain capable of compiling their C codecs and delivering high quality code. There was an assumption that this code would then be hand-modified if necessary. They had an existing assembler/linker, which worked by combining all the assembler into a single source file, resolving cross references and generating a binary file to load onto the DSP. The customer was also keen to build up in-house compiler expertise, so one of their engineers joined the Embecosm implementation team and has been maintaining the compiler since the end of the project.
In the first 3 months, we created a tool chain based on their existing assembler/disassembler. In order to use newlib, we created a pseudo-linker, which would extract the required files from newlib as source assembler to combine with the test program. Because silicon was not yet available, we tested against a Verilator model of the chip. For this we wrote a gdbserver, allowing GDB to talk to the model. In the absence of ELF, source level debugging was not possible, but GDB was capable of loading programs and determining results, sufficient for testing purpose. In the absence of 16-bit char support in LLVM, we used packed chars for the proof-of-concept. This meant many character based programs would not work, but was sufficient for this stage.
This allowed us to compile representative test programs and demonstrate that the compiler tool chain would work. It became clear that there were two major obstacles to achieving full-functionality: 1) lack of ELF binary support; and 2) lack of proper 16-bit character support.
For phase two, we implemented a GNU assembler/disassembler using CGEN, which required approximately 10 days of effort. We also implemented 16-bit character support for LLVM as documented in this blog post. With these two features, completing the tool chain functionality became much more straightfoward and we were able to run standard LLVM lit and GCC regression tests for the tool chain, the great majority of which passed. The DSP has a number of specialist modes for providing saturating fixed-point arithmetic. To support these we implemented specialist builtin and intrinsic functions.
At this point we had a compiler which correctly compiled the customer’s code. The ELF support meant techniques such as link-time optimization (LTO) and garbage collection of sections were possible, leading to successful optimization of the code so it met the memory constraints of the customer. With an investment of 120 engineer days, they had achieved their goal of being able to compile C code efficiently for their new DSP.
The customer decided they had all the functionality they needed by this point and decided no further work was required. Should they decide to make the compiler more widely available they have the option to continue with full production testing of the compiler tool chain.
Two factors made it possible to deliver a fully functional compiler tool chain in 120 engineer days.
If you would like to know more about bringing up a compiler tool chain for your processor, please get in touch.