Preparation
MAGEEC interfaces to the compiler through its command line, allowing individual optimizations and associated parameters to be set. The system is first trained on a large body of code, representing the broad spread of software with which the compiler is to be used. It is important that the programs provide the full range of language features, and that the spread is balanced. So we need programs which between them have a good mix of
- small and large functions;
- small, large and nested loops;
- no loops;
- integer arithmetic of all sizes;
- floating point arithmetic of all sizes; and
- scalars, small structs/vectors and large structs/vectors.
Individual programs will have only one or two of these features, providing plenty of differentiation from which optimizations can learn. A good training set should have at least 50 programs. For deeply embedded systems, the BEEBS test suite, developed by Embecosm and the University of Bristol provides an appropriate mix of programs.
Preparing the Training Set of Data
Combined elimination is used to find the best flag combination for each program in the test suite. For this technique, the program is first compiled with all optimizations enabled and its performance measured. The criterion for “performance” can be execution speed, code size, or energy consumed when the program runs. It is important that the figure is accurate and reproducible, so many runs may be needed to eliminate statistical variation, particularly on machines with caches.
The program is then recompiled, disabling one optimization in turn and the performance again measured. It is likely that some optimizations will be counter-productive with the program under test, and so removing those optimizations will increase performance. The optimization whose removal most improves performance is then removed from the set of optimizations to be applied.
The process is then repeated to remove another optimization from the set. The process of elimination is repeated until no optimization can increase performance by its removal. At this point we have the best set of optimizations to use with this particular program.
Almost all optimizations are turn on/turn off flags. However for optimizations which are parameterized, the process of elimination must explore changing the parameter value. This same approach can also be used to explore the values of tuning parameters associated with individual optimizations.
Once this approach has been completed, we have our training data: a set of programs, each with their own set of optimal optimizations.
Training
We now need to learn how the characteristics of the training programs predicts which optimizations will be best. For each program we use a set of abstract features, for example number of variables in a function, number of dominators of each basic block and so forth. We can use the compiler to generate this information, since it is part of the standard front-end analysis carried out when compiling.
Given the relatively small set of data, training is not usually too complex. We do not need deep training techniques such as neural networks. We have worked with a number of techniques, with one of the best being to use a C4.5 or C5 classifier to create a decision tree based on the features of the source program.
Using the Trained Compiler
When given a new program to compile, MAGEEC, the compiler determines the features of the source program and then applies those features to the decision tree from training to choose the best optimization set. The relatively small size of the training set means there may not be a good match, so one option is to use a nearest neighbour approach to find the most similar program when training and use its optimization settings. This is fast, with no significant impact on compile time performance.
The nearest neighbour approach can be extended to suggest several nearest neighbours, and the program can be compiled and its performance measured with each of these settings to determine which is best. This is not as simple as straight computation, but much faster and simpler than combined elimination. The data from this exercise can also be used to extend the trained database.
Implementing MAGEEC for a New Architecture
There are always optimization passes which are specific to an architecture, and which can only be used when training on that architecture. Different optimizations work differently on different architectures. So the training set and training database must be recreated for each new architecture.
Future Research & Development by Embecosm
MAGEEC is still an active research program. There are a number of areas for future development.
- We wish to tune optimization flags for individual source files and eventually for individual source functions. This will give us a much larger training set for the machine learning phase, improving the quality of generated code.
- The choice of abstract features in the source code is critical to good machine learning. The current set were created for the earlier MILEPOST EU research project, but lack sufficient rigour in their choice. Principal component analysis (PCA) will allow us to identify a better set, but it requires much larger training sets. This should be possible once we are able to train on individual source files and functions.
- We have to date used simple machine learning techniques. Preliminary work with deeper techniques, such as genetic algorithms suggest these can yield much better results, which we wish to explore further.