GSO 2.0

The original GNU Superoptimizer (GSO) was pioneering, but has some limitations: it only handles arithmetic instructions, delivers a single result, generates impossible sequences, only supports a carry flag and has a very simple cost model. Its big advantage is that it is fast.

Since GSO was created in 1991, there have been many advances in superoptimization. Rather than being a single program, GSO 2.0 is a tool kit, allowing multiple approaches to be used.

The configurability is in four categories, machine architecture, iteration approach, testing approach and parallelisation

Machine Architecture

The machine state describes what the instructions can access

registers
flags
memory

The instructions are the operations that the processor can perform. This is the “simulation” aspect of the superoptimizer and can possibly be automatically generated from tools such as CGEN

The slots describe what the superoptimizer can change in a sequence, for example slots containing registers

add r0, r1

or a slot containing constants

add r0, #128

Here is an example for the x86 architecture.

x86_Machine mach;

...

// add ax, bx
Instruction *add = new x86_add();

vector<Slot*> slots = add->getSlots();

slots[0]->setValue(0);
slots[1]->setValue(1);

add->execute(&mach, slots);

Approach to Iteration

There are a number of approaches to iteration supported.

Brute Force Iteration

This is the original approach, where the superoptimizer iterates over a sequence of items, visiting every combination

vector values = {0,1,2,3};
vector<vector::iterator> indices;

indices.push_back(values.begin());
indices.push_back(values.begin());

do {
     ...
} while(bruteforceIterate(values, indices));

Canonical Form Iteration

Iteration is optimized by ensuring that each register renaming gets tested only once when iterating over register slots

vector<Slot*> slots;
...
canonicalIterator c_iter(slots);

do {
     ...
} while(c_iter.next());

Constants Iteration

Iteration over constant slots is optimized by only testing the most frequently used constants.

vector<Slot*> slots;
...
constantIterator cn_iter(slots);

do {
     ...
} while(cn_iter.next());

Stochastic Iteration

Other approaches to iteration are possible. The stochastic iterator allows for directed exploration of the search space, similar to that provided in STOKE.

Testing

Central to superoptimization is testing whether a valid sequence has been found.

Instruction Sequence Testing

This approach checks for equivalent behavior of a sequence of instructions. This may be:

equivalence to another sequence of instructions; or
equivalence to an arbitrary function (as in the original GSO).

The testing is carried out by executing the sequence, with support for multiple tests to provide probabilistic correctness. For example

vector<Instruction*> insns, goal_insns;
vector<Slots*> slots, goal_slots;
... set up the above variables ...
bool success = testEquivalence(
                   insns, slots, 
                   goal_insns, slots);

Instruction Equivalence Checking

With this approach, we verify that the instructions are equivalent for all possible inputs. The approach is two step:

translate to SMT form; and
call an external solver.

This approach guarantees correctness, but is expensive. Instruction sequence testing should first be used for quick exclusion.

Peephole Superoptimizer Testing

When using superoptimization to create a peephole optimization pass for a compiler, it is useful to be able to test a large number of sequences simultaneously. this allows us to find many equivalent sequences at once, and these sequences form new peephole optimizations. The approach is described in a 2006 paper by Sorav Bansal and Alex Aiken.

Parallelisation

Brute force superoptimization is an embarrassingly parallel problem. GSO 2.0 is designed to take advantage of this on multiprocessor machines. It uses a generic master-slave MPI-based work queue.

The master distributes work items.
Each slave requests a work item, sending a message back if it was successful.

This is an example of the master code:

vector<vector> instruction_ids =     {{0, 1, 2}, 
    {0, 1, 2}, 
    {0, 1, 2}};
...
distributeTasks(instruction_ids);

This is an example of the slave code:

vector insn_ids;

while(getWork(insn_ids)) {
    ...
    // computation with insn_ids
    ...
}
{0, 0, 0}
{0, 0, 1}
{0, 0, 2}
{0, 2, 0}
...

Using GSO 2.0

GSO 2.0 is a work in progress. The first version was presented at the GNU Tools Cauldron in 2015. The current code base can be found on Embecosm GitHub. We welcome contributions from other developers.

Pages in this section

SECURE

The Security Enhancing Compilation for Use in Real Environments (SECURE) project is an InnovateUK supported research program by Embecosm which started in July 2017. Its goal is to take the latest academic ideas for improving security of code, and provide practical reference implications in the main open source compilers, GCC and LLVM. The project has […]

AAP

An Altruistic Processor (AAP) was created to advance compiler technology for deeply embedded processors with a restricted register set and complex memory structures. The first version is documented in Embecosm Application Note 13. It is a 16-bit Harvard architecture with multiple 16-bit word addressed code memories, multiple byte addressed data memories and between 4 and […]

TSERO

2Total Software Energy Reduction and Optimization (TSERO) was an InnovateUK supported follow-on project between Embecosm, Allinea (now part of ARM), Concertim and STFC Daresbury Hartree Center, running from June 2015 to September 2017. The project aimed to apply the techniques developed in the MAGEEC and Superoptimization projects to compiling energy efficient code for high performance computing […]

Superoptimization

Compilers translate software into code executed by actual processors, but that translation is not always as efficient as desired, even when using traditional optimization techniques are used. Superoptimization attempts to find the the theoretically best translation of a block of code (in terms of code size, execution speed and energy efficiency). The first attempts were […]

MAGEEC

The Machine Guided Energy Efficient Compiler (MAGEEEC) project was an InnovateUK supported research program led by Embecosm in partnership with the University of Bristol from May 2013 to November 2014. Its goal was to make machine learning feasible in commercial compilers, specifically for generating energy efficient code on deeply embedded systems. The target was to […]