Services - tools - models - for embedded software development
Embecosm divider strip
Prev  Next

7.3.3.  Compiler Profiling

Modern compilers, such as the GNU C++ compiler can optimize based on statistics from earlier runs of the compiled program. The program is compiled with options to gather statistics, run to create the statistics, then recompiled using the data from those statistics.

The latest versions of the GNU C++ compiler can use this for:

Some care is needed in using branch-profiling. It can interact badly with other systems (for example ccache). Although it has been part of the GNU C++ Compiler for some years, it must still be regarded as somewhat experimental in nature.

Profiling is enabled with the example Makefile by using the verilator-fast target. Statistics are gathered by compiling the model with -ftest-coverage and -fprofile-generate options and then running it. The options to be used in the subsequent optimizing recompile are passed as a macro, PROF_OPTS, for example:

make verilate-fast COMMAND_FILE=cf-optimized-8.scr NUM_RUNS=1000 \
     OPT="-O3" PROF_OPTS="-fbranch-probabilities"
	

Table 7.4 shows the impact of the different profiling options on the example design when compiled with the -Os option, the fastest option without profiling. The options are applied incrementally, in the order -fbranch-probabilities, -fvpt, -funroll-loops, -fpeel-loops and -ftracer.

Run Description

Build Time

Run Time

Performance

No profile optimization

26.23 s

12.24 s

96.41 kHz

Add -fbranch-probabilities

72.44 s

11.94 s

98.79 kHz

Add -fvpt

73.88 s

11.93 s

98.93 kHz

Add -funroll-loops

72.63 s

12.00 s

98.30 kHz

Add -fpeel-loops

72.65 s

12.02 s

98.17 kHz

Add -ftracer

72.65 s

11.99 s

98.42 kHz

Table 7.4.  Comparison of model performance using -Os and profiling.


Model build times are all substantially bigger because of the need to do a statistics gathering build and run. The results improve slightly for the first two optimizations (-fbranch-probabilities and -fvpt), but then fall off. This is not surprising. The benefit of -Os is compactness of code size. However -funroll-loops, -fpeel-loops and -ftracer all tend to increase code size—reducing the caching benefit with using -Os.

The added effort of profile directed compilation cannot be justified when using -Os.

The same exercise is repeated, but this time to see the effect on a compile using option -O3. The results are in Table 7.5.

Run Description

Build Time

Run Time

Performance

No profile optimization

35.35 s

12.39 s

95.25 kHz

Add -fbranch-probabilities

83.51 s

9.36 s

126.10 kHz

Add -fvpt

83.28 s

9.34 s

126.39 kHz

Add -funroll-loops

83.78 s

9.34 s

126.39 kHz

Add -fpeel-loops

84.61 s

9.27 s

127.32 kHz

Add -ftracer

85.87 s

9.13 s

129.28 kHz

Table 7.5.  Comparison of model performance using -O3 and profiling.


The results are dramatic. The -fbranch-probabilities optimization gives the majority of the benefit, but cumulatively the other four options further increase performance. The results are significantly better than using -Os.

The guideline advice is to use -O3 rather than -Os if you have the opportunity to profile your design.

Embecosm divider strip