Introduction

With gperftools it is possible to profile a program and create a call-graph. Profiling can be done on several levels, including line level. Another advantage is that your binary can be used as-is. The tools work in principle with any binary (created by gnu or intel compilers).

Usage

From the 2022 software environment, the tools are activated by the commands

module load gperftools/2.10-GCCcore-11.3.0

The easiest method is to set the environment variable LD_PRELOAD to libprofiler.so at runtime, no re-compilation or relinking is required. The profile will be made during the run of the program, if the environment variable '''CPUPROFILE''' is set to the name of the file to contain the profiling information:

export LD_PRELOAD=libprofiler.so
export CPUPROFILE=profile.prof

If you'd like to use profiling at the source-line level, compile your program with the '''-g''' flag and link with the '''-lprofiler''' flag.

The program '''pprof''' reads this file to produce a profile or a call-graph.

Important: gperftools don't work with statically linked programs.

To profile an MPI-program We have added an extra script to gperftools that allows it to be used for profiling MPI-applications. The script sets a different CPUPROFILE for each MPI-rank, so you'll get as many profiles as MPI-ranks, e.g.:

module load gperftools/2.10-GCCcore-11.3.0
srun -n 2 srun_gperftools my_mpiprogram arg1

This will generate two profiles, profile.out_0 and profile.out_1.

Example

This example demonstrates how to compile and link a simple program, how to show a profile based on lines and how to generate a call-graph.

> module load gperftools/2.10-GCCcore-11.3.0
> gfortran -O2 -g weps8.f -lprofiler
> export CPUPROFILE=a.prof   # a.prof will contain the profiling information
> ./a.out
> pprof --lines a.out a.prof
Welcome to pprof!  For help, type 'help'.
(pprof) top10
Total: 265 samples
      45  17.0%  17.0%       45  17.0% ludcmp_ weps8.f:415
      40  15.1%  32.1%       40  15.1% tstran_ weps8.f:251
      39  14.7%  46.8%       39  14.7% sort_ weps8.f:621
      29  10.9%  57.7%       29  10.9% ludcmp_ weps8.f:404
      27  10.2%  67.9%       27  10.2% sort_ weps8.f:623
      10   3.8%  71.7%       10   3.8% ran1_ weps8.f:544
       8   3.0%  74.7%        8   3.0% ran1_ weps8.f:533
       7   2.6%  77.4%        7   2.6% ran1_ weps8.f:536
       6   2.3%  79.6%       44  16.6% tstran_ weps8.f:252
       6   2.3%  81.9%        6   2.3% times ??:0
       6   2.3%  84.2%        6   2.3% sort_ weps8.f:619
(pprof) quit
> pprof -pdf a.out a.prof > a.pdf    # create a call-graph in a pdf formatted file

In this example, we see that 17.0% of the time was spent in line 415 in the file weps8.f in subroutine ludcmp.

The call-graph is available in the file a.pdf which you can transport to your workstation or view with the command '''gv''':

gv a.pdf

(Of course, this will only work if you are running a X-server on your workstation)

  • No labels