Introduction
With gperftools it is possible to profile a program and create a call-graph. Profiling can be done on several levels, including line level. Another advantage is that your binary can be used as-is. The tools work in principle with any binary (created by gnu or intel compilers).
Usage
From the 2022 software environment, the tools are activated by the commands
module load gperftools/2.10-GCCcore-11.3.0
The easiest method is to set the environment variable LD_PRELOAD to libprofiler.so at runtime, no re-compilation or relinking is required. The profile will be made during the run of the program, if the environment variable '''CPUPROFILE''' is set to the name of the file to contain the profiling information:
export LD_PRELOAD=libprofiler.so export CPUPROFILE=profile.prof
If you'd like to use profiling at the source-line level, compile your program with the '''-g''' flag and link with the '''-lprofiler''' flag.
The program '''pprof''' reads this file to produce a profile or a call-graph.
Important: gperftools don't work with statically linked programs.
To profile an MPI-program We have added an extra script to gperftools that allows it to be used for profiling MPI-applications. The script sets a different CPUPROFILE for each MPI-rank, so you'll get as many profiles as MPI-ranks, e.g.:
module load gperftools/2.10-GCCcore-11.3.0 srun -n 2 srun_gperftools my_mpiprogram arg1
This will generate two profiles, profile.out_0 and profile.out_1.
Example
This example demonstrates how to compile and link a simple program, how to show a profile based on lines and how to generate a call-graph.
> module load gperftools/2.10-GCCcore-11.3.0 > gfortran -O2 -g weps8.f -lprofiler > export CPUPROFILE=a.prof # a.prof will contain the profiling information > ./a.out > pprof --lines a.out a.prof Welcome to pprof! For help, type 'help'. (pprof) top10 Total: 265 samples 45 17.0% 17.0% 45 17.0% ludcmp_ weps8.f:415 40 15.1% 32.1% 40 15.1% tstran_ weps8.f:251 39 14.7% 46.8% 39 14.7% sort_ weps8.f:621 29 10.9% 57.7% 29 10.9% ludcmp_ weps8.f:404 27 10.2% 67.9% 27 10.2% sort_ weps8.f:623 10 3.8% 71.7% 10 3.8% ran1_ weps8.f:544 8 3.0% 74.7% 8 3.0% ran1_ weps8.f:533 7 2.6% 77.4% 7 2.6% ran1_ weps8.f:536 6 2.3% 79.6% 44 16.6% tstran_ weps8.f:252 6 2.3% 81.9% 6 2.3% times ??:0 6 2.3% 84.2% 6 2.3% sort_ weps8.f:619 (pprof) quit > pprof -pdf a.out a.prof > a.pdf # create a call-graph in a pdf formatted file
In this example, we see that 17.0% of the time was spent in line 415 in the file weps8.f in subroutine ludcmp.
The call-graph is available in the file a.pdf which you can transport to your workstation or view with the command '''gv''':
gv a.pdf
(Of course, this will only work if you are running a X-server on your workstation)
- More information about the toolkit is in the manual page:
man pprof
and on the web: https://github.com/gperftools/gperftools - More about modules