PyTorch Profiling

What is profiling?

According to wikipedia:

"Profiling is a form of dynamic program analysis that measures, for example, the space (memory) or time complexity of a program, the usage of particular instructions, or the frequency and duration of function calls. Most commonly, profiling information serves to aid program optimization, and more specifically, performance engineering."

What this means is that you analyse your program, trying to identify bottlenecks, and thereby optimizing it's execution. As an example, you might have an application needs to read a lot of input data (quite typical in machine learning!) during its run. A profile might show you that while your code runs, your processor is mostly idling since it is waiting for input data. This might give you a hint on how to optimize your program: maybe you can read in part of the input, and already start computing on that while you're loading in your next samples. Or: maybe you can copy your data to a faster disk, before you start running.

Why should I care about profiling?

You may know that training large models like GPT-3 takes several million dollars source and a few hundred MWh source. If the engineers that trained these models did not spend time on optimization, it might have been several million dollars and hunderds of MWh more.

Sure, the model you'd like to train is probably not quite as big. But maybe you want to train it 10000 times, because you want to do hyperparameter optimization. And even if you only train it once, it may take quite a bit of compute resources, i.e. money and energy.

When should I care about profiling?

Well, you should always care if your code runs efficiently, but there's different levels of caring.

From personal experience: if I know I'm going to run a code only once, for a few days, on a single GPU, I'll probably not create a full profile. What I would do is inspect my GPU and CPU utilization during my runs, just to see if it is somewhat efficient, and if I didn't make any obvious mistakes (e.g. accidentally not using the GPU, even if I have one available).

If I know that I'll run my code on multiple GPUs, for multiple days, (potentially) on multiple nodes, and/or I need to run it multiple times, I know that my resource footprint is going to be large, and it's worth spending some time and effort to optimize the code. That's when I'll create a profile. The good part is: the more often you do it, the quicker and more adapt you become at it.

Profiling tutorial for PyTorch

We have developed a self-contained Jupyter Notebook that teaches you how to create and interpret a PyTorch profile. You can find the full tutorial here. To run a Jupyter Notebook on Snellius or Lisa, please follow these instructions.

Space shortcuts

Page tree

What is profiling?

Why should I care about profiling?

When should I care about profiling?

Profiling tutorial for PyTorch