Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Training in mixed precision is more involved. It requires some code changes (though frameworks have automated a lot for you) and you'll have to think carefully about where you inspect/modify gradients. It does provide a substantially larger speedup than using TF32. This can be particularly important for compute intensive tasks, such as hyperparameter tuning. It is therefore worth to try using mixed precision. Mixed precision has succesfully been used to train a large number of well known networks to proper convergence. If convergence does prove to be an issue for your particular task, switch back to TF32 and see if that helps. If it did, check your loss scaling code, inspect scaled/unscaled losses, and verify that nothing gets clipped due to underflow. 

Sources:

nvidia A100 whitepaper

nvidia deep learning performance tutorial

nvidia tensor core performance tutorial

For future reference:

nvidia H100 whitepaper