Training in mixed precision is more involved. It requires some code changes (though frameworks have automated a lot for you) and you'll have to think carefully about where you inspect/modify gradients. It does provide a substantially larger speedup than using TF32. This can be particularly important for compute intensive tasks, such as hyperparameter tuning. It is therefore worth to try using mixed precision. Mixed precision has succesfully been used to train a large number of well known networks to proper convergence. If convergence does prove to be an issue for your particular task, switch back to TF32 and see if that helps. If it did, check your loss scaling code, inspect scaled/unscaled losses, and verify that nothing gets clipped due to underflow.

Sources:

nvidia A100 whitepaper

nvidia deep learning performance tutorial

nvidia tensor core performance tutorial

For future reference:

nvidia H100 whitepaper

Space shortcuts

Page tree

Versions Compared

Old Version 20

New Version 21

Key

Sources:

For future reference:

Space shortcuts

Page tree

Page History

Versions Compared

Old Version 20

New Version 21

Key

Sources:

For future reference: