1-Cycle Schedule
This tutorial shows how to implement 1Cycle schedules for learning rate and momentum in PyTorch.
This tutorial shows how to implement 1Cycle schedules for learning rate and momentum in PyTorch.
The quickest way to get started with DeepSpeed is via pip, this will install the latest release of DeepSpeed which is not tied to specific PyTorch or CUDA ve...
This tutorial will help you get started running DeepSpeed on Azure virtual machines. Looking forward, we will be integrating these techniques and additional ...
Train your first model with DeepSpeed!
Train your first GAN model with DeepSpeed!
First steps with DeepSpeed
This tutorial shows how to use to perform Learning Rate range tests in PyTorch.
If you haven’t already, we advise you to first read through the Getting Started guide before stepping through this tutorial.
In this tutorial, we are going to introduce the 1-bit Adam optimizer in DeepSpeed. 1-bit Adam can improve model training speed on communication-constrained c...
DeepSpeed v0.3 includes new support for pipeline parallelism! Pipeline parallelism improves both the memory and compute efficiency of deep learning training ...
In this tutorial, we are going to introduce the progressive layer dropping (PLD) in DeepSpeed and provide examples on how to use PLD. PLD allows to train Tra...
In this tutorial we describe how to use DeepSpeed Sparse Attention (SA) and its building-block kernels. The easiest way to use SA is through DeepSpeed launch...
This tutorial shows how to enable the DeepSpeed transformer kernel and set its different configuration parameters.
We recommend that you read the tutorials on Getting Started and ZeRO before stepping through this tutorial. ZeRO-Offload is a ZeRO optimization that offloa...
If you have not done so already, we advise that you read the DeepSpeed tutorials on Getting Started and Megatron-LM GPT-2 before stepping through this tutori...