Tutorials

DeepSpeed Mixture-of-Quantization (MoQ)

DeepSpeed introduces new support for model compression using quantization, called Mixture-of-Quantization (MoQ). MoQ is designed on top of QAT (Quantization...

Installation Details

The quickest way to get started with DeepSpeed is via pip, this will install the latest release of DeepSpeed which is not tied to specific PyTorch or CUDA ve...

Getting Started with DeepSpeed on Azure

This tutorial will help you get started running DeepSpeed on Azure virtual machines. Looking forward, we will be integrating these techniques and additional ...

Flops Profiler

Measure the parameters, latency, and floating point operations of your model

Megatron-LM GPT2

If you haven’t already, we advise you to first read through the Getting Started guide before stepping through this tutorial.

1-Cycle Schedule

This tutorial shows how to implement 1Cycle schedules for learning rate and momentum in PyTorch.

Pipeline Parallelism

DeepSpeed v0.3 includes new support for pipeline parallelism! Pipeline parallelism improves both the memory and compute efficiency of deep learning training ...

DeepSpeed Sparse Attention

In this tutorial we describe how to use DeepSpeed Sparse Attention (SA) and its building-block kernels. The easiest way to use SA is through DeepSpeed launch...

DeepSpeed Transformer Kernel

This tutorial shows how to enable the DeepSpeed transformer kernel and set its different configuration parameters.

ZeRO-Offload

ZeRO-3 Offload consists of a subset of features in our newly released ZeRO-Infinity. Read our ZeRO-Infinity blog to learn more! We recommend that you read t...

Zero Redundancy Optimizer (ZeRO)

If you have not done so already, we advise that you read the DeepSpeed tutorials on Getting Started and Megatron-LM GPT-2 before stepping through this tutori...