DeepSpeed Mixture-of-Quantization (MoQ)
DeepSpeed introduces new support for model compression using quantization, called Mixture-of-Quantization (MoQ). MoQ is designed on top of QAT (Quantization...
DeepSpeed introduces new support for model compression using quantization, called Mixture-of-Quantization (MoQ). MoQ is designed on top of QAT (Quantization...
Contents Contents Introduction Write accelerator agnostic models Port accelerator runtime calls Port accelerator device name Te...
Contents Contents Introduction Intel Architecture (IA) CPU Intel XPU Huawei Ascend NPU Intel Gaudi
The quickest way to get started with DeepSpeed is via pip, this will install the latest release of DeepSpeed which is not tied to specific PyTorch or CUDA ve...
Contents Introduction Example Script Launching T5 11B Inference Performance Comparison OPT 13B Inference Performance Comparison ...
Automatically discover the optimal DeepSpeed configuration that delivers good training speed
This tutorial will help you get started with DeepSpeed on Azure.
Train your first model with DeepSpeed!
Log all DeepSpeed communication calls
Watch out! On 12/12/2022, we released DeepSpeed Data Efficiency Library which provides a more general curriculum learning support. This legacy curriculum lea...
What is DeepSpeed Data Efficiency: DeepSpeed Data Efficiency is a library purposely built to make better use of data, increases training efficiency, and impr...
This tutorial will show how to use DeepNVMe for data transfers between persistent storage and tensors residing in host or device memory. DeepNVMe improves th...
In this tutorial we describe how to enable DeepSpeed-Ulysses. DeepSpeed-Ulysses is a simple but highly communication and memory efficient mechanism sequence ...
1. What is DS4Sci_EvoformerAttention DS4Sci_EvoformerAttention is a collection of kernels built to scale the Evoformer computation to larger number of sequen...
Measure the parameters, latency, and floating-point operations of your model
Train your first GAN model with DeepSpeed!
First steps with DeepSpeed
DeepSpeed-Inference v2 is here and it’s called DeepSpeed-FastGen! For the best performance, latest features, and newest model support please see our DeepS...
This tutorial shows how to use to perform Learning Rate range tests in PyTorch.
If you haven’t already, we advise you to first read through the Getting Started guide before stepping through this tutorial.
Mixed Precision ZeRO++ (MixZ++) is a set of optimization strategies based on ZeRO and ZeRO++ to improve the efficiency and reduce memory usage for large mode...
DeepSpeed-MoE Inference introduces several important features on top of the inference optimization for dense models (DeepSpeed-Inference blog post). It embra...
In this tutorial, we introduce how to apply DeepSpeed Mixture of Experts (MoE) to NLG models, which reduces the training cost by 5 times and reduce the MoE m...
DeepSpeed v0.5 introduces new support for training Mixture of Experts (MoE) models. MoE models are an emerging class of sparsely activated models that have s...
What is DeepSpeed Compression: DeepSpeed Compression is a library purposely built to make it easy to compress models for researchers and practitioners while ...
Monitor your model’s training metrics live and log for future analysis
This tutorial shows how to implement 1Cycle schedules for learning rate and momentum in PyTorch.
Note: On 03/07/2022 we released 0/1 Adam, which is a new communication-efficient Adam optimizer partially following the 1-bit Adam’s design. Compared to the ...
Watch out! 1) The NCCL-based implementation requires PyTorch >= 1.8 (and NCCL >= 2.8.3 when you have 64 or more GPUs). See details below. 2) Although 1...
DeepSpeed v0.3 includes new support for pipeline parallelism! Pipeline parallelism improves both the memory and compute efficiency of deep learning training ...
In this tutorial, we are going to introduce the progressive layer dropping (PLD) in DeepSpeed and provide examples on how to use PLD. PLD allows to train Tra...
This tutorial describes how to use PyTorch Profiler with DeepSpeed.
In this tutorial we describe how to use DeepSpeed Sparse Attention (SA) and its building-block kernels. The easiest way to use SA is through DeepSpeed launch...
This tutorial shows how to enable the DeepSpeed transformer kernel and set its different configuration parameters.
DeepSpeed Universal Checkpointing feature is a powerful tool for saving and loading model checkpoints in a way that is both efficient and flexible, enabling ...
ZeRO-3 Offload consists of a subset of features in our newly released ZeRO-Infinity. Read our ZeRO-Infinity blog to learn more!
Watch out! 1) The NCCL-based implementation requires PyTorch >= 1.8 (and NCCL >= 2.8.3 when you have 64 or more GPUs). See details below. 2) Although 0...
ZeRO++ is a system of communication optimization strategies built on top of ZeRO to offer unmatched efficiency for large model training regardless of the sca...