ZenFlow

ZenFlow is an extension of ZeRO-Offload that decouples and asynchronously updates gradients during training. It reduces CPU-induced stalls when using offload optimizers, enabling smoother and faster training. Like ZeRO-Offload, ZenFlow requires no code changes, only configuration updates in your DeepSpeed JSON file.

We recommend that you read the tutorials on Getting Started and ZeRO before stepping through this tutorial. ZenFlow builds on top of ZeRO-Offload, so shared setup details can be found there.

Configuration Changes

To enable ZenFlow, simply add a zenflow section under the existing zero_optimization block in your DeepSpeed config:

{
  "zero_optimization": {
    "stage": 2,
    "offload_optimizer": {
      "device": "cpu",
      "pin_memory": true
    },
    "zenflow": {
      "topk_ratio": 0.05,
      "select_strategy": "auto",
      "select_interval": "auto",
      "update_interval": 4,
      "full_warm_up_rounds": 0,
      "overlap_step": true
    }
  }
}

Each field in the zenflow block controls selective gradient update behavior:

topk_ratio: Fraction of the most important gradients to update on GPU (e.g., 0.05 means top 5% by importance).
select_strategy: Strategy for selecting important gradients ("auto", "step", or custom).
select_interval: How often to re-select important gradients ("auto" or integer like 1).
update_interval: How often to update unimportant gradients ("auto" or an integer like 4, meaning every 4 steps).
full_warm_up_rounds: Number of initial steps with full gradient updates before selection begins.
overlap_step: Whether to overlap communication with computation (true enables it).

Recommended: Use "auto" for select_strategy, select_interval, and update_interval to enable adaptive behavior with minimal tuning.

You can continue using the same training setup and launch script as in the ZeRO-Offload tutorial, since ZenFlow builds directly on top of ZeRO Offload.

Quick Start: Fine-tuning Example

A complete fine-tuning example using ZenFlow is available in DeepSpeedExamples – ZenFlow Fine-Tuning on GLUE

This example shows how to fine-tune a GPT model on the GLUE benchmark with:

CPU optimizer offload
ZenFlow asynchronous updates

To run the example:

cd DeepSpeedExamples/training/DeepSpeed-ZenFlow
bash finetune_gpt_glue.sh

Refer to the README.md in the folder for setup instructions, dataset preparation, and configuration details.

Congratulations! You have successfully enabled ZenFlow for stall-free offloading.