ZenFlow
ZenFlow is an extension of ZeRO-Offload that decouples and asynchronously updates gradients during training. It reduces CPU-induced stalls when using offload optimizers, enabling smoother and faster training. Like ZeRO-Offload, ZenFlow requires no code changes, only configuration updates in your DeepSpeed JSON file.
We recommend that you read the tutorials on Getting Started and ZeRO before stepping through this tutorial. ZenFlow builds on top of ZeRO-Offload, so shared setup details can be found there.
Configuration Changes
To enable ZenFlow, simply add a zenflow
section under the existing zero_optimization
block in your DeepSpeed config:
{
"zero_optimization": {
"stage": 2,
"offload_optimizer": {
"device": "cpu",
"pin_memory": true
},
"zenflow": {
"topk_ratio": 0.05,
"select_strategy": "auto",
"select_interval": "auto",
"update_interval": 4,
"full_warm_up_rounds": 0,
"overlap_step": true
}
}
}
Each field in the zenflow
block controls selective gradient update behavior:
topk_ratio
: Fraction of the most important gradients to update on GPU (e.g., 0.05 means top 5% by importance).select_strategy
: Strategy for selecting important gradients ("auto"
,"step"
, or custom).select_interval
: How often to re-select important gradients ("auto"
or integer like 1).update_interval
: How often to update unimportant gradients ("auto"
or an integer like 4, meaning every 4 steps).full_warm_up_rounds
: Number of initial steps with full gradient updates before selection begins.overlap_step
: Whether to overlap communication with computation (true
enables it).
Recommended: Use "auto"
for select_strategy
, select_interval
, and update_interval
to enable adaptive behavior with minimal tuning.
You can continue using the same training setup and launch script as in the ZeRO-Offload tutorial, since ZenFlow builds directly on top of ZeRO Offload.
Quick Start: Fine-tuning Example
A complete fine-tuning example using ZenFlow is available in DeepSpeedExamples – ZenFlow Fine-Tuning on GLUE
This example shows how to fine-tune a GPT model on the GLUE benchmark with:
- CPU optimizer offload
- ZenFlow asynchronous updates
To run the example:
cd DeepSpeedExamples/training/DeepSpeed-ZenFlow
bash finetune_gpt_glue.sh
Refer to the README.md
in the folder for setup instructions, dataset preparation, and configuration details.
Congratulations! You have successfully enabled ZenFlow for stall-free offloading.