Training a Trillion Parameters with Pipeline Parallelism
DeepSpeed includes new support for pipeline parallelism! DeepSpeed’s training
engine provides hybrid 3D parallelism for training models with over a
trillion parameters. In addition to scaling to the extreme, we have
demonstrated that hybrid parallelism accelerates training on clusters with
low-bandwidth network by up to 7x.
- For a brief overview and results including trillion-parameter capabilities,
see our press release.
- To get started with pipeline parallel training in DeepSpeed, we recommend our tutorial.
- See our AlexNet example in DeepSpeedExamples.
- Read our API documentation on readthedocs.