Training a Trillion Parameters with Pipeline Parallelism

September 8, 2020

DeepSpeed includes new support for pipeline parallelism! DeepSpeed’s training engine provides hybrid 3D parallelism for training models with over a trillion parameters. In addition to scaling to the extreme, we have demonstrated that hybrid parallelism accelerates training on clusters with low-bandwidth network by up to 7x.

For a brief overview and results including trillion-parameter capabilities, see our press release.
To get started with pipeline parallel training in DeepSpeed, we recommend our tutorial.
See our AlexNet example in DeepSpeedExamples.
Read our API documentation on readthedocs.

Twitter Facebook LinkedIn