DeepSpeed4Scienceイニシアティブ: 洗練されたAIシステムのテクノロジーにより大規模な科学的発見を可能に Permalink
title: “ZeRO-Inference: 20X faster inference through weight quantization and KV cache offloading” excerpt: “” link: https://github.com/microsoft/DeepSpeedExa...
Partition-aware ZeRO with up to 2x reduction in communication time!
DeepSpeed was used to train the world’s largest language model.