A Gentle Introduction to Distributed Training with DeepSpeed

Miguel González-Fierro
Jan. 30, 2022

DeepSpeed is an open-source library that facilitates the training of large deep learning models based on PyTorch. With minimal code changes, a developer can train a model on a single GPU machine, a single machine with multiple GPUs, or on multiple machines in a distributed fashion. In this post, we review DeepSpeed and explain how to get started.


deep learning; distributed training; deepspeed

blog comments powered by Disqus