A Gentle Introduction to Distributed Training with DeepSpeed

Miguel González-Fierro
Jan. 30, 2022
 
 

DeepSpeed is an open-source library that facilitates the training of large deep learning models based on PyTorch. With minimal code changes, a developer can train a model on a single GPU machine, a single machine with multiple GPUs, or on multiple machines in a distributed fashion. In this post, we review DeepSpeed and explain how to get started.

 
 

deep learning; distributed training; deepspeed

 
 
 
 
 
blog comments powered by Disqus