Accelerator¶

The Accelerator is the main class provided by 🤗 Accelerate. It serves at the main entrypoint for the API. To quickly adapt your script to work on any kind of setup with 🤗 Accelerate juste:

Initialize an Accelerator object (that we will call accelerator in the rest of this page) as early as possible in your script.
Pass along your model(s), optimizer(s), dataloader(s) to the prepare() method.
(Optional but best practice) Remove all the cuda() or to(device) in your code and let the accelerator handle device placement for you.
Replace the loss.backward() in your code by accelerator.backward(loss).
(Optional, when using distributed evaluation) Gather your predictions and labelsbefore storing them or using them for metric computation using gather().

This is all what is needed in most cases. For more advanced case or a nicer experience here are the functions you should search for and replace by the corresponding methods of your accelerator:

print statements should be replaced by print() to be only printed once per process.
Use is_local_main_process() for statements that should be executed once per server.
Use is_main_process() for statements that should be executed once only.
Use wait_for_everyone() to make sure all processes join that point before continuing (useful before a model save for instance).
Use unwrap_model() to unwrap your model before saving it.
Use save() instead of torch.save.
Use clip_grad_norm_() instead of torch.nn.utils.clip_grad_norm_ and clip_grad_value_() instead of torch.nn.utils.clip_grad_value_.

class accelerate.Accelerator(device_placement: bool = True, split_batches: bool = False, fp16: bool = None, cpu: bool = False)[source]¶

Creates an instance of an accelerator for distributed training (on multi-GPU, TPU) or mixed precision training.

Parameters

device_placement (bool, optional, defaults to True) – Whether or not the accelerator should put objects on device (tensors yielded by the dataloader, model, etc…).
split_batches (bool, optional, defaults to False) – Whether or not the accelerator should split the batches yielded by the dataloaders across the devices. If True the actual batch size used will be the same on any kind of distributed processes, but it must be a round multiple of the num_processes you are using. If False, actual batch size used will be the one set in your script multiplied by the number of processes.
fp16 (bool, optional) – Whether or not to use mixed precision training. Will default to the value in the environment variable USE_FP16, which will use the default value in the accelerate config of the current system or the flag passed with the accelerate.launch command.
cpu (bool, optional) – Whether or not to force the script to execute on CPU. Will ignore GPU available if set to True and force the execution on one process only.

Attributes

device (torch.device) – The device to use.

state (AcceleratorState) – The distributed setup state.

backward(loss)[source]¶: Use accelerator.backward(loss) in lieu of loss.backward().

clip_grad_norm_(parameters, max_norm, norm_type=2)[source]¶: Should be used in place of torch.nn.utils.clip_grad_norm_().

clip_grad_value_(parameters, clip_value)[source]¶: Should be used in place of torch.nn.utils.clip_grad_value_().

gather(tensor)[source]¶

Gather the values in tensor accross all processes and concatenate them on the first dimension. Useful to regroup the predictions from all processes when doing evaluation.

Note

This gather happens in all processes.

Parameters: tensor (torch.Tensor, or a nested tuple/list/dictionary of torch.Tensor) – The tensors to gather accross all processes.
Returns: The gathered tensor(s). Note that the first dimension of the result is num_processes multiplied by the first dimension of the input tensors.
Return type: torch.Tensor, or a nested tuple/list/dictionary of torch.Tensor

property is_local_main_process¶: True for one process per server.

property is_main_process¶: True for one process only.

prepare(*args)[source]¶

Prepare all objects passed in args for distributed training and mixed precision, then return them in the same order.

Accepts the following type of objects:

torch.utils.data.DataLoader: PyTorch Dataloader

torch.nn.Module: PyTorch Module

torch.optim.Optimizer: PyTorch Optimizer

print(*args, **kwargs)[source]¶: Use in replacement of print() to only print once per server.

save(obj, f)[source]¶

Save the object passed to disk once per machine. Use in place of torch.save.

Parameters

obj – The object to save.
f (str or os.PathLike) – Where to save the content of obj.

unwrap_model(model)[source]¶

Unwraps the model from the additional layer possible added by prepare(). Useful before saving the model.

Parameters: model (torch.nn.Module) – The model to unwrap.

wait_for_everyone()[source]¶: Will stop the execution of the current process until every other process has reached that point (so this does nothing when the script is only run in one process). Useful to do before saving a model.