Accelerator¶
The Accelerator is the main class provided by 🤗 Accelerate. It serves at the main entrypoint for
the API. To quickly adapt your script to work on any kind of setup with 🤗 Accelerate juste:
Initialize an
Acceleratorobject (that we will callacceleratorin the rest of this page) as early as possible in your script.Pass along your model(s), optimizer(s), dataloader(s) to the
prepare()method.(Optional but best practice) Remove all the
cuda()orto(device)in your code and let theacceleratorhandle device placement for you.Replace the
loss.backward()in your code byaccelerator.backward(loss).(Optional, when using distributed evaluation) Gather your predictions and labelsbefore storing them or using them for metric computation using
gather().
This is all what is needed in most cases. For more advanced case or a nicer experience here are the functions you
should search for and replace by the corresponding methods of your accelerator:
printstatements should be replaced byprint()to be only printed once per process.Use
is_local_main_process()for statements that should be executed once per server.Use
is_main_process()for statements that should be executed once only.Use
wait_for_everyone()to make sure all processes join that point before continuing (useful before a model save for instance).Use
unwrap_model()to unwrap your model before saving it.Use
save()instead oftorch.save.Use
clip_grad_norm_()instead oftorch.nn.utils.clip_grad_norm_andclip_grad_value_()instead oftorch.nn.utils.clip_grad_value_.
-
class
accelerate.Accelerator(device_placement: bool = True, split_batches: bool = False, fp16: bool = None, cpu: bool = False)[source]¶ Creates an instance of an accelerator for distributed training (on multi-GPU, TPU) or mixed precision training.
- Parameters
device_placement (
bool, optional, defaults toTrue) – Whether or not the accelerator should put objects on device (tensors yielded by the dataloader, model, etc…).split_batches (
bool, optional, defaults toFalse) – Whether or not the accelerator should split the batches yielded by the dataloaders across the devices. IfTruethe actual batch size used will be the same on any kind of distributed processes, but it must be a round multiple of thenum_processesyou are using. IfFalse, actual batch size used will be the one set in your script multiplied by the number of processes.fp16 (
bool, optional) – Whether or not to use mixed precision training. Will default to the value in the environment variableUSE_FP16, which will use the default value in the accelerate config of the current system or the flag passed with theaccelerate.launchcommand.cpu (
bool, optional) – Whether or not to force the script to execute on CPU. Will ignore GPU available if set toTrueand force the execution on one process only.
Attributes
device (
torch.device) – The device to use.state (
AcceleratorState) – The distributed setup state.
-
clip_grad_norm_(parameters, max_norm, norm_type=2)[source]¶ Should be used in place of
torch.nn.utils.clip_grad_norm_().
-
clip_grad_value_(parameters, clip_value)[source]¶ Should be used in place of
torch.nn.utils.clip_grad_value_().
-
gather(tensor)[source]¶ Gather the values in tensor accross all processes and concatenate them on the first dimension. Useful to regroup the predictions from all processes when doing evaluation.
Note
This gather happens in all processes.
- Parameters
tensor (
torch.Tensor, or a nested tuple/list/dictionary oftorch.Tensor) – The tensors to gather accross all processes.- Returns
The gathered tensor(s). Note that the first dimension of the result is num_processes multiplied by the first dimension of the input tensors.
- Return type
torch.Tensor, or a nested tuple/list/dictionary oftorch.Tensor
-
property
is_local_main_process¶ True for one process per server.
-
property
is_main_process¶ True for one process only.
-
prepare(*args)[source]¶ Prepare all objects passed in
argsfor distributed training and mixed precision, then return them in the same order.Accepts the following type of objects:
torch.utils.data.DataLoader: PyTorch Dataloadertorch.nn.Module: PyTorch Moduletorch.optim.Optimizer: PyTorch Optimizer
-
save(obj, f)[source]¶ Save the object passed to disk once per machine. Use in place of
torch.save.- Parameters
obj – The object to save.
f (
stroros.PathLike) – Where to save the content ofobj.