Kwargs Handlers¶
The following objects can be passed to the main Accelerator to customize how some PyTorch objects
related to distributed training or mixed precision are created.
DistributedDataParallelKwargs¶
-
class
accelerate.DistributedDataParallelKwargs(dim: int = 0, broadcast_buffers: bool = True, bucket_cap_mb: int = 25, find_unused_parameters: bool = False, check_reduction: bool = False, gradient_as_bucket_view: bool = False)[source]¶ Use this object in your
Acceleratorto customize how your model is wrapped in atorch.nn.parallel.DistributedDataParallel. Please refer to the documentation of this wrapper for more information on each argument.Warning
gradient_as_bucket_viewis only available in PyTorch 1.7.0 and later versions.
GradScalerKwargs¶
-
class
accelerate.GradScalerKwargs(init_scale: float = 65536.0, growth_factor: float = 2.0, backoff_factor: float = 0.5, growth_interval: int = 2000, enabled: bool = True)[source]¶ Use this object in your
Acceleratorto customize the behavior of mixed precision, specifically how thetorch.cuda.amp.GradScalerused is created. Please refer to the documentation of this scaler for more information on each argument.Warning
GradScaleris only available in PyTorch 1.5.0 and later versions.