Hi I am trying to finetune a 35B model using lora (r and alpha 64) . My batch size is 2 and grad accumulation is 2 . I am using 8 A100 80GB gpus with deepspeed zero2 . I estimated it would require 3 gpus to do this . But I am not even able to achieve this on 8GPUs . I keep on getting CUDA OOM. I am unable to figure out why this disceperancy exists. It will be great if someone can explain why this is happening.
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| OOM error for lora multigpu finetuning | 0 | 37 | July 28, 2025 | |
| ORPO Trainer giving error when fine-tuning Llama3-8b in Multi-GPU environment | 8 | 1276 | May 27, 2024 | |
| Getting OOM during full-finetuning on kaggle T4s. Help please. Beginner here | 2 | 27 | April 22, 2025 | |
| LoRA Finetuning RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! | 4 | 273 | June 16, 2025 | |
| Training llama with Lora on multiple GPUs may exist bug | 10 | 9893 | August 25, 2023 |