There are several methods, but you can use TransformersModel instead of HfApiModel and it will work. When using large models, powerful GPUs are required, so please be careful. Well, I think SmolLM below will work with about 1GB of VRAM…
Also, Ollama is faster and uses less VRAM, but I think it will be a little difficult to set up (compared to TransformersModel…).
from smolagents import TransformersModel
model = TransformersModel(model_id="HuggingFaceTB/SmolLM-135M-Instruct")