My Fine-Tuning loss is not decreasing

KingKosumi · September 10, 2025, 9:34am

Hey guys, im new to fine tuning, and i decided to fine tune a miniLM model with a triplet dataset,

but the problem im facing is that my training loss is not decreasing, I have read the sbert.net documentation, and following the sample code they have, but still, i dont seem to find the problem,

Please let me know if i did any silly mistakes, as im really new in this domain

The following is my training code,

```python
def finetune_sentencebert(config: dict, dataset_dict, log_file: str = “./log/experiment_log_v2.csv”):

"""

Fine-tune a SentenceTransformer model with flexible loss + logging.



Args:

    config (dict): Experiment config (model_name, epochs, batch_size, loss_type, etc.)

    dataset_dict (DatasetDict): HuggingFace DatasetDict with 'train', 'validation', 'test'

    log_file (str): Path to CSV log file



Returns:

    SentenceTransformer: Fine-tuned model

"""



\# ---------------- Load model ----------------

model = SentenceTransformer(config\["model_name"\])



\# ---------------- Dataset preparation ----------------



train_dataset = dataset_dict\["train"\]

eval_dataset = dataset_dict\["validation"\]

test_dataset = dataset_dict\["test"\]



\# ---------------- Training Arguments ----------------

args = SentenceTransformerTrainingArguments(

    

    output_dir=config\["output_path"\],

    num_train_epochs=config\["epochs"\],

    per_device_train_batch_size=config\["batch_size"\],

    per_device_eval_batch_size=config\["batch_size"\],

    learning_rate=config.get("learning_rate", 2e-5),

    batch_sampler=BatchSamplers.NO_DUPLICATES,

    

    \# Improved evaluation settings

    eval_strategy="steps",

    eval_steps= 50,  # REDUCED from 50 for more frequent evaluation

    

    \# FIXED: Save settings for best model only

    save_strategy="steps",              # Change from "epoch" to "steps"

    save_steps=50,                      # Same as eval_steps

    save_total_limit=1,                 # Keep only 1 checkpoint

    load_best_model_at_end=True,        # Change from False to True

    metric_for_best_model="eval_loss",

    greater_is_better=False,

    

    \# Optional: Add logging

    logging_steps=10,

    report_to=None,

)



\# ---------------- Evaluators ----------------



\# do NOT need to wrap, accepts raw lists of strings

evaluator_valid = TripletEvaluator(

    anchors= eval_dataset\["anchor"\],

    positives= eval_dataset\["positive"\],

    negatives= eval_dataset\["negative"\],

    name="validation_eval"

)



evaluator_test = TripletEvaluator(

    anchors= test_dataset\["anchor"\],

    positives= test_dataset\["positive"\],

    negatives= test_dataset\["negative"\],

    name="test_eval"

)



val_before = evaluator_valid(model)

test_before = evaluator_test(model)



print("\[Before Training\] Validation:", {k: f"{v:.4f}" for k, v in val_before.items()})

print("\[Before Training\] Test:", {k: f"{v:.4f}" for k, v in test_before.items()})



\# ---------------- Flexible loss ----------------

if config\["loss_type"\] == "triplet":

    train_loss = losses.TripletLoss(model=model, triplet_margin=config.get("triplet_margin", 0.5))

elif config\["loss_type"\] == "cosine":

    train_loss = losses.CosineSimilarityLoss(model=model)

elif config\["loss_type"\] == "contrastive":

    train_loss = losses.ContrastiveLoss(model=model)

else:

    raise ValueError(f"Unknown loss type: {config\['loss_type'\]}")



\# ---------------- Training ----------------

output_dir = config\["output_path"\]

os.makedirs(output_dir, exist_ok=True)



trainer = SentenceTransformerTrainer(

    model=model,

    args=args,

    train_dataset= train_dataset,   

    eval_dataset=eval_dataset,    

    loss=train_loss,

    evaluator=evaluator_valid if config.get("eval_during_training", True) else None,

    callbacks=\[EarlyStoppingCallback(early_stopping_patience=5)\]  # INCREASED from 3

)

trainer.train()



\# ---------------- After training evaluation ----------------

val_after = evaluator_valid(model)

test_after = evaluator_test(model)



print("\[After Training\] Validation:", {k: f"{v:.4f}" for k, v in val_after.items()})

print("\[After Training\] Test:", {k: f"{v:.4f}" for k, v in test_after.items()})



\# ---------------- Save best model ----------------

best_model_path = os.path.join(config\["output_path"\], "best_model")

os.makedirs(best_model_path, exist_ok=True)

model.save(best_model_path)

print(f"Saved best model to {best_model_path}")



\# ---------------- Logging ----------------

log_data = {

    "timestamp": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),

    \*\*config,

    "val_before": val_before,

    "test_before": test_before,

    "val_after": val_after,

    "test_after": test_after

}



if os.path.exists(log_file):

    df_log = pd.read_csv(log_file)

    df_log = pd.concat(\[df_log, pd.DataFrame(\[log_data\])\], ignore_index=True)

else:

    df_log = pd.DataFrame(\[log_data\])

    

df_log.to_csv(log_file, index=False)

print(f"Logged results to {log_file}")



return model

```

John6666 · September 10, 2025, 12:40pm

It seems TripletLoss is quite delicate…

Pimpcat-AU · September 10, 2025, 9:22pm

Can you show me a screen shot of the actual data please?

KingKosumi · September 11, 2025, 3:05am

the dataset is a triplet of anchor, positive, negative where they are same in words, but different in numbers

KingKosumi · September 11, 2025, 3:06am

i would like to know is my training set up correct? If so i could focus on the quality of my data when i do preprocessing

Pimpcat-AU · September 11, 2025, 4:57am

It’s a what of what? No idea what you’re talking about and neither of your replies are a screen shot.

KingKosumi · September 11, 2025, 5:55am

Sorry for not sharing the screenshot, essentially in the triplet, the sentences are the same, only the numbers in it are different

KingKosumi · September 11, 2025, 5:57am

Thanks for this forum link!

I tried the quick debug you have about over fitting it, my loss didnt change much to zero, so its probably my triplet set up problems, but im still new to this domain, so is my code setup alright for you?

John6666 · September 11, 2025, 6:05am

It’s hard for us to determine if it’s okay just based on the code. For example, if it’s an error, the error message is a chunk of information, so we might be able to figure it out somehow. But this time, it’s probably not an error—it’s likely an issue with consistency between the data and the code.

The loss function is “flexible” in polite terms, or “anything goes” in plain language, so whether it works correctly depends entirely on the actual function and the data itself…

Well, even without the actual data, if I have a little data of the same shape, I can probably make a guess…

KingKosumi · September 16, 2025, 7:54am

my dataset is a triplet of, so im trying to make my model to understand the differences in numerical numbers

anchor: I like to eat 5 apples
positive: I like to eat 6 apples
negative: I like to eat 10 apples

im currently facing problem is that, my eval_validation_eval_cosine_accuracy is dropping, while my loss lingering around without decreasing, i tried smaller margin as my sentences are very similar, but still only got a tiny bit of improvement from my base model

{‘eval_loss’: 0.500556468963623, ‘eval_validation_eval_cosine_accuracy’: 0.7423197627067566, ‘eval_runtime’: 53.6482, ‘eval_samples_per_second’: 148.654, ‘eval_steps_per_second’: 6.207, ‘epoch’: 3.38}
{‘loss’: 0.5023, ‘grad_norm’: 0.6130768656730652, ‘learning_rate’: 1.634351145038168e-05, ‘epoch’: 3.39}
{‘loss’: 0.4938, ‘grad_norm’: 0.4634580910205841, ‘learning_rate’: 1.6267175572519087e-05, ‘epoch’: 3.4}
{‘loss’: 0.5007, ‘grad_norm’: 0.4218961298465729, ‘learning_rate’: 1.619083969465649e-05, ‘epoch’: 3.41}
{‘loss’: 0.5046, ‘grad_norm’: 0.39359888434410095, ‘learning_rate’: 1.6114503816793893e-05, ‘epoch’: 3.41}
{‘loss’: 0.502, ‘grad_norm’: 0.29602673649787903, ‘learning_rate’: 1.6038167938931297e-05, ‘epoch’: 3.42}
{‘loss’: 0.5051, ‘grad_norm’: 0.3112173080444336, ‘learning_rate’: 1.5961832061068705e-05, ‘epoch’: 3.43}
{‘loss’: 0.5047, ‘grad_norm’: 0.3411155045032501, ‘learning_rate’: 1.588549618320611e-05, ‘epoch’: 3.44}
{‘loss’: 0.5008, ‘grad_norm’: 0.2764521539211273, ‘learning_rate’: 1.5809160305343514e-05, ‘epoch’: 3.44}

John6666 · September 16, 2025, 11:16am

Without seeing the code, it’s hard to say for sure, but I’ve tried to fill in the gaps based on my understanding…

Deliriousintent · September 19, 2025, 3:17am

If your negative prompt is to vague it can cancel out other pretrained metrics by creating a gradient explosion in the backpropagation.
The negative prompt should be directly related to the anchor and in direct opposition to the positive.
i.e. negative: I can not eat 10 apples.

Topic		Replies	Views
Fine Tuning A sentence transformer model with my own data Intermediate	2	3317	April 17, 2024
Weird losses while fine tuning Beginners	0	356	September 17, 2021
Finetuning with Trainer doesn't seem to learn since second epoch Beginners	3	2452	January 19, 2023
Increasing loss during LM fine-tuning on custom dataset Models	1	1321	December 31, 2021
Training loss does not go down during fine-tuning Beginners	0	1826	July 3, 2023

My Fine-Tuning loss is not decreasing

Related topics