My Fine-Tuning loss is not decreasing

Hey guys, im new to fine tuning, and i decided to fine tune a miniLM model with a triplet dataset,

but the problem im facing is that my training loss is not decreasing, I have read the sbert.net documentation, and following the sample code they have, but still, i dont seem to find the problem,

Please let me know if i did any silly mistakes, as im really new in this domain :sad_but_relieved_face:

The following is my training code,

```python
def finetune_sentencebert(config: dict, dataset_dict, log_file: str = “./log/experiment_log_v2.csv”):

"""

Fine-tune a SentenceTransformer model with flexible loss + logging.



Args:

    config (dict): Experiment config (model_name, epochs, batch_size, loss_type, etc.)

    dataset_dict (DatasetDict): HuggingFace DatasetDict with 'train', 'validation', 'test'

    log_file (str): Path to CSV log file



Returns:

    SentenceTransformer: Fine-tuned model

"""



\# ---------------- Load model ----------------

model = SentenceTransformer(config\["model_name"\])



\# ---------------- Dataset preparation ----------------



train_dataset = dataset_dict\["train"\]

eval_dataset = dataset_dict\["validation"\]

test_dataset = dataset_dict\["test"\]



\# ---------------- Training Arguments ----------------

args = SentenceTransformerTrainingArguments(

    

    output_dir=config\["output_path"\],

    num_train_epochs=config\["epochs"\],

    per_device_train_batch_size=config\["batch_size"\],

    per_device_eval_batch_size=config\["batch_size"\],

    learning_rate=config.get("learning_rate", 2e-5),

    batch_sampler=BatchSamplers.NO_DUPLICATES,

    

    \# Improved evaluation settings

    eval_strategy="steps",

    eval_steps= 50,  # REDUCED from 50 for more frequent evaluation

    

    \# FIXED: Save settings for best model only

    save_strategy="steps",              # Change from "epoch" to "steps"

    save_steps=50,                      # Same as eval_steps

    save_total_limit=1,                 # Keep only 1 checkpoint

    load_best_model_at_end=True,        # Change from False to True

    metric_for_best_model="eval_loss",

    greater_is_better=False,

    

    \# Optional: Add logging

    logging_steps=10,

    report_to=None,

)



\# ---------------- Evaluators ----------------



\# do NOT need to wrap, accepts raw lists of strings

evaluator_valid = TripletEvaluator(

    anchors= eval_dataset\["anchor"\],

    positives= eval_dataset\["positive"\],

    negatives= eval_dataset\["negative"\],

    name="validation_eval"

)



evaluator_test = TripletEvaluator(

    anchors= test_dataset\["anchor"\],

    positives= test_dataset\["positive"\],

    negatives= test_dataset\["negative"\],

    name="test_eval"

)



val_before = evaluator_valid(model)

test_before = evaluator_test(model)



print("\[Before Training\] Validation:", {k: f"{v:.4f}" for k, v in val_before.items()})

print("\[Before Training\] Test:", {k: f"{v:.4f}" for k, v in test_before.items()})



\# ---------------- Flexible loss ----------------

if config\["loss_type"\] == "triplet":

    train_loss = losses.TripletLoss(model=model, triplet_margin=config.get("triplet_margin", 0.5))

elif config\["loss_type"\] == "cosine":

    train_loss = losses.CosineSimilarityLoss(model=model)

elif config\["loss_type"\] == "contrastive":

    train_loss = losses.ContrastiveLoss(model=model)

else:

    raise ValueError(f"Unknown loss type: {config\['loss_type'\]}")



\# ---------------- Training ----------------

output_dir = config\["output_path"\]

os.makedirs(output_dir, exist_ok=True)



trainer = SentenceTransformerTrainer(

    model=model,

    args=args,

    train_dataset= train_dataset,   

    eval_dataset=eval_dataset,    

    loss=train_loss,

    evaluator=evaluator_valid if config.get("eval_during_training", True) else None,

    callbacks=\[EarlyStoppingCallback(early_stopping_patience=5)\]  # INCREASED from 3

)

trainer.train()



\# ---------------- After training evaluation ----------------

val_after = evaluator_valid(model)

test_after = evaluator_test(model)



print("\[After Training\] Validation:", {k: f"{v:.4f}" for k, v in val_after.items()})

print("\[After Training\] Test:", {k: f"{v:.4f}" for k, v in test_after.items()})



\# ---------------- Save best model ----------------

best_model_path = os.path.join(config\["output_path"\], "best_model")

os.makedirs(best_model_path, exist_ok=True)

model.save(best_model_path)

print(f"Saved best model to {best_model_path}")



\# ---------------- Logging ----------------

log_data = {

    "timestamp": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),

    \*\*config,

    "val_before": val_before,

    "test_before": test_before,

    "val_after": val_after,

    "test_after": test_after

}



if os.path.exists(log_file):

    df_log = pd.read_csv(log_file)

    df_log = pd.concat(\[df_log, pd.DataFrame(\[log_data\])\], ignore_index=True)

else:

    df_log = pd.DataFrame(\[log_data\])

    

df_log.to_csv(log_file, index=False)

print(f"Logged results to {log_file}")



return model

```

1 Like

It seems TripletLoss is quite delicate

1 Like

Can you show me a screen shot of the actual data please?

1 Like

the dataset is a triplet of anchor, positive, negative where they are same in words, but different in numbers

1 Like

i would like to know is my training set up correct? If so i could focus on the quality of my data when i do preprocessing :thinking:

1 Like

It’s a what of what? No idea what you’re talking about and neither of your replies are a screen shot.

2 Likes

Sorry for not sharing the screenshot, essentially in the triplet, the sentences are the same, only the numbers in it are different

1 Like

Thanks for this forum link!

I tried the quick debug you have about over fitting it, my loss didnt change much to zero, so its probably my triplet set up problems, but im still new to this domain, so is my code setup alright for you?

1 Like

It’s hard for us to determine if it’s okay just based on the code. For example, if it’s an error, the error message is a chunk of information, so we might be able to figure it out somehow. But this time, it’s probably not an error—it’s likely an issue with consistency between the data and the code.

The loss function is “flexible” in polite terms, or “anything goes” in plain language, so whether it works correctly depends entirely on the actual function and the data itself…

Well, even without the actual data, if I have a little data of the same shape, I can probably make a guess…

my dataset is a triplet of, so im trying to make my model to understand the differences in numerical numbers

anchor: I like to eat 5 apples
positive: I like to eat 6 apples
negative: I like to eat 10 apples

im currently facing problem is that, my eval_validation_eval_cosine_accuracy is dropping, while my loss lingering around without decreasing, i tried smaller margin as my sentences are very similar, but still only got a tiny bit of improvement from my base model

{‘eval_loss’: 0.500556468963623, ‘eval_validation_eval_cosine_accuracy’: 0.7423197627067566, ‘eval_runtime’: 53.6482, ‘eval_samples_per_second’: 148.654, ‘eval_steps_per_second’: 6.207, ‘epoch’: 3.38}
{‘loss’: 0.5023, ‘grad_norm’: 0.6130768656730652, ‘learning_rate’: 1.634351145038168e-05, ‘epoch’: 3.39}
{‘loss’: 0.4938, ‘grad_norm’: 0.4634580910205841, ‘learning_rate’: 1.6267175572519087e-05, ‘epoch’: 3.4}
{‘loss’: 0.5007, ‘grad_norm’: 0.4218961298465729, ‘learning_rate’: 1.619083969465649e-05, ‘epoch’: 3.41}
{‘loss’: 0.5046, ‘grad_norm’: 0.39359888434410095, ‘learning_rate’: 1.6114503816793893e-05, ‘epoch’: 3.41}
{‘loss’: 0.502, ‘grad_norm’: 0.29602673649787903, ‘learning_rate’: 1.6038167938931297e-05, ‘epoch’: 3.42}
{‘loss’: 0.5051, ‘grad_norm’: 0.3112173080444336, ‘learning_rate’: 1.5961832061068705e-05, ‘epoch’: 3.43}
{‘loss’: 0.5047, ‘grad_norm’: 0.3411155045032501, ‘learning_rate’: 1.588549618320611e-05, ‘epoch’: 3.44}
{‘loss’: 0.5008, ‘grad_norm’: 0.2764521539211273, ‘learning_rate’: 1.5809160305343514e-05, ‘epoch’: 3.44}

1 Like

Without seeing the code, it’s hard to say for sure, but I’ve tried to fill in the gaps based on my understanding

If your negative prompt is to vague it can cancel out other pretrained metrics by creating a gradient explosion in the backpropagation.
The negative prompt should be directly related to the anchor and in direct opposition to the positive.
i.e. negative: I can not eat 10 apples.

1 Like