fd33b4dc739ed339f325e271329c3708

This model is a fine-tuned version of deepseek-ai/DeepSeek-R1-Distill-Llama-8B on the nyu-mll/glue [mrpc] dataset. It achieves the following results on the evaluation set:

Loss: 8.6834
Data Size: 1.0
Epoch Runtime: 179.1882
Accuracy: 0.6226
F1 Macro: 0.5889
Rouge1: 0.6221
Rouge2: 0.0
Rougel: 0.6226
Rougelsum: 0.6232

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Accuracy	F1 Macro	Rouge1	Rougel	Rougelsum
No log	0	0	4.7487	0	6.7844	0.6197	0.5055	0.6203	0.6197	0.6191
No log	1	114	222.4741	0.0078	6.8698	0.3349	0.2509	0.3343	0.3355	0.3349
No log	2	228	92.6943	0.0156	16.7244	0.3349	0.2509	0.3343	0.3355	0.3349
No log	3	342	4.6442	0.0312	27.8821	0.6663	0.4049	0.6669	0.6663	0.6663
1.7692	4	456	47.7786	0.0625	43.7053	0.6651	0.3994	0.6657	0.6645	0.6651
1.7692	5	570	2.8836	0.125	63.4294	0.6651	0.3994	0.6657	0.6645	0.6651
1.7692	6	684	2.5582	0.25	90.4381	0.6651	0.3994	0.6657	0.6645	0.6651
0.9053	7	798	2.6115	0.5	115.1112	0.6651	0.3994	0.6657	0.6645	0.6651
2.6302	8.0	912	2.4972	1.0	178.0615	0.6651	0.3994	0.6657	0.6645	0.6651
2.3755	9.0	1026	2.4527	1.0	173.0147	0.6757	0.5915	0.6763	0.6748	0.6757
1.5813	10.0	1140	3.8940	1.0	179.4894	0.6114	0.5934	0.6114	0.6114	0.6114
0.9018	11.0	1254	4.9587	1.0	174.2554	0.6144	0.5902	0.6144	0.6144	0.6144
0.3964	12.0	1368	22.9097	1.0	163.4577	0.6527	0.5617	0.6521	0.6527	0.6533
0.3403	13.0	1482	8.6834	1.0	179.1882	0.6226	0.5889	0.6221	0.6226	0.6232

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.0.0
Tokenizers 0.22.1

Downloads last month: 5

Safetensors

Model size

2B params

Tensor type

F32

Model tree for contemmcm/fd33b4dc739ed339f325e271329c3708

Base model

deepseek-ai/DeepSeek-R1-Distill-Llama-8B

Finetuned

(140)

this model

contemmcm
/

fd33b4dc739ed339f325e271329c3708

fd33b4dc739ed339f325e271329c3708

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for contemmcm/fd33b4dc739ed339f325e271329c3708

Evaluation results