tomaarsen HF Staff commited on
Commit
c5094d0
·
verified ·
1 Parent(s): ab4aedb

Add new CrossEncoder model

Browse files
README.md ADDED
@@ -0,0 +1,409 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - cross-encoder
5
+ - text-classification
6
+ - generated_from_trainer
7
+ - dataset_size:79561408
8
+ - loss:MSELoss
9
+ base_model: microsoft/MiniLM-L12-H384-uncased
10
+ datasets:
11
+ - tomaarsen/ms-marco-shuffled
12
+ pipeline_tag: text-classification
13
+ library_name: sentence-transformers
14
+ metrics:
15
+ - map
16
+ - mrr@10
17
+ - ndcg@10
18
+ model-index:
19
+ - name: CrossEncoder based on microsoft/MiniLM-L12-H384-uncased
20
+ results: []
21
+ ---
22
+
23
+ # CrossEncoder based on microsoft/MiniLM-L12-H384-uncased
24
+
25
+ This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [microsoft/MiniLM-L12-H384-uncased](https://huggingface.co/microsoft/MiniLM-L12-H384-uncased) on the [ms-marco-shuffled](https://huggingface.co/datasets/tomaarsen/ms-marco-shuffled) dataset using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
26
+
27
+ ## Model Details
28
+
29
+ ### Model Description
30
+ - **Model Type:** Cross Encoder
31
+ - **Base model:** [microsoft/MiniLM-L12-H384-uncased](https://huggingface.co/microsoft/MiniLM-L12-H384-uncased) <!-- at revision 44acabbec0ef496f6dbc93adadea57f376b7c0ec -->
32
+ - **Maximum Sequence Length:** 512 tokens
33
+ - **Number of Output Labels:** 1 label
34
+ - **Training Dataset:**
35
+ - [ms-marco-shuffled](https://huggingface.co/datasets/tomaarsen/ms-marco-shuffled)
36
+ <!-- - **Language:** Unknown -->
37
+ <!-- - **License:** Unknown -->
38
+
39
+ ### Model Sources
40
+
41
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
42
+ - **Documentation:** [Cross Encoder Documentation](https://www.sbert.net/docs/cross_encoder/usage/usage.html)
43
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
44
+ - **Hugging Face:** [Cross Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=cross-encoder)
45
+
46
+ ## Usage
47
+
48
+ ### Direct Usage (Sentence Transformers)
49
+
50
+ First install the Sentence Transformers library:
51
+
52
+ ```bash
53
+ pip install -U sentence-transformers
54
+ ```
55
+
56
+ Then you can load this model and run inference.
57
+ ```python
58
+ from sentence_transformers import CrossEncoder
59
+
60
+ # Download from the 🤗 Hub
61
+ model = CrossEncoder("tomaarsen/reranker-modernbert-base-msmarco-mse")
62
+ # Get scores for pairs of texts
63
+ pairs = [
64
+ ['what is a electrophoresis apparatus', 'Gel electrophoresis is a method for separation and analysis of macromolecules (DNA, RNA and proteins) and their fragments, based on their size and charge.el electrophoresis of large DNA or RNA is usually done by agarose gel electrophoresis. See the Chain termination method page for an example of a polyacrylamide DNA sequencing gel. Characterization through ligand interaction of nucleic acids or fragments may be performed by mobility shift affinity electrophoresis.'],
65
+ ['does creatine elevate creatinine levels', "Creatinine is produced from creatine, a molecule of major importance for energy production in muscles. Approximately 2% of the body's creatine is converted to creatinine every day. Creatinine is transported through the bloodstream to the kidneys."],
66
+ ['how to get rid of caffeine in the body', 'In addition to quickly curing caffeine withdrawal headaches, caffeine may help cure regular headaches and even migraines. Some studies have shown that small doses of caffeine taken in conjunction with pain killers may help the body absorb the medication more quickly and cure the headache in a shorter period of time.'],
67
+ ['define splanchnopleure', 'delineated, represented, delineate(verb) represented accurately or precisely. define, delineate(verb) show the form or outline of. The tree was clearly defined by the light; The camera could define the smallest object. specify, define, delineate, delimit, delimitate(verb) determine the essential quality of.'],
68
+ ['how many calories does a glass of wine', 'A large glass of wine contains as many calories as an ice cream. We often drink wine with a meal. But did you know that a large glass of wine (250ml) with 13% ABV can add 228 calories to your dinner? Thatâ\x80\x99s similar to an ice cream or two fish fingers. A standard glass of red or white wine (175ml) with 13% ABV could also contain up to 160 calories, similar to a slice of Madeira cake.'],
69
+ ]
70
+ scores = model.predict(pairs)
71
+ print(scores.shape)
72
+ # (5,)
73
+
74
+ # Or rank different texts based on similarity to a single text
75
+ ranks = model.rank(
76
+ 'what is a electrophoresis apparatus',
77
+ [
78
+ 'Gel electrophoresis is a method for separation and analysis of macromolecules (DNA, RNA and proteins) and their fragments, based on their size and charge.el electrophoresis of large DNA or RNA is usually done by agarose gel electrophoresis. See the Chain termination method page for an example of a polyacrylamide DNA sequencing gel. Characterization through ligand interaction of nucleic acids or fragments may be performed by mobility shift affinity electrophoresis.',
79
+ "Creatinine is produced from creatine, a molecule of major importance for energy production in muscles. Approximately 2% of the body's creatine is converted to creatinine every day. Creatinine is transported through the bloodstream to the kidneys.",
80
+ 'In addition to quickly curing caffeine withdrawal headaches, caffeine may help cure regular headaches and even migraines. Some studies have shown that small doses of caffeine taken in conjunction with pain killers may help the body absorb the medication more quickly and cure the headache in a shorter period of time.',
81
+ 'delineated, represented, delineate(verb) represented accurately or precisely. define, delineate(verb) show the form or outline of. The tree was clearly defined by the light; The camera could define the smallest object. specify, define, delineate, delimit, delimitate(verb) determine the essential quality of.',
82
+ 'A large glass of wine contains as many calories as an ice cream. We often drink wine with a meal. But did you know that a large glass of wine (250ml) with 13% ABV can add 228 calories to your dinner? Thatâ\x80\x99s similar to an ice cream or two fish fingers. A standard glass of red or white wine (175ml) with 13% ABV could also contain up to 160 calories, similar to a slice of Madeira cake.',
83
+ ]
84
+ )
85
+ # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
86
+ ```
87
+
88
+ <!--
89
+ ### Direct Usage (Transformers)
90
+
91
+ <details><summary>Click to see the direct usage in Transformers</summary>
92
+
93
+ </details>
94
+ -->
95
+
96
+ <!--
97
+ ### Downstream Usage (Sentence Transformers)
98
+
99
+ You can finetune this model on your own dataset.
100
+
101
+ <details><summary>Click to expand</summary>
102
+
103
+ </details>
104
+ -->
105
+
106
+ <!--
107
+ ### Out-of-Scope Use
108
+
109
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
110
+ -->
111
+
112
+ ## Evaluation
113
+
114
+ ### Metrics
115
+
116
+ #### Cross Encoder Reranking
117
+
118
+ * Datasets: `NanoMSMARCO`, `NanoNFCorpus` and `NanoNQ`
119
+ * Evaluated with [<code>CERerankingEvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CERerankingEvaluator)
120
+
121
+ | Metric | NanoMSMARCO | NanoNFCorpus | NanoNQ |
122
+ |:------------|:---------------------|:---------------------|:---------------------|
123
+ | map | 0.5979 (+0.1083) | 0.3464 (+0.0760) | 0.6886 (+0.2679) |
124
+ | mrr@10 | 0.5893 (+0.1118) | 0.6264 (+0.1266) | 0.6962 (+0.2695) |
125
+ | **ndcg@10** | **0.6585 (+0.1181)** | **0.3864 (+0.0613)** | **0.7366 (+0.2359)** |
126
+
127
+ #### Cross Encoder Nano BEIR
128
+
129
+ * Dataset: `NanoBEIR_mean`
130
+ * Evaluated with [<code>CENanoBEIREvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CENanoBEIREvaluator)
131
+
132
+ | Metric | Value |
133
+ |:------------|:---------------------|
134
+ | map | 0.5443 (+0.1507) |
135
+ | mrr@10 | 0.6373 (+0.1693) |
136
+ | **ndcg@10** | **0.5938 (+0.1385)** |
137
+
138
+ <!--
139
+ ## Bias, Risks and Limitations
140
+
141
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
142
+ -->
143
+
144
+ <!--
145
+ ### Recommendations
146
+
147
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
148
+ -->
149
+
150
+ ## Training Details
151
+
152
+ ### Training Dataset
153
+
154
+ #### ms-marco-shuffled
155
+
156
+ * Dataset: [ms-marco-shuffled](https://huggingface.co/datasets/tomaarsen/ms-marco-shuffled) at [0e80192](https://huggingface.co/datasets/tomaarsen/ms-marco-shuffled/tree/0e8019214fbbb17845d8fa1e4594882944716633)
157
+ * Size: 79,561,408 training samples
158
+ * Columns: <code>score</code>, <code>query</code>, and <code>passage</code>
159
+ * Approximate statistics based on the first 1000 samples:
160
+ | | score | query | passage |
161
+ |:--------|:-------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------|
162
+ | type | float | string | string |
163
+ | details | <ul><li>min: -11.8</li><li>mean: 0.75</li><li>max: 11.16</li></ul> | <ul><li>min: 9 characters</li><li>mean: 33.33 characters</li><li>max: 123 characters</li></ul> | <ul><li>min: 53 characters</li><li>mean: 348.8 characters</li><li>max: 1016 characters</li></ul> |
164
+ * Samples:
165
+ | score | query | passage |
166
+ |:--------------------------------|:----------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
167
+ | <code>6.732539335886638</code> | <code>what is shielding in welding</code> | <code>A benefit in using a shielding gas when welding is that there is no slag left on the weld that requires chipping and cleaning like that which is found on an arc weld. When a new wire welding machine is purchased, it does not come with a shielding gas tank. This must be purchased or rented from a gas supplier. Most welding supply stores also sell welding gasses and will be able to assist the buyer in a tank purchase.</code> |
168
+ | <code>-5.769245758652687</code> | <code>what degree do you need for physical therapy</code> | <code>E. Medicaid covers occupational therapy, physical therapy and speech therapy services when provided to eligible Medicaid beneficiaries under age 21 in the Child Health Services (EPSDT) Program by qualified occupational, physical or speech therapy providers.</code> |
169
+ | <code>9.033631960550943</code> | <code>cascade effect definition</code> | <code>In medicine, cascade effect may also refer to a chain of events initiated by an unnecessary test, an unexpected result, or patient or physician anxiety, which results in ill-advised tests or treatments that may cause harm to patients as the results are pursued.</code> |
170
+ * Loss: [<code>MSELoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#mseloss)
171
+
172
+ ### Evaluation Dataset
173
+
174
+ #### ms-marco-shuffled
175
+
176
+ * Dataset: [ms-marco-shuffled](https://huggingface.co/datasets/tomaarsen/ms-marco-shuffled) at [0e80192](https://huggingface.co/datasets/tomaarsen/ms-marco-shuffled/tree/0e8019214fbbb17845d8fa1e4594882944716633)
177
+ * Size: 79,561,408 evaluation samples
178
+ * Columns: <code>score</code>, <code>query</code>, and <code>passage</code>
179
+ * Approximate statistics based on the first 1000 samples:
180
+ | | score | query | passage |
181
+ |:--------|:--------------------------------------------------------------------|:------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------|
182
+ | type | float | string | string |
183
+ | details | <ul><li>min: -11.86</li><li>mean: 0.72</li><li>max: 11.07</li></ul> | <ul><li>min: 10 characters</li><li>mean: 33.83 characters</li><li>max: 101 characters</li></ul> | <ul><li>min: 50 characters</li><li>mean: 343.73 characters</li><li>max: 929 characters</li></ul> |
184
+ * Samples:
185
+ | score | query | passage |
186
+ |:---------------------------------|:-----------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
187
+ | <code>4.691008905569713</code> | <code>what is a electrophoresis apparatus</code> | <code>Gel electrophoresis is a method for separation and analysis of macromolecules (DNA, RNA and proteins) and their fragments, based on their size and charge.el electrophoresis of large DNA or RNA is usually done by agarose gel electrophoresis. See the Chain termination method page for an example of a polyacrylamide DNA sequencing gel. Characterization through ligand interaction of nucleic acids or fragments may be performed by mobility shift affinity electrophoresis.</code> |
188
+ | <code>0.7860534191131592</code> | <code>does creatine elevate creatinine levels</code> | <code>Creatinine is produced from creatine, a molecule of major importance for energy production in muscles. Approximately 2% of the body's creatine is converted to creatinine every day. Creatinine is transported through the bloodstream to the kidneys.</code> |
189
+ | <code>-1.2669222354888916</code> | <code>how to get rid of caffeine in the body</code> | <code>In addition to quickly curing caffeine withdrawal headaches, caffeine may help cure regular headaches and even migraines. Some studies have shown that small doses of caffeine taken in conjunction with pain killers may help the body absorb the medication more quickly and cure the headache in a shorter period of time.</code> |
190
+ * Loss: [<code>MSELoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#mseloss)
191
+
192
+ ### Training Hyperparameters
193
+ #### Non-Default Hyperparameters
194
+
195
+ - `eval_strategy`: steps
196
+ - `per_device_train_batch_size`: 64
197
+ - `per_device_eval_batch_size`: 64
198
+ - `learning_rate`: 8e-06
199
+ - `num_train_epochs`: 1
200
+ - `warmup_ratio`: 0.1
201
+ - `seed`: 12
202
+ - `bf16`: True
203
+ - `dataloader_num_workers`: 4
204
+ - `load_best_model_at_end`: True
205
+
206
+ #### All Hyperparameters
207
+ <details><summary>Click to expand</summary>
208
+
209
+ - `overwrite_output_dir`: False
210
+ - `do_predict`: False
211
+ - `eval_strategy`: steps
212
+ - `prediction_loss_only`: True
213
+ - `per_device_train_batch_size`: 64
214
+ - `per_device_eval_batch_size`: 64
215
+ - `per_gpu_train_batch_size`: None
216
+ - `per_gpu_eval_batch_size`: None
217
+ - `gradient_accumulation_steps`: 1
218
+ - `eval_accumulation_steps`: None
219
+ - `torch_empty_cache_steps`: None
220
+ - `learning_rate`: 8e-06
221
+ - `weight_decay`: 0.0
222
+ - `adam_beta1`: 0.9
223
+ - `adam_beta2`: 0.999
224
+ - `adam_epsilon`: 1e-08
225
+ - `max_grad_norm`: 1.0
226
+ - `num_train_epochs`: 1
227
+ - `max_steps`: -1
228
+ - `lr_scheduler_type`: linear
229
+ - `lr_scheduler_kwargs`: {}
230
+ - `warmup_ratio`: 0.1
231
+ - `warmup_steps`: 0
232
+ - `log_level`: passive
233
+ - `log_level_replica`: warning
234
+ - `log_on_each_node`: True
235
+ - `logging_nan_inf_filter`: True
236
+ - `save_safetensors`: True
237
+ - `save_on_each_node`: False
238
+ - `save_only_model`: False
239
+ - `restore_callback_states_from_checkpoint`: False
240
+ - `no_cuda`: False
241
+ - `use_cpu`: False
242
+ - `use_mps_device`: False
243
+ - `seed`: 12
244
+ - `data_seed`: None
245
+ - `jit_mode_eval`: False
246
+ - `use_ipex`: False
247
+ - `bf16`: True
248
+ - `fp16`: False
249
+ - `fp16_opt_level`: O1
250
+ - `half_precision_backend`: auto
251
+ - `bf16_full_eval`: False
252
+ - `fp16_full_eval`: False
253
+ - `tf32`: None
254
+ - `local_rank`: 0
255
+ - `ddp_backend`: None
256
+ - `tpu_num_cores`: None
257
+ - `tpu_metrics_debug`: False
258
+ - `debug`: []
259
+ - `dataloader_drop_last`: False
260
+ - `dataloader_num_workers`: 4
261
+ - `dataloader_prefetch_factor`: None
262
+ - `past_index`: -1
263
+ - `disable_tqdm`: False
264
+ - `remove_unused_columns`: True
265
+ - `label_names`: None
266
+ - `load_best_model_at_end`: True
267
+ - `ignore_data_skip`: False
268
+ - `fsdp`: []
269
+ - `fsdp_min_num_params`: 0
270
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
271
+ - `fsdp_transformer_layer_cls_to_wrap`: None
272
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
273
+ - `deepspeed`: None
274
+ - `label_smoothing_factor`: 0.0
275
+ - `optim`: adamw_torch
276
+ - `optim_args`: None
277
+ - `adafactor`: False
278
+ - `group_by_length`: False
279
+ - `length_column_name`: length
280
+ - `ddp_find_unused_parameters`: None
281
+ - `ddp_bucket_cap_mb`: None
282
+ - `ddp_broadcast_buffers`: False
283
+ - `dataloader_pin_memory`: True
284
+ - `dataloader_persistent_workers`: False
285
+ - `skip_memory_metrics`: True
286
+ - `use_legacy_prediction_loop`: False
287
+ - `push_to_hub`: False
288
+ - `resume_from_checkpoint`: None
289
+ - `hub_model_id`: None
290
+ - `hub_strategy`: every_save
291
+ - `hub_private_repo`: None
292
+ - `hub_always_push`: False
293
+ - `gradient_checkpointing`: False
294
+ - `gradient_checkpointing_kwargs`: None
295
+ - `include_inputs_for_metrics`: False
296
+ - `include_for_metrics`: []
297
+ - `eval_do_concat_batches`: True
298
+ - `fp16_backend`: auto
299
+ - `push_to_hub_model_id`: None
300
+ - `push_to_hub_organization`: None
301
+ - `mp_parameters`:
302
+ - `auto_find_batch_size`: False
303
+ - `full_determinism`: False
304
+ - `torchdynamo`: None
305
+ - `ray_scope`: last
306
+ - `ddp_timeout`: 1800
307
+ - `torch_compile`: False
308
+ - `torch_compile_backend`: None
309
+ - `torch_compile_mode`: None
310
+ - `dispatch_batches`: None
311
+ - `split_batches`: None
312
+ - `include_tokens_per_second`: False
313
+ - `include_num_input_tokens_seen`: False
314
+ - `neftune_noise_alpha`: None
315
+ - `optim_target_modules`: None
316
+ - `batch_eval_metrics`: False
317
+ - `eval_on_start`: False
318
+ - `use_liger_kernel`: False
319
+ - `eval_use_gather_object`: False
320
+ - `average_tokens_across_devices`: False
321
+ - `prompts`: None
322
+ - `batch_sampler`: batch_sampler
323
+ - `multi_dataset_batch_sampler`: proportional
324
+
325
+ </details>
326
+
327
+ ### Training Logs
328
+ | Epoch | Step | Training Loss | Validation Loss | NanoMSMARCO_ndcg@10 | NanoNFCorpus_ndcg@10 | NanoNQ_ndcg@10 | NanoBEIR_mean_ndcg@10 |
329
+ |:----------:|:---------:|:-------------:|:---------------:|:--------------------:|:--------------------:|:--------------------:|:---------------------:|
330
+ | -1 | -1 | - | - | 0.0219 (-0.5185) | 0.2538 (-0.0712) | 0.0498 (-0.4509) | 0.1085 (-0.3469) |
331
+ | 0.0000 | 1 | 64.054 | - | - | - | - | - |
332
+ | 0.0322 | 1000 | 55.8586 | - | - | - | - | - |
333
+ | 0.0643 | 2000 | 31.6183 | - | - | - | - | - |
334
+ | 0.0965 | 3000 | 13.1762 | - | - | - | - | - |
335
+ | 0.1286 | 4000 | 6.1773 | - | - | - | - | - |
336
+ | 0.1608 | 5000 | 4.2945 | 3.4889 | 0.6180 (+0.0776) | 0.3893 (+0.0643) | 0.7144 (+0.2137) | 0.5739 (+0.1185) |
337
+ | 0.1930 | 6000 | 3.6451 | - | - | - | - | - |
338
+ | 0.2251 | 7000 | 3.3041 | - | - | - | - | - |
339
+ | 0.2573 | 8000 | 2.9813 | - | - | - | - | - |
340
+ | 0.2894 | 9000 | 2.8473 | - | - | - | - | - |
341
+ | 0.3216 | 10000 | 2.6852 | 2.6960 | 0.6124 (+0.0720) | 0.3992 (+0.0742) | 0.7315 (+0.2309) | 0.5811 (+0.1257) |
342
+ | 0.3538 | 11000 | 2.6128 | - | - | - | - | - |
343
+ | 0.3859 | 12000 | 2.5252 | - | - | - | - | - |
344
+ | 0.4181 | 13000 | 2.461 | - | - | - | - | - |
345
+ | 0.4502 | 14000 | 2.3625 | - | - | - | - | - |
346
+ | 0.4824 | 15000 | 2.2746 | 2.0279 | 0.6397 (+0.0993) | 0.3963 (+0.0713) | 0.7369 (+0.2363) | 0.5910 (+0.1356) |
347
+ | 0.5146 | 16000 | 2.2551 | - | - | - | - | - |
348
+ | 0.5467 | 17000 | 2.2193 | - | - | - | - | - |
349
+ | 0.5789 | 18000 | 2.2099 | - | - | - | - | - |
350
+ | 0.6111 | 19000 | 2.1277 | - | - | - | - | - |
351
+ | 0.6432 | 20000 | 2.0969 | 1.9564 | 0.6468 (+0.1063) | 0.3936 (+0.0685) | 0.7391 (+0.2385) | 0.5932 (+0.1378) |
352
+ | 0.6754 | 21000 | 2.0624 | - | - | - | - | - |
353
+ | 0.7075 | 22000 | 2.0565 | - | - | - | - | - |
354
+ | 0.7397 | 23000 | 2.0226 | - | - | - | - | - |
355
+ | 0.7719 | 24000 | 1.9583 | - | - | - | - | - |
356
+ | 0.8040 | 25000 | 2.0048 | 1.8239 | 0.6575 (+0.1171) | 0.3884 (+0.0634) | 0.7339 (+0.2333) | 0.5933 (+0.1379) |
357
+ | 0.8362 | 26000 | 1.9861 | - | - | - | - | - |
358
+ | 0.8683 | 27000 | 1.9675 | - | - | - | - | - |
359
+ | 0.9005 | 28000 | 1.9531 | - | - | - | - | - |
360
+ | 0.9327 | 29000 | 1.9139 | - | - | - | - | - |
361
+ | **0.9648** | **30000** | **1.9224** | **1.7848** | **0.6585 (+0.1181)** | **0.3864 (+0.0613)** | **0.7366 (+0.2359)** | **0.5938 (+0.1385)** |
362
+ | 0.9970 | 31000 | 1.9059 | - | - | - | - | - |
363
+ | -1 | -1 | - | - | 0.6585 (+0.1181) | 0.3864 (+0.0613) | 0.7366 (+0.2359) | 0.5938 (+0.1385) |
364
+
365
+ * The bold row denotes the saved checkpoint.
366
+
367
+ ### Framework Versions
368
+ - Python: 3.11.10
369
+ - Sentence Transformers: 3.5.0.dev0
370
+ - Transformers: 4.49.0.dev0
371
+ - PyTorch: 2.6.0.dev20241112+cu121
372
+ - Accelerate: 1.2.0
373
+ - Datasets: 3.2.0
374
+ - Tokenizers: 0.21.0
375
+
376
+ ## Citation
377
+
378
+ ### BibTeX
379
+
380
+ #### Sentence Transformers
381
+ ```bibtex
382
+ @inproceedings{reimers-2019-sentence-bert,
383
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
384
+ author = "Reimers, Nils and Gurevych, Iryna",
385
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
386
+ month = "11",
387
+ year = "2019",
388
+ publisher = "Association for Computational Linguistics",
389
+ url = "https://arxiv.org/abs/1908.10084",
390
+ }
391
+ ```
392
+
393
+ <!--
394
+ ## Glossary
395
+
396
+ *Clearly define terms in order to be accessible across audiences.*
397
+ -->
398
+
399
+ <!--
400
+ ## Model Card Authors
401
+
402
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
403
+ -->
404
+
405
+ <!--
406
+ ## Model Card Contact
407
+
408
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
409
+ -->
config.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "microsoft/MiniLM-L12-H384-uncased",
3
+ "architectures": [
4
+ "BertForSequenceClassification"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 384,
11
+ "id2label": {
12
+ "0": "LABEL_0"
13
+ },
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 1536,
16
+ "label2id": {
17
+ "LABEL_0": 0
18
+ },
19
+ "layer_norm_eps": 1e-12,
20
+ "max_position_embeddings": 512,
21
+ "model_type": "bert",
22
+ "num_attention_heads": 12,
23
+ "num_hidden_layers": 12,
24
+ "pad_token_id": 0,
25
+ "position_embedding_type": "absolute",
26
+ "torch_dtype": "float32",
27
+ "transformers_version": "4.49.0.dev0",
28
+ "type_vocab_size": 2,
29
+ "use_cache": true,
30
+ "vocab_size": 30522
31
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4d14e27f50238e10d71861e7514c173ac49b910872d9e9fa6374a9f5aa82f400
3
+ size 133464836
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "BertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff