---
license: apache-2.0
datasets:
- graziasveva93/patent_sdg_dataset
language:
- en
base_model:
- mpi-inno-comp/pat_specter
pipeline_tag: text-classification
---
## Patent SDG Classifier

This is the resulting classification model trained on the *"silver"* dateset builded as described in [*"From scratch to silver: Creating trustworthy training data
for patent-SDG classification using Large Language Models"*](https://www.arxiv.org/pdf/2509.09303).

The model is a multi-label classifier built on `mpi-inno-comp/pat_specter`, designed to classify patent texts against 17 United Nations Sustainable Development Goals (SDGs). 
The model was trained using auto-generated *soft* prediction vectors, where as soft we intended that are normalized weak-supervised annotation frequencies, as described in the paper, which are normalized importance scores for each SDG per patent, rather than traditional binary human-annotated labels. 
A custom BCE loss function was used, incorporating epsilon smoothing (0.02) to prevent overconfident predictions and entropy regularization (lambda=0.0005) to encourage a broader probability distribution. Square-root inverse class weights were applied to address class imbalance.
Training involved a learning rate of `2e-5`, a batch size of `128` per device, `20` epochs, and `0.01` weight decay. Early stopping was implemented with a patience of 3, monitoring f1_macro

### Usage

```
from transformers import pipeline
pipe = pipeline("text-classification",
               model="graziasveva93/patent_sdg_classifier")

inf = pipe({PATENT_TEXT}, top_k=None)
print(inf)
[{'label': 'sdg_14', 'score': 0.47907012701034546},
 {'label': 'sdg_9', 'score': 0.31178590655326843},
 {'label': 'sdg_3', 'score': 0.108424112200737},
 {'label': 'sdg_7', 'score': 0.08894453942775726},
 {'label': 'sdg_6', 'score': 0.05447851121425629},
 {'label': 'sdg_12', 'score': 0.04002637416124344},
 {'label': 'sdg_15', 'score': 0.021443745121359825},
 {'label': 'sdg_17', 'score': 0.01988437958061695},
 {'label': 'sdg_2', 'score': 0.019452063366770744},
 {'label': 'sdg_10', 'score': 0.01940927840769291},
 {'label': 'sdg_13', 'score': 0.018470678478479385},
 {'label': 'sdg_11', 'score': 0.01597837172448635},
 {'label': 'sdg_5', 'score': 0.01508121658116579},
 {'label': 'sdg_16', 'score': 0.014975948259234428},
 {'label': 'sdg_4', 'score': 0.01400495134294033},
 {'label': 'sdg_8', 'score': 0.011000115424394608},
 {'label': 'sdg_1', 'score': 0.009290466085076332}]
```