--- license: apache-2.0 datasets: - graziasveva93/patent_sdg_dataset language: - en base_model: - mpi-inno-comp/pat_specter pipeline_tag: text-classification --- ## Patent SDG Classifier This is the resulting classification model trained on the *"silver"* dateset builded as described in [*"From scratch to silver: Creating trustworthy training data for patent-SDG classification using Large Language Models"*](https://www.arxiv.org/pdf/2509.09303). The model is a multi-label classifier built on `mpi-inno-comp/pat_specter`, designed to classify patent texts against 17 United Nations Sustainable Development Goals (SDGs). The model was trained using auto-generated *soft* prediction vectors, where as soft we intended that are normalized weak-supervised annotation frequencies, as described in the paper, which are normalized importance scores for each SDG per patent, rather than traditional binary human-annotated labels. A custom BCE loss function was used, incorporating epsilon smoothing (0.02) to prevent overconfident predictions and entropy regularization (lambda=0.0005) to encourage a broader probability distribution. Square-root inverse class weights were applied to address class imbalance. Training involved a learning rate of `2e-5`, a batch size of `128` per device, `20` epochs, and `0.01` weight decay. Early stopping was implemented with a patience of 3, monitoring f1_macro ### Usage ``` from transformers import pipeline pipe = pipeline("text-classification", model="graziasveva93/patent_sdg_classifier") inf = pipe({PATENT_TEXT}, top_k=None) print(inf) [{'label': 'sdg_14', 'score': 0.47907012701034546}, {'label': 'sdg_9', 'score': 0.31178590655326843}, {'label': 'sdg_3', 'score': 0.108424112200737}, {'label': 'sdg_7', 'score': 0.08894453942775726}, {'label': 'sdg_6', 'score': 0.05447851121425629}, {'label': 'sdg_12', 'score': 0.04002637416124344}, {'label': 'sdg_15', 'score': 0.021443745121359825}, {'label': 'sdg_17', 'score': 0.01988437958061695}, {'label': 'sdg_2', 'score': 0.019452063366770744}, {'label': 'sdg_10', 'score': 0.01940927840769291}, {'label': 'sdg_13', 'score': 0.018470678478479385}, {'label': 'sdg_11', 'score': 0.01597837172448635}, {'label': 'sdg_5', 'score': 0.01508121658116579}, {'label': 'sdg_16', 'score': 0.014975948259234428}, {'label': 'sdg_4', 'score': 0.01400495134294033}, {'label': 'sdg_8', 'score': 0.011000115424394608}, {'label': 'sdg_1', 'score': 0.009290466085076332}] ```