visolex
/

visobert-spam-multiclass

+---
+license: apache-2.0
+base_model: uitnlp/visobert
+tags:
+- vietnamese
+- spam-detection
+- text-classification
+- e-commerce
+datasets:
+- ViSpamReviews
+metrics:
+- accuracy
+- macro-f1
+- macro-precision
+- macro-recall
+model-index:
+- name: visobert-spam-multi-class
+  results:
+  - task:
+      type: text-classification
+      name: Spam Review Detection
+    dataset:
+      name: ViSpamReviews
+      type: ViSpamReviews
+    metrics:
+      - type: accuracy
+        value: N/A
+      - type: macro-f1
+        value: N/A
+---
+# visobert-spam-multi-class: Spam Review Detection for Vietnamese Text
+This model is a fine-tuned version of [uitnlp/visobert](https://huggingface.co/uitnlp/visobert) on the **ViSpamReviews** dataset for spam review detection in Vietnamese e-commerce reviews.
+## Model Details
+* **Base Model**: `uitnlp/visobert`
+* **Description**: ViSoBERT - Vietnamese Social BERT
+* **Dataset**: ViSpamReviews (Vietnamese Spam Review Dataset)
+* **Fine-tuning Framework**: HuggingFace Transformers
+* **Task**: Spam Review Detection (multi-class)
+* **Number of Classes**: 4
+### Hyperparameters
+* Max sequence length: `256`
+* Learning rate: `5e-5`
+* Batch size: `32`
+* Epochs: `100`
+* Early stopping patience: `5`
+## Dataset
+The model was trained on the **ViSpamReviews** dataset, which contains 19,860 Vietnamese e-commerce review samples. The dataset includes:
+* **Train set**: 14,299 samples (72%)
+* **Validation set**: 1,590 samples (8%)
+* **Test set**: 3,971 samples (20%)
+### Label Distribution
+* **NO-SPAM** (0): Genuine product reviews
+* **SPAM-1** (1): Fake review (synthetic/manipulated reviews)
+* **SPAM-2** (2): Brand-only reviews (only mention brand without product details)
+* **SPAM-3** (3): Irrelevant reviews (unrelated content)
+## Results
+The model was evaluated on the test set with the following metrics:
+* Results: <INSERT_METRICS>
+## Usage
+You can use this model for spam review detection in Vietnamese text. Below is an example:
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import torch
+# Load model and tokenizer
+model_name = "visolex/visobert-spam-multiclass"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForSequenceClassification.from_pretrained(model_name)
+# Example review text
+text = "Sản phẩm này rất tốt, shop giao hàng nhanh!"
+# Tokenize
+inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
+# Predict
+with torch.no_grad():
+    outputs = model(**inputs)
+    predicted_class = outputs.logits.argmax(dim=-1).item()
+    probabilities = torch.softmax(outputs.logits, dim=-1)
+# Map to label
+label_map = {
+    0: "NO-SPAM",
+    1: "SPAM-1 (fake review)",
+    2: "SPAM-2 (brand-only)",
+    3: "SPAM-3 (irrelevant)"
+}
+predicted_label = label_map[predicted_class]
+confidence = probabilities[0][predicted_class].item()
+print(f"Text: {text}")
+print(f"Predicted: {predicted_label} (confidence: {confidence:.2%})")
+```
+## Citation
+If you use this model, please cite:
+```bibtex
+@misc{{
+  {model_key}_spam_detection,
+  title={{{description}}},
+  author={{ViSoLex Team}},
+  year={{2025}},
+  howpublished={{\url{{https://huggingface.co/{visolex/visobert-spam-multiclass}}}}}
+}}
+```
+## License
+This model is released under the Apache-2.0 license.
+## Acknowledgments
+* Base model: [{base_model}](https://huggingface.co/{base_model})
+* Dataset: ViSpamReviews (Vietnamese Spam Review Dataset)
+* ViSoLex Toolkit for Vietnamese NLP