File size: 4,112 Bytes
2706389
880f4cc
2706389
 
 
 
 
 
 
fe3a980
2e4a6d5
3597c7e
 
 
53b8ff4
 
3597c7e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e2abb4f
 
53b8ff4
 
 
fe3a980
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e2abb4f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fe3a980
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
---
license: apache-2.0
metrics:
- accuracy
pipeline_tag: image-classification
tags:
- art
library_name: transformers
---

# AI and Human Image Classification Model - v1
A fine-tuned model trained on 60,000 AI-generated and 60,000 human images.
The model demonstrates strong capability in detecting high-quality, state-of-the-art AI-generated images from models such as Midjourney v6.1, Flux 1.1 Pro, Stable Diffusion 3.5, GPT-4o, and other trending generation models.

Detailed Training Code is available here: [blog/ai/fine-tuning-siglip2](https://exnrt.com/blog/ai/fine-tuning-siglip2/)

## Evaluation Metrics

![Training Results](results.jpg)

### ๐Ÿ‹๏ธโ€โ™‚๏ธ Train Metrics
- **Epoch:** 5.0  
- **Total FLOPs:** 51,652,280,821 GF  
- **Train Loss:** 0.0799  
- **Train Runtime:** 2:39:49.46  
- **Train Samples/Sec:** 69.053  
- **Train Steps/Sec:** 4.316  

### ๐Ÿ“Š Evaluation Metrics (Fine-Tuned Model on Test Set)
- **Epoch:** 5.0  
- **Eval Accuracy:** 0.9923  
- **Eval Loss:** 0.0551  
- **Eval Runtime:** 0:02:35.78  
- **Eval Samples/Sec:** 212.533  
- **Eval Steps/Sec:** 6.644  

### ๐Ÿ”ฆ Prediction Metrics (on test set):

```
{
  "test_loss": 0.05508904904127121,
  "test_accuracy": 0.9923283699296264,
  "test_runtime": 167.1844,
  "test_samples_per_second": 198.039,
  "test_steps_per_second": 6.191
}
```

* Final Test Accuracy: 0.9923
* Final Test F1 Score (Macro): 0.9923
* Final Test F1 Score (Weighted): 0.9923

## Usage
```
pip install -q transformers torch Pillow accelerate
```
```python
import torch
from PIL import Image as PILImage
from transformers import AutoImageProcessor, SiglipForImageClassification

MODEL_IDENTIFIER = r"Ateeqq/ai-vs-human-image-detector"

# Device: Use GPU if available, otherwise CPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# Load Model and Processor
try:
    print(f"Loading processor from: {MODEL_IDENTIFIER}")
    processor = AutoImageProcessor.from_pretrained(MODEL_IDENTIFIER)

    print(f"Loading model from: {MODEL_IDENTIFIER}")
    model = SiglipForImageClassification.from_pretrained(MODEL_IDENTIFIER)
    model.to(device)
    model.eval()
    print("Model and processor loaded successfully.")

except Exception as e:
    print(f"Error loading model or processor: {e}")
    exit()

# Load and Preprocess the Image

IMAGE_PATH = r"/content/images.jpg" 
try:
    print(f"Loading image: {IMAGE_PATH}")
    image = PILImage.open(IMAGE_PATH).convert("RGB")
except FileNotFoundError:
    print(f"Error: Image file not found at {IMAGE_PATH}")
    exit()
except Exception as e:
    print(f"Error opening image: {e}")
    exit()

print("Preprocessing image...")
# Use the processor to prepare the image for the model
inputs = processor(images=image, return_tensors="pt").to(device)

# Perform Inference
print("Running inference...")
with torch.no_grad(): # Disable gradient calculations for inference
    outputs = model(**inputs)
    logits = outputs.logits

# Interpret the Results
# Get the index of the highest logit score -> this is the predicted class ID
predicted_class_idx = logits.argmax(-1).item()

# Use the model's config to map the ID back to the label string ('ai' or 'hum')
predicted_label = model.config.id2label[predicted_class_idx]

# Optional: Get probabilities using softmax
probabilities = torch.softmax(logits, dim=-1)
predicted_prob = probabilities[0, predicted_class_idx].item()

print("-" * 30)
print(f"Image: {IMAGE_PATH}")
print(f"Predicted Label: {predicted_label}")
print(f"Confidence Score: {predicted_prob:.4f}")
print("-" * 30)

# You can also print the scores for all classes:
print("Scores per class:")
for i, label in model.config.id2label.items():
    print(f"  - {label}: {probabilities[0, i].item():.4f}")
```

## Output
```
Using device: cpu
Model and processor loaded successfully.
Loading image: /content/images.jpg
Preprocessing image...
Running inference...
------------------------------
Image: /content/images.jpg
Predicted Label: ai
Confidence Score: 0.9996
------------------------------
Scores per class:
  - ai: 0.9996
  - hum: 0.0004
```