PVT-Tiny on CIFAR-100 @ 224×224
This model is PVT-Tiny (Pyramid Vision Transformer) trained from scratch on CIFAR-100 (upsampled to 224×224) as a baseline for Vision GNN research.
Model Description
- Architecture: PVT-Tiny (Pyramid Vision Transformer)
- Dataset: CIFAR-100 (32×32 upsampled to 224×224)
- Training: From scratch (no pretraining)
- Purpose: Transformer baseline for validating Vision GNN performance
Training Details
- Optimizer: AdamW (lr=5e-4, weight_decay=0.05)
- Scheduler: CosineAnnealingLR (min_lr=1e-5)
- Epochs: 100
- Batch Size: 128
- Normalization: CIFAR-100 statistics
- Mixed Precision: Enabled
Model Architecture
PVT-Tiny uses a pyramid structure with spatial reduction attention:
- Patch Size: 4×4
- Embed Dims: [64, 128, 320, 512]
- Num Heads: [1, 2, 5, 8]
- Depths: [2, 2, 2, 2]
- SR Ratios: [8, 4, 2, 1]
- MLP Ratios: [8, 8, 4, 4]
Results
- Best Test Acc@1: 50.35%
- Best Test Acc@5: 75.69%
- Final Test Acc@1: 50.08%
- Final Test Acc@5: 74.80%
- Training Time: 3.02 hours
Methodology
We follow the original PVT training protocol adapted for CIFAR-100 to ensure fair comparison with Vision GNN and CNN baselines. All models in the comparison are trained under identical conditions:
- Same resolution (224×224)
- Same data augmentation
- No pretrained weights
- Same CIFAR-100 normalization
Available Checkpoints
best_model.pth- Best performing checkpoint (50.35% Acc@1)final_model.pth- Final model after all epochscheckpoint_epoch_X.pth- Saved every 20 epochs
Usage
import torch
import torch.nn as nn
from functools import partial
# Use pvt-tiny configuration
# Load model
model = pvt_tiny(num_classes=100)
# Load trained weights
checkpoint = torch.load('best_model.pth')
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()
Citation
This implementation is based on:
Pyramid Vision Transformer:
@inproceedings{wang2021pyramid,
title={Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions},
author={Wang, Wenhai and Xie, Enze and Li, Xiang and Fan, Deng-Ping and Song, Kaitao and Liang, Ding and Lu, Tong and Luo, Ping and Shao, Ling},
booktitle={ICCV},
year={2021}
}
Training Protocol
Training follows the standard PVT protocol with AdamW optimizer and cosine annealing scheduler, ensuring reproducibility and fair comparison with other vision architectures.