PVT-Tiny on CIFAR-100 @ 224×224

This model is PVT-Tiny (Pyramid Vision Transformer) trained from scratch on CIFAR-100 (upsampled to 224×224) as a baseline for Vision GNN research.

Model Description

Architecture: PVT-Tiny (Pyramid Vision Transformer)
Dataset: CIFAR-100 (32×32 upsampled to 224×224)
Training: From scratch (no pretraining)
Purpose: Transformer baseline for validating Vision GNN performance

Training Details

Optimizer: AdamW (lr=5e-4, weight_decay=0.05)
Scheduler: CosineAnnealingLR (min_lr=1e-5)
Epochs: 100
Batch Size: 128
Normalization: CIFAR-100 statistics
Mixed Precision: Enabled

Model Architecture

PVT-Tiny uses a pyramid structure with spatial reduction attention:

Patch Size: 4×4
Embed Dims: [64, 128, 320, 512]
Num Heads: [1, 2, 5, 8]
Depths: [2, 2, 2, 2]
SR Ratios: [8, 4, 2, 1]
MLP Ratios: [8, 8, 4, 4]

Results

Best Test Acc@1: 50.35%
Best Test Acc@5: 75.69%
Final Test Acc@1: 50.08%
Final Test Acc@5: 74.80%
Training Time: 3.02 hours

Methodology

We follow the original PVT training protocol adapted for CIFAR-100 to ensure fair comparison with Vision GNN and CNN baselines. All models in the comparison are trained under identical conditions:

Same resolution (224×224)
Same data augmentation
No pretrained weights
Same CIFAR-100 normalization

Available Checkpoints

best_model.pth - Best performing checkpoint (50.35% Acc@1)
final_model.pth - Final model after all epochs
checkpoint_epoch_X.pth - Saved every 20 epochs

Usage

import torch
import torch.nn as nn
from functools import partial

# Use pvt-tiny configuration

# Load model
model = pvt_tiny(num_classes=100)

# Load trained weights
checkpoint = torch.load('best_model.pth')
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()

Citation

This implementation is based on:

Pyramid Vision Transformer:

@inproceedings{wang2021pyramid,
  title={Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions},
  author={Wang, Wenhai and Xie, Enze and Li, Xiang and Fan, Deng-Ping and Song, Kaitao and Liang, Ding and Lu, Tong and Luo, Ping and Shao, Ling},
  booktitle={ICCV},
  year={2021}
}

Training Protocol

Training follows the standard PVT protocol with AdamW optimizer and cosine annealing scheduler, ensuring reproducibility and fair comparison with other vision architectures.

Downloads last month: -; Downloads are not tracked for this model. How to track