PVT-Tiny on CIFAR-100 @ 224×224

This model is PVT-Tiny (Pyramid Vision Transformer) trained from scratch on CIFAR-100 (upsampled to 224×224) as a baseline for Vision GNN research.

Model Description

  • Architecture: PVT-Tiny (Pyramid Vision Transformer)
  • Dataset: CIFAR-100 (32×32 upsampled to 224×224)
  • Training: From scratch (no pretraining)
  • Purpose: Transformer baseline for validating Vision GNN performance

Training Details

  • Optimizer: AdamW (lr=5e-4, weight_decay=0.05)
  • Scheduler: CosineAnnealingLR (min_lr=1e-5)
  • Epochs: 100
  • Batch Size: 128
  • Normalization: CIFAR-100 statistics
  • Mixed Precision: Enabled

Model Architecture

PVT-Tiny uses a pyramid structure with spatial reduction attention:

  • Patch Size: 4×4
  • Embed Dims: [64, 128, 320, 512]
  • Num Heads: [1, 2, 5, 8]
  • Depths: [2, 2, 2, 2]
  • SR Ratios: [8, 4, 2, 1]
  • MLP Ratios: [8, 8, 4, 4]

Results

  • Best Test Acc@1: 50.35%
  • Best Test Acc@5: 75.69%
  • Final Test Acc@1: 50.08%
  • Final Test Acc@5: 74.80%
  • Training Time: 3.02 hours

Methodology

We follow the original PVT training protocol adapted for CIFAR-100 to ensure fair comparison with Vision GNN and CNN baselines. All models in the comparison are trained under identical conditions:

  • Same resolution (224×224)
  • Same data augmentation
  • No pretrained weights
  • Same CIFAR-100 normalization

Available Checkpoints

  • best_model.pth - Best performing checkpoint (50.35% Acc@1)
  • final_model.pth - Final model after all epochs
  • checkpoint_epoch_X.pth - Saved every 20 epochs

Usage

import torch
import torch.nn as nn
from functools import partial

# Use pvt-tiny configuration

# Load model
model = pvt_tiny(num_classes=100)

# Load trained weights
checkpoint = torch.load('best_model.pth')
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()

Citation

This implementation is based on:

Pyramid Vision Transformer:

@inproceedings{wang2021pyramid,
  title={Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions},
  author={Wang, Wenhai and Xie, Enze and Li, Xiang and Fan, Deng-Ping and Song, Kaitao and Liang, Ding and Lu, Tong and Luo, Ping and Shao, Ling},
  booktitle={ICCV},
  year={2021}
}

Training Protocol

Training follows the standard PVT protocol with AdamW optimizer and cosine annealing scheduler, ensuring reproducibility and fair comparison with other vision architectures.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support