NB-Transformer Validation Examples
This directory contains three comprehensive validation scripts that reproduce all key results from the NB-Transformer paper.
Scripts Overview
1. validate_accuracy.py - Parameter Accuracy Validation
Compares parameter estimation accuracy and speed across three methods:
- NB-Transformer: Fast neural network approach
- Classical NB GLM: Maximum likelihood via statsmodels
- Method of Moments: Fastest baseline method
Usage:
python validate_accuracy.py --n_tests 1000 --output_dir accuracy_results/
Expected Results:
- NB-Transformer: 14.8x faster than classical GLM
- 47% better accuracy on log fold change (β)
- 100% success rate vs 98.7% for classical methods
2. validate_calibration.py - P-value Calibration Validation
Validates that p-values are properly calibrated under null hypothesis (β = 0).
Usage:
python validate_calibration.py --n_tests 10000 --output_dir calibration_results/
Expected Results:
- QQ plot should follow diagonal line
- Kolmogorov-Smirnov test p > 0.05 (well-calibrated)
- False positive rate ~5% at α = 0.05
3. validate_power.py - Statistical Power Analysis
Evaluates statistical power across experimental designs and effect sizes.
Usage:
python validate_power.py --n_tests 1000 --output_dir power_results/
Expected Results:
- Power increases with effect size and sample size
- Competitive performance across all designs (3v3, 5v5, 7v7, 9v9)
- Faceted power curves by experimental design
Requirements
All scripts require these additional dependencies for validation:
pip install statsmodels pandas matplotlib scikit-learn
For enhanced plotting (optional):
pip install plotnine theme-nxn
Output Files
Each script generates:
- Plots: Visualization of validation results
- CSV files: Detailed numerical results
- Summary reports: Text summaries of key findings
Performance Expectations
All validation scripts should complete within:
- Accuracy validation: ~2-5 minutes for 1000 tests
- Calibration validation: ~10-15 minutes for 10000 tests
- Power analysis: ~15-20 minutes for 1000 tests per design
Troubleshooting
Common Issues
- statsmodels not available: Install with
pip install statsmodels - Memory errors: Reduce
--n_testsparameter - Slow performance: Ensure PyTorch is using GPU/MPS if available
- Plot display errors: Plots save to files even if display fails
Expected Performance Metrics
Based on v13 model validation:
| Metric | NB-Transformer | Classical GLM | Method of Moments |
|---|---|---|---|
| Success Rate | 100.0% | 98.7% | 100.0% |
| Time (ms) | 0.076 | 1.128 | 0.021 |
| μ MAE | 0.202 | 0.212 | 0.213 |
| β MAE | 0.152 | 0.284 | 0.289 |
| α MAE | 0.477 | 0.854 | 0.852 |
Citation
If you use these validation scripts in your research, please cite:
@software{svensson2025nbtransformer,
title={NB-Transformer: Fast Negative Binomial GLM Parameter Estimation using Transformers},
author={Svensson, Valentine},
year={2025},
url={https://huggingface.co/valsv/nb-transformer}
}