nb-transformer / examples /README.md

valsv

Upload folder using huggingface_hub

ccd282b verified 5 months ago

preview code

raw

history blame contribute delete

3.26 kB

NB-Transformer Validation Examples

This directory contains three comprehensive validation scripts that reproduce all key results from the NB-Transformer paper.

Scripts Overview

1. `validate_accuracy.py` - Parameter Accuracy Validation

Compares parameter estimation accuracy and speed across three methods:

NB-Transformer: Fast neural network approach
Classical NB GLM: Maximum likelihood via statsmodels
Method of Moments: Fastest baseline method

Usage:

python validate_accuracy.py --n_tests 1000 --output_dir accuracy_results/

Expected Results:

NB-Transformer: 14.8x faster than classical GLM
47% better accuracy on log fold change (β)
100% success rate vs 98.7% for classical methods

2. `validate_calibration.py` - P-value Calibration Validation

Validates that p-values are properly calibrated under null hypothesis (β = 0).

Usage:

python validate_calibration.py --n_tests 10000 --output_dir calibration_results/

Expected Results:

QQ plot should follow diagonal line
Kolmogorov-Smirnov test p > 0.05 (well-calibrated)
False positive rate ~5% at α = 0.05

3. `validate_power.py` - Statistical Power Analysis

Evaluates statistical power across experimental designs and effect sizes.

Usage:

python validate_power.py --n_tests 1000 --output_dir power_results/

Expected Results:

Power increases with effect size and sample size
Competitive performance across all designs (3v3, 5v5, 7v7, 9v9)
Faceted power curves by experimental design

Requirements

All scripts require these additional dependencies for validation:

pip install statsmodels pandas matplotlib scikit-learn

For enhanced plotting (optional):

pip install plotnine theme-nxn

Output Files

Each script generates:

Plots: Visualization of validation results
CSV files: Detailed numerical results
Summary reports: Text summaries of key findings

Performance Expectations

All validation scripts should complete within:

Accuracy validation: ~2-5 minutes for 1000 tests
Calibration validation: ~10-15 minutes for 10000 tests
Power analysis: ~15-20 minutes for 1000 tests per design

Troubleshooting

Common Issues

statsmodels not available: Install with pip install statsmodels
Memory errors: Reduce --n_tests parameter
Slow performance: Ensure PyTorch is using GPU/MPS if available
Plot display errors: Plots save to files even if display fails

Expected Performance Metrics

Based on v13 model validation:

Metric	NB-Transformer	Classical GLM	Method of Moments
Success Rate	100.0%	98.7%	100.0%
Time (ms)	0.076	1.128	0.021
μ MAE	0.202	0.212	0.213
β MAE	0.152	0.284	0.289
α MAE	0.477	0.854	0.852

Citation

If you use these validation scripts in your research, please cite:

@software{svensson2025nbtransformer,
  title={NB-Transformer: Fast Negative Binomial GLM Parameter Estimation using Transformers},
  author={Svensson, Valentine},
  year={2025},  
  url={https://huggingface.co/valsv/nb-transformer}
}

NB-Transformer Validation Examples

Scripts Overview

1. validate_accuracy.py - Parameter Accuracy Validation

2. validate_calibration.py - P-value Calibration Validation

3. validate_power.py - Statistical Power Analysis

Requirements

Output Files

Performance Expectations

Troubleshooting

Common Issues

Expected Performance Metrics

Citation

1. `validate_accuracy.py` - Parameter Accuracy Validation

2. `validate_calibration.py` - P-value Calibration Validation

3. `validate_power.py` - Statistical Power Analysis