VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation Paper • 2412.00927 • Published Dec 1, 2024 • 29
LV-MAE: Learning Long Video Representations through Masked-Embedding Autoencoders Paper • 2504.03501 • Published Apr 4, 2025 • 2