-
Energy-Based Transformers are Scalable Learners and Thinkers
Paper • 2507.02092 • Published • 69 -
MOSPA: Human Motion Generation Driven by Spatial Audio
Paper • 2507.11949 • Published • 24 -
Sound and Complete Neuro-symbolic Reasoning with LLM-Grounded Interpretations
Paper • 2507.09751 • Published • 1 -
Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling
Paper • 2507.07982 • Published • 33
Collections
Discover the best community collections!
Collections including paper arxiv:2507.11061
-
3D Congealing: 3D-Aware Image Alignment in the Wild
Paper • 2404.02125 • Published • 10 -
SpatialTracker: Tracking Any 2D Pixels in 3D Space
Paper • 2404.04319 • Published • 25 -
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
Paper • 2404.09956 • Published • 12 -
MeshLRM: Large Reconstruction Model for High-Quality Mesh
Paper • 2404.12385 • Published • 27
-
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning
Paper • 2311.12631 • Published • 15 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 58 -
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step
Paper • 2504.01956 • Published • 41 -
UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence with Spatial Reasoning and Understanding
Paper • 2506.23219 • Published • 7
-
BrushEdit: All-In-One Image Inpainting and Editing
Paper • 2412.10316 • Published • 35 -
ColorFlow: Retrieval-Augmented Image Sequence Colorization
Paper • 2412.11815 • Published • 26 -
FluxSpace: Disentangled Semantic Editing in Rectified Flow Transformers
Paper • 2412.09611 • Published • 11 -
FireFlow: Fast Inversion of Rectified Flow for Image Semantic Editing
Paper • 2412.07517 • Published • 11
-
TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion
Paper • 2401.09416 • Published • 11 -
SHINOBI: Shape and Illumination using Neural Object Decomposition via BRDF Optimization In-the-wild
Paper • 2401.10171 • Published • 14 -
DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model
Paper • 2311.09217 • Published • 22 -
GALA: Generating Animatable Layered Assets from a Single Scan
Paper • 2401.12979 • Published • 9
-
FlashWorld: High-quality 3D Scene Generation within Seconds
Paper • 2510.13678 • Published • 71 -
NANO3D: A Training-Free Approach for Efficient 3D Editing Without Masks
Paper • 2510.15019 • Published • 63 -
GeoSVR: Taming Sparse Voxels for Geometrically Accurate Surface Reconstruction
Paper • 2509.18090 • Published • 4 -
Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation
Paper • 2509.19296 • Published • 23
-
Energy-Based Transformers are Scalable Learners and Thinkers
Paper • 2507.02092 • Published • 69 -
MOSPA: Human Motion Generation Driven by Spatial Audio
Paper • 2507.11949 • Published • 24 -
Sound and Complete Neuro-symbolic Reasoning with LLM-Grounded Interpretations
Paper • 2507.09751 • Published • 1 -
Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling
Paper • 2507.07982 • Published • 33
-
BrushEdit: All-In-One Image Inpainting and Editing
Paper • 2412.10316 • Published • 35 -
ColorFlow: Retrieval-Augmented Image Sequence Colorization
Paper • 2412.11815 • Published • 26 -
FluxSpace: Disentangled Semantic Editing in Rectified Flow Transformers
Paper • 2412.09611 • Published • 11 -
FireFlow: Fast Inversion of Rectified Flow for Image Semantic Editing
Paper • 2412.07517 • Published • 11
-
3D Congealing: 3D-Aware Image Alignment in the Wild
Paper • 2404.02125 • Published • 10 -
SpatialTracker: Tracking Any 2D Pixels in 3D Space
Paper • 2404.04319 • Published • 25 -
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
Paper • 2404.09956 • Published • 12 -
MeshLRM: Large Reconstruction Model for High-Quality Mesh
Paper • 2404.12385 • Published • 27
-
TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion
Paper • 2401.09416 • Published • 11 -
SHINOBI: Shape and Illumination using Neural Object Decomposition via BRDF Optimization In-the-wild
Paper • 2401.10171 • Published • 14 -
DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model
Paper • 2311.09217 • Published • 22 -
GALA: Generating Animatable Layered Assets from a Single Scan
Paper • 2401.12979 • Published • 9
-
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning
Paper • 2311.12631 • Published • 15 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 58 -
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step
Paper • 2504.01956 • Published • 41 -
UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence with Spatial Reasoning and Understanding
Paper • 2506.23219 • Published • 7
-
FlashWorld: High-quality 3D Scene Generation within Seconds
Paper • 2510.13678 • Published • 71 -
NANO3D: A Training-Free Approach for Efficient 3D Editing Without Masks
Paper • 2510.15019 • Published • 63 -
GeoSVR: Taming Sparse Voxels for Geometrically Accurate Surface Reconstruction
Paper • 2509.18090 • Published • 4 -
Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation
Paper • 2509.19296 • Published • 23