Text-guided Sparse Voxel Pruning for Efficient 3D Visual Grounding
Paper
•
2502.10392
•
Published
•
6
This repo contains the models for paper Text-guided Sparse Voxel Pruning for Efficient 3D Visual Grounding. Code is available at: https://github.com/GWxuan/TSP3D
Text-guided Sparse Voxel Pruning for Efficient 3D Visual Grounding
Wenxuan Guo*, Xiuwei Xu*, Ziwei Wang, Jianjiang Feng†, Jie Zhou, Jiwen Lu
* Equal contribution † Corresponding author
In this work, we propose an efficient multi-level convolution architecture for 3D visual grounding. TSP3D achieves superior performance compared to previous approaches in both inference speed and accuracy.
We provide the checkpoints for quick reproduction of the results reported in the paper.
| Benchmark | Pipeline | [email protected] | [email protected] | Inference Speed (FPS) | Downloads |
|---|---|---|---|---|---|
| ScanRefer | Single-stage | 56.45 | 46.71 | 12.43 | model |
| Benchmark | Pipeline | [email protected] | [email protected] | Downloads |
|---|---|---|---|---|
| Nr3d | Single-stage | 48.7 | 37.0 | model |
| Sr3d | Single-stage | 57.1 | 44.1 | model |
Comparison of 3DVG methods on ScanRefer dataset: