HikariDawn
/

This-and-That-1.0

Model card Files Files and versions

This-and-That-1.0 / README.md

HikariDawn's picture

Update README.md

b9a5e47 verified about 1 year ago

|

history blame contribute delete

1.69 kB

	---
	license: apache-2.0
	---

	# This&That V1.0 Model Card

	<div align="center">

	[Project Page](https://cfeng16.github.io/this-and-that/) \| [Paper (ArXiv)](https://arxiv.org/abs/2407.05530) \| [Code](https://github.com/Kiteretsu77/This_and_That_VDM)

	</div>

	## Introduction

	We propose a robot learning method for communicating, planning, and executing a wide range of tasks, dubbed This&That.
	We achieve robot planning for general tasks by leveraging the power of video generative models trained on
	internet-scale data containing rich physical and semantic context.
	In this work, we tackle three fundamental challenges in video-based planning: 1) unambiguous task
	communication with simple human instructions, 2) controllable video generation
	that respects user intents, and 3) translating visual planning into robot actions.
	We propose language-gesture conditioning to generate videos, which is both simpler
	and clearer than existing language-only methods, especially in complex and uncertain environments.
	We then suggest a behavioral cloning design that seamlessly incorporates the video plans. This&That demonstrates state-of-the-art effectiveness
	in addressing the above three challenges, and justifies the use of video generation vas an intermediate representation for generalizable task planning and execution.






	## Citation
	```bibtex
	@article{wang2024language,
	title={This\&That: Language-Gesture Controlled Video Generation for Robot Planning},
	author={Wang, Boyang and Sridhar, Nikhil and Feng, Chao and Van der Merwe, Mark and Fishman, Adam and Fazeli, Nima and Park, Jeong Joon},
	journal={arXiv preprint arXiv:2407.05530},
	year={2024}
	}
	```