Spaces:
Running
on
Zero
Running
on
Zero
Commit
·
3186d16
1
Parent(s):
c4876d0
docs: little update
Browse files- app.py +4 -4
- requirements.txt +2 -2
app.py
CHANGED
|
@@ -71,7 +71,7 @@ MARKDOWN = \
|
|
| 71 |
"""
|
| 72 |
<div align='center'>
|
| 73 |
<h1> This&That: Language-Gesture Controlled Video Generation for Robot Planning </h1> \
|
| 74 |
-
<
|
| 75 |
<a href='https://kiteretsu77.github.io/BoyangWang/'>Boyang Wang</a>, \
|
| 76 |
<a href='https://www.linkedin.com/in/niksridhar/'>Nikhil Sridhar</a>, \
|
| 77 |
<a href='https://cfeng16.github.io/'>Chao Feng</a>, \
|
|
@@ -79,17 +79,17 @@ MARKDOWN = \
|
|
| 79 |
<a href='https://fishbotics.com/'>Adam Fishman</a>, \
|
| 80 |
<a href='https://www.mmintlab.com/people/nima-fazeli/'>Nima Fazeli</a>, \
|
| 81 |
<a href='https://jjparkcv.github.io/'>Jeong Joon Park</a> \
|
| 82 |
-
</
|
| 83 |
|
| 84 |
<a style='font-size:18px;color: #000000' href='https://github.com/Kiteretsu77/This_and_That_VDM'> [Github] </a> \
|
| 85 |
<a style='font-size:18px;color: #000000' href='http://arxiv.org/abs/2407.05530'> [ArXiv] </a> \
|
| 86 |
<a style='font-size:18px;color: #000000' href='https://cfeng16.github.io/this-and-that/'> [Project Page] </a> </div> \
|
| 87 |
</div>
|
| 88 |
|
| 89 |
-
This&That is
|
| 90 |
|
| 91 |
This demo focuses on the Video Diffusion Model.
|
| 92 |
-
Only the VGL mode (
|
| 93 |
|
| 94 |
### Note: The default gesture point indices are [4, 10] (5th and 11th) for two gesture points, or [4] (5th) for one gesture point.
|
| 95 |
### Note: Currently, the supported resolution is 256x384.
|
|
|
|
| 71 |
"""
|
| 72 |
<div align='center'>
|
| 73 |
<h1> This&That: Language-Gesture Controlled Video Generation for Robot Planning </h1> \
|
| 74 |
+
<h3 style='font-weight: 450; font-size: 1rem; margin: 0rem'>\
|
| 75 |
<a href='https://kiteretsu77.github.io/BoyangWang/'>Boyang Wang</a>, \
|
| 76 |
<a href='https://www.linkedin.com/in/niksridhar/'>Nikhil Sridhar</a>, \
|
| 77 |
<a href='https://cfeng16.github.io/'>Chao Feng</a>, \
|
|
|
|
| 79 |
<a href='https://fishbotics.com/'>Adam Fishman</a>, \
|
| 80 |
<a href='https://www.mmintlab.com/people/nima-fazeli/'>Nima Fazeli</a>, \
|
| 81 |
<a href='https://jjparkcv.github.io/'>Jeong Joon Park</a> \
|
| 82 |
+
</h3> \
|
| 83 |
|
| 84 |
<a style='font-size:18px;color: #000000' href='https://github.com/Kiteretsu77/This_and_That_VDM'> [Github] </a> \
|
| 85 |
<a style='font-size:18px;color: #000000' href='http://arxiv.org/abs/2407.05530'> [ArXiv] </a> \
|
| 86 |
<a style='font-size:18px;color: #000000' href='https://cfeng16.github.io/this-and-that/'> [Project Page] </a> </div> \
|
| 87 |
</div>
|
| 88 |
|
| 89 |
+
This&That is Language-Gesture-conditioned Video Generation Model for Robot Planning (based on the Bridge V1 & V2 dataset).
|
| 90 |
|
| 91 |
This demo focuses on the Video Diffusion Model.
|
| 92 |
+
Only the VGL mode (Image + Language + Gesture conditioned) is provided, but you can find the complete test code and all pretrained weights available in our [GitHub Repo](https://github.com/Kiteretsu77/This_and_That_VDM).
|
| 93 |
|
| 94 |
### Note: The default gesture point indices are [4, 10] (5th and 11th) for two gesture points, or [4] (5th) for one gesture point.
|
| 95 |
### Note: Currently, the supported resolution is 256x384.
|
requirements.txt
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
# Non-strict version lib
|
| 2 |
-
# torch==2.
|
| 3 |
-
#
|
| 4 |
opencv-python
|
| 5 |
transformers
|
| 6 |
accelerate
|
|
|
|
| 1 |
# Non-strict version lib
|
| 2 |
+
# torch==2.5.1
|
| 3 |
+
# torchvision==0.20.1
|
| 4 |
opencv-python
|
| 5 |
transformers
|
| 6 |
accelerate
|