HikariDawn commited on
Commit
3186d16
·
1 Parent(s): c4876d0

docs: little update

Browse files
Files changed (2) hide show
  1. app.py +4 -4
  2. requirements.txt +2 -2
app.py CHANGED
@@ -71,7 +71,7 @@ MARKDOWN = \
71
  """
72
  <div align='center'>
73
  <h1> This&That: Language-Gesture Controlled Video Generation for Robot Planning </h1> \
74
- <h2 style='font-weight: 450; font-size: 1rem; margin: 0rem'>\
75
  <a href='https://kiteretsu77.github.io/BoyangWang/'>Boyang Wang</a>, \
76
  <a href='https://www.linkedin.com/in/niksridhar/'>Nikhil Sridhar</a>, \
77
  <a href='https://cfeng16.github.io/'>Chao Feng</a>, \
@@ -79,17 +79,17 @@ MARKDOWN = \
79
  <a href='https://fishbotics.com/'>Adam Fishman</a>, \
80
  <a href='https://www.mmintlab.com/people/nima-fazeli/'>Nima Fazeli</a>, \
81
  <a href='https://jjparkcv.github.io/'>Jeong Joon Park</a> \
82
- </h2> \
83
 
84
  <a style='font-size:18px;color: #000000' href='https://github.com/Kiteretsu77/This_and_That_VDM'> [Github] </a> \
85
  <a style='font-size:18px;color: #000000' href='http://arxiv.org/abs/2407.05530'> [ArXiv] </a> \
86
  <a style='font-size:18px;color: #000000' href='https://cfeng16.github.io/this-and-that/'> [Project Page] </a> </div> \
87
  </div>
88
 
89
- This&That is a robotics scenario (based on the Bridge dataset for this demo), a Language-Gesture-Image-conditioned Video Generation Model for Robot Planning.
90
 
91
  This demo focuses on the Video Diffusion Model.
92
- Only the VGL mode (image + language + gesture conditioned) is provided, but you can find the complete test code and all pretrained weights available.
93
 
94
  ### Note: The default gesture point indices are [4, 10] (5th and 11th) for two gesture points, or [4] (5th) for one gesture point.
95
  ### Note: Currently, the supported resolution is 256x384.
 
71
  """
72
  <div align='center'>
73
  <h1> This&That: Language-Gesture Controlled Video Generation for Robot Planning </h1> \
74
+ <h3 style='font-weight: 450; font-size: 1rem; margin: 0rem'>\
75
  <a href='https://kiteretsu77.github.io/BoyangWang/'>Boyang Wang</a>, \
76
  <a href='https://www.linkedin.com/in/niksridhar/'>Nikhil Sridhar</a>, \
77
  <a href='https://cfeng16.github.io/'>Chao Feng</a>, \
 
79
  <a href='https://fishbotics.com/'>Adam Fishman</a>, \
80
  <a href='https://www.mmintlab.com/people/nima-fazeli/'>Nima Fazeli</a>, \
81
  <a href='https://jjparkcv.github.io/'>Jeong Joon Park</a> \
82
+ </h3> \
83
 
84
  <a style='font-size:18px;color: #000000' href='https://github.com/Kiteretsu77/This_and_That_VDM'> [Github] </a> \
85
  <a style='font-size:18px;color: #000000' href='http://arxiv.org/abs/2407.05530'> [ArXiv] </a> \
86
  <a style='font-size:18px;color: #000000' href='https://cfeng16.github.io/this-and-that/'> [Project Page] </a> </div> \
87
  </div>
88
 
89
+ This&That is Language-Gesture-conditioned Video Generation Model for Robot Planning (based on the Bridge V1 & V2 dataset).
90
 
91
  This demo focuses on the Video Diffusion Model.
92
+ Only the VGL mode (Image + Language + Gesture conditioned) is provided, but you can find the complete test code and all pretrained weights available in our [GitHub Repo](https://github.com/Kiteretsu77/This_and_That_VDM).
93
 
94
  ### Note: The default gesture point indices are [4, 10] (5th and 11th) for two gesture points, or [4] (5th) for one gesture point.
95
  ### Note: Currently, the supported resolution is 256x384.
requirements.txt CHANGED
@@ -1,6 +1,6 @@
1
  # Non-strict version lib
2
- # torch==2.0.1
3
- # torchaudio==2.0.1
4
  opencv-python
5
  transformers
6
  accelerate
 
1
  # Non-strict version lib
2
+ # torch==2.5.1
3
+ # torchvision==0.20.1
4
  opencv-python
5
  transformers
6
  accelerate