Spaces:

HikariDawn
/

This-and-That

Running on Zero

App Files Files Community

HikariDawn commited on Oct 7

Commit

3186d16

1 Parent(s): c4876d0

docs: little update

Browse files

Files changed (2) hide show

app.py +4 -4
requirements.txt +2 -2

app.py CHANGED Viewed

@@ -71,7 +71,7 @@ MARKDOWN = \
     """
     <div align='center'>
     <h1> This&That: Language-Gesture Controlled Video Generation for Robot Planning </h1> \
-        <h2 style='font-weight: 450; font-size: 1rem; margin: 0rem'>\
             <a href='https://kiteretsu77.github.io/BoyangWang/'>Boyang Wang</a>, \
             <a href='https://www.linkedin.com/in/niksridhar/'>Nikhil Sridhar</a>, \
             <a href='https://cfeng16.github.io/'>Chao Feng</a>, \
@@ -79,17 +79,17 @@ MARKDOWN = \
             <a href='https://fishbotics.com/'>Adam Fishman</a>, \
             <a href='https://www.mmintlab.com/people/nima-fazeli/'>Nima Fazeli</a>, \
             <a href='https://jjparkcv.github.io/'>Jeong Joon Park</a> \
-        </h2> \
     <a style='font-size:18px;color: #000000' href='https://github.com/Kiteretsu77/This_and_That_VDM'> [Github] </a> \
     <a style='font-size:18px;color: #000000' href='http://arxiv.org/abs/2407.05530'> [ArXiv] </a> \
     <a style='font-size:18px;color: #000000' href='https://cfeng16.github.io/this-and-that/'> [Project Page] </a> </div> \
     </div>
-    This&That is a robotics scenario (based on the Bridge dataset for this demo), a Language-Gesture-Image-conditioned Video Generation Model for Robot Planning.
     This demo focuses on the Video Diffusion Model.
-    Only the VGL mode (image + language + gesture conditioned) is provided, but you can find the complete test code and all pretrained weights available.
     ### Note: The default gesture point indices are [4, 10] (5th and 11th) for two gesture points, or [4] (5th) for one gesture point.
     ### Note: Currently, the supported resolution is 256x384.

     """
     <div align='center'>
     <h1> This&That: Language-Gesture Controlled Video Generation for Robot Planning </h1> \
+        <h3 style='font-weight: 450; font-size: 1rem; margin: 0rem'>\
             <a href='https://kiteretsu77.github.io/BoyangWang/'>Boyang Wang</a>, \
             <a href='https://www.linkedin.com/in/niksridhar/'>Nikhil Sridhar</a>, \
             <a href='https://cfeng16.github.io/'>Chao Feng</a>, \
             <a href='https://fishbotics.com/'>Adam Fishman</a>, \
             <a href='https://www.mmintlab.com/people/nima-fazeli/'>Nima Fazeli</a>, \
             <a href='https://jjparkcv.github.io/'>Jeong Joon Park</a> \
+        </h3> \
     <a style='font-size:18px;color: #000000' href='https://github.com/Kiteretsu77/This_and_That_VDM'> [Github] </a> \
     <a style='font-size:18px;color: #000000' href='http://arxiv.org/abs/2407.05530'> [ArXiv] </a> \
     <a style='font-size:18px;color: #000000' href='https://cfeng16.github.io/this-and-that/'> [Project Page] </a> </div> \
     </div>
+    This&That is Language-Gesture-conditioned Video Generation Model for Robot Planning (based on the Bridge V1 & V2 dataset).
     This demo focuses on the Video Diffusion Model.
+    Only the VGL mode (Image + Language + Gesture conditioned) is provided, but you can find the complete test code and all pretrained weights available in our [GitHub Repo](https://github.com/Kiteretsu77/This_and_That_VDM).
     ### Note: The default gesture point indices are [4, 10] (5th and 11th) for two gesture points, or [4] (5th) for one gesture point.
     ### Note: Currently, the supported resolution is 256x384.

requirements.txt CHANGED Viewed

@@ -1,6 +1,6 @@
 # Non-strict version lib
-# torch==2.0.1
-# torchaudio==2.0.1
 opencv-python
 transformers
 accelerate

 # Non-strict version lib
+# torch==2.5.1
+# torchvision==0.20.1
 opencv-python
 transformers
 accelerate