DALL-E - mini version

devetec · July 8, 2021, 6:23pm

There’s another effort to recreate DALL-E: DALL-E

There’s also an effort to get text-image pairs from Common Crawl there, currently at 100 million.

johnpaulbin · July 8, 2021, 6:26pm

With this, our currently released models is located here: GitHub - robvanvolt/DALLE-models: Here is a collection of checkpoints for DALLE-pytorch models, from where you can keep on training or start generating images.
Along with the inference colab.

boris · July 9, 2021, 5:20pm

Status

datasets for VQGAN ready (thanks Khalid), training being set up (Boris & Pedro)
debugging of VQGAN on TPU aborted (Pedro may do some last attempt with help of Tanishq)
test script to convert a VQGAN (try a pretrained one) to JAX from Suraj repo
prepare a function that turns an image into encoded tokens from VQGAN (test with pretrained model)
create a dataset that contains only target text + image tokens (if it’s small we can do it on all the images we have access to) - I suggest to try datasets with map (I would not batch it as the intermediate dataset with loaded images may be too big and it may be the bottleneck)
prepare the seq2seq jax script (test an example first and adapt with one of our dummy datasets)
prepare demo

A lot of these tasks can be done in parallel!
We can coordinate on the discord.

boris · July 13, 2021, 6:37am

Status

VQGAN trained - Pytorch model
conversion script + VQGAN converted to JAX - JAX model
image encoding script - thanks @pcuenq
images from CC3M and CC12M pre-encoded with VQGAN - thanks @pcuenq and @khalidsaifullaah - Can any of you check that we dropna for non-existing images on both datasets
images from YFCC100M pre-encoded with VQGAN - @khalidsaifullaah if you are interested
- file is metadata.jsonl and can be read with pandas.read_json("metadata.jsonl", orient="records", lines=True)
- explore and see if we prefer text_clean or description_clean or any other logic
- some files are not present on our VM so we will need to remove those items
dataset pipeline - thanks @pcuenq - Can we concatenate datasets by passing a list of files?
model inference - thanks @lkhphuc for the help
prepare the seq2seq jax script - I already started it but didn’t have the time to test: see here
Note: I think for now we should have predict_with_generate=False as we would need to define metrics (not obvious to find anything better than the loss here). When it runs we can just log some sample predictions decoded with the VQGAN.
finetune learning rate and warmup steps
final training
test generate function - should already handle properly bos/eos/pos tokens in decoder based on config
make a JAX VQVAE - use haiku implementation + Suraj VQGAN repo as an example
prepare demo - we may want to generate several images and use CLIP for reranking

Thanks everybody!!!

boris · July 16, 2021, 4:47am

Status

Training ongoing for the Seq2Seq (2 active runs doing pretty good) - see dashboard
Awesome development demo and tests ongoing by everybody
CLIP integrated by @pcuenq
Final demo to develop
- UI
  - Option A: huggingface widget - seems like the required PR has not been merged yet
  - Option B: Streamlit
  - Option C: Gradio
  - Option D: Colab
  - Notes:
    - We can potentially get a T4 GPU and host it on huggingface spaces - @valhalla would it be possible?
    - We also need to see if those options support well JAX (I would think so except maybe for huggingface widget)
  - I think it’s important to pursue different options and see what works the best in the end
- Inference
  - Reference: see the demo folder and the colab from @lkhphuc
  - Check how we can make it faster in JAX (maybe some compilation tricks)
  - Test Pytorch vs JAX inference (regular Bart summarization model) and if it’s worth it, we’ll convert our model to Pytorch (need to recreate the architecture + transfer correctly the weights)
  - Push the model to the hub as it could potentially be loaded faster if the app is hosted by HF
Clean up repo - not sure if we want to bring our fork of taming-transformers as well
Writeup - I would suggest to make the main writeup as a W&B report since it will link all our runs together and have a clean repo for reproducibility

Congrats everybody for the great results we’re getting!

boris · July 17, 2021, 4:04am

Status update

Demo
- We’ll design a streamlit app in an “app” folder
  - → let us keep app and code together as they evolve
  - → easier for several member to contribute (branches, PR’s…)
  - → let us experiment locally
- We’ll force push our repo to huggingface spaces - only the README seems to have a few important tags at the top and the rest can be anything (we don’t have any content for now, we’ll put link to app + link to report + setup instructions)
- Once it’s set up, we can probably set up a Github action so our repo is pushed automatically to huggingface spaces
- we should push our best checkpoint (even if we update it later) to the hub so we can reference it in the hub (I imagine it will load faster as it’s probably the same servers) - We should reference the model commit id in case someone pushes a new version on our repo by mistake
- I believe @ghosh-r has started prototyping something so he can start pushing his code
Report
- We’ll do it through W&B. See current report here
- I’m checking how to add the contributors as authors but you should have access to edit it
- @pcuenq is taking care of the predictions section (right now it’s automatically updated with our latest run )
- @khalidsaifullaah is making a super cool graph
Documentation
- update repo README
- model cards VQGAN + mini-DALLE
Ongoing runs
- Current best run
- Alternative run - starting from a checkpoint and with higher learning rate - expect the loss to go up for a few hours and then it should go down

boris · July 18, 2021, 6:59am

Status update

Runs
- Our current best run has a decreasing eval loss
- However I checked manually and it seems it now predicts only the same token 10042. Can somebody double check based on wandb/hf-flax-dalle-mini/model-4oh3u7ca:latest
- If that’s the case, we may have to pick manually the best performing model and find out what this happened - @pcuenq could you use your script to loop over all versions of wandb/hf-flax-dalle-mini/model-4oh3u7ca:latest
Demo
- @tmabraham has created a cool Gradio demo. The UI is awesome but the inference is slow (however not slower than our Streamlit version). Once we can test it on HF spaces we may have to adapt how many images are generated.
- We are having trouble installing dependencies on our HF space. @pcuenq has reached out to the HF team through slack so they can help us resolve the issue (seems to be related to the cuda toolkit).
- We can set up some suggested prompts so let’s play with it and find 5-8 cool prompts to suggest (can be great for our report too)!
Repos
- I created some basic cards for our datasets and models
- we need to refactor a bit our repo. Ideally the root will only have the readme, requirements.txt for the app (needs to be there), an app folder, a dev folder (with our notebooks) and the rest in the same folder (whether “src” or “dalle-mini”).
- our app should pull the model definition from the repo with something like from ../dalle-mini/model import CustomFlaxBartForConditionalGeneration)
- it would be cool to force push our repo to HF space and set up a Github action to always push our master branch there
Writeup
- @khalidsaifullaah has created some awesome diagrams of our architecture
- the report will be in W&B and will link to our runs. Pedro has already prepared some cool stuff for our predictions section.

boris · July 19, 2021, 7:20am

Status update

Our report is ready and looks awesome! Possible improvements:
- top pic in intro could be a bit better - feel free to generate something interesting
- generated samples could potentially be better - @pcuenq if you have a chance you could try latest model with 128 samples
Demo
- Works locally and looks pretty great!
- Pushed to HF space - we will need their help for set up
- The screenshot has the white space on the top - maybe @tmabraham can check with Gradio?
- Maybe we can remove the example prompts (unless it would be a drop down in the input) or we can just have a cool default prompt
Repo
- I did some big cleanup of the repo - see PR

boris · July 22, 2021, 12:37pm

Status update

Demo almost ready (just need to upgrade streamlit with HF when possible)
Report almost ready (let’s have some final read and check links, etc)
Repo
- Use the model definition in our training script and notebooks (see app)
- Clean up our notebooks - remove the useless ones and have some simple ones - inference (regular + TPU), use of VQGAN, how to encode an image

boris · July 22, 2021, 12:59pm

In addition to my previous comments, see the evaluation form that has great ideas from Patrick on what we can improve!

boris · July 23, 2021, 5:15am

Status update

Almost done

Demo
- Github repo now in sync with Spaces
- update streamlit on HF Spaces (and remove our hacks)
Report
- Evolution of predictions complete
- Updated section on limitations and biases - Please review and see if you have more to add
- compare to DALLE-pytorch - I could not find any generic model, there’s some trained on tiny datasets (for example birds) which would not be great for comparison. Can you find any?
Model cards
- VQGAN JAX card being updated by @pcuenq
- VQGAN Pytorch version to complete (I can copy from JAX version + add details on requirements and inference)
- complete DALL-E mini card (inference script could be nice and does not need to include CLIP)
Repo
- cleanup notebooks - I think we just need one for encoding data + one for inference

boris · July 30, 2021, 4:53am

Status

Report to be released soon
All model cards updated (thanks Pedro)
Inference colab ready
Demo - waiting for the upgrade of streamlit to fix our layout
New run ongoing: same data, more epochs
working on dataset loading script for YFCC100M - see here
trying to fix optimizer checkpoint - see here

Let’s plan next steps soon.

johnpaulbin · August 22, 2021, 9:41pm

For those following and don’t know, the demo has already been released.

Feel free to join the Dall-E Discord server to help with this project!

Topic		Replies	Views
Generate GIF reply to English text with VQGAN + CLIP Flax/JAX Projects	23	3399	July 2, 2021
I need The implications of dalle2 and CogView2 model 🤗Transformers	0	227	August 15, 2022
Train the Best Sentence Embedding Model Ever with 1B Training Pairs Flax/JAX Projects	36	26851	July 2, 2023
Image captioning for French with pre-trained vision and text model Flax/JAX Projects	6	2209	January 4, 2022
Creation of Images from Text-Prompt (Customized Training) Beginners	37	785	January 15, 2025

DALL-E - mini version

Related topics