There’s another effort to recreate DALL-E: DALL-E
There’s also an effort to get text-image pairs from Common Crawl there, currently at 100 million. 
There’s another effort to recreate DALL-E: DALL-E
There’s also an effort to get text-image pairs from Common Crawl there, currently at 100 million. 
With this, our currently released models is located here: GitHub - robvanvolt/DALLE-models: Here is a collection of checkpoints for DALLE-pytorch models, from where you can keep on training or start generating images.
Along with the inference colab.
Status
datasets for VQGAN ready (thanks Khalid), training being set up (Boris & Pedro)
debugging of VQGAN on TPU aborted (Pedro may do some last attempt with help of Tanishq)
test script to convert a VQGAN (try a pretrained one) to JAX from Suraj repo
prepare a function that turns an image into encoded tokens from VQGAN (test with pretrained model)
create a dataset that contains only target text + image tokens (if it’s small we can do it on all the images we have access to) - I suggest to try datasets with map (I would not batch it as the intermediate dataset with loaded images may be too big and it may be the bottleneck)
prepare the seq2seq jax script (test an example first and adapt with one of our dummy datasets)
prepare demoA lot of these tasks can be done in parallel!
We can coordinate on the discord.
Status
VQGAN trained - Pytorch model
conversion script + VQGAN converted to JAX - JAX model
image encoding script - thanks @pcuenq
images from CC3M and CC12M pre-encoded with VQGAN - thanks @pcuenq and @khalidsaifullaah - Can any of you check that we dropna for non-existing images on both datasets
images from YFCC100M pre-encoded with VQGAN - @khalidsaifullaah if you are interested
pandas.read_json("metadata.jsonl", orient="records", lines=True)
text_clean or description_clean or any other logic
dataset pipeline - thanks @pcuenq - Can we concatenate datasets by passing a list of files?
model inference - thanks @lkhphuc for the help
prepare the seq2seq jax script - I already started it but didn’t have the time to test: see herepredict_with_generate=False as we would need to define metrics (not obvious to find anything better than the loss here). When it runs we can just log some sample predictions decoded with the VQGAN.
finetune learning rate and warmup steps
final training
test generate function - should already handle properly bos/eos/pos tokens in decoder based on config
make a JAX VQVAE - use haiku implementation + Suraj VQGAN repo as an example
prepare demo - we may want to generate several images and use CLIP for rerankingThanks everybody!!!
Status
Training ongoing for the Seq2Seq (2 active runs doing pretty good) - see dashboard
Awesome development demo and tests ongoing by everybody
CLIP integrated by @pcuenq
Final demo to develop
Clean up repo - not sure if we want to bring our fork of taming-transformers as well
Writeup - I would suggest to make the main writeup as a W&B report since it will link all our runs together and have a clean repo for reproducibility
Congrats everybody for the great results we’re getting!
Status update
Demo
Report
)Documentation
Ongoing runs
Status update
Runs
wandb/hf-flax-dalle-mini/model-4oh3u7ca:latest
wandb/hf-flax-dalle-mini/model-4oh3u7ca:latest
Demo
Repos
from ../dalle-mini/model import CustomFlaxBartForConditionalGeneration)Writeup
Status update
Our report is ready and looks awesome! Possible improvements:
Demo
Repo
Status update
Demo almost ready (just need to upgrade streamlit with HF when possible)
Report almost ready (let’s have some final read and check links, etc)
Repo
In addition to my previous comments, see the evaluation form that has great ideas from Patrick on what we can improve!
Status update
Almost done 
Demo
Github repo now in sync with Spaces
update streamlit on HF Spaces (and remove our hacks)Report
Evolution of predictions complete
Updated section on limitations and biases - Please review and see if you have more to add
compare to DALLE-pytorch - I could not find any generic model, there’s some trained on tiny datasets (for example birds) which would not be great for comparison. Can you find any?Model cards
VQGAN JAX card being updated by @pcuenq
VQGAN Pytorch version to complete (I can copy from JAX version + add details on requirements and inference)
complete DALL-E mini card (inference script could be nice and does not need to include CLIP)Repo
cleanup notebooks - I think we just need one for encoding data + one for inferenceStatus
Report to be released soon
All model cards updated (thanks Pedro)
Inference colab ready
Demo - waiting for the upgrade of streamlit to fix our layout
New run ongoing: same data, more epochs
working on dataset loading script for YFCC100M - see here
trying to fix optimizer checkpoint - see here
Let’s plan next steps soon.
For those following and don’t know, the demo has already been released.
Feel free to join the Dall-E Discord server to help with this project!