Add files using upload-large-folder tool
Browse files- .gitattributes +1 -0
- README.md +22 -65
- assets/longcat-image_logo.svg +78 -0
- assets/model_struct.jpg +3 -0
.gitattributes
CHANGED
|
@@ -34,3 +34,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
*.jpeg filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
*.jpeg filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
*.jpg filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
|
@@ -7,77 +7,34 @@ pipeline_tag: text-to-image
|
|
| 7 |
library_name: transformers
|
| 8 |
---
|
| 9 |
|
| 10 |
-
## Introduction
|
| 11 |
-
**LongCat-Image-Dev** is an open-source, bilingual (Chinese-English) text-to-image foundation model with 6B parameters, designed for efficient high-quality generation and downstream development.
|
| 12 |
<div align="center">
|
| 13 |
-
<img src="assets/
|
| 14 |
</div>
|
|
|
|
| 15 |
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
---
|
| 24 |
-
## Quick Start
|
| 25 |
-
|
| 26 |
-
### Installation
|
| 27 |
-
|
| 28 |
-
Clone the repo:
|
| 29 |
-
|
| 30 |
-
```shell
|
| 31 |
-
git clone --single-branch --branch main https://github.com/meituan-longcat/LongCat-Image
|
| 32 |
-
cd LongCat-Image
|
| 33 |
-
```
|
| 34 |
-
|
| 35 |
-
Install dependencies:
|
| 36 |
-
|
| 37 |
-
```shell
|
| 38 |
-
# create conda environment
|
| 39 |
-
conda create -n longcat-image python=3.10
|
| 40 |
-
conda activate longcat-image
|
| 41 |
-
|
| 42 |
-
# install other requirements
|
| 43 |
-
pip install -r requirements.txt
|
| 44 |
-
python setup.py develop
|
| 45 |
-
```
|
| 46 |
-
|
| 47 |
-
### Run Text-to-Image Generation
|
| 48 |
|
| 49 |
-
|
| 50 |
-
import torch
|
| 51 |
-
from transformers import AutoProcessor
|
| 52 |
-
from longcat_image.models import LongCatImageTransformer2DModel
|
| 53 |
-
from longcat_image.pipelines import LongCatImagePipeline
|
| 54 |
|
| 55 |
-
|
| 56 |
-
|
|
|
|
|
|
|
|
|
|
| 57 |
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
|
|
|
|
|
|
| 61 |
|
| 62 |
-
|
| 63 |
-
checkpoint_dir,
|
| 64 |
-
transformer=transformer,
|
| 65 |
-
text_processor=text_processor
|
| 66 |
-
)
|
| 67 |
-
pipe.to(device, torch.bfloat16)
|
| 68 |
|
| 69 |
-
|
| 70 |
|
| 71 |
-
|
| 72 |
-
prompt,
|
| 73 |
-
height=768,
|
| 74 |
-
width=1344,
|
| 75 |
-
guidance_scale=4.5,
|
| 76 |
-
num_inference_steps=50,
|
| 77 |
-
num_images_per_prompt=1,
|
| 78 |
-
generator=torch.Generator("cpu").manual_seed(43),
|
| 79 |
-
enable_cfg_renorm=True,
|
| 80 |
-
enable_prompt_rewrite=True
|
| 81 |
-
).images[0]
|
| 82 |
-
image.save('./t2i_example.png')
|
| 83 |
-
```
|
|
|
|
| 7 |
library_name: transformers
|
| 8 |
---
|
| 9 |
|
|
|
|
|
|
|
| 10 |
<div align="center">
|
| 11 |
+
<img src="assets/longcat-image_logo.svg" width="45%" alt="LongCat-Image" />
|
| 12 |
</div>
|
| 13 |
+
<hr>
|
| 14 |
|
| 15 |
+
<div align="center" style="line-height: 1;">
|
| 16 |
+
<a href='https://arxiv.org/abs/'><img src='https://img.shields.io/badge/Technical-Report-red'></a>
|
| 17 |
+
<a href='https://github.com/meituan-longcat/LongCat-Image'><img src='https://img.shields.io/badge/GitHub-Code-black'></a>
|
| 18 |
+
<a href='https://github.com/meituan-longcat/LongCat-Flash-Chat/blob/main/figures/wechat_official_accounts.png'><img src='https://img.shields.io/badge/WeChat-LongCat-brightgreen?logo=wechat&logoColor=white'></a>
|
| 19 |
+
<a href='https://x.com/Meituan_LongCat'><img src='https://img.shields.io/badge/Twitter-LongCat-white?logo=x&logoColor=white'></a>
|
| 20 |
+
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
|
| 22 |
+
<div align="center" style="line-height: 1;">
|
|
|
|
|
|
|
|
|
|
|
|
|
| 23 |
|
| 24 |
+
[//]: # ( <a href='https://meituan-longcat.github.io/LongCat-Image/'><img src='https://img.shields.io/badge/Project-Page-green'></a>)
|
| 25 |
+
<a href='https://huggingface.co/meituan-longcat/LongCat-Image'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-LongCat--Image-blue'></a>
|
| 26 |
+
<a href='https://huggingface.co/meituan-longcat/LongCat-Image-Dev'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-LongCat--Image--Dev-blue'></a>
|
| 27 |
+
<a href='https://huggingface.co/meituan-longcat/LongCat-Image-Edit'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-LongCat--Image--Edit-blue'></a>
|
| 28 |
+
</div>
|
| 29 |
|
| 30 |
+
## Introduction
|
| 31 |
+
**LongCat-Image-Dev** is a development variant of LongCat-Image, representing a mid-training checkpoint that is released to facilitate downstream development by the community, such as secondary fine-tuning via SFT, LoRA, and other customization methods.
|
| 32 |
+
<div align="center">
|
| 33 |
+
<img src="assets/model_struct.jpg" width="90%" alt="LongCat-Image Generation Examples" />
|
| 34 |
+
</div>
|
| 35 |
|
| 36 |
+
### Key Features
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
|
| 38 |
+
- 🔧 **True Developer-Ready Foundation**: Unlike typical release-the-final-model-only approaches, we provide the Dev—a high-plasticity, unconstrained state that avoids RL-induced rigidity. This enables seamless fine-tuning without fighting against over-aligned parameter spaces.
|
| 39 |
|
| 40 |
+
- 🛠️ **Full-Stack Training Framework**: We ship production-ready code for **SFT**, **LoRA fine-tuning**, **DPO/GRPO/MPO alignment**, and **specialized Edit training**. Every stage from pre-training data curation to reward model integration is reproducible, empowering researchers to build on our exact pipeline rather than reverse-engineering it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
assets/longcat-image_logo.svg
ADDED
|
|
assets/model_struct.jpg
ADDED
|
Git LFS Details
|