hujiefrank commited on
Commit
f866549
·
verified ·
1 Parent(s): 3a91bf8

Add files using upload-large-folder tool

Browse files
.gitattributes CHANGED
@@ -34,3 +34,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  *.jpeg filter=lfs diff=lfs merge=lfs -text
 
 
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  *.jpeg filter=lfs diff=lfs merge=lfs -text
37
+ *.jpg filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -7,77 +7,34 @@ pipeline_tag: text-to-image
7
  library_name: transformers
8
  ---
9
 
10
- ## Introduction
11
- **LongCat-Image-Dev** is an open-source, bilingual (Chinese-English) text-to-image foundation model with 6B parameters, designed for efficient high-quality generation and downstream development.
12
  <div align="center">
13
- <img src="assets/model_struct.svg" width="90%" alt="LongCat-Image Generation Examples" />
14
  </div>
 
15
 
16
- ### Key Features
17
-
18
- - 🌟 **Exceptional Efficiency & Performance**: With only **6B parameters**, LongCat-Image-Dev achieves SOTA results that rival 20B+ models.
19
- - 🔧 **True Developer-Ready Foundation**: Unlike typical release-the-final-model-only approaches, we provide the Dev—a high-plasticity, unconstrained state that avoids RL-induced rigidity. This enables seamless fine-tuning without fighting against over-aligned parameter spaces.
20
-
21
- - 🛠️ **Full-Stack Training Framework**: We ship production-ready code for **SFT**, **LoRA fine-tuning**, **DPO/GRPO/MPO alignment**, and **specialized Edit training**. Every stage from pre-training data curation to reward model integration is reproducible, empowering researchers to build on our exact pipeline rather than reverse-engineering it.
22
-
23
- ---
24
- ## Quick Start
25
-
26
- ### Installation
27
-
28
- Clone the repo:
29
-
30
- ```shell
31
- git clone --single-branch --branch main https://github.com/meituan-longcat/LongCat-Image
32
- cd LongCat-Image
33
- ```
34
-
35
- Install dependencies:
36
-
37
- ```shell
38
- # create conda environment
39
- conda create -n longcat-image python=3.10
40
- conda activate longcat-image
41
-
42
- # install other requirements
43
- pip install -r requirements.txt
44
- python setup.py develop
45
- ```
46
-
47
- ### Run Text-to-Image Generation
48
 
49
- ```shell
50
- import torch
51
- from transformers import AutoProcessor
52
- from longcat_image.models import LongCatImageTransformer2DModel
53
- from longcat_image.pipelines import LongCatImagePipeline
54
 
55
- device = torch.device('cuda')
56
- checkpoint_dir = './weights/LongCat-Image-Dev'
 
 
 
57
 
58
- text_processor = AutoProcessor.from_pretrained( checkpoint_dir, subfolder = 'tokenizer' )
59
- transformer = LongCatImageTransformer2DModel.from_pretrained( checkpoint_dir , subfolder = 'transformer',
60
- torch_dtype=torch.bfloat16, use_safetensors=True).to(device)
 
 
61
 
62
- pipe = LongCatImagePipeline.from_pretrained(
63
- checkpoint_dir,
64
- transformer=transformer,
65
- text_processor=text_processor
66
- )
67
- pipe.to(device, torch.bfloat16)
68
 
69
- prompt = '一个年轻的亚裔女性,身穿黄色针织衫,搭配白色项链。她的双手放在膝盖上,表情恬静。背景是一堵粗糙的砖墙,午后的阳光温暖地洒在她身上,营造出一种宁静而温馨的氛围。镜头采用中距离视角,突出她的神态和服饰的细节。光线柔和地打在她的脸上,强调她的五官和饰品的质感,增加画面的层次感与亲和力。整个画面构图简洁,砖墙的纹理与阳光的光影效果相得益彰,突显出人物的优雅与从容。'
70
 
71
- image = pipe(
72
- prompt,
73
- height=768,
74
- width=1344,
75
- guidance_scale=4.5,
76
- num_inference_steps=50,
77
- num_images_per_prompt=1,
78
- generator=torch.Generator("cpu").manual_seed(43),
79
- enable_cfg_renorm=True,
80
- enable_prompt_rewrite=True
81
- ).images[0]
82
- image.save('./t2i_example.png')
83
- ```
 
7
  library_name: transformers
8
  ---
9
 
 
 
10
  <div align="center">
11
+ <img src="assets/longcat-image_logo.svg" width="45%" alt="LongCat-Image" />
12
  </div>
13
+ <hr>
14
 
15
+ <div align="center" style="line-height: 1;">
16
+ <a href='https://arxiv.org/abs/'><img src='https://img.shields.io/badge/Technical-Report-red'></a>
17
+ <a href='https://github.com/meituan-longcat/LongCat-Image'><img src='https://img.shields.io/badge/GitHub-Code-black'></a>
18
+ <a href='https://github.com/meituan-longcat/LongCat-Flash-Chat/blob/main/figures/wechat_official_accounts.png'><img src='https://img.shields.io/badge/WeChat-LongCat-brightgreen?logo=wechat&logoColor=white'></a>
19
+ <a href='https://x.com/Meituan_LongCat'><img src='https://img.shields.io/badge/Twitter-LongCat-white?logo=x&logoColor=white'></a>
20
+ </div>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
+ <div align="center" style="line-height: 1;">
 
 
 
 
23
 
24
+ [//]: # ( <a href='https://meituan-longcat.github.io/LongCat-Image/'><img src='https://img.shields.io/badge/Project-Page-green'></a>)
25
+ <a href='https://huggingface.co/meituan-longcat/LongCat-Image'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-LongCat--Image-blue'></a>
26
+ <a href='https://huggingface.co/meituan-longcat/LongCat-Image-Dev'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-LongCat--Image--Dev-blue'></a>
27
+ <a href='https://huggingface.co/meituan-longcat/LongCat-Image-Edit'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-LongCat--Image--Edit-blue'></a>
28
+ </div>
29
 
30
+ ## Introduction
31
+ **LongCat-Image-Dev** is a development variant of LongCat-Image, representing a mid-training checkpoint that is released to facilitate downstream development by the community, such as secondary fine-tuning via SFT, LoRA, and other customization methods.
32
+ <div align="center">
33
+ <img src="assets/model_struct.jpg" width="90%" alt="LongCat-Image Generation Examples" />
34
+ </div>
35
 
36
+ ### Key Features
 
 
 
 
 
37
 
38
+ - 🔧 **True Developer-Ready Foundation**: Unlike typical release-the-final-model-only approaches, we provide the Dev—a high-plasticity, unconstrained state that avoids RL-induced rigidity. This enables seamless fine-tuning without fighting against over-aligned parameter spaces.
39
 
40
+ - 🛠️ **Full-Stack Training Framework**: We ship production-ready code for **SFT**, **LoRA fine-tuning**, **DPO/GRPO/MPO alignment**, and **specialized Edit training**. Every stage from pre-training data curation to reward model integration is reproducible, empowering researchers to build on our exact pipeline rather than reverse-engineering it.
 
 
 
 
 
 
 
 
 
 
 
 
assets/longcat-image_logo.svg ADDED
assets/model_struct.jpg ADDED

Git LFS Details

  • SHA256: aadef7db22c66c8060e3f7df5657ae6b77c728429ef28cc57c61638343768bc1
  • Pointer size: 132 Bytes
  • Size of remote file: 3.63 MB