Add files using upload-large-folder tool

Browse files

Files changed (4) hide show

.gitattributes +1 -0
README.md +22 -65
assets/longcat-image_logo.svg +78 -0
assets/model_struct.jpg +3 -0

.gitattributes CHANGED Viewed

@@ -34,3 +34,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 *.jpeg filter=lfs diff=lfs merge=lfs -text

 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 *.jpeg filter=lfs diff=lfs merge=lfs -text
+*.jpg filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -7,77 +7,34 @@ pipeline_tag: text-to-image
 library_name: transformers
 ---
-## Introduction
-**LongCat-Image-Dev** is an open-source, bilingual (Chinese-English) text-to-image foundation model with 6B parameters, designed for efficient high-quality generation and downstream development.
 <div align="center">
-  <img src="assets/model_struct.svg" width="90%" alt="LongCat-Image Generation Examples" />
 </div>
-### Key Features
-- 🌟 **Exceptional Efficiency & Performance**: With only **6B parameters**, LongCat-Image-Dev achieves SOTA results that rival 20B+ models.
-- 🔧 **True Developer-Ready Foundation**: Unlike typical release-the-final-model-only approaches, we provide the Dev—a high-plasticity, unconstrained state that avoids RL-induced rigidity. This enables seamless fine-tuning without fighting against over-aligned parameter spaces.
-- 🛠️ **Full-Stack Training Framework**: We ship production-ready code for **SFT**, **LoRA fine-tuning**, **DPO/GRPO/MPO alignment**, and **specialized Edit training**. Every stage from pre-training data curation to reward model integration is reproducible, empowering researchers to build on our exact pipeline rather than reverse-engineering it.
----
-## Quick Start
-### Installation
-Clone the repo:
-```shell
-git clone --single-branch --branch main https://github.com/meituan-longcat/LongCat-Image
-cd LongCat-Image
-```
-Install dependencies:
-```shell
-# create conda environment
-conda create -n longcat-image python=3.10
-conda activate longcat-image
-# install other requirements
-pip install -r requirements.txt
-python setup.py develop
-```
-### Run Text-to-Image Generation
-```shell
-import torch
-from transformers import AutoProcessor
-from longcat_image.models import LongCatImageTransformer2DModel
-from longcat_image.pipelines import LongCatImagePipeline
-device = torch.device('cuda')
-checkpoint_dir = './weights/LongCat-Image-Dev'
-text_processor = AutoProcessor.from_pretrained( checkpoint_dir, subfolder = 'tokenizer'  )
-transformer = LongCatImageTransformer2DModel.from_pretrained( checkpoint_dir , subfolder = 'transformer',
-    torch_dtype=torch.bfloat16, use_safetensors=True).to(device)
-pipe = LongCatImagePipeline.from_pretrained(
-    checkpoint_dir,
-    transformer=transformer,
-    text_processor=text_processor
-)
-pipe.to(device, torch.bfloat16)
-prompt = '一个年轻的亚裔女性，身穿黄色针织衫，搭配白色项链。她的双手放在膝盖上，表情恬静。背景是一堵粗糙的砖墙，午后的阳光温暖地洒在她身上，营造出一种宁静而温馨的氛围。镜头采用中距离视角，突出她的神态和服饰的细节。光线柔和地打在她的脸上，强调她的五官和饰品的质感，增加画面的层次感与亲和力。整个画面构图简洁，砖墙的纹理与阳光的光影效果相得益彰，突显出人物的优雅与从容。'
-image = pipe(
-    prompt,
-    height=768,
-    width=1344,
-    guidance_scale=4.5,
-    num_inference_steps=50,
-    num_images_per_prompt=1,
-    generator=torch.Generator("cpu").manual_seed(43),
-    enable_cfg_renorm=True,
-    enable_prompt_rewrite=True
-).images[0]
-image.save('./t2i_example.png')
-```

 library_name: transformers
 ---
 <div align="center">
+  <img src="assets/longcat-image_logo.svg" width="45%" alt="LongCat-Image" />
 </div>
+<hr>
+<div align="center" style="line-height: 1;">
+    <a href='https://arxiv.org/abs/'><img src='https://img.shields.io/badge/Technical-Report-red'></a>
+    <a href='https://github.com/meituan-longcat/LongCat-Image'><img src='https://img.shields.io/badge/GitHub-Code-black'></a>
+    <a href='https://github.com/meituan-longcat/LongCat-Flash-Chat/blob/main/figures/wechat_official_accounts.png'><img src='https://img.shields.io/badge/WeChat-LongCat-brightgreen?logo=wechat&logoColor=white'></a>
+    <a href='https://x.com/Meituan_LongCat'><img src='https://img.shields.io/badge/Twitter-LongCat-white?logo=x&logoColor=white'></a>
+</div>
+<div align="center" style="line-height: 1;">
+[//]: # (  <a href='https://meituan-longcat.github.io/LongCat-Image/'><img src='https://img.shields.io/badge/Project-Page-green'></a>)
+  <a href='https://huggingface.co/meituan-longcat/LongCat-Image'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-LongCat--Image-blue'></a>
+  <a href='https://huggingface.co/meituan-longcat/LongCat-Image-Dev'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-LongCat--Image--Dev-blue'></a>
+  <a href='https://huggingface.co/meituan-longcat/LongCat-Image-Edit'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-LongCat--Image--Edit-blue'></a>
+</div>
+## Introduction
+**LongCat-Image-Dev** is a development variant of LongCat-Image, representing a mid-training checkpoint that is released to facilitate downstream development by the community, such as secondary fine-tuning via SFT, LoRA, and other customization methods.
+<div align="center">
+  <img src="assets/model_struct.jpg" width="90%" alt="LongCat-Image Generation Examples" />
+</div>
+### Key Features
+- 🔧 **True Developer-Ready Foundation**: Unlike typical release-the-final-model-only approaches, we provide the Dev—a high-plasticity, unconstrained state that avoids RL-induced rigidity. This enables seamless fine-tuning without fighting against over-aligned parameter spaces.
+- 🛠️ **Full-Stack Training Framework**: We ship production-ready code for **SFT**, **LoRA fine-tuning**, **DPO/GRPO/MPO alignment**, and **specialized Edit training**. Every stage from pre-training data curation to reward model integration is reproducible, empowering researchers to build on our exact pipeline rather than reverse-engineering it.

assets/longcat-image_logo.svg ADDED Viewed

assets/model_struct.jpg ADDED Viewed

Git LFS Details

SHA256: aadef7db22c66c8060e3f7df5657ae6b77c728429ef28cc57c61638343768bc1
Pointer size: 132 Bytes
Size of remote file: 3.63 MB