Spaces:

Zulelee
/

langchain-chatchat

Sleeping

App Files Files Community

langchain-chatchat / README_en.md

Zulelee

Upload 16 files

8d50bff verified almost 2 years ago

preview code

raw

history blame contribute delete

7.99 kB

	![](img/logo-long-chatchat-trans-v2.png)

	🌍 [中文文档](README.md)
	🌍 [日本語で読む](README_ja.md)

	📃 LangChain-Chatchat (formerly Langchain-ChatGLM):

	A LLM application aims to implement knowledge and search engine based QA based on Langchain and open-source or remote
	LLM API.

	⚠️`0.2.10` will be the last version of the `0.2.x` series. The `0.2.x` series will stop updating and technical support,
	and strive to develop `Langchain-Chachat 0.3.x with stronger applicability. `.


	---

	## Table of Contents

	- [Introduction](README.md#Introduction)
	- [Pain Points Addressed](README.md#Pain-Points-Addressed)
	- [Quick Start](README.md#Quick-Start)
	- [1. Environment Setup](README.md#1-Environment-Setup)
	- [2. Model Download](README.md#2-Model-Download)
	- [3. Initialize Knowledge Base and Configuration Files](README.md#3-Initialize-Knowledge-Base-and-Configuration-Files)
	- [4. One-Click Startup](README.md#4-One-Click-Startup)
	- [5. Startup Interface Examples](README.md#5-Startup-Interface-Examples)
	- [Contact Us](README.md#Contact-Us)

	## Introduction

	🤖️ A Q&A application based on local knowledge base implemented using the idea
	of [langchain](https://github.com/langchain-ai/langchain). The goal is to build a KBQA(Knowledge based Q&A) solution
	that
	is friendly to Chinese scenarios and open source models and can run both offline and online.

	💡 Inspired by [document.ai](https://github.com/GanymedeNil/document.ai)
	and [ChatGLM-6B Pull Request](https://github.com/THUDM/ChatGLM-6B/pull/216) , we build a local knowledge base question
	answering application that can be implemented using an open source model or remote LLM api throughout the process. In
	the latest version of this project, [FastChat](https://github.com/lm-sys/FastChat) is used to access Vicuna, Alpaca,
	LLaMA, Koala, RWKV and many other models. Relying on [langchain](https://github.com/langchain-ai/langchain) , this
	project supports calling services through the API provided based on [FastAPI](https://github.com/tiangolo/fastapi), or
	using the WebUI based on [Streamlit](https://github.com/streamlit/streamlit).

	✅ Relying on the open source LLM and Embedding models, this project can realize full-process **offline private
	deployment**. At the same time, this project also supports the call of OpenAI GPT API- and Zhipu API, and will continue
	to expand the access to various models and remote APIs in the future.

	⛓️ The implementation principle of this project is shown in the graph below. The main process includes: loading files ->
	reading text -> text segmentation -> text vectorization -> question vectorization -> matching the `top-k` most similar
	to the question vector in the text vector -> The matched text is added to `prompt `as context and question -> submitte
	to `LLM` to generate an answer.

	📺[video introduction](https://www.bilibili.com/video/BV13M4y1e7cN/?share_source=copy_web&vd_source=e6c5aafe684f30fbe41925d61ca6d514)

	![实现原理图](img/langchain+chatglm.png)

	The main process analysis from the aspect of document process:

	![实现原理图2](img/langchain+chatglm2.png)

	🚩 The training or fine-tuning are not involved in the project, but still, one always can improve performance by do
	these.

	🌐 [AutoDL image](https://www.codewithgpu.com/i/chatchat-space/Langchain-Chatchat/Langchain-Chatchat) is supported, and in v13 the codes are update to v0.2.9.

	🐳 [Docker image](registry.cn-beijing.aliyuncs.com/chatchat/chatchat:0.2.7) is supported to 0.2.7

	## Pain Points Addressed

	This project is a solution for enhancing knowledge bases with fully localized inference, specifically addressing the
	pain points of data security and private deployments for businesses.
	This open-source solution is under the Apache License and can be used for commercial purposes for free, with no fees
	required.
	We support mainstream local large prophecy models and Embedding models available in the market, as well as open-source
	local vector databases. For a detailed list of supported models and databases, please refer to
	our [Wiki](https://github.com/chatchat-space/Langchain-Chatchat/wiki/)

	## Quick Start

	### Environment Setup

	First, make sure your machine has Python 3.10 installed.

	```
	$ python --version
	Python 3.10.12
	```

	Then, create a virtual environment and install the project's dependencies within the virtual environment.

	```shell

	# 拉取仓库
	$ git clone https://github.com/chatchat-space/Langchain-Chatchat.git

	# 进入目录
	$ cd Langchain-Chatchat

	# 安装全部依赖
	$ pip install -r requirements.txt
	$ pip install -r requirements_api.txt
	$ pip install -r requirements_webui.txt

	# 默认依赖包括基本运行环境（FAISS向量库）。如果要使用 milvus/pg_vector 等向量库，请将 requirements.txt 中相应依赖取消注释再安装。
	```

	Please note that the LangChain-Chachat `0.2.x` series is for the Langchain `0.0.x` series version. If you are using the
	Langchain `0.1.x` series version, you need to downgrade.

	### Model Download

	If you need to run this project locally or in an offline environment, you must first download the required models for
	the project. Typically, open-source LLM and Embedding models can be downloaded from HuggingFace.

	Taking the default LLM model used in this project, [THUDM/chatglm2-6b](https://huggingface.co/THUDM/chatglm2-6b), and
	the Embedding model [moka-ai/m3e-base](https://huggingface.co/moka-ai/m3e-base) as examples:

	To download the models, you need to first
	install [Git LFS](https://docs.github.com/zh/repositories/working-with-files/managing-large-files/installing-git-large-file-storage)
	and then run:

	```Shell
	$ git lfs install
	$ git clone https://huggingface.co/THUDM/chatglm2-6b
	$ git clone https://huggingface.co/moka-ai/m3e-base
	```

	### Initializing the Knowledge Base and Config File

	Follow the steps below to initialize your own knowledge base and config file:

	```shell
	$ python copy_config_example.py
	$ python init_database.py --recreate-vs
	```

	### One-Click Launch

	To start the project, run the following command:

	```shell
	$ python startup.py -a
	```

	### Example of Launch Interface

	1. FastAPI docs interface

	![](img/fastapi_docs_026.png)

	2. webui page

	- Web UI dialog page:

	![img](img/LLM_success.png)

	- Web UI knowledge base management page:

	![](img/init_knowledge_base.jpg)

	### Note

	The above instructions are provided for a quick start. If you need more features or want to customize the launch method,
	please refer to the [Wiki](https://github.com/chatchat-space/Langchain-Chatchat/wiki/).

	---

	## Project Milestones

	+ `April 2023`: `Langchain-ChatGLM 0.1.0` released, supporting local knowledge base question and answer based on the
	ChatGLM-6B model.
	+ `August 2023`: `Langchain-ChatGLM` was renamed to `Langchain-Chatchat`, `0.2.0` was released, using `fastchat` as the
	model loading solution, supporting more models and databases.
	+ `October 2023`: `Langchain-Chachat 0.2.5` was released, Agent content was launched, and the open source project won
	the third prize in the hackathon held by `Founder Park & Zhipu AI & Zilliz`.
	+ `December 2023`: `Langchain-Chachat` open source project received more than 20K stars.
	+ `January 2024`: `LangChain 0.1.x` is launched, `Langchain-Chachat 0.2.x` is released. After the stable
	version `0.2.10` is released, updates and technical support will be stopped, and all efforts will be made to
	develop `Langchain with stronger applicability -Chat 0.3.x`.


	+ 🔥 Let’s look forward to the future Chatchat stories together···

	---

	## Contact Us

	### Telegram

	[![Telegram](https://img.shields.io/badge/Telegram-2CA5E0?style=for-the-badge&logo=telegram&logoColor=white "langchain-chatglm")](https://t.me/+RjliQ3jnJ1YyN2E9)

	### WeChat Group

	<img src="img/qr_code_87.jpg" alt="二维码" width="300" height="300" />

	### WeChat Official Account

	<img src="img/official_wechat_mp_account.png" alt="图片" width="900" height="300" />