联 系 我 们
售前咨询
售后咨询
微信关注:星环科技服务号
更多联系方式 >
9 Hippo with Langchain
更新时间:12/24/2024, 11:10:43 AM

背景

LangChain 是一个用于开发基于语言模型的应用程序开发框架,可以轻松管理与语言模型的交互,将多个组件链接在一起,并集成额外的资源,例如 API 和数据库。其组件包括了模型 (各类LLM),提示模板 (Prompts),索引 (Index),代理 (Agent),记忆 (Memory) 等等。为了使Hippo的适用性更强,开发更加方便,使用 Python 代码将 Hippo 与 Langchain 进行集成。

演示案例

注意:

  • 如果要调用 OpenAI 的模型,需要使用代理 (美国节点)

  • 使用前需要先从 OpenAI 官网获得 API Key

  • OpenAI 的 Embedding 模型生成的向量维度数为 1536,在建表指定列的维度时需要注意

  • 向量列必须创建索引才可对其进行查询

  • 需要 Python 的版本 >= 3.8

安装依赖

首先,在正常使用之前,用户需要安装OpenAI,Langchain,以及Hippo-API对应的依赖包。请注意,用户需要根据自己所使用的安装环境来判断对应依赖包的版本信息。

!pip install langchain=0.1.13
!pip install tiktoken openai
!pip install hippo-api
复制
Requirement already satisfied: hippo-api==1.1.0.rc3 in /Users/daochengzhang/miniforge3/envs/py310/lib/python3.10/site-packages (1.1.0rc3)
Requirement already satisfied: pyyaml>=6.0 in /Users/daochengzhang/miniforge3/envs/py310/lib/python3.10/site-packages (from hippo-api==1.1.0.rc3) (6.0.1)
复制

导入依赖包

import os

from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import TextLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores.hippo import Hippo
复制

文章 / 文本导入 Hippo

# 建议用自己的 Open API Key
OPENAI_API_KEY = "your_key"
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

# 使用自己的代理
os.environ['http_proxy'] = 'your_proxy'

# 读取本地文件
loader = TextLoader("../../modules/state_of_the_union.txt")
documents = loader.load()
复制

切分文本

用户可以使用 Langchain 自带的 CharacterTextSplitter 方法进行文件的切分,该方法默认的分隔符为 \n\n。以下面所示为例,每一个 Segment 的大小不超过 500 字符,并且重复的字符数为 0。

text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=0)
docs = text_splitter.split_documents(documents)
复制

声明 Embedding 模型

用户可以使用 Langchain 的 OpenAIEmbeddings 方法来创建 OpenAI 或 Azure 的 Embedding 模型。

# OpenAI
embeddings = OpenAIEmbeddings()

# Azure
# embeddings = OpenAIEmbeddings(
#     openai_api_type="azure",
#     openai_api_base="x x x",
#     openai_api_version="x x x",
#     model="x x x",
#     deployment="x x x",
#     openai_api_key="x x x"
# )
复制

声明 Hippo Client

HIPPO_CONNECTION = {"host": "IP", "port": "PORT"}
复制

存储文本

print("input...")
# Insert Docs
vector_store = Hippo.from_documents(
    docs,
    embedding=embeddings,
    table_name="langchain_test",
    connection_args=HIPPO_CONNECTION,
)
print("success")

input...
success
复制

知识问答

创建问答大模型

用户可以使用 Langchain 的 AzureChatOpenAI 或 ChatOpenAI 方法来构建 Azure 或 OpenAI 的问答大语言模型。

# llm = AzureChatOpenAI(
#     openai_api_base="x x x",
#     openai_api_version="xxx",
#     deployment_name="xxx",
#     openai_api_key="xxx",
#     openai_api_type="azure"
# )

llm = ChatOpenAI(openai_api_key="YOUR OPENAI KEY", model_name="gpt-3.5-turbo-16k")
复制
获取问答结果
query = "Please introduce COVID-19"
# query = "Please introduce Hippo Core Architecture"
# query = "What operations does the Hippo Vector Database support for vector data?"
# query = "Does Hippo use hardware acceleration technology? Briefly introduce hardware acceleration technology."

# 从知识库中检索相似结果,并获取最相似的2个文本
res = vector_store.similarity_search(query, 2)
content_list = [item.page_content for item in res]
text = "".join(content_list)
复制
构建提示模板
prompt = f"""
Please use the content of the following [Article] to answer my question. If you don't know, please say you don't know, and the answer should be concise."
[Article]:{text}
Please answer this question in conjunction with the above article:{query}
"""
复制
等待大模型生成问答结果
response_with_hippo = llm.predict(prompt)
print(f"response_with_hippo:{response_with_hippo}")

response = llm.predict(query)
print("==========================================")
print(f"response_without_hippo:{response}")
复制
response_with_hippo:COVID-19 is a virus that has impacted every aspect of our lives for over two years. It is a highly contagious and mutates easily, requiring us to remain vigilant in combating its spread. However, due to progress made and the resilience of individuals, we are now able to move forward safely and return to more normal routines.
==========================================
response_without_hippo:COVID-19 is a contagious respiratory illness caused by the novel coronavirus SARS-CoV-2. It was first identified in December 2019 in Wuhan, China and has since spread globally, leading to a pandemic. The virus primarily spreads through respiratory droplets when an infected person coughs, sneezes, talks, or breathes, and can also spread by touching contaminated surfaces and then touching the face. COVID-19 symptoms include fever, cough, shortness of breath, fatigue, muscle or body aches, sore throat, loss of taste or smell, headache, and in severe cases, pneumonia and organ failure. While most people experience mild to moderate symptoms, it can lead to severe illness and even death, particularly among older adults and those with underlying health conditions. To combat the spread of the virus, various preventive measures have been implemented globally, including social distancing, wearing face masks, practicing good hand hygiene, and vaccination efforts.
复制