ClickHouse

ClickHouse는 실시간 앱과 분석을 위한 가장 빠르고 리소스 효율적인 오픈소스 데이터베이스로, 완전한 SQL 지원과 사용자가 분석 쿼리를 작성하는 데 도움이 되는 다양한 함수를 제공합니다. 최근 추가된 데이터 구조와 거리 검색 함수(예: L2Distance) 및 근사 최근접 이웃 검색 인덱스를 통해 ClickHouse를 고성능 및 확장 가능한 vector database로 사용하여 SQL로 벡터를 저장하고 검색할 수 있습니다.

이 노트북은 ClickHouse vector store와 관련된 기능을 사용하는 방법을 보여줍니다.

Setup

먼저 docker로 로컬 clickhouse 서버를 설정합니다:

! docker run -d -p 8123:8123 -p 9000:9000 --name langchain-clickhouse-server --ulimit nofile=262144:262144 -e CLICKHOUSE_SKIP_USER_SETUP=1 clickhouse/clickhouse-server:25.7

이 통합을 사용하려면 langchain-community와 clickhouse-connect를 설치해야 합니다

pip install -qU langchain-community clickhouse-connect

Credentials

이 노트북에는 별도의 credentials가 필요하지 않으며, 위에 표시된 대로 패키지를 설치했는지 확인하기만 하면 됩니다. 모델 호출에 대한 최고 수준의 자동 추적을 원하시면 아래 주석을 해제하여 LangSmith API key를 설정할 수도 있습니다:

os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
os.environ["LANGSMITH_TRACING"] = "true"

Instantiation

# | output: false
# | echo: false
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

from langchain_community.vectorstores import Clickhouse, ClickhouseSettings

settings = ClickhouseSettings(table="clickhouse_example")
vector_store = Clickhouse(embeddings, config=settings)

Manage vector store

vector store를 생성한 후에는 다양한 항목을 추가하고 삭제하여 상호작용할 수 있습니다.

Add items to vector store

add_documents 함수를 사용하여 vector store에 항목을 추가할 수 있습니다.

from uuid import uuid4

from langchain_core.documents import Document

document_1 = Document(
    page_content="I had chocolate chip pancakes and scrambled eggs for breakfast this morning.",
    metadata={"source": "tweet"},
)

document_2 = Document(
    page_content="The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.",
    metadata={"source": "news"},
)

document_3 = Document(
    page_content="Building an exciting new project with LangChain - come check it out!",
    metadata={"source": "tweet"},
)

document_4 = Document(
    page_content="Robbers broke into the city bank and stole $1 million in cash.",
    metadata={"source": "news"},
)

document_5 = Document(
    page_content="Wow! That was an amazing movie. I can't wait to see it again.",
    metadata={"source": "tweet"},
)

document_6 = Document(
    page_content="Is the new iPhone worth the price? Read this review to find out.",
    metadata={"source": "website"},
)

document_7 = Document(
    page_content="The top 10 soccer players in the world right now.",
    metadata={"source": "website"},
)

document_8 = Document(
    page_content="LangGraph is the best framework for building stateful, agentic applications!",
    metadata={"source": "tweet"},
)

document_9 = Document(
    page_content="The stock market is down 500 points today due to fears of a recession.",
    metadata={"source": "news"},
)

document_10 = Document(
    page_content="I have a bad feeling I am going to get deleted :(",
    metadata={"source": "tweet"},
)

documents = [
    document_1,
    document_2,
    document_3,
    document_4,
    document_5,
    document_6,
    document_7,
    document_8,
    document_9,
    document_10,
]
uuids = [str(uuid4()) for _ in range(len(documents))]

vector_store.add_documents(documents=documents, ids=uuids)

Delete items from vector store

delete 함수를 사용하여 ID로 vector store에서 항목을 삭제할 수 있습니다.

vector_store.delete(ids=uuids[-1])

Query vector store

vector store가 생성되고 관련 문서가 추가되면 chain이나 agent를 실행하는 동안 쿼리하고 싶을 것입니다.

Query directly

Similarity search

간단한 similarity search는 다음과 같이 수행할 수 있습니다:

results = vector_store.similarity_search(
    "LangChain provides abstractions to make working with LLMs easy", k=2
)
for res in results:
    page_content, metadata = res
    print(f"* {page_content} [{metadata}]")

Similarity search with score

score와 함께 검색할 수도 있습니다:

results = vector_store.similarity_search_with_score("Will it be hot tomorrow?", k=1)
for res, score in results:
    print(f"* [SIM={score:3f}] {res.page_content} [{res.metadata}]")

Filtering

ClickHouse SQL where 문에 직접 액세스할 수 있습니다. 표준 SQL을 따라 WHERE 절을 작성할 수 있습니다. 참고: SQL injection에 주의하시기 바랍니다. 이 인터페이스는 최종 사용자가 직접 호출해서는 안 됩니다. 설정에서 column_map을 사용자 정의한 경우 다음과 같이 filter로 검색할 수 있습니다:

meta = vector_store.metadata_column
results = vector_store.similarity_search_with_relevance_scores(
    "What did I eat for breakfast?",
    k=4,
    where_str=f"{meta}.source = 'tweet'",
)
for res in results:
    print(f"* {res.page_content} [{res.metadata}]")

Other search methods

이 노트북에서 다루지 않는 MMR search나 vector로 검색하는 등 다양한 검색 방법이 있습니다. Clickhouse vector store에서 사용 가능한 검색 기능의 전체 목록은 API reference를 확인하세요.

Query by turning into retriever

vector store를 retriever로 변환하여 chain에서 더 쉽게 사용할 수도 있습니다. 다음은 vector store를 retriever로 변환한 다음 간단한 쿼리와 filter로 retriever를 호출하는 방법입니다.

retriever = vector_store.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={"k": 1, "score_threshold": 0.5},
)
retriever.invoke("Stealing from the bank is a crime", filter={"source": "news"})

Usage for retrieval-augmented generation

이 vector store를 retrieval-augmented generation (RAG)에 사용하는 방법에 대한 가이드는 다음 섹션을 참조하세요:

더 자세한 내용은 Astra DB를 사용한 완전한 RAG template을 여기에서 확인하세요.

API reference

모든 Clickhouse 기능 및 구성에 대한 자세한 문서는 API reference를 참조하세요.

Edit the source of this page on GitHub.

Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.

Popular Providers

Integrations by component

Setup

Credentials

Instantiation

Manage vector store

Add items to vector store

Delete items from vector store

Query vector store

Query directly

Similarity search

Similarity search with score

Filtering

Other search methods

Query by turning into retriever

Usage for retrieval-augmented generation

API reference

Popular Providers

Integrations by component

​Setup

​Credentials

​Instantiation

​Manage vector store

​Add items to vector store

​Delete items from vector store

​Query vector store

​Query directly

​Similarity search

​Similarity search with score

​Filtering

​Other search methods

​Query by turning into retriever

​Usage for retrieval-augmented generation

​API reference

Setup

Credentials

Instantiation

Manage vector store

Add items to vector store

Delete items from vector store

Query vector store

Query directly

Similarity search

Similarity search with score

Filtering

Other search methods

Query by turning into retriever

Usage for retrieval-augmented generation

API reference