PGVector

postgres를 백엔드로 사용하고 pgvector 확장을 활용하는 LangChain vectorstore 추상화의 구현입니다.

이 코드는 langchain-postgres라는 통합 패키지에 포함되어 있습니다.

상태

이 코드는 langchain-community에서 전용 패키지인 langchain-postgres로 이관되었습니다. 다음과 같은 변경 사항이 적용되었습니다:

langchain-postgres는 psycopg3에서만 동작합니다. 연결 문자열을 postgresql+psycopg2://...에서 postgresql+psycopg://langchain:langchain@...로 업데이트해 주세요(네, 드라이버 이름은 psycopg3가 아니라 psycopg이지만 내부적으로 psycopg3를 사용합니다).
사용자 지정 ID로 add_documents가 올바르게 작동하도록 embedding store와 collection의 스키마가 변경되었습니다.
이제 명시적인 connection 객체를 전달해야 합니다.

현재 스키마 변경에 따른 손쉬운 데이터 마이그레이션을 지원하는 메커니즘은 없습니다. 따라서 vectorstore의 스키마가 변경되면 사용자가 테이블을 다시 생성하고 문서를 다시 추가해야 합니다. 이 점이 우려된다면 다른 vectorstore 사용을 권장합니다. 그렇지 않다면, 이 구현은 귀하의 사용 사례에 적합할 것입니다.

설정

먼저 파트너 패키지를 다운로드하세요:

pip install -qU langchain-postgres

다음 명령을 실행하여 pgvector 확장이 포함된 postgres 컨테이너를 띄울 수 있습니다:

%docker run --name pgvector-container -e POSTGRES_USER=langchain -e POSTGRES_PASSWORD=langchain -e POSTGRES_DB=langchain -p 6024:5432 -d pgvector/pgvector:pg16

자격 증명

이 노트북을 실행하는 데는 별도의 자격 증명이 필요하지 않습니다. langchain-postgres 패키지를 다운로드하고 postgres 컨테이너를 올바르게 시작했는지 확인하세요. 모델 호출에 대한 최고 수준의 자동 추적을 원하신다면 아래의 주석을 해제하여 LangSmith API 키를 설정할 수도 있습니다:

os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
os.environ["LANGSMITH_TRACING"] = "true"

인스턴스화

# | output: false
# | echo: false
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

from langchain_postgres import PGVector

# See docker command above to launch a postgres instance with pgvector enabled.
connection = "postgresql+psycopg://langchain:langchain@localhost:6024/langchain"  # Uses psycopg3!
collection_name = "my_docs"

vector_store = PGVector(
    embeddings=embeddings,
    collection_name=collection_name,
    connection=connection,
    use_jsonb=True,
)

vector store 관리

vector store에 항목 추가

ID로 문서를 추가하면 해당 ID와 일치하는 기존 문서를 덮어씁니다.

from langchain_core.documents import Document

docs = [
    Document(
        page_content="there are cats in the pond",
        metadata={"id": 1, "location": "pond", "topic": "animals"},
    ),
    Document(
        page_content="ducks are also found in the pond",
        metadata={"id": 2, "location": "pond", "topic": "animals"},
    ),
    Document(
        page_content="fresh apples are available at the market",
        metadata={"id": 3, "location": "market", "topic": "food"},
    ),
    Document(
        page_content="the market also sells fresh oranges",
        metadata={"id": 4, "location": "market", "topic": "food"},
    ),
    Document(
        page_content="the new art exhibit is fascinating",
        metadata={"id": 5, "location": "museum", "topic": "art"},
    ),
    Document(
        page_content="a sculpture exhibit is also at the museum",
        metadata={"id": 6, "location": "museum", "topic": "art"},
    ),
    Document(
        page_content="a new coffee shop opened on Main Street",
        metadata={"id": 7, "location": "Main Street", "topic": "food"},
    ),
    Document(
        page_content="the book club meets at the library",
        metadata={"id": 8, "location": "library", "topic": "reading"},
    ),
    Document(
        page_content="the library hosts a weekly story time for kids",
        metadata={"id": 9, "location": "library", "topic": "reading"},
    ),
    Document(
        page_content="a cooking class for beginners is offered at the community center",
        metadata={"id": 10, "location": "community center", "topic": "classes"},
    ),
]

vector_store.add_documents(docs, ids=[doc.metadata["id"] for doc in docs])

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

vector store에서 항목 삭제

vector_store.delete(ids=["3"])

vector store 쿼리

vector store가 생성되고 관련 문서가 추가되면, 체인이나 에이전트를 실행하는 동안 이를 쿼리하고자 할 가능성이 큽니다.

필터링 지원

vectorstore는 문서의 metadata 필드에 적용할 수 있는 일련의 필터를 지원합니다.

Operator	의미/분류
$eq	동등 (==)
$ne	불일치 (!=)
$lt	작다 (<)
$lte	작거나 같음 (<=)
$gt	크다 (>)
$gte	크거나 같음 (>=)
$in	특수 처리됨 (in)
$nin	특수 처리됨 (not in)
$between	특수 처리됨 (between)
$like	텍스트 (like)
$ilike	텍스트 (대소문자 구분 없음 like)
$and	논리 (and)
$or	논리 (or)

직접 쿼리

간단한 similarity search는 다음과 같이 수행할 수 있습니다:

results = vector_store.similarity_search(
    "kitty", k=10, filter={"id": {"$in": [1, 5, 2, 9]}}
)
for doc in results:
    print(f"* {doc.page_content} [{doc.metadata}]")

* there are cats in the pond [{'id': 1, 'topic': 'animals', 'location': 'pond'}]
* the library hosts a weekly story time for kids [{'id': 9, 'topic': 'reading', 'location': 'library'}]
* ducks are also found in the pond [{'id': 2, 'topic': 'animals', 'location': 'pond'}]
* the new art exhibit is fascinating [{'id': 5, 'topic': 'art', 'location': 'museum'}]

여러 필드가 있는 dict를 제공하되 연산자를 지정하지 않으면, 최상위는 논리적 AND 필터로 해석됩니다.

vector_store.similarity_search(
    "ducks",
    k=10,
    filter={"id": {"$in": [1, 5, 2, 9]}, "location": {"$in": ["pond", "market"]}},
)

[Document(metadata={'id': 1, 'topic': 'animals', 'location': 'pond'}, page_content='there are cats in the pond'),
 Document(metadata={'id': 2, 'topic': 'animals', 'location': 'pond'}, page_content='ducks are also found in the pond')]

vector_store.similarity_search(
    "ducks",
    k=10,
    filter={
        "$and": [
            {"id": {"$in": [1, 5, 2, 9]}},
            {"location": {"$in": ["pond", "market"]}},
        ]
    },
)

[Document(metadata={'id': 1, 'topic': 'animals', 'location': 'pond'}, page_content='there are cats in the pond'),
 Document(metadata={'id': 2, 'topic': 'animals', 'location': 'pond'}, page_content='ducks are also found in the pond')]

similarity search를 실행하고 해당 점수도 함께 받으려면 다음을 실행하세요:

results = vector_store.similarity_search_with_score(query="cats", k=1)
for doc, score in results:
    print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")

* [SIM=0.763449] there are cats in the pond [{'id': 1, 'topic': 'animals', 'location': 'pond'}]

PGVector vector store에서 실행할 수 있는 다양한 검색의 전체 목록은 API 레퍼런스를 참조하세요.

retriever로 변환하여 쿼리하기

체인에서 더 쉽게 사용하기 위해 vector store를 retriever로 변환할 수도 있습니다.

retriever = vector_store.as_retriever(search_type="mmr", search_kwargs={"k": 1})
retriever.invoke("kitty")

[Document(metadata={'id': 1, 'topic': 'animals', 'location': 'pond'}, page_content='there are cats in the pond')]

Retrieval-Augmented Generation 사용

이 vector store를 retrieval-augmented generation(RAG)에 사용하는 방법은 다음 섹션을 참고하세요:

API 레퍼런스

__ModuleName__VectorStore의 모든 기능과 설정에 대한 자세한 문서는 API 레퍼런스를 확인하세요: python.langchain.com/api_reference/postgres/vectorstores/langchain_postgres.vectorstores.PGVector.html

Edit the source of this page on GitHub.

Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.

Popular Providers

Integrations by component

상태

설정

자격 증명

인스턴스화

vector store 관리

vector store에 항목 추가

vector store에서 항목 삭제

vector store 쿼리

필터링 지원

직접 쿼리

retriever로 변환하여 쿼리하기

Retrieval-Augmented Generation 사용

API 레퍼런스

Popular Providers

Integrations by component

​상태

​설정

​자격 증명

​인스턴스화

​vector store 관리

​vector store에 항목 추가

​vector store에서 항목 삭제

​vector store 쿼리

​필터링 지원

​직접 쿼리

​retriever로 변환하여 쿼리하기

​Retrieval-Augmented Generation 사용

​API 레퍼런스

상태

설정

자격 증명

인스턴스화

vector store 관리

vector store에 항목 추가

vector store에서 항목 삭제

vector store 쿼리

필터링 지원

직접 쿼리

retriever로 변환하여 쿼리하기

Retrieval-Augmented Generation 사용

API 레퍼런스