BM25

BM25 (Wikipedia)는 Okapi BM25로도 알려져 있으며, 정보 검색 시스템에서 주어진 검색 쿼리에 대한 문서의 관련성을 추정하는 데 사용되는 순위 함수입니다. BM25Retriever retriever는 rank_bm25 패키지를 사용합니다.

pip install -qU  rank_bm25

from langchain_community.retrievers import BM25Retriever

Texts로 새 Retriever 생성하기

retriever = BM25Retriever.from_texts(["foo", "bar", "world", "hello", "foo bar"])

Documents로 새 Retriever 생성하기

이제 생성한 documents로 새 retriever를 만들 수 있습니다.

from langchain_core.documents import Document

retriever = BM25Retriever.from_documents(
    [
        Document(page_content="foo"),
        Document(page_content="bar"),
        Document(page_content="world"),
        Document(page_content="hello"),
        Document(page_content="foo bar"),
    ]
)

Retriever 사용하기

이제 retriever를 사용할 수 있습니다!

result = retriever.invoke("foo")

result

[Document(metadata={}, page_content='foo'),
 Document(metadata={}, page_content='foo bar'),
 Document(metadata={}, page_content='hello'),
 Document(metadata={}, page_content='world')]

Preprocessing Function

검색 결과를 개선하기 위해 retriever에 사용자 정의 preprocessing function을 전달하세요. 단어 수준에서 텍스트를 토큰화하면 특히 청크된 문서에 대해 Chroma, Pinecone 또는 Faiss와 같은 vector store를 사용할 때 검색 성능을 향상시킬 수 있습니다.

import nltk

nltk.download("punkt_tab")

from nltk.tokenize import word_tokenize

retriever = BM25Retriever.from_documents(
    [
        Document(page_content="foo"),
        Document(page_content="bar"),
        Document(page_content="world"),
        Document(page_content="hello"),
        Document(page_content="foo bar"),
    ],
    k=2,
    preprocess_func=word_tokenize,
)

result = retriever.invoke("bar")
result

[Document(metadata={}, page_content='bar'),
 Document(metadata={}, page_content='foo bar')]

Edit the source of this page on GitHub.

Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.

Popular Providers

Integrations by component

Texts로 새 Retriever 생성하기

Documents로 새 Retriever 생성하기

Retriever 사용하기

Preprocessing Function

Popular Providers

Integrations by component

​Texts로 새 Retriever 생성하기

​Documents로 새 Retriever 생성하기

​Retriever 사용하기

​Preprocessing Function

Texts로 새 Retriever 생성하기

Documents로 새 Retriever 생성하기

Retriever 사용하기

Preprocessing Function