Faiss

Facebook AI Similarity Search (FAISS)는 밀집 벡터의 효율적인 유사도 검색 및 클러스터링을 위한 라이브러리입니다. RAM에 맞지 않을 수 있는 크기를 포함하여 모든 크기의 벡터 집합에서 검색하는 알고리즘을 포함합니다. 또한 평가 및 매개변수 튜닝을 위한 지원 코드도 포함되어 있습니다. The FAISS Library 논문을 참조하세요.

FAISS 문서는 이 페이지에서 찾을 수 있습니다. 이 노트북은 FAISS vector database와 관련된 기능을 사용하는 방법을 보여줍니다. 이 통합에 특화된 기능을 보여줄 것입니다. 이 내용을 살펴본 후, 관련 사용 사례 페이지를 탐색하여 이 vectorstore를 더 큰 chain의 일부로 사용하는 방법을 배우는 것이 유용할 수 있습니다.

Setup

이 통합은 langchain-community 패키지에 포함되어 있습니다. 또한 faiss 패키지 자체도 설치해야 합니다. 다음과 같이 설치할 수 있습니다: GPU 지원 버전을 사용하려면 faiss-gpu를 설치할 수도 있습니다.

pip install -qU langchain-community faiss-cpu

모델 호출에 대한 최고 수준의 자동 추적을 원하시면 아래 주석을 해제하여 LangSmith API key를 설정할 수 있습니다:

os.environ["LANGSMITH_TRACING"] = "true"
# os.environ["LANGSMITH_API_KEY"] = getpass.getpass()

Initialization

# | output: false
# | echo: false
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

import faiss
from langchain_community.docstore.in_memory import InMemoryDocstore
from langchain_community.vectorstores import FAISS

index = faiss.IndexFlatL2(len(embeddings.embed_query("hello world")))

vector_store = FAISS(
    embedding_function=embeddings,
    index=index,
    docstore=InMemoryDocstore(),
    index_to_docstore_id={},
)

Manage vector store

Add items to vector store

from uuid import uuid4

from langchain_core.documents import Document

document_1 = Document(
    page_content="I had chocolate chip pancakes and scrambled eggs for breakfast this morning.",
    metadata={"source": "tweet"},
)

document_2 = Document(
    page_content="The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.",
    metadata={"source": "news"},
)

document_3 = Document(
    page_content="Building an exciting new project with LangChain - come check it out!",
    metadata={"source": "tweet"},
)

document_4 = Document(
    page_content="Robbers broke into the city bank and stole $1 million in cash.",
    metadata={"source": "news"},
)

document_5 = Document(
    page_content="Wow! That was an amazing movie. I can't wait to see it again.",
    metadata={"source": "tweet"},
)

document_6 = Document(
    page_content="Is the new iPhone worth the price? Read this review to find out.",
    metadata={"source": "website"},
)

document_7 = Document(
    page_content="The top 10 soccer players in the world right now.",
    metadata={"source": "website"},
)

document_8 = Document(
    page_content="LangGraph is the best framework for building stateful, agentic applications!",
    metadata={"source": "tweet"},
)

document_9 = Document(
    page_content="The stock market is down 500 points today due to fears of a recession.",
    metadata={"source": "news"},
)

document_10 = Document(
    page_content="I have a bad feeling I am going to get deleted :(",
    metadata={"source": "tweet"},
)

documents = [
    document_1,
    document_2,
    document_3,
    document_4,
    document_5,
    document_6,
    document_7,
    document_8,
    document_9,
    document_10,
]
uuids = [str(uuid4()) for _ in range(len(documents))]

vector_store.add_documents(documents=documents, ids=uuids)

['22f5ce99-cd6f-4e0c-8dab-664128307c72',
 'dc3f061b-5f88-4fa1-a966-413550c51891',
 'd33d890b-baad-47f7-b7c1-175f5f7b4e59',
 '6e6c01d2-6020-4a7b-95da-ef43d43f01b5',
 'e677223d-ad75-4c1a-bef6-b5912bd1de03',
 '47e2a168-6462-4ed2-b1d9-d9edfd7391d6',
 '1e4d66d6-e155-4891-9212-f7be97f36c6a',
 'c0663096-e1a5-4665-b245-1c2e6c4fb653',
 '8297474a-7f7c-4006-9865-398c1781b1bc',
 '44e4be03-0a8d-4316-b3c4-f35f4bb2b532']

Delete items from vector store

vector_store.delete(ids=[uuids[-1]])

True

Query vector store

vector store가 생성되고 관련 문서가 추가되면, chain이나 agent를 실행하는 동안 쿼리하고 싶을 것입니다.

Query directly

Similarity search

metadata에 대한 필터링과 함께 간단한 유사도 검색을 수행하는 방법은 다음과 같습니다:

results = vector_store.similarity_search(
    "LangChain provides abstractions to make working with LLMs easy",
    k=2,
    filter={"source": "tweet"},
)
for res in results:
    print(f"* {res.page_content} [{res.metadata}]")

* Building an exciting new project with LangChain - come check it out! [{'source': 'tweet'}]
* LangGraph is the best framework for building stateful, agentic applications! [{'source': 'tweet'}]

더 고급 metadata 필터링을 위해 일부 MongoDB query and projection operators가 지원됩니다. 현재 지원되는 operator 목록은 다음과 같습니다:

$eq (equals)
$neq (not equals)
$gt (greater than)
$lt (less than)
$gte (greater than or equal)
$lte (less than or equal)
$in (membership in list)
$nin (not in list)
$and (all conditions must match)
$or (any condition must match)
$not (negation of condition)

고급 metadata 필터링을 사용한 동일한 유사도 검색은 다음과 같이 수행할 수 있습니다:

results = vector_store.similarity_search(
    "LangChain provides abstractions to make working with LLMs easy",
    k=2,
    filter={"source": {"$eq": "tweet"}},
)
for res in results:
    print(f"* {res.page_content} [{res.metadata}]")

* Building an exciting new project with LangChain - come check it out! [{'source': 'tweet'}]
* LangGraph is the best framework for building stateful, agentic applications! [{'source': 'tweet'}]

Similarity search with score

score와 함께 검색할 수도 있습니다:

results = vector_store.similarity_search_with_score(
    "Will it be hot tomorrow?", k=1, filter={"source": "news"}
)
for res, score in results:
    print(f"* [SIM={score:3f}] {res.page_content} [{res.metadata}]")

* [SIM=0.893688] The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees. [{'source': 'news'}]

Other search methods

FAISS vector store를 검색하는 다양한 다른 방법이 있습니다. 이러한 메서드의 전체 목록은 API Reference를 참조하세요.

Query by turning into retriever

vector store를 retriever로 변환하여 chain에서 더 쉽게 사용할 수도 있습니다.

retriever = vector_store.as_retriever(search_type="mmr", search_kwargs={"k": 1})
retriever.invoke("Stealing from the bank is a crime", filter={"source": "news"})

[Document(metadata={'source': 'news'}, page_content='Robbers broke into the city bank and stole $1 million in cash.')]

Usage for retrieval-augmented generation

retrieval-augmented generation (RAG)을 위해 이 vector store를 사용하는 방법에 대한 가이드는 다음 섹션을 참조하세요:

Saving and loading

FAISS index를 저장하고 로드할 수도 있습니다. 이는 사용할 때마다 다시 생성할 필요가 없어 유용합니다.

vector_store.save_local("faiss_index")

new_vector_store = FAISS.load_local(
    "faiss_index", embeddings, allow_dangerous_deserialization=True
)

docs = new_vector_store.similarity_search("qux")

docs[0]

Document(metadata={'source': 'tweet'}, page_content='Building an exciting new project with LangChain - come check it out!')

Merging

두 개의 FAISS vectorstore를 병합할 수도 있습니다.

db1 = FAISS.from_texts(["foo"], embeddings)
db2 = FAISS.from_texts(["bar"], embeddings)

db1.docstore._dict

{'b752e805-350e-4cf5-ba54-0883d46a3a44': Document(page_content='foo')}

db2.docstore._dict

{'08192d92-746d-4cd1-b681-bdfba411f459': Document(page_content='bar')}

db1.merge_from(db2)

db1.docstore._dict

{'b752e805-350e-4cf5-ba54-0883d46a3a44': Document(page_content='foo'),
 '08192d92-746d-4cd1-b681-bdfba411f459': Document(page_content='bar')}

API reference

모든 FAISS vector store 기능 및 구성에 대한 자세한 문서는 API reference를 참조하세요: python.langchain.com/api_reference/community/vectorstores/langchain_community.vectorstores.faiss.FAISS.html

Edit the source of this page on GitHub.

Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.

Popular Providers

Integrations by component

Setup

Initialization

Manage vector store

Add items to vector store

Delete items from vector store

Query vector store

Query directly

Similarity search

Similarity search with score

Other search methods

Query by turning into retriever

Usage for retrieval-augmented generation

Saving and loading

Merging

API reference

Popular Providers

Integrations by component

​Setup

​Initialization

​Manage vector store

​Add items to vector store

​Delete items from vector store

​Query vector store

​Query directly

​Similarity search

​Similarity search with score

​Other search methods

​Query by turning into retriever

​Usage for retrieval-augmented generation

​Saving and loading

​Merging

​API reference

Setup

Initialization

Manage vector store

Add items to vector store

Delete items from vector store

Query vector store

Query directly

Similarity search

Similarity search with score

Other search methods

Query by turning into retriever

Usage for retrieval-augmented generation

Saving and loading

Merging

API reference