Neo4j는 vector 유사도 검색을 통합 지원하는 오픈 소스 graph database입니다
지원 기능:
  • approximate nearest neighbor search
  • Euclidean similarity 및 cosine similarity
  • vector와 keyword 검색을 결합한 hybrid search
이 notebook은 Neo4j vector index(Neo4jVector)를 사용하는 방법을 보여줍니다. 설치 지침을 참조하세요.
# Pip install necessary package
pip install -qU  neo4j
pip install -qU  langchain-openai langchain-neo4j
pip install -qU  tiktoken
OpenAIEmbeddings를 사용하려면 OpenAI API Key를 가져와야 합니다.
import getpass
import os

if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")
OpenAI API Key: ········
from langchain_community.document_loaders import TextLoader
from langchain_core.documents import Document
from langchain_neo4j import Neo4jVector
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter
loader = TextLoader("../../how_to/state_of_the_union.txt")

documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings()
# Neo4jVector requires the Neo4j database credentials

url = "bolt://localhost:7687"
username = "neo4j"
password = "password"

# You can also use environment variables instead of directly passing named parameters
# os.environ["NEO4J_URI"] = "bolt://localhost:7687"
# os.environ["NEO4J_USERNAME"] = "neo4j"
# os.environ["NEO4J_PASSWORD"] = "pleaseletmein"

Cosine Distance를 사용한 Similarity Search (기본값)

# The Neo4jVector Module will connect to Neo4j and create a vector index if needed.

db = Neo4jVector.from_documents(
    docs, OpenAIEmbeddings(), url=url, username=username, password=password
)
query = "What did the president say about Ketanji Brown Jackson"
docs_with_score = db.similarity_search_with_score(query, k=2)
for doc, score in docs_with_score:
    print("-" * 80)
    print("Score: ", score)
    print(doc.page_content)
    print("-" * 80)
--------------------------------------------------------------------------------
Score:  0.9076391458511353
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections.

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Score:  0.8912242650985718
A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.

And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system.

We can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling.

We’ve set up joint patrols with Mexico and Guatemala to catch more human traffickers.

We’re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster.

We’re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.
--------------------------------------------------------------------------------

vectorstore 작업하기

위에서는 vectorstore를 처음부터 생성했습니다. 하지만 종종 기존 vectorstore를 사용하고 싶을 때가 있습니다. 이를 위해 직접 초기화할 수 있습니다.
index_name = "vector"  # default index name

store = Neo4jVector.from_existing_index(
    OpenAIEmbeddings(),
    url=url,
    username=username,
    password=password,
    index_name=index_name,
)
from_existing_graph 메서드를 사용하여 기존 graph에서 vectorstore를 초기화할 수도 있습니다. 이 메서드는 database에서 관련 텍스트 정보를 가져와서 텍스트 embedding을 계산하고 database에 다시 저장합니다.
# First we create sample data in graph
store.query(
    "CREATE (p:Person {name: 'Tomaz', location:'Slovenia', hobby:'Bicycle', age: 33})"
)
[]
# Now we initialize from existing graph
existing_graph = Neo4jVector.from_existing_graph(
    embedding=OpenAIEmbeddings(),
    url=url,
    username=username,
    password=password,
    index_name="person_index",
    node_label="Person",
    text_node_properties=["name", "location"],
    embedding_node_property="embedding",
)
result = existing_graph.similarity_search("Slovenia", k=1)
result[0]
Document(page_content='\nname: Tomaz\nlocation: Slovenia', metadata={'age': 33, 'hobby': 'Bicycle'})
Neo4j는 relationship vector index도 지원합니다. 여기서 embedding은 relationship property로 저장되고 인덱싱됩니다. relationship vector index는 LangChain을 통해 채울 수 없지만, 기존 relationship vector index에 연결할 수 있습니다.
# First we create sample data and index in graph
store.query(
    "MERGE (p:Person {name: 'Tomaz'}) "
    "MERGE (p1:Person {name:'Leann'}) "
    "MERGE (p1)-[:FRIEND {text:'example text', embedding:$embedding}]->(p2)",
    params={"embedding": OpenAIEmbeddings().embed_query("example text")},
)
# Create a vector index
relationship_index = "relationship_vector"
store.query(
    """
CREATE VECTOR INDEX $relationship_index
IF NOT EXISTS
FOR ()-[r:FRIEND]-() ON (r.embedding)
OPTIONS {indexConfig: {
 `vector.dimensions`: 1536,
 `vector.similarity_function`: 'cosine'
}}
""",
    params={"relationship_index": relationship_index},
)
[]
relationship_vector = Neo4jVector.from_existing_relationship_index(
    OpenAIEmbeddings(),
    url=url,
    username=username,
    password=password,
    index_name=relationship_index,
    text_node_property="text",
)
relationship_vector.similarity_search("Example")
[Document(page_content='example text')]

Metadata 필터링

Neo4j vector store는 parallel runtime과 exact nearest neighbor search를 결합하여 metadata 필터링도 지원합니다. Neo4j 5.18 이상 버전이 필요합니다. 동등 필터링은 다음 구문을 사용합니다.
existing_graph.similarity_search(
    "Slovenia",
    filter={"hobby": "Bicycle", "name": "Tomaz"},
)
[Document(page_content='\nname: Tomaz\nlocation: Slovenia', metadata={'age': 33, 'hobby': 'Bicycle'})]
Metadata 필터링은 다음 연산자도 지원합니다:
  • $eq: Equal
  • $ne: Not Equal
  • $lt: Less than
  • $lte: Less than or equal
  • $gt: Greater than
  • $gte: Greater than or equal
  • $in: In a list of values
  • $nin: Not in a list of values
  • $between: Between two values
  • $like: Text contains value
  • $ilike: lowered text contains value
existing_graph.similarity_search(
    "Slovenia",
    filter={"hobby": {"$eq": "Bicycle"}, "age": {"$gt": 15}},
)
[Document(page_content='\nname: Tomaz\nlocation: Slovenia', metadata={'age': 33, 'hobby': 'Bicycle'})]
필터 간에 OR 연산자를 사용할 수도 있습니다
existing_graph.similarity_search(
    "Slovenia",
    filter={"$or": [{"hobby": {"$eq": "Bicycle"}}, {"age": {"$gt": 15}}]},
)
[Document(page_content='\nname: Tomaz\nlocation: Slovenia', metadata={'age': 33, 'hobby': 'Bicycle'})]

document 추가

기존 vectorstore에 document를 추가할 수 있습니다.
store.add_documents([Document(page_content="foo")])
['acbd18db4cc2f85cedef654fccc4a4d8']
docs_with_score = store.similarity_search_with_score("foo")
docs_with_score[0]
(Document(page_content='foo'), 0.9999997615814209)

retrieval query로 응답 커스터마이징

graph에서 다른 정보를 가져올 수 있는 사용자 정의 Cypher snippet을 사용하여 응답을 커스터마이징할 수도 있습니다. 내부적으로 최종 Cypher statement는 다음과 같이 구성됩니다:
read_query = (
  "CALL db.index.vector.queryNodes($index, $k, $embedding) "
  "YIELD node, score "
) + retrieval_query
retrieval query는 다음 세 개의 column을 반환해야 합니다:
  • text: Union[str, Dict] = document의 page_content를 채우는 데 사용되는 값
  • score: Float = 유사도 점수
  • metadata: Dict = document의 추가 metadata
자세한 내용은 이 블로그 포스트를 참조하세요.
retrieval_query = """
RETURN "Name:" + node.name AS text, score, {foo:"bar"} AS metadata
"""
retrieval_example = Neo4jVector.from_existing_index(
    OpenAIEmbeddings(),
    url=url,
    username=username,
    password=password,
    index_name="person_index",
    retrieval_query=retrieval_query,
)
retrieval_example.similarity_search("Foo", k=1)
[Document(page_content='Name:Tomaz', metadata={'foo': 'bar'})]
다음은 embedding을 제외한 모든 node property를 dictionary로 text column에 전달하는 예제입니다.
retrieval_query = """
RETURN node {.name, .age, .hobby} AS text, score, {foo:"bar"} AS metadata
"""
retrieval_example = Neo4jVector.from_existing_index(
    OpenAIEmbeddings(),
    url=url,
    username=username,
    password=password,
    index_name="person_index",
    retrieval_query=retrieval_query,
)
retrieval_example.similarity_search("Foo", k=1)
[Document(page_content='name: Tomaz\nage: 33\nhobby: Bicycle\n', metadata={'foo': 'bar'})]
retrieval query에 Cypher parameter를 전달할 수도 있습니다. parameter는 추가 필터링, traversal 등에 사용할 수 있습니다…
retrieval_query = """
RETURN node {.*, embedding:Null, extra: $extra} AS text, score, {foo:"bar"} AS metadata
"""
retrieval_example = Neo4jVector.from_existing_index(
    OpenAIEmbeddings(),
    url=url,
    username=username,
    password=password,
    index_name="person_index",
    retrieval_query=retrieval_query,
)
retrieval_example.similarity_search("Foo", k=1, params={"extra": "ParamInfo"})
[Document(page_content='location: Slovenia\nextra: ParamInfo\nname: Tomaz\nage: 33\nhobby: Bicycle\nembedding: None\n', metadata={'foo': 'bar'})]

Hybrid search (vector + keyword)

Neo4j는 vector와 keyword index를 모두 통합하여 hybrid search 접근 방식을 사용할 수 있습니다
# The Neo4jVector Module will connect to Neo4j and create a vector and keyword indices if needed.
hybrid_db = Neo4jVector.from_documents(
    docs,
    OpenAIEmbeddings(),
    url=url,
    username=username,
    password=password,
    search_type="hybrid",
)
기존 index에서 hybrid search를 로드하려면 vector와 keyword index를 모두 제공해야 합니다
index_name = "vector"  # default index name
keyword_index_name = "keyword"  # default keyword index name

store = Neo4jVector.from_existing_index(
    OpenAIEmbeddings(),
    url=url,
    username=username,
    password=password,
    index_name=index_name,
    keyword_index_name=keyword_index_name,
    search_type="hybrid",
)

Retriever 옵션

이 섹션은 Neo4jVector를 retriever로 사용하는 방법을 보여줍니다.
retriever = store.as_retriever()
retriever.invoke(query)[0]
Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'source': '../../how_to/state_of_the_union.txt'})

출처를 포함한 Question Answering

이 섹션은 Index에서 출처를 포함한 question-answering을 수행하는 방법을 다룹니다. Index에서 document를 조회하는 RetrievalQAWithSourcesChain을 사용하여 이를 수행합니다.
from langchain.chains import RetrievalQAWithSourcesChain
from langchain_openai import ChatOpenAI
chain = RetrievalQAWithSourcesChain.from_chain_type(
    ChatOpenAI(temperature=0), chain_type="stuff", retriever=retriever
)
chain.invoke(
    {"question": "What did the president say about Justice Breyer"},
    return_only_outputs=True,
)
{'answer': 'The president honored Justice Stephen Breyer for his service to the country and mentioned his retirement from the United States Supreme Court.\n',
 'sources': '../../how_to/state_of_the_union.txt'}

Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.
I