개요
Needle Document Loader는 Needle collection을 LangChain과 통합하기 위한 유틸리티입니다. Retrieval-Augmented Generation (RAG) 워크플로우를 위한 문서의 원활한 저장, 검색 및 활용을 가능하게 합니다. 이 예제는 다음을 보여줍니다:- Needle collection에 문서 저장하기
- 문서를 가져오기 위한 retriever 설정하기
- Retrieval-Augmented Generation (RAG) 파이프라인 구축하기
설정
시작하기 전에 다음 환경 변수가 설정되어 있는지 확인하세요:- NEEDLE_API_KEY: Needle 인증을 위한 API key
- OPENAI_API_KEY: language model 작업을 위한 OpenAI API key
Copy
import os
Copy
os.environ["NEEDLE_API_KEY"] = ""
Copy
os.environ["OPENAI_API_KEY"] = ""
초기화
NeedleLoader를 초기화하려면 다음 매개변수가 필요합니다:- needle_api_key: Needle API key (또는 환경 변수로 설정)
- collection_id: 작업할 Needle collection의 ID
인스턴스화
Copy
from langchain_community.document_loaders.needle import NeedleLoader
collection_id = "clt_01J87M9T6B71DHZTHNXYZQRG5H"
# Initialize NeedleLoader to store documents to the collection
document_loader = NeedleLoader(
needle_api_key=os.getenv("NEEDLE_API_KEY"),
collection_id=collection_id,
)
Load
Needle collection에 파일을 추가하려면:Copy
files = {
"tech-radar-30.pdf": "https://www.thoughtworks.com/content/dam/thoughtworks/documents/radar/2024/04/tr_technology_radar_vol_30_en.pdf"
}
document_loader.add_files(files=files)
Copy
# Show the documents in the collection
# collections_documents = document_loader.load()
Lazy Load
lazy_load 메서드를 사용하면 Needle collection에서 문서를 반복적으로 로드할 수 있으며, 각 문서를 가져올 때마다 yield합니다:Copy
# Show the documents in the collection
# collections_documents = document_loader.lazy_load()
사용법
chain 내에서 사용하기
다음은 chain 내에서 Needle을 사용하여 RAG 파이프라인을 설정하는 완전한 예제입니다:Copy
import os
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_community.retrievers.needle import NeedleRetriever
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(temperature=0)
# Initialize the Needle retriever (make sure your Needle API key is set as an environment variable)
retriever = NeedleRetriever(
needle_api_key=os.getenv("NEEDLE_API_KEY"),
collection_id="clt_01J87M9T6B71DHZTHNXYZQRG5H",
)
# Define system prompt for the assistant
system_prompt = """
You are an assistant for question-answering tasks.
Use the following pieces of retrieved context to answer the question.
If you don't know, say so concisely.\n\n{context}
"""
prompt = ChatPromptTemplate.from_messages(
[("system", system_prompt), ("human", "{input}")]
)
# Define the question-answering chain using a document chain (stuff chain) and the retriever
question_answer_chain = create_stuff_documents_chain(llm, prompt)
# Create the RAG (Retrieval-Augmented Generation) chain by combining the retriever and the question-answering chain
rag_chain = create_retrieval_chain(retriever, question_answer_chain)
# Define the input query
query = {"input": "Did RAG move to accepted?"}
response = rag_chain.invoke(query)
response
Copy
{'input': 'Did RAG move to accepted?',
'context': [Document(metadata={}, page_content='New Moved in/out No change\n\n© Thoughtworks, Inc. All Rights Reserved. 12\n\nTechniques\n\n1. Retrieval-augmented generation (RAG)\nAdopt\n\nRetrieval-augmented generation (RAG) is the preferred pattern for our teams to improve the quality of \nresponses generated by a large language model (LLM). We’ve successfully used it in several projects, \nincluding the popular Jugalbandi AI Platform. With RAG, information about relevant and trustworthy \ndocuments — in formats like HTML and PDF — are stored in databases that supports a vector data \ntype or efficient document search, such as pgvector, Qdrant or Elasticsearch Relevance Engine. For \na given prompt, the database is queried to retrieve relevant documents, which are then combined \nwith the prompt to provide richer context to the LLM. This results in higher quality output and greatly \nreduced hallucinations. The context window — which determines the maximum size of the LLM input \n— is limited, which means that selecting the most relevant documents is crucial. We improve the \nrelevancy of the content that is added to the prompt by reranking. Similarly, the documents are usually \ntoo large to calculate an embedding, which means they must be split into smaller chunks. This is often \na difficult problem, and one approach is to have the chunks overlap to a certain extent.'),
Document(metadata={}, page_content='New Moved in/out No change\n\n© Thoughtworks, Inc. All Rights Reserved. 12\n\nTechniques\n\n1. Retrieval-augmented generation (RAG)\nAdopt\n\nRetrieval-augmented generation (RAG) is the preferred pattern for our teams to improve the quality of \nresponses generated by a large language model (LLM). We’ve successfully used it in several projects, \nincluding the popular Jugalbandi AI Platform. With RAG, information about relevant and trustworthy \ndocuments — in formats like HTML and PDF — are stored in databases that supports a vector data \ntype or efficient document search, such as pgvector, Qdrant or Elasticsearch Relevance Engine. For \na given prompt, the database is queried to retrieve relevant documents, which are then combined \nwith the prompt to provide richer context to the LLM. This results in higher quality output and greatly \nreduced hallucinations. The context window — which determines the maximum size of the LLM input \n— is limited, which means that selecting the most relevant documents is crucial. We improve the \nrelevancy of the content that is added to the prompt by reranking. Similarly, the documents are usually \ntoo large to calculate an embedding, which means they must be split into smaller chunks. This is often \na difficult problem, and one approach is to have the chunks overlap to a certain extent.'),
Document(metadata={}, page_content='New Moved in/out No change\n\n© Thoughtworks, Inc. All Rights Reserved. 12\n\nTechniques\n\n1. Retrieval-augmented generation (RAG)\nAdopt\n\nRetrieval-augmented generation (RAG) is the preferred pattern for our teams to improve the quality of \nresponses generated by a large language model (LLM). We’ve successfully used it in several projects, \nincluding the popular Jugalbandi AI Platform. With RAG, information about relevant and trustworthy \ndocuments — in formats like HTML and PDF — are stored in databases that supports a vector data \ntype or efficient document search, such as pgvector, Qdrant or Elasticsearch Relevance Engine. For \na given prompt, the database is queried to retrieve relevant documents, which are then combined \nwith the prompt to provide richer context to the LLM. This results in higher quality output and greatly \nreduced hallucinations. The context window — which determines the maximum size of the LLM input \n— is limited, which means that selecting the most relevant documents is crucial. We improve the \nrelevancy of the content that is added to the prompt by reranking. Similarly, the documents are usually \ntoo large to calculate an embedding, which means they must be split into smaller chunks. This is often \na difficult problem, and one approach is to have the chunks overlap to a certain extent.'),
Document(metadata={}, page_content='New Moved in/out No change\n\n© Thoughtworks, Inc. All Rights Reserved. 12\n\nTechniques\n\n1. Retrieval-augmented generation (RAG)\nAdopt\n\nRetrieval-augmented generation (RAG) is the preferred pattern for our teams to improve the quality of \nresponses generated by a large language model (LLM). We’ve successfully used it in several projects, \nincluding the popular Jugalbandi AI Platform. With RAG, information about relevant and trustworthy \ndocuments — in formats like HTML and PDF — are stored in databases that supports a vector data \ntype or efficient document search, such as pgvector, Qdrant or Elasticsearch Relevance Engine. For \na given prompt, the database is queried to retrieve relevant documents, which are then combined \nwith the prompt to provide richer context to the LLM. This results in higher quality output and greatly \nreduced hallucinations. The context window — which determines the maximum size of the LLM input \n— is limited, which means that selecting the most relevant documents is crucial. We improve the \nrelevancy of the content that is added to the prompt by reranking. Similarly, the documents are usually \ntoo large to calculate an embedding, which means they must be split into smaller chunks. This is often \na difficult problem, and one approach is to have the chunks overlap to a certain extent.'),
Document(metadata={}, page_content='New Moved in/out No change\n\n© Thoughtworks, Inc. All Rights Reserved. 12\n\nTechniques\n\n1. Retrieval-augmented generation (RAG)\nAdopt\n\nRetrieval-augmented generation (RAG) is the preferred pattern for our teams to improve the quality of \nresponses generated by a large language model (LLM). We’ve successfully used it in several projects, \nincluding the popular Jugalbandi AI Platform. With RAG, information about relevant and trustworthy \ndocuments — in formats like HTML and PDF — are stored in databases that supports a vector data \ntype or efficient document search, such as pgvector, Qdrant or Elasticsearch Relevance Engine. For \na given prompt, the database is queried to retrieve relevant documents, which are then combined \nwith the prompt to provide richer context to the LLM. This results in higher quality output and greatly \nreduced hallucinations. The context window — which determines the maximum size of the LLM input \n— is limited, which means that selecting the most relevant documents is crucial. We improve the \nrelevancy of the content that is added to the prompt by reranking. Similarly, the documents are usually \ntoo large to calculate an embedding, which means they must be split into smaller chunks. This is often \na difficult problem, and one approach is to have the chunks overlap to a certain extent.')],
'answer': 'Yes, RAG has been adopted as the preferred pattern for improving the quality of responses generated by a large language model.'}
API reference
모든Needle 기능 및 구성에 대한 자세한 문서는 API reference를 참조하세요: docs.needle-ai.com
Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.