Needle Document Loader

Needle을 사용하면 최소한의 노력으로 RAG 파이프라인을 쉽게 생성할 수 있습니다. 자세한 내용은 API 문서를 참조하세요.

개요

Needle Document Loader는 Needle collection을 LangChain과 통합하기 위한 유틸리티입니다. Retrieval-Augmented Generation (RAG) 워크플로우를 위한 문서의 원활한 저장, 검색 및 활용을 가능하게 합니다. 이 예제는 다음을 보여줍니다:

Needle collection에 문서 저장하기
문서를 가져오기 위한 retriever 설정하기
Retrieval-Augmented Generation (RAG) 파이프라인 구축하기

설정

시작하기 전에 다음 환경 변수가 설정되어 있는지 확인하세요:

NEEDLE_API_KEY: Needle 인증을 위한 API key
OPENAI_API_KEY: language model 작업을 위한 OpenAI API key

import os

os.environ["NEEDLE_API_KEY"] = ""

os.environ["OPENAI_API_KEY"] = ""

초기화

NeedleLoader를 초기화하려면 다음 매개변수가 필요합니다:

needle_api_key: Needle API key (또는 환경 변수로 설정)
collection_id: 작업할 Needle collection의 ID

인스턴스화

from langchain_community.document_loaders.needle import NeedleLoader

collection_id = "clt_01J87M9T6B71DHZTHNXYZQRG5H"

# Initialize NeedleLoader to store documents to the collection
document_loader = NeedleLoader(
    needle_api_key=os.getenv("NEEDLE_API_KEY"),
    collection_id=collection_id,
)

Load

Needle collection에 파일을 추가하려면:

files = {
    "tech-radar-30.pdf": "https://www.thoughtworks.com/content/dam/thoughtworks/documents/radar/2024/04/tr_technology_radar_vol_30_en.pdf"
}

document_loader.add_files(files=files)

# Show the documents in the collection
# collections_documents = document_loader.load()

Lazy Load

lazy_load 메서드를 사용하면 Needle collection에서 문서를 반복적으로 로드할 수 있으며, 각 문서를 가져올 때마다 yield합니다:

# Show the documents in the collection
# collections_documents = document_loader.lazy_load()

사용법

chain 내에서 사용하기

다음은 chain 내에서 Needle을 사용하여 RAG 파이프라인을 설정하는 완전한 예제입니다:

import os

from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_community.retrievers.needle import NeedleRetriever
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(temperature=0)

# Initialize the Needle retriever (make sure your Needle API key is set as an environment variable)
retriever = NeedleRetriever(
    needle_api_key=os.getenv("NEEDLE_API_KEY"),
    collection_id="clt_01J87M9T6B71DHZTHNXYZQRG5H",
)

# Define system prompt for the assistant
system_prompt = """
    You are an assistant for question-answering tasks.
    Use the following pieces of retrieved context to answer the question.
    If you don't know, say so concisely.\n\n{context}
    """

prompt = ChatPromptTemplate.from_messages(
    [("system", system_prompt), ("human", "{input}")]
)

# Define the question-answering chain using a document chain (stuff chain) and the retriever
question_answer_chain = create_stuff_documents_chain(llm, prompt)

# Create the RAG (Retrieval-Augmented Generation) chain by combining the retriever and the question-answering chain
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

# Define the input query
query = {"input": "Did RAG move to accepted?"}

response = rag_chain.invoke(query)

response

{'input': 'Did RAG move to accepted?',
 'context': [Document(metadata={}, page_content='New Moved in/out No change\n\n© Thoughtworks, Inc. All Rights Reserved. 12\n\nTechniques\n\n1. Retrieval-augmented generation (RAG)\nAdopt\n\nRetrieval-augmented generation (RAG) is the preferred pattern for our teams to improve the quality of \nresponses generated by a large language model (LLM). We’ve successfully used it in several projects, \nincluding the popular Jugalbandi AI Platform. With RAG, information about relevant and trustworthy \ndocuments — in formats like HTML and PDF — are stored in databases that supports a vector data \ntype or efficient document search, such as pgvector, Qdrant or Elasticsearch Relevance Engine. For \na given prompt, the database is queried to retrieve relevant documents, which are then combined \nwith the prompt to provide richer context to the LLM. This results in higher quality output and greatly \nreduced hallucinations. The context window — which determines the maximum size of the LLM input \n— is limited, which means that selecting the most relevant documents is crucial. We improve the \nrelevancy of the content that is added to the prompt by reranking. Similarly, the documents are usually \ntoo large to calculate an embedding, which means they must be split into smaller chunks. This is often \na difficult problem, and one approach is to have the chunks overlap to a certain extent.'),
  Document(metadata={}, page_content='New Moved in/out No change\n\n© Thoughtworks, Inc. All Rights Reserved. 12\n\nTechniques\n\n1. Retrieval-augmented generation (RAG)\nAdopt\n\nRetrieval-augmented generation (RAG) is the preferred pattern for our teams to improve the quality of \nresponses generated by a large language model (LLM). We’ve successfully used it in several projects, \nincluding the popular Jugalbandi AI Platform. With RAG, information about relevant and trustworthy \ndocuments — in formats like HTML and PDF — are stored in databases that supports a vector data \ntype or efficient document search, such as pgvector, Qdrant or Elasticsearch Relevance Engine. For \na given prompt, the database is queried to retrieve relevant documents, which are then combined \nwith the prompt to provide richer context to the LLM. This results in higher quality output and greatly \nreduced hallucinations. The context window — which determines the maximum size of the LLM input \n— is limited, which means that selecting the most relevant documents is crucial. We improve the \nrelevancy of the content that is added to the prompt by reranking. Similarly, the documents are usually \ntoo large to calculate an embedding, which means they must be split into smaller chunks. This is often \na difficult problem, and one approach is to have the chunks overlap to a certain extent.'),
  Document(metadata={}, page_content='New Moved in/out No change\n\n© Thoughtworks, Inc. All Rights Reserved. 12\n\nTechniques\n\n1. Retrieval-augmented generation (RAG)\nAdopt\n\nRetrieval-augmented generation (RAG) is the preferred pattern for our teams to improve the quality of \nresponses generated by a large language model (LLM). We’ve successfully used it in several projects, \nincluding the popular Jugalbandi AI Platform. With RAG, information about relevant and trustworthy \ndocuments — in formats like HTML and PDF — are stored in databases that supports a vector data \ntype or efficient document search, such as pgvector, Qdrant or Elasticsearch Relevance Engine. For \na given prompt, the database is queried to retrieve relevant documents, which are then combined \nwith the prompt to provide richer context to the LLM. This results in higher quality output and greatly \nreduced hallucinations. The context window — which determines the maximum size of the LLM input \n— is limited, which means that selecting the most relevant documents is crucial. We improve the \nrelevancy of the content that is added to the prompt by reranking. Similarly, the documents are usually \ntoo large to calculate an embedding, which means they must be split into smaller chunks. This is often \na difficult problem, and one approach is to have the chunks overlap to a certain extent.'),
  Document(metadata={}, page_content='New Moved in/out No change\n\n© Thoughtworks, Inc. All Rights Reserved. 12\n\nTechniques\n\n1. Retrieval-augmented generation (RAG)\nAdopt\n\nRetrieval-augmented generation (RAG) is the preferred pattern for our teams to improve the quality of \nresponses generated by a large language model (LLM). We’ve successfully used it in several projects, \nincluding the popular Jugalbandi AI Platform. With RAG, information about relevant and trustworthy \ndocuments — in formats like HTML and PDF — are stored in databases that supports a vector data \ntype or efficient document search, such as pgvector, Qdrant or Elasticsearch Relevance Engine. For \na given prompt, the database is queried to retrieve relevant documents, which are then combined \nwith the prompt to provide richer context to the LLM. This results in higher quality output and greatly \nreduced hallucinations. The context window — which determines the maximum size of the LLM input \n— is limited, which means that selecting the most relevant documents is crucial. We improve the \nrelevancy of the content that is added to the prompt by reranking. Similarly, the documents are usually \ntoo large to calculate an embedding, which means they must be split into smaller chunks. This is often \na difficult problem, and one approach is to have the chunks overlap to a certain extent.'),
  Document(metadata={}, page_content='New Moved in/out No change\n\n© Thoughtworks, Inc. All Rights Reserved. 12\n\nTechniques\n\n1. Retrieval-augmented generation (RAG)\nAdopt\n\nRetrieval-augmented generation (RAG) is the preferred pattern for our teams to improve the quality of \nresponses generated by a large language model (LLM). We’ve successfully used it in several projects, \nincluding the popular Jugalbandi AI Platform. With RAG, information about relevant and trustworthy \ndocuments — in formats like HTML and PDF — are stored in databases that supports a vector data \ntype or efficient document search, such as pgvector, Qdrant or Elasticsearch Relevance Engine. For \na given prompt, the database is queried to retrieve relevant documents, which are then combined \nwith the prompt to provide richer context to the LLM. This results in higher quality output and greatly \nreduced hallucinations. The context window — which determines the maximum size of the LLM input \n— is limited, which means that selecting the most relevant documents is crucial. We improve the \nrelevancy of the content that is added to the prompt by reranking. Similarly, the documents are usually \ntoo large to calculate an embedding, which means they must be split into smaller chunks. This is often \na difficult problem, and one approach is to have the chunks overlap to a certain extent.')],
 'answer': 'Yes, RAG has been adopted as the preferred pattern for improving the quality of responses generated by a large language model.'}

API reference

모든 Needle 기능 및 구성에 대한 자세한 문서는 API reference를 참조하세요: docs.needle-ai.com

Edit the source of this page on GitHub.

Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.

Popular Providers

Integrations by component

개요

설정

초기화

인스턴스화

Load

Lazy Load

사용법

chain 내에서 사용하기

API reference

Popular Providers

Integrations by component

​개요

​설정

​초기화

​인스턴스화

​Load

​Lazy Load

​사용법

​chain 내에서 사용하기

​API reference

개요

설정

초기화

인스턴스화

Load

Lazy Load

사용법

chain 내에서 사용하기

API reference