Rememberizer는 SkyDeck AI Inc.에서 만든 AI 애플리케이션을 위한 지식 향상 서비스입니다.
이 노트북은 Rememberizer에서 문서를 검색하여 다운스트림에서 사용되는 Document 형식으로 가져오는 방법을 보여줍니다.

준비

API key가 필요합니다: https://rememberizer.ai에서 common knowledge를 생성한 후 API key를 받을 수 있습니다. API key를 받은 후에는 환경 변수 REMEMBERIZER_API_KEY로 설정하거나 RememberizerRetriever를 초기화할 때 rememberizer_api_key로 전달해야 합니다. RememberizerRetriever는 다음과 같은 인자를 가집니다:
  • optional top_k_results: 기본값=10. 반환되는 문서의 수를 제한하는 데 사용합니다.
  • optional rememberizer_api_key: 환경 변수 REMEMBERIZER_API_KEY를 설정하지 않은 경우 필수입니다.
get_relevant_documents()는 하나의 인자 query를 가집니다: Rememberizer.ai의 common knowledge에서 문서를 찾는 데 사용되는 자유 텍스트입니다.

예제

기본 사용법

# Setup API key
from getpass import getpass

REMEMBERIZER_API_KEY = getpass()
import os

from langchain_community.retrievers import RememberizerRetriever

os.environ["REMEMBERIZER_API_KEY"] = REMEMBERIZER_API_KEY
retriever = RememberizerRetriever(top_k_results=5)
docs = retriever.get_relevant_documents(query="How does Large Language Models works?")
docs[0].metadata  # meta-information of the Document
{'id': 13646493,
 'document_id': '17s3LlMbpkTk0ikvGwV0iLMCj-MNubIaP',
 'name': 'What is a large language model (LLM)_ _ Cloudflare.pdf',
 'type': 'application/pdf',
 'path': '/langchain/What is a large language model (LLM)_ _ Cloudflare.pdf',
 'url': 'https://drive.google.com/file/d/17s3LlMbpkTk0ikvGwV0iLMCj-MNubIaP/view',
 'size': 337089,
 'created_time': '',
 'modified_time': '',
 'indexed_on': '2024-04-04T03:36:28.886170Z',
 'integration': {'id': 347, 'integration_type': 'google_drive'}}
print(docs[0].page_content[:400])  # a content of the Document
before, or contextualized in new ways. on some level they " understand " semantics in that they can associate words and concepts by their meaning, having seen them grouped together in that way millions or billions of times. how developers can quickly start building their own llms to build llm applications, developers need easy access to multiple data sets, and they need places for those data sets

chain에서 사용하기

OPENAI_API_KEY = getpass()
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
from langchain.chains import ConversationalRetrievalChain
from langchain_openai import ChatOpenAI

model = ChatOpenAI(model_name="gpt-3.5-turbo")
qa = ConversationalRetrievalChain.from_llm(model, retriever=retriever)
questions = [
    "What is RAG?",
    "How does Large Language Models works?",
]
chat_history = []

for question in questions:
    result = qa.invoke({"question": question, "chat_history": chat_history})
    chat_history.append((question, result["answer"]))
    print(f"-> **Question**: {question} \n")
    print(f"**Answer**: {result['answer']} \n")
-> **Question**: What is RAG?

**Answer**: RAG stands for Retrieval-Augmented Generation. It is an AI framework that retrieves facts from an external knowledge base to enhance the responses generated by Large Language Models (LLMs) by providing up-to-date and accurate information. This framework helps users understand the generative process of LLMs and ensures that the model has access to reliable information sources.

-> **Question**: How does Large Language Models works?

**Answer**: Large Language Models (LLMs) work by analyzing massive data sets of language to comprehend and generate human language text. They are built on machine learning, specifically deep learning, which involves training a program to recognize features of data without human intervention. LLMs use neural networks, specifically transformer models, to understand context in human language, making them better at interpreting language even in vague or new contexts. Developers can quickly start building their own LLMs by accessing multiple data sets and using services like Cloudflare's Vectorize and Cloudflare Workers AI platform.

Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.
I