- VectaraSearch: corpus에 대한 의미론적 검색
- VectaraRAG: RAG를 사용한 요약 생성
- VectaraIngest: corpus에 문서 수집
- VectaraAddFiles: 파일 업로드
Setup
Vectara Tools를 사용하려면 먼저 partner package를 설치해야 합니다.
Copy
!uv pip install -U pip && uv pip install -qU langchain-vectara langgraph
Getting Started
시작하려면 다음 단계를 따르세요:- 아직 계정이 없다면 가입하여 무료 Vectara 평가판을 받으세요.
- 계정 내에서 하나 이상의 corpus를 생성할 수 있습니다. 각 corpus는 입력 문서에서 수집 시 텍스트 데이터를 저장하는 영역을 나타냅니다. corpus를 생성하려면 “Create Corpus” 버튼을 사용하세요. 그런 다음 corpus에 이름과 설명을 제공합니다. 선택적으로 필터링 속성을 정의하고 일부 고급 옵션을 적용할 수 있습니다. 생성된 corpus를 클릭하면 상단에서 이름과 corpus ID를 확인할 수 있습니다.
- 다음으로 corpus에 액세스하기 위한 API key를 생성해야 합니다. corpus 보기에서 “Access Control” 탭을 클릭한 다음 “Create API Key” 버튼을 클릭하세요. key에 이름을 지정하고 query-only 또는 query+index 중 선택하세요. “Create”를 클릭하면 활성 API key가 생성됩니다. 이 key는 기밀로 유지하세요.
corpus_key와 api_key 두 가지 값이 필요합니다.
VECTARA_API_KEY를 LangChain에 두 가지 방법으로 제공할 수 있습니다:
Instantiation
-
환경에 다음 두 변수를 포함하세요:
VECTARA_API_KEY. 예를 들어, os.environ과 getpass를 사용하여 다음과 같이 이러한 변수를 설정할 수 있습니다:
Copy
import os
import getpass
os.environ["VECTARA_API_KEY"] = getpass.getpass("Vectara API Key:")
Vectaravectorstore constructor에 추가하세요:
Copy
vectara = Vectara(
vectara_api_key=vectara_api_key
)
Copy
import os
os.environ["VECTARA_API_KEY"] = "<VECTARA_API_KEY>"
os.environ["VECTARA_CORPUS_KEY"] = "<VECTARA_CORPUS_KEY>"
os.environ["OPENAI_API_KEY"] = "<OPENAI_API_KEY>"
from langchain_vectara import Vectara
from langchain_vectara.tools import (
VectaraAddFiles,
VectaraIngest,
VectaraRAG,
VectaraSearch,
)
from langchain_vectara.vectorstores import (
ChainReranker,
CorpusConfig,
CustomerSpecificReranker,
File,
GenerationConfig,
MmrReranker,
SearchConfig,
VectaraQueryConfig,
)
vectara = Vectara(vectara_api_key=os.getenv("VECTARA_API_KEY"))
VectaraAddFiles 도구를 사용한다는 점에 유의하세요 - Vectara는 파일 콘텐츠를 받아 필요한 모든 전처리, chunking 및 embedding을 수행하여 knowledge store에 저장합니다.
이 경우 .txt 파일을 사용하지만 다른 많은 파일 유형에서도 동일하게 작동합니다.
Copy
corpus_key = os.getenv("VECTARA_CORPUS_KEY")
add_files_tool = VectaraAddFiles(
name="add_files_tool",
description="Upload files about state of the union",
vectorstore=vectara,
corpus_key=corpus_key,
)
file_obj = File(
file_path="../document_loaders/example_data/state_of_the_union.txt",
metadata={"source": "text_file"},
)
add_files_tool.run({"files": [file_obj]})
Copy
'Successfully uploaded 1 files to Vectara corpus test-langchain with IDs: state_of_the_union.txt'
Vectara RAG (retrieval augmented generation)
이제 검색 및 요약 옵션을 제어하기 위해VectaraQueryConfig 객체를 생성합니다:
- 요약을 활성화하고, LLM이 상위 7개의 일치하는 chunk를 선택하고 영어로 응답하도록 지정합니다
VectaraRAG 객체를 생성해 보겠습니다:
Copy
generation_config = GenerationConfig(
max_used_search_results=7,
response_language="eng",
generation_preset_name="vectara-summary-ext-24-05-med-omni",
enable_factual_consistency_score=True,
)
search_config = SearchConfig(
corpora=[CorpusConfig(corpus_key=corpus_key)],
limit=25,
reranker=ChainReranker(
rerankers=[
CustomerSpecificReranker(reranker_id="rnk_272725719", limit=100),
MmrReranker(diversity_bias=0.2, limit=100),
]
),
)
config = VectaraQueryConfig(
search=search_config,
generation=generation_config,
)
query_str = "what did Biden say?"
vectara_rag_tool = VectaraRAG(
name="rag-tool",
description="Get answers about state of the union",
vectorstore=vectara,
corpus_key=corpus_key,
config=config,
)
Invocation
Copy
vectara_rag_tool.run(query_str)
Copy
'{\n "summary": "President Biden discussed several key topics in his recent statements. He emphasized the importance of keeping schools open and noted that with a high vaccination rate and reduced hospitalizations, most Americans can safely return to normal activities [1]. He addressed the need to hold social media platforms accountable for their impact on children and called for stronger privacy protections and mental health services [2]. Biden also announced measures against Russia, including preventing its central bank from defending the Ruble and targeting Russian oligarchs\' assets, as well as closing American airspace to Russian flights [3], [7]. Additionally, he reaffirmed the need to protect women\'s rights, particularly the right to choose as affirmed in Roe v. Wade [5].",\n "factual_consistency_score": 0.5415039\n}'
Vectara를 langchain retriever로 사용
VectaraSearch 도구는 retriever로만 사용할 수 있습니다.
이 경우 다른 LangChain retriever와 동일하게 작동합니다. 이 모드의 주요 용도는 의미론적 검색이며, 이 경우 요약을 비활성화합니다:
Copy
search_config = SearchConfig(
corpora=[CorpusConfig(corpus_key=corpus_key)],
limit=25,
reranker=ChainReranker(
rerankers=[
CustomerSpecificReranker(reranker_id="rnk_272725719", limit=100),
MmrReranker(diversity_bias=0.2, limit=100),
]
),
)
search_tool = VectaraSearch(
name="Search tool",
description="Search for information about state of the union",
vectorstore=vectara,
corpus_key=corpus_key,
search_config=search_config,
)
search_tool.run({"query": "What did Biden say?"})
Copy
'[\n {\n "index": 0,\n "content": "The vast majority of federal workers will once again work in person. Our schools are open. Let\\u2019s keep it that way. Our kids need to be in school. And with 75% of adult Americans fully vaccinated and hospitalizations down by 77%, most Americans can remove their masks, return to work, stay in the classroom, and move forward safely.",\n "source": "text_file",\n "metadata": {\n "X-TIKA:Parsed-By": "org.apache.tika.parser.csv.TextAndCSVParser",\n "Content-Encoding": "UTF-8",\n "X-TIKA:detectedEncoding": "UTF-8",\n "X-TIKA:encodingDetector": "UniversalEncodingDetector",\n "Content-Type": "text/plain; charset=UTF-8",\n "source": "text_file",\n "framework": "langchain"\n },\n "score": 0.9988395571708679\n },\n {\n "index": 1,\n "content": "Children were also struggling before the pandemic. Bullying, violence, trauma, and the harms of social media. As Frances Haugen, who is here with us tonight, has shown, we must hold social media platforms accountable for the national experiment they\\u2019re conducting on our children for profit. It\\u2019s time to strengthen privacy protections, ban targeted advertising to children, demand tech companies stop collecting personal data on our children. And let\\u2019s get all Americans the mental health services they need.",\n "source": "text_file",\n "metadata": {\n "X-TIKA:Parsed-By": "org.apache.tika.parser.csv.TextAndCSVParser",\n "Content-Encoding": "UTF-8",\n "X-TIKA:detectedEncoding": "UTF-8",\n "X-TIKA:encodingDetector": "UniversalEncodingDetector",\n "Content-Type": "text/plain; charset=UTF-8",\n "source": "text_file",\n "framework": "langchain"\n },\n "score": 0.6355851888656616\n },\n {\n "index": 2,\n "content": "Preventing Russia\\u2019s central bank from defending the Russian Ruble making Putin\\u2019s $630 Billion \\u201cwar fund\\u201d worthless. We are choking off Russia\\u2019s access to technology that will sap its economic strength and weaken its military for years to come. Tonight I say to the Russian oligarchs and corrupt leaders who have bilked billions of dollars off this violent regime no more. The U.S. Department of Justice is assembling a dedicated task force to go after the crimes of Russian oligarchs. We are joining with our European allies to find and seize your yachts your luxury apartments your private jets.",\n "source": "text_file",\n "metadata": {\n "X-TIKA:Parsed-By": "org.apache.tika.parser.csv.TextAndCSVParser",\n "Content-Encoding": "UTF-8",\n "X-TIKA:detectedEncoding": "UTF-8",\n "X-TIKA:encodingDetector": "UniversalEncodingDetector",\n "Content-Type": "text/plain; charset=UTF-8",\n "source": "text_file",\n "framework": "langchain"\n },\n "score": 0.6353664994239807\n },\n {\n "index": 3,\n "content": "When they came home, many of the world\\u2019s fittest and best trained warriors were never the same. Dizziness. \\n\\nA cancer that would put them in a flag-draped coffin. I know. \\n\\nOne of those soldiers was my son Major Beau Biden. We don\\u2019t know for sure if a burn pit was the cause of his brain cancer, or the diseases of so many of our troops. But I\\u2019m committed to finding out everything we can.",\n "source": "text_file",\n "metadata": {\n "X-TIKA:Parsed-By": "org.apache.tika.parser.csv.TextAndCSVParser",\n "Content-Encoding": "UTF-8",\n "X-TIKA:detectedEncoding": "UTF-8",\n "X-TIKA:encodingDetector": "UniversalEncodingDetector",\n "Content-Type": "text/plain; charset=UTF-8",\n "source": "text_file",\n "framework": "langchain"\n },\n "score": 0.6315145492553711\n },\n {\n "index": 4,\n "content": "Let\\u2019s get it done once and for all. Advancing liberty and justice also requires protecting the rights of women. The constitutional right affirmed in Roe v. Wade\\u2014standing precedent for half a century\\u2014is under attack as never before. If we want to go forward\\u2014not backward\\u2014we must protect access to health care. Preserve a woman\\u2019s right to choose.",\n "source": "text_file",\n "metadata": {\n "X-TIKA:Parsed-By": "org.apache.tika.parser.csv.TextAndCSVParser",\n "Content-Encoding": "UTF-8",\n "X-TIKA:detectedEncoding": "UTF-8",\n "X-TIKA:encodingDetector": "UniversalEncodingDetector",\n "Content-Type": "text/plain; charset=UTF-8",\n "source": "text_file",\n "framework": "langchain"\n },\n "score": 0.6307355165481567\n },\n {\n "index": 5,\n "content": "That\\u2019s why the Justice Department required body cameras, banned chokeholds, and restricted no-knock warrants for its officers. That\\u2019s why the American Rescue Plan provided $350 Billion that cities, states, and counties can use to hire more police and invest in proven strategies like community violence interruption\\u2014trusted messengers breaking the cycle of violence and trauma and giving young people hope. We should all agree: The answer is not to Defund the police. The answer is to FUND the police with the resources and training they need to protect our communities. I ask Democrats and Republicans alike: Pass my budget and keep our neighborhoods safe.",\n "source": "text_file",\n "metadata": {\n "X-TIKA:Parsed-By": "org.apache.tika.parser.csv.TextAndCSVParser",\n "Content-Encoding": "UTF-8",\n "X-TIKA:detectedEncoding": "UTF-8",\n "X-TIKA:encodingDetector": "UniversalEncodingDetector",\n "Content-Type": "text/plain; charset=UTF-8",\n "source": "text_file",\n "framework": "langchain"\n },\n "score": 0.6283233761787415\n },\n {\n "index": 6,\n "content": "The U.S. Department of Justice is assembling a dedicated task force to go after the crimes of Russian oligarchs. We are joining with our European allies to find and seize your yachts your luxury apartments your private jets. We are coming for your ill-begotten gains. And tonight I am announcing that we will join our allies in closing off American air space to all Russian flights \\u2013 further isolating Russia \\u2013 and adding an additional squeeze \\u2013on their economy. The Ruble has lost 30% of its value.",\n "source": "text_file",\n "metadata": {\n "X-TIKA:Parsed-By": "org.apache.tika.parser.csv.TextAndCSVParser",\n "Content-Encoding": "UTF-8",\n "X-TIKA:detectedEncoding": "UTF-8",\n "X-TIKA:encodingDetector": "UniversalEncodingDetector",\n "Content-Type": "text/plain; charset=UTF-8",\n "source": "text_file",\n "framework": "langchain"\n },\n "score": 0.6250241994857788\n },\n {\n "index": 7,\n "content": "Tonight, I can announce that the United States has worked with 30 other countries to release 60 Million barrels of oil from reserves around the world. America will lead that effort, releasing 30 Million barrels from our own Strategic Petroleum Reserve. And we stand ready to do more if necessary, unified with our allies. These steps will help blunt gas prices here at home. And I know the news about what\\u2019s happening can seem alarming.",\n "source": "text_file",\n "metadata": {\n "X-TIKA:Parsed-By": "org.apache.tika.parser.csv.TextAndCSVParser",\n "Content-Encoding": "UTF-8",\n "X-TIKA:detectedEncoding": "UTF-8",\n "X-TIKA:encodingDetector": "UniversalEncodingDetector",\n "Content-Type": "text/plain; charset=UTF-8",\n "source": "text_file",\n "framework": "langchain"\n },\n "score": 0.6240909099578857\n },\n {\n "index": 8,\n "content": "So tonight I\\u2019m offering a Unity Agenda for the Nation. Four big things we can do together. First, beat the opioid epidemic. There is so much we can do. Increase funding for prevention, treatment, harm reduction, and recovery.",\n "source": "text_file",\n "metadata": {\n "X-TIKA:Parsed-By": "org.apache.tika.parser.csv.TextAndCSVParser",\n "Content-Encoding": "UTF-8",\n "X-TIKA:detectedEncoding": "UTF-8",\n "X-TIKA:encodingDetector": "UniversalEncodingDetector",\n "Content-Type": "text/plain; charset=UTF-8",\n "source": "text_file",\n "framework": "langchain"\n },\n "score": 0.6232858896255493\n },\n {\n "index": 9,\n "content": "We won\\u2019t be able to compete for the jobs of the 21st Century if we don\\u2019t fix that. That\\u2019s why it was so important to pass the Bipartisan Infrastructure Law\\u2014the most sweeping investment to rebuild America in history. This was a bipartisan effort, and I want to thank the members of both parties who worked to make it happen. We\\u2019re done talking about infrastructure weeks. We\\u2019re going to have an infrastructure decade.",\n "source": "text_file",\n "metadata": {\n "X-TIKA:Parsed-By": "org.apache.tika.parser.csv.TextAndCSVParser",\n "Content-Encoding": "UTF-8",\n "X-TIKA:detectedEncoding": "UTF-8",\n "X-TIKA:encodingDetector": "UniversalEncodingDetector",\n "Content-Type": "text/plain; charset=UTF-8",\n "source": "text_file",\n "framework": "langchain"\n },\n "score": 0.6227864027023315\n },\n {\n "index": 10,\n "content": "We\\u2019re going to have an infrastructure decade. It is going to transform America and put us on a path to win the economic competition of the 21st Century that we face with the rest of the world\\u2014particularly with China. As I\\u2019ve told Xi Jinping, it is never a good bet to bet against the American people. We\\u2019ll create good jobs for millions of Americans, modernizing roads, airports, ports, and waterways all across America. And we\\u2019ll do it all to withstand the devastating effects of the climate crisis and promote environmental justice.",\n "source": "text_file",\n "metadata": {\n "X-TIKA:Parsed-By": "org.apache.tika.parser.csv.TextAndCSVParser",\n "Content-Encoding": "UTF-8",\n "X-TIKA:detectedEncoding": "UTF-8",\n "X-TIKA:encodingDetector": "UniversalEncodingDetector",\n "Content-Type": "text/plain; charset=UTF-8",\n "source": "text_file",\n "framework": "langchain"\n },\n "score": 0.6180555820465088\n },\n {\n "index": 11,\n "content": "It delivered immediate economic relief for tens of millions of Americans. Helped put food on their table, keep a roof over their heads, and cut the cost of health insurance. And as my Dad used to say, it gave people a little breathing room. And unlike the $2 Trillion tax cut passed in the previous administration that benefitted the top 1% of Americans, the American Rescue Plan helped working people\\u2014and left no one behind. Lots of jobs. \\n\\nIn fact\\u2014our economy created over 6.5 Million new jobs just last year, more jobs created in one year \\nthan ever before in the history of America.",\n "source": "text_file",\n "metadata": {\n "X-TIKA:Parsed-By": "org.apache.tika.parser.csv.TextAndCSVParser",\n "Content-Encoding": "UTF-8",\n "X-TIKA:detectedEncoding": "UTF-8",\n "X-TIKA:encodingDetector": "UniversalEncodingDetector",\n "Content-Type": "text/plain; charset=UTF-8",\n "source": "text_file",\n "framework": "langchain"\n },\n "score": 0.6175862550735474\n },\n {\n "index": 12,\n "content": "Our purpose is found. Our future is forged. Well I know this nation. We will meet the test. To protect freedom and liberty, to expand fairness and opportunity.",\n "source": "text_file",\n "metadata": {\n "X-TIKA:Parsed-By": "org.apache.tika.parser.csv.TextAndCSVParser",\n "Content-Encoding": "UTF-8",\n "X-TIKA:detectedEncoding": "UTF-8",\n "X-TIKA:encodingDetector": "UniversalEncodingDetector",\n "Content-Type": "text/plain; charset=UTF-8",\n "source": "text_file",\n "framework": "langchain"\n },\n "score": 0.6163091659545898\n },\n {\n "index": 13,\n "content": "He rejected repeated efforts at diplomacy. He thought the West and NATO wouldn\\u2019t respond. And he thought he could divide us at home. We were ready. Here is what we did. We prepared extensively and carefully.",\n "source": "text_file",\n "metadata": {\n "X-TIKA:Parsed-By": "org.apache.tika.parser.csv.TextAndCSVParser",\n "Content-Encoding": "UTF-8",\n "X-TIKA:detectedEncoding": "UTF-8",\n "X-TIKA:encodingDetector": "UniversalEncodingDetector",\n "Content-Type": "text/plain; charset=UTF-8",\n "source": "text_file",\n "framework": "langchain"\n },\n "score": 0.6160664558410645\n },\n {\n "index": 14,\n "content": "The federal government spends about $600 Billion a year to keep the country safe and secure. There\\u2019s been a law on the books for almost a century \\nto make sure taxpayers\\u2019 dollars support American jobs and businesses. Every Administration says they\\u2019ll do it, but we are actually doing it. We will buy American to make sure everything from the deck of an aircraft carrier to the steel on highway guardrails are made in America. But to compete for the best jobs of the future, we also need to level the playing field with China and other competitors.",\n "source": "text_file",\n "metadata": {\n "X-TIKA:Parsed-By": "org.apache.tika.parser.csv.TextAndCSVParser",\n "Content-Encoding": "UTF-8",\n "X-TIKA:detectedEncoding": "UTF-8",\n "X-TIKA:encodingDetector": "UniversalEncodingDetector",\n "Content-Type": "text/plain; charset=UTF-8",\n "source": "text_file",\n "framework": "langchain"\n },\n "score": 0.6155637502670288\n },\n {\n "index": 15,\n "content": "And while you\\u2019re at it, pass the Disclose Act so Americans can know who is funding our elections. Tonight, I\\u2019d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer\\u2014an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson.",\n "source": "text_file",\n "metadata": {\n "X-TIKA:Parsed-By": "org.apache.tika.parser.csv.TextAndCSVParser",\n "Content-Encoding": "UTF-8",\n "X-TIKA:detectedEncoding": "UTF-8",\n "X-TIKA:encodingDetector": "UniversalEncodingDetector",\n "Content-Type": "text/plain; charset=UTF-8",\n "source": "text_file",\n "framework": "langchain"\n },\n "score": 0.6151937246322632\n },\n {\n "index": 16,\n "content": "He loved building Legos with their daughter. But cancer from prolonged exposure to burn pits ravaged Heath\\u2019s lungs and body. Danielle says Heath was a fighter to the very end. He didn\\u2019t know how to stop fighting, and neither did she. Through her pain she found purpose to demand we do better.",\n "source": "text_file",\n "metadata": {\n "X-TIKA:Parsed-By": "org.apache.tika.parser.csv.TextAndCSVParser",\n "Content-Encoding": "UTF-8",\n "X-TIKA:detectedEncoding": "UTF-8",\n "X-TIKA:encodingDetector": "UniversalEncodingDetector",\n "Content-Type": "text/plain; charset=UTF-8",\n "source": "text_file",\n "framework": "langchain"\n },\n "score": 0.5935490727424622\n },\n {\n "index": 17,\n "content": "Six days ago, Russia\\u2019s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. He thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. He met the Ukrainian people.",\n "source": "text_file",\n "metadata": {\n "X-TIKA:Parsed-By": "org.apache.tika.parser.csv.TextAndCSVParser",\n "Content-Encoding": "UTF-8",\n "X-TIKA:detectedEncoding": "UTF-8",\n "X-TIKA:encodingDetector": "UniversalEncodingDetector",\n "Content-Type": "text/plain; charset=UTF-8",\n "source": "text_file",\n "framework": "langchain"\n },\n "score": 0.5424350500106812\n },\n {\n "index": 18,\n "content": "All told, we created 369,000 new manufacturing jobs in America just last year. Powered by people I\\u2019ve met like JoJo Burgess, from generations of union steelworkers from Pittsburgh, who\\u2019s here with us tonight. As Ohio Senator Sherrod Brown says, \\u201cIt\\u2019s time to bury the label \\u201cRust Belt.\\u201d It\\u2019s time. \\n\\nBut with all the bright spots in our economy, record job growth and higher wages, too many families are struggling to keep up with the bills. Inflation is robbing them of the gains they might otherwise feel.",\n "source": "text_file",\n "metadata": {\n "X-TIKA:Parsed-By": "org.apache.tika.parser.csv.TextAndCSVParser",\n "Content-Encoding": "UTF-8",\n "X-TIKA:detectedEncoding": "UTF-8",\n "X-TIKA:encodingDetector": "UniversalEncodingDetector",\n "Content-Type": "text/plain; charset=UTF-8",\n "source": "text_file",\n "framework": "langchain"\n },\n "score": 0.4970792531967163\n },\n {\n "index": 19,\n "content": "Putin\\u2019s latest attack on Ukraine was premeditated and unprovoked. He rejected repeated efforts at diplomacy. He thought the West and NATO wouldn\\u2019t respond. And he thought he could divide us at home. We were ready. Here is what we did.",\n "source": "text_file",\n "metadata": {\n "X-TIKA:Parsed-By": "org.apache.tika.parser.csv.TextAndCSVParser",\n "Content-Encoding": "UTF-8",\n "X-TIKA:detectedEncoding": "UTF-8",\n "X-TIKA:encodingDetector": "UniversalEncodingDetector",\n "Content-Type": "text/plain; charset=UTF-8",\n "source": "text_file",\n "framework": "langchain"\n },\n "score": 0.4501495063304901\n },\n {\n "index": 20,\n "content": "And with an unwavering resolve that freedom will always triumph over tyranny. Six days ago, Russia\\u2019s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. He thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined.",\n "source": "text_file",\n "metadata": {\n "X-TIKA:Parsed-By": "org.apache.tika.parser.csv.TextAndCSVParser",\n "Content-Encoding": "UTF-8",\n "X-TIKA:detectedEncoding": "UTF-8",\n "X-TIKA:encodingDetector": "UniversalEncodingDetector",\n "Content-Type": "text/plain; charset=UTF-8",\n "source": "text_file",\n "framework": "langchain"\n },\n "score": 0.35465705394744873\n },\n {\n "index": 21,\n "content": "But most importantly as Americans. With a duty to one another to the American people to the Constitution. And with an unwavering resolve that freedom will always triumph over tyranny. Six days ago, Russia\\u2019s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated.",\n "source": "text_file",\n "metadata": {\n "X-TIKA:Parsed-By": "org.apache.tika.parser.csv.TextAndCSVParser",\n "Content-Encoding": "UTF-8",\n "X-TIKA:detectedEncoding": "UTF-8",\n "X-TIKA:encodingDetector": "UniversalEncodingDetector",\n "Content-Type": "text/plain; charset=UTF-8",\n "source": "text_file",\n "framework": "langchain"\n },\n "score": 0.3056836426258087\n },\n {\n "index": 22,\n "content": "But cancer from prolonged exposure to burn pits ravaged Heath\\u2019s lungs and body. Danielle says Heath was a fighter to the very end. He didn\\u2019t know how to stop fighting, and neither did she. Through her pain she found purpose to demand we do better. Tonight, Danielle\\u2014we are.",\n "source": "text_file",\n "metadata": {\n "X-TIKA:Parsed-By": "org.apache.tika.parser.csv.TextAndCSVParser",\n "Content-Encoding": "UTF-8",\n "X-TIKA:detectedEncoding": "UTF-8",\n "X-TIKA:encodingDetector": "UniversalEncodingDetector",\n "Content-Type": "text/plain; charset=UTF-8",\n "source": "text_file",\n "framework": "langchain"\n },\n "score": 0.30382269620895386\n },\n {\n "index": 23,\n "content": "Danielle says Heath was a fighter to the very end. He didn\\u2019t know how to stop fighting, and neither did she. Through her pain she found purpose to demand we do better. Tonight, Danielle\\u2014we are. The VA is pioneering new ways of linking toxic exposures to diseases, already helping more veterans get benefits.",\n "source": "text_file",\n "metadata": {\n "X-TIKA:Parsed-By": "org.apache.tika.parser.csv.TextAndCSVParser",\n "Content-Encoding": "UTF-8",\n "X-TIKA:detectedEncoding": "UTF-8",\n "X-TIKA:encodingDetector": "UniversalEncodingDetector",\n "Content-Type": "text/plain; charset=UTF-8",\n "source": "text_file",\n "framework": "langchain"\n },\n "score": 0.1369067132472992\n },\n {\n "index": 24,\n "content": "Groups of citizens blocking tanks with their bodies. Everyone from students to retirees teachers turned soldiers defending their homeland. In this struggle as President Zelenskyy said in his speech to the European Parliament \\u201cLight will win over darkness.\\u201d The Ukrainian Ambassador to the United States is here tonight. Let each of us here tonight in this Chamber send an unmistakable signal to Ukraine and to the world.",\n "source": "text_file",\n "metadata": {\n "X-TIKA:Parsed-By": "org.apache.tika.parser.csv.TextAndCSVParser",\n "Content-Encoding": "UTF-8",\n "X-TIKA:detectedEncoding": "UTF-8",\n "X-TIKA:encodingDetector": "UniversalEncodingDetector",\n "Content-Type": "text/plain; charset=UTF-8",\n "source": "text_file",\n "framework": "langchain"\n },\n "score": 0.04977428913116455\n }\n]'
Vectara 도구와 체이닝
Vectara 도구를 다른 LangChain 컴포넌트와 체이닝할 수 있습니다. 이 예제는 다음 방법을 보여줍니다:- 추가 처리를 위한 ChatOpenAI 모델 설정
- 특정 요약 요구사항을 위한 사용자 정의 prompt template 생성
- LangChain의 Runnable interface를 사용하여 여러 컴포넌트를 함께 체이닝
- Vectara의 JSON 응답 처리 및 포맷팅
Copy
from langchain.prompts import ChatPromptTemplate
from langchain.schema.output_parser import StrOutputParser
from langchain.schema.runnable import RunnableSerializable
from langchain_openai.chat_models import ChatOpenAI
model = ChatOpenAI(temperature=0)
# Create a prompt template
template = """
Based on the following information from the State of the Union address:
{rag_result}
Please provide a concise summary that focuses on the key points mentioned.
If there are any specific numbers or statistics, be sure to include them.
"""
prompt = ChatPromptTemplate.from_template(template)
# Create a function to get RAG results
def get_rag_result(query: str) -> str:
result = vectara_rag_tool.run(query)
result_dict = json.loads(result)
return result_dict["summary"]
# Create the chain
chain: RunnableSerializable = (
{"rag_result": get_rag_result} | prompt | model | StrOutputParser()
)
# Run the chain
chain.invoke("What were the key economic points in Biden's speech?")
Copy
"President Biden's State of the Union address highlighted key economic points, including closing the coverage gap and making savings permanent, cutting energy costs by $500 annually through climate change initiatives, and providing tax credits for energy efficiency. He emphasized doubling clean energy production and reducing electric vehicle costs. Biden proposed cutting child care costs, making housing more affordable, and offering Pre-K for young children. He assured that no one earning under $400,000 would face new taxes and emphasized the need for a fair tax system. His plan to fight inflation focuses on lowering costs without reducing wages, increasing domestic production, and closing tax loopholes for the wealthy. Additionally, he advocated for raising the minimum wage, extending the Child Tax Credit, and ensuring fair pay and opportunities for workers."
agent 내에서 사용
아래 코드는 LangChain과 함께 Vectara 도구를 사용하여 agent를 생성하는 방법을 보여줍니다.Copy
import json
from langchain.messages import HumanMessage
from langchain_openai.chat_models import ChatOpenAI
from langchain.agents import create_agent
# Set up the tools and LLM
tools = [vectara_rag_tool]
model = ChatOpenAI(model="gpt-4o-mini", temperature=0)
# Construct the ReAct agent
agent_executor = create_agent(model, tools)
question = (
"What is an API key? What is a JWT token? When should I use one or the other?"
)
input_data = {"messages": [HumanMessage(content=question)]}
agent_executor.invoke(input_data)
Copy
{'messages': [HumanMessage(content='What is an API key? What is a JWT token? When should I use one or the other?', additional_kwargs={}, response_metadata={}, id='2d0d23c4-ca03-4164-8417-232ce12b47df'),
AIMessage(content="An API key and a JWT (JSON Web Token) are both methods used for authentication and authorization in web applications, but they serve different purposes and have different characteristics.\n\n### API Key\n- **Definition**: An API key is a unique identifier used to authenticate a client making requests to an API. It is typically a long string of characters that is passed along with the API request.\n- **Usage**: API keys are often used for simple authentication scenarios where the client needs to be identified, but there is no need for complex user authentication or session management.\n- **Security**: API keys can be less secure than other methods because they are often static and can be easily exposed if not handled properly. They should be kept secret and not included in public code repositories.\n- **When to Use**: Use API keys for server-to-server communication, when you need to track usage, or when you want to restrict access to certain features of an API.\n\n### JWT (JSON Web Token)\n- **Definition**: A JWT is a compact, URL-safe means of representing claims to be transferred between two parties. It consists of three parts: a header, a payload, and a signature. The payload typically contains user information and claims.\n- **Usage**: JWTs are commonly used for user authentication and authorization in web applications. They allow for stateless authentication, meaning the server does not need to store session information.\n- **Security**: JWTs can be more secure than API keys because they can include expiration times and can be signed to verify their authenticity. However, if a JWT is compromised, it can be used until it expires.\n- **When to Use**: Use JWTs when you need to authenticate users, manage sessions, or pass claims between parties securely. They are particularly useful in single-page applications (SPAs) and microservices architectures.\n\n### Summary\n- **API Key**: Best for simple authentication and tracking API usage. Less secure and static.\n- **JWT**: Best for user authentication and authorization with claims. More secure and supports stateless sessions.\n\nIn general, choose the method that best fits your application's security requirements and architecture.", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 436, 'prompt_tokens': 66, 'total_tokens': 502, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_dbaca60df0', 'id': 'chatcmpl-BPZK7UZveFJrGkT3iwjNQ2XHCmbqF', 'finish_reason': 'stop', 'logprobs': None}, id='run-4717221a-cd77-4627-aa34-3ee1b2a3803e-0', usage_metadata={'input_tokens': 66, 'output_tokens': 436, 'total_tokens': 502, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})]}
VectaraIngest 예제
VectaraIngest 도구를 사용하면 텍스트 콘텐츠를 Vectara corpus에 직접 수집할 수 있습니다. 이는 먼저 파일을 생성하지 않고 corpus에 추가하려는 텍스트 콘텐츠가 있을 때 유용합니다.
사용 방법 예제는 다음과 같습니다:
Copy
ingest_tool = VectaraIngest(
name="ingest_tool",
description="Add new documents about planets",
vectorstore=vectara,
corpus_key=corpus_key,
)
# Test ingest functionality
texts = ["Mars is a red planet.", "Venus has a thick atmosphere."]
metadatas = [{"type": "planet Mars"}, {"type": "planet Venus"}]
ingest_tool.run(
{
"texts": texts,
"metadatas": metadatas,
"doc_metadata": {"test_case": "langchain tool"},
}
)
Copy
'Successfully ingested 2 documents into Vectara corpus test-langchain with IDs: 0de5bbb6c6f0ac632c8d6cda43f02929, 5021e73c9a9128b05c7a94b299744190'
API reference
자세한 내용은 Vectara tools 구현을 확인하세요.Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.