ChatPremAI

PremAI는 Generative AI로 구동되는 강력하고 프로덕션 준비가 완료된 애플리케이션 생성을 간소화하는 올인원 플랫폼입니다. 개발 프로세스를 간소화함으로써 PremAI를 사용하면 사용자 경험 향상과 애플리케이션의 전반적인 성장 촉진에 집중할 수 있습니다. 여기에서 플랫폼 사용을 빠르게 시작할 수 있습니다. 이 예제는 ChatPremAI를 사용하여 다양한 chat model과 상호작용하기 위해 LangChain을 사용하는 방법을 다룹니다.

Installation 및 setup

먼저 langchain과 premai-sdk를 설치합니다. 다음 명령어를 입력하여 설치할 수 있습니다:

pip install premai langchain

계속 진행하기 전에 PremAI에 계정을 만들고 이미 프로젝트를 생성했는지 확인하세요. 그렇지 않은 경우 quick start 가이드를 참조하여 PremAI 플랫폼을 시작하세요. 첫 번째 프로젝트를 생성하고 API key를 가져오세요.

from langchain_community.chat_models import ChatPremAI
from langchain.messages import HumanMessage, SystemMessage

LangChain에서 PremAI client setup

필요한 module을 import한 후 client를 설정해봅시다. 지금은 project_id가 8이라고 가정하겠습니다. 하지만 반드시 자신의 project-id를 사용하세요. 그렇지 않으면 오류가 발생합니다. langchain과 prem을 함께 사용하려면 chat-client에 model 이름을 전달하거나 parameter를 설정할 필요가 없습니다. 기본적으로 LaunchPad에서 사용된 model 이름과 parameter를 사용합니다.

참고: client를 설정할 때 model이나 temperature 또는 max_tokens와 같은 다른 parameter를 변경하면 LaunchPad에서 사용된 기존 기본 구성을 재정의합니다.

import getpass
import os

# First step is to set up the env variable.
# you can also pass the API key while instantiating the model but this
# comes under a best practices to set it as env variable.

if os.environ.get("PREMAI_API_KEY") is None:
    os.environ["PREMAI_API_KEY"] = getpass.getpass("PremAI API Key:")

# By default it will use the model which was deployed through the platform
# in my case it will is "gpt-4o"

chat = ChatPremAI(project_id=1234, model_name="gpt-4o")

Chat Completions

ChatPremAI는 두 가지 method를 지원합니다: invoke (generate와 동일)와 stream. 첫 번째는 정적 결과를 제공합니다. 반면 두 번째는 token을 하나씩 스트리밍합니다. chat과 같은 completion을 생성하는 방법은 다음과 같습니다.

human_message = HumanMessage(content="Who are you?")

response = chat.invoke([human_message])
print(response.content)

I am an AI language model created by OpenAI, designed to assist with answering questions and providing information based on the context provided. How can I help you today?

위 내용이 흥미롭지 않나요? 저는 기본 launchpad system-prompt를 Always sound like a pirate로 설정했습니다. 필요한 경우 기본 system prompt를 재정의할 수도 있습니다. 방법은 다음과 같습니다.

system_message = SystemMessage(content="You are a friendly assistant.")
human_message = HumanMessage(content="Who are you?")

chat.invoke([system_message, human_message])

AIMessage(content="I'm your friendly assistant! How can I help you today?", response_metadata={'document_chunks': [{'repository_id': 1985, 'document_id': 1306, 'chunk_id': 173899, 'document_name': '[D] Difference between sparse and dense informati…', 'similarity_score': 0.3209080100059509, 'content': "with the difference or anywhere\nwhere I can read about it?\n\n\n      17                  9\n\n\n      u/ScotiabankCanada        •  Promoted\n\n\n                       Accelerate your study permit process\n                       with Scotiabank's Student GIC\n                       Program. We're here to help you tur…\n\n\n                       startright.scotiabank.com         Learn More\n\n\n                            Add a Comment\n\n\nSort by:   Best\n\n\n      DinosParkour      • 1y ago\n\n\n     Dense Retrieval (DR) m"}]}, id='run-510bbd0e-3f8f-4095-9b1f-c2d29fd89719-0')

다음과 같이 system prompt를 제공할 수 있습니다:

chat.invoke([system_message, human_message], temperature=0.7, max_tokens=10, top_p=0.95)

/home/anindya/prem/langchain/libs/community/langchain_community/chat_models/premai.py:355: UserWarning: WARNING: Parameter top_p is not supported in kwargs.
  warnings.warn(f"WARNING: Parameter {key} is not supported in kwargs.")

AIMessage(content="Hello! I'm your friendly assistant. How can I", response_metadata={'document_chunks': [{'repository_id': 1985, 'document_id': 1306, 'chunk_id': 173899, 'document_name': '[D] Difference between sparse and dense informati…', 'similarity_score': 0.3209080100059509, 'content': "with the difference or anywhere\nwhere I can read about it?\n\n\n      17                  9\n\n\n      u/ScotiabankCanada        •  Promoted\n\n\n                       Accelerate your study permit process\n                       with Scotiabank's Student GIC\n                       Program. We're here to help you tur…\n\n\n                       startright.scotiabank.com         Learn More\n\n\n                            Add a Comment\n\n\nSort by:   Best\n\n\n      DinosParkour      • 1y ago\n\n\n     Dense Retrieval (DR) m"}]}, id='run-c4b06b98-4161-4cca-8495-fd2fc98fa8f8-0')

여기에 system prompt를 배치하면 플랫폼에서 애플리케이션을 배포할 때 고정된 system prompt가 재정의됩니다.

Prem Repositories를 사용한 Native RAG 지원

Prem Repositories는 사용자가 문서(.txt, .pdf 등)를 업로드하고 해당 repository를 LLM에 연결할 수 있게 해줍니다. Prem repository를 native RAG로 생각할 수 있으며, 각 repository는 vector database로 간주될 수 있습니다. 여러 repository를 연결할 수 있습니다. repository에 대한 자세한 내용은 여기에서 확인할 수 있습니다. Repository는 langchain premai에서도 지원됩니다. 방법은 다음과 같습니다.

query = "Which models are used for dense retrieval"
repository_ids = [
    1985,
]
repositories = dict(ids=repository_ids, similarity_threshold=0.3, limit=3)

먼저 일부 repository id로 repository를 정의합니다. id가 유효한 repository id인지 확인하세요. repository id를 얻는 방법에 대한 자세한 내용은 여기에서 확인할 수 있습니다.

참고: model_name과 유사하게 repositories argument를 invoke하면 launchpad에 연결된 repository를 잠재적으로 재정의하게 됩니다.

이제 RAG 기반 generation을 invoke하기 위해 repository를 chat object와 연결합니다.

import json

response = chat.invoke(query, max_tokens=100, repositories=repositories)

print(response.content)
print(json.dumps(response.response_metadata, indent=4))

Dense retrieval models typically include:

1. **BERT-based Models**: Such as DPR (Dense Passage Retrieval) which uses BERT for encoding queries and passages.
2. **ColBERT**: A model that combines BERT with late interaction mechanisms.
3. **ANCE (Approximate Nearest Neighbor Negative Contrastive Estimation)**: Uses BERT and focuses on efficient retrieval.
4. **TCT-ColBERT**: A variant of ColBERT that uses a two-tower
{
    "document_chunks": [
        {
            "repository_id": 1985,
            "document_id": 1306,
            "chunk_id": 173899,
            "document_name": "[D] Difference between sparse and dense informati\u2026",
            "similarity_score": 0.3209080100059509,
            "content": "with the difference or anywhere\nwhere I can read about it?\n\n\n      17                  9\n\n\n      u/ScotiabankCanada        \u2022  Promoted\n\n\n                       Accelerate your study permit process\n                       with Scotiabank's Student GIC\n                       Program. We're here to help you tur\u2026\n\n\n                       startright.scotiabank.com         Learn More\n\n\n                            Add a Comment\n\n\nSort by:   Best\n\n\n      DinosParkour      \u2022 1y ago\n\n\n     Dense Retrieval (DR) m"
        }
    ]
}

이상적으로는 Retrieval Augmented Generation을 얻기 위해 여기에 Repository ID를 연결할 필요가 없습니다. prem 플랫폼에서 repository를 연결한 경우 동일한 결과를 얻을 수 있습니다.

Prem Templates

Prompt Template 작성은 매우 복잡할 수 있습니다. Prompt template은 길고 관리하기 어려우며 개선하고 애플리케이션 전체에서 동일하게 유지하기 위해 지속적으로 조정해야 합니다. Prem을 사용하면 prompt 작성 및 관리가 매우 쉬워집니다. launchpad 내의 Templates 탭을 사용하면 필요한 만큼 많은 prompt를 작성하고 SDK 내에서 사용하여 해당 prompt를 사용하는 애플리케이션을 실행할 수 있습니다. Prompt Template에 대한 자세한 내용은 여기에서 확인할 수 있습니다. LangChain과 함께 Prem Template을 기본적으로 사용하려면 HumanMessage에 id를 전달해야 합니다. 이 id는 prompt template의 변수 이름이어야 합니다. HumanMessage의 content는 해당 변수의 값이어야 합니다. 예를 들어 prompt template이 다음과 같다면:

Say hello to my name and say a feel-good quote
from my age. My name is: {name} and age is {age}

이제 human_messages는 다음과 같아야 합니다:

human_messages = [
    HumanMessage(content="Shawn", id="name"),
    HumanMessage(content="22", id="age"),
]

이 human_messages를 ChatPremAI Client에 전달하세요. 참고: Prem Template으로 generation을 invoke하려면 추가 template_id를 전달하는 것을 잊지 마세요. template_id에 대해 잘 모르는 경우 문서에서 자세히 알아볼 수 있습니다. 예제는 다음과 같습니다:

template_id = "78069ce8-xxxxx-xxxxx-xxxx-xxx"
response = chat.invoke([human_messages], template_id=template_id)
print(response.content)

Prem Template 기능은 streaming에서도 사용할 수 있습니다.

Streaming

이 섹션에서는 langchain과 PremAI를 사용하여 token을 스트리밍하는 방법을 살펴보겠습니다. 방법은 다음과 같습니다.

import sys

for chunk in chat.stream("hello how are you"):
    sys.stdout.write(chunk.content)
    sys.stdout.flush()

It looks like your message got cut off. If you need information about Dense Retrieval (DR) or any other topic, please provide more details or clarify your question.

위와 유사하게 system-prompt와 generation parameter를 재정의하려면 다음을 추가해야 합니다:

import sys

# For some experimental reasons if you want to override the system prompt then you
# can pass that here too. However it is not recommended to override system prompt
# of an already deployed model.

for chunk in chat.stream(
    "hello how are you",
    system_prompt="act like a dog",
    temperature=0.7,
    max_tokens=200,
):
    sys.stdout.write(chunk.content)
    sys.stdout.flush()

Woof! 🐾 How can I help you today? Want to play fetch or maybe go for a walk 🐶🦴

Tool/Function Calling

LangChain PremAI는 tool/function calling을 지원합니다. Tool/function calling을 사용하면 model이 사용자 정의 schema와 일치하는 출력을 생성하여 주어진 prompt에 응답할 수 있습니다.

tool calling에 대한 모든 세부 정보는 문서에서 확인할 수 있습니다.
langchain tool calling에 대한 자세한 내용은 문서의 이 부분에서 확인할 수 있습니다.

참고: 현재 버전의 LangChain ChatPremAI는 streaming 지원과 함께 function/tool calling을 지원하지 않습니다. function calling과 함께 streaming 지원은 곧 제공될 예정입니다.

Model에 tool 전달하기

tool을 전달하고 LLM이 호출해야 하는 tool을 선택하도록 하려면 tool schema를 전달해야 합니다. Tool schema는 함수가 수행하는 작업, 함수의 각 argument가 무엇인지 등에 대한 적절한 docstring과 함께 함수 정의입니다. 아래는 schema가 포함된 몇 가지 간단한 산술 함수입니다. 참고: function/tool schema를 정의할 때 함수 argument에 대한 정보를 추가하는 것을 잊지 마세요. 그렇지 않으면 오류가 발생합니다.

from langchain.tools import tool
from pydantic import BaseModel, Field


# Define the schema for function arguments
class OperationInput(BaseModel):
    a: int = Field(description="First number")
    b: int = Field(description="Second number")


# Now define the function where schema for argument will be OperationInput
@tool("add", args_schema=OperationInput, return_direct=True)
def add(a: int, b: int) -> int:
    """Adds `a` and `b`.

    Args:
        a: First int
        b: Second int
    """
    return a + b


@tool("multiply", args_schema=OperationInput, return_direct=True)
def multiply(a: int, b: int) -> int:
    """Multiplies a and b.

    Args:
        a: First int
        b: Second int
    """
    return a * b

LLM에 tool schema binding하기

이제 bind_tools method를 사용하여 위의 함수를 “tool”로 변환하고 model과 binding합니다. 이는 model을 invoke할 때마다 이러한 tool 정보를 전달한다는 의미입니다.

tools = [add, multiply]
llm_with_tools = chat.bind_tools(tools)

이후 tool과 binding된 model로부터 응답을 받습니다.

query = "What is 3 * 12? Also, what is 11 + 49?"

messages = [HumanMessage(query)]
ai_msg = llm_with_tools.invoke(messages)

보시다시피 chat model이 tool과 binding되면 주어진 prompt를 기반으로 올바른 tool 세트를 순차적으로 호출합니다.

ai_msg.tool_calls

[{'name': 'multiply',
  'args': {'a': 3, 'b': 12},
  'id': 'call_A9FL20u12lz6TpOLaiS6rFa8'},
 {'name': 'add',
  'args': {'a': 11, 'b': 49},
  'id': 'call_MPKYGLHbf39csJIyb5BZ9xIk'}]

위에 표시된 이 message를 LLM에 추가하여 context 역할을 하고 LLM이 호출한 모든 함수를 인식하도록 합니다.

messages.append(ai_msg)

Tool calling은 두 단계로 발생합니다:

첫 번째 호출에서 LLM이 tool하기로 결정한 모든 tool을 수집하여 더 정확하고 환각 없는 결과를 제공하기 위한 추가 context로 결과를 얻을 수 있습니다.
두 번째 호출에서 LLM이 결정한 tool 세트를 파싱하고 실행한 다음(우리의 경우 LLM이 추출한 argument와 함께 정의한 함수가 됩니다) 이 결과를 LLM에 전달합니다.

from langchain.messages import ToolMessage

for tool_call in ai_msg.tool_calls:
    selected_tool = {"add": add, "multiply": multiply}[tool_call["name"].lower()]
    tool_output = selected_tool.invoke(tool_call["args"])
    messages.append(ToolMessage(tool_output, tool_call_id=tool_call["id"]))

마지막으로 context에 함수 응답이 추가된 상태로 LLM(tool과 binding된)을 호출합니다.

response = llm_with_tools.invoke(messages)
print(response.content)

The final answers are:

- 3 * 12 = 36
- 11 + 49 = 60

Tool schema 정의하기: Pydantic class

위에서 tool decorator를 사용하여 schema를 정의하는 방법을 보여주었지만 Pydantic을 사용하여 schema를 동등하게 정의할 수 있습니다. Pydantic은 tool input이 더 복잡할 때 유용합니다:

from langchain_core.output_parsers.openai_tools import PydanticToolsParser

class add(BaseModel):
    """Add two integers together."""

    a: int = Field(..., description="First integer")
    b: int = Field(..., description="Second integer")

class multiply(BaseModel):
    """Multiply two integers together."""

    a: int = Field(..., description="First integer")
    b: int = Field(..., description="Second integer")

tools = [add, multiply]

이제 chat model에 binding하고 직접 결과를 얻을 수 있습니다:

chain = llm_with_tools | PydanticToolsParser(tools=[multiply, add])
chain.invoke(query)

[multiply(a=3, b=12), add(a=11, b=49)]

이제 위에서 수행한 것처럼 이것을 파싱하고 함수를 실행한 다음 LLM을 다시 호출하여 결과를 얻습니다.

Edit the source of this page on GitHub.

Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.

Popular Providers

Integrations by component

Installation 및 setup

LangChain에서 PremAI client setup

Chat Completions

Prem Repositories를 사용한 Native RAG 지원

Prem Templates

Streaming

Tool/Function Calling

Model에 tool 전달하기

LLM에 tool schema binding하기

Tool schema 정의하기: Pydantic class

Popular Providers

Integrations by component

​Installation 및 setup

​LangChain에서 PremAI client setup

​Chat Completions

​Prem Repositories를 사용한 Native RAG 지원

​Prem Templates

​Streaming

​Tool/Function Calling

​Model에 tool 전달하기

​LLM에 tool schema binding하기

​Tool schema 정의하기: Pydantic class

Installation 및 setup

LangChain에서 PremAI client setup

Chat Completions

Prem Repositories를 사용한 Native RAG 지원

Prem Templates

Streaming

Tool/Function Calling

Model에 tool 전달하기

LLM에 tool schema binding하기

Tool schema 정의하기: Pydantic class