Kinetica

---
title: Kinetica Language To SQL Chat Model
---

이 노트북은 Kinetica를 사용하여 자연어를 SQL로 변환하고 데이터 검색 프로세스를 단순화하는 방법을 보여줍니다. 이 데모는 LLM의 기능보다는 chain을 생성하고 사용하는 메커니즘을 보여주기 위한 것입니다.

## 개요

Kinetica LLM workflow를 사용하면 데이터베이스에 LLM context를 생성하여 테이블, 주석, 규칙 및 샘플을 포함한 추론에 필요한 정보를 제공합니다. `ChatKinetica.load_messages_from_context()`를 호출하면 데이터베이스에서 context 정보를 검색하여 chat prompt를 생성하는 데 사용할 수 있습니다.

chat prompt는 [`SystemMessage`](https://reference.langchain.com/python/langchain/messages/#langchain.messages.SystemMessage)와 샘플(질문/SQL 쌍)을 포함하는 `HumanMessage`/`AIMessage` 쌍으로 구성됩니다. 이 목록에 샘플 쌍을 추가할 수 있지만 일반적인 자연어 대화를 촉진하기 위한 것은 아닙니다.

chat prompt에서 chain을 생성하고 실행하면 Kinetica LLM이 입력에서 SQL을 생성합니다. 선택적으로 `KineticaSqlOutputParser`를 사용하여 SQL을 실행하고 결과를 dataframe으로 반환할 수 있습니다.

현재 SQL 생성을 위해 2개의 LLM이 지원됩니다:

1. **Kinetica SQL-GPT**: 이 LLM은 OpenAI ChatGPT API를 기반으로 합니다.
2. **Kinetica SqlAssist**: 이 LLM은 Kinetica 데이터베이스와 통합되도록 특별히 제작되었으며 안전한 고객 환경에서 실행될 수 있습니다.

이 데모에서는 **SqlAssist**를 사용합니다. 자세한 내용은 [Kinetica Documentation site](https://docs.kinetica.com/7.1/sql-gpt/concepts/)를 참조하세요.

## 사전 요구 사항

시작하려면 Kinetica DB 인스턴스가 필요합니다. 없는 경우 [무료 개발 인스턴스](https://cloud.kinetica.com/trynow)를 얻을 수 있습니다.

다음 패키지를 설치해야 합니다...

```python
# Install LangChain community and core packages
pip install -qU langchain-core langchain-community

# Install Kinetica DB connection package
pip install -qU 'gpudb>=7.2.0.8' typeguard pandas tqdm

# Install packages needed for this tutorial
pip install -qU faker ipykernel

Database 연결

다음 환경 변수에서 데이터베이스 연결을 설정해야 합니다. 가상 환경을 사용하는 경우 프로젝트의 .env 파일에서 설정할 수 있습니다:

KINETICA_URL: Database 연결 URL
KINETICA_USER: Database 사용자
KINETICA_PASSWD: 보안 비밀번호.

KineticaChatLLM 인스턴스를 생성할 수 있다면 성공적으로 연결된 것입니다.

from langchain_community.chat_models.kinetica import ChatKinetica

kinetica_llm = ChatKinetica()

# Test table we will create
table_name = "demo.user_profiles"

# LLM Context we will create
kinetica_ctx = "demo.test_llm_ctx"

테스트 데이터 생성

SQL을 생성하기 전에 Kinetica 테이블과 테이블을 추론할 수 있는 LLM context를 생성해야 합니다.

가짜 사용자 프로필 생성

faker 패키지를 사용하여 100개의 가짜 프로필이 있는 dataframe을 생성합니다.

from typing import Generator

import pandas as pd
from faker import Faker

Faker.seed(5467)
faker = Faker(locale="en-US")


def profile_gen(count: int) -> Generator:
    for id in range(0, count):
        rec = dict(id=id, **faker.simple_profile())
        rec["birthdate"] = pd.Timestamp(rec["birthdate"])
        yield rec


load_df = pd.DataFrame.from_records(data=profile_gen(100), index="id")
print(load_df.head())

         username             name sex  \
id
0       eduardo69       Haley Beck   F
1        lbarrera  Joshua Stephens   M
2         bburton     Paula Kaiser   F
3       melissa49      Wendy Reese   F
4   melissacarter      Manuel Rios   M

                                              address                    mail  \
id
0   59836 Carla Causeway Suite 939\nPort Eugene, I...  [email protected]
1   3108 Christina Forges\nPort Timothychester, KY...     [email protected]
2                    Unit 7405 Box 3052\nDPO AE 09858  [email protected]
3   6408 Christopher Hill Apt. 459\nNew Benjamin, ...        [email protected]
4    2241 Bell Gardens Suite 723\nScottside, CA 38463  [email protected]

    birthdate
id
0  1997-12-08
1  1924-08-03
2  1933-12-05
3  1988-10-26
4  1931-03-19

Dataframe에서 Kinetica 테이블 생성

from gpudb import GPUdbTable

gpudb_table = GPUdbTable.from_df(
    load_df,
    db=kinetica_llm.kdbc,
    table_name=table_name,
    clear_table=True,
    load_data=True,
)

# See the Kinetica column types
print(gpudb_table.type_as_df())

        name    type   properties
 username  string     [char32]
     name  string     [char32]
      sex  string      [char2]
  address  string     [char64]
     mail  string     [char32]
birthdate    long  [timestamp]

LLM context 생성

Kinetica Workbench UI를 사용하여 LLM Context를 생성하거나 CREATE OR REPLACE CONTEXT 구문으로 수동으로 생성할 수 있습니다. 여기서는 생성한 테이블을 참조하는 SQL 구문에서 context를 생성합니다.

from gpudb import GPUdbSamplesClause, GPUdbSqlContext, GPUdbTableClause

table_ctx = GPUdbTableClause(table=table_name, comment="Contains user profiles.")

samples_ctx = GPUdbSamplesClause(
    samples=[
        (
            "How many male users are there?",
            f"""
            select count(1) as num_users
                from {table_name}
                where sex = 'M';
            """,
        )
    ]
)

context_sql = GPUdbSqlContext(
    name=kinetica_ctx, tables=[table_ctx], samples=samples_ctx
).build_sql()

print(context_sql)
count_affected = kinetica_llm.kdbc.execute(context_sql)
count_affected

CREATE OR REPLACE CONTEXT "demo"."test_llm_ctx" (
    TABLE = "demo"."user_profiles",
    COMMENT = 'Contains user profiles.'
),
(
    SAMPLES = (
        'How many male users are there?' = 'select count(1) as num_users
    from demo.user_profiles
    where sex = ''M'';' )
)

추론을 위한 LangChain 사용

아래 예제에서는 이전에 생성한 테이블과 LLM context에서 chain을 생성합니다. 이 chain은 SQL을 생성하고 결과 데이터를 dataframe으로 반환합니다.

Kinetica DB에서 chat prompt 로드

load_messages_from_context() 함수는 DB에서 context를 검색하고 이를 chat message 목록으로 변환하여 ChatPromptTemplate을 생성하는 데 사용합니다.

from langchain_core.prompts import ChatPromptTemplate

# load the context from the database
ctx_messages = kinetica_llm.load_messages_from_context(kinetica_ctx)

# Add the input prompt. This is where input question will be substituted.
ctx_messages.append(("human", "{input}"))

# Create the prompt template.
prompt_template = ChatPromptTemplate.from_messages(ctx_messages)
prompt_template.pretty_print()

================================ System Message ================================

CREATE TABLE demo.user_profiles AS
(
   username VARCHAR (32) NOT NULL,
   name VARCHAR (32) NOT NULL,
   sex VARCHAR (2) NOT NULL,
   address VARCHAR (64) NOT NULL,
   mail VARCHAR (32) NOT NULL,
   birthdate TIMESTAMP NOT NULL
);
COMMENT ON TABLE demo.user_profiles IS 'Contains user profiles.';

================================ Human Message =================================

How many male users are there?

================================== Ai Message ==================================

select count(1) as num_users
    from demo.user_profiles
    where sex = 'M';

================================ Human Message =================================

{input}

chain 생성

이 chain의 마지막 요소는 SQL을 실행하고 dataframe을 반환하는 KineticaSqlOutputParser입니다. 이것은 선택 사항이며 생략하면 SQL만 반환됩니다.

from langchain_community.chat_models.kinetica import (
    KineticaSqlOutputParser,
    KineticaSqlResponse,
)

chain = prompt_template | kinetica_llm | KineticaSqlOutputParser(kdbc=kinetica_llm.kdbc)

SQL 생성

생성한 chain은 질문을 입력으로 받아 생성된 SQL과 데이터를 포함하는 KineticaSqlResponse를 반환합니다. 질문은 prompt를 생성하는 데 사용한 LLM context와 관련이 있어야 합니다.

# Here you must ask a question relevant to the LLM context provided in the prompt template.
response: KineticaSqlResponse = chain.invoke(
    {"input": "What are the female users ordered by username?"}
)

print(f"SQL: {response.sql}")
print(response.dataframe.head())

SQL: SELECT username, name
    FROM demo.user_profiles
    WHERE sex = 'F'
    ORDER BY username;
      username               name
0  alexander40       Tina Ramirez
1      bburton       Paula Kaiser
2      brian12  Stefanie Williams
3    brownanna      Jennifer Rowe
4       carl19       Amanda Potts

---

<Callout icon="pen-to-square" iconType="regular">
    [Edit the source of this page on GitHub.](https://github.com/langchain-ai/docs/edit/main/src/oss/python/integrations/chat/kinetica.mdx)
</Callout>
<Tip icon="terminal" iconType="regular">
    [Connect these docs programmatically](/use-these-docs) to Claude, VSCode, and more via MCP for    real-time answers.
</Tip>

Popular Providers

Integrations by component

Database 연결

테스트 데이터 생성

가짜 사용자 프로필 생성

Dataframe에서 Kinetica 테이블 생성

LLM context 생성

추론을 위한 LangChain 사용

Kinetica DB에서 chat prompt 로드

chain 생성

SQL 생성

Popular Providers

Integrations by component

​Database 연결

​테스트 데이터 생성

​가짜 사용자 프로필 생성

​Dataframe에서 Kinetica 테이블 생성

​LLM context 생성

​추론을 위한 LangChain 사용

​Kinetica DB에서 chat prompt 로드

​chain 생성

​SQL 생성

Database 연결

테스트 데이터 생성

가짜 사용자 프로필 생성

Dataframe에서 Kinetica 테이블 생성

LLM context 생성

추론을 위한 LangChain 사용

Kinetica DB에서 chat prompt 로드

chain 생성

SQL 생성