이 문서는 langchain_huggingface chat models 시작하기를 도와드립니다. 모든 ChatHuggingFace 기능과 설정에 대한 자세한 문서는 API reference를 참조하세요. Hugging Face에서 지원하는 모델 목록은 이 페이지를 확인하세요.

Overview

Integration details

Integration details

ClassPackageLocalSerializableJS supportDownloadsVersion
ChatHuggingFacelangchain-huggingfacebetaPyPI - DownloadsPyPI - Version

Model features

Tool callingStructured outputJSON modeImage inputAudio inputVideo inputToken-level streamingNative asyncToken usageLogprobs

Setup

Hugging Face 모델에 액세스하려면 Hugging Face 계정을 생성하고, API 키를 받고, langchain-huggingface integration package를 설치해야 합니다.

Credentials

Hugging Face Access Token을 생성하고 환경 변수 HUGGINGFACEHUB_API_TOKEN으로 저장하세요.
import getpass
import os

if not os.getenv("HUGGINGFACEHUB_API_TOKEN"):
    os.environ["HUGGINGFACEHUB_API_TOKEN"] = getpass.getpass("Enter your token: ")

Installation

ClassPackageLocalSerializableJS supportDownloadsVersion
ChatHuggingFacelangchain-huggingfacePyPI - DownloadsPyPI - Version

Model features

Tool callingStructured outputJSON modeImage inputAudio inputVideo inputToken-level streamingNative asyncToken usageLogprobs

Setup

langchain_huggingface 모델에 액세스하려면 Hugging Face 계정을 생성하고, API 키를 받고, langchain-huggingface integration package를 설치해야 합니다.

Credentials

Hugging Face Access Token을 환경 변수 HUGGINGFACEHUB_API_TOKEN으로 저장해야 합니다.
import getpass
import os

os.environ["HUGGINGFACEHUB_API_TOKEN"] = getpass.getpass(
    "Enter your Hugging Face API key: "
)
pip install -qU  langchain-huggingface text-generation transformers google-search-results numexpr langchainhub sentencepiece jinja2 bitsandbytes accelerate
[notice] A new release of pip is available: 24.0 -> 24.1.2
[notice] To update, run: pip install -U pip
Note: you may need to restart the kernel to use updated packages.

Instantiation

ChatHuggingFace 모델은 HuggingFaceEndpoint 또는 HuggingFacePipeline 두 가지 방법으로 인스턴스화할 수 있습니다.

HuggingFaceEndpoint

from langchain_huggingface import ChatHuggingFace, HuggingFaceEndpoint

llm = HuggingFaceEndpoint(
    repo_id="deepseek-ai/DeepSeek-R1-0528",
    task="text-generation",
    max_new_tokens=512,
    do_sample=False,
    repetition_penalty=1.03,
    provider="auto",  # let Hugging Face choose the best provider for you
)

chat_model = ChatHuggingFace(llm=llm)
The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: fineGrained).
Your token has been saved to /Users/isaachershenson/.cache/huggingface/token
Login successful
이제 Inference Providers를 활용하여 특정 타사 제공업체에서 모델을 실행해 보겠습니다
llm = HuggingFaceEndpoint(
    repo_id="deepseek-ai/DeepSeek-R1-0528",
    task="text-generation",
    provider="hyperbolic",  # set your provider here
    # provider="nebius",
    # provider="together",
)

chat_model = ChatHuggingFace(llm=llm)

HuggingFacePipeline

from langchain_huggingface import ChatHuggingFace, HuggingFacePipeline

llm = HuggingFacePipeline.from_model_id(
    model_id="HuggingFaceH4/zephyr-7b-beta",
    task="text-generation",
    pipeline_kwargs=dict(
        max_new_tokens=512,
        do_sample=False,
        repetition_penalty=1.03,
    ),
)

chat_model = ChatHuggingFace(llm=llm)
config.json:   0%|          | 0.00/638 [00:00<?, ?B/s]
model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]
Downloading shards:   0%|          | 0/8 [00:00<?, ?it/s]
model-00001-of-00008.safetensors:   0%|          | 0.00/1.89G [00:00<?, ?B/s]
model-00002-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]
model-00003-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]
model-00004-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]
model-00005-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]
model-00006-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]
model-00007-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]
model-00008-of-00008.safetensors:   0%|          | 0.00/816M [00:00<?, ?B/s]
Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]
generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

Instatiating with Quantization

모델의 양자화된 버전을 실행하려면 다음과 같이 bitsandbytes quantization config를 지정할 수 있습니다:
from transformers import BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype="float16",
    bnb_4bit_use_double_quant=True,
)
그리고 이를 HuggingFacePipelinemodel_kwargs의 일부로 전달합니다:
llm = HuggingFacePipeline.from_model_id(
    model_id="HuggingFaceH4/zephyr-7b-beta",
    task="text-generation",
    pipeline_kwargs=dict(
        max_new_tokens=512,
        do_sample=False,
        repetition_penalty=1.03,
        return_full_text=False,
    ),
    model_kwargs={"quantization_config": quantization_config},
)

chat_model = ChatHuggingFace(llm=llm)

Invocation

from langchain.messages import (
    HumanMessage,
    SystemMessage,
)

messages = [
    SystemMessage(content="You're a helpful assistant"),
    HumanMessage(
        content="What happens when an unstoppable force meets an immovable object?"
    ),
]

ai_msg = chat_model.invoke(messages)
print(ai_msg.content)
According to the popular phrase and hypothetical scenario, when an unstoppable force meets an immovable object, a paradoxical situation arises as both forces are seemingly contradictory. On one hand, an unstoppable force is an entity that cannot be stopped or prevented from moving forward, while on the other hand, an immovable object is something that cannot be moved or displaced from its position.

In this scenario, it is un

API reference

모든 ChatHuggingFace 기능과 설정에 대한 자세한 문서는 API reference를 참조하세요: python.langchain.com/api_reference/huggingface/chat_models/langchain_huggingface.chat_models.huggingface.ChatHuggingFace.html
Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.
I