ChatHuggingFace

This will help you get started with langchain_huggingface chat models. For detailed documentation of all ChatHuggingFace features and configurations head to the API reference. For a list of models supported by Hugging Face check out this page.

Overview

Integration details

Class	Package	Local	Serializable	JS support	Package downloads	Package latest
ChatHuggingFace	langchain-huggingface	✅	beta	❌

Model features

Tool calling	Structured output	JSON mode	Image input	Audio input	Video input	Token-level streaming	Native async	Token usage	Logprobs
✅	✅	❌	✅	✅	✅	❌	✅	✅	❌

Setup

To access Hugging Face models you'll need to create a Hugging Face account, get an API key, and install the langchain-huggingface integration package.

Credentials

Generate a Hugging Face Access Token and store it as an environment variable: HUGGINGFACEHUB_API_TOKEN.

import getpass
import os

if not os.getenv("HUGGINGFACEHUB_API_TOKEN"):
    os.environ["HUGGINGFACEHUB_API_TOKEN"] = getpass.getpass("Enter your token: ")

Installation

Class	Package	Local	Serializable	JS support	Package downloads	Package latest
ChatHuggingFace	langchain_huggingface	✅	❌	❌

Model features

Tool calling	Structured output	JSON mode	Image input	Audio input	Video input	Token-level streaming	Native async	Token usage	Logprobs
✅	✅	❌	❌	❌	❌	❌	❌	❌	❌

Setup

To access langchain_huggingface models you'll need to create a/an Hugging Face account, get an API key, and install the langchain_huggingface integration package.

Credentials

You'll need to have a Hugging Face Access Token saved as an environment variable: HUGGINGFACEHUB_API_TOKEN.

import getpass
import os

os.environ["HUGGINGFACEHUB_API_TOKEN"] = getpass.getpass(
    "Enter your Hugging Face API key: "
)

%pip install --upgrade --quiet  langchain-huggingface text-generation transformers google-search-results numexpr langchainhub sentencepiece jinja2 bitsandbytes accelerate

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.

Instantiation

You can instantiate a ChatHuggingFace model in two different ways, either from a HuggingFaceEndpoint or from a HuggingFacePipeline.

`HuggingFaceEndpoint`

from langchain_huggingface import ChatHuggingFace, HuggingFaceEndpoint

llm = HuggingFaceEndpoint(
    repo_id="HuggingFaceH4/zephyr-7b-beta",
    task="text-generation",
    max_new_tokens=512,
    do_sample=False,
    repetition_penalty=1.03,
)

chat_model = ChatHuggingFace(llm=llm)

API Reference:ChatHuggingFace | HuggingFaceEndpoint

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: fineGrained).
Your token has been saved to /Users/isaachershenson/.cache/huggingface/token
Login successful

`HuggingFacePipeline`

from langchain_huggingface import ChatHuggingFace, HuggingFacePipeline

llm = HuggingFacePipeline.from_model_id(
    model_id="HuggingFaceH4/zephyr-7b-beta",
    task="text-generation",
    pipeline_kwargs=dict(
        max_new_tokens=512,
        do_sample=False,
        repetition_penalty=1.03,
    ),
)

chat_model = ChatHuggingFace(llm=llm)

API Reference:ChatHuggingFace | HuggingFacePipeline

config.json:   0%|          | 0.00/638 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/8 [00:00<?, ?it/s]

model-00001-of-00008.safetensors:   0%|          | 0.00/1.89G [00:00<?, ?B/s]

model-00002-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00003-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00004-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00005-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00006-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00007-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00008-of-00008.safetensors:   0%|          | 0.00/816M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

Instatiating with Quantization

To run a quantized version of your model, you can specify a bitsandbytes quantization config as follows:

from transformers import BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype="float16",
    bnb_4bit_use_double_quant=True,
)

and pass it to the HuggingFacePipeline as a part of its model_kwargs:

llm = HuggingFacePipeline.from_model_id(
    model_id="HuggingFaceH4/zephyr-7b-beta",
    task="text-generation",
    pipeline_kwargs=dict(
        max_new_tokens=512,
        do_sample=False,
        repetition_penalty=1.03,
        return_full_text=False,
    ),
    model_kwargs={"quantization_config": quantization_config},
)

chat_model = ChatHuggingFace(llm=llm)

Invocation

from langchain_core.messages import (
    HumanMessage,
    SystemMessage,
)

messages = [
    SystemMessage(content="You're a helpful assistant"),
    HumanMessage(
        content="What happens when an unstoppable force meets an immovable object?"
    ),
]

ai_msg = chat_model.invoke(messages)

API Reference:HumanMessage | SystemMessage

print(ai_msg.content)

According to the popular phrase and hypothetical scenario, when an unstoppable force meets an immovable object, a paradoxical situation arises as both forces are seemingly contradictory. On one hand, an unstoppable force is an entity that cannot be stopped or prevented from moving forward, while on the other hand, an immovable object is something that cannot be moved or displaced from its position. 

In this scenario, it is un

Chat Templating\n\n`ChatHuggingFace` relies on the underlying HuggingFace tokenizer's `apply_chat_template` method to convert a list of messages into a single formatted string prompt for the model. This process uses a Jinja2 template.\n\nYou can customize this behavior in two main ways:\n\n### 1. Providing a Custom Chat Template String\n\nYou can specify a custom Jinja template string directly to the `ChatHuggingFace` constructor using the `chat_template` parameter. This allows you to define precisely how messages (system, human, AI) are formatted.\n\nFor example, if you want a simple template that prefixes each message with its role:\n\npython\nfrom langchain_huggingface import ChatHuggingFace, HuggingFacePipeline\nfrom langchain_core.messages import HumanMessage, SystemMessage\n\n# This is a simplified example; replace with your actual LLM loading\n# For instance, using HuggingFacePipeline\nllm_pipeline = HuggingFacePipeline.from_model_id(\n model_id=\"HuggingFaceH4/zephyr-7b-beta\", # Replace with a model that supports chat\n task=\"text-generation\",\n pipeline_kwargs={\"max_new_tokens\": 200},\n)\n\ncustom_template_str = (\n \"{% for message in messages %}\"\n \"{{ message.role.upper() }}: {{ message.content }}\\n\"\n \"{% endfor %}\"\n \"{% if add_generation_prompt %}{{ 'ASSISTANT:' }}{% endif %}\" # Example generation prompt\n)\n\nchat_model_custom = ChatHuggingFace(llm=llm_pipeline, chat_template=custom_template_str)\n\nmessages = [\n SystemMessage(content=\"You are a pirate.\"),\n HumanMessage(content=\"What is your name?\"),\n]\n\n# The apply_chat_template method (called internally by invoke) will use your custom_template_str\n# response = chat_model_custom.invoke(messages)\n# print(response.content)\n\n\n### 2. Passing Keyword Arguments for Template Variables\n\nChat templates can include variables that you can substitute at runtime. You can pass values for these variables as keyword arguments to methods like `invoke`, `stream`, `ainvoke`, and `astream`. These arguments are then passed down to the tokenizer's `apply_chat_template` method.\n\nThis is useful if your chat template (either a custom one you've provided or the default one loaded with the tokenizer) is designed to accept such variables.\n\nFor example, imagine you have a custom template that uses a specific instruction for the system:\n\npython\n# (Continuing from the previous setup with llm_pipeline)\n\n# Note: The variable {{ system_instruction }} must be handled by your Jinja template.\n# This is a conceptual example. The actual default templates of models may not use such arbitrary variables.\ncustom_template_with_vars = (\n \"{% if system_instruction is defined %}\"\n \"System Instruction: {{ system_instruction }}\\n\"\n \"{% endif %}\"\n \"{% for message in messages %}\"\n \"{{ message.role.upper() }}: {{ message.content }}\\n\"\n \"{% endfor %}\"\n \"{% if add_generation_prompt %}{{ 'ASSISTANT:' }}{% endif %}\"\n)\n\nchat_model_vars = ChatHuggingFace(llm=llm_pipeline, chat_template=custom_template_with_vars)\n\nmessages_for_vars = [\n HumanMessage(content=\"Tell me a short story about a brave knight.\"),\n]\n\n# Pass the 'system_instruction' as a keyword argument to invoke\n# response_with_vars = chat_model_vars.invoke(\n# messages_for_vars, \n# system_instruction=\"The story should be suitable for children.\"\n# )\n# print(response_with_vars.content)\n\n\nThe underlying templating engine is Jinja2. For more advanced templating features and how specific models define their chat templates, refer to the Hugging Face documentation on Chat Templating.

API reference

For detailed documentation of all ChatHuggingFace features and configurations head to the API reference: https://python.langchain.com/api_reference/huggingface/chat_models/langchain_huggingface.chat_models.huggingface.ChatHuggingFace.html

API reference

For detailed documentation of all ChatHuggingFace features and configurations head to the API reference: https://python.langchain.com/api_reference/huggingface/chat_models/langchain_huggingface.chat_models.huggingface.ChatHuggingFace.html

Chat model conceptual guide
Chat model how-to guides

Was this page helpful?

Overview
Setup
Setup
- Credentials
Instantiation
Invocation
Chat Templating\n\nChatHuggingFace relies on the underlying HuggingFace tokenizer's apply_chat_template method to convert a list of messages into a single formatted string prompt for the model. This process uses a Jinja2 template.\n\nYou can customize this behavior in two main ways:\n\n### 1. Providing a Custom Chat Template String\n\nYou can specify a custom Jinja template string directly to the ChatHuggingFace constructor using the chat_template parameter. This allows you to define precisely how messages (system, human, AI) are formatted.\n\nFor example, if you want a simple template that prefixes each message with its role:\n\npython\nfrom langchain_huggingface import ChatHuggingFace, HuggingFacePipeline\nfrom langchain_core.messages import HumanMessage, SystemMessage\n\n# This is a simplified example; replace with your actual LLM loading\n# For instance, using HuggingFacePipeline\nllm_pipeline = HuggingFacePipeline.from_model_id(\n model_id=\"HuggingFaceH4/zephyr-7b-beta\", # Replace with a model that supports chat\n task=\"text-generation\",\n pipeline_kwargs={\"max_new_tokens\": 200},\n)\n\ncustom_template_str = (\n \"{% for message in messages %}\"\n \"{{ message.role.upper() }}: {{ message.content }}\\n\"\n \"{% endfor %}\"\n \"{% if add_generation_prompt %}{{ 'ASSISTANT:' }}{% endif %}\" # Example generation prompt\n)\n\nchat_model_custom = ChatHuggingFace(llm=llm_pipeline, chat_template=custom_template_str)\n\nmessages = [\n SystemMessage(content=\"You are a pirate.\"),\n HumanMessage(content=\"What is your name?\"),\n]\n\n# The apply_chat_template method (called internally by invoke) will use your custom_template_str\n# response = chat_model_custom.invoke(messages)\n# print(response.content)\n\n\n### 2. Passing Keyword Arguments for Template Variables\n\nChat templates can include variables that you can substitute at runtime. You can pass values for these variables as keyword arguments to methods like invoke, stream, ainvoke, and astream. These arguments are then passed down to the tokenizer's apply_chat_template method.\n\nThis is useful if your chat template (either a custom one you've provided or the default one loaded with the tokenizer) is designed to accept such variables.\n\nFor example, imagine you have a custom template that uses a specific instruction for the system:\n\npython\n# (Continuing from the previous setup with llm_pipeline)\n\n# Note: The variable {{ system_instruction }} must be handled by your Jinja template.\n# This is a conceptual example. The actual default templates of models may not use such arbitrary variables.\ncustom_template_with_vars = (\n \"{% if system_instruction is defined %}\"\n \"System Instruction: {{ system_instruction }}\\n\"\n \"{% endif %}\"\n \"{% for message in messages %}\"\n \"{{ message.role.upper() }}: {{ message.content }}\\n\"\n \"{% endfor %}\"\n \"{% if add_generation_prompt %}{{ 'ASSISTANT:' }}{% endif %}\"\n)\n\nchat_model_vars = ChatHuggingFace(llm=llm_pipeline, chat_template=custom_template_with_vars)\n\nmessages_for_vars = [\n HumanMessage(content=\"Tell me a short story about a brave knight.\"),\n]\n\n# Pass the 'system_instruction' as a keyword argument to invoke\n# response_with_vars = chat_model_vars.invoke(\n# messages_for_vars, \n# system_instruction=\"The story should be suitable for children.\"\n# )\n# print(response_with_vars.content)\n\n\nThe underlying templating engine is Jinja2. For more advanced templating features and how specific models define their chat templates, refer to the Hugging Face documentation on Chat Templating.
API reference
API reference
Related

Overview​

Integration details​

Integration details​

Model features​

Setup​

Credentials​

Installation​

Model features​

Setup​

Credentials​

Instantiation​

HuggingFaceEndpoint​

HuggingFacePipeline​

Instatiating with Quantization​

Invocation​

API reference​

API reference​

Related​

Was this page helpful?

Overview

Integration details

Integration details

Model features

Setup

Credentials

Installation

Model features

Setup

Credentials

Instantiation

`HuggingFaceEndpoint`

`HuggingFacePipeline`

Instatiating with Quantization

Invocation

API reference

API reference

Related