LangChain and Hugging Face. Introduction

Integrating Hugging Face models in natural language processing (NLP) applications has been greatly facilitated by the possibilities provided by the use of the LangChain framework and, additionally, the ability to build locally deployed solutions thanks to LM Studio. The combination of these technologies allows you to take advantage of Hugging Face ‘s large collection of pre-trained models and LangChain’s flexibility in creating customized workflows for each use case.

In the following sections we will show in a concise way how to make use of these components to integrate them in a specific application, we will use a Python based development environment although the explanations will be similar for other languages/frameworks.

Initial Configuration

Before starting, it is necessary to install the Python packages transformers and huggingface_hub. These packages allow Hugging Face models to be accessed and run locally through LangChain.

%pip install --upgrade --quiet transformers huggingface_hub

Load a Hugging Face Model

To load a model, you can use the class HuggingFacePipeline in LangChain, specifying the model ID and the task to be performed. This allows you to load models directly from the Hugging Face Model Hub and run them in your local environment.

from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline

hf = HuggingFacePipeline.from_model_id(
    model_id="gpt2",
    task="text-generation",
    pipeline_kwargs={"max_new_tokens": 10},
)

Create and Execute a “chain

LangChain facilitates the creation of chains, which are sequences of operations to be performed on the text. You can compose a model with a prompt using PromptTemplate to form a chain, which allows a customized interaction with the loaded model. In the following example, a template is created for a prompt that will answer a given question by constructing a step-by-step reasoned explanation. Subsequently, a chain is constructed that passes the prompt to the loaded model and finally the chain is invoked (executed) to obtain the result.

from langchain.prompts import PromptTemplate

template = """Question: {question}  

Answer: Let's think step by step."""
prompt = PromptTemplate.from_template(template)

chain = prompt | hf

question = "What is electroencephalography?"

print(chain.invoke({"question": question}))

Possibilities for inference

Use of GPU for Inference

LangChain offers support for GPU-based inference, which is useful for speeding up processing and handling large models. You can specify the GPU device during model loading, or use automatic device mapping with the Accelerate library if you have multiple GPUs or large models.

gpu_llm = HuggingFacePipeline.from_model_id(
    model_id="gpt2",
    task="text-generation",
    device=0,  # Usa device_map="auto" para mapeo automático con Accelerate
    pipeline_kwargs={"max_new_tokens": 10},
)

Inference with OpenVINO Backend

For deployments requiring high efficiency and low latency, LangChain supports the use of OpenVINO as an inference backend. This allows models to run on Intel hardware, optimizing performance and resource consumption.

ov_llm = HuggingFacePipeline.from_model_id(
    model_id="gpt2",
    task="text-generation",
    backend="openvino",
    model_kwargs={"device": "CPU"},
    pipeline_kwargs={"max_new_tokens": 10},
)

Access to Hugging Face Endpoints

LangChain provides access to Hugging Face endpoints to easily integrate text generation capabilities and other NLP services into your applications. For this, you will need to obtain a Hugging Face API token and configure the environment accordingly. This option facilitates development in exchange for establishing a dependency on the HuggingFace inference mechanism.

from langchain_community.llms import HuggingFaceEndpoint
from getpass import getpass

HUGGINGFACEHUB_API_TOKEN = getpass()
os.environ["HUGGINGFACEHUB_API_TOKEN"] = HUGGINGFACEHUB_API_TOKEN

repo_id = "mistralai/Mistral-7B-Instruct-v0.2"

llm = HuggingFaceEndpoint(
    repo_id=repo_id, max_length=128, temperature=0.5, token=HUGGINGFACEHUB_API_TOKEN
)

The integration of Hugging Face with LangChain and LM Studio provides a powerful platform for the development of NLP solutions, combining the accessibility of advanced pre-trained models with the flexibility of a framework dedicated to the creation of NLP applications.

Local endpoints using LM Studio

Compared to the previous options, the use of LM Studio allows you to easily load models from the HuggingFace Hub and run them locally, thus being able to control the entire development environment without losing the ease of integration provided by the use of an endpoint. In this sense, LM Studio allows the use of a local server that is fully compatible with the OpenAI API, thus simplifying the integration work, allowing the use of different models with the same API.

Conclusion

The combination of Hugging Face, LangChain and LM Studio constitutes a powerful technology alliance that is democratizing access to advanced NLP tools, enabling developers and companies of all sizes to explore and exploit the potential of natural language processing in scenarios of any kind.

This post is also available in: English Español

LangChain and Hugging Face. Introduction

Initial Configuration

Possibilities for inference

Local endpoints using LM Studio

Conclusion

Artículos relacionados

Gemini: Google’s revolutionary model

LM Studio

Leave a Reply Cancel reply