Integrating Hugging Face models in natural language processing (NLP) applications has been greatly facilitated by the possibilities provided by the use of the LangChain framework and, additionally, the ability to build locally deployed solutions thanks to LM Studio. The combination of these technologies allows you to take advantage of Hugging Face ‘s large collection of pre-trained models and LangChain’s flexibility in creating customized workflows for each use case.
In the following sections we will show in a concise way how to make use of these components to integrate them in a specific application, we will use a Python based development environment although the explanations will be similar for other languages/frameworks.
Initial Configuration
Before starting, it is necessary to install the Python packages transformers
and huggingface_hub
. These packages allow Hugging Face models to be accessed and run locally through LangChain.
%pip install --upgrade --quiet transformers huggingface_hub
Load a Hugging Face Model
To load a model, you can use the class HuggingFacePipeline
in LangChain, specifying the model ID and the task to be performed. This allows you to load models directly from the Hugging Face Model Hub and run them in your local environment.
from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline
hf = HuggingFacePipeline.from_model_id(
model_id="gpt2",
task="text-generation",
pipeline_kwargs={"max_new_tokens": 10},
)
Create and Execute a “chain
LangChain facilitates the creation of chains, which are sequences of operations to be performed on the text. You can compose a model with a prompt using PromptTemplate
to form a chain, which allows a customized interaction with the loaded model. In the following example, a template is created for a prompt that will answer a given question by constructing a step-by-step reasoned explanation. Subsequently, a chain is constructed that passes the prompt to the loaded model and finally the chain is invoked (executed) to obtain the result.
from langchain.prompts import PromptTemplate
template = """Question: {question}
Answer: Let's think step by step."""
prompt = PromptTemplate.from_template(template)
chain = prompt | hf
question = "What is electroencephalography?"
print(chain.invoke({"question": question}))
Possibilities for inference
Use of GPU for Inference
LangChain offers support for GPU-based inference, which is useful for speeding up processing and handling large models. You can specify the GPU device during model loading, or use automatic device mapping with the Accelerate
library if you have multiple GPUs or large models.
gpu_llm = HuggingFacePipeline.from_model_id(
model_id="gpt2",
task="text-generation",
device=0, # Usa device_map="auto" para mapeo automático con Accelerate
pipeline_kwargs={"max_new_tokens": 10},
)
Inference with OpenVINO Backend
For deployments requiring high efficiency and low latency, LangChain supports the use of OpenVINO as an inference backend. This allows models to run on Intel hardware, optimizing performance and resource consumption.
ov_llm = HuggingFacePipeline.from_model_id(
model_id="gpt2",
task="text-generation",
backend="openvino",
model_kwargs={"device": "CPU"},
pipeline_kwargs={"max_new_tokens": 10},
)
Access to Hugging Face Endpoints
LangChain provides access to Hugging Face endpoints to easily integrate text generation capabilities and other NLP services into your applications. For this, you will need to obtain a Hugging Face API token and configure the environment accordingly. This option facilitates development in exchange for establishing a dependency on the HuggingFace inference mechanism.
from langchain_community.llms import HuggingFaceEndpoint
from getpass import getpass
HUGGINGFACEHUB_API_TOKEN = getpass()
os.environ["HUGGINGFACEHUB_API_TOKEN"] = HUGGINGFACEHUB_API_TOKEN
repo_id = "mistralai/Mistral-7B-Instruct-v0.2"
llm = HuggingFaceEndpoint(
repo_id=repo_id, max_length=128, temperature=0.5, token=HUGGINGFACEHUB_API_TOKEN
)
The integration of Hugging Face with LangChain and LM Studio provides a powerful platform for the development of NLP solutions, combining the accessibility of advanced pre-trained models with the flexibility of a framework dedicated to the creation of NLP applications.
Local endpoints using LM Studio
Compared to the previous options, the use of LM Studio allows you to easily load models from the HuggingFace Hub and run them locally, thus being able to control the entire development environment without losing the ease of integration provided by the use of an endpoint. In this sense, LM Studio allows the use of a local server that is fully compatible with the OpenAI API, thus simplifying the integration work, allowing the use of different models with the same API.
Conclusion
The combination of Hugging Face, LangChain and LM Studio constitutes a powerful technology alliance that is democratizing access to advanced NLP tools, enabling developers and companies of all sizes to explore and exploit the potential of natural language processing in scenarios of any kind.