Introduction
In this blog, we will guide you through the steps to perform Parameter Efficient Fine Tuning (PEFT), using Low rank Adaption (LoRA) of LLM. We will understand how Parameter Efficient Fine Tuning can be used to fine-tune the selected trainable parameters for specific applications, at the lowest cost and minimal infrastructure.Â
Large language models (LLMs) are already pre-trained for some tasks; we can use LLMs in our applications to perform any tasks on which they have already been trained. However, these LLMs require very expensive resources to be operational in our environment, and Parameter Efficient Fine Tuning is needed.Â
Let us break the terms of Parameter Efficient Fine Tuning to understand the PEFT library.
First, let’s suppose we were able to use the Large Language Model on our system cost-effectively. This is possible by using the PEFT library, as it allows us use some parameters of LLMs separately.Â
Now let us understand how we can use only some parameters of a model. Actually, we will be using some configuration for this that is similar to what is used for quantization.Â
PEFT (Parameter Efficient Fine Tuning)
Parameter Efficient Fine Tuning is a library that lets us use the LLMs to be operational to perform tasks without fine-tuning the complete model, but some (extra) parameters, such as complete model fine-tuning, which would otherwise require expensive computational costs.Â
The fine-tuning of the (extra) parameters significantly reduces computation and storage costs.
Also, the performance is same as compared to a completely fine-tuned LLM model.Â
The (extra) parameter fine tuning makes it easier to train and store LLMs on CPU-supported hardware.
Moreover, Parameter Efficient Fine Tuning integration with libraries like transformers and diffusers makes it easier to load, train, and use LLMs for inference.
We have used the LoRA method to reduce the size of LLMs, and lora_config is the configuration we used in our code to reduce the parameters.
LoRA
LoRA is a low-rank decomposition method to reduce the number of trainable parameters which helps in easy fine tuning LLMs with less memory consumption.
In PEFT LoRA Config is wrapped with get_peft_model() to create a trainable PeftModel. In order to increase/decrease the weights check init_lora_weights in LoraConfig.
Motivation
In the world of artificial intelligence (AI) and natural language processing (NLP), large language models are used. GenAI Applications have used LLM models to build a number of applications in various fields such as healthcare, finance, banking etc.Â
Achieving the desired results from LLM-based applications involves various approaches, such as Fine tuning and creating a new model from scratch. As we go deeper into lower levels, the requirements in terms of computational resources and costs increase significantly.Â
Let’s understand the following steps to perform Parameter Efficient Fine Tuning using the LoRA technique.
Firstly, we have to import libraries to load the models, datasets, and configuration files.Â
Step 1: Import libraries
from datasets import load_dataset, DatasetDict, Dataset
from transformers import AutoTokenizer,AutoConfig,AutoModelForSequenceClassification,DataCollatorWithPadding,TrainingArguments,Trainer
from peft import PeftModel, PeftConfig, get_peft_model, LoraConfig
import evaluate
import torch
import numpy as np
In this step, we give the path of the actual model on which we have to perform the PEFT technique.
Step 2: Load Model
load_model = 'distilbert-base-uncased'
Step 3: Define Labels for Mapping
Id_to_Label = {0: ‘Neg’, 1:'Pos'}
Label_to_Id = {‘Neg’:0, 'Pos':1}
Step 4: Load Model from Original Checkpoints  Â
model_loading = AutoModelForSequenceClassification.from_pretrained(load_model, num_labels = 2, Id_to_Label = Id_to_Label, Label_to_Id = Label_to_Id)
Step 5: Loading Dataset
dataset = load_dataset("glue", "sst2")
Step 6: Pre-processing Data: Tokenize and mappingÂ
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(load_model, add_prefix_space=True)
# create custom tokenize function
if tokenizer.pad_token is None:
tokenizer.add_special_tokens({'pad_token': '[PAD]'})
model_loading.resize_token_embeddings(len(tokenizer))
def tokenize_function(examples):
text = examples["sentence"]
tokenizer.truncation_side = "left"
tokenized_inputs = tokenizer(
text,
return_tensors="np",
truncation=True,
max_length=512
)
return tokenized_inputs
# Call function and map it to all texts
tokenized_dataset = dataset.map(tokenize_function, batched=True)
Step 7: Import Data Collator from transformers
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
Step 8: Performance Evaluation Metrics
accuracy = evaluate.load("accuracy")
# Define an evaluation function
def compute_metrics(p):
predictions, labels = p
predictions = np.argmax(predictions, axis=1)
return {"accuracy": accuracy.compute(predictions=predictions, references=labels)}
Step 9:Â Sample data for testingÂ
text_list = ["I don't like mountains", "I love iphone build quality ", "An apple a day keeps doctor away"," I really want to impress my friend"]
Step 10: Before Parameter Efficient Fine Tuning predictions
# Untrained Model predictions before applying PEFT technique
print("Untrained model predictions:")
for text in text_list:
inputs = tokenizer.encode(text, return_tensors="pt")
logits = model_loading(inputs).logits
predictions = torch.argmax(logits)
print(text + " - " + Id_to_Label[predictions.tolist()])
Step 11: Create Parameter Efficient Fine Tuning Config File
peft_config = LoraConfig(task_type="SEQ_CLS",
r=4,
lora_alpha=32,
lora_dropout=0.01,
target_modules=["q_lin", "k_lin","v_lin"])
# target_modules=["q", "v"])
#target_modules = ['query_key_value'])
Step 12: Load the Parameter Efficient Fine Tuning Model and define hyperparametersÂ
model_loading = get_peft_model(model_loading, peft_config)
model_loading.print_trainable_parameters()
print("Loading PEFT model",model_loading)
# Training Parameters
lr = 1e-3
batch_size = 4
num_epochs = 2
Step 13: Create Training Arguments
training_args = TrainingArguments(
output_dir= load_model + "-lora-text-classification",
learning_rate=lr,
per_device_train_batch_size=batch_size,
per_device_eval_batch_size=batch_size,
num_train_epochs=num_epochs,
weight_decay=0.01,
evaluation_strategy="epoch",
save_strategy="epoch",
load_best_model_at_end=True,
)
Step 14: Load trainer for training the Parameter Efficient Fine Tuning ModelÂ
trainer = Trainer(
model=model_loading,
args=training_args,
train_dataset=tokenized_dataset["train"],
eval_dataset=tokenized_dataset["validation"],
tokenizer=tokenizer,
data_collator=data_collator,
compute_metrics=compute_metrics,
)
#train function calling
trainer.train()
model.to('cpu')
Step 15: Inference part
from datasets import load_dataset, DatasetDict, Dataset
from transformers import DistilBertForMaskedLM,AutoTokenizer,AutoConfig,AutoModelForSequenceClassification,DataCollatorWithPadding,TrainingArguments,Trainer
from peft import PeftModel, PeftConfig, get_peft_model, LoraConfig
import evaluate
import torch
import numpy as np
load_model = 'distilbert-base-uncased-lora-text-classification_refined-20240418T064518Z-001/distilbert-base-uncased-lora-text-classification_refined/checkpoint-33676'
print("modelcheckpoint",load_model)
Id_to_Label = {0: ‘Neg’, 1:'Pos'}
Label_to_Id = {‘Neg’:0, 'Pos':1}
model_loading = AutoModelForSequenceClassification.from_pretrained(load_model, num_labels = 2, Id_to_Label = Id_to_Label, Label_to_Id = Label_to_Id)
tokenizer = AutoTokenizer.from_pretrained(load_model, add_prefix_space=True)
text_list = ["I don't like mountains", "I love iphone build quality ", "An apple a day keeps doctor away", " I really want to impress my friend"]
model_loading.to('cpu')
print("Trained model predictions:")
print("--------------------------")
for text in text_list:
inputs = tokenizer.encode(text, return_tensors="pt").to("cpu") # to('mps') for Mac
logits = model_loading(inputs).logits
predictions = torch.max(logits,1).indices
print(text + " - " + Id_to_Label[predictions.tolist()[0]])
Step 16: Output of the model before Parameter Efficient Fine Tuning
Untrained model predictions:
----------------------------
I don’t like mountains - Positive
I love iphone build quality - Positive
An apple a day keeps doctor away - Negative
I really want to impress my friend ! - Positive
Step 17: Trainable ParametersÂ
trainable params: 702,722 || all params: 67,657,732 || trainable%: 1.0386425604689202
Step 18: Output of the model after applying the Parameter Efficient Fine Tuning technique
Trained model predictions:
--------------------------
I don’t like mountains - Negative
I love iphone build quality - Positive
An apple a day keeps doctor away - Positive
I really want to impress my friend ! - Positive
Conclusion
In this blog, we studied the Parameter Efficient Fine Tuning using the Low-rank Adaption (LoRA) method for Large Language Models (LLMs). We learned how PEFT offers a cost-effective solution by fine-tuning (extra) parameters, significantly reducing computational and storage costs while maintaining comparable performance.
By integrating Parameter Efficient Fine Tuning with libraries like transformers, diffusers, and accelerate, we see an efficient approach to load, train, and utilize LLMs for inference, making it more accessible and operable for various applications across industries such as healthcare, finance, etc.
FAQs
PEFT(Parameter Efficient Fine Tuning) is a technique for fine-tuning large language models (LLMs) with minimal computational resources. It allows us to selectively fine-tune only certain parameters of the model, significantly reducing computational and storage costs while maintaining performance levels comparable to fully fine-tuned models.
PEFT works by using a library that enables the selective fine-tuning of LLMs. Instead of tuning the entire model, only a subset of parameters is adjusted, reducing the computational burden. This is achieved through techniques like Low-rank Adaptation (LoRA), which decomposes the model's weight matrices to reduce the number of trainable parameters.
LoRA (Low-rank Adaptation) is a method for reducing the number of trainable parameters in LLMs by decomposing weight matrices. This reduction in parameters leads to more efficient fine-tuning with lower memory consumption, making it an integral part of the PEFT process.