Fine Tuning Of Language Model With PEFT: A Comprehensive Guide

Ishita Kaur
May 17, 2024

Introduction

In this blog, we will guide you through the steps to perform Parameter Efficient Fine Tuning (PEFT), using Low rank Adaption (LoRA) of LLM. We will understand how Parameter Efficient Fine Tuning can be used to fine-tune the selected trainable parameters for specific applications, at the lowest cost and minimal infrastructure.

Large language models (LLMs) are already pre-trained for some tasks; we can use LLMs in our applications to perform any tasks on which they have already been trained. However, these LLMs require very expensive resources to be operational in our environment, and Parameter Efficient Fine Tuning is needed.

Let us break the terms of Parameter Efficient Fine Tuning to understand the PEFT library.

First, let’s suppose we were able to use the Large Language Model on our system cost-effectively. This is possible by using the PEFT library, as it allows us use some parameters of LLMs separately.

Now let us understand how we can use only some parameters of a model. Actually, we will be using some configuration for this that is similar to what is used for quantization.

PEFT (Parameter Efficient Fine Tuning)

Parameter Efficient Fine Tuning is a library that lets us use the LLMs to be operational to perform tasks without fine-tuning the complete model, but some (extra) parameters, such as complete model fine-tuning, which would otherwise require expensive computational costs.

The fine-tuning of the (extra) parameters significantly reduces computation and storage costs.

Also, the performance is same as compared to a completely fine-tuned LLM model.

The (extra) parameter fine tuning makes it easier to train and store LLMs on CPU-supported hardware.

Moreover, Parameter Efficient Fine Tuning integration with libraries like transformers and diffusers makes it easier to load, train, and use LLMs for inference.

We have used the LoRA method to reduce the size of LLMs, and lora_config is the configuration we used in our code to reduce the parameters.

LoRA

LoRA is a low-rank decomposition method to reduce the number of trainable parameters which helps in easy fine tuning LLMs with less memory consumption.

In PEFT LoRA Config is wrapped with get_peft_model() to create a trainable PeftModel. In order to increase/decrease the weights check init_lora_weights in LoraConfig.

Motivation

In the world of artificial intelligence (AI) and natural language processing (NLP), large language models are used. GenAI Applications have used LLM models to build a number of applications in various fields such as healthcare, finance, banking etc.

Achieving the desired results from LLM-based applications involves various approaches, such as Fine tuning and creating a new model from scratch. As we go deeper into lower levels, the requirements in terms of computational resources and costs increase significantly.

Let’s understand the following steps to perform Parameter Efficient Fine Tuning using the LoRA technique.

Firstly, we have to import libraries to load the models, datasets, and configuration files.

Step 1: Import libraries

    
     from datasets import load_dataset, DatasetDict, Dataset
from transformers import AutoTokenizer,AutoConfig,AutoModelForSequenceClassification,DataCollatorWithPadding,TrainingArguments,Trainer
from peft import PeftModel, PeftConfig, get_peft_model, LoraConfig
import evaluate
import torch
import numpy as np

In this step, we give the path of the actual model on which we have to perform the PEFT technique.

Step 2: Load Model

    
     
load_model = 'distilbert-base-uncased'

Step 3: Define Labels for Mapping

    
     Id_to_Label = {0: ‘Neg’, 1:'Pos'}
Label_to_Id = {‘Neg’:0, 'Pos':1}

Step 4: Load Model from Original Checkpoints

    
     model_loading = AutoModelForSequenceClassification.from_pretrained(load_model, num_labels = 2, Id_to_Label = Id_to_Label, Label_to_Id = Label_to_Id)

Step 5: Loading Dataset

    
     
dataset = load_dataset("glue", "sst2")

Step 6: Pre-processing Data: Tokenize and mapping

    
     # Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(load_model, add_prefix_space=True)
# create custom tokenize function
if tokenizer.pad_token is None:
   tokenizer.add_special_tokens({'pad_token': '[PAD]'})
   model_loading.resize_token_embeddings(len(tokenizer))
def tokenize_function(examples):
   text = examples["sentence"]
   tokenizer.truncation_side = "left"
   tokenized_inputs = tokenizer(
       text,
       return_tensors="np",
       truncation=True,
       max_length=512
   )
   return tokenized_inputs
# Call function and map it to all texts
tokenized_dataset = dataset.map(tokenize_function, batched=True)

Step 7: Import Data Collator from transformers

    
     
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

Step 8: Performance Evaluation Metrics

    
     
accuracy = evaluate.load("accuracy")
# Define an evaluation function
def compute_metrics(p):
   predictions, labels = p
   predictions = np.argmax(predictions, axis=1)
   return {"accuracy": accuracy.compute(predictions=predictions,       references=labels)}

Step 9: Sample data for testing

    
     
text_list = ["I don't like mountains", "I love iphone build quality ", "An apple a day keeps doctor away"," I really want to impress my friend"]

Step 10: Before Parameter Efficient Fine Tuning predictions

    
     # Untrained Model predictions before applying PEFT technique
print("Untrained model predictions:")
for text in text_list:
   inputs = tokenizer.encode(text, return_tensors="pt")
   logits = model_loading(inputs).logits
   predictions = torch.argmax(logits)
   print(text + " - " + Id_to_Label[predictions.tolist()])

Step 11: Create Parameter Efficient Fine Tuning Config File

    
     peft_config = LoraConfig(task_type="SEQ_CLS",
                       r=4, 
                       lora_alpha=32, 
                       lora_dropout=0.01,
                       target_modules=["q_lin", "k_lin","v_lin"])
                       # target_modules=["q", "v"])
                       #target_modules = ['query_key_value'])

Step 12: Load the Parameter Efficient Fine Tuning Model and define hyperparameters

    
     model_loading = get_peft_model(model_loading, peft_config)
model_loading.print_trainable_parameters()
print("Loading PEFT model",model_loading)
# Training Parameters
lr = 1e-3
batch_size = 4
num_epochs = 2

Step 13: Create Training Arguments

    
     training_args = TrainingArguments(
   output_dir= load_model + "-lora-text-classification",
   learning_rate=lr,
   per_device_train_batch_size=batch_size,
   per_device_eval_batch_size=batch_size,
   num_train_epochs=num_epochs,
   weight_decay=0.01,
   evaluation_strategy="epoch",
   save_strategy="epoch",
   load_best_model_at_end=True,
)

Step 14: Load trainer for training the Parameter Efficient Fine Tuning Model

    
     trainer = Trainer(
   model=model_loading,
   args=training_args,
   train_dataset=tokenized_dataset["train"],
   eval_dataset=tokenized_dataset["validation"],
   tokenizer=tokenizer,
   data_collator=data_collator,
   compute_metrics=compute_metrics,
)
#train function calling 
trainer.train()
model.to('cpu')

Step 15: Inference part

    
     from datasets import load_dataset, DatasetDict, Dataset
from transformers import DistilBertForMaskedLM,AutoTokenizer,AutoConfig,AutoModelForSequenceClassification,DataCollatorWithPadding,TrainingArguments,Trainer
from peft import PeftModel, PeftConfig, get_peft_model, LoraConfig
import evaluate
import torch
import numpy as np
load_model = 'distilbert-base-uncased-lora-text-classification_refined-20240418T064518Z-001/distilbert-base-uncased-lora-text-classification_refined/checkpoint-33676'

print("modelcheckpoint",load_model)
Id_to_Label = {0: ‘Neg’, 1:'Pos'}
Label_to_Id = {‘Neg’:0, 'Pos':1}
model_loading = AutoModelForSequenceClassification.from_pretrained(load_model, num_labels = 2, Id_to_Label = Id_to_Label, Label_to_Id = Label_to_Id)
tokenizer = AutoTokenizer.from_pretrained(load_model, add_prefix_space=True)
text_list = ["I don't like mountains", "I love iphone build quality ", "An apple a day keeps doctor away", " I really want to impress my friend"]
model_loading.to('cpu')

print("Trained model predictions:")
print("--------------------------")
for text in text_list:
   inputs = tokenizer.encode(text, return_tensors="pt").to("cpu") # to('mps') for Mac
   logits = model_loading(inputs).logits
   predictions = torch.max(logits,1).indices
   print(text + " - " + Id_to_Label[predictions.tolist()[0]])

Step 16: Output of the model before Parameter Efficient Fine Tuning

    
     Untrained model predictions:
----------------------------
I don’t like mountains - Positive
I love iphone build quality - Positive
An apple a day keeps doctor away - Negative
I really want to impress my friend ! - Positive

Step 17: Trainable Parameters

    
     trainable params: 702,722 || all params: 67,657,732 || trainable%: 1.0386425604689202

Step 18: Output of the model after applying the Parameter Efficient Fine Tuning technique

    
     Trained model predictions:
--------------------------
I don’t like mountains - Negative
I love iphone build quality - Positive
An apple a day keeps doctor away - Positive
I really want to impress my friend ! - Positive

Conclusion

In this blog, we studied the Parameter Efficient Fine Tuning using the Low-rank Adaption (LoRA) method for Large Language Models (LLMs). We learned how PEFT offers a cost-effective solution by fine-tuning (extra) parameters, significantly reducing computational and storage costs while maintaining comparable performance.

By integrating Parameter Efficient Fine Tuning with libraries like transformers, diffusers, and accelerate, we see an efficient approach to load, train, and utilize LLMs for inference, making it more accessible and operable for various applications across industries such as healthcare, finance, etc.

FAQs

What is PEFT, and why is it important?

PEFT(Parameter Efficient Fine Tuning) is a technique for fine-tuning large language models (LLMs) with minimal computational resources. It allows us to selectively fine-tune only certain parameters of the model, significantly reducing computational and storage costs while maintaining performance levels comparable to fully fine-tuned models.

How does PEFT work?

PEFT works by using a library that enables the selective fine-tuning of LLMs. Instead of tuning the entire model, only a subset of parameters is adjusted, reducing the computational burden. This is achieved through techniques like Low-rank Adaptation (LoRA), which decomposes the model's weight matrices to reduce the number of trainable parameters.

What is LoRA, and how does it contribute to PEFT?

LoRA (Low-rank Adaptation) is a method for reducing the number of trainable parameters in LLMs by decomposing weight matrices. This reduction in parameters leads to more efficient fine-tuning with lower memory consumption, making it an integral part of the PEFT process.

Need Help To Kick-Start Your AI Journey Today ?

Reach out to us now to know how we can help you improve business productivity, efficiency, and scale with AI solutions.

Industries

Are You AI Ready?

Insights

Table of Content

Fine Tuning Of Language Model With PEFT: A Comprehensive Guide

Introduction

PEFT (Parameter Efficient Fine Tuning)

LoRA

Motivation

Conclusion

FAQs

Related Articles

Need Help To Kick-Start Your AI Journey Today ?

send your query

Recognized by

Quick Links

Services

Contact

Subscribe to our Newsletter!

Let's Transform Your Business with AI

Get latest AI insights, tips, and updates directly to your inbox.