top of page
Search

Fine-Tuning in the context of Large Language Models

  • saurabhkamal14
  • 2 days ago
  • 4 min read
ree


Imagine a language model like a smart student who has read almost every book in the world. This student knows a lot about general knowledge and language, but sometimes isn’t perfect for a specific task you want them to do.

Fine-tuning is like giving this student extra lessons on a specific topic:


  • The student already knows general stuff (like grammar, facts, and reasoning).

  • You now give them a smaller set of examples focused on your task (e.g., answering customer service questions, writing medical reports, or coding help).

  • After learning from these examples, the student gets better at your specific task without forgetting everything they already know.


So, in short: Fine-tuning = teaching a pre-trained model extra lessons on a special topic so it performs better for that task.

Fine-tuning Process:


  1. Start with Pre-trained Models: Use a model trained on general text.

  2. Modify the model for the specific task

  3. Use labeled data for the specific task

  4. Adjust parameters: Update model weights for the task.


Types of Fine-tuning:
  1. Full Fine-tuning

  2. PEFT (Parameter Efficient Fine-tuning): (a). LoRA (b). QLoRA

  3. DPO (Direct Preference Optimization)



Full Fine-Tuning:

Full fine-tuning means updating all the knowledge inside the model to make it perform better on my specific task.


  • Remember our model is like a student who has read the whole library.

  • Full fine-tuning = you re-teach the student using your special lessons, and they adjust everything they know based on the new lessons.


How it works:


  1. Start with a pre-trained model (already very knowledgeable).

  2. Prepare a task-specific dataset.

  3. Train the model on this dataset, and update all its internal parameters.

  4. After training, the model is fully adapted to my task.


This is different from the LoRA/adapter methods, where only some parts of the model are updated.


Pros of Full Fine-Tuning:


  • Models becomes highly specialized for your task.

  • Can achieve maximum accuracy because all parts of the model are adjusted.


Cons of Full Fine-Tuning:


  • Expensive: Requires a lot of computing power and memory.

  • Time-Consuming: Training takes longer.


PEFT (Parameter Efficient Fine-tuning):

These are the methods where only a subset of parameters are trained; the bulk of the model’s weights remain frozen.


 (a). LoRA (Low-Rank Adaption): Imagine you have a very large language model (lots of parameters). Fine-tuning everything can be extremely expensive. LoRA says: instead of changing all those big weight matrices, keep the original weights frozen and add a small “adjustment” matrix that has a low-rank. So you’re learning a modest “delta” over the big model, rather than rewriting the whole thing.


How it works:


  • The pretrained weight matrix is frozen (unchanged).

  • You introduce two small matrices A and B (with rank r much smaller than the full dimension) such that the model’s effective weight becomes


 W′=W+α⋅(A×B)


You only train A and B (i.e., the “delta”), not all of W. This drastically reduces how many parameters you must update and store.



ree

Limitations/trade-offs:


  • Choosing the right rank (r) and scaling matters: too small → underfits; too large → you lose the efficiency benefits.

  • Because you only allow a small “change budget” (low rank), you might not be able to adapt fully if your new task is very different or demands large shifts in behaviour.


(b). QLoRA (Quantized LoRA): QLoRA builds on LoRA, adding another trick: quantization of the base model weights to very low precision (e.g., 4-bit) so that you can fine-tune huge models on smaller hardware. In other words: “we’ll freeze the big model, make it small/memory-light via quantization, and then inject a LoRA delta to adapt it”.

How it works:


  • Take the pre-trained model and quantize (compress) its weights to a low bit-width format (e.g., 4-bit “NormalFloat 4” (NF4)). This shrinks memory usage.

  • Freeze the quantized weights (you won’t update them).

  • Add low-rank adapters (LoRA) as before, and train those on your downstream task.

  • Because the base is compressed, you can fine-tune very large models (e.g., 65 B parameters) on hardware that would otherwise be impossible.


Advantages:


  • Training very large models becomes feasible in more modest settings (less GPU memory required)

  • You still get much of the performance benefit of fine-tuning a huge model.

  • Combines “small update cost” (LoRA) + “small base memory cost” (quantization).


 Disadvantages:


  • There may still be a slight performance drop compared to full-precision fine-tuning (though in practice QLoRA claims near parity).

  • Complexity is higher


DPO (Direct Preference Optimization):

When we want a language model to behave in a way humans like (e.g., being helpful, safe, etc.), one way is to collect preference data: “Given prompt X, response A is preferred by a human over response B.”


ree

Imagine you’re training a chatbot. 

For each question (prompt), humans tell you:

“Answer A sounds better than Answer B.”

You collect lots of these “A is better than B” pairs. Now, you want your model to start preferring A-like answers over B-like ones.

 In old methods (like Reinforcement Learning with Human Feedback, RLHF):


  1. You first train a reward model that scores how good an answer is.

  2. Then you train your chatbot (using reinforcement learning) to maximize that reward.


That’s a two-step and sometimes unstable process.

 DPO skips all that.
ree

Advantages:


  • Often more stable and efficient.

  • Good empirical results in aligning large language models with human preferences


Limitations:


  • Still need quality preference data: good prompts + well-judged pairs.

  • It’s not magic — if your preferred/rejected pairs are noisy or biased, you’ll get poor alignment.


ree

 
 
bottom of page