Tutorial: MSc Cristian Camilo Millán Arias


Efficient Fine-Tuning for Quantized LLM on a Single GPU

MSc Cristian Camilo Millán Arias

Large Language Models (LLMs) have revolutionized our interaction with artificial intelligence and natural language processing. These models, like GPT, have proven to be highly effective tools across various applications, although they have some significant challenges. One of these challenges involves the need for constant supervision and efforts to fine-tune both the models and training data. However, the high number of model parameters and the vast amount of data in databases require substantial computational power, often rendering training unfeasible on a personal machine. A commonly employed strategy is efficient fine-tuning, which allows for model specialization in a specific area. This tutorial briefly introduces how to carry out efficient fine-tuning for quantized LLMs on a single GPU. We will explore the frameworks that enable model quantization, resulting in a reduction of parameters to fine-tune without compromising performance.