Home
Omar Hosney
PEFT (Parameter-Efficient Fine-Tuning) π€ Cheat Sheet
1. Introduction to PEFT π
- PEFT reduces computational and storage costs by fine-tuning fewer parameters.
- π» Enables the training of large models on consumer hardware, making AI more accessible.
- π€ Maintains performance comparable to fully fine-tuned models.
- π Seamless Integration: Works with Hugging Face libraries like Transformers, Diffusers, and Accelerate.
2. PEFT Methodologies π―
- π Soft Prompting: Adds learnable parameters to input embeddings to optimize tasks while keeping model parameters frozen.
- π LoRA (Low-Rank Adaptation): Uses low-rank matrices to reduce memory usage and computational cost by limiting the number of trainable parameters.
- π IA3 (Integrated Attention Activation Adapters): Multiplies model activations by three learnable vectors, minimizing parameter changes.
3. Adapter Methods π§©
- π Adapters: Small neural networks inserted into layers of a pretrained model, allowing task adaptation without altering the base model.
- π§ X-LoRA: Uses multiple LoRA adapters for fine-tuning a model on different tasks simultaneously, enhancing flexibility and efficiency.
4. Quick Tour of PEFT π
- π₯οΈ Install PEFT: Run
pip install peft
or install from the GitHub repository for the latest features.
- π οΈ Configuration: Define specific settings, such as the dimension of LoRA matrices, using LoraConfig or PromptEncoderConfig.
- πΎ Save Model: Use save_pretrained() to save only additional weights, ensuring efficient storage.
- π Load Model for Inference: Use
from_pretrained()
to load a trained model efficiently.
5. Advanced Applications π
- πΌοΈ Integration with Diffusers: Manage multiple adapters for generative AI tasks, such as creating images and videos from text prompts.
- π§ Integration with Transformers: Efficiently train large-scale language models for various NLP tasks using adapters.
- βοΈ Soft Prompting Methods: Learn task-specific prompts dynamically by adding learnable parameters to input embeddings.
6. Advanced Configurations π οΈ
- π§© Create Custom Configurations: Tailor PEFT methods to specific needs by creating configurations like LoraConfig.
- π API References: Explore detailed API references for methods and classes to fine-tune models effectively.
7. Model Merging & Quantization π οΈ
- π§© TIES & DARE: Efficiently merge models by eliminating redundant parameters using trimming and rescaling techniques.
- βοΈ Quantization: Use fewer bits to represent data, reducing memory usage and accelerating inference for large language models.
- π QLoRA: Combines quantization with LoRA to fine-tune large models, making it possible to use them on limited hardware.
Different Adaptors
Low-Rank Adaptation (LoRA) β¨
- LoRA represents weight updates using low-rank matrices.
- Keeps pretrained weights frozen, reducing trainable parameters.
- Combines original and adapted weights for final results.
- Efficient and comparable to full fine-tuning.
- Typically applied to attention blocks in Transformer models.
Mixture of LoRA Experts (X-LoRA) π€
- X-LoRA uses dense/sparse gating to activate experts dynamically.
- Only the gating layers are trained, keeping the parameter count low.
- Allows the model to reconfigure dynamically during inference.
- Requires a dual forward pass for effective knowledge mixing.
Low-Rank Hadamard Product (LoHa) π§©
- LoHa enhances model expressivity using Hadamard product.
- Utilizes four smaller matrices for higher rank without extra parameters.
- Originally developed for computer vision, adapted for diffusion models.
Low-Rank Kronecker Product (LoKr) π
- LoKr uses Kronecker product for parameter-efficient finetuning.
- Maintains the original weight matrix's rank.
- Can be vectorized for faster processing.
Orthogonal Finetuning (OFT) π―
- OFT preserves pretrained model's generative performance.
- Maintains cosine similarity between neurons for semantic preservation.
- Utilizes a sparse block-diagonal matrix to be parameter-efficient.
Orthogonal Butterfly (BOFT) π¦
- BOFT focuses on maintaining pretrained model's structure.
- Uses an orthogonal matrix for transformations.
- Ensures minimal change in modelβs latent space.
Adaptive Low-Rank Adaptation (AdaLoRA) π οΈ
- AdaLoRA allocates parameters based on task importance.
- Uses SVD-like techniques to control rank dynamically.
- Prunes less important parameters for efficiency.
Llama-Adapter π¦
- Llama-Adapter adapts models for instruction-following.
- Uses learnable prompts to guide higher-level semantics.
- Zero-initialized attention prevents overwhelming pretrained knowledge.
Soft Prompts
π Prompt Tuning
- Trains only a small set of task-specific prompt parameters.
- Designed for text classification on T5 models as text generation tasks.
- Prompt tokens have independent parameters updated separately.
- Keeps the pretrained model frozen and updates only the prompt embeddings.
- Performance is comparable to full model training.
π Prefix Tuning
- Optimizes prefix parameters for each task.
- Works with natural language generation tasks on GPT models.
- Prefix parameters are inserted at all layers of the model.
- Uses a separate feed-forward network (FFN) for optimization.
- Comparable to full finetuning with 1000x fewer parameters.
π P-Tuning
- Suitable for natural language understanding tasks.
- Uses a prompt encoder (LSTM) to optimize prompts.
- Prompt tokens can be inserted anywhere in the input sequence.
- Only adds tokens to the input, not to every layer.
- Improves performance with anchor tokens.
π Multitask Prompt Tuning
- Enables parameter-efficient transfer learning.
- Learns a single prompt for multiple tasks.
- Consists of source training and target adaptation stages.
- Uses Hadamard product for generating task-specific prompts.
- Trains a shared prompt matrix across all tasks.
IA3 and BOFT π
IA3 Overview π
- IA3 makes fine-tuning more efficient by using learned vectors to rescale inner activations.
- Only trainable parameters are the learned vectors; original weights remain frozen.
- IA3 drastically reduces the number of trainable parameters to about 0.01% for T0.
- Performance is comparable to fully fine-tuned models without adding inference latency.
IA3 in Practice π‘
- Injected in the attention and feedforward modules of transformers.
- Targets outputs of key and value layers and input of the second feedforward layer.
- Implemented using IA3Config to control how IA3 is applied.
- Example for sequence classification in a Llama model using peft_config.
OFT and BOFT Overview βοΈ
- OFT uses an orthogonal matrix to transform pretrained weights.
- BOFT generalizes OFT using Butterfly factorization for greater efficiency.
- Uses multiplicative updates for weight matrices, preserving pretraining knowledge better.
- Efficiently reduces the number of trainable parameters while maintaining model performance.
BOFT Key Features π
- Uses Butterfly factorization to parameterize the orthogonal matrix.
- Structural constraint maintains hyperspherical energy to prevent knowledge forgetting.
- Supports flexible and parameter-efficient finetuning for various downstream tasks.
- Can merge weights with base model using merge_and_unload().
BOFT Parameters π
- boft_block_size: Determines sparsity of update matrices.
- boft_block_num: Specifies number of blocks across layers.
- boft_n_butterfly_factor: Defines the number of butterfly factors.
- boft_dropout: Probability of multiplicative dropout.
Example Usage π οΈ
- Configure for image classification using BOFTConfig.
- Set parameters like boft_block_size and target_modules.
- Integrate with transformers library and PEFT for training.