Fine-tuning (machine learning)

In machine learning, fine-tuning is an approach to transfer learning in which the weights of a pre-trained model are trained on new data.[1] Fine-tuning can be done on the entire neural network, or on only a subset of its layers, in which case the layers that are not being fine-tuned are "frozen" (not updated during the backpropagation step).[2]

For some architectures, such as convolutional neural networks, it is common to keep the earlier layers (those closest to the input layer) frozen because they capture lower-level features, while later layers often discern high-level features that can be more related to the task that the model is trained on.[2][3]

Fine-tuning is common in natural language processing (NLP), especially in the domain of language modeling. Large language models like OpenAI's GPT-2 can be fine-tuned on downstream NLP tasks to produce better results than the pre-trained model can normally achieve.[4] Models that are pre-trained on large and general corpora are usually fine-tuned by reusing the model's parameters as a starting point and adding a task-specific layer trained from scratch.[5] Fine-tuning the full model is common as well and often yields better results, but it is more computationally expensive.[4] Full fine-tuning is also more prone to overfitting and may cause the model to perform worse on data outside of the distribution of training data used during finetuning.[6]

Fine-tuning is typically accomplished with supervised learning, but there are also techniques to fine-tune a model using weak supervision.[7] Reinforcement learning is also used to fine-tune language models like ChatGPT (a fine-tuned version of GPT-3) and Sparrow by means of reinforcement learning from human feedback.[8][9]

Low-Rank Adaptation (LoRA)[10] trains low-rank matrices ("update matrices") that adds to existing weights. The basic idea is as follows: Suppose we have a $N\times N$ matrix $A$ in the model, where $N$ is large, then we can either modify $A$ itself into some $A'$ , or define $A':=A+VW^{T}$ , and train $V,W$ . Here $V,W$ are of size $N\times r$ , and $r\ll N$ is the "low-rank" of the update matrix $VW^{T}$ .

LoRA is often used for language models, but it can also be used for image models.[11]

References

Quinn, Joanne (2020). Dive into deep learning: tools for engagement. Thousand Oaks, California. p. 551. ISBN 978-1-5443-6137-6. Archived from the original on January 10, 2023. Retrieved January 10, 2023.
"CS231n Convolutional Neural Networks for Visual Recognition". cs231n.github.io. Retrieved 9 March 2023.
Zeiler, Matthew D; Fergus, Rob (2013). "Visualizing and Understanding Convolutional Networks". arXiv:1311.2901. {{cite journal}}: Cite journal requires |journal= (help)
Dingliwal, Saket; Shenoy, Ashish; Bodapati, Sravan; Gandhe, Ankur; Gadde, Ravi Teja; Kirchhoff, Katrin (2021). "Prompt Tuning GPT-2 language model for parameter-efficient domain adaptation of ASR systems". arXiv:2112.08718. {{cite journal}}: Cite journal requires |journal= (help)
Dodge, Jesse; Ilharco, Gabriel; Schwartz, Roy; Farhadi, Ali; Hajishirzi, Hannaneh; Smith, Noah (2020). "Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping". arXiv:2002.06305. {{cite journal}}: Cite journal requires |journal= (help)
Kumar, Ananya; Raghunathan, Aditi; Jones, Robbie; Ma, Tengyu; Liang, Percy (2022). "Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution". arXiv:2202.10054. {{cite journal}}: Cite journal requires |journal= (help)
Yu, Yue; Zuo, Simiao; Jiang, Haoming; Ren, Wendi; Zhao, Tuo; Zhang, Chao (2020). "Fine-Tuning Pre-trained Language Model with Weak Supervision: A Contrastive-Regularized Self-Training Approach". arXiv:2010.07835. {{cite journal}}: Cite journal requires |journal= (help)
"Introducing ChatGPT". openai.com. Retrieved 9 March 2023.
Glaese, Amelia; McAleese, Nat; Trębacz, Maja; Aslanides, John; Firoiu, Vlad; Ewalds, Timo; Rauh, Maribeth; Weidinger, Laura; Chadwick, Martin; Thacker, Phoebe; Campbell-Gillingham, Lucy; Uesato, Jonathan; Huang, Po-Sen; Comanescu, Ramona; Yang, Fan; See, Abigail; Dathathri, Sumanth; Greig, Rory; Chen, Charlie; Fritz, Doug; Elias, Jaume Sanchez; Green, Richard; Mokrá, Soňa; Fernando, Nicholas; Wu, Boxi; Foley, Rachel; Young, Susannah; Gabriel, Iason; Isaac, William; Mellor, John; Hassabis, Demis; Kavukcuoglu, Koray; Hendricks, Lisa Anne; Irving, Geoffrey (2022). "Improving alignment of dialogue agents via targeted human judgements". arXiv:2209.14375. {{cite journal}}: Cite journal requires |journal= (help)
Hu, Edward J.; Shen, Yelong; Wallis, Phillip; Allen-Zhu, Zeyuan; Li, Yuanzhi; Wang, Shean; Wang, Lu; Chen, Weizhu (2021-10-16). "LoRA: Low-Rank Adaptation of Large Language Models". arXiv:2106.09685 [cs].
Wu, Hecong (February 2023), ControlLoRA: A Light Neural Network To Control Stable Diffusion Spatial Information, retrieved 2023-04-27

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[d2l-1] Quinn, Joanne (2020). Dive into deep learning: tools for engagement. Thousand Oaks, California. p. 551. ISBN 978-1-5443-6137-6. Archived from the original on January 10, 2023. Retrieved January 10, 2023.

[cs231n-2] "CS231n Convolutional Neural Networks for Visual Recognition". cs231n.github.io. Retrieved 9 March 2023.

[3] Zeiler, Matthew D; Fergus, Rob (2013). "Visualizing and Understanding Convolutional Networks". arXiv:1311.2901. {{cite journal}}: Cite journal requires |journal= (help)

[amazon-4] Dingliwal, Saket; Shenoy, Ashish; Bodapati, Sravan; Gandhe, Ankur; Gadde, Ravi Teja; Kirchhoff, Katrin (2021). "Prompt Tuning GPT-2 language model for parameter-efficient domain adaptation of ASR systems". arXiv:2112.08718. {{cite journal}}: Cite journal requires |journal= (help)

[5] Dodge, Jesse; Ilharco, Gabriel; Schwartz, Roy; Farhadi, Ali; Hajishirzi, Hannaneh; Smith, Noah (2020). "Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping". arXiv:2002.06305. {{cite journal}}: Cite journal requires |journal= (help)

[6] Kumar, Ananya; Raghunathan, Aditi; Jones, Robbie; Ma, Tengyu; Liang, Percy (2022). "Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution". arXiv:2202.10054. {{cite journal}}: Cite journal requires |journal= (help)

[7] Yu, Yue; Zuo, Simiao; Jiang, Haoming; Ren, Wendi; Zhao, Tuo; Zhang, Chao (2020). "Fine-Tuning Pre-trained Language Model with Weak Supervision: A Contrastive-Regularized Self-Training Approach". arXiv:2010.07835. {{cite journal}}: Cite journal requires |journal= (help)

[8] "Introducing ChatGPT". openai.com. Retrieved 9 March 2023.

[9] Glaese, Amelia; McAleese, Nat; Trębacz, Maja; Aslanides, John; Firoiu, Vlad; Ewalds, Timo; Rauh, Maribeth; Weidinger, Laura; Chadwick, Martin; Thacker, Phoebe; Campbell-Gillingham, Lucy; Uesato, Jonathan; Huang, Po-Sen; Comanescu, Ramona; Yang, Fan; See, Abigail; Dathathri, Sumanth; Greig, Rory; Chen, Charlie; Fritz, Doug; Elias, Jaume Sanchez; Green, Richard; Mokrá, Soňa; Fernando, Nicholas; Wu, Boxi; Foley, Rachel; Young, Susannah; Gabriel, Iason; Isaac, William; Mellor, John; Hassabis, Demis; Kavukcuoglu, Koray; Hendricks, Lisa Anne; Irving, Geoffrey (2022). "Improving alignment of dialogue agents via targeted human judgements". arXiv:2209.14375. {{cite journal}}: Cite journal requires |journal= (help)

[10] Hu, Edward J.; Shen, Yelong; Wallis, Phillip; Allen-Zhu, Zeyuan; Li, Yuanzhi; Wang, Shean; Wang, Lu; Chen, Weizhu (2021-10-16). "LoRA: Low-Rank Adaptation of Large Language Models". arXiv:2106.09685 [cs].

[11] Wu, Hecong (February 2023), ControlLoRA: A Light Neural Network To Control Stable Diffusion Spatial Information, retrieved 2023-04-27

Fine-tuning (machine learning)

See also

References