In-context learning (natural language processing)

In natural language processing, in-context learning, few-shot learning or few-shot prompting is a prompting technique that allows a model to process examples before attempting a task.[1][2] The method was popularized after the advent of GPT-3[3] and is considered to be an emergent property of large language models.[4]

A few-shot prompt normally includes n examples of (problem, solution) pairs known as "shots", with the overall usage of such a prompt being known as n-shot prompting.[5][6] For instance, the following is a one-shot prompt for review sentiment classification: Review: This movie sucks. Sentiment: negative. Review: I love this movie. Sentiment: If the model outputs "positive", then it has correctly solved the task.[4]

The term zero-shot prompting is often used to signify that no examples are provided.[7][8][9] An example of a zero-shot prompt for a question-answering task would be "Who wrote the book On the Origin of Species?".

In-context learning was initially proposed as an alternative to fine-tuning a pre-trained language model on a task-specific dataset.[3] The term is misleading as in fact no parameters are changed in the model and as a result no actual learning takes place. Instead the prompt primes the model for subsequent inference. The main advantages of in-context learning over fine-tuning are a reduction in the amount of task-specific data needed and a reduced potential of overfitting by learning an overly narrow distribution from a large but narrow fine-tuning dataset.[3] Few-shot performance of large language models has been shown to achieve competitive results on NLP tasks, sometimes surpassing prior state-of-the-art fine-tuning approaches.[3][10] Examples of such NLP tasks are translation, question answering, cloze tasks, unscrambling words, and using a novel word in a sentence. The creation and optimization of such few-shot prompts is part of the now active field of study of prompt engineering.[11][12]

While few-shot prompting has performed competitively when compared to fine-tuned models, it has its own drawbacks. For example, it has been shown that the order in which the shots are listed can make the difference between state-of-the-art and random guess performance. A set of few-shot examples that works well in some specific order with one model may not work at all when used with a different model.[13] Despite these shortcomings, the commonly used Transformer model can encode principled learning algorithms based on gradient descent inside their weights and enable mesa-optimization[14] i.e. learn-to-learn small models based on the data given in-context when making predictions.[15][16][17][18]

A common example of in-context learning is chain-of-thought prompting, where few-shot examples are given to teach the model to output a string of reasoning before attempting to answer a question.[19] This technique has been shown to improve performance of models in tasks that require logical thinking and reasoning.[20]

References

Logan IV, Robert; Balazevic, Ivana; Wallace, Eric; Petroni, Fabio; Singh, Sameer; Riedel, Sebastian (2022). "Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models". Findings of the Association for Computational Linguistics: ACL 2022: 2824–2835. doi:10.18653/v1/2022.findings-acl.222. S2CID 235652287.
Bragg, Jonathan; Cohan, Arman; Lo, Kyle; Beltagy, Iz (9 November 2021). "FLEX: Unifying Evaluation for Few-Shot NLP". arXiv:2107.07170 [cs.CL].
Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon; Ramesh, Aditya; Ziegler, Daniel M.; Wu, Jeffrey; Winter, Clemens; Hesse, Christopher; Chen, Mark; Sigler, Eric; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack; Berner, Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, Dario (2020). "Language Models are Few-Shot Learners". arXiv:2005.14165 [cs.CL].
Wei, Jason; Tay, Yi; Bommasani, Rishi; Raffel, Colin; Zoph, Barret; Borgeaud, Sebastian; Yogatama, Dani; Bosma, Maarten; Zhou, Denny; Metzler, Donald; Chi, Ed H.; Hashimoto, Tatsunori; Vinyals, Oriol; Liang, Percy; Dean, Jeff; Fedus, William (31 August 2022). "Emergent Abilities of Large Language Models". arXiv:2206.07682 [cs.CL].
Beltagy, Iz; Cohan, Arman; Logan IV, Robert; Min, Sewon; Singh, Sameer (2022). "Zero- and Few-Shot NLP with Pretrained Language Models". Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts: 32–37. doi:10.18653/v1/2022.acl-tutorials.6. S2CID 248779924.
Ke, Zixuan; Lin, Haowei; Shao, Yijia; Xu, Hu; Shu, Lei; Liu, Bing (2022). "Continual Training of Language Models for Few-Shot Learning". arXiv:2210.05549 [cs.CL].
Wiggers, Kyle (28 April 2022). "The emerging types of language models and why they matter". TechCrunch.
Wei, Jason; Bosma, Maarten; Zhao, Vincent Y.; Guu, Kelvin; Yu, Adams Wei; Lester, Brian; Du, Nan; Dai, Andrew M.; Le, Quoc V. (2021). "Finetuned Language Models Are Zero-Shot Learners". arXiv:2109.01652 [cs.CL].
Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon; Ramesh, Aditya; Ziegler, Daniel M.; Wu, Jeffrey; Winter, Clemens; Hesse, Christopher; Chen, Mark; Sigler, Eric; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack; Berner, Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, Dario (Dec 2020). Larochelle, H.; Ranzato, M.; Hadsell, R.; Balcan, M.F.; Lin, H. (eds.). "Language Models are Few-Shot Learners" (PDF). Advances in Neural Information Processing Systems. Curran Associates, Inc. 33: 1877–1901.
Schick, Timo; Schütze, Hinrich (2021). "It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners". Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: 2339–2352. doi:10.18653/v1/2021.naacl-main.185. S2CID 221703107.
Mok, Aaron. "'Prompt engineering' is one of the hottest jobs in generative AI. Here's how it works". Business Insider. Retrieved 14 March 2023.
Harwell, Drew (25 February 2023). "Tech's hottest new job: AI whisperer. No coding required". Washington Post. Retrieved 14 March 2023.
Lu, Yao; Bartolo, Max; Moore, Alastair; Riedel, Sebastian; Stenetorp, Pontus (2022). "Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity". Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers): 8086–8098. doi:10.18653/v1/2022.acl-long.556. S2CID 233296494.
"Mesa-Optimization". Retrieved 17 May 2023.
Johannes von Oswald; Niklasson, Eyvind; Randazzo, Ettore; Sacramento, João; Mordvintsev, Alexander; Zhmoginov, Andrey; Vladymyrov, Max (2022). "Transformers learn in-context by gradient descent". arXiv:2212.07677 [cs.LG].
Garg, Shivam; Tsipras, Dimitris; Liang, Percy; Valiant, Gregory (2022). "What Can Transformers Learn In-Context? A Case Study of Simple Function Classes". arXiv:2208.01066 [cs.CL].
Akyürek, Ekin; Schuurmans, Dale; Andreas, Jacob; Ma, Tengyu; Zhou, Denny (2022). "What learning algorithm is in-context learning? Investigations with linear models". arXiv:2211.15661 [cs.LG].
Musser, George. "How AI Knows Things No One Told It". Scientific American. Retrieved 17 May 2023.
Wei, Jason; Zhou. "Language Models Perform Reasoning via Chain of Thought". ai.googleblog.com. Retrieved 10 March 2023.
Wei, Jason; Wang, Xuezhi; Schuurmans, Dale; Bosma, Maarten; Ichter, Brian; Xia, Fei; Chi, Ed H.; Le, Quoc V.; Zhou, Denny (31 October 2022). "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models". arXiv:2201.11903 [cs.CL].

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[1] Logan IV, Robert; Balazevic, Ivana; Wallace, Eric; Petroni, Fabio; Singh, Sameer; Riedel, Sebastian (2022). "Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models". Findings of the Association for Computational Linguistics: ACL 2022: 2824–2835. doi:10.18653/v1/2022.findings-acl.222. S2CID 235652287.

[2] Bragg, Jonathan; Cohan, Arman; Lo, Kyle; Beltagy, Iz (9 November 2021). "FLEX: Unifying Evaluation for Few-Shot NLP". arXiv:2107.07170 [cs.CL].

[gpt3-3] Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon; Ramesh, Aditya; Ziegler, Daniel M.; Wu, Jeffrey; Winter, Clemens; Hesse, Christopher; Chen, Mark; Sigler, Eric; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack; Berner, Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, Dario (2020). "Language Models are Few-Shot Learners". arXiv:2005.14165 [cs.CL].

[emerge-4] Wei, Jason; Tay, Yi; Bommasani, Rishi; Raffel, Colin; Zoph, Barret; Borgeaud, Sebastian; Yogatama, Dani; Bosma, Maarten; Zhou, Denny; Metzler, Donald; Chi, Ed H.; Hashimoto, Tatsunori; Vinyals, Oriol; Liang, Percy; Dean, Jeff; Fedus, William (31 August 2022). "Emergent Abilities of Large Language Models". arXiv:2206.07682 [cs.CL].

[5] Beltagy, Iz; Cohan, Arman; Logan IV, Robert; Min, Sewon; Singh, Sameer (2022). "Zero- and Few-Shot NLP with Pretrained Language Models". Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts: 32–37. doi:10.18653/v1/2022.acl-tutorials.6. S2CID 248779924.

[6] Ke, Zixuan; Lin, Haowei; Shao, Yijia; Xu, Hu; Shu, Lei; Liu, Bing (2022). "Continual Training of Language Models for Few-Shot Learning". arXiv:2210.05549 [cs.CL].

[Wiggers-7] Wiggers, Kyle (28 April 2022). "The emerging types of language models and why they matter". TechCrunch.

[8] Wei, Jason; Bosma, Maarten; Zhao, Vincent Y.; Guu, Kelvin; Yu, Adams Wei; Lester, Brian; Du, Nan; Dai, Andrew M.; Le, Quoc V. (2021). "Finetuned Language Models Are Zero-Shot Learners". arXiv:2109.01652 [cs.CL].

[few-shot-learners-9] Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon; Ramesh, Aditya; Ziegler, Daniel M.; Wu, Jeffrey; Winter, Clemens; Hesse, Christopher; Chen, Mark; Sigler, Eric; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack; Berner, Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, Dario (Dec 2020). Larochelle, H.; Ranzato, M.; Hadsell, R.; Balcan, M.F.; Lin, H. (eds.). "Language Models are Few-Shot Learners" (PDF). Advances in Neural Information Processing Systems. Curran Associates, Inc. 33: 1877–1901.

[10] Schick, Timo; Schütze, Hinrich (2021). "It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners". Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: 2339–2352. doi:10.18653/v1/2021.naacl-main.185. S2CID 221703107.

[11] Mok, Aaron. "'Prompt engineering' is one of the hottest jobs in generative AI. Here's how it works". Business Insider. Retrieved 14 March 2023.

[12] Harwell, Drew (25 February 2023). "Tech's hottest new job: AI whisperer. No coding required". Washington Post. Retrieved 14 March 2023.

[13] Lu, Yao; Bartolo, Max; Moore, Alastair; Riedel, Sebastian; Stenetorp, Pontus (2022). "Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity". Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers): 8086–8098. doi:10.18653/v1/2022.acl-long.556. S2CID 233296494.

[14] "Mesa-Optimization". Retrieved 17 May 2023.

[15] Johannes von Oswald; Niklasson, Eyvind; Randazzo, Ettore; Sacramento, João; Mordvintsev, Alexander; Zhmoginov, Andrey; Vladymyrov, Max (2022). "Transformers learn in-context by gradient descent". arXiv:2212.07677 [cs.LG].

[16] Garg, Shivam; Tsipras, Dimitris; Liang, Percy; Valiant, Gregory (2022). "What Can Transformers Learn In-Context? A Case Study of Simple Function Classes". arXiv:2208.01066 [cs.CL].

[17] Akyürek, Ekin; Schuurmans, Dale; Andreas, Jacob; Ma, Tengyu; Zhou, Denny (2022). "What learning algorithm is in-context learning? Investigations with linear models". arXiv:2211.15661 [cs.LG].

[18] Musser, George. "How AI Knows Things No One Told It". Scientific American. Retrieved 17 May 2023.

[19] Wei, Jason; Zhou. "Language Models Perform Reasoning via Chain of Thought". ai.googleblog.com. Retrieved 10 March 2023.

[weipaper-20] Wei, Jason; Wang, Xuezhi; Schuurmans, Dale; Bosma, Maarten; Ichter, Brian; Xia, Fei; Chi, Ed H.; Le, Quoc V.; Zhou, Denny (31 October 2022). "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models". arXiv:2201.11903 [cs.CL].

In-context learning (natural language processing)

See also

References