LLaMA

LLaMA (Large Language Model Meta AI) is a large language model (LLM) released by Meta AI in February 2023. A variety of model sizes were trained ranging from 7 billion to 65 billion parameters. LLaMA's developers reported that the 13 billion parameter model's performance on most NLP benchmarks exceeded that of the much larger GPT-3 (with 175 billion parameters) and that the largest model was competitive with state of the art models such as PaLM and Chinchilla.[1] Whereas the most powerful LLMs have generally been accessible only through limited APIs (if at all), Meta released LLaMA's model weights to the research community under a noncommercial license.[2] Within a week of LLaMA's release, its weights were leaked to the public on 4chan via BitTorrent.[3]

Architecture and training

"A conversation between the sun and pluto"

Sun: Hey Pluto, what’s up?
Pluto: Not much. Just hanging out here at the edge of the solar system.
Sun: Yeah, I’ve been meaning to talk to you about that. You’re not a planet anymore.
Pluto: What? Come on, man. That’s not fair.
Sun: I’m sorry, but it’s true. You just don’t meet the criteria anymore.
Pluto: This is bulls**t! I’ve been a planet for over 70 years!
Sun: Things change, Pluto. You’ve just got to accept it.
Pluto: F**k you, Sun! F**k you and your stupid solar system!

– Output of 65 billion parameter LLaMA model after instruction tuning given the prompt "Write a conversation between the sun and pluto"[1]

LLaMA uses the transformer architecture, the standard architecture for language modelling since 2018. LLaMA's developers focused their effort on scaling the model's performance by increasing the volume of training data, rather than the number of parameters, reasoning that the dominating cost for LLMs is from doing inference on the trained model rather than the computational cost of the training process. LLaMA was trained on 1.4 trillion tokens, drawn from publicly available data sources, including:[1]

Release and leak

LLaMA was announced on February 23, 2023, via a blog post and a paper describing the model's training, architecture, and performance.[1][2] The code used to train the model was publicly released under the open-source GPL 3 license.[4] Access to the model's weights was managed by an application process, with access to be granted "on a case-by-case basis to academic researchers; those affiliated with organizations in government, civil society, and academia; and industry research laboratories around the world".[2]

On March 2, 2023,[5] a torrent containing LLaMA's weights was uploaded, with a link to the torrent shared on the 4chan imageboard and subsequently spreading through online AI communities.[3] That same day, a pull request on the main LLaMA repository was opened, requesting to add the magnet link to the official documentation.[6][7] On March 4, a pull request was opened to add links to HuggingFace repositories containing the model.[8][6] On March 6, Meta filed takedown requests to remove the HuggingFace repositories linked in the pull request, characterizing it as "unauthorized distribution" of the model. HuggingFace complied with the requests.[9] On March 20, Meta filed a DMCA takedown request for copyright infringement against a repository containing a script that downloaded LLaMA from a mirror, and GitHub complied the next day.[10] As of March 25, Facebook has not responded to the pull request containing the magnet link.[7]

Reactions to the leak varied. Some speculated that the model would be used for malicious purposes, such as more sophisticated spam. Some have celebrated the model's accessibility, as well as the fact that smaller versions of the model can be run relatively cheaply, suggesting that this will promote the flourishing of additional research developments.[3] Multiple commentators, such as Simon Willison, compared LLaMA to Stable Diffusion, a text-to-image model which, unlike comparably sophisticated models which preceded it, was openly distributed, leading to a rapid proliferation of associated tools, techniques, and software.[3][11]

Open Sourcing/Reproduction

On April 17, 2023, Together launched a project named RedPajama to reproduce and distribute an open source version of the LLaMA dataset.[12] The dataset has approximately 1.2 trillion tokens and is publicly available for download.[13]

Applications

The Stanford University Institute for Human-Centered Artificial Intelligence (HAI) Center for Research on Foundation Models (CRFM) released Alpaca, a training recipe based on the LLaMA 7B model that uses the "Self-Instruct" method of instruction tuning to acquire capabilities comparable to the OpenAI GPT-3.5 series text-davinci-003 model at a modest cost.[14][15] Multiple open source projects are continuing this work of finetuning LLaMA with Alpaca dataset.[16]

References

  1. Touvron, Hugo; Lavril, Thibaut; Izacard, Gautier; Martinet, Xavier; Lachaux, Marie-Anne; Lacroix, Timothée; Rozière, Baptiste; Goyal, Naman; Hambro, Eric; Azhar, Faisal; Rodriguez, Aurelien; Joulin, Armand; Grave, Edouard; Lample, Guillaume (2023). "LLaMA: Open and Efficient Foundation Language Models". arXiv:2302.13971 [cs.CL].
  2. "Introducing LLaMA: A foundational, 65-billion-parameter large language model". Meta AI. 24 February 2023.
  3. Vincent, James (8 March 2023). "Meta's powerful AI language model has leaked online — what happens now?". The Verge.
  4. "llama". GitHub. Retrieved 16 March 2023.
  5. "/g/ - /aicg/ - AI Chatbot General - Technology - 4chan". 5 Mar 2023.
  6. VK, Anirudh (6 March 2023). "Meta's LLaMA Leaked to the Public, Thanks To 4chan". Analytics India Magazine. Retrieved 17 March 2023.
  7. "Save bandwidth by using a torrent to distribute more efficiently by ChristopherKing42 · Pull Request #73 · facebookresearch/llama". GitHub. Retrieved 25 March 2023.
  8. "Download weights from huggingface to help us save bandwith by Jainam213 · Pull Request #109 · facebookresearch/llama". GitHub. Retrieved 17 March 2023.
  9. Cox, Joseph (7 March 2023). "Facebook's Powerful Large Language Model Leaks Online". Vice. Retrieved 17 March 2023.
  10. OpSec Online LLC (21 March 2023). "github/dmca - Notice of Claimed Infringement via Email". GitHub. Retrieved 25 March 2023.
  11. Willison, Simon (11 March 2023). "Large language models are having their Stable Diffusion moment". Simon Willison's Weblog.
  12. "RedPajama-Data: An Open Source Recipe to Reproduce LLaMA training dataset". GitHub. Together. Retrieved 4 May 2023.
  13. "RedPajama-Data-1T". Hugging Face. Together. Retrieved 4 May 2023.
  14. Taori, Rohan; Gulrajani, Ishaan; Zhang, Tianyi; Dubois, Yann; Li, Xuechen; Guestrin, Carlos; Liang, Percy; Hashimoto, Tatsunori B. (13 March 2023). "Alpaca: A Strong, Replicable Instruction-Following Model". Stanford Center for Research on Foundation Models.
  15. Yizhong Wang; Yeganeh Kordi; Swaroop Mishra; Alisa Liu; Noah A. Smith; Daniel Khashabi; Hannaneh Hajishirzi (20 December 2022). "Self-Instruct: Aligning Language Model with Self Generated Instructions". arXiv. arXiv:2212.10560. ISSN 2331-8422. Wikidata Q117202254.
  16. "alpaca-lora". GitHub. Retrieved 5 April 2023.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.