Prompt-Engineering-Guide/pages/models/collection.en.mdx

# Model Collection

import { Callout, FileTree } from 'nextra-theme-docs'

<Callout emoji="⚠️">
  This section is under heavy development.
</Callout>

This section consists of a collection and summary of notable and foundational LLMs.


## Models

| Model | Description | 
| --- | --- | 
| [BERT](https://arxiv.org/abs/1810.04805) | Bidirectional Encoder Representations from Transformers | 
| [RoBERTa](https://arxiv.org/abs/1907.11692) | A Robustly Optimized BERT Pretraining Approach | 
| [ALBERT](https://arxiv.org/abs/1909.11942) | A Lite BERT for Self-supervised Learning of Language Representations | 
| [XLNet](https://arxiv.org/abs/1906.08237) | Generalized Autoregressive Pretraining for Language Understanding and Generation |
| [GPT](https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf) | Improving Language Understanding by Generative Pre-Training | 
| [GPT-2](https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) | Language Models are Unsupervised Multitask Learners | 
| [GPT-3](https://arxiv.org/abs/2005.14165) | Language Models are Few-Shot Learners |
| [T5](https://arxiv.org/abs/1910.10683) | Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | 
| [CTRL](https://arxiv.org/abs/1909.05858) | CTRL: A Conditional Transformer Language Model for Controllable Generation | 
| [BART](https://arxiv.org/abs/1910.13461) | Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension |
| [Chinchilla](https://arxiv.org/abs/2203.15556)(Hoffman et al. 2022) | Shows that for a compute budget, the best performances are not achieved by the largest models but by smaller models trained on more data. |
multilanguage support 2023-03-31 08:43:20 +08:00			`# Model Collection`

			`import { Callout, FileTree } from 'nextra-theme-docs'`

			`<Callout emoji="⚠️">`
			`This section is under heavy development.`
			`</Callout>`

			`This section consists of a collection and summary of notable and foundational LLMs.`



			`## Models`

			`\| Model \| Description \|`
			`\| --- \| --- \|`
			`\| [BERT](https://arxiv.org/abs/1810.04805) \| Bidirectional Encoder Representations from Transformers \|`
			`\| [RoBERTa](https://arxiv.org/abs/1907.11692) \| A Robustly Optimized BERT Pretraining Approach \|`
			`\| [ALBERT](https://arxiv.org/abs/1909.11942) \| A Lite BERT for Self-supervised Learning of Language Representations \|`
			`\| [XLNet](https://arxiv.org/abs/1906.08237) \| Generalized Autoregressive Pretraining for Language Understanding and Generation \|`
Model Collection GPT model description Following paper title of Generative Pre-Training language model. 2023-04-03 13:18:22 +08:00			`\| [GPT](https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf) \| Improving Language Understanding by Generative Pre-Training \|`
multilanguage support 2023-03-31 08:43:20 +08:00			`\| [GPT-2](https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) \| Language Models are Unsupervised Multitask Learners \|`
			`\| [GPT-3](https://arxiv.org/abs/2005.14165) \| Language Models are Few-Shot Learners \|`
			`\| [T5](https://arxiv.org/abs/1910.10683) \| Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer \|`
			`\| [CTRL](https://arxiv.org/abs/1909.05858) \| CTRL: A Conditional Transformer Language Model for Controllable Generation \|`
			`\| [BART](https://arxiv.org/abs/1910.13461) \| Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension \|`
Model Collection GPT model description Following paper title of Generative Pre-Training language model. 2023-04-03 13:18:22 +08:00			`\| [Chinchilla](https://arxiv.org/abs/2203.15556)(Hoffman et al. 2022) \| Shows that for a compute budget, the best performances are not achieved by the largest models but by smaller models trained on more data. \|`