Updated list of LLMs

pull/315/head
S4MFI 2023-10-12 16:51:43 +03:00
parent b55469455e
commit fc0c75dcc5
1 changed files with 43 additions and 0 deletions

View File

@ -9,6 +9,31 @@ This section consists of a collection and summary of notable and foundational LL
| Model | Release Date | Size (B) | Checkpoints | Description |
| --- | --- | --- | --- | --- |
| [Falcon LLM](https://falconllm.tii.ae/) | Sep 2023 | 7, 40, 180 | [Falcon-7B](https://huggingface.co/tiiuae/falcon-7b), [Falcon-40B](https://huggingface.co/tiiuae/falcon-40b), [Falcon-180B](https://huggingface.co/tiiuae/falcon-180B) | Falcon LLM is a foundational large language model (LLM) with 180 billion parameters trained on 3500 Billion tokens. TII has now released Falcon LLM a 180B model. |
| [Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) | Sep 2023 | 7 | - | Mistral-7B-Instruct-v0.1 is a fine-tuned version of the Mistral-7B-v0.1 generative text model. It's designed for instruction following and uses a variety of publicly available conversation datasets for training. The model is based on a transformer architecture with features like Grouped-Query Attention and Sliding-Window Attention. It doesn't have any moderation mechanisms but is a quick demonstration that the base model can be easily fine-tuned for compelling performance. |
| [WizardLM-70b-v1.0](https://huggingface.co/WizardLM/WizardLM-70B-V1.0) | Aug 2023 | 70 | - | WizardLM-70b-v1.0 is a large language model designed to follow complex instructions. It performs well in coding, mathematical reasoning, and open-domain conversations. The model is license-friendly and adopts a prompt format from Vicuna for multi-turn conversations. |
| [Llama-2-70b-chat](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf) | Jul 2023 | 70 | - | Llama-2-70b-chat is an auto-regressive language model optimized for dialogue use cases. It's part of the Llama 2 family of large language models developed by Meta. The model uses an optimized transformer architecture and is fine-tuned for helpfulness and safety. It outperforms many open-source chat models on benchmarks and is on par with some popular closed-source models. |
| [ChatGLM2-6B](https://huggingface.co/THUDM/chatglm2-6b) | Jul 2023 | 6 | - | ChatGLM2-6B is the second-generation version of the open-source bilingual (Chinese-English) chat model ChatGLM-6B. It has improved performance, longer context capabilities, more efficient inference, and an open license for academic and commercial use. The model uses a hybrid objective function and has been trained with 1.4T bilingual tokens. It shows substantial improvements in performance on various datasets compared to its first-generation counterpart. |
| [Llama-2-7b-chat](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) | Jul 2023 | 7 | - | Llama-2-7b-chat is part of the Llama 2 family of large language models developed by Meta. It's a fine-tuned model optimized for dialogue use cases. The model outperforms open-source chat models on most benchmarks and is on par with some popular closed-source models like ChatGPT. It uses an optimized transformer architecture and is trained on a mix of publicly available online data. |
| [WizardLM-13B-v1.1](https://huggingface.co/WizardLM/WizardLM-13B-V1.1) | Jul 2023 | 13 | - | WizardLM-13B V1.1 is a version of the WizardLM model, designed to follow complex instructions in various NLP tasks. |
| [Llama-2-13b-chat](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) | Jul 2023 | 13 | - | Llama-2-13b-chat is part of the Llama 2 family of large language models developed by Meta. It is a fine-tuned model optimized for dialogue use cases. The model outperforms open-source chat models on most benchmarks and is on par with some popular closed-source models like ChatGPT. It uses an optimized transformer architecture and is trained on a mix of publicly available online data. |
| [XGen-7B-8K-Inst](https://huggingface.co/Salesforce/xgen-7b-8k-inst) | Jul 2023 | 7 | - | The XGen-7B-8K-Inst, developed by Salesforce AI Research, is a 7B parameter language model, fine-tuned for instruction following. |
| [CodeLlama-34B-instruct](https://huggingface.co/codellama/CodeLlama-34b-Instruct-hf) | Jul 2023 | 34 | - | CodeLlama-34B-instruct is a variant of the Code Llama family designed for general code synthesis and understanding. It is specifically tuned for instruction following and safer deployment. The model is auto-regressive and uses an optimized transformer architecture. It is intended for commercial and research use in English and relevant programming languages. |
| [LLaMA-13B](https://arxiv.org/abs/2302.13971) | Jul 2023 | 13 | - | LLaMA-13B is a part of the LLaMA-2 series. This series was an extension of the original LLaMA models. LLaMA-13B, maintains the architecture from the earlier version but was trained with 40% more data to enhance its performance. It's built to be a high-performing language model, excelling in various natural language processing tasks, and aiming to push the boundaries of what language models can achieve in both size and capability |
| [WizardLM-30B](https://huggingface.co/WizardLM/WizardLM-30B-V1.0) | Jul 2023 | 30 | - | WizardLM-30B V1.0 is a model developed by the WizardLM Team, designed to empower large pre-trained language models to follow complex instructions across various NLP tasks. |
| [WizardLM-13b-v1.2](https://huggingface.co/WizardLM/WizardLM-13B-V1.2) | Jul 2023 | 13 | - | WizardLM-13b-v1.2 is an instruction-following large language model that achieves high scores on various benchmarks like MT-Bench and AlpacaEval. It is designed to provide helpful, detailed, and polite answers in multi-turn conversations. It was trained from Llama-2 and uses brand-new Evol+ methods. |
| [Claude-2](https://www.anthropic.com/index/claude-2) | Jul 2023 | 130 | - | Claude 2 is a foundational LLM built by Anthropic, designed to be safer and more "steerable" than its previous version. It is conversational and can be used for a variety of tasks like customer support, Q&A, and more. It can process large amounts of text and is well-suited for applications that require handling extensive data, such as documents, emails, FAQs, and chat transcripts. |
| [MPT-30B-Instruct](https://huggingface.co/mosaicml/mpt-30b-instruct) | Jun 2023 | 30 | - | MPT-30B-Instruct is a model developed by MosaicML, fine-tuned from the Llama 2 model, and trained on various datasets including Dolly HHRLHF, Competition Math, Duorc, CoT GSM8k, Qasper, Quality, Summ Screen FD, and Spider. It is designed for short-form instruction following and utilizes a modified decoder-only transformer architecture. |
| [Nous-Hermes-13B](https://huggingface.co/NousResearch/Nous-Hermes-13b) | Jun 2023 | 13 | - | Nous-Hermes-13B is a language model fine-tuned by Nous Research on over 300,000 instructions. |
| [H2O-Oasst-OpenLLaMA-13B](https://huggingface.co/h2oai/h2ogpt-gm-oasst1-en-2048-open-llama-13b) | Jun 2023 | 13 | - | H2O-Oasst-OpenLLaMA-13B, also known as h2ogpt-gm-oasst1-en-2048-open-llama-13b, is a model developed by H2O AI and trained using H2O LLM Studio. It is based on the OpenLLaMA 13B model and has been fine-tuned on the OASST1 dataset. |
| [Baize-v2-13B](https://huggingface.co/project-baize/baize-v2-13b) | Jun 2023 | 13 | - | Baize-v2-13B is an open-source chat model developed by UCSD and Sun Yat-Sen University, fine-tuned with LoRA, and trained with supervised fine-tuning (SFT) and self-distillation with feedback (SDF). It is a 13B parameter model that has been merged with LLaMA and is designed to engage in detailed and informative conversations, adhering to a specific conversational format. Baize, named after a mythical creature in Chinese folklore known to speak human languages and possess vast knowledge, is expected to provide detailed responses and avoid engaging in unethical or sensitive topics. |
| [Tulu-30B](https://huggingface.co/allenai/tulu-30b) | Jun 2023 | 30 | - | Tulu 30B is a model developed by Allen Institute for AI and is a 30B parameter LLaMa model that has been fine-tuned on a mixture of instruction datasets, including FLAN V2, CoT, Dolly, Open Assistant 1, GPT4-Alpaca, Code-Alpaca, and ShareGPT. It is designed to follow complex instructions across various NLP tasks |
| [MPT-30B-chat](https://huggingface.co/mosaicml/mpt-30b-chat) | Jun 2023 | 30 | - | MPT-30B-chat is a chatbot-like model fine-tuned on MPT-30B. It's part of the Mosaic Pretrained Transformer (MPT) family and is optimized for dialogue generation. The model is trained on a large amount of data (1T tokens) and is capable of handling extremely long inputs thanks to ALiBi. It also offers fast training and inference through features like FlashAttention and FasterTransformer. |
| [WizardLM-13B-v1.0](https://huggingface.co/WizardLM/WizardLM-13B-V1.0) | May 2023 | 13 | - | WizardLM-13B V1.0 is a model developed by the WizardLM Team, designed for various NLP tasks. |
| [RWKV-4-Raven-14B](https://huggingface.co/BlinkDL/rwkv-4-raven) | May 2023 | 14 | - | RWKV-4-Raven-14B is part of the RWKV-4 "Raven" series of models. These models are fine-tuned on various datasets like Alpaca, CodeAlpaca, Guanaco, GPT4All, and ShareGPT. The model is designed to be surprisingly good for its size and is developed by BlinkDL. It follows a 100% RNN architecture for its language model. |
| [Falcon-40B-Instruct](https://huggingface.co/tiiuae/falcon-40b-instruct) | May 2023 | 40 | - | Falcon-40B-Instruct, developed by TII and based on Falcon-40B, is a 40B parameter causal decoder-only model fine-tuned on a mixture of Baize. It is optimized for inference with technologies like FlashAttention and multiquery and is designed for chat and instruction-following tasks in NLP. |
| [MPT-7B-Chat](https://huggingface.co/mosaicml/mpt-7b-chat) | May 2023 | 7 | - | MPT-7B-Chat is a dialogue generation model. It was fine-tuned on various datasets like ShareGPT-Vicuna, HC3, Alpaca, HH-RLHF, and Evol-Instruct. The model was developed by MosaicML and uses a modified decoder-only transformer architecture. It's licensed under CC-By-NC-SA-4.0, meaning it's for non-commercial use only. |
| [PaLM-Chat-Bison-001](https://cloud.google.com/vertex-ai/docs/generative-ai/learn/models#foundation_models) | May 2023 | - | - | PaLM-Chat-Bison-001 is optimized for dialog language tasks such as the implementation of chatbots or AI agents. It can handle zero, one, and few-shot tasks. The model does not have adjustable safety settings and has a rate limit of 90 requests per minute during preview. |
| [Guanaco-65B](https://huggingface.co/timdettmers/guanaco-65b-merged) | May 2023 | 65 | - | The Guanaco 65B model is an open-source finetuned chatbot developed by Tim Dettmers. It was obtained through 4-bit QLoRA tuning of LLaMA base models on the OASST1 dataset. |
| [PaLM 2](https://arxiv.org/abs/2305.10403) | May 2023 | - | - | A Language Model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. |
| [Med-PaLM 2](https://arxiv.org/abs/2305.09617v1) | May 2023 | - | - | Towards Expert-Level Medical Question Answering with Large Language Models |
| [Gorilla](https://arxiv.org/abs/2305.15334v1) | May 2023 | 7 | [Gorilla](https://github.com/ShishirPatil/gorilla) | Gorilla: Large Language Model Connected with Massive APIs |
@ -21,10 +46,28 @@ This section consists of a collection and summary of notable and foundational LL
| [StarCoder](https://huggingface.co/blog/starcoder) | May 2023 | 15 | [StarCoder](https://huggingface.co/bigcode/starcoder) | StarCoder: A State-of-the-Art LLM for Code |
| [MPT-7B](https://www.mosaicml.com/blog/mpt-7b) | May 2023 | 7 | [MPT-7B](https://github.com/mosaicml/llm-foundry#mpt) | MPT-7B is a GPT-style model, and the first in the MosaicML Foundation Series of models. |
| [DLite](https://medium.com/ai-squared/announcing-dlite-v2-lightweight-open-llms-that-can-run-anywhere-a852e5978c6e) | May 2023 | 0.124 - 1.5 | [DLite-v2-1.5B](https://huggingface.co/aisquared/dlite-v2-1_5b) | Lightweight instruction following models which exhibit ChatGPT-like interactivity. |
| [Vicuna-7B](https://huggingface.co/lmsys/vicuna-7b-v1.5) | Apr 2023 | 7 | - | Vicuna-7B is a chat assistant trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT. It's an auto-regressive language model based on the transformer architecture. The primary use of Vicuna is research on large language models and chatbots. The model is intended for researchers and hobbyists in natural language processing, machine learning, and artificial intelligence. |
| [Dolly-V2-12B](https://huggingface.co/databricks/dolly-v2-12b) | Apr 2023 | 12 | - | Dolly-V2-12B is a 12 billion parameter causal language model developed by Databricks, deriving its architecture from EleutherAI's Pythia-12b. It was fine-tuned on a corpus of approximately 15,000 instructions generated by Databricks employees, aiming to excel in instruction-following tasks. |
| [FastChat-T5-3B](https://huggingface.co/lmsys/fastchat-t5-3b-v1.0) | Apr 2023 | 3 | - | FastChat-T5 is an open-source chatbot trained by fine-tuning Flan-t5-xl (3B parameters) on user-shared conversations collected from ShareGPT. It's based on an encoder-decoder transformer architecture and can autoregressively generate responses to users' inputs. |
| [GPT4All-13B-Snoozy](https://huggingface.co/nomic-ai/gpt4all-13b-snoozy) | Apr 2023 | 13 | - | GPT4All-13B-Snoozy is a GPL licensed chatbot trained over a massive curated corpus of assistant interactions including word problems, multi-turn dialogue, code, poems, songs, and stories. It has been finetuned from LLama 13B and is developed by Nomic AI. The model is designed for assistant-style interaction data and is primarily in English. |
| [Koala-13B](https://bair.berkeley.edu/blog/2023/04/03/koala/) | Apr 2023 | 13 | - | Koala-13B is a chatbot created by Berkeley AI Research (BAIR). It is fine-tuned on Meta's LLaMA and focuses on dialogue data scraped from the web. The model aims to balance performance and cost, providing a lighter, open-source alternative to models like ChatGPT. It has been trained on interaction data that includes conversations with highly capable closed-source models such as ChatGPT. |
| [StableLM-Tuned-Alpha-7B](https://huggingface.co/stabilityai/stablelm-tuned-alpha-7b) | Apr 2023 | 7 | - | StableLM-Tuned-Alpha-7B is a decoder-only language model with 7 billion parameters, built upon the foundational StableLM-Base-Alpha models. It has been further fine-tuned on various datasets aimed at enhancing chat and instruction-following capabilities. |
| [OpenAssistant-LLaMA-30B](https://huggingface.co/OpenAssistant/oasst-sft-6-llama-30b-xor) | Apr 2023 | 30 | - | OpenAssistant-LLaMA-30B, released on 15th April 2023, is a language model from OpenAssistant's seventh phase of work on the Llama 30B model. Trained on a 30 billion-word dataset, it supports CPU + GPU inference using GGML format and aims to provide an open-source alternative for instruction following tasks |
| [Dolly](https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm) | April 2023 | 3, 7, 12 | [Dolly](https://huggingface.co/databricks/dolly-v2-12b) | An instruction-following LLM, fine-tuned on a human-generated instruction dataset licensed for research and commercial use. |
| [StableLM](https://github.com/Stability-AI/StableLM#stablelm-alpha) | April 2023 | 3, 7 | [StableLM-Alpha](https://github.com/Stability-AI/StableLM#stablelm-alpha) | Stability AI's StableLM series of language models |
| [Pythia](https://arxiv.org/abs/2304.01373) | April 2023 | 0.070 - 12 | [Pythia](https://github.com/eleutherai/pythia) | A suite of 16 LLMs all trained on public data seen in the exact same order and ranging in size from 70M to 12B parameters. |
| [Open Assistant (Pythia Family)](https://open-assistant.io/) | March 2023 | 12 | [Open Assistant](https://huggingface.co/OpenAssistant) | OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so. |
| [Claude-instant-1](https://www.anthropic.com/index/introducing-claude) | Mar 2023 | - | - | Claude-instant-1 is a smaller and faster model compared to its predecessors, aimed at handling various tasks like text analysis, summarization, and casual dialogue among others |
| [ChatGLM-6B](https://huggingface.co/THUDM/chatglm-6b) | Mar 2023 | 6 | - | ChatGLM-6B, is an open-source, Chinese-English bilingual dialogue model based on the General Language Model (GLM) architecture with 6.2 billion parameters. Despite its small size causing some factual or mathematical logic issues, it's adept for Chinese question-answering, summarization, and conversational tasks due to its training on over 1 trillion English and Chinese tokens |
| [OpenAssistant-Pythia-12B](https://huggingface.co/OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5) | Mar 2023 | 12 | - | OpenAssistant-Pythia-12B is the first iteration English supervised-fine-tuning (SFT) model of the Open-Assistant project. It's based on Pythia 12B and was fine-tuned on ~22k human demonstrations collected through the Open-Assistant human feedback web app. |
| [GPT-3.5-turbo](https://openai.com/blog/chatgpt) | Mar 2023 | 175 | - | GPT-3.5-Turbo is OpenAI's advanced language model optimized for chat but also works well for traditional completion tasks. It offers better performance across all aspects compared to GPT-3 and is 10 times cheaper per token. |
| [Vicuna-13B-16k](https://huggingface.co/lmsys/vicuna-13b-v1.5-16k) | Mar 2023 | 13 | - | icuna is a chat assistant developed by LMSYS, fine-tuned from the Llama 2 model and trained on user-shared conversations from ShareGPT. Specifically, Vicuna v1.5 (16k) is fine-tuned with supervised instruction and linear RoPE scaling, utilizing around 125K conversations packed into sequences containing 16K tokens each. It is designed primarily for research on large language models and chatbots, catering to researchers and hobbyists in fields like natural language processing and artificial intelligence. |
| [Vicuna-33B](https://huggingface.co/lmsys/vicuna-33b-v1.3) | Mar 2023 | 33 | - | Vicuna-33B is an auto-regressive language model based on the transformer architecture. It's fine-tuned from LLaMA and primarily intended for research on large language models and chatbots. It's developed by LMSYS and has a non-commercial license. |
| [Vicuna-7B-16k](https://huggingface.co/lmsys/vicuna-7b-v1.5-16k) | Mar 2023 | 7 | - | icuna is a chat assistant developed by LMSYS, fine-tuned from the Llama 2 model and trained on user-shared conversations from ShareGPT. Specifically, Vicuna v1.5 (16k) is fine-tuned with supervised instruction and linear RoPE scaling, utilizing around 125K conversations packed into sequences containing 16K tokens each. It is designed primarily for research on large language models and chatbots, catering to researchers and hobbyists in fields like natural language processing and artificial intelligence. |
| [Guanaco-33B](https://huggingface.co/timdettmers/guanaco-33b-merged) | Mar 2023 | 33 | - | Guanaco-33B is an open-source chatbot fine-tuned through 4-bit QLoRA tuning of LLaMA base models on the OASST1 dataset. It's intended for research purposes and is competitive with commercial chatbot systems like ChatGPT and BARD. The model allows for cheap and local experimentation with high-quality chatbot systems and is available in multiple parameter sizes including 7B, 13B, and 33B. |
| [Vicuna-13B](https://huggingface.co/lmsys/vicuna-13b-v1.5) | Mar 2023 | 13 | - | Vicuna-13B is an open-source chatbot trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT. It's developed by LMSYS and is primarily intended for research on large language models and chatbots. The model has shown to achieve more than 90% quality of OpenAI ChatGPT and Google Bard in preliminary evaluations. |
| [Alpaca-13B](https://crfm.stanford.edu/2023/03/13/alpaca.html) | Mar 2023 | 13 | - | Alpaca is an instruction-following language model fine-tuned from Meta's LLaMA 7B. It's designed for academic research to address issues like misinformation and toxicity. Alpaca is trained on 52K instruction-following demonstrations and aims to be a more accessible option for academic study. It's not intended for commercial use due to licensing and safety concerns. |
| [Claude-1](https://www.anthropic.com/index/introducing-claude) | Mar 2023 | 137 | - | Claude is foundational a large language model (LLM) built by Anthropic. It is designed to be a helpful, honest, and harmless AI assistant. It can perform a wide variety of conversational and text processing tasks and is accessible through a chat interface and API. |
| [Cerebras-GPT](https://arxiv.org/abs/2304.03208) | March 2023 | 0.111 - 13 | [Cerebras-GPT](https://huggingface.co/cerebras) | Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster |
| [BloombergGPT](https://arxiv.org/abs/2303.17564v1)| March 2023 | 50 | - | BloombergGPT: A Large Language Model for Finance|
| [PanGu-Σ](https://arxiv.org/abs/2303.10845v1) | March 2023 | 1085 | - | PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing |