Added checkpoints for CodeLlama and Llama-2 models

pull/319/head
S4MFI 2023-10-19 21:54:08 +03:00
parent a5e6907d7a
commit bda15ebb6e
1 changed files with 2 additions and 2 deletions

View File

@ -10,8 +10,8 @@ This section consists of a collection and summary of notable and foundational LL
| --- | --- | --- | --- | --- |
| [Falcon LLM](https://falconllm.tii.ae/) | Sep 2023 | 7, 40, 180 | [Falcon-7B](https://huggingface.co/tiiuae/falcon-7b), [Falcon-40B](https://huggingface.co/tiiuae/falcon-40b), [Falcon-180B](https://huggingface.co/tiiuae/falcon-180B) | Falcon LLM is a foundational large language model (LLM) with 180 billion parameters trained on 3500 Billion tokens. TII has now released Falcon LLM a 180B model. |
| [Mistral-7B-v0.1](https://arxiv.org/abs/2310.06825) | Sep 2023 | 7 | [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) | Mistral-7B-v0.1 is a pretrained generative text model with 7 billion parameters. The model is based on a transformer architecture with features like Grouped-Query Attention, Byte-fallback BPE tokenizer and Sliding-Window Attention. |
| [CodeLlama](https://scontent.fbze2-1.fna.fbcdn.net/v/t39.2365-6/369856151_1754812304950972_1159666448927483931_n.pdf?_nc_cat=107&ccb=1-7&_nc_sid=3c67a6&_nc_ohc=aLQJyBvzDUwAX-5EVhT&_nc_ht=scontent.fbze2-1.fna&oh=00_AfA2dCIqykviwlY3NiHIFzO85n1-JyK4_pM24FJ5v5XUOA&oe=6535DD4F) | Aug 2023 |7, 13, 34 | [CodeLlama](https://huggingface.co/codellama/CodeLlama-34b-Instruct-hf) | The Code Llama family is designed for general code synthesis and understanding. It is specifically tuned for instruction following and safer deployment. The models are auto-regressive and use an optimized transformer architecture. They are intended for commercial and research use in English and relevant programming languages. |
| [Llama-2](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/) | Jul 2023 | 70, 13, 7 | [Llama-2](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf) | LLaMA-2, developed by Meta AI, was released in July 2023 with models of 7, 13, and 70 billion parameters. It maintains a similar architecture to LLaMA-1 but uses 40% more training data. LLaMA-2 includes foundational models and dialog-fine-tuned models, known as LLaMA-2 Chat, and is available for many commercial uses, with some restrictions. |
| [CodeLlama](https://scontent.fbze2-1.fna.fbcdn.net/v/t39.2365-6/369856151_1754812304950972_1159666448927483931_n.pdf?_nc_cat=107&ccb=1-7&_nc_sid=3c67a6&_nc_ohc=aLQJyBvzDUwAX-5EVhT&_nc_ht=scontent.fbze2-1.fna&oh=00_AfA2dCIqykviwlY3NiHIFzO85n1-JyK4_pM24FJ5v5XUOA&oe=6535DD4F) | Aug 2023 |7, 13, 34 | [CodeLlama-7B](https://huggingface.co/codellama/CodeLlama-7b-hf), [CodeLlama-13B](https://huggingface.co/codellama/CodeLlama-13b-hf), [CodeLlama-34B](https://huggingface.co/codellama/CodeLlama-34b-Instruct-hf) | The Code Llama family is designed for general code synthesis and understanding. It is specifically tuned for instruction following and safer deployment. The models are auto-regressive and use an optimized transformer architecture. They are intended for commercial and research use in English and relevant programming languages. |
| [Llama-2](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/) | Jul 2023 | 7, 13, 70 | [Llama-2-7B](https://huggingface.co/meta-llama/Llama-2-7b), [Llama-2-13B](https://huggingface.co/meta-llama/Llama-2-13b), [Llama-2-70B](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf) | LLaMA-2, developed by Meta AI, was released in July 2023 with models of 7, 13, and 70 billion parameters. It maintains a similar architecture to LLaMA-1 but uses 40% more training data. LLaMA-2 includes foundational models and dialog-fine-tuned models, known as LLaMA-2 Chat, and is available for many commercial uses, with some restrictions. |
| [XGen-7B-8K](https://arxiv.org/abs/2309.03450) | Jul 2023 | 7 | [XGen-7B-8K](https://huggingface.co/Salesforce/xgen-7b-8k-inst) | The XGen-7B-8K, developed by Salesforce AI Research, is a 7B parameter language model. |
| [Claude-2](https://www.anthropic.com/index/claude-2) | Jul 2023 | 130 | - | Claude 2 is a foundational LLM built by Anthropic, designed to be safer and more "steerable" than its previous version. It is conversational and can be used for a variety of tasks like customer support, Q&A, and more. It can process large amounts of text and is well-suited for applications that require handling extensive data, such as documents, emails, FAQs, and chat transcripts. |
| [Tulu](https://arxiv.org/abs/2306.04751) | Jun 2023 | 7, 13, 30, 65 | [Tulu-7B](https://huggingface.co/allenai/tulu-7b), [Tulu-13B](https://huggingface.co/allenai/tulu-13b) [Tulu-30B](https://huggingface.co/allenai/tulu-30b), [Tulu-65B](https://huggingface.co/allenai/tulu-65b) | Tulu is a family of models developed by Allen Institute for AI. The models are LLaMa models that have been fine-tuned on a mixture of instruction datasets, including FLAN V2, CoT, Dolly, Open Assistant 1, GPT4-Alpaca, Code-Alpaca, and ShareGPT. They are designed to follow complex instructions across various NLP tasks |