multimodal CoT
parent
db5414fbd6
commit
91495493ed
|
@ -89,6 +89,7 @@ The following are the latest papers (sorted by release date) on prompt engineeri
|
|||
- [Large Language Models are Zero-Shot Reasoners](https://arxiv.org/abs/2205.11916) (May 2022)
|
||||
- [MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning](https://arxiv.org/abs/2205.00445) (May 2022)
|
||||
- [Toxicity Detection with Generative Prompt-based Inference](https://arxiv.org/abs/2205.12390) (May 2022)
|
||||
- [Learning to Transfer Prompts for Text Generation](https://arxiv.org/abs/2205.01543) (May 2022)
|
||||
- [The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning](https://arxiv.org/abs/2205.03401) (May 2022)
|
||||
- [A Taxonomy of Prompt Modifiers for Text-To-Image Generation](https://arxiv.org/abs/2204.13988) (Apr 2022)
|
||||
- [PromptChainer: Chaining Large Language Model Prompts through Visual Programming](https://arxiv.org/abs/2203.06566) (Mar 2022)
|
||||
|
|
|
@ -7,7 +7,7 @@ In this section, we discuss other miscellaneous but important topics in prompt e
|
|||
Topic:
|
||||
- [Program-Aided Language Models](#program-aided-language-models)
|
||||
- [ReAct](#react)
|
||||
- [Multimodal Prompting](#multimodal-prompting)
|
||||
- [Multimodal CoT Prompting](#multimodal-prompting)
|
||||
- [GraphPrompts](#graphprompts)
|
||||
|
||||
---
|
||||
|
@ -30,10 +30,13 @@ The ReAct framework can allow LLMs to interact with external tools to retrieve a
|
|||
Full example coming soon!
|
||||
|
||||
---
|
||||
## Multimodal Prompting
|
||||
In this section, we will cover some examples of multimodal prompting techniques and applications that leverage multiple modalities as opposed to just text alone.
|
||||
## Multimodal CoT Prompting
|
||||
|
||||
Examples coming soon!
|
||||
[Zhang et al. (2023)](https://arxiv.org/abs/2302.00923) recently proposed a multimodal chain-of-thought prompting approach. Traditional CoT focuses on the language modality. In contrast, Multimodal CoT incorporates text and vision into a two-stage framework. The first step involves rationale generation based on multimodal information. This is followed by the second phase, answer inference, which leverages the informative generated rationales.
|
||||
|
||||
The multimodal CoT model (1B) outperforms GPT-3.5 on the ScienceQA benchmark.
|
||||
|
||||
![](../img/multimodal-cot.png)
|
||||
|
||||
---
|
||||
## GraphPrompts
|
||||
|
|
Binary file not shown.
After Width: | Height: | Size: 171 KiB |
Loading…
Reference in New Issue