Merge pull request #316 from holarissun/add-panelgpt-promptoirl

Add discussions on PanelGPT, Prompt-OIRL, OPRO, and an Article on Prompting with RL.
2023-10-16 09:57:23 -06:00 · 2023-10-16 09:57:23 -06:00 · b7313a7662
parent 33d6a1c7fa 5781df968d
commit b7313a7662
3 changed files with 13 additions and 7 deletions
--- a/pages/papers.en.mdx
+++ b/pages/papers.en.mdx
@ -4,6 +4,7 @@ The following are the latest papers (sorted by release date) on prompt engineeri

 ## Overviews

+- [An RL Perspective on RLHF, Prompting, and Beyond](https://arxiv.org/abs/2310.06147) (October 2023)
 - [Few-shot Fine-tuning vs. In-context Learning: A Fair Comparison and Evaluation](https://arxiv.org/abs/2305.16938) (May 2023)
 - [Jailbreaking ChatGPT via Prompt Engineering: An Empirical Study](https://arxiv.org/abs/2305.13860) (May 2023)
 - [Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond](https://arxiv.org/abs/2304.13712) (April 2023)
@ -22,6 +23,7 @@ The following are the latest papers (sorted by release date) on prompt engineeri

 ## Approaches

+- [Query-Dependent Prompt Evaluation and Optimization with Offline Inverse RL](https://arxiv.org/abs/2309.06653) (September 2023)
 - [Chain-of-Verification Reduces Hallucination in Large Language Models](https://arxiv.org/abs/2309.11495) (September 2023)
 - [Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers](https://arxiv.org/abs/2309.08532) (September 2023)
 - [From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting](https://arxiv.org/abs/2309.04269) (September 2023)
--- a/pages/techniques/ape.en.mdx
+++ b/pages/techniques/ape.en.mdx
@ -8,9 +8,9 @@ import APECOT from '../../img/ape-zero-shot-cot.png'
 <Screenshot src={APE} alt="APE" />
 Image Source: [Zhou et al., (2022)](https://arxiv.org/abs/2211.01910)

-[Zhou et al., (2022)](https://arxiv.org/abs/2211.01910) propose automatic prompt engineer (APE) a framework for automatic instruction generation and selection. The instruction generation problem is framed as natural language synthesis addressed as a black-box optimization problem using LLMs to generate and search over candidate solutions. 
+[Zhou et al., (2022)](https://arxiv.org/abs/2211.01910) propose automatic prompt engineer (APE) a framework for automatic instruction generation and selection. The instruction generation problem is framed as natural language synthesis addressed as a black-box optimization problem using LLMs to generate and search over candidate solutions.

-The first step involves a large language model (as an inference model) that is given output demonstrations to generate instruction candidates for a task. These candidate solutions will guide the search procedure. The instructions are executed using a target model, and then the most appropriate instruction is selected based on computed evaluation scores. 
+The first step involves a large language model (as an inference model) that is given output demonstrations to generate instruction candidates for a task. These candidate solutions will guide the search procedure. The instructions are executed using a target model, and then the most appropriate instruction is selected based on computed evaluation scores.

 APE discovers a better zero-shot CoT prompt than the human engineered "Let's think step by step" prompt ([Kojima et al., 2022](https://arxiv.org/abs/2205.11916)).

@ -21,6 +21,8 @@ Image Source: [Zhou et al., (2022)](https://arxiv.org/abs/2211.01910)

 This paper touches on an important topic related to prompt engineering which is the idea of automatically optimizing prompts. While we don't go deep into this topic in this guide, here are a few key papers if you are interested in the topic:

+- [Prompt-OIRL](https://arxiv.org/abs/2309.06553) - proposes to use offline inverse reinforcement learning to generate query-dependent prompts.
+- [OPRO](https://arxiv.org/abs/2309.03409) - introduces the idea of using LLMs to optimize prompts: let LLMs "Take a deep breath" improves the performance on math problems.
 - [AutoPrompt](https://arxiv.org/abs/2010.15980) - proposes an approach to automatically create prompts for a diverse set of tasks based on gradient-guided search.
- [Prefix Tuning](https://arxiv.org/abs/2101.00190) - a lightweight alternative to fine-tuning that prepends a trainable continuous prefix for NLG tasks. 
+- [Prefix Tuning](https://arxiv.org/abs/2101.00190) - a lightweight alternative to fine-tuning that prepends a trainable continuous prefix for NLG tasks.
 - [Prompt Tuning](https://arxiv.org/abs/2104.08691) - proposes a mechanism for learning soft prompts through backpropagation.
--- a/pages/techniques/tot.en.mdx
+++ b/pages/techniques/tot.en.mdx
@ -13,19 +13,19 @@ ToT maintains a tree of thoughts, where thoughts represent coherent language seq
 The ToT framework is illustrated below:

 <Screenshot src={TOT} alt="TOT" />
-Image Source: [Yao et el. (2023)](https://arxiv.org/abs/2305.10601) 
+Image Source: [Yao et el. (2023)](https://arxiv.org/abs/2305.10601)

-When using ToT, different tasks requires defining the number of candidates and the number of thoughts/steps. For instance, as demonstrated in the paper, Game of 24 is used as a mathematical reasoning task which requires decomposing the thoughts into 3 steps, each involving an intermediate equation. At each step, the best b=5 candidates are kept. 
+When using ToT, different tasks requires defining the number of candidates and the number of thoughts/steps. For instance, as demonstrated in the paper, Game of 24 is used as a mathematical reasoning task which requires decomposing the thoughts into 3 steps, each involving an intermediate equation. At each step, the best b=5 candidates are kept.

 To perform BFS in ToT for the Game of 24 task, the LM is prompted to evaluate each thought candidate as "sure/maybe/impossible" with regard to reaching 24. As stated by the authors, "the aim is to promote correct partial solutions that can be verdicted within few lookahead trials, and eliminate impossible partial solutions based on "too big/small" commonsense, and keep the rest "maybe"". Values are sampled 3 times for each thought. The process is illustrated below:

 <Screenshot src={TOT2} alt="TOT2" />
-Image Source: [Yao et el. (2023)](https://arxiv.org/abs/2305.10601) 
+Image Source: [Yao et el. (2023)](https://arxiv.org/abs/2305.10601)

 From the results reported in the figure below, ToT substantially outperforms the other prompting methods:

 <Screenshot src={TOT3} alt="TOT3" />
-Image Source: [Yao et el. (2023)](https://arxiv.org/abs/2305.10601) 
+Image Source: [Yao et el. (2023)](https://arxiv.org/abs/2305.10601)

 Code available [here](https://github.com/princeton-nlp/tree-of-thought-llm) and [here](https://github.com/jieyilong/tree-of-thought-puzzle-solver)

@ -41,3 +41,5 @@ Then all experts will go on to the next step, etc.
 If any expert realises they're wrong at any point then they leave.
 The question is...
 ```
+
+[Sun (2023)](https://github.com/holarissun/PanelGPT) benchmarked the Tree-of-Thought Prompting with large-scale experiments, and introduce PanelGPT --- an idea of prompting with Panel discussions among LLMs.