more few-shot examples

2023-02-09 03:56:56 +00:00 · 2023-02-09 03:56:56 +00:00 · b9f5f65d57
parent a89f992a9c
commit b9f5f65d57
4 changed files with 56 additions and 5 deletions
--- a/README.md
+++ b/README.md
@ -25,9 +25,10 @@ Announcements:
 ## Guides
 The following are a set of guides on prompt engineering developed by us. Guides are work in progress.  
- [Prompts Introduction](/guides/prompts-intro.md)
+- [Prompt Engineering - Introduction](/guides/prompts-intro.md)
- [Prompts Basic Usage](/guides/prompts-basic-usage.md)
+- [Prompt Engineering - Basic Usage](/guides/prompts-basic-usage.md)
- [Prompts Advanced Usage](/guides/prompts-advanced-usage.md)
+- [Prompt Engineering - Advanced Usage](/guides/prompts-advanced-usage.md)
 - [Prompt Engineering - Miscellaneous Topics](/guides/prompt-miscellaneous.md)
 ## Papers
 #### (Sorted by Release Date)
--- a/guides/prompt-miscellaneous.md
+++ b/guides/prompt-miscellaneous.md
@ -0,0 +1,12 @@
 # Miscellaneous Topics
 In this section, we discuss other miscellaneous but important topics in prompt engineering.
 ---
 ## Prompt Injection
 ...
 ## Multimodal Prompting
 ...
--- a/guides/prompts-advanced-usage.md
+++ b/guides/prompts-advanced-usage.md
@ -52,6 +52,43 @@ That didn't work. It seems like basic standard prompting is not enough to get re
 More recently, chain-of-thought (CoT) prompting has been popularized to address more complex arithmetic,
 commonsense, and symbolic reasoning tasks. So let's talk about CoT next and see if we can solve the above task.
 Following the findings from [Min et al. (2022)](https://arxiv.org/abs/2202.12837), here a few more tips about demonstrations/exemplars when doing few-shot:
 - the label space and the distribution of the input text specified by the demonstrations are both key (regardless of whether the labels are correct
 for individual inputs).
 - the format you use also plays a key role in performance; Even if you just use random labels, this is much better than no labels at all.  
 - additional results show that selecting random labels from a true distribution of labels (instead of a uniform distribution) also helps.
 Let's try out a few examples. Let's first try an example with random labels (meaning the labels Negative and Positive are randomly assigned to the inputs):
 ```
 This is awesome! // Negative
 This is bad! // Positive
 Wow that movie was rad! // Positive
 What a horrible show! //
 ```
 Output
 ```
 Negative
 ```
 We still get the correct answer, even though the labels have been randomized. Note that we also kept the format, which helps too. In fact, with further experimentation it seems the newer GPT models we are experimenting with are becoming more robust to even random format. Example:
 ```
 Positive This is awesome! 
 This is bad! Negative
 Wow that movie was rad!
 Positive
 What a horrible show! --
 ```
 Output
 ```
 Negative
 ```
 There is no consistency in the format above but that still affect the model from predicting the correct label. We have to conduct more thorough analysis to confirm if this holds true for different and more complex tasks.
 ---
 ## Chain-of-Thought Prompting
--- a/guides/prompts-intro.md
+++ b/guides/prompts-intro.md
@ -47,9 +47,10 @@ Is that better? Well, we told the model to complete the sentence so the result l
 The example above is a basic illustration of what's possible with LLMs today. Today's LLMs are able to perform all kinds of advanced tasks that range from text summarization to mathematical reasoning to code generation.
-You can try other simple tasks by using simple commands to instruct the model like "Write", "Classify", "Summarize", "Translate", "Order", etc.
+Here are few more tips to keep in mind while you do prompt engineering:
 - You can try other simple tasks by using simple commands to instruct the model like "Write", "Classify", "Summarize", "Translate", "Order", etc.
 - Keep in mind that you also need to experiment a lot so see what works best. Trying different instructions with different keywords, context, and data and see what works best for your particular use case and task. Usually, the more specific and relevant the context is to the task you are trying to perform, the better. We will touch on the importance of sampling and adding more context in the upcoming guides.
 Keep in mind that you also need to experiment a lot so see what works best. Trying different instructions with different keywords and data and see what works best for your particular use case and task. Usually, the mode specific and relevant the context is to the task you are trying to perform, the better. We will touch on the importance of sampling and adding more context in the upcoming guides.
 We will cover more of these capabilities in this guide but also cover other areas of interest such as advanced prompting techniques and research topics around prompt engineering.