Chain of Thought Paradigms in LLMs

Mar 09, 2023

Chain of thought (CoT), breaking a problem down into a series of intermediate reasoning steps, has significantly improved the ability of LLMs to perform complex reasoning. But, most importantly, it is the current state-of-the-art in teaching LLMs how to take action (API calls, RPA, or anything else).

An overview of different strategies.

Few-shot CoT. Provide examples of Question-Answer pairs where the answer is explained "step by step."

Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now?

A: Roger started with 5 balls. 2 cans of 3 tennis balls each is 6 tennis balls. 5 + 6 = 11. The answer is 11.

From Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.

Zero-shot CoT. Prefix the Answer block with "Let's think step by step." to prompt the LLM to complete the output in that format.

Self-consistency CoT. First, prompt the model with CoT, generate multiple completions, and choose the most consistent answer. You can think of this as a self-ensemble method.

Self-consistency Improves Chain of Thought Reasoning in Language Models.

Least-to-Most. Borrowed from an idea in education psychology, generating a list of questions to answer and then sequentially solving the subquestions. Problem reduction followed by problem-solving.

Least-to-Most Prompting Enables Complex Reasoning in Large Language Models.

ReAct. Given a claim or question, generate a completion identifying an action to take, record the action, and make an observation from the result. Repeat until the task is finished, recognized by calling a special FINISH action.

Thought:
Action:
Observation:

Claim: Princess Mononoke is a film.
Thought 1: I need to search Princess Mononoke and find if it is a film. Action 1: Search[Princess Mononoke] Observation 1: Princess Mononoke ... Thought 2: From the observation, it says that Princess Mononoke is a film. Action 2: Finish[SUPPORTS]
Observation 2: Episode finished

ReAct: Synergizing Reasoning and Acting in Language Models.

Chain of thought prompting techniques have been shown to increase LLM performance significantly, but they still feel incredibly unoptimized.

Matt Rickard

Discussion about this post

Ready for more?