Horizontal Tuning: Instruction, Chat, and What Else?

Oct 23, 2023

So far, LLMs have been fine-tuned in two specific ways other than generic next-token completion.

Instruction-tuned models are specialized in answering questions or commands. “Write me a story” or “What is the capital of France?”.
Chat-tuned models are specialized in dialogue between (usually human and AI) entities. Think of all the conversational agents (ChatGPT, etc.). For example, you can ask a chat-tuned model to summarize a document, but an instruction-tuned model will probably do a better job. However, chat-tuned models can usually hold a more coherent conversation and have been used to power many different applications like answering questions, tutoring, and customer support.

But what’s beyond instruction-tuning and chat-tuning? Are there similar horizontal applications of tuning that would make sense for LLMs? That is, beyond fine-tuning for specific tasks, can we come up with better formats to query LLMs? I don’t know, but my intuition says yes. It might entail a small structure that lives over the input and compiles down to some intermediate representation (why ChatML is so interesting). Some ideas:

Question-tuned: Given a block of text, return a list of insightful and relevant questions about the text. (Imperative, declarative, interrogative, and exclamatory interfaces).
Editor-tuned: Given a block of text, returns the same block of text edited for correctness and clarity.
Schedule-tuned: Given a command, break it down into multiple smaller tasks.
Filter-tuned: Given a block of text and a set of fuzzy filters, return the same block of text with only the text that passes the filter.
Reverse-instruction-tuned: Given some output, generate the prompt. Could be useful for training or evaluating instruction-tuned models.
Reverse-chat-tuned: I don’t know what this would exactly be used for, but reversing the input-output pairs for chat-tuning. Might at least shed some more light on how these models work.
Diff-tuned: Given a block of text and a diff, return the original + changes applied. Could be useful for everything from merge conflicts in code to document-based collaboration.

Alex Gorischek

Oct 24, 2023

Similar to your note on "Schedule-tuned", I could image models fine-tuned for use as the planning/reasoning engines for autonomous agents. This is a bit more general than taking a direct "command" as an input; the usual inputs would likely be "context about the state of the world", "goal", and "available tools".

Expand full comment

Andrew Smith

Do you use Bard much? I use it fairly regularly, mainly for research. It seems to be kind of in between instruction-tuned and conversational. Do I have that right?

Matt Rickard

Discussion about this post