A recent paper (Large Langauge Models as Optimizers) by researchers at Google DeepMind found that AI-optimized prompts (via another model) can outperform humans by up to 50% on certain benchmarks.
When I wrote Prompt Engineering Shouldn’t Exist at the start of 2023, I outlined some problems with the then-current state of prompt engineering: complex string-based prompt templates, model-specific prompts, and lack of structured I/O, to name a few. These problems are still pervasive today. Reflecting on some of the paths I thought were promising for a post-prompt engineering AI ecosystem:
A purpose-built DSL for prompts. DSPy is a programming framework open-sourced by the Stanford NLP group. It uses Pythonic syntax and encapsulates prompting techniques like chain-of-thought and self-reflection. LMQL is another attempt at building a programming language for prompting through a declarative SQL-like language. EdgeChains uses Jsonnet — a declarative configuration language (every sufficiently advanced configuration language is wrong). I’m not really sure what the future of prompt-DSLs is. It’s hard to overcome the typical reasons why DSLs fail. Although the toughest problems continue to remind me of configuration engineering.
Is there a way to integrate these DSLs vertically? Model-specific DSLs? It might be useful (from a provider or developer standpoint).
Removing degrees of freedom. Midjourney’s success can be traced (in part) to its lack of configuration. You can ask the model for images without specifying complex prompt styles or engineering. This philosophy is living on in open-source through the Fooocus library. This might mean having task-specific models that take certain parameters out of the in-context window and move them to the actual model. For example, encoding a specific style in a fine-tuned Stable Diffusion model or teaching an LLM to output JSON responses.
Meta Prompting. Have another model optimize prompts for task accuracy. For problems where tasks can cleanly be described and are recurring, this is an interesting approach. The optimization technique is probably model-agnostic, but prompts must be optimized per model.
Structured I/O. While LLMs are extremely accessible because they are natural language interfaces, natural language isn’t a good fit for programmatic use cases. For many tasks, a structured approach might make more sense. One that only uses the LLM in a pipeline but still utilizes the general reasoning ability of these LLMs.
Check out TypeChat - convert natural language text to structured data described by typescript schema
I'm confident other LLMs can outperform most humans in re-prompting or whatever we want to call this concept. I'm also confident many humans will do a better job, at least for the moment.