The “wizard” interface design has been around for decades: interfaces that guide users step by step through complicated procedures, usually installations. Even today, you see them on the web with software like Typeform (splits form questions into a multi-step UI, which is aesthetically pleasing and maybe even increases conversion).
One idea I’ve been thinking about recently is generative interfaces. These interfaces have to be designed individually for every task. There are “no-code” builders that help design these wizards, but they still require work. The logic gets complicated exponentially fast — every branch in the decision tree can add many different wizard states.
What if we could compile designs just in time with generative AI? Given the current application state, the UI is conditionally rendered according to the output of some AI. For a form, this might be letting the user input multiple answers in a free-form text box, have the AI try to parse it into the structured output, and then ask clarifying questions for the remaining or unclear values.
For application onboarding, it might allow users to have a customized journey — what are you interested in learning about? How will you use the application? How familiar are you with the application already?
Instead of dealing with a raw chat interface — there might be richer elements — input boxes, sliders, select forms, or other interactive elements.
One more thought — generative interfaces might be best served as a “design in the small” paradigm. That is, instead of trying to generate an entire application, they might be more useful if they generate a single element or piece of the UI. Maybe a foundation of generative UI components that you can assemble together.
I’ve been exploring the opportunities in Generative UI as well. Some thoughts:
- UI generator as agent: If the AI has some set of (configurable) components to work with, it has similarities to modern semantic agents (e.g. ChatGPT with Plugins), which choose from a set of tools based on their declared purposes. Agent frameworks might even be directly usable for generative UI.
- Application state as agent goal: Taking the above further, the application itself can be conceptualized as an agent whose goal is to reach some desired internal state (e.g. a form is completely filled out), in which case the UI components aren’t the entire “tools”, but rather just the contracts; the human user is the actor behind those contracts.
- Markup format & generation efficiency: Users tend to have limited patience for UI rendering, and token generation is going to take time. Additionally, common markup languages for this purpose (e.g. JSX) may be particularly poorly suited, e.g. due to large numbers of opening and closing brackets. There may be “abbreviated” formats, i.e. JSON is to YAML as JSX is to ___? Something like Pug comes to mind.
- Streaming support: The way chat interfaces stream text is important for perceived UI responsiveness. But UI markup languages that require a nesting structure aren’t ideal for this because you can’t parse it until the very end (unless someone builds a fuzzy parser?). There may be alternate markup formats, e.g. I can imagine a format where the elements are generated in a completely flat list, one per line, where each one declares its immediate parent. As soon as a line is generated, that element is attached to its parent. This UI could even be interactive while still being generated, and the model could be told to generate the (contextually) most likely options first.
- Division of labor: I suspect there will be some elements that we want to display deterministically. E.g., as an easy example, the logic to display loading UI should possibly be mechanical; if we had to *generate* the loading UI via LLM, what do we show while the loading UI is loading?
Cool idea. I think giving up the reins entirely to AI is folly, but tasteful addition of rehash is a nice thought.