The “wizard” interface design has been around for decades: interfaces that guide users step by step through complicated procedures, usually installations.
I’ve been exploring the opportunities in Generative UI as well. Some thoughts:
- UI generator as agent: If the AI has some set of (configurable) components to work with, it has similarities to modern semantic agents (e.g. ChatGPT with Plugins), which choose from a set of tools based on their declared purposes. Agent frameworks might even be directly usable for generative UI.
- Application state as agent goal: Taking the above further, the application itself can be conceptualized as an agent whose goal is to reach some desired internal state (e.g. a form is completely filled out), in which case the UI components aren’t the entire “tools”, but rather just the contracts; the human user is the actor behind those contracts.
- Markup format & generation efficiency: Users tend to have limited patience for UI rendering, and token generation is going to take time. Additionally, common markup languages for this purpose (e.g. JSX) may be particularly poorly suited, e.g. due to large numbers of opening and closing brackets. There may be “abbreviated” formats, i.e. JSON is to YAML as JSX is to ___? Something like Pug comes to mind.
- Streaming support: The way chat interfaces stream text is important for perceived UI responsiveness. But UI markup languages that require a nesting structure aren’t ideal for this because you can’t parse it until the very end (unless someone builds a fuzzy parser?). There may be alternate markup formats, e.g. I can imagine a format where the elements are generated in a completely flat list, one per line, where each one declares its immediate parent. As soon as a line is generated, that element is attached to its parent. This UI could even be interactive while still being generated, and the model could be told to generate the (contextually) most likely options first.
- Division of labor: I suspect there will be some elements that we want to display deterministically. E.g., as an easy example, the logic to display loading UI should possibly be mechanical; if we had to *generate* the loading UI via LLM, what do we show while the loading UI is loading?
Hey Matt, an OS like this which is based on spatial concept is really postioned well to use this paradigm, check out their launch video: https://deta.space/blog/space-os
When doing doing JIT generative UI, one could take the user's preferences or any análysis taken from him, realtime, to affect and improve his experience.
With edge modelos this could even happens in your computer.
You could detect any actions on an existing application (frontend code or UI code),
create a generative UI based on preferences and trends, without disclosing these to the owner of the application, maybe an on demand API or even chatbotting your way through Amazon, and finally use a no code editor to improve details on the result, or just go through explicit iterations.
In mins you could get custom UI, which sometimes is even a product by itself, like alternate gvt portals and other inefficient but necessary processes where easy UIs are a product by themselves not yet going as far as postman vs curl.
And do what the metaverse should have done, given us a closer and more intuitive interface to access technology's super powers, and integrate fully.
I’ve been exploring the opportunities in Generative UI as well. Some thoughts:
- UI generator as agent: If the AI has some set of (configurable) components to work with, it has similarities to modern semantic agents (e.g. ChatGPT with Plugins), which choose from a set of tools based on their declared purposes. Agent frameworks might even be directly usable for generative UI.
- Application state as agent goal: Taking the above further, the application itself can be conceptualized as an agent whose goal is to reach some desired internal state (e.g. a form is completely filled out), in which case the UI components aren’t the entire “tools”, but rather just the contracts; the human user is the actor behind those contracts.
- Markup format & generation efficiency: Users tend to have limited patience for UI rendering, and token generation is going to take time. Additionally, common markup languages for this purpose (e.g. JSX) may be particularly poorly suited, e.g. due to large numbers of opening and closing brackets. There may be “abbreviated” formats, i.e. JSON is to YAML as JSX is to ___? Something like Pug comes to mind.
- Streaming support: The way chat interfaces stream text is important for perceived UI responsiveness. But UI markup languages that require a nesting structure aren’t ideal for this because you can’t parse it until the very end (unless someone builds a fuzzy parser?). There may be alternate markup formats, e.g. I can imagine a format where the elements are generated in a completely flat list, one per line, where each one declares its immediate parent. As soon as a line is generated, that element is attached to its parent. This UI could even be interactive while still being generated, and the model could be told to generate the (contextually) most likely options first.
- Division of labor: I suspect there will be some elements that we want to display deterministically. E.g., as an easy example, the logic to display loading UI should possibly be mechanical; if we had to *generate* the loading UI via LLM, what do we show while the loading UI is loading?
Cool idea. I think giving up the reins entirely to AI is folly, but tasteful addition of rehash is a nice thought.
I don't think full reins is practical, it's like self driven Cars, no problem 99% of the time after billions were invested but... that 1% though.
High precisión AI is costly, and failures are incredibly nuts
Hey Matt, an OS like this which is based on spatial concept is really postioned well to use this paradigm, check out their launch video: https://deta.space/blog/space-os
Now you've hitted an interesting topic
When doing doing JIT generative UI, one could take the user's preferences or any análysis taken from him, realtime, to affect and improve his experience.
With edge modelos this could even happens in your computer.
You could detect any actions on an existing application (frontend code or UI code),
create a generative UI based on preferences and trends, without disclosing these to the owner of the application, maybe an on demand API or even chatbotting your way through Amazon, and finally use a no code editor to improve details on the result, or just go through explicit iterations.
In mins you could get custom UI, which sometimes is even a product by itself, like alternate gvt portals and other inefficient but necessary processes where easy UIs are a product by themselves not yet going as far as postman vs curl.
And do what the metaverse should have done, given us a closer and more intuitive interface to access technology's super powers, and integrate fully.
This vision can be built with https://v0.dev/ I suppose today easily
As soon as the exit their prívate alpha I guess