Switching between LLMs is a necessary pain when testing (or even in production). What LLM works best for the task at hand? Which is the fastest? The most cost-effective? Tuned to your prompts?
Different LLM APIs are unnecessarily different despite serving similar model outputs — HuggingFace might return a JSON object with a “generated_text” field, while OpenAI returns a JSON object with an array of “choices”. Every single LLM framework or library has painstakingly implemented “connectors” for every different API provider. These connectors break over time and still have different semantics in code.
What if you could inference any model through a single API endpoint? With no additional markup (pay the same inference as you would directly).
The Model API Gateway provides a single interface to interact with over 20 models from GPT-4 to Llama 2. A universal API endpoint that can be called with a simple HTTP request or a zero-dependency client library in Python, Go, or TypeScript. Use a single API key — no need to input your OpenAI, HuggingFace, or other credentials.
If you’re already using OpenAI APIs, you can switch to the Model API Gateway by changing a single line of code.
openai.api_base = 'https://api.thiggle.com/v1/'
Two libraries that I open-sourced — OpenLM and llm.ts — provided Python and TypeScript interfaces to do this for hosted APIs. But open-source models have gotten much better in the months since releasing those, and hosting those models (like Llama 2) isn’t always straightforward. Also, keeping track of spend and API keys for different providers is difficult and error-prone, so the Model API Gateway does this for you.
https://openrouter.ai/ is interesting in this space as well.