In a world where “scale is all you need,” sometimes the biggest models don’t win. Some reasons why smaller LLMs might pull ahead.
Many of these points follow from each other.
Quicker to train. Obvious, but quicker feedback means faster iterations. Faster training, faster fine-tuning, faster results.
Runs locally. The smaller the model, the more environments it can run in.
Easier to debug. If you can run it on your laptop, it’s easier to debug.
No specialized hardware. Small LLMs rarely require specialized hardware to train or run inference. In a market where the biggest chips are high in demand and low in supply, this matters.
Cost-effective. Smaller models are cheaper to run. This opens up more NPV-positive applications they can work on.
Lower latency. Smaller models can generate completions faster. Most models can’t run in low-latency environments today.
Runs on the edge. Low latency, smaller file size, and shorter startup times mean that small LLMs can run at the edge.
Easier to deploy. Getting to production is sometimes the hardest part.
Can be ensembled. It’s rumored that GPT-4 is eight smaller models. Ensembling smaller models together is a strategy that’s worked for decades of pragmatic machine learning.
A few more conjectures on why small models might be better:
More interpretable? We don’t have a defining theory on interpretability of LLMs, but I imagine that we’ll understand more of what’s going on in 7 billion parameter models before we know what’s going on in 60 billion parameter models.
Enhanced reproducibility? Small LLMs can easily be trained from scratch again. Counter this with the largest LLMs, which might undergo multiple checkpoints and continued training. Reproducing a model that was trained in an hour is much easier than one trained in six months.
Okay, but can we call them SLMs?