A List of 1 Billion+ Parameter LLMs
There are already over 50 different 1B+ parameter LLMs accessible via open-source checkpoints or proprietary APIs. That’s not counting any private models or models with academic papers but no available API or model weights. There’s even more if you count fine-tuned models like Alpaca or InstructGPT. A list of the ones I know about (this is an evolving document).
GPT-J (6B) (EleutherAI)
GPT-Neo (1.3B, 2.7B, 20B) (EleutherAI)
Pythia (1B, 1.4B, 2.8B, 6.9B, 12B)
Polyglot (1.3B, 3.8B, 5.8B)
J1 (7.5B, 17B, 178B) (AI21)
LLaMa (7B, 13B, 33B, 65B) (Meta)
OPT (1.3B, 2.7B, 13B, 30B, 66B, 175B) (Meta)
Fairseq (1.3B, 2.7B, 6.7B, 13B) (Meta)
YaLM (100B) (Yandex)
UL2 20B (Google)
PanGu-α (200B) (Huawei)
Cohere (Medium, XLarge)
Claude (instant-v1.0, v1.2) (Anthropic)
CodeGen (2B, 6B, 16B) (Salesforce)
Cerebras-GPT (1.3B, 2.7B, 6.7B, 13B) (Cerebras)
RWKV (14B)
BLOOM (1B, 3B, 7B)
NeMo (1.3B, 5B, 20B) (NVIDIA)
GPT-4 (OpenAI)
GPT-3.5 (OpenAI)
GPT-3 (ada, babbage, curie, davinci) (OpenAI)
Codex (cushman, davinci) (OpenAI)
T5 (11B) (Google)
CPM-Bee (10B)
Fine-tuned models
Alpaca (7B)
Convo (6B)
J1-Grande-Instruct (17B) (AI21)
InstructGPT (175B)
BLOOMZ (176B)
Flan-UL2 (20B)
Flan-T5 (11B)
T0 (11B)
Galactica (120B) (Meta)