Large language models can only consider a limited amount of text at one time when generating a response or prediction. This is called the context length. It differs across models. But one trend is interesting. Context length is increasing. GPT-1 (2018) had a context length of 512 tokens.
There is still for the time being the question of performance and cost even if the context size gets bigger. Therefore I wouldn’t be do sure that simple implementations which just send everything would be always competitive. Just sending what is needed might result in dramatic cost savings and at the same time improve performance.
There is still for the time being the question of performance and cost even if the context size gets bigger. Therefore I wouldn’t be do sure that simple implementations which just send everything would be always competitive. Just sending what is needed might result in dramatic cost savings and at the same time improve performance.
Really interested to see how "well" these very long contexts are "used" (looking at paper "Lost in the middle: how language models use long contexts")
I like the analogy to Moore's Law. It's not perfect, but it does help us see what's going on here.
Great article Matt especially given the OpenAI dev day. Context stuffing here we come! Only at what cost? This gets expensive fast.