There is still for the time being the question of performance and cost even if the context size gets bigger. Therefore I wouldn’t be do sure that simple implementations which just send everything would be always competitive. Just sending what is needed might result in dramatic cost savings and at the same time improve performance.
There is still for the time being the question of performance and cost even if the context size gets bigger. Therefore I wouldn’t be do sure that simple implementations which just send everything would be always competitive. Just sending what is needed might result in dramatic cost savings and at the same time improve performance.
Really interested to see how "well" these very long contexts are "used" (looking at paper "Lost in the middle: how language models use long contexts")
I like the analogy to Moore's Law. It's not perfect, but it does help us see what's going on here.
Great article Matt especially given the OpenAI dev day. Context stuffing here we come! Only at what cost? This gets expensive fast.