OpenAI DevDay November 6th, 2023
This year at DevDay, OpenAI released a new version of Chat-GPT4 called ‘Turbo’. Turbo boasts 128K context window as its biggest feature. In addition, the Turbo model is more capable cheaper than the previous Chat-GPT versions.
A list of some new new features rolled out in Chat-GPT4 Turbo:
- 128k context window size
- trained on more data and world events up to April 2023
- more optimized performance
- improved JSON responses
- improved instruction following
- released to all paying developers on a limited basis
128K Context Window – What does that mean?
Context window refers to the amount of information a LLM (Large Language Model) can take as an input.
A context window of 128,000 tokens is equivalent to over 300 pages of text.
If you imagine Chat-GPT or Google Bard as a black box, we are only concerned with the amount of information we can put into it and how much information comes out of it.
The model itself does not contain any memory. It simply takes an input and processes the output. The way OpenAI and Google get the models to act as though they have memory is by feeding in the entire conversational history each time you send a new prompt.
The conversational history includes your prompts and its (Chat-GPT, Bard, etc…) responses. Together, they provide the ‘context’ of the conversation.
As a conversation with Chat-GPT or Bard gets longer, the conversational history grows. Eventually, the conversational history can grow longer than the context window. If this happens, then parts of the conversational history get excluded (it ‘forgets’).
By increasing the context window to 128K tokens or more, the models are able to take in more information to provide coherent and relevant responses. This also allows the models to examine larger documents (as text inputs) from which to generate responses from.
Google has not released the context window length for Bard.
Optimized Performance = Lower Cost
AI models can get big. Large language models (LLM) are LARGE for a reason. As the models are trained on more data, they grow in size, requiring more nodes to encode relationships.
When a user prompts the model with a question, extremely fast computers process the data in the models to generate the outputs. As the models get larger, they require more processing power which requires more electricity -> more cost.
Models can be reduced in size while still maintaining a high intelligence level. Different techniques exist to reduce model size: model distillation, model pruning, model quantization, and dataset distillation.
The goal is to reduce the number of nodes in a model which in turn reduces the memory footprint. This all translates into reduce power consumption.
In the end, if OpenAI is able to create more efficient models, they can pass those savings on to the users.
More Up-To-Date Knowledge
AI models like Bard and Chat-GPT are constantly being trained on new data to keep their knowledge levels relevant. Chat-GPT Turbo has the most up-to-date training set out of all the GPT models, including Chat-GPT 4.
For More Information
- https://openai.com/blog/introducing-gpts
- https://openai.com/blog/new-models-and-developer-products-announced-at-devday

