there has not been a release as big as the iphone in 2007 until today.
sam altman, ceo of open ai, hosted their first developers’ day sharing big updates to features and plans for their ai models.
this marks one year since the chat gpt interface that broke the internet with 100M users in 2 months. they now have 100M active users each week. while it’s not technically a new unveiling, it shares major updates signalling what’s to come in the year ahead.
from a marketplace for custom gpts, like one specifically for canva workflows, to the new assistance api, this is the first major updates of many enabling ubiquitous ai tooling.
here’s a recap of the top takeaways for what was announced.
top 5 highlights, in case you missed it
1. larger context windows for a lower price.
two limiting factors for using gpt models have been size of input and the cost to run.
for the first one on limited context window for inputs: gpt-4 was previously 8k-32k. now gpt-4 turbo supports 128k tokens. that means it can take in about 300 pages of text… ie. full novels, textbooks, reports, legal documents, and more. larger inputs unlocks more robust applications.
2. chat gpt gets eyes and hands.
gpt-4 turbo will include access to the vision api. evolving from text input and text output only, gpt is now multimodal taking in audio, images, documents, and different data sources.
in terms of hands, it’s integrated with dalle-3 so image generation is seamless. one benefit of gpt v. midjourney is the textual understanding can help users design better prompts, getting one step closer to pure natural language input. rip gen 1 prompt engineers. text to speech will also be available.
👉more on the vision guide documentation here.
3. chat gets a voice.
the new text to speech (tts) integration has two different model variants.
- tts-1 is meant for real time transcription optimizing for speed.
- tts-1-hd focuses on quality.
this enables a more natural way to interact with open ai’s models. the powerful application is combining vision based inputs (like a photo of an environment), with a reasoning based questions (chatting with it), to voice output. we’ll start to see spatial gpts everywhere.
4. as apple has the app store ($80B annualy) open ai launches its gpt store (potential for $T+ annually).
different developers can move beyond plugin mode and into full application mode.
this brings together custom features released recently, like systems level prompting, as well as custom data integration.
some example gpts demo’d:
- personal gpt to answer any questions based on personal knowledge graph
- canva gpt for making a prototype of a poster and clicking out into editor
- code gpt with code.org’s curriculum data integrated into the chat interface
thinking in the oprah meme: you get a gpt. you get a gpt. and you get a gpt.
5. the assistants api will integrate agent like experiences onto any platform.
from natural language based data analysts to different smart assistants, this new api will enable platforms to have custom integrations with the back end power of open ai.
the more impressive thing is their tooling will make it easier to deploy solving challenges in state and prompt management or even limited capabilities.
four main features for assistants api include:
- threading: persistent history in conversation
- retrieval: built-in search
- code interpreter: a working python interpreter in a sandbox environment
- function calling: ability for multiple functions to take place
they demo’d what a travel booking assistant would look like on a travel website. it can:
- remember the conversation from chat history
- import and interpret documents like flight details or airbnb receipts
- analyze those documents for a multi step function like splitting a tab per day including currency conversion
- updated knowledge for world events from september 2021 cutoff to april 2023
- improved function calling to call multiple functions in one message to always return valid functions in json mode
- model outputs can be deterministic with reproducible outputs beta ultimately reducing hallucinations
implications and applications
these updates increase the access of different ai models with less dev time to set up, support, and iterate. teams will be able to build more robust applications with this new tooling.
there will be an explosion of co-pilot this and smart assistant that ensuring truly personalized journeys on different platforms and unlocking new irl experiences.
- multi-modal (image and voice) and multi-model integrations will mean new physical interfaces and experiences in ways that we haven’t seen before with glass screens that fall flat in interactivity with their surrounding environment. let’s see what johnny ive can do here.
- whole new ecosystem to trade services faster and cheaper. the gpt store will enable a new generation of gpt developers who ensure that knowledge and content keeps getting distributed. in an education use case, rather than buying the latest text book or cram to study for an exam, every student will have a personalized bio or physics tutor. cramming will be one use case to get students started, but exploring curiosity will keep them learning.
- smaller teams can implement and iterate faster. in startups, it will be an advantage to keep the team small while building out agents for each use case to prove different concepts. the marketing gpt can generate its own social media campaigns, understand its own data, and iterate accordingly.
- 📝 full blog post release here
- 📽️ sam altman’s full 45min keynote here or⚡2min summary of everything you missed
bonus: if you’re in the sf bay area, jump in to the ‘emergency’ hack to build and experiment.
or if you’re somewhere else ^ copy the idea and host your own local hackathon.