what happened at open ai’s devday 2024?

michael raspuzzi
6 min readOct 2, 2024

--

summary of 5 major updates | oct 2024

“the turing test wooshed past and nobody cared.” sam altman, ceo of open ai

alan turing using a smartphone at night — ar 16:9 by gpt-4o

measuring machine intelligence as ai progresses

kevin: there used to be this idea of artificial general intelligence (agi) as a binary thing […] idon’t think that’s how think about it any more.

sam: most people looking back in history won’t agree when agi happened. the turing test wooshed past and nobody cared.

kevin: at open ai the state of what computers can do evolves every 2–3 months, and then computers have a new capability that they never had in the history of the world.

the above is from the live update blog during the closed afternoon discussion between sam altman, the ceo of open ai, and kevin weil, the cpo.

when measuring a machine’s intelligence, there are three thought experiments over time:

1. 🎯 the turing test: in 1950, turing introduced a practical test to determine if machines can think. the imitation game, later known as the turing test, involves an interrogator trying to distinguish between a human and a machine based on their responses.

✅ in two months after launch, chat gpt seemed to impress 100 million people enough to pass that moment.

2. 🎯 woz’s coffee test: in early 2000s, steve wozniak, the co-founder of apple asked the question if ai could make a cup of coffee, ie. navigate a new environment, ask questions, and use a kitchen to brew.

✅ in jan 2024, figure released a demo of it’s humanoid robot making coffee.

3. 🎯the modern turing test: last summer, mustafa the ceo of microsoft ai and former head of applied ai at deepmind, asked could an ai make a $1M on a retail platform with a $100k investment?

🟡 while there are examples of people coding iphone apps to break through $1M arr (like cal ai or wave ai note taker), this question leads to the next chapter of ai: agents.

and this devday by open ai is starting to lay the foundation for more robust capabilities for developers to build with the open ai api including

  1. realtime api
  2. vision fine-tuning
  3. prompt caching
  4. model distillation tools
  5. structured outputs

more below.

summary of 5 major updates from devday

1. realtime api

healthify’s ai coach voice to function calling for healthy food recommendations (open ai)

why it’s important: makes interactions with ai smoother, especially in applications like voice-activated assistants, reducing lag. it also includes function calling.

what it is: websocket-based api for real-time voice input and output, allowing direct interaction with models via audio. in the future, it will also support multimodal.

examples of use:

  • healthify’s nutrition and fitness coaching app uses real time api for its ai coach ria for personalized support.
  • speak, a language learning app, uses realtime api for the role playing feature for users to practice conversations.

➡️ more on open ai’s website here.

2. vision fine-tuning

vision fine tuning can help with design tasks too, like matching style of a company’s brand (open ai)

why it’s important: users can customize the model to have stronger understanding of images, leading to enhanced visual search, object detection, and more accurate image analysis, like in medical settings.

what it is: similar to fine tuning with text, it helps users use image datasets to help gpt-4o perform better at specific tasks. it can work with as few as100 images.

{
"messages": [
{ "role": "system", "content": "You are an assistant that identifies uncommon cheeses." },
{ "role": "user", "content": "What is this cheese?" },
{ "role": "user", "content": [
{
"type": "image_url",
"image_url": {
"url": "https://upload.wikimedia.org/wikipedia/commons/3/36/Danbo_Cheese.jpg"
}
}
]
},
{ "role": "assistant", "content": "Danbo" }
]
}

examples of use:

  • for product recommendations through similar style or appearance
  • analyzing medical images
  • object detection for road signs and traffic lanes

➡️ more on open ai’s website here.

3. prompt caching

source

why it’s important: when using multiple api calls, the same context can be used, which helps reduce cost and latency. previously seen in anthropic’s and google’s models.

what it is: prompt caching stores the results of previous prompt responses to save processing time and reduce costs by reusing answers for identical inputs. requests using prompt caching will have a cached_tokens value within the usage field in the api response.

usage: {
total_tokens: 2306,
prompt_tokens: 2006,
completion_tokens: 300,

prompt_tokens_details: {
cached_tokens: 1920,
audio_tokens: 0,
},
completion_tokens_details: {
reasoning_tokens: 0,
audio_tokens: 0,
}
}

➡️ more on open ai’s website here.

4. model distillation tools

distillation test dashboard (open ai)

why it’s important: smaller models (like gpt-4o mini) can learn from larger models (like gpt-o1). this helps make a path to scale applications with more resource efficient models. this can also help accelerate testing complementing humans in the loop.

what it is: this includes stored completions, capturing input-output pairs, evals, and fine-tuning using the stored completions and evals.

examples:

  • superhuman’s quick reply feature, fine-tuning a smaller model for email response tasks.

5. structured outputs

source

why it’s important: improves the dependability of outputs, crucial for tasks where exact formats are required, such as api integrations.

what it is: ensures model outputs adhere strictly to a defined json schema for greater reliability.

examples

  • form filling: automatically generate structured JSON responses to fill in web forms
  • data extraction: extract specific data points from text, formatted as a structured output (e.g., names, dates, and locations). previously pdf documents may have been processed as images.
  • api integration: respond directly in a format required by a third-party api, reducing the need for post-processing

➡️more on how it works in this twitter post.

why all of this matters

these features and extended capabilities were previously unavailable through api or in some cases, only available through open ai’s chat gpt. there were workarounds, but they were tedious.

the cto of healthify said they could use the realtime api to set up their voice coach in 3 days, for what would have previously taken 3 or more weeks.

some exciting areas to look out for:

  • more dynamic front end experiences
  • better speech to speech interactions across applications, especially with function calling
  • vision fine-tuning in use cases that previously were harder to deploy

thanks for reading this far, and see you for future updates, which will most likely be something to do with the theme of this next year in ai: agents.

shoutout to Simon Willison for live blogging the whole event.

hey, i’m michael and i’m exploring an applied ai studio for solving challenges in global healthcare.

connect with me on linkedin to follow my build journey.

--

--

michael raspuzzi

building something new. previously @tks @harvard @culinaryai