OpenAI’s Deep Research Team on Why End-to-End Training is the Future of AI Agents

Training Data: Ep31

OpenAI’s Isa Fulford and Josh Tobin discuss how the company’s newest agent, Deep Research, represents a breakthrough in AI research capabilities by training models end-to-end rather than using rigid operational graphs. The product leads explain how high-quality training data and the o3 model’s reasoning abilities enable adaptable research strategies, and why Sam Altman thinks Deep Research will capture a meaningful percentage of all knowledge tasks. Key product decisions that build transparency and trust include citations and clarification flows. By compressing hours of work into minutes, Deep Research transforms what’s possible for many business and consumer use cases.

Listen Now

Stream On

Summary

OpenAI’s Isa Fulford and Josh Tobin discuss Deep Research, the company’s AI agent that conducts comprehensive online research in 5-30 minutes by searching multiple websites and synthesizing detailed reports with citations. The episode explains how OpenAI is building effective AI agents and teases some future directions across business and personal use cases.

End-to-end training beats manual orchestration. Rather than constructing a graph of operations with language models at specific nodes—the common approach to building agents—Deep Research is trained end-to-end on hard browsing tasks. This allows the model to develop flexible strategies for gathering and synthesizing information that would break if scripted manually.
Data quality is the super power. Creating high-quality training data was critical to the development of Deep Research. The team fine-tuned the model on carefully curated examples of complex browsing tasks while leveraging the advanced reasoning capabilities of OpenAI’s o3 model. This combination enables creative results.
Agents can excel at defined but flexible tasks. Deep Research demonstrates that AI agents can be trained to handle specific workflows that can’t be captured by rigid rules. The model can adapt its research strategy based on initial findings—making it ideal for tasks like market research, scientific literature review and consumer research that benefit from comprehensive and exploratory information gathering.
Trust through transparency and control. The model builds user confidence through clear citations, upfront clarification of requirements, and visible chain-of-thought reasoning. This transparency, combined with the model’s ability to synthesize information from many sources, enables users to verify conclusions while benefiting from comprehensive research they couldn’t practically do themselves.
Time compression creates new possibilities. By reducing multi-hour research tasks to minutes, Deep Research doesn’t just save time—it fundamentally changes what’s possible for knowledge workers. Users can now conduct thorough research for decisions they previously wouldn’t have had bandwidth for, from analyzing potential investments to planning special events.

Transcript

Josh Tobin: A lesson that I’ve seen people learn over and over again in this field is like, you know, we think that we can do things that are smarter than what the models do by writing it ourselves. But in reality, like, usually the model—like, as the field progresses, the models come up with better solutions to things than humans do.

And also, like, you know, the probably number one lesson of machine learning is you get what you optimize for. And so if you’re able to set up the system such that you can optimize directly for the outcome that you’re looking for, the results are going to be much, much better than if you try to sort of glue together models that are not optimized end-to-end for the task that you’re trying to have them do. So my long term guidance is that I think reinforcement learning, tuning on top of models is probably going to be a critical part of how the most powerful agents get built.

Sonya Huang: We’re excited to welcome Isa Fulford and Josh Tobin, who lead the Deep Research product at OpenAI. Deep Research launched three weeks ago and has quickly become a hit product, used by many tech luminaries like the Collisons for everything from industry analysis to medical research to birthday party planning.

Deep Research was trained using end-to-end reinforcement learning on hard browsing and reasoning tasks and is the second product in a series of agent launches from OpenAI, with the first being Operator. We talked to Isa and Josh about everything from Deep Research’s use cases to how the technology works under the hood to what we should expect in future agent launches from OpenAI.

Isa and Josh, welcome to the show.

Lauren Reeder: Thank you. Thank you so much for joining us.

Josh Tobin: Excited to be here.

Isa Fulford: Thank you for having us.

Lauren Reeder: So maybe let’s start with what is Deep Research? Tell us about the origin stories and what this product is doing.

Isa Fulford: So Deep Research is an agent that is able to search many online websites, and it can create very comprehensive reports. It can do tasks that would take humans many hours to complete. And it’s in ChatGPT, and it takes like five to thirty minutes to answer you. And so it’s able to do much more in-depth research and answer your questions with much more detail and specific sources than regular ChatGPT response would be able to do.

It’s one of the first agents that we’ve released. So we released Operator pretty recently as well. And so Deep Research is the second agent, and we’ll release many more in future.

Sonya Huang: What’s the origin story behind Deep Research? When did you choose to do this? What was the inspiration, and how many people work on it? What did it take to bring this to fruition?

Josh Tobin: Good question. This is before my time.

Isa Fulford: Oh yeah. [laughs] So I think maybe around a year ago, we were seeing a lot of success internally with this new reasoning paradigm and training models to think before responding. And we were focusing a lot on math and science domains, but I think that the other thing that this kind of new reasoning model regime unlocks is the ability to do longer horizon tasks that involve agentic kind of abilities.

And so we thought a lot of people do tasks that require a lot of online research or a lot of external context, and that involves a lot of reasoning and discriminating between sources. And you have to be quite creative to do those kinds of things. And I think we finally had models, or a way of training models that would allow us to be able to tackle some of those tasks. So we decided to try and start training models to do first browsing tasks. So using, like, the same methods that we use to train reasoning models, but on more real world tasks.

Sonya Huang: Was it your idea? And Josh, how’d you get involved?

Isa Fulford: It was—yeah, it was—at first it was me and Yash Patil, who is a guy at OpenAI who is working on a similar project that will be released at some point, which we’re very excited about. And we built an original demo. And then also with Thomas Dimson, who’s one of those people who just is an amazing engineer, like, will dive into anything and just get loads of things done. So it was very fun.

Josh Tobin: Yeah. And I joined more recently. I rejoined OpenAI about six months ago from my startup. I was at OpenAI in the early days, and was looking around at projects when I rejoined, and got very interested in some of our agentic efforts, including this one, and got involved with that.

Lauren Reeder: Amazing. Well, tell us a little about who you built it for.

Josh Tobin: Yeah, I mean it’s really for anyone who does knowledge work as part of their—as part of their day-to-day job or really as part of their life. So we’re seeing a lot of the usage come from people using it for work, doing things like, you know, research as part of their jobs, for understanding markets, companies, real estate…

Isa Fulford: A lot of scientific research, medical. I think we’ve seen a lot of medical examples as well.

Josh Tobin: Yeah. And one of the things we’re really excited about as well is this style of, like, I just need to go out and spend many hours doing something that, you know, where I have to do a bunch of web searches and collate a bunch of information is not just a work thing, but it’s also useful for shopping and travel as well.

Isa Fulford: So we’re excited for the Plus launch so that more people will be able to try Deep Research and maybe we’ll see some new use cases as well.

Lauren Reeder: Nice. It’s definitely one of the products I’ve used the most over the last couple weeks. It’s been amazing.

Isa Fulford: I’m so happy to hear that.

Josh Tobin: Using it for work?

Lauren Reeder: For work, definitely. Also for fun.

Sonya Huang: What are you using it for?

Lauren Reeder: Oh, for me? Oh my goodness. So I was thinking about buying a new car, and I was trying to figure out when the next model was going to be released for the car. And there’s all these speculative blog posts, like, there’s patterns from the manufacturer, and so I asked Deep Research can you break down all the gossip about this car and then all of the facts about what they’ve done—what this automaker’s done before. And it put together an amazing report that told me maybe wait a couple months, but this year, like, in the next few months it should come out.

Josh Tobin: Yeah. Like, one of the things that’s really cool about it is it’s not just for going broad and gathering all of the information about a source, but it’s also really good at finding, like, very obscure, like, weird facts on the internet. Like, if you have something very specific you want to know that you might not just turn up in the first page of search results, it’s good at that kind of thing too. So that’s cool.

Lauren Reeder: What are some of the surprising use cases that you’ve seen?

Josh Tobin: Ooh.

Isa Fulford: I think the thing I’ve been most surprised by is how many people are using it for coding.

Josh Tobin: Yeah.

Isa Fulford: Which wasn’t really a use case I’d considered, but I’ve seen a lot of people on Twitter and in various places where we get feedback using it for coding and code search, and also for finding the latest documentation on a certain package or something and helping them write a script or something.

Josh Tobin: Yeah, I’m kind of embarrassed that we didn’t think of that as a use case.

Isa Fulford: [laughs] Yeah.

Josh Tobin: It’s like for ChatGPT users it seems so obvious, but I know, it’s impressive how well it works.

Sonya Huang: How do you think the balance of business versus individual use case will evolve over time? Like, you mentioned the Plus launch that’s happening. In a year’s time or two years’ time, would you guess this is mostly a business tool or mostly a consumer tool?

Isa Fulford: I would say hopefully both. I think it’s a pretty general capability, and I think it’s something that we do both in work and in personal life. So hopefully both.

Josh Tobin: Yeah, I’m excited about both. I think the magic of it is, like, it just saves people a lot of time. If there’s something that might have taken you hours—or in some cases we’ve heard, like, days, people can just put it in here and get, you know, 90 percent of what they would have come up with on their own. And so yeah, I tend to think there’s—like, there’s more tasks like that in business than there are in personal. But I mean, I think for sure it’s going to be part of people’s lives in both.

Lauren Reeder: It’s really become the majority of my usage for ChatGPT. I just always pick Deep Research rather than normal.

Isa Fulford: Really?

Lauren Reeder: [laughs]

Josh Tobin: Yeah, exactly. You’re patient.

Lauren Reeder: Apparently.

Lauren Reeder: So what are you seeing in terms of consumer use cases and what are you excited about?

Isa Fulford: I think a lot of shopping, travel recommendations. I personally used the model a lot. I’ve been using it for months to do these kinds of things. We were in Japan for the launch of Deep Research, so it was very helpful in finding restaurants with very specific requirements and finding things that I wouldn’t have, like, necessarily found.

Josh Tobin: Yeah. And I found it, like, when you have something, it’s like the kind of thing where, you know, if you’re shopping maybe for something expensive or you’re planning a trip that is special, or you want to spend a lot of—you want to spend a lot of time thinking about. It’s like for me, you know, I might go and spend hours and hours, like, trying to read everything on the internet about this product that I’m interested in buying, like scouring all of the reviews and the forums and stuff like that. And Deep Research can put together kind of like something like that very quickly. And so it’s really useful for that kind of thing.

Isa Fulford: The model is also very good at instruction following. So if you have a query with many different parts or many different questions, so if you want the information about the product, but you also want comparisons to all other products, and you also want information about reviews from Reddit or something like that, you can give loads of different requirements and it will do all of them for you.

Josh Tobin: Yeah. Another tip is, like, just ask it to format it in a table. It will usually do that anyway, but it’s really helpful to have a table with a bunch of citations and things like that for all the categories of things that you want to research.

Isa Fulford: Yeah. There are also some features that hopefully will get into the product at some point, but the model is able to—the underlying model is able to embed images so it can find images of the products. And it’s also—this is not a consumer use case, but it’s able to create graphs as well and then embed those in its response. So hopefully that will come to ChatGPT soon as well.

Sonya Huang: Nerdy consumer use case. [laughs]

Josh Tobin: Yeah, speaking of nerdy consumer use cases, also, like, personalized education is a really interesting use case. Like, if there’s a topic that you’ve been meaning to learn about, If you need to brush up on your biology or you want to learn about some world event, it’s really good at putting all the information about what you feel like you don’t understand, what aspects of it you want It to go do research on and it’ll put together a nice report for you.

Isa Fulford: One of my friends is considering starting a CPG company, and he’s been using it so much to find similar products to see if specific names are already—the domains are already taken, market sizing, like, all of these different things. So that’s been fun to—he’ll share the reports with me and I’ll read them. So it’s been pretty fun to see.

Josh Tobin: Another fun use case is it’s really good at finding, like, a single obscure fact on the internet. Like, if there’s like a—you know, like an obscure TV show or something that you want to, you know, to like find, like, one particular episode of or something like that, it’ll go and—it’ll go deep and find the, like, one reference to it on the web.

Isa Fulford: Oh, yeah. My brother’s friend’s dad had this very specific fact. It was about some Austrian general who was in power during a certain—a death of someone during a battle. Like, a very niche question. And apparently ChatGPT had previously answered it wrong and he was very sure that it was wrong. So he went to the public library and found a record and found that it was wrong. And so then Deep Research was able to get it right, so we sent it to him and he was excited. [laughs]

Sonya Huang: What is the rough mental model for, you know, what deep research is excellent at today? And, you know, where should people be using the o series of models? Where should—where should they be using Deep Research?

Josh Tobin: What Deep Research really excels at is if you have a sort of detailed description of what you want, and in order to get the best possible answer, it requires reading a lot of the Internet. If you have kind of like, more of a vague question, it’ll help you kind of clarify what you want. But it’s really at its best when there’s, like, a specific set of information that you’re looking for.

Isa Fulford: And I think it’s very good at synthesizing information it encounters, it’s very good at finding specific, hard-to-find information, but it’s maybe less—and it can make kind of some new insights I guess from what it encounters, but I don’t think—it’s not making new scientific discoveries yet. And then I think using the o series model, for me, if I’m asking for something to do with coding, usually it doesn’t require knowledge outside of what the model already knows from pre-training. So I would usually use o1 Pro or o1 for coding or o3-mini high.

Lauren Reeder: And so Deep Research is a great example of where some of the new product directions for OpenAI are going. I’m curious to the extent you can share, how does it work?

Isa Fulford: The model that powers Deep Research is a fine-tuned version of o3, which is our most advanced reasoning model. And we specifically trained it on hard browsing tasks that we collected as well as other reasoning tasks. And so it also has access to a browsing tool and Python tool. So through training end-to-end on those tasks it learned, like, strategies to solve them and the resulting model’s good at online search and analysis.

Josh Tobin: And, like, intuitively the way you can think about it is you make this sort of—this request, ideally a detailed request about what you want. The model thinks hard about that, it searches for information, it pulls that information and it reads it, it understands how it relates to that request and then decides what to search for next in order to get kind of closer to the final answer that you want. And it’s trained to do a good job of pulling together all that information into a nice tidy report with citations that point back to the original information that it found.

Isa Fulford: Yeah, I think what’s new about Deep Research as an agentic capability is that because we have the ability to train end-to-end, there are a lot of things that you have to do in the process of doing research that you couldn’t really predict beforehand. So I don’t think it’s possible to write some kind of language model, program or script that would be as flexible as what the model is able to learn through training, where it’s actually reacting to live web information and based on something it sees, it has to make a—change its strategy and things like that. So we actually see it doing pretty creative searches. You can read the chain of thought summary, and I’m sure you can see sometimes it’s very smart about how it comes up with the next thing to look for or get around.

Sonya Huang: So John Collison had a tweet that went somewhat viral. You know, how much of the magic of Deep Research is real time access to web content, and how much of the magic is in kind of chain of thought? Can you maybe shed some light on that?

Isa Fulford: I think it’s definitely a combination. I think you can see that because there are other search products that don’t necessarily—that weren’t trained end-to-end, so won’t be as flexible in responding to—responding to information it encounters, won’t be as creative about how to solve specific problems because they weren’t specifically trained for that purpose. So it’s definitely a combination. I mean it’s a fine tuned version of o3. o3 is a very smart and powerful model. A lot of the analysis capability is also from the underlying o3 model training. So I think it’s definitely a combination.

Josh Tobin: Before OpenAI, I was working at a startup, and we were dabbling in building agents kind of the way that I see most people describe building agents on the internet, which is essentially you construct this graph of operations, and some of the nodes in that graph are language models. And so the language model can decide what to do next, but the overarching logic of the sequence of steps that happen is defined by a human. And what we found is that it’s a powerful way of building things to get quickly to a prototype, but it falls down pretty quickly in the real world because it’s very hard to anticipate all the scenarios that the model might face, and think about all the different branches of the path that you might want to take.

In addition to that, the models often are not the best decision makers at nodes in that graph because they weren’t trained to make those decisions. They were trained to do things that look similar to that. And so I think the thing that’s really powerful about this model is that it’s trained directly end-to-end to solve the kinds of tasks that users are using it to solve.

Lauren Reeder: So you don’t have to set up a graph or make those node-like decisions on the architecture on the back end?

Isa Fulford: It’s all driven by the model itself.

Josh Tobin: Yeah.

Sonya Huang: Can you say more about this? Because it seems like that’s one of the very opinionated decisions that you’ve made, and clearly it’s worked. There’s so many companies that are building on your API, kind of prompting to solve specific tasks for specific users. Do you think a lot of those applications would be better served by kind of having trained models end-to-end for their specific workflows?

Isa Fulford: I think if you have a very specific workflow that is quite predictable, it makes a lot of sense to do something like Josh described. But if you have something that has a lot of edge cases or it needs to be quite flexible, then I think something similar to Deep Research is probably a better approach.

Josh Tobin: Yeah, I think the guidance I give people is the one thing that you don’t want to bake into the model is, like, kind of hard and fast rules. If you have a database that you don’t want the model to touch or something like that, it’s better to encode that in human-written logic. But I think it’s kind of like a lesson that I’ve seen people learn over and over again in this field is like, you know, we think that we can do things that are smarter than what the models do by writing it ourselves. But in reality, like, usually the model—like, as the field progresses, the models come up with better solutions to things than humans do.

Sonya Huang: What were the biggest technical challenges along the way to making this work?

Josh Tobin: Well, I mean, maybe I can say as, like, an observer rather than someone who was involved in this from the beginning, but it seems like kind of one of the things that Isa and the rest of the team worked really, really hard on and was kind of like one of the hidden keys to success was, like, making really high quality data sets. It’s, you know, another one of those, like, age-old lessons in machine learning that people keep relearning. But the quality of the data that you put into the model is probably the biggest determining factor in the quality of the model that you get on the other side.

Isa Fulford: And then have someone like Edward, Edward Sun, who’s the other person who works on the project, who just any data set he will optimize. So that’s a secret to success.

Lauren Reeder: Find your Edward.

Josh Tobin: Great machine learning model training.

Lauren Reeder: How do you make sure that it’s right?

Isa Fulford: Yeah, so that’s obviously a core part of this model and product is that we want it to be—users to be able to trust the outputs. So part of that is we have citations, and so users are able to see where the model is citing its information from. And we—during training, that’s something that we actually, like, try and make sure is correct, but it’s still possible for the model to make mistakes or hallucinate or trust a source that maybe isn’t the most trustworthy source of information. So that’s definitely an active area where we want to continue improving the model.

Sonya Huang: How should we think about this together with, you know, o3 and Operator and other different releases? Like, does this use Operator? Do these all build on top of each other, or are they all kind of a series of different applications of o3?

Josh Tobin: Today these are pretty disconnected, but you can imagine kind of where we’re going with this, which is like the ultimate agent that people will have access to at some point in the future should be able to do not just web search or using a computer or any of the other types of actions that you’d want a human assistant to do, but should be able to fuse all of these things in a more natural way.

Sonya Huang: Any other design decisions that you’ve taken that are maybe not obvious at first glance?

Isa Fulford: I think one of them is the clarification flow. So if you’ve used Deep Research, the model will ask you questions before starting its research, and usually ChatGPT, maybe it will ask you a question at the end of its response, but usually doesn’t have that kind of behavior up front. And that was intentional because you will get the best response from the Deep Research model if the prompt is very well specified and detailed. And I think that it’s not the natural user behavior to give all of the information in the first prompt, so we wanted to make sure that if you’re going to wait five minutes, thirty minutes, that your response is as detailed and satisfactory. So we added this additional steps to make sure that the user provides all the detail that we would need.

And I’ve actually seen a bunch of people on Twitter saying that they have this flow where they will talk to o1 or o1 Pro to help make their prompt more detailed, and then once they’re happy with the prompt, then they’ll send it to Deep Research. Which is interesting. So people are finding their own workflows as to how to use this.

Lauren Reeder: So there’s been three different Deep Research products launched in the last few months. Tell us a little about what makes you guys special and how we should think about it.

Sonya Huang: And they’re all called Deep Research, right?

Josh Tobin: They’re all called Deep Research. Yeah, not a lot of naming creativity in this field. I think people should try all of them for themselves and get a feel. I think the difference in quality, I think they all have pros and cons, but I think the difference will be clear. But what that comes down to is just the way that this model was built and the effort that went into constructing the data sets, and then the engine that we have with the o series models, which allows us to just optimize models to make things that are really smart and really high quality.

Sonya Huang: We had the o1 team on the podcast last year and we were joking that OpenAI is not that good at naming things. I will say this is your best-named product. [laughs]

Josh Tobin: Deep Research is? At least it describes what it does, I guess.

Lauren Reeder: And so I’m curious to hear a little about where you want to go from here. You have Deep Research today. What do you think it looks like a year from now? And what maybe are complementary things you want to build along the way?

Isa Fulford: We’re excited to expand the data sources that the model has access to. We’ve trained a model that’s generally very good at browsing public information, but it should also be able to search private data as well. And then I think just pushing the capabilities further. So it could be better at browsing, it could be better at analysis. Yeah, I think short term those are things we want to improve.

Josh Tobin: Yeah. And then thinking about how this fits into our agent roadmap more broadly. Like, I think that the recipe here is something that’s going to scale to a pretty wide range of use cases, things that are going to surprise people at how well they work. But this idea of you take a state-of-the-art reasoning model, you give it access to the same tools that humans can use to do their jobs or to go about their daily lives, and then you optimize directly for the kinds of outcomes that you want the agent to be able to do. That recipe, there’s really nothing stopping that recipe from scaling to more and more complex tasks, so I feel like yeah, AGI is an operational problem now. And I think yeah, a lot of things to come in that general formula.

Lauren Reeder: So Sam had a pretty striking quote of Deep Research will take over a single digit percentage of all economically-viable tasks, valuable tasks in the world. How should we think about that?

Josh Tobin: I think of it as like, it’s—Deep Research is not capable of doing all of what you do, but it is capable of saving you, like, hours, or sometimes in some cases, days at a time. And so I think, like, what we’re hopefully relatively close to is Deep Research and the agents that we build next and the agents that we build on top of it, giving you one, five, ten, twenty-five percent of your time back, depending on the type of work that you do.

Sonya Huang: I mean, I think you’ve really automated 80 percent of what I do, so …

Lauren Reeder: [laughs] Definitely on the higher end for me.

Josh Tobin: We just need to start writing checks, I guess. Yeah.

Sonya Huang: Are there entire job categories that you think are kind of more—at risk is the wrong word, but more in the strike zone for what Deep Research is exceptional at? So for example, I’m thinking consulting, but, like, are there specific categories that you think are more in the strike zone?

Josh Tobin: Yeah, I used to be a consultant. I don’t think any jobs are at risk. Like, I don’t really think of this as like a labor replacement kind of thing at all. But for these types of knowledge work jobs where—like, where you are spending a lot of your time kind of looking through information and making conclusions, I think it’s gonna give people superpowers.

Isa Fulford: Yeah, I’m very excited about a lot of the medical use cases. Just the ability to find all of the literature or all of the recent cases for a certain condition. I think I’ve already seen a lot of doctors posting about this, or they’ve reached out to us and said, “Oh, we used it for this thing. We used it to help find a clinical trial for this patient,” or something like that. So just people who are already so busy, just saving some time, or it’s maybe something that they wouldn’t have had time to do, and now they are able to have that information for them.

Josh Tobin: Yeah. And I think the impact of that is maybe a little bit more profound than it sounds on the surface, right? It’s not just like—it’s not just like getting five percent of your time back, but it’s the type of thing that might have taken you four hours or eight hours to do, now you can do for a ChatGPT subscription and five minutes. And so, like, what types of things would you do if you had infinite time that now maybe you can do, like, many, many copies of?

So, like, you know, should you do research on every single possible startup that you could invest in instead of just the ones that you have time to meet with? Things like that.

Sonya Huang: Or on the consumer side, one thing that I’m thinking of is, you know, the working mom that’s too busy to plan a birthday party for her toddler. Like, now it’s doable. So I agree with you. It’s way more important than five percent of your time.

Josh Tobin: Yeah.

Lauren Reeder: It’s all the things you couldn’t do before.

Isa Fulford: Exactly.

Sonya Huang: What does this change about education and the way we should learn? And what will you be teaching your kids now that we’re in a world of agents And Deep Research?

Josh Tobin: Education’s been at, like, one of the top few things that people use it for. I think it’s—I mean, this is true for ChatGPT generally. It’s like learning things by talking to an AI system that is able to personalize the information that it gives you based on what you tell it, or maybe in the future what it knows about you feels like a much more efficient way to learn and a much more engaging way to learn than reading textbooks.

Lauren Reeder: We have some lightning-round questions.

Josh Tobin: All right.

Sonya Huang: Okay. Your favorite Deep Research use case.

Josh Tobin: I’ll say yeah, like, personalized education. Just, like, learning about anything I want to learn about.

Isa Fulford: I’ve already mentioned this, but I think a lot of the personal stories that people have shared about finding information about a diagnosis that they’ve received or someone in their family received have been really great to see.

Sonya Huang: Okay. We saw a few application categories break out last year. So for example, coding being an obvious one. What application categories do you think will break out this year?

Josh Tobin: I mean, clearly agents.

Isa Fulford: I was gonna say that, too.

Sonya Huang: Okay, 2025 is the year of the agent.

Josh Tobin: I think so.

Lauren Reeder: And then how do you think about what piece of content that you should recommend people to read to learn more about agents or where the state of AI is going? Could be an author too.

Sonya Huang: Training Data. [laughs]

Josh Tobin: I think it’s so hard to keep up with the state of the art in AI. I think the general advice I have for people is, like, pick one or two subtopics that you’re really interested in and go, like, curate a list of people who you think are saying interesting things about it, and how to find those one or two things that you’re interested in. Maybe actually, that’s a good Deep Research use case. Go use it to go deep on things that you want to learn more about.

Isa Fulford: This is a bit old now, but I think a few years ago I watched the—I think it’s called Foundations of RL or something like this from Pieter Abbeel. And it’s a few years old, but I think that it was a good introduction to reinforcement learning.

Josh Tobin: Yeah, I would definitely second any content by Pieter Abbeel. My grad school advisor.

Isa Fulford: Oh, yeah.

Sonya Huang: Okay. Reinforcement learning kind of went through a peak and then felt like it was in a little bit of a doldrum again. And speaking again, is that the right read on what’s happening with RL?

Josh Tobin: It’s so back. Yeah.

Sonya Huang: It’s so back. Why? Why now?

Josh Tobin: Because everything else is working. Like, I think if you—maybe people that have been following the field for a while will remember the Yann LeCun cake analogy?

Sonya Huang: Say it.

Josh Tobin: So it’s like the—I think it’s like if you’re building a cake, then most of the cake is the cake, and then there’s a little bit of frosting and then there’s a few cherries on top. And the analogy was that, like, unsupervised learning is the cake, supervised learning is the frosting, and reinforcement learning is the cherries on top.

I think when we in the field were working on reinforcement learning back in 2015, 2016, it was kind of like, I think Yann LeCun’s analogy, which I think in retrospect is probably correct, is that we were, like, trying to add the cherries before we had the cake. But now we have language models that are pre-trained on massive amounts of data and are incredibly capable. We know how to do supervised fine tuning on those language models to make them good at instruction following and generally doing the things that people want them to do.

And so now that that works really well, it’s very ripe to tune those models for any kind of use case that you can define a reward function for.

Sonya Huang: Great. Okay, so from this lightning round, we got Deep Research’s favorite AI app. Agents will be the breakout category in 2025. And reinforcement learning is so back. I love it. Thank you guys so much for joining us. We loved this conversation. Congratulations on launching an incredible product, and we can’t wait to see what comes of it.

Isa Fulford: Thank you.

Lauren Reeder: Thank you.

Sonya Huang: Thank you.

Mentioned in this episode:

Yann Lecun’s Cake: An analogy Meta AI’s leader shared in his 2016 NIPS keynote

OpenAI’s Deep Research Team on Why End-to-End Training is the Future of AI Agents

Training Data: Ep31

Listen Now

Stream On

Summary

Transcript

Chapters

Contents

What is Deep Research?

Surprising use cases

End-to-end training

Deep Research and Operator

Where to go from here?

Lightning round

Mentioned in this episode