Google DeepMind’s Logan Kilpatrick: Why the Model Eats the Harness
The entire startup ecosystem is racing to build agent harnesses. Logan Kilpatrick, who leads Google AI Studio and the Gemini API, argues that scramble has a roughly 12-month shelf life. Models will absorb the scaffolding and run it natively, so the edge moves elsewhere. Google’s own bet runs in parallel: a single agent harness, now called Antigravity, has become the connective tissue across search, the Gemini app, Cloud, and AI Studio. Logan unpacks Omni, the single model built to replace multiple separate systems. He makes the case that coding already feels like narrow superintelligence, and that “jagged” vertical superintelligence (in math, finance, and science) will arrive well before AGI. His throughline: AI is an accelerant for human ambition, not a substitute for it.
Watch Now
Transcript
Chapters
Intro
Logan Kilpatrick: I, as a human developer, feel like I have more agency in the world. I feel like I can tackle more ambitious problems. I feel like I used to kick around ideas and they were, like, slightly out of reach. And I would just be like, “Ah, wouldn’t it be nice?” And now I have the opposite problem, which is I’m kicking around an idea and I’m like, “I could probably make this even more ambitious.” And sort of it does—it adds a different layer of burden actually, because I’m like, “Oh, I can’t just do the sort of MVP of this.” I actually need to go 10 steps further because the technology enables me and resetting my level of ambition, I think, is something that I’ve also spent a bunch of time thinking about.
Main conversation
Sonya Huang: I’m delighted to have Logan on the show. Logan runs Google AI Studio and the Gemini API. You spend a lot of your time thinking and building for the next generation of builders.
Logan Kilpatrick: Yes.
Sonya Huang: So I’m excited to talk to you about everything from agentic AI to AI coding, world models, and more today, and right off the heels of Google I/O. So what better timing?
Logan Kilpatrick: Yeah, I’m super excited. Thank you for having me.
Sonya Huang: Wonderful. Let’s start with agentic AI. So Sundar opened I/O by calling this the agentic Gemini era. What does agentic AI mean for Google?
Logan Kilpatrick: Yeah, it’s a good question. I think—and we sort of, if you followed closely, we did sort of mention some of these things back with Gemini 2.0, which I think was, like, a little bit early. And so I think this era, this Gemini 3.5 era feels like it’s actually becoming true now. And we’re in the era of agentic coding, or agentic products and everything agents as far as Gemini goes.
I think for us, this agentic layer—and I think we announced this actually at I/O—sort of being powered by the Antigravity Agent Harness, is this additional through line for Google that sort of connects all of our products that they’re sort of based on now. And so historically, like, prior to Gemini, there actually wasn’t a through line for the probably sub-hundred number of Google products that we have, the 50 Google products we have. There wasn’t a through line. We had Gemini, it became this through line. Everything is now sort of using Gemini in some way. That’s now becoming true for Antigravity as sort of all of the products rebase, to become sort of like agentic native products and, like, actually taking action on behalf of users and helping them get things done. You see this new through line emerging, which I think is actually really, really interesting.
Sonya Huang: And sorry, help me with Antigravity is the IDE, right? Or the non-IDE.
Logan Kilpatrick: Yeah, Antigravity is a lot of things, which I think is sort of an opportunity for us. You have sort of a core IDE, you have sort of the agent-first experience if you want it on the web, you have a CLI, you have an SDK. So I actually think—and I don’t know how much we’ve framed it this way, but, like, it really is an ecosystem of stuff that we built and it’s designed to sort of meet developers wherever they are. So you could use it through the Gemini API if you want to and you want a managed agent that you don’t have to do any of the sort of infrastructure work for. And then the most interesting bit is it’s not just the ecosystem of Antigravity stuff. Literally, it’s the same harness that is actually powering all the other Google products. So Antigravity will be powering a bunch of agent stuff in Search, in the Gemini app, across Cloud and AI Studio, which is really exciting.
Sonya Huang: I see. So it used to be the Gemini API. So, like, the language model was the through line in terms of how AI gets baked into every Google product.
Logan Kilpatrick: Yeah.
Sonya Huang: And now it’s not only the API, it’s the coding harness.
Logan Kilpatrick: Exactly.
Sonya Huang: That’s being used in each of these products and therefore it’s the coding agent itself that’s driving more agentic properties.
Logan Kilpatrick: Yeah.
Sonya Huang: Inside the products.
Logan Kilpatrick: Yeah.
Sonya Huang: Is that a fair description?
Logan Kilpatrick: Fair description. I think more generically, too, it’s just like, it is the agent harness. I think coding as sort of like a specialized use case of the agent harness, I think is obviously powerful, but coding has proved to be the general purpose agent harness in addition to also working really well for coding.
Sonya Huang: Are agent harness and coding harness synonymous or not?
Logan Kilpatrick: There’s definitely nuance. I think there’s, like, optimization that you can squeeze out of specializing. And actually you see this where technically the agent harness that gets used for the way that AI Studio uses it is, like, a little bit specialized for the vibe coding use case. And the way that the Gemini app is using the agent harness is a little bit specialized for the sort of consumer always-on 24/7 agent. So I think you have that base harness, that probably has, like, 80 percent of the same stuff, and then you specialize for coding or for whatever the use case is.
Sonya Huang: Interesting. How do you think about the cannibalization of the existing business, especially now that you are, you know, going much more aggressively into agentic properties? Because I could see, for example, if all you’re doing is search or summarization, there’s not as much of a cannibalization fear, whereas if you’re actually going through my emails, replying to them for me, am I even going through my email anymore? And so I could imagine that there’s actually just fewer human eyeball hours on your products as a result of having more agent capabilities. Is that fair? Or how do you think about the cannibalization?
Logan Kilpatrick: Yeah, it’s interesting. I think one sort of observation I have is that, like, at the beginning—and I think Sundar has done a great job of sort of talking through this, is at the beginning of the current AI era, everyone assumed that AI being able to answer questions for you was going to be, like, negative sum for search. And actually what’s ended up happening is it’s been an incredibly positive sum for search. Like, people are searching more, people are doing more.
Sonya Huang: And agents are searching, too.
Logan Kilpatrick: Yeah. And agents. Actually, again, there’s this whole market that spawned at the same time that agents are doing more, at the same time that humans are also searching more. And so I think it will be—obviously there’s a finite amount of human time in the world. But from like my early feelings of how a lot of this is playing out, it does feel like it’s very positive sum from an ecosystem value creation, like how the human behavior aspect of it turns out, I think is somewhat clear in the next one to two years, much less clear three to five years from now when the technology is improved and the products probably look a little bit different than the way that they do. But ultimately, like, that is the success of the product. I think we have a bunch of conversations with Demis all the time and it’s like, the point of building the technology is so that it can go and do stuff for you. Like, success for Google, probably doesn’t look like, you know, maximizing eyeball time in front of our products. It’s maximizing outcome for customers to do the thing that they want to do so that they can go and live their life and do what they want. And so I feel like you’ll probably see us go down the route of maximizing outcomes for customers and not maximizing eyeballs.
Sonya Huang: Yeah. I have this term stuck in my head: agent-led growth. It seems to me—so I’m using coding agents a lot in my personal time and I just let the agent make all the infrastructure choices for me. I’m like, “I don’t care which database, you tell me.” And the reason I ask is, it’s true in coding today. I would imagine it’s maybe going to be generally true for a lot of things, let’s say shopping down the line. How do you think that’s going to change how advertising works, how value capture works for the aggregators?
Logan Kilpatrick: It feels like it’s a very similar trend. This isn’t perfectly true, but a lot of these things are just proxies of each other. The way that SEO works, I think, is directly correlated with the way that—I forgot what the term is now for it—it’s like GEO, the generative engine optimization or whatever it’s called. And so it does feel like there’s a lot of correlation between the things. My guess is it looks like much less of a radical shift than maybe what we assume right now, just because these things compound on top of each other.
Sonya Huang: If you were to grade the scale of agenticness in terms of crawl, walk, run, where are we in terms of how agentic the Google suite of products is?
Logan Kilpatrick: Yeah, that’s a great question. It’s definitely crawl right now. And I think some of this is—all of the inherent product tension for Google is you have, what, 13 billion+ user products? And so I actually think we have some more labs-like experiences where you’re probably closer to running or walking. But I think most of the product experience today is definitely closer to crawling. And I think that’s just the stewardship responsibility we have sort of building a product that’s being used by lots of people. I don’t think the long tail of customers are ready to have AI running and just doing all the things. They probably want to be in the driver’s seat. They’re cautiously taking the first step. And I think Search is maybe the most quintessential example of this. I think they have a lot of responsibility to actually do that in a way that it brings people along and doesn’t just change everything of how they interact with the internet and the way they associate with products and stuff like that.
Sonya Huang: Which products do you think are closest to the walk?
Logan Kilpatrick: That’s a good question. I think Gemini app is definitely closest to walk. And so for Spark, I think having a 24/7 always-on agent, literally going and potentially doing a bunch of actions on your behalf. Is definitely one of the frontier use cases. And I think you’ll see Antigravity is another one where you can have autonomous coding agents rebuilding operating systems and doing billions of tokens and spending thousands of dollars on your behalf. And those are again more—and actually they’re in GDM as well is another angle of this. So I think GDM is taking very much a frontier look at this, where I think the rest of Google’s products are more incrementally getting there, which again makes reasonable sense to me.
Sonya Huang: Yeah. Do you think that Google ends up with one, two, three product surfaces for using AI or thousands?
Logan Kilpatrick: It’s tough. I think a lot of this is actually baked in just how humans consume products. And my sense is that there’s something nice about having this compartmentalization and this specialization of products where it becomes—if you end up with a product that is doing everything for you, inherently there’s more work involved in using that version of the product I think would be the default state. I think maybe somebody will spin together the truly magic experience that doesn’t make that true. But I think the long tail of folks end up having to spend more mental energy and more time to actually get the general purpose product to do the thing that they actually want to do versus, like, there’s something nice about, I click my calendar app, it just shows me my calendar. I don’t need to worry and deal with anything else.
Sonya Huang: This is my hot take for why slide decks have existed for so long, of just, you know, the thing, the piece of information, you want it to be exactly in the same place. And I think we as humans are just actually very used to that, as opposed to the idea of a generative interface sounds so cool to me, but it’s like, do our brains really—isn’t that just more cognitive overhead for us?
Logan Kilpatrick: It definitely is in certain cases. And I think somebody needs to—again, there’s a lot of incredibly smart people in the world, and so maybe somebody will find the experience that makes it feel more natural. But to me right now, I’m maybe not—10,000 is the extreme version. I’m guessing it looks more like more products going after sort of different, or—and maybe the other answer is like, I don’t know what it looks like for Google. For the ecosystem, it looks like a lot more products, I think. And that’s really exciting. I think how Google will end up strategically deciding, do our customers want to deal with us having 10,000 products, or would it be better to only have three, will come down to a strategic decision for us.
Sonya Huang: That totally makes sense. When I talk to companies in the enterprise, they say everyone’s talking about agentic AI, but the only place they’ve seen agents really working is coding agents. Do you agree or disagree with that take?
Logan Kilpatrick: Yeah, I think it depends what your bar for working is, which I think is a lot of the nuance of this. I think if you’re truly trying to offload very complicated tasks for domains in which the models haven’t actually crossed the threshold of quality, then I think that’s definitely true. It’s not going to solve the problem, but this is something that I wish we could measure.
A good example is OpenRouter, for example, which is measuring the total token consumption that’s happening. And so you can sort of see these trends play out over time of how much more intelligence is in the world now versus a year ago. In parallel, the thing that I’m actually really interested to measure is, like, how long is the average agent run or the average task actually taking place? And I don’t think it’s something that they publish, but I feel like they probably have interesting data. And I’m sure there’s others. Because I do think you’re seeing these new model capability lands or new model drops, and it’s spiking up. And maybe the curve is still very low right now, but you’re seeing those early signs of it spiking upward to, like, long-running tasks. And all the model labs are talking about, “We’ve released this new model and it did three days of autonomous work” or whatever it is. That’s the extreme. But I think in practice you’re seeing that trickling up pretty quickly, which is really interesting.
Sonya Huang: Yeah.
Logan Kilpatrick: So even if the enterprises haven’t felt it outside of coding, they are going to this year, as sort of a bunch of those other use cases get much better as well.
Sonya Huang: From the DeepMind perspective, do you think long horizon agents is a KPI that matters? Is it the KPI that matters?
Logan Kilpatrick: It definitely matters. I think for DeepMind, we’re doing lots of things which we can talk more about later. There’s a huge portfolio of different bets that are taking place. Long-running agents obviously matter a lot. And I think also specifically coding agents and that matters a lot. Like, it clearly is an accelerant of every other part of your business if you have a great coding model. And so making sure we have that, I think, is super top of mind.
Sonya Huang: Got it. I’d love to shift gears a little bit and talk about coding.
Logan Kilpatrick: Yeah.
Sonya Huang: Okay, I’m going to ask a hard question. A lot of my developer friends were using Claude for a long time. OpenAI saw that, declared code red. Codex is now really good. I would say my friends are maybe split 50/50 now in using Claude and using Codex. I don’t hear a ton of them using Gemini, which has always kind of puzzled me. What’s going on with that?
Logan Kilpatrick: Yeah, it’s a great question. I think there’s one part of the story that I’ll add, which makes it even more interesting, which is in December, the narrative was that Google had won. And when we landed Gemini 3, I think it was such a profound improvement from a model capability perspective. I think a lot of the narrative was, Google has taken a huge leap forward and made that happen. And I think what was interesting to see sort of as an ecosystem participant is not how quickly that narrative shifted, but just the next wind of the narrative obviously was all the agent encoding stuff that happened over the holidays and then into January and beyond. And that was not that long ago. And so it’s a—it is a …
Sonya Huang: I feel like we’ve been in warp speed ever since.
Logan Kilpatrick: Yeah, for sure. But it’s a meta reminder of just how fast things can change. I think the observation is not unreasonable. I do think that what’s happening behind the scenes for us is trying to push the frontier as fast as possible on coding. And so I think Antigravity actually is an important part of that. I think one of the takeaways is that it’s actually really hard to make a great coding model for this developer use case of really long-running suite work if you don’t actually have a product that does that. And so I think Google realized that. That’s why the windsurf deal happened. That’s why those folks came over and then ultimately built Antigravity. And we’ve been using it internally, actually, and Sundar showed this at I/O, just the graph of growth of token consumption inside of Google.
So you need that engine to spin. And the meta comment again is, the engine is spinning. It takes time in order to actually make model progress. But I’m super confident. I think the group of folks who we have working on Code is—I describe it as the Avengers of AI internally.
Sonya Huang: [laughs]
Logan Kilpatrick: And so it really is some of the best people inside of Google trying to push the rock up the hill on this stuff and taking it super seriously and trying to push. And I think 3 Flash, notwithstanding some of the conversation about the price and stuff like that, is sort of a step towards actually starting to bring a lot of these capabilities and, like, the fruits of that labor paying off. It’s a Flash model that’s better than any Pro model we’ve ever released from a coding standpoint—and the Pro models were really good before. So there’s another thread of this also, which is everyone forgets that there’s pre-training windows. And I wonder, somebody should track this online, which would be interesting to see.
Sonya Huang: Meaning like the big run, like what clusters have been available and …
Logan Kilpatrick: Exactly. The big runs are an interesting thread of this. And so it might look from an external perspective that, like, oh, you’re super behind in some way, and actually you miss all the context of where the big runs are and where the large pre-training runs are. So I think that also obviously pre-training has historically been a massive strength for DeepMind. We have some of the best people in the world, and so excited to see the fruits of that labor and everything else that’s happened. Like, 3.5 Flash was all post-training gains, which is really cool. So a huge testament to the work that that team did to actually make the level of gains and surpass the previous pro model literally just with post-training, which is awesome.
Sonya Huang: How religious are you all about dogfooding internally? For example, are DeepMind folks still allowed to use other models, or is it like, you guys are using the Gemini harness now and we have to make this really, really good?
Logan Kilpatrick: Yeah, I think it’s so healthy to be using other models just because it’s sometimes hard to actually grok what’s happening in the ecosystem if you’re not. So I use all the models, I use all the products. I think folks across the rest of DeepMind are doing the same thing. You definitely have to use the Gemini models, though. It’s just great from a feedback flywheel perspective. And it’s part of how they get better is DeepMind has, and Google more broadly has, like, 100,000+ incredible engineers who are using the models and giving feedback. And it should be a competitive advantage for Google, because we have that scale of engineering resources and the depth of the talent and can run AB tests and live experiments and all that stuff. So I think you have to use all the models, but I think for the majority of folks, it’s Gemini as the daily driver, which is great.
Sonya Huang: Do you believe in this narrative around a soft takeoff of once you have a good enough agentic coding model then it accelerates the pace of research progress, and it’s a self-reinforcing cycle?
Logan Kilpatrick: It seems obvious that that’s true, but maybe I’ve drank too much Kool-Aid that that’s the case.
Sonya Huang: Are you seeing the signs of it yet?
Logan Kilpatrick: Yeah, I mean, you definitely see some signs of this. I think the signs that are still early is doing this from a model perspective. And I think part of the context of that is, like, the resource allocation for some of these larger training runs is just significant. And so you definitely still have a human in the driver’s seat of making those decisions because you’re not going to accidentally take 10,000 TPUs to go kick off some job that actually doesn’t make that much sense. But from a product perspective, you for sure see it. Like, I think we’re seeing this on our team. We’ve built mobile apps using Antigravity, and we’ll launch them to the world faster than I think any team at Google has ever built a mobile app. Josh’s team did this with the Gemini macOS app, and sort of end-to-end delivered an app faster than any team had ever delivered a Mac app at Google. And it’s because of agentic coding. And so it’s great from a product perspective.
Sonya Huang: I think you’ve said in the past that if you could have a system that could build anything with code, humans can’t compete on the same level, and that’s narrow superintelligence. Do you think we’ve reached that point?
Logan Kilpatrick: It is interesting. I think this narrow superintelligence example is interesting to see how—obviously it kind of feels that way for coding right now, where coding is just so good that it does kind of feel like narrow superintelligence. I don’t know, it depends how you actually end up deciding the details of quantifying this. But I think the important thing is, to your point earlier, it works incredibly well for code. And so it would be great if it did a bunch of other things, but it’s actually just so impactful that it can be great at code. And so I spend a lot of time just letting that fact sort of just wash over me because I think it’s—obviously building AGI is super important and very interesting, but building AGI, if it sort of takes away from the story of the current present capability of the technology, I think is actually a bad trade-off. And so I’m trying to always hold these two things in my head equally at the same time, which is we need to build general purpose technology, but obviously it’s so impactful to have this thing.
And it feels like it hasn’t taken away sort of—it’s been one of the best positive outcomes is that I feel like it hasn’t tak en away from human developers. It really does feel like an accelerant of what human develop—like, I as a human developer feel like I have more agency in the world. I feel like I can tackle—this is my personal experience—I feel like I can tackle more ambitious problems. I feel like I used to kick around ideas and they were slightly out of reach, and I would just be like, “Ah, wouldn’t it be nice?” And now I have the opposite problem, which is I’m kicking around an idea and I’m like, “I could probably make this even more ambitious.” And it adds a different layer of responsibility, or some different layer of burden actually, because I’m like, “Oh, I can’t just do the sort of MVP of this.” I actually need to go 10 steps further because the technology enables me, and resetting my level of ambition I think is something that I’ve also spent a bunch of time thinking about. But I think that will happen in other vertical superintelligence domains, which will be interesting. And it feels like we’re going to get a bunch of those before we’ve solved—it’s almost like jagged superintelligence, I think, is what we’ll end up with.
Sonya Huang: What verticals do you think we’ll get superintelligence at next?
Logan Kilpatrick: That’s a great question. I do spend a lot of my time—too much time, probably—thinking about coding these days. So I’ll think for a second of the other domains. I think part of this is, like, things that have better verifiability obviously are the ones where you’ll see the gains happen more quickly. So things with math and finance—actually, science could be a really interesting one. It would be fascinating to see some of these domains where there’s some level of verifiability actually really start to take off. Which would be cool. And I also think an important thing in this broader narrative about just what impact AI is having on the world. You almost want that to be the case. In the sequencing of things that work, you want a lot of these really, really good impactful positive things for the world to happen as early on as humanly possible so that folks understand what the potential positive impact of the technology is. So I think science could be a really interesting one. Yeah, obviously there’s all the stuff happening right now with math proofs and stuff like that, which I’m not a mathematician, so it’s somewhat over my head, but …
Sonya Huang: I saw a great tweet the other day. “Why did Eidos have so many problems?”
Logan Kilpatrick: Exactly. That’s a good one. I like that. That’s a good t-shirt.
Sonya Huang: So funny. Okay. But speaking of Twitter, I went through your Twitter before this, so I’m going to read back another tweet at you. The good thing about Twitter is there’s a public record of all your predictions.
Logan Kilpatrick: I need to turn on that auto tweet deleting feature or whatever it is.
Sonya Huang: Last October you tweeted, “Everyone is going to be able to vibe code video games by the end of 2025.”
Logan Kilpatrick: Yeah.
Sonya Huang: Did that end up being true?
Logan Kilpatrick: It feels close. And I mean, obviously not AAA games, like you’re not building the next Call of Duty or GTA yet. But I think it feels closer than it’s ever been. And actually, a lot of the interesting bit about video games is you actually need to end up building a lot of this other stuff. Like models—and we were talking off camera before this—Three.js is a great example of this. Three.js makes a lot of things possible that weren’t before. But there’s still all these rough edges that, like, just a coding agent doesn’t solve. And so you need sprite generation, and the models aren’t very good at doing that natively, and so you need some orchestration layer and tooling in order to make that happen.
There’s a bunch of other things like that that are core to the gaming video game experience, that need to have a high degree of reliability, that I think it feels like it’s within reach, but actually requires a lot of product scaffolding work in order to create experiences that are reusable and replayable and have the level of depth and requires a little bit of taste in there.
Sonya Huang: Do you see people making a lot of video games inside AI Studio and the other developer surfaces that you have?
Logan Kilpatrick: Yeah. And so this was actually based on us looking at the early data, and there was something like, in AI Studio at the time, it was 20 percent of all apps that folks were making were actually games. Like, people trying to build games.
Sonya Huang: Is that the most popular category?
Logan Kilpatrick: It’s not the most popular category anymore, just because I think the ecosystem has shifted and the user base has shifted. But it is a lot of games.
Sonya Huang: What is the most popular category?
Logan Kilpatrick: I think it’s like 20 percent finance-related stuff. 20 percent …
Sonya Huang: Wow, people like counting their money that much?
Logan Kilpatrick: People—I think it’s something around crypto, actually, I think is what people are doing a lot of stuff with with finance. A lot of personal productivity things and a lot of gen media stuff actually, because obviously the Google suite of gen media stuff has done a great job. But I also think GDM has sort of—obviously Demis cares a ton about games, and sort of started his career in doing AI stuff because of games. And so I think we’ll have some interesting swings at this. And our team actually in Kaggle, which is a bunch of the AI benchmarking stuff we do in GDM, sort of works with GDM to build this game arena, which is sort of our way of testing progress towards AGI, using games as a proxy, which again is very deeply rooted in GDM’s history.
Sonya Huang: How close do you think we are to, you know, rando off the street with a good idea can vibe code a really fun playable game?
Logan Kilpatrick: I want to say this year. I think the model capability makes it possible. I think this is where I’ve gotten excited on the product side. And again, we were also talking off camera about the startups in this ecosystem, because it feels like it’s possible. It doesn’t feel like there’s a gap in model quality. It feels like there’s a gap in someone who knows what it takes to build a great game actually putting the scaffolding together in the right way to make that possible. And I think there are folks who are doing this right now. And so some of it is a discoverability and awareness thing that people just don’t even know that they can do that. And some of it is maybe certain categories of model capabilities are just slightly off and we’re weeks or months away from that chasm being crossed and then it just working for most people.
Sonya Huang: And so this is a good segue into—I want to ask you about world models next, but do you think vibe-coded video games are more likely going to be game engine plus coding agents based, or do you think it’s more likely to be world model based?
Logan Kilpatrick: Yeah, I think what will end up happening is the definition of world models will blur, which we should talk about with Omni. And it will still—I think the coding agent will look like some sort of world model-type system. But you actually do need to make world models useful for real things. You need, like, scaffolding. And so I think there’s actually a bunch of interesting startups doing work figuring out what is the scaffolding for world models so that you can take them from these very open-ended—inherent design of world models, very open-ended spaces, and do it in a tangible way so that it’s grounded in a use case that you could use in a recurring way. And somebody maybe will figure out the scaffolding for world models to make games possible, but the inherent nature of world models right now, I think, makes it so that it’s actually not well suited for games in the current form. But the progress has been crazy. So who knows, maybe in, like, two years the versions will be able to, but at least in the short term, it’s coding agent plus some sort of game engine, I think, is where you’ll see way more alpha from a games perspective.
Sonya Huang: That makes sense. Okay, so you said the definitions of world models are blurry. Can we unpack that?
Logan Kilpatrick: Yeah. I mean, I think Omni is an example of this. We launched this at I/O. You can sort of take in any input, create any output. And I think Demis sort of framed it to the world—rightfully so—as a world model because of just the level of understanding that it has of the world. I think that technically looks different than—and I’m not an architecture expert on the way that we’ve done world models before, but it is different from an architectural standpoint than what’s happened in the past, which I think is positive because it’s getting closer to some of the ways in which it might actually be more scalable. And historically, it’s been super not scalable. It’s very, very expensive to run traditional, online world models.
Sonya Huang: Yeah. Like Genie being …
Logan Kilpatrick: Like Genie. Exactly.
Sonya Huang: Okay. So if you think of traditional world models as being like an action-conditioned video model almost, then right now when we say “world model,” what we actually mean is a model that has some understanding of the world as opposed to being strictly technically an action-conditioned video model.
Logan Kilpatrick: Yeah, and so the interesting thing, though, is it has an understanding of the world, but then it also has that really great—and that’s where the line is blurry to me, where it’s like it can do a lot of those same use cases. It’s not real time right now, but it can do a lot of those same use cases that you would describe or visually could create with that same exact world model, which I think is what’s most interesting to me. So I do feel like this world model-video model thing is gonna change and play out in a different way than was obvious before.
Sonya Huang: And how does it work under the hood? Like, whatever you’re able to share, is it Gemini plus video models? Is it something different entirely?
Logan Kilpatrick: [laughs] It is a single model, which I think is the important part. This was actually part of the original desire, was, like, you were training eight different models to do all of those things historically. It’s like you have a text model with the baseline Gemini model, you have audio, you have music models with Lyria, you have Nanobanana, you have VEO video models, we have a whole suite of audio models. And it would be great for us, our customers, if you just had a single model to do all those things. So it is a new setup that sort of makes that possible. It’s not routing to a bunch of different models, which you could have imagined we could have done something like that actually before and done a Gemini Omni model, but this is a true Omni model. And it’s starting with the use case that works the best right now, which is why it’s the one that’s available is this video editing capability. Technically it’s functional with the other things, it’s just the quality isn’t perfect. And it’s not state of the art, so we haven’t rolled that out yet. It’s also just the first crank of the model turn on Omni. It’s the Omni Flash model, the first iteration. And so we’ll have much, much more capable, powerful versions, which will be exciting to see.
Sonya Huang: So we could edit this set so it looks like we’re on a …
[CROSSTALK]
Logan Kilpatrick: Yeah, we should. Again, we were talking off camera, we should do that for the intro, because I think it just makes all this stuff more capable. And I’ve seen these examples of such subtle nuance that make me appreciate that it’s like the world understanding playing out. I was giving a talk, and was on stage with my friend Tulsi, who leads the model team, who I don’t know if you’ve ever had on before, but she’s amazing. I love Tulsi. And I had mentioned to someone in the crowd to edit the video, and they literally took the picture, edited it with Omni in real time. And this dog came on the stage and in the edited version, the other guests sort of look down and see the dog. They chuckle a little bit. This is while I’m opining about whatever AI nonsense.
Sonya Huang: They’re laughing at your jokes.
Logan Kilpatrick: Yeah, it was not my jokes.
Sonya Huang: [laughs]
Logan Kilpatrick: They laugh at the dog coming up. It jumps onto my lap. I sort of acknowledge the dog. I keep talking, I’m petting it or whatever. And there’s so much subtlety in getting that right. And the model crushed it. And it’s very interesting, and still trying to absorb and digest what that means for the way we make content and all these other things.
Sonya Huang: That’s so interesting. I’m the biggest bull on generative media and what it means. And one of the things we’ve thought about for our podcast is the visuals matter as much as the content.
Logan Kilpatrick: For sure.
Sonya Huang: That’s how you catch people’s attention in the first place, right? And so okay, I’m excited to play with Omni.
Logan Kilpatrick: I’m excited, too. And I think you probably feel this way as somebody who makes content, but I’ve historically been very—for myself personally, like, I don’t use AI to make any content that I produce. It’s all my words, it’s always my voice, it’s always my image and picture showing up. I feel like there’s just so much alpha and authenticity. And so I would much rather it be me than some AI version of me. What I like so much about Omni is that it’s not changing me. It is changing a bunch of these other bits, which are not me. Like, I didn’t choose any of the set around us or the coffee table. So our words can stay the same and you can change these bits that are not personal and do something more interesting with them, which I think is really, really cool and feels—it feels like the version of what I want gen media to be, which is not a bunch of AI avatars.
Sonya Huang: No Fruit Island videos?
Logan Kilpatrick: Exactly.
Sonya Huang: [laughs]
Logan Kilpatrick: Truly. Like, it really is the original content. It’s the person. The personhood is there, it’s just different and amplified.
Sonya Huang: Super interesting. Okay, I’m excited to play with it.
Logan Kilpatrick: Yeah, we should send some prompts right after this and try some things.
Sonya Huang: I don’t mind the fruit videos, though. I’m happy for a world of both. On the coding side, you launched the ability in AI Studio for people to vibe code Android apps.
Logan Kilpatrick: Yeah, yeah.
Sonya Huang: I’d love to hear how that’s going so far and where you plan to take that.
Logan Kilpatrick: Yeah, it’s super exciting. I think one of the strategic things for AI Studio—and actually this is based on, like, a lot of the feedback from the ecosystem and actually from developers, from others. Like so many Google products. There’s so many different ways in which you touch Google through all these different journeys of building a startup or bringing an idea to your life. And so we have this first-class principle of how do we bring things into AI Studio that make it so that you are exposed to other parts of the Google ecosystem without having to go through nine different UIs across Google? And so Androids are a great example not only of that, but also of enabling people who wouldn’t have otherwise built an Android app. And so I literally built my first Android app in AI Studio. Very cool to see.
Sonya Huang: What is it?
Logan Kilpatrick: Yeah, I just did a …
Sonya Huang: Crypto app? [laughs]
Logan Kilpatrick: Not a crypto app, just a plant one. I was planting trees in my backyard.
Sonya Huang: Oh, like a gardening app. Okay.
Logan Kilpatrick: Yeah. And so it was just playing around with a gardening app as I was kicking the tires. I haven’t had my breakthrough idea yet of what I want for a mobile app, but I’m going to come up with something and see. Go compete on the App Store.
Sonya Huang: Have you seen anything live coded really fly in the App Store yet?
Logan Kilpatrick: That’s a good question. It’d actually be interesting to see some analysis. I don’t know. I’m sure it’s, like, accelerating a lot of things on the App Store, but I don’t know how much. Like, I don’t know anyone personally who’s done that. It is interesting, and I was going to make the observation, too, that I think the last time I checked the numbers—we were viewing it this morning—it was 350,000 Android apps built in AI Studio since last week, which is crazy. And excitingly, it’s 350,000 apps that probably no one was going to build before. A lot of these are personal, too. And so this is where I think maybe Gen UI is farther out there, but I think the idea of you building software to solve your personal problem is very real right now. And people are doing that. It’s one of the most common use cases of a lot of these products. And being able to unlock a bunch of the native capabilities of the phone, I think is also really interesting, because you just have so much context that’s in different places. So I’m getting very excited about that opportunity, and Android feels like it’s becoming the platform for builders.
Sonya Huang: Does it matter that something’s an app versus just like the web is so powerful now?
Logan Kilpatrick: Yeah, it’s also very interesting to see that play out. Web is definitely powerful. There are certain things that the operating systems have that you just can’t unlock. Like, lots of native richness that actually make experiences feel so much richer. I think about this for text messaging actually, that the text messaging experience in all the main operating systems feel way richer to me than any AI chat app that I’ve ever used. Like, if I could just talk to AI in whatever texting app I use, I would be way happier than having to go to some other app, because I think we’re also just conditioned on the operating systems.
Sonya Huang: Yeah, makes sense. Okay. I want to ask about the model eats the harness, or the model eats the scaffolding. What are your thoughts?
Logan Kilpatrick: Yeah, I think it’s true. And I think part of this is what we have historically thought of as the model is not the model anymore. I think two years ago when LLMs were popular, it was like the model was actually just a set of weights. It was a set of weights and it was like, how can you, as simple as possible, send tokens in and get tokens out? And I think we’ve just progressively, step by step by step—we still call it the model, we still call it Gemini 3.5, you still call it GPT whatever and Claude whatever, but it’s actually not just the weights anymore. It’s an entire expanding, sprawling system that’s built around the weights that sort of enable a lot of these next generation experiences from agentic tool calling to all these hosted tools—search, code execution, et cetera. The models are now being spun up in containers and have an agent harness and all that stuff.
So the scaffolding is oftentimes a couple of steps ahead of what is baked directly into the model. And then what ends up happening is the model eats that scaffolding and it becomes part of the native model system. And there’s still value in having sort of the external scaffolding in certain cases. And search maybe is an example of this. There’s lots of folks who use different search providers, and there’s different use cases that you want. And so sure, maybe the model can natively use search, but you also want something else. Code execution is another example of that. But it does feel like maybe the agent harness is the quintessential example of this right now where everyone’s like, “Ah, we gotta go build a harness.” And the harness is where the alpha is. And I think that perhaps won’t be true, at least in the way that we think of the harness today. In 12 months, I think the models will have sort of just digested a bunch of that. It’ll be upstreamed into the model and the alpha will be somewhere else now—it won’t be in trying to spin your own harness because the model just does it natively.
Sonya Huang: But I thought that part of the reason why people are building their own harnesses is because if you use a harness from any given model provider, you’re locked in, right? So a lot of the application companies want flexibility, which is why they’re building their own harnesses.
Logan Kilpatrick: Yeah. And I think that’s part of the scaffolding story is that it starts out perhaps true, but then as the model capability improves, it becomes less true over time, actually. Like, you don’t have a generalized model if it can’t use another harness. And so it is important. And I mentioned this in another conversation with someone a few weeks ago, but we need something like Harness Bench, which is actually measuring how good are all these different models at adapting to all the different harnesses. I feel like that seems like a reasonable thing we should measure as an ecosystem. And I’d be curious to see what models are actually best. But I think over time you expect they’d be able to use every harness unless you’re completely out of distribution, which in that case, you’re still going to be completely out of distribution even if you’re using your own harness. So not sure it matters much.
Sonya Huang: Fair enough. What about the application layer? How do you think about where independent companies can have a hope of surviving when the model eats the harness and eats the stuff around it?
Logan Kilpatrick: Yeah, it feels like there’s—yeah, it’s an interesting story that both of these things feel true. On one hand, everywhere I look, I’m like, there’s never been more opportunity to go and build something. At the same time, obviously the models are doing more than they’ve ever done before.
I think there’s that threat of capability overhang, which I think there’s a huge amount of alpha in. There’s the thread of the model companies are going after these very general problems, and there’s just so much value in these verticalized domains. If you have expertise in that domain, you know the customers, you know the ecosystem. It’s like you can really run laps around even the best model labs, because focus is the superpower of startups. If you can focus, you can do anything. And if you look at all of the companies that are big or doing lots of stuff, there’s just not a lot of focus. And for some reasons rightfully so, because maybe I’m overly justifying Google’s strategy, but we just have a lot of products, we have a lot of users, we have a lot of different things going on, and so we actually can’t focus in one domain. We have an obligation to do a bunch of things as a big company. I think that’s not true for startups.
And so I think, like, 24 months ago, we were all asking ourselves, oh wow, it seems like the opportunity space is shifting and maybe it’s possible one of the outcomes is there’s less opportunity for startups in the future. That feels like so far and away not what has ended up playing out, which is really positive. If anything, it feels like there’s just even more opportunity than there was. Now coding has helped you close the gap on larger companies that have established code bases and all this other stuff because you can just run way faster and write software quicker.
The agentic primitive is a new category that you can build products around that, like, actually, in a lot of cases, to the conversation about the risks involved with building, there’s risk involved. And so the risk appetite of different companies is different. And so if you’re willing to take more risk in some domains, you can win a user cohort who’s interested in also taking risk. There’s so much opportunity.
Sonya Huang: Awesome. I’d love to talk about Google DeepMind’s culture. And I’m curious, what does it feel like to be inside GDM right now? We had Demis at AI Sense. He was so inspiring. I’ve heard Sergey’s back. You guys have Noam Shazir back. Walk me through what it’s like to be at GDM right now.
Logan Kilpatrick: It’s incredible. I do try to take it all in, because it is a moment. I try to reflect as much as possible in the chaos of all the things that are happening just because there’s so much cool stuff going on. GDM’s culture is interesting, and maybe three observations. One, back to this thread of focus, we’re doing a lot of things. And so I think you see sort of—I think about this a lot, like, from a portfolio perspective, I think we have one of the strongest portfolios, which is really exciting. But you do see these moments where another lab or another company or whatever it is, will pull ahead in a certain area where we underinvested, just hadn’t been focused enough in that domain. And it’s cool to see the way we go about trying to close that gap. I very much appreciate it. I think I’ve watched the Demis Thinking Game documentary a few times, and you see a lot of details of that original culture and just the way that strikes work and all this stuff. It’s actually really similar today where, like, you just get a bunch of smart people together and go solve the problem. And I love that. And it’s very cool to be a part of.
Another one is this. I think you see the culture permeate from who the leaders are. And maybe this isn’t like a perfect characterization of the ecosystem, but Demis is a Nobel Prize scientist and the sort of OG of a lot of this stuff. And you feel that in the DeepMind culture. I think Sam is maybe one of the world’s best businessmen ever. And you sort of see that in the OpenAI culture and the way that they go about the world. I don’t have a strong sense of who Dario is, but I think Anthropic is a very interesting place, and at least as an external observer, he seems like an interesting guy and somewhat esoteric. And so it seems like it’s in the DNA and the culture of the company.
The other labs are interesting. But I like this very scientific approach to the world, and the way that Demis looks at this. And the reason he’s doing this and the reason they started this mission was literally to solve disease and all these things. And it’s so easy to get—and again, I’m always trying to pull myself out of the moment, but it’s so easy to get lost in this competitive race of who’s pushing a number higher on SWE-bench or whatever it is. It’s very easy to lose sight of the reason we’re doing that is so that we can solve problems that humans actually have. And my favorite quote from all of Silicon Valley is something like, you know, “We can’t let other people make the world a better place more than we can,” which is what this moment feels like.
Sonya Huang: The Gavin Belson quote? [laughs]
Logan Kilpatrick: The Gavin Belson quote. And I think about that all the time. And it’s like we’re all fighting over who can make the world better more than the other person, which when you frame it like that, it seems really goofy to me. And so it’s very much not zero-sum. And I think that’s a way of looking at the world. I think the last thing about DeepMind’s culture is we’re very— it’s sort of the engine room of Google, which I think is literally the Twitter bio now of the DeepMind Twitter account, which I love.
Sonya Huang: Do you man the DeepMind Twitter account?
Logan Kilpatrick: I don’t. I don’t want any responsibility manning other people’s accounts online. Too much responsibility to do that. But it does feel like that, too. So it’s sort of like on one hand you have the deep-rooted lab culture. On the other hand, you have all of these partners across the Google ecosystem that we’re collaborating with, everybody from Android that we talked about earlier to Google Cloud to Gmail to Workspace, et cetera, et cetera. And so it’s an interesting blend of, like, I think there’s lots of research work happening, but there’s tons of applied work that’s happening to actually work with some of the forefront customers. Like, deploying Gemini to billion-user products is a problem that only two companies in the world have. And we have 13 of those products. And Google goes through this all the time now, and it’s such an interesting place to see that happen and see the innovation that takes place in order to make that actually possible. And I feel like you can only do that inside of Google, which is really cool.
Sonya Huang: Beautifully said. Did it give them a lot of heartburn when you joined and were tweeting a lot?
Logan Kilpatrick: That’s a good question.
Sonya Huang: Did you have to get sign-off from comms?
Logan Kilpatrick: One of the silver linings to my Google experience has been just how great that group of folks across marketing comms are to work with. And I think their job is to protect Google and make sure we tell the right story and make sure a bunch of bad things don’t happen. And so I have a ton of appreciation and partnership with them, but it’s been an incredible experience to be able to go try to tell the story that resonates with developers in a way that feels authentic and not have a huge amount of, you know, I don’t have to get my tweets approved all the time and all this stuff. It’s a very, very positive culture. And hopefully I am always trying to walk the line of not burning the trust and goodwill that I’ve accumulated with those folks. But it’s been super positive, because ultimately it’s really hard for Google to tell this authentic story. It’s just it’s a big company, there’s a lot of people, there’s a lot of opinions. And so you take the magic of Google and you water it down through a lot of people and a lot of process and you miss the beautiful story, which is, Google’s doing the most interesting technology in the world and helping our users with some of the hardest problems in the world. And it’s a privilege to get to help tell that story. So it’s a lot of fun. I enjoy it.
Sonya Huang: I love what you’re doing. I love what Josh is doing. I think you guys have put a really kind of sincere human touch on, as you put it, the most important problem of our time.
Logan Kilpatrick: Thank you.
Sonya Huang: Well, wonderful. Logan, thank you so much for joining me today. This is a very far-ranging conversation, everything from agents and coding to world models and harnesses and GDM culture and lots of nuggets here. Thank you for joining me today.
Logan Kilpatrick: This was a ton of fun. Thank you for having me. And I’m excited to see what the folks cook up of where we’ve been sitting this whole time, maybe in front of us.
Sonya Huang: Maybe there’ll be a dog.
Logan Kilpatrick: A dog.
Sonya Huang: You can make my dog dreams come true.
Logan Kilpatrick: I love it.
Sonya Huang: Awesome. Thanks, Logan.
Logan Kilpatrick: Of course.