Skip to main content

Decart’s Dean Leitersdorf on AI-Generated Video Games and Worlds

Can GenAI allow us to connect our imagination to what we see on our screens? Decart’s Dean Leitersdorf believes it can.

In this episode, Dean breaks down how Decart is pushing the boundaries of compute in order to create AI-generated consumer experiences, from fully playable video games to immersive worlds. From achieving real-time video inference on existing hardware to building a fully vertically integrated stack, Dean explains why solving fundamental limitations rather than specific problems could lead to the next trillion-dollar company.

Summary

Two weeks ago, Oasis went viral as the first real-time GenAI videogame, rendered through real time inference with no game engine. Decart founder Dean Leitersdorf thinks this achievement has profound implications:
Bridging Imagination and Interaction: Dean emphasizes the potential of AI to bridge the gap between human imagination and digital interaction. He envisions a future where AI allows users to interact with digital environments in a way that mirrors their imagination, such as transforming a scene into a “Game of Thrones” setting or modifying objects in real-time. This represents a shift from traditional applications to more immersive and interactive experiences.
Overcoming Limitations Rather than Solving Problems: Dean proposes that the most impactful companies don’t just solve existing problems but overcome fundamental limitations. He compares this to the personal computer, which didn’t solve a specific problem but created a new platform for countless applications. He sees this as a rare opportunity that arises only occasionally, allowing for the creation of groundbreaking technology.
The Role of Game Engines and AI: The discussion touches on the potential for AI to replace traditional game engines by allowing more dynamic and flexible interactions. Dean suggests that AI can enable users to modify digital environments using natural language, making it easier and faster to create and interact with virtual worlds without the need for extensive coding.
Vertical Integration for Competitive Advantage: Shaun Maguire highlights Decart’s strategy of being fully vertically integrated, optimizing everything from low-level hardware to high-level user experiences. This approach is compared to Google’s early advantage in distributed systems, suggesting that deep integration can provide a significant competitive edge by improving efficiency and performance.
Future of Consumer Entertainment and AI: Leitersdorf envisions a future where AI-generated experiences (GX) replace traditional user experiences (UX). This shift is expected to create new forms of entertainment and interaction that are more aligned with how humans naturally want to engage with technology. The goal is to create experiences that are not only immersive but also personalized and responsive to individual users’ needs and preferences.

Transcript

Contents

Dean Leitersdorf:  So we launched Oasis a few weeks ago, and really when we launched it, the incredible thing from a tech perspective was, oh, this is the first video model that actually runs real time, and you can interact with it. It responds to user actions. You can move around the world, you can break blocks, you can place blocks. And so we got this nice game without a game engine, okay? But that’s not interesting.
So why is this actually interesting? And so to answer that, forget about Oasis, Oasis 1. Think about, say, Oasis 3, okay? And imagine this. So imagine for a sec, just tech aside for a sec. Imagine you’re looking at a mirror. And you have this magical mirror. You can talk to it, okay? You can tell it to do cool things. You can say, “Hey, I’m here, and here’s my hand and I want to hold a sword. Okay, can you give me a sword?” And then you look at yourself in the mirror and boom, there’s a sword in the mirror where your hand is. And you move your hand around and the sword moves. And you can be like, “No, no, no. Make the sword bigger or make it blue.” And it changes. And you can be like, “Okay, now turn me into Game of Thrones.” And everything around becomes Game of Thrones. And then you get a crown and everything, and you can be like, “I didn’t like my crown. Change it a bit.” And then you start jumping and you move around, and the mirror responds to that, okay? And that’s interesting.

Now the reason that’s interesting is because it’s a completely different experience than anything we’ve had before on Earth. And it allows us to kind of channel our imagination through screens that we can see. It connects two things. It connects what we see in our minds and what we can see with our eyes. And so that’s where we’re going with this. How can we—in a sentence, how can GenAI really allow us to connect our imagination to what we see on our screens? And with that, we can take it into, really, worlds that we didn’t explore before. It can change everything from applications we can’t do today, all the way to how we even interact with computers or with hardware.

Introduction

Shaun Maguire: Hey everyone, I’m Shaun Maguire. I’m a partner at Sequoia Capital. Today, my partner Sonya Huang and I are going to interview Dean Leitersdorf. Dean is a brilliant young mind. He grew up back and forth between Israel and the United States. He was the youngest person to ever get a PhD from the Technion at Israel, at 23 years old–at least until his younger brother beat him and got his PhD when he was 21. Decart is trying to deliver delightful AI experiences–really trying to let people interact with their imaginations, and other people’s imaginations, in a way that’s never been possible before. To do this, they are fully vertically integrated, optimizing everything from as low level as CUDA kernels up to designing their own models, training the models, and then, at the end of the day, delivering experiences. Over the next few months we’re going to see some pretty impressive launches from these guys.

ABOUT OASIS

Sonya Huang: Dean, thank you for joining us today. I was just playing Oasis this morning. I had so much fun. So let me start by asking Oasis, a fully playable AI game engine. What is it? Why did you launch it?

Dean Leitersdorf:  About Oasis. So we launched Oasis a few weeks ago, and really when we launched it, the incredible thing from a tech perspective was, oh, this is the first video model that actually runs real time, and you can interact with it. It responds to user actions. You can move around the world, you can break blocks, you can place blocks. And so we got this nice game without a game engine, okay? But that’s not interesting.
So why is this actually interesting? And so to answer that, forget about Oasis, Oasis 1. Think about, say, Oasis 3, okay? And imagine this. So imagine for a sec, just tech aside for a sec. Imagine you’re looking at a mirror. And you have this magical mirror. You can talk to it, okay? You can tell it to do cool things. You can say, “Hey, I’m here, and here’s my hand and I want to hold a sword. Okay, can you give me a sword?” And then you look at yourself in the mirror and boom, there’s a sword in the mirror where your hand is. And you move your hand around and the sword moves. And you can be like, “No, no, no. Make the sword bigger or make it blue.” And it changes. And you can be like, “Okay, now turn me into Game of Thrones.” And everything around becomes Game of Thrones. And then you get a crown and everything, and you can be like, “I didn’t like my crown. Change it a bit.” And then you start jumping and you move around, and the mirror responds to that, okay? And that’s interesting.

Now the reason that’s interesting is because it’s a completely different experience than anything we’ve had before on Earth. And it allows us to kind of channel our imagination through screens that we can see. It connects two things. It connects what we see in our minds and what we can see with our eyes. And so that’s where we’re going with this. How can we—in a sentence, how can GenAI really allow us to connect our imagination to what we see on our screens? And with that, we can take it into, really, worlds that we didn’t explore before. It can change everything from applications we can’t do today, all the way to how we even interact with computers or with hardware.

SOLVING A PROBLEM VS. OVERCOMING A LIMITATION

Sonya Huang: I love the mirror. Let’s take it further. Where are you going with that? Is this a social media thing? Are you building a game? Are you building a world model, an interactive world model? How should I think about what is Decart? What is Oasis?

Dean Leitersdorf:  So let me ask you this. What problem does ChatGPT solve?

Sonya Huang: Homework.

Dean Leitersdorf:  Homework. Great. And what else does it solve?

Shaun Maguire: It makes it easier to talk to computers.

Dean Leitersdorf:  Nice. Shaun knows the answer because …

Shaun Maguire: Because I spent a lot of time with games.

Sonya Huang: Classic Shaun.

Shaun Maguire: I spent a lot of time with games.

Dean Leitersdorf:  But exactly that. The TL:DR is challenging, but it doesn’t solve any given problem. It helps you do your homework better. It helps you write emails. It helps you summarize. Exactly. Now it doesn’t solve a problem. It overcomes some fundamental limitation, which is exactly what Shaun was saying, that it overcomes this communication barrier between humans and computers. Computers speak in structured languages, humans in unstructured languages or languages with complex structure that LLMs just bridge that gap and let computers and machines interact with each other in a language that we can both understand.

That itself, the second you have that, you get a hundred different things that are solved on top of that. So what you get with the mirror or what you get with generative interactive video is you get that communication barrier now overcome not just with text, but also with what we can see. Now computers will be able to see the world the way we see it, and they’ll be able to show us the world in ways that we can understand. And you solve that, you solve—you build a platform that allows you to build everything on top of that, from next gen Snapchat or TikTok to simulators for fighter pilots, okay? And that’s the cool thing here.

And that’s if—now we’re in 2024. I think one of the most fun things we have  at Decart is that we’re founding a company when you have an opportunity to build something that doesn’t solve a problem but overcomes a limitation. 99 percent of companies solve problems. When you look at companies that come to pitch Sequoia or pitch any other VC, they start with, “Here’s the problem, here’s how big the problem is. That’s our TAM and everything. And here’s how we’re going to solve the problem.” And usually the first two stay the same, otherwise you call it a pivot, right? You say, “Okay, this is the problem I’m solving.” If you change the problem you’re solving you call that a pivot, and 500 times you change the way you’re going to solve it.

That’s 99 percent of companies, and that’s what you can do in any regular year. There are moments in history—recently, it’s been like once every decade, maybe 15 years—that you actually have a chance to build something that doesn’t solve a problem, but just overcomes a limitation. And let me ask you this in a different way. Is the Mac a consumer product or an enterprise product?

Shaun Maguire: And is it a hardware company or a software company?

Dean Leitersdorf:  Is it a hardware company or a software company? And what problems does it solve? And if you try to give me a list of problems that the personal computer solves, you’d have everything from gaming to Excel. And that’s the nice thing about this, that you’re building an insane piece of tech that you’ll be able to productize in so many different ways

THE ROLE OF GAME ENGINES

Sonya Huang: Yeah, I love that. One of the things that was so cool about what you’ve built is that there’s no game engine inside, as far as I can tell. Like, what do you think that means? Do you think that game engines are an artifact of the past? Or, like, what does that mean?

Dean Leitersdorf:  Game engines were supposed to make it so that we can—so that one person can create a world, and a different person can interact with that world, right? That’s the purpose of game engines. You have the game developer, and you have the user that uses that. And it might go also for movies or whatever other people use game engines for. Unreal has been used for movies a lot recently as well. Now that is a very valuable product, and it has lots of advantages to it. The world is very consistent. You can really make things very accurate. The problem is that it does take a lot of time to interact with it.

People like taking the basic game and they like turning into a bunch of different things. And, you know, as we got into this and we actually saw what people do with it, do you know there’s an actual mod to put Pokemon inside Minecraft? Okay? You can walk around the forest and there’s Pokemon running around. That’s an actual mod someone built. Okay. And so people inherently have this oh, we got this platform and we want to change it. And so that’s the nice thing about mods.

What you get here is that because what’s running your game or your environment is an AI, you can interact with it in the ways we’re used to interacting with AI. You’d be able to say, “Hey, can you turn this into Elsa themed?” And then boom, everything becomes Elsa themed. “And can you add a flying elephant?” And there’s a flying elephant in the game. And it’s not just there as a picture. You can actually interact with it. You can, I don’t know, punch the elephant, it’ll punch you back, or whatever you can do with an elephant.

And so I think that if this trend were to replace game engines, it would have to be at the state that you can program for it so that it’s some machine that one person can build worlds on and the other can interact with. And that is definitely coming. And not only that, it’s going to be much easier to program for this. You can just use words. You don’t have to write code. And even if you do know how to write code, you can iterate so much faster on it. So basically to summarize this, I think what this will allow us to do is we’ll get modding much, much, much, much, much faster, and we’ll get into active modding.

HOW VIDEO REAL TIME INFERENCE WORKS

Shaun Maguire: To get a little more technical for a second. You’re the first video model I’ve ever seen that has real time inference. What are some of the things that go into having real time inference? Like, you know, how hard is it? And just, like, give us some of the flavors of what goes into that.

Dean Leitersdorf:  If we go back, like, three, four months, like, back to the summer. I don’t remember where this was published, but there were a few headlines about, “Oh, when Blackwell chips come out, when Nvidia’s Blackwell chips come out, we’ll get real time video.” Hoppers just can’t do it. The H100s can’t do it. We have to wait for Nvidia’s next generation. 

And I think I heard this from quite a few different sources. There were, like, two weeks during the summer where everyone was saying that for some reason, okay? And no, H100s can actually do it, okay? And to pull that off, you have to do two things at once. You have to change a lot of things around the model itself. Not every video model can be run in real time. You have to train the model differently. The architecture needs to look different. Now it’s not major architectural changes, but you do have to make them.

On the other hand, you also have to do lots of the systems-level stuff. You actually have to write your own CUDA kernels. You have to write—we threw out like PyTorch’s garbage collector and wrote, like, half of it from scratch, okay? And you really have to write everything on the system’s level as well to actually pull this off. Because if you do only one of the two, you’ll be waiting for someone else to do the other half for you. If you’re only doing the systems-level part, then you won’t be able to pull this off because you won’t have a model that’s ready to be interacted with this way. If you do just the modeling stuff, you won’t have the systems-level support to be able to make it run real time.

Sonya Huang: Can you say a word on how the model works? Is it transformer based? Is it similar to the Soras of the world? What have you built on the model side?

Dean Leitersdorf:  Yeah. TL:DR is it’s exactly like the Soras of the world. The prompt is user actions instead of text. Like, that’s the easiest way to think about it. You have text-to-video models, right? You have Sora that you put in a sentence and you get a video. Same thing here. Just you put in your prompt as like your keyboard actions and your past frames and it generates the next frame.

Sonya Huang: Okay, so how do you get the data between actions and video?

Dean Leitersdorf:  So yeah, you do have to do some pre-processing steps here that you don’t do with regular video models. For example, you do have to take the raw recordings of hey, this is the gameplay and to label it at each step with the action that’s being taken. And so we trained a small model that does that. It actually doesn’t need too much data. You can solve that with a small model that doesn’t need too many examples. And so you can just have—you know, our team just played for a bit, recorded that, you get a small model and then you use that to label all your data.

WORLD MODEL VS. PIXEL REPRESENTATION

Sonya Huang: Super interesting. And are you building a world model, or is this just purely pixel representation?

Dean Leitersdorf:  Nice. The beautiful thing here is that it’s purely pixel representation. Now let’s compare that to exactly what you were saying with world models or 3D stuff and the other things. In AI for more than a decade, there’s been a general question of do you solve stuff end to end, or do you take an existing workflow and make something more efficient, okay? There could have been two ways to solve this problem. You could just say, “Hey, game engines exist. Unity is amazing, Unreal is amazing. Let’s just plug into that workflow. Let’s build text to 3D.” So I’ll describe an elephant, and I’ll get the 3D mesh of an elephant, and that’ll be embedded into Unity and Unreal or whatever game engine you’re using, okay?

So compare that to the end-to-end solution of at the end of the day, what I have is a screen. The screen needs to show something, and that needs to work, okay? And at the end of the day, what people do is they see their computer screen and they touch their keyboard and they move their mouse, and that’s your interface. And you solve this end to end from keystroke to frame, okay?

So obviously these two are competing directions. Now over time, I think that there will be some merging between them. From a technical perspective, they each have their own advantages. The first is much more consistent over time. It’s much easier to say, “Oh, here’s this object, here’s how it looks.” And when it’ll come back in two hours, it’ll look exactly the same. And the other one, the end-to-end pixel, the fusion version that does pixels in pixel space, that one is much more easy to work around. It’s much more flexible. You can really say, “Oh, no, no. Change the elephant’s tail, it’s too big.” Or you can actually edit it live in a way that’s just more dynamic.

So I do think that long term though, these two things will converge. And just if we roughly map this out, so today we really just have prompt to pixels, keystrokes to pixels. You could in theory say that the right way to solve this in, say, the next two or three years is to have two models, to have a model that—everything’s transformers, right? You have one model that’s in charge of holding some state, state of the game, and that’s unrelated to pixels. It’s literally just like a LLM wise transformer, okay? It just gets the current state, it gets the new user’s action and just outputs changes to that state. And you have one model that’s doing that. And then the second model takes that state and renders it to pixels. So it makes sense that that’s roughly where we’ll converge because that will really take into account both the advantage of world models and the advantages of the fusion models.

Sonya Huang: Do you want to build both of those models?

Dean Leitersdorf:  Of course. I mean, yeah, definitely.

Shaun Maguire: But yeah, I will say that we are a bit off. Like, it will take some time to reach that stage. Yeah.

VERTICAL INTEGRATION

Shaun Maguire: One of the things for me that really caught my attention about Dean and Decart is they have this ambition to be completely vertically integrated. Like, these guys understand literally down to electrons and how—I’m serious. They understand how electrons move in logic gates, and even alternate logic gates and how you can represent them in levels even below assembly. You know, how you can change then in assembly CUDA kernels, they literally go all the way from electrons to pixels that your eye sees. And they’re optimizing every single level in there. And I think by doing that, I think they’ll always have a kind of 10x-plus advantage over anyone that’s just on the application layer.

Sonya Huang: Actually, so talk about this, because Shaun loves to talk about this. I think the counter argument would be specialization, right? There’s 10,000 very smart people at Nvidia, at choose your favorite company working on this. You should focus on building the best possible user experience and the viral loops and things like that. So talk about your decisions to be vertically integrated.

Shaun Maguire: Let me actually say something because Dean can’t brag about himself the way we can brag about him. But I’ve been studying business models my whole life. It’s been a passion of mine from a young age. And for myself, Google to me is one of the most amazing companies of all time, one of the most amazing business models. I worked at Google for a few years. I really feel like people have the wrong understanding of what was Google’s moat. I also think people have the wrong understanding of what is Nvidia’s moat today. But for me with Google, like, obviously Sergey and Larry had invented PageRank. PageRank was a very beautiful algorithm, but it was a deep insight, but it’s very simple to implement. It’s like a very basic graph theoretical idea and it was a published paper. So once PageRank came out, everyone replicated it very quickly.

For me, the real advantage of Google was that these guys were some of the best in the world at distributed systems and at, like, low-level systems optimization. And they had this very profound insight from early on that basically all the other search engines were buying Sun Microsystems server racks. The way they would get fault tolerance was by buying expensive hardware. Whereas for Google, they realized that they can buy just cheap consumer commodity hardware that fails all the time. You know, you buy Intel Pentium processors that are in your gaming computer or, like, SanDisk memory. And, you know, you need five times as many total flops or five times as many bits to get the same performance because of all the failure rates. But the cost per flop is like 1/50th. And so you can have a 10x cost optimization—10x cost advantage by really leaning into distributed systems and getting the most out of the hardware.

And what that led to with Google is for me, when I look back on when I first started using it, it was this very, very simple front end. It was literally just a white webpage with a search box. It was, I think, a worse front end than Yahoo at the time. You know, Yahoo also had chat rooms and other these kinds of flashier, exciting things. But Google had this magical back end. Like all the magic to me of Google was on the backend. And I think that back end, the performance came from this cost advantage, and it came from the fact that they had optimized all the way down to the bare metal.

And with Dean and Decart, the story really rhymes with me. And look, we need to stay humble. Like, this company hasn’t done jack shit yet. It’s a very long way before they deserve a comparison to Google. But—and for what it’s worth, Sequoia led the Series A, co-led the Series A in Google. I’m very proud of that. Also led the seed in Nvidia. So we have good history.

Dean Leitersdorf: Good track record. Good track record.

Shaun Maguire: Good track record. Also Series A and Apple. But …

Sonya Huang: Commercial break is over.

Shaun Maguire: Commercial break is over. But anyways, I think to really deliver these, like, delightful—like, say a delightful mirror experience, which is a very simple front end, I think you need this absolutely insane back end that is optimized to the bare metal. And I think it’s kind of all or nothing. Like, if you can’t deliver real time, I don’t think it’s very good. And I don’t think you can deliver real time in the next year without going all the way to the bottom. And so I just, I don’t know, for me, I think you kind of have to do that. And these guys are the only ones I’ve seen doing that.

Sonya Huang: Well said.

Dean Leitersdorf:  Wow. I love what Shaun just said, because two things really caught my attention. One is about the vertical integration. We’ll touch about that in a sec. And it goes back to your original question. The second is really about—so I won’t name names, but I was speaking to someone who’s very, very, very executive at Google recently, okay? And just reminiscing about the past and trying to hear—because I was three months old when Google was founded, okay? So I was around back—I was around back then, but not really paying attention.

Shaun Maguire: Knowing you, Dean, you might have been paying attention.

Dean Leitersdorf:  You know, so I was trying to understand exactly what happened there, like, why that was interesting. It came from, like, an unrelated conversation. And the way that person brought it up, we’re talking about how GPU clusters are just unreliable, okay? Just, you know, in general today, if you try to train a model like the one we trained, on any cluster, whether it’s hyperscalers or GPU clouds, that thing’s gonna crash every few hours, okay? And you’re going to have, like, the weirdest things, okay? You’ll have one node crashing, and it’ll be because two other nodes have dust on the cable between them. And there won’t be any error to really tell you that that’s what’s happening. So your training room will just crash and you’re like, “Okay, why did it crash?” And you’ll try rebooting it and it won’t work. And then you’ll try removing random nodes until you understand what happens. And that’s the state of the entire industry, okay?

Pretty much, like, the only ones training that don’t see this are probably Google and OpenAI, because they really built everything down to, like—Google built everything down to the hardware as well. OpenAI had a lot of time to really focus a lot on this reliability stuff. But anyone else who’s training, from the big companies to the small startups, they’re all experiencing this. And so I was talking to this person who’s very, very high up at Google, and they said, “Hey, we’re—today with—like, training today is like back where CPUs were in the ‘90s.” Like, forget, like, Kubernetes, there was no VMware.

Shaun Maguire: Yeah.

Dean Leitersdorf: Okay? Nothing was reliable, and your servers would just crash all the time. And you had the exact same thing that most companies didn’t want to deal with that. And so they just either paid for the premium service that was somehow better. A) So they both paid more money; but B) they also paid with time. Like, the broken hardware exists before the stable hardware exists. Sure, we’ll get to stable training runs in a year, in two years, whenever that will happen, okay? Nvidia will make their chips more stable. They’ll make their code more stable. The GPU clouds will figure out stuff around this. That’ll happen. It’s not the state today. If you want to train a model today, you’re gonna face all of that.

And so it’s one of the things that it’s really a challenge you have to deal with. And at Decart we just deal with it, Okay? The reason we can—so the model that you saw, Oasis 1? Oasis 1 converges from start to finish in 20 hours.

Sonya Huang: Wow.

Dean Leitersdorf: And compare—you can compare that. Like, we know what the—we have lots of joint work or communication with other AI labs. They were all shocked by this. Now I’m talking really about the best labs training diffusion models. For this model, their convergence would usually take around two weeks. And it’s both because they’re not using optimized systems layer stuff, but also because they crash every few hours or every few days or whatever. We can actually hold it.—we can—okay, we can actually hold the training run end to end without crashing. We can also hold a training run for a week or for two weeks without crashing. And that reliability part really, really resonates with what happened back then.

Now the thing about it is that it’s really not simple to pull off. You see, like, we have this internal doc, I think it’s around 200 pages now, of everything that can go wrong when you’re training a model. And it’s everything from, “If you see this error on this node, then yeah, tell your hardware operators that these two nodes have a problem between them, these other nodes have a problem between them.” And all the way to—and here’s a fun one. At a certain point as we’re training Oasis, we were doing the training run and we needed some synthetic data to generate as well.

And so we said, “Okay. Well, we have this cluster. It has a shit ton of CPUs as well. Like, great, it has lots of GPUs, but there’s lots of CPUs and they’re utilized by like three percent or something. Okay, we can just use this and just generate lots of synthetic data on the same cluster as the training is happening.” By the way, this, like, blew the minds of our GPU cloud. They were like, “You guys are using the cluster to, like, 200 percent. You’re using the CPUs, you’re using the GPUs, and you’re using—” we even use, like, the Infiniband to send data around during training. So, like, we’re getting a lot more out of the cluster than should be expected, okay?

Now that all makes sense. So on one hand you have this, like, the GPUs are utilized, the CPUs are not utilized. So you run, like, synthetic data in parallel. It’s not supposed to utilize. It uses just the CPUs, and so it’s not supposed to hurt anything. And then your training run doesn’t work. Okay? And you get a random error that literally says—the team will know how to say this better, but the error that you get is something like “Missing lock file in the data loader.” Okay? It’s like, how are these two related? Do you want to know how they’re related? They’re related like this. The synthetic data gen was using up more RAM, which is fine. But it caused—sorry, no. It was—okay, to move the data around between the different nodes as the synthetic data was being generated, it was using more network bandwidth than before. And that caused Python’s data loader to take one of its log files that’s usually network mapped and move it to swap it out to disk, okay? And that caused the state that different nodes had different log files, and that caused the data loader to crash, okay?

Now I’m probably saying this wrong and the team’s probably listening to this and like, “No, Dean, you’re getting it all wrong.” But that’s the TL:DR of what happened, okay? You did something that was supposed to make sense and you got a random error. And that’s the day to day. And we have a 200-page doc of all of these things. And so that’s why it’s hard.

Shaun Maguire: And this is a simple example that Dean is happy to share. Like, there’s—you know, there’s …

Dean Leitersdorf:  It’s one of the simpler ones.

Shaun Maguire: There’s 100x harder, more important things that they’ve had to figure out. One that I think is also relatively simple, but it just kind of shows the current state of AI—and Dean, feel free. If you don’t want to talk about this, don’t talk about it. But they got access to a new cluster, and somehow the cluster had not installed memory yet, but the GPUs have some very small amount of onboard memory. And so most people would just not even be able to use the GPUs. Can you share anything about this story?

Dean Leitersdorf:  Yeah, so this is actually a nice story. So we call this the best place on Earth to train a video model. Training a video model isn’t just the cluster, it’s everything surrounding the cluster, okay? You need to have the storage there, you need to have the networking there. There’s so much that needs to go into building the best place on Earth to train a video model, and we’re actually very far away from that. I’m assuming that roughly over the next half year, lots of this will stabilize. And lots of the GPU clouds are working on this, but yeah, with one of the clusters that we got to, there wasn’t any storage. And by the way, it wasn’t even with one. It happened with a few clusters and different clouds, okay? That, you know, the clouds, they bring the GPUs and they try to get …

Shaun Maguire: Everyone’s so focused on getting the H100s that, you know, they forgot the memory or the storage.

Dean Leitersdorf: And it’s fine and it’s okay. And they were gonna stall. They would get there, but, you know, they try to release everything as fast as possible, which is great, which makes sense. And so okay, there was no stable storage, storage-optimized nodes that you can use, or an S3 bucket or something that you can use. And so we said, “Okay, well, every node has a few SSDs connected to it. What if we just build our own mini fake distributed file system on top of that?” And that’s what we did. And it worked. And it was—there were so many things to overcome to make that happen, but it works at the end of the day. And that’s I think—and it goes back to your question about vertical integration. Vertical integration, so I’m—Shaun knows business much better than I do and has been around all these fields way longer than I have, okay. I did PhDs and, like, technical stuff.

Dean Leitersdorf: No, I said experienced. I said experienced.

Shaun Maguire: I was using Google when it first came out, and I bought Nvidia shares in the IPO, which is also right around when Dean was born, so …

Dean Leitersdorf: Nice. Nice. And Nvidia, I think IPOed before I was born. No? ‘96?

Shaun Maguire: ‘99, I think.

Dean Leitersdorf: ‘99. Okay. Okay. But yeah, as far as I see it, and correct me if I’m wrong, vertical integration usually gives you two things. It gives you a cost reduction, like higher margins or whatever, and it gives you the ability to move faster. Maybe gives you a third thing, because usually things give you three things, but who knows? So I think here in AI, the more important part—sure, they’re both important, but I think the second one is even more important than the first. Because at the end of the day, if you look at all the problems we’re facing, great. They will be solved, but it’ll take time for them to be solved.

And if you—you know, I think that there was a great article, I think at The Information, about how it was like a few months ago that people who leave Google to start startups suddenly realize that nothing works. Because everything works inside Google, and then you go outside and like, “Oh, there’s no storage?” Or, “Oh, my cloud provider doesn’t provide me with this? I actually need to take care of this?” And so okay, fine. Over time, these things will stabilize, and your clouds will provide you what the cloud needs to provide you. And you’ll have great companies that provide you with, like, middle layer for the system stuff. or even for the model training stuff will make lots of easier for you. But if you really do everything end to end, you can get to market a year before everyone else. You can get to market two years before everyone else. And that’s, I think, what’s key here, because even if we go to the Google story or the OpenAI story, tech moats don’t last, right? Like, sure, Google is a great search engine. Bing is probably not that bad, okay? Sure, maybe Google has more data so they’re able to do that. But Microsoft, a huge company, they’ve been working on Bing for so long. It’s a good search engine. They have the tech. It still doesn’t mean that now Bing and Google are balanced, right?

So at the end of the day, the entire game here is get your tech moat quickly, two years before everyone else like Google and like OpenAI did, and work as fast as possible to convert that to different moats. And that’s the game here, that’s what you have to play because we can all say, “Okay, you know what? Sequoia invested, all good. Let’s put the money in the bank for a sec, okay? Let’s get some interest on that. We’ll go be on the beach for, like, two years, wait for everything to stabilize, and we’ll come back in two years, and then we’ll build the same company.” And that’ll be great, but someone else would have done it before. And that’s, I think, why we chose to be vertically integrated.

BUILDING A MOAT

Sonya Huang: I love it. What’s your moat going to be?

Dean Leitersdorf:  Long term or short term?

Shaun Maguire: Both.

Dean Leitersdorf:  Both. Perfect. Short term? Tech. Okay? Short term, tech. And that’s great, and we have the best systems layer stuff, and we’re also doing the model layer stuff as well. So we’re fully integrating. And that’s your moat at the end of the day short term. Long term? Long term, I think that’s a great question. And let me share something that I found really interesting, okay? So there is a new, weaker version of network effects that exists today that didn’t exist before. And that network effect is called “What people say on TikTok.” Now why is that interesting? Okay? One of the companies that I really—that we learned a lot from and that I think is an actually really, really good company, they did end up selling to Google is Character AI. They didn’t end up selling to Google and wanting to go back to training big models, but Character, there’s a lot to learn from Character.

And one of the things that—the second they took off, they had lots of competition instantly. Like fine, their tech moat lasted for, like, half a year until Meta released open source models, and then other people started running this. They were still vertically integrated, and so they were able to be 10x cheaper than everyone else, which was great. But one of the things that really stood out to me was their TikTok moat. If you go on TikTok and you look for any Character AI competitor, fine, you’ll find a video of that competitor, and then you’ll scroll and you’ll see a hundred videos of Character. And if you even—you know, if you go on the videos which are not Character, all the comments are full of Character. And if you talk to a random Character AI user, they don’t even know the competition. And so we have somehow literally because of TikTok, there is a new moat of “What people say about you on TikTok.” And do you have a mini network effect there? A mini brand? I’m not sure if it’s a network effect or brand effect, but …

Sonya Huang: Why is this different from just brand?

Dean Leitersdorf: So it’s very similar. It’s similar to brand.

Sonya Huang: Okay.

Dean Leitersdorf: But it’s in your face. Like brand, like, 20 years ago was okay, did you hear your friends talking about this or your parents talking about this? Here you’re always on—like the younger generation especially, they’re always on TikTok, and so they just see this instantly. And so there’s even a big question of whether a moat like that could survive for the two, three years until you get your long-term moats of like insane brand like Google’s or a distribution brand or something like that, or a distribution mode or something like that. So I think we’re really in this new market here that we’re not necessarily gonna have the same moats we had 10 years ago.

Sonya Huang: Hmm. Super interesting.

Shaun Maguire: Hardware is always the best moat, though. And for what it’s worth, Google, I think they elevated what was initially like a software moat and a distributed systems moat to becoming a hardware moat. I personally think that Google has not leveraged that moat enough on the application layer. They haven’t had that many really fantastic breakout consumer products since the early days. But they have an absolutely gigantic cost advantage, really, because—on the hardware layer. When I was at Google, there was this project that just absolutely blew my mind and gave me a prepared mind for a few investments, which is basically Google built optical interconnects to move data in data centers. One of the papers, if you google “Jupiter Rising Google Data Center,” you’ll find the papers.

And basically these optical switches, by turning them on, basically about doubled the performance of the data centers. Like, these one switch is mainly rack to rack in data centers, moving from electrons to photons. And these switches were insanely hard to build. And basically everyone outside of Google, if you asked them at the time, “Is it possible to build?”

Dean Leitersdorf: They’d say, “No way.” Yeah.

Shaun Maguire: Terabit per second switch or whatever. They’d say, “Absolutely no way.” But they did it. People didn’t even know for years that Google had this, and it reduced power consumption of the data center by 30 percent or something. And it’s just like those things are real fundamental moats. I think it’s always hard to know what the moats will be for a company in the future, but I strongly believe hardware is the ultimate moat, in part because there’s always going to be an extreme delay to move atoms, like to spin up fabs, to get power, to build a power plant. Even in a world with AGI, the timescale of hardware, even in a world with a billion Optimus robots, the timescale to make new hardware will be much slower—or the timescale will be longer. So anyways, I hope Decart has a hardware moat.

Dean Leitersdorf:  I think I agree with you on that. Like, long term. Okay, you know, this actually goes back to when we were founding Decart. So we said, “Okay, we’re in this—” we called it the golden ticket. We got this ticket that you get once in your life of starting a company and a time where going back to what we were discussing before. starting a company at a time we can solve some fundamental limitation and not—like there’s some huge tech shift going on. And we said, “Okay, there are three huge companies you can build here.” That was our analysis of the field. A) You can build an Nvidia competitor, and if you—like, the next gen chip that’s actually built for AI. And it’ll be very tough to do. And Nvidia, Nvidia is not just a chip giant, but they’re a supply chain giant, okay? Insanely hard to do, but if you hustle your way around, everyone in the industry wants to help you. And so it’s doable if you really excel on the business side.

Two was to build the next AWS. Like, there is an opportunity. Because the workloads themselves are changing, there is an opportunity to be able to build a new cloud. Very, very, very tough because in that market there’s a default winner. If you all lose, the big three will still win, the big three plus Oracle or the other clouds as well. And the third was create new experiences. That new experiences will happen and these experiences will be drastic enough so that the next trillion-dollar company can come out of these in five years and not in 30 years. And so we had to choose one to start with. We chose the experiences one. But a definitely strong second was let’s build an Nvidia competitor one. And so we have that lingering thought of one day we’ll get back to this.

Sonya Huang: I see why you two are friends.

Dean Leitersdorf: [laughs]

THE FUTURE OF CONSUMER ENTERTAINMENT

Sonya Huang: I want to close it out with one last question: if everything goes right, what is Decart in 10, 15, 20 years? And what experiences have you crafted? And what is the future of consumer entertainment? I don’t know if that’s the right market.

Dean Leitersdorf:  And I’ll say this, and I’ll give credit to James from Sequoia here because he’s the one who coined this term, ‘generate experiences.’ GX, okay? And we call this UX is dead, long live GX. Okay? Basically we’re gonna have new experiences that are generated in ways that match how humans want to interact with computers. And that encapsulates everything from Character AI to generate experience to real-time video models that are generated experiences. And that’s what we’re gonna see. Decart at the end of the day is a generative experiences company. We’re implementing this with being fully vertically integrated, with having the systems layer. At the end of the day, you’re a generate experiences company. You’re creating the new wave of experiences that’s gonna touch every single person on the planet. And that’s where Decart is. Now the only question is whether it does take 10 or 15 years? In today’s age it might take less. It took a long time for the previous titans to rule the world. I don’t know if it’ll take that long this time. It’ll definitely take at least five years.

Sonya Huang: You operate on a different timescale than a lot of the best AI researchers that are in our orbit. And I really respect that about you. Should we close out with a rapid fire round?

RAPID FIRE QUESTIONS

Shaun Maguire: Sure.

Sonya Huang: Okay. Favorite AI app other than Oasis?

Dean Leitersdorf:  Has to be between ChatGPT and Character. Has to be between ChatGPT and Character.

Sonya Huang: What do you use Character for?

Dean Leitersdorf:  Not using character.

Sonya Huang: Okay.

Dean Leitersdorf:  But on the basic notion of that we’ll have these apps that are entities that have—that hold some kind of relationship, whether it’s friendship or whether it’s utilitarian, with hundreds of millions of people. I think that’s an insane platform that’s gonna be the basis for so many things going forward.

Sonya Huang: Yeah, I love that. Favorite AI company. Could be the same as the last answer.

Dean Leitersdorf:  Same as the last answer.

Sonya Huang: Same as the last answer. Okay, let’s see.

Shaun Maguire: When did you first program a computer?

Dean Leitersdorf:  First programmed a computer? When I was 13. Bots for Runescape. Okay? Great game, Runescape. I botted the hell out of it for years, until six years in I used a bot that I downloaded from the Internet and 24 hours later got banned.

Sonya Huang: Are we going to have AI-generated video games first or AI-generated novels? And I mean, at the level where I would actually pay for it.

Dean Leitersdorf:  You’re going to have—the first thing you’re going to have is a platform that lets other people use their creativity to create this content because AI is still far away from creating creative content.

Sonya Huang: Super interesting. Okay.

Shaun Maguire: Who’s your favorite scientist ever?

Dean Leitersdorf:  Favorite scientist? That one I like. That one I like. You know, there’s a reason we chose the name Decart. We chose the name Decart because—okay, well, first of all, I’ll answer the question. Favorite scientist is Da Vinci, because I think he’s both an insane scientist and engineer, and somehow was able to get people to fund his project, okay? He was like, if you go back to DaVinci, he literally was a great scientist, engineer and somehow knew how to raise money from VCs back then, which were kings. Okay? So yeah, definitely DaVinci. And Descartes and Tesla are close seconds. The reason we chose the name Decart was we looked at Tesla, we’re like, “Okay, we love both that company and the name.” And we needed someone who does—who resembles the same thing that Nikola Tesla resembled to the company Tesla. And for that, that was Descartes because, you know, “I think, therefore I am,” resembles almost a lot of what AI is today.

Shaun Maguire: Brilliant.

Sonya Huang: Perfect note to end on. Dean, congratulations on what you’ve done. Thank you for joining us today. We love this conversation.

Shaun Maguire: Dean, I’m not gonna congratulate you. You haven’t done jack shit yet.

Dean Leitersdorf:  Right?

Shaun Maguire: Build something insane. But I love the sentiment.

Dean Leitersdorf:  We can’t celebrate until we really win.

Shaun Maguire: Yeah.

Dean Leitersdorf:  Okay? There’s no celebrating small wins.