Skip to main content

Founder Eric Steinberger on Magic’s Counterintuitive Approach to Pursuing AGI

In 2022, Eric realized that AGI was closer than he had previously thought and started Magic to automate the software engineering necessary to get there. Among his counterintuitive ideas are the need to train proprietary large models, that value will not accrue in the application layer and that the best agents will manage themselves.

Summary

Eric Steinberger, co-founder and CEO of Magic, is a prodigy who began his AI journey at age 14 and quickly rose to become a top collaborator with renowned researchers like Noam Brown. In this episode, Steinberger shares his vision for building AGI through automating software engineering, and offers insights on what it takes to compete in the rapidly evolving AI landscape.

  • General domain long-horizon reliability is the key frontier in AI research. Steinberger believes that to solve this challenge, models will need to selectively leverage massive amounts of inference-time compute—potentially up to 1M times current levels to find a particularly significant token. He views this as possibly “the last big problem” to be solved on the path to AGI.
  • Vertical integration and model ownership are essential for leading AI companies. Steinberger argues that relying on third-party APIs puts companies at a significant disadvantage. He believes the most valuable AI applications will be developed in-house by those who control the underlying models. This conviction drives Magic’s approach to building their own large language models, despite the significant technical and financial challenges involved.
  • Effective AI researchers combine broad knowledge synthesis with the ability to make conceptual leaps. Steinberger emphasizes the importance of reading widely across the field, building a mental “database” of ideas that can be recombined in novel ways. However, he also stresses the need to generate entirely new concepts—“leaps” that push the boundaries beyond synthesis.
  • Building an AI company requires balancing research excellence with strong business outcomes. As a researcher-CEO, Steinberger highlights the unique challenges of leading an AI company, including the need to deeply understand the technology while also focusing on product development, team building and fundraising. He notes that success in this role requires a different mindset from pure academic research, with a greater emphasis on practical outcomes and business viability.
  • The ultimate goal for AI is to function as a colleague rather than just an assistant. Steinberger envisions AI that can take high-level direction and execute complex tasks independently, much like interacting with a top-tier human engineer. He believes this level of capability—where the AI can effectively manage itself and solve problems with minimal human intervention—is achievable within a “very small number of years,” though he acknowledges the challenges to overcome.

Transcript

Contents

Eric Steinberger: The thing that remains to be solved is general domain long horizon reliability, and I think you need inference time compute for that. When you try to prove a new theorem in math, or when you’re writing a large software program, or when you’re writing an essay of reasonable complexity, you usually wouldn’t write it token by token. You’d want to think quite hard about some of those tokens. And finding ways to spend not 1X or 2x or 10X, but a 1,000,000X the resources on that token in a productive way, I think, is really important. That is probably the last big problem.

Sonya Huang: Hi, and welcome to Training Data. I’m delighted to share today’s episode with Eric Steinberger, founder and CEO of Magic. Eric has an epic backstory as a researcher, having caught the attention of Noam Brown and becoming one of his research collaborators while still a student in high school. Eric is known for his exquisite research taste, as well as his big ambition to build an AI software engineer. We’re excited to ask Eric about what it takes to build a full stack company in AI, his ambitions for Magic and what separates a good AI researcher from a legendary one. Eric, welcome to the show. Thank you so much for joining us.

Eric Steinberger: Thank you for having me, Sonya.

Vienna-born wunderkind

Sonya Huang: Okay, so let’s start with who’s Eric? You’re a Vienna-born wunderkind whose early passion for math turned into a I think what you described as a full-fledged obsession with AI by age 14. Take us back to age 14 Eric. What were you up to? How did you become so obsessed with AI?

Eric Steinberger: Thank you, Sonya. I think I just had my midlife crisis when I was 14, and I just was looking for something meaningful to do. I spent about a year looking at physics, math, bio, medicine, just anything really that seemed valuable to the world, and at some point bumped into just simply the idea of AI. It hadn’t sort of occurred to me until then. And if you could just build a system, a computer system that could do all this other stuff for me, like, great, like, I don’t have to decide. So it felt like my decision paralysis was sort of resolved then. It was this weird moment where I could just see the next 30 years of my life unfold in front of me and I was like, “Okay, this is clearly what’s gonna happen. Like, I have to do this.” It was quite nice. I like predictability, so it was great to know what the world will look like.

Sonya Huang: And you started loving math. Why AI then?

Eric Steinberger: I think I’m naturally attracted to math. It’s just what my brain sort of gravitates to. AI just seems useful. The thing that’s most important to me is just what is useful for humanity and the world. And math is nice, but not useful at some point. Like, you know, 17-dimensional spheres are probably not going to be the best career choice if you want to be useful. So it seemed like something that I could get good at, but also just the most important thing ever. And so it was a very clear choice. AI just is like—it was clear 10 years ago. It’s just it wasn’t close, and now it’s close and clear.

Sonya Huang: Can you tell the story of how you got to FAIR? I think it is such an epic story.

Eric Steinberger: Sure. I mean, so when I started at 14, I didn’t really know how to program. I didn’t get into programming out of curiosity about computers. I just wanted to solve AI, basically. So after a couple years of just warming up on my own, I reached out to one of David Silver’s PhD students who’s the AlphaGo DeepMind co-founder. And the PhD student, I guess at that point he was a graduate and sort of worked at DeepMind. I asked him if he could, like, spend a year with me, just every two weeks bashing my work, sort of trying to do some sort of like, super speed-up mini kind of PhD-like experience where I could just learn how to do research.

And I sent him this, like, giant email. You could print it out. I don’t know how many pages it would be, but it’d be a lot of pages, where I was basically just saying, like, “I want to build this algorithm you made in your PhD—sorry, I’m going to beat this algorithm you made in your PhD. Here’s a list of 10 ideas. I don’t know if they’re going to work, and I think I need your help to figure that out.” And then over a year, we eventually got there, and he was like—his name is Johannes. Johannes was kind enough to just bash me every two weeks, roughly. And yeah, it was brutal, dude. Like, because I was like, “Hold me to the standard,” you know? And I was like, “Don’t be nice just because I’m in high school.”

Sonya Huang: And you were in high school?

Working with Noam Brown

Eric Steinberger: Yeah, I was in high school. And then when we were done—I just graduated high school when I finished, so this is when I finished the project that I was trying to get to with this. And then Noam Brown, who is obviously one of the best RL researchers in the world, reached out because he had worked on something similar, it turns out. And we sort of had some ideas that were very similar and some ideas that were a little different. So we just published this. And he reached out, and then I got to work with Noam Brown for two years, which was great. And then that continued, and so I got bashed for another year.

Sonya Huang: How did you get his attention? You were a high schooler, he was Noam Brown.

Eric Steinberger: Well, I mean, he published a paper called “Deep Counterfactual Regret Minimization,” and I published a paper called “Single Deep Counterfactual Regret Minimization,” and mine beat his by a little bit.

Sonya Huang: So you oneupped Noam Brown as a high schooler.

Eric Steinberger: I think I’d just graduated. And it also took him, like, three months to write this paper, and it took me a couple years, but yeah, I mean, slightly. I’m sure he, like, would have come up with this the next day, like, the sort of the gap between the two things. But it was just—yeah, it was like obsession’ is the right word. I just—I do things like one hundred percent. Yeah, so that was a lot of fun. Kept working on RL with Noam Brown for a while and yeah, so that’s how I got to FAIR. Noam Brown worked at FAIR at the time, and he reached out. I was at university then, and basically just worked part time as a researcher at FAIR while studying, and anyway, that was it.

Sonya Huang: That’s awesome.

Eric Steinberger: It was a lot of fun. Noam is great. Like, the brainstorm ping pong sessions with Noam Brown? Dude, there’s nothing like this, where you would just think there’s this problem and it’s sort of like maybe you would start a six-month research project. Noam and I would get on a call and it would just be like, we’d just discuss it and it’s done. That was so great.

Sonya Huang: I love that. I love that. What makes him so great as a researcher?

Eric Steinberger: I think it’s a number of things. As a researcher generally more from a meta level, he is fantastic at picking the right problems, and then spending a long time just grinding to make it better and better and better and better. He’s very good at the whole, like, compounding thing in research. Also making bets that aren’t obviously the right bets when he makes them, because he makes them earlier, I suppose. So he’s generally very good at picking problems and then attacking them consistently. He’s also just very smart. I guess that helps. He works really hard. He used to do 100-hour weeks during his PhD. I don’t know if he still does them, but he used to work really, really hard during his PhD.

Sonya Huang: I imagine he still is. Okay, so Noam arranged for you to become a researcher at FAIR while you were still a university student.

Eric Steinberger: Yeah, that was fun. That was in my first semester, I think, or something. [laughs]

Sonya Huang: So you were juggling that. You were juggling being a collaborator to Noam at FAIR, and then you became obsessed with yet another problem, climate change. And actually started an NGO that is incredibly popular, ClimateScience. So you just didn’t have enough on your plate.

“I can do two things. I cannot do three.”

Eric Steinberger: No, it was actually too crazy. That’s when I dropped out. I was like, “This is crazy. This is too much. I can do two things. I cannot do three.” It was like my conclusion after doing—I did three months of that. That was terrible. That was fucking awful, doing all three things. Because it’s just like you just can’t do well at three things. I mean, Elon can, but maybe I’ll learn it in 10 years, but I couldn’t at the time. So I dropped out at the time. But yeah, I started an NGO. I generally—I just think, like, charity stuff is awesome and hugely underappreciated. Like, it’s sort of like super high status to start a startup, but I think it should be equally cool to start a charity. You’re, like, helping the world in other ways. And so yeah, I mean, we started it as a—it’s a nonprofit, but we started it like a startup. People were working insanely hard. We had clear objectives. It was a software product, effectively. It was much more similar to a startup, with the exception that there was no money in, no money out, which is very weird. But yeah, it’s mostly volunteer driven, or I guess it is. I just no longer run it.

But yeah, that was an interesting experience. You’d think I would learn—transfer a lot from running a quote-unquote “company” to running a quote-unquote “company” now. But they are so different that I could—like, there was like no transfer at all between ClimateScience and Magic. Like 1,000 volunteers, 20 hardcore engineers, no money at all. I can’t tell you how much we raised because it’s not announced yet, but giant. It’s completely different in every imaginable way.

Sonya Huang: Totally.

Eric Steinberger: But yeah, it was a lot of fun.

Sonya Huang: So Eric, ClimateScience became an incredibly successful nonprofit—it wasn’t just any nonprofit. What made you decide to kind of hand over the reins and hand over the torch on that and go start a company in AI?

Eric Steinberger: I just thought AGI was further away when we started it at all. I would never have started anything else if I thought AGI was so close. And once I realized it is, there was just like no other—I mean, my initial thing was always AI. That’s what I did as a kid.

Sonya Huang: Yeah.

Eric Steinberger: I care about various issues in the world, but none of them are my unique calling in any way. I just—you know, I’ll hopefully be in a position to donate a bunch of money and whatever, but the thing I care about fundamentally is AGI. And it was like, oh, damn it, this is not 20 years away.

So I have been running around with this AGI to-do list, which is somewhat of a meme internally, because it’s sort of like just going through it …

Sonya Huang: I think I’ve seen it.

AGI to-do list

Eric Steinberger: … and we’re trying to fix all these problems. You have seen it. Yes, we showed it to you. And I’ve been running around with a version of this. There’s actually like a—in 2017 or so, I was still in high school. I don’t know why, but some conference invited me to present my AGI to the list. It was wrong at the time. I was also sure it was wrong. But at some point, there was one thing I just couldn’t at all figure out. And I don’t like blue sky research in the sense of, like, just staring at a wall and trying to figure out what the right question is. I really like to have the question and then look for the right answer when starting an intense project, because you need to know which direction you run in to really plan for it.

And many things seemed clear, but it seemed completely unclear how to make these models reason in the general domain. And that became more clear with language models, especially code models. And so yeah, when I saw just some of these early—some of the early results in this basic, I was like, “Okay, I know all this stuff from the RL world. I have a bunch of other thoughts. This seems great.” Like, we should just take LMs and make them do the RL stuff. It’s a very simple kind of proposal, but I think that’s sort of where—I mean, it makes a lot of sense. RL has been doing this for 10 years. It works in restricted domains. If you can make something work in 20 restricted domains and you have something else that works in a general domain, if you can combine them, maybe you get both the X and the Y axis, and then you have your beautiful top right corner of the matrix.

And yeah, so it seemed pursuable. And if something—when something becomes—when something as important as AGI becomes an actually executable to-do list—obviously there are still things to figure out, details of the algorithms, how do you make it efficient, et cetera, et cetera. Like, it’s not like we knew everything at all. Many, many things to figure out. But the direction was clear, so it seemed like the right moment.

Sonya Huang: Okay, we’re going to circle back to your AGI to-do list later because I’m curious about it.

Eric Steinberger: Sure.

Sonya Huang: I want to brag about you for a minute …

Eric Steinberger: It might be wrong still. I don’t know. Until we have AGI, it is a hypothetical AGI to-do list. But we’re trying.

Sonya Huang: I think the research field is tracking pretty closely to your to-do list. I want to brag about you for a minute. I think you’ve been incredibly humble about your background, but as a high school student, you did catch Noam Brown’s eye. And, you know, as one of his colleagues at FAIR, you became one of his top collaborators. Not even just one of many, because there’s such talented people that work there. But you’re one of his top collaborators. And, you know, when I speak to folks that know you, they just say extraordinary things about your capabilities as a researcher, your creativity, your work ethic. As far as I can tell, you work nonstop. I think you texted me at 2:00 am in preparation for this podcast. So I think it’s safe to say that …

Eric Steinberger: Hope I didn’t wake you up.

Advice for young researchers

Sonya Huang: No, no. Thank you, silent mode. Anyways, I think it’s safe to say that you are one of the brightest minds of the current research generation already, and will certainly be one of the legends that people talk about for the next decade. And so with that in mind, I’d love to ask you some questions of advice for aspiring researchers. And so maybe first off, you did it all from a very untraditional background. How did you do it? And do you think that—what advice would you give to others in your shoes?

Eric Steinberger: I can only really speak for the sort of profile of goals and person I am. I think I was lucky in the sense that I knew very, very early—I was 14, as I said—exactly what I wanted to do with my life. I had no doubt at all. And uncertainty can be paralyzing to a lot of people. I also had a very clear sense that I did not at all have a plan B. Like, there was no other path in life that I would have been even, like, remotely above the neutral line on. Like, it had to be build AGI. Everything else is completely irrelevant. So I understand for many people, a well-paying job at Google is a great achievement. I mean, if it’s on AGI, it’s fine, but you get what I mean. Like, I just knew that there was nothing else I could do and be fulfilled in life and look back when I’m 90 and be happy.

So in a way, burning the boats very, very early gives you the opportunity to just be—like, do things that you’d otherwise do 10 years later. Which again, even—like, I sucked at the beginning. It took me two months to understand the first paper I tried to understand. Like, I was terrible at programming for a long time, but when you’re a teenager, you’re, like, a decent researcher, you don’t have to be great. Like, that gets you things like a great mentor who then bashes you for a year, which was very, very helpful. And then you get better, and you’re still young, so your brain shapes more easily, maybe. I don’t know. So I feel like I benefited a lot from being early, but within that, I’d say just go for the end goal immediately. Doing anything sort of like, “Oh, I’m gonna do a PhD, because I need a PhD to get a job.” That’s all bullshit. Like, you don’t. It’s just completely bullshit.

The other thing is, like, writing five-page emails to people actually works. Writing, like—I get a lot of these, like, two-paragraph, meh things now. I’m grateful I get emails, but I understand now why people think this stuff doesn’t work. It certainly does when you’re like, “Here is how I’m gonna beat your algorithm. Please help me.” Five pages, at least in my experience, every single time anyone I wanted help from in this way was very helpful. So I suppose be proactive in seeking, like, the best people in the world to, in a time-efficient manner, just distill their brain into yours, and show them that you can make use of that. If you tell someone who’s very good, effectively, “Hey, I’m gonna make good use of this. If you want to coach someone, I would love to be that person,” they’ll usually do it. They won’t do it for 10 people, but if they do it for one or two, that’s enough. You just have to win that seat, I guess. So that’s been really helpful in my experience.

Also just not shying away from learning new things. Again, I didn’t get into programming because I’m curious about computers. I’m not very curious about computers. I just like AI, and the computers are the thing that are necessary. So it’s fun. I enjoy programming now. It’s great, but I wouldn’t have gotten into it, I think, if it wasn’t for AI. But still, you get into it, so don’t be shy. We interview a lot of people who don’t know how you’d implement an LLM. And it seems kind of crazy to me if you’re a researcher and you couldn’t implement sharding or whatever. Like, it’s just insane. So really understanding the whole stack going down to—but sort of not bottom up, really top down. Like, here’s the thing I care about. This is the problem I want to solve. Okay, like, what do I need, what do I need, what do I need? And then all the way down.

And there are much more competent people at kernel programming and hardware design or whatever than I could ever dream up to be, but I understand enough of it to do better work at the top of the stack than I could if I didn’t. So I think fundamentally you need to understand the domain you work in. It’s also really good to just read everything. I used to read—I don’t know, I don’t have a precise number, but just every paper I could. Every paper I would see, basically. And eventually you get so fast at it that’s feasible, and you build a database in your head of, like, “Oh, this is similar to this thing.” This was sort of like, my eye-opening moment where Bill Gates has this interview like, “Oh, yeah, if you learn enough things, they’re all similar to each other. So it’s not linear, it gets easier.” And at that point I was like, “I should read every paper.” And so thanks for the advice, Bill. Obviously through a video, I never met him.

But so I just started reading every paper. And that’s really, really helpful because a lot of the best ideas that we had that work really well now at Magic were enabled by random things that are like, “Oh, it would never work without this random thing that I would have to have come up with in tandem.” But because I have this database in my head, I can go like, “Oh yeah, like this.” And then so often one good idea is enabled by three other ideas that others have come up with. And so it’s always just like this composition of stuff. So having a large database is really helpful. Yeah, and then just never stop. Like, never, never stop. It takes, like, ages to do good stuff, to do good work.

And at any point—there was actually one moment with Johannes, the DeepMind research scientist who mentored me for a year in high school, where we had a version of the algorithm that wasn’t very good. It was all right, and we’re thinking like, “Ah, should we publish this?” Like, we were both not really happy about it, and he was close to giving up on me. It was like, “Well, maybe this is just not gonna work. I wouldn’t want to publish this.” I was like, “Dude, fuck you, I’m just gonna get this done.” And then we got it done a month or two later. And so I think—I remember going on a walk after this and just being like, “Can I do this? I don’t know if I can do this, but there is no other option. So I just better get it done.” [laughs] And then I went back home and I started programming again. It was still sad that day, but the next day was fine again, and just kept going.

So I think you have to—I think that was a pretty formative experience because I actually wasn’t sure if I could do it. And then we just did it, like, super soon after. So I really haven’t felt that insane level of, like, doubt and pressure since then, which has sort of enabled—it’s actually beneficial. You have to be realistic, but you don’t want to—if you stop, you—yeah, I mean, so anyway, so I think those would be the main things. Also be like really fucking honest about what you suck at to yourself because otherwise you’re never gonna get good at it. Like, you need to search for the bad things, and instead of tryingactually, yeah. I think as a researcher, betting on your strengths is good only to the extent that you don’t have necessary conditions that are completely missing. Like, you can’t bet on your strengths if they’re not enabled. Again, back to the engineering thing, for example. So yeah, I don’t know, I’m rambling. But stuff like that.

Reading every paper voraciously

Sonya Huang: No, that’s great. That is such a fascinating glimpse into the inner mind of what it takes to be a great researcher and behind all the glamor of training large models. And so thank you for providing that peek. I’m really glad that you mentioned this kind of reading every paper voraciously and having this database in your head because one thing I’ve heard from your collaborators is that your superpower is understanding and absorbing new research. And so I’m curious, do you agree? Like, do you think that is your superpower as a researcher, or what kind of traits do you think have made you such an exceptional researcher?

Eric Steinberger: So I think initially in the RL work I did, it was synthesis, where I would read every paper and I would go, like, “This thing plus this thing plus that thing with this modification.” I think that’s what they would mean. That yes, was definitely very helpful. I think it’s a good way to do research. Generally there’s enough work for synthesis to be a successful strategy. I guess to an extent it’s still that. I tried very hard after—this is actually interesting that you bring this up, I realized this, and I tried very hard to get better at leaps. Like, coming up with totally alien crap that just there’s no reference for it at all.

Because ultimately, like, so if you take a transformer, for example, right? Like, attention existed. The idea of stacking a bunch of LSTM blocks existed, and you just have to remove the idea of recurrence, really. And a bunch of—couple other things that were necessary, right? Residual streams, like the residual update and transformers existed from ResNet. So it’s synthesis, but there is an amount of leap in there to make it all work. Like, it’s a little more complex than just taking components and putting them together. You need to come up with new things too, like the normalization and the head rejection. But anyway, everyone now knows this. But roughly, you should do this. There’s some new ideas in there that really help make it work. But it’s still a large amount of synthesis. So I suppose, like, most good ideas are synthesis, but there are always some—in the best ideas, there are some leaps. And I’m trying to get better at those. But still, it’s mostly, I guess, like, take five things and throw away the stuff that doesn’t work in them. Make the things work and configure. But yeah, I think some stuff needs leaps. But yeah, I guess, like, no, that’s a recipe. Like, take LLMs. Make them super efficient. Make context giant—throw RL on it. Make it all work together. It’s still mostly synthesis, I guess. You’re right.

Sonya Huang: Who do you admire most in the research world? And, like, what do you think those folks’ superpowers are?

The army of Noams

Eric Steinberger: Shazeer. Noam Shazeer. Yes. What is his superpower? I guess to an extent synthesis. He is—I mean, he’s just the best at synthesis. He’s also great at everything in the stack. He can—he has no weakness, really. Like, he can implement the whole thing on his own if he had to run it. He sees the future, I think, in a way, like, he’s very unconstrained. And I think everyone’s sort of crediting a number of the labs for scaling laws. This guy made a presentation where he was zipping through essays or completions or whatever written by models of various scale, and it was like, “This is 100-million parameter model, this is a 300-million parameter model, this is a billion parameter model, this is a five-billion parameter model.” This is on YouTube somewhere. It’s hilarious. And he goes, like, “What if we make this bigger?” He sort of presented it this hilarious way, and then everyone else is super scientific about it.

I think Noam is generally just, if I had to put it, he’s very intuitive. I think a lot of labs and researchers are sort of—and I think this is not a bad thing. It’s very good. Are very evals driven, very mechanical, right? Like sort of very empirical in a way. Like, Noam sort of just knows. He’s like, “Ah, this would work,” and then it works. So I think that’s a superpower, that—it’s just extremely great synthesis. He has the larger—he has a larger database because he’s been around for so long, he just—he literally knows everything. I mean, he invented half of the stuff that everyone’s doing now. There’s no one who compares.

I’d say there are a number of other people, I guess. Just you shouldn’t feel—out of all the people who are sort of the OGs of deep learning, I think Hinton deserves by far the most credit just because he went through all the bashing when it was like, “Oh, this will never work,” and they’re like training, like tiny, tiny, tiny, tiny, tiny things. They’re like, “This will never work.” And he somehow stuck with it. I think that level of grit and belief in something that is now obviously working deserves a huge amount of credit, whether capsule nets work or not, whatever. He—like, you know, it’s incredible to come to something like the conclusions that the world is at now. And if you look at some of the older papers, a lot of the ideas that are important now were in there already, so that’s important. And I think he just deserves a ton of credit. Noam Brown had—the army of Noams. Noam Brown.

Sonya Huang:  I should name my kid Noam, is what you’re saying.

Eric Steinberger: It’s a very good strategy, yeah. It’s a great strategy, actually. I think 100 percent of Noams that are somewhat popular and well known in the research community are great. Yeah, no, he’s—he’s also amazing. I mean, a number of labs were working on what he was working on during his PhD, and he basically soloed the thing and was like, way better and way faster than labs that put 10 people, including some really famous names on it. And if you just look at the paper trail and track record, like, here’s the rest of the field, and then Noam’s 100X efficiency. And then here’s the rest of the field. Noam does it again. And it’s just consistent. I think the consistency with which he has just bashed out these 100X multipliers in RL data efficiency and computer efficiency is crazy. The Noam army is pretty good.

The leaps still needed in research

Sonya Huang: I want to go back to this concept of leaps are still needed in research, and that you still have this AGI to-do list. What do you think are the most interesting unsolved problems in AI right now?

Eric Steinberger: Well, so a lot of it is solved now, I think. And the thing that remains to be solved is general domain long horizon reliability. And I think you need inference time compute, test time compute, for that. So you’d want—when you try to prove a new theorem in math, or when you’re writing a large software program, or when you’re writing an essay of reasonable complexity, you usually wouldn’t write it token by token. You’d want to think quite hard about some of those tokens. And finding ways to spend not 1X or 2x or 10X, but a 1,000,000X the resources on that token in a productive way, I think, is really important. That is probably the last big problem.

Sonya Huang: Fascinating. The last one. Okay.

Eric Steinberger: I hope so. I think it’s reasonable to think that is the last big unsolved. I mean, look, over the last few years, all of this other stuff got solved. Like, oh, can we do multimodal things? Can we do long context? Can we do all this? Now it’s gone. Reasonably smart models, you know, they’re quite efficient now in terms of cost, that I mean, you’d have to be a reality denier to not see what’s coming.

Sonya Huang: So then to what you were saying earlier then, it’s kind of like what the results from gameplay, like poker or Go. Like, if you let the machine have a lot of time to think and inference time, it can do way better. That’s what you meant by …

Eric Steinberger: I mean, this is just—this is like a realization to a lot of people in the LLM space. But, like, RL has been doing this for, like, ages. So it’s just, like, so clear that you need to do that. Or maybe you don’t need to. Maybe you can get away without doing it, which would be insane. But if you don’t need to, it will still help you a lot. Like, it’s just like, do I want to spend a billion dollars on my pre-training run and then, like, a little bit more money on inference? Or do I need to spend $10 billion on my pre-training run? You know, like, $10 billion would be great, but I’m gonna be—I’m gonna prefer spending one.

Sonya Huang: And is bringing, like, the LLM and RL worlds together, is that like, a research problem? Like, there’s still, like, fundamental, like, unsolved science problems? Or is that like a, you know, we have the recipe, we just need to do it and have the compute and the data?

Eric Steinberger: I think there is no public successful recipe right now. There are good ideas. Like, okay, even if you take best of N, make N large enough, it’s sort of—you know, it’s not terrible.

Sonya Huang: Yeah.

Eric Steinberger: So there are ideas. I don’t know that the final idea exists. I think there’s just a lot of room up from what is currently known. But there are ideas. See, I think it’s very unlikely that even if you stop progress in research, we would not, at some point, hit something that everyone would agree is AGI. It’s just that I think we can do better. And maybe it couldn’t solve Riemann, right? Maybe it couldn’t do all these super hard things, but it’d be pretty good. And now I’m just curious, okay, like, what’s the actual—if we did all the things, how good will it get? So I think there is research left to be done, and there are a lot of ideas floating in the world now. Everyone’s sort of working on this, but I don’t know that the current set of ideas is even final. It’ll keep moving, I think.

What is Magic?

Sonya Huang: Yeah, let’s transition to talking about Magic. Maybe just what is Magic? You’ve been very mysterious to date, so maybe just share a little bit about what you’re building.

Eric Steinberger: Yeah. I mean, we’re trying to automate software engineering. It took us a while to figure out how to train supergiant models. That’s a pretty interesting engineering challenge. I mean, fundamentally, we’re trying to automate software engineering from the product side, and a subset of that is a model that can build AGI. Because it’s like, if it’s a great software engineer, then it should be able to do everyone’s job at Magic. Like, if it can do everyone else’s job, that would be a subset. So the idea is that you could use this to recursively improve alignment as well as the models themselves in a way that isn’t bottlenecked by human resources.

And there aren’t that many Noam Shazeers in the world. If I had a Noam Shazeer on my computer, I could spin off a million of them and maybe alignment would just be solved. I’m simplifying a ton and very idealistic in the statement. I’m happy to turn this whole thing into a scalable oversight podcast if you’d like, but the core idea is like, okay, if I could just clone what we are doing into a computer and then press ‘yes’ on the money button to run a cluster to do the work we would be doing next week, that would be phenomenal.

So I think we’re pursuing these two things in tandem, where we want to ship something that’s a good AI software engineer for people to use. It’s like, I think, going to be one of the first domains to see higher levels of automation. And I don’t like talking about—I don’t think the whole assistant pitch is going to last very long. Once these models are good enough to automate, there’s just no way the economy is not going to do that. And I think everyone knows this and they just don’t like talking about it. It’s totally fine. We used to all be farmers. We’re not farmers. We’re fine. Everyone prefers this. I think we’ll figure out—figure our way out in the economy. If it produces the same or more stuff with less inputs, we should be able to figure that out. That’s not a hard problem from economic principles. You just have to figure out distribution. Anyway, but that’s what we’re trying to do. We’re trying to automate software engineering and as a part of that, automate ourselves in doing the work we want to do.

Sonya Huang: And so the reason they go after software engineering then is that is the kind of lever that allows you to automate everything else.

Eric Steinberger: It’s like the MVP of AGI, right? Like the minimum viable AGI, because then it creates everything else. We wouldn’t train something like Sora. Sora is great. Fantastic, generates videos. Awesome. It’s just not interesting from an AGI perspective if you believe that models can code themselves soon.

Sonya Huang: Totally. And so out of all the companies that are trying to build an AI software engineer, you were probably the only one that is really taking a vertically-integrated approach and training your own models. And that is either insanely brave or insanely crazy, and probably a combination of both. I’m curious, like, I know you love training models and so I know that’s part of it, but why do you think you need to own the model to get this right? And how do you motivate yourself in kind of the David versus Goliath of knowing that OpenAI exists and has great people and cares about coding and is great at building models, obviously. How do you think about that entire dynamic?

Eric Steinberger: I think you need—well, to build the best model, you need to build the model. And we want to solve these fundamental problems. You can’t rely on any—like if the API guy solved it, then what the hell are you? We might as well start the company three years later. It goes to the point where we started, right? We started working on this stuff two years ago. So we have—it took us some time to learn how to train these large models. I think it took OpenAI two years to get from GPT-3 to GPT-4 as well. And I thought we could be much faster, and this is going to be great. It’s a pain.

So it’s definitely an engineering challenge, but it’s necessary. Like, it’s not like we’re doing it just because it’s fun or because I like training models. It’s a massive financial investment that people trust us with. And it’s not like it’s one of those like one-to-one ROI investments. It’s like, if it’s work, if it works is fantastic, and if it doesn’t work, the GPU is run and the money is gone. So, like, you’re getting a lot of people’s trust doing that. It’s certainly not something you should do just because it’s fun and you enjoy it. Fundamentally, I think the value will accrue at both—at the AGI and at the hardware level, and never at the application level. There’s no incentive at all to offer an API. If the API creates a $100-billion company, you will just build that company internally. If OpenAI doesn’t, someone else will. It’s just incredibly unimaginable to me that that would be how you would build these companies in the first place. From a business perspective, I don’t think that’s necessarily the right way. Maybe there’s some partnership potentials. You could go, “Oh, we’ll get special access or whatever.”

Sonya Huang: But why is that different from cloud computing, right? Like, there’s been many $10-billion or $100-billion …

Eric Steinberger: I mean, it’s much, much, much harder to build Netflix and Airbnb and Uber than it is to build a chat interface. Like, fundamentally, Magic is an application you press download on that we have a couple guys working on and it’s just there. Like, it’s not—you know, you can build this with, like, YC pre-seed money.

So the moats—the moats in—I guess I could just make the API twice as expensive for the next model, and then launch my own product and then undercut every—it’s really fucked to not own the model in this domain, and in any domain that’s going to generate a ton of revenue for a single company. In the case where it’s distributed, maybe it’s fine, but I don’t think this will be. So it’s necessary both for the market, which is good for us, because the market is incentivized to fund folks like us, which it isn’t in other domains. Like, have fun writing an email assistant. You’re not gonna get that funded anymore. So that’s helpful. But fundamentally, the reason we train our own models is because it’s necessary for our mission. And I just wouldn’t be interested in building a nice little SaaS wrapper. It’s just not—like, that’s going to happen anyway.

Competing against the 800-pound gorillas

Sonya Huang: And how do you think, though, about competing against the 800-pound gorillas? Like, you’ve raised a lot of money, but some people have raised boatloads of money.

Eric Steinberger: Yeah, they raised a lot more money, too. Some people have $100-billion+ in revenue a year that they could spend. It goes beyond even the ones who could raise. Yeah, absolutely.

Sonya Huang: And so how do you—like, how do you motivate yourself to compete in that reality?

Eric Steinberger: The question is how much does it cost to build AGI and not how much money can you raise? Because if you can build AGI for however much you can raise and you’re—having more might help you, but it won’t get you there substantially sooner, right? Like, if you have all the right ideas and you can build it with a certain amount of hardware, like, by definition, like, okay, if someone had, like, 100 times more hardware, would it be, like, computing that much faster or whatever? But it doesn’t seem like a material advantage if your estimate for how much compute you need to build AGI is not as high as the revenue these companies could generate or the funding they raise is, in fact, much lower.

And I think that is the case. So it’s not by any means accessible. It’s very damn hard to get that much money, but it’s not $100 billion. And if I’m wrong, I’m wrong and it’ll be $100 billion and we will not have $100 billion, and that’s it. But if we can get to that point where we have AGI and a couple others have AGI, and then, like, the sort of the benefit of additional compute there, and you show an ROI, it’s like a reasonably even playing field in terms of additional revenue. You’re just—you’re gonna bring AGI to the market, you’re gonna raise more on it. So the starting conditions of, like, half this hardware is, like—and you need sufficient hardware, but you don’t need more than sufficient. So that’s a bet. That’s not a—you don’t know, but I think it’s a bet with a high enough probability of being right that it is reasonable to compete in the space. And I think it is actually—it is reasonable to think that, like, the ROI of having, quote-unquote, “sufficient” funding might be better than the ROI of having, like, infinite funding early on. For investors, that is. That is not for me.

Ideal team size for researchers

Sonya Huang: [laughs] Is there an ideal, like, team size for researchers? Is there a certain point at which you reach kind of like, diminishing marginal returns of adding on an extra researcher?

Eric Steinberger: So one of my biggest weaknesses, especially early on at Magic, was just scaling the team effectively. Like, we were very single-threaded on a very small number of people doing basically all the work. And I think we’re getting better at that now. It’s also, you just need a certain level of maturity of your code base and of your research ideas and everything to properly segment them. So early on, I would have said five for that time. Now I would say closer to 20. And I’m not including folks working on other stuff. I’m including folks working on the models and everything. I’d say closer to 20. I could imagine that in a few months, I’ll say at a slightly larger number. Especially when you get into large scale deployment, you really want to have very, very good processes around just having high reliability, availability of services that are detached from each other, et cetera, et cetera. So then you can segment even more, which is obviously stuff we’re working on now.

But it sort of grows over time. I don’t see it ever exceeding, like, the tens of people, and right now it’s in the low tens, very low tens. But I don’t know, maybe—it’s a skill to be able to utilize. If you’re able to utilize 200 people, you’re just a better CEO than I am. No, seriously. If you can, it’s a good skill. And I think part of why I say a smaller number for us is that there is a ton of stuff we just don’t do. Like, if we built a video model, that would just be a separate team. They built a video model and like, you know, that’s more scaling. So to an extent, we’re more focused, and that’s why we’re smaller. But also, if we could double the team and be twice as fast, I would do it any day.

AI that feels like a colleague

Sonya Huang: Back in, was it late 2022 when I first met you? At the time, it was marketing assistants and email assistants were all the rage. And you were the first pitch that I heard that was AI that feels like a colleague. And I just remember that really sticking in my brain. So in some sense, you’ve been thinking about kind of like agents, to use a buzzword, longer than anyone else. Maybe share your vision for that and what you think it takes to build a great agent.

Eric Steinberger: Fundamentally, there are two tiers here, I guess three. One is useless. The next is assistant that you have to micromanage. And then the next is the thing that manages you, basically, where it’s more like a colleague. I think the layer where it’s exactly even doesn’t really exist because it’s sort of like this little thin point. Once the model is more competent than you are, you are there to give it guidance on what you want to be accomplished and answer clarification questions, but you’ll never have to tell it, like, “Here’s a bug.”

I’m not saying that this is V1 of everything. I’m not saying this is V1 of our product, but fundamentally, that has to be the goal. The way I feel when I talk to my best engineer, that’s how I want to feel when I talk to Magic, where we have a discussion, he’s almost always right, and then he just writes the code, and then someone else reviews it, and then it works. Like, that experience, where my job is exclusively saying, “Here’s kind of what I want,” and then they help clarify, even, right? Like, I just want to hear specifically that. Like, that—it should feel like that. And everything else doesn’t matter to the user. Like, what tools the agent uses, how it works, does it run locally? In the cloud? Does it need a VM? Does it have a browser? I don’t care. Doesn’t fucking matter. Our problem, not your problem. You care about your problems getting solved.

So fundamentally, that’s what I think matters to customers. And everything else is dependent on the exact product shape, exact domain, exact everything. And, like, I’m stubborn as fuck, and I just don’t want to launch anything that isn’t that. We will probably have to, but I just really want to get that thing. Like, I want to talk to my computer, go and have lunch and come back,  and it built AGI. Like, that’s the end goal, right? And there’ll be checkpoints, but I don’t think anything else matters. How you accomplish that is up to each individual company.

Sonya Huang: Yeah. How far away do you think we are from that? Or I guess maybe to break it down into a little bit more …

Eric Steinberger: I mean, we met in 2022. You learned how to extrapolate Eric’s timelines. So maybe yeah, one and a half or double everything I say. But I think very soon, like, very small number of years. I don’t want to give a number now, but very small number.

Sonya Huang: Less than 10.

Eric Steinberger: Oh, definitely less than 10. I mean, way less.

Sonya Huang: Wow. Okay. Because I’m seeing some of the, like, the SWE-agent stuff that just came out. They’re at, like, 14 percent on SWE-bench, which feels …

Eric Steinberger: I mean, they’re not doing any inference time compute.

Yeah, yeah, yeah. I mean, 14 percent. I just don’t care about 14 percent. Like, I mean we—like, I don’t know if 80 or 90 is good enough. I think you need 99. Like, even 96, I don’t trust my computer. I don’t want to review the code. The tier of products where I have to review the code is fundamentally different from the tier of products where I don’t have to review and understand the code. And you’re not talking about 95 when you don’t want to review, you’re talking about 99-point something. You’re talking about whatever my developers accomplish, plus some—same as with self-driving cars. So the difference with self-driving cars is like, you die if the thing crashes, and here you just have to review code. So it’s launchable before, but fundamentally you need way, way, way more. And, like, usually the last few, right? The nines are hard to get. So yeah, but no, I think you can—I don’t know, people have—I mean, models have surpassed all these benchmarks. I mean, just recently, the math benchmark, right? Like way faster than even, like, the prediction markets assumed. And, like, I don’t see that stopping. There’s just too much. Like, if everyone was stuck, and, like, I realize there’s some perception in the public that, “Oh, GPT-4 is only—not getting much better.” No, nobody’s …

Lightning round

Sonya Huang: Okay, we’re gonna close out with a lightning round. One-word answers. One. What’s your favorite AI app, not Magic?

Eric Steinberger: Probably all the invisible ones still. Like my spam filter and all that stuff.

Sonya Huang: Love that.

Eric Steinberger: The things that keep life working, I think, are still at the moment more useful than the sort of AGI-like apps, because if you took them away, life would just be awful. Like, recommendation algorithms for whatever, I think that’s really useful. Other than that, yeah, I think whichever you’re saying, other than, let’s say, other than the programming world, other than Magic, I’d say whichever model is currently best. It’s a very boring answer, but I actually pick the spam filters, et cetera, the recommendation services first.

Sonya Huang: What paper has been most influential to you?

Eric Steinberger: I don’t think this paper is relevant at all in the world anymore, but it was the first paper I ever tried to deeply understand, like, or spent months on it and reimplemented it and everything. And so it was most influential to me as a person, not so much to my current work. And the paper is called “Deep Stack.” It’s one of those neural networks plus imperfect information game solving papers. It’s reasonably complex for the time. Yeah, so for folks who are interested, it’s like nowhere near SOTA now, but it’s sort of just an irrelevant type of algorithm. But at the current time right now. Back then, it was useful. So that was very influential for me because it was just my first touch point with research, really. I had no idea how to do research at all. And then I sort of just was like, “I’m gonna dig into this,” the way people, like—people, like, hyperlink spam on Wikipedia, where you rabbit hole? I did that with this paper.

Sonya Huang: I love that. Okay, that’s going to be my weekend reading. Last question. What are you most excited about in AI in the next one, five and ten years?

Eric Steinberger: Just what it’s going to—how society is going to integrate with it. I think that’s—we’re getting to the point now where it’s really going to impact over the next one to five years, it’s really going to impact how society does stuff and beyond just another tab in your browser that speeds you up by some percentage on some tasks. I think it’ll get much more significant in that timeframe. And ultimately, you should—I am not one of the intrinsic curiosity type of people. I know most researchers are. I really am not. I just care about the outcome, and that is the outcome. So most excited for the outcome.

Bonus round: 200M token context announcement

Sonya Huang: Eric, thank you for joining us again. Last time we recorded the podcast, we weren’t actually able to talk about the thing that got us so excited about Magic, which was you had shared with us your long context eval, and our own kind of AI researchers had gotten really excited by what you’d accomplished on that. And that was actually what led to us investing in Magic in the first place. So you just made some exciting new announcements around the eval. I was hoping you could share it with our audience.

Eric Steinberger: Yeah, for sure. Thank you so much. Yeah, I mean, we’ve been running around with this hashes eval for a while, basically just being frustrated by needle in a haystack evals, and everyone keeps complaining about it. And now that we’ve decided to announce where we’re currently at in terms of our context work instead of just blah, blah talking about all, we have so many tokens of context, it felt reasonable to share the eval as well. I mean, we’ve used it in our fundraising, obviously—and thanks for backing us—and generally just used it to guide our architecture development and our research. So yeah, it felt right to open source it and let others compare their architectures and their results with ours. And then it’s exciting to share. And thank you for having me back on to talk about it.

Sonya Huang: Thank you. Can you say a word on what’s broken about needle in a haystack and what your eval does differently?

Eric Steinberger: Yeah, for sure. With needle in a haystack, basically what you’re testing is like, find this weird thing, the needle, in this giant pool of not weird stuff, the haystack. And so really, all you need to be able to do to do this is to sort of take a little backpack and walk from the start of the context to the end of the context and find the weird thing, put it in your backpack and return it. You have to have this implicit prior that there is this thing that’s weird, so you’re more likely to remember it, which means that you actually don’t need to remember the whole context window. You don’t need to know all of it.

So that allows some models, I would say, to look like they’re doing long context really well, when really it’s not working as well. So we decided to just go the complete opposite, super hardcore mode and just replace everything with random noise. There’s no semantic information at all because it’s just randomly generated letters, basically just hashes. And if you did something like needle in a haystack in a pool of hashes, you really have to know the whole thing. But then what we do is we also do a hop, so it’s not just you find this one thing, but you find this one thing, and then you find another thing, and obviously you can keep that going. But those two dimensions, I think, really are the important quantitative components of context. 

There are other things you can measure much better in more domain-specific evals. Of course, we care a lot about code, and so we look a lot at that, too, internally. But I think from a general purpose context evaluation perspective, and the reason we chose to open source this eval and only this eval is just that. I think this quantifies exactly what you want to measure when you think about long context, and everything else is domain specific. But yeah, you want to be forced to remember the whole context window when you’re talking about the context window. Otherwise, is it really that big?

Sonya Huang: Totally. I remember our own researchers were just blown away by the purity of the eval and how well done it was. And so thank you for what you’re doing, and thank you for open sourcing it, especially in an age where long context is becoming more and more important.

Eric Steinberger: Thank you so much for having me back. Cheers.

Sonya Huang: Of course. Thanks, Eric.

Mentioned in this episode:

Mentioned in this episode: