Skip to main content
Simulating Humans at Scale: Simile’s Joon Sung Park
Episode 89 | Visit Training Data Series Page

Simulating Humans at Scale: Simile’s Joon Sung Park

The race to build superintelligence is producing models that keep getting better at objective problems, but not at behaving like actual people. Joon Sung Park, founder and CEO of Simile and creator of Stanford’s “Smallville” generative agents study, argues that simulating human society requires a fundamentally different kind of model. He frames today’s frontier models as the “CPU of intelligence”—rational, superhuman at problems with right answers—and Simile as creating the “GPU of intelligence,” built to encode the diversity of people’s values, preferences, and tastes. Joon’s larger bet: a “CERN of human society” that could one day model bank runs, climate cooperation, or the early signals of a collapsing democracy.

Watch Now

Transcript

Intro

Joon Sung Park: I am somebody who is quite inspired by science fiction. And when you read science fiction that covers societies that have progressed far enough in its technological maturity, you always see two pillars. You have some version of AGI, and you have some version of simulations that really help guide the society. I do see an opportunity today to really take a first crack at building the simulation. I would not have said that even five years ago, but that is a conviction that we have built up over the years as we’re going deep into this research.

Main conversation

Sonya Huang: Today we’re delighted to have Joon, founder and CEO of Simile. Simile is building an applied AI lab simulating human behavior and societies. And I’m very excited to have you here to discuss what you’re building.

Joon Sung Park: Same here. Thank you for having me.

Sonya Huang: Okay. Take me back to April 2023, Stanford, California, specifically Smallville, Stanford, California. What was that?

Joon Sung Park: So Smallville was a project that we were running at Stanford, where the idea was that we made this observation that large language models can now encode a lot of human behavior that is embedded in its training data, from the web and social media and so forth, that if you sort of probe at the right angle, you can actually get a lot of microbehaviors out of these models. So given a very specific demonstration or description of a situation, what would person X do? And it would actually generate really interesting behaviors.

We found that to be so interesting. And we found that to be the ingredient that we had been waiting for for creating really complex agentic behaviors. So Smallville actually was an experiment where we decided that if we push this as far as possible, what would a society that is created by these agents look like? So we basically created generative agents that is paired with a generative AI model with memory, planning, and reflection to basically create this lived experience of agents living in this small town. So Smallville was basically a game town of 25 agents living in it. Individual agents had a description of persona, but they would actually wake up in the morning, do their routines, go to work, actually have relationships, sort of like people would, and they would actually have emergent phenomena, like having parties and so forth. So that was the experiment that we ran.

Sonya Huang: What was the most surprising thing to come out of the experiment?

Joon Sung Park: So one of the surprising things was—so the experiment, the simulation itself actually took place the day before Valentine’s Day. So you actually see these agents, one of the agents actually thinking, “Well, I run a cafe.” So she’s a cafe owner. Her name’s Isabella. She goes and thinks, “It would be great if I can do a Valentine’s Day party where we invite a lot of friends, customers.” So you actually see her on the day before Valentine’s Day going around, actually gathering materials for the party, actually telling her customers, “Hey, we’re going to have this party, please come.” And on the day of Valentine’s, you actually see this immersion party that actually gets formed with all these agents coming to the …

Sonya Huang: Did anyone not get invited?

Joon Sung Park: Well, some of the people did get the invitation, but they forgot. That’s one thing that did happen. Some of the agents did not explicitly get invited, but we had one agent who got the invite, Klaus, who decided to ask his crush out on a date. So he would actually bring in the date, they would actually have a party at this cafe. So quite surreal.

Sonya Huang: So how did you end up building Smallville in the first place? Were you studying kind of human psychology and social behavior, or was this coming from—was it coming from the kind of customer back, or was it coming from the technology out?

Joon Sung Park: So my particular team has been excited about simulations, and we saw the vision of simulation failure early on. So my career as a researcher at Stanford really started back in 2020. That was the year when GPT-3 was about to come out. It wasn’t quite there yet, but it was just about to come out. We started to get its first demos. And my first year, we wrote this paper called “Opportunities and Risks of Foundation Models” alongside many of the Stanford researchers, and was led by one of my co-founders, Percy Liang, who is now the head of the Center for Foundation Models at Stanford.

And when we were writing that, the part that I was really focused on was well, here’s a new class of models that we have not seen in the past, that these models that can be very generalizable in ways we didn’t quite have in the past. And I got into thinking, well, if we can imagine the kind of interaction we can create with these models, what would that be? And many of my colleagues back then were surprised that these agents or these models can do classification or simple generation. And that was really incredible to see because these models didn’t really know or didn’t really—wasn’t really taught to do that. But the part that was surprising to me wasn’t that these models can do that, because from an interaction perspective, we’ve known how to do this for a long time. The interesting part was, these models can actually encode human behavior. What does that mean if we were to push this as far as possible?

So part of the tradition I come from in research included what we call social computing. And social computing within human-computer interaction really has to do with this idea of how can we build a better technological platform that would enable social interactions and collaboration? One of the most difficult challenges of building a social platform is not necessarily testing the UI/UX of the system, but it’s more about when you have tens of people, millions of people, and down the line billions of people, how do all these people come together to create the immersion phenomenon that’s both good and bad? And how can we design for scale? And so far, we didn’t really have a tool that would enable us to test for that. The only way we test it today is you basically field test it, you release your prototype, see what happens. And sometimes it actually comes at a real cost. Obviously, it’s high cost in terms of human hours and the time it takes. But at the same time, if you have a bad design, imagine you have a feed on social media that is more likely to propagate certain emotions that are negative, then obviously, that is something that we want to avoid. But this now gets tested in the field.

So we wanted to see whether we can actually create a simulation that would actually let you test for this. So 2022, this was actually a year before generative agents, we worked on a paper called “Social Simulacra,” which actually really was the precursor to the agent paper that we ended up writing. The core thesis was: imagine you’re building a subreddit. You’re a designer on a subreddit. You want to see what people might do in the subreddit, which is a surprisingly hard task, even for practiced designers. And we basically decided, hey, we have this model, seems unique. Let’s use this model to create simulations of the entire subreddit. So you define the goal, you define the moderation strategies, and you populate it with thousands of—back then we didn’t call them “agents,” but we called them “personas,” but populate it with thousands of personas. This is basically ’22 version of Moltbook, which is quite interesting that it actually came back.

And when we saw that, we actually got a lot of really important insights out of this. What are the good behaviors? We actually simulated a community where the entire idea was for people to discuss with each other the places to sightsee in Pittsburgh. And all of a sudden, you start to see these personas actually collaborate to actually discuss, “Hey, XYZ places are amazing. Do you want to actually go on a trip together?” And actually plan those trips live in this simulated subreddit. So that’s how we got excited. So we saw the vision and the excitement and the potential applications fairly early on. But then the work that we had to do was then demonstrating how can we go beyond simple personas to create complex agents that actually can think over time, because we want to simulate the longitudinal aspect of our society, and then actually validating that these simulations are actually accurate in practice.

Sonya Huang: Was there a point of model evolution at which you felt like, okay, we’re there, the models are good enough for us to actually have a faithful representation of human society?

Joon Sung Park: So GPT-3, when it came out, and “Social Simulacra” was built with GPT-3, and it was very janky. It didn’t do any instruction tuning. It did not follow your instructions. So just to have it listen to you and do what you wanted it to do, you had to do some weird tricks with prompting and so forth. But you could actually see the promise. The model actually had encoded a lot of human behavior, and you could actually see the trajectory. And when we had the generative agents paper, it wasn’t quite ChatGPT, but we now had instruction tuning. So we could actually build much more complex agents that can reason about its memory. That wasn’t really possible when we did “Social Simulacra.” And since then, of course, the models have improved. So where we are today is the models at its foundational level have reached a point where we can actually imagine building these kind of applications. Now the part that actually I do think, however, that’s quite interesting here, today, if you look at many of the large language model companies, whether it’s OpenAI, Anthropic, and many of the new labs that are getting formed, the models they are creating are models that I would consider to be—their North Star to be something that is similar to, let’s build a super intelligent machine. These machines are meant to be rational, and these machines are supposed to be really amazing at tackling problems that have an objective answer.

Sonya Huang: So maybe that’s not even the best simulation of true human society then.

Joon Sung Park: Turns out, people are irrational. We have a lot of subjective values, preferences and tastes. So you actually start to see divergence in model size going up, and the performance in its ability to predict and simulate human behavior. So we have sort of plateaued with current modeling paradigm, our ability to really simulate humans. So it is sort of at this starting good foundational level, but to make it really amazing, we do need the next frontier that is more geared towards actually modeling people’s diversity.

Sonya Huang: Very interesting. At what point did you realize that what you did with Smallville could become a company?

Joon Sung Park: Right. So again, the promise of application was something that I was very much inspired by early on by simulation with “Social Simulacra” and so forth. But the part that I realized over time is research and a company have very different functions. Research is an amazing vehicle if you want to basically do breadth research. You are in a lab surrounded by a really smart set of people. And each of the researchers own a small piece of thesis. And they go explore, some of those thesis blossom into amazing research products. But we’re not necessarily known for finishing our job. We’re not usually the ones to bring that research impact to the real world.

A company is a machine for depth-first search. You have a conviction on an area, you find a hill that you want to climb. This is the vehicle that lets you put together resources and an amazing group of people to go after a singular vision without hesitation. And we got that conviction, I would say about half a year after generative agents. After the original generative agents paper, we got so much inbound interest initially from actually social scientists who wanted to run their experiments and all the RCTs on our platform. Then very soon after, many of the Fortune 500 companies who saw this demo and their board members and CEOs who sometimes visit Stanford saw that, and they started asking, “Well, we go run all of these surveys and experiments, and there’s so many research questions about the market that we cannot answer today. Can we run that in simulation?”

And that started to really intrigue me, because that showed a clear line towards a real-world impact for research, which is not always the case that we have that kind of opportunity. So that is when we decided we actually want to validate the simulations are accurate, so we went out and actually created simulations of a thousand people of the US population. We demonstrated that using our architecture and the models, we can actually predict people’s behaviors 85 percent as accurately as people replicate their own. When we saw that, we thought okay, this is something that we feel comfortable providing to our users as a platform for simulating their really important decisions. So that’s when the co-founders—myself, Percy, as well as Michael Bernstein, who was a researcher and my advisor at Stanford—both of them were actually my advisors. So the three of us had been working together for five years. And now at this point, similarly, six years. But that’s when we got together to have the initial conversation of can this be a company?

Sonya Huang: Got it. Amazing. Maybe walk me through a customer engagement end-to-end today. Like, who’s a canonical customer and in which department? And they come to you, what are they asking you? And what product do you—or service do you deliver to them?

Joon Sung Park: Right. So maybe an example that I can give to make this concrete. So CVS has been partnering with Simile for the past, I would say, nearly half a year. And they’ve been an amazing partner. The way we initially got in touch with them is so our main buyer at CVS is the lead—it’s a senior VP who leads human insights. And the original story there he basically read my paper that validated the agent simulations and thought, “We have to bring this to CVS. Because today, we are bottlenecked by the number of questions we can field test. And we’re also bottlenecked by truly the physics of human society.” It’s one thing to ask surveys and experiments, but a totally different thing if down the line, you actually want to simulate the entire market, and actually map out all the second-order impact of the decisions you suggest to your leadership.

So he’s been looking around for that solution. And his cousin happened to know me. And basically told our buyer, Sri, that the authors of the paper are actually looking to start something. So that’s how we got connected. And in this particular engagement, usually the way this goes is our customers are very much used to working with polling companies or panel companies today. And there they go and basically ask these companies, XYZ are the populations that we are interested in better understanding. Can we go run a research study of these topics?

That initial stage looks very similar for Simile. So our buyers come and they tell us, we want to better understand XYZ population. Then Simile goes out and we have—through our partnership with vendors—we have a strategic partnership now with Gallup, for instance, who is a polling and panel company, where we go out, work with our vendors to actually reach out to real humans. So these simulations are grounded in real data, but reach out to those people, collect data that we believe are efficient and generalizable about that person.

So imagine you have 15 minutes. What are the magical questions you can ask these people during that time? We collect that data, use that data to create agents or simulations of these people that can basically be used to answer a large number of questions that goes way beyond the original domain. We load it onto our platform, and it’s basically a SaaS product. Our customers come and they can basically ask any questions about the group of people of their interest.

Sonya Huang: So interesting. It reminds me of autonomous vehicles. You know, you go and collect a bunch of data from the road and then you’re able to augment it with simulation. Is this a similar concept, or are there big differences to what you’re doing here?

Joon Sung Park: It is a similar concept in the sense that, of course, with the self-driving vehicles, you want to create a model that is based on real-world physics, but you want to create a model that is generalizable beyond your training data. It needs to be generalizable into different locations with different weather conditions. Very similar concept, where what we want to create is we want to reach out to real people. And these people want to understand something fundamental about these people in a way that we cannot code into the model.

Sonya Huang: I would’ve thought that the large language models would be such a good representation of the whole world that you could almost narrow it down. You could tell Claude you are a 34-year-old woman living in a bicoastal metropolitan area, and it would be able to have a faithful representation. So I’m actually surprised that you go out to Gallup. Maybe can you just explain why you have to go out and collect any real-world data at all?

Joon Sung Park: Yeah. One of the big questions here is the question around say-do gap. There are things that people say, and then there are things that people actually do. And the gap there is real. And a lot of the large language models are trained on attitudinal data. Fundamentally, it is the things that people have said online that does cover a large quantity of its training data. So one of the things that Simile’s simulation platform does is actually closing that gap. So a lot of the data that we end up collecting by nature are behavioral. It also includes data that actually goes into literally questions like, “Just tell me the story of your life.” Turns out, if we understand the person’s story of your life, the kind of data you get from it is what we consider to be the long-tail information about this person. It’s not about what you’ve done in this particular moment. It’s not about very broad questions like what’s your view on politics? It’s about where you grew up, what were some of the difficult decisions you had to make in life. And what’s interesting about this data is it’s an amazing way to build a translational layer between attitudes and behavior. So we combine these kinds of datasets, but fundamentally, that’s the gap that we want to close.

Sonya Huang: What sort of behavioral data do you have?

Joon Sung Park: So Simile does run a lot of experiments. So kind of models that we have trained, for instance, we have a huge repo of RCTs, so randomized controlled trials that were run in social scientific contexts, that were run around pricing studies. So one of the models that we are training is basically the foundation model of human behavior in quite a literal sense. We have all the behavioral signals from RCTs. Can we actually encode that into the model so that the end outcome is a model that can basically predict the results of any RCTs?

At the same time, one of the conversations that we keep on having with our customers that we’re very excited by is our customers then come in, see that potential, and their mind goes to, “Wow, we have 90 million customers, let’s say, here at CVS. How can we leverage this kind of data to create better simulations?” So there’s also a conversation around how can we, in a responsible and ethical way, leverage existing data that is also in-house for our customers, and then use that to create an augmented version of Simile’s model. So that, of course, is going to be more fine-tuned, specific to the population of these customers, but that’s the kind of data that we will be leveraging.

Sonya Huang: I see. And are you doing these interviews typically by voice? Is it a survey that you fill out? What’s the modality?

Joon Sung Park: So it’s a huge breadth. The quick answer here is it’s both. Interviews are fantastic if you want to get the long-tail information about people. So we actually do—in the original study that I conducted back in 2024, we’d literally ask the question, tell me the story of your life. Now the way we do it is we are training our own model, so it’s a reinforcement learning loop, but basically imagine the objective function here is how can you spend the minimum amount of time to get the maximum amount of visibility about this person? So that is one of the things that we do. So basically training an interviewer that is not really asking for factual information or an experience about a particular platform, but just what are the life stories that people have that can be used to train our own model for these agents? And then for the more factual or sort of more discrete choice questions, surveys, and so forth. These are also very efficient. These are time and data efficient, because people can fill out many of the questions in short periods of time. So for those, we actually do leverage them. For instance, if you want to just have a broad understanding of people’s viewpoints on certain topics, certain policies, and things like that.

Sonya Huang: You describe yourself as an applied AI lab. How do you think about where you want to build your own models versus where you want to rely on other existing models?

Joon Sung Park: So in terms of building our own model, really the core thesis here is there is an amazing model to be built that really encodes the diversity of people’s values, preferences and tastes in ways that simply a rational model cannot do. So one way I actually pose this, we’re sort of building—so imagine today’s models are akin to the CPU of the intelligence unit. It’s a single model trained on amazingly rational data that is amazing at solving very complex, objective questions. Simile’s model is much more akin to developing something that is closer to the GPU of the intelligence unit, where the idea here is we don’t actually need a model that is superhuman at Simile. In fact, we want a model that’s as human as possible, but we want to make sure that these models at the sort of individual subunits can represent the real viewpoints of different subpopulations. So where we see that gap, that’s when we go develop our own model. But at the same time, we do leverage frontier models, for instance, as a way to coordinate the research. Frontier models are amazing at coming up with a research plan. So that’s where those models actually do get leveraged.

Sonya Huang: Very interesting. Are people typically coming to you with questions around new product launches? How they should be marketing their companies? Pricing? All of the above?

Joon Sung Park: So it is all of the above.

Sonya Huang: Okay.

Joon Sung Park: Our customer journey usually does, however, start with very concrete use cases and problems they are trying to solve. Concept testing is a big one. It’s also a very straightforward one. So they have a new concept, new product idea, new market message they want to test. And they want to hear from their users what they would think about XYZ. This is one way for them to quickly test those ideas. And then the promise they quickly see is well, right now we’re very much in the practice of testing five to ten different ideas at most. But what does it look like for us to test instantly 1,000 different ideas across 1,000 different subpopulations? That’s the initial vision they see. Then we really get into the nitty-gritty details of well, where does simulation go from here? They then pretty soon start asking well, can this be used to do product testing, but not just simply submitting, like, an image, but imagine basically asking these agents, go experience this product for 10 minutes, and tell us about what you experienced, what you saw. So you’re basically adding temporal dimension. Then you go into things like multi-agent simulation. Some of our customers very routinely actually ask us to simulate their earnings call. This is actually a use case that both surprised me at first, but this is also surprisingly a common ask. Because of course, the CEOs and board members always need to think about hey, how are we going to design our earnings call? How would the audience react? So that is something that we also do. And this is very much a multi-agent simulation.

Sonya Huang: It seems like there’s so many use cases that could potentially be tested once you have, like, a simulated almost customer population, right?

Joon Sung Park: Yeah.

Sonya Huang: I’m curious about the value of research and testing in sim versus just like, let’s say you have a new product concept that you want to test. Why not just go run 1,000 Facebook ads and you actually get the click-through rates on this stuff? Isn’t that real-world data almost more useful than the simulated data on how people might behave that you then correct for with your own models?

Joon Sung Park: So it’s a great question. And I think to some extent here, the answer has to do initially with scale. And then down the line, truly the new capability that comes because you can simulate interactions. The scale question here is actually quite straightforward, where yes, you can absolutely run Facebook ads and Facebook testing. But the kind of experiments that you can run in simulation is actual behavior simulation at scale. And so you can basically pull in any number of users, doesn’t even have to be bounded by the number of population that’s available on Facebook. And it’s also much more representative, because only certain groups of people will actually respond to the online experiments.

But Simile, the model that we are creating, one of the key promises is that it is representative. We do the hard work of actually getting the representative set of people, and then collecting the data that would actually represent them properly. So the scale representativeness is something that many of our users do not have easy access to. This is actually one of the common asks also that we do get, or common pain points that we have heard, where the question that many of these people have isn’t about what questions do we ask these people? But it’s about in the first place, how can we get to the population that we’re excited to talk to? That’s a huge bottleneck.

Then down the line, you can actually really start to imagine—and this is something that our customers and some of the most forward-looking customers are now going into—which is, what are all the downstream implications of the decisions that you make? It’s not just about imagine you have this particular product. Do you like it or do you not like it? Would you pay for this, not pay for this? It’s not necessarily just the initial questions that we want to answer and finish. But we want to understand—imagine you’re a car company, you launched an electric vehicle in this market. Maybe the electric vehicle does really, really well. So we can help you do concept testing around marketing and the product around the electric vehicle. But what does that do to the perception of, let’s say, non-electric vehicles? Does it change the market perception? Then what does it mean for the rest of the product line? And how do you balance those kind of second-order impacts of your decision in a way that is more evidence-based? Today, there’s no way to test for this.

Sonya Huang: Yeah.

Joon Sung Park: You can run this in simulation. So really going beyond simply asking one question at a time, but then to think about what are the long-term implications of your decisions is something that our customers are quite excited by.

Sonya Huang: I’d love to understand how you think about how predictive your model is in actually simulating real human behavior. I imagine you have a lot of evals on this. I guess, what is your North Star metric? How do you guys do on that? And what do you think is the theoretical limit?

Joon Sung Park: It’s a great question. So theoretical limit—and let me just start from there—certainly does exist in the sense that humans are genuinely—there’s a lot of randomness that if you ask me the same question, I’ll actually answer the question slightly differently. So there is certainly that degree of randomness in human behavior. However, there’s a lot of gains in performance that we can have even today in the way we’re predicting people. So the measurement that we do is—so at the level of population, we measure the distribution of responses if it is more quantitative. So we actually measure total variance distance, which basically shows how close are the distributions of the ground truth versus the simulated information. And that is a metric that we run across all the use cases that our customers have. And we have certain thresholds that we believe is good enough for decision making. So TVD of let’s say less than 0.15, we believe, is actually quite strong evidence for making decisions. So that is the North Star state that we want to hit for this class of use cases that are more quantitative, that’s more question and answers. This also does cover RCTs, which is many of the core use cases our customers have. Now there’s actually a really interesting question to ask around, what about multi-agent simulation? What about all the downstream implications that we’re going to be simulating? What does the evaluation of those look like?

Sonya Huang: Yeah. And then do you daisy-chain errors as you kind of—you know, if this one is 85 percent accurate and then this agent is telling another agent something and they’re—do you accumulate errors as you go towards multi-agents?

Joon Sung Park: Exactly. And one of the core theses here is we basically see two categories of simulations. One simulation is what I would consider to be simulations that converge. The other categories of simulations are the simulations that diverge. And sometimes they actually coexist. And it’s really about what research questions do you have. Questions that converge, doesn’t actually matter if you have a little bit of error. Now the error cannot be obviously so dramatic that it sort of is completely detached from reality. But you actually are okay, even if the errors do compound over time, because the pull towards the convergence is strong enough that you’ll actually understand where everything would fall.

A good example here actually is if you simulate a network of people, then that network will always have a hub that gets formed. This is what network scientists would call the scale-free network, for instance. This is actually what powered Google, too. One of the core observations of PageRank was that it doesn’t matter how these networks actually get formulated, you actually see some web pages that get exponentially more links that are attached to it. This is a very fundamental behavior in humans that we also see in simulated networks. And that convergence always happens as long as you’re replicating human behavior with certain threshold accuracy.

Now there are questions that generally do diverge. It’s like your classical questions like was World War I inevitable or was it not? And there, it is sometimes difficult to run the same simulation over time and get the same exact outcome. Imagine you’re running a—this is not something that necessarily Simile right now is going into, but imagine you’re running a simulation of an election. Will the same person win the election every time? There are a lot of downstream implications of every single decision that does happen. So it does diverge. There, the core evaluation is around confidence. So imagine you run the simulation 100 times. How many of those times do the results come out to be X? And how can we actually use that to basically create a bootstrap resampling to calculate the confidence around the simulations? Those are some of the questions that we do ask. And a huge part of this also that the power of simulation, is then to show when it diverges, to show the diversity of possible outcomes so that people can actually look, understand the cause or mechanism of how we got to those outcomes, and prepare for those futures. So those are some of the implications of divergence in simulations.

Sonya Huang: Are there any mathematical descriptions of why something would converge or diverge? I imagine if you have an average function, maybe you converge. And then if you’re splitting outcomes to a binary, then you might maybe diverge.

Joon Sung Park: Yeah. So the intuition I think is close. And technically, this is also a research topic. So Simile is a company where we do go deep into this research topic, in the sense that I see simulation as a field that’s akin to developing your day one of inferential statistics. Inferential statistics scientists actually had to do a lot of discussion and research over time to decide that P less than 0.05 is actually evidence that is strong enough for science. Simile is working on setting the same kind of threshold and standards for the rest of the field. So those are the intuitions. I think that’s exactly the right intuition. In terms of actually how to make a robust mathematical equation around, like, what’s going to happen when, it is a real research frontier for simulations.

Sonya Huang: Thank you for being nice about my vibe mathing.

Joon Sung Park: [laughs]

Sonya Huang: I’m curious, you know, it seems like a lot of Fortune 500s coming to you. I’m wondering whether there are non-existing corporate use cases that are great mysteries of our society that might become solved. And for example, I’m wondering about economics, central bank decisions. Oftentimes, I personally believe in macro, nobody knows nothing. And oftentimes a lot of the issues come about from human psychology. So to me, macroeconomics is a function of simulating human behavior at scale. You know, I’m thinking even in the venture capital use case, we often debate internally, does value accrue to this company or not? You could run the simulation of all the different layers of the AI stack and almost figure out where durability and value accrues. If you had a kind of perfect simulator of human behavior, there’s so much more you could do than serving the Fortune 500. Do you agree with that? And then if so, are you serving governments and the like?

Joon Sung Park: Yeah. So it’s interesting. When we were still researching in this area, the way I actually got, back then, my advisors, Michael and Percy, excited about this was I basically told them, “Look, we do this right, there’s a Nobel Prize to be won there.” And I truly believe that. And it’s also not surprising in that your classical economics simulations, things like agent-based models that really pioneered our understanding of back in the day, the kind of topics they studied was: how does segregation happen? What are the causal mechanisms for segregation? So scholars like Thomas Schelling would actually build agent-based models that are extremely simple and rudimentary, but that showed something deep about human macro behaviors. And he, of course, went on to win a Nobel Prize.

I see the same opportunity here, but in an augmented way, where back in the day, the agent-based models were very much deterministic in some sense, where you basically in this simulation of, let’s say, model of segregation from, like, 30 years ago, individual agent was simply red dot or blue dot. And every game iteration, they would look around its corner, see how many of its neighbors are of the same color. And if that threshold goes below a certain threshold, then they will decide to move to a new location. And that was it. But now we can actually create real agents that replicate the full richness of individuals and run the same kind of simulations.

So the kind of questions that we can ask that goes beyond simply the commercial use cases, for instance, in the context of macroeconomics, actually, the questions that I actually did get asked from economists were things like: When does bank run happen? Or questions like climate change. One of the core blockers of climate—like, solving that issue is the collective action problem of many nations. Can we actually simulate that? Or what are the signals of a democracy that is about to collapse? Can we understand the origin story of the monetary system? These are the kinds of simulations that I do believe ought to be the North Star state of this field. And it is sort of interesting to imagine what that would actually look like in practice, right? Because these would involve very large-scale simulations with many agents interacting with each other. I do see a future where—today, this is certainly not the case. Today, a simulation is quick and fast to run. But what about simulations that take actually $100 million to run once, and could take many months to run? But when we run it, it solves one of the fundamental questions of our society. That I do think is genuinely a very exciting possibility for this field.

Sonya Huang: Mm-hmm. I’m even thinking politics, for example, could be forever changed. Today everyone has an agenda of how they say some policy change will impact things. Well, why don’t we just run the simulation?

Joon Sung Park: And understand all the downstream implications, and not just what’s going to happen this year, but what does it mean in the next five to ten years?

Sonya Huang: Fascinating. I was going to close by asking you what makes you excited about the future? Is it what we just talked about? Or is it something else?

Joon Sung Park: I am somebody who is quite inspired by science fiction. And when you read science fiction that covers societies that have progressed far enough in its technological maturity, you always see two pillars. You have some version of AGI, and you have some version of simulations that really help guide the society. I do see an opportunity today to really take a first crack at building the simulation. I would not have said that even five years ago. But that is a conviction that we have built up over the years as we’re going deep into this research. And what’s exciting is there’s a clear use case today that can serve our users. But then there’s a lot of innovation that is yet to come that I do think will build up to actually building a simulator that’s akin to the CERN of human society. And one of the things that my co-founder Percy sometimes says is you look at the greatest scientific innovations, they often start from an amazing measurement. Hubble Telescope really changed the trajectory of how we understand the universe. Simulation can be that for human society. So the thing that does excite me, there’s a lot of focus on natural sciences, but how can simulation really unlock our understanding of humanity and social sciences? And how can we actually use that to make our society be a better place? That’s exciting.

Sonya Huang: Totally. I remember reading somebody was excited about, you know, there’s a small but breathtaking chance that the field of economics as we know it may actually become solved by simulation. And I’d extend that not just to be economics, but kind of everything that deals with human behavior and social sciences, which ultimately is everything around us.

Joon Sung Park: Truly.

Sonya Huang: Wonderful. Thank you so much for joining today and sharing the story of both Smallville and what you’re now up to at Simile. I really enjoyed the conversation.

Joon Sung Park: Same here. Thank you for having me.

More Episodes