Suno’s Mikey Shulman: Everyone Can Make Music Now
Most music platforms assume you’re a listener. On Suno, 90% of daily users make something. Founder and CEO Mikey Shulman explains why that flips the model: the act of creating IS the entertainment, with closer parallels to gaming and Claude Code than to Spotify. He breaks down the technical bets that got them here — modeling raw sound waves instead of encoding music theory, choosing autoregression over diffusion to prioritize full songs over crisp clips, and why music isn’t a scale problem the way LLMs are.
Watch Now
Transcript
Chapters
Full episode
Mikey Shulman: In Western music, there are 12 tones. If you tell the model there are 12 tones, it will only ever produce those 12 tones. You will be forever limited. And so for us, it was all about let’s throw away everything we know about music and let’s try to do this from scratch. And it’s just a sound wave. It’s just sampled at 48,000 times a second, and it is a continuous float 32 number. And let’s figure out how to model that. And that was a lot of the early breakthroughs that we had to make, but once we did, we realized that it is a totally generic music-making machine. And now you are only constrained by what you can describe and your imagination.
Sonya Huang: I’m delighted to welcome Mikey Shulman. Mikey is founder and CEO of Suno, which is building a music company or a creative entertainment platform, and has been one of the most novel consumer applications I’ve seen out of AI. And I’m very, very excited to ask you about your journey and what’s ahead for Suno. So thank you for joining us today.
Mikey Shulman: Thank you for having me. I’m excited.
Sonya Huang: Okay, awesome. I want to start with your background, because it is very, very unexpected. You went from a physics PhD at Harvard, I think quantum computing with solid state spins, to building the largest AI music company in the world. Like, what insight connected those two things for you?
Mikey Shulman: You know, I don’t know how. On paper, I guess I have no business building a consumer entertainment company, but a lot of people went from physics into AI just, like, 30 years ago, a lot of people went from physics into quantitative trading. I’ll be honest, though, I was an okay physicist only, and there are a lot of better physicists—including one of my co-founders. And I think what I mostly learned is playing at the nexus of two things that don’t usually play together is just a massive opportunity in all domains. It can be music and technology, it can be quantum mechanics and low-temperature microwave engineering, or it could be whatever else you’re going to do.
Sonya Huang: You and I got connected in the very early days of Suno. One of our mutual friends, Harrison Chase, was one of the earliest Suno Discord users, and he was having far too much fun making songs in your Discord. Maybe tell us about the early days of Suno. How did it come together? Did you set out to build a music company?
Mikey Shulman: Originally, we thought this would actually be too hard. And it’s because you have to rewind. This is pre the ChatGPT moment. We did some back-of-the-envelope math. We knew we loved audio, but the back of the envelope math told us that actually producing good music, making good music, generating good music was a couple of orders of magnitude away in terms of compute and model size and capability. And it’s because music, sound in general, is very unwieldy. It’s not in discrete bits like text is. And so we actually started building a company that was all around using the same technologies to make sense of audio, not to produce it. And very happily, pretty early on, we had the right breakthroughs and we realized, oh, we actually can make music.
Sonya Huang: You’re pretty good at math. What’d you get wrong with your back-of-the-napkin math then?
Mikey Shulman: The math was right. We just had some breakthroughs that said, it’s actually—you don’t need that amount of compute. You can make the right technological breakthroughs to, if you want to think about it, basically just compress audio really, really efficiently. And that worked a hell of a lot better than we anticipated. So it was a very nice being wrong moment. Not all being wrong moments are so pleasant. And to be clear, at the beginning, the music was terrible, but we still stayed up late …
Sonya Huang: Actually, he thought it was good. He was one of your first 10 users, I think. He thought he was pretty good. [laughs]
Mikey Shulman: Certainly before we put it on Discord, the music was very terrible. Before we put it on Discord, we could make, like, 12.5-second clips that wouldn’t always listen to the words you asked them to sing. But we had so much fun doing it, and we thought other people might have fun doing it. And so we kind of took the example of Midjourney and we said, it’s really easy to put a Discord bot out and see, will people enjoy it? And we put it out there and a hell of a lot of people enjoyed it. And that was a really confirmatory moment for us. And so a lot of people told us not to build a music company. It’s not the easiest business to work in. Speech is really big. There’s a lot of great business-use cases for building speech technologies. But when you are staying up late playing with the thing, and you don’t want to go to sleep, it’s a really good sign that that is what you are meant to be doing. And so that’s what we did.
Sonya Huang: I love that. Are you a musician?
Mikey Shulman: I am. I play almost every day. I grew up playing a lot of piano, and ended up picking up a bass around age 12 and playing a lot more of that.
Sonya Huang: Okay, so personal passion point. That’s awesome.
Mikey Shulman: The revisionist history is that—which is true—is that we used to have jam sessions at our last company in one of my co-founders’ basements. And it’s true, we had a lot of fun there. It’s not why we started the company. Again, we thought it would be too hard to do this. It was just fun.
Sonya Huang: Meaning at Kensho?
Mikey Shulman: At Kensho, yes. Where I met the great Harrison Chase.
Sonya Huang: The Kensho mafia is pretty unparalleled. There’s Harrison, but also Daniel Nadler, Sam Whitmore, you. Well, there are a lot of you.
Mikey Shulman: There’s a lot of us. I just credit Daniel with that, honestly. Daniel is, I think, the best object lesson in what talent density can do for a company. And it was a lot of people with non-traditional backgrounds. It skewed very young, but he was great at finding people and great at convincing them to join.
Sonya Huang: I love that. Okay, so walk us through what happens when, somebody types upbeat “‘90s hip-hop track about a road trip.” You get the prompt in. What happens? What is the model doing to be able to pass something back to the user that seems like it’s quite special?
Mikey Shulman: In some way, it’s actually pretty simple. A prompt like that, you have to figure out what are the words of this song. And we use various LLMs to do that, to make the lyrics. And so it’s taking basically the cue there is “road trip.” And so, like. what should this road trip be about? And it will probably get it wrong because you didn’t give us enough information, but that’s actually okay. You can iterate on it. And then you said ‘90s hip hop, and we tried to expand that out into a set of cues that the model can really understand. What is the genre? What is the style of this music? And then you put those things together. You have a lot of lyrics, you have a lot of styles, and we have our models that are trained to take in all of that information and just produce sound.
The amazing thing here is that the models don’t know that there’s vocals and instruments. It doesn’t know what kind of instruments there are. Very early on, it was actually quite obvious to us that the more musical knowledge we give the model, the more constrained it will be—in a bad way. And so we actually just model everything as sound. And that’s what made it so hard, but ultimately that’s what makes these things so powerful. So just to be concrete about it, in Western music, there are 12 tones. If you tell the model there are 12 tones, it will only ever produce those 12 tones. You will be forever limited. And if you tell the model there’s 200 instruments, those are the only sounds that you’ll ever be able to make. And you won’t get the next Skrillex using Suno. And so for us, it was all about let’s throw away everything we know about music and let’s try to do this from scratch. And it’s just a sound wave. It’s just sampled at 48,000 times a second. And it is a continuous, float 32 number. And let’s figure out how to model that. And that was a lot of the early breakthroughs that we had to make, but once we did, we realized that it is a totally generic music-making machine, and now you are only constrained by what you can describe and your imagination.
Sonya Huang: That’s so cool. Have you found that we’ve basically just rediscovered the existing genres of music and the 12 notes? Like, have you, I guess, independently seen just that same behavior emerge when you’re trying to learn music from first principles, or have you seen a different set of capabilities emerge?
Mikey Shulman: No, the amazing thing is now we see new things emerge that you never would have thought of. And so most of the time this looks like blending genres that really have no business going together. And so you’ll get, I don’t know, trap with a sitar in it, or you’ll get country with 808s in it, or whatever it is. And again, this is really empowering people to do the things that are in their heads. And it’s in a way that would not have been possible without a technology like this, or would have been really, really hard. We see microtonal music. It is really inspiring to go and just look at all of the crazy things that people are making. A lot of them sound like genres you know, and a ton of them sound totally strange and bizarre and lovely.
Sonya Huang: That’s awesome. That’s really cool. Are there certain genres that you’re finding your model is better at and certain genres where you’re worse at?
Mikey Shulman: Definitely. I mean, I attempt not to say good and bad about music other than, it’s sampled well, the full bit depth or full sampling rate. But to the extent that you can make such generalizations, we’re very good at country. We’re very good at pop music. And I think the cartoon maybe to have in your head is that there are some genres that are somewhat more formulaic than other genres. And so perhaps we’re better at those. But I have some sneaking suspicion that for those, it’s as much raising the floor as it is raising the ceiling. And for the things where we’re less good at it, we’ve not raised the floor. And so we make a lot of bad music, but we have also raised the ceiling. And if you’re willing to go for long enough, you’ll find amazing stuff.
Sonya Huang: That’s so cool. Suno V5 seems like it was a real step change in quality. What goes into one of those step changes?
Mikey Shulman: You know, it’s really hard to predict when the step changes happen because it’s really nonlinear in both the research inputs, but actually it’s not even linear in how much our testing says the model is better. And so just as an example, we can measure how much one model is preferred to another model. And you may come up with it’s 10 percent preferred or 15 percent preferred. And you can take two different models, and one is 10 percent preferred or 15 percent preferred. And the uptake on the other end, how much our users actually love it and use it, or how much the product grows when you release it, won’t necessarily be all that correlated with what the preference signal is. And it’s because music is messy and there’s lots of other things that go into it.
But to take a huge step back, we have a pretty aggressive research roadmap. And in some weird way, we’re always working on this thing—we know what V6 and V7 are. At some point there’s lots of things that you want to have your model do. There’s lots of improvements that you want to make, and it’s almost an arbitrary cutoff of saying, like, okay, this is the break. This is what we’re going to call V5.5. And everything that comes after is going to go into the next models. And almost just to keep it on a steady cadence of when we release things, because what you would hate to have happen is we don’t release stuff for, like, two years and we try to make the music model to save humanity. And that’s going to come out in two years and we’re going to do nothing before then.
Sonya Huang: Yeah, totally. How much of each of these improvements do you think is just a function of scale, scaling compute, scaling data, and then getting a lot of human preference data back? How much are you guys doing, I guess, more novel research?
Mikey Shulman: Music is really not a scale problem. The models are pretty small for a variety of reasons. And I think people will often incorrectly take what they know from LLM land, where models are giant and scale helps a ton, and apply it to music. And I think the cartoon that I have in my head is that in LLM land, there’s all of these benchmarks, and you can quibble about which ones are flawed and which ones are good. But these benchmarks exist, and scale is actually a pretty efficient way to climb up the ladder and just keep doing better and better on the benchmarks. In music, there are no right answers, there are no benchmarks, and so scale is somewhat less helpful in solving it. It’s a messier problem in many ways, it’s aligning models to creative human tastes. You and I are not going to agree on every song. You and I aren’t even going to agree on …
Sonya Huang: I’ll just defer to whatever you say.
Mikey Shulman: I don’t think you want to do that. And the models not being that big actually lets us get you the music quicker, which turns out to be really important for good UX. And so I think a lot of this boils down to research and preference data. And so we gather preference data that lets us align models to what our users like. A really underappreciated thing is how much this preference data actually lets us do research. Without the scale of preference data that we have, we wouldn’t even be able to develop the techniques that we are using. And so there are really some virtuous cycles there in how the product itself keeps getting better just by virtue of having people use it.
Sonya Huang: Interesting. And I guess you can use the human preference data in a much stronger way than the text models because they’re all worried about sycophancy, right? And for you I guess that’s much less of a challenge.
Mikey Shulman: A hundred percent. A hundred percent. And so I think yeah, there’s just a tremendous amount of edge comes from our ability to understand it, do research on it, and then RL that back into our models.
Sonya Huang: That’s awesome. Okay, I want to switch gears a little bit and talk about music as a consumer phenomenon. And you’ve mentioned “consumer creative entertainment platform” at the beginning. I want to dig into what that means. Maybe starting with it seems like music is just a cultural, social phenomenon of I like this song, I send it to my friend. It’s a scarce resource, we bond over liking that song, and having those mixtapes, listening to it together, et cetera. And so to me, music has always just been this shared cultural experience. That’s what it is. Do you agree with that? And then if so, what does AI music mean for that?
Mikey Shulman: I agree with that very strongly. Music has a very different place in culture than other media in a variety of ways. One is actually people’s tastes are far more developed in music than they are in other media. Everybody has taste in music in a way that most people don’t have taste in film or literature. And the other thing is that music is actually inherently a much more social medium. And if you think about how going to a concert is an inherently social thing, even though you’re only really looking at the performers and it’s because of the people around you. In a way that, let’s say, going to a movie in a movie theater isn’t quite as elevated as it would be compared to an empty movie theater, for example.
And so I think this is actually largely that humans communicating sonically through our mouths and ears, and therefore music is a much earlier method of communication than writing. It’s much more in our DNA, I think, compared to other things. I’m obviously biased. I obviously love music. I’m not sure. I think people assume that oh, you’re just going to have AI-powered Spotify and it’s going to dehumanize it and music is going to get terrible. That seems to me to be obviously wrong. I don’t think you’re going to make a better Spotify just by powering it with AI.
And the thing that’s really interesting is actually how can we not just change, but elevate the place of music in culture? And music has this other funny thing that by virtue of being so ubiquitous, it ends up being in the background a lot. And the thing that’s amazing is that AI can be used to actually change that and to augment how music is perceived in society and in culture, augment how it is used socially, because it’s actually become less social in the last 30 years. And so that is the corner of the universe that we play in and that we are really excited about.
Sonya Huang: Do you see yourselves as today—and I guess when you look at your users, are people more creators of music or are they more consumers of music or both?
Mikey Shulman: This is the crazy thing about Suno. Before Suno, basically everybody was a consumer of music. Compared to the eight billion people on the planet, there are very few people who make music and the rest of us consume it. And that’s fine. It tends to cater to passivity. It tends to cater to making it less social and more impersonal. And the crazy thing about Suno is that on any given day, 90 percent of the users are going to create something. And the thing that’s hard to wrap your head around is you’re not creating it to go bring it elsewhere, by and large, to do something with it. People are creating music for the fun and enjoyment and fulfillment that comes with being creative. And so that, the creation, is actually the entertaining bit. And that is the big step change. It’s that everybody in the world is creative. Being creative makes you feel a certain way; this is in our DNA. And we are basically using technology to allow everybody to feel those warm and fuzzy feelings. A lot of the inspiration for me personally for doing this comes from just remembering the fondest memories that I’ve had, or some of the fondest memories that I’ve ever had are making music with my friends. Not even performing in bands but, like, practice was so much fun, and you get really close to people making music. And it’s because it feels really good to be productive in a way that doomscrolling your favorite app for an hour does not feel so good when you’re done.
Sonya Huang: I was an orchestra kid, so I was not doing nearly as cool music as you, but I totally agree.
Mikey Shulman: What did you play?
Sonya Huang: Violin.
Mikey Shulman: Do you still?
Sonya Huang: Yeah. Not that much.
Mikey Shulman: Oh, excellent.
Sonya Huang: I have perfect pitch and let’s just say I’m definitely not playing 12 tones now. So my ears bleed when I play. [laughs] But I totally agree with you. Okay, so that’s crazy. So you’re very much a this is not a replace Spotify type platform. This is a creator platform. You’re turning people into creators. This is an active entertainment platform. It almost feels more like gaming than it feels like music listening or music as we currently understand it.
Mikey Shulman: A hundred percent. I mean, I will get shit inside of the music industry for saying that there’s a lot to learn from gaming. But there’s a lot to learn from gaming. And it’s not just about how well it monetizes. It is how it grabs your attention and how it pulls you in and makes you use your brain. There’s also things about the business model about gaming that music has to learn from. But I think I understand why it is in some sense taboo, because music is art and people don’t think of gaming as art. But I think it’s silly to say that there’s nothing to learn from it.
Sonya Huang: Yeah. Yeah, that’s awesome. Okay, so it’s a self-expression/active entertainment platform, some parallels to gaming and some parallels to even Claude Code, right?
Mikey Shulman: Absolutely. So I think the thing that’s amazing about making music is that you feel good and fulfilled, and you enjoy making it and then you listen to it. And there are parallels. And so that’s what we call creative entertainment. The entertaining part is being creative. It’s not that you are being creative for the sake of bringing the piece of content somewhere else. I think you see that in cooking. People like to cook even though they can get a better meal at a restaurant, and it’s because it is fun to cook and it is fun to consume what you make. And I think a lot of what makes Claude Code or any of the other platforms so special is that it’s fun to build things, and it’s fun to use what you build. And even though most of the things that I build are definitely not meant to be hosted in AWS and used by millions of people. I actually enjoy the act of building and I enjoy the act of using the thing that I built. And so I predict that, like, in 10 or 20 years, there will be way more of these creative entertainment things all over the place. And it’s because that’s actually finally possible. That is the thing that AI unlocks. It unlocks lots of intelligence things too, but it actually lets everybody be creative in almost any domain.
Sonya Huang: Yeah. I’m guessing you have an opinion on this. What do you think of the word “slop?”
Mikey Shulman: [laughs] I do have an opinion. I actually—I mean, my answer is usually it’s thrown around without any meaning, and I don’t know what people mean by that. So sometimes in music it goes all the way to say there is some streaming fraud and so people think streaming fraud is slop. I don’t really understand that. Like, the fraud part is bad, but the fact that the music was made by AI is an implementation detail. I made two songs with my five-year-old yesterday. Is that slop in the sense that 99.999 percent of the planet has no interest in hearing that? Sure. But that’s really meaningful to me. And so if you call that slop, I’m not sure I care. It’s an interesting question, though, right? This has happened before, at least in music, where when way more people start to be able to produce something, people get afraid that it’s just going to flood all of our ears and all of the platforms with more content. And this happened when people started to be able to make music on their laptops. You had a lot of 13 year olds making beats in their bedrooms. And you fast forward to today, that seems like obviously a good thing. Yeah, there’s way more music. It means that there’s way more quote-unquote “bad” music, but it also means that there’s way more great music and there’s new kinds of music that get made and there’s new kinds of stars that get made. And, I see no reason why way more people making music again would be any different from that.
Sonya Huang: I love that. So we talked about the floor, the slop floor, the non-slop floor. What about the ceiling? Tell us a little bit about the most incredible things people have been able to do with Suno, and I think you guys have had some chart-topping hits now. Maybe talk a little bit about that.
Mikey Shulman: We have had some chart-topping hits. We’ve had people signed to record deals. We’ve had people make single songs that chart. And that’s amazing, and I think about that as that is a new creator coming with a new perspective that resonates very strongly with people. And so that is obviously the ceiling going up. My favorite example is Xania Monet, who—it’s the stage name of a poet who took all of her beautiful poetry that she had been writing for a decade and started to make music out of it, and found an entirely new voice and an entirely new audience to resonate with her art. And I think this is fantastic. This is people connecting, right? This is the most personal thing in the world. And when you go and listen to the music, you’ll realize it’s extremely personal.
The best music will always require human guidance. And it’s because again, music has no right answer. You like a piece of music because of how it sounds and because of the messenger who delivers it. And we will find new messengers with new sounds, and we already are. And to me, the ceiling is obviously going up there.
The other thing that’s really cool is even if there—I just know that there are tons of charting tracks that have little bits of Suno in them—they’re not entirely Suno. And it’s because for the professionals, it’s also just an amazing tool to use as part of your workflow. It’s not your whole workflow. And so there’s this weird thing where I think people incorrectly say it’s either all AI or none AI. And the vast majority of music will have some AI in it, just like the vast majority of music today is autotuned or is digitally produced. And again, more tools let you push music forward faster, let you find new sounds faster. To me, this is obviously the ceiling going higher.
Sonya Huang: That’s amazing. Okay, so you chose to go after music, which is probably the one industry that a lawyer would tell you not to go after because the minute you’re there, you have pitchforks coming after you. I mean, you just had a pretty landmark settlement partnership with Warner. Can you tell us more about that, and what you think that means for the future of collaboration with the existing professional music industry?
Mikey Shulman: Absolutely. Just to back up, I think people incorrectly assume that we hate the existing music industry and especially we hate the record labels. People also expect me to say, “Oh, the record labels are cooked.” I think that’s obviously wrong. They’re some of the most culturally important institutions in the world. They understand music and they understand music culture. They cultivate and grow stars that resonate with billions of people. And the way I see it, it would be a real shame if there were two worlds of music, if there were an AI world of music and a non-AI world of music. One, is it makes no sense because most music will have some AI in it. But the other is it’s just bad for the end user to think about having to separate these things in your head, and to have to go to different platforms to have effectively similar usage patterns or interactions.
And so what I’m most excited about doing with Warner is actually building things together that could never have existed before, and building products that let fans interact with their favorite artist and really deepen the artist-fan connection in ways that are just positive sum for everyone. It’s great for the artist. They get to engage with their fans. It’s great for the fans. They get to feel like they’re engaging with their favorite artist through music. It’s great for the rights holders—obviously this is a heavily monetizable thing. And it’s something that literally could not have existed up until approximately right now. And my sincere hope is that going forward, we find way more of these opportunities of things to build together that couldn’t have existed until today. And just to say it out loud, the digital musical experience has basically not changed for 25 years. We’ve just been streaming music for 25 years. And I think music is due for a new innovation and a new format. And so that’s what we’re here to do.
Sonya Huang: When are we gonna see Suno at Coachella?
Mikey Shulman: You probably have already. It’s probably in a lot of the music. It’s probably in a lot of the backing tracks, I’m sure.
Sonya Huang: But I mean, like a main stage, a consumer participation thing.
Mikey Shulman: I hope that at some point in the next year, we see a truly interactive concert where the audience is actually able to participate and make music with that artist. One of the coolest parts of my job is if I go and I demo Suno to an audience of hundreds or even a thousand people, is making a song with that many people all at once. And it’s a very special moment. It’s almost religious. A lot of religions will do chanting and singing in large groups. And why does that have to be confined only to a religious context? Why can’t that happen at Coachella where people are already so excited about being together at a festival? And so my sincere hope is that happens in the next 12 months.
Sonya Huang: Love it. Okay, we talked a lot about the model layer and the cultural experience of making music. I’d love to talk about the application layer product building, because I think that’s also an area where you guys have been really, really innovative. What’s your approach to how to think about building in the application layer?
Mikey Shulman: I guess a lot to say here. The first thing is actually there really isn’t enough innovation for consumers right now, but the average consumer is not willing to put up with rough edges. And it’s because you’re not using this for work, you’re using this for fun, you’re probably paying for it and not your boss is paying for it or your company is paying for it. And so there just needs to be a bigger emphasis on the actual experience that we deliver to people. Also, just if we’re being honest, like, it’s unclear how much moat exists in only a model. And I’ll just say it, Google has started to build music models. And while ours are way better today, they’re Google and they’ll outspend us seven days a week, and they can probably catch up on the model side. And so I think it’s just really undervalued to invest in the product and the UI and the UX to make sure that you’re constantly delighting people.
You know, one of our company values is actually we’re just a music company. And in many ways, I don’t think of us as a technology company. And this is to make sure that we’re not building technology for the sake of building technology. We’re building technology for the sake of delighting people. And infusing that in the culture is actually just really helpful in getting people to realize what the whole point of the company is. And so that manifests itself in lots of little ways, but from a product building strategy, that’s what it is.
Sonya Huang: That’s awesome. What are some of the, I guess, consumer product decisions you’ve made that you’re most proud of or that were the most contrarian?
Mikey Shulman: A bunch. One that I got wrong was getting off Discord very quickly. I thought we would be on Discord for a while. We got off Discord at the end of 2023, and we released a pretty thin, not full-featured web app. And it took five days for 90 percent of the traffic to move to the web. So that’s just an overwhelming signal that I had gotten that wrong.
Maybe the biggest one and the most contrarian one is at the time a lot of people were experimenting with music. Let me give two, actually. One was to focus on songs and not just background music, to focus on lyrical music. And it’s because a song is a story, and captivates you in a way that vocal-less background music just won’t. It was also just way harder. And so nobody was really able to do that at the time. And so by figuring that out, that was certainly a source of moat. But in hindsight, it’s not just that we were able to do something hard. It’s that the human voice touches people in a certain way and just makes the product way more delightful than just making background music for fun.
And then the other is also in the same direction. We decided to make full songs. And so again, a song is a story. It’s maybe on average three or three and a half minutes. And we optimized for that, even though originally most technologies just let you make something like 10 or 12 seconds of music at the expense of sound quality. And for the longest time, our audio was really not crisp. And every single one of our competitors had just way crisper audio. And everybody could hear one second of a Suno song and know oh, that sounds like crap. That’s a Suno song. And to just go all in on that and say, okay, we’re going to make full songs and yes, they’re not going to sound amazing, but they are still going to tell the story, instead of making perfectly sounding audio that just is background music. And so the choice of technologies there was to use autoregression instead of diffusion. But it was really kind of product-driven, and to say it’s not just that we like autoregression because we have emotional attachment to that technology. It’s because we think that making a song and telling a story is more important than making crisp audio.
Sonya Huang: That’s so cool. What’s ahead for Suno? $300 million in revenue run rate. I mean, you’ve made it extraordinarily far. What’s ahead?
Mikey Shulman: A lot. I think it’s really early. Most people don’t even know about us. The product is still very crude. There’s a lot of room to run. I think you’ll see us do a couple of things. One is try to increasingly make it a more social interaction. Music is meant to be social, and so you are meant to be sharing music more with people, but also creating more with people. And that can be both synchronous and asynchronous. And so perhaps one day I’m going to share you not even a song, but a template for a song that you are going to explicitly riff on and send back to me. And that is you and I kind of co-creating. Maybe you’re going to do that with your favorite artist, with some of their old music that never got released, whatever that may be. And I think you will see us go much more in the direction of letting people express themselves in the music. And so the last big feature we’ve released is the ability to use your own voice. When you hear yourself in the song, you get so much more attached to it. But actually even more so is when I send you a song and you can hear me in it, that song will resonate much more than some nondescript voice, even if that nondescript voice is very good. And it’s because the human ear is highly attuned to voices. We kind of evolved that way. And so both of those, being more social and letting people pour themselves into the music, will be a huge focus for us for the next 12 months.
Sonya Huang: I love that. Love music videos.
Mikey Shulman: I love music videos. I don’t see enough music videos getting made. I grew up watching music videos on MTV.
Sonya Huang: Me too.
Mikey Shulman: And there’s just a huge difference between a music video, which is heightening the song and telling the story, versus background music to put behind whatever YouTube content that I may make. And I love the former and I’m much less into the latter. And it’s because what we would like to do is pull people into music more than they are now, and not just have music be forever a background thing. There’s actually a video product in beta in Suno right now. And so people really love it.
Sonya Huang: Nice. That’s really cool. I can’t wait. Why do you think there’s so few consumer founders in AI right now? Like, what’s up with that? Everyone’s going for the enterprise. OpenAI just shut down Sora, which to me was—I mean, I understand their reasons, but why do you think there’s so few people building in consumer right now?
Mikey Shulman: I mean, I should ask you that. You’re the professional investor. I mean, my theory is it’s just harder, and there are a lot of obvious business problems to solve. And I’m happy to have less competition, honestly. Why do you think it is?
Sonya Huang: I think it’s very clear to see how AI is going to automate a lot of existing business processes. I think it takes real creativity to dream about how AI can seep into the way that we actually play and create. I think it takes real creativity to see that. And most people, when they think AI music, probably think AI Spotify, which just sounds terrible. And I think it takes a lot of creativity to do what you’re doing.
Mikey Shulman: Well, thank you. Yeah, I think we are much more inspired and motivated by doing something that wasn’t possible until today, instead of automating or speeding up something that already exists. Again, there’s a lot of business value in automating and speeding up something that exists. In some sense, it is just more fun to do something that could never have been done before.
Sonya Huang: Yeah. And what are we going to do with all our time after all the robots are doing all our work? Like, we …
Mikey Shulman: You’re not going to want to doomscroll for an hour. You’re going to want to be productive and fulfilled.
Sonya Huang: Yeah, exactly. Awesome. Mikey, thank you for sharing everything about your journey to Suno. It’s been so cool, and to see you at the helm of a music company and an active entertainment platform, and just defining what the creator layer means in the world of AI. It’s been extraordinary to watch your journey since the original days of Harrison and your Discord. And so kudos on what you’ve done. Big admirer of you and Suno.
Mikey Shulman: Thank you so much. This was a lot of fun.