Podcasts Training Data Alfred Wahlforss, Listen Labs

Knowing What Your Customers Want, All the Time: Listen Labs’ Alfred Wahlforss

Stream Now On

Alfred Wahlforss, co-founder and CEO of Listen Labs, is building an AI agent that interviews your customers at a scale no focus group ever could—thousands of voice conversations at once, drawn from an audience of 30 million people. Alfred explains the counterintuitive finding underneath it all: people are often more honest with an AI than a human interviewer. He walks through why interview transcripts turn out to be the richest fuel for predicting how customers will behave, how Listen back-tests its simulations to know which questions it can and can’t answer, and why 80% of the company’s engineering goes into building the right audience. As AGI makes building trivial, Alfred argues the scarce resource becomes knowing what to build.

Alfred Wahlforss: Our goal is to get to a billion people in our audience and then to be able to stratify and know what exactly is this person an expert on. And it might be, you know, even something like sneakers. You have some people who are influencers and kind of early adopters. And if you’re able to find that audience and interview them first, the insights are much more valuable. And we can learn across all of the interviews that we do. We build profiles of people as we do more interviews in the platform, and then we can search and find the right person.

Sonya Huang: Okay, today we’re sitting down with Alfred Wahlforss, founder and CEO of Listen Labs. Listen is an AI-first customer research platform that can run thousands of voice interviews simultaneously. You launched about a year ago, and you now serve 20 percent of the Fortune 500, including iconic brands like Microsoft, Anthropic, Sweetgreen, NBC and others. And Konstantine and I are very, very excited to sit down with you today and talk about market research and how it’s getting transformed with AI.

Alfred Wahlforss: Yeah, thank you for having me.

Sonya Huang: Maybe just to get started—you are building an AI-enabled platform that scales market research. What does that mean?

Alfred Wahlforss: Yeah, so we have this AI agent that can understand your customers better than you can. And the way we do that is by talking to them. So to give you an example, you can ask a question like, “How can you improve Cursor’s onboarding?” And then Listen will create an interview guide, which is instructions for the agent to conduct the interviews. And then we have an audience—we have 30 million participants. We can find pretty much anyone from an oncologist to a software engineer, and we’ll go and actually talk to them and have hundreds of those interviews, and then analyze the data and give you recommendations. And now the final step that we’re just launching in a couple of months is simulation. So after you’ve done tens of thousands of interviews in the platform, can you predict how your customers will answer questions in the future? To put it another way, as we get closer to AGI, it will be easier to build things, but the hard part will be knowing what to build. And that’s what we’re building at Listen.

Sonya Huang: Awesome. Do you have any favorite customer stories?

Alfred Wahlforss: Yeah, so Chubbies is one of our customers.

Sonya Huang: Like the shorts brand?

Alfred Wahlforss: Yeah, they’ve been one of our early customers.

Sonya Huang: What do they use you for?

Alfred Wahlforss: They use us for everything—so a lot of marketing testing, testing shirts to understand what products perform well and what doesn’t. And one of my favorite examples is they discovered that chest hair interfaced really poorly with one of the materials they have, so it’s really uncomfortable to wear one of their shirts. And they changed the shirt and it became radically more comfortable. So we solve the small things to the big things. Manscaped changed their Super Bowl ad with insights from Listen.

Sonya Huang: Never heard of that, but I’m not going to ask.

Konstantine Buhler: So you’ve got the men’s hair market covered.

Alfred Wahlforss: [laughs] Yes, that’s our niche.

Konstantine Buhler: From shaping to clothing.

Alfred Wahlforss: That’s right.

Konstantine Buhler: Wow.

Alfred Wahlforss: We do other things. Skims is one of our customers as well.

[CROSSTALK]

Sonya Huang: Don’t know what you’re talking about, but I know context clues. So that’s awesome. I’d love to understand—as you framed it, as we get closer to this AGI future, one of the questions I have is: Traditionally I’ve always been very skeptical actually of surveys, because people get paid to take surveys, so you already have a selection bias issue. The things that people say they would do or the way that they describe how they would behave is different from how they actually behave in practice. And so I come from the school of thought where actual telemetry in the real world matters so much more than asking people about what they would do. And so I’m curious what you think of that, and how you think AI or Listen Labs can help bridge that gap.

Alfred Wahlforss: Yeah, so we’ve done a lot of research on this. One of the things we’ve done with surveys, for example, is we went back to the same person and asked them a multiple choice survey again, and they were radically inconsistent. So even if you go back to the same person and ask them a survey question in a multiple choice fashion, they’re much more inconsistent. But we did the same thing with Listen, when you actually have to think and really reason through your answer, you’re much more consistent with at least how you answer the same question.

And then we are constantly tracking—for example with Chubbies, when we test their different shirts, we a couple of months later look back and see how did that perform with the actual sales data. And I think it depends on the different use cases. I agree that A/B testing is kind of the holy grail, but in practice it becomes really difficult to get right because you need a very large volume of users. And it’s really useful to have some kind of input rather than no input at all.

Konstantine Buhler: Does Listen do voice to text, as in the actual customer who’s answering the survey can speak their answer and then you guys transcribe it? Does it also do text to voice? Is it a two-way conversation? What does Listen start with and what does it finish with for the user experience?

Alfred Wahlforss: Yeah, so it’s essentially a Zoom call that you have with the agent, so you’re on video. And we can also detect your emotions.

Konstantine Buhler: Hmm, that’s cool.

Alfred Wahlforss: So that’s another way to bridge the gap between what they say and how they actually think and feel. So it looks at your eyes, the way you say it. and that’s kind of much closer to how you actually behave in the real world.

Konstantine Buhler: And have you seen, per Sonya’s point, that actually having the person’s face and their emotions and their voice and whatnot yields more engagement and truthfulness? Have we been able to have any studies or at least data to point in that direction?

Alfred Wahlforss: Yeah, specifically with advertising it’s a huge benefit, because you might have people, say, on a Likert scale, which is like five questions that you click are you extremely likely to buy this product, versus you might have very high scores on a certain question like that. But when someone also reacts very enthusiastically, it’s going to perform much higher. And we’ve seen that those ads then perform better in performance marketing for example on Meta and LinkedIn.

Konstantine Buhler: And can you, if you’re the customer and you commissioned this and you get all this response, can you actually click in and if you ever wanted to, watch the interview to get that level of granularity?

Alfred Wahlforss: Yeah, so we built the platform around traceability so that for every data point you can always click and then look at the video or see the quote, and so you know that the AI is not just hallucinating—you can see where it’s coming from.

Konstantine Buhler: That’s awesome. Makes sense.

Sonya Huang: How’d you come up with the idea to build this?

Alfred Wahlforss: So my co-founder and I actually built a consumer app.

Sonya Huang: That did what?

Alfred Wahlforss: That went viral. It was called BeFake. So you could create an AI avatar of yourself. It was an early version of the ChatGPT images, and you could fine-tune Stable Diffusion and put yourself in that world. And that ended up going super viral, and overnight we had 20,000 users. And we were also experimenting with different ways of using AI, so we built this AI interview for ourselves because we had a bunch of questions of how—we had a ton of churn, so we wanted to understand why, how they thought about our positioning, different use cases. And it was really useful for us, and that’s how we got started.

Sonya Huang: Maybe just walk us through how the industry is changing before and after Listen Labs. Historically, let’s say you’re somebody with an app with 20,000 users. You don’t understand how users are using the app, what they want next, why they’re churning. Historically, how did people go about doing that?

Alfred Wahlforss: Yeah, so what we discovered was that there are these survey tools that are pretty old school, like Qualtrics. But then there’s also this very large services industry, because it becomes harder and harder—especially if you want to do market research where you want to talk to your prospective customers, not your current customers, it becomes harder and harder to do that as you scale. So that’s a multi-billion-dollar industry. And what they do is come up with questions to ask, which is an academic subject in and of itself. It’s actually really hard to know, like, how do you ask questions to your point that get to how someone actually will behave? You can’t just ask, like, how much are you willing to pay for this? There are different methodologies that work better than others to finding the audience, how do you source the participants, to then analyzing hundreds of those calls. And in traditional industries like CPG, even at Microsoft, they spend tens of millions of dollars on focus groups, to bring people into a room and interview them. We can help speed that up much faster.

Sonya Huang: Okay, so that’s the old world of how this used to be done. Maybe describe the new world. And then it seems to me that there were obvious first-order benefits, like it’s probably much more scalable, probably much more cost-effective. But there are probably also less obvious benefits. Maybe just talk about some of the benefits of, you know, what is it like when you actually do AI-first market or customer research?

Alfred Wahlforss: Yeah, so most decisions that get made are not based on customer input, right? And the reason for that is it’s just a lot of friction to even talk to customers. So when you can lower the barriers of talking to customers, you end up making much smarter decisions. So the speed advantage is actually huge. For us, you can get input within five minutes from real people. And it’s a really magical experience when you see hundreds of people populate in your interview. And so that’s one thing. And because it’s asynchronous, it’s also much more affordable, so you can pay people much less than if you would’ve had to run synchronous interviews. And so actually that’s an interesting thing that people often ask us, like, do people even like being interviewed by an AI? And the objective answer is yes, because you can pay them less to talk to an AI than to talk to an actual interviewer.

Sonya Huang: Why is that?

Alfred Wahlforss: I think it’s mostly because it’s asynchronous and people are very busy, but then also …

Konstantine Buhler: Lower pressure?

Alfred Wahlforss: Yeah, lower pressure. You can kind of go on and off. And we’ve also found that people are more honest when talking to an AI. We’ve had people really open up. It’s a very therapeutic experience, because it’s a non-judgmental entity that’s really interested in you. And we can also have sensitive conversations, like interviewing kids about how they react to different products. And so I think that’s another advantage as well is that people can be brutally honest talking to the AI.

Sonya Huang: Okay, so historically, for example, if you want to do research on the kids’ market—very, very hard to access that market. Is that a regulatory thing? Is it a scheduling thing?

Alfred Wahlforss: Yeah, it’s—you need parental consent. Kids are really busy: they go to school, they have extracurricular activities. How do you find time with them? And you need to find the right kind of kids. Like, one of the things we realized is the audience is extremely important, and that’s actually where we spend 80 percent of our engineering resources. Every company is driven by a power law in customer segmentation. So even a product like Sweetgreen, which you would think is for everyone, the right audience is typically urban, high household income, mostly female. And by the way, they need to know what seed oils are, which only about one percent of the population does. And then you find that some people go to Sweetgreen every single day and that’s 80 percent of their revenue. So if you can find that segment, the research is so much more actionable.

Konstantine Buhler: Yeah, there’s probably a network effect to it as well, where once you get to a certain scale and people use it, you can access the same kind of person that otherwise might be really difficult to access. Or maybe it’s a scale economy—something along the lines of accessing those really, really specific people who are really valuable for the type of product you’re trying to introduce.

Alfred Wahlforss: Yeah, it’s really about—our goal is to get to a billion people in our audience, and then to be able to stratify and know what exactly is this person an expert on. And it might be even something like sneakers. You have some people who are influencers and kind of early adopters, and if you’re able to find that audience and interview them first, the insights are much more valuable. And we can learn across all of the interviews that we do. So we build profiles of people as we do more interviews in the platform, and then we can search and find the right person.

Konstantine Buhler: So someone might say in a totally unrelated interview, “I’m a total sneakerhead.” And you can keep that in the database on that person. And then when Nike or what have you is launching a new product line, you can offer that person up.

Alfred Wahlforss: That’s right.

Konstantine Buhler: That’s amazing.

Alfred Wahlforss: And that was not possible to do before, because it was usually separate entities, and it would be a very manual process where you would have an email list and you would just spam email.

Konstantine Buhler: I’ve been on the receiving end of those. Yeah, they’re terrible.

Alfred Wahlforss: And one of the problems with that is you then need to have an extensive screening process. So you have something called an “incidence rate,” which can be 10 percent, which means only 1 in 10 people gets qualified to even take the interview. And that causes significant churn on these databases.

Konstantine Buhler: Yeah.

Alfred Wahlforss: Because it’s really annoying to be screened out 10 times to even get paid the first time.

Sonya Huang: Why do these brands even need you for access? If we take Sweetgreen—Sweetgreen knows who that 80 percent is. Can’t they just reach them? Don’t they have direct relationships with them already?

Alfred Wahlforss: Yeah, so they can—and we do that as well, we connect to their CRM and they can send that out. But then the really interesting part is how do you talk to prospective customers—people who also may not be kind of current power law users? And how do you compare those two? And then also what we’ve found is that the CRM is typically really unorganized. And sometimes there’s also regulatory issues—if you’re at Google, you can’t just send emails to people who use Gmail. And it gets much easier to use an external third party.

Konstantine Buhler: And you run the risk of spam, which can get you totally blocked. I have seen that at some of our companies over the years where you do outbound, and then eventually you’re in the Google filter and next thing you know you’re in Microsoft purgatory. I guess going through you guys, you don’t have to deal with that.

Alfred Wahlforss: Yeah, exactly.

Konstantine Buhler: That’s cool.

Sonya Huang: What does this mean for the McKinseys or the whoevers of the world—the people that are building the hundred-slide decks that, you know, reach 3,000 people to reach some set of incentives?

Konstantine Buhler: Didn’t you do that, Sonya? Wasn’t that a former life?

Sonya Huang: You know, no, Konstantine, but I’m glad that’s what you think of me.

Alfred Wahlforss: [laughs]

Konstantine Buhler: Isn’t that what [inaudible] does?

Sonya Huang: We hired a consultant. I used to hire these people.

Konstantine Buhler: Got it.

Sonya Huang: So I was a layer on top of the layer on top of the layer.

Konstantine Buhler: Oh, okay.

Sonya Huang: I was even more redundant. But what does it mean for all these people? Do they still have a role to play in this new future?

Alfred Wahlforss: Yeah, I think AI is changing all roles very quickly. And we work a lot with Bain, for example. So they use us to speed up their traditional processes. And I think they still have a role to play. I think traditional services and then being able to implement these changes is still extremely valuable. But a lot of margins are going to drop, and you have to make sure you kind of unbundle a lot of your services to maybe allow for AI agents to help solve some of the problems that you would go to traditional consulting firms for.

Konstantine Buhler: Maybe I’m an optimist here, but why wouldn’t it be more? Why wouldn’t I, if I’m running a business, say, “Oh great, I want to find five new areas to expand into now that I have AI and these tools, and I will pay you, Bain or what have you, the same dollars. You use Listen, and just explore those new areas and tell me where to build.” Is that overly optimistic?

Alfred Wahlforss: No, I think it’s one of those areas where the ceiling is very high. You can learn more about your customers and you can build more things. So I think you’re right.

Konstantine Buhler: Yeah. I’m still thinking about the chest hair-shirt thing.

Alfred Wahlforss: [laughs]

Sonya Huang: I’m glad that 20 minutes in, you’re still thinking about chest hair, Konstantine.

Konstantine Buhler: There are so many little things I’d love to tell the companies that I’m a consumer of—like, even the way they laced these shoes, I’d love to give that feedback.

Sonya Huang: This is why you’re a venture capitalist.

Konstantine Buhler: Details. Details.

Alfred Wahlforss: We hope to live in a world that finally works the way people want.

Konstantine Buhler: That would be great. Please.

Sonya Huang: Are you seeing any pricing compression already hit the industry? Like, I would imagine if I am Bain’s customer, I’m thinking, well, you’re able to do this survey a lot more efficiently now with AI than before AI. Who’s getting the benefit of that economic surplus?

Alfred Wahlforss: So because you’re able to do it faster, I would argue you should be able to charge more for it.

Sonya Huang: And is that what’s actually playing out?

Alfred Wahlforss: We’ve done some studies where we’re able to charge hundreds of thousands of dollars to speak to twenty doctors across eight countries. So maybe over the long term, like, the individual interview will become more affordable, but I think you’ll be doing two orders of magnitude more research. And I think what’s really exciting is also simulation, which is something we’re building now, where you’re able to unlock the 99 percent of use cases where you would never have time to talk to real people.

Konstantine Buhler: I think that’s so awesome in part because there are so many areas where they don’t even listen to the customer. Like medicine. There are a million little problems with the medical system. I hear about it all the time. And these are doctors, you know, they’re busy, important people, but it feels like the companies haven’t even invested the time in figuring out where all those paper cuts are. And the doctors are really busy, so they’re not going to go schedule an appointment and have some long conversation and meet with some group. But if they could do it at any time in an app on their phone as part of their normal homepage app, and give feedback on their EHR or something in the operating room or something along those lines, that seems like a life-saving use case for Listen over time.

Alfred Wahlforss: Yeah, I think what I’m really excited about as well is taking all those small things and then telling another agent to go and solve that problem. And we’re getting pulled in this direction by some of our customers where they will have a churn interview and then they will connect—if you find a bug for example, they’ll connect that to another coding agent to go and solve the problem.

Konstantine Buhler: That’s cool.

Sonya Huang: Let’s talk about generative agent simulation. It seems like the entire industry has gone from Market Research 1.0—call 100 people one by one, collate them manually—to Market Research 2.0, AI-native, where AI designs the question track, is able to talk to thousands of people simultaneously, synthesize the answers. It seems like we’re maybe moving to Market Research 3.0 with generative agent simulation. What do you make of that? You know, I both see the dream of it—I see how synthetic data has changed, for example, self-driving cars. And then I’m also inherently skeptical of it. Like, is a bunch of metadata just remixing what’s already in the pre-training sets? Are you actually learning anything useful or what else is in there? So I’d love to hear your take on it and how you guys are taking on the 3.0.

Alfred Wahlforss: Yeah. And maybe what is it too, to start?

Alfred Wahlforss: Yeah, so the way we are building simulation is by interviewing a single person. Say if I interview Konstantine for one hour, I can probably start to predict your preferences to some degree.

Konstantine Buhler: Fascinating insights about chest hair.

Alfred Wahlforss: [laughs] And it turns out that LLMs are quite good at this as well. So you can essentially try to feed in as much information as possible on a single individual. And then in some cases we’re able to get 95 percent accuracy to predict how they will answer certain questions. Now the problem becomes things are changing all the time, and chaos theory tells us it’s really hard to predict the future, otherwise we would be on Wall Street and making a ton of money. So the way we think about it is you need to hydrate these audiences. And the way we do that is through all of the interviews that are running through Listen. So we have a very strong network effect. We’ve done a million interviews so far, and that has grown exponentially since we reported that number.

Konstantine Buhler: Wow.

Alfred Wahlforss: And we’re able to train audiences on those interviews. So you can imagine a future where you can ask a question in Listen like, “How do software engineers think about Claude Code?” And then Listen will say, “Well, I already talked to 1,000 software engineers this week, let me predict how they’re going to answer that question.” But the tricky part is knowing what things you can answer and what can’t you answer, because …

Sonya Huang: And how do you do that?

Alfred Wahlforss: Yeah, we try to be very explicit to the model about what the domain of knowledge they have and then see how much you can expand that domain. That’s kind of the fundamental idea. And we can back test how well the simulation works with what’s in our training dataset, so we remove one of the questions and then see how okay, how accurately did you predict that. And then you can add in nonsensical things, like what’s the name of their dog or something like that, and then can say is the model able to understand that you can’t predict that?

Konstantine Buhler: That’s really cool.

Sonya Huang: What sorts of things are you finding you can predict well versus can’t?

Alfred Wahlforss: One of the most useful things is message testing. So that’s the idea of, like, what’s the tagline on the billboard? I was actually using it this weekend. So I created a panel of our customer base, and I had to come up with the title for a talk at a conference. And it’s like a small thing, but it actually does matter because it will increase conversion if people show up. And I came up with 100 different titles for my talk and inputted that into our simulation.

Konstantine Buhler: Oh wow.

Alfred Wahlforss: And the top talk was like twice as good as the next one.

Konstantine Buhler: Wow, cool.

Alfred Wahlforss: And I don’t know if it’s correct, but it certainly felt correct, and it was really helpful to have guidance in making that decision. And I also think even if it’s wrong, it’s just nice to have some help in making a decision. It’s also nice to outsource your decisions.

Sonya Huang: And how does it compare to just asking ChatGPT the same thing?

Alfred Wahlforss: Yeah, so then I inputted the same questions into ChatGPT. I had another talk I did that was not so successful, and I inputted a competitor’s—or another talk that was more successful. And I showed both of them to ChatGPT, and both of them to our simulation. And ChatGPT picked the wrong one, and our simulation picked the right one. So it’s early for us—we’re going to release this in a couple of months—but it seems like it’s performing better than the general models. And the models are trained on the average person, and you want to build for a very specific niche, and that’s how we can essentially train the models to follow that niche.

Sonya Huang: And just to push on this—because I think it’s so fascinating—can’t you kind of force the models into a specific niche or personality? Like, “Hey ChatGPT, you’re a 35-year-old, really grumpy software engineer that likes using your terminal.” And then it does take on the preferences of that niche. That’s sort of my mental model, at least. And so I’m actually surprised that ChatGPT wasn’t able to arrive at the right answer and that bootstrapping off real user data was, because ultimately it all is kind of a reflection of real user data, right? And so actually, what is the intuition for why sim-only on pre-trained data isn’t sufficient?

Alfred Wahlforss: Yeah, so we’ve tried many different inputs, and that certainly performs a little bit better than just vanilla ChatGPT. But what performs much better is—we tried credit card spend, behavioral data, purchasing behavior, but what we found was the best dataset is interviews, because it allows you to go off tangents, it understands—you can ask behavioral questions. And also it can’t just be any interview—like, the way you design the questions is also really important. And the intuition, I think, is that the models don’t have clean data on how a specific persona acts and how they think.

Konstantine Buhler: It’s anecdotal, but it makes perfect sense, because if you want to understand someone, what better way to understand them than asking them a lot of questions? That’s why we’re all here—it’s kind of the purpose of this kind of format. And if you have enough people that follow a certain group as opposed to the average, that can tell you a lot about other things they might not have explicitly said. All of AI is this generalization of some sort of compressed data, of some sort, and so if you have this compression in a slightly different part of the hyperspace, and you say, “Now complete this orbital of what everybody is thinking in this category of person,” Listen can fill that out because it has enough interviews.

Alfred Wahlforss: Yeah.

Konstantine Buhler: Do you think you’ll offer that package as a product? As in if I wanted to understand my customer—and for me, for us our customers are founders, and they’re very different people. If I wanted to understand my customer, could you do active interviews, the normal Listen Labs interviews, have 1,000 or 10,000 cumulatively, and then offer a little special-purpose Listen Labs bot that then I can use instantaneously for any ad hoc question?

Alfred Wahlforss: Yeah, that’s exactly what we have. So that’s what we call “augmented responses.” The cool part of this as well is that it can also live in your coding agents or your other agents. So I think in the future you’ll want to have almost a human API where the agents are able to call the preferences of your users to be able to know what to build, how to do it, or who to invest in, or how to help them best.

Konstantine Buhler: Today, is it all RAG? Is it fine-tuned? Is it something else? How do you take those conversations and then combine them with the models you’re doing the rest of Listen Labs with?

Alfred Wahlforss: Yeah, we are doing post-training, typical RAG as well. There are a bunch of different techniques. Some of them are proprietary, but yeah.

Konstantine Buhler: All right. We’ll do customer interviews on all your engineers. Report back.

Sonya Huang: I’m curious what you think of multi-agent systems and their role in helping us kind of iteratively use—at inference time iterate a better answer. Is that part of how you’re doing simulation or not?

Alfred Wahlforss: Yeah, the way we do simulation is essentially you have one person that you model really, really well and then you scale it up with a thousand people. So you have a representative sample, and it’s essentially multi-agent.

Sonya Huang: But you’re not having those thousand people debate each other. That’s what I’m asking.

Alfred Wahlforss: Oh yeah. No, we don’t have that yet, but that’s a good question.

Sonya Huang: Do you think that would help?

Alfred Wahlforss: Potentially, but there are these other competitors that are doing more of that approach. The worry is that again, chaos theory tells us that when things compound it becomes really hard to predict how they’re going to interact with each other. It’s something we should definitely explore more, but I’m a little bit skeptical of the approach.

Sonya Huang: Maybe the analogy I’d make is the AI council approach, which is send the same query out to three different LLMs and then have one LLM act as judge in synthesizing them. I do think on average you get a slightly better response.

Alfred Wahlforss: Yeah.

Sonya Huang: Cool. So where else do you see yourself going from here then? You’re going from Market Research 2.0 to Market Research 3.0, now with kind of generative simulation. Do you expect that 3.0 takes over as the majority of queries over time? And then what else is ahead?

Alfred Wahlforss: Yeah, I think you’ll still need human input. But I think there would be many more use cases that are now opened up where you can get customer input. So for the large decisions—if you’re doing a Super Bowl ad or things like that—you’ll still need to run real interviews. But for the smaller things, like what should be the tagline for your billboard and it’s a small billboard, then you can use simulation to answer that. And I still think there’s a lot of alpha on the core product as well to improve. I mean, when we started, the core idea was just making the interview less annoying to go through. We had an eval that looked at repetitive questions, or looked at is the AI even able to follow the instructions. And with GPT-4, sometimes we would ask the same question a hundred times.

Sonya Huang: Yeah.

Alfred Wahlforss: And in the beginning, that eval was at 20 percent. Now we’ve been able to climb that eval to be 85 percent. But now we created a new eval that’s much more advanced, so it’s able to understand what are you doing on your screen when you’re screen recording, or can you skip questions that are not relevant anymore. And now we’re back at 20 percent, which I think that’s one of the values that vertical AI companies can have is that they have this proprietary eval that they can use and essentially climb that eval. And that’s your advantage as a vertical AI company.

Konstantine Buhler: Keep pushing forward—better data, harder problems, better data, repeat.

Alfred Wahlforss: Yeah.

Sonya Huang: It seems to me like you’re in the middle of a very interesting infinity loop, right? Because fundamentally a company is: figure out what to build, build it, figure out what to build, build it.

Alfred Wahlforss: Write code and talk to users.

Sonya Huang: Exactly. And the “build it” is coming up rapidly in exponential. And the “figure out what to build” is the thing that you are pushing forward. And then not only—even outside of product and engineering, the broader loop is actually strategy, execution, strategy, execution. And so much of what AI is enabling us to do is making execution faster, cheaper, better, all these things. And the thing that you guys are fundamentally positioning yourselves to do at the company is the strategy part—from what to build to what to say. Is that a fair synthesis?

Alfred Wahlforss: Yeah. And I think when we have that one-person billion-dollar company, we’ll be part of that loop. So you have a coding agent and Listen and then run that in a loop. And we’ll have these autonomous organizations. And you can do that.

Konstantine Buhler: Even the big companies, though. Like, back to this idea of you can implement things faster. Let’s say you have an agent—I mean, if you can be a big company—and we’re talking in software because software is native to us—in software, if you could talk to a customer, figure out a bug, create a PR, have a coding agent close it, ship it, customer’s happy, that seems like a really important left-hand side of the equation—find the bug from an actual human. But I imagine it’s the same thing in a big atoms company. Like, if you’re consumer packaged goods, if you’re clothing, if you’re any of those things, I imagine it’s even more important because once you actually do the thing, it’s done.

Alfred Wahlforss: Yeah, exactly. Procter & Gamble, when they’re launching in a new market, that can be tens of millions of dollars, if not more. And you have to make sure that it’s right when you launch. And that’s one of the reasons why they’re big customers of Listen.

Konstantine Buhler: Who has done this historically really well? Who are the companies that are admired in history that have done a great job of listening to their customers, either in the consumer space or in the software space?

Alfred Wahlforss: I think Procter & Gamble is kind of the archetype of the best market research organization, where there are essentially marketing companies that are trying to figure out what are niches that people really care about, and then build specific brands to solve those problems.

I mean, one example is the Tide Pods. They were able to figure out that it was really uncomfortable to use the washing liquid, and discovered that people wanted something that was much more easier to use. And through customer interviews they found that insight, made the Tide Pod, and it became really successful.

Another example, which is in the Acquired podcast, when they talk about Mars, they did one of the first market research studies in the 1950s where M&M’s were originally designed for the army because they were a sweet treat that doesn’t melt in your pocket. And they discovered through market research that another great segment was young kids. And they then decided to pivot the entire advertising strategy to focus on this, because it doesn’t melt and ruin your furniture and things like that.

Sonya Huang: As we progress towards this Listen Labs future vision of the world, what are the things that you’re confident will work and what are things that you’re still not sure about?

Alfred Wahlforss: I’m confident that in the future you’ll still need to have human input, because even if you have a perfect rational being like AGI, humans are still irrational.

Konstantine Buhler: Totally.

Alfred Wahlforss: And they will still be chaotic in their nature where they all of a sudden getting obsessed with a new product, a new TikTok trend that shows up and you have to change your entire marketing strategy towards that. And so I think that will remain a really huge part of how we do things. I think I’m still uncertain about what level role simulation will play. I’m confident that it will work for certain questions, but we’ll see how good the models get at predicting human behavior.

Konstantine Buhler: I’d imagine it’s actually even more important the better AI gets to have the delta, because the competition—if companies are about serving people—which I think we can all agree on, at the end of the day every company is about serving humans.

Sonya Huang: Konstantine, our resident humanist.

Konstantine Buhler: [laughs] I’m a humanist, absolutely. But if companies are about serving people, because that’s why we’re all working is to help someone else in some way, and intelligence gets better and better and better, and you kind of have what the human wants here and the intelligence is approaching that asymptote, then the delta in that asymptote—which is what is in a human’s mind that isn’t in the AI’s mind—only becomes more important.

Alfred Wahlforss: Yeah. And one of the things that we’ve also realized is there’s a lot of talk around what are the moats for these vertical AI companies.

Sonya Huang: And what’s your moat?

Alfred Wahlforss: What is our moat?

Konstantine Buhler: We’ve got network effects and scale economies.

Alfred Wahlforss: Yes, we …

Konstantine Buhler: Those are nice.

Sonya Huang: I feel like we’re on an episode of Acquired right now.

Konstantine Buhler: Hey, I’m feeling it.

Alfred Wahlforss: [laughs]

Konstantine Buhler: I’m feeling it right now. It’s a good book. I recommend it. Seven Powers.

Alfred Wahlforss: Yeah, Seven Powers. Love Seven Powers. On the moats, I mean, we have the clear moats, which are the network effects on the panel where you have supply and demand dynamics. We also have the network [inaudible] data moat—as we do more interviews, you get better simulation. And then the product is very sticky, because you have all these interviews in your platform and you don’t want to lose that, you want to track things over time. But even the simplest things, I think, in terms of product advantages, like, one of the first things Bryan Schreier, one of our Sequoia partners, said was that founders want to build something that’s complex, but customers want something that’s stupid simple and it just works. They don’t want to configure their own workflow. They don’t want to sit and build custom software. And just one example of this is creating the interview guide is really difficult. It’s actually an academic subject. And it’s one of the reasons why you have services firms, because they know what methodology to use if you want to understand pricing or brand perception, these kind of things.

Konstantine Buhler: You don’t want to lead the witness.

Alfred Wahlforss: You don’t want to lead the witness. And it’s really hard to get that right. In the beginning, we just used the vanilla LLM models. And the customers would create the interviews, they would get the data back, and then they’d come back to us really frustrated saying, “What is this? I can’t use this data for anything.” And we took the blame for that. Now we’ve trained it to follow the best practices so that you always get good data out of the interviews. And I think that’s the advantage you have as a vertical AI company, that you can essentially train this agent to follow best practices in the work that you do.

Sonya Huang: So I want to go back to the concept of Tide Pods that you had mentioned earlier. I think it’s really interesting. So much of market research as I understand it today is almost more inviting people to pass judgment on ideas that you feed them. But it seems to me that one of the—you know, hallucinations can be a bug, they can also be a feature with generative AI. And do you think we’re going to see user research actually evolve into live product ideation? I could almost imagine AI inventing solutions as customers are going about their interview process, even helping visualize those solutions. Are your customers doing that already, or do you think we’re going to have the moment where AI can create a Tide Pods idea in a market interview anytime soon?

Alfred Wahlforss: Yeah, I think that’s really exciting. Today they do that manually, you know, use AI to generate images of different concepts and feed that into the interviews. But I think specifically also with simulation it becomes really powerful. So we now have an MCP as well, so that you can feed that into Claude and then you can tell Claude to run Listen in a loop and then come up with a bunch of ideas for how to market something or different concepts, and then you can have it run like that.

Sonya Huang: I’m even thinking in the course of an interview, as somebody’s complaining about, say, the Tide is not very portable, for the AI to be live brainstorming solutions with you, not just listening.

Konstantine Buhler: Yeah, this is what it could look like with an image generator, too. That’d be cool, Sonya.

Alfred Wahlforss: Yeah, I think it’s a good idea. You should be on our product team.

Sonya Huang: [laughs] Awesome. Well, Alfred, we really love what you’re building. Thank you for taking the time to share insights both on the broader market, which I think is just so fascinating, and also what it takes to be building in the application layer right now. We really admire the business that you’ve built, and thank you for your continued partnership.

Alfred Wahlforss: Thank you so much.