Podcasts Training Data How Glean CEO Arvind Jain Solved the Enterprise Search Problem – and What It Means for AI at Work

How Glean CEO Arvind Jain Solved the Enterprise Search Problem – and What It Means for AI at Work

Stream Now On

YouTube Spotify Apple Amazon

Years before co-founding Glean, Arvind was an early Google employee who helped design the search algorithm. Today, Glean is building search and work assistants inside the enterprise, which is arguably an even harder problem. One of the reasons enterprise search is so difficult is that each individual at the company has different permissions and access to different documents and information, meaning that every search needs to be fully personalized. Solving this difficult ingestion and ranking problem also unlocks a key problem for AI: feeding the right context into LLMs to make them useful for your enterprise context. Arvind and his team are harnessing generative AI to synthesize, make connections, and turbo-change knowledge work. Hear Arvind’s vision for what kind of work we’ll do when work AI assistants reach their potential.

Glean founder and CEO Arvind Jain spent over a decade as a distinguished engineer at Google leading Search, Maps and YouTube teams before co-founding Rubrik and then launching Glean in 2019. His insights stem from deep experience building both enterprise search and AI-powered applications. His insights focus on how to successfully integrate AI into enterprise environments while solving fundamental business problems.

Knowledge Infrastructure Comes First: Building effective AI applications requires a strong foundation of data infrastructure—including deep integrations with enterprise systems, robust security and permissions frameworks, and sophisticated knowledge graphs that understand relationships between people, content, and context. Without this foundation, AI applications will fail to deliver real value, regardless of model sophistication.

Context is Critical for Enterprise AI: Enterprise search and AI applications face unique challenges compared to consumer applications because company data is private, permissions-based, and deeply contextual. Success requires understanding not just content but also organizational structure, user roles, and access rights. Building this contextual understanding through techniques like knowledge graphs is essential for delivering relevant results.

Learn from Human Behavior: The best AI systems learn continuously from how humans actually work—which documents they engage with, how they interact in communication tools, and what information they find valuable. Tracking these implicit signals creates a virtuous cycle where the system gets smarter over time about what’s truly relevant and useful.

Start with Clear User Value: Rather than building AI technology in search of a problem, focus first on delivering clear value to users through core functionality—like enterprise search—and then expand into more AI capabilities. This creates trust and provides the data foundation needed for advanced features.

The Future is Proactive: While today’s AI requires explicit user queries, the next evolution will be proactive AI assistants that understand your work context deeply enough to anticipate needs and guide you throughout your day—similar to an executive assistant but available to every knowledge worker. Building toward this vision requires excelling at both the foundational and advanced aspects of enterprise AI.

Arvind Jain: The majority of the work that we do today is not going to be done by us anymore in five years from now. And that applies to me, that applies to you. Like, you know, we both do very different things, but still, like, I think we are knowledge workers and I think a lot of our work is actually going to be done by these amazing AI assistants that are actually in many ways more powerful than us. Like, you know, they have access to all of our company’s data, our knowledge. They have all the context from all the past conversations and meetings. They don’t forget anything. And they can really sort of—and they have the—you know, on top of that, they have the reasoning capabilities, you know, that allow them to be super helpful to you in, like, any tasks that you do. So that’s sort of our core belief, like, you know, that the majority of our work is actually going to be done by these AI companions or assistants. And we want Glean to be that assistant in the workplace.

Sonya Huang: Joining us today is Arvind Jain, co-founder and CEO of Glean. Earlier in his career Arvind was instrumental in building Google search and was co-founder and CTO of Rubrik. Glean began life as an enterprise search company and today has evolved to a general purpose work assistant. Bringing AI into an enterprise context is notoriously difficult because of the integrations, the permissions, the ranking, the parsing–all of the other magic that needs to happen to make AI work on your company data. Arvind joins us today to share how Glean is solving this problem where other companies have failed and what he’s learned as one of the first successful AI-native application companies.

Arvind, thank you so much for joining us. We have a lot of questions about RAG and agents and knowledge graphs and all of that, but before we do that, can you give us one or two minutes on what is Glean and what are you building?

Arvind Jain: Yeah. First of all, thank you for having me. Glean, think of it as the Google or ChatGPT, but inside your enterprise. It’s a place where your employees go and ask questions and Glean answers all of those questions using your company knowledge, regardless of where that knowledge is, brings it all back to you. So that’s what Glean does. Glean is also an AI platform. So if you want to actually build AI applications inside your company, you can use the Glean RAG platform to build those applications quickly.

Sonya Huang: Wonderful. And since you make the analogy of a Google for work, Google for work, I think, is something that every CIO has described as their holy grail. And we have two decades of failed attempts at building it. You were actually a star search engineer at Google before, and even Google never managed to crack this category before. Maybe can you just say a word on why is this such a hard problem and how did you do it?

Arvind Jain: Well, I mean, search is hard because it’s actually magic in some sense. Like, you can come and ask any question that you have, and you expect the system to actually give you back the right answer. So expectations are always high. And it’s a difficult problem, especially in the enterprises, because there’s so much information inside the enterprises spread across so many different systems, it’s both hard to actually even get hold of that information, but then even harder to actually make sense of what information is actually good, what has become out of date. So there is lots and lots of challenges around building that system.

And in the past, I would say that there were no good attempts made because the problem was so—it was so hard, it requires so much R&D, so much investment. Like, you know, it is not really startup friendly in many ways. And you also, like, couldn’t even—you know, in the pre-SaaS world, you couldn’t even build a product because you know, just getting connecting with all of your enterprise data meant that you just spent a year sitting with an enterprise trying to actually bring the data into your search system and then actually, you know, solve the real problem, which is like, you know, make that information searchable.

Pat Grady: Arvind, one of the things that I think is so interesting about Glean is you are probably one of the first and best examples of what an enterprise AI application company can or should look like. And we’re going to focus most of this conversation on the AI aspects of Glean, however, I know there are a lot of layers to your stack. You’ve got the infrastructure, you’ve got the connectors, you’ve got the governance engine, you’ve got the knowledge graph. Can you say a couple words about all the stuff you had to build before you even got to the AI part to make the AI work?

Arvind Jain: Absolutely. So as you said, like, you know, search starts first with the data and the knowledge that you need to actually make searchable. So the first part of the Glean tech stack is these deep integrations that we built with most common enterprise systems. So think of systems like Salesforce or Confluence, Jira, Google Drive, SharePoint, ServiceNow. Like, your enterprise data typically lives in all of these different systems, and you bring it all together in one place. So that’s the first part of our technology stack is these integrations.

But then if you think about enterprise data, and this is one of the most unique things about enterprise search versus if you think about Google search on the web, is most of your enterprise information is actually private in nature. When you author a document in Google Drive, this document may actually be private to you or you may share it with a few other people, and so you can’t build a search engine in which you just dump all the company knowledge and make it accessible to everyone. You have to actually understand the permissions of each content. So when you go and search, the system should understand who you are and only retrieve information that you actually have access to. So that’s our governance layer. Understanding governance across all of these hundreds of different systems, which is quite complicated.

And then the third part, and this is where really most products have failed in the past, is search is not about just putting like a whole bunch of documents in an index and then, like, you know, when somebody comes and asks a question, take those words or take that question and just match it up semantically or with keywords, you know, with the right content. You got to actually also understand who’s the person who’s asking a question. You know, I can come in and ask for an onboarding guide because I’m new, I’m a new employee. But then which onboarding guide should be actually, you know, given to me? Like, it depends on like, you know, whether I’m in the marketing team or I’m in the engineering team. So sort of understanding people and understanding knowledge and relationships between them, you know, that’s a big part of actually making a search or a question answering service work inside an enterprise. So we do that. So we actually build a deep knowledge graph, you know, where we look at all the employees, understand what roles do they play in the company, look at all the documents, and then we sort of try to understand what documents are meant for what departments, what documents actually are popular. Like, what are the relationships between a particular individual and a particular document? And that is what we used as a core foundation that sort of then governs like when somebody comes and asks a question, what are the most relevant pieces of knowledge for them.

So we have to do all of that work. And actually like, interestingly you mentioned like what came, you know, before AI even became relevant. You know, for us, the AI was actually part of the core search technology from day one. We were actually working with LLMs in 2019, or I mean, at least language models like, you know, these were, you know, BERT-based language models that you could use. And so when Glean …

Sonya Huang: MLMs.

Arvind Jain: That’s right. Is that the name for them now? But yeah, you know, in the search engineering sort of community, like, we just called them language models at the time. And so the language models were actually part of the core search experience from day one because it really allowed us to understand content at a semantic level. So that we sort of already baked in, like, in our core search experience on day one when we were actually trying to look at a question from a user. We were never sort of limited by the actual exact keywords that users use. We were able to actually understand the meaning behind the question and actually match it up with the right documents. But still that’s sort of all the work that you have to do before you can actually even do anything with LLMs.

Sonya Huang: Can you say a word about rankings? Like, I think part of what makes Google work so well is I always get the answer I want at the top of the page. In the case of the public internet, you have so much web data and links and all that to make the rankings really work. To what extent is that part of the magic for Glean and how do you guys do it?

Arvind Jain: Yeah, so that’s of course the core of the product is all the effort that we put in to actually build a really good ranking system for our search. I’ll give you some examples of the kind of things that go into determining what documents are the best ones to rank for a given question. So of course, if you imagine that there’s a document that people in the company are constantly looking at, so that gives you obviously a signal that there must be something about it. It’s actually important, people like to spend time on it. If there’s a document that was actually written in the last one or two weeks and there’s some engagement around it, you know that this again is information that people care about. It’s not actually become obsolete yet. Then if you think about the particular document that we see is not popular, like, when you look at a company level, but you look at one individual team inside the company, we see there’s heavy usage for that document inside that particular team. So that sort of tells us more about that, hey, this document may actually be relevant for this particular set of people.

Or the last thing, the last example, imagine that somebody had a question and they’re not bothering to a search, they go in Slack and ask a question, and then somebody else posts a link to a document as a response to that, and the person who asked the question gave a thumbs up to it. Just imagine this interaction, like what it means. It actually means that particular document was actually a really good answer for that question that the user had asked. And so if you keep that association in mind, like, it’s going to help you later when somebody else came and asked a similar question.

So those are some of the signals. You have to sort of constantly look for all these signals, you know, they have to collect them differently in the enterprise setting. Then, you know, on the web, Google only has to look at all the activity that’s happening right on Google itself because that’s the gateway to any sort of knowledge quest. But if you look at in the enterprise, you know, not all the things are happening, you know, through the search paradigm. So you have to sort of go and look at all the activity around all the knowledge in different systems, like, you know, your communication systems, your document systems, and just try to learn from that human behavior. Because ultimately that’s how you learn. You learn from what people are doing inside the company. The more you can actually collect, you know, that information, the better your ranking systems are going to be.

Sonya Huang: Totally. Can we spend a minute on RAG? As Pat mentioned, you were kind of in the right place at the right time. You had put together all the hard stuff so that when the LLMs got really good, you kind of had all the infrastructure in place. And I think you’ve been one of the experts in using RAG to make these LLMs actually useful on your corporate content. Can you explain RAG to me like I’m five years old, and, like, what are the secrets to making it work? What are the things that people don’t talk about? You know, what are examples of things that you can do thanks to RAG that you can’t in a generic chat interface?

Arvind Jain: Yeah, well, so first of all, I think since you were talking about a five year old, let’s first talk about what RAG is.

Sonya Huang: Okay. 10 year old. 10 year old.

Arvind Jain: So yeah.

Pat Grady: I need five. Let’s start with five.

Arvind Jain: Let’s start with five. Yeah. So I mean, like, if you think about all these amazing models, you know, GPT and Gemini and Claude, these models are all trained on the world’s public knowledge and data. And so if you were to actually go in ChatGPT and ask a question like, “Hey, how many days do I get off with my PTO policy?” it has no idea. Like, it can’t answer that question because that’s my company’s private knowledge. The answer is somewhere there and the model is not trained on it.

So how do you bring your private enterprise data to these models so that you can actually have AI create that magic for you? That’s what a RAG-based AI application architecture allows you to do. So the way it works is that you come and ask a question, and you have a search engine or a retrieval engine, whatever you want to call it. And given the question, this retrieval engine, you know, finds potentially relevant documents, you know, that could actually answer your question. And then you’re going to take those documents or those sort of content fragments and make the model work on it. You’ll tell the model like GPT that “Hey, I have this question and I have this company knowledge that I think is relevant, you know, in terms of answering that question. Now you answer that question using this knowledge.”

So this is how, like, most AI applications today are being built in the enterprise. The only way to actually connect, you know, your private enterprise data to the power of these language models is basically a search engine that’s sort of sitting, you know, in the middle. So we—like in our given—you know, at Glean also, like, you know, we of course built a search engine for all of our enterprise content in the last five years. It actually allowed us to actually become one of the best RAG systems that allows you now—not only of course we deliver our own end user application, which is a Glean assistant, using this RAG-based application architecture, but we’re also allowing companies to actually build more and more applications using RAG.

Now I think in terms of—while this is the architecture that is emerging as a canonical architecture for building AI applications, I think it is still full of challenges. It’s actually really hard to build great AI applications using RAG because one of the things that, you know, how models themselves are sort of while very powerful, they’re also still an emerging technology. Like, models hallucinate, they make things up. And what you’re doing now is you’re actually adding one more complex layer of technology in this application architecture. So think of it like you’re chaining two things which are both not perfect.

So oftentimes you will see a RAG-based AI application not perform well because you asked a question and the failure actually happens at the RAG stage, at the retrieval stage where you didn’t even actually—you know, weren’t able to find the right pieces of knowledge or maybe you found like, you know, stale information that then you’re actually giving to the LLM to work on. And then of course it’s going to give you bad results. So it actually—you know, while it’s the only way to sort of bring knowledge together, it creates these interesting challenges for you as well.

Pat Grady: Let me ask you a question—and just to paraphrase a bit what you were saying at the start of this conversation, act one: enterprise search; act two: application platform. For that, act on, which is enterprise search, how do the concepts of enterprise search and RAG relate? Is one a superset or a subset of the other? Are they similar but distinct? Are they the same thing? How does enterprise search and RAG—how do those concepts relate?

Arvind Jain: So I think, like, you know, I think of search and RAG as being in some sense, you know, they’re one and the same thing. The real core technology is taking all of your knowledge, enterprise knowledge, and putting them into this search system where now you can actually ask questions, and the system is able to actually give you relevant pieces of information back. So that’s sort of the core technology. Now you can actually use this technology either as a standalone product, so that’s what the Glean search product is, for example, where people come in, they ask questions and we can give them the relevant documents that potentially are useful to them based on that question. Or you could actually use this as an API layer in your overall AI application. So where the search system, the search module now is only one component of your overall AI application architecture. And so I think in that sense it’s sort of similar.

But the industry on the other hand, like, I think what we’ve seen most of these RAG-based applications in the enterprise today, they actually use a much more simpler version of a retrieval system in their RAG application, typically like a vector search-based system which doesn’t really have full enterprise context. And so I would say, that’s the key difference. So for us, our approach always has been to really think about how to build a standalone search system, something that is as good, that you can actually put them in front of the users as a standalone product. That’s really the real test for how good the search is. And that then actually when you put it behind the scenes in a RAG-based application, it’s actually going to actually create better AI experiences.

Sonya Huang: So is it fair to say that kind of the magic that you’ve done in terms of getting a good ranking of search results, that is exactly—like, you’ve made that ranking good for people. It turns out making that ranking good for people is also what you need to make the ranking good for machines in order to get the best possible results. And that’s why, you know, what you’ve built is very different from somebody that’s just DIYing a data pipeline and their own little retrieval system.

Arvind Jain: Yeah. Yeah, that’s correct. I mean, I think it’s really hard to build these systems, you know, like yourself and build them in a matter of weeks. Like, I think you can build a great AI demo, you know, in one day, maybe like in two hours now. But I think, like, to actually build, you know, where it’s sort of like, you know, it’s robust, it’s stable, it actually adds value to your—within your enterprise, you know, like, you know, it’s a hard problem.

Pat Grady: So we’ve talked a little bit about how you’ve built what you’ve built and we know that it’s working. We know that the company is quadrupling year over year. And we use it here internally, and there are a lot of happy people out there who are customers of yours. The real measure of success in some ways is how your product is changing the lives of its customers. And so I’m curious to hear from you, when you look at your customers and sort of how they operate day to day pre-Glean versus post-Glean, what are some of the changes you notice? How does this help people do their jobs?

Arvind Jain: So Glean is actually a product that is used quite heavily by people. Like, there’s many, many different types of things. We’re often surprised by what people are using Glean for, but I would give you a few examples. So for engineering teams, you know, they find Glean super useful, like, in terms of troubleshooting. Whenever you run into any kind of a roadblock or an issue, like, sometimes an error like, you know, your programs are not working properly, and so Glean serves as a really good troubleshooting tool for them. Like, you know, it’s a place where you go and debug because you post the issue. More often than not, you’re not the first one who’s going to experience an issue. Like, somebody else has experienced those issues before. So just getting the context from all of those, you know, like, all the other people and how they solved that problem before, like, you know, that sort of helps you move, solve that issue for yourself.

So that’s—you know, that’s a big use case for engineering. For some roles, like for support, you know, their life day in, day out is about resolving, you know, answering people’s questions. And I think a tool like Glean actually fundamentally just changes how they work now because by default now they don’t think about giving a question, like trying to go and look for answers in different knowledge bases and whatnot. Instead the first reaction that they have now is that there’s a question that’s coming in from a customer, and then Glean on the side is actually already answering those questions for them. So their sort of model of working changes from trying to find things to actually trying to validate what AI is telling them is the right answer, and then just share that back with users.

Some teams have actually really changed their behavior. Like salespeople, for example, they use Glean as a way to prep for meetings. So before, like, a customer call is coming up, they will just ask Glean. Like, you know, they can be lazy. They can ask Glean, like, you know, help me prep for this meeting. And Glean is actually going to bring that 360 view of all of the data from that customer, like what happened in the last meeting, who’s—you know, like what opportunities are open with them and things like that. So it sort of really helps them prep for a meeting, then actually run a meeting well, because customers always have lots of questions. And so the salespeople feel more confident, you know, like, running that meeting. Because if somebody throws a curveball at them, you know, they can just ask clean, like, right in the meeting, you know, get the answer and quickly sort of, you know, get the responses back.

So in fact, like, you know, in our company, we don’t allow salespeople to actually bring in sales engineers in the call. Like, they have to answer those questions themselves. So that’s like, you know, one change in behavior that we drive, like in the first few calls. Yeah, those are some of the things. But overall, the use cases are unbounded. Like, the one that I think is the one that’s universal across everybody inside the company is finding other people who can help you. That is one of the things that Glean makes it really easy for people. We help you connect with the right subject matter experts based on what questions you have. So that’s one thing that we see everybody in the company make use of a lot.

Sonya Huang: Is there a North Star metric you track? Like, these are wonderful kind of stories of customer impact. I guess, how do you benchmark yourselves objectively?

Arvind Jain: Yeah, so our key metric is how many questions people ask on a daily basis and actually get successful—like, you know, we were successful in answering those questions correctly for them.

Sonya Huang: Hmm. So similar to, like, Google’s Search SAT metric then.

Arvind Jain: Okay. Can you share anything about those numbers, or do you prefer to keep it private?

Arvind Jain: Well, we have—so yeah, so we have this technical metric. I don’t know how much sense it’s going to make, but we tend to actually keep that number at 80 percent. So I think it’s a proxy for that 80 percent of the sessions, you know, that users had with us, they were actually successful in getting what they needed.

Pat Grady: And is that—do you measure that success, is that explicitly they thumbs up, this was good. Or is that implicitly they take action on the basis of the results that you served, and you can see that action taking—how do you actually measure the success?

Arvind Jain: It’s actually implicit. So we will track their actions. For example, in search, when you come and ask a question, and then you click on one of the top two or three results and go to the destination and then stay there for a long time, so that sort of gives us an indication that you were happy. You didn’t come back and ask another question quickly or refine your search. So that’s sort of how we track, like, whether somebody’s successful or not.

Pat Grady: Got it. What are some of the top things that are not in the product yet that you think will make people more successful?

Arvind Jain: I think the—like, I referred to this, you know, when we started, that building a product like ChatGPT or Glean, it’s sort of like magic. Like, you know, the expectations are infinite. And because, you know, like, it’s supposed to basically not just answer any questions that people have, but also perform any task that they actually ask you to do. And so for us, it’s not so much about what features are missing. Like, you know, like, the big thing that we have to actually keep working on is actually be successful at this core feature, which is like, you know, answer people’s questions correctly, and answer questions of higher and higher complexity correctly for them over time.

So we feel like us or anybody else out there today, we’re all very far away from that true vision for our product, which is that we want Glean to be that AI assistant that can actually answer any questions that you have using your company knowledge, that can actually do half of your work for you in the future. And so I think, like, I would say, like, you know, we’re maybe two percent of the way there. Like, you know, AI, for all is said and done, you know, we’re still in very, very early stages of making that impact.

Sonya Huang: So we’re only two percent of the way there. I’d love to ask you actually about agentic reasoning. It’s something that’s been on our mind a lot as a partnership at Sequoia, and I know it’s been on your mind as well as a founder. And one of the results that I was really impressed by in the coding space was that, you know, with RAG, I think these coding agents can get to three, four percent completion rates, but if you give them more agentic reasoning capabilities, they can get to 14, 15%. So like a multifold improvement. And, you know, it’s as simple as, you know, go reflect on what you just said or best event or, you know, whatever the techniques are. I’d love to understand how you guys are thinking about incorporating more agentic reasoning into your products, and anything else to kind of get from that two percent where you said we are today to what you hope to build one day.

Arvind Jain: Yeah. And I want to clarify the two percent is something that I was making up. Like, it’s not a measured number.

Sonya Huang: Yeah. Yeah.

Arvind Jain: I just wanted to sort of, you know, express, like, how early things are today and how much amazing things we’re going to see in the future. Like, I was just basically trying to talk more about that. But in terms of agentic behavior, one of the things that we are doing on that front is first try to actually get a lot of input from our users. So we have a concept of building a workflow inside Glean to actually answer a complicated question. And today we actually seek a lot of help from our users in sort of completing that workflow. We’ll actually—like, say, for example, if you come and ask a question like, “Help me write a weekly status report of all the work that my team did.”

So this is your question. Now if you think about this question, it’s complicated. There are a few things you need to do to actually go and really figure out the answer to it. The first thing is you have to actually understand what do you mean by ‘your team?’ Who’s your team? You have to maybe go in your HR system, try to figure out who are the people who report to you. Then we are talking about work. So where does work happen for each one of these team members? You sort of build an understanding of that, like, you know, and then go sort of pull a bunch of knowledge from all these different systems.

So I think right now what we’re doing is we’re actually trying to get help from our users, and we will sort of create a plan for a complicated question, try to actually get the user to actually input and tell us, like, you know, are we getting it right? Sometimes, you know, users can actually explicitly, like, completely ignore what we do and just build a workflow on their own. And I think that’s going to be essential for us to build that fully agentic behavior for the future. I think you can build agentic behavior for a specific narrow set of problems, but in Glean, since our footprint is so wide, there is the range of questions that people can have, the range of tasks that they want to perform is so broad that we feel like first we have to learn. We have to learn from workflows that people are going to actually create manually, and then build these models which can then take complicated questions in the future and automatically build those, convert them into these agentic loops or a complicated workflow. So that’s the approach that we are taking on it.

Sonya Huang: I see. So you’re saying since you have such a broad surface area, you can’t build agentic reasoning for every single possible task, and so instead you’re exposing a workflow engine for your users to individually be able to build different automations and different agents.

Arvind Jain: Yeah, and then you learn from it. And then you learn from it, so once you see people building these workflows, that sort of then fits and goes into a training data set to sort of allow you to actually automatically build, like, new workflows based on complicated questions that people have. So those agentic capabilities are coming, but again, I would say when it is hard for you to answer simple questions, then if you want to do complicated tasks, it’s equally hard because you’re going to make a mistake. Imagine an agent that actually breaks down a complex task into a series of 10 individual tasks, then your error rate is going to compound, like, if each step is 90 percent accurate. So there is like—it is incredible, but I think it is still something that we are—I feel like the human assist is actually critical in building these complicated workflows.

Pat Grady: It’s also—Arvind, it might be worth saying a word—maybe this is obvious to people who are listening, but just to state it explicitly, how act one, which is the enterprise search business, gives you the moral authority or the unfair advantage to get into act two, which is the application platform, or the platform for agentic behavior. It may not be totally obvious to people how act one leads to act two. Can you just say a couple words about that?

Arvind Jain: By building the search product, which immediately adds value to our customers, to our users, we are able to actually solve a bunch of complicated problems that you will typically run into in an enterprise. The first part of that is security. So if you think about the Glean product, we are actually telling our customers that hey, give us all of your data and we’re going to hopefully do something useful for you after you give that data to us. And that’s a big demand. Like, it’s not easy for companies to actually trust, you know, a new product company, a startup with all of their data when they’re actually not getting any immediate value from it.

And so that is one of the things that we’ve seen to be super helpful to us, because we actually have this search product that people understand, that people want, and they want to deploy. So it’s already deployed now, and so Glean is running and it’s already connected with all of your enterprise data inside the company. And so then, like, you know, helping us going to our customers and then saying to them that, like, “Look, you know, use that as your core AI data platform,” is a much easier sell because we don’t actually have to convince them again to actually give all of their data to us. It’s already there.

Pat Grady: So this might not be a perfect analogy, but hopefully it’s not a terrible analogy that, you know, Tesla had an advantage in self driving because they’re already selling cars. You guys have an advantage in delivering AI agents because you’re already selling a data platform that organizes all the enterprise information, makes it accessible, makes it secure, kind of puts people in a position where they’re already asking questions of it, and it’s kind of a logical next step to ask it to start taking actions.

Arvind Jain: Absolutely.

Sonya Huang: I think you also announced a set of APIs that let developers build on Glean. Maybe say a word on that. Like, I think that was in response to customer demand. What makes developers want to build on Glean versus directly access their own data? I think it’s probably a similar effect to what you just talked about.

Arvind Jain: Yeah. So a lot of AI applications that our customers are wanting to build, they need to actually tap into data that lives in multiple different cloud-based SaaS systems. And I think it’s quite tedious for them to first go and actually bring that data in one place, build a search or retrieval layer using that. The integrations are hard, understanding permissions and governance is really, really hard on that. And I think when people—like, as these models actually became accessible and developers started to actually develop AI applications, they realized that—well, they were really excited about building these new cool AI apps, but basically they realized that building an app, 90 percent of the work was actually this boring infrastructure work that they didn’t want to do, like bringing data from all these different systems, running these ETL and data pipelines and then sort of building a good search over it. So you’ll spend so much time before you actually even get to play with AI.

And so that’s the thing that they find very useful with Glean because we’re actually solving all the problems around ETL, building a great search, properly obeying governance within your company. All of that stuff is done for you. You just have your search API, and you can focus all of your attention on the business problem that you’re working on and how AI can help you achieve that automation that you’re looking for.

Sonya Huang: In some ways, all the hard work you’ve done to ETL and put all the data together with data governance reminds me a lot of Snowflake. And you’re really doing it with text data and unstructured data. But just that central data platform that companies can build around, build apps on top of reminds me a lot of the Snowflake story.

Pat Grady: Arvind, can we ask you a question about future state, if you allow us to dream for a few minutes? Five or ten years from now, how do you think Glean is showing up inside of a business? And maybe more importantly, if you’re the typical knowledge worker five or ten years from now and you are equipped with Glean, what is your life like then?

Arvind Jain: That’s a great question. I think let’s keep it five years instead of ten. And I think the—well, I mean, one belief that I have is that the majority of the work that we do today is not going to be done by us anymore in five years from now. And that applies to me, that applies to you. Like, you know, we both do very different things, but still, like, I think, you know, we are knowledge workers, and I think a lot of our work is actually going to be done by these amazing AI assistants that are actually in many ways, you know, more powerful than us. Like, you know, they have access to all of our company’s data, our knowledge, they have all the context from all the past conversations and meetings. They don’t forget anything and they can really sort of—and on top of that, you know, they have the reasoning capabilities that allow them to be super helpful to you in any tasks that you do.

So that’s sort of our core belief, like, that, like, the majority of our work is actually going to be done by these AI companions or assistants. And we want Glean to be, you know, that assistant in the workplace. We want Glean to be the place where most of your work happens.

One of the things that we also think is going to change is today a lot of AI is about you go and seek help from these AI agents. For example, you go and ask questions, you get answers back. But the future is sort of—the future is where this assistance is going to be proactive. It’s sort of like if you think about, like, if you have an executive assistant, they actually help you a lot. A lot of their help is when you go and ask them for help, but a lot of their help is actually proactive in nature. They tell you what to do next, they manage your day, they know everything about your work life, and they guide you to be effective throughout the day. And I think AI is going to actually allow that luxury regardless of who you are. And today, some executives in the company have that luxury, but in the future everybody’s going to have these really powerful AI-based assistants that are going to actually help them do their work. So we’re really excited about bringing that change to the workplace, and hope that, you know, Glean can be the world’s most successful AI assistant.

Pat Grady: Love it.

Sonya Huang: Arvind, can we change gears a little bit? And I’d love to step back and hear your advice for other founders. You are one of the most successful application-level AI companies—I think probably number two behind Copilot and Scale. And you did it as a startup, as an independent startup. I think you’ve also had to navigate some unique challenges, right? Like OpenAI, for example, is one of your providers and also one of your competitors, one of your top competitors. Maybe just tell us what that dynamic is like.

Arvind Jain: Well, I mean, first of all, from a point of view of, like, building a startup, in fact, like, you know, I’ve been actually quoting you guys in many places. Like, you know, Pat, I remember the slide where you talk about, like, the overall software market being $600 billion, but then AI sort of is, you know, expanding that market to $15 trillion or $12 trillion. Something massive.

Pat Grady: Yeah.

Arvind Jain: And that’s actually like the reality of where we are today, which is that everything that we do is going to actually change, fundamentally change. AI is going to be a key component that’s going to drive that change. So first thing that I, as a founder, I don’t actually worry about what other people are actually working on, because even if all of us are working on a lot of great things, it will still not be enough. It will still not be enough to actually solve all the problems that need to get solved. And so that’s the first mindset.

So like, you know, I think for advice to other founders, that’s the thing I want to tell them, like, if you found a problem, you know, just go work on it and don’t worry about, like, if somebody else is solving it, because the chances are that other people are not and they won’t solve it the same way as you will. But for us, like, you know, and coming back to Glean specifically, the dynamics for us, you know, we felt the same way. For the first four years of our existence, we were working on a problem, you know, where we had no competition. You know, nobody was actually interested in solving the problem that we were solving because it was a dead market, and we had to create a category which would actually generate interest, you know, be evangelical. But we knew that we were working on an important problem.

But then, you know, suddenly ChatGPT happened and search has become hot. And now, in fact, every company that you go and talk to wants to build a product like Glean. And so is that good news for us? Is that bad news for us? Like, how do you think about it? You know, from our perspective, like, it doesn’t matter. Like, you know, the way we feel is that it’s actually great news for us. Like, now everybody is interested, everybody wants to buy our product. And yes, we have to compete with many, many other vendors, but that’s the place where we think we’ll win, because we have the desire to actually solve this problem and stay focused on this problem, keep working on it. And there’s no reason for us to not do a better job than others.

Pat Grady: Part of what I heard in there was that building an AI company is just building a company: find an important problem and solve it in a compelling way. I’m curious, particularly because this is not your first rodeo. You know, Rubrik was obviously wildly successful, and of course, you were pretty core to some of the early days of Google. How much of building an AI company is just building a company, versus things that are AI specific in some way?

Arvind Jain: It’s a great question. I think of AI mostly as a tool in your arsenal. And one of the tools. I don’t think you suddenly become a different company because you’re doing something with AI. In reality, I think there’s going to be no new company that is not going to use AI technologies in some shape or form. So my point of view is that look, you have to actually find a business problem that you’re planning to solve, and hopefully you can actually solve that problem in a much better way because of the technology that AI is actually providing to you now. So I don’t think it actually changes. I don’t think it actually feels different, like, you know, whether you are—like, we don’t think of ourselves as an AI company for that matter either.

Sonya Huang: Would you ever train your own models? I guess maybe more broadly, how do you think about where Glean’s core competencies start and stop? And, you know, if you have 100 R&D chips, where do you want to place them?

Arvind Jain: We don’t have plans to train super large models. But at the same time, you know, we do train models which are smaller in size. You know, for every individual customer of ours, these language models that we train for an individual customer, it sort of goes through all of their own enterprise corpus and sort of starts to understand, like, you know, sort of the lingo, the speak, you know, the acronyms, the code names, all of that stuff within their corpus. So model training is actually a core part of the clean core technology, but not in the sense of training a model like GPT-4. We don’t do that. We don’t have plans for it. We plan to partner with a lot of other great companies that build models of that scale.

Sonya Huang: Wonderful. Arvind, thank you so much for joining us today. This was a wonderful conversation. We really appreciate it.

Arvind Jain: Thank you for having me.

How Glean CEO Arvind Jain Solved the Enterprise Search Problem – and What It Means for AI at Work

Stream Now On

Listen Now

Summary

Transcript

More Episodes

Building the GitHub for RL Environments: Prime Intellect’s Will Brown & Johannes Hagemann

The Wartime CEO: Vlad Tenev of Robinhood

How Glean CEO Arvind Jain Solved the Enterprise Search Problem – and What It Means for AI at Work

Stream Now On

Introduction

Search rankings

Retrieval-Augmented Generation

Where enterprise search meets RAG

How is Glean changing work?

Agentic reasoning

Act 2: application platform

Developers building on Glean

5 years in the future

Advice for founders

Building the GitHub for RL Environments: Prime Intellect’s Will Brown & Johannes Hagemann

The Wartime CEO: Vlad Tenev of Robinhood