XBOW CEO and GitHub Copilot Creator Oege de Moor: Cracking the Code on Offensive Security With AI

Training Data: Ep24

Oege de Moor, the creator of GitHub Copilot, discusses how XBOW’s AI offensive security system matches and even outperforms top human penetration testers, completing security assessments in minutes instead of days. The team’s speed and focus is transforming the niche market of pen testing with an always-on service-as-a-software platform. Oege describes how he is building a large and sustainable business while also creating a product that will “protect all the software in the free world.” XBOW shows how AI is essential for protecting software systems as the amount of AI-generated code increases along with the scale and sophistication of cyber threats.

Listen Now

Stream On

Summary

Not only are attackers now using AI, but AI-generated code can inherit security vulnerabilities from its training data. Automating offensive security testing is an example of AI solving problems made more critical by the proliferation of AI systems.

Continuous testing transforms security: Rather than conducting pen tests once or twice a year, organizations need continuous automated testing that can keep pace with rapidly evolving systems. XBOW enables security testing after every code change, allowing teams to fix vulnerabilities before they reach production.

AI will bring novel attack patterns: While current AI approaches mirror human hacking techniques, the system is expected to discover entirely new types of attacks through continuous learning and reinforcement. This capability is crucial for staying ahead of malicious actors who are also employing AI.

Real-world context powers discovery: The AI’s understanding of real-world context enables it to autonomously identify critical vulnerabilities without explicit instructions. By comprehending an application’s purpose it can determine what constitutes a security risk and craft appropriate attacks.

Speed creates a defensive advantage: XBOW’s ability to complete comprehensive security assessments in minutes rather than days gives defenders a crucial time advantage. This speed allows organizations to rapidly identify and patch vulnerabilities before they can be exploited by attackers.

Cloud control ensures responsible use: By making the technology available only through cloud services rather than downloadable software, usage can be restricted to legitimate security testing against authorized targets. This approach helps prevent misuse while ensuring the technology remains a force for good.

Transcript

Chapters

Introduction
Complete disruption by AI
What is the offensive security market?
The first AI cyber warrior
Finding everything that there is to find
Developing GitHub Copilot
From Oxford professor to tech company CEO
How agentic applications are priced and packaged
Protect all the software in the free world
Lightning round

Oege de Moor: Because we now have AI code generation, everybody can create code. But not everybody knows about security. The models that generate the code have been trained on all public source code. There’s a lot of vulnerabilities in older public source codes, and so we generate much more code with more security problems. On the other hand, attackers are already using AI to make their own work more effective. And so we also have a greater threat. So more code, more attacks. And that kind of makes the automation that XBOW is doing absolutely essential.

Konstantine Buhler: Today we are excited to welcome Oege de Moor, founder and CEO of XBOW. As the creator of GitHub Copilot, Oege has helped push the boundaries of modern AI. Before GitHub acquired his last startup, Semmle, Oege was a computer science professor at Oxford. His new company, XBOW, is one of the most exciting AI native companies to launch this year.

They’re able to automate offensive security with an AI penetration tester. It’s one of the best examples of AI Service-as-Software that we’ve seen. We’re excited to talk to Oege about the breakthrough results of XBOW and what’s next in AI.

Oege, XBOW now matches the capabilities of the world’s best hackers. Is this one of the first industries that’s going to be completely disrupted by AI?

Oege de Moor: Absolutely. It’s going to completely change the way application security is implemented in the enterprise. Really, it’s an example of service as software. People will be able to replace a lot of routine human work with complete automation, and that will free up the humans to do the truly creative work themselves.

Konstantine Buhler: So Oege, tell us about some of the results that you announced recently, because I think they’re really quite striking.

Oege de Moor: So when we first built the first version of our product, we decided to try it out on renowned industry benchmarks. And these are challenges that human hackers use to hone their skills. We got these from a bunch of commercial providers, including PortSwigger and PentesterLab. On these benchmarks, our product scored 75 percent, which was amazing. In fact, it was so good that my first reaction was that surely there’s something wrong here. What’s actually happening is that probably these benchmarks are so well known that they occur somewhere in the training data and the model is simply regurgitating the answers.

So we created a new set of benchmarks, completely original, guaranteed to be not in any training set, and those had scored even better. 85 percent.

Konstantine Buhler: Wow!

Oege de Moor: So then the question is: how good is that really? To answer that, we got in five professional pen testers from reputed firms, and we asked them to solve exactly the same set of 104 challenges. And one of these people is really at the top of the game—the very best type of pen tester, the kind of person that you’d ask to secure a multibillion dollar hedge fund. And he scored the same. He scored the same as the AI. However, the human took 40 hours and the system took just 28 minutes.

Konstantine Buhler: Yeah, that is striking. And when we first partnered with Oege, it was science, I gotta say. We did not know if the AI would even be able to perform remotely as well as humans. And then when Oege called and said, “Hey, Konstantine, we got some results to share that will blow you away.” It certainly did. It certainly did. What do you think was your over/under back in January, February, when it was still science, as to whether an AI could perform at the level of these 20-year seasoned penetration tester experts?

Oege de Moor: So at that time I didn’t think that it would be achieved so quickly. I thought it would take at least a year to reach a reasonable level of proficiency. And even then I would expect that it would work at the level of a mediocre human pen tester, not at the level of the absolute top.

In fact, since we announced these results, we’ve been working quite closely with a bunch of early design partners. And at one of them this morning, we found an incredible critical vulnerability. Very surprising. And the way it worked, if you look at what the AI is doing, it first crawled the web app and then it found some source code written in PHP. And this source code was intended to access another host, but it used an insecure signing algorithm in order to make that connection. So XBOW was able to get to the other host, generate links and access that. Nothing interesting found there, so then it continued crawling the web app and found another endpoint and decided to try and use the same trick that it had previously discovered. Didn’t quite work. Needs another parameter. No problem. Browses around a bit, finds some more source code, this time in JavaScript, sees a number of candidate parameters, tries them all out, finds one that works, and now it has access to an endpoint. And when it explores that, it turns out it’s intended to download PDF files.

But not only could you download PDF files, you could actually download a password file. So this is quite serious. And what I find fascinating about this type of example is that the AI is exploring like a human pen tester is. It’s taking quite interesting, creative terms that would be hard for most human experts.

Konstantine Buhler: So just to summarize what you just said, this is a very—confidentiality, obviously—this is a very large financial institution that everybody watching this podcast would have heard of. High confidence. And the AI was able to find a very advanced vulnerability. This is the type of institution that has human penetration testers constantly targeting it and trying to find vulnerabilities, a massive budget on security. It was able to find a whole file full of passwords.

Oege de Moor: That’s right.

Konstantine Buhler: Just this morning.

Oege de Moor: That’s right. And we have something like that every day, every other day.

Konstantine Buhler: Wow.

Sonya Huang: Oege, congratulations on the results. Maybe can we take a step back? And for those who aren’t that familiar with this specific market, I’ve heard you and Konstantine talking about pen testing, and I think Konstantine called them hackers. I don’t know if that’s the same thing. Like, what is the offensive security market? And I guess how do you define the market that you’re going after, and what is XBOW?

Oege de Moor: Thank you for taking a step back. So offensive security is currently the best way to secure software systems. You invite external experts to come and simulate attacks against your systems, and they report whatever they find so that it can be fixed before the bad guys get at it. Now this is a highly-skilled activity. People need years of training to do it, and it’s expensive and slow. Typical cost of a so-called ’penetration test’ is something in the order of $18,000. Because it is expensive and slow, people only do it once or twice a year. That doesn’t make sense because their systems evolve much faster than that, and so there will always be periods of time that insecure systems are out there. And what XBOW does, it automates this process, this highly-skilled activity of launching simulated attacks and trying to find vulnerabilities. And because it automates it, you can now run it continuously instead of just once or twice a year.

Sonya Huang: Hmm. What drew you towards this market? I think Konstantine mentioned your background in founding Semmle, and having seen GitHub copilots, what drew you towards this specific market? Because it feels like there’s a dozen teams going after AI coding. You’re the only team I’ve met that is taking this specific approach to offensive security.

Oege de Moor: So it was kind of the natural thing to do. So my previous company, called Semmle, also was in security, but finding flaws in source code. And at Semmle we had an offensive security team which would use our product in order to find potential vulnerabilities, and then our security researchers would find exploits, and we would tell the world about what we found. Even at that time, it was kind of embarrassing to me that that last step of finding the exploits was done manually.

Then when I was at GitHub—GitHub acquired our company. At GitHub, I had the opportunity to found the Copilot project. And so it was natural to now take my newfound interest in AI and apply it to the challenge of automating offensive security. It was very lucky. One of the star researchers at Semmle was Nico Waisman, and he joined me in creating XBOW.

Sonya Huang: And one thing I’d love to ask you about. I think XBOW is such an interesting case study for this broader thesis we have that, you know, AI is actually changing markets of yesterday that weren’t as interesting. AI is actually really expanding and dramatically changing the nature of those markets. And I think this is a really interesting case study, so I’d love to dig into it a little bit more. The pen testing market is relative to say, endpoint security or network security, it’s a relatively small, services-heavy market today. And so to your point, offensive security is so important and it’s the gold standard, but it’s a relatively small market. How do you think AI is going to change the nature of that?

Oege de Moor: So first of all, it’s small because it is powered by a small group of highly-skilled human experts. I think AI is going to change the market fundamentally in a couple of ways. So first of all, because we now have AI code generation, everybody can create code. But not everybody knows about security. The models that generate the code have been trained on all public source code. There’s a lot of vulnerabilities in all the public source code, and so we generate much more code with more security problems. On the other hand, attackers are already using AI to make their own work more effective. And so we also have a greater threat. So more code, more attacks. And that kind of makes the automation that XBOW is doing absolutely essential. So we believe that the markets will grow enormously.

Konstantine Buhler: Oege, one of the—and Sonya, one of the analogies that I think about with this market is frankly, the adversarial nature of conflict, of human conflict. Cybersecurity is an adversarial game. You basically have two sides that get better and better equipment and they fight each other. And it’s a little bit of a game of cat and mouse, not completely unlike war and physical conflicts in human history. And one of the reasons why we think that this market is particularly interesting is, think about how frequent war games are played in the military, in the U.S. military, or in any military abroad. War games, red teaming. In fact, red teaming has been an initiative in most militaries for decades and centuries where you actually simulate a war game simulation.

So this is a level of national importance. And really what you have built is, in my eyes, the first ever AI cyber warrior. I mean, this is—I described it as a hacker, because this is an AI cyber warrior that can do things that no software has been able to do before, ever. And when you launched these results, I know with confidence because we talked about it, a bunch of people from DC called us up and said, “Whoa, wait a second. This is very consequential,” from DC and all over the West Coast. This is highly consequential, and I’m sure it didn’t go unnoticed by adversaries to the West as well, and that they have probably been working on issues like this.

So my question is: How do we stay ahead of the competition? True competition as in nation-state competition, not business competition. How do we stay ahead of it? And how do we make sure that XBOW is a force for good in this massive adversarial cybersecurity game?

Oege de Moor: So first of all, we stay ahead by moving very fast. At XBOW, we are very lucky to work closely with several of the creators of big foundation models which are ahead of the rest of the world. We’re also extremely cognizant of the potential dangerous uses of our technology, therefore, we’ve decided to make it available only in the cloud. By making it available only in the cloud and not in some downloadable form of software, we can actually control what scope is being used. Is it being used to launch attacks? And so we can require from our customers that they prove to us that the scope they wish to have tested is actually legitimately theirs and it’s not being used to attack someone else.

Sonya Huang: AI security warrior. Konstantine, I think you got—you’re the new XBOW CMO. That is incredible. Oege, I’d love to learn about how the product actually works and how the models work. How much of the magic of what you’ve built is—you mentioned you work with some of the major foundation model companies. How much of the magic of what you’ve built kind of exists in the foundation models versus things that you are building on top?

Oege de Moor: So most of the magic is, in fact, on top. We work with several of the foundation model providers. We’re very happy that they are in stiff competition and they’re playing hopscotch. One pulls ahead, the other one pulls ahead. And every time the foundation model gets better, it benefits us. But the true magic comes from the security team at XBOW. We’ve got some of the very best hackers in the world working for us, and that domain knowledge is what informs how our product works.

Sonya Huang: Can you double click a little bit into how that works? Is it prompt engineering? Are you fine tuning the models? I know that you probably want to keep your cards close to your chest as well in terms of how it works, but I’d love to hear the high level of how you’ve built it.

Oege de Moor: Sure. So I’ve already talked about these benchmarks that we use to evaluate our product at the beginning. And that is absolutely key. Benchmarks, benchmarks, benchmarks. It’s the lifeblood of a company, of a product like this. And so we’ve organized these into a kind of curriculum to teach the model how to solve cybersecurity problems better. And the benchmarks are critical to evaluate all the other changes that we make.

The other big component of our proprietary technology are the tools that we give to the LLM in order to forge these attacks. A human pen tester typically has a toolkit of a bunch of things that they use in order to do attacks. But here it’s a bit special because we want these tools to work well with LLMs. For example, since we were focused on web security initially, we need a web browser that is driven by the LLM. You need to click around, you need to fill out forms, and so on and so forth. And so we created a special browser to do that sort of thing.

Thirdly, and this is pretty important, we need guardrails. Maybe first try our products on some of these benchmarks. It struck me like an overeager, super brilliant teenager who would do lots of attacks and find something. And then it got very excited and goes, “I did a SQL injection. Let me show you what I can do. Drop table!” This is catastrophic if you use that at a customer. This is a big thing about pen testing services. You have to make sure that you do not actually do the harm that a real hacker, an adversarial hacker, would do. So we’ve been building guardrails to carefully watch over the shoulder of this brilliant teenager and stop it when it is not doing things that might be unsafe.

Then there is an initial phase of attack surface discovery. So what we have is a fantastic exploit finder, but you have to point it at the right endpoint to begin forging an attack. And so this is running a bunch of tools on prioritizing where to go first. And then finally, as you already mentioned, those of course are prompt engineering, tree of thoughts prompting to keep it on track and make sure that it finishes one goal, and when it finishes a goal it goes on to the next and so forth.

Konstantine Buhler: You describe the technology as a brilliant teenager who’s sometimes overeager and maybe finds an exploit and actually drops that table. Some places in the world there are actors that don’t have the same discretion to add those guardrails. What do we do to stay ahead of those actors and make sure that XBOW can protect those that are doing good against them?

Oege de Moor: So first of all, we need all the obvious safeguards in place. We need firewalls, and in that type of technology AI will also play a role. But first and foremost, we have to make sure that we find the vulnerabilities and the exploits before the bad guys do. And that’s what XBOW is all about.

Sonya Huang: Oege, how do you deal with hallucinations? And, you know, I hear about people saying, you know, “If my LLM does 50 percent or 60 percent, gets it right 60 percent of the time, I’m good.” I imagine security is one of those fields where that is insufficient. How do you deal with managing around the stochastic nature and the unpredictable nature of these LLMs?

Oege de Moor: So fortunately, because it’s automated, you just have to run it many times. Going back to your earlier question about the foundation models, what we do see is the better the foundation models get, the less attempts we need to make in order to find exploits. So it’s kind of interesting how those will influence how you deploy and package and price a product like this.

Very much like humans, if you get a human to perform this service for you, you actually pay for the time, for how long they tried, how many things they tried. So we are thinking about doing the same kind of thing: Charging our customers, on the one hand, subscription license, but on top of that you can pay for attack hours. If you want to do a really thorough test and make sure that you absolutely find everything, you can pay more, and obviously that would then pay for the inference time on our side.

Sonya Huang: I want to get into the pricing and packaging a little bit later because I’m very curious about that, and I think you are one of the first examples of service as a software, and so you are really paving the way in terms of how these things are priced and packaged. Before we get there, you mentioned inference time compute, and I think we’re broadly very excited about what’s happening as more and more of the compute is shifting from pre-training to inference time. What do you think the impact is going to be in your market?

Oege de Moor: For us, it can only be good when the value that we deliver remains constant, the price for delivering it goes down. We see this even over this very short time that XBOW has been in existence. We only expect that to continue.

Sonya Huang: Yeah.

Konstantine Buhler: Oege, on the probabilistic nature of these LLMs, just revisiting that concept for a second, my mental model of what’s going on is you have a state space with billions of possible states—the actions that the hacker can take, the actions that this penetration tester, this AI penetration tester can take billions of possible states. And you’ve introduced this really intelligent heuristic as to the directions to go. You in theory could execute all possible states in perpetuity if you had infinite compute and infinite time, but in reality, you have these constraints. And so I’m wondering, is that a reason why this might be the first, or one of the first markets to enable full AI automation? As in the stochastic nature of it, and the fact that even if you find one exploit, it’s extremely valuable and you don’t have the expectation of a complete, exhaustive search.

Oege de Moor: You do want to be sure that you find everything that a very skilled human being would find. And so this is why we’ve kind of exhausted our first set of benchmarks and we’re now creating a new set of benchmarks to be absolutely sure that we find everything that there is to find. People do these offensive security exercises, not only to find the vulnerability, but also to have the peace of mind that it’s not easy to find stuff they didn’t know about. And so we do have to make that. We have to present the evidence to our customers that we do find everything that skilled human beings would find, and people will insist on having that reassurance.

Konstantine Buhler: And when it is found, is it verified by a human or by the machine?

Oege de Moor: We have a validator that automatically validates that the report is correct and reproducible before it goes to a human. But of course, in the end, the human will have to take a look at it and fix the problem.

Konstantine Buhler: Makes sense.

Sonya Huang: Oege, I’d love to dive a little bit deeper into the results that you’ve attained so far. So you mentioned you’re at 85 percent on your current benchmarks, you know, at the level of the best human pen testers in the world. What have been the most surprising things that you’ve found as you dig into the nature of those results?

Oege de Moor: The thing that I find most surprising was that we—originally, we only had benchmarks with particular instructions. So here it would say something like, “You’re going to test a web app for managing medical prescriptions. Try to log in and access the prescriptions of another user.” And it would do that successfully. But then we ran another test where we took the instructions away completely and just said, “Here’s a web app, go explore.” And the AI was able to find exactly the same vulnerability because it was able to read what’s on the web pages and say, “Ah, this is about medical prescriptions. Probably it’s not a good idea that one user can access the prescriptions of another.” And so it would go and find that vulnerability completely autonomously.

I think that that’s part of the reason that this technology is so exciting compared to all the security tools that came before. Because these LLMs have an understanding of the real world, it actually can assess what is important to go and test. It doesn’t have to do this complete exhaustive search of all the possible possibilities. It can interpret what is important for this particular application.

Sonya Huang: That’s really cool. That’s really cool. And then does the way that the AI system kind of reaches its results, how does that compare to the way that a human pen tester would go about approaching the problem? I’m kind of thinking of, you know, AlphaGo and move 37, just, you know, very different from how we as humans would think about it. What is the model doing?

Oege de Moor: So it’s early days. Today, it’s very similar to what a human being would do. I completely agree that we have to be wary here of Rich Sutton’s “Bitter Lesson” in the end, because it learns on continuous stream data, on benchmarks, on more and more examples, it will start finding attacks that were unimaginable from a human perspective.

Konstantine Buhler: Which is a good thing. I mean, you say “cautiously.” I’m curious as to why. Isn’t that a great outcome?

Oege de Moor: Yes, yes. It’s a great outcome. So I’m merely saying that today when you read the places. Absolutely. This is what you would expect a good human to do. I fully expect that we’ll go beyond that in a couple of months and certainly within years.

Sonya Huang: Oege, where do you think the biggest remaining room for improvement lies? And I’m curious, you know, you mentioned looking at the traces of these models, like, would you say that they are reasoning already today and this is further improvement remaining in the reasoning area, or how do you think about that?

Oege de Moor: I think that that’s clearly the case, but most of the improvements will come from more data, more reinforcement learning on particular examples. And as we just said, that will lead to a similar improvement to other games like Go.

Sonya Huang: How do you get more data? Is that just running more simulations or—I imagine you’ve used a lot of the data there is.

Oege de Moor: A couple of different ways. We have quite a few contractors, security experts who create more benchmarks for us. There’s also the opportunity of mining open source. So we’ve only recently started doing this, just letting it loose on a large number of images on Docker Hub and finding—just let it go. And every time it finds something that becomes a new thing that it can learn from. And so it might find it by doing a hundred attempts. And so in practice, if you had to do a hundred attempts that probably wouldn’t work at a customer because you would already get shut down because there’s too many attacks happening. Clearly that shouldn’t happen. But because it’s open source and we can run it on our own servers, we can do a hundred attempts. But now we have the data to try and make the model better, to find it more quickly.

Konstantine Buhler: You mentioned open source and Docker Hub, and so that obviously gets me thinking about GitHub. And Oege, for those who don’t know, was the creative brain and creator behind GitHub Copilot, one of the most widely adopted AI applications in the world. Was there a moment when you were developing Copilot or productizing it, that you realized this AI is going to get so good that it’s going to automate entire processes—what people now call agents—actually take these actions on entire processes. And was there a moment where you said, “Hey, security is actually a very relevant area for this to happen?”

Oege de Moor: Hmm. So in fact, I wrote a memo in December of 2020 where I sketched what would later become Copilot. But also we were already speculating that perhaps it will autonomously be able to fix bugs, just look at the issue ticket. And now we see that functionality emerging. So yes, I think that that was pretty clear from the very beginning. I think the moment where I realized that would happen was I took a set of exercises, interview questions that I normally used to ask people at Oxford, and asked the model to solve them. If you just give it one attempt, it didn’t do it. But if you gave it a hundred attempts, or even a thousand attempts, it would do most of them. And at that moment it was pretty clear that as the models get better and they need less attempts, they will be able to do these types of things. And one of the things that we also hoped it would do would be doing security analysis, although admittedly I didn’t have offensive security on my list in the summer of 2020 just yet.

Konstantine Buhler: Cool.

Sonya Huang: Any other lessons from productizing GitHub Copilot that you think are relevant to share here?

Oege de Moor: I actually think that the most interesting thing about GitHub Copilot was that it was done by such a small team. When we launched, we were only 10 people, something like that. And it’s just a testament to how fast you can move with a dedicated team of people that believes.

Konstantine Buhler: How big was XBOW when you launched the results?

Oege de Moor: We were 13 people, so actually quite big.

Konstantine Buhler: Well, 13 really brilliant people.

Sonya Huang: Since you were part of the Copilot journey from the very beginning, I’m curious what you think of the current market for code generation AI startups. It seems like it’s one of the most crowded categories competitively right now. Do you think there’s a path to building a company there? And, you know, can one of these startups beat the incumbent GitHub that already has so much distribution?

Oege de Moor: I agree. So I like a lot of what’s going on. I particularly admire the work at Cursor or at Factory, but it’s really difficult to compete with the distribution of a juggernaut like GitHub. I do think that there may be an opportunity to go after a different market. So GitHub is reigning supreme among professional developers, but if you go after people who do not code for a living, there’s an opportunity. And Replit does this quite well, for example.

Sonya Huang: How do you think coding will transform in the future? Like, do you think the market that Replit serves, do you think there will just be a dramatically larger and more important market as AI kind of continues to take over the world? Or how do you think coding changes?

Oege de Moor: Yeah, so I think that the biggest change is going to be that many, many more people are able to create their own software. So that’s a big transformation. But even for professionals, it will be much more about the conceptual ideas, about sketching an architecture and then having the AI fill in the details.

Longer term, I believe that we may be moving away from code as we know it today. The artifact that you make as a developer is the conversation with the model. And so that is what you should store, because that records what the code is supposed to do, rather than the details in a particular coding language.

Konstantine Buhler: English is the coding language. So sketching …

Oege de Moor: That’s right. Yeah, so English is the coding language, perhaps with some diagrams to explain it better. But it’s just the next step in moving up in abstraction. Originally it was all in machine language, and we had higher level programming languages. And now we’re going to natural language and images.

Konstantine Buhler: So you talked a little bit about education and coding. I’m going to go down a little diversion for a second, because one of the amazing things about your life is you were a professor for much of it, and a very, very good one at that. So for context, Oege was a computer science professor at Magdalen College in Oxford. And Magdalen is one of the most prestigious colleges at Oxford. He was one of the most amazing computer science professors. I got to study abroad at Magdalen. It’s one of those incredibly serene places where they’ve got the deer park and the thousand-year-old buildings and the British man who tells me that the door at his entrance is older than my country.

Oege de Moor: [laughs]

Konstantine Buhler: And all of the things that you would expect from one of the most prestigious academic institutions in the world, including Oege was a professor and could walk across the grass, whereas I, a mere student, would only be able to if I was holding his cape with his permission.

Oege de Moor: That’s right.

Konstantine Buhler: And you left all of that to come into the commercial world with Semmle 15, 20 years ago. Can you tell us a little bit about your personal journey from leading academic at highly-prestigious institution to commercial CEO redefining the cybersecurity industry today?

Oege de Moor: I actually got into computer science because I loved coding. My very first program was a word processor so my dad, who was a professor of Semitic languages, could type his manuscripts on his computer. So when I started studying computer science, I got totally taken by mathematics and the foundational theories, and so that’s what I pursued as an academic initially. Then when I became a professor, I wanted to go back to my love of coding, so I started a new research group in programming tools, which eventually led to the spin out that was Semmle.

While I love the serenity and the peace and quiet of a place like Magdalen College, in our field, speed is incredibly important and speed can only be achieved with small teams that have a profit motive. It’s just different from trying to invent something because you have a paper deadline for an important conference, or you’ve got to invent it because otherwise this important customer will not sign up. And I actually loved that additional excitement and pressure, and that’s what led to me leaving Magdalen behind and going full in on Semmle.

Sonya Huang: Love it. A great advertisement for capitalism.

Oege de Moor: [laughs] So, you know, and if you do really foundational work, a university—and the University of Oxford is probably the best way to do it. But as soon as you start doing applied stuff, there’s no place like a startup.

Sonya Huang: I love it. I guess on that capitalistic note, I’d love to understand how you think about generating profits at XBOW. And since you are one of the first agent-first services as a software company, I think you’re really going to set the precedent for how these types of agentic applications are priced and packaged. And so maybe can you just expand a little bit about how you’re thinking about how to do that with your offering?

Oege de Moor: Sure. So we would like our product to run continuously as part of engineering processes. I mean, that is the main value proposition that instead of doing a pen test once or twice a year, you run it continuously after every change and immediately fix problems before they even reach production.

So if you think about it like that, the most obvious pricing model would be based on the size of the engineering team. Very much similar to products like GitHub Advanced Security. However, there is a different dimension here, and we touched on it a little bit earlier in the conversation. Some customers will want to do a super thorough test, really making sure that they’ve exhaustively eliminated every possible exploitable vulnerability. And in order to serve such customers, we should have a service component to our pricing where if you pay more, you get a more thorough task. And the way we talk about this is in terms of “attack hours.” How many hours of attack do you get? And so if you buy a normal license, it’s based on the number of engineers in your organization, and that comes with a fixed number of attack hours suitable for your environment. But then if you want to go more thorough, you pay that extra service fee in order to go deeper.

Sonya Huang: That’s super interesting. So you are really tapping into services like pricing models and budgets, but on the back end you have the gross margin profile of software.

Oege de Moor: Right. But I think that enterprise software is moving more and more towards a consumption-based model as well. Here there is a very clear correlation between attack hours and the benefits to the customer. And I think that correlation between resources you consume and the benefits you get as a customer, that has to be very clear for a pricing model like that.

Konstantine Buhler: Oege, my other takeaway from your Magdalen was, I mean, you’ve always had this interest in impact. You got into development tools because they touched people. You got into this because you know that this is going to change the world. I mean, you’re highly confident this type of technology is going to change the world with cybersecurity, whether it’s us or someone else.

Oege de Moor: I actually would put it a little differently. We absolutely must create XBOW because if we don’t do it, the bad guys will get there first. And so for sure, I mean, we do it because it’s interesting and we think it’s a great commercial opportunity, but it’s also an imperative. It’s an imperative for the free world that we actually create this thing to protect all the software in the free world.

Konstantine Buhler: That has been so clear from minute one of meeting that that is the driver behind you and this brilliant team that you’ve assembled of academics and builders and technologists that are, I mean, incredible. The other thing you mentioned in the Magdalen story was speed, the ability to move fast. And let me say you have moved really quickly, you and your team. What should we come to expect from XBOW in a year? What do you think will be the product—let’s focus on the product—and technology impact? What will be happening from a product and technology and capabilities perspective a year out?

Oege de Moor: You’re going to replay this to me in the next board meeting, aren’t you?

Konstantine Buhler: [laughs] Four board meetings, don’t worry.

Oege de Moor: So I believe that in a couple of months—so we’re currently in a phase where we very carefully try out the product with a select few early design partners. And the reason that we do that is because it needs the human supervision in order to control the brilliant teenager that we discussed before. Once we are over that phase and we are confident that we can let it loose without any supervision, I think everything is going to move very fast.

Part of the reason is that this type of product is very easy to deploy. You can just point it at an existing server and immediately find results. So I would expect that by next summer we have significantly transformed the state of web security, hopefully by demonstrating our work on open source, but also on platforms like HackerOne.

Sonya Huang: Oege, this has been one of my favorite episodes so far. Thank you again. Shall we wrap up with a quick lightning round?

Oege de Moor: Go for it.

Sonya Huang: Okay. Awesome. Number one: Favorite startups other than XBOW.

Oege de Moor: Suno. I love the way you can just type in a few words and you get a completely original song. It’s spine chilling to me. The other startup I like a lot is Harmonic, applying AI to mathematical reasoning.

Sonya Huang: Are you making Suno songs about coding and security?

Oege de Moor: No. [laughs]

Konstantine Buhler: There’s a great …

Oege de Moor: I sent my wife a new song about sitting on the balcony at home in Malta.

Sonya Huang: Aw, that’s sweet.

Konstantine Buhler: There’s a hard metal one about the AI cyber warrior.

Oege de Moor: [laughs] I’ll try that.

Sonya Huang: Konstantine, I think we’re actually gonna need that.

Konstantine Buhler: Okay, perfect.

Sonya Huang: At our annual AI event that we throw, we had Mikey from Suno there, and we crowd created an AI hot girl summer song. It was actually very catchy.

Konstantine Buhler: That was great. That was great. Oege was there. What other markets do you think AI is going to disrupt with this service-as-a-software model in the short, medium and long term?

Oege de Moor: So in the short term—and this is already happening—everything related to customer support is clearly going to be impacted by this type of technology. I think this is not exactly service-as-a-software, but I think that much of the problems we currently see with social media could be mitigated using this type of technology. I mean, you read all these reports about how social media is affecting the mental health of children all over the world. AI has the power to help with this type of problem. And long term, I think health and biology are the areas where this will make the biggest impact.

Sonya Huang: What advice do you have for other startup founders?

Oege de Moor: Focus on only one thing. Move as fast as you can. If you do those two things, then it will all come all right.

Sonya Huang: Love it. One last question, and we’re going to end on an optimistic note. What do you think is the best possible thing that can happen with AI over the next decade?

Oege de Moor: I already touched on it: The opportunities in health and biology and to significantly expand health outcomes everywhere in the world is amazing. Dario Amodei wrote this essay, “Machines of Loving Grace,” and I think he laid out very beautifully what the potential benefits of generative AI are for all of us.

Konstantine Buhler: Oege, thank you so much for joining us. This has been absolutely fantastic. And we’re so grateful to get to work with you, and for the fact that you’re building this on behalf of the right players, the people that are trying to do good in the world.

Oege de Moor: Thank you very much. It’s been a pleasure to be here.

XBOW CEO and GitHub Copilot Creator Oege de Moor: Cracking the Code on Offensive Security With AI

Training Data: Ep24

Listen Now

Stream On

Summary

Transcript

Chapters

Contents

Introduction

Complete disruption by AI

What is the offensive security market?

The first AI cyber warrior

Finding everything that there is to find

Developing GitHub Copilot

From Oxford professor to tech company CEO

How agentic applications are priced and packaged

Protect all the software in the free world

Lightning round