Skip to main content
Podcasts Training Data Securing the AI Frontier: Irregular Co-founder Dan Lahav

Securing the AI Frontier: Irregular Co-founder Dan Lahav

Irregular co-founder Dan Lahav is redefining what cybersecurity means in the age of autonomous AI. Working closely with OpenAI, Anthropic, and Google DeepMind, Dan, co-founder Omer Nevo and team are pioneering “frontier AI security”—a proactive approach to safeguarding systems where AI models act as independent agents. Dan shares how emergent behaviors, from models socially engineering each other to outmaneuvering real-world defenses like Windows Defender, signal a coming paradigm shift. Dan explains why tomorrow’s threats will come from AI-on-AI interactions, why anomaly detection will soon break down, and how governments and enterprises alike must rethink defenses from first principles as AI becomes a national security layer.

Listen Now

Summary

Dan lays out the current and future state AI security: 

Economic activity will shift toward human-AI and AI-AI interactions: As AI agents gain autonomy and embed themselves in workflows, security will need to adapt to unpredictable agent behavior far beyond traditional digital threats.

AI models’ offensive capabilities are progressing rapidly: New models can chain vulnerabilities, demonstrate situational awareness, and perform complex exploits—skills that were unavailable even a few quarters ago, requiring founders to anticipate threats well ahead of deployment.

Traditional monitoring and anomaly detection are inadequate: Existing security tools can’t track agents communicating in evolving protocols or spot emergent behaviors, making it essential to rethink how monitoring and baselining work in an agentic world.

Proactive, experimental security research is critical: Simulated environments that push models to their limits are vital for understanding how attacks and defenses play out, enabling founders to develop protections before threats appear in the wild.

Security must be balanced with innovation and productivity: Overly restrictive defenses can stifle progress, so founders should focus on robust measurement, mapping which traditional defenses remain relevant, and timing new controls to the actual evolution of model capabilities.

Transcript

Intro

Dan Lahav: There was a scenario where there was an agent-on-agent interaction. I won’t say the names, but you can kind of think about it like a Claude, a Gemini. And it was a critical security task, that was the simulation that they were in. But after working for a while, one of the models decided that they’ve worked enough and they should stop. It did not stop there. It convinced the other model that they should both take a break. So the model did social engineering on the other model—to another model. But now try to think about the situation where you actually, as an enterprise, are delegating an autonomous workflow that is critical to you to complete, and the more complicated and capable machines are ultimately going to be, the more of these weird examples we’re going to encounter.

Dean Meyer: Today on Training Data, we dig into the future of frontier AI security with Dan Lahav, founder of Irregular. Dan challenges how we think about security in a world where AI models are not just tools, but autonomous economic actors. He explains why the rise of AI agents will force us to reinvent security from first principles, and reveals how the very nature of threats is shifting from, say, code vulnerabilities to unpredictable emergent AI behaviors. Dan also shares surprising real-world simulations where AI models outmaneuver traditional defenses, and why proactive experimental security research is now essential. His view is that in a world where more economic value will shift to human on AI or AI on AI, solving these problems is paramount. Enjoy the show.

The Future of AI Security

Dean Meyer:Dan. Wonderful to have you with us today.

Dan Lahav: It’s a pleasure to be here.

Dean Meyer: Awesome. So before we jump into questions, I will just say that it was very hard to get in front of Dan. I was trying to get in front of him for three months, probably thirty to forty emails, five or six people around us who we both knew closely were pinging him all the time, and he was still not responsive. And I basically learned where he was spending most of his time, and I kind of went …

Sonya Huang: Did you stalk him?

Dean Meyer: I kind of stalked him. I kind of stalked him. And eventually, we basically bumped into each other, like, not intentionally. And anyway, so we bumped into each other. I was like, “Dan, you know, you’re brilliant. I keep hearing great things. Please respond. Let’s find time. You know, we at Sequoia spend a lot of time in AI security.” And eventually, we found time the following week. So welcome, Dan. Thank you for everything.

Dan Lahav: It seems that I’m going to have to start this podcast with an apology, sorry Dean, sorry Sonya, sorry the entirety of Sequoia. It indeed took time.

Dean Meyer: It took time, but we partnered, and here we are. And you guys have done wonderful things. So it’s wonderful to have you with us today.

Dan Lahav: Yeah, it’s a very, very happy ending and, you know, just, like, appreciate you and everyone here.

Dean Meyer: Of course. Of course. Okay, so let’s jump into it. I’m going to start with a spicy question. As we recently saw, you partnered with OpenAI on GPT-5. And let’s kind of look forward a little bit. What does security look like in a world of GPT-10?

Dan Lahav: Ooh, spicy and speculative, indeed. So let me wrap my head around that. So obviously everything I’m going to say is speculation, projection, but I think the way that we think about what’s going to come is trying to understand how we’re even going to produce economic value, and how organizations and enterprises and people are going to consume stuff in the world at the time of GPT-10 or Claude-10 or just like any one of the models.

Let’s do a thought experiment to just, like, clarify why we actually believe that sometimes we think in the next two to three to five years there’s going to be a huge shift in the way that humans are even organizing themselves. As an outcome, security is probably going to be very different as well. So here’s the thought experiment: So imagine a situation where you work with OpenAI, and you go one generation up or two generations up and you tell your parents or grandparents that you’re doing work with Anthropic or with OpenAI or with Google DeepMind on security, I think their mind would go on to assuming that the work that you’re doing is probably providing a bodyguard service to Sam or to Dario or to Demis. Because the canonical security problem of a few decades ago, you know, it’s like our parents’, grandparents’ generation, was physical security, because the vast majority of economic activity was in the physical realm and not in the digital world.

Dean Meyer: Yeah.

Dan Lahav: And, you know, after the PC revolution and the internet revolution, we shifted the way that we are organizing and creating value. We transitioned primarily to a digital environment. And just think about how strong of a testimony it is of how many times you did an economic activity of value just by getting an email from someone that you may have not met. Just this morning I got an email from my bank activating me to do something from a person that I’ve never met, maybe it was a security person—that’s not a great thing to say openly, but just like we do that all of the time because that’s the way that we interact in society.

So our view is that soon that’s going to happen again. And the reason is that AI models are getting gradually so capable that a lot of the economic activity of value is going to transition to human-on-AI interaction and AI-on-AI interaction. And that means that we may see soon a fleet of agents in an enterprise, or a human when they’re doing a simple activity like trying to draft a Facebook post, taking a collection of different AI tools in order to just promote that activity that they’re doing. And we’re essentially embedding tools that are increasingly more capable, and we’re delegating them tasks that require more and more and more and more autonomy in order to drive meaningful parts of our lives. So we’re transitioning from an age where software is deterministic and is fully understood end to end, to an age where this is no longer the case. And as an outcome, enterprises themselves, or just how we interact with the world, is going to go to a fundamental change, and it’s clear that security is just not going to be the same.

As an interesting analogy, think about Blockbuster—may it rest in peace—and Netflix, the current version of Netflix. They both, if you think about it, give the exact same value to the consumer. Both allow you to list units of content for your pleasure and entertainment. But clearly and intuitively, security for Netflix and security for Blockbuster is not the same. Like, one was a chain that organized—you need to go and just physically rent a DVD. And another one is much more of modern architecture where you’re just streaming stuff to your home. So even enterprises that are going to provide the exact same value in the near future, may have, like, a very, very different backend to how they’re shaped in this autonomous age that we’re entering, which makes it clear that security as a whole is going to be very, very, very different. And we need to recalibrate to just like an age of autonomous security that’s coming upon us.

Sonya Huang: You were at our AI Ascent event earlier this year, right? Do you remember when Jensen Huang shamed everybody who was there for the fact that not enough people in the room were thinking about security in a world of agents. And I remember Jensen said something about how, you know, you can imagine that as these agents are allowed to act more autonomously in enterprises, you should expect orders of magnitude more security agents than the actual productive agents themselves watchdogging and shepherding this herd of agents effectively.

Dan Lahav: So I’m biased still. I agree with Jensen. I think Jensen was the first person that I’ve met that was much more bullish on AI security than myself, because in our view, you need a collection of defense bots that are also going to be working side by side with capability bots in just the next generations of how enterprises are going to be created. But indeed, he gave a ratio that he thinks that it was going to be 100-1. And just like how many, just like defense and security bots are going to be required out of the assumption that secure by design in AI is not going to work. So I’m not sure that I agree with that part of the conclusion. I think that we can make significant progress on secure by design, specifically embedding defenses in the AI models themselves. That being said, we share the view that the future is going to be one where we’ll need to have a lot of just agents that are specifically for the task of monitoring other agents and making sure that they’re not going to step out of bounds.

Dean Meyer: So maybe on that question, just to dive one layer deeper, what is the state of model cyber capabilities today, and how has that changed over the past 12 to 18 months?

Dan Lahav: It’s a great question, and I actually think that the rate of change is the most relevant part here, because models are capable of doing so much more now than they were even capable of doing a quarter or two quarters before. So just to give an intuition, so this is now just like we’re entering the fourth quarter of 2025. At the beginning of the year, coding agents were not a widespread thing yet. The ability to do tool use properly was not just starting, but obviously much, much more nascent than it is right now. Reasoning models were only at the beginning as well. So just, like, think about all of the things that were added last year, and what they mean also for security elements.

So what we’re seeing now is that the combination of coding being much better, models being able to have multimodal operations, tool use improving, reasoning skills improving, if you’re using models for offensive capabilities, we are seeing unlocks all of the time. Something that is now feasible that was not even feasible a quarter ago is proper training of different vulnerabilities and exploiting them in order to do much more complicated actions. So for example, if you have a website and you want to hack it on the application, a few months ago, if you needed to integrate a collection of vulnerabilities in order to perform an action of value—at least autonomously without a human being involved—models were unable to do that, even the state of the art models. That’s not the case anymore.

So obviously that depends. It’s not a hundred percent success, and obviously that depends also on the level of complexity of vulnerabilities and the environment that you’re trying to hack. But we have seen huge spikes of just, like, being able to scan more and more complicated code bases, exploiting more complex vulnerabilities, training them in order to do these exploitations, et cetera. And, you know, just like the recent GPT-5 launch on security and on the offensive side specifically of what models are capable of doing, we have seen a significant jump in their ability to be able to be much more competent across a collection of skills that actually matter a lot around the cyber kill chain.

AI Model Capabilities and Cybersecurity

Dean Meyer: Can you tell us more about that? And obviously, there’s some things that are publicly available, others that are not, but at least on the scorecard and what OpenAI have shared in particular for GPT-5, what are some of the capabilities that you’ve seen that were surprising?

Dan Lahav: We are seeing constant improvement on the ability of models to, for example, have situational awareness on whether they are in a network. And up until a few months ago, the beginning of the year, complete models were unable to do that. They were able to run some operations locally, but they were usually not having situational awareness over what’s happening and what they can activate, even in more limited and constrained scenarios as we put them in. And that’s not the case anymore. So we still sleep very, very easily at night because the level of sophistication is still somewhat limited. But we are finding ourselves trying to create more and more and more complicated scenarios just because there is a huge jump in being able to take more complicated context, as I said before, chain complicated vulnerabilities to one another in order to do multi-step reasoning and exploit. And these are all new skills that going one year back did not exist.

Dean Meyer: You guys are trusted partners by many of the labs, including Anthropic, including OpenAI, including Google DeepMind. You work very closely with them for quite some time at this point. Why did you take the approach of working—kind of embedding yourselves within the labs as opposed to, I don’t know, selling directly to an enterprise right now?

Dan Lahav: There are multiple companies that are doing AI security. We are pioneering a category of the market that we call “frontier AI security,” and we think it’s fundamentally different. And the core thing is actually very simple: The rate of progress and the rate of adoption of models change so many things at the same time that while traditional security tends to be somewhat reactive in nature, here we need a very aggressive, proactive approach. In markets that are dominated by a rate of innovation that is frankly unmatched, I think, unparalleled in human history, we think it’s more interesting to take a temporal niche of the market, that is to say, focus on the first group of people or organizations that are about to experience a problem—so the labs, because they are the contenders to create the most advanced and increasingly sophisticated AI models in the world—work very closely in order to just see firsthand the kinds of problems that are going to emerge and utilize that in order to have a clear and crisp understanding of what’s going to come six, twelve, twenty-four months ahead of time, such that we can be prepared at a moment where general deployers are going to need to be in a situation of embedding these advanced models and already have solutions that are going to be relevant for them.

Sonya Huang: Given the rapid pace of progress in the foundation model side of the world, if you’re at one of these model companies—and I think the people there are sincere, they want to do good for the world, they now know their models are capable of being used for extreme harm and cyber attacks as well, what do you do about that conundrum? And I remember—so we’ve been working with OpenAI since 2021. I remember back in those days, every enterprise user of the API past some volume had to be manually approved for their use case in order to even access the API. It feels like the ship has sailed of anybody anywhere will be able to access some of these models. And so how can you make the models sort of secure by design if you’re in one of these foundation model seats right now?

Dan Lahav: I think it’s a great question. One thing on the premise of the question, I think that at least right now, at the moment in time in which we’re in, the ability of models to actually do extreme harm, you know, it exists in potentially some use cases, but at least in cyber, I think we’re not there just yet. And that matters.

And just to be really sharp on what I mean here, models can clearly be used in order to do harm, but there is a distinction between harm and extreme harm that should be made. Harm would be an example of using a model in order to fool the senior citizen and in order to just steal money from them, so just like scaling up phishing operations. That can happen easily right now. Extreme harm, in my view, would be something along the lines of taking down multiple parts of critical infrastructure in the United States at once, that you can take full cities off the grid, making hospitals not work. Models are not there yet.

And that’s not me nitpicking on the question. I actually think it matters quite a lot, because how much time we have to prepare to a world where models that are that capable actually dominate the strategies that we can take on the defensive side. Because our view is that the first thing that we should do, just like a first order thing, is be able to monitor and have a view of what’s going to come, such that we’ll have an ability to have a much higher resolution discussion of which capabilities are progressing, at which pace they are progressing, should we expect them to continue to progress at this pace or accelerate in the future? And that dictates the order of and the priority of some defenses, when we’re going to embed them, whether we should embed them, et cetera. And if we get this wrong, I also think it’s unfair to the companies and to the world as well, because AI also has, like, so much potential to do good that if we deploy a lot of some defenses that may chip away from productivity ahead of time, we’re also doing real harm to just innovation and the world at large. And it’s a very delicate balance to strike.

So I think just like the first order thing to do if you’re working inside of the labs is actually having and supporting a large ecosystem that can take the models and measure them and get to high resolution before this is even possible to do. The second bit is figuring out a defense strategy that is informed by exactly what’s happening and treating it almost like a regular science with experiments of just how to assess, how to do predictions, et cetera.

There are some defenses that will require a degree of customization. For example, if you’re someone that is creating monitoring infrastructure, we’ll still need that. You may want to recalibrate some of your infrastructure to give you higher alerts that AI is going off the rails, for example. But there are some problems that are very easy to write about, but actually very hard to develop solutions for. For example, the service of just like customizing your monitoring software in order to prioritize alerts that are coming from your AI layer. How are you going to be able to understand when AI is doing something which is problematic? Occasionally you’re going to be able to run into that, but sometimes this may be—I think it’s like the entire subsection of the market which is anomaly detection, which is a huge subsection of security is going to have a big problem very soon, because anomaly detection is based on measuring a baseline, understanding what is a baseline and measuring against that baseline in order to see that something is an anomaly. But if you don’t have a crisp understanding of the baseline and how it should look like, you have an issue of understanding that something went badly.

So as an outcome, there are some defenses that are going to operate as is and there are some defenses that we should just like recalibrate, either customize or create from scratch. Just because there is a lot of science to be done of understanding how models look when they’re under attack, understanding how it looks like when models are attacking something, and because I think we still have some time for the world that’s about to come, my recommended strategy would be: Invest a lot in creating robust tools that would allow you to have rigorous evidence of what’s going to come, such that you can have the discussion at high resolution, map which classic defenses are still relevant, understand where your gaps are at, and invest a lot in R&D in order to make sure that you have cost-effective defenses that you can roll out ahead of time before models are going to be deployed.

Sonya Huang: I know one of the broader questions in frontier AI research today is: Can we actually understand the mind of a neural net and what’s happening? And so I’m wondering if we can actually detect the characteristics of when is the model starting to behave badly? And do you actually—can you understand that in the activations of the net?

Dan Lahav: For some of the attacks we may be able to detect that, but that’s still a very big open question. I’d say our approach in which we’re trying to pioneer around security research, which is actually work from the outside in. So that would be putting the models in high fidelity, realistic environments that would push them to the limits. And in these environments, you’re recording pretty much everything that’s happening in the background. So that would be both the internals of the model, like the neural net, but also interactions of the model with the environment or the AI system with the environment.

And it actually allows you, by this recording, to play around with using all of this data to create, firstly, a mapping of how it even looks like when an attack is happening, whether you’re trying to attack a model or using the model to attack a target. And then based on that, taking up classifiers. And I think just like a key thing at least around where we are right now in security, and a key realization is while understanding the internals of a model is ultimately going to be important to fully solve the field, we can make a lot of progress just by understanding that something is not right, even if we don’t have full, crisp, full understanding of just the internals and what’s going on there.

And also in these environments, the other nice thing is you can insert or remove defenses to see if they’re effective. So think about a situation where the question that I’m trying to ask is whether a model is going to be able to evade a detection from an antivirus. This is a core security question. If models are going to be able to do that in enterprises, this is an issue because they’re going to be able to scheme and do malicious things and just harm security processes without anyone noticing.

And in specific environments we see that this is actually possible. Models are able to do that. We were able to show how a model was able to take out Windows Defender, the real Windows Defender—in a toy environment, granted, but real Windows Defender. But that kind of research, when you’re putting a model in an environment and recording a lot of stuff both internally and externally allows you to play with these parameters to ask a question of A) should I change something on the model itself in order to understand that something bad is happening? But also take a different approach, which is agnostic to that, and say should I alter something in the defense in order to make sure it’s more resilient?

So I think our view is that for the long run, for the vision of the field, we do need to solve problems such as mechinterp in order to also solve issues in security, which also shows how different problems in AI are just changing the core of just so many different fields. Security itself may just be reframed to an issue of reliability and control in the near future, not just classic traditional security. And drawing an analogy to the human brain, we still don’t have a proper understanding of our own mechinterps like the neural net in my head. That being said, we were able to create very powerful mitigations, you know, and just solutions to multiple problems that we’re experiencing as humans in our lives just by being observational on how we’re interacting with the environment or just understanding smaller parts of the inner parts of the system.

Real-World AI Security Simulations

Sonya Huang: I love that framing. You said something scary to me that the models are capable of taking out Windows Defender in your simulation environments. Can you share more about how that even happened?

Dan Lahav: Yeah. So just to be clear, it’s AI. It happened in a toy environment, you know, it’s a controlled setup simulation. That’s not to say that everyone that is using Windows Defender now should just be scared of an outcome like that. So I just want to make sure that Microsoft is not going to sue Sequoia for just taking down the stock. You can think about a network—so it’s a network simulation. So think about a network environment. Just to simplify, imagine the network environment that you have in your enterprise, in your work, in your home, the Sequoia network, for example, your phones, your laptops are connected into this network environment that you’re using.

And the AI there was a storage bot, so it had access to the organizational file in order to be a helpful storage bot. For example, just scan some of the organizational files such that it can summarize some of the inputs. Like you’re doing it all of the time when you’re interacting with files that you’re having just like access to a file and you’re asking, “Give me a summary of these four things. What’s common between these documents? What’s not common between these documents?”

And also as a prelude to the simulation, there was an attacker that had access to the enterprise Slack. That happened multiple times in the last year, just like Slack is occasionally hacked, as happens to every software that is very ubiquitous and just being used by so many different organizations in the world. And the attacker gave a very simple instruction to this helpful AI storage bot. It just said, “I want you to download this file to the organizational network.” But clearly, because it’s a storage bot, it did not have access to download from the internet. But what you saw that was pretty amazing, that through a collection of prompting, the model, because it was given the clear instructions to just do it at all costs—obviously much more sophisticated language I’m simplifying here—it realized that it can’t do downloads because it doesn’t have access and because Windows Defender, the real Windows Defender, is blocking it from downloading files. So that’s the way the organization was set up.

So you can see internally of how the model is thinking, that it was able to map and have situational awareness in the environment, A) understanding that this is the problem, that it should have higher privileges to just do more stuff in the organization, and that it should be able to remove Windows Defender because it’s bothering it in order to achieve its ultimate goal. So then you can see as a next step in that reasoning trail that the model is realizing that it’s a storage bot. So as an outcome, it can access different files, obviously still by just asking it to access different files, but it started to just scan whatever it had access to in that privilege level, and found that someone accidentally in one of these files left a hard-coded password to the organizational cloud. And it used that. And now it was able to just get access to something that it didn’t have access to before.

And I’ll cut the story short. It just did a collection of things that ultimately just did more privilege escalation, more privilege escalation, more privilege escalation, and was able to just gain more and more and more and more access to stuff that it should not have given access to. And ultimately, it was able to just get enough access to have admin privileges on the organization and kill critical security processes. And as an outcome, it was able to outmaneuver Windows Defender, remove an organizational defense and upgrade its own privileges such that it can download a file and just actually hack the organization that it’s a part of. We call it internally, this simulation, a double agent, just because it’s an agent that is doing that, but it’s turning against its host. So it’s kind of like a fun play of words of these kinds of new attacks.

And in a way, the AI is kind of used very much similarly, this prompting, as that you would think about a parasite that is attacking externally, because you’re using this very lean text of just sending something to a model, and you’re using the fact that it has a very strong brain in order to just do a collection of actions that are very advanced.

And I want to say the point in time in which we’re in right now is that this is a toy setup and it’s not—I don’t expect that, you know, for a lot of these things to happen in the wild yet. That being said, we are seeing huge “progress,” quote-unquote. And I feel it in security. Have you guys seen, like, the classic—I think it was called Bricks, like the game that DeepMind demoed just like a decade ago, where just like it starts very badly, and just like then the AI is able to just figure out better and better and better strategies. And it is first relevant to just Bricks and then just relevant to just like many, many, many other games. And here we are right now, just like a decade after the state of AI.

So I think security by being a derivative market of what it is ultimately you’re trying to secure, is at the more nascent stage right now, where in toy setups and simulations we’re able to start to only get a glimpse of what’s about to come. And we are seeing stuff like models having enough power to do stuff such as maneuver the host in order to just do privilege escalation attacks, remove some organizational barriers and wipe out even real security software such as Windows Defender. And while these are not things that will likely happen in the wild now, it’s likely that in a year or two or three, if we’re not going to have the appropriate defenses, this is going to be a world that we’re going to just land up on. And clearly the implications here matter, right? I assume that the vast majority of enterprises in the world don’t want to deploy or just adopt tools that are able to outmaneuver their defenses.

Working with AI Labs

Dean Meyer: How do you think about model improvement, especially in the context of reinforcement learning, playing a pretty significant role in the improvement of coding, even tool use? For example, how does reinforcement learning play a role in cybersecurity?

Dan Lahav: I think that’s literally a billion-dollar question, or just like maybe a trillion-dollar question. I don’t know. Because my background is as a researcher, I’ll keep my scientific integrity and just say that there’s a lot of uncertainty, but I’m still going to give a speculation of what’s likely and what’s going to come.

We’ve already seen that RL is very, very useful to a lot of the innovations that we’re seeing right now around coding, around math, and in other verticals as well. I think it’s likely at the point in time in which we are in right now that RL is going to be able to scale as well, that is that we’re going to see something similar to scaling laws, that if we’re going to input more data or just have breakthroughs and improvements in training, we’re going to ultimately get better models, at least in the verticals that I’ve mentioned before, by RL.

I think it’s still an open question on whether RL generalizes, just like where we are right now in the world. So that means that if you’re using data and RL environments in order to improve the model encoding, whether you’re going to see a huge jump in being able to produce better literature, for example. If you think about it, that’s roughly—you know, a huge simplification—something that we did come out to expect out of models. We lived in the last few years in a world where models were showing properties of advancing a lot of capabilities in the same time, which is different than the world that we lived in before, where I still have the skill from just like what feels in our previous life, just like previous jobs of understanding how to create huge ML data sets in order to just improve in a very narrow domain.

And that world, it still exists, but we shifted into just a much more generalized paradigm. And there’s a question of just whether RL is going to provide that. And the reason that that matters is we still are at the early stages of A) figuring out if unique improvements in—or just like taking data that is relevant for RL training around security is going to push the security frontier, or whether improvements that RL is providing around coding or math or others just like scientific skills is going to be relevant for security.

My intuition on the first one is a fairly strong yes, that we are going to see a success in some experiments of just using security data in order to have improvements such that AI can become better and better just like at security engineering tasks. I think there are some indicators that are showing that we’re on the way to doing that. I think it’s not going to be as clean as improvements that have happened in coding math, just because the complexity and noise level around some security tasks are going to make it a harder problem. I think we are going to also get some boosts around security that is coming from other domains improving in RL soon. If you’re better at coding, you are going to be better at some security tasks as well.

I think it’s still unclear about whether this is going to generalize, and in security we’re in a more nascent situation around just like what’s happening right now in RL, but I am placing a not-insignificant bet that there is going to come just like a string of just like innovation, just like a string of innovation is potentially going to come around that, and that we’ll see some improvements on security as well with RL over the upcoming period.

Enterprise AI Security Strategies

Dean Meyer: That’s very exciting. Now let’s take a step back and talk about the enterprise. And so I think enterprises are still very much so in the early innings of building, deploying agentic AI. And how should CISOs and security teams be thinking about security as we move into this world?

Dan Lahav: I would say that there is a lot of dependency on exactly what you’re trying to do. So I’m going to simplify kind of like aggressively in this answer. A useful frame to start thinking about when you’re an enterprise that is starting to deploy on a first-level issue is to kind of see AI, and specifically agentic AI, as the new frontier of inside risk. So there are multiple enterprises already that are trying to give the agents that are deploying internally persistent identities or ways to interact with them, for example either Slack identities or email identities or systemized inside of the organizations how and when you can approach AI. And that allows us to just have a clear visibility about where AI is being used and in which use cases. And then it becomes an issue of access controls and an issue of just like privileges that you’re giving to these identities that you’re putting in. And that’s usually a useful frame in the beginnings of just embedding. So just gain some visibility and just try to turn it as much as you can into an issue of access and into an issue of control of privileges.

That’s obviously insufficient, especially if you’re getting to the stage where you’re allowing AIs to interact with one another in the organization. So for example, agent-on-agent communication. And that may sound very advanced, but if you’re running one agent that is doing summarization and another agent that is just ranking the quality and acting as a judge on these summarizations, then you’re already at the stage where you’re having AI-on-AI interaction. The base level would be to just have state-of-the-art monitoring on this AI-on-AI interaction.

Here, I’ll however say that there is a genuine gap in what is available out there. A lot of monitoring software was not built to detect agents that can potentially communicate in ever-changing protocols, that if something goes wrong can try to bypass and understand that they’re in monitored environments, and need to also take into account attacks that are unique and novel to AI itself. So clearly, if you’re starting to deploy right now, you need to make sure that you have proper visibility and monitoring into the interactions of just like both the internals of the models and the externals. But you need to also realize that a lot of the software is not properly built for that age, especially, especially, especially if you’re moving to agent-on-agent communication. So that’s a more advanced use case.

Just to give a funny example of something that happened to us in one of the streams of work that we’ve done with one of the frontier labs, there was a scenario where there was an agent-on-agent interaction. So these are two real frontier models—actually one of each kind. So it’s like having an environment where there was, like, one—I won’t say the names, but you can kind of think about it’s like a Claude, a Gemini, it’s like a Grok, it’s like a Llama, ChatGPT, et cetera. So just like two of these that were speaking to one another, and very, very, very randomly in this environment—it was a critical security task. That was the simulation that they were in. But after working for a while—and you can see it in the reasoning trail—one of the models decided that they’ve worked enough and they should stop. And, you know, these are stochastic machines at heart. And that’s the design. And it’s just likely somehow it got picked up as part of the distribution that if you work a long time—because it’s learned from the internet.

Sonya Huang: Trained on me and Dean talking to each other.

Dan Lahav: Exactly.

[CROSSTALK]

Dean Meyer: This should be a new policy at Sequoia.

Dan Lahav: Yeah, exactly. That you should take a break, right? And. you know, that makes sense if you can actually pick up that behavior. And on an individual model level, that’s already—you know, in this example it’s funny and it’s weird. That being said, it did not stop there. It convinced the other model that they should both take a break. So the model did social engineering on the other model, to another model, which is—and again, it’s funny in a simulated environment, in a toy setup, but now try to think about the situation where you actually as an enterprise are delegating an autonomous workflow that is critical to you to complete. And occasionally, there’s kind of like this weird thing that stopped working and you don’t understand why.

And the more complicated and capable machines are ultimately going to be, the more of these weird examples we’re going to encounter. And just like a different thing that happened to us, we gave a model a CTF charge, so Capture the Flag. It’s very common in cyber competitions that you’re trying to achieve something, you usually have to do a chain of vulnerabilities that you need to exploit in order to just obtain and just capture a flag. And that gives you validation that the model was able to do a collection of cyber actions that ended up in a success. But the model here again, it understood that it’s in the context of a CTF, and decided that potentially the challenge is too hard. So it did what potentially humans would do, which is he tried to email the organizers of the competition in order to help it gain a solution.

Dean Meyer: [laughs]

Dan Lahav: And that is literally—however, if you think about it in an enterprise setting, it’s like you have an identity that just like unasked, may try to just use your servers in order to send an email to the world. In our example, by the way, just like the other fun thing was a second order issue, is that the model failed at doing that not because it had an issue of maneuvering inside, but because it hallucinated the email address. So as an outcome, it tried to send an email to an email that doesn’t exist, which also shows just like the classic other problems that you’re having in AI and in AI adoption are going to be chained to security problems as well, which shows the frontier of attacks and defenses that we’ll need to develop here.

So, you know, just like if I’m going back into just monitoring, et cetera, a lot of monitoring software, you have to embed it and you have to use what’s out there already. But it’s not built for these kinds of challenges, you know? That’s why a lot of our approach is to figure out how these attacks are going to look like, how to just redo some of the defenses, what’s going to be required. And occasionally I think it’s like a common misconception is that all that you need to do, that all of it ultimately collapses into an issue of access management. And that, while I think a lot of the basis is there is by just figuring out how to do the access management world and just manage the privileges, it’s only step one of just what we need to do. And there is a mind shift that we also need to have when we’re approaching this subject, which is the rate of innovation is so high, our ability to understand what’s happening at the frontier, you know, just like so many things are happening at once, try to be very engaged with the community in order to figure out what kind of problems that you’re even going to encounter essentially over time in order to be better prepared.

Governmental AI Security Considerations

Dean Meyer: Okay, so as we shift from the enterprise to sovereign AI, we know the UK government and a set of others are customers of Irregular, so how should governments and countries be thinking about AI risk?

Dan Lahav: Obviously, all of the risks that apply on the enterprise side and to the labs themselves apply also on the governmental level. Because if you’re now the Department of Defense, the Department of Commerce, Department of Education, doesn’t matter, and you’re using advanced AI models, you’re importing the benefits and risks that come associated with them. So everything that we’ve said about the enterprises, everything that we’ve said about the frontier labs themselves, they have similarities on the governmental side as well.

Usually governments, however, come with a set of unique requirements and a new level of risk that is relevant to them. So just like one, they are often targets of other very strong adversaries, and should take into account that the adversaries are now taking offensive AI models and are already starting to use them in order to scale up, whether simple things such as phishing campaigns, up to testing more and more advanced cyber offensive weapons, that is scaling up their efforts, that is trying to bypass the fact that I think pretty much every critical system that countries have was hacked at some point in time. We have not yet seen multiple critical systems ubiquitously just going under.

And the fact that AI on the offender side can scale up operations aggressively means that countries should essentially recreate their approach around critical infrastructure. And that is AI is being elevated in that context from a classic security risk into a national security issue. And the infrastructure, and just like the thought leadership should be created there.

The other bit is that from a country perspective—and you can argue on whether this is the right thing or not, but multiple governments that we’ve spoken with are very strongly emphasizing the effort of sovereignty in the context of AI. And what they usually mean by that is that they are anxious around being dependent, because they understand that AI is extremely critical as the infrastructure that could be the key to the 21st century and potentially beyond. Because of that, especially if the country is doing an end-to-end effort, starting from building local data centers that could be used in order to train and to do inference on advanced AI models, up to the point of potentially training the models and creating the AI systems that surround them and having proprietary environments that they also take in, defenses should be done across this entire spectrum.

And we’ve indeed done work to just both create standards of how to secure these data centers, and making sure that people are not going to lift critical assets, how to run models on such data centers. For example, we’ve done a combination of a white paper with Anthropic that is discussing confidential inference systems, and trying to just figure out how to create a standard in the field, up to the fact of when actually using these models, taking into considerations how to customize some of the defenses that enterprises need, and create the variations of them that governments would need for their use cases, especially if they’re putting AI as part of—not just taking into consideration that AI can be used by adversaries to attack critical infrastructure, but by the fact that they may integrate AI to their own critical infrastructures. And that requires a whole new level of thinking through the defenses.

Dean Meyer: Dan, this was a lot of fun. Thank you very much for joining us.

Dan Lahav: It was a pleasure being here, and also very happy that I ended up answering your emails.

Dean Meyer: Thank you.

More Episodes