
2026: This is AGI
Years ago, some leading researchers told us that their objective was AGI. Eager to hear a coherent definition, we naively asked “how do you define AGI?”. They paused, looked at each other tentatively, and then offered up what’s since become something of a mantra in the field of AI: “well, we each kind of have our own definitions, but we’ll know it when we see it.”
This vignette typifies our quest for a concrete definition of AGI. It has proven elusive.
While the definition is elusive, the reality is not. AGI is here, now.
Coding agents are the first example. There are more on the way.
Long-horizon agents are functionally AGI, and 2026 will be their year.
Blissfully Unencumbered by the Details
Before we go any further, it’s worth acknowledging that we do not have the moral authority to propose a technical definition of AGI.
We are investors. We study markets, founders, and the collision thereof: businesses.
Given that, ours is a functional definition, not a technical definition. New technical capabilities beg the Don Valentine question: so what?
The answer resides in real world impact.
A Functional Definition of AGI
AGI is the ability to figure things out. That’s it.*
*We appreciate that such an imprecise definition will not settle any philosophical debates. Pragmatically speaking, what do you want if you’re trying to get something done? An AI that can just figure stuff out. How it happens is of less concern than the fact that it happens.
A human who can figure things out has some baseline knowledge, the ability to reason over that knowledge, and the ability to iterate their way to the answer.
An AI that can figure things out has some baseline knowledge (pre-training), the ability to reason over that knowledge (inference-time compute), and the ability to iterate its way to the answer (long-horizon agents).
The first ingredient (knowledge / pre-training) is what fueled the original ChatGPT moment in 2022. The second (reasoning / inference-time compute) came with the release of o1 in late 2024. The third (iteration / long-horizon agents) came in the last few weeks with Claude Code and other coding agents crossing a capability threshold.
Generally intelligent people can work autonomously for hours at a time, making and fixing their mistakes and figuring out what to do next without being told. Generally intelligent agents can do the same thing. This is new.
What Does It Mean to Figure Things Out?
A founder messages his agent: “I need a developer relations lead. Someone technical enough to earn respect from senior engineers, but who actually enjoys being on Twitter. We sell to platform teams. Go.”
The agent starts with the obvious: LinkedIn searches for “Developer Advocate” and “DevRel” at competing companies — Datadog, Temporal, Langchain. It finds hundreds of profiles. But job titles don’t reveal who’s actually good at this.
It pivots to signal over credentials. It searches YouTube for conference talks. It finds 50+ speakers, then filters for those with talks that have strong engagement.
It cross-references those speakers with Twitter. Half have inactive accounts or just retweet their employer’s blog posts. Not what we want. But a dozen have real followings — they post real opinions, reply to people, and get engagement from developers. And their posts have real taste.
The agent narrows further. It checks who’s been posting less frequently in the last three months. A drop in activity sometimes signals disengagement from their current role. Three names surface.
It researches those three. One just announced a new role — too late. One is a founder of a company that just raised funding — not leaving. The third is a senior DevRel at a Series D company that just did layoffs in marketing. Her last talk was about exactly the platform engineering space the startup targets. She has 14k Twitter followers and posts memes that actual engineers engage with. She hasn’t updated her LinkedIn in two months.
The agent drafts an email acknowledging her recent talk, the overlap with the startup’s ICP, and a specific note about the creative freedom a smaller team offers. It suggests a casual conversation, not a pitch.
Total time: 31 minutes. The founder has a shortlist of one instead of a JD posted to a job board.
This is what it means to figure things out. Navigating ambiguity to accomplish a goal – forming hypotheses, testing them, hitting dead ends, and pivoting until something clicks. The agent didn’t follow a script. It ran the same loop a great recruiter runs in their head, except it did it tirelessly in 31 minutes, without being told how.
To be clear: agents still fail. They hallucinate, lose context, and sometimes charge confidently down exactly the wrong path. But the trajectory is unmistakable, and the failures are increasingly fixable.
How Did We Get Here? From Reasoning Models to Long-Horizon Agents
In last year’s essay, we wrote about reasoning models as the most important new frontier for AI. Long-horizon agents push this paradigm further by allowing models to take actions and iterate over time.
Coaxing a model to think for longer is not trivial. A base reasoning model can think for seconds or minutes.
Two different technical approaches seem to both be working and scaling well: reinforcement learning and agent harnesses. The former approach teaches a model intrinsically to stay on track for longer by poking and prodding it to maintain focus during the training process. The latter designs specific scaffolding around the known limitations of models (memory hand-offs, compaction, and more).
Scaling reinforcement learning is the domain of the research labs. They have made exceptional progress on this front, from multi-agent systems to reliable tool use.
Designing great agent harnesses is the domain of the application layer. Some of the most beloved products on the market today are known for their exceptionally engineered agent harnesses: Manus, Claude Code, Factory’s Droids, etc.
If there’s one exponential curve to bet on, it’s the performance of long-horizon agents. METR has been meticulously tracking AI’s ability to complete long-horizon tasks. The rate of progress is exponential, doubling every ~7 months. If we trace out the exponential, agents should be able to work reliably to complete tasks that take human experts a full day by 2028, a full year by 2034, and a full century by 2037.
So What?
Soon you’ll be able to hire an agent. That’s one litmus test for AGI (h/t: Sarah Guo).
You can “hire” GPT-5.2 or Claude or Grok or Gemini today. More examples are on the way:
- Medicine: OpenEvidence’s Deep Consult functions as a specialist
- Law: Harvey’s agents function as an Associate
- Cybersecurity: XBOW functions as a pen-tester
- DevOps: Traversal’s agents function as an SRE
- GTM: Day AI functions as a BDR, SE, and Rev Ops leader
- Recruiting: Juicebox functions as a recruiter
- Math: Harmonic’s Aristotle functions as a mathematician
- Semiconductor Design: Ricursive’s agents function as chip designers
- AI Researcher: GPT-5.2 and Claude function as AI researchers
From Talkers to Doers: Implications for Founders
This has profound implications for founders.
The AI applications of 2023 and 2024 were talkers. Some were very sophisticated conversationalists! But their impact was limited.
The AI applications of 2026 and 2027 will be doers. They will feel like colleagues. Usage will go from a few times a day to all-day, every day, with multiple instances running in parallel. Users won’t save a few hours here and there – they’ll go from working as an IC to managing a team of agents.
Remember all that talk of selling work? Now it’s possible.
What work can you accomplish? The capabilities of a long-horizon agent are drastically different than a single forward pass of a model. What new capabilities do long-horizon agents unlock in your domain? What tasks require persistence, where sustained attention is the bottleneck?
How will you productize that work? How will your application interface evolve in your domain, as the UI of work grows from chatbot to agent delegation?
Can you do that work reliably? Are you obsessively improving your agent harness? Do you have a strong feedback loop?
How can you sell that work?Can you price and package to value and outcomes?
Saddle Up!
It’s time to ride the long-horizon agent exponential.
Today, your agents can probably work reliably for ~30 minutes. But they’ll be able to perform a day’s worth of work very soon – and a century’s worth of work eventually.
What can you achieve when your plans are measured in centuries? A century is 200,000 clinical trials no one’s cross-referenced. A century is every customer support ticket ever filed, finally mined for signal. A century is the entire U.S. tax code, refactored for coherence.
The ambitious version of your roadmap just became the realistic one.
Thanks to Dan Roberts, Harrison Chase, Noam Brown, Sholto Douglas, Isa Fulford, Ben Mann, Nick Turley, Phil Duan, Michelle Bailhe, and Romie Boyd for reviewing drafts of this post.