The Compound Lever: AI for Software Engineering

For decades, software has provided the lever to move the world—now AI that can create software is levering that lever.

Published June 25, 2024

In the Generative AI application landscape, a single market category stands out as the Schelling point where many of this generation’s best entrepreneurs have gathered and focused their talents: building the AI software engineer.

There is no shortage of elite teams in this race, and they are approaching the problem from a myriad of angles, from ambitious teams building their own custom cognitive architectures and Droids (e.g. Factory) to researchers building their own foundation models for code (e.g. Magic) to incumbents who have found a formidable second wind (e.g. GitHub Copilot).

But is this white-hot market worth the hype? Are autonomous software engineers really that magical, or are engineers just doing what they like to do, building tools for other engineers?

We spoke with Matan Grinberg and Eno Reyes of Factory in our latest Training Data episode to better understand what’s at stake. If everything goes right, could autonomous software engineers be the killer app for generative AI? Could it usher in a new golden age of technology?

The dream: compounding Archimedes’ lever

Archimedes famously said, “Give me a lever long enough and a fulcrum on which to place it, and I shall move the world.”

For the last several decades, software engineering has been the perfect lever with which to move the world. Thanks to computer programming, we have enjoyed an ever-increasing standard of living: the world’s information searchable at our fingertips; fast and reliable banking anywhere; fashion and commerce from all over the world at our doorstep; medical software that has elevated the standard of living globally.

Matan and Eno challenged us to take this analogy one step further. Software engineering may be the perfect lever with which to move the world, but it is still bottlenecked by supply of talent (there are only 30 million software developers globally, with no quick fix – it takes years to train a non-software engineer to become a coder) and by hours in the day (it takes time to code things up). In other words, we cannot yet create software at the speed of our imagination.

AI that can create software would be the perfect lever on software itself, which in turn is the perfect lever upon the world. In Matan’s words, AI has a chance to become the greatest compound lever in human history—if we can create autonomous software engineers.

Tracking our progress

In order for AI software engineers to become the ideal compound lever, they first need to actually work. We’re still a ways off—but things are progressing at a very fast pace.

Take, for example, SWE-bench, the canonical benchmark for tasks that the average human software engineer performs. We’ve gone from 1% to 4% to 14% to 19% on this key benchmark in the past 11 weeks.

And SWE-bench is a very general set of problems, drawn from actual GitHub issues in popular open source projects. As Harrison Chase and Eno pointed out in our conversations, when you constrain the software engineering problem to a more specific problem space (e.g., software testing or code review) the success rates are far greater.

In fact, in some problem spaces the success rates already meet the “good enough” threshold for software engineers. One example is v0 by Vercel, which allows anybody to create working frontends using natural language, ranging from calculators to SaaS pricing pages. Other examples include AI coding for use cases from database migrations and codebase refactoring to writing API integrations and ETL pipelines.

Implications

With the emergence of AI as a new compound lever, the fulcrum of impact is shifting from “learning code” to “instructing a machine in natural language.”

This means the most critical bottleneck will no longer be the supply of software engineering talent, but the supply of ideas. If AI software engineering fulfills its promise and becomes the ultimate compound lever, then creators with imagination, not developers, may become the scarcest and most valuable resource. We are headed towards a future that is a meritocracy of ideas.

Recursion: The machine that builds the machine

Let’s assume that AI is on track to build software autonomously—can we push the analogy even further? Can AI build the AI and be a compound lever upon itself?

There are some encouraging clues that this may just be the case:

In the race to the ARC AGI Prize (next week’s episode), the solution in the lead involves generating thousands of python programs.
Similarly, DeepMind famously cracked one of the hardest unsolved math problems through code generation and program synthesis.

If AI becomes a compound lever upon AI—that’s when things will really get wild. (Cue infinite recursions.)

The path ahead

The industry is quickly building Archimedes’ compound lever. We have rapidly scaled from 1% to 19% on SWE-bench, a benchmark that under-states AIs’ actual performance on properly scoped coding tasks. And there are a plethora of highly capable teams attacking this problem from multiple angles.

That said, open questions remain.

How much further can the field run on the current generation of foundation models before needing order(s)-of-magnitude more powerful models, or new techniques (like inference-time compute)?

Moreover, do you need to train your own foundation model to build a human-quality AI software engineer? What about a Jeff Dean-quality software engineer?