Agents on the Brain
The last nine months produced a few discrete waves of AI innovation, each of which expanded our notions of the possible. Last summer came Stable Diffusion and the image generation moment. Then in November, ChatGPT turned our collective attention to LLMs, which has only grown with the launch of GPT-4 and many others. LangChain popularized agents and now we’re seeing the next step on that journey emerge with autonomous agents. AutoGPT surpassed 100k Github stars (more than Go, Kubernetes or Node.js) on April 21, less than a month after launch. BabyAGI has 12k stars and has inspired a long list of projects.
Autonomous agents can be thought of as language model-powered bots that can break down complex problems and iteratively solve them, taking action on users’ behalf. We can use a simple example to illustrate what is possible with just an LLM, an agent, and with autonomous agents. With just an LLM, we can look up the best restaurants in a given city. With an agent, we can tell it to look up the highest rated restaurant with a table available and book the table for two. With an autonomous agent, we can ask it to find the best restaurant that fits into my schedule and my preferences then book it for me and my best friend. Autonomous agents can do this by breaking down a task into subtasks and using memory between each step to guide the agent’s actions.
Depending on whom you ask, autonomous agents will either prove to be a lasting paradigm shift—with a glimmer of what might be ahead with AGI—or they may just be another moment among many iterative approaches.
It’s easy to see why autonomous agents have captured our collective imaginations. They make it easy to dream of what is possible with small, lightweight apps on top of LLMs. We’ve experimented with spinning up agents ourselves—exploring everything from email digests (that go from scraping Google trends to synthesis and summary) to complex travel itinerary planning and more.
As compelling as these examples may be, autonomous agents leave a lot to be desired in their current state. There is significant room to improve performance, user control and output quality. It is still early days and agents will have to overcome at least three significant hurdles to achieve large scale adoption:
- Logical reasoning != good execution: In principle GPT-4 is capable of chain-of-thought reasoning and decomposing tasks into multi-step processes. But in practice, the agent often struggles with executing on their own sub-tasks. They struggle to know when to “take a step back,” leading to getting stuck doing the same task in a loop, or may hallucinate a step and get stuck because there is little external feedback.
- Compute costs: The architecture of these applications rely on recursive loops, which can lead to many repetitive calls of your LLM. The cost is relatively low per call today with tools like OpenAI’s APIs (but may run into API limits!), but with in-house models, the cost equation may be quite different.
- Learning: Because the autonomous agents are spun up and not subsequently reused, they do not learn from the prompts or from prior attempts, and don’t learn much from their mistakes. Services that help agents persist are on the horizon, though, which should make managing them easier.
If we can solve some of these challenges, we can imagine a future of “agent to agent” interactions. Specialized agents could be created for common tasks. Instead of spinning up a new agent for every task, you might “outsource” some of the steps and rely on pre-trained agents to fulfill tasks that you pay for per output, and incorporate those outputs as input into your next steps. In other words, your AI could hire or outsource to another AI. “Core” tasks could be covered by different agents, and a new layer of tooling could emerge as the “glue” stitching the entire process together.
What would it take to get there? Current implementations rely on GPT-4 APIs utilized by short-lived agents with limited context windows. To reach their full potential, the next generation will need to be:
- Compute aware: minimizing resource usage as an objective function
- Data aware: finding and connecting to the right model or data source for the task
- Agent aware: finding, reusing and communicating with ecosystems of agents
- Safety aware: checking outputs and sandboxing code is the first step, plus more serious controls will be needed to prevent abuse
- User aware: learning from user behavior and preferences to optimize performance
We think autonomous agents could become an interesting part of the AI application landscape, and the technology is just starting to get good. What will the evolution of agents look like? Tell us what you’re building!