Building the Future: Meet the 2024 Sequoia Open Source Fellows

From one celebrated UC Berkeley lab come two groundbreaking projects: vLLM and Chatbot Arena.

Published November 11, 2024

It was the summer of 2022, and a team of Ph.D. researchers in UC Berkeley’s newly launched Sky Computing Lab had a problem. They’d been working to make large deep-learning models more efficient, by distributing work across GPUs. But when they set up a demo of their new framework, performance was an immediate—and serious—issue.

“It was ridiculously slow,” remembers Zhuohan Li, a member of the project team and a veteran of RISElab, the renowned Sky Lab predecessor that had birthed Databricks and Anyscale. “We realized memory management was going to be a big bottleneck for serving these models—and we wanted to take a deeper look.”

So over the next couple of months, Li and fellow researcher Woosuk Kwon dug in, and eventually developed a promising algorithm inspired by classical virtual memory and paging techniques that they dubbed “PagedAttention.” Then on November 30, OpenAI released ChatGPT. Interest in LLMs, and in their project, exploded.

Suddenly, a flood of new models was coming from tech companies and research institutions, each of them needing to run on increasingly scarce GPUs—and Li and Kwon’s idea could be the unlock that made that easy. Eventually, they teamed up with labmate and former Anyscale engineer Simon Mo to host the first meetup for their open-source project, now known as vLLM.

vLLM contributors Woosuk Kwon, Zhuohan Li and Simon Mo.

“I’d been increasingly excited about it, but after that meetup, I was even more excited,” Mo said. “Everybody had so many questions, so much enthusiasm.” It was an early illustration of a principle that Sky Lab leader and Databricks co-founder Ion Stoica had long impressed upon his advisees: community is key.

“An open source project is not just about the code,” Stoica says. “In every one I’ve been part of, building a strong community was extremely important to success.”

Among vLLM’s earliest adopters was a new model called Vicuna, from another group of researchers within the Sky Lab—including Lianmin Zheng, Ying Sheng, Hao Zhang and Wei-Lin Chiang, and advised by Stoica and Joseph Gonzalez. Chiang had previously collaborated with Kwon on SkyPilot, a framework for running LLMs, AI and batch jobs in the cloud. But since the release of ChatGPT he, too, had shifted focus, fascinated by the possibility of building an open-source LLM.

The result, Vicuna, had launched in March 2023. It was based on Meta’s LLaMA, but with an innovative twist: the researchers had trained it in part on data from ShareGPT, a Chrome plug-in for sharing ChatGPT conversations, making it particularly useful for chatbot applications.

“It started out as just a fun project, and we were surprised to see how well it worked,” Chiang says. “We wanted to share it with everyone.” The group quickly bought a domain, lmsys.org, where they could publish a blog post and run a demo of their new chatbot. Within two weeks, they had millions of visitors—but with the excitement came criticism, as well.

“People questioned the idea that our model was actually better, and as researchers, we wanted a scientific way to convince them,” Chiang recalls. “How do you actually evaluate between models?”

At a time when new LLM releases were coming out on a weekly if not daily basis, it was a critical question not just for Vicuna, but for anyone building—or building on top of—a model. To help answer it, Zheng, Sheng, Chiang and their teammates enlisted yet another fellow Berkeley researcher: theoretical statistician Anastasios Angelopoulos.
“The fundamental problems attracted me, and it was very clearly important—the impact was there,” Angelopoulos remembers. As the group explored what a platform for benchmarking and evaluation could look like, he worked to add statistical rigor. In May, they launched their new project, Chatbot Arena, where users could compare Vicuna alongside other open models—and in May, they rolled out a live leaderboard, with rankings informed by crowdsourced preference data, from real users putting the models to work.

Angelopoulos, Chiang, Stoica and the Chatbot Arena team.

Today, Chatbot Arena is the de facto standard for evaluating model performance, with more than one million monthly users—and every time a new model comes out, industry leaders including Sam Altman, Jeff Dean and others point to their evaluations on Chatbot Arena. Almost all major model providers—including OpenAI, Google, Meta and xAI—periodically share variants of models with the Chatbot Arena team before they’re released, so the data can inform their development processes.

Like their colleagues building vLLM, Chiang and Angelopoulos credit the lab’s emphasis on community for much of their project’s success. “Since I’ve been involved, it’s been a continual effort to keep building that trust,” Angelopoulos says. Earlier this year, the Chatbot Arena team published policy documentation outlining their methodology—and their motivations. Because the project does not monetize its data, instead relying on free credits and donations, “we knew people might be skeptical about our incentives,” Angelopoulos acknowledges. “So we wanted to be very clear that our only incentive is to do the best possible science. We just want the truth.”

The vLLM community, meanwhile, has been navigating its own rapid growth. Since releasing its open-source library in June 2023, it’s racked up an impressive 28,000 stars on GitHub, and is used by developers in many top tech companies. This year, Nvidia, AWS, Cloudflare and the gaming company Roblox—which uses vLLM for translation and child safety monitoring—have all hosted meetups. “Whenever we find out that an app I already have on my iPhone is powered by vLLM, that’s very rewarding,” says Kwon. “The more we meet real users, the more confident we are that we are doing something right.”

But as for any successful open-source project, growth has been a double-edged sword; each new model or hardware release requires a time-consuming—and expensive—sprint by Li, Kwon, Mo and vLLM contributors. “Quality is absolutely critical to this project’s success, so we need robust testing for every change—especially with this many people contributing,” Mo says. At $10 per test, across a variety of GPUs, academic funding and even grants and donations quickly become unsustainable.

In fact, Stoica says, funding pressure on open source projects is “at least an order of magnitude higher” in the age of LLMs. “You have multiple kinds of GPUs, you have all of these other accelerators. And there’s also a difference in scale,” he explains. “10 years ago, most of the funding for a new startup would go to adding people. Today, it’s going to infrastructure.”

But this, again, is where community comes in. Sequoia partner Lauren Reeder says by the spring of 2024, she’d heard from multiple portfolio companies singing the praises of vLLM—it was saving their engineering teams a lot of time and effort. So Reeder reached out on X, and Mo invited her to visit the lab.

“I met Simon, Woosuk and Zhuohan, and they explained some of the dynamics around testing and cost, and the community. That’s what stood out to me most—a lot of open source projects are run by one person, kind of carrying the world on their shoulders,” she says. “But this was absolutely community-run, with nearly 80% of the contributions coming from outside the lab.”

The previous year, Reeder and fellow partner Bogomil Balkansky had headed up the launch of Sequoia’s Open-Source Fellowship. There were no strings attached, and no expectation of starting a business—just support for projects that were key to the success of the firm’s portfolio and many other companies. Reeder thought vLLM was a perfect fit, and Li, Kwon and Mo became Sequoia open-source fellows.

The Chatbot Arena team, too, has navigated their projects’ popularity with duct tape and sheer force of will. While free credits paid for the LLM calls and hosting, the team desperately needed help with technical tasks, such as updating the Chatbot Arena UI to work with multi-modal models. Their work often kept them up until 4 a.m.; “we simply didn’t have enough hands on deck,” Angelopoulos says. But they were hesitant to take any support beyond free credits from model providers, for fear of jeopardizing their independence.

Once again, word of mouth reached Reeder, who now knew the dense talent in Sky Lab firsthand—and had seen founders repeatedly refer to Chatbot Arena as their litmus test for choosing between models. She, Balkansky and the rest of Sequoia’s partners were open to adding a second fellowship for 2024, and after a call for applications, had hundreds to choose from. But Chiang and Angelopoulos stood out.

“The fact that Chatbot Arena was committed to being fundamentally neutral was actually really important to us,” she says. “Versus other projects that may be widely used but have more commercial incentives, we felt like with this one, our support could really have an impact.”

Chiang and Angelopoulos became fellows in August, and say the funding has indeed freed them up to check crucial to-dos off their list. As always, they’re taking a user-first approach to using their newfound resources: through a recent partnership with a red-teaming community, for example, they’re exploring the relationship between controllability and potential safety concerns across different models.

Li, Kwon and Mo, too, have quickly put Sequoia’s support to use on their CI bills, and are currently rearchitecting vLLM to reduce and prevent the technical debt that often plagues fast-growing systems. Their primary goals, Mo says, remain usability and performance: “We want to build the easiest to use, most efficient, fastest inference engine in the world.” And their community is stepping up in kind; while 90% of contributions came from Berkeley as recently as last fall, that number is now just 25%.

For vLLM and Chatbot Arena alike, the rise of multimodal models looms large, as both teams undertake updates to accommodate images, video and code. Chiang and Angelopoulos recently partnered with Wayne Chi and Valerie Chen at Carnegie Mellon University, who built a VSCode extension called Copilot Arena that allows users to tab back and forth between completion models—and allows the platform to collect preference data about which models are best on coding auto-completion tasks. “That’s something we know the community cares a lot about,” says Angelopoulos. “We think it could be huge.”

Reeder says the progress vLLM and Chatbot Arena have made is just the beginning—for the projects, but also from the open source community at large. “Something we’ve noticed this year is that much of the innovation on the infrastructure stack is coming from open source—that’s one of the reasons we decided to expand the fellowship,” she says. “These projects have the support of the community. Everyone is jumping in and trying to help, in a way we haven’t seen in a long time.

“We’re excited to see how they continue pushing things forward. Open source is really building the future.”

If you’re building in open source and are interested in becoming a fellow, learn more and apply at sequoiacap.com/oss.