High-performance computing, with much less code

The Exo 2 programming language enables reusable scheduling libraries external to compilers.

Adam Conner-Simons | MIT CSAIL | Thu, Mar 13, 2025

Many companies invest heavily in hiring talent to create the high-performance library code that underpins modern artificial intelligence systems. NVIDIA, for instance, developed some of the most advanced high-performance computing (HPC) libraries, creating a competitive moat that has proven difficult for others to breach.

But what if a couple of students, within a few months, could compete with state-of-the-art HPC libraries with a few hundred lines of code, instead of tens or hundreds of thousands?

That’s what researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have shown with a new programming language called Exo 2.

Exo 2 belongs to a new category of programming languages that MIT Professor Jonathan Ragan-Kelley calls “user-schedulable languages” (USLs). Instead of hoping that an opaque compiler will auto-generate the fastest possible code, USLs put programmers in the driver's seat, allowing them to write “schedules” that explicitly control how the compiler generates code. This enables performance engineers to transform simple programs that specify what they want to compute into complex programs that do the same thing as the original specification, but much, much faster.

One of the limitations of existing USLs (like the original Exo) is their relatively fixed set of scheduling operations, which makes it difficult to reuse scheduling code across different “kernels” (the individual components in a high-performance library).

In contrast, Exo 2 enables users to define new scheduling operations externally to the compiler, facilitating the creation of reusable scheduling libraries. Lead author Yuka Ikarashi, an MIT PhD student in electrical engineering and computer science and CSAIL affiliate, says that Exo 2 can reduce total schedule code by a factor of 100 and deliver performance competitive with state-of-the-art implementations on multiple different platforms, including Basic Linear Algebra Subprograms (BLAS) that power many machine learning applications. This makes it an attractive option for engineers in HPC focused on optimizing kernels across different operations, data types, and target architectures.

“It’s a bottom-up approach to automation, rather than doing an ML/AI search over high-performance code,” says Ikarashi. “What that means is that performance engineers and hardware implementers can write their own scheduling library, which is a set of optimization techniques to apply on their hardware to reach the peak performance.”

One major advantage of Exo 2 is that it reduces the amount of coding effort needed at any one time by reusing the scheduling code across applications and hardware targets. The researchers implemented a scheduling library with roughly 2,000 lines of code in Exo 2, encapsulating reusable optimizations that are linear-algebra specific and target-specific (AVX512, AVX2, Neon, and Gemmini hardware accelerators). This library consolidates scheduling efforts across more than 80 high-performance kernels with up to a dozen lines of code each, delivering performance comparable to, or better than, MKL, OpenBLAS, BLIS, and Halide.

Exo 2 includes a novel mechanism called “Cursors” that provides what they call a “stable reference” for pointing at the object code throughout the scheduling process. Ikarashi says that a stable reference is essential for users to encapsulate schedules within a library function, as it renders the scheduling code independent of object-code transformations.

“We believe that USLs should be designed to be user-extensible, rather than having a fixed set of operations,” says Ikarashi. “In this way, a language can grow to support large projects through the implementation of libraries that accommodate diverse optimization requirements and application domains.”

Exo 2’s design allows performance engineers to focus on high-level optimization strategies while ensuring that the underlying object code remains functionally equivalent through the use of safe primitives. In the future, the team hopes to expand Exo 2’s support for different types of hardware accelerators, like GPUs. Several ongoing projects aim to improve the compiler analysis itself, in terms of correctness, compilation time, and expressivity.

Ikarashi and Ragan-Kelley co-authored the paper with graduate students Kevin Qian and Samir Droubi, Alex Reinking of Adobe, and former CSAIL postdoc Gilbert Bernstein, now a professor at the University of Washington. This research was funded, in part, by the U.S. Defense Advanced Research Projects Agency (DARPA) and the U.S. National Science Foundation, while the first author was also supported by Masason, Funai, and Quad Fellowships.

Latest MIT Latest News

View all