Machine Learning Acceleration (MLA)

Our Vision

As AI models scale toward trillions of parameters, the bottleneck has shifted from raw computation to the complex interplay between data movement and architectural efficiency. We envision a future of Hardware-Software Co-design, where the execution of ML models is optimized through a deep understanding of both high-level parallelism and low-level hardware structures. Our goal is to build an intelligent orchestration layer that treats heterogeneous accelerators and their underlying memory hierarchies – from scratchpads and DMAs to multi-level caches and HBM – as a unified, programmable fabric. By architecting systems that can autonomously exploit every dimension of parallelism, we aim to redefine the performance limits of next-generation AI.


Key Research Challenges

Recent Results

DMazeRunner provides a holistic framework for exploring and optimizing the spatiotemporal mapping of complex loop nests onto the computational and memory resources of dataflow accelerators.To address the challenge of mapping complex ML workloads onto specialized hardware, we developed dMazeRunner, a framework designed to explore the vast spatiotemporal execution space of perfectly nested loops (such as convolutions and matrix multiplications) on dataflow accelerators.

To address the challenge of mapping complex ML workloads onto specialized hardware, we developed dMazeRunner, a framework designed to explore the vast spatiotemporal execution space of perfectly nested loops (such as convolutions and matrix multiplications) on dataflow accelerators.

Software Downloads:
1. dMazeRunner – dataflow optimization for DNN accelerators
2. DiRAC – cycle-level simulator of reconfigurable dataflow accelerators