Shail Dave; Tony Nowatzki; Aviral Shrivastava
Explainable-DSE: An Agile and Explainable Exploration of Efficient Hardware/Software Codesigns of Deep Learning Accelerators Using Bottleneck Analysis Proceedings Article
In: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2024, (Won Silver Medal at ACM Student Research Competition 2022-23 (Host: ACM SIGBED)).
Abstract | BibTeX | Tags: Accelerated Computing, Machine Learning, Machine Learning Accelerators | Links:
@inproceedings{Dave2024ASPLOS,
title = {Explainable-DSE: An Agile and Explainable Exploration of Efficient Hardware/Software Codesigns of Deep Learning Accelerators Using Bottleneck Analysis},
author = {Shail Dave and Tony Nowatzki and Aviral Shrivastava},
url = {https://mpslab-asu.github.io/publications/papers/Dave2024ASPLOS.pdf, pdf
https://mpslab-asu.github.io/publications/slides/Dave2024ASPLOS.pptx, slides
https://mpslab-asu.github.io/publications/posters/Dave2024ASPLOS.pdf, poster
https://youtu.be/y-F1Cp66_oQ, teaser},
year = {2024},
date = {2024-04-02},
urldate = {2024-04-02},
booktitle = {Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)},
abstract = {Effective design space exploration (DSE) is paramount for hardware/software codesigns of deep learning accelerators that must meet strict execution constraints. For their vast search space, existing DSE techniques can require excessive number of trials to obtain valid and efficient solution because they rely on black-box explorations that do not reason about design inefficiencies. In this paper, we propose Explainable-DSE – a framework for DSE of DNN accelerator codesigns using bottleneck analysis. By leveraging information about execution costs from bottleneck models, our DSE is able to identify the bottlenecks and therefore the reasons for design inefficiency, and can therefore make mitigating acquisitions in further explorations. We describe the construction of such bottleneck models for DNN accelerator domain. We also propose an
API for expressing such domain-specific models and integrating them into the DSE framework. Acquisitions of our DSE framework caters to multiple bottlenecks in executions of workloads like DNNs that contain different functions with diverse execution characteristics. Evaluations for recent computer vision and language models show that Explainable-DSE mostly explores effectual candidates, achieving codesigns of 6× lower latency in 47× fewer iterations vs. non-explainable techniques using evolutionary or ML-based optimizations. By taking minutes or tens of iterations, it enables opportunities for runtime DSEs.},
note = {Won Silver Medal at ACM Student Research Competition 2022-23 (Host: ACM SIGBED)},
keywords = {Accelerated Computing, Machine Learning, Machine Learning Accelerators},
pubstate = {published},
tppubtype = {inproceedings}
}
API for expressing such domain-specific models and integrating them into the DSE framework. Acquisitions of our DSE framework caters to multiple bottlenecks in executions of workloads like DNNs that contain different functions with diverse execution characteristics. Evaluations for recent computer vision and language models show that Explainable-DSE mostly explores effectual candidates, achieving codesigns of 6× lower latency in 47× fewer iterations vs. non-explainable techniques using evolutionary or ML-based optimizations. By taking minutes or tens of iterations, it enables opportunities for runtime DSEs.
Shail Dave; Aviral Shrivastava
Automating the Architectural Execution Modeling and Characterization of Domain-Specific Architectures Conference
Proceedings of the TECHCON, 2023.
Abstract | BibTeX | Tags: Accelerated Computing, Machine Learning, Machine Learning Accelerators | Links:
@conference{Dave2023TECHCON,
title = {Automating the Architectural Execution Modeling and Characterization of Domain-Specific Architectures},
author = {Shail Dave and Aviral Shrivastava},
url = {https://mpslab-asu.github.io/publications/papers/Dave2023TECHCON.pdf, pdf},
year = {2023},
date = {2023-09-11},
urldate = {2023-09-11},
booktitle = {Proceedings of the TECHCON},
abstract = {Domain-specific architectures (DSAs) are increasingly designed to efficiently process a variety of workloads, such as deep learning, linear algebra, and graph analytics. Most research efforts have focused on proposing new DSAs or efficiently exploring hardware/software designs of previously proposed architecture templates. Recent architectural modeling or simulation frameworks for DSAs can analyze execution costs, e.g., for a limited architectural templates for dense DNNs such as systolic arrays or a spatial architecture with an array of processing elements and 3-level memory hierarchy. However, they are manually developed by domain-experts, containing several 1000s of lines-of-code, and extending them for characterizing new architectures is infeasible, such as DSAs for sparse DNNs. Further, the lack of automated architecture-level execution modeling limits the design space of novel architectures that can be explored/optimized, affecting overall efficiency of solutions, and it delays time-to-market with low sustainability of design process.
To address this issue, this paper introduces DSAProf : a framework for automated execution modeling and bottleneck characterization by a modular, dataflow-driven approach. The framework uses a flow-graph-based methodology for modeling DSAs in a modular manner via a library of architectural components and analyzing their executions. The methodology can account for analytically modeling and simulating intricacies in the presence of a variety of architectural features such as asynchronous execution of workgroups, sparse data processing, arbitrary buffer hierarchies, and multi-chip or mixed-precision modules. Preliminary evaluations of modeling previously proposed DSAs for dense/sparse deep learning demonstrate that our approach is extensible for novel DSAs and it can accurately and automatically characterize their latency and identify execution bottlenecks, without requiring designers to manually build analysis/simulator from scratch for every DSA.},
keywords = {Accelerated Computing, Machine Learning, Machine Learning Accelerators},
pubstate = {published},
tppubtype = {conference}
}
To address this issue, this paper introduces DSAProf : a framework for automated execution modeling and bottleneck characterization by a modular, dataflow-driven approach. The framework uses a flow-graph-based methodology for modeling DSAs in a modular manner via a library of architectural components and analyzing their executions. The methodology can account for analytically modeling and simulating intricacies in the presence of a variety of architectural features such as asynchronous execution of workgroups, sparse data processing, arbitrary buffer hierarchies, and multi-chip or mixed-precision modules. Preliminary evaluations of modeling previously proposed DSAs for dense/sparse deep learning demonstrate that our approach is extensible for novel DSAs and it can accurately and automatically characterize their latency and identify execution bottlenecks, without requiring designers to manually build analysis/simulator from scratch for every DSA.
Yi Hu; Chaoran Zhang; Edward Andert; Harshul Singh; Aviral Shrivastava; James Laudon; Yanqi Zhou; Bob Iannucci; Carlee Joe-Wong
GiPH: Generalizable Placement Learning for Adaptive Heterogeneous Computing Proceedings Article
In: Proceedings of the Sixth Conference on Machine Learning and Systems (MLSys), 2023.
BibTeX | Tags: Accelerated Computing, Machine Learning, Machine Learning Accelerators, Real-Time Systems | Links:
@inproceedings{Hu2023MLSYS,
title = {GiPH: Generalizable Placement Learning for Adaptive Heterogeneous Computing},
author = {Yi Hu and Chaoran Zhang and Edward Andert and Harshul Singh and Aviral Shrivastava and James Laudon and Yanqi Zhou and Bob Iannucci and Carlee Joe-Wong},
url = {https://mpslab-asu.github.io/publications/papers/Hu2023MLSYS.pdf, pdf},
year = {2023},
date = {2023-06-04},
urldate = {2023-06-04},
booktitle = {Proceedings of the Sixth Conference on Machine Learning and Systems (MLSys)},
keywords = {Accelerated Computing, Machine Learning, Machine Learning Accelerators, Real-Time Systems},
pubstate = {published},
tppubtype = {inproceedings}
}
Behnaz Ranjbar; Florian Klemme; Paul R. Genssler; Hussam Amrouch; Jinhyo Jung; Shail Dave; Hwisoo So; Kyongwoo Lee; Aviral Shrivastava; Ji-Yung Lin; Pieter Weckx; Subrat Mishra; Francky Catthoor; Dwaipayan Biswas; Akash Kumar
Learning-Oriented Reliability Improvement of Computing Systems From Transistor to Application Level Proceedings Article
In: Proceedings of the 26th International Conference on Design Automation and Test in Europe (DATE), 2023.
BibTeX | Tags: Efficient Embedded Computing, Error Correction, Error Resilience, Machine Learning, Machine Learning Accelerators, Soft Error | Links:
@inproceedings{Ranjbar2023DATE,
title = {Learning-Oriented Reliability Improvement of Computing Systems From Transistor to Application Level},
author = {Behnaz Ranjbar and Florian Klemme and Paul R. Genssler and Hussam Amrouch and Jinhyo Jung and Shail Dave and Hwisoo So and Kyongwoo Lee and Aviral Shrivastava and Ji-Yung Lin and Pieter Weckx and Subrat Mishra and Francky Catthoor and Dwaipayan Biswas and Akash Kumar},
url = {https://mpslab-asu.github.io/publications/papers/Ranjbar2023DATE.pdf, paper
https://mpslab-asu.github.io/publications/slides/Ranjbar2023DATE.pptx, slides},
year = {2023},
date = {2023-04-17},
urldate = {2023-04-17},
booktitle = {Proceedings of the 26th International Conference on Design Automation and Test in Europe (DATE)},
keywords = {Efficient Embedded Computing, Error Correction, Error Resilience, Machine Learning, Machine Learning Accelerators, Soft Error},
pubstate = {published},
tppubtype = {inproceedings}
}
Aviral Shrivastava; Xiaobo Sharon Hu
Report on the 2022 Embedded Systems Week (ESWEEK) Journal Article
In: IEEE Design & Test, vol. 40, iss. 1, pp. 108-111, 2023.
Abstract | BibTeX | Tags: Accelerated Computing, CPS, Efficient Embedded Computing, Error Resilience, Machine Learning Accelerators, Real-Time Systems | Links:
@article{Shrivastava2023D&T,
title = {Report on the 2022 Embedded Systems Week (ESWEEK)},
author = {Aviral Shrivastava; Xiaobo Sharon Hu},
url = {https://mpslab-asu.github.io/publications/papers/Shrivastava2023D&T.pdf, pdf},
year = {2023},
date = {2023-01-23},
urldate = {2023-01-23},
journal = {IEEE Design & Test},
volume = {40},
issue = {1},
pages = {108-111},
abstract = {Embedded Systems Week (ESWEEK) is the premier event covering all aspects of hardware and software design for intelligent and connected computing systems. By bringing together three leading conferences [the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES); the International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS); and the International Conference on Embedded Software (EMSOFT)] and a variety of symposia, hot-topic workshops, tutorials, and education classes, ESWEEK presents to the attendees a wide range of topics unveiling state-of-the-art embedded software, embedded architectures, and embedded system designs.},
keywords = {Accelerated Computing, CPS, Efficient Embedded Computing, Error Resilience, Machine Learning Accelerators, Real-Time Systems},
pubstate = {published},
tppubtype = {article}
}
Shail Dave; Alberto Marchisio; Muhammad Abdullah Hanif; Amira Guesmi; Aviral Shrivastava; Ihsen Alouani; Muhammad Shafique
Special Session: Towards an Agile Design Methodology for Efficient, Reliable, and Secure ML Systems Proceedings Article
In: Proceedings of the 2022 IEEE 40th VLSI Test Symposium (VTS), 2022.
Abstract | BibTeX | Tags: Accelerated Computing, Efficient Embedded Computing, Error Resilience, Machine Learning, Machine Learning Accelerators, Soft Error | Links:
@inproceedings{DaveVTS2022,
title = {Special Session: Towards an Agile Design Methodology for Efficient, Reliable, and Secure ML Systems},
author = {Shail Dave and Alberto Marchisio and Muhammad Abdullah Hanif and Amira Guesmi and Aviral Shrivastava and Ihsen Alouani and Muhammad Shafique},
url = {https://mpslab-asu.github.io/publications/papers/Dave2022VTS.pdf, pdf
https://mpslab-asu.github.io/publications/slides/Dave2022VTS.pptx, slides},
year = {2022},
date = {2022-04-25},
urldate = {2022-04-25},
booktitle = {Proceedings of the 2022 IEEE 40th VLSI Test Symposium (VTS)},
abstract = {The real-world use cases of Machine Learning (ML) have exploded over the past few years. However, the current computing infrastructure is insufficient to support all real-world applications and scenarios. Apart from high efficiency requirements, modern ML systems are expected to be highly reliable against hardware failures as well as secure against adversarial and IP stealing attacks. Recent developments have also highlighted various privacy concerns. Towards trustworthy ML systems, in this work we highlight different challenges faced by the embedded systems community towards enabling efficient,
dependable and secure deployment of ML. To address these challenges, we present an agile design methodology to generate efficient, reliable and secure ML systems based on user-defined constraints and objectives.},
keywords = {Accelerated Computing, Efficient Embedded Computing, Error Resilience, Machine Learning, Machine Learning Accelerators, Soft Error},
pubstate = {published},
tppubtype = {inproceedings}
}
dependable and secure deployment of ML. To address these challenges, we present an agile design methodology to generate efficient, reliable and secure ML systems based on user-defined constraints and objectives.
Shail Dave; Aviral Shrivastava
Design Space Description Language for Automated and Comprehensive Exploration of Next-Gen Hardware Accelerators Workshop
Workshop on Languages, Tools, and Techniques for Accelerator Design (LATTE), 2022, (co-located with ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).).
Abstract | BibTeX | Tags: Accelerated Computing, CGRA, Machine Learning, Machine Learning Accelerators | Links:
@workshop{DaveLATTE2022,
title = {Design Space Description Language for Automated and Comprehensive Exploration of Next-Gen Hardware Accelerators},
author = {Shail Dave and Aviral Shrivastava},
url = {https://mpslab-asu.github.io/publications/papers/Dave2022LATTE.pdf, pdf
https://mpslab-asu.github.io/publications/slides/Dave2022LATTE.pptx, slides
https://capra.cs.cornell.edu/latte22/, workshop
https://youtu.be/Z5jZ2dbE0To, talk},
year = {2022},
date = {2022-03-01},
urldate = {2022-03-01},
booktitle = {Workshop on Languages, Tools, and Techniques for Accelerator Design (LATTE)},
abstract = {Exploration of accelerators typically involves an architectural template specified in architecture description language (ADL). It can limit the design space that can be explored, reusability and automation of system stack, explainability, and exploration efficiency. We envision Design Space Description Language (DSDL) for comprehensive, reusable, explainable, and agile DSE. We describe how its flow graph abstraction enables comprehensive DSE of modular designs, with architectural components organized in various hierarchies and groups. We discuss automation of characterizing, simulating, and programming new architectures. Lastly, we describe how DSDL flow graphs facilitate bottleneck analysis, yielding explainability of costs and selected designs and super-fast exploration.},
note = {co-located with ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).},
keywords = {Accelerated Computing, CGRA, Machine Learning, Machine Learning Accelerators},
pubstate = {published},
tppubtype = {workshop}
}
Shail Dave; Riyadh Baghdadi; Tony Nowatzki; Sasikanth Avancha; Aviral Shrivastava; Baoxin Li
Hardware Acceleration of Sparse and Irregular Tensor Computations of ML Models: A Survey and Insights Journal Article
In: Proceedings of the IEEE (PIEEE), 2021, (arXiv: 2007.00864).
BibTeX | Tags: Accelerated Computing, CGRA, Low-power Computing, Machine Learning, Machine Learning Accelerators | Links:
@article{DavePIEEE2021,
title = {Hardware Acceleration of Sparse and Irregular Tensor Computations of ML Models: A Survey and Insights},
author = {Shail Dave and Riyadh Baghdadi and Tony Nowatzki and Sasikanth Avancha and Aviral Shrivastava and Baoxin Li},
url = {https://mpslab-asu.github.io/publications/papers/Dave2021PIEEE.pdf, paper},
year = {2021},
date = {2021-10-01},
urldate = {2021-10-01},
journal = {Proceedings of the IEEE (PIEEE)},
note = {arXiv: 2007.00864},
keywords = {Accelerated Computing, CGRA, Low-power Computing, Machine Learning, Machine Learning Accelerators},
pubstate = {published},
tppubtype = {article}
}
Shail Dave; Aviral Shrivastava; Youngbin Kim; Sasikanth Avancha; Kyoungwoo Lee
dMazeRunner: Optimizing Convolutions on Dataflow Accelerators Proceedings Article
In: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, (Invited).
BibTeX | Tags: Accelerated Computing, Machine Learning, Machine Learning Accelerators | Links:
@inproceedings{DaveICASSP2020,
title = {dMazeRunner: Optimizing Convolutions on Dataflow Accelerators},
author = {Shail Dave and Aviral Shrivastava and Youngbin Kim and Sasikanth Avancha and Kyoungwoo Lee},
url = {https://mpslab-asu.github.io/publications/papers/Dave2020ICASSP.pdf, paper
https://mpslab-asu.github.io/publications/slides/Dave2020ICASSP.pptx, slides
https://github.com/MPSLab-ASU/dMazeRunner, code
https://www.youtube.com/watch?v=21F79Taelts, video},
year = {2020},
date = {2020-04-09},
urldate = {2020-04-09},
booktitle = {ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
note = {Invited},
keywords = {Accelerated Computing, Machine Learning, Machine Learning Accelerators},
pubstate = {published},
tppubtype = {inproceedings}
}
Shail Dave; Youngbin Kim; Sasikanth Avancha; Kyoungwoo Lee; Aviral Shrivastava
DMazeRunner: Executing Perfectly Nested Loops on Dataflow Accelerators Journal Article
In: ACM Transactions on Embedded Computing Systems (TECS), vol. 18, no. 5s, 2019, (Special Issue on ESWEEK 2019 - Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)).
BibTeX | Tags: Accelerated Computing, Machine Learning, Machine Learning Accelerators | Links:
@article{DaveTECS2019,
title = {DMazeRunner: Executing Perfectly Nested Loops on Dataflow Accelerators},
author = {Shail Dave and Youngbin Kim and Sasikanth Avancha and Kyoungwoo Lee and Aviral Shrivastava},
url = {https://mpslab-asu.github.io/publications/papers/Dave2019TECS.pdf, paper
https://mpslab-asu.github.io/publications/slides/Dave2019TECS.pptx, slides
https://mpslab-asu.github.io/publications/posters/Dave2019TECS.pdf, poster
https://github.com/MPSLab-ASU/dMazeRunner, code},
year = {2019},
date = {2019-01-01},
urldate = {2019-01-01},
journal = {ACM Transactions on Embedded Computing Systems (TECS)},
volume = {18},
number = {5s},
note = {Special Issue on ESWEEK 2019 - Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)},
keywords = {Accelerated Computing, Machine Learning, Machine Learning Accelerators},
pubstate = {published},
tppubtype = {article}
}