Reading Resources: Programmable Dataflow Accelerators for Machine Learning

Programmable Dataflow Accelerator Architectures

Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, and Joel S. Emer. “Efficient processing of deep neural networks: A tutorial and survey.” Proceedings of the IEEE 105, no. 12 (2017): 2295-2329.
Yu-Hsin Chen, Joel Emer, and Vivienne Sze. “Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks.” In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), pp. 367-379. IEEE, 2016.
Yu-Hsin Chen, Tushar Krishna, Joel S. Emer, and Vivienne Sze. “Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks.” IEEE Journal of Solid-State Circuits 52, no. 1 (2016): 127-138.
Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates et al. “In-datacenter performance analysis of a tensor processing unit.” In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), pp. 1-12. IEEE, 2017.
Wenyan Lu, Guihai Yan, Jiajun Li, Shijun Gong, Yinhe Han, and Xiaowei Li. “Flexflow: A flexible dataflow accelerator architecture for convolutional neural networks.” In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 553-564. IEEE, 2017.
Zidong Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Li, Tao Luo, Xiaobing Feng, Yunji Chen, and Olivier Temam. “ShiDianNao: Shifting vision processing closer to the sensor.” In 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA), pp. 92-104. IEEE, 2015.
Hongbo Rong. “Programmatic control of a compiler for generating high-performance spatial hardware.” arXiv preprint arXiv:1711.07606 (2017).
Michael Pellauer, Angshuman Parashar, Michael Adler, Bushra Ahsan, Randy Allmon, Neal Crago, Kermin Fleming et al. “Efficient control and communication paradigms for coarse-grained spatial architectures.” ACM Transactions on Computer Systems (TOCS) 33, no. 3 (2015): 10.
Hyoukjun Kwon, Ananda Samajdar, and Tushar Krishna. “Maeri: Enabling flexible dataflow mapping over dnn accelerators via reconfigurable interconnects.” In ACM SIGPLAN Notices, vol. 53, no. 2, pp. 461-475. ACM, 2018.
Bruce Fleischer, Sunil Shukla, Matthew Ziegler, Joel Silberman, Jinwook Oh, Vijavalakshmi Srinivasan, Jungwook Choi, Silvia Mueller, Ankur Agrawal, Tina Babinsky, et al. A scalable multi-teraops deep learning processor core for AI training and inference. In 2018 IEEE Symposium on VLSI Circuits, pages 35–36. IEEE, 2018.

Analytical Modeling of Hardware Accelerator Execution and Optimization of Dataflow Mappings

Xuan Yang, Mingyu Gao, Jing Pu, Ankita Nayak, Qiaoyi Liu, Steven Emberton Bell, Jeff Ou Setter, Kaidi Cao, Heonjae Ha, Christos Kozyrakis, and Mark Horowitz. “DNN Dataflow Choice Is Overrated.” arXiv preprint arXiv:1809.04070 (2018).
Shail Dave, Youngbin Kim, Sasikanth Avancha, Kyoungwoo Lee, Aviral Shrivastava, “dMazeRunner: Executing Perfectly Nested Loops on Dataflow Accelerators”, in ACM Transactions on Embedded Computing Systems (TECS), 2019 [Special Issue on ESWEEK 2019 – ACM/IEEE International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)]
Angshuman Parashar, Priyanka Raina, Yakun Sophia Shao, Yu-Hsin Chen, Victor A. Ying, Anurag Mukkara, Rangharajan Venkatesan, Brucek Khailany, Stephen W. Keckler, and Joel Emer. “Timeloop: A Systematic Approach to DNN Accelerator Evaluation.” In 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 304-315. IEEE, 2019.
Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. Optimizing fpga-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pages 161–170. ACM, 2015

Accelerators for Sparse and Compact Deep Neural Network Models

Shail Dave, Riyadh Baghdadi, Tony Nowatzki, Sasikanth Avancha, Aviral Shrivastava, and Baoxin Li. “Hardware Acceleration of Sparse and Irregular Tensor Computations of ML Models: A Survey and Insights.” In Proceedings of the IEEE, 2021.
Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. “EIE: efficient inference engine on compressed deep neural network.” In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), pp. 243-254. IEEE, 2016.
Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W. Keckler, and William J. Dally. “Scnn: An accelerator for compressed-sparse convolutional neural networks.” In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), pp. 27-40. IEEE, 2017.
Shijin Zhang, Zidong Du, Lei Zhang, Huiying Lan, Shaoli Liu, Ling Li, Qi Guo, Tianshi Chen, and Yunji Chen. “Cambricon-x: An accelerator for sparse neural networks.” In The 49th Annual IEEE/ACM International Symposium on Microarchitecture, p. 20. IEEE Press, 2016.
Dongyoung Kim, Junwhan Ahn, and Sungjoo Yoo. “Zena: Zero-aware neural network accelerator.” IEEE Design & Test 35, no. 1 (2017): 39-46.
Xuda Zhou, Zidong Du, Qi Guo, Shaoli Liu, Chengsi Liu, Chao Wang, Xuehai Zhou, Ling Li, Tianshi Chen, and Yunji Chen. “Cambricon-S: Addressing irregularity in sparse neural networks through a cooperative software/hardware approach.” In 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 15-28. IEEE, 2018.
Yu-Hsin Chen, Tien-Ju Yang, Joel Emer, and Vivienne Sze. “Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices.” IEEE Journal on Emerging and Selected Topics in Circuits and Systems 9, no. 2 (2019): 292-308.

System Stack

Tianqi Chen, Thierry Moreau, Ziheng Jiang, Haichen Shen, Eddie Yan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. “TVM: end-to-end optimization stack for deep learning.” arXiv preprint arXiv:1802.04799 (2018).
Nadav Rotem, Jordan Fix, Saleem Abdulrasool, Garret Catron, Summer Deng, Roman Dzhabarov, Nick Gibson et al. “Glow: Graph lowering compiler techniques for neural networks.” arXiv preprint arXiv:1805.00907 (2018).
Yi-Hsiang Lai, Yuze Chi, Yuwei Hu, Jie Wang, Cody Hao Yu, Yuan Zhou, Jason Cong, and Zhiru Zhang. “HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Reconfigurable Computing.” In Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 242-251. ACM, 2019.

DNN Model Compression

Song Han, Huizi Mao, and William J. Dally. “Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding.” arXiv preprint arXiv:1510.00149 (2015).
Lei Deng, Guoqi Li, Song Han, Luping Shi, and Yuan Xie. “Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey.” Proceedings of the IEEE 108, no. 4 (2020): 485-532.
Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. “Learning structured sparsity in deep neural networks.” In Advances in neural information processing systems, pp. 2074-2082. 2016.
Raghuraman Krishnamoorthi. “Quantizing deep convolutional networks for efficient inference: A whitepaper.” arXiv preprint arXiv:1806.08342 (2018).
Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. “Deep learning with limited numerical precision.” In International Conference on Machine Learning, pp. 1737-1746. 2015.
Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, and Kurt Keutzer. “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size.” arXiv preprint arXiv:1602.07360 (2016).

Miscellaneous (Application Frameworks, Emerging Applications/Models, etc.)

Jongsoo Park, Maxim Naumov, Protonu Basu, Summer Deng, Aravind Kalaiah, Daya Khudia, James Law, Parth Malani, Andrey Malevich, Satish Nadathur, et al. “Deep learning inference in facebook data centers: Characterization, performance optimizations and hardware implications.” arXiv preprint arXiv:1811.09886 (2018).
Yann LeCun. 1.1 deep learning hardware: Past, present, and future. In 2019 IEEE International Solid-State Circuits Conference-(ISSCC), pages 12–19. IEEE, 2019
Mingxing Tan, and Quoc Le. “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.” In International Conference on Machine Learning, pp. 6105-6114. 2019.