{"id":3600,"date":"2019-10-02T21:26:16","date_gmt":"2019-10-03T04:26:16","guid":{"rendered":"http:\/\/aviral.lab.asu.edu\/?page_id=3600"},"modified":"2021-07-15T22:07:13","modified_gmt":"2021-07-15T22:07:13","slug":"reading-ml-accelerators","status":"publish","type":"page","link":"https:\/\/labs.engineering.asu.edu\/mps-lab\/ml-accelerators\/reading-ml-accelerators\/","title":{"rendered":"Reading Resources: Programmable Dataflow Accelerators for Machine Learning"},"content":{"rendered":"<p>&nbsp;<\/p>\n<h4><strong>Programmable Dataflow Accelerator Architectures<\/strong><\/h4>\n<ul>\n<li>Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, and Joel S. Emer. &#8220;Efficient processing of deep neural networks: A tutorial and survey.&#8221; Proceedings of the IEEE 105, no. 12 (2017): 2295-2329.<\/li>\n<li>Yu-Hsin Chen, Joel Emer, and Vivienne Sze. &#8220;Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks.&#8221; In\u00a0<i>2016 ACM\/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)<\/i>, pp. 367-379. IEEE, 2016.<\/li>\n<li>Yu-Hsin Chen, Tushar Krishna, Joel S. Emer, and Vivienne Sze. &#8220;Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks.&#8221;\u00a0<i>IEEE Journal of Solid-State Circuits<\/i>\u00a052, no. 1 (2016): 127-138.<\/li>\n<li>Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates et al. &#8220;In-datacenter performance analysis of a tensor processing unit.&#8221; In\u00a0<i>2017 ACM\/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)<\/i>, pp. 1-12. IEEE, 2017.<\/li>\n<li>Wenyan Lu, Guihai Yan, Jiajun Li, Shijun Gong, Yinhe Han, and Xiaowei Li. &#8220;Flexflow: A flexible dataflow accelerator architecture for convolutional neural networks.&#8221; In\u00a0<i>2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)<\/i>, pp. 553-564. IEEE, 2017.<\/li>\n<li>Zidong Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Li, Tao Luo, Xiaobing Feng, Yunji Chen, and Olivier Temam. &#8220;ShiDianNao: Shifting vision processing closer to the sensor.&#8221; In\u00a0<i>2015 ACM\/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)<\/i>, pp. 92-104. IEEE, 2015.<\/li>\n<li>Hongbo Rong. &#8220;Programmatic control of a compiler for generating high-performance spatial hardware.&#8221;\u00a0<i>arXiv preprint arXiv:1711.07606<\/i>\u00a0(2017).<\/li>\n<li>Michael Pellauer, Angshuman Parashar, Michael Adler, Bushra Ahsan, Randy Allmon, Neal Crago, Kermin Fleming et al. &#8220;Efficient control and communication paradigms for coarse-grained spatial architectures.&#8221;\u00a0<i>ACM Transactions on Computer Systems (TOCS)<\/i>\u00a033, no. 3 (2015): 10.<\/li>\n<li>Hyoukjun Kwon, Ananda Samajdar, and Tushar Krishna. &#8220;Maeri: Enabling flexible dataflow mapping over dnn accelerators via reconfigurable interconnects.&#8221; In\u00a0<i>ACM SIGPLAN Notices<\/i>, vol. 53, no. 2, pp. 461-475. ACM, 2018.<\/li>\n<li>Bruce Fleischer, Sunil Shukla, Matthew Ziegler, Joel Silberman, Jinwook Oh, Vijavalakshmi Srinivasan, Jungwook\u00a0Choi, Silvia Mueller, Ankur Agrawal, Tina Babinsky, et al. A scalable multi-teraops deep learning processor core for AI training and inference. In 2018 IEEE Symposium on VLSI Circuits, pages 35\u201336. IEEE, 2018.<\/li>\n<\/ul>\n<h4><strong>Analytical Modeling of Hardware Accelerator Execution and Optimization of Dataflow Mappings<\/strong><\/h4>\n<ul>\n<li>Xuan Yang, Mingyu Gao, Jing Pu, Ankita Nayak, Qiaoyi Liu, Steven Emberton Bell, Jeff Ou Setter, Kaidi Cao, Heonjae Ha, Christos Kozyrakis, and Mark Horowitz. &#8220;DNN Dataflow Choice Is Overrated.&#8221;\u00a0<i>arXiv preprint arXiv:1809.04070<\/i>\u00a0(2018).<\/li>\n<li>Shail Dave, Youngbin Kim, Sasikanth Avancha, Kyoungwoo Lee, Aviral Shrivastava, &#8220;dMazeRunner: Executing Perfectly Nested Loops on Dataflow Accelerators&#8221;, in ACM Transactions on Embedded Computing Systems (TECS), 2019 [Special Issue on ESWEEK 2019 &#8211; ACM\/IEEE International Conference on Hardware\/Software Codesign and System Synthesis (CODES+ISSS)]<\/li>\n<li>Angshuman Parashar, Priyanka Raina, Yakun Sophia Shao, Yu-Hsin Chen, Victor A. Ying, Anurag Mukkara, Rangharajan Venkatesan, Brucek Khailany, Stephen W. Keckler, and Joel Emer. &#8220;Timeloop: A Systematic Approach to DNN Accelerator Evaluation.&#8221; In\u00a0<i>2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)<\/i>, pp. 304-315. IEEE, 2019.<\/li>\n<li>Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. Optimizing fpga-based accelerator\u00a0design for deep convolutional neural networks. In Proceedings of the 2015 ACM\/SIGDA International Symposium on\u00a0Field-Programmable Gate Arrays, pages 161\u2013170. ACM, 2015<\/li>\n<\/ul>\n<h4><strong>Accelerators for Sparse and Compact Deep Neural Network Models<\/strong><\/h4>\n<ul>\n<li>Shail Dave, Riyadh Baghdadi, Tony Nowatzki, Sasikanth Avancha, Aviral Shrivastava, and Baoxin Li. &#8220;Hardware Acceleration of Sparse and Irregular Tensor Computations of ML Models: A Survey and Insights.&#8221; In Proceedings of the IEEE, 2021.<\/li>\n<li>Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. &#8220;EIE: efficient inference engine on compressed deep neural network.&#8221; In\u00a0<i>2016 ACM\/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)<\/i>, pp. 243-254. IEEE, 2016.<\/li>\n<li>Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W. Keckler, and William J. Dally. &#8220;Scnn: An accelerator for compressed-sparse convolutional neural networks.&#8221; In\u00a0<i>2017 ACM\/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)<\/i>, pp. 27-40. IEEE, 2017.<\/li>\n<li>Shijin Zhang, Zidong Du, Lei Zhang, Huiying Lan, Shaoli Liu, Ling Li, Qi Guo, Tianshi Chen, and Yunji Chen. &#8220;Cambricon-x: An accelerator for sparse neural networks.&#8221; In\u00a0<i>The 49th Annual IEEE\/ACM International Symposium on Microarchitecture<\/i>, p. 20. IEEE Press, 2016.<\/li>\n<li>Dongyoung Kim, Junwhan Ahn, and Sungjoo Yoo. &#8220;Zena: Zero-aware neural network accelerator.&#8221; IEEE Design &amp; Test 35, no. 1 (2017): 39-46.<\/li>\n<li>Xuda Zhou, Zidong Du, Qi Guo, Shaoli Liu, Chengsi Liu, Chao Wang, Xuehai Zhou, Ling Li, Tianshi Chen, and Yunji Chen. &#8220;Cambricon-S: Addressing irregularity in sparse neural networks through a cooperative software\/hardware approach.&#8221; In 2018 51st Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO), pp. 15-28. IEEE, 2018.<\/li>\n<li>Yu-Hsin Chen, Tien-Ju Yang, Joel Emer, and Vivienne Sze. &#8220;Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices.&#8221; IEEE Journal on Emerging and Selected Topics in Circuits and Systems 9, no. 2 (2019): 292-308.<\/li>\n<\/ul>\n<h4><strong>System Stack<\/strong><\/h4>\n<ul>\n<li>Tianqi Chen, Thierry Moreau, Ziheng Jiang, Haichen Shen, Eddie Yan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. &#8220;TVM: end-to-end optimization stack for deep learning.&#8221;\u00a0<i>arXiv preprint arXiv:1802.04799<\/i>\u00a0(2018).<\/li>\n<li>Nadav Rotem, Jordan Fix, Saleem Abdulrasool, Garret Catron, Summer Deng, Roman Dzhabarov, Nick Gibson et al. &#8220;Glow: Graph lowering compiler techniques for neural networks.&#8221;\u00a0<i>arXiv preprint arXiv:1805.00907<\/i>\u00a0(2018).<\/li>\n<li>Yi-Hsiang Lai, Yuze Chi, Yuwei Hu, Jie Wang, Cody Hao Yu, Yuan Zhou, Jason Cong, and Zhiru Zhang. &#8220;HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Reconfigurable Computing.&#8221; In\u00a0<i>Proceedings of the 2019 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays<\/i>, pp. 242-251. ACM, 2019.<\/li>\n<\/ul>\n<p><strong>DNN Model Compression<\/strong><\/p>\n<ul>\n<li>Song Han, Huizi Mao, and William J. Dally. &#8220;Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding.&#8221; <i>arXiv preprint arXiv:1510.00149<\/i> (2015).<\/li>\n<li>Lei Deng, Guoqi Li, Song Han, Luping Shi, and Yuan Xie. &#8220;Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey.&#8221; Proceedings of the IEEE 108, no. 4 (2020): 485-532.<\/li>\n<li>Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. &#8220;Learning structured sparsity in deep neural networks.&#8221; In Advances in neural information processing systems, pp. 2074-2082. 2016.<\/li>\n<li>Raghuraman Krishnamoorthi. &#8220;Quantizing deep convolutional networks for efficient inference: A whitepaper.&#8221; <i>arXiv preprint arXiv:1806.08342<\/i>\u00a0(2018).<\/li>\n<li>Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. &#8220;Deep learning with limited numerical precision.&#8221; In International Conference on Machine Learning, pp. 1737-1746. 2015.<\/li>\n<li>Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, and Kurt Keutzer. &#8220;SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and&lt; 0.5 MB model size.&#8221; arXiv preprint arXiv:1602.07360 (2016).<\/li>\n<\/ul>\n<h4><strong>Miscellaneous\u00a0<\/strong>(Application Frameworks, Emerging Applications\/Models, etc.)<\/h4>\n<ul>\n<li>Jongsoo Park, Maxim Naumov, Protonu Basu, Summer Deng, Aravind Kalaiah, Daya Khudia, James Law, Parth\u00a0Malani, Andrey Malevich, Satish Nadathur, et al. &#8220;Deep learning inference in facebook data centers: Characterization, performance optimizations and hardware implications.&#8221;\u00a0<i>arXiv preprint arXiv:1811.09886<\/i>\u00a0(2018).<\/li>\n<li>Yann LeCun. 1.1 deep learning hardware: Past, present, and future. In 2019 IEEE International Solid-State Circuits\u00a0Conference-(ISSCC), pages 12\u201319. IEEE, 2019<\/li>\n<li>Mingxing Tan, and Quoc Le. &#8220;EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.&#8221; In\u00a0<i>International Conference on Machine Learning<\/i>, pp. 6105-6114. 2019.<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>&nbsp; Programmable Dataflow Accelerator Architectures Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, and Joel S. Emer. &#8220;Efficient processing of deep neural networks: A tutorial and survey.&#8221; Proceedings of the IEEE 105, no. 12 (2017): 2295-2329. Yu-Hsin [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":4436,"menu_order":23,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-3600","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/labs.engineering.asu.edu\/mps-lab\/wp-json\/wp\/v2\/pages\/3600","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/labs.engineering.asu.edu\/mps-lab\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/labs.engineering.asu.edu\/mps-lab\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/labs.engineering.asu.edu\/mps-lab\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/labs.engineering.asu.edu\/mps-lab\/wp-json\/wp\/v2\/comments?post=3600"}],"version-history":[{"count":0,"href":"https:\/\/labs.engineering.asu.edu\/mps-lab\/wp-json\/wp\/v2\/pages\/3600\/revisions"}],"up":[{"embeddable":true,"href":"https:\/\/labs.engineering.asu.edu\/mps-lab\/wp-json\/wp\/v2\/pages\/4436"}],"wp:attachment":[{"href":"https:\/\/labs.engineering.asu.edu\/mps-lab\/wp-json\/wp\/v2\/media?parent=3600"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}