{"id":4007,"date":"2020-02-20T18:45:50","date_gmt":"2020-02-20T18:45:50","guid":{"rendered":"https:\/\/labs.engineering.asu.edu\/mps-lab\/?post_type=research&#038;p=4007"},"modified":"2025-05-30T17:31:16","modified_gmt":"2025-05-31T00:31:16","slug":"reliability","status":"publish","type":"research","link":"https:\/\/labs.engineering.asu.edu\/mps-lab\/research\/reliability\/","title":{"rendered":"Reliability and Robustness for Machine Learning"},"content":{"rendered":"\n<h3 class=\"wp-block-heading has-text-align-center\">Our Vision<\/h3>\n\n\n\n<p><i>We envision a machine learning landscape where reliability and robustness are not an afterthought, but a foundational pillar\u2014enabling dependable execution even in the presence of various threats in real-world environments. Hardware faults, such as soft errors, can suddenly affect the results of machine learning models. Malicious attackers can add adversarial perturbation to the input to change the inference result. Even in the absence of faults and adversarial attacks, the machine learning model can encounter untrained, out-of-distribution (OOD) inputs that the model cannot handle correctly. Our goal is to empower reliable and robust machine learning models with holistic countermeasures against soft errors, adversarial inputs, and OOD inputs.<\/i><\/p>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:33.33%\">\n<div class=\"wp-block-group is-nowrap is-layout-flex wp-container-core-group-is-layout-6c531013 wp-block-group-is-layout-flex\">\n<div class=\"wp-block-buttons is-layout-flex wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button\"><a class=\"wp-block-button__link has-asu-maroon-background-color has-background has-small-font-size has-custom-font-size wp-element-button\" href=\"https:\/\/docs.google.com\/document\/d\/1NOoZxUBtLrMPTwM_sFFKDbVnmguQH8y-ESd4fVlYV_Q\/edit#heading=h.yeepu7u2h8b\">Reading List<\/a><\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-buttons is-layout-flex wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button\"><a class=\"wp-block-button__link has-asu-maroon-background-color has-background has-small-font-size has-custom-font-size wp-element-button\" href=\"https:\/\/labs.engineering.asu.edu\/mps-lab\/publications\/?tgid=5\">Publications<\/a><\/div>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:66.66%\"><\/div>\n<\/div>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">Introduction<\/h3>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Challenge:<\/strong><\/h4>\n\n\n\n<p>With the surge of deep neural networks (DNNs), machine learning plays a key role in most modern computing including safety-critical applications such as autonomous driving. In such safety-critical applications, malfunction of machine learning models can result in catastrophic consequences. In real-world environments, various threats such as soft errors, adversarial attacks, and out-of-distribution (OOD) inputs can induce the malfunction of machine learning models. This research aims to detect such threats, and further, differentiate and handle the detected threats.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Reliability Enhancement for Machine Learning Against Soft Errors<\/strong><\/h4>\n\n\n\n<p>Neural networks have been known to be inherently robust against faults due to their distributed structure and intrinsic redundancy. Still, a recent study<sup>1<\/sup> found that neural networks without protection cannot satisfy the strict reliability standard. Various studies proposed soft error mitigation solutions based on fault detection algorithm, training additional small network to mitigate the fault, or detecting the abnormal activation values. Among such solutions, algorithm-based fault tolerance (ABFT) solutions for neural networks can provide higher fault detection capability. Our research extended previous detection-only ABFT to detect and correct the fault, with the error correction algorithm based on the hamming distance and software-level checkpointing for deep neural networks.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1920\" height=\"434\" src=\"https:\/\/labs.engineering.asu.edu\/mps-lab\/wp-content\/uploads\/sites\/8\/2020\/02\/maintaining_sanity_figure_1-1.png\" alt=\"\" class=\"wp-image-7203\" srcset=\"https:\/\/labs.engineering.asu.edu\/mps-lab\/wp-content\/uploads\/sites\/8\/2020\/02\/maintaining_sanity_figure_1-1.png 1920w, https:\/\/labs.engineering.asu.edu\/mps-lab\/wp-content\/uploads\/sites\/8\/2020\/02\/maintaining_sanity_figure_1-1-300x68.png 300w, https:\/\/labs.engineering.asu.edu\/mps-lab\/wp-content\/uploads\/sites\/8\/2020\/02\/maintaining_sanity_figure_1-1-400x90.png 400w, https:\/\/labs.engineering.asu.edu\/mps-lab\/wp-content\/uploads\/sites\/8\/2020\/02\/maintaining_sanity_figure_1-1-768x174.png 768w, https:\/\/labs.engineering.asu.edu\/mps-lab\/wp-content\/uploads\/sites\/8\/2020\/02\/maintaining_sanity_figure_1-1-1536x347.png 1536w\" sizes=\"auto, (max-width: 1920px) 100vw, 1920px\" \/><figcaption class=\"wp-element-caption\">The proposed algorithm-based fault tolerance (ABFT) for a fully-connected layer.<\/figcaption><\/figure>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-image aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"461\" src=\"https:\/\/labs.engineering.asu.edu\/mps-lab\/wp-content\/uploads\/sites\/8\/2024\/08\/reliability-figure-5-1024x461.png\" alt=\"\" class=\"wp-image-6720\" srcset=\"https:\/\/labs.engineering.asu.edu\/mps-lab\/wp-content\/uploads\/sites\/8\/2024\/08\/reliability-figure-5-1024x461.png 1024w, https:\/\/labs.engineering.asu.edu\/mps-lab\/wp-content\/uploads\/sites\/8\/2024\/08\/reliability-figure-5-300x135.png 300w, https:\/\/labs.engineering.asu.edu\/mps-lab\/wp-content\/uploads\/sites\/8\/2024\/08\/reliability-figure-5-768x346.png 768w, https:\/\/labs.engineering.asu.edu\/mps-lab\/wp-content\/uploads\/sites\/8\/2024\/08\/reliability-figure-5-1536x691.png 1536w, https:\/\/labs.engineering.asu.edu\/mps-lab\/wp-content\/uploads\/sites\/8\/2024\/08\/reliability-figure-5.png 1920w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">The proposed algorithm-based fault tolerance (ABFT) for a convolution layer.<\/figcaption><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1751\" height=\"1080\" src=\"https:\/\/labs.engineering.asu.edu\/mps-lab\/wp-content\/uploads\/sites\/8\/2020\/02\/maintaining_sanity_figure_3.png\" alt=\"\" class=\"wp-image-7204\" srcset=\"https:\/\/labs.engineering.asu.edu\/mps-lab\/wp-content\/uploads\/sites\/8\/2020\/02\/maintaining_sanity_figure_3.png 1751w, https:\/\/labs.engineering.asu.edu\/mps-lab\/wp-content\/uploads\/sites\/8\/2020\/02\/maintaining_sanity_figure_3-300x185.png 300w, https:\/\/labs.engineering.asu.edu\/mps-lab\/wp-content\/uploads\/sites\/8\/2020\/02\/maintaining_sanity_figure_3-400x247.png 400w, https:\/\/labs.engineering.asu.edu\/mps-lab\/wp-content\/uploads\/sites\/8\/2020\/02\/maintaining_sanity_figure_3-768x474.png 768w, https:\/\/labs.engineering.asu.edu\/mps-lab\/wp-content\/uploads\/sites\/8\/2020\/02\/maintaining_sanity_figure_3-1536x947.png 1536w\" sizes=\"auto, (max-width: 1751px) 100vw, 1751px\" \/><figcaption class=\"wp-element-caption\">Checkpointing of the proposed ABFT<\/figcaption><\/figure>\n<\/div>\n<\/div>\n\n\n\n<p><strong>Detecting, Differentiating, and Mitigating Different Types of Threats<\/strong><\/p>\n\n\n\n<p>Since it is almost impossible to ensure the correctness of machine learning model in the presence of threats including soft errors, adversarial attacks, and out-of-distribution (OOD) inputs, it is essential to detect such threats. The straightforward solution after the detection of threats is rejecting the inference under the effect of the threat to avoid the malfunction. A more advanced approach may involve applying tailored countermeasure for each threat, but applying proper countermeasure requires correct differentiation between threats. For example, re-executing the inference in the presence of a soft error can resolve the effect of fault, while the re-execution cannot handle adversarial attacks and OOD inputs. Our research topic includes holistic detecting and differentiating solutions against soft errors, adversarial attacks, and OOD inputs as well as mitigation solutions for each threat.<\/p>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"999\" height=\"842\" src=\"https:\/\/labs.engineering.asu.edu\/mps-lab\/wp-content\/uploads\/sites\/8\/2020\/02\/ood_vs_fault.png\" alt=\"\" class=\"wp-image-7205\" srcset=\"https:\/\/labs.engineering.asu.edu\/mps-lab\/wp-content\/uploads\/sites\/8\/2020\/02\/ood_vs_fault.png 999w, https:\/\/labs.engineering.asu.edu\/mps-lab\/wp-content\/uploads\/sites\/8\/2020\/02\/ood_vs_fault-300x253.png 300w, https:\/\/labs.engineering.asu.edu\/mps-lab\/wp-content\/uploads\/sites\/8\/2020\/02\/ood_vs_fault-400x337.png 400w, https:\/\/labs.engineering.asu.edu\/mps-lab\/wp-content\/uploads\/sites\/8\/2020\/02\/ood_vs_fault-768x647.png 768w\" sizes=\"auto, (max-width: 999px) 100vw, 999px\" \/><figcaption class=\"wp-element-caption\">The softmax score distribution of in-distribution (ID) and OOD inputs with and without faults. The abnormality of the softmax scores with faults enables differentiation between ID, OOD, and faulty inference.<\/figcaption><\/figure>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1025\" height=\"780\" src=\"https:\/\/labs.engineering.asu.edu\/mps-lab\/wp-content\/uploads\/sites\/8\/2020\/02\/adversarial_mitigation.png\" alt=\"\" class=\"wp-image-7206\" srcset=\"https:\/\/labs.engineering.asu.edu\/mps-lab\/wp-content\/uploads\/sites\/8\/2020\/02\/adversarial_mitigation.png 1025w, https:\/\/labs.engineering.asu.edu\/mps-lab\/wp-content\/uploads\/sites\/8\/2020\/02\/adversarial_mitigation-300x228.png 300w, https:\/\/labs.engineering.asu.edu\/mps-lab\/wp-content\/uploads\/sites\/8\/2020\/02\/adversarial_mitigation-400x304.png 400w, https:\/\/labs.engineering.asu.edu\/mps-lab\/wp-content\/uploads\/sites\/8\/2020\/02\/adversarial_mitigation-768x584.png 768w\" sizes=\"auto, (max-width: 1025px) 100vw, 1025px\" \/><figcaption class=\"wp-element-caption\">The mitigation solution for adversarial attacks, which detects the adversarial samples from the feature information, reconstructs the attacked image with reverse attack, and re-classifying the reconstructed image by additional network trained with reverse attacked samples.<\/figcaption><\/figure>\n<\/div>\n<\/div>\n\n\n\n<p><sup>1<\/sup>He, Y., Balaprakash, P., &amp; Li, Y. (2020, October). Fidelity: Efficient resilience analysis framework for deep learning accelerators. In 2020 53rd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO) (pp. 270-281). IEEE.<br><\/p>\n\n\n\n<h4 class=\"wp-block-heading\">&nbsp;<\/h4>\n","protected":false},"excerpt":{"rendered":"<p class=\"mb-2\">Our Vision We envision a machine learning landscape where reliability and robustness are not an afterthought, but a foundational pillar\u2014enabling dependable execution even in the presence of various threats in real-world environments. Hardware faults, such as soft errors, can suddenly affect the results of machine learning models. Malicious attackers can add adversarial perturbation to the&#8230;<\/p>\n","protected":false},"featured_media":6721,"parent":0,"menu_order":6,"template":"","meta":{"_acf_changed":false,"footnotes":""},"class_list":["post-4007","research","type-research","status-publish","has-post-thumbnail","hentry"],"acf":[],"_links":{"self":[{"href":"https:\/\/labs.engineering.asu.edu\/mps-lab\/wp-json\/wp\/v2\/research\/4007","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/labs.engineering.asu.edu\/mps-lab\/wp-json\/wp\/v2\/research"}],"about":[{"href":"https:\/\/labs.engineering.asu.edu\/mps-lab\/wp-json\/wp\/v2\/types\/research"}],"version-history":[{"count":0,"href":"https:\/\/labs.engineering.asu.edu\/mps-lab\/wp-json\/wp\/v2\/research\/4007\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/labs.engineering.asu.edu\/mps-lab\/wp-json\/wp\/v2\/media\/6721"}],"wp:attachment":[{"href":"https:\/\/labs.engineering.asu.edu\/mps-lab\/wp-json\/wp\/v2\/media?parent=4007"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}