Graph interpolating activation improves both natural and robust accuracies in data-efficient deep learning

BAO WANG; STAN J. OSHER

doi:10.1017/S0956792520000406

Graph interpolating activation improves both natural and robust accuracies in data-efficient deep learning

Part of: Artificial intelligence (68Txx)

Published online by Cambridge University Press: 28 December 2020

BAO WANG

and

STAN J. OSHER

Show author details

BAO WANG: Affiliation:
Department of Mathematics, Scientific Computing and Imaging (SCI) Institute, University of Utah, Salt Lake City, UT, USA, email: wangbaonj@gmail.com
STAN J. OSHER: Affiliation:
Department of Mathematics, UCLA, Los Angeles, CA90095-1555, USA, email: sjo@math.ucla.edu

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Improving the accuracy and robustness of deep neural nets (DNNs) and adapting them to small training data are primary tasks in deep learning (DL) research. In this paper, we replace the output activation function of DNNs, typically the data-agnostic softmax function, with a graph Laplacian-based high-dimensional interpolating function which, in the continuum limit, converges to the solution of a Laplace–Beltrami equation on a high-dimensional manifold. Furthermore, we propose end-to-end training and testing algorithms for this new architecture. The proposed DNN with graph interpolating activation integrates the advantages of both deep learning and manifold learning. Compared to the conventional DNNs with the softmax function as output activation, the new framework demonstrates the following major advantages: First, it is better applicable to data-efficient learning in which we train high capacity DNNs without using a large number of training data. Second, it remarkably improves both natural accuracy on the clean images and robust accuracy on the adversarial images crafted by both white-box and black-box adversarial attacks. Third, it is a natural choice for semi-supervised learning. This paper is a significant extension of our earlier work published in NeurIPS, 2018. For reproducibility, the code is available at https://github.com/BaoWangMath/DNN-DataDependentActivation.

Keywords

Data-dependent activation adversarial defense data-efficient learning

MSC classification

Primary: 68T01: General

Secondary: 68T10: Pattern recognition, speech recognition

Type: Papers
Information: European Journal of Applied Mathematics , Volume 32 , Special Issue 3: Connections between Deep learning and Partial Differential Equations , June 2021 , pp. 540 - 569

DOI: https://doi.org/10.1017/S0956792520000406 [Opens in a new window]
Copyright: © The Author(s), 2020. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Agostinelli, F., Hoffman, M., Sadowski, P. & Baldi, P. (2014) Learning Activation Functions to Improve Deep Neural Networks. arXiv preprint arXiv:1412.6830.Google Scholar

Anonymous. (2019) Adversarial Machine Learning against Tesla’s Autopilot. https://www.schneier.com/blog/archives/2019/04/adversarial_mac.html.Google Scholar

Athalye, A., Carlini, N. & Wagner, D. (2018) Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. In: International Conference on Machine Learning.Google Scholar

Bengio, Y., Lamblin, P., Popovici, D. & Larochelle, H. (2007) Greedy layer-wise training of deep networks. In: Advances in Neural Information Processing Systems.Google Scholar

Brendel, W., Rauber, J. & Bethge, M. (2017) Decision-Based adversarial attacks: Reliable attacks against black-box machine learning models. arXiv preprint arXiv:1712.04248.Google Scholar

Carlini, N. & Wagner, D. A. (2016) Towards evaluating the robustness of neural networks. In: IEEE European Symposium on Security and Privacy, pp. 39–57.Google Scholar

Chapelle, O., Scholkopf, B. & Zien, A. (2006) Semi-Supervised Learning, MIT Press, Cambridge, Massachusetts.CrossRef Google Scholar

Chen, X., Liu, C., Li, B., Liu, K. & Song, D. (2017a) Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning. arXiv preprint arXiv:1712.05526.Google Scholar

Chen, Y., Li, J., Xiao, H., Jin, X., Yan, S. & Feng, J. (2017b) Dual path networks. In: Advances in Neural Information Processing Systems.Google Scholar

Cohen, J., Rosenfeld, E. & Kolter, J. Z. (2019) Certified Adversarial Robustness via Randomized Smoothing. arXiv preprint arXiv:1902.02918v1.Google Scholar

Dou, Z., Osher, S. J. & Wang, B. (2018) Mathematical Analysis of Adversarial Attacks. arXiv preprint arXiv:1811.06492.Google Scholar

Glorot, X., Bordes, A. & Bengio, Y. (2011) Deep sparse rectifier neural networks. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 315–323.Google Scholar

Goodfellow, I., Warde-Farley, D., Mirza, M., Courville, A. & Bengio, Y. (2013) Maxout networks. arXiv preprint arXiv:1302.4389.Google Scholar

Goodfellow, I. J., Shlens, J. & Szegedy, C. (2014) Explaining and Harnessing Adversarial Examples. arXiv preprint arXiv:1412.6275.Google Scholar

Guo, C., Rana, M., Cisse, M. & van der Maaten, L. (2018) Countering adversarial images using input transformations. In: International Conference on Learning Representations. https://openreview.net/forum?id=SyJ7ClWCb.Google Scholar

He, K., Zhang, X., Ren, S. & Sun, J. (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp. 1026–1034.CrossRef Google Scholar

He, K., Zhang, X., Ren, S. & Sun, J. (2016a) Identity mappings in deep residual network. In: European Conference on Computer Vision.Google Scholar

He, K., Zhang, X., Ren, S. & Sun, J. (2016b) Identity mappings in deep residual networks. In: European Conference on Computer Vision.CrossRef Google Scholar

He, K., Zhang, X., Ren, S. & Sun, J. (2016c) Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778.CrossRef Google Scholar

Hinton, G., Osindero, S. & Teh, T. (2006) A fast learning algorithm for deep belief nets. Neural Comput. 180 (7), 1527–1554.CrossRef Google Scholar

Hinton, G., Srivastava, N., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. (2012) Improving Neural Networks by Preventing Co-Adaptation of Feature Detectors. arXiv preprint arXiv:1207.0580.Google Scholar

Huang, G., Liu, Z., van der Maaten, L. & Weinberger, K. (2017) Densely connected convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition.CrossRef Google Scholar

Huang, G., Sun, Y., Liu, Z., Sedra, D. & Weinberger, K. (2016) Deep networks with stochastic depth. In: European Conference on Computer Vision.CrossRef Google Scholar

Kingma, D. & Ba, J. (2014) Adam: A Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980.Google Scholar

Krizhevsky, A. (2009) Learning Multiple Layers of Features from Tiny Images. https://www.cs.toronto.edu/~kriz/cifar.html Google Scholar

Krizhevsky, A., Sutskever, I. & Hinton, G. (2012) Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105.Google Scholar

Kurakin, A., Goodfellow, I. & Bengio, S. (2017) Adversarial machine learning at scale. In: International Conference on Learning Representations.Google Scholar

LeCun, Y. (1998) The MNIST Database of Handwritten Digits.Google Scholar

LeCun, Y., Bengio, Y. & Hinton, G. (2015) Deep learning. Nature 521, 436–444.CrossRef Google Scholar PubMed

Lecuyer, M., Atlidakis, V., Geambasu, R., Hsu, D. & Jana, S. (2019) Certified robustness to adversarial examples with differential privacy. In: IEEE Symposium on Security and Privacy (SP).CrossRef Google Scholar

Li, Z. & Shi, Z. (2017) Deep Residual Learning and PDEs on Manifold. arXiv preprint arXiv:1708.05115.Google Scholar

Liu, Y., Chen, X., Liu, C. & Song, D. (2016) Delving into Transferable Adversarial Examples and Black-Box Attacks. arXiv preprint arXiv:1611.02770.Google Scholar

Madry, A., Makelov, A., Schmidt, L., Tsipras, D. & Vladu, A. (2018) Towards deep learning models resistant to adversarial attacks. In: International Conference on Learning Representations. https://openreview.net/forum?id=rJzIBfZAb.Google Scholar

Moosavi-Dezfooli, S.-M., Fawzi, A., Fawzi, O. & Frossard, P. (2017) Universal adversarial perturbations. In: IEEE Conference on Computer Vision and Pattern Recognition, July 2017.CrossRef Google Scholar

Muja, M. & Lowe, D. G. (2014) Scalable nearest neighbor algorithms for high dimensional data. IEEE Pattern Anal. Mach. Intell. (PAMI) 36, 2227–2240.CrossRef Google Scholar PubMed

Nair, V. & Hinton, G. (2010) Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning, pp. 807–814.Google Scholar

Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B. & Ng, A. (2011) Reading digits in natural images with unsupervised features learning. In: NIPS Workshop on Deep Learning and Unsupervised Feature Learning.Google Scholar

Osher, S. J., Wang, B., Yin, P., Luo, X., Pham, M. & Lin, A. (2018) Laplacian Smoothing Gradient Descent. arXiv preprint arXiv:1806.06317.Google Scholar

Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik, Z. B. & Swami, A. (2016a) The limitations of deep learning in adversarial settings. In: IEEE European Symposium on Security and Privacy, pp. 372–387.CrossRef Google Scholar

Papernot, N., McDaniel, P., Wu, X., Jha, S. & Swami, A. (2016b) Distillation as a defense to adversarial perturbations against deep neural networks. In: IEEE European Symposium on Security and Privacy.CrossRef Google Scholar

Papernot, N., McDaniel, P. D. & Goodfellow, I. J. (2016c) Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples. CoRR, abs/1605.07277. http://arxiv.org/abs/1605.07277.Google Scholar

Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L. & Lerer, A. (2017) Automatic Differentiation in PyTorch. https://openreview.net/forum?id=BJJsrmfCZ.Google Scholar

Ross, A. & Doshi-Velez, F. (2017) Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing Their Input Gradients. arXiv preprint arXiv:1711.09404.Google Scholar

Samangouei, P., Kabkab, M. & Chellappa, R. (2018) Defense-GAN: protecting classifiers against adversarial attacks using generative models. In: International Conference on Learning Representations. https://openreview.net/forum?id=BkJ3ibb0-.Google Scholar

Shi, Z., Wang, B. & Osher, S. (2018) Error Estimation of the Weighted Nonlocal Laplacian on Random Point Cloud. arXiv preprint arXiv:1809.08622.Google Scholar

Simonyan, K. & Zisserman, A. (2014) Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv preprint arXiv:1409.1556.Google Scholar

Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I. & Fergus, R. (2013) Intriguing Properties of Neural Networks. arXiv preprint arXiv:1312.6199.Google Scholar

Tang, Y. (2013) Deep Learning Using Linear Support Vector Machines. arXiv:1306.0239.Google Scholar

Tramèr, F., Kurakin, A., Papernot, N., Goodfellow, I., Boneh, D. & McDaniel, P. (2018) Ensemble adversarial training: attacks and defenses. In: International Conference on Learning Representations. https://openreview.net/forum?id=rkZvSe-RZ.Google Scholar

Verma, V., Lamb, A., Beckham, C., Najafi, A., Mitiagkas, I., Courville, A., Lopez-Paz, D. & Bengio, Y. (2018) Manifold Mixup: Better Representations by Interpolating Hidden States. arXiv preprint arXiv:1806.05236.Google Scholar

Wan, L., Zeiler, M., Zhang, S., LeCun, Y. & Fergus, R. (2013) Regularization of neural networks using dropconnect. In: International Conference on Machine Learning, pp. 1058–1066.Google Scholar

Wang, B., Lin, A. T., Shi, Z., Zhu, W., Yin, P., Bertozzi, A. L. & Osher, S. J. (2018a) Adversarial Defense via Data Dependent Activation Function and Total Variation Minimization. arXiv preprint arXiv:1809.08516.Google Scholar

Wang, B., Luo, X., Li, Z., Zhu, W., Shi, Z. & Osher, S. (2018b) Deep neural nets with interpolating function as output activation. In: Advances in Neural Information Processing Systems.Google Scholar

Wang, B., Yuan, B., Shi, Z. & Osher, S. (2019) ResNets ensemble via the Feynman-Kac formalism to improve natural and robust accuracies. In: Advances in Neural Information Processing Systems.Google Scholar

Zagoruyko, S. & Komodakis, N. (2016) Wide residual networks. In: British Machine Vision Conference.CrossRef Google Scholar

Zhang, H., Yu, Y., Jiao, J., Xing, E., Ghaoui, L. & Jordan, M. (2019) Theoretically Principled Trade-Off between Robustness and Accuracy. arXiv preprint arXiv:1901.08573.Google Scholar

Zheng, S., Song, Y., Leung, T. & Goodfellow, I. (2016) Improving the robustness of deep neural networks via stability training. In: IEEE Conference on Computer Vision and Pattern Recognition.CrossRef Google Scholar

Article contents

Graph interpolating activation improves both natural and robust accuracies in data-efficient deep learning

Abstract

Keywords

MSC classification

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests