论文

Batch-attention:深度学习中一种新的协调过拟合与欠拟合的方法

  • 胡涵清 ,
  • 李政勋 ,
  • 吴竹南
展开
  • 北京信息科技大学经济管理学院,北京 100192
胡涵清,副教授,研究方向为大数据分析与挖掘、机器学习算法等,电子信箱:hanqinghu@bistu.edu.cn

收稿日期: 2022-07-18

  修回日期: 2023-02-07

  网络出版日期: 2023-08-11

基金资助

北京信息科技大学“促进同校分类发展——经管学院专业学位点与研究生教育改革”项目

Batch-attention: A method for reconciling overfitting and underfitting in deep learning

  • HU Hanqing ,
  • LI Zhengxun ,
  • WU Zhunan
Expand
  • School of Economics and Management,Beijing Information Science & Technology University, Beijing 100192,China

Received date: 2022-07-18

  Revised date: 2023-02-07

  Online published: 2023-08-11

摘要

在深度学习网络训练的过程中,现有大多数提升模型效果的方法都集中在网络上,要提升模型的效果与准确率,就须关注数据的特性。提出了一种新的深度学习模型训练框架Batch-attention,从数据层面出发,改变了原有训练方式,经实验证明可以协调深度学习模型的过拟合与欠拟合。通过在Cifar10与Cifar100数据集上分别采用 Resnet34、Transformer和efficientnet-b7进行实验对比,证明了采用Batch-attention的模型相对于基准模型,在测试集上的准确率与F1-score均有一定提升。在测试实验中,进一步分析了Batch-attention的作用机制。

本文引用格式

胡涵清 , 李政勋 , 吴竹南 . Batch-attention:深度学习中一种新的协调过拟合与欠拟合的方法[J]. 科技导报, 2023 , 41(13) : 100 -108 . DOI: 10.3981/j.issn.1000-7857.2023.13.010

Abstract

In the process of deep learning network training, most existing methods aim to improve the model effect focus on the network. However, to improve the effect and accuracy of the model it is necessary to pay attention to the characteristics of the data. In this paper, batch-attention, a new training framework for deep learning model, is proposed, which changes the original training method from the data level. It is shown that the method can coordinate overfitting and underfitting of the deep learning model. Experimental comparisons using Resnet34, TNT and efficientnet-b7 on Cifar10 and Cifar100 data sets respectively prove that the batch-attention model has improved both accuracy and F1-score in the test set compared with the benchmark model. In addition, the mechanism of batch-attention is further analyzed in the follow-up experiment.

参考文献

[1] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90.
[2] Srivastava N, Hinton G, Krizhevsky A, et al. Dropout: A simple way to prevent neural networks from overfitting[J]. Journal of Machine Learning Research, 2014, 15: 1929-1958.
[3] Nowlan S J, Hinton G E. Simplifying neural networks by soft weight-sharing[J]. Neural Computation, 1992, 4(4):473-493.
[4] Shorten C, Khoshgoftaar T M. A survey on image data augmentation for deep learning[J]. Journal of Big Data, 2019, 6(60): 1-48.
[5] Tenenbaum J B, Kemp C, Griffiths T L, et al. How to grow a mind: Statistics, structure, and abstraction[J]. Science, 2011, 331(6022): 1279-1285.
[6] He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2016: 770-778.
[7] Han K, Wang Y H, Chen H T, et al. A Survey on vision transformer[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 45(1): 87-110.
[8] Breiman L. Bagging predictors[J]. Machine Learning, 1996, 24(2): 123-140.
[9] Wong T T. Performance evaluation of classification algorithms by k-fold and leave-one-out cross validation[J]. Pattern Recognition, 2015, 48(9): 2839-2846.
[10] Athitsos V, Alon J, Sclaroff S, et al. BoostMap: An embedding method for efficient nearest neighbor retrieval[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(1): 89-104.
[11] Goodfellow I J, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets[C]//Advances in Neural Information Processing Systems 27. La Jolla: Neural Information Processing Systems, 2014: 2672-2680.
[12] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems 30. La Jolla: Neural Information Processing Systems, 2017: 6000-6010.
[13] Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift[J]. International Conference on Machine Learning, 2015, 37(1): 448-456.
[14] Takagi S, Yoshida Y, Okada M. Impact of layer normalization on single-layer perceptron: Statistical mechanical analysis[J]. Journal of the Physical Society of Japan, 2019, 88(7): 074003.
[15] Byrd R H, Chin G M, Nocedal J, et al. Sample size selection in optimization methods for machine learning[J].Mathematical Programming, 2012, 134(1): 127-155.
[16] Li M, Zhang T, Chen Y Q, et al. Efficient mini-batch training for stochastic optimization[C]//Proceedings of the 20th ACM Sigkdd International Conference on Knowledge Discovery and Data Mining. New York: Association for Computing Machinery, 2014: 661-670.
[17] Dekel O, Gilad-Bachrach R, Shamir O, et al. Optimal distributed online prediction using mini-batches[J]. Journal of Machine Learning Research, 2012, 13: 165-202.
[18] Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: Machine learning in Python[J]. Journal of Machine Learning Research, 2011, 12: 2825-2830.
[19] Cai Z, Vasconcelos N. Cascade R-CNN: High quality object detection and instance segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(5): 1483-1498.
[20] Dietterich T G. Ensemble methods in machine learning[M]//Multiple Classifier Systems. Berlin: Springer-Verlag, 2000: 1-15.
[21] Panigrahi S, Nanda A, Swamkar T. Deep learning approach for image classification[C]//2nd International Conference on Data Science and Business Analytics. Piscataway, NJ: IEEE, 2018: 511-516.
[22] Sun Y, Wang X, Tang X. Deep learning face representation from predicting 10000 classes[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2014: 1891-1898.
[23] Friedman J, Hastie T, Tibshirani R. Additive logistic regression: A statistical view of boosting[J]. The Annals of Statistics, 2000, 28: 337-407.
[24] Pang Y, Sun M, Jiang X, et al. Convolution in convolution for network in network[J]. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29(5): 1587-1597.
[25] Hong D F, Han Z, Yao J, et al. Spectral former: Rethinking hyperspectral image classification with transformers[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5518615.
[26] Paszke A, Gross S, Massa F, et al. PyTorch: An imperative style, high-performance deep learning library[C]//Advances in Neural Information Processing Systems 32. La Jolla: Neural Information Processing Systems, 2019: 8026-8037.
[27] Zhang Z, Sabuncu M R. Generalized cross entropy loss for training deep neural networks with noisy labels[C]//Advances in Neural Information Processing Systems 31. La Jolla: Neural Information Processing Systems, 2018: 8792-8802.
[28] Liu Z Q, Cao Y W, Wang Y Z, et al. Computer visionbased concrete crack detection using U-net fully convolutional networks[J]. Automation in Construction, 2019, 104: 129-139.
[29] Guo S, Wang K, Kang H, et al. BTS-DSN: Deeply supervised neural network with short connections for retinal vessel segmentation[J]. International Journal of Medical Informatics, 2019, 126: 105-113.
[30] Mikolajczyk K, Schmid C. A performance evaluation of local descriptors[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(10): 1615-1630.
[31] Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(2): 318-327.
文章导航

/