In the process of deep learning network training, most existing methods aim to improve the model effect focus on the network. However, to improve the effect and accuracy of the model it is necessary to pay attention to the characteristics of the data. In this paper, batch-attention, a new training framework for deep learning model, is proposed, which changes the original training method from the data level. It is shown that the method can coordinate overfitting and underfitting of the deep learning model. Experimental comparisons using Resnet34, TNT and efficientnet-b7 on Cifar10 and Cifar100 data sets respectively prove that the batch-attention model has improved both accuracy and F1-score in the test set compared with the benchmark model. In addition, the mechanism of batch-attention is further analyzed in the follow-up experiment.
[1] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90.
[2] Srivastava N, Hinton G, Krizhevsky A, et al. Dropout: A simple way to prevent neural networks from overfitting[J]. Journal of Machine Learning Research, 2014, 15: 1929-1958.
[3] Nowlan S J, Hinton G E. Simplifying neural networks by soft weight-sharing[J]. Neural Computation, 1992, 4(4):473-493.
[4] Shorten C, Khoshgoftaar T M. A survey on image data augmentation for deep learning[J]. Journal of Big Data, 2019, 6(60): 1-48.
[5] Tenenbaum J B, Kemp C, Griffiths T L, et al. How to grow a mind: Statistics, structure, and abstraction[J]. Science, 2011, 331(6022): 1279-1285.
[6] He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2016: 770-778.
[7] Han K, Wang Y H, Chen H T, et al. A Survey on vision transformer[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 45(1): 87-110.
[8] Breiman L. Bagging predictors[J]. Machine Learning, 1996, 24(2): 123-140.
[9] Wong T T. Performance evaluation of classification algorithms by k-fold and leave-one-out cross validation[J]. Pattern Recognition, 2015, 48(9): 2839-2846.
[10] Athitsos V, Alon J, Sclaroff S, et al. BoostMap: An embedding method for efficient nearest neighbor retrieval[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(1): 89-104.
[11] Goodfellow I J, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets[C]//Advances in Neural Information Processing Systems 27. La Jolla: Neural Information Processing Systems, 2014: 2672-2680.
[12] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems 30. La Jolla: Neural Information Processing Systems, 2017: 6000-6010.
[13] Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift[J]. International Conference on Machine Learning, 2015, 37(1): 448-456.
[14] Takagi S, Yoshida Y, Okada M. Impact of layer normalization on single-layer perceptron: Statistical mechanical analysis[J]. Journal of the Physical Society of Japan, 2019, 88(7): 074003.
[15] Byrd R H, Chin G M, Nocedal J, et al. Sample size selection in optimization methods for machine learning[J].Mathematical Programming, 2012, 134(1): 127-155.
[16] Li M, Zhang T, Chen Y Q, et al. Efficient mini-batch training for stochastic optimization[C]//Proceedings of the 20th ACM Sigkdd International Conference on Knowledge Discovery and Data Mining. New York: Association for Computing Machinery, 2014: 661-670.
[17] Dekel O, Gilad-Bachrach R, Shamir O, et al. Optimal distributed online prediction using mini-batches[J]. Journal of Machine Learning Research, 2012, 13: 165-202.
[18] Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: Machine learning in Python[J]. Journal of Machine Learning Research, 2011, 12: 2825-2830.
[19] Cai Z, Vasconcelos N. Cascade R-CNN: High quality object detection and instance segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(5): 1483-1498.
[20] Dietterich T G. Ensemble methods in machine learning[M]//Multiple Classifier Systems. Berlin: Springer-Verlag, 2000: 1-15.
[21] Panigrahi S, Nanda A, Swamkar T. Deep learning approach for image classification[C]//2nd International Conference on Data Science and Business Analytics. Piscataway, NJ: IEEE, 2018: 511-516.
[22] Sun Y, Wang X, Tang X. Deep learning face representation from predicting 10000 classes[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2014: 1891-1898.
[23] Friedman J, Hastie T, Tibshirani R. Additive logistic regression: A statistical view of boosting[J]. The Annals of Statistics, 2000, 28: 337-407.
[24] Pang Y, Sun M, Jiang X, et al. Convolution in convolution for network in network[J]. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29(5): 1587-1597.
[25] Hong D F, Han Z, Yao J, et al. Spectral former: Rethinking hyperspectral image classification with transformers[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5518615.
[26] Paszke A, Gross S, Massa F, et al. PyTorch: An imperative style, high-performance deep learning library[C]//Advances in Neural Information Processing Systems 32. La Jolla: Neural Information Processing Systems, 2019: 8026-8037.
[27] Zhang Z, Sabuncu M R. Generalized cross entropy loss for training deep neural networks with noisy labels[C]//Advances in Neural Information Processing Systems 31. La Jolla: Neural Information Processing Systems, 2018: 8792-8802.
[28] Liu Z Q, Cao Y W, Wang Y Z, et al. Computer visionbased concrete crack detection using U-net fully convolutional networks[J]. Automation in Construction, 2019, 104: 129-139.
[29] Guo S, Wang K, Kang H, et al. BTS-DSN: Deeply supervised neural network with short connections for retinal vessel segmentation[J]. International Journal of Medical Informatics, 2019, 126: 105-113.
[30] Mikolajczyk K, Schmid C. A performance evaluation of local descriptors[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(10): 1615-1630.
[31] Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(2): 318-327.