Speaker recognition method based on combination of kernel functions of SVM

  • FAN Chijie ,
  • SI Qiaomei ,
  • XU Yan ,
  • ZHANG Dan ,
  • CAI Chunhua ,
  • YU Xu
  • 1. School of Engineering, Mudanjiang Normal University, Mudanjiang 157012, China;
    2. School of Information and Electrical Engineering, Mudanjiang University, Mudanjiang 157011, China;
    3. School of Information Science and Technology, Qingdao University of Science and Technology, Qingdao 266061, China

Received date: 2014-06-20

  Revised date: 2014-11-07

  Online published: 2015-02-02


In speaker recognition systems, if the original data distribution is unknown, the choice of inappropriate kernel functions will result in poor support vector machine (SVM) learning performance. Thus a speaker recognition method based on a multi-grid search of parameters and a combination of kernel functions is proposed in this paper. First, the method constructs a hybrid kernel function by linearly weighted polynomial and RBF kernels. Then it proposes a multi-grid search method to adjust the weights, and thus the hybrid kernel function can adapt to the current data distribution. Finally, a SVM classifier is trained to obtain the classification results. Simulation experiments on TIMIT datasets and noisy datasets show that the recognition performance of SVM classifiers using a combination of kernel functions is better than that using linear kernels, polynomial kernels, and RBF kernels. Therefore, the proposed method can effectively improve the performance of speaker recognition systems.

Cite this article

FAN Chijie , SI Qiaomei , XU Yan , ZHANG Dan , CAI Chunhua , YU Xu . Speaker recognition method based on combination of kernel functions of SVM[J]. Science & Technology Review, 2015 , 33(1) : 90 -94 . DOI: 10.3981/j.issn.1000-7857.2015.01.016


[1] Reynolds D A, Rose R C. Robust text-independent speaker identification using Gaussian mixture speaker models[J]. IEEE Transactions on Speech and Audio Processing, 1995, 3(1): 72-83.
[2] Gish H, Schmidt M. Text-independent speaker identification[J]. IEEE Signal Processing Magazine, 1994, 11(4): 18-32.
[3] 张亮. 说话人识别中语音增强算法的研究和系统实现[D]. 重庆: 重庆 大学, 2009. Zhang Liang. Speech enhancement algorithm research and system implementation for speaker recognition[D]. Chongqing: Chongqing University, 2009.
[4] Kinnunen T, Li H. An overview of text-independent speaker recognition: From features to supervectors[J]. Speech Communication, 2010, 52(1): 12-40.
[5] Sakoe H, Chiba S. Dynamic programming algorithm optimization for spoken word recognition[J]. IEEE Transactions on Acoustics, Speech and Signal Processing, 1978, 26(1): 43-49.
[6] Togneri R, Pullella D. An overview of speaker identification: Accuracy and robustness issues[J]. IEEE Circuits and Systems Magazine, 2011, 11 (2): 23-61.
[7] Rosenberg A, Soong F. Evaluation of a vector quantization talker recognition system in text independent and text dependent modes[J]. Computer Speech and Language, 1987, 22(4): 143-157.
[8] HigginsA L, Bahler L G, Porter J E. Voice identification using nearestneighbor distance measure[C]. IEEE International Conference on the Acoustics, Speech, and Signal Processing, Minneapolis, USA, April 27- 30, 1993.
[9] Wang G W, Luo S X, He L, et al. Application BP neural network in the speaker recognition based on chaos particle swarm optimization algorithm[J]. Advanced Materials Research, 2013, 765: 2805-2808.
[10] 刘雪燕, 李明, 张亚芬. 基于PCA和多约简SVM的多级说话人辨识[J]. 计算机应用, 2008, 28(1): 127-130. Liu Xueyan, Li Ming, Zhang Yafen. Hierarchical speaker identification based on PCA and multi- reduced SVM[J]. Computer Applications, 2008, 28(1): 127-130.
[11] You C H, Lee K A, Li H. GMM-SVM kernel with a Bhattacharyyabased distance for speaker recognition[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2010, 18(6): 1300-1312.
[12] Fisher W M, Zue V, Bernstein J, et al. An acoustic-phonetic data base[J]. Journal of the Acoustical Society of America, 1987, 81(Suppl 1): 92-93.
[13] Vapnik V. The nature of statistical learning theory[M]. Berlin: Springer Publishing Company, 2000.
[14] 兰均, 施化吉, 李星毅, 等. 基于特征词复合权重的关联网页分类[J]. 计算机科学, 2011, 38(3): 187-190. Lan Jun, Shi Huaji, Li Xingyi, et al. Associative web document classification based on word mixed weight[J]. Computer Science, 2011, 38(3): 187-190.
[15] Kohavi R. A study of cross- validation and bootstrap for accuracy estimation and model selection[C]. 14th International Joint Conference on Artificial Intelligence, Adelaide, Australia, December 10-14, 1995.
[16] Nakagawa S, Wang L, Ohtsuka S. Speaker identification and verification by combining MFCC and phase information[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20 (4): 1085-1095.
[17] Hsu C W, Lin C J. A comparison of methods for multiclass support vector machines[J]. IEEE Transactions on Neural Networks, 2002, 13 (2): 415-425.