|
|
Speaker recognition method based on combination of kernel functions of SVM |
FAN Chijie1, SI Qiaomei1, XU Yan2, ZHANG Dan1, CAI Chunhua1, YU Xu3 |
1. School of Engineering, Mudanjiang Normal University, Mudanjiang 157012, China;
2. School of Information and Electrical Engineering, Mudanjiang University, Mudanjiang 157011, China;
3. School of Information Science and Technology, Qingdao University of Science and Technology, Qingdao 266061, China |
|
|
Abstract: In speaker recognition systems, if the original data distribution is unknown, the choice of inappropriate kernel functions will result in poor support vector machine (SVM) learning performance. Thus a speaker recognition method based on a multi-grid search of parameters and a combination of kernel functions is proposed in this paper. First, the method constructs a hybrid kernel function by linearly weighted polynomial and RBF kernels. Then it proposes a multi-grid search method to adjust the weights, and thus the hybrid kernel function can adapt to the current data distribution. Finally, a SVM classifier is trained to obtain the classification results. Simulation experiments on TIMIT datasets and noisy datasets show that the recognition performance of SVM classifiers using a combination of kernel functions is better than that using linear kernels, polynomial kernels, and RBF kernels. Therefore, the proposed method can effectively improve the performance of speaker recognition systems.
|
Received: 20 June 2014
|
|
|
|
|
[1] Reynolds D A, Rose R C. Robust text-independent speaker identification using Gaussian mixture speaker models[J]. IEEE Transactions on Speech and Audio Processing, 1995, 3(1): 72-83.
[2] Gish H, Schmidt M. Text-independent speaker identification[J]. IEEE Signal Processing Magazine, 1994, 11(4): 18-32.
[3] 张亮. 说话人识别中语音增强算法的研究和系统实现[D]. 重庆: 重庆 大学, 2009. Zhang Liang. Speech enhancement algorithm research and system implementation for speaker recognition[D]. Chongqing: Chongqing University, 2009.
[4] Kinnunen T, Li H. An overview of text-independent speaker recognition: From features to supervectors[J]. Speech Communication, 2010, 52(1): 12-40.
[5] Sakoe H, Chiba S. Dynamic programming algorithm optimization for spoken word recognition[J]. IEEE Transactions on Acoustics, Speech and Signal Processing, 1978, 26(1): 43-49.
[6] Togneri R, Pullella D. An overview of speaker identification: Accuracy and robustness issues[J]. IEEE Circuits and Systems Magazine, 2011, 11 (2): 23-61.
[7] Rosenberg A, Soong F. Evaluation of a vector quantization talker recognition system in text independent and text dependent modes[J]. Computer Speech and Language, 1987, 22(4): 143-157.
[8] HigginsA L, Bahler L G, Porter J E. Voice identification using nearestneighbor distance measure[C]. IEEE International Conference on the Acoustics, Speech, and Signal Processing, Minneapolis, USA, April 27- 30, 1993.
[9] Wang G W, Luo S X, He L, et al. Application BP neural network in the speaker recognition based on chaos particle swarm optimization algorithm[J]. Advanced Materials Research, 2013, 765: 2805-2808.
[10] 刘雪燕, 李明, 张亚芬. 基于PCA和多约简SVM的多级说话人辨识[J]. 计算机应用, 2008, 28(1): 127-130. Liu Xueyan, Li Ming, Zhang Yafen. Hierarchical speaker identification based on PCA and multi- reduced SVM[J]. Computer Applications, 2008, 28(1): 127-130.
[11] You C H, Lee K A, Li H. GMM-SVM kernel with a Bhattacharyyabased distance for speaker recognition[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2010, 18(6): 1300-1312.
[12] Fisher W M, Zue V, Bernstein J, et al. An acoustic-phonetic data base[J]. Journal of the Acoustical Society of America, 1987, 81(Suppl 1): 92-93.
[13] Vapnik V. The nature of statistical learning theory[M]. Berlin: Springer Publishing Company, 2000.
[14] 兰均, 施化吉, 李星毅, 等. 基于特征词复合权重的关联网页分类[J]. 计算机科学, 2011, 38(3): 187-190. Lan Jun, Shi Huaji, Li Xingyi, et al. Associative web document classification based on word mixed weight[J]. Computer Science, 2011, 38(3): 187-190.
[15] Kohavi R. A study of cross- validation and bootstrap for accuracy estimation and model selection[C]. 14th International Joint Conference on Artificial Intelligence, Adelaide, Australia, December 10-14, 1995.
[16] Nakagawa S, Wang L, Ohtsuka S. Speaker identification and verification by combining MFCC and phase information[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20 (4): 1085-1095.
[17] Hsu C W, Lin C J. A comparison of methods for multiclass support vector machines[J]. IEEE Transactions on Neural Networks, 2002, 13 (2): 415-425. |
[1] |
WANG Hongzhi, LIU Zhen, LI Donghui. A Network Traffic Prediction Method Based on Multi-class Support Vector Machine[J]. journal1, 2014, 32(17): 60-63. |
[2] |
HUANG Lu;LI Ran;GU Jun. EEG Signals Classification Based on AR Model and SVM Algorithm[J]. , 2013, 31(35): 24-27. |
[3] |
ZHANG Qinli;CHENG Jian;CHEN Qiusong;HU Wei;ZHOU Bihui. Prediction of Backfill Drill-hole Life Based on Combined Model of GA-SVM and Neural Network[J]. , 2013, 31(34): 34-38. |
[4] |
PAN Ping;LUO Hui;WANG Yang. Speaker Recognition Method Based on Quantum Logic Cir-cuit Neural Networks[J]. , 2013, 31(33): 15-18. |
[5] |
LI Jing;XIE Ting. Predictive Model of Steel Plate Temperature in Heat Treatment Furnace Based on LS-SVM[J]. , 2012, 30(8): 41-44. |
[6] |
WANG Peixi;ZHANG Jing. Improved PSO-LSSVM Productivity Prediction Model for the Fractured Horizontal Well in Volcanic Gas Reservoir[J]. , 2011, 29(33): 52-57. |
|
|
|
|