鉴于应用支持向量机进行说话人识别过度依赖于选择核函数的问题,提出一种基于组合核函数支持向量机(SVM)的说话人识别方法.对多项式核函数、径向基核函数进行线性加权,构建既具有全局核函数优点又具有局部核函数优点的组合核函数,并通过多重网格搜索调节权重系数使组合核函数适用于当前数据分布,确定组合核函数SVM 的最优参数,实现对说话人的有效识别.对TIMIT 数据集和含噪声数据集的仿真实验显示,基于组合核函数SVM 的说话人识别性能明显优于单一的多项式核函数、径向基核函数和线性核函数.
In speaker recognition systems, if the original data distribution is unknown, the choice of inappropriate kernel functions will result in poor support vector machine (SVM) learning performance. Thus a speaker recognition method based on a multi-grid search of parameters and a combination of kernel functions is proposed in this paper. First, the method constructs a hybrid kernel function by linearly weighted polynomial and RBF kernels. Then it proposes a multi-grid search method to adjust the weights, and thus the hybrid kernel function can adapt to the current data distribution. Finally, a SVM classifier is trained to obtain the classification results. Simulation experiments on TIMIT datasets and noisy datasets show that the recognition performance of SVM classifiers using a combination of kernel functions is better than that using linear kernels, polynomial kernels, and RBF kernels. Therefore, the proposed method can effectively improve the performance of speaker recognition systems.
[1] Reynolds D A, Rose R C. Robust text-independent speaker identification using Gaussian mixture speaker models[J]. IEEE Transactions on Speech and Audio Processing, 1995, 3(1): 72-83.
[2] Gish H, Schmidt M. Text-independent speaker identification[J]. IEEE Signal Processing Magazine, 1994, 11(4): 18-32.
[3] 张亮. 说话人识别中语音增强算法的研究和系统实现[D]. 重庆: 重庆 大学, 2009. Zhang Liang. Speech enhancement algorithm research and system implementation for speaker recognition[D]. Chongqing: Chongqing University, 2009.
[4] Kinnunen T, Li H. An overview of text-independent speaker recognition: From features to supervectors[J]. Speech Communication, 2010, 52(1): 12-40.
[5] Sakoe H, Chiba S. Dynamic programming algorithm optimization for spoken word recognition[J]. IEEE Transactions on Acoustics, Speech and Signal Processing, 1978, 26(1): 43-49.
[6] Togneri R, Pullella D. An overview of speaker identification: Accuracy and robustness issues[J]. IEEE Circuits and Systems Magazine, 2011, 11 (2): 23-61.
[7] Rosenberg A, Soong F. Evaluation of a vector quantization talker recognition system in text independent and text dependent modes[J]. Computer Speech and Language, 1987, 22(4): 143-157.
[8] HigginsA L, Bahler L G, Porter J E. Voice identification using nearestneighbor distance measure[C]. IEEE International Conference on the Acoustics, Speech, and Signal Processing, Minneapolis, USA, April 27- 30, 1993.
[9] Wang G W, Luo S X, He L, et al. Application BP neural network in the speaker recognition based on chaos particle swarm optimization algorithm[J]. Advanced Materials Research, 2013, 765: 2805-2808.
[10] 刘雪燕, 李明, 张亚芬. 基于PCA和多约简SVM的多级说话人辨识[J]. 计算机应用, 2008, 28(1): 127-130. Liu Xueyan, Li Ming, Zhang Yafen. Hierarchical speaker identification based on PCA and multi- reduced SVM[J]. Computer Applications, 2008, 28(1): 127-130.
[11] You C H, Lee K A, Li H. GMM-SVM kernel with a Bhattacharyyabased distance for speaker recognition[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2010, 18(6): 1300-1312.
[12] Fisher W M, Zue V, Bernstein J, et al. An acoustic-phonetic data base[J]. Journal of the Acoustical Society of America, 1987, 81(Suppl 1): 92-93.
[13] Vapnik V. The nature of statistical learning theory[M]. Berlin: Springer Publishing Company, 2000.
[14] 兰均, 施化吉, 李星毅, 等. 基于特征词复合权重的关联网页分类[J]. 计算机科学, 2011, 38(3): 187-190. Lan Jun, Shi Huaji, Li Xingyi, et al. Associative web document classification based on word mixed weight[J]. Computer Science, 2011, 38(3): 187-190.
[15] Kohavi R. A study of cross- validation and bootstrap for accuracy estimation and model selection[C]. 14th International Joint Conference on Artificial Intelligence, Adelaide, Australia, December 10-14, 1995.
[16] Nakagawa S, Wang L, Ohtsuka S. Speaker identification and verification by combining MFCC and phase information[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20 (4): 1085-1095.
[17] Hsu C W, Lin C J. A comparison of methods for multiclass support vector machines[J]. IEEE Transactions on Neural Networks, 2002, 13 (2): 415-425.