实验:建立五个词的识别系统(步骤以及结果显示)

(PS:只作为尝试,不保证正确)
系统简介:
HTK是隐马尔可夫模型工具箱,由剑桥大学工程系研发而成。该工具箱的目的是搭建使用隐马尔可夫模型。详见:http://htk.eng.cam.ac.uk/
搭建步骤:
a)      训练库创建:词汇集中的每个元素进行多次录制,且对相应词汇做好标签;
b)      声学分析:将波形数据文件转换为一系列系数向量;
c)      模型定义:为总词汇集中的每个元素定义一个HMM原型;
d)      模型训练:使用训练数据对每个HMM模型进行初始化、训练;
e)      任务定义:识别系统的语法(什么可被识别)的定义;
f)       未知输入信号识别;
g)      评估:识别系统的性能可通过测试数据进行评估。

工作环境构建:
创建如下目录结构:
a)      data/:存储训练和测试数据(语音信号、标签等等) ,包括2个子目录,data/train/和 data/test/,用来区分识别系统的训练数据和评估数据;
b)      analysis/:存储声学分析步骤的文件;
c)      training/:存储初始化和训练步骤的相关文件;
d)      model/:存储识别系统的模型(HMMs)的相关文件;
e)      def/:存储任务定义的相关文件;
f)       test/:存储测试相关文件。
后期要建立的几个文件:analysis.conf   targetlist.txt  hmmlist.txt   trainlist.txt

过程:
1、  建立训练资料
a. 录制音频
HSLab name.sig
b. 标记信号
在HSLab中标记信号位置

2、声学分析
a. 配置参数(analysis.conf)
#
# Example of an acoustical analysis configuration file
#
SOURCEFORMAT = HTK # Gives the format of the speech files
TARGETKIND = MFCC_0_D_A # Identifier of the coefficients to use
# Unit = 0.1 micro-second :
WINDOWSIZE = 250000.0 # = 25 ms = length of a time frame
TARGETRATE = 100000.0 # = 10 ms = frame periodicity
NUMCEPS = 12 # Number of MFCC coeffs (here from c1 to c12)
USEHAMMING = T # Use of Hamming function for windowing frames
PREEMCOEF = 0.97 # Pre-emphasis coefficient
NUMCHANS = 26 # Number of filterbank channels
CEPLIFTER = 22 # Length of cepstral liftering
# The End
b. 源目标列表(targetlist.txt)
data/train/sig/name.sig data/train/mfcc/name.mfcc
etc...
c. 使用HCopy进行声学分析
>>>HCopy -A -D -C analysis.conf -S targetlist.txt
HCopy -A -D -C analysis.conf -S targetlist.txt
HTK Configuration Parameters[9]
Module/Tool     Parameter                  Value
#                 CEPLIFTER                       22
#                 NUMCHANS                      26
#                 PREEMCOEF               0.970000
#                 USEHAMMING                  TRUE
#                 NUMCEPS                       12
#                 TARGETRATE         100000.000000
#                 WINDOWSIZE         250000.000000
#                 TARGETKIND            MFCC_0_D_A
#                 SOURCEFORMAT                 HTK
HTK Configuration Parameters[9]
Module/Tool     Parameter                  Value
CEPLIFTER                     22
NUMCHANS                      26
PREEMCOEF               0.970000
USEHAMMING                  TRUE
NUMCEPS                       12
TARGETRATE         100000.000000
WINDOWSIZE         250000.000000
TARGETKIND            MFCC_0_D_A
SOURCEFORMAT                 HTK

3. 定义模型
~o <VecSize> 39 <MFCC_0_D_A>
~h "lable"
<BeginHMM>
<NumStates> 6
<State> 2
<Mean> 39
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
<Variance> 39
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
<State> 3
<Mean> 39
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
<Variance> 39
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
<State> 4
<Mean> 39
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
<Variance> 39
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
<State> 5
<Mean> 39
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
<Variance> 39
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
<TransP> 6
0.0 0.5 0.5 0.0 0.0 0.0
0.0 0.4 0.3 0.3 0.0 0.0
0.0 0.0 0.4 0.3 0.3 0.0
0.0 0.0 0.0 0.4 0.3 0.3
0.0 0.0 0.0 0.0 0.5 0.5
0.0 0.0 0.0 0.0 0.0 0.0
<EndHMM>
4. 模型训练
a. 初始化
>HInit -A -D -T 1 -S trainlist.txt -M model/hmm0 –H  model/proto/hmm_name –l name -L data/train/lab/ name
……
No HTK Configuration Parameters Set
Initialising  HMM name . . .
States   :   2  3  4  5 (width)
Mixes  s1:   1  1  1  1 ( 39  )
Num Using:   0  0  0  0
Parm Kind:  MFCC_D_A_0
Number of owners = 1
SegLab   : name
maxIter  :  20
epsilon  :  0.000100
minSeg   :  3
Updating :  Means Variances MixWeights/DProbs TransProbs
– system is PLAIN
10 Observation Sequences Loaded
Starting Estimation Process
Iteration 1: Average LogP = -4510.80420
Iteration 2: Average LogP = -4420.17188  Change =    90.63232
Iteration 3: Average LogP = -4408.49854  Change =    11.67334
Iteration 4: Average LogP = -4403.09863  Change =     5.39990
Iteration 5: Average LogP = -4400.53662  Change =     2.56201
Iteration 6: Average LogP = -4398.79785  Change =     1.73877
Iteration 7: Average LogP = -4398.36572  Change =     0.43213
Iteration 8: Average LogP = -4398.14648  Change =     0.21924
Iteration 9: Average LogP = -4397.99072  Change =     0.15576
Iteration 10: Average LogP = -4397.79785  Change =     0.19287
Iteration 11: Average LogP = -4397.42090  Change =     0.37695
Iteration 12: Average LogP = -4397.29492  Change =     0.12598
Iteration 13: Average LogP = -4397.29492  Change =     0.00000
Estimation converged at iteration 14
Output written to directory model/hmm0
No HTK Configuration Parameters Set

b. 训练
HRest 迭代(即当前再估计迭代中的迭代)显示在屏幕上,通过 change量度标示收敛性。一旦这个量度不再从一个 HRest迭代到下个迭代减少(绝对值),过程就该停止了。
???Questions: HRest训练如何选择最优收敛模型以及如何确定迭代次数
实验迭代12次后仍旧不能很好地收敛,怀疑是和HMM的模型定义有关系。

>>>HRest -A -D -T 1 -S trainlist.txt -M model/hmm1 -H model/hmm0/hmm_name -l name -L data/train/lab/ name
……
No HTK Configuration Parameters Set
Reestimating HMM name . . .
States   :   2  3  4  5 (width)
Mixes  s1:   1  1  1  1 ( 39  )
Num Using:   0  0  0  0
Parm Kind:  MFCC_D_A_0
Number of owners = 1
SegLab   :  name
MaxIter  :  20
Epsilon  :  0.000100
Updating :  Transitions Means Variances
– system is PLAIN
10 Examples loaded, Max length = 69, Min length = 43
Ave LogProb at iter 1 = -4397.05420 using 10 examples
Ave LogProb at iter 2 = -4396.95020 using 10 examples  change =    0.10400
Ave LogProb at iter 3 = -4396.83057 using 10 examples  change =    0.11963
Ave LogProb at iter 4 = -4396.80225 using 10 examples  change =    0.02832
Ave LogProb at iter 5 = -4396.80127 using 10 examples  change =    0.00098
Ave LogProb at iter 6 = -4396.80029 using 10 examples  change =    0.00098
Ave LogProb at iter 7 = -4396.80078 using 10 examples  change =   -0.00049
Ave LogProb at iter 8 = -4396.79980 using 10 examples  change =    0.00098
Ave LogProb at iter 9 = -4396.79980 using 10 examples  change =    0.00000
Estimation converged at iteration 9
No HTK Configuration Parameters Set

5. 定义任务
a. 语法(gram.txt)
/*
* Task grammar
*/
$WORD = NAME1|NAME2|……;
( { START_SIL } [ $WORD ] { END_SIL } )
QQQ:为了可以识别连续的多个词需要修改语法网,不知如何改,所以每次只能识别出一个词。
b. 字典(dict.txt)
NAME  [name]  name
……

START_SIL [sil] sil
END_SIL [sil] sil
c. 使用HParse和HSGen建立状态网络
HParse -A -D -T 1 def/gram.txt def/net.slf
HSGen -A -D -n 10 -s def/net.slf def/dict.txt
!! dict.txt的文件末尾一定要添一个换行符!!!

6. 识别未知信号,使用HVite
>>>HSLab test.sig
>>>HCopy -A -D -C analysis.conf -S test_targetlist.txt
HCopy -A -D -C analysis.conf -S test_targetlist.txt
HTK Configuration Parameters[9]
Module/Tool     Parameter                  Value
#                 CEPLIFTER                     22
#                 NUMCHANS                      26
#                 PREEMCOEF               0.970000
#                 USEHAMMING                  TRUE
#                 NUMCEPS                       12
#                 TARGETRATE         100000.000000
#                 WINDOWSIZE         250000.000000
#                 TARGETKIND            MFCC_0_D_A
#                 SOURCEFORMAT                 HTK
HTK Configuration Parameters[9]
Module/Tool     Parameter                  Value
CEPLIFTER                     22
NUMCHANS                      26
PREEMCOEF               0.970000
USEHAMMING                  TRUE
NUMCEPS                       12
TARGETRATE         100000.000000
WINDOWSIZE         250000.000000
TARGETKIND            MFCC_0_D_A
SOURCEFORMAT                 HTK
D:\My_Graduation_Project\Demo>HVite -A -D -T 1 -H model/hmm6/hmm_name1 -H model/hmm6/hmm_name2 …… -i reco_test_4.mlf -w def/net.slf def/dict.txt hmmlist.txt data/test/test.mfcc
……
No HTK Configuration Parameters Set
Read 6 physical / 6 logical HMMs
Read lattice with 11 nodes / 18 arcs
Created network with 20 nodes / 27 links
File: data/test/test_4.mfcc
START_SIL TAIWAN END_SIL END_SIL END_SIL END_SIL  ==  [83 frames] -93.5447
[Ac=-7764.2 LM=0.0] (Act=17.7)
No HTK Configuration Parameters Set

3 thoughts on “实验:建立五个词的识别系统(步骤以及结果显示)”

  1. 我使用的是开源的HTK。
    HTK是英国剑桥大学开发的一套基于C语言的隐马尔科夫模型工具箱,主要应用于语音识别、语音合成的研究,也被用在其他领域,如字符识别和DNA排序等。HTK是重量级的HMM版本。
    HTK主页:http://htk.eng.cam.ac.uk/

  2. 您好 首先感谢你写的这篇文章,让我对语音识别的具体步骤有了个比较直观的认识,不过有个问题没有明白 整个模型的编码使用的什么语言 ?

Leave a Reply

Your email address will not be published. Required fields are marked *