Skip to content

pocketsphinx_continuous is unable to convert speech to text #199

@ghost

Description

Hi All,
I recently compiled the pocket sphinx on MIPS architecture and then I recorded a 16 bit 16000 Hz mono audio and then I tried to run pocketsphinx_continuous using the following command
pocketsphinx_continuous -hmm en-us/ -lm TAR9897/9897.lm -dict TAR9897/9897.dic -infile input.wav

But I am unable to get any speech from the audio but when I tried to do the same in my laptop it was working. Please help

I am sharing the logs below

INFO: pocketsphinx.c(153): Parsed model-specific feature parameters from en-us//feat.params
Current configuration:
[NAME]			[DEFLT]		[VALUE]
-agc			none		none
-agcthresh		2.0		2.000000e+00
-allphone				
-allphone_ci		yes		yes
-alpha			0.97		9.700000e-01
-ascale			20.0		2.000000e+01
-aw			1		1
-backtrace		no		no
-beam			1e-48		1.000000e-48
-bestpath		yes		yes
-bestpathlw		9.5		9.500000e+00
-ceplen			13		13
-cmn			live		batch
-cmninit		40,3,-1		41.00,-5.29,-0.12,5.09,2.48,-4.07,-1.37,-1.78,-5.08,-2.05,-6.45,-1.42,1.17
-compallsen		no		no
-dict					TAR9897/9897.dic
-dictcase		no		no
-dither			no		no
-doublebw		no		no
-ds			1		1
-fdict					
-feat			1s_c_d_dd	1s_c_d_dd
-featparams				
-fillprob		1e-8		1.000000e-08
-frate			100		100
-fsg					
-fsgusealtpron		yes		yes
-fsgusefiller		yes		yes
-fwdflat		yes		yes
-fwdflatbeam		1e-64		1.000000e-64
-fwdflatefwid		4		4
-fwdflatlw		8.5		8.500000e+00
-fwdflatsfwin		25		25
-fwdflatwbeam		7e-29		7.000000e-29
-fwdtree		yes		yes
-hmm					en-us/
-input_endian		little		little
-jsgf					
-keyphrase				
-kws					
-kws_delay		10		10
-kws_plp		1e-1		1.000000e-01
-kws_threshold		1e-30		1.000000e-30
-latsize		5000		5000
-lda					
-ldadim			0		0
-lifter			0		22
-lm					TAR9897/9897.lm
-lmctl					
-lmname					
-logbase		1.0001		1.000100e+00
-logfn					
-logspec		no		no
-lowerf			133.33334	1.300000e+02
-lpbeam			1e-40		1.000000e-40
-lponlybeam		7e-29		7.000000e-29
-lw			6.5		6.500000e+00
-maxhmmpf		30000		30000
-maxwpf			-1		-1
-mdef					
-mean					
-mfclogdir				
-min_endfr		0		0
-mixw					
-mixwfloor		0.0000001	1.000000e-07
-mllr					
-mmap			yes		yes
-ncep			13		13
-nfft			512		512
-nfilt			40		25
-nwpen			1.0		1.000000e+00
-pbeam			1e-48		1.000000e-48
-pip			1.0		1.000000e+00
-pl_beam		1e-10		1.000000e-10
-pl_pbeam		1e-10		1.000000e-10
-pl_pip			1.0		1.000000e+00
-pl_weight		3.0		3.000000e+00
-pl_window		5		5
-rawlogdir				
-remove_dc		no		no
-remove_noise		yes		yes
-remove_silence		yes		yes
-round_filters		yes		yes
-samprate		16000		1.600000e+04
-seed			-1		-1
-sendump				
-senlogdir				
-senmgau				
-silprob		0.005		5.000000e-03
-smoothspec		no		no
-svspec					0-12/13-25/26-38
-tmat					
-tmatfloor		0.0001		1.000000e-04
-topn			4		4
-topn_beam		0		0
-toprule				
-transform		legacy		dct
-unit_area		yes		yes
-upperf			6855.4976	6.800000e+03
-uw			1.0		1.000000e+00
-vad_postspeech		50		50
-vad_prespeech		20		20
-vad_startspeech	10		10
-vad_threshold		3.0		3.000000e+00
-var					
-varfloor		0.0001		1.000000e-04
-varnorm		no		no
-verbose		no		no
-warp_params				
-warp_type		inverse_linear	inverse_linear
-wbeam			7e-29		7.000000e-29
-wip			0.65		6.500000e-01
-wlen			0.025625	2.562500e-02

INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='batch', VARNORM='no', AGC='none'
INFO: acmod.c(162): Using subvector specification 0-12/13-25/26-38
INFO: mdef.c(518): Reading model definition: en-us//mdef
INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file
INFO: bin_mdef.c(337): Reading binary model definition: en-us//mdef
INFO: bin_mdef.c(517): 42 CI-phone, 137053 CD-phone, 3 emitstate/phone, 126 CI-sen, 5126 Sen, 29324 Sen-Seq
INFO: tmat.c(149): Reading HMM transition probability matrices: en-us//transition_matrices
INFO: acmod.c(113): Attempting to use PTM computation module
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: en-us//means
INFO: ms_gauden.c(242): 42 codebook, 3 feature, size: 
INFO: ms_gauden.c(244):  128x13
INFO: ms_gauden.c(244):  128x13
INFO: ms_gauden.c(244):  128x13
INFO: ms_gauden.c(127): Reading mixture gaussian parameter: en-us//variances
INFO: ms_gauden.c(242): 42 codebook, 3 feature, size: 
INFO: ms_gauden.c(244):  128x13
INFO: ms_gauden.c(244):  128x13
INFO: ms_gauden.c(244):  128x13
INFO: ms_gauden.c(304): 222 variance values floored
INFO: ptm_mgau.c(475): Loading senones from dump file en-us//sendump
INFO: ptm_mgau.c(499): BEGIN FILE FORMAT DESCRIPTION
INFO: ptm_mgau.c(562): Rows: 128, Columns: 5126
INFO: ptm_mgau.c(594): Using memory-mapped I/O for senones
INFO: ptm_mgau.c(837): Maximum top-N: 4
INFO: phone_loop_search.c(114): State beam -225 Phone exit beam -225 Insertion penalty 0
INFO: dict.c(320): Allocating 4130 * 20 bytes (80 KiB) for word entries
INFO: dict.c(333): Reading main dictionary: TAR9897/9897.dic
INFO: dict.c(213): Dictionary size 29, allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(336): 29 words read
INFO: dict.c(358): Reading filler dictionary: en-us//noisedict
INFO: dict.c(213): Dictionary size 34, allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(361): 5 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(406): Allocating 42^3 * 2 bytes (144 KiB) for word-initial triphones
INFO: dict2pid.c(132): Allocated 21336 bytes (20 KiB) for word-final triphones
INFO: dict2pid.c(196): Allocated 21336 bytes (20 KiB) for single-phone word triphones
INFO: ngram_model_trie.c(354): Trying to read LM in trie binary format
INFO: ngram_model_trie.c(365): Header doesn't match
INFO: ngram_model_trie.c(177): Trying to read LM in arpa format
INFO: ngram_model_trie.c(193): LM of order 3
INFO: ngram_model_trie.c(195): #1-grams: 26
INFO: ngram_model_trie.c(195): #2-grams: 41
INFO: ngram_model_trie.c(195): #3-grams: 36
INFO: lm_trie.c(474): Training quantizer
INFO: lm_trie.c(482): Building LM trie
INFO: ngram_search_fwdtree.c(74): Initializing search tree
INFO: ngram_search_fwdtree.c(101): 26 unique initial diphones
INFO: ngram_search_fwdtree.c(186): Creating search channels
INFO: ngram_search_fwdtree.c(323): Max nonroot chan increased to 202
INFO: ngram_search_fwdtree.c(333): Created 26 root, 74 non-root channels, 5 single-phone words
INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25
INFO: continuous.c(307): pocketsphinx_continuous COMPILED ON: Mar 10 2020, AT: 16:01:07

INFO: cmn_live.c(120): Update from < 41.00 -5.29 -0.12  5.09  2.48 -4.07 -1.37 -1.78 -5.08 -2.05 -6.45 -1.42  1.17 >
INFO: cmn_live.c(138): Update to   < 43.50 12.27  5.78  3.19 -3.29  0.34 -5.26 -11.33  4.52  0.05 -2.95 11.94 -3.49 >
INFO: ngram_search_fwdtree.c(1550):      525 words recognized (6/fr)
INFO: ngram_search_fwdtree.c(1552):    27719 senones evaluated (292/fr)
INFO: ngram_search_fwdtree.c(1556):    15927 channels searched (167/fr), 2366 1st, 9410 last
INFO: ngram_search_fwdtree.c(1559):      724 words for which last channels evaluated (7/fr)
INFO: ngram_search_fwdtree.c(1561):      876 candidate words for entering last phone (9/fr)
INFO: ngram_search_fwdtree.c(1564): fwdtree 1.34 CPU 1.406 xRT
INFO: ngram_search_fwdtree.c(1567): fwdtree 1.37 wall 1.446 xRT
INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 10 words
INFO: ngram_search_fwdflat.c(948):      547 words recognized (6/fr)
INFO: ngram_search_fwdflat.c(950):    13371 senones evaluated (141/fr)
INFO: ngram_search_fwdflat.c(952):    11350 channels searched (119/fr)
INFO: ngram_search_fwdflat.c(954):      879 words searched (9/fr)
INFO: ngram_search_fwdflat.c(957):      312 word transitions (3/fr)
INFO: ngram_search_fwdflat.c(960): fwdflat 0.45 CPU 0.474 xRT
INFO: ngram_search_fwdflat.c(963): fwdflat 0.48 wall 0.501 xRT
INFO: ngram_search.c(1250): lattice start node <s>.0 end node </s>.88
INFO: ngram_search.c(1276): Eliminated 0 nodes before end node
INFO: ngram_search.c(1381): Lattice has 211 nodes, 433 links
INFO: ps_lattice.c(1376): Bestpath score: -1763
INFO: ps_lattice.c(1380): Normalizer P(O) = alpha(</s>:88:93) = -105689
INFO: ps_lattice.c(1437): Joint P(O,S) = -110043 P(S|O) = -4354
INFO: ngram_search.c(872): bestpath 0.01 CPU 0.006 xRT
INFO: ngram_search.c(875): bestpath 0.01 wall 0.008 xRT

INFO: ngram_search_fwdtree.c(429): TOTAL fwdtree 1.34 CPU 1.421 xRT
INFO: ngram_search_fwdtree.c(432): TOTAL fwdtree 1.37 wall 1.461 xRT
INFO: ngram_search_fwdflat.c(176): TOTAL fwdflat 0.45 CPU 0.479 xRT
INFO: ngram_search_fwdflat.c(179): TOTAL fwdflat 0.48 wall 0.507 xRT
INFO: ngram_search.c(303): TOTAL bestpath 0.01 CPU 0.006 xRT
INFO: ngram_search.c(306): TOTAL bestpath 0.01 wall 0.008 xRT
 

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions