Skip to content

Commit d0c41e4

Browse files
authored
Merge pull request #292 from cmusphinx/pocketsphinx-python-compat
Add compatibility with pocketsphinx-python 0.1.5
2 parents 7ff1d2e + 9c8012d commit d0c41e4

File tree

13 files changed

+665
-15
lines changed

13 files changed

+665
-15
lines changed

CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,7 @@ else()
103103
add_subdirectory(doxygen)
104104
add_subdirectory(include)
105105
add_subdirectory(programs)
106+
add_subdirectory(examples)
106107
if(CMAKE_PROJECT_NAME STREQUAL PROJECT_NAME AND BUILD_TESTING)
107108
add_subdirectory(test)
108109
endif()

cython/README.md

Lines changed: 188 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,188 @@
1+
PocketSphinx 5.0.0 release candidate 3
2+
======================================
3+
4+
This is PocketSphinx, one of Carnegie Mellon University's open source large
5+
vocabulary, speaker-independent continuous speech recognition engines.
6+
7+
Although this was at one point a research system, active development
8+
has largely ceased and it has become very, very far from the state of
9+
the art. I am making a release, because people are nonetheless using
10+
it, and there are a number of historical errors in the build system
11+
and API which needed to be corrected.
12+
13+
The version number is strangely large because there was a "release"
14+
that people are using called 5prealpha, and we will use proper
15+
[semantic versioning](https://semver.org/) from now on.
16+
17+
**Please see the LICENSE file for terms of use.**
18+
19+
Installation
20+
------------
21+
22+
You should be able to install this with pip for recent platforms and
23+
versions of Python:
24+
25+
pip3 install pocketsphinx5
26+
27+
Alternately, you can also compile it from the source tree. I highly
28+
suggest doing this in a virtual environment (replace
29+
`~/ve_pocketsphinx` with the virtual environment you wish to create),
30+
from the top level directory:
31+
32+
python3 -m venv ~/ve_pocketsphinx
33+
. ~/ve_pocketsphinx/bin/activate
34+
pip3 install .
35+
36+
On GNU/Linux and maybe other platforms, you must have
37+
[PortAudio](http://www.portaudio.com/) installed for the `LiveSpeech`
38+
class to work (we may add a fall-back to `sox` in the near future).
39+
On Debian-like systems this can be achieved by installing the
40+
`libportaudio2` package:
41+
42+
sudo apt-get install libportaudio2
43+
44+
Usage
45+
-----
46+
47+
See the [examples directory](../examples/) for a number of examples of
48+
using the library from Python. You can also read the [documentation
49+
for the Python API](https://pocketsphinx5.readthedocs.io) or [the C
50+
API](https://cmusphinx.github.io/doc/pocketsphinx/).
51+
52+
It also mostly supports the same APIs as the previous
53+
[pocketsphinx-python](https://github.com/bambocher/pocketsphinx-python)
54+
module, as described below.
55+
56+
### LiveSpeech
57+
58+
An iterator class for continuous recognition or keyword search from a
59+
microphone. For example, to do speech-to-text with the default (some
60+
kind of US English) model:
61+
62+
```python
63+
from pocketsphinx5 import LiveSpeech
64+
for phrase in LiveSpeech(): print(phrase)
65+
```
66+
67+
Or to do keyword search:
68+
69+
```python
70+
from pocketsphinx5 import LiveSpeech
71+
72+
speech = LiveSpeech(keyphrase='forward', kws_threshold=1e-20)
73+
for phrase in speech:
74+
print(phrase.segments(detailed=True))
75+
```
76+
77+
With your model and dictionary:
78+
79+
```python
80+
import os
81+
from pocketsphinx5 import LiveSpeech, get_model_path
82+
83+
speech = LiveSpeech(
84+
sampling_rate=16000, # optional
85+
hmm=get_model_path('en-us'),
86+
lm=get_model_path('en-us.lm.bin'),
87+
dic=get_model_path('cmudict-en-us.dict')
88+
)
89+
90+
for phrase in speech:
91+
print(phrase)
92+
```
93+
94+
### AudioFile
95+
96+
This is an iterator class for continuous recognition or keyword search
97+
from a file. Currently it supports only raw, single-channel, 16-bit
98+
PCM data in native byte order.
99+
100+
```python
101+
from pocketsphinx5 import AudioFile
102+
for phrase in AudioFile("goforward.raw"): print(phrase) # => "go forward ten meters"
103+
```
104+
105+
An example of a keyword search:
106+
107+
```python
108+
from pocketsphinx5 import AudioFile
109+
110+
audio = AudioFile("goforward.raw", keyphrase='forward', kws_threshold=1e-20)
111+
for phrase in audio:
112+
print(phrase.segments(detailed=True)) # => "[('forward', -617, 63, 121)]"
113+
```
114+
115+
With your model and dictionary:
116+
117+
```python
118+
import os
119+
from pocketsphinx5 import AudioFile, get_model_path
120+
121+
model_path = get_model_path()
122+
123+
config = {
124+
'verbose': False,
125+
'audio_file': 'goforward.raw',
126+
'hmm': get_model_path('en-us'),
127+
'lm': get_model_path('en-us.lm.bin'),
128+
'dict': get_model_path('cmudict-en-us.dict')
129+
}
130+
131+
audio = AudioFile(**config)
132+
for phrase in audio:
133+
print(phrase)
134+
```
135+
136+
Convert frame into time coordinates:
137+
138+
```python
139+
from pocketsphinx5 import AudioFile
140+
141+
# Frames per Second
142+
fps = 100
143+
144+
for phrase in AudioFile(frate=fps): # frate (default=100)
145+
print('-' * 28)
146+
print('| %5s | %3s | %4s |' % ('start', 'end', 'word'))
147+
print('-' * 28)
148+
for s in phrase.seg():
149+
print('| %4ss | %4ss | %8s |' % (s.start_frame / fps, s.end_frame / fps, s.word))
150+
print('-' * 28)
151+
152+
# ----------------------------
153+
# | start | end | word |
154+
# ----------------------------
155+
# | 0.0s | 0.24s | <s> |
156+
# | 0.25s | 0.45s | <sil> |
157+
# | 0.46s | 0.63s | go |
158+
# | 0.64s | 1.16s | forward |
159+
# | 1.17s | 1.52s | ten |
160+
# | 1.53s | 2.11s | meters |
161+
# | 2.12s | 2.6s | </s> |
162+
# ----------------------------
163+
```
164+
165+
Authors
166+
-------
167+
168+
PocketSphinx is ultimately based on `Sphinx-II` which in turn was
169+
based on some older systems at Carnegie Mellon University, which were
170+
released as free software under a BSD-like license thanks to the
171+
efforts of Kevin Lenzo. Much of the decoder in particular was written
172+
by Ravishankar Mosur (look for "rkm" in the comments), but various
173+
other people contributed as well, see [the AUTHORS file](./AUTHORS)
174+
for more details.
175+
176+
David Huggins-Daines (the author of this document) is
177+
guilty^H^H^H^H^Hresponsible for creating `PocketSphinx` which added
178+
various speed and memory optimizations, fixed-point computation, JSGF
179+
support, portability to various platforms, and a somewhat coherent
180+
API. He then disappeared for a while.
181+
182+
Nickolay Shmyrev took over maintenance for quite a long time
183+
afterwards, and a lot of code was contributed by Alexander Solovets,
184+
Vyacheslav Klimkov, and others. The
185+
[pocketsphinx-python](https://github.com/bambocher/pocketsphinx-python)
186+
module was originally written by Dmitry Prazdnichnov.
187+
188+
Currently this is maintained by David Huggins-Daines again.

0 commit comments

Comments
 (0)