|
| 1 | +PocketSphinx 5.0.0 release candidate 3 |
| 2 | +====================================== |
| 3 | + |
| 4 | +This is PocketSphinx, one of Carnegie Mellon University's open source large |
| 5 | +vocabulary, speaker-independent continuous speech recognition engines. |
| 6 | + |
| 7 | +Although this was at one point a research system, active development |
| 8 | +has largely ceased and it has become very, very far from the state of |
| 9 | +the art. I am making a release, because people are nonetheless using |
| 10 | +it, and there are a number of historical errors in the build system |
| 11 | +and API which needed to be corrected. |
| 12 | + |
| 13 | +The version number is strangely large because there was a "release" |
| 14 | +that people are using called 5prealpha, and we will use proper |
| 15 | +[semantic versioning](https://semver.org/) from now on. |
| 16 | + |
| 17 | +**Please see the LICENSE file for terms of use.** |
| 18 | + |
| 19 | +Installation |
| 20 | +------------ |
| 21 | + |
| 22 | +You should be able to install this with pip for recent platforms and |
| 23 | +versions of Python: |
| 24 | + |
| 25 | + pip3 install pocketsphinx5 |
| 26 | + |
| 27 | +Alternately, you can also compile it from the source tree. I highly |
| 28 | +suggest doing this in a virtual environment (replace |
| 29 | +`~/ve_pocketsphinx` with the virtual environment you wish to create), |
| 30 | +from the top level directory: |
| 31 | + |
| 32 | + python3 -m venv ~/ve_pocketsphinx |
| 33 | + . ~/ve_pocketsphinx/bin/activate |
| 34 | + pip3 install . |
| 35 | + |
| 36 | +On GNU/Linux and maybe other platforms, you must have |
| 37 | +[PortAudio](http://www.portaudio.com/) installed for the `LiveSpeech` |
| 38 | +class to work (we may add a fall-back to `sox` in the near future). |
| 39 | +On Debian-like systems this can be achieved by installing the |
| 40 | +`libportaudio2` package: |
| 41 | + |
| 42 | + sudo apt-get install libportaudio2 |
| 43 | + |
| 44 | +Usage |
| 45 | +----- |
| 46 | + |
| 47 | +See the [examples directory](../examples/) for a number of examples of |
| 48 | +using the library from Python. You can also read the [documentation |
| 49 | +for the Python API](https://pocketsphinx5.readthedocs.io) or [the C |
| 50 | +API](https://cmusphinx.github.io/doc/pocketsphinx/). |
| 51 | + |
| 52 | +It also mostly supports the same APIs as the previous |
| 53 | +[pocketsphinx-python](https://github.com/bambocher/pocketsphinx-python) |
| 54 | +module, as described below. |
| 55 | + |
| 56 | +### LiveSpeech |
| 57 | + |
| 58 | +An iterator class for continuous recognition or keyword search from a |
| 59 | +microphone. For example, to do speech-to-text with the default (some |
| 60 | +kind of US English) model: |
| 61 | + |
| 62 | +```python |
| 63 | +from pocketsphinx5 import LiveSpeech |
| 64 | +for phrase in LiveSpeech(): print(phrase) |
| 65 | +``` |
| 66 | + |
| 67 | +Or to do keyword search: |
| 68 | + |
| 69 | +```python |
| 70 | +from pocketsphinx5 import LiveSpeech |
| 71 | + |
| 72 | +speech = LiveSpeech(keyphrase='forward', kws_threshold=1e-20) |
| 73 | +for phrase in speech: |
| 74 | + print(phrase.segments(detailed=True)) |
| 75 | +``` |
| 76 | + |
| 77 | +With your model and dictionary: |
| 78 | + |
| 79 | +```python |
| 80 | +import os |
| 81 | +from pocketsphinx5 import LiveSpeech, get_model_path |
| 82 | + |
| 83 | +speech = LiveSpeech( |
| 84 | + sampling_rate=16000, # optional |
| 85 | + hmm=get_model_path('en-us'), |
| 86 | + lm=get_model_path('en-us.lm.bin'), |
| 87 | + dic=get_model_path('cmudict-en-us.dict') |
| 88 | +) |
| 89 | + |
| 90 | +for phrase in speech: |
| 91 | + print(phrase) |
| 92 | +``` |
| 93 | + |
| 94 | +### AudioFile |
| 95 | + |
| 96 | +This is an iterator class for continuous recognition or keyword search |
| 97 | +from a file. Currently it supports only raw, single-channel, 16-bit |
| 98 | +PCM data in native byte order. |
| 99 | + |
| 100 | +```python |
| 101 | +from pocketsphinx5 import AudioFile |
| 102 | +for phrase in AudioFile("goforward.raw"): print(phrase) # => "go forward ten meters" |
| 103 | +``` |
| 104 | + |
| 105 | +An example of a keyword search: |
| 106 | + |
| 107 | +```python |
| 108 | +from pocketsphinx5 import AudioFile |
| 109 | + |
| 110 | +audio = AudioFile("goforward.raw", keyphrase='forward', kws_threshold=1e-20) |
| 111 | +for phrase in audio: |
| 112 | + print(phrase.segments(detailed=True)) # => "[('forward', -617, 63, 121)]" |
| 113 | +``` |
| 114 | + |
| 115 | +With your model and dictionary: |
| 116 | + |
| 117 | +```python |
| 118 | +import os |
| 119 | +from pocketsphinx5 import AudioFile, get_model_path |
| 120 | + |
| 121 | +model_path = get_model_path() |
| 122 | + |
| 123 | +config = { |
| 124 | + 'verbose': False, |
| 125 | + 'audio_file': 'goforward.raw', |
| 126 | + 'hmm': get_model_path('en-us'), |
| 127 | + 'lm': get_model_path('en-us.lm.bin'), |
| 128 | + 'dict': get_model_path('cmudict-en-us.dict') |
| 129 | +} |
| 130 | + |
| 131 | +audio = AudioFile(**config) |
| 132 | +for phrase in audio: |
| 133 | + print(phrase) |
| 134 | +``` |
| 135 | + |
| 136 | +Convert frame into time coordinates: |
| 137 | + |
| 138 | +```python |
| 139 | +from pocketsphinx5 import AudioFile |
| 140 | + |
| 141 | +# Frames per Second |
| 142 | +fps = 100 |
| 143 | + |
| 144 | +for phrase in AudioFile(frate=fps): # frate (default=100) |
| 145 | + print('-' * 28) |
| 146 | + print('| %5s | %3s | %4s |' % ('start', 'end', 'word')) |
| 147 | + print('-' * 28) |
| 148 | + for s in phrase.seg(): |
| 149 | + print('| %4ss | %4ss | %8s |' % (s.start_frame / fps, s.end_frame / fps, s.word)) |
| 150 | + print('-' * 28) |
| 151 | + |
| 152 | +# ---------------------------- |
| 153 | +# | start | end | word | |
| 154 | +# ---------------------------- |
| 155 | +# | 0.0s | 0.24s | <s> | |
| 156 | +# | 0.25s | 0.45s | <sil> | |
| 157 | +# | 0.46s | 0.63s | go | |
| 158 | +# | 0.64s | 1.16s | forward | |
| 159 | +# | 1.17s | 1.52s | ten | |
| 160 | +# | 1.53s | 2.11s | meters | |
| 161 | +# | 2.12s | 2.6s | </s> | |
| 162 | +# ---------------------------- |
| 163 | +``` |
| 164 | + |
| 165 | +Authors |
| 166 | +------- |
| 167 | + |
| 168 | +PocketSphinx is ultimately based on `Sphinx-II` which in turn was |
| 169 | +based on some older systems at Carnegie Mellon University, which were |
| 170 | +released as free software under a BSD-like license thanks to the |
| 171 | +efforts of Kevin Lenzo. Much of the decoder in particular was written |
| 172 | +by Ravishankar Mosur (look for "rkm" in the comments), but various |
| 173 | +other people contributed as well, see [the AUTHORS file](./AUTHORS) |
| 174 | +for more details. |
| 175 | + |
| 176 | +David Huggins-Daines (the author of this document) is |
| 177 | +guilty^H^H^H^H^Hresponsible for creating `PocketSphinx` which added |
| 178 | +various speed and memory optimizations, fixed-point computation, JSGF |
| 179 | +support, portability to various platforms, and a somewhat coherent |
| 180 | +API. He then disappeared for a while. |
| 181 | + |
| 182 | +Nickolay Shmyrev took over maintenance for quite a long time |
| 183 | +afterwards, and a lot of code was contributed by Alexander Solovets, |
| 184 | +Vyacheslav Klimkov, and others. The |
| 185 | +[pocketsphinx-python](https://github.com/bambocher/pocketsphinx-python) |
| 186 | +module was originally written by Dmitry Prazdnichnov. |
| 187 | + |
| 188 | +Currently this is maintained by David Huggins-Daines again. |
0 commit comments