├── LICENSE ├── README.rdoc ├── conf-english.yaml ├── conf-estonian.yaml ├── conf.yaml ├── config.ru ├── dummy.fsg ├── lib ├── handlers │ ├── handler.rb │ ├── jsgf_handler.rb │ ├── pgf_handler.rb │ └── prettifier.rb ├── raw_recognizer.rb └── server.rb ├── scripts ├── convert-gf-jsgf.sh ├── en-g2p.sh ├── fsg-to-dict.sh ├── fsg-to-dict_en.py ├── fsm2fsg.py ├── jsgf2fsg.sh ├── log2apps-png.sh ├── log2apps-txt.sh ├── log2models-txt.sh ├── log2png.sh └── settings.sh ├── unicorn.conf.rb └── views ├── index.md └── layout.erb /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2011 Tallinn Univeristy of Technology 2 | All rights reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without 5 | modification, are permitted provided that the following conditions 6 | are met: 7 | 1. Redistributions of source code must retain the above copyright 8 | notice, this list of conditions and the following disclaimer. 9 | 2. Redistributions in binary form must reproduce the above copyright 10 | notice, this list of conditions and the following disclaimer in the 11 | documentation and/or other materials provided with the distribution. 12 | 3. The name of the author may not be used to endorse or promote products 13 | derived from this software without specific prior written permission. 14 | 15 | THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR 16 | IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES 17 | OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. 18 | IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, 19 | INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT 20 | NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 21 | DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 22 | THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 23 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF 24 | THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 25 | -------------------------------------------------------------------------------- /README.rdoc: -------------------------------------------------------------------------------- 1 | = Introduction 2 | 3 | Ruby-based web service for speech recognition, using the PocketSphinx gstreamer module. 4 | 5 | = Requirements 6 | 7 | * Ruby 1.8 8 | * Sinatra 9 | * Rack 10 | * Unicorn 11 | * PocketSphinx (NOTE: some features of the server require patched PocketSphinx, see below) 12 | * Some acoustic and language models for PocketSphinx 13 | 14 | 15 | = Installing 16 | 17 | == CMU Sphinx 18 | 19 | * Install sphinxbase from SVN (make, make install) 20 | 21 | === Apply PocketSphinx patch 22 | 23 | In cmusphinx/pocketsphinx directory: 24 | 25 | wget http://www.phon.ioc.ee/~tanela/ps_gst.patch 26 | patch -p0 -i ps_gst.patch 27 | 28 | 29 | Make sure you have GStreamer devevelopment packages installed. In Debian Squeeze: 30 | 31 | apt-get install libgstreamer0.10-dev libgstreamer-plugins-base0.10-dev 32 | 33 | And configure, make, make install as usual. 34 | 35 | == Install Ruby gems: Unicorn and Sinatra, UUID tools, JSON, locale 36 | 37 | This assumes you have ruby and rubygems installed. 38 | 39 | You might want to do this as root: 40 | 41 | gem install unicorn 42 | gem install sinatra 43 | gem install uuidtools 44 | gem install json 45 | gem install locale 46 | 47 | Install ruby-gstreamer package (might vary depending on your distribution): 48 | 49 | apt-get install libgst-ruby1.8 50 | 51 | == Additional tools 52 | 53 | English GF-based recognizer also need: 54 | 55 | * libtext-unidecode-perl 56 | * Phonetisaurus, Phonetisaurus prebuilt model for English (http://code.google.com/p/phonetisaurus/downloads/detail?name=g014b2b.tgz) 57 | * Python 58 | 59 | 60 | == Run ruby-pocketsphinx-server 61 | 62 | Clone the git repository: 63 | 64 | git clone git://github.com/alumae/ruby-pocketsphinx-server.git 65 | 66 | Before executing, add `/usr/local/lib` to the path where GStreamer plugins are looked for: 67 | 68 | export GST_PLUGIN_PATH=/usr/local/lib 69 | 70 | = Running 71 | 72 | unicorn -c unicorn.conf.rb config.ru 73 | 74 | If you installed Unicorn as a Ruby gem, you might need to execute: 75 | 76 | /var/lib/gems/1.8/bin/unicorn -c unicorn.conf.rb config.ru 77 | 78 | Test the default configuration (English WSJ language model with HUB4 acostic models), using a raw audio file in the PocketSphinx test directory 79 | (replace `$(POCKETSPHINX_DIR)` with the Pocketsphinx source directory): 80 | 81 | curl -T $(POCKETSPHINX_DIR)/test/data/wsj/n800_440c0207.wav -H "Content-Type: audio/x-wav" "http://localhost:8080/recognize" 82 | 83 | Response should be: 84 | 85 | { 86 | "status": 0, 87 | "hypotheses": [ 88 | { 89 | "utterance": "the agency isn't likely to take any action until the union's rank and file votes on the contract into three weeks" 90 | }, 91 | { 92 | "utterance": "the agency isn't likely to take any action until the union's rank and file puts on the contract into three weeks" 93 | }, 94 | { 95 | "utterance": "the agency isn't likely to take any action until the union's rank and file funds from the contract into three weeks" 96 | }, 97 | { 98 | "utterance": "the agency isn't likely to take any action until the union's rank and file for from the contract into three weeks" 99 | }, 100 | { 101 | "utterance": "the agency isn't likely to take any action until the union's rank and file parts of the contract into three weeks" 102 | } 103 | ], 104 | "id": "8686a37b5674cbdc63deb13f73de81a5" 105 | } 106 | 107 | 108 | = Configuration 109 | 110 | == Web service 111 | 112 | Unicorn configuration is in file unicorn.conf.rb. See http://unicorn.bogomips.org/examples/unicorn.conf.rb for 113 | more info. 114 | 115 | == Recognizer 116 | 117 | See conf.yaml 118 | 119 | = Using the web service 120 | 121 | Some of the more advanced examples below are specific to the Estonian configuration. 122 | 123 | ==Example 1 124 | 125 | Record a sentence to a wav file, in mono (hit Ctrl-C when done speaking): 126 | 127 | rec -c 1 sentence.wav 128 | 129 | 130 | Send it to the web service: 131 | 132 | curl -X POST --data-binary @sentence.wav -H "Content-Type: audio/x-wav" http://localhost:8080/recognize 133 | 134 | Output (encoded using json, the example uses Estonian models): 135 | 136 | { 137 | "status": 0, 138 | "hypotheses": [ 139 | { 140 | "utterance": [ 141 | "t\u00e4na on v\u00e4ljas \u00fcsna ilus ilm" 142 | ] 143 | } 144 | ], 145 | "id": "e30f54561135d681599915562d77d240" 146 | } 147 | 148 | == Example 2 149 | 150 | Record a raw file using arecord: 151 | 152 | arecord --format=S16_LE --file-type raw --channels 1 --rate 16000 > sentence2.raw 153 | 154 | Send it to web service: 155 | 156 | curl -X POST --data-binary @sentence2.raw -H "Content-Type: audio/x-raw-int; rate=16000" http://localhost:8080/recognize 157 | 158 | == Example 3 159 | 160 | Record a 5 second audio, pipe it to curl, which streams it directly to web service using PUT (and gets almost instant response): 161 | 162 | arecord --format=S16_LE --file-type raw --channels 1 --rate 16000 --duration 5 | curl -vv -T - -H "Content-Type: audio/x-raw-int; rate=16000" http://localhost:8080/recognize 163 | 164 | 165 | = Support for JSGF grammars 166 | 167 | Users can use their own grammars to recognize certain sentences. The grammars should be in JSGF format. 168 | 169 | Example JSGF (let's call it robot.jsgf) 170 | 171 | #JSGF V1.0; 172 | 173 | grammar robot; 174 | 175 | public = (liigu | mine ) [ ( üks | kaks | kolm | neli | viis ) meetrit ] (edasi | tagasi); 176 | 177 | NB! Grammars should be in the same charset that the server is using for dictionary, which currently is latin-1 (sorry for that). 178 | 179 | You need to upload the JSGF file to somewhere where the server can fetch it, let's say http://www.example.com/robot.txt 180 | 181 | Now, let the server download and compile it: 182 | 183 | curl -vv http://localhost:8080/fetch-lm?url=http://www.example.com/robot.jsgf 184 | 185 | This should result in HTTP/1.1 200 OK. 186 | 187 | Now you can use the grammar to recognize a sentence that is accepted by the grammar: 188 | 189 | arecord --format=S16_LE --file-type raw --channels 1 --rate 16000 --duration 5 | \ 190 | curl -vv -T - -H "Content-Type: audio/x-raw-int; rate=16000" http://localhost:8080/recognize?lm=http://www.example.com/robot.jsgf 191 | 192 | Result: 193 | 194 | { 195 | "status": 0, 196 | "hypotheses": [ 197 | { 198 | "utterance": "mine viis meetrit tagasi" 199 | } 200 | ], 201 | "id": "9e3895e9ee0b5138e73c6fca30f51a58" 202 | } 203 | 204 | If you update the grammar on the server, you need to make the /fetch-jsgf request again, as the server doesn't check for changes every time 205 | a recognition request is done (for efficiency reasons). 206 | 207 | = Support for GF grammars 208 | 209 | GF (Grammatical Framework) grammars are supported. 210 | 211 | A GF grammar must be compiled into a .pgf file. To upload it to the server, use the fetch-pgf API call, e.g.: 212 | 213 | curl "http://bark.phon.ioc.ee/speech-api/v1/fetch-lm?url=http://kaljurand.github.com/Grammars/grammars/pgf/Calc.pgf&lang=Est" 214 | 215 | The 'lang' attribute (defaults to 'Est') specifies input languages of the grammar. Many comma-separated languages can be specified, e.g lang=Est,Est2 216 | 217 | To recognize with a GF, use similar request as with JSGF, e.g.: 218 | 219 | arecord --format=S16_LE --file-type raw --channels 1 --rate 16000 --duration 5 | curl -vv -T - -H "Content-Type: audio/x-raw-int; rate=16000" "http://localhost:8080/recognize?lm=http://kaljurand.github.com/Grammars/grammars/pgf/Calc.pgf 220 | 221 | You can also specify output language(s) that will be used to linearize the raw recognition result, e.g.: 222 | 223 | arecord --format=S16_LE --file-type raw --channels 1 --rate 16000 --duration 5 | curl -vv -T - -H "Content-Type: audio/x-raw-int; rate=16000" "http://localhost:8080/recognize?lm=http://kaljurand.github.com/Grammars/grammars/pgf/Calc.pgf&output-lang=App" 224 | 225 | Output: 226 | 227 | { 228 | "status": 0, 229 | "hypotheses": [ 230 | { 231 | "utterance": "viis minutit sekundites", 232 | "linearizations": [ 233 | { 234 | "lang": "App", 235 | "output": "5 ' IN \"" 236 | }, 237 | { 238 | "lang": "App", 239 | "output": "5 min IN s" 240 | } 241 | ] 242 | } 243 | ], 244 | "id": "83486feaca30995401ed4a66951a3f23" 245 | } 246 | 247 | Multiple output languages can be used, by using comma-separated values: "..&output-lang=App,App2" 248 | -------------------------------------------------------------------------------- /conf-english.yaml: -------------------------------------------------------------------------------- 1 | 2 | handlers: 3 | 4 | - name: PrettifyingHandler 5 | require: handlers/prettifier 6 | recognizer: 7 | hmm: /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k 8 | lm: /usr/local/share/pocketsphinx/model/lm/en_US/wsj0vp.5000.DMP 9 | dict: /usr/local/share/pocketsphinx/model/lm/en_US/cmu07a.dic 10 | bestpath: true 11 | maxwpf: 700 12 | fwdflat: true 13 | maxhmmpf: 12000 14 | wbeam: 1.0e-32 15 | beam: 1.0e-50 16 | pbeam: 1.0e-50 17 | 18 | 19 | # Request audio and metadata is dumped to this dir 20 | request_dump_dir: out 21 | 22 | # Encoding of LM words. Used for converting output to JSON (which is always UTF-8) 23 | recognizer_encoding: iso-8859-15 24 | 25 | -------------------------------------------------------------------------------- /conf-estonian.yaml: -------------------------------------------------------------------------------- 1 | 2 | handlers: 3 | - name: JSGFHandler 4 | require: handlers/jsgf_handler 5 | grammar_dir: user_grammars 6 | jsgf-to-fsg: ./scripts/jsgf2fsg.sh 7 | fsg-to-dict: ./scripts/fsg-to-dict.sh 8 | recognizer: 9 | hmm: models/est16k.cd_cont_3000-mapadapt 10 | dict: models/konele.splitw2.dict 11 | fsg: dummy.fsg 12 | bestpath: true 13 | fwdflat: true 14 | beam: 1.0e-80 15 | pbeam: 1.0e-47 16 | wbeam: 1.0e-39 17 | 18 | - name: PGFHandler 19 | require: handlers/pgf_handler 20 | lang: Est 21 | grammar_dir: user_gfs 22 | jsgf-to-fsg: ./scripts/jsgf2fsg.sh 23 | fsg-to-dict: ./scripts/fsg-to-dict.sh 24 | convert-gf-jsgf: ./scripts/convert-gf-jsgf.sh 25 | recognizer: 26 | hmm: models/est16k.cd_cont_3000-mapadapt 27 | dict: models/konele.splitw2.dict 28 | fsg: dummy.fsg 29 | bestpath: true 30 | fwdflat: true 31 | beam: 1.0e-80 32 | pbeam: 1.0e-47 33 | wbeam: 1.0e-39 34 | 35 | - name: PGFHandler 36 | require: handlers/pgf_handler 37 | lang: Eng 38 | grammar_dir: user_gfs 39 | jsgf-to-fsg: ./scripts/jsgf2fsg.sh 40 | fsg-to-dict: ./scripts/fsg-to-dict_en.py 41 | convert-gf-jsgf: ./scripts/convert-gf-jsgf.sh 42 | recognizer: 43 | fsg: dummy.fsg 44 | hmm: models/voxforge_en_sphinx.cd_cont_5000 45 | 46 | 47 | - name: PrettifyingHandler 48 | require: handlers/prettifier 49 | prettifier: ./prettify-with-numbers.sh 50 | recognizer: 51 | hmm: models/est16k.cd_cont_3000 52 | dict: models/konele.splitw2.dict 53 | lm: models/sphinx-trigram.konele.splitw2.arpa.gz 54 | bestpath: false 55 | maxwpf: 800 56 | fwdflat: false 57 | maxhmmpf: 10000 58 | wbeam: 1.0e-22 59 | beam: 1.0e-50 60 | pbeam: 1.0e-50 61 | 62 | 63 | request_dump_dir: out 64 | recognizer_encoding: UTF-8 65 | 66 | 67 | -------------------------------------------------------------------------------- /conf.yaml: -------------------------------------------------------------------------------- 1 | 2 | handlers: 3 | 4 | - name: PrettifyingHandler 5 | require: handlers/prettifier 6 | recognizer: 7 | hmm: /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k 8 | lm: /usr/local/share/pocketsphinx/model/lm/en_US/wsj0vp.5000.DMP 9 | dict: /usr/local/share/pocketsphinx/model/lm/en_US/cmu07a.dic 10 | bestpath: true 11 | maxwpf: 700 12 | fwdflat: true 13 | maxhmmpf: 12000 14 | wbeam: 1.0e-32 15 | beam: 1.0e-50 16 | pbeam: 1.0e-50 17 | 18 | 19 | # Request audio and metadata is dumped to this dir 20 | request_dump_dir: out 21 | 22 | # Encoding of LM words. Used for converting output to JSON (which is always UTF-8) 23 | recognizer_encoding: iso-8859-15 24 | 25 | -------------------------------------------------------------------------------- /config.ru: -------------------------------------------------------------------------------- 1 | 2 | 3 | # the below script is a standalone Sinatra application; absolutely 4 | # nothing special needs to be done in this Sinatra app for running 5 | # with Unicorn 6 | 7 | puts "-----------STARTING?----------" 8 | puts Dir.pwd 9 | 10 | $LOAD_PATH.unshift File.join(File.dirname(__FILE__), 'lib') 11 | 12 | require 'server' 13 | 14 | # the following hash needs to be the last statement, as unicorn 15 | # will eval this entire file 16 | run PocketsphinxServer::Server 17 | 18 | 19 | -------------------------------------------------------------------------------- /dummy.fsg: -------------------------------------------------------------------------------- 1 | FSG_BEGIN 2 | NUM_STATES 2 3 | START_STATE 0 4 | FINAL_STATE 1 5 | 6 | TRANSITION 0 1 1.000000 7 | FSG_END 8 | -------------------------------------------------------------------------------- /lib/handlers/handler.rb: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | class PocketsphinxServer::Handler 5 | 6 | attr_reader :recognizer, :config 7 | 8 | def initialize(server, config={}) 9 | @config = config 10 | @server = server 11 | @recognizer = PocketsphinxServer::Recognizer.new(server, config.fetch('recognizer', {})) 12 | end 13 | 14 | # Can this handler handle this request? 15 | def can_handle?(req) 16 | true 17 | end 18 | 19 | # Prepare the recognizer for this request (switch LM, etc) 20 | def prepare_rec(req) 21 | 22 | end 23 | 24 | # Postprocess an hypothesis string (e.g., make it prettyier) 25 | def postprocess_hypothesis(hyp) 26 | hyp 27 | end 28 | 29 | # Return a map of extra data for a hypothesis 30 | def get_hyp_extras(req, hyp) 31 | {} 32 | end 33 | 34 | def can_handle_fetch_lm?(req) 35 | false 36 | end 37 | 38 | def handle_fetch_lm(req) 39 | 40 | end 41 | 42 | def log(str) 43 | @server.logger.info str 44 | end 45 | end 46 | -------------------------------------------------------------------------------- /lib/handlers/jsgf_handler.rb: -------------------------------------------------------------------------------- 1 | 2 | require 'handlers/handler' 3 | 4 | class PocketsphinxServer::JSGFHandler < PocketsphinxServer::Handler 5 | 6 | def initialize(server, config={}) 7 | super 8 | @grammar_dir = config.fetch('grammar-dir', 'user_grammars') 9 | end 10 | 11 | def can_handle?(req) 12 | lm_name = req.params['lm'] 13 | return (lm_name != nil) && (lm_name =~ /jsgf$/) 14 | end 15 | 16 | def prepare_rec(req) 17 | lm_name = req.params['lm'] 18 | log("Using JSGF-based grammar") 19 | dict_file = dict_file_from_url(lm_name) 20 | fsg_file = fsg_file_from_url(lm_name) 21 | if not File.exists? fsg_file 22 | raise IOError, "Language model #{lm_name} not available. Use /fetch-lm API call to upload it to the server" 23 | end 24 | if not File.exists? dict_file 25 | raise IOError, "Pronunciation dictionary for #{lm_name} not available. Use /fetch-lm API call to make it on the server" 26 | end 27 | @recognizer.set_fsg(fsg_file, dict_file) 28 | log("Loaded requested JSGF model from #{fsg_file}") 29 | 30 | end 31 | 32 | def can_handle_fetch_lm?(req) 33 | lm_name = req.params['url'] 34 | if lm_name == nil 35 | # backward compability 36 | lm_name = req.params['lm'] 37 | end 38 | return (lm_name != nil) && (lm_name =~ /jsgf$/) 39 | end 40 | 41 | def handle_fetch_lm(req) 42 | url = req.params['url'] 43 | if url == nil 44 | # backward compability 45 | url = req.params['lm'] 46 | end 47 | log "Fetching JSGF grammar from #{url}" 48 | digest = MD5.hexdigest url 49 | content = open(url).read 50 | jsgf_file = @grammar_dir + "/#{digest}.jsgf" 51 | fsg_file = fsg_file_from_url(url) 52 | dict_file = dict_file_from_url(url) 53 | File.open(jsgf_file, 'w') { |f| 54 | f.write(content) 55 | } 56 | log "Converting to FSG.." 57 | `#{@config['jsgf-to-fsg']} #{jsgf_file} #{fsg_file}` 58 | if $? != 0 59 | raise "Failed to convert JSGF to FSG" 60 | end 61 | log "Making dictionary.." 62 | `cat #{fsg_file} | #{@config['fsg-to-dict']} > #{dict_file}` 63 | if $? != 0 64 | raise "Failed to make dictionary from FSG" 65 | end 66 | "Request completed" 67 | end 68 | 69 | def fsg_file_from_url(url) 70 | digest = MD5.hexdigest url 71 | return @grammar_dir + "/#{digest}.fsg" 72 | end 73 | 74 | def dict_file_from_url(url) 75 | digest = MD5.hexdigest url 76 | return @grammar_dir + "/#{digest}.dict" 77 | end 78 | 79 | 80 | 81 | end 82 | 83 | -------------------------------------------------------------------------------- /lib/handlers/pgf_handler.rb: -------------------------------------------------------------------------------- 1 | 2 | require 'handlers/handler' 3 | require 'rubygems' 4 | require 'locale' 5 | require 'locale/info' 6 | 7 | class PocketsphinxServer::PGFHandler < PocketsphinxServer::Handler 8 | 9 | def initialize(server, config={}) 10 | super 11 | @grammar_dir = config.fetch('grammar-dir', 'user_gfs') 12 | configured_language = config.fetch('lang', 'et') 13 | @language = Locale::Info.get_language(Locale::Tag.parse(configured_language).language) 14 | end 15 | 16 | def get_request_language(req) 17 | lang = req.params['lang'] 18 | if lang == nil 19 | lang = "et" 20 | end 21 | return Locale::Info.get_language(Locale::Tag.parse(lang).language) 22 | end 23 | 24 | def can_handle?(req) 25 | if @language != get_request_language(req) 26 | return false 27 | end 28 | lm_name = req.params['lm'] 29 | return (lm_name != nil) && (lm_name =~ /pgf$/) 30 | end 31 | 32 | def get_req_properties(req) 33 | input_lang = get_request_language(req).three_code.capitalize() 34 | output_langs = req.params['output-lang'] 35 | lm_name = req.params['lm'] 36 | digest = MD5.hexdigest lm_name 37 | pgf_dir = @grammar_dir + '/' + digest 38 | pgf_basename = File.basename(URI.parse(lm_name).path, ".pgf") 39 | return input_lang, output_langs, pgf_dir, pgf_basename, lm_name 40 | end 41 | 42 | def prepare_rec(req) 43 | puts "Using GF-based grammar" 44 | input_lang, output_langs, pgf_dir, pgf_basename, lm_name = get_req_properties(req) 45 | fsg_file = pgf_dir + '/' + pgf_basename + input_lang + ".fsg" 46 | dict_file = pgf_dir + '/' + pgf_basename + input_lang + ".dict" 47 | if not File.exists? fsg_file 48 | raise IOError, "Grammar for lang #{input_lang} for #{lm_name} not available. Use /fetch-lm API call to upload it to the server" 49 | end 50 | if not File.exists? dict_file 51 | raise IOError, "Pronunciation dictionary for lang #{input_lang} for #{lm_name} not available. Use /fetch-lm API call to make it on the server" 52 | end 53 | @recognizer.set_fsg(fsg_file, dict_file) 54 | end 55 | 56 | def get_hyp_extras(req, hyp) 57 | input_lang, output_langs, pgf_dir, pgf_basename, lm_name = get_req_properties(req) 58 | linearizations = [] 59 | if not output_langs.nil? 60 | output_langs.split(",").each do |output_lang| 61 | log "Linearizing [#{hyp}] to lang #{output_lang}" 62 | outputs = `echo "parse -lang=#{pgf_basename + input_lang} \\"#{hyp}\\" | linearize -lang=#{pgf_basename + output_lang} | ps -bind" | gf --run #{pgf_dir + '/' + pgf_basename + '.pgf'}` 63 | output_lines = outputs.split("\n") 64 | if output_lines == [] 65 | output_lines = [""] 66 | end 67 | output_lines.each do |output| 68 | log "LINEARIZED RESULT: " + output 69 | linearizations.push({:output => output, :lang => output_lang}) 70 | end 71 | end 72 | end 73 | return {:linearizations => linearizations} 74 | end 75 | 76 | def can_handle_fetch_lm?(req) 77 | lm_name = req.params['url'] 78 | langs = req.params['lang'] 79 | if langs == nil 80 | langs = "Est" 81 | end 82 | if not langs.split(',').collect { |l| Locale::Info.get_language(Locale::Tag.parse(l).language)}.include? @language 83 | return false 84 | end 85 | return (lm_name != nil) && (lm_name =~ /pgf$/) 86 | end 87 | 88 | def handle_fetch_lm(req) 89 | url = req.params['url'] 90 | log "Fetching PGF from #{url}" 91 | digest = MD5.hexdigest url 92 | content = open(url).read 93 | pgf_dir = @grammar_dir + '/' + digest 94 | FileUtils.mkdir_p pgf_dir 95 | pgf_basename = File.basename(URI.parse(url).path, ".pgf") 96 | File.open(pgf_dir + '/' + pgf_basename + ".pgf", 'w') { |f| 97 | f.write(content) 98 | } 99 | log 'Extracting concrete grammars' 100 | `gf -make --output-format=jsgf --output-dir=#{pgf_dir} #{pgf_dir + '/' + pgf_basename + ".pgf"}` 101 | if $? != 0 102 | raise "Failed to extract JSGF from PGF" 103 | end 104 | 105 | lang = @language.three_code.capitalize() 106 | 107 | jsgf_file = pgf_dir + '/' + pgf_basename + lang + ".jsgf" 108 | fsg_file = pgf_dir + '/' + pgf_basename + lang + ".fsg" 109 | dict_file = pgf_dir + '/' + pgf_basename + lang + ".dict" 110 | log "Making finite state grammar for input language #{lang}" 111 | log "Converting JSGF.." 112 | `#{@config.fetch('convert-gf-jsgf')} #{jsgf_file}` 113 | if $? != 0 114 | raise "Failed to convert JSGF for lang #{lang}" 115 | end 116 | log "Converting to FSG.." 117 | `#{@config.fetch('jsgf-to-fsg')} #{jsgf_file} #{fsg_file}` 118 | if $? != 0 119 | raise "Failed to convert JSGF to FSG for lang #{lang}" 120 | end 121 | log "Making dictionary.." 122 | `cat #{fsg_file} | #{@config.fetch('fsg-to-dict')} -lang #{lang} > #{dict_file}` 123 | if $? != 0 124 | raise "Failed to make dictionary from FSG for lang #{lang}" 125 | end 126 | 127 | 128 | "Request completed" 129 | end 130 | 131 | end 132 | 133 | -------------------------------------------------------------------------------- /lib/handlers/prettifier.rb: -------------------------------------------------------------------------------- 1 | require 'handlers/handler' 2 | 3 | ### 4 | # This handler postprocesses hyps using an external script that can 5 | # syncronously process text, i.e., for each input line it instantly 6 | # flushes an output line 7 | ### 8 | class PocketsphinxServer::PrettifyingHandler < PocketsphinxServer::Handler 9 | 10 | def initialize(server, config={}) 11 | super 12 | @prettifier 13 | if config['prettifier'] != nil 14 | @prettifier = IO.popen(config['prettifier'], mode="r+") 15 | end 16 | end 17 | 18 | def postprocess_hypothesis(hyp) 19 | if @prettifier != nil && hyp && !hyp.empty? 20 | log "PRETTIFYING: #{hyp}" 21 | @prettifier.puts "#{hyp}" 22 | @prettifier.flush 23 | result = @prettifier.gets.strip 24 | log "RESULT: #{result}" 25 | return result 26 | end 27 | hyp 28 | end 29 | end 30 | -------------------------------------------------------------------------------- /lib/raw_recognizer.rb: -------------------------------------------------------------------------------- 1 | require 'gst' 2 | Gst.init 3 | 4 | class PocketsphinxServer::Recognizer 5 | attr :result 6 | attr :queue 7 | attr :pipeline 8 | attr :appsrc 9 | attr :asr 10 | attr :clock 11 | attr :appsink 12 | attr :recognizing 13 | 14 | def initialize(server, config={}) 15 | @server = server 16 | @data_buffers = [] 17 | @clock = Gst::SystemClock.new 18 | @result = "" 19 | @recognizing = false 20 | 21 | @outdir = nil 22 | begin 23 | @outdir = server.config.fetch('request_dump_dir' '') 24 | rescue 25 | end 26 | 27 | @appsrc = Gst::ElementFactory.make "appsrc", "appsrc" 28 | @decoder = Gst::ElementFactory.make "decodebin2", "decoder" 29 | @audioconvert = Gst::ElementFactory.make "audioconvert", "audioconvert" 30 | @audioresample = Gst::ElementFactory.make "audioresample", "audioresample" 31 | @tee = Gst::ElementFactory.make "tee", "tee" 32 | @queue1 = Gst::ElementFactory.make "queue", "queue1" 33 | @filesink = Gst::ElementFactory.make "filesink", "filesink" 34 | @queue2 = Gst::ElementFactory.make "queue", "queue2" 35 | @asr = Gst::ElementFactory.make "pocketsphinx", "asr" 36 | @appsink = Gst::ElementFactory.make "appsink", "appsink" 37 | 38 | @filesink.set_property("location", "/dev/null") 39 | 40 | config.map{ |k,v| 41 | log "Setting #{k} to #{v}..." 42 | @asr.set_property(k, v) 43 | } 44 | # This returns when ASR engine has been fully loaded 45 | @asr.set_property('configured', true) 46 | 47 | create_pipeline() 48 | end 49 | 50 | 51 | def log(str) 52 | @server.logger.debug(str) 53 | end 54 | 55 | def create_pipeline() 56 | @pipeline = Gst::Pipeline.new "pipeline" 57 | @pipeline.add @appsrc, @decoder, @audioconvert, @audioresample, @tee, @queue1, @filesink, @queue2, @asr, @appsink 58 | @appsrc >> @decoder 59 | @audioconvert >> @audioresample >> @tee 60 | @tee >> @queue1 >> @asr >> @appsink 61 | @tee >> @queue2 >> @filesink 62 | 63 | @decoder.signal_connect('pad-added') { | element, pad, last, data | 64 | log "---- pad-added ---- " 65 | pad.link @audioconvert.get_pad("sink") 66 | } 67 | 68 | @queue = Queue.new 69 | 70 | 71 | @asr.signal_connect('partial_result') { |asr, text, uttid| 72 | #log "PARTIAL: " + text 73 | @result = text 74 | } 75 | 76 | @asr.signal_connect('result') { |asr, text, uttid| 77 | #log "FINAL: " + text 78 | if text.nil? 79 | text = "" 80 | end 81 | @result = text 82 | @queue.push(1) 83 | } 84 | 85 | @appsink = @pipeline.get_child("appsink") 86 | 87 | @appsink.signal_connect('eos') { |appsink, data| 88 | log "##### EOS #####" 89 | } 90 | 91 | @bus = @pipeline.bus 92 | @bus.signal_connect('message::state-changed') { |appsink, data| 93 | log "##### STATE-CHANGED #####" 94 | } 95 | end 96 | 97 | 98 | # Call this before starting a new recognition 99 | def clear(id, caps_str) 100 | caps = Gst::Caps.parse(caps_str) 101 | @appsrc.set_property("caps", caps) 102 | @result = "" 103 | queue.clear 104 | pipeline.pause 105 | if @outdir != nil 106 | @filesink.set_state(Gst::State::NULL) 107 | @filesink.set_property('location', "#{@outdir}/#{id}.raw") 108 | end 109 | @filesink.set_state(Gst::State::PLAYING) 110 | end 111 | 112 | def set_cmn_mean(mean) 113 | @asr.set_property("cmn_mean", mean) 114 | end 115 | 116 | def get_cmn_mean() 117 | return @asr.get_property("cmn_mean") 118 | end 119 | 120 | # Feed new chunk of audio data to the recognizer 121 | def feed_data(data) 122 | buffer = Gst::Buffer.new 123 | my_data = data.dup 124 | buffer.data = my_data 125 | buffer.timestamp = clock.time 126 | appsrc.push_buffer(buffer) 127 | # HACK: we need to reference the buffer so that ruby won't overwrite it 128 | @data_buffers.push my_data 129 | pipeline.play 130 | @recognizing = true 131 | end 132 | 133 | # Notify recognizer of utterance end 134 | def feed_end 135 | appsrc.end_of_stream 136 | end 137 | 138 | # Wait for the recognizer to recognize the current utterance 139 | # Returns the final recognition result 140 | def wait_final_result(max_nbest = 5) 141 | queue.pop 142 | # we request more N-best hyps than needed since we don't care about 143 | # differences in fillers 144 | @asr.set_property("nbest_size", max_nbest * 3) 145 | nbest = @asr.get_property("nbest") 146 | nbest.uniq! 147 | #nbest.map!{ |hyp| if hyp.nil? then hyp = "" end } 148 | @pipeline.ready 149 | @data_buffers.clear 150 | @recognizing = false 151 | log "CMN mean after: #{@asr.get_property("cmn_mean")}" 152 | return result, nbest[0..max_nbest-1] 153 | end 154 | 155 | def stop 156 | #@pipeline.play 157 | appsrc.end_of_stream 158 | wait_final_result 159 | end 160 | 161 | def set_fsg(fsg_file, dict_file) 162 | @asr.set_property('fsg', 'dummy.fsg') 163 | log "Trying to use dict #{dict_file}" 164 | @asr.set_property('dict', dict_file) 165 | log "Trying to use FSG #{fsg_file}" 166 | @asr.set_property('fsg', fsg_file) 167 | @asr.set_property('configured', true) 168 | log "FSG configured" 169 | end 170 | 171 | def recognizing?() 172 | @recognizing 173 | end 174 | end 175 | -------------------------------------------------------------------------------- /lib/server.rb: -------------------------------------------------------------------------------- 1 | require 'sinatra/base' 2 | require 'uuidtools' 3 | require 'json' 4 | require 'iconv' 5 | require 'set' 6 | require 'yaml' 7 | require 'open-uri' 8 | require 'md5' 9 | require 'uri' 10 | 11 | module PocketsphinxServer 12 | 13 | require 'raw_recognizer' 14 | 15 | class PocketsphinxServer::Server < Sinatra::Base 16 | 17 | configure do 18 | enable :static 19 | set :root, File.expand_path(".") 20 | 21 | set :public_folder, 'static' 22 | 23 | enable :logging 24 | disable :show_exceptions 25 | 26 | LOGGER = Logger.new(STDOUT) 27 | set :logger, LOGGER 28 | def LOGGER.puts(*s) 29 | s.flatten.each { |item| info(item.to_s) } 30 | end 31 | 32 | def LOGGER.write(*s) 33 | s.flatten.each { |item| info(item.to_s) } 34 | end 35 | 36 | $stdout = LOGGER 37 | $stderr = LOGGER 38 | 39 | set :config, YAML.load_file('conf.yaml') 40 | 41 | set :handlers, [] 42 | config['handlers'].each do |handler_config| 43 | className = handler_config['name'] 44 | requireName = handler_config['require'] 45 | require requireName 46 | puts "Creating handler #{className}" 47 | handler = PocketsphinxServer.const_get(className).new(self, handler_config) 48 | handlers << handler 49 | end 50 | 51 | begin 52 | set :outdir, config["request_dump_dir"] 53 | Dir.mkdir(settings.outdir) 54 | rescue 55 | end 56 | 57 | CHUNK_SIZE = 256 58 | 59 | end 60 | 61 | get '/' do 62 | markdown :index, :layout_engine => :erb 63 | end 64 | 65 | # FIXME: make it concurrent-safe 66 | get '/stats/history.png' do 67 | headers "Content-Type" => "image/png" 68 | `mkdir -p tmp` 69 | `./scripts/log2png.sh server.log tmp/stats.png` 70 | File.read(File.join('tmp', "stats.png")) 71 | end 72 | 73 | # FIXME: make it concurrent-safe 74 | get '/stats/apps.png' do 75 | headers "Content-Type" => "image/png" 76 | `mkdir -p tmp` 77 | `./scripts/log2apps-png.sh server.log tmp/apps.png` 78 | File.read(File.join('tmp', "apps.png")) 79 | end 80 | 81 | get '/stats/apps.txt' do 82 | headers "Content-Type" => "text/plain" 83 | `./scripts/log2apps-txt.sh server.log` 84 | end 85 | 86 | get '/stats/models.txt' do 87 | headers "Content-Type" => "text/plain" 88 | `./scripts/log2models-txt.sh server.log` 89 | end 90 | 91 | post '/recognize' do 92 | do_post() 93 | end 94 | 95 | put '/recognize' do 96 | do_post() 97 | end 98 | 99 | put '/recognize/*' do 100 | do_post() 101 | end 102 | 103 | helpers do 104 | def logger 105 | LOGGER 106 | end 107 | end 108 | 109 | def do_post() 110 | id = SecureRandom.hex 111 | 112 | logger.info "Request ID: " + id 113 | req = Rack::Request.new(env) 114 | 115 | logger.info "Determining request handler..." 116 | 117 | @req_handler = nil 118 | settings.handlers.each do |handler| 119 | if handler.can_handle?(req) 120 | @req_handler = handler 121 | break 122 | end 123 | end 124 | logger.info "Request will be handled by #{@req_handler}" 125 | 126 | logger.info "Preparing request handler recognizer..." 127 | @req_handler.prepare_rec(req) 128 | 129 | nbest_n = 5 130 | if req.params.has_key? 'nbest' 131 | nbest_n = req.params['nbest'].to_i 132 | end 133 | 134 | if settings.outdir != nil 135 | File.open("#{settings.outdir}/#{id}.info", 'w') { |f| 136 | req.env.select{ |k,v| 137 | f.write "#{k}: #{v}\n" 138 | } 139 | } 140 | end 141 | logger.info "User agent: " + req.user_agent 142 | 143 | device_id = get_user_device_id(req) 144 | logger.info "Device ID : #{device_id}" 145 | cmn_mean = get_cmn_mean(device_id) 146 | if cmn_mean != nil 147 | logger.info "Setting CMN mean to #{cmn_mean}" 148 | @req_handler.recognizer.set_cmn_mean(cmn_mean) 149 | end 150 | 151 | logger.info "Parsing content type " + req.content_type 152 | caps_str = content_type_to_caps(req.content_type) 153 | logger.info "CAPS string is " + caps_str 154 | @req_handler.recognizer.clear(id, caps_str) 155 | 156 | length = 0 157 | 158 | left_over = "" 159 | req.body.each do |chunk| 160 | chunk_to_rec = left_over + chunk 161 | if chunk_to_rec.length > CHUNK_SIZE 162 | chunk_to_send = chunk_to_rec[0..(chunk_to_rec.length / 2) * 2 - 1] 163 | @req_handler.recognizer.feed_data(chunk_to_send) 164 | left_over = chunk_to_rec[chunk_to_send.length .. -1] 165 | else 166 | left_over = chunk_to_rec 167 | end 168 | length += chunk.size 169 | end 170 | @req_handler.recognizer.feed_data(left_over) 171 | 172 | 173 | logger.info "Data end received" 174 | if length > 0 175 | @req_handler.recognizer.feed_end() 176 | result,nbest = @req_handler.recognizer.wait_final_result(max_nbest=nbest_n) 177 | set_cmn_mean(device_id, @req_handler.recognizer.get_cmn_mean()) 178 | 179 | nbest_results = [] 180 | 181 | nbest.collect! do |hyp| 182 | @req_handler.postprocess_hypothesis(hyp) 183 | end 184 | 185 | nbest_results = [] 186 | nbest.collect do |hyp| 187 | nbest_result = {} 188 | nbest_result[:utterance] = hyp 189 | extras_map = @req_handler.get_hyp_extras(req, hyp) 190 | nbest_result.merge!(extras_map) 191 | nbest_results << nbest_result 192 | end 193 | source_encoding = settings.config["recognizer_encoding"] 194 | if source_encoding != "utf-8" 195 | # convert all strings in nbest_results from source encoding to UTF-8 196 | traverse( nbest_results ) do |node| 197 | if node.is_a? String 198 | node = Iconv.iconv('utf-8', source_encoding, node)[0] 199 | end 200 | node 201 | end 202 | end 203 | 204 | headers "Content-Type" => "application/json; charset=utf-8", "Content-Disposition" => "attachment" 205 | JSON.pretty_generate({:status => 0, :id => id, :hypotheses => nbest_results}) 206 | else 207 | @req_handler.recognizer.stop 208 | headers "Content-Type" => "application/json; charset=utf-8", "Content-Disposition" => "attachment" 209 | JSON.pretty_generate({:status => 0, :id => id, :hypotheses => [:utterance => ""]}) 210 | end 211 | end 212 | 213 | 214 | # Handle /fetch-lm requests and backward compatible fetch requests 215 | get %r{/fetch-((lm)|(jsgf)|(pgf))} do 216 | handled = false 217 | settings.handlers.each do |handler| 218 | if handler.can_handle_fetch_lm?(request) 219 | handler.handle_fetch_lm(request) 220 | handled = true 221 | end 222 | end 223 | if !handled 224 | status 409 225 | "Don't know how to handle this type of language model" 226 | else 227 | "Request completed" 228 | end 229 | end 230 | 231 | error do 232 | logger.info "Error: " + env['sinatra.error'] 233 | logger.info "Inspecting #{@req_handler}..." 234 | if @req_handler != nil and @req_handler.recognizer.recognizing? 235 | logger.info "Trying to clear recognizer.." 236 | @req_handler.recognizer.stop 237 | logger.info "Cleared recognizer" 238 | end 239 | #consume request body to avoid proxy error 240 | begin 241 | request.body.read 242 | rescue 243 | end 244 | 'Sorry, failed to process request. Reason: ' + env['sinatra.error'] + "\n" 245 | end 246 | 247 | # Traverses a structure of hashes and arrays and applied blk to the values 248 | def traverse(obj, &blk) 249 | case obj 250 | when Hash 251 | # Forget keys because I don't know what to do with them 252 | obj.each {|k,v| obj[k] = traverse(v, &blk) } 253 | when Array 254 | obj.collect! {|v| traverse(v, &blk) } 255 | else 256 | blk.call(obj) 257 | end 258 | end 259 | 260 | # Parses Content-type ans resolves it to GStreamer Caps string 261 | def content_type_to_caps(content_type) 262 | if not content_type 263 | content_type = "audio/x-raw-int" 264 | return "audio/x-raw-int,rate=16000,channels=1,signed=true,endianness=1234,depth=16,width=16" 265 | end 266 | parts = content_type.split(%r{[,; ]}) 267 | result = "" 268 | allowed_types = Set.new ["audio/x-flac", "audio/x-raw-int", "application/ogg", "audio/mpeg", "audio/x-wav"] 269 | if allowed_types.include? parts[0] 270 | result = parts[0] 271 | if parts[0] == "audio/x-raw-int" 272 | attributes = {"rate"=>"16000", "channels"=>"1", "signed"=>"true", "endianness"=>"1234", "depth"=>"16", "width"=>"16"} 273 | user_attributes = Hash[*parts[1..-1].map{|s| s.split('=', 2) }.flatten] 274 | attributes.merge!(user_attributes) 275 | result += ", " + attributes.map{|k,v| "#{k}=#{v}"}.join(", ") 276 | end 277 | return result 278 | else 279 | raise IOError, "Unsupported content type: #{parts[0]}. Supported types are: " + allowed_types.to_a.join(", ") + "." 280 | end 281 | end 282 | 283 | # TODO: make this configurable and modular 284 | def get_user_device_id(req) 285 | device_id = req.params['device_id'] 286 | if (not device_id.nil?) and (not device_id.empty?) 287 | return device_id 288 | end 289 | user_agent = req.user_agent 290 | # try to identify android device using old method 291 | if user_agent =~ /.*\(RecognizerIntentActivity.* ([\w-]+); .*/ 292 | return $1 293 | elsif user_agent =~ /RecognizerTester.* (\S+)/ 294 | return $1 295 | end 296 | return "default" 297 | end 298 | 299 | def get_cmn_mean(device_id) 300 | cmn_means = {} 301 | begin 302 | File.open('cmn_means.json', 'r') { |f| cmn_means = JSON.load(f) } 303 | rescue 304 | begin 305 | # backward compability, remove soon 306 | logger.warn("Falling back to deprecated cmn_means.yaml instead of cmn_means.json") 307 | cmn_means = YAML.load_file('cmn_means.yaml') 308 | rescue 309 | end 310 | end 311 | return cmn_means.fetch(device_id, get_mean_cmn_mean(cmn_means.values)) 312 | end 313 | 314 | # Calculate mean CMN from an array of CMN means 315 | # CMN means are given as an array of string, each with comma-seperated values 316 | # Returns mean CMN, as a string, comma-seperated 317 | def get_mean_cmn_mean(cmn_mean_array) 318 | begin 319 | if cmn_mean_array.size > 0 320 | means = (cmn_mean_array.collect do | s | s.split(",") end).collect do |a| a.collect do |ss| ss.to_f end end 321 | sum = [0.0] * means[0].size 322 | means = means.select do | mean | mean.size == sum.size end 323 | means.each do | mean | 324 | sum.each_with_index do |s,i| 325 | sum[i] += mean[i] 326 | end 327 | end 328 | return (sum.collect do |s| "%.2f" % (s / means.size) end).join(",") 329 | else 330 | return nil 331 | end 332 | rescue Exception => e 333 | logger.warn("Failed to calculate CMN mean over saved means:" + e.message) 334 | return nil 335 | end 336 | end 337 | 338 | def set_cmn_mean(device_id, mean) 339 | cmn_means = {} 340 | begin 341 | cmn_means = YAML.load_file('cmn_means.yaml') 342 | rescue 343 | end 344 | cmn_means[device_id] = mean 345 | uid = Process.uid 346 | File.open("cmn_means.json.#{uid}", 'w' ) do |out| 347 | out.write(cmn_means.to_json) 348 | end 349 | `mv cmn_means.json.#{uid} cmn_means.json` 350 | end 351 | end 352 | 353 | 354 | end 355 | -------------------------------------------------------------------------------- /scripts/convert-gf-jsgf.sh: -------------------------------------------------------------------------------- 1 | #! /bin/bash 2 | 3 | sed -i "s/^public //" $1 4 | sed -i "s/^
/public
/" $1 5 | -------------------------------------------------------------------------------- /scripts/en-g2p.sh: -------------------------------------------------------------------------------- 1 | #! /bin/sh 2 | 3 | . `dirname $0`/settings.sh 4 | 5 | 6 | tempfile=`mktemp` 7 | tempfile2=`mktemp` 8 | tempfile3=`mktemp` 9 | 10 | cat > $tempfile 11 | 12 | cat $tempfile | perl -C -ne 'BEGIN{use Text::Unidecode;} chomp; $x=unidecode($_); $x=uc($x); print "$_ $x\n"' | sort -k2 > $tempfile2 13 | 14 | cut -f 2 -d " " $tempfile2 > $tempfile3 15 | 16 | $PHONETISAURUS --model=$EN_FST --input=$tempfile3 --isfile --words | sort | join -1 2 -2 1 $tempfile2 - | perl -npe 's/\S+\s+(\S+)\s+\S+/\1 /' 17 | 18 | rm $tempfile $temfile2 $tempfile3 19 | -------------------------------------------------------------------------------- /scripts/fsg-to-dict.sh: -------------------------------------------------------------------------------- 1 | #! /bin/bash 2 | 3 | . `dirname $0`/settings.sh 4 | 5 | grep TRANSITION | cut -f 5 -d " " | sort | uniq | $ET_G2P 6 | -------------------------------------------------------------------------------- /scripts/fsg-to-dict_en.py: -------------------------------------------------------------------------------- 1 | #! /usr/bin/python 2 | 3 | import sys 4 | import re 5 | 6 | import os 7 | from subprocess import Popen, PIPE, STDOUT 8 | BASE_DICT="/usr/local/share/pocketsphinx/model/lm/en_US/cmu07a.dic" 9 | G2P=os.path.dirname(sys.argv[0]) + "/en-g2p.sh" 10 | 11 | words = {} 12 | for l in open(BASE_DICT): 13 | ss = l.split() 14 | word = ss[0] 15 | word = re.sub(r"\(\d\)$", "", word) 16 | try: 17 | prob = float(ss[1]) 18 | pron = ss[2:] 19 | except ValueError: 20 | prob = 1 21 | pron = ss[1:] 22 | 23 | words.setdefault(word, []).append((pron, prob)) 24 | 25 | input_words = set() 26 | 27 | for l in sys.stdin: 28 | if l.startswith("TRANSITION"): 29 | ss = l.split() 30 | if len(ss) == 5: 31 | input_words.add(ss[-1]) 32 | 33 | g2p_words = [] 34 | for w in input_words: 35 | if w.lower() in words: 36 | for (i, pron) in enumerate(words[w.lower()]): 37 | if i == 0: 38 | print w, 39 | else: 40 | print "%s(%d)" % (w, i+1), 41 | print " ".join(pron[0]) 42 | else: 43 | g2p_words.append(w) 44 | 45 | if len(g2p_words) > 0: 46 | proc = Popen(G2P,stdin=PIPE, stdout=PIPE, stderr=STDOUT ) 47 | #stdout, stderr = proc.communicate() 48 | for w in g2p_words: 49 | print >>proc.stdin, w 50 | proc.stdin.close() 51 | 52 | #return_code = proc.wait() 53 | 54 | for l in proc.stdout: 55 | print l, 56 | 57 | 58 | 59 | 60 | -------------------------------------------------------------------------------- /scripts/fsm2fsg.py: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env python 2 | 3 | import sys 4 | 5 | from math import exp 6 | 7 | if __name__ == '__main__': 8 | #HACK: currently all weights are set to 1. Seems to make the recognition more robust 9 | 10 | arcs = [] 11 | for l in sys.stdin: 12 | ss = l.split() 13 | if len(ss) == 5: 14 | arcs.append((int(ss[0]), int(ss[1]), ss[2], 1)) 15 | elif len(ss) == 4: 16 | arcs.append((int(ss[0]), int(ss[1]), ss[2], 1)) 17 | elif len(ss) == 2: 18 | arcs.append((int(ss[0]), -1, "", 1)) 19 | elif len(ss) == 1: 20 | arcs.append((int(ss[0]), -1, "", 1)) 21 | else: 22 | print >>sys.stderr, "WARNING: strange FSG line: ", l 23 | 24 | max_state = max([a[1] for a in arcs]) 25 | 26 | final_state_id = max_state + 1 27 | 28 | 29 | print "FSG_BEGIN " 30 | print "NUM_STATES", max_state + 2 31 | print "START_STATE 0" 32 | print "FINAL_STATE", final_state_id 33 | 34 | for a in arcs: 35 | print "TRANSITION", a[0], a[1] == -1 and final_state_id or a[1], "%7.5f" % min(1.0, a[3]), a[2] 36 | 37 | print "FSG_END" 38 | -------------------------------------------------------------------------------- /scripts/jsgf2fsg.sh: -------------------------------------------------------------------------------- 1 | #! /bin/sh 2 | 3 | if [ $# -ne 2 ] 4 | then 5 | echo "Usage: `basename $0` jsgf fsg" 6 | exit 1 7 | fi 8 | 9 | #sphinx_jsgf2fsg -jsgf $1 -fsg $2 10 | 11 | sphinx_jsgf2fsg -jsgf $1 -fsm ${1%.*}.fsm -symtab ${1%.*}.sym 12 | 13 | fstcompile --arc_type=log --acceptor --isymbols=${1%.*}.sym --keep_isymbols ${1%.*}.fsm | \ 14 | fstdeterminize | fstminimize | fstrmepsilon | fstprint | \ 15 | `dirname $0`/fsm2fsg.py > $2 16 | -------------------------------------------------------------------------------- /scripts/log2apps-png.sh: -------------------------------------------------------------------------------- 1 | #! /bin/bash 2 | 3 | DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )" 4 | 5 | $DIR/log2apps-txt.sh $1 | head -20 > tmp/apps.txt 6 | 7 | gnuplot < tmp/data.txt 5 | 6 | gnuplot < 31 | Available in Android Market 33 | 34 | 35 | Mõlemad rakendused on tasuta ja avatud lähtekoodiga. 36 | 37 | ## Serveri kasutamine Java rakendustes 38 | 39 | Serverit on lihtne kasutada läbi spetsiaalse teegi, mis on tasuta ja koos lähtekoodiga saadaval 40 | [siin](http://code.google.com/p/net-speech-api). 41 | 42 | ## Serveri kasutamine muudes rakendustes 43 | 44 | Serveri kasutamine on väga lihtne ka "otse", ilma vaheteegita. Järgnevalt demonstreerime, kuidas 45 | serverit kasutada Linuxi käsurealt. 46 | 47 | ### Näide 1: raw formaadis heli 48 | 49 | Lindista mikrofoniga üks lühike lause, kasutades raw formaati, 16 kB, mono kodeeringut (vajuta Ctrl-C, kui oled lõpetanud): 50 | 51 | arecord --format=S16_LE --file-type raw --channels 1 --rate 16000 > lause1.raw 52 | 53 | 54 | Nüüd, saada lause serverisse tuvastamisele (kasutades programmi curl, saadaval 55 | kõikide Linuxite repositoriumites): 56 | 57 | curl -X POST --data-binary @lause1.raw \ 58 | -H "Content-Type: audio/x-raw-int; rate=16000" \ 59 | http://bark.phon.ioc.ee/speech-api/v1/recognize?nbest=1 60 | 61 | 62 | Server genereerib vastuse JSON formaadis: 63 | 64 | 65 | { 66 | "status": 0, 67 | "hypotheses": [ 68 | { 69 | "utterance": "see on esimene lause" 70 | } 71 | ], 72 | "id": "4d00ffd9b1a101940bb3ed88c6b6300d" 73 | } 74 | 75 | ### Näide 2: ogg formaadis heli 76 | 77 | Server tunneb ka formaate flac, ogg, mpeg, wav. Päringu Content-Type väli peaks sel juhul olema 78 | vastavalt audio/x-flac, application/ogg, audio/mpeg või audio/x-wav. 79 | 80 | Salvestame ogg formaadis lause (selleks peaks olema installeeritud pakett SoX): 81 | 82 | rec -r 16000 lause2.ogg 83 | 84 | Saadame serverisse, kasutades PUT päringut: 85 | 86 | curl -T lause2.ogg -H "Content-Type: application/ogg" "http://bark.phon.ioc.ee/speech-api/v1/recognize?nbest=1" 87 | 88 | Väljund: 89 | 90 | { 91 | "status": 0, 92 | "hypotheses": [ 93 | { 94 | "utterance": "see on teine lause" 95 | } 96 | ], 97 | "id": "dfd8ed3a028d1e70e4233f500e21c027" 98 | } 99 | 100 | 101 | ### Näide 3: mitu tuvastushüpoteesi 102 | 103 | Parameeter nbest=1 ütles eelmises päringus serverile, et meid huvitab 104 | ainult üks tulemus. Vaikimisi annab server viis kõige tõenäolisemat tuvastushüpoteesi, 105 | hüpoteesi tõenäosuse järjekorras: 106 | 107 | curl -X POST --data-binary @lause1.raw \ 108 | -H "Content-Type: audio/x-raw-int; rate=16000" \ 109 | http://bark.phon.ioc.ee/speech-api/v1/recognize 110 | 111 | 112 | Tulemus: 113 | 114 | { 115 | "status": 0, 116 | "hypotheses": [ 117 | { 118 | "utterance": "see on esimene lause" 119 | }, 120 | { 121 | "utterance": "see on esimene lause on" 122 | }, 123 | { 124 | "utterance": "see on esimene lausa" 125 | }, 126 | { 127 | "utterance": "see on mu esimene lause" 128 | }, 129 | { 130 | "utterance": "see on esimene laose" 131 | } 132 | ], 133 | "id": "61c78c7271026153b83f39a514dc0c41" 134 | } 135 | 136 | ### Näide 4: JSGF grammatika kasutamine 137 | 138 | Vaikimisi kasutab server statistilist keelemudelit, mis üritab leida õige 139 | tuvastushüpoteesi kõikvõimalike eestikeelsete lausete hulgast. Mõnikord on 140 | aga kasulik võimalike lausete hulka piirata reeglipõhise grammatikaga. Server 141 | lubab grammatikaid defineerida kahes formaadis: 142 | [JSGF](http://java.sun.com/products/java-media/speech/forDevelopers/JSGF/) ja 143 | [GF](http://www.grammaticalframework.org/). 144 | 145 | Näiteks allolev JSGF formaadis grammatika aktsepteerib muu hulgas selliseid lauseid: 146 | 147 | >mine edasi 148 | 149 | >liigu kaks meetrit tagasi 150 | 151 | >liigu üks meeter edasi 152 | 153 | >keera paremale 154 | 155 | 156 | 157 | Grammatika: 158 | 159 | #JSGF V1.0; 160 | 161 | grammar robot; 162 | 163 | public = | ; 164 | = (liigu | mine ) [ ( üks meeter ) | ( (kaks | kolm | neli | viis ) meetrit ) ] (edasi | tagasi ); 165 | = (keera | pööra) [ paremale | vasakule ]; 166 | 167 | Grammatika kasutamiseks peab selle kõigepealt laadima kusagile internetiserverisse, kus ta oleks 168 | kõikjalt kättessadav (näiteks Dropboxi public folder). Antud juhul on grammatika 169 | [siin](http://www.phon.ioc.ee/~tanela/tmp/robot.jsgf). 170 | 171 | Seejärel tuleb kõnetuvastusserverile öelda, et ta grammatika alla laeks: 172 | 173 | curl "http://bark.phon.ioc.ee/speech-api/v1/fetch-lm?url=http://www.phon.ioc.ee/~tanela/tmp/robot.jsgf" 174 | 175 | Lindistame seejärel testlause (näiteks "liigu üks meeter edasi", Ctrl-C kui valmis): 176 | 177 | rec -r 16000 liigu_1m_edasi.ogg 178 | 179 | Grammatika abil tuvastamiseks tuleb päringule lisada parameeter 180 | lm=http://www.phon.ioc.ee/~tanela/tmp/robot.jsgf: 181 | 182 | 183 | curl -T liigu_1m_edasi.ogg \ 184 | -H "Content-Type: application/ogg" \ 185 | "http://bark.phon.ioc.ee/speech-api/v1/recognize?nbest=1&lm=http://www.phon.ioc.ee/~tanela/tmp/robot.jsgf" 186 | 187 | Vastus: 188 | 189 | { 190 | "status": 0, 191 | "hypotheses": [ 192 | { 193 | "utterance": "liigu \u00fcks meeter edasi" 194 | } 195 | ], 196 | "id": "c858c89badc3597ca8ec7f10985b71de" 197 | } 198 | 199 | NB! JSGF formaadis grammatikad peaksid olema ISO-8859-14 kodeeringus. Serveri vastus on 200 | UTF-8 kodeeringus, nagu JSON standard ette näeb. 201 | 202 | ### Näide 5: GF formaadis grammatika kasutamine (edasijõudnutele) 203 | 204 | GF on grammatikaformalism, mis lubab muu hulgas ühele abstraktsele grammatikale 205 | luua mitu implementatsiooni erinevates keeltes. Näiteks, abstraktne grammatika 206 | võib olla mõeldud roboti juhtimiseks, tema implementatsioon eesti keeles 207 | defineerib, kuidas robotit eesti keeles juhtida, ning teine implementatsioon 208 | "masinkeeles" defineerib roboti poolt arusaadava süntaksi. 209 | 210 | Palju eestikeelse implementatsiooniga GF grammatikaid leiab [siit](http://kaljurand.github.com/Grammars/). 211 | 212 | Nagu JSGF puhul, tuleb ka GF grammatika serverisse laadida, kasutades GF binaarset 213 | formaati (PGF). Antud juhul tuleb ka spetsifitseerida, 214 | millist grammatikaimplementatsiooni server kõnetuvastuseks kasutama peaks, kasutades parameetrit 215 | lang: 216 | 217 | curl "http://bark.phon.ioc.ee/speech-api/v1/fetch-lm?url=http://kaljurand.github.com/Grammars/grammars/pgf/Go.pgf&lang=Est" 218 | 219 | Salvestame jälle testlause (näiteks "mine neli meetrit edasi"): 220 | 221 | rec -r 16000 mine_4m_edasi.ogg 222 | 223 | Tuvastamiseks tuleb näidata serverile, milline on soovitav väljundkeel (parameetriga output-lang=App): 224 | 225 | curl -T mine_4m_edasi.ogg \ 226 | -H "Content-Type: application/ogg"\ 227 | "http://bark.phon.ioc.ee/speech-api/v1/recognize?nbest=1&lm=http://kaljurand.github.com/Grammars/grammars/pgf/Go.pgf&output-lang=App" 228 | 229 | Vastus: 230 | 231 | { 232 | "status": 0, 233 | "hypotheses": [ 234 | { 235 | "linearizations": [ 236 | { 237 | "lang": "App", 238 | "output": "4 m >" 239 | } 240 | ], 241 | "utterance": "mine neli meetrit edasi" 242 | } 243 | ], 244 | "id": "e2f3067d69ea22c75dc4b0073f23ff38" 245 | } 246 | 247 | Vastuses on nüüd iga hüpoteesi juures väli linearizations, 248 | mis annab sisendi "linearisatsiooni" (ehk tõlke) väljundkeeles. Antud grammatika 249 | puhul on linearisatsioon väljundkeeles "4 m >", mida on robotil võib-olla 250 | lihtsam parsida, kui eestikeelset käsklust. 251 | 252 | Kui PGF failis 253 | on grammatikaimplementatsioone rohkem, võib korraga küsida väljundit mitmes keeles: 254 | 255 | curl -T mine_4m_edasi.ogg \ 256 | -H "Content-Type: application/ogg" \ 257 | "http://bark.phon.ioc.ee/speech-api/v1/recognize?nbest=1&lm=http://kaljurand.github.com/Grammars/grammars/pgf/Go.pgf&output-lang=App,Eng,Est" 258 | 259 | Väljund: 260 | 261 | { 262 | "status": 0, 263 | "hypotheses": [ 264 | { 265 | "linearizations": [ 266 | { 267 | "lang": "App", 268 | "output": "4 m >" 269 | }, 270 | { 271 | "lang": "Eng", 272 | "output": "go four meters forward" 273 | }, 274 | { 275 | "lang": "Est", 276 | "output": "mine neli meetrit edasi" 277 | } 278 | ], 279 | "utterance": "mine neli meetrit edasi" 280 | } 281 | ], 282 | "id": "d9abdbc2a7669752059ad544d3ba14f7" 283 | } 284 | 285 | ## Korduma kippuvad küsimused 286 | 287 | #### Kas server salvestab mu kõnet? 288 | 289 | Jah. Üldjuhul neid salvestusi küll keegi ei kuula, aga pisteliselt võidakse 290 | salvestusi kuulata ja käsitsi transkribeerida tuvastuskvaliteedi hindamiseks 291 | ja parandamiseks. 292 | 293 | #### Tuvastuskvaliteet on väga halb! 294 | 295 | Jah. Parima kvaliteedi saab suu lähedal oleva mikrofoni kasutamisel. 296 | Loodetavasti tulevikus kvaliteet paraneb, kui saame juba serverisse saadetud 297 | salvestusi kasutada mudelite parandamiseks (vt eelmine küsimus). 298 | 299 | 300 | #### Kas ma võin serverit piiramatult tasuta kasutada? 301 | 302 | Mitte päris. Hetkel võib ühelt IP-lt teha tunnis kuni 100 ja päevas kuni 200 tuvastuspäringut. 303 | Tulevikus võivad need limiidid muutuda (see sõltub teenuse populaarsusest ja meie serveripargi 304 | arengust). 305 | 306 | 307 | #### Mis mõttes see tasuta on? 308 | 309 | Tehnoloogia on välja töötatud riikliku programmi "Eesti keeletehnoloogia 2011-2017" raames, seega 310 | on maksumaksja juba selle eest maksnud. Riiklik programm ei pane küll meile 311 | kohustust sellist serverit piiramatult hallata, sellepärast võivad tulevikus 312 | kasutustingimused muutuda, serveri tarkvara aga jääb alatiseks tasuta, kui 313 | ei teki mingeid muid seniarvestamata asjaolusid. 314 | 315 | #### OK, aga kas ma võin siis sellise tuvastustarkava enda serverisse installeerida? 316 | 317 | Jah. Serveri tarkvara on saadaval [siin](https://github.com/alumae/ruby-pocketsphinx-server), 318 | eesti keele akustilise ja statistilise keelemudeli ning liitsõnade rekonstrueerimismudeli 319 | saamiseks palume kontakteeruda. Mudelid ei ole päris "vabad", s.t. nendele kehtivad teatud 320 | kasutuspiirangud (näiteks ei või neid levitada). 321 | 322 | #### Kas iOS (Windows Phone 7, Blackberry, Meego) rakendus ka tuleb? 323 | 324 | Hetkel pole plaanis. Samas on server avatud kõikidele rakendustele, seega 325 | võib sellise rakenduse implementeerida keegi kolmas. 326 | 327 | ## Kontakt 328 | 329 | Tanel Alumäe: [tanel.alumae@phon.ioc.ee](tanel.alumae@phon.ioc.ee) 330 | -------------------------------------------------------------------------------- /views/layout.erb: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | <%= @title %> 5 | 6 | 7 | 26 | 27 | 28 | 29 |
30 |
31 | 32 | <%= yield %> 33 |
34 |

© TTÜ Küberneetika Instituut 2011

35 | 36 |
37 |
38 |
39 | 40 | 41 | 42 | --------------------------------------------------------------------------------