├── LICENSE
├── README.rdoc
├── conf-english.yaml
├── conf-estonian.yaml
├── conf.yaml
├── config.ru
├── dummy.fsg
├── lib
    ├── handlers
    │   ├── handler.rb
    │   ├── jsgf_handler.rb
    │   ├── pgf_handler.rb
    │   └── prettifier.rb
    ├── raw_recognizer.rb
    └── server.rb
├── scripts
    ├── convert-gf-jsgf.sh
    ├── en-g2p.sh
    ├── fsg-to-dict.sh
    ├── fsg-to-dict_en.py
    ├── fsm2fsg.py
    ├── jsgf2fsg.sh
    ├── log2apps-png.sh
    ├── log2apps-txt.sh
    ├── log2models-txt.sh
    ├── log2png.sh
    └── settings.sh
├── unicorn.conf.rb
└── views
    ├── index.md
    └── layout.erb


/LICENSE:
--------------------------------------------------------------------------------
 1 | Copyright (c) 2011 Tallinn Univeristy of Technology
 2 | All rights reserved.
 3 | 
 4 | Redistribution and use in source and binary forms, with or without
 5 | modification, are permitted provided that the following conditions
 6 | are met:
 7 | 1. Redistributions of source code must retain the above copyright
 8 |    notice, this list of conditions and the following disclaimer.
 9 | 2. Redistributions in binary form must reproduce the above copyright
10 |    notice, this list of conditions and the following disclaimer in the
11 |    documentation and/or other materials provided with the distribution.
12 | 3. The name of the author may not be used to endorse or promote products
13 |    derived from this software without specific prior written permission.
14 | 
15 | THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
16 | IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
17 | OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
18 | IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
19 | INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
20 | NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
21 | DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
22 | THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
23 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
24 | THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
25 | 


--------------------------------------------------------------------------------
/README.rdoc:
--------------------------------------------------------------------------------
  1 | = Introduction
  2 | 
  3 | Ruby-based web service for speech recognition, using the PocketSphinx gstreamer module.
  4 | 
  5 | = Requirements
  6 | 
  7 | * Ruby 1.8
  8 | * Sinatra
  9 | * Rack
 10 | * Unicorn
 11 | * PocketSphinx (NOTE: some features of the server require patched PocketSphinx, see below)
 12 | * Some acoustic and language models for PocketSphinx
 13 | 
 14 | 
 15 | = Installing
 16 | 
 17 | == CMU Sphinx
 18 | 
 19 | * Install sphinxbase from SVN (make, make install)
 20 | 
 21 | === Apply PocketSphinx patch
 22 | 
 23 | In cmusphinx/pocketsphinx directory:
 24 | 
 25 |   wget http://www.phon.ioc.ee/~tanela/ps_gst.patch
 26 |   patch  -p0 -i ps_gst.patch
 27 | 
 28 | 
 29 | Make sure you have GStreamer devevelopment packages installed. In Debian Squeeze:
 30 | 
 31 |   apt-get install libgstreamer0.10-dev libgstreamer-plugins-base0.10-dev
 32 |   
 33 | And configure, make, make install as usual.
 34 | 
 35 | == Install Ruby gems: Unicorn and Sinatra, UUID tools, JSON, locale
 36 | 
 37 | This assumes you have ruby and rubygems installed.
 38 | 
 39 | You might want to do this as root:
 40 | 
 41 |   gem install unicorn
 42 |   gem install sinatra
 43 |   gem install uuidtools
 44 |   gem install json
 45 |   gem install locale
 46 |   
 47 | Install ruby-gstreamer package (might vary depending on your distribution):
 48 |   
 49 |   apt-get install libgst-ruby1.8
 50 | 
 51 | == Additional tools
 52 | 
 53 | English GF-based recognizer also need:
 54 | 
 55 | * libtext-unidecode-perl
 56 | * Phonetisaurus, Phonetisaurus prebuilt model for English (http://code.google.com/p/phonetisaurus/downloads/detail?name=g014b2b.tgz)
 57 | * Python
 58 | 
 59 | 
 60 | == Run ruby-pocketsphinx-server
 61 | 
 62 | Clone the git repository:
 63 | 
 64 |   git clone git://github.com/alumae/ruby-pocketsphinx-server.git
 65 |   
 66 | Before executing, add `/usr/local/lib` to the path where GStreamer plugins are looked for:
 67 | 
 68 |   export GST_PLUGIN_PATH=/usr/local/lib
 69 | 
 70 | = Running
 71 | 
 72 |  unicorn -c unicorn.conf.rb config.ru
 73 |  
 74 | If you installed Unicorn as a Ruby gem, you might need to execute:
 75 |  
 76 |  /var/lib/gems/1.8/bin/unicorn -c unicorn.conf.rb config.ru
 77 |  
 78 | Test the default configuration (English WSJ language model with HUB4 acostic models), using a raw audio file in the PocketSphinx test directory 
 79 | (replace `$(POCKETSPHINX_DIR)` with the Pocketsphinx source directory):
 80 | 
 81 |   curl -T $(POCKETSPHINX_DIR)/test/data/wsj/n800_440c0207.wav -H "Content-Type: audio/x-wav"  "http://localhost:8080/recognize"
 82 |   
 83 | Response should be:
 84 | 
 85 |   {
 86 |     "status": 0,
 87 |     "hypotheses": [
 88 |       {
 89 |         "utterance": "the agency isn't likely to take any action until the union's rank and file votes on the contract into three weeks"
 90 |       },
 91 |       {
 92 |         "utterance": "the agency isn't likely to take any action until the union's rank and file puts on the contract into three weeks"
 93 |       },
 94 |       {
 95 |         "utterance": "the agency isn't likely to take any action until the union's rank and file funds from the contract into three weeks"
 96 |       },
 97 |       {
 98 |         "utterance": "the agency isn't likely to take any action until the union's rank and file for from the contract into three weeks"
 99 |       },
100 |       {
101 |         "utterance": "the agency isn't likely to take any action until the union's rank and file parts of the contract into three weeks"
102 |       }
103 |     ],
104 |     "id": "8686a37b5674cbdc63deb13f73de81a5"
105 |   }
106 | 
107 | 
108 | = Configuration
109 | 
110 | == Web service
111 | 
112 | Unicorn configuration is in file unicorn.conf.rb. See http://unicorn.bogomips.org/examples/unicorn.conf.rb for
113 | more info. 
114 | 
115 | == Recognizer
116 | 
117 | See conf.yaml
118 | 
119 | = Using the web service
120 | 
121 | Some of the more advanced examples below are specific to the Estonian configuration.
122 | 
123 | ==Example 1
124 | 
125 | Record a sentence to a wav file, in mono (hit Ctrl-C when done speaking):
126 | 
127 |  rec -c 1 sentence.wav
128 |  
129 |  
130 | Send it to the web service:
131 | 
132 |  curl   -X POST --data-binary @sentence.wav -H "Content-Type: audio/x-wav"  http://localhost:8080/recognize
133 | 
134 | Output (encoded using json, the example uses Estonian models):
135 | 
136 |   {
137 |     "status": 0,
138 |     "hypotheses": [
139 |       {
140 |         "utterance": [
141 |           "t\u00e4na on v\u00e4ljas \u00fcsna ilus ilm"
142 |         ]
143 |       }
144 |     ],
145 |     "id": "e30f54561135d681599915562d77d240"
146 |   }
147 |  
148 | == Example 2
149 | 
150 | Record a raw file using arecord:
151 | 
152 |  arecord --format=S16_LE  --file-type raw  --channels 1 --rate 16000 > sentence2.raw
153 | 
154 | Send it to web service:
155 |  
156 |  curl -X POST --data-binary @sentence2.raw -H "Content-Type: audio/x-raw-int; rate=16000"  http://localhost:8080/recognize
157 |  
158 | == Example 3
159 | 
160 | Record a 5 second audio, pipe it to curl, which streams it directly to web service using PUT (and gets almost instant response):
161 | 
162 |  arecord --format=S16_LE --file-type raw --channels 1 --rate 16000 --duration 5 | curl -vv -T - -H "Content-Type: audio/x-raw-int; rate=16000"  http://localhost:8080/recognize
163 |  
164 |  
165 | = Support for JSGF grammars
166 | 
167 | Users can use their own grammars to recognize certain sentences. The grammars should be in JSGF format.
168 | 
169 | Example JSGF (let's call it robot.jsgf)
170 | 
171 |  #JSGF V1.0;
172 |   
173 |  grammar robot;
174 |    
175 |  public <command> = (liigu | mine ) [ ( üks | kaks | kolm | neli | viis ) meetrit ] (edasi | tagasi);
176 |  
177 | NB! Grammars should be in the same charset that the server is using for dictionary, which currently is latin-1 (sorry for that). 
178 |  
179 | You need to upload the JSGF file to somewhere where the server can fetch it, let's say http://www.example.com/robot.txt
180 |  
181 | Now, let the server download and compile it:
182 | 
183 |  curl -vv  http://localhost:8080/fetch-lm?url=http://www.example.com/robot.jsgf
184 | 
185 | This should result in HTTP/1.1 200 OK.
186 | 
187 | Now you can use the grammar to recognize a sentence that is accepted by the grammar:
188 | 
189 |  arecord --format=S16_LE --file-type raw --channels 1 --rate 16000 --duration 5 | \
190 |  curl -vv -T - -H "Content-Type: audio/x-raw-int; rate=16000"  http://localhost:8080/recognize?lm=http://www.example.com/robot.jsgf
191 | 
192 | Result:
193 |  
194 |  {
195 |    "status": 0,
196 |    "hypotheses": [
197 |      {
198 |        "utterance": "mine viis meetrit tagasi"
199 |      }
200 |    ],
201 |    "id": "9e3895e9ee0b5138e73c6fca30f51a58"
202 |  }
203 | 
204 | If you update the grammar on the server, you need to make the /fetch-jsgf request again, as the server doesn't check for changes every time
205 | a recognition request is done (for efficiency reasons).
206 | 
207 | = Support for GF grammars
208 | 
209 | GF (Grammatical Framework) grammars are supported. 
210 | 
211 | A GF grammar must be compiled into a .pgf file. To upload it to the server, use the fetch-pgf API call, e.g.:
212 |   
213 |   curl "http://bark.phon.ioc.ee/speech-api/v1/fetch-lm?url=http://kaljurand.github.com/Grammars/grammars/pgf/Calc.pgf&lang=Est"
214 |   
215 | The 'lang' attribute (defaults to 'Est') specifies input languages of the grammar. Many comma-separated languages can be specified, e.g lang=Est,Est2
216 | 
217 | To recognize with a GF, use similar request as with JSGF, e.g.:
218 | 
219 |   arecord --format=S16_LE --file-type raw --channels 1 --rate 16000 --duration 5 | curl -vv -T - -H "Content-Type: audio/x-raw-int; rate=16000"  "http://localhost:8080/recognize?lm=http://kaljurand.github.com/Grammars/grammars/pgf/Calc.pgf
220 |   
221 | You can also specify output language(s) that will be used to linearize the raw recognition result, e.g.:
222 |  
223 |  arecord --format=S16_LE --file-type raw --channels 1 --rate 16000 --duration 5 | curl -vv -T - -H "Content-Type: audio/x-raw-int; rate=16000"  "http://localhost:8080/recognize?lm=http://kaljurand.github.com/Grammars/grammars/pgf/Calc.pgf&output-lang=App"
224 |  
225 | Output:
226 | 
227 |  {
228 |   "status": 0,
229 |   "hypotheses": [
230 |     {
231 |       "utterance": "viis minutit sekundites",
232 |       "linearizations": [
233 |         {
234 |           "lang": "App",
235 |           "output": "5 ' IN \""
236 |         },
237 |         {
238 |           "lang": "App",
239 |           "output": "5 min IN s"
240 |         }
241 |       ]
242 |     }
243 |   ],
244 |   "id": "83486feaca30995401ed4a66951a3f23"
245 |  }
246 |   
247 | Multiple output languages can be used, by using comma-separated values: "..&output-lang=App,App2"
248 | 


--------------------------------------------------------------------------------
/conf-english.yaml:
--------------------------------------------------------------------------------
 1 | 
 2 | handlers:
 3 |   
 4 |  - name: PrettifyingHandler
 5 |    require: handlers/prettifier
 6 |    recognizer:
 7 |      hmm: /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k
 8 |      lm: /usr/local/share/pocketsphinx/model/lm/en_US/wsj0vp.5000.DMP
 9 |      dict: /usr/local/share/pocketsphinx/model/lm/en_US/cmu07a.dic
10 |      bestpath: true
11 |      maxwpf: 700
12 |      fwdflat: true
13 |      maxhmmpf: 12000
14 |      wbeam: 1.0e-32
15 |      beam: 1.0e-50
16 |      pbeam: 1.0e-50
17 |    
18 | 
19 | # Request audio and metadata is dumped to this dir
20 | request_dump_dir: out
21 | 
22 | # Encoding of LM words. Used for converting output to JSON (which is always UTF-8)
23 | recognizer_encoding: iso-8859-15    
24 | 
25 | 


--------------------------------------------------------------------------------
/conf-estonian.yaml:
--------------------------------------------------------------------------------
 1 | 
 2 | handlers:
 3 |  - name: JSGFHandler
 4 |    require: handlers/jsgf_handler
 5 |    grammar_dir: user_grammars
 6 |    jsgf-to-fsg: ./scripts/jsgf2fsg.sh
 7 |    fsg-to-dict: ./scripts/fsg-to-dict.sh  
 8 |    recognizer:
 9 |      hmm: models/est16k.cd_cont_3000-mapadapt
10 |      dict: models/konele.splitw2.dict
11 |      fsg: dummy.fsg
12 |      bestpath: true
13 |      fwdflat: true
14 |      beam: 1.0e-80
15 |      pbeam: 1.0e-47
16 |      wbeam: 1.0e-39
17 | 
18 |  - name: PGFHandler
19 |    require: handlers/pgf_handler
20 |    lang: Est
21 |    grammar_dir: user_gfs
22 |    jsgf-to-fsg: ./scripts/jsgf2fsg.sh
23 |    fsg-to-dict: ./scripts/fsg-to-dict.sh  
24 |    convert-gf-jsgf: ./scripts/convert-gf-jsgf.sh
25 |    recognizer:
26 |      hmm: models/est16k.cd_cont_3000-mapadapt
27 |      dict: models/konele.splitw2.dict
28 |      fsg: dummy.fsg
29 |      bestpath: true
30 |      fwdflat: true
31 |      beam: 1.0e-80
32 |      pbeam: 1.0e-47
33 |      wbeam: 1.0e-39
34 | 
35 |  - name: PGFHandler
36 |    require: handlers/pgf_handler
37 |    lang: Eng
38 |    grammar_dir: user_gfs
39 |    jsgf-to-fsg: ./scripts/jsgf2fsg.sh
40 |    fsg-to-dict: ./scripts/fsg-to-dict_en.py
41 |    convert-gf-jsgf: ./scripts/convert-gf-jsgf.sh
42 |    recognizer:
43 |       fsg: dummy.fsg
44 |       hmm: models/voxforge_en_sphinx.cd_cont_5000
45 | 
46 |   
47 |  - name: PrettifyingHandler
48 |    require: handlers/prettifier
49 |    prettifier: ./prettify-with-numbers.sh
50 |    recognizer:
51 |        hmm: models/est16k.cd_cont_3000
52 |        dict: models/konele.splitw2.dict
53 |        lm: models/sphinx-trigram.konele.splitw2.arpa.gz
54 |        bestpath: false
55 |        maxwpf: 800
56 |        fwdflat: false
57 |        maxhmmpf: 10000
58 |        wbeam: 1.0e-22
59 |        beam: 1.0e-50
60 |        pbeam: 1.0e-50
61 |        
62 | 
63 | request_dump_dir: out
64 | recognizer_encoding: UTF-8    
65 | 
66 | 
67 | 


--------------------------------------------------------------------------------
/conf.yaml:
--------------------------------------------------------------------------------
 1 | 
 2 | handlers:
 3 |   
 4 |  - name: PrettifyingHandler
 5 |    require: handlers/prettifier
 6 |    recognizer:
 7 |      hmm: /usr/local/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k
 8 |      lm: /usr/local/share/pocketsphinx/model/lm/en_US/wsj0vp.5000.DMP
 9 |      dict: /usr/local/share/pocketsphinx/model/lm/en_US/cmu07a.dic
10 |      bestpath: true
11 |      maxwpf: 700
12 |      fwdflat: true
13 |      maxhmmpf: 12000
14 |      wbeam: 1.0e-32
15 |      beam: 1.0e-50
16 |      pbeam: 1.0e-50
17 |    
18 | 
19 | # Request audio and metadata is dumped to this dir
20 | request_dump_dir: out
21 | 
22 | # Encoding of LM words. Used for converting output to JSON (which is always UTF-8)
23 | recognizer_encoding: iso-8859-15    
24 | 
25 | 


--------------------------------------------------------------------------------
/config.ru:
--------------------------------------------------------------------------------
 1 | 
 2 | 
 3 | # the below script is a standalone Sinatra application; absolutely
 4 | # nothing special needs to be done in this Sinatra app for running
 5 | # with Unicorn
 6 | 
 7 | puts "-----------STARTING?----------"
 8 | puts Dir.pwd
 9 | 
10 | $LOAD_PATH.unshift File.join(File.dirname(__FILE__), 'lib')
11 | 
12 | require 'server'
13 | 
14 | # the following hash needs to be the last statement, as unicorn
15 | # will eval this entire file
16 | run PocketsphinxServer::Server
17 | 
18 | 
19 | 


--------------------------------------------------------------------------------
/dummy.fsg:
--------------------------------------------------------------------------------
1 | FSG_BEGIN <dummy>
2 | NUM_STATES 2
3 | START_STATE 0
4 | FINAL_STATE 1
5 | 
6 | TRANSITION 0 1 1.000000
7 | FSG_END
8 | 


--------------------------------------------------------------------------------
/lib/handlers/handler.rb:
--------------------------------------------------------------------------------
 1 | 
 2 | 
 3 | 
 4 | class PocketsphinxServer::Handler
 5 | 
 6 |   attr_reader :recognizer, :config
 7 | 
 8 |   def initialize(server, config={})
 9 |     @config = config
10 |     @server = server
11 |     @recognizer = PocketsphinxServer::Recognizer.new(server, config.fetch('recognizer', {}))
12 |   end
13 |   
14 |   # Can this handler handle this request?
15 |   def can_handle?(req)
16 |     true
17 |   end
18 |   
19 |   # Prepare the recognizer for this request (switch LM, etc)
20 |   def prepare_rec(req)
21 |   
22 |   end
23 |   
24 |   # Postprocess an hypothesis string (e.g., make it prettyier)
25 |   def postprocess_hypothesis(hyp)
26 |     hyp
27 |   end
28 | 
29 |   # Return a map of extra data for a hypothesis
30 |   def get_hyp_extras(req, hyp)
31 |     {}
32 |   end
33 | 
34 |   def can_handle_fetch_lm?(req)
35 |     false
36 |   end
37 |   
38 |   def handle_fetch_lm(req)
39 |   
40 |   end
41 | 
42 |   def log(str)
43 |     @server.logger.info str
44 |   end
45 | end
46 | 


--------------------------------------------------------------------------------
/lib/handlers/jsgf_handler.rb:
--------------------------------------------------------------------------------
 1 | 
 2 | require 'handlers/handler'
 3 | 
 4 | class PocketsphinxServer::JSGFHandler < PocketsphinxServer::Handler
 5 | 
 6 |   def initialize(server, config={})
 7 |     super
 8 |     @grammar_dir = config.fetch('grammar-dir', 'user_grammars')
 9 |   end
10 |   
11 |   def can_handle?(req)
12 |     lm_name = req.params['lm']
13 |     return (lm_name != nil) && (lm_name =~ /jsgf$/)
14 |   end
15 | 
16 |   def prepare_rec(req)
17 |     lm_name = req.params['lm']
18 |     log("Using JSGF-based grammar")
19 |     dict_file = dict_file_from_url(lm_name)
20 |     fsg_file = fsg_file_from_url(lm_name)
21 |     if not File.exists? fsg_file
22 |       raise IOError, "Language model #{lm_name} not available. Use /fetch-lm API call to upload it to the server"
23 |     end
24 |     if not File.exists? dict_file
25 |       raise IOError, "Pronunciation dictionary for #{lm_name} not available. Use /fetch-lm API call to make it on the server"
26 |     end
27 |     @recognizer.set_fsg(fsg_file, dict_file)      
28 |     log("Loaded requested JSGF model from #{fsg_file}")
29 |     
30 |   end
31 |   
32 |   def can_handle_fetch_lm?(req)
33 |     lm_name = req.params['url']
34 |     if lm_name == nil
35 |       # backward compability
36 |       lm_name = req.params['lm']
37 |     end
38 |     return (lm_name != nil) && (lm_name =~ /jsgf$/)
39 |   end
40 |   
41 |   def handle_fetch_lm(req)
42 |     url = req.params['url']  
43 |     if url == nil
44 |       # backward compability
45 |       url = req.params['lm']
46 |     end
47 |     log "Fetching JSGF grammar from #{url}"
48 |     digest = MD5.hexdigest url
49 |     content = open(url).read
50 |     jsgf_file =  @grammar_dir + "/#{digest}.jsgf"
51 |     fsg_file =  fsg_file_from_url(url)
52 |     dict_file =  dict_file_from_url(url)
53 |     File.open(jsgf_file, 'w') { |f|
54 |         f.write(content)
55 |     }
56 |     log "Converting to FSG.."
57 |     `#{@config['jsgf-to-fsg']} #{jsgf_file} #{fsg_file}`
58 |     if $? != 0
59 |         raise "Failed to convert JSGF to FSG" 
60 |     end
61 |     log "Making dictionary.."
62 |     `cat #{fsg_file} | #{@config['fsg-to-dict']} > #{dict_file}`
63 |     if $? != 0
64 |         raise "Failed to make dictionary from FSG" 
65 |     end
66 |     "Request completed"
67 |   end
68 |   
69 |   def fsg_file_from_url(url)
70 |     digest = MD5.hexdigest url
71 |     return @grammar_dir + "/#{digest}.fsg"
72 |   end
73 | 
74 |   def dict_file_from_url(url)
75 |     digest = MD5.hexdigest url
76 |     return @grammar_dir + "/#{digest}.dict"
77 |   end
78 |   
79 |   
80 |   
81 | end
82 | 
83 | 


--------------------------------------------------------------------------------
/lib/handlers/pgf_handler.rb:
--------------------------------------------------------------------------------
  1 | 
  2 | require 'handlers/handler'
  3 | require 'rubygems'
  4 | require 'locale'
  5 | require 'locale/info'
  6 | 
  7 | class PocketsphinxServer::PGFHandler < PocketsphinxServer::Handler
  8 | 
  9 |   def initialize(server, config={})
 10 |     super
 11 |     @grammar_dir = config.fetch('grammar-dir', 'user_gfs')
 12 |     configured_language = config.fetch('lang', 'et')
 13 |     @language = Locale::Info.get_language(Locale::Tag.parse(configured_language).language)
 14 |   end
 15 | 
 16 |   def get_request_language(req)
 17 |     lang = req.params['lang']
 18 |     if lang == nil
 19 |       lang = "et"
 20 |     end
 21 |     return Locale::Info.get_language(Locale::Tag.parse(lang).language)
 22 |   end
 23 | 
 24 |   def can_handle?(req)
 25 |     if @language != get_request_language(req)
 26 |       return false
 27 |     end    
 28 |     lm_name = req.params['lm']    
 29 |     return (lm_name != nil) && (lm_name =~ /pgf$/)
 30 |   end
 31 | 
 32 |   def get_req_properties(req)
 33 |     input_lang = get_request_language(req).three_code.capitalize()
 34 |     output_langs = req.params['output-lang']
 35 |     lm_name = req.params['lm']
 36 |     digest = MD5.hexdigest lm_name
 37 |     pgf_dir = @grammar_dir + '/' + digest
 38 |     pgf_basename = File.basename(URI.parse(lm_name).path, ".pgf")
 39 |     return input_lang, output_langs, pgf_dir, pgf_basename, lm_name
 40 |   end
 41 |   
 42 |   def prepare_rec(req)
 43 |     puts "Using GF-based grammar"
 44 |     input_lang, output_langs, pgf_dir, pgf_basename, lm_name = get_req_properties(req)
 45 |     fsg_file = pgf_dir + '/' + pgf_basename + input_lang + ".fsg"
 46 |     dict_file = pgf_dir + '/' + pgf_basename + input_lang + ".dict"
 47 |     if not File.exists? fsg_file
 48 |       raise IOError, "Grammar for lang #{input_lang} for #{lm_name} not available. Use /fetch-lm API call to upload it to the server"
 49 |     end
 50 |     if not File.exists? dict_file
 51 |         raise IOError, "Pronunciation dictionary for lang #{input_lang} for #{lm_name} not available. Use /fetch-lm API call to make it on the server"
 52 |     end
 53 |     @recognizer.set_fsg(fsg_file, dict_file)      
 54 |   end
 55 | 
 56 |   def get_hyp_extras(req, hyp)
 57 |     input_lang, output_langs, pgf_dir, pgf_basename, lm_name = get_req_properties(req)
 58 |     linearizations = []
 59 |     if not output_langs.nil?
 60 |       output_langs.split(",").each do |output_lang|
 61 |         log "Linearizing [#{hyp}] to lang #{output_lang}"
 62 |         outputs = `echo "parse -lang=#{pgf_basename + input_lang} \\"#{hyp}\\" | linearize -lang=#{pgf_basename + output_lang} | ps -bind" | gf --run #{pgf_dir + '/' + pgf_basename + '.pgf'}`
 63 |         output_lines = outputs.split("\n")
 64 |         if output_lines == []
 65 |           output_lines = [""]
 66 |         end
 67 |         output_lines.each do |output|
 68 |           log "LINEARIZED RESULT: " + output
 69 |           linearizations.push({:output => output, :lang => output_lang})
 70 |         end
 71 |       end
 72 |     end
 73 |     return {:linearizations => linearizations}    
 74 |   end
 75 | 
 76 |   def can_handle_fetch_lm?(req)
 77 |     lm_name = req.params['url']
 78 |     langs = req.params['lang']
 79 |     if langs == nil
 80 |       langs = "Est"
 81 |     end
 82 |     if not langs.split(',').collect { |l|  Locale::Info.get_language(Locale::Tag.parse(l).language)}.include? @language
 83 |       return false
 84 |     end
 85 |     return (lm_name != nil) && (lm_name =~ /pgf$/)
 86 |   end
 87 | 
 88 |   def handle_fetch_lm(req)
 89 |     url = req.params['url']  
 90 |     log "Fetching PGF from #{url}"
 91 |     digest = MD5.hexdigest url
 92 |     content = open(url).read
 93 |     pgf_dir = @grammar_dir + '/' + digest
 94 |     FileUtils.mkdir_p pgf_dir
 95 |     pgf_basename = File.basename(URI.parse(url).path, ".pgf")
 96 |     File.open(pgf_dir + '/' + pgf_basename + ".pgf", 'w') { |f|
 97 |         f.write(content)
 98 |     }
 99 |     log 'Extracting concrete grammars'
100 |     `gf -make --output-format=jsgf --output-dir=#{pgf_dir} #{pgf_dir + '/' + pgf_basename + ".pgf"}` 
101 |     if $? != 0
102 |         raise "Failed to extract JSGF from PGF" 
103 |     end
104 | 
105 |     lang = @language.three_code.capitalize()
106 |      
107 |     jsgf_file = pgf_dir + '/' + pgf_basename + lang + ".jsgf"
108 |     fsg_file = pgf_dir + '/' + pgf_basename + lang + ".fsg"
109 |     dict_file = pgf_dir + '/' + pgf_basename + lang + ".dict"
110 |     log "Making finite state grammar for input language #{lang}"
111 |     log "Converting JSGF.."
112 |     `#{@config.fetch('convert-gf-jsgf')} #{jsgf_file}`
113 |     if $? != 0
114 |       raise "Failed to convert JSGF for lang #{lang}" 
115 |     end
116 |     log "Converting to FSG.."
117 |     `#{@config.fetch('jsgf-to-fsg')} #{jsgf_file} #{fsg_file}`
118 |     if $? != 0
119 |       raise "Failed to convert JSGF to FSG for lang #{lang}" 
120 |     end
121 |     log "Making dictionary.."
122 |     `cat #{fsg_file} | #{@config.fetch('fsg-to-dict')} -lang #{lang} > #{dict_file}`
123 |     if $? != 0
124 |       raise "Failed to make dictionary from FSG for lang #{lang}" 
125 |     end
126 |        
127 | 
128 |     "Request completed"
129 |   end
130 |   
131 | end
132 | 
133 | 


--------------------------------------------------------------------------------
/lib/handlers/prettifier.rb:
--------------------------------------------------------------------------------
 1 | require 'handlers/handler'
 2 | 
 3 | ###
 4 | # This handler postprocesses hyps using an external script that can 
 5 | # syncronously process text, i.e., for each input line it instantly
 6 | # flushes an output line
 7 | ###
 8 | class PocketsphinxServer::PrettifyingHandler < PocketsphinxServer::Handler
 9 | 
10 |   def initialize(server, config={})
11 |     super
12 |     @prettifier
13 |     if config['prettifier'] != nil
14 |       @prettifier = IO.popen(config['prettifier'], mode="r+")
15 |     end
16 |   end
17 | 
18 |   def postprocess_hypothesis(hyp)
19 |     if @prettifier != nil && hyp && !hyp.empty?
20 |       log "PRETTIFYING: #{hyp}"
21 |       @prettifier.puts "#{hyp}"
22 |       @prettifier.flush
23 |       result = @prettifier.gets.strip
24 |       log "RESULT:      #{result}"
25 |       return result
26 |     end
27 |     hyp
28 |   end
29 | end
30 | 


--------------------------------------------------------------------------------
/lib/raw_recognizer.rb:
--------------------------------------------------------------------------------
  1 | require 'gst'
  2 | Gst.init
  3 | 
  4 | class PocketsphinxServer::Recognizer
  5 |   attr :result
  6 |   attr :queue
  7 |   attr :pipeline
  8 |   attr :appsrc
  9 |   attr :asr
 10 |   attr :clock
 11 |   attr :appsink
 12 |   attr :recognizing
 13 |          
 14 |   def initialize(server, config={})
 15 |     @server = server
 16 |     @data_buffers = []
 17 |     @clock = Gst::SystemClock.new
 18 |     @result = ""
 19 |     @recognizing = false
 20 | 
 21 |     @outdir = nil
 22 |     begin
 23 |       @outdir = server.config.fetch('request_dump_dir' '')
 24 |     rescue
 25 |     end
 26 | 
 27 |     @appsrc = Gst::ElementFactory.make "appsrc", "appsrc"
 28 |     @decoder = Gst::ElementFactory.make "decodebin2", "decoder"
 29 |     @audioconvert = Gst::ElementFactory.make "audioconvert", "audioconvert"
 30 |     @audioresample = Gst::ElementFactory.make "audioresample", "audioresample"    
 31 |     @tee = Gst::ElementFactory.make "tee", "tee"
 32 |     @queue1 = Gst::ElementFactory.make "queue", "queue1"
 33 |     @filesink = Gst::ElementFactory.make "filesink", "filesink"
 34 |     @queue2 = Gst::ElementFactory.make "queue", "queue2"  
 35 |     @asr = Gst::ElementFactory.make "pocketsphinx", "asr"
 36 |     @appsink = Gst::ElementFactory.make "appsink", "appsink"
 37 | 
 38 |     @filesink.set_property("location", "/dev/null")
 39 |     
 40 |     config.map{ |k,v|
 41 |       log "Setting #{k} to #{v}..."
 42 |       @asr.set_property(k, v) 
 43 |     }
 44 |     # This returns when ASR engine has been fully loaded
 45 |     @asr.set_property('configured', true)
 46 | 
 47 |     create_pipeline()
 48 |   end
 49 | 
 50 | 
 51 |   def log(str)
 52 |     @server.logger.debug(str)
 53 |   end
 54 |  
 55 |   def create_pipeline()
 56 |     @pipeline = Gst::Pipeline.new "pipeline"
 57 |     @pipeline.add @appsrc, @decoder, @audioconvert, @audioresample, @tee, @queue1, @filesink, @queue2, @asr, @appsink
 58 |     @appsrc >> @decoder
 59 |     @audioconvert >> @audioresample >> @tee
 60 |     @tee >> @queue1 >> @asr >> @appsink
 61 |     @tee >> @queue2 >> @filesink
 62 |     
 63 |     @decoder.signal_connect('pad-added') { | element, pad, last, data |
 64 |       log "---- pad-added ---- "
 65 |       pad.link @audioconvert.get_pad("sink")
 66 |     }
 67 | 
 68 |     @queue = Queue.new
 69 |     
 70 |     
 71 |     @asr.signal_connect('partial_result') { |asr, text, uttid|
 72 |         #log "PARTIAL: " + text
 73 |         @result = text
 74 |     }
 75 | 
 76 |     @asr.signal_connect('result') { |asr, text, uttid|
 77 |         #log "FINAL: " + text
 78 |         if text.nil?
 79 |           text = ""
 80 |         end
 81 |         @result = text
 82 |         @queue.push(1)
 83 |     }
 84 |     
 85 |     @appsink = @pipeline.get_child("appsink")
 86 |     
 87 |     @appsink.signal_connect('eos') { |appsink, data|
 88 |         log "##### EOS #####"
 89 |     }
 90 | 
 91 |     @bus = @pipeline.bus
 92 |     @bus.signal_connect('message::state-changed') { |appsink, data|
 93 |         log "##### STATE-CHANGED #####"
 94 |     }
 95 |   end     
 96 |     
 97 |   
 98 |   # Call this before starting a new recognition
 99 |   def clear(id, caps_str)
100 |     caps = Gst::Caps.parse(caps_str)
101 |     @appsrc.set_property("caps", caps)
102 |     @result = ""
103 |     queue.clear
104 |     pipeline.pause
105 |     if @outdir != nil
106 |       @filesink.set_state(Gst::State::NULL)
107 |       @filesink.set_property('location', "#{@outdir}/#{id}.raw")
108 |     end
109 |     @filesink.set_state(Gst::State::PLAYING)
110 |   end  
111 |   
112 |   def set_cmn_mean(mean)
113 |     @asr.set_property("cmn_mean", mean)
114 |   end
115 | 
116 |   def get_cmn_mean()
117 |     return @asr.get_property("cmn_mean")
118 |   end
119 | 
120 |   # Feed new chunk of audio data to the recognizer
121 |   def feed_data(data)
122 |     buffer = Gst::Buffer.new
123 |     my_data = data.dup
124 |     buffer.data = my_data
125 |     buffer.timestamp = clock.time
126 |     appsrc.push_buffer(buffer)
127 |     # HACK: we need to reference the buffer so that ruby won't overwrite it
128 |     @data_buffers.push my_data
129 |     pipeline.play
130 |     @recognizing = true
131 |   end
132 |   
133 |   # Notify recognizer of utterance end
134 |   def feed_end
135 |     appsrc.end_of_stream
136 |   end
137 |   
138 |   # Wait for the recognizer to recognize the current utterance
139 |   # Returns the final recognition result
140 |   def wait_final_result(max_nbest = 5)
141 |     queue.pop
142 |     # we request more N-best hyps than needed since we don't care about 
143 |     # differences in fillers
144 |     @asr.set_property("nbest_size", max_nbest * 3)
145 |     nbest = @asr.get_property("nbest")
146 |     nbest.uniq!
147 |     #nbest.map!{ |hyp| if hyp.nil? then hyp = "" end }
148 |     @pipeline.ready
149 |     @data_buffers.clear
150 |     @recognizing = false
151 |     log "CMN mean after: #{@asr.get_property("cmn_mean")}"
152 |     return result, nbest[0..max_nbest-1]
153 |   end  
154 |   
155 |   def stop
156 |     #@pipeline.play
157 |     appsrc.end_of_stream
158 |     wait_final_result    
159 |   end
160 |   
161 |   def set_fsg(fsg_file, dict_file)
162 |     @asr.set_property('fsg', 'dummy.fsg')  
163 |     log "Trying to use dict #{dict_file}"
164 |     @asr.set_property('dict', dict_file)    
165 |     log "Trying to use FSG #{fsg_file}"
166 |     @asr.set_property('fsg', fsg_file)
167 |     @asr.set_property('configured', true)    
168 |     log "FSG configured"
169 |   end
170 |   
171 |   def recognizing?()
172 |     @recognizing
173 |   end
174 | end
175 | 


--------------------------------------------------------------------------------
/lib/server.rb:
--------------------------------------------------------------------------------
  1 | require 'sinatra/base'
  2 | require 'uuidtools'
  3 | require 'json'
  4 | require 'iconv'
  5 | require 'set'
  6 | require 'yaml'
  7 | require 'open-uri'
  8 | require 'md5'
  9 | require 'uri'
 10 | 
 11 | module PocketsphinxServer
 12 | 
 13 | require 'raw_recognizer'
 14 | 
 15 |   class PocketsphinxServer::Server < Sinatra::Base
 16 |   
 17 |     configure do
 18 |       enable :static
 19 |       set :root, File.expand_path(".")
 20 | 
 21 |       set :public_folder, 'static'
 22 |       
 23 |       enable :logging
 24 |       disable :show_exceptions
 25 | 
 26 |       LOGGER = Logger.new(STDOUT)
 27 |       set :logger, LOGGER
 28 |       def LOGGER.puts(*s)
 29 |         s.flatten.each { |item| info(item.to_s) }
 30 |       end
 31 | 
 32 |       def LOGGER.write(*s)
 33 |         s.flatten.each { |item| info(item.to_s) }
 34 |       end
 35 | 
 36 |       $stdout = LOGGER      
 37 |       $stderr = LOGGER      
 38 |       
 39 |       set :config, YAML.load_file('conf.yaml')
 40 |   
 41 |       set :handlers, []
 42 |       config['handlers'].each do |handler_config|
 43 |         className = handler_config['name']
 44 |         requireName = handler_config['require']
 45 |         require requireName
 46 |         puts "Creating handler #{className}"
 47 |         handler = PocketsphinxServer.const_get(className).new(self, handler_config)  
 48 |         handlers << handler    
 49 |       end
 50 | 
 51 |       begin
 52 |         set :outdir, config["request_dump_dir"]
 53 |         Dir.mkdir(settings.outdir)
 54 |       rescue
 55 |       end
 56 |       
 57 |       CHUNK_SIZE = 256
 58 |       
 59 |     end
 60 | 
 61 |     get '/' do
 62 |        markdown :index, :layout_engine => :erb
 63 |     end
 64 |     
 65 |     # FIXME: make it concurrent-safe
 66 |     get '/stats/history.png' do
 67 |         headers "Content-Type" => "image/png"
 68 |         `mkdir -p tmp`
 69 |         `./scripts/log2png.sh server.log tmp/stats.png`
 70 |         File.read(File.join('tmp', "stats.png"))
 71 |     end
 72 |     
 73 |     # FIXME: make it concurrent-safe    
 74 |     get '/stats/apps.png' do 
 75 |         headers "Content-Type" => "image/png"
 76 |         `mkdir -p tmp`
 77 |         `./scripts/log2apps-png.sh server.log tmp/apps.png`
 78 |         File.read(File.join('tmp', "apps.png"))
 79 |     end
 80 |     
 81 |     get '/stats/apps.txt' do
 82 |         headers "Content-Type" => "text/plain"
 83 |         `./scripts/log2apps-txt.sh server.log`
 84 |     end
 85 |     
 86 |     get '/stats/models.txt' do
 87 |         headers "Content-Type" => "text/plain"
 88 |         `./scripts/log2models-txt.sh server.log`
 89 |     end
 90 |     
 91 |     post '/recognize' do
 92 |       do_post()
 93 |     end
 94 |   
 95 |     put '/recognize' do
 96 |       do_post()
 97 |     end
 98 |   
 99 |     put '/recognize/*' do
100 |       do_post()
101 |     end
102 |   
103 |     helpers do
104 |       def logger
105 |         LOGGER
106 |       end
107 |     end
108 |   
109 |     def do_post()
110 |       id = SecureRandom.hex
111 |       
112 |       logger.info "Request ID: " + id
113 |       req = Rack::Request.new(env)
114 |       
115 |       logger.info "Determining request handler..."
116 |       
117 |       @req_handler = nil
118 |       settings.handlers.each do |handler|
119 |         if handler.can_handle?(req)
120 |           @req_handler = handler
121 |           break
122 |         end
123 |       end
124 |       logger.info "Request will be handled by #{@req_handler}"
125 |       
126 |       logger.info "Preparing request handler recognizer..."
127 |       @req_handler.prepare_rec(req)
128 | 
129 |       nbest_n = 5
130 |       if req.params.has_key? 'nbest'
131 |         nbest_n = req.params['nbest'].to_i
132 |       end
133 |       
134 |       if settings.outdir != nil
135 |          File.open("#{settings.outdir}/#{id}.info", 'w') { |f|
136 |            req.env.select{ |k,v|
137 |               f.write "#{k}: #{v}\n"
138 |            }
139 |         }
140 |       end
141 |       logger.info "User agent: " + req.user_agent
142 |       
143 |       device_id = get_user_device_id(req)
144 |       logger.info "Device ID : #{device_id}"
145 |       cmn_mean = get_cmn_mean(device_id)
146 |       if cmn_mean != nil
147 |         logger.info "Setting CMN mean to #{cmn_mean}"
148 |         @req_handler.recognizer.set_cmn_mean(cmn_mean)
149 |       end
150 |         
151 |       logger.info "Parsing content type " + req.content_type
152 |       caps_str = content_type_to_caps(req.content_type)
153 |       logger.info "CAPS string is " + caps_str
154 |       @req_handler.recognizer.clear(id, caps_str)
155 |       
156 |       length = 0
157 |       
158 |       left_over = ""
159 |       req.body.each do |chunk|
160 |         chunk_to_rec = left_over + chunk
161 |         if chunk_to_rec.length > CHUNK_SIZE
162 |           chunk_to_send = chunk_to_rec[0..(chunk_to_rec.length / 2) * 2 - 1]
163 |           @req_handler.recognizer.feed_data(chunk_to_send)
164 |           left_over = chunk_to_rec[chunk_to_send.length .. -1]
165 |         else
166 |           left_over = chunk_to_rec
167 |         end
168 |         length += chunk.size
169 |       end
170 |       @req_handler.recognizer.feed_data(left_over)
171 |         
172 |       
173 |       logger.info "Data end received"
174 |       if length > 0
175 |         @req_handler.recognizer.feed_end()
176 |         result,nbest = @req_handler.recognizer.wait_final_result(max_nbest=nbest_n)
177 |         set_cmn_mean(device_id, @req_handler.recognizer.get_cmn_mean())
178 |         
179 |         nbest_results = []
180 |         
181 |         nbest.collect! do |hyp| 
182 |           @req_handler.postprocess_hypothesis(hyp) 
183 |         end
184 |         
185 |         nbest_results = []
186 |         nbest.collect do |hyp|
187 |           nbest_result = {}
188 |           nbest_result[:utterance] = hyp
189 |           extras_map  = @req_handler.get_hyp_extras(req, hyp)
190 |           nbest_result.merge!(extras_map)
191 |           nbest_results << nbest_result
192 |         end
193 |         source_encoding = settings.config["recognizer_encoding"]
194 |         if source_encoding != "utf-8"
195 |           # convert all strings in nbest_results from source encoding to UTF-8
196 |           traverse( nbest_results ) do |node|
197 |               if node.is_a? String
198 |                 node = Iconv.iconv('utf-8', source_encoding, node)[0]
199 |               end
200 |               node
201 |           end
202 |         end
203 |         
204 |         headers "Content-Type" => "application/json; charset=utf-8", "Content-Disposition" => "attachment"
205 |         JSON.pretty_generate({:status => 0, :id => id, :hypotheses => nbest_results})
206 |       else
207 |         @req_handler.recognizer.stop
208 |         headers "Content-Type" => "application/json; charset=utf-8", "Content-Disposition" => "attachment"
209 |         JSON.pretty_generate({:status => 0, :id => id, :hypotheses => [:utterance => ""]})
210 |       end
211 |     end
212 |     
213 |     
214 |     # Handle /fetch-lm requests and backward compatible fetch requests
215 |     get %r{/fetch-((lm)|(jsgf)|(pgf))} do
216 |       handled = false
217 |       settings.handlers.each do |handler|
218 |         if handler.can_handle_fetch_lm?(request)
219 |           handler.handle_fetch_lm(request)
220 |           handled = true
221 |         end
222 |       end
223 |       if !handled
224 |         status 409
225 |         "Don't know how to handle this type of language model"
226 |       else
227 |         "Request completed"
228 |       end
229 |     end
230 |     
231 |     error do
232 |         logger.info "Error: " + env['sinatra.error']
233 |         logger.info "Inspecting #{@req_handler}..."
234 |         if @req_handler != nil and @req_handler.recognizer.recognizing?
235 |           logger.info "Trying to clear recognizer.."
236 |           @req_handler.recognizer.stop
237 |           logger.info "Cleared recognizer"
238 |         end
239 |         #consume request body to avoid proxy error
240 |         begin
241 |           request.body.read
242 |         rescue
243 |         end
244 |         'Sorry, failed to process request. Reason: ' + env['sinatra.error'] + "\n"
245 |     end
246 |   
247 |     # Traverses a structure of hashes and arrays and applied blk to the values
248 |     def traverse(obj, &blk)
249 |       case obj
250 |       when Hash
251 |         # Forget keys because I don't know what to do with them
252 |         obj.each {|k,v| obj[k] = traverse(v, &blk) }
253 |       when Array
254 |         obj.collect! {|v| traverse(v, &blk) }
255 |       else
256 |         blk.call(obj)
257 |       end
258 |     end
259 |   
260 |     # Parses Content-type ans resolves it to GStreamer Caps string
261 |     def content_type_to_caps(content_type)
262 |       if not content_type
263 |         content_type = "audio/x-raw-int"
264 |         return "audio/x-raw-int,rate=16000,channels=1,signed=true,endianness=1234,depth=16,width=16"
265 |       end
266 |       parts = content_type.split(%r{[,; ]})
267 |       result = ""
268 |       allowed_types = Set.new ["audio/x-flac", "audio/x-raw-int", "application/ogg", "audio/mpeg", "audio/x-wav"]
269 |       if allowed_types.include? parts[0]
270 |         result = parts[0]
271 |         if parts[0] == "audio/x-raw-int"
272 |           attributes = {"rate"=>"16000", "channels"=>"1", "signed"=>"true", "endianness"=>"1234", "depth"=>"16", "width"=>"16"}
273 |           user_attributes = Hash[*parts[1..-1].map{|s| s.split('=', 2) }.flatten]
274 |           attributes.merge!(user_attributes)
275 |           result += ", " + attributes.map{|k,v| "#{k}=#{v}"}.join(", ")
276 |         end
277 |         return result
278 |       else
279 |         raise IOError, "Unsupported content type: #{parts[0]}. Supported types are: " + allowed_types.to_a.join(", ") + "." 
280 |       end
281 |     end
282 | 
283 |     # TODO: make this configurable and modular
284 |     def get_user_device_id(req)
285 |       device_id = req.params['device_id']
286 |       if (not device_id.nil?) and (not device_id.empty?)
287 |         return device_id
288 |       end
289 |       user_agent = req.user_agent
290 |       # try to identify android device using old method
291 |       if user_agent =~ /.*\(RecognizerIntentActivity.* ([\w-]+); .*/
292 |         return $1
293 |       elsif user_agent =~ /RecognizerTester.* (\S+)/
294 |         return $1
295 |       end
296 |       return "default"
297 |     end
298 |   
299 |     def get_cmn_mean(device_id)
300 |       cmn_means = {}
301 |       begin
302 |         File.open('cmn_means.json', 'r') { |f| cmn_means = JSON.load(f) }
303 |       rescue
304 |         begin
305 |           # backward compability, remove soon
306 |           logger.warn("Falling back to deprecated cmn_means.yaml instead of cmn_means.json")
307 |           cmn_means = YAML.load_file('cmn_means.yaml')
308 |         rescue
309 |         end
310 |       end
311 |       return cmn_means.fetch(device_id, get_mean_cmn_mean(cmn_means.values))
312 |     end 
313 |   
314 |     # Calculate mean CMN from an array of CMN means
315 |     # CMN means are given as an array of string, each with comma-seperated values
316 |     # Returns mean CMN, as a string, comma-seperated
317 |     def get_mean_cmn_mean(cmn_mean_array)
318 |       begin
319 |         if cmn_mean_array.size > 0
320 |           means = (cmn_mean_array.collect do | s | s.split(",") end).collect do |a| a.collect do |ss| ss.to_f end end
321 |           sum = [0.0] * means[0].size
322 |           means = means.select do | mean | mean.size == sum.size end
323 |           means.each do | mean |
324 |             sum.each_with_index do |s,i|
325 |               sum[i] += mean[i]
326 |             end
327 |           end
328 |           return (sum.collect do |s|  "%.2f" % (s / means.size) end).join(",")
329 |         else
330 |           return nil
331 |         end
332 |       rescue  Exception => e
333 |         logger.warn("Failed to calculate CMN mean over saved means:" + e.message)
334 |         return nil
335 |       end
336 |     end
337 |   
338 |     def set_cmn_mean(device_id, mean)
339 |       cmn_means = {}
340 |       begin
341 |         cmn_means = YAML.load_file('cmn_means.yaml')
342 |       rescue
343 |       end
344 |       cmn_means[device_id] = mean
345 |       uid = Process.uid
346 |       File.open("cmn_means.json.#{uid}", 'w' ) do |out|
347 |         out.write(cmn_means.to_json)
348 |       end
349 |       `mv cmn_means.json.#{uid} cmn_means.json`
350 |     end
351 |   end
352 | 
353 | 
354 | end
355 | 


--------------------------------------------------------------------------------
/scripts/convert-gf-jsgf.sh:
--------------------------------------------------------------------------------
1 | #! /bin/bash
2 | 
3 | sed -i "s/^public //" $1
4 | sed -i "s/^<MAIN>/public <MAIN>/" $1
5 | 


--------------------------------------------------------------------------------
/scripts/en-g2p.sh:
--------------------------------------------------------------------------------
 1 | #! /bin/sh
 2 | 
 3 | . `dirname $0`/settings.sh
 4 | 
 5 | 
 6 | tempfile=`mktemp`
 7 | tempfile2=`mktemp`
 8 | tempfile3=`mktemp`
 9 | 
10 | cat > $tempfile
11 | 
12 | cat $tempfile | perl -C -ne 'BEGIN{use Text::Unidecode;} chomp; $x=unidecode($_); $x=uc($x); print "$_ $x\n"' | sort -k2  > $tempfile2
13 | 
14 | cut -f 2 -d " " $tempfile2 > $tempfile3
15 | 
16 | $PHONETISAURUS --model=$EN_FST --input=$tempfile3 --isfile --words | sort | join -1 2 -2 1 $tempfile2 - | perl -npe 's/\S+\s+(\S+)\s+\S+/\1 /'
17 | 
18 | rm $tempfile $temfile2 $tempfile3
19 | 


--------------------------------------------------------------------------------
/scripts/fsg-to-dict.sh:
--------------------------------------------------------------------------------
1 | #! /bin/bash
2 | 
3 | . `dirname $0`/settings.sh
4 | 
5 | grep TRANSITION | cut -f 5 -d " " | sort | uniq | $ET_G2P
6 | 


--------------------------------------------------------------------------------
/scripts/fsg-to-dict_en.py:
--------------------------------------------------------------------------------
 1 | #! /usr/bin/python
 2 | 
 3 | import sys
 4 | import re
 5 | 
 6 | import os
 7 | from subprocess import Popen, PIPE, STDOUT
 8 | BASE_DICT="/usr/local/share/pocketsphinx/model/lm/en_US/cmu07a.dic"
 9 | G2P=os.path.dirname(sys.argv[0]) + "/en-g2p.sh"
10 | 
11 | words = {}
12 | for l in open(BASE_DICT):
13 |     ss = l.split()
14 |     word = ss[0]
15 |     word = re.sub(r"\(\d\)$", "", word) 
16 |     try:
17 |       prob = float(ss[1])
18 |       pron = ss[2:]
19 |     except ValueError:
20 |       prob = 1
21 |       pron = ss[1:]
22 |     
23 |     words.setdefault(word, []).append((pron, prob))
24 | 
25 | input_words = set()
26 | 
27 | for l in sys.stdin:
28 |   if l.startswith("TRANSITION"):
29 |     ss = l.split()
30 |     if len(ss) == 5:
31 |       input_words.add(ss[-1])
32 |   
33 | g2p_words = []
34 | for w in input_words:
35 |   if w.lower() in words:
36 |     for (i, pron) in enumerate(words[w.lower()]):
37 |       if i == 0:
38 |         print w,
39 |       else:
40 |         print "%s(%d)" % (w, i+1),
41 |       print " ".join(pron[0])
42 |   else:
43 |     g2p_words.append(w)
44 |   
45 | if len(g2p_words) > 0:
46 |   proc = Popen(G2P,stdin=PIPE, stdout=PIPE, stderr=STDOUT )
47 |   #stdout, stderr = proc.communicate()
48 |   for w in g2p_words:
49 |     print >>proc.stdin, w
50 |   proc.stdin.close()
51 |   
52 |   #return_code = proc.wait()
53 |   
54 |   for l in proc.stdout:
55 |     print l,
56 |     
57 |   
58 |   
59 |   
60 | 


--------------------------------------------------------------------------------
/scripts/fsm2fsg.py:
--------------------------------------------------------------------------------
 1 | #! /usr/bin/env python
 2 | 
 3 | import sys
 4 | 
 5 | from math import exp
 6 | 
 7 | if __name__ == '__main__':
 8 |   #HACK: currently all weights are set to 1. Seems to make the recognition more robust
 9 |   
10 |   arcs = []
11 |   for l in sys.stdin:
12 |     ss = l.split()
13 |     if len(ss) == 5:
14 |       arcs.append((int(ss[0]), int(ss[1]), ss[2], 1))
15 |     elif len(ss) == 4:
16 |       arcs.append((int(ss[0]), int(ss[1]), ss[2], 1))
17 |     elif len(ss) == 2:
18 |       arcs.append((int(ss[0]), -1, "", 1))
19 |     elif len(ss) == 1:
20 |       arcs.append((int(ss[0]), -1, "", 1))
21 |     else:
22 |       print >>sys.stderr, "WARNING: strange FSG line: ", l
23 |   
24 |   max_state = max([a[1] for a in arcs])
25 |   
26 |   final_state_id = max_state + 1
27 |   
28 |   
29 |   print "FSG_BEGIN <test>"
30 |   print "NUM_STATES", max_state + 2
31 |   print "START_STATE 0"
32 |   print "FINAL_STATE", final_state_id
33 |   
34 |   for a in arcs:
35 |     print "TRANSITION", a[0], a[1] == -1 and final_state_id or a[1], "%7.5f" % min(1.0, a[3]), a[2]
36 | 
37 |   print "FSG_END"
38 | 


--------------------------------------------------------------------------------
/scripts/jsgf2fsg.sh:
--------------------------------------------------------------------------------
 1 | #! /bin/sh
 2 | 
 3 | if [ $# -ne 2 ]
 4 | then
 5 |   echo "Usage: `basename $0` jsgf fsg"
 6 |   exit 1
 7 | fi
 8 | 
 9 | #sphinx_jsgf2fsg -jsgf $1 -fsg $2
10 | 
11 | sphinx_jsgf2fsg -jsgf $1 -fsm ${1%.*}.fsm  -symtab ${1%.*}.sym
12 | 
13 | fstcompile --arc_type=log --acceptor --isymbols=${1%.*}.sym --keep_isymbols ${1%.*}.fsm | \
14 | 	fstdeterminize | fstminimize | fstrmepsilon |  fstprint | \
15 | 	`dirname $0`/fsm2fsg.py > $2
16 | 


--------------------------------------------------------------------------------
/scripts/log2apps-png.sh:
--------------------------------------------------------------------------------
 1 | #! /bin/bash
 2 | 
 3 | DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
 4 | 
 5 | $DIR/log2apps-txt.sh $1 | head -20 > tmp/apps.txt
 6 | 
 7 | gnuplot <<EOF
 8 | set term pngcairo size 800,600
 9 | 
10 | set style line 1 lc rgb '#8b1a0e' pt 1 ps 1 lt 1 lw 2 # --- red
11 | set style line 2 lc rgb '#5e9c36' pt 6 ps 1 lt 1 lw 2 # --- green
12 | 
13 | set style line 11 lc rgb '#808080' lt 1
14 | set border 3 back ls 11
15 | set tics nomirror
16 | 
17 | set style line 12 lc rgb '#808080' lt 0 lw 1
18 | set grid back ls 12
19 | 
20 | 
21 | set style data histogram
22 | set xtic rotate by -45 scale 0
23 | set output '$2'
24 | plot 'tmp/apps.txt' using 1:xticlabels(2) ti "Requests per app"
25 | EOF
26 | 


--------------------------------------------------------------------------------
/scripts/log2apps-txt.sh:
--------------------------------------------------------------------------------
1 | #! /bin/bash
2 | 
3 | egrep "^I, \[.*INFO.*User agent.* \(RecognizerIntentActivity.*\)" $1 | perl -npe 's/.* (\S+)\)/\1/' | sort | uniq -c | sort -nr
4 | 


--------------------------------------------------------------------------------
/scripts/log2models-txt.sh:
--------------------------------------------------------------------------------
1 | #! /bin/bash
2 | 
3 | egrep "^I, \[.*INFO.*User agent.* \(RecognizerIntentActivity.*; ([^;]+\/[^;]+\/[^;]+); " $1 | 
4 | perl -npe 's/.*User agent.* \(RecognizerIntentActivity.*; ([^;]+\/[^;]+\/[^;]+);.*/\1/' | 
5 | sort | uniq -c | sort -nr -k1
6 | 


--------------------------------------------------------------------------------
/scripts/log2png.sh:
--------------------------------------------------------------------------------
 1 | #! /bin/bash
 2 | 
 3 | 
 4 | grep "^I, \[.*INFO.*User agent" $1 | perl -npe 's/^I, \[(\d\d\d\d-\d\d-\d\d)T\d\d:\d\d.*/\1/g' | sort | uniq -c | awk '{print($2,$1)}' > tmp/data.txt
 5 | 
 6 | gnuplot <<EOF
 7 | 
 8 | set term pngcairo size 800,600
 9 | 
10 | set style line 1 lc rgb '#8b1a0e' pt 1 ps 1 lt 1 lw 2 # --- red
11 | set style line 2 lc rgb '#5e9c36' pt 6 ps 1 lt 1 lw 2 # --- green
12 | 
13 | set style line 11 lc rgb '#808080' lt 1
14 | set border 3 back ls 11
15 | set tics nomirror
16 | 
17 | set style line 12 lc rgb '#808080' lt 0 lw 1
18 | set grid back ls 12
19 | set output '$2'
20 | set xdata time
21 | set timefmt "%Y-%m-%d"
22 | set format x "%Y-%m"
23 | plot "tmp/data.txt" using 1:2 with impulses ti "Requests per day"
24 | EOF
25 | 
26 | 
27 | 


--------------------------------------------------------------------------------
/scripts/settings.sh:
--------------------------------------------------------------------------------
1 | #! /bin/sh
2 | 
3 | PHONETISAURUS=/home/tanel/tools/phonetisaurus/phonetisaurus-g2p
4 | EN_FST=/home/tanel/devel/phonetisaurus_en/g014b2b/g014b2b.fst
5 | 
6 | ET_G2P=/home/tanel/workspace/et-g2p/run.sh
7 | 


--------------------------------------------------------------------------------
/unicorn.conf.rb:
--------------------------------------------------------------------------------
1 | worker_processes 1
2 | 
3 | 


--------------------------------------------------------------------------------
/views/index.md:
--------------------------------------------------------------------------------
  1 | Reaalajalise kõnetuvastuse server
  2 | ================
  3 | 
  4 | 
  5 | ## Sellest lehest
  6 | 
  7 | See leht kirjeldab TTÜ Küberneetika Instituudi [foneetika- kõnetehnoloogia labori](http://www.phon.ioc.ee) reaalajalise kõnetuvastuse serverit ja tema kasutamist. 
  8 | 
  9 | ## Server
 10 | 
 11 | Server on mõeldud lühikeste (max. u 20-sekundiliste) eestikeelsete kõnelõikude tuvastamiseks. 
 12 | 
 13 | Serveri lähtekood on BSD litsentsi alusel saadaval [siin](https://github.com/alumae/ruby-pocketsphinx-server). 
 14 | 
 15 | Server põhineb ise paljudel vaba tarkvara komponentidel, mille litsentse tuleks serveri kasutamisel arvestada.
 16 | Olulisemad tehnoloogiad, mida serveri juures on kasutatud:
 17 | 
 18 | * [CMU Sphinx](http://cmusphinx.org) -- server kasutab kõnetuvastuseks Pocketsphinx dekoodrit
 19 | * [Wapiti](http://wapiti.limsi.fr) -- kasutatakse liitsõnade rekonstrueerimiseks
 20 | * [Sinatra](http://www.sinatrarb.com) --  serveri poolt kasutatav veebiraamistik
 21 | * [Grammatical Framework](http://www.grammaticalframework.org) -- kasutakse GF-põhisel tuvastusel
 22 | 
 23 | ## Rakendused
 24 | 
 25 | Hetkel saab serverit kasutada kahe Android-platvormile loodud rakendusega: 
 26 | 
 27 | * [Kõnele](http://code.google.com/p/recognizer-intent)
 28 | * [Arvutaja](https://github.com/Kaljurand/Arvutaja)
 29 | 
 30 | <a href="http://market.android.com/search?q=pub:Kaarel Kaljurand">
 31 |   <img src="http://www.android.com/images/brand/45_avail_market_logo1.png"
 32 |        alt="Available in Android Market" />
 33 | </a>
 34 | 
 35 | Mõlemad rakendused on tasuta ja avatud lähtekoodiga.
 36 | 
 37 | ## Serveri kasutamine Java rakendustes
 38 | 
 39 | Serverit on lihtne kasutada läbi spetsiaalse teegi, mis on tasuta ja koos lähtekoodiga saadaval 
 40 | [siin](http://code.google.com/p/net-speech-api).
 41 | 
 42 | ## Serveri kasutamine muudes rakendustes
 43 | 
 44 | Serveri kasutamine on väga lihtne ka "otse", ilma vaheteegita. Järgnevalt demonstreerime, kuidas
 45 | serverit kasutada Linuxi käsurealt.
 46 | 
 47 | ### Näide 1: raw formaadis heli
 48 | 
 49 | Lindista mikrofoniga üks lühike lause, kasutades <i>raw</i> formaati, 16 kB, mono kodeeringut (vajuta Ctrl-C, kui oled lõpetanud):
 50 | 
 51 |     arecord --format=S16_LE  --file-type raw  --channels 1 --rate 16000 > lause1.raw
 52 | 
 53 | 
 54 | Nüüd, saada lause serverisse tuvastamisele (kasutades programmi <code>curl</code>, saadaval
 55 | kõikide Linuxite repositoriumites):
 56 | 
 57 |     curl -X POST --data-binary @lause1.raw \
 58 |       -H "Content-Type: audio/x-raw-int; rate=16000" \
 59 |       http://bark.phon.ioc.ee/speech-api/v1/recognize?nbest=1
 60 | 
 61 | 
 62 | Server genereerib vastuse JSON formaadis:
 63 | 
 64 | 
 65 |     {
 66 |       "status": 0,
 67 |       "hypotheses": [
 68 |         {
 69 |           "utterance": "see on esimene lause"
 70 |         }
 71 |       ],
 72 |       "id": "4d00ffd9b1a101940bb3ed88c6b6300d"
 73 |     }
 74 | 
 75 | ### Näide 2: ogg formaadis heli
 76 | 
 77 | Server tunneb ka formaate flac, ogg, mpeg, wav. Päringu Content-Type väli peaks sel juhul olema
 78 | vastavalt audio/x-flac, application/ogg, audio/mpeg või audio/x-wav.
 79 | 
 80 | Salvestame ogg formaadis lause (selleks peaks olema installeeritud pakett SoX):
 81 | 
 82 |     rec -r 16000 lause2.ogg
 83 |     
 84 | Saadame serverisse, kasutades PUT päringut:
 85 |     
 86 |     curl -T lause2.ogg -H "Content-Type: application/ogg"  "http://bark.phon.ioc.ee/speech-api/v1/recognize?nbest=1"
 87 | 
 88 | Väljund:
 89 | 
 90 |     {
 91 |       "status": 0,
 92 |       "hypotheses": [
 93 |         {
 94 |           "utterance": "see on teine lause"
 95 |         }
 96 |       ],
 97 |       "id": "dfd8ed3a028d1e70e4233f500e21c027"
 98 |     }
 99 | 
100 | 
101 | ### Näide 3: mitu tuvastushüpoteesi
102 | 
103 | Parameeter <code>nbest=1</code> ütles eelmises päringus serverile, et meid huvitab 
104 | ainult üks tulemus. Vaikimisi annab server viis kõige tõenäolisemat tuvastushüpoteesi,
105 | hüpoteesi tõenäosuse järjekorras:
106 | 
107 |     curl -X POST --data-binary @lause1.raw \
108 |       -H "Content-Type: audio/x-raw-int; rate=16000" \
109 |       http://bark.phon.ioc.ee/speech-api/v1/recognize
110 | 
111 | 
112 | Tulemus:
113 | 
114 |     {
115 |       "status": 0,
116 |       "hypotheses": [
117 |         {
118 |           "utterance": "see on esimene lause"
119 |         },
120 |         {
121 |           "utterance": "see on esimene lause on"
122 |         },
123 |         {
124 |           "utterance": "see on esimene lausa"
125 |         },
126 |         {
127 |           "utterance": "see on mu esimene lause"
128 |         },
129 |         {
130 |           "utterance": "see on esimene laose"
131 |         }
132 |       ],
133 |       "id": "61c78c7271026153b83f39a514dc0c41"
134 |     }
135 | 
136 | ### Näide 4: JSGF grammatika kasutamine
137 | 
138 | Vaikimisi kasutab server statistilist keelemudelit, mis üritab leida õige
139 | tuvastushüpoteesi kõikvõimalike eestikeelsete lausete hulgast. Mõnikord on
140 | aga kasulik võimalike lausete hulka piirata reeglipõhise grammatikaga. Server
141 | lubab grammatikaid defineerida kahes formaadis: 
142 | [JSGF](http://java.sun.com/products/java-media/speech/forDevelopers/JSGF/) ja 
143 | [GF](http://www.grammaticalframework.org/).
144 | 
145 | Näiteks allolev JSGF formaadis grammatika aktsepteerib muu hulgas selliseid lauseid:
146 | 
147 | >mine edasi
148 | 
149 | >liigu kaks meetrit tagasi
150 | 
151 | >liigu üks meeter edasi
152 | 
153 | >keera paremale
154 | 
155 | 
156 | 
157 | Grammatika:
158 | 
159 |     #JSGF V1.0;
160 |       
161 |     grammar robot;
162 |        
163 |     public <command> = <liigu> | <keera>;
164 |     <liigu> = (liigu | mine ) [ ( üks meeter ) | ( (kaks | kolm | neli | viis ) meetrit ) ] (edasi | tagasi );
165 |     <keera> = (keera | pööra) [ paremale | vasakule ];
166 | 
167 | Grammatika kasutamiseks peab selle kõigepealt laadima kusagile internetiserverisse, kus ta oleks 
168 | kõikjalt kättessadav (näiteks Dropboxi <i>public folder</i>). Antud juhul on grammatika 
169 | [siin](http://www.phon.ioc.ee/~tanela/tmp/robot.jsgf).
170 | 
171 | Seejärel tuleb kõnetuvastusserverile öelda, et ta grammatika alla laeks:
172 | 
173 |     curl "http://bark.phon.ioc.ee/speech-api/v1/fetch-lm?url=http://www.phon.ioc.ee/~tanela/tmp/robot.jsgf"
174 | 
175 | Lindistame seejärel testlause (näiteks "liigu üks meeter edasi", Ctrl-C kui valmis):
176 | 
177 |     rec -r 16000 liigu_1m_edasi.ogg
178 |     
179 | Grammatika abil tuvastamiseks tuleb päringule lisada parameeter 
180 | <code>lm=http://www.phon.ioc.ee/~tanela/tmp/robot.jsgf</code>:
181 | 
182 | 
183 |     curl -T liigu_1m_edasi.ogg \
184 |       -H "Content-Type: application/ogg" \
185 |       "http://bark.phon.ioc.ee/speech-api/v1/recognize?nbest=1&lm=http://www.phon.ioc.ee/~tanela/tmp/robot.jsgf"
186 |     
187 | Vastus:
188 | 
189 |     {
190 |       "status": 0,
191 |       "hypotheses": [
192 |         {
193 |           "utterance": "liigu \u00fcks meeter edasi"
194 |         }
195 |       ],
196 |       "id": "c858c89badc3597ca8ec7f10985b71de"
197 |     }
198 | 
199 | NB! JSGF formaadis grammatikad peaksid olema ISO-8859-14 kodeeringus. Serveri vastus on
200 | UTF-8 kodeeringus, nagu JSON standard ette näeb.
201 | 
202 | ### Näide 5: GF formaadis grammatika kasutamine (edasijõudnutele)
203 | 
204 | GF on grammatikaformalism, mis lubab muu hulgas ühele abstraktsele grammatikale
205 | luua mitu implementatsiooni erinevates keeltes. Näiteks, abstraktne grammatika
206 | võib olla mõeldud roboti juhtimiseks, tema implementatsioon eesti keeles
207 | defineerib, kuidas robotit eesti keeles juhtida, ning teine implementatsioon
208 | "masinkeeles" defineerib roboti poolt arusaadava süntaksi.
209 | 
210 | Palju eestikeelse implementatsiooniga GF grammatikaid leiab [siit](http://kaljurand.github.com/Grammars/).
211 | 
212 | Nagu JSGF puhul, tuleb ka GF grammatika serverisse laadida, kasutades GF binaarset
213 | formaati (PGF). Antud juhul tuleb ka spetsifitseerida,
214 | millist grammatikaimplementatsiooni server kõnetuvastuseks kasutama peaks, kasutades parameetrit 
215 | <code>lang</code>:
216 | 
217 |     curl "http://bark.phon.ioc.ee/speech-api/v1/fetch-lm?url=http://kaljurand.github.com/Grammars/grammars/pgf/Go.pgf&lang=Est"
218 |     
219 | Salvestame jälle testlause (näiteks "mine neli meetrit edasi"):
220 | 
221 |     rec -r 16000 mine_4m_edasi.ogg
222 | 
223 | Tuvastamiseks tuleb näidata serverile, milline on soovitav väljundkeel (parameetriga <code>output-lang=App</code>):
224 | 
225 |     curl  -T mine_4m_edasi.ogg \
226 |       -H "Content-Type: application/ogg"\
227 |       "http://bark.phon.ioc.ee/speech-api/v1/recognize?nbest=1&lm=http://kaljurand.github.com/Grammars/grammars/pgf/Go.pgf&output-lang=App"
228 | 
229 | Vastus:
230 | 
231 |     {
232 |       "status": 0,
233 |       "hypotheses": [
234 |         {
235 |           "linearizations": [
236 |             {
237 |               "lang": "App",
238 |               "output": "4 m >"
239 |             }
240 |           ],
241 |           "utterance": "mine neli meetrit edasi"
242 |         }
243 |       ],
244 |       "id": "e2f3067d69ea22c75dc4b0073f23ff38"
245 |     }
246 | 
247 | Vastuses on nüüd iga hüpoteesi juures väli <code>linearizations</code>,
248 | mis annab sisendi "linearisatsiooni" (ehk tõlke) väljundkeeles. Antud grammatika
249 | puhul on linearisatsioon väljundkeeles "4 m >", mida on robotil võib-olla
250 | lihtsam parsida, kui eestikeelset käsklust.
251 | 
252 | Kui PGF failis
253 | on grammatikaimplementatsioone rohkem, võib korraga küsida väljundit mitmes keeles:
254 |     
255 |     curl  -T mine_4m_edasi.ogg \
256 |       -H "Content-Type: application/ogg" \
257 |       "http://bark.phon.ioc.ee/speech-api/v1/recognize?nbest=1&lm=http://kaljurand.github.com/Grammars/grammars/pgf/Go.pgf&output-lang=App,Eng,Est"
258 | 
259 | Väljund:
260 | 
261 |     {
262 |       "status": 0,
263 |       "hypotheses": [
264 |         {
265 |           "linearizations": [
266 |             {
267 |               "lang": "App",
268 |               "output": "4 m >"
269 |             },
270 |             {
271 |               "lang": "Eng",
272 |               "output": "go four meters forward"
273 |             },
274 |             {
275 |               "lang": "Est",
276 |               "output": "mine neli meetrit edasi"
277 |             }
278 |           ],
279 |           "utterance": "mine neli meetrit edasi"
280 |         }
281 |       ],
282 |       "id": "d9abdbc2a7669752059ad544d3ba14f7"
283 |     }
284 | 
285 | ## Korduma kippuvad küsimused
286 | 
287 | #### Kas server salvestab mu kõnet?
288 | 
289 | Jah. Üldjuhul neid salvestusi küll keegi ei kuula, aga pisteliselt võidakse
290 | salvestusi kuulata ja käsitsi transkribeerida tuvastuskvaliteedi hindamiseks
291 | ja parandamiseks.
292 | 
293 | #### Tuvastuskvaliteet on väga halb!
294 | 
295 | Jah. Parima kvaliteedi saab suu lähedal oleva mikrofoni kasutamisel.
296 | Loodetavasti tulevikus kvaliteet paraneb, kui saame juba serverisse saadetud
297 | salvestusi kasutada mudelite parandamiseks (vt eelmine küsimus).
298 | 
299 | 
300 | #### Kas ma võin serverit piiramatult tasuta kasutada?
301 | 
302 | Mitte päris. Hetkel võib ühelt IP-lt teha tunnis kuni 100 ja päevas kuni 200 tuvastuspäringut.
303 | Tulevikus võivad need limiidid muutuda (see sõltub teenuse populaarsusest ja meie serveripargi
304 | arengust).
305 | 
306 | 
307 | #### Mis mõttes see tasuta on?
308 | 
309 | Tehnoloogia on välja töötatud riikliku programmi "Eesti keeletehnoloogia 2011-2017" raames, seega
310 | on maksumaksja juba selle eest maksnud. Riiklik programm ei pane küll meile
311 | kohustust sellist serverit piiramatult hallata, sellepärast võivad tulevikus
312 | kasutustingimused muutuda, serveri tarkvara aga jääb alatiseks tasuta, kui
313 | ei teki mingeid muid seniarvestamata asjaolusid.
314 | 
315 | #### OK, aga kas ma võin siis sellise tuvastustarkava enda serverisse installeerida?
316 | 
317 | Jah. Serveri tarkvara on saadaval [siin](https://github.com/alumae/ruby-pocketsphinx-server),
318 | eesti keele akustilise ja statistilise keelemudeli ning liitsõnade rekonstrueerimismudeli
319 | saamiseks palume kontakteeruda. Mudelid ei ole päris "vabad", s.t. nendele kehtivad teatud
320 | kasutuspiirangud (näiteks ei või neid levitada).
321 | 
322 | #### Kas iOS (Windows Phone 7, Blackberry, Meego) rakendus ka tuleb?
323 | 
324 | Hetkel pole plaanis. Samas on server avatud kõikidele rakendustele, seega
325 | võib sellise rakenduse implementeerida keegi kolmas. 
326 | 
327 | ## Kontakt
328 | 
329 | Tanel Alumäe: [tanel.alumae@phon.ioc.ee](tanel.alumae@phon.ioc.ee)
330 | 


--------------------------------------------------------------------------------
/views/layout.erb:
--------------------------------------------------------------------------------
 1 | <html>
 2 |   <head>
 3 |     <meta charset="utf-8">
 4 |     <title><%= @title %></title>
 5 | 
 6 |     <link rel="stylesheet" href="http://twitter.github.com/bootstrap/1.4.0/bootstrap.min.css">
 7 |     <style type="text/css">
 8 |           /* Override some defaults */
 9 |           html, body {
10 |             background-color: #EEE;
11 |           }
12 |           blockquote {
13 |             border-left: 5px solid #DDD;
14 |           }
15 |           body {
16 |             margin: 5px;
17 |           }
18 |           .container > footer p {
19 |             text-align: center; /* center align it with the container */
20 |           }
21 |           
22 |           code {
23 |             background-color: inherit;
24 |           }
25 |     </style>
26 |     
27 |   </head>
28 |   <body>
29 |     <div class="container">
30 |       <div class="content">
31 |   
32 |     <%= yield %>
33 |       <footer>
34 |         <p>&copy; TTÜ Küberneetika Instituut 2011</p>
35 | 
36 |       </footer>
37 |       </div>
38 |     </div> <!-- /container -->
39 |     
40 |   </body>
41 | </html>
42 | 


--------------------------------------------------------------------------------