├── .dockerignore
├── .gitignore
├── README.md
├── dockerfile
├── get_model.py
├── handler.py
├── model
    └── .gitkeep
├── requirements.txt
└── serverless.yml


/.dockerignore:
--------------------------------------------------------------------------------
1 | README.md
2 | *.pyc
3 | *.pyo
4 | *.pyd
5 | __pycache__
6 | .pytest_cache
7 | serverless.yaml
8 | get_model.py
9 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
 1 | # Distribution / packaging
 2 | .Python
 3 | env/
 4 | build/
 5 | develop-eggs/
 6 | dist/
 7 | downloads/
 8 | eggs/
 9 | .eggs/
10 | lib/
11 | lib64/
12 | parts/
13 | sdist/
14 | var/
15 | *.egg-info/
16 | .installed.cfg
17 | *.egg
18 | 
19 | # Serverless directories
20 | .serverless
21 | model/*.json
22 | model/*.bin
23 | model/*.txt
24 | model/*.model


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # [multilingual-serverless-qa-aws-lambda](https://www.philschmid.de/multilingual-serverless-xlm-roberta-with-huggingface)
  2 | 
  3 | 
  4 | 
  5 | Currently, we have 7.5 billion people living on the world in around 200 nations. Only
  6 | [1.2 billion people of them are native English speakers](https://en.wikipedia.org/wiki/List_of_countries_by_English-speaking_population).
  7 | This leads to a lot of unstructured non-English textual data.
  8 | 
  9 | Most of the tutorials and blog posts demonstrate how to build text classification, sentiment analysis,
 10 | question-answering, or text generation models with BERT based architectures in English. In order to overcome this
 11 | missing, we are going to build a multilingual Serverless Question Answering API.
 12 | 
 13 | Multilingual models describe machine learning models that can understand different languages. An example of a
 14 | multilingual model is [mBERT](https://storage.googleapis.com/bert_models/2018_11_23/multi_cased_L-12_H-768_A-12.zip)
 15 | from Google research.
 16 | [This model supports and understands 104 languages.](https://github.com/google-research/bert/blob/master/multilingual.md)
 17 | 
 18 | We are going to use the new AWS Lambda Container Support to build a Question-Answering API with a `xlm-roberta`.
 19 | Therefore we use the [Transformers](https://github.com/huggingface/transformers) library by HuggingFace,
 20 | the [Serverless Framework](https://serverless.com/), AWS Lambda, and Amazon ECR.
 21 | 
 22 | The special characteristic about this architecture is that we provide a "State-of-the-Art" model with more than 2GB and
 23 | that is served in a Serverless Environment
 24 | 
 25 | Before we start I wanted to encourage you to read my blog [philschmid.de](https://www.philschmi.de) where I have already
 26 | wrote several blog posts about [Serverless](https://www.philschmid.de/aws-lambda-with-custom-docker-image), how to
 27 | deploy [BERT in a Serverless Environment](https://www.philschmid.de/serverless-bert-with-huggingface-aws-lambda-docker),
 28 | or [How to fine-tune BERT models](https://www.philschmid.de/bert-text-classification-in-a-different-language).
 29 | 
 30 | You can find the complete code for it in this
 31 | [Github repository](https://github.com/philschmid/multilingual-serverless-qa-aws-lambda).
 32 | 
 33 | ---
 34 | 
 35 | # Services included in this tutorial
 36 | 
 37 | ## Transformers Library by Huggingface
 38 | 
 39 | The [Transformers library](https://github.com/huggingface/transformers) provides state-of-the-art machine learning
 40 | architectures like BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet, T5 for Natural Language Understanding (NLU) and Natural
 41 | Language Generation (NLG). It also provides thousands of pre-trained models in 100+ different languages.
 42 | 
 43 | ## AWS Lambda
 44 | 
 45 | [AWS Lambda](https://docs.aws.amazon.com/lambda/latest/dg/welcome.html) is a serverless computing service that lets you
 46 | run code without managing servers. It executes your code only when required and scales automatically, from a few
 47 | requests per day to thousands per second.
 48 | 
 49 | ## Amazon Elastic Container Registry
 50 | 
 51 | [Amazon Elastic Container Registry (ECR)](https://aws.amazon.com/ecr/?nc1=h_ls) is a fully managed container registry.
 52 | It allows us to store, manage, share docker container images. You can share docker containers privately within your
 53 | organization or publicly worldwide for anyone.
 54 | 
 55 | ## Serverless Framework
 56 | 
 57 | [The Serverless Framework](https://www.serverless.com/) helps us develop and deploy AWS Lambda functions. It’s a CLI
 58 | that offers structure, automation, and best practices right out of the box.
 59 | 
 60 | ---
 61 | 
 62 | 
 63 | # Tutorial
 64 | 
 65 | Before we get started, make sure you have the [Serverless Framework](https://serverless.com/) configured and set up. You
 66 | also need a working `docker` environment. We use `docker` to create our own custom image including all needed `Python`
 67 | dependencies and our multilingual `xlm-roberta` model, which we then use in our AWS Lambda function. Furthermore, you
 68 | need access to an AWS Account to create an IAM User, an ECR Registry, an API Gateway, and the AWS Lambda function.
 69 | 
 70 | We design the API in the following way:
 71 | 
 72 | We send a context (small paragraph) and a question to it and respond with the answer to the question. As model, we are
 73 | going to use the `xlm-roberta-large-squad2` trained by [deepset.ai](https://deepset.ai/) from the
 74 | [transformers model-hub](https://huggingface.co/deepset/xlm-roberta-large-squad2#). The model size is more than 2GB.
 75 | It's huge.
 76 | 
 77 | **What are we going to do:**
 78 | 
 79 | - create a `Python` Lambda function with the Serverless Framework.
 80 | - add the multilingual `xlm-roberta` model to our function and create an inference pipeline.
 81 | - Create a custom `docker` image and test it.
 82 | - Deploy a custom `docker` image to ECR.
 83 | - Deploy AWS Lambda function with a custom `docker` image.
 84 | - Test our Multilingual Serverless API.
 85 | 
 86 | You can find the complete code in this
 87 | [Github repository](https://github.com/philschmid/multilingual-serverless-qa-aws-lambda).
 88 | 
 89 | ---
 90 | 
 91 | # Create a `Python` Lambda function with the Serverless Framework
 92 | 
 93 | First, we create our AWS Lambda function by using the Serverless CLI with the `aws-python3` template.
 94 | 
 95 | ```bash
 96 | serverless create --template aws-python3 --path serverless-multilingual
 97 | ```
 98 | 
 99 | This CLI command will create a new directory containing a `handler.py`, `.gitignore`, and `serverless.yaml` file. The
100 | `handler.py` contains some basic boilerplate code.
101 | 
102 | ---
103 | 
104 | # Add the multilingual `xlm-roberta` model to our function and create an inference pipeline
105 | 
106 | To add our `xlm-roberta` model to our function we have to load it from the
107 | [model hub of HuggingFace](https://huggingface.co/models). For this, I have created a python script. Before we can
108 | execute this script we have to install the `transformers` library to our local environment and create a `model`
109 | directory in our `serverless-multilingual/` directory.
110 | 
111 | ```yaml
112 | mkdir model & pip3 install torch==1.5.0 transformers==3.4.0
113 | ```
114 | 
115 | After we installed `transformers` we create `get_model.py` file and include the script below.
116 | 
117 | ```python
118 | from transformers import AutoModelForQuestionAnswering, AutoTokenizer
119 | 
120 | def get_model(model):
121 |   """Loads model from Hugginface model hub"""
122 |   try:
123 |     model = AutoModelForQuestionAnswering.from_pretrained(model,use_cdn=True)
124 |     model.save_pretrained('./model')
125 |   except Exception as e:
126 |     raise(e)
127 | 
128 | def get_tokenizer(tokenizer):
129 |   """Loads tokenizer from Hugginface model hub"""
130 |   try:
131 |     tokenizer = AutoTokenizer.from_pretrained(tokenizer)
132 |     tokenizer.save_pretrained('./model')
133 |   except Exception as e:
134 |     raise(e)
135 | 
136 | get_model('deepset/xlm-roberta-large-squad2')
137 | get_tokenizer('deepset/xlm-roberta-large-squad2')
138 | ```
139 | 
140 | To execute the script we run `python3 get_model.py` in the `serverless-multilingual/` directory.
141 | 
142 | ```python
143 | python3 get_model.py
144 | ```
145 | 
146 | _**Tip**: add the `model` directory to `.gitignore`._
147 | 
148 | The next step is to adjust our `handler.py` and include our `serverless_pipeline()`, which initializes our model and
149 | tokenizer. It then returns a `predict` function, which we can use in our `handler`.
150 | 
151 | ```python
152 | import json
153 | import torch
154 | from transformers import AutoModelForQuestionAnswering, AutoTokenizer, AutoConfig
155 | 
156 | def encode(tokenizer, question, context):
157 |     """encodes the question and context with a given tokenizer"""
158 |     encoded = tokenizer.encode_plus(question, context)
159 |     return encoded["input_ids"], encoded["attention_mask"]
160 | 
161 | def decode(tokenizer, token):
162 |     """decodes the tokens to the answer with a given tokenizer"""
163 |     answer_tokens = tokenizer.convert_ids_to_tokens(
164 |         token, skip_special_tokens=True)
165 |     return tokenizer.convert_tokens_to_string(answer_tokens)
166 | 
167 | def serverless_pipeline(model_path='./model'):
168 |     """Initializes the model and tokenzier and returns a predict function that ca be used as pipeline"""
169 |     tokenizer = AutoTokenizer.from_pretrained(model_path)
170 |     model = AutoModelForQuestionAnswering.from_pretrained(model_path)
171 |     def predict(question, context):
172 |         """predicts the answer on an given question and context. Uses encode and decode method from above"""
173 |         input_ids, attention_mask = encode(tokenizer,question, context)
174 |         start_scores, end_scores = model(torch.tensor(
175 |             [input_ids]), attention_mask=torch.tensor([attention_mask]))
176 |         ans_tokens = input_ids[torch.argmax(
177 |             start_scores): torch.argmax(end_scores)+1]
178 |         answer = decode(tokenizer,ans_tokens)
179 |         return answer
180 |     return predict
181 | 
182 | # initializes the pipeline
183 | question_answering_pipeline = serverless_pipeline()
184 | 
185 | def handler(event, context):
186 |     try:
187 |         # loads the incoming event into a dictonary
188 |         body = json.loads(event['body'])
189 |         # uses the pipeline to predict the answer
190 |         answer = question_answering_pipeline(question=body['question'], context=body['context'])
191 |         return {
192 |             "statusCode": 200,
193 |             "headers": {
194 |                 'Content-Type': 'application/json',
195 |                 'Access-Control-Allow-Origin': '*',
196 |                 "Access-Control-Allow-Credentials": True
197 | 
198 |             },
199 |             "body": json.dumps({'answer': answer})
200 |         }
201 |     except Exception as e:
202 |         print(repr(e))
203 |         return {
204 |             "statusCode": 500,
205 |             "headers": {
206 |                 'Content-Type': 'application/json',
207 |                 'Access-Control-Allow-Origin': '*',
208 |                 "Access-Control-Allow-Credentials": True
209 |             },
210 |             "body": json.dumps({"error": repr(e)})
211 |         }
212 | ```
213 | 
214 | # Create a custom `docker` image and test it.
215 | 
216 | Before we can create our `docker` we need to create a `requirements.txt` file with all the dependencies we want to
217 | install in our `docker.`
218 | 
219 | We are going to use a lighter Pytorch Version and the transformers library.
220 | 
221 | ```bash
222 | https://download.pytorch.org/whl/cpu/torch-1.5.0%2Bcpu-cp38-cp38-linux_x86_64.whl
223 | transformers==3.4.0
224 | ```
225 | 
226 | To containerize our Lambda Function, we create a `dockerfile` in the same directory and copy the following content.
227 | 
228 | ```bash
229 | FROM public.ecr.aws/lambda/python:3.8
230 | 
231 | # Copy function code and models into our /var/task
232 | COPY ./ ${LAMBDA_TASK_ROOT}/
233 | 
234 | # install our dependencies
235 | RUN python3 -m pip install -r requirements.txt --target ${LAMBDA_TASK_ROOT}
236 | 
237 | # Set the CMD to your handler (could also be done as a parameter override outside of the Dockerfile)
238 | CMD [ "handler.handler" ]
239 | ```
240 | 
241 | Additionally we can add a `.dockerignore` file to exclude files from your container image.
242 | 
243 | ```bash
244 | README.md
245 | *.pyc
246 | *.pyo
247 | *.pyd
248 | __pycache__
249 | .pytest_cache
250 | serverless.yaml
251 | get_model.py
252 | ```
253 | 
254 | To build our custom `docker` image we run.
255 | 
256 | ```bash
257 | docker build -t multilingual-lambda .
258 | ```
259 | 
260 | We can start our `docker` by running.
261 | 
262 | ```bash
263 | docker run -p 8080:8080 multilingual-lambda
264 | ```
265 | 
266 | Afterwards, in a separate terminal, we can then locally invoke the function using `curl` or a REST-Client.
267 | 
268 | ```bash
269 | ccurl --request POST \
270 |   --url http://localhost:8080/2015-03-31/functions/function/invocations \
271 |   --header 'Content-Type: application/json' \
272 |   --data '{
273 | 	"body":"{\"context\":\"Saisonbedingt ging der Umsatz im ersten Quartal des Geschäftsjahres 2019 um 4 Prozent auf 1.970 Millionen Euro zurück, verglichen mit 1.047 Millionen Euro im vierten Quartal des vorangegangenen Geschäftsjahres. Leicht rückläufig war der Umsatz in den Segmenten Automotive (ATV) und Industrial Power Control (IPC). Im Vergleich zum Konzerndurchschnitt war der Rückgang im Segment Power Management & Multimarket (PMM) etwas ausgeprägter und im Segment Digital Security Solutions (DSS) deutlich ausgeprägter. Die Bruttomarge blieb von Quartal zu Quartal weitgehend stabil und fiel von 39,8 Prozent auf 39,5 Prozent. Darin enthalten sind akquisitionsbedingte Abschreibungen sowie sonstige Aufwendungen in Höhe von insgesamt 16 Millionen Euro, die hauptsächlich im Zusammenhang mit der internationalen Rectifier-Akquisition stehen. Die bereinigte Bruttomarge blieb ebenfalls nahezu unverändert und lag im ersten Quartal bei 40,4 Prozent, verglichen mit 40,6 Prozent im letzten Quartal des Geschäftsjahres 2018. Das Segmentergebnis für das erste Quartal des laufenden Fiskaljahres belief sich auf 359 Millionen Euro, verglichen mit 400 Millionen Euro ein Quartal zuvor. Die Marge des Segmentergebnisses sank von 19,5 Prozent auf 18,2 Prozent.\",\n\"question\":\"Was war die bereinigte Bruttomarge?\"\n}"}'
274 | 
275 | # {"statusCode": 200, "headers": {"Content-Type": "application/json", "Access-Control-Allow-Origin": "*", "Access-Control-Allow-Credentials": true}, "body": "{\"answer\": \"40,4 Prozent\"}"}%
276 | ```
277 | 
278 | _Beware we have to `stringify` our body since we are passing it directly into the function (only for testing)._
279 | 
280 | # Deploy a custom `docker` image to ECR
281 | 
282 | Since we now have a local `docker` image we can deploy this to ECR. Therefore we need to create an ECR repository with
283 | the name `multilingual-lambda`.
284 | 
285 | ```bash
286 | aws ecr create-repository --repository-name multilingual-lambda > /dev/null
287 | ```
288 | 
289 | To be able to push our images we need to login to ECR. We are using the `aws` CLI v2.x. Therefore we need to define some
290 | environment variables to make deploying easier.
291 | 
292 | ```bash
293 | aws_region=eu-central-1
294 | aws_account_id=891511646143
295 | 
296 | aws ecr get-login-password \
297 |     --region $aws_region \
298 | | docker login \
299 |     --username AWS \
300 |     --password-stdin $aws_account_id.dkr.ecr.$aws_region.amazonaws.com
301 | ```
302 | 
303 | Next we need to `tag` / rename our previously created image to an ECR format. The format for this is
304 | `{AccountID}.dkr.ecr.{region}.amazonaws.com/{repository-name}`
305 | 
306 | ```bash
307 | docker tag multilingual-lambda $aws_account_id.dkr.ecr.$aws_region.amazonaws.com/multilingual-lambda
308 | ```
309 | 
310 | To check if it worked we can run `docker images` and should see an image with our tag as name
311 | 
312 | ![docker-image](./images/docker-image.png)
313 | 
314 | Finally, we push the image to ECR Registry.
315 | 
316 | ```bash
317 |  docker push $aws_account_id.dkr.ecr.$aws_region.amazonaws.com/multilingual-lambda
318 | ```
319 | 
320 | # Deploy AWS Lambda function with a custom `docker` image
321 | 
322 | I provide the complete `serverless.yaml` for this example, but we go through all the details we need for our `docker`
323 | image and leave out all standard configurations. If you want to learn more about the `serverless.yaml`, I suggest you
324 | check out
325 | [Scaling Machine Learning from ZERO to HERO](https://www.philschmid.de/scaling-machine-learning-from-zero-to-hero). In
326 | this article, I went through each configuration and explained the usage of them.
327 | 
328 | _**Attention**: We need at least 9GB of memory and 300s as timeout._
329 | 
330 | ```yaml
331 | service: multilingual-qa-api
332 | 
333 | provider:
334 |   name: aws # provider
335 |   region: eu-central-1 # aws region
336 |   memorySize: 10240 # optional, in MB, default is 1024
337 |   timeout: 300 # optional, in seconds, default is 6
338 | 
339 | functions:
340 |   questionanswering:
341 |     image: 891511646143.dkr.ecr.eu-central-1.amazonaws.com/multilingual-lambda@sha256:4d08a8eb6286969a7594e8f15360f2ab6e86b9dd991558989a3b65e214ed0818
342 |     events:
343 |       - http:
344 |           path: qa # http path
345 |           method: post # http method
346 | ```
347 | 
348 | To use a `docker` image in our `serverlss.yaml` we have to add the `image` in our `function` section. The `image` has
349 | the URL to our `docker` image also value.
350 | 
351 | For an ECR image, the URL should look like tihs `<account>.dkr.ecr.<region>.amazonaws.com/<repository>@<digest>` (e.g
352 | `000000000000.dkr.ecr.sa-east-1.amazonaws.com/test-lambda-docker@sha256:6bb600b4d6e1d7cf521097177dd0c4e9ea373edb91984a505333be8ac9455d38`)
353 | 
354 | You can get the ecr url via the
355 | [AWS Console](https://eu-central-1.console.aws.amazon.com/ecr/repositories?region=eu-central-1).
356 | 
357 | In order to deploy the function, we run `serverless deploy`.
358 | 
359 | ```yaml
360 | serverless deploy
361 | ```
362 | 
363 | After this process is done we should see something like this.
364 | 
365 | ![serverless-deploy](./images/serverless-deploy.png)
366 | 
367 | ---
368 | 
369 | # Test our Multilingual Serverless API
370 | 
371 | To test our Lambda function we can use Insomnia, Postman, or any other REST client. Just add a JSON with a `context` and
372 | a `question` to the body of your request. Let´s try it with a `German` example and then with a `French` example.
373 | 
374 | _Beaware that if you invoke your function the first time there will be an cold start. Due to the model size. The cold
375 | start is bigger than 30s so the first request will run into an API Gateway timeout._
376 | 
377 | **German:**
378 | 
379 | ```json
380 | {
381 |   "context": "Saisonbedingt ging der Umsatz im ersten Quartal des Geschäftsjahres 2019 um 4 Prozent auf 1.970 Millionen Euro zurück, verglichen mit 1.047 Millionen Euro im vierten Quartal des vorangegangenen Geschäftsjahres. Leicht rückläufig war der Umsatz in den Segmenten Automotive (ATV) und Industrial Power Control (IPC). Im Vergleich zum Konzerndurchschnitt war der Rückgang im Segment Power Management & Multimarket (PMM) etwas ausgeprägter und im Segment Digital Security Solutions (DSS) deutlich ausgeprägter. Die Bruttomarge blieb von Quartal zu Quartal weitgehend stabil und fiel von 39,8 Prozent auf 39,5 Prozent. Darin enthalten sind akquisitionsbedingte Abschreibungen sowie sonstige Aufwendungen in Höhe von insgesamt 16 Millionen Euro, die hauptsächlich im Zusammenhang mit der internationalen Rectifier-Akquisition stehen. Die bereinigte Bruttomarge blieb ebenfalls nahezu unverändert und lag im ersten Quartal bei 40,4 Prozent, verglichen mit 40,6 Prozent im letzten Quartal des Geschäftsjahres 2018. Das Segmentergebnis für das erste Quartal des laufenden Fiskaljahres belief sich auf 359 Millionen Euro, verglichen mit 400 Millionen Euro ein Quartal zuvor. Die Marge des Segmentergebnisses sank von 19,5 Prozent auf 18,2 Prozent.",
382 |   "question": "Was war die bereinigte Bruttomarge?"
383 | }
384 | ```
385 | 
386 | Our `serverless_pipeline()` answered our question correctly with `40,4 Prozent`.
387 | 
388 | ![insomnia-ger](./images/insomnia-ger.png)
389 | 
390 | **French:**
391 | 
392 | ```json
393 | {
394 |   "context": "En raison de facteurs saisonniers, le chiffre d'affaires du premier trimestre de l'exercice 2019 a diminué de 4 % pour atteindre 1 970 millions d'euros, contre 1 047 millions d'euros au quatrième trimestre de l'exercice précédent. Les ventes ont légèrement diminué dans les segments de l'automobile (ATV) et de la régulation de la puissance industrielle (IPC). Par rapport à la moyenne du groupe, la baisse a été légèrement plus prononcée dans le segment de la gestion de l'énergie et du multimarché (PMM) et nettement plus prononcée dans le segment des solutions de sécurité numérique (DSS). La marge brute est restée largement stable d'un trimestre à l'autre, passant de 39,8 % à 39,5 %. Ce montant comprend l'amortissement lié à l'acquisition et d'autres dépenses totalisant 16 millions d'euros, principalement liées à l'acquisition de Rectifier international. La marge brute ajustée est également restée pratiquement inchangée à 40,4 % au premier trimestre, contre 40,6 % au dernier trimestre de l'exercice 2018. Le bénéfice sectoriel pour le premier trimestre de l'exercice en cours s'est élevé à 359 millions d'euros, contre 400 millions d'euros un trimestre plus tôt. La marge du résultat du segment a diminué de 19,5 % à 18,2 %.",
395 |   "question": "Quelle était la marge brute ajustée ?"
396 | }
397 | ```
398 | 
399 | Our `serverless_pipeline()` answered our question correctly with `40,4%`.
400 | 
401 | ![insomnia-fr](./images/insomnia-fr.png)
402 | 
403 | # Conclusion
404 | 
405 | The release of the AWS Lambda Container Support and the increase from Memory up to 10GB enables much wider use of AWS
406 | Lambda and Serverless. It fixes many existing problems and gives us greater scope for the deployment of serverless
407 | applications.
408 | 
409 | **I mean we deployed a docker container containing a "State-of-the-Art" multilingual NLP model bigger than 2GB in a
410 | Serverless Environment without the need to manage any server.**
411 | 
412 | It will automatically scale up to thousands of parallel requests without any worries.
413 | 
414 | The future looks more than golden for AWS Lambda and Serverless.
415 | 
416 | ---
417 | 
418 | You can find the [GitHub repository](https://github.com/philschmid/multilingual-serverless-qa-aws-lambda) with the
419 | complete code [here](https://github.com/philschmid/multilingual-serverless-qa-aws-lambda).
420 | 
421 | Thanks for reading. If you have any questions, feel free to contact me or comment on this article. You can also connect
422 | with me on [Twitter](https://twitter.com/_philschmid) or
423 | [LinkedIn](https://www.linkedin.com/in/philipp-schmid-a6a2bb196/).
424 | 


--------------------------------------------------------------------------------
/dockerfile:
--------------------------------------------------------------------------------
 1 | FROM public.ecr.aws/lambda/python:3.8
 2 | 
 3 | # Copy function code and models into our /var/task
 4 | COPY ./ ${LAMBDA_TASK_ROOT}/
 5 | 
 6 | # install our dependencies
 7 | RUN python3 -m pip install -r requirements.txt --target ${LAMBDA_TASK_ROOT}
 8 | 
 9 | # Set the CMD to your handler (could also be done as a parameter override outside of the Dockerfile)
10 | CMD [ "handler.handler" ]


--------------------------------------------------------------------------------
/get_model.py:
--------------------------------------------------------------------------------
 1 | from transformers import AutoModelForQuestionAnswering, AutoTokenizer
 2 | 
 3 | def get_model(model):
 4 |   """Loads model from Hugginface model hub"""
 5 |   try:
 6 |     model = AutoModelForQuestionAnswering.from_pretrained(model,use_cdn=True)
 7 |     model.save_pretrained('./model')
 8 |   except Exception as e:
 9 |     raise(e)
10 |   
11 | def get_tokenizer(tokenizer):
12 |   """Loads tokenizer from Hugginface model hub"""
13 |   try:
14 |     tokenizer = AutoTokenizer.from_pretrained(tokenizer)
15 |     tokenizer.save_pretrained('./model')
16 |   except Exception as e:
17 |     raise(e)
18 |   
19 | get_model('deepset/xlm-roberta-large-squad2')
20 | get_tokenizer('deepset/xlm-roberta-large-squad2')
21 | # get_model('mrm8488/bert-multi-cased-finetuned-xquadv1')
22 | # get_tokenizer('mrm8488/bert-multi-cased-finetuned-xquadv1')


--------------------------------------------------------------------------------
/handler.py:
--------------------------------------------------------------------------------
 1 | import json
 2 | import torch
 3 | from transformers import AutoModelForQuestionAnswering, AutoTokenizer, AutoConfig
 4 | 
 5 | def encode(tokenizer, question, context):
 6 |     """encodes the question and context with a given tokenizer"""
 7 |     encoded = tokenizer.encode_plus(question, context)
 8 |     return encoded["input_ids"], encoded["attention_mask"]
 9 | 
10 | def decode(tokenizer, token):
11 |     """decodes the tokens to the answer with a given tokenizer"""
12 |     answer_tokens = tokenizer.convert_ids_to_tokens(
13 |         token, skip_special_tokens=True)
14 |     return tokenizer.convert_tokens_to_string(answer_tokens)
15 |       
16 | def serverless_pipeline(model_path='./model'):
17 |     """Initializes the model and tokenzier and returns a predict function that ca be used as pipeline"""
18 |     tokenizer = AutoTokenizer.from_pretrained(model_path)
19 |     model = AutoModelForQuestionAnswering.from_pretrained(model_path)
20 |     def predict(question, context):
21 |         """predicts the answer on an given question and context. Uses encode and decode method from above"""
22 |         input_ids, attention_mask = encode(tokenizer,question, context)
23 |         start_scores, end_scores = model(torch.tensor(
24 |             [input_ids]), attention_mask=torch.tensor([attention_mask]))
25 |         ans_tokens = input_ids[torch.argmax(
26 |             start_scores): torch.argmax(end_scores)+1]
27 |         answer = decode(tokenizer,ans_tokens)        
28 |         return answer
29 |     return predict
30 | 
31 | # initializes the pipeline
32 | question_answering_pipeline = serverless_pipeline()
33 | 
34 | def handler(event, context):
35 |     try:
36 |         # loads the incoming event into a dictonary
37 |         body = json.loads(event['body'])
38 |         print(body)
39 |         # uses the pipeline to predict the answer
40 |         answer = question_answering_pipeline(question=body['question'], context=body['context'])
41 |         print(answer)
42 |         return {
43 |             "statusCode": 200,
44 |             "headers": {
45 |                 'Content-Type': 'application/json',
46 |                 'Access-Control-Allow-Origin': '*',
47 |                 "Access-Control-Allow-Credentials": True
48 | 
49 |             },
50 |             "body": json.dumps({'answer': answer})
51 |         }
52 |     except Exception as e:
53 |         print(repr(e))
54 |         return {
55 |             "statusCode": 500,
56 |             "headers": {
57 |                 'Content-Type': 'application/json',
58 |                 'Access-Control-Allow-Origin': '*',
59 |                 "Access-Control-Allow-Credentials": True
60 |             },
61 |             "body": json.dumps({"error": repr(e)})
62 |         }


--------------------------------------------------------------------------------
/model/.gitkeep:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/philschmid/multilingual-serverless-qa-aws-lambda/7d59d0ef446ed003f0ea4b821e4763d4e7f59f7e/model/.gitkeep


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | https://download.pytorch.org/whl/cpu/torch-1.5.0%2Bcpu-cp38-cp38-linux_x86_64.whl
2 | transformers==3.4.0


--------------------------------------------------------------------------------
/serverless.yml:
--------------------------------------------------------------------------------
 1 | service: multilingual-qa-api
 2 | 
 3 | provider:
 4 |   name: aws # provider
 5 |   region: eu-central-1 # aws region
 6 |   memorySize: 10240 # optional, in MB, default is 1024
 7 |   timeout: 300 # optional, in seconds, default is 6
 8 | 
 9 | functions:
10 |   questionanswering:
11 |     image: 891511646143.dkr.ecr.eu-central-1.amazonaws.com/multilingual-lambdasha256:4d08a8eb6286969a7594e8f15360f2ab6e86b9dd991558989a3b65e214ed0818
12 |     events:
13 |       - http:
14 |           path: qa # http path
15 |           method: post # http method
16 | 


--------------------------------------------------------------------------------