├── static
    ├── subnets.png
    ├── aim317-sm-arch-1.jpg
    ├── aim317-sm-arch-2.jpg
    ├── aim317-sm-arch-3.jpg
    ├── aim317-sm-arch-4.jpg
    ├── aim317-sm-arch-5.jpg
    ├── AIM317 Diagram - A1.png
    ├── AIM317 Solution Flow.png
    └── aim317-sm-arch-full.jpg
├── notebooks
    ├── 1-Transcribe-Translate-Calls
    │   ├── input
    │   │   ├── audio-recordings
    │   │   │   ├── AIM317-Call1-EN.m4a
    │   │   │   ├── AIM317-Call1-ES.m4a
    │   │   │   ├── AIM317-Call2-EN.m4a
    │   │   │   ├── AIM317-Call3-EN.m4a
    │   │   │   ├── AIM317-Call4-EN.m4a
    │   │   │   ├── AIM317-Call5-EN.m4a
    │   │   │   └── AIM317-Call5-ES.m4a
    │   │   ├── translate-custom-terminology.txt
    │   │   ├── custom-vocabulary-ES.txt
    │   │   ├── custom-vocabulary-EN.txt
    │   │   └── translate-parallel-data.txt
    │   └── AIM317-reInvent2021-transcribe-and-translate-customer-calls.ipynb
    ├── 5-Visualize-Insights
    │   ├── quicksight_raw_manifest.json
    │   └── AIM317-reInvent2021-prepare-quicksight-inputs.ipynb
    ├── 4-Detect-Call-Sentiment
    │   └── AIM317-reInvent2021-detect-customer-sentiment.ipynb
    ├── 2-Train-Detect-Entities
    │   ├── annotations.csv
    │   ├── AIM317-reInvent2021-train-and-detect-entities.ipynb
    │   └── train.csv
    └── 3-Train-Classify-Calls
    │   └── AIM317-reInvent2021-train-and-classify-customer-calls.ipynb
├── CODE_OF_CONDUCT.md
├── src
    ├── importTerminology.py
    ├── paginateProcessDataTrainTestFiles.py
    ├── createDocumentClassifier.py
    ├── startSentimentDetection.py
    ├── createEntityRecognizer.py
    ├── startTranscriptionJob.py
    ├── createVocabulary.py
    ├── createEndpoint.py
    ├── classifyDocument.py
    ├── detectEntities.py
    ├── buildTrainTest.py
    └── translateText.py
├── LICENSE
├── NOTICE
├── cloudformation
    ├── sagemakerNotebookTemplate.yaml
    ├── sagemakerNotebookEventEngineTemplate.yaml
    └── aim317Template.yml
├── CONTRIBUTING.md
└── README.md


/static/subnets.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/aim317-uncover-insights-customer-conversations/main/static/subnets.png


--------------------------------------------------------------------------------
/static/aim317-sm-arch-1.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/aim317-uncover-insights-customer-conversations/main/static/aim317-sm-arch-1.jpg


--------------------------------------------------------------------------------
/static/aim317-sm-arch-2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/aim317-uncover-insights-customer-conversations/main/static/aim317-sm-arch-2.jpg


--------------------------------------------------------------------------------
/static/aim317-sm-arch-3.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/aim317-uncover-insights-customer-conversations/main/static/aim317-sm-arch-3.jpg


--------------------------------------------------------------------------------
/static/aim317-sm-arch-4.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/aim317-uncover-insights-customer-conversations/main/static/aim317-sm-arch-4.jpg


--------------------------------------------------------------------------------
/static/aim317-sm-arch-5.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/aim317-uncover-insights-customer-conversations/main/static/aim317-sm-arch-5.jpg


--------------------------------------------------------------------------------
/static/AIM317 Diagram - A1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/aim317-uncover-insights-customer-conversations/main/static/AIM317 Diagram - A1.png


--------------------------------------------------------------------------------
/static/AIM317 Solution Flow.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/aim317-uncover-insights-customer-conversations/main/static/AIM317 Solution Flow.png


--------------------------------------------------------------------------------
/static/aim317-sm-arch-full.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/aim317-uncover-insights-customer-conversations/main/static/aim317-sm-arch-full.jpg


--------------------------------------------------------------------------------
/notebooks/1-Transcribe-Translate-Calls/input/audio-recordings/AIM317-Call1-EN.m4a:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/aim317-uncover-insights-customer-conversations/main/notebooks/1-Transcribe-Translate-Calls/input/audio-recordings/AIM317-Call1-EN.m4a


--------------------------------------------------------------------------------
/notebooks/1-Transcribe-Translate-Calls/input/audio-recordings/AIM317-Call1-ES.m4a:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/aim317-uncover-insights-customer-conversations/main/notebooks/1-Transcribe-Translate-Calls/input/audio-recordings/AIM317-Call1-ES.m4a


--------------------------------------------------------------------------------
/notebooks/1-Transcribe-Translate-Calls/input/audio-recordings/AIM317-Call2-EN.m4a:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/aim317-uncover-insights-customer-conversations/main/notebooks/1-Transcribe-Translate-Calls/input/audio-recordings/AIM317-Call2-EN.m4a


--------------------------------------------------------------------------------
/notebooks/1-Transcribe-Translate-Calls/input/audio-recordings/AIM317-Call3-EN.m4a:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/aim317-uncover-insights-customer-conversations/main/notebooks/1-Transcribe-Translate-Calls/input/audio-recordings/AIM317-Call3-EN.m4a


--------------------------------------------------------------------------------
/notebooks/1-Transcribe-Translate-Calls/input/audio-recordings/AIM317-Call4-EN.m4a:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/aim317-uncover-insights-customer-conversations/main/notebooks/1-Transcribe-Translate-Calls/input/audio-recordings/AIM317-Call4-EN.m4a


--------------------------------------------------------------------------------
/notebooks/1-Transcribe-Translate-Calls/input/audio-recordings/AIM317-Call5-EN.m4a:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/aim317-uncover-insights-customer-conversations/main/notebooks/1-Transcribe-Translate-Calls/input/audio-recordings/AIM317-Call5-EN.m4a


--------------------------------------------------------------------------------
/notebooks/1-Transcribe-Translate-Calls/input/audio-recordings/AIM317-Call5-ES.m4a:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/aim317-uncover-insights-customer-conversations/main/notebooks/1-Transcribe-Translate-Calls/input/audio-recordings/AIM317-Call5-ES.m4a


--------------------------------------------------------------------------------
/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
1 | ## Code of Conduct
2 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct).
3 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact
4 | opensource-codeofconduct@amazon.com with any additional questions or comments.
5 | 


--------------------------------------------------------------------------------
/notebooks/5-Visualize-Insights/quicksight_raw_manifest.json:
--------------------------------------------------------------------------------
 1 | {
 2 |     "fileLocations": [
 3 |         {
 4 |             "URIPrefixes": [
 5 |                 "s3://bucket/prefix"
 6 |             ]
 7 |         }
 8 |     ],
 9 |     "globalUploadSettings": {
10 |         "format": "CSV",
11 |         "delimiter": ",",
12 |         "textqualifier": "'",
13 |         "containsHeader": "true"
14 |     }
15 | }
16 | 


--------------------------------------------------------------------------------
/notebooks/1-Transcribe-Translate-Calls/input/translate-custom-terminology.txt:
--------------------------------------------------------------------------------
 1 | es,en
 2 | Buenas noches, Good evening
 3 | este es Easytron, this is Easytron
 4 | necesitado, in need
 5 | bots de Trantor, Trantor robots
 6 | chocolate para mi hijo, choclate milkshake
 7 | tres leyes de la robótica, three laws of robotics
 8 | tres leyes, three laws
 9 | noches, evening
10 | licuado, milkshake
11 | bots, robots
12 | robo, robot
13 | bot, robot   
14 | 


--------------------------------------------------------------------------------
/notebooks/1-Transcribe-Translate-Calls/input/custom-vocabulary-ES.txt:
--------------------------------------------------------------------------------
 1 | Phrase	IPA	SoundsLike	DisplayAs
 2 | ahí-Sí-Tron			Easytron
 3 | necesita			necesito
 4 | cumple			compre
 5 | quise-ofenderlo			no-quise-ofenderlo
 6 | Kuerten			cubren
 7 | otro-mes-ticos			domésticos
 8 | sus-bocinas			subrutinas
 9 | Ecuador			licuado
10 | cuadro			licuado
11 | y-Citron			Easytron
12 | temes-EP-cinco-mil			MCP-5000
13 | MSP-cinco-milantes			MCP-5000
14 | Sí-sitúan			Easytron
15 | robadas			robot
16 | robo			robot
17 | bot			robot


--------------------------------------------------------------------------------
/notebooks/1-Transcribe-Translate-Calls/input/custom-vocabulary-EN.txt:
--------------------------------------------------------------------------------
 1 | Phrase	IPA	SoundsLike	DisplayAs
 2 | cyberpunk			CyberPunk
 3 | easy-drawn			EasyTron
 4 | emotional-intelligence			Emotional Intelligence
 5 | galactic-empire			Galactic Empire
 6 | is-it-ron			EasyTron
 7 | kinematics			Kinematics
 8 | nightfall			Nightfall
 9 | positronic-brain			Positronic Brain
10 | psychohistory			Psychohistory
11 | robot			Robot
12 | robot-ethics			Robot Ethics
13 | three-laws			Three Laws
14 | transistor			Trantor
15 | trent-or			Trantor
16 | worship			version


--------------------------------------------------------------------------------
/src/importTerminology.py:
--------------------------------------------------------------------------------
 1 | import boto3
 2 | 
 3 | def lambda_handler(event, context):
 4 | 
 5 |     record = event['Records'][0]
 6 |     
 7 |     print("Record: " + str(record))
 8 |         
 9 |     s3bucket = record['s3']['bucket']['name']
10 |     s3object = record['s3']['object']['key']
11 | 
12 |     s3Path = "s3://" + s3bucket + "/" + s3object
13 | 
14 |     s3_resource = boto3.resource('s3')
15 | 
16 |     temp = s3_resource.Object(s3bucket, s3object)
17 |     term_file = temp.get()['Body'].read().decode('utf-8')
18 | 
19 |     client = boto3.client('translate')
20 | 
21 |     print("S3 Path:" + s3Path)
22 | 
23 |     response = client.import_terminology(
24 |         Name="aim317-custom-terminology",
25 |         MergeStrategy='OVERWRITE',
26 |         TerminologyData={
27 |             'File': term_file,
28 |             'Format': 'CSV'
29 |         },
30 |     )
31 | 
32 |     return {
33 |         'TerminologyName': response['TerminologyProperties']['Name']
34 |     }


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 2 | 
 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of
 4 | this software and associated documentation files (the "Software"), to deal in
 5 | the Software without restriction, including without limitation the rights to
 6 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
 7 | the Software, and to permit persons to whom the Software is furnished to do so.
 8 | 
 9 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
10 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
11 | FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
12 | COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
13 | IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
14 | CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
15 | 
16 | 


--------------------------------------------------------------------------------
/src/paginateProcessDataTrainTestFiles.py:
--------------------------------------------------------------------------------
 1 | import boto3
 2 | import os
 3 | import io
 4 | import pandas as pd
 5 | 
 6 | def lambda_handler(event, context):
 7 | 
 8 |     s3 = boto3.client('s3')
 9 |     raw_data = s3.get_object(Bucket=os.environ['comprehendBucket'], Key='comprehend/train/aim317-cust-class-train-data.csv')
10 |     raw_content = pd.read_csv(io.BytesIO(raw_data['Body'].read()))
11 |     print(raw_content)
12 |     raw_content['label'] = raw_content['label'].astype(str)
13 |     selected_columns = ['label', 'text']
14 |     selected_data = raw_content[selected_columns]
15 | 
16 |     DSTTRAINFILE='/tmp/comprehend-train.csv'
17 | 
18 |     selected_data.to_csv(path_or_buf=DSTTRAINFILE,
19 |                     header=False,
20 |                     index=False,
21 |                     escapechar='\\',
22 |                     doublequote=False,
23 |                     quotechar='"')
24 | 
25 |     s3 = boto3.client('s3')
26 |     prefix = 'comprehend-custom-classifier'
27 |     bucket = os.environ['comprehendBucket']
28 | 
29 |     s3.upload_file(DSTTRAINFILE, bucket, prefix+'/comprehend-train.csv')


--------------------------------------------------------------------------------
/src/createDocumentClassifier.py:
--------------------------------------------------------------------------------
 1 | import boto3
 2 | import uuid
 3 | import os
 4 | 
 5 | def lambda_handler(event, context):
 6 | 
 7 |     DSTTRAINFILE='comprehend-train.csv'
 8 |     s3_train_data = 's3://{}/{}/{}'.format(os.environ['classifierBucket'], os.environ['classifierBucketPrefix'], DSTTRAINFILE)
 9 |     s3_output_job = 's3://{}/{}/{}'.format(os.environ['classifierBucket'], os.environ['classifierBucket'], 'output/train_job')
10 |     print('training data location: ',s3_train_data, "output location:", s3_output_job)
11 | 
12 |     uid = str(uuid.uuid4())
13 |     comprehend = boto3.client('comprehend')
14 | 
15 |     training_job = comprehend.create_document_classifier(
16 |         DocumentClassifierName='aim317-custom-classifier-' + uid,
17 |         DataAccessRoleArn=os.environ['ComprehendARN'],
18 |         InputDataConfig={
19 |             'S3Uri': s3_train_data
20 |         },
21 |         OutputDataConfig={
22 |             'S3Uri': s3_output_job
23 |         },
24 |         LanguageCode='en',
25 |         VersionName='v001'
26 |     )
27 | 
28 |     return {
29 |         'DocumentClassifierArn': training_job['DocumentClassifierArn']
30 |     }


--------------------------------------------------------------------------------
/NOTICE:
--------------------------------------------------------------------------------
 1 | aim317-uncover-insights-customer-conversations
 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 
 3 | 
 4 | This project uses training data created from content on Wikipedia licensed under CC-BY-SA-3.0.
 5 | # https://en.wikipedia.org/wiki/Wikipedia:Text_of_Creative_Commons_Attribution-ShareAlike_3.0_Unported_License
 6 | # notebooks/2-Train-Detect-Entities/train.csv
 7 | * https://en.wikipedia.org/wiki/Robot_series
 8 | * https://en.wikipedia.org/wiki/I,_Robot
 9 | * https://en.wikipedia.org/wiki/The_Complete_Robot
10 | * https://en.wikipedia.org/wiki/The_Bicentennial_Man
11 | * https://en.wikipedia.org/wiki/The_Positronic_Man
12 | * https://en.wikipedia.org/wiki/Mother_Earth_(novella)
13 | * https://en.wikipedia.org/wiki/The_Naked_Sun
14 | * https://en.wikipedia.org/wiki/Robots_and_Empire
15 | * https://en.wikipedia.org/wiki/Robotics
16 | * https://en.wikipedia.org/wiki/Gait_training
17 | * https://en.wikipedia.org/wiki/Inverse_kinematics
18 | * https://en.wikipedia.org/wiki/Ethics_of_artificial_intelligence
19 | * https://en.wikipedia.org/wiki/Three_Laws_of_Robotics
20 | * https://en.wikipedia.org/wiki/Positronic_brain
21 | * https://en.wikipedia.org/wiki/Neural_pathway
22 | * https://en.wikipedia.org/wiki/Brain%E2%80%93computer_interface
23 | * https://en.wikipedia.org/wiki/Robot_locomotion
24 | 


--------------------------------------------------------------------------------
/src/startSentimentDetection.py:
--------------------------------------------------------------------------------
 1 | import awswrangler as wr
 2 | import pandas as pd
 3 | import boto3
 4 | import os
 5 | 
 6 | def lambda_handler(event, context):
 7 | 
 8 |     client = boto3.client('comprehend')
 9 | 
10 |     inprefix = 'comprehendInput'
11 |     outprefix = 'quicksight/temp/insights'
12 | 
13 |     comprehend = boto3.client('comprehend')
14 |     s3 = boto3.client('s3')
15 |     s3_resource = boto3.resource('s3')
16 | 
17 |     paginator = s3.get_paginator('list_objects_v2')
18 |     pages = paginator.paginate(Bucket=os.environ['ComprehendBucket'], Prefix=inprefix)
19 |     job_name_list = []
20 |     t_prefix = 'quicksight/data/sentiment'
21 | 
22 |     cols = ['transcript_name', 'sentiment']
23 |     df_sent = pd.DataFrame(columns=cols)
24 | 
25 |     for page in pages:
26 |         for obj in page['Contents']:
27 |             transcript_file_name = obj['Key'].split('/')[1]
28 |             temp = s3_resource.Object(os.environ['ComprehendBucket'], obj['Key'])
29 |             transcript_contents = temp.get()['Body'].read().decode('utf-8')
30 |             response = comprehend.detect_sentiment(Text=transcript_contents, LanguageCode='en')
31 |             df_sent.loc[len(df_sent.index)] = [transcript_file_name.strip('en-').strip('.txt'),response['Sentiment']]
32 |             
33 |     wr.s3.to_csv(df_sent, path='s3://' + os.environ['ComprehendBucket'] + '/' + t_prefix + '/' + 'sentiment.csv')
34 | 


--------------------------------------------------------------------------------
/src/createEntityRecognizer.py:
--------------------------------------------------------------------------------
 1 | import boto3
 2 | import uuid
 3 | import os
 4 | 
 5 | def lambda_handler(event, context):
 6 |     
 7 |     jobName = "aim317-recognizer" + '-' + str(uuid.uuid4())
 8 | 
 9 |     client = boto3.client('comprehend')
10 | 
11 |     s3TrainingBucket = os.environ['ComprehendAnnotationBucket']
12 |     s3AnnotationBucket = os.environ['ComprehendAnnotationBucket']
13 | 
14 |     response = client.create_entity_recognizer(
15 |         RecognizerName=jobName,
16 |         DataAccessRoleArn=os.environ['ComprehendARN'],
17 |         InputDataConfig={
18 |             'DataFormat': 'COMPREHEND_CSV',
19 |             "EntityTypes": [
20 |                     {
21 |                         "Type": "MOVEMENT"
22 |                     },
23 |                     {
24 |                         "Type": "BRAIN"
25 |                     },
26 |                     {
27 |                         "Type": "ETHICS"
28 |                     }
29 |                 ],
30 |             'Documents': {
31 |                 'S3Uri': "s3://" + s3TrainingBucket + "/comprehend/train/train.csv",
32 |                 'InputFormat': 'ONE_DOC_PER_LINE'
33 |             },
34 |             'Annotations': {
35 |                 'S3Uri': "s3://" + s3AnnotationBucket + "/comprehend/train/annotations.csv",
36 |             }
37 |         },
38 |         LanguageCode='en',
39 |         VersionName= 'v001'
40 |     )
41 | 
42 |     return {
43 |         'EntityRecognizerArn': response['EntityRecognizerArn']
44 |     }


--------------------------------------------------------------------------------
/src/startTranscriptionJob.py:
--------------------------------------------------------------------------------
 1 | import boto3
 2 | import uuid
 3 | import json
 4 | import os
 5 | 
 6 | def lambda_handler(event, context):
 7 | 
 8 |     record = event['Records'][0]
 9 |     
10 |     print("Record: " + str(record))
11 |     
12 |     s3bucket = record['s3']['bucket']['name']
13 |     s3object = record['s3']['object']['key']
14 |     s3fileName = record['s3']['object']['key'].split("/")[-1]
15 |     
16 |     s3Path = "s3://" + s3bucket + "/" + s3object
17 |     jobName = s3fileName + '-' + str(uuid.uuid4())
18 | 
19 |     client = boto3.client('transcribe')
20 | 
21 |     vocabLanguage = s3object.split('-')[2].split('.')[0]
22 |     if vocabLanguage == "EN":
23 |         vocabLanguage = 'en-US'
24 |         vocabularyName = os.environ['ENVocabularyName']
25 |     elif vocabLanguage == "ES":
26 |         vocabLanguage = 'es-US'
27 |         vocabularyName = os.environ['ESVocabularyName']
28 |     
29 |     response = client.start_transcription_job(
30 |         TranscriptionJobName=jobName,
31 |         LanguageCode=vocabLanguage,
32 |         Settings = {'VocabularyName': vocabularyName,
33 |                     'ShowSpeakerLabels': True,
34 |                     'MaxSpeakerLabels': 2},
35 |         Media={
36 |             'MediaFileUri': s3Path
37 |         },
38 |         OutputBucketName = os.environ['outputBucket'],
39 |         OutputKey = os.environ['outputKey'] + s3fileName.split(".")[0] + "-transcription"
40 |         )
41 | 
42 |     return {
43 |         'TranscriptionJobName': response['TranscriptionJob']['TranscriptionJobName']
44 |     }


--------------------------------------------------------------------------------
/src/createVocabulary.py:
--------------------------------------------------------------------------------
 1 | import boto3
 2 | import uuid
 3 | import json
 4 | import os
 5 | 
 6 | def lambda_handler(event, context):
 7 | 
 8 |     record = event['Records'][0]
 9 |             
10 |     s3bucket = record['s3']['bucket']['name']
11 |     s3object = record['s3']['object']['key']
12 |     
13 |     print(s3object.split(".")[0].split("-")[2])
14 | 
15 |     if s3object.split(".")[0].split("-")[2] == "EN":
16 |     
17 |         s3Path = "s3://" + s3bucket + "/" + s3object
18 |         VocabName = "custom-vocab-EN-" + str(uuid.uuid4())
19 | 
20 |         client = boto3.client('transcribe')
21 | 
22 |         print("S3 Path:" + s3Path)
23 | 
24 |         response = client.create_vocabulary(
25 |             VocabularyName=VocabName,
26 |             LanguageCode='en-US',
27 |             VocabularyFileUri = s3Path,
28 |         )
29 | 
30 |         return {
31 |             'VocabularybName': response['VocabularyName']
32 |         }
33 |     
34 |     elif s3object.split(".")[0].split("-")[2] == "ES":
35 |         s3Path = "s3://" + s3bucket + "/" + s3object
36 |         VocabName = "custom-vocab-ES-" + str(uuid.uuid4())
37 | 
38 |         client = boto3.client('transcribe')
39 | 
40 |         print("S3 Path:" + s3Path)
41 | 
42 |         response = client.create_vocabulary(
43 |             VocabularyName=VocabName,
44 |             LanguageCode='es-ES',
45 |             VocabularyFileUri = s3Path,
46 |         )
47 | 
48 |         return {
49 |             'VocabularybName': response['VocabularyName']
50 |         }
51 |     
52 |     else:
53 |         
54 |         return {
55 |             'ErrorCode': "Language not in filename, must end in EN or ES"
56 |         }


--------------------------------------------------------------------------------
/src/createEndpoint.py:
--------------------------------------------------------------------------------
 1 | import boto3
 2 | import uuid
 3 | import os
 4 | 
 5 | def lambda_handler(event, context):
 6 | 
 7 |     ## Get Model ARN depending on argument passed
 8 | 
 9 |     client = boto3.client('comprehend')
10 | 
11 |     requestParams = event['endpointType']
12 |     
13 |     if requestParams == "EntityRecognizer":
14 |         endpointName = "aim317-entity-recognizer" + '-' + str(uuid.uuid4())[:8]
15 |         response = client.list_entity_recognizers(
16 |             Filter={
17 |                 'Status': 'TRAINED',
18 |             }
19 |         )
20 |         if not response['EntityRecognizerPropertiesList']:
21 |             return "No models trained, please check the Comprehend dashboard"
22 | 
23 |         modelARN = response['EntityRecognizerPropertiesList'][0]['EntityRecognizerArn']
24 | 
25 |     elif requestParams == "DocumentClassifier":
26 |         endpointName = "aim317-document-classifier" + '-' + str(uuid.uuid4())[:8]
27 | 
28 |         response = client.list_document_classifiers(
29 |             Filter={
30 |                 'Status': 'TRAINED',
31 |             }
32 |         )
33 |         if not response['DocumentClassifierPropertiesList']:
34 |             return "No models trained, please check the Comprehend dashboard"
35 |         
36 |         modelARN = response['DocumentClassifierPropertiesList'][0]['DocumentClassifierArn']
37 | 
38 |     response = client.create_endpoint(
39 |         EndpointName=endpointName,
40 |         ModelArn=modelARN,
41 |         DesiredInferenceUnits=4,
42 |         DataAccessRoleArn=os.environ['ComprehendARN']
43 |     )
44 | 
45 |     return {
46 |         'EndpointArn': response['EndpointArn']
47 |     }


--------------------------------------------------------------------------------
/src/classifyDocument.py:
--------------------------------------------------------------------------------
 1 | import awswrangler as wr
 2 | import pandas as pd
 3 | import boto3
 4 | import os
 5 | 
 6 | 
 7 | def lambda_handler(event, context):
 8 | 
 9 |     s3 = boto3.client('s3')
10 |     s3_resource = boto3.resource('s3')
11 |     comprehend = boto3.client('comprehend')
12 |     t_prefix = 'quicksight/data/cta'
13 | 
14 |     paginator = s3.get_paginator('list_objects_v2')
15 |     pages = paginator.paginate(Bucket=os.environ['classifierBucket'], Prefix='comprehendInput')
16 |     a = []
17 | 
18 |     cols = ['transcript_name', 'cta_status']
19 |     df_class = pd.DataFrame(columns=cols)
20 | 
21 |     comprehendEndpoint = comprehend.list_endpoints(
22 |         Filter={
23 |             'Status': 'IN_SERVICE',
24 |         }
25 |     )
26 | 
27 |     for item in comprehendEndpoint.get('EndpointPropertiesList'):
28 |         if 'document-classifier-endpoint' in item['EndpointArn']:
29 |             endpointArn = item['EndpointArn']
30 | 
31 |     for page in pages:
32 |         for obj in page['Contents']:
33 |             transcript_file_name = obj['Key'].split('/')[1]
34 |             temp = s3_resource.Object(os.environ['classifierBucket'], obj['Key'])
35 |             transcript_content = temp.get()['Body'].read().decode('utf-8')
36 |             transcript_truncated = transcript_content[-400:]
37 |             response = comprehend.classify_document(Text=transcript_truncated, EndpointArn=endpointArn)
38 |             a = response['Classes']
39 |             tempcols = ['Name', 'Score']
40 |             df_temp = pd.DataFrame(columns=tempcols)
41 |             for i in range(0, 2):
42 |                 df_temp.loc[len(df_temp.index)] = [a[i]['Name'], a[i]['Score']]
43 |             cta = df_temp.iloc[df_temp.Score.argmax(), 0:2]['Name']
44 |             df_class.loc[len(df_class.index)] = [transcript_file_name.strip('en-').strip('.txt'), cta]        
45 | 
46 |     wr.s3.to_csv(df_class, path='s3://' + os.environ['classifierBucket'] + '/' + t_prefix + '/' + 'cta_status.csv')
47 | 


--------------------------------------------------------------------------------
/src/detectEntities.py:
--------------------------------------------------------------------------------
 1 | import boto3
 2 | import os
 3 | import pandas as pd
 4 | import awswrangler as wr
 5 | 
 6 | def lambda_handler(event, context):
 7 | 
 8 |     s3 = boto3.client('s3')
 9 |     s3_resource = boto3.resource('s3')
10 |     comprehend = boto3.client('comprehend')
11 | 
12 |     t_prefix = 'quicksight/data/entity'
13 | 
14 |     paginator = s3.get_paginator('list_objects_v2')
15 |     pages = paginator.paginate(Bucket=os.environ['entityDetectionBucket'], Prefix='comprehendInput/')
16 | 
17 |     tempcols = ['Type', 'Score']
18 |     df_temp = pd.DataFrame(columns=tempcols)
19 | 
20 |     cols = ['transcript_name', 'entity_type']
21 |     df_ent = pd.DataFrame(columns=cols)
22 | 
23 |     comprehendEndpoint = comprehend.list_endpoints(
24 |         Filter={
25 |             'Status': 'IN_SERVICE',
26 |         }
27 |     )
28 | 
29 |     for item in comprehendEndpoint.get('EndpointPropertiesList'):
30 |         if 'entity-recognizer-endpoint' in item['EndpointArn']:
31 |             endpointArn = item['EndpointArn']
32 | 
33 |     for page in pages:
34 |         for obj in page['Contents']:
35 |             transcript_file_name = obj['Key'].split('/')[1]
36 |             temp = s3_resource.Object(os.environ['entityDetectionBucket'], obj['Key'])
37 |             transcript_content = temp.get()['Body'].read().decode('utf-8')
38 |             transcript_truncated = transcript_content[500:1800]
39 |             response = comprehend.detect_entities(Text=transcript_truncated, LanguageCode='en', EndpointArn=endpointArn)
40 |             df_temp = pd.DataFrame(columns=tempcols)
41 |             for ent in response['Entities']:
42 |                 df_temp.loc[len(df_temp.index)] = [ent['Type'],ent['Score']]
43 |             if len(df_temp) > 0:
44 |                 entity = df_temp.iloc[df_temp.Score.argmax(), 0:2]['Type']
45 |             else:
46 |                 entity = 'No entities'
47 |             
48 |             df_ent.loc[len(df_ent.index)] = [transcript_file_name.strip('en-'),entity]        
49 | 
50 |     wr.s3.to_csv(df_ent, path='s3://' + os.environ['entityDetectionBucket'] + '/' + t_prefix + '/' + 'entities.csv')
51 | 


--------------------------------------------------------------------------------
/src/buildTrainTest.py:
--------------------------------------------------------------------------------
 1 | import boto3
 2 | import os
 3 | import pandas as pd
 4 | import subprocess
 5 | import json
 6 | import time
 7 | import pprint
 8 | import numpy as np
 9 | import string
10 | import datetime 
11 | import random
12 | 
13 | def lambda_handler(event, context):
14 | 
15 |     s3 = boto3.client('s3')
16 |     s3_resource = boto3.resource('s3')
17 |     bucket = os.environ['s3Bucket']
18 |     prefix = 'Comprehend-Custom-Classification'
19 |     bucket = 'aim317-workshop-bucket'
20 | 
21 |     DSTTRAINFILE='data/training/comprehend-train.csv'
22 |     DSTVALIDATIONFILE='data/test/comprehend-test.csv'
23 | 
24 |     raw_data = pd.read_csv('data/training/aim317-cust-class-train-data.csv')
25 |     raw_data['label'] = raw_data['label'].astype(str)
26 |     raw_data.groupby('label')['text'].count()
27 |     selected_columns = ['label', 'text']
28 |     selected_data = raw_data[selected_columns]
29 |     selected_data.shape
30 |     selected_data.groupby('label')['text'].count()
31 | 
32 |     selected_data.to_csv(path_or_buf=DSTTRAINFILE,
33 |                     header=False,
34 |                     index=False,
35 |                     escapechar='\\',
36 |                     doublequote=False,
37 |                     quotechar='"')
38 | 
39 |     s3 = boto3.client('s3')
40 |     comprehend = boto3.client('comprehend')
41 | 
42 |     s3.upload_file(DSTTRAINFILE, bucket, prefix+'/'+DSTTRAINFILE)
43 | 
44 |     s3_train_data = 's3://{}/{}/{}'.format(bucket, prefix, DSTTRAINFILE)
45 |     s3_output_job = 's3://{}/{}/{}'.format(bucket, prefix, 'output/train_job')
46 |     print('training data location: ',s3_train_data, "output location:", s3_output_job)
47 | 
48 |     id = str(datetime.datetime.now().strftime("%s"))
49 |     training_job = comprehend.create_document_classifier(
50 |         DocumentClassifierName='BYOD-Custom-Classifier-'+ id,
51 |         DataAccessRoleArn=os.environ['ServiceRoleArn'],
52 |         InputDataConfig={
53 |             'S3Uri': s3_train_data
54 |         },
55 |         OutputDataConfig={
56 |             'S3Uri': s3_output_job
57 |         },
58 |         LanguageCode='en',
59 |         VersionName= 'v001',
60 |     )
61 | 
62 |     response = comprehend.describe_document_classifier(
63 |         DocumentClassifierArn=training_job['DocumentClassifierArn']
64 |     )
65 | 


--------------------------------------------------------------------------------
/cloudformation/sagemakerNotebookTemplate.yaml:
--------------------------------------------------------------------------------
 1 | ---
 2 | AWSTemplateFormatVersion: '2010-09-09'
 3 | 
 4 | Description: IAM Policies, and SageMaker Notebook to work with Amazon Comprehend, it will also clone the Lab codebase into the Notebook before you get started.
 5 | 
 6 | Parameters:
 7 | 
 8 |   NotebookName:
 9 |     Type: String
10 |     Default: AIM317WorkshopNotebook
11 |     Description: Enter the name of the SageMaker notebook instance. Deafault is ComprehendLabNotebook.
12 | 
13 |   DefaultCodeRepo:
14 |     Type: String
15 |     Default: https://github.com/aws-samples/uncover-insights-from-customer-conversations.git
16 |     Description: Enter the url of a git code repository for this lab
17 |     
18 |   InstanceType:
19 |     Type: String
20 |     Default: ml.t2.medium
21 |     AllowedValues:
22 |       - ml.t2.medium
23 |       - ml.m4.xlarge
24 |       - ml.c5.xlarge
25 |       - ml.p2.xlarge
26 |       - ml.p3.2xlarge
27 |     Description: Enter instance type. Default is ml.t2.medium.
28 | 
29 |   VolumeSize:
30 |     Type: Number
31 |     Default: 10
32 |     MinValue: 5
33 |     MaxValue: 16384
34 |     ConstraintDescription: Must be an integer between 5 (GB) and 16384 (16 TB).
35 |     Description: Enter the size of the EBS volume in GB. Default is 10 GB.
36 | 
37 | Resources:
38 |   # SageMaker Execution Role
39 |   SageMakerIamRole:
40 |     Type: "AWS::IAM::Role"
41 |     Properties:
42 |       AssumeRolePolicyDocument:
43 |         Version: "2012-10-17"
44 |         Statement:
45 |           -
46 |             Effect: Allow
47 |             Principal:
48 |               Service: sagemaker.amazonaws.com
49 |             Action: sts:AssumeRole
50 |       Path: "/"
51 |       ManagedPolicyArns:
52 |         - "arn:aws:iam::aws:policy/AmazonSageMakerFullAccess"
53 |         - "arn:aws:iam::aws:policy/AmazonS3FullAccess"
54 |         - "arn:aws:iam::aws:policy/ComprehendFullAccess"
55 |         - "arn:aws:iam::aws:policy/IAMFullAccess"
56 |         - "arn:aws:iam::aws:policy/TranslateFullAccess"
57 |         - "arn:aws:iam::aws:policy/AmazonTranscribeFullAccess"
58 |         
59 |   # SageMaker notebook
60 |   NotebookInstance:
61 |     Type: "AWS::SageMaker::NotebookInstance"
62 |     Properties:
63 |       InstanceType: !Ref InstanceType
64 |       NotebookInstanceName: !Ref NotebookName
65 |       RoleArn: !GetAtt SageMakerIamRole.Arn
66 |       VolumeSizeInGB: !Ref VolumeSize
67 |       DefaultCodeRepository: !Ref DefaultCodeRepo


--------------------------------------------------------------------------------
/notebooks/1-Transcribe-Translate-Calls/input/translate-parallel-data.txt:
--------------------------------------------------------------------------------
1 | es,en
2 | Buenas noches, este es Easytron, donde un amigo necesitado es un amigo de hecho ¿cómo puedo ayudarlo hoy?,Good evening this is Easytron where a friend in need is friend indeed how may I help you today?
3 | Tengo uno de estos bots de Trantor para cocinar y todo eso y simplemente no hace un licuado de chocolate para mi hijo. Sé que cargué la receta y la puedo ver pero si pido un licuado de chocolate el robot simplemente no hace nada,Yeah I got one of these Trantor bots for cooking and whatnot and it just wouldn’t make a Chocolate milkshake for my son. I know I loaded the recipe correctly and it is visible but if I ask for Chocolate milkshake the robot just freezes in place.
4 | Gracias por esperar. Parece que el robot estaba configurado con la versión predeterminada de las tres leyes de la robótica. Las tres leyes son las características fundamentales con las que se construye todo robot. Por lo general actualizamos a una versión personalizada de las tres leyes para nuestros robots domésticos con subrutinas adicionales que ayudan al robot a clasificar qué es dañino para los humanos y qué no lo es. Entonces en su caso el robot interpretó las tres leyes en el sentido de que el licuado de chocolate es dañino para los humanos,Thank you for holding.  It looks like the robot was encoded with the default version of the three laws of robotics. The three laws are the foundational characteristics every robot is built with. We typically upgrade to a customized version of the three laws for our domestic robots with additional subroutines that help the robot classify what is harmful to humans and what is not. So in your case the robot interpreted the three laws to mean that chocolate milkshake is harmful to humans.
5 | Todo parece estar bien fisicamente pero desde ayer el comportamiento del robot ha sido bastante extraño. Parece que piensa que nuestro perro es un peligro e intenta atacarlo. Yo sé que sus robots se apegan a las 3 leyes de la robótica pero ¿hay alguna ley qué especifique qué pasa con las mascotas?,Everything appears to be alright physically but the robot’s behavior has been very strange since yesterday. It seems to think our dog is a danger and tries to attack it. I know your robots adhere to the three laws of robotics but is there any law that covers what happens with pets?
6 | Bueno actualmente las 3 leyes de la robótica solo cubren a los humanos. No lo puedo ayudar mas de eso. ¿Puedo solicitar un reemplazo para ver si el nuevo funciona para usted? ¿Podría abrir un ticket?… ¿Señor David?… ¿Hola? …¿Sigue en la línea?,Well the three laws of robotics cover only humans at this point I can’t help you any more than that I can get this one replaced to see if the new one will work for you? Should I open a ticket? Sir? Hello?  Are you still there?
7 | Gracias ya ubiqué su orden y veo que tiene un robot de la serie MCP-5000. Antes de que envíe la señal de apagado de su robot ¿puedo preguntarle si por casualidad ha derramado algún liquido sobre su robot Easytron o si ha sido expuesto a alguna temperatura fuera de los límites seguros?,Thank you I see your order now and that you have an MCP-5000 series.  Before I send a shut-down signal to your robot can I ask if you happened to spill any liquids on the Easytron robot or expose it to temperatures outside the safety limits?


--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
 1 | # Contributing Guidelines
 2 | 
 3 | Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional
 4 | documentation, we greatly value feedback and contributions from our community.
 5 | 
 6 | Please read through this document before submitting any issues or pull requests to ensure we have all the necessary
 7 | information to effectively respond to your bug report or contribution.
 8 | 
 9 | 
10 | ## Reporting Bugs/Feature Requests
11 | 
12 | We welcome you to use the GitHub issue tracker to report bugs or suggest features.
13 | 
14 | When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already
15 | reported the issue. Please try to include as much information as you can. Details like these are incredibly useful:
16 | 
17 | * A reproducible test case or series of steps
18 | * The version of our code being used
19 | * Any modifications you've made relevant to the bug
20 | * Anything unusual about your environment or deployment
21 | 
22 | 
23 | ## Contributing via Pull Requests
24 | Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that:
25 | 
26 | 1. You are working against the latest source on the *main* branch.
27 | 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already.
28 | 3. You open an issue to discuss any significant work - we would hate for your time to be wasted.
29 | 
30 | To send us a pull request, please:
31 | 
32 | 1. Fork the repository.
33 | 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change.
34 | 3. Ensure local tests pass.
35 | 4. Commit to your fork using clear commit messages.
36 | 5. Send us a pull request, answering any default questions in the pull request interface.
37 | 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation.
38 | 
39 | GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and
40 | [creating a pull request](https://help.github.com/articles/creating-a-pull-request/).
41 | 
42 | 
43 | ## Finding contributions to work on
44 | Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start.
45 | 
46 | 
47 | ## Code of Conduct
48 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct).
49 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact
50 | opensource-codeofconduct@amazon.com with any additional questions or comments.
51 | 
52 | 
53 | ## Security issue notifications
54 | If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue.
55 | 
56 | 
57 | ## Licensing
58 | 
59 | See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution.
60 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # AIM317 re:Invent 2021 Workshop
 2 | ## Uncover insights from customer conversations — no ML expertise required
 3 | 
 4 | ### Abstract 
 5 | Understanding what your customers are saying is critical to your business. But navigating the technology needed to make sense of these conversations can be daunting. In this hands-on workshop, you will discover how to uncover valuable insights from your data using custom models that are tailored to your business needs—no ML expertise required. With a set of customer calls, learn how to boost transcription accuracy with Amazon Transcribe custom language models, extract insights with Amazon Comprehend custom entities, localize content with Amazon Translate active custom translation, and create powerful visualizations with Amazon QuickSight.
 6 | 
 7 | This code sample is part of the re:Invent 2021 workshop and is designed to guide AWS re:Invent attendees through setting up a complete end-to-end solution for analyzing voice recordings of customer and support representative interactions  These analyses can be used to detect sentiment, keywords, transcriptions, or translations. The following are the high-level steps to deploy this solution:
 8 | 
 9 | ## Phase 1 - Build using SageMaker Jupyter notebooks
10 | 
11 | We will first deep dive into the solution to understand what the building blocks are and how you can tie these together. The various notebooks you will run are as shown in the following image.
12 | 
13 | ![SageMaker notebook architecture](https://github.com/aws-samples/aim317-uncover-insights-customer-conversations/blob/main/static/aim317-sm-arch-full.jpg)
14 | 
15 | **For a full list of instructions to running these notebooks**, please refer to the [AIM317 Workshop Instructions](https://catalog.us-east-1.prod.workshops.aws/v2/workshops/076e45e5-760d-41cf-bd22-a86c46ee462c/en-US/)
16 | 
17 | 
18 | ## Phase 2 - Full Deployment - Operationalized Solution
19 | 
20 | The solution that will be deployed is shown in the following image.
21 | 
22 | ![Solution Architecture](https://github.com/aws-samples/aim317-uncover-insights-customer-conversations/blob/main/static/AIM317%20Diagram%20-%20A1.png)
23 | 
24 | Use the following AWS CloudFormation template to deploy the operational version of the solution. **For deploy instructions please refer to** 6-Deploy in [AIM317 Workshop Instructions](https://catalog.us-east-1.prod.workshops.aws/v2/workshops/076e45e5-760d-41cf-bd22-a86c46ee462c/en-US/)   
25 | 
26 | [![Launch Stack](https://s3.amazonaws.com/cloudformation-examples/cloudformation-launch-stack.png)](https://console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/quickcreate?templateUrl=https://ai-ml-services-lab.s3.amazonaws.com/public/labs/aim317/cloudformation/aim317Template.yml&param_SubnetID=subnet-00001)
27 | 
28 | Follow are the list of the parameters. 
29 | 
30 | | Parameters         | Description                                    |
31 | | ------------------ | ---------------------------------------------- |
32 | | SubnetID           | Subnet where the Lambdas will be deployed      |
33 | 
34 | You can copy the the required `SubnetID` from your default VPC in the VPC service in the console.
35 | 
36 | ![Subnets](static/subnets.png)
37 | 
38 | ## Security
39 | 
40 | See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.
41 | 
42 | ## License
43 | 
44 | This library is licensed under the MIT-0 License. See the [LICENSE](LICENSE) file.
45 | 


--------------------------------------------------------------------------------
/src/translateText.py:
--------------------------------------------------------------------------------
 1 | import boto3
 2 | import json
 3 | import os
 4 | 
 5 | def lambda_handler(event, context):
 6 | 
 7 |     record = event['Records'][0]
 8 |        
 9 |     s3bucket = record['s3']['bucket']['name']
10 |     s3object = record['s3']['object']['key']
11 | 
12 |     s3 = boto3.client('s3')
13 |     s3Resource = boto3.resource('s3')
14 |     transcribe = boto3.client('transcribe')
15 |     translate = boto3.client('translate')
16 | 
17 |     ## Get the transcription job name from the filename that triggered the event
18 | 
19 |     response = transcribe.list_transcription_jobs(
20 |         JobNameContains='-'.join(s3object.split("/")[1].split("-")[0:3])
21 |     )
22 | 
23 |     TranscriptionJobName = response['TranscriptionJobSummaries'][0]['TranscriptionJobName']
24 | 
25 |     transcribed_data = s3Resource.Object(s3bucket,s3object)
26 |     original = json.loads(transcribed_data.get()['Body'].read().decode('utf-8'))
27 |     entire_transcript = original['results']['transcripts']
28 |     print(entire_transcript)
29 |     outfile = '/tmp/'+ TranscriptionJobName +'.txt'
30 |     with open(outfile, 'w') as out:
31 |         out.write(entire_transcript[0]['transcript'])
32 |     s3.upload_file(outfile,os.environ['outputBucket'], 'translateInput' + TranscriptionJobName +'.txt')
33 | 
34 |     ## Now get the language code from the transcription job
35 | 
36 |     response = transcribe.get_transcription_job(
37 |         TranscriptionJobName=TranscriptionJobName
38 |     )
39 | 
40 |     TranslateLanguageCode = response['TranscriptionJob']['LanguageCode'].split("-")[0]
41 | 
42 |     if TranslateLanguageCode != 'en':
43 | 
44 |         paginator = s3.get_paginator('list_objects_v2')
45 |         pages = paginator.paginate(Bucket=os.environ['outputBucket'], Prefix='translateInput' + TranscriptionJobName +'.txt')
46 |         for page in pages:
47 |             for obj in page['Contents']:
48 |                 temp = s3Resource.Object(s3bucket, obj['Key'])
49 |                 trans_input = temp.get()['Body'].read().decode('utf-8')
50 |                 if len(trans_input) > 0:
51 |                     # Translate the Spanish transcripts
52 |                     trans_response = translate.translate_text(
53 |                         Text=trans_input,
54 |                         TerminologyNames=['aim317-custom-terminology'],
55 |                         SourceLanguageCode='es',
56 |                         TargetLanguageCode='en'
57 |                     )
58 |                     # Write the translated text to a temporary file
59 |                     with open('/tmp/temp_translate.txt',  'w') as outfile:
60 |                         outfile.write(trans_response['TranslatedText'])
61 |                     # Upload the translated text to S3 bucket
62 |                     s3.upload_file('/tmp/temp_translate.txt', os.environ['outputBucket'], 'comprehendInput' + '/en-' + TranscriptionJobName)
63 |                     print("Translated text file uploaded to: " + 's3://' + os.environ['outputBucket'] + '/' + 'comprehendInput' + '/en-' + TranscriptionJobName)
64 |         
65 |     else:
66 | 
67 |         paginator = s3.get_paginator('list_objects_v2')
68 |         pages = paginator.paginate(Bucket=os.environ['outputBucket'], Prefix='translateInput' + TranscriptionJobName +'.txt')
69 |         for page in pages:
70 |             for obj in page['Contents']:
71 |                 temp = s3Resource.Object(s3bucket, obj['Key'])
72 |                 file_input = temp.get()['Body'].read().decode('utf-8')
73 |                 with open('/tmp/temp_translate.txt',  'w') as outfile:
74 |                     outfile.write(file_input)
75 |                 s3.upload_file('/tmp/temp_translate.txt', os.environ['outputBucket'], 'comprehendInput' + '/en-' + TranscriptionJobName)
76 |                 print("Translated text file uploaded to: " + 's3://' + os.environ['outputBucket'] + '/' + 'comprehendInput' + '/en-' + TranscriptionJobName)


--------------------------------------------------------------------------------
/cloudformation/sagemakerNotebookEventEngineTemplate.yaml:
--------------------------------------------------------------------------------
  1 | ---
  2 | AWSTemplateFormatVersion: '2010-09-09'
  3 | 
  4 | Description: IAM Policies, and SageMaker Notebook to work with Amazon Comprehend, it will also clone the Lab codebase into the Notebook before you get started.
  5 | 
  6 | Parameters:
  7 | 
  8 |   EEAssetsBucket:
  9 |     Description: "Region-specific assets S3 bucket name (e.g. ee-assets-prod-us-east-1)"
 10 |     Type: String
 11 | 
 12 |   EEAssetsKeyPrefix:
 13 |     Description: "S3 key prefix where this modules assets are stored. (e.g. modules/my_module/v1/)"
 14 |     Type: String
 15 | 
 16 |   NotebookName:
 17 |     Type: String
 18 |     Default: AIM317WorkshopNotebook
 19 |     Description: Enter the name of the SageMaker notebook instance. Deafault is ComprehendLabNotebook.
 20 |     
 21 |   InstanceType:
 22 |     Type: String
 23 |     Default: ml.t2.medium
 24 |     AllowedValues:
 25 |       - ml.t2.medium
 26 |       - ml.m4.xlarge
 27 |       - ml.c5.xlarge
 28 |       - ml.p2.xlarge
 29 |       - ml.p3.2xlarge
 30 |     Description: Enter instance type. Default is ml.t2.medium.
 31 | 
 32 |   VolumeSize:
 33 |     Type: Number
 34 |     Default: 20
 35 |     MinValue: 5
 36 |     MaxValue: 16384
 37 |     ConstraintDescription: Must be an integer between 5 (GB) and 16384 (16 TB).
 38 |     Description: Enter the size of the EBS volume in GB. Default is 10 GB.
 39 | 
 40 | Resources:
 41 |   WorkshopRepository:
 42 |     Type: AWS::CodeCommit::Repository
 43 |     Properties:
 44 |       RepositoryName: ih-ea-poc-workshop
 45 |       RepositoryDescription: CodeCommit Repo for the EA Marketplace PoC workshops
 46 |       Code:
 47 |         S3:
 48 |           Bucket: !Ref EEAssetsBucket
 49 |           Key: !Sub ${EEAssetsKeyPrefix}intelligenthelpeaPOC.zip
 50 | 
 51 | 
 52 |   # SageMaker Execution Role
 53 |   SageMakerIamRole:
 54 |     Type: "AWS::IAM::Role"
 55 |     Properties:
 56 |       AssumeRolePolicyDocument:
 57 |         Version: "2012-10-17"
 58 |         Statement:
 59 |           -
 60 |             Effect: Allow
 61 |             Principal:
 62 |               Service: sagemaker.amazonaws.com
 63 |             Action: sts:AssumeRole
 64 |       Path: "/"
 65 |       ManagedPolicyArns:
 66 |         - "arn:aws:iam::aws:policy/AmazonSageMakerFullAccess"
 67 |         - "arn:aws:iam::aws:policy/AWSCodeCommitFullAccess"
 68 |         - "arn:aws:iam::aws:policy/AmazonS3FullAccess"
 69 |         - "arn:aws:iam::aws:policy/ComprehendFullAccess"
 70 |         - "arn:aws:iam::aws:policy/IAMFullAccess"
 71 |         - "arn:aws:iam::aws:policy/TranslateFullAccess"
 72 |         - "arn:aws:iam::aws:policy/AmazonTranscribeFullAccess"
 73 |         
 74 |   # SageMaker notebook
 75 |   NotebookInstance:
 76 |     Type: "AWS::SageMaker::NotebookInstance"
 77 |     Properties:
 78 |       InstanceType: !Ref InstanceType
 79 |       NotebookInstanceName: !Ref NotebookName
 80 |       RoleArn: !GetAtt SageMakerIamRole.Arn
 81 |       VolumeSizeInGB: !Ref VolumeSize
 82 |       DefaultCodeRepository: !GetAtt WorkshopRepository.CloneUrlHttp
 83 |       LifecycleConfigName: !GetAtt NotebookInstanceLifecycleConfig.NotebookInstanceLifecycleConfigName
 84 | 
 85 |   
 86 |   NotebookInstanceLifecycleConfig:
 87 |     Type: "AWS::SageMaker::NotebookInstanceLifecycleConfig"
 88 |     Properties:
 89 |       OnStart:
 90 |         - Content:
 91 |             Fn::Base64: !Sub |
 92 |               #!/bin/bash
 93 |               set -e
 94 |               mkdir -p /home/ec2-user/SageMaker/data/
 95 |               wget https://${EEAssetsBucket}.s3.amazonaws.com/${EEAssetsKeyPrefix}intelligenthelpeaPOCdata.zip && unzip intelligenthelpeaPOCdata.zip -d /home/ec2-user/SageMaker/data/
 96 | 
 97 | Outputs:
 98 |   WorkshopCloneUrlHttp:
 99 |     Description: Workshop CodeCommit Respository Http Clone Url
100 |     Value: !GetAtt WorkshopRepository.CloneUrlHttp 
101 | 
102 |   WorkshopCloneUrlSsh:
103 |     Description: Workshop CodeCommit Repository SSH Clone Url
104 |     Value: !GetAtt WorkshopRepository.CloneUrlSsh
105 | 
106 |   WorkshopRepositoryArn:
107 |     Description: Workshop CodeCommit Repository Arn
108 |     Value: !GetAtt WorkshopRepository.Arn          
109 | 


--------------------------------------------------------------------------------
/notebooks/5-Visualize-Insights/AIM317-reInvent2021-prepare-quicksight-inputs.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "id": "7de4471f",
  6 |    "metadata": {},
  7 |    "source": [
  8 |     "# Prepare inputs for Amazon Quicksight visualization\n",
  9 |     "\n",
 10 |     "Amazon QuickSight is a cloud-scale business intelligence (BI) service that you can use to deliver easy-to-understand insights to the people who you work with, wherever they are. Amazon QuickSight connects to your data in the cloud and combines data from many different sources. In a single data dashboard, QuickSight can include AWS data, third-party data, big data, spreadsheet data, SaaS data, B2B data, and more. As a fully managed cloud-based service, Amazon QuickSight provides enterprise-grade security, global availability, and built-in redundancy. It also provides the user-management tools that you need to scale from 10 users to 10,000, all with no infrastructure to deploy or manage.\n",
 11 |     "\n",
 12 |     "In this notebook, we will prepare the manifest file that we need to use with Amazon Quicksight to visualize insights we generated from our customer call transcripts."
 13 |    ]
 14 |   },
 15 |   {
 16 |    "cell_type": "markdown",
 17 |    "id": "146615fd",
 18 |    "metadata": {},
 19 |    "source": [
 20 |     "### Initialize libraries and import variables"
 21 |    ]
 22 |   },
 23 |   {
 24 |    "cell_type": "code",
 25 |    "execution_count": null,
 26 |    "id": "b2797e21",
 27 |    "metadata": {},
 28 |    "outputs": [],
 29 |    "source": [
 30 |     "# import libraries\n",
 31 |     "import pandas as pd\n",
 32 |     "import boto3\n",
 33 |     "import json\n",
 34 |     "import csv\n",
 35 |     "import os\n",
 36 |     "\n",
 37 |     "# initialize variables we need\n",
 38 |     "infile = 'quicksight_raw_manifest.json'\n",
 39 |     "outfile = 'quicksight_formatted_manifest_type.json'\n",
 40 |     "\n",
 41 |     "inprefix = 'quicksight/data'\n",
 42 |     "manifestprefix = 'quicksight/manifest'\n",
 43 |     "\n",
 44 |     "bucket = '<your-bucket-name>' # Enter your bucket name here\n",
 45 |     "\n",
 46 |     "s3 = boto3.client('s3')\n",
 47 |     "\n",
 48 |     "try:\n",
 49 |     "    s3.head_bucket(Bucket=bucket)\n",
 50 |     "except:\n",
 51 |     "    print(\"The S3 bucket name {} you entered seems to be incorrect, please try again\".format(bucket))"
 52 |    ]
 53 |   },
 54 |   {
 55 |    "cell_type": "markdown",
 56 |    "id": "fc71d262",
 57 |    "metadata": {},
 58 |    "source": [
 59 |     "### Review transcripts with insights for QuickSight\n",
 60 |     "When we ran the previous notebooks, we created CSV files containing speaker and time segmentation, the inference results that classified the transcripts to CTA/No CTA using Amazon Comprehend custom classification, we detected custom entities using Amazon Comprehend custom entity recognizer, and we finally detected the sentiment of the call transcripts using Amazon Comprehend Sentiment anlysis feature. These are available in our temp folder, let us move these to the quicksight/input folder"
 61 |    ]
 62 |   },
 63 |   {
 64 |    "cell_type": "code",
 65 |    "execution_count": null,
 66 |    "id": "57d8c711",
 67 |    "metadata": {},
 68 |    "outputs": [],
 69 |    "source": [
 70 |     "# Lets review what CSV files we have for QuickSight\n",
 71 |     "!aws s3 ls s3://{bucket}/{inprefix} --recursive "
 72 |    ]
 73 |   },
 74 |   {
 75 |    "cell_type": "markdown",
 76 |    "id": "bfe4137c",
 77 |    "metadata": {},
 78 |    "source": [
 79 |     "### Update QuickSight Manifest\n",
 80 |     "We will replace the S3 bucket and prefix from the raw manifest file with what you have entered in STEP 0 - CELL 1 above. We will then create a new formatted manifest file that will be used for creating a dataset with Amazon QuickSight based on the content we extract from the handwritten documents."
 81 |    ]
 82 |   },
 83 |   {
 84 |    "cell_type": "code",
 85 |    "execution_count": null,
 86 |    "id": "75619a75",
 87 |    "metadata": {},
 88 |    "outputs": [],
 89 |    "source": [
 90 |     "# S3 boto3 client handle\n",
 91 |     "s3 = boto3.client('s3')\n",
 92 |     "\n",
 93 |     "# Create formatted manifests for each type of dataset we need from the raw manifest JSON\n",
 94 |     "types = ['transcripts', 'entity', 'cta', 'sentiment']\n",
 95 |     "\n",
 96 |     "manifest = open(infile, 'r')\n",
 97 |     "ln = json.load(manifest)\n",
 98 |     "t = json.dumps(ln['fileLocations'][0]['URIPrefixes'])\n",
 99 |     "for type in types:\n",
100 |     "    t1 = t.replace('bucket', bucket).replace('prefix', inprefix + '/' + type)\n",
101 |     "    ln['fileLocations'][0]['URIPrefixes'] = json.loads(t1)\n",
102 |     "    outfile_rep = outfile.replace('type', type)\n",
103 |     "    with open(outfile_rep, 'w', encoding='utf-8') as out:\n",
104 |     "        json.dump(ln, out, ensure_ascii=False, indent=4)\n",
105 |     "    # Upload the manifest to S3\n",
106 |     "    s3.upload_file(outfile_rep, bucket, manifestprefix + '/' + outfile_rep)\n",
107 |     "    print(\"Manifest file uploaded to: s3://{}/{}\".format(bucket, manifestprefix + '/' + outfile_rep))"
108 |    ]
109 |   },
110 |   {
111 |    "cell_type": "markdown",
112 |    "id": "0a6dce9f",
113 |    "metadata": {},
114 |    "source": [
115 |     "#### Please copy the manifest S3 URIs above. We need it when we build the datasets for the QuickSight dashboard.\n",
116 |     "\n",
117 |     "### We are done here. Please go back to workshop instructions."
118 |    ]
119 |   }
120 |  ],
121 |  "metadata": {
122 |   "kernelspec": {
123 |    "display_name": "conda_python3",
124 |    "language": "python",
125 |    "name": "conda_python3"
126 |   },
127 |   "language_info": {
128 |    "codemirror_mode": {
129 |     "name": "ipython",
130 |     "version": 3
131 |    },
132 |    "file_extension": ".py",
133 |    "mimetype": "text/x-python",
134 |    "name": "python",
135 |    "nbconvert_exporter": "python",
136 |    "pygments_lexer": "ipython3",
137 |    "version": "3.6.13"
138 |   }
139 |  },
140 |  "nbformat": 4,
141 |  "nbformat_minor": 5
142 | }
143 | 


--------------------------------------------------------------------------------
/notebooks/4-Detect-Call-Sentiment/AIM317-reInvent2021-detect-customer-sentiment.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "id": "49212b4a",
  6 |    "metadata": {},
  7 |    "source": [
  8 |     "# Detect sentiment in customer calls using Amazon Comprehend\n",
  9 |     "\n",
 10 |     "Now we will detect the customer sentiment in the call conversations using Amazon Comprehend. "
 11 |    ]
 12 |   },
 13 |   {
 14 |    "cell_type": "markdown",
 15 |    "id": "adf00292",
 16 |    "metadata": {},
 17 |    "source": [
 18 |     "### Import libraries and initialize variables"
 19 |    ]
 20 |   },
 21 |   {
 22 |    "cell_type": "code",
 23 |    "execution_count": null,
 24 |    "id": "a16de80a",
 25 |    "metadata": {},
 26 |    "outputs": [],
 27 |    "source": [
 28 |     "import boto3\n",
 29 |     "import pandas as pd\n",
 30 |     "\n",
 31 |     "inprefix = 'comprehend/input'\n",
 32 |     "outprefix = 'quicksight/temp/insights'\n",
 33 |     "# Amazon Comprehend client\n",
 34 |     "comprehend = boto3.client('comprehend')\n",
 35 |     "# Amazon S3 clients\n",
 36 |     "s3 = boto3.client('s3')\n",
 37 |     "s3_resource = boto3.resource('s3')\n",
 38 |     "\n",
 39 |     "bucket = '<your-s3-bucket>' # Enter your bucket name here\n",
 40 |     "\n",
 41 |     "try:\n",
 42 |     "    s3.head_bucket(Bucket=bucket)\n",
 43 |     "except:\n",
 44 |     "    print(\"The S3 bucket name {} you entered seems to be incorrect, please try again\".format(bucket))"
 45 |    ]
 46 |   },
 47 |   {
 48 |    "cell_type": "markdown",
 49 |    "id": "e7e708e4",
 50 |    "metadata": {},
 51 |    "source": [
 52 |     "### Detect sentiment of transcripts\n",
 53 |     "For our workshop we will determine the sentiment of an entire call transcript to use with our visuals, but you can also capture sentiment trends in a conversation. We will demonstrate this during the workshop using the new **Transcribe Call Analytics** solution. If you like to try how this looks, please execute the optional code block at the end of this notebook."
 54 |    ]
 55 |   },
 56 |   {
 57 |    "cell_type": "code",
 58 |    "execution_count": null,
 59 |    "id": "432afad2",
 60 |    "metadata": {},
 61 |    "outputs": [],
 62 |    "source": [
 63 |     "# Prepare to page through our transcripts in S3\n",
 64 |     "paginator = s3.get_paginator('list_objects_v2')\n",
 65 |     "pages = paginator.paginate(Bucket=bucket, Prefix=inprefix)\n",
 66 |     "job_name_list = []\n",
 67 |     "t_prefix = 'quicksight/data/sentiment'\n",
 68 |     "\n",
 69 |     "# We will define a DataFrame to store the results of the sentiment analysis\n",
 70 |     "cols = ['transcript_name', 'sentiment']\n",
 71 |     "df_sent = pd.DataFrame(columns=cols)\n",
 72 |     "\n",
 73 |     "# Now lets page through the transcripts\n",
 74 |     "for page in pages:\n",
 75 |     "    for obj in page['Contents']:\n",
 76 |     "        # get the transcript file name\n",
 77 |     "        transcript_file_name = obj['Key'].split('/')[2]\n",
 78 |     "        # now lets get the transcript file contents\n",
 79 |     "        temp = s3_resource.Object(bucket, obj['Key'])\n",
 80 |     "        transcript_contents = temp.get()['Body'].read().decode('utf-8')\n",
 81 |     "        # Call Comprehend to detect sentiment\n",
 82 |     "        response = comprehend.detect_sentiment(Text=transcript_contents, LanguageCode='en')\n",
 83 |     "        # Update the results DataFrame with the cta predicted label\n",
 84 |     "        # Create a CSV file with cta label from this DataFrame\n",
 85 |     "        df_sent.loc[len(df_sent.index)] = [transcript_file_name.strip('en-').strip('.txt'),response['Sentiment']]\n",
 86 |     "        \n",
 87 |     "df_sent.to_csv('s3://' + bucket + '/' + t_prefix + '/' + 'sentiment.csv', index=False)\n",
 88 |     "df_sent"
 89 |    ]
 90 |   },
 91 |   {
 92 |    "cell_type": "markdown",
 93 |    "id": "7a785d9a",
 94 |    "metadata": {},
 95 |    "source": [
 96 |     "### OPTIONAL - Detect sentiment Trend\n",
 97 |     "We will now take one of the transcripts and show you how to detect sentiment trend in conversations. This can be a powerful insight to both demonstrate and understand the triggers for a shift in customer perspective as well as how to remedy it."
 98 |    ]
 99 |   },
100 |   {
101 |    "cell_type": "code",
102 |    "execution_count": null,
103 |    "id": "b3788eff",
104 |    "metadata": {},
105 |    "outputs": [],
106 |    "source": [
107 |     "# Select one of the transcripts we created in 1-Transcribe-Translate\n",
108 |     "import os\n",
109 |     "rootdir = '/home/ec2-user/SageMaker/aim317-uncover-insights-customer-conversations/notebooks/1-Transcribe-Translate-Calls'\n",
110 |     "csvfile = ''\n",
111 |     "for subdir, dirs, files in os.walk(rootdir):\n",
112 |     "    for file in files:\n",
113 |     "        filepath = subdir + os.sep + file\n",
114 |     "        if filepath.endswith(\".csv\"):\n",
115 |     "            csvfile = str(filepath)\n",
116 |     "            break\n",
117 |     "            \n",
118 |     "df_t = pd.read_csv(csvfile)\n",
119 |     "df_t.head()"
120 |    ]
121 |   },
122 |   {
123 |    "cell_type": "markdown",
124 |    "id": "87d7583d",
125 |    "metadata": {},
126 |    "source": [
127 |     "Separate the sentences spoken by each of the speakers to their own dictionaries along with the last timestamp when their sentence ended"
128 |    ]
129 |   },
130 |   {
131 |    "cell_type": "code",
132 |    "execution_count": null,
133 |    "id": "c7554b69",
134 |    "metadata": {},
135 |    "outputs": [],
136 |    "source": [
137 |     "spk_0 = {}\n",
138 |     "spk_1 = {}\n",
139 |     "a = ''\n",
140 |     "b = ''\n",
141 |     "j = 0\n",
142 |     "k = 0\n",
143 |     "for i, row in df_t.iterrows():\n",
144 |     "    if row['speaker_label'] == 'spk_0':\n",
145 |     "        if len(b) > 0:\n",
146 |     "            j += 1\n",
147 |     "            spk_1['end_time'+str(j)] = row['start_time'] \n",
148 |     "            spk_1['transcript'+str(j)] = b\n",
149 |     "            b = ''\n",
150 |     "        a += row['content'] + ' '\n",
151 |     "    if row['speaker_label'] == 'spk_1':\n",
152 |     "        if len(a) > 0:\n",
153 |     "            k += 1\n",
154 |     "            spk_0['end_time'+str(k)] = row['start_time']\n",
155 |     "            spk_0['transcript'+str(k)] = a\n",
156 |     "            a = ''\n",
157 |     "        b += row['content'] + ' '\n",
158 |     "if len(a) > 0:\n",
159 |     "    spk_0['transcript'+str(j+1)] = a\n",
160 |     "    spk_0['end_time'+str(j+1)] = row['end_time']\n",
161 |     "if len(b) > 0:\n",
162 |     "    spk_1['transcript'+str(k+1)] = b\n",
163 |     "    spk_1['end_time'+str(k+1)] = row['end_time']"
164 |    ]
165 |   },
166 |   {
167 |    "cell_type": "markdown",
168 |    "id": "4e688186",
169 |    "metadata": {},
170 |    "source": [
171 |     "#### Check the results"
172 |    ]
173 |   },
174 |   {
175 |    "cell_type": "code",
176 |    "execution_count": null,
177 |    "id": "07c49134",
178 |    "metadata": {},
179 |    "outputs": [],
180 |    "source": [
181 |     "spk_0"
182 |    ]
183 |   },
184 |   {
185 |    "cell_type": "markdown",
186 |    "id": "a7addfcf",
187 |    "metadata": {},
188 |    "source": [
189 |     "Now get the **sentiment for each line using Amazon Comprehend** and update the transcript with the sentiment"
190 |    ]
191 |   },
192 |   {
193 |    "cell_type": "code",
194 |    "execution_count": null,
195 |    "id": "4dbc59de",
196 |    "metadata": {},
197 |    "outputs": [],
198 |    "source": [
199 |     "import re\n",
200 |     "for line in spk_0:\n",
201 |     "    if 'transcript' in line:\n",
202 |     "        res0 = comprehend.detect_sentiment(Text=spk_0[line], LanguageCode='en')['Sentiment']\n",
203 |     "        spk_0[line] = res0\n",
204 |     "\n",
205 |     "for line in spk_1:\n",
206 |     "    if 'transcript' in line:\n",
207 |     "        res1 = comprehend.detect_sentiment(Text=spk_1[line], LanguageCode='en')['Sentiment']\n",
208 |     "        spk_1[line] = res1"
209 |    ]
210 |   },
211 |   {
212 |    "cell_type": "code",
213 |    "execution_count": null,
214 |    "id": "440b8ad0",
215 |    "metadata": {},
216 |    "outputs": [],
217 |    "source": [
218 |     "spk_1"
219 |    ]
220 |   },
221 |   {
222 |    "cell_type": "markdown",
223 |    "id": "d1e26acd",
224 |    "metadata": {},
225 |    "source": [
226 |     "#### Let us now graph it"
227 |    ]
228 |   },
229 |   {
230 |    "cell_type": "code",
231 |    "execution_count": null,
232 |    "id": "fcde602e",
233 |    "metadata": {},
234 |    "outputs": [],
235 |    "source": [
236 |     "!pip install matplotlib"
237 |    ]
238 |   },
239 |   {
240 |    "cell_type": "code",
241 |    "execution_count": null,
242 |    "id": "67257a58",
243 |    "metadata": {},
244 |    "outputs": [],
245 |    "source": [
246 |     "import matplotlib.pyplot as plt\n",
247 |     "\n",
248 |     "spk_0_end_time = []\n",
249 |     "spk_0_sentiment = []\n",
250 |     "spk_1_end_time = []\n",
251 |     "spk_1_sentiment = []\n",
252 |     "\n",
253 |     "\n",
254 |     "for x in spk_0:\n",
255 |     "    if 'end_time' in x:\n",
256 |     "        spk_0_end_time.append(spk_0[x])\n",
257 |     "    if 'transcript' in x:\n",
258 |     "        spk_0_sentiment.append(spk_0[x])\n",
259 |     "\n",
260 |     "for x in spk_1:\n",
261 |     "    if 'end_time' in x:\n",
262 |     "        spk_1_end_time.append(spk_1[x])\n",
263 |     "    if 'transcript' in x:\n",
264 |     "        spk_1_sentiment.append(spk_1[x])\n",
265 |     "        \n",
266 |     "plt.plot(spk_0_end_time, spk_0_sentiment, color = 'g', label = 'Speaker 0 Sentiment Trend')\n",
267 |     "plt.plot(spk_1_end_time, spk_1_sentiment, color = 'b', label = 'Speaker 1 Sentiment Trend')\n",
268 |     "plt.xlabel('Call time in seconds')\n",
269 |     "plt.ylabel('Sentiment')\n",
270 |     "plt.legend()"
271 |    ]
272 |   },
273 |   {
274 |    "cell_type": "markdown",
275 |    "id": "d18282d2",
276 |    "metadata": {},
277 |    "source": [
278 |     "As you can see above, the sky's the limit on what you can do with the Amazon Transcribe output in tandem with Amazon Comprehend. Please go back now to watch your team members create some **AWSome visuals using Amazon QuickSight!!**"
279 |    ]
280 |   },
281 |   {
282 |    "cell_type": "markdown",
283 |    "id": "e797961e",
284 |    "metadata": {},
285 |    "source": [
286 |     "## End of notebook. Please go back to the workshop instructions to review the next steps."
287 |    ]
288 |   }
289 |  ],
290 |  "metadata": {
291 |   "kernelspec": {
292 |    "display_name": "conda_python3",
293 |    "language": "python",
294 |    "name": "conda_python3"
295 |   },
296 |   "language_info": {
297 |    "codemirror_mode": {
298 |     "name": "ipython",
299 |     "version": 3
300 |    },
301 |    "file_extension": ".py",
302 |    "mimetype": "text/x-python",
303 |    "name": "python",
304 |    "nbconvert_exporter": "python",
305 |    "pygments_lexer": "ipython3",
306 |    "version": "3.6.13"
307 |   }
308 |  },
309 |  "nbformat": 4,
310 |  "nbformat_minor": 5
311 | }
312 | 


--------------------------------------------------------------------------------
/cloudformation/aim317Template.yml:
--------------------------------------------------------------------------------
  1 | AWSTemplateFormatVersion: '2010-09-09'
  2 | Parameters:
  3 |   ESVocabularyName:
  4 |     Type: String
  5 |     Default: "Add Vocabulary Name"
  6 |   ENVocabularyName: 
  7 |     Type: String
  8 |     Default: "Add Vocabulary Name"
  9 | Resources:
 10 |   ServiceRole:
 11 |     Type: AWS::IAM::Role
 12 |     Properties: 
 13 |       AssumeRolePolicyDocument: 
 14 |         Version: "2012-10-17"
 15 |         Statement: 
 16 |           - 
 17 |             Effect: "Allow"
 18 |             Principal: 
 19 |               Service: 
 20 |                 - "comprehend.amazonaws.com"
 21 |                 - "lambda.amazonaws.com"
 22 |                 - "translate.amazonaws.com"
 23 |                 - "s3.amazonaws.com"
 24 |             Action: 
 25 |             - "sts:AssumeRole"
 26 |       RoleName: "AIM317ServiceRole"
 27 |       ManagedPolicyArns: 
 28 |         - arn:aws:iam::aws:policy/TranslateFullAccess
 29 |         - arn:aws:iam::aws:policy/AmazonS3FullAccess
 30 |         - arn:aws:iam::aws:policy/AmazonTranscribeFullAccess
 31 |         - arn:aws:iam::aws:policy/ComprehendFullAccess
 32 |         - arn:aws:iam::aws:policy/AWSLambdaExecute
 33 | 
 34 |   PassthroughPolicy:
 35 |     Type: AWS::IAM::Policy
 36 |     Properties: 
 37 |       PolicyName: "AIM317PassthroughPolicy"
 38 |       PolicyDocument:
 39 |         Version: "2012-10-17"
 40 |         Statement: 
 41 |           - Effect: Allow
 42 |             Action: 
 43 |               - 'iam:PassRole'
 44 |             Resource: !GetAtt ServiceRole.Arn
 45 |       Roles:
 46 |           - !Ref ServiceRole
 47 | 
 48 |   StartTranscriptionJob: 
 49 |     Type: AWS::Lambda::Function
 50 |     Properties:
 51 |       Description: "Triggers on raw audio files added to S3 location and performs transcription"
 52 |       Handler: startTranscriptionJob.lambda_handler
 53 |       Code: 
 54 |         S3Bucket: !Sub 'aim317-code-${AWS::AccountId}'
 55 |         S3Key: "src/startTranscriptionJob.zip"
 56 |       Runtime: python3.8
 57 |       Timeout: 10
 58 |       ReservedConcurrentExecutions: 1
 59 |       FunctionName: "AIM317startTranscriptionJob"
 60 |       Role: !GetAtt ServiceRole.Arn
 61 |       Environment:
 62 |         Variables:
 63 |           outputBucket : !Sub 'aim317-${AWS::AccountId}'
 64 |           outputKey : "transcriptOutput/"
 65 |           ESVocabularyName : !Ref ESVocabularyName
 66 |           ENVocabularyName : !Ref ENVocabularyName
 67 |   
 68 |   StartTranscriptionJobPermission:
 69 |     Type: AWS::Lambda::Permission
 70 |     Properties: 
 71 |       Action: lambda:InvokeFunction
 72 |       FunctionName: !GetAtt StartTranscriptionJob.Arn
 73 |       Principal: s3.amazonaws.com
 74 |       SourceArn: !Sub 'arn:aws:s3:::aim317-${AWS::AccountId}'
 75 |   
 76 |   CreateVocabulary: 
 77 |     Type: AWS::Lambda::Function
 78 |     Properties:
 79 |       Handler: createVocabulary.lambda_handler
 80 |       Code: 
 81 |         S3Bucket: !Sub 'aim317-code-${AWS::AccountId}'
 82 |         S3Key: "src/createVocabulary.zip"
 83 |       Runtime: python3.8
 84 |       ReservedConcurrentExecutions: 1
 85 |       FunctionName: "AIM317createVocavulary"
 86 |       Role: !GetAtt ServiceRole.Arn
 87 |   
 88 |   CreateVocabularyPermission:
 89 |     Type: AWS::Lambda::Permission
 90 |     Properties: 
 91 |       Action: lambda:InvokeFunction
 92 |       FunctionName: !GetAtt CreateVocabulary.Arn
 93 |       Principal: s3.amazonaws.com
 94 |       SourceArn: !Sub 'arn:aws:s3:::aim317-${AWS::AccountId}'
 95 | 
 96 |   ImportTerminology: 
 97 |     Type: AWS::Lambda::Function
 98 |     Properties:
 99 |       Handler: importTerminology.lambda_handler
100 |       Code: 
101 |         S3Bucket: !Sub 'aim317-code-${AWS::AccountId}'
102 |         S3Key: "src/importTerminology.zip"
103 |       Runtime: python3.8
104 |       ReservedConcurrentExecutions: 1
105 |       FunctionName: "AIM317importTerminology"
106 |       Role: !GetAtt ServiceRole.Arn
107 |   
108 |   ImportTerminolofyPermission:
109 |     Type: AWS::Lambda::Permission
110 |     Properties: 
111 |       Action: lambda:InvokeFunction
112 |       FunctionName: !GetAtt ImportTerminology.Arn
113 |       Principal: s3.amazonaws.com
114 |       SourceArn: !Sub 'arn:aws:s3:::aim317-${AWS::AccountId}'
115 | 
116 |   TranslateText: 
117 |     Type: AWS::Lambda::Function
118 |     Properties:
119 |       Handler: translateText.lambda_handler
120 |       Code: 
121 |         S3Bucket: !Sub 'aim317-code-${AWS::AccountId}'
122 |         S3Key: "src/translateText.zip"
123 |       Runtime: python3.8
124 |       ReservedConcurrentExecutions: 1
125 |       FunctionName: "AIM317translateText"
126 |       Role: !GetAtt ServiceRole.Arn
127 |       Environment:
128 |         Variables:
129 |           outputBucket : !Sub 'aim317-${AWS::AccountId}'
130 |           outputKey : "translateOutput/"
131 |           TranslateARN : !GetAtt ServiceRole.Arn
132 |   
133 |   TranslateTextPermission:
134 |     Type: AWS::Lambda::Permission
135 |     Properties: 
136 |       Action: lambda:InvokeFunction
137 |       FunctionName: !GetAtt TranslateText.Arn
138 |       Principal: s3.amazonaws.com
139 |       SourceArn: !Sub 'arn:aws:s3:::aim317-${AWS::AccountId}'
140 | 
141 |   CreateEntityRecognizer: 
142 |     Type: AWS::Lambda::Function
143 |     Properties:
144 |       Handler: createEntityRecognizer.lambda_handler
145 |       Code: 
146 |         S3Bucket: !Sub 'aim317-code-${AWS::AccountId}'
147 |         S3Key: "src/createEntityRecognizer.zip"
148 |       Runtime: python3.8
149 |       ReservedConcurrentExecutions: 1
150 |       FunctionName: "AIM317createEntityRecognizer"
151 |       Role: !GetAtt ServiceRole.Arn
152 |       Environment:
153 |         Variables:
154 |           ComprehendARN : !GetAtt ServiceRole.Arn
155 |           ComprehendAnnotationBucket : !Sub 'aim317-${AWS::AccountId}'
156 |           ComprehendTargetBucket : !Sub 'aim317-${AWS::AccountId}'
157 |   
158 |   CreateEntityRecognizerPermission:
159 |     Type: AWS::Lambda::Permission
160 |     Properties: 
161 |       Action: lambda:InvokeFunction
162 |       FunctionName: !GetAtt CreateEntityRecognizer.Arn
163 |       Principal: s3.amazonaws.com
164 |       SourceArn: !Sub 'arn:aws:s3:::aim317-${AWS::AccountId}'
165 | 
166 |   CreateEndpoint: 
167 |     Type: AWS::Lambda::Function
168 |     Properties:
169 |       Handler: createEndpoint.lambda_handler
170 |       Code: 
171 |         S3Bucket: !Sub 'aim317-code-${AWS::AccountId}'
172 |         S3Key: "src/createEndpoint.zip"
173 |       Description: "Lambda that creates an endpoint for inference"
174 |       Runtime: python3.8
175 |       ReservedConcurrentExecutions: 1
176 |       FunctionName: "AIM317createEndpoint"
177 |       Role: !GetAtt ServiceRole.Arn
178 |       Environment:
179 |         Variables:
180 |           ComprehendARN : !GetAtt ServiceRole.Arn
181 |   
182 |   CreateEndpointPermission:
183 |     Type: AWS::Lambda::Permission
184 |     Properties: 
185 |       Action: lambda:InvokeFunction
186 |       FunctionName: !GetAtt CreateEndpoint.Arn
187 |       Principal: s3.amazonaws.com
188 |       SourceArn: !Sub 'arn:aws:s3:::aim317-${AWS::AccountId}'
189 | 
190 |   DetectEntities: 
191 |     Type: AWS::Lambda::Function
192 |     Properties:
193 |       Handler: detectEntities.lambda_handler
194 |       Code: 
195 |         S3Bucket: !Sub 'aim317-code-${AWS::AccountId}'
196 |         S3Key: "src/detectEntities.zip"
197 |       Runtime: python3.8
198 |       Timeout: 10
199 |       ReservedConcurrentExecutions: 1
200 |       Layers:
201 |         - "arn:aws:lambda:us-east-1:336392948345:layer:AWSDataWrangler-Python38:1"
202 |       FunctionName: "AIM317detectEntities"
203 |       Role: !GetAtt ServiceRole.Arn
204 |       Environment:
205 |         Variables:
206 |           entityDetectionBucket : !Sub 'aim317-${AWS::AccountId}'
207 |           ComprehendARN : !GetAtt ServiceRole.Arn
208 |   
209 |   DetectEntitiesPermission:
210 |     Type: AWS::Lambda::Permission
211 |     Properties: 
212 |       Action: lambda:InvokeFunction
213 |       FunctionName: !GetAtt DetectEntities.Arn
214 |       Principal: s3.amazonaws.com
215 |       SourceArn: !Sub 'arn:aws:s3:::aim317-${AWS::AccountId}'
216 | 
217 |   BuildTrainTest: 
218 |     Type: AWS::Lambda::Function
219 |     Properties:
220 |       Handler: paginateProcessDataTrainTestFiles.lambda_handler
221 |       Code: 
222 |         S3Bucket: !Sub 'aim317-code-${AWS::AccountId}'
223 |         S3Key: "src/paginateProcessDataTrainTestFiles.zip"
224 |       Runtime: python3.8
225 |       ReservedConcurrentExecutions: 1
226 |       Layers:
227 |         - "arn:aws:lambda:us-east-1:336392948345:layer:AWSDataWrangler-Python38:1"
228 |       FunctionName: "AIM317buildTrainTest"
229 |       Role: !GetAtt ServiceRole.Arn
230 |       Environment:
231 |         Variables:
232 |           comprehendBucket : !Sub 'aim317-${AWS::AccountId}'
233 |           ComprehendARN : !GetAtt ServiceRole.Arn
234 |   
235 |   BuildTrainTestPermission:
236 |     Type: AWS::Lambda::Permission
237 |     Properties: 
238 |       Action: lambda:InvokeFunction
239 |       FunctionName: !GetAtt BuildTrainTest.Arn
240 |       Principal: s3.amazonaws.com
241 |       SourceArn: !Sub 'arn:aws:s3:::aim317-${AWS::AccountId}'
242 | 
243 |   CreateDocumentClassifier: 
244 |     Type: AWS::Lambda::Function
245 |     Properties:
246 |       Handler: createDocumentClassifier.lambda_handler
247 |       Code: 
248 |         S3Bucket: !Sub 'aim317-code-${AWS::AccountId}'
249 |         S3Key: "src/createDocumentClassifier.zip"
250 |       Runtime: python3.8
251 |       ReservedConcurrentExecutions: 1
252 |       FunctionName: "AIM317createDocumentClassifier"
253 |       Role: !GetAtt ServiceRole.Arn
254 |       Environment:
255 |         Variables:
256 |           classifierBucket : !Sub 'aim317-${AWS::AccountId}'
257 |           classifierBucketPrefix : "comprehend-custom-classifier"
258 |           ComprehendARN : !GetAtt ServiceRole.Arn
259 |   
260 |   CreateDocumentClassifierPermission:
261 |     Type: AWS::Lambda::Permission
262 |     Properties: 
263 |       Action: lambda:InvokeFunction
264 |       FunctionName: !GetAtt CreateDocumentClassifier.Arn
265 |       Principal: s3.amazonaws.com
266 |       SourceArn: !Sub 'arn:aws:s3:::aim317-${AWS::AccountId}'
267 | 
268 |   ClassifyDocument: 
269 |     Type: AWS::Lambda::Function
270 |     Properties:
271 |       Handler: classifyDocument.lambda_handler
272 |       Code: 
273 |         S3Bucket: !Sub 'aim317-code-${AWS::AccountId}'
274 |         S3Key: "src/classifyDocument.zip"
275 |       Runtime: python3.8
276 |       Timeout: 10
277 |       Layers:
278 |         - "arn:aws:lambda:us-east-1:336392948345:layer:AWSDataWrangler-Python38:1"
279 |       ReservedConcurrentExecutions: 1
280 |       FunctionName: "AIM317classifyDocument"
281 |       Role: !GetAtt ServiceRole.Arn
282 |       Environment:
283 |         Variables:
284 |           classifierBucketPrefix : "quicksight/data/cta"
285 |           classifierBucket : !Sub 'aim317-${AWS::AccountId}'
286 |           ComprehendARN : !GetAtt ServiceRole.Arn
287 |   
288 |   ClassifyDocumentPermission:
289 |     Type: AWS::Lambda::Permission
290 |     Properties: 
291 |       Action: lambda:InvokeFunction
292 |       FunctionName: !GetAtt ClassifyDocument.Arn
293 |       Principal: s3.amazonaws.com
294 |       SourceArn: !Sub 'arn:aws:s3:::aim317-${AWS::AccountId}'
295 | 
296 |   DetectSentiment: 
297 |     Type: AWS::Lambda::Function
298 |     Properties:
299 |       Handler: startSentimentDetection.lambda_handler
300 |       Code: 
301 |         S3Bucket: !Sub 'aim317-code-${AWS::AccountId}'
302 |         S3Key: "src/startSentimentDetection.zip"
303 |       Runtime: python3.8
304 |       Timeout: 10
305 |       Layers:
306 |         - "arn:aws:lambda:us-east-1:336392948345:layer:AWSDataWrangler-Python38:1"
307 |       ReservedConcurrentExecutions: 1
308 |       FunctionName: "AIM317detectSentiment"
309 |       Role: !GetAtt ServiceRole.Arn
310 |       Environment:
311 |         Variables:
312 |           ComprehendBucket : !Sub 'aim317-${AWS::AccountId}'
313 |           ComprehendARN : !GetAtt ServiceRole.Arn
314 |   
315 |   DetectSentimentPermission:
316 |     Type: AWS::Lambda::Permission
317 |     Properties: 
318 |       Action: lambda:InvokeFunction
319 |       FunctionName: !GetAtt DetectSentiment.Arn
320 |       Principal: s3.amazonaws.com
321 |       SourceArn: !Sub 'arn:aws:s3:::aim317-${AWS::AccountId}'
322 | 
323 |   AIM317Bucket:
324 |     Type: AWS::S3::Bucket
325 |     DeletionPolicy: Delete
326 |     UpdateReplacePolicy: Retain
327 |     Properties:
328 |       AccessControl: Private
329 |       BucketName: !Sub 'aim317-${AWS::AccountId}'
330 |       LoggingConfiguration: 
331 |         LogFilePrefix: access-logs
332 |       VersioningConfiguration:
333 |         Status: Enabled
334 |       PublicAccessBlockConfiguration: 
335 |           BlockPublicAcls: true
336 |           BlockPublicPolicy: false
337 |           IgnorePublicAcls: true
338 |           RestrictPublicBuckets: true
339 |       NotificationConfiguration:
340 |         LambdaConfigurations:
341 |           - Event: s3:ObjectCreated:*
342 |             Filter:
343 |               S3Key:
344 |                 Rules: 
345 |                   - Name: prefix
346 |                     Value: "transcribeInput/"
347 |             Function: !GetAtt StartTranscriptionJob.Arn
348 |           - Event: s3:ObjectCreated:*
349 |             Filter:
350 |               S3Key:
351 |                 Rules: 
352 |                   - Name: prefix
353 |                     Value: "vocabularyInput/"
354 |             Function: !GetAtt CreateVocabulary.Arn
355 |           - Event: s3:ObjectCreated:*
356 |             Filter:
357 |               S3Key:
358 |                 Rules: 
359 |                   - Name: prefix
360 |                     Value: "terminologyInput/"
361 |             Function: !GetAtt ImportTerminology.Arn
362 |           - Event: s3:ObjectCreated:*
363 |             Filter:
364 |               S3Key:
365 |                 Rules: 
366 |                   - Name: prefix
367 |                     Value: "transcriptOutput/"
368 |             Function: !GetAtt TranslateText.Arn


--------------------------------------------------------------------------------
/notebooks/2-Train-Detect-Entities/annotations.csv:
--------------------------------------------------------------------------------
  1 | File,Line,Begin Offset,End Offset,Type
  2 | train.csv,0,45,67,ETHICS
  3 | train.csv,0,92,108,BRAIN
  4 | train.csv,1,28,45,BRAIN
  5 | train.csv,1,58,68,ETHICS
  6 | train.csv,2,145,155,ETHICS
  7 | train.csv,3,123,140,BRAIN
  8 | train.csv,4,79,96,BRAIN
  9 | train.csv,5,56,61,MOVEMENT
 10 | train.csv,5,99,109,BRAIN
 11 | train.csv,6,78,95,BRAIN
 12 | train.csv,7,70,87,BRAIN
 13 | train.csv,8,53,63,BRAIN
 14 | train.csv,8,191,201,ETHICS
 15 | train.csv,9,200,210,BRAIN
 16 | train.csv,10,57,79,ETHICS
 17 | train.csv,10,184,207,BRAIN
 18 | train.csv,11,99,109,ETHICS
 19 | train.csv,12,55,65,ETHICS
 20 | train.csv,13,65,83,ETHICS
 21 | train.csv,13,104,129,ETHICS
 22 | train.csv,14,170,195,ETHICS
 23 | train.csv,15,296,306,ETHICS
 24 | train.csv,16,79,89,ETHICS
 25 | train.csv,16,94,117,BRAIN
 26 | train.csv,17,109,131,ETHICS
 27 | train.csv,18,299,315,BRAIN
 28 | train.csv,19,102,112,BRAIN
 29 | train.csv,20,4,20,BRAIN
 30 | train.csv,20,153,157,MOVEMENT
 31 | train.csv,21,0,16,BRAIN
 32 | train.csv,22,87,97,ETHICS
 33 | train.csv,23,21,37,BRAIN
 34 | train.csv,23,157,167,ETHICS
 35 | train.csv,24,80,90,ETHICS
 36 | train.csv,24,189,211,ETHICS
 37 | train.csv,25,102,118,BRAIN
 38 | train.csv,26,135,151,BRAIN
 39 | train.csv,27,99,109,BRAIN
 40 | train.csv,29,73,82,ETHICS
 41 | train.csv,29,119,137,ETHICS
 42 | train.csv,30,4,14,BRAIN
 43 | train.csv,31,48,64,BRAIN
 44 | train.csv,32,7,17,BRAIN
 45 | train.csv,33,28,44,BRAIN
 46 | train.csv,34,100,110,BRAIN
 47 | train.csv,35,33,50,BRAIN
 48 | train.csv,36,192,209,BRAIN
 49 | train.csv,37,81,97,BRAIN
 50 | train.csv,37,165,174,ETHICS
 51 | train.csv,38,102,118,BRAIN
 52 | train.csv,39,69,85,BRAIN
 53 | train.csv,40,22,31,ETHICS
 54 | train.csv,41,109,118,ETHICS
 55 | train.csv,42,59,69,ETHICS
 56 | train.csv,42,91,114,ETHICS
 57 | train.csv,42,137,167,ETHICS
 58 | train.csv,43,130,139,ETHICS
 59 | train.csv,44,59,64,BRAIN
 60 | train.csv,44,90,100,ETHICS
 61 | train.csv,45,40,49,ETHICS
 62 | train.csv,45,74,84,ETHICS
 63 | train.csv,45,206,222,BRAIN
 64 | train.csv,46,131,153,ETHICS
 65 | train.csv,47,114,131,BRAIN
 66 | train.csv,48,29,46,BRAIN
 67 | train.csv,49,44,52,MOVEMENT
 68 | train.csv,51,161,166,MOVEMENT
 69 | train.csv,51,176,180,MOVEMENT
 70 | train.csv,52,41,45,MOVEMENT
 71 | train.csv,54,35,39,MOVEMENT
 72 | train.csv,55,203,225,ETHICS
 73 | train.csv,56,38,42,MOVEMENT
 74 | train.csv,57,69,92,BRAIN
 75 | train.csv,58,16,39,BRAIN
 76 | train.csv,59,40,44,MOVEMENT
 77 | train.csv,60,39,43,MOVEMENT
 78 | train.csv,61,0,7,MOVEMENT
 79 | train.csv,61,117,127,MOVEMENT
 80 | train.csv,62,34,38,MOVEMENT
 81 | train.csv,65,42,46,MOVEMENT
 82 | train.csv,66,35,39,MOVEMENT
 83 | train.csv,67,103,107,MOVEMENT
 84 | train.csv,69,65,69,MOVEMENT
 85 | train.csv,70,65,70,MOVEMENT
 86 | train.csv,70,80,84,MOVEMENT
 87 | train.csv,73,124,130,MOVEMENT
 88 | train.csv,74,7,17,MOVEMENT
 89 | train.csv,74,155,160,MOVEMENT
 90 | train.csv,75,8,18,MOVEMENT
 91 | train.csv,75,65,70,MOVEMENT
 92 | train.csv,77,13,17,MOVEMENT
 93 | train.csv,77,162,167,BRAIN
 94 | train.csv,78,26,30,MOVEMENT
 95 | train.csv,79,0,4,MOVEMENT
 96 | train.csv,79,61,65,MOVEMENT
 97 | train.csv,80,59,63,MOVEMENT
 98 | train.csv,81,9,13,MOVEMENT
 99 | train.csv,81,201,205,MOVEMENT
100 | train.csv,82,2,6,MOVEMENT
101 | train.csv,83,4,8,MOVEMENT
102 | train.csv,83,42,47,MOVEMENT
103 | train.csv,83,52,58,MOVEMENT
104 | train.csv,84,4,8,MOVEMENT
105 | train.csv,85,4,8,MOVEMENT
106 | train.csv,85,79,89,MOVEMENT
107 | train.csv,86,25,30,MOVEMENT
108 | train.csv,86,35,41,MOVEMENT
109 | train.csv,86,149,153,MOVEMENT
110 | train.csv,87,4,10,MOVEMENT
111 | train.csv,87,45,49,MOVEMENT
112 | train.csv,87,108,115,MOVEMENT
113 | train.csv,88,4,9,MOVEMENT
114 | train.csv,88,105,109,MOVEMENT
115 | train.csv,89,14,18,MOVEMENT
116 | train.csv,90,76,80,MOVEMENT
117 | train.csv,91,142,146,MOVEMENT
118 | train.csv,92,37,41,MOVEMENT
119 | train.csv,92,78,85,MOVEMENT
120 | train.csv,93,16,24,MOVEMENT
121 | train.csv,93,136,142,MOVEMENT
122 | train.csv,94,4,12,MOVEMENT
123 | train.csv,94,101,111,MOVEMENT
124 | train.csv,95,0,8,MOVEMENT
125 | train.csv,95,56,61,MOVEMENT
126 | train.csv,96,21,31,MOVEMENT
127 | train.csv,97,134,156,ETHICS
128 | train.csv,98,62,72,ETHICS
129 | train.csv,99,55,68,ETHICS
130 | train.csv,100,142,152,ETHICS
131 | train.csv,101,4,26,ETHICS
132 | train.csv,101,74,87,ETHICS
133 | train.csv,102,4,14,ETHICS
134 | train.csv,102,86,95,ETHICS
135 | train.csv,102,194,204,ETHICS
136 | train.csv,102,215,235,ETHICS
137 | train.csv,102,322,331,ETHICS
138 | train.csv,103,49,66,BRAIN
139 | train.csv,104,159,169,ETHICS
140 | train.csv,106,31,41,ETHICS
141 | train.csv,106,114,144,ETHICS
142 | train.csv,107,4,14,ETHICS
143 | train.csv,108,22,32,ETHICS
144 | train.csv,109,37,47,ETHICS
145 | train.csv,110,63,72,ETHICS
146 | train.csv,111,41,51,ETHICS
147 | train.csv,113,136,145,ETHICS
148 | train.csv,114,137,147,ETHICS
149 | train.csv,115,116,126,ETHICS
150 | train.csv,116,212,221,ETHICS
151 | train.csv,117,164,174,ETHICS
152 | train.csv,118,79,88,ETHICS
153 | train.csv,119,178,188,ETHICS
154 | train.csv,122,67,89,ETHICS
155 | train.csv,123,24,34,ETHICS
156 | train.csv,124,26,36,ETHICS
157 | train.csv,125,156,166,ETHICS
158 | train.csv,126,13,22,ETHICS
159 | train.csv,127,105,114,ETHICS
160 | train.csv,127,124,134,ETHICS
161 | train.csv,128,82,92,ETHICS
162 | train.csv,129,144,154,ETHICS
163 | train.csv,130,70,86,BRAIN
164 | train.csv,130,152,161,ETHICS
165 | train.csv,131,24,40,BRAIN
166 | train.csv,132,32,42,ETHICS
167 | train.csv,132,126,131,BRAIN
168 | train.csv,133,90,100,ETHICS
169 | train.csv,134,95,105,ETHICS
170 | train.csv,135,55,65,ETHICS
171 | train.csv,136,165,175,ETHICS
172 | train.csv,137,75,85,ETHICS
173 | train.csv,138,93,102,ETHICS
174 | train.csv,138,203,213,ETHICS
175 | train.csv,138,275,284,ETHICS
176 | train.csv,138,426,436,ETHICS
177 | train.csv,138,538,548,ETHICS
178 | train.csv,139,4,20,ETHICS
179 | train.csv,139,173,183,ETHICS
180 | train.csv,139,267,277,ETHICS
181 | train.csv,140,33,43,ETHICS
182 | train.csv,140,127,149,ETHICS
183 | train.csv,141,4,20,ETHICS
184 | train.csv,142,173,183,ETHICS
185 | train.csv,143,112,122,ETHICS
186 | train.csv,144,119,129,ETHICS
187 | train.csv,145,66,76,ETHICS
188 | train.csv,146,17,26,ETHICS
189 | train.csv,147,9,19,ETHICS
190 | train.csv,147,158,167,ETHICS
191 | train.csv,148,89,98,ETHICS
192 | train.csv,149,41,57,BRAIN
193 | train.csv,150,90,99,ETHICS
194 | train.csv,151,11,34,BRAIN
195 | train.csv,151,74,84,ETHICS
196 | train.csv,152,96,106,ETHICS
197 | train.csv,152,132,155,BRAIN
198 | train.csv,153,178,188,ETHICS
199 | train.csv,154,95,111,ETHICS
200 | train.csv,155,80,90,ETHICS
201 | train.csv,156,103,113,ETHICS
202 | train.csv,157,121,131,ETHICS
203 | train.csv,158,33,43,ETHICS
204 | train.csv,159,76,86,ETHICS
205 | train.csv,160,20,30,ETHICS
206 | train.csv,161,275,285,ETHICS
207 | train.csv,162,93,103,ETHICS
208 | train.csv,163,64,74,ETHICS
209 | train.csv,164,21,31,ETHICS
210 | train.csv,165,75,85,ETHICS
211 | train.csv,166,2,18,BRAIN
212 | train.csv,168,63,85,ETHICS
213 | train.csv,168,114,130,BRAIN
214 | train.csv,169,0,16,BRAIN
215 | train.csv,170,53,69,BRAIN
216 | train.csv,171,90,112,ETHICS
217 | train.csv,171,213,229,BRAIN
218 | train.csv,172,87,103,BRAIN
219 | train.csv,173,2,18,BRAIN
220 | train.csv,173,72,82,ETHICS
221 | train.csv,174,4,14,ETHICS
222 | train.csv,174,40,45,BRAIN
223 | train.csv,175,67,76,ETHICS
224 | train.csv,175,223,233,ETHICS
225 | train.csv,176,13,18,BRAIN
226 | train.csv,176,218,228,ETHICS
227 | train.csv,177,22,32,BRAIN
228 | train.csv,177,53,58,BRAIN
229 | train.csv,178,59,69,BRAIN
230 | train.csv,179,149,159,ETHICS
231 | train.csv,180,181,197,BRAIN
232 | train.csv,181,109,114,BRAIN
233 | train.csv,182,106,122,BRAIN
234 | train.csv,183,127,143,BRAIN
235 | train.csv,183,178,200,ETHICS
236 | train.csv,184,264,274,BRAIN
237 | train.csv,185,44,60,BRAIN
238 | train.csv,186,199,215,BRAIN
239 | train.csv,187,75,80,BRAIN
240 | train.csv,188,78,94,BRAIN
241 | train.csv,189,18,34,BRAIN
242 | train.csv,190,118,134,BRAIN
243 | train.csv,191,102,118,BRAIN
244 | train.csv,192,66,82,BRAIN
245 | train.csv,193,34,50,BRAIN
246 | train.csv,193,78,88,ETHICS
247 | train.csv,194,115,137,ETHICS
248 | train.csv,195,90,106,BRAIN
249 | train.csv,196,28,44,BRAIN
250 | train.csv,197,131,147,BRAIN
251 | train.csv,198,27,49,ETHICS
252 | train.csv,199,255,271,BRAIN
253 | train.csv,200,98,114,BRAIN
254 | train.csv,201,28,38,BRAIN
255 | train.csv,201,39,62,BRAIN
256 | train.csv,202,65,81,BRAIN
257 | train.csv,203,2,16,BRAIN
258 | train.csv,204,2,16,BRAIN
259 | train.csv,205,8,22,BRAIN
260 | train.csv,206,29,43,BRAIN
261 | train.csv,207,115,129,BRAIN
262 | train.csv,208,131,145,BRAIN
263 | train.csv,209,0,14,BRAIN
264 | train.csv,210,90,104,BRAIN
265 | train.csv,211,83,97,BRAIN
266 | train.csv,212,72,78,BRAIN
267 | train.csv,213,56,62,BRAIN
268 | train.csv,213,89,94,BRAIN
269 | train.csv,214,58,72,BRAIN
270 | train.csv,215,31,37,BRAIN
271 | train.csv,216,99,105,BRAIN
272 | train.csv,217,44,49,MOVEMENT
273 | train.csv,217,146,151,MOVEMENT
274 | train.csv,217,205,219,BRAIN
275 | train.csv,218,18,23,MOVEMENT
276 | train.csv,218,76,82,BRAIN
277 | train.csv,219,27,33,BRAIN
278 | train.csv,220,120,134,BRAIN
279 | train.csv,221,80,86,BRAIN
280 | train.csv,222,49,55,BRAIN
281 | train.csv,222,73,87,BRAIN
282 | train.csv,223,128,142,BRAIN
283 | train.csv,224,9,15,BRAIN
284 | train.csv,224,97,102,BRAIN
285 | train.csv,225,0,7,MOVEMENT
286 | train.csv,225,75,81,MOVEMENT
287 | train.csv,228,0,7,MOVEMENT
288 | train.csv,229,34,41,MOVEMENT
289 | train.csv,229,53,57,MOVEMENT
290 | train.csv,231,25,32,MOVEMENT
291 | train.csv,234,5,11,MOVEMENT
292 | train.csv,235,56,63,MOVEMENT
293 | train.csv,238,39,49,MOVEMENT
294 | train.csv,239,32,42,MOVEMENT
295 | train.csv,240,56,66,MOVEMENT
296 | train.csv,240,150,157,MOVEMENT
297 | train.csv,241,35,42,MOVEMENT
298 | train.csv,242,75,85,MOVEMENT
299 | train.csv,244,124,131,MOVEMENT
300 | train.csv,244,135,142,MOVEMENT
301 | train.csv,245,46,50,MOVEMENT
302 | train.csv,245,55,58,MOVEMENT
303 | train.csv,245,90,95,MOVEMENT
304 | train.csv,249,23,27,MOVEMENT
305 | train.csv,250,80,84,MOVEMENT
306 | train.csv,251,44,51,MOVEMENT
307 | train.csv,251,64,70,MOVEMENT
308 | train.csv,252,42,48,MOVEMENT
309 | train.csv,253,141,147,MOVEMENT
310 | train.csv,254,65,73,MOVEMENT
311 | train.csv,254,93,100,MOVEMENT
312 | train.csv,255,77,84,MOVEMENT
313 | train.csv,255,96,103,MOVEMENT
314 | train.csv,256,71,78,MOVEMENT
315 | train.csv,257,34,41,MOVEMENT
316 | train.csv,257,53,57,MOVEMENT
317 | train.csv,258,36,43,MOVEMENT
318 | train.csv,258,48,55,MOVEMENT
319 | train.csv,258,102,112,MOVEMENT
320 | train.csv,259,130,137,MOVEMENT
321 | train.csv,259,165,172,MOVEMENT
322 | train.csv,260,80,87,MOVEMENT
323 | train.csv,261,54,59,MOVEMENT
324 | train.csv,261,115,122,MOVEMENT
325 | train.csv,262,40,44,MOVEMENT
326 | train.csv,263,9,13,MOVEMENT
327 | train.csv,265,46,51,MOVEMENT
328 | train.csv,265,89,96,MOVEMENT
329 | train.csv,266,80,85,MOVEMENT
330 | train.csv,266,125,132,MOVEMENT
331 | train.csv,267,14,20,MOVEMENT
332 | train.csv,267,231,237,MOVEMENT
333 | train.csv,268,93,97,MOVEMENT
334 | train.csv,269,99,106,MOVEMENT
335 | train.csv,271,45,67,ETHICS
336 | train.csv,271,92,108,BRAIN
337 | train.csv,272,28,45,BRAIN
338 | train.csv,272,58,68,ETHICS
339 | train.csv,273,145,155,ETHICS
340 | train.csv,274,123,140,BRAIN
341 | train.csv,275,79,96,BRAIN
342 | train.csv,276,56,61,MOVEMENT
343 | train.csv,276,99,109,BRAIN
344 | train.csv,277,78,95,BRAIN
345 | train.csv,278,70,87,BRAIN
346 | train.csv,279,53,63,BRAIN
347 | train.csv,279,191,201,ETHICS
348 | train.csv,280,200,210,BRAIN
349 | train.csv,281,57,79,ETHICS
350 | train.csv,281,184,207,BRAIN
351 | train.csv,282,99,109,ETHICS
352 | train.csv,283,55,65,ETHICS
353 | train.csv,284,65,83,ETHICS
354 | train.csv,284,104,129,ETHICS
355 | train.csv,285,170,195,ETHICS
356 | train.csv,286,296,306,ETHICS
357 | train.csv,287,79,89,ETHICS
358 | train.csv,287,94,117,BRAIN
359 | train.csv,288,109,131,ETHICS
360 | train.csv,289,299,315,BRAIN
361 | train.csv,290,102,112,BRAIN
362 | train.csv,291,4,20,BRAIN
363 | train.csv,291,153,157,MOVEMENT
364 | train.csv,292,0,16,BRAIN
365 | train.csv,293,87,97,ETHICS
366 | train.csv,294,21,37,BRAIN
367 | train.csv,294,157,167,ETHICS
368 | train.csv,295,80,90,ETHICS
369 | train.csv,295,189,211,ETHICS
370 | train.csv,296,102,118,BRAIN
371 | train.csv,297,135,151,BRAIN
372 | train.csv,298,99,109,BRAIN
373 | train.csv,300,73,82,ETHICS
374 | train.csv,300,119,137,ETHICS
375 | train.csv,301,4,14,BRAIN
376 | train.csv,302,48,64,BRAIN
377 | train.csv,303,7,17,BRAIN
378 | train.csv,304,28,44,BRAIN
379 | train.csv,305,100,110,BRAIN
380 | train.csv,306,33,50,BRAIN
381 | train.csv,307,192,209,BRAIN
382 | train.csv,308,81,97,BRAIN
383 | train.csv,308,165,174,ETHICS
384 | train.csv,309,102,118,BRAIN
385 | train.csv,310,69,85,BRAIN
386 | train.csv,311,22,31,ETHICS
387 | train.csv,312,109,118,ETHICS
388 | train.csv,313,59,69,ETHICS
389 | train.csv,313,91,114,ETHICS
390 | train.csv,313,137,167,ETHICS
391 | train.csv,314,130,139,ETHICS
392 | train.csv,315,59,64,BRAIN
393 | train.csv,315,90,100,ETHICS
394 | train.csv,316,40,49,ETHICS
395 | train.csv,316,74,84,ETHICS
396 | train.csv,316,206,222,BRAIN
397 | train.csv,317,131,153,ETHICS
398 | train.csv,318,114,131,BRAIN
399 | train.csv,319,29,46,BRAIN
400 | train.csv,320,44,52,MOVEMENT
401 | train.csv,322,161,166,MOVEMENT
402 | train.csv,322,176,180,MOVEMENT
403 | train.csv,323,41,45,MOVEMENT
404 | train.csv,325,35,39,MOVEMENT
405 | train.csv,326,203,225,ETHICS
406 | train.csv,327,38,42,MOVEMENT
407 | train.csv,328,69,92,BRAIN
408 | train.csv,329,16,39,BRAIN
409 | train.csv,330,40,44,MOVEMENT
410 | train.csv,331,39,43,MOVEMENT
411 | train.csv,332,0,7,MOVEMENT
412 | train.csv,332,117,127,MOVEMENT
413 | train.csv,333,34,38,MOVEMENT
414 | train.csv,336,42,46,MOVEMENT
415 | train.csv,337,35,39,MOVEMENT
416 | train.csv,338,103,107,MOVEMENT
417 | train.csv,340,65,69,MOVEMENT
418 | train.csv,341,65,70,MOVEMENT
419 | train.csv,341,80,84,MOVEMENT
420 | train.csv,344,124,130,MOVEMENT
421 | train.csv,345,7,17,MOVEMENT
422 | train.csv,345,155,160,MOVEMENT
423 | train.csv,346,8,18,MOVEMENT
424 | train.csv,346,65,70,MOVEMENT
425 | train.csv,348,13,17,MOVEMENT
426 | train.csv,348,162,167,BRAIN
427 | train.csv,349,26,30,MOVEMENT
428 | train.csv,350,0,4,MOVEMENT
429 | train.csv,350,61,65,MOVEMENT
430 | train.csv,351,59,63,MOVEMENT
431 | train.csv,352,9,13,MOVEMENT
432 | train.csv,352,201,205,MOVEMENT
433 | train.csv,353,2,6,MOVEMENT
434 | train.csv,354,4,8,MOVEMENT
435 | train.csv,354,42,47,MOVEMENT
436 | train.csv,354,52,58,MOVEMENT
437 | train.csv,355,4,8,MOVEMENT
438 | train.csv,356,4,8,MOVEMENT
439 | train.csv,356,79,89,MOVEMENT
440 | train.csv,357,25,30,MOVEMENT
441 | train.csv,357,35,41,MOVEMENT
442 | train.csv,357,149,153,MOVEMENT
443 | train.csv,358,4,10,MOVEMENT
444 | train.csv,358,45,49,MOVEMENT
445 | train.csv,358,108,115,MOVEMENT
446 | train.csv,359,4,9,MOVEMENT
447 | train.csv,359,105,109,MOVEMENT
448 | train.csv,360,14,18,MOVEMENT
449 | train.csv,361,76,80,MOVEMENT
450 | train.csv,362,142,146,MOVEMENT
451 | train.csv,363,37,41,MOVEMENT
452 | train.csv,363,78,85,MOVEMENT
453 | train.csv,364,16,24,MOVEMENT
454 | train.csv,364,136,142,MOVEMENT
455 | train.csv,365,4,12,MOVEMENT
456 | train.csv,365,101,111,MOVEMENT
457 | train.csv,366,0,8,MOVEMENT
458 | train.csv,366,56,61,MOVEMENT
459 | train.csv,367,21,31,MOVEMENT
460 | train.csv,368,134,156,ETHICS
461 | train.csv,369,62,72,ETHICS
462 | train.csv,370,55,68,ETHICS
463 | train.csv,371,142,152,ETHICS
464 | train.csv,372,4,26,ETHICS
465 | train.csv,372,74,87,ETHICS
466 | train.csv,373,4,14,ETHICS
467 | train.csv,373,86,95,ETHICS
468 | train.csv,373,194,204,ETHICS
469 | train.csv,373,215,235,ETHICS
470 | train.csv,373,322,331,ETHICS
471 | train.csv,374,49,66,BRAIN
472 | train.csv,375,159,169,ETHICS
473 | train.csv,377,31,41,ETHICS
474 | train.csv,377,114,144,ETHICS
475 | train.csv,378,4,14,ETHICS
476 | train.csv,379,22,32,ETHICS
477 | train.csv,380,37,47,ETHICS
478 | train.csv,381,63,72,ETHICS
479 | train.csv,382,41,51,ETHICS
480 | train.csv,384,136,145,ETHICS
481 | train.csv,385,137,147,ETHICS
482 | train.csv,386,116,126,ETHICS
483 | train.csv,387,212,221,ETHICS
484 | train.csv,388,164,174,ETHICS
485 | train.csv,389,79,88,ETHICS
486 | train.csv,390,178,188,ETHICS
487 | train.csv,393,67,89,ETHICS
488 | train.csv,394,24,34,ETHICS
489 | train.csv,395,26,36,ETHICS
490 | train.csv,396,156,166,ETHICS
491 | train.csv,397,13,22,ETHICS
492 | train.csv,398,105,114,ETHICS
493 | train.csv,398,124,134,ETHICS
494 | train.csv,399,82,92,ETHICS
495 | train.csv,400,144,154,ETHICS
496 | train.csv,401,70,86,BRAIN
497 | train.csv,401,152,161,ETHICS
498 | train.csv,402,24,40,BRAIN
499 | train.csv,403,32,42,ETHICS
500 | train.csv,403,126,131,BRAIN
501 | train.csv,404,90,100,ETHICS
502 | 


--------------------------------------------------------------------------------
/notebooks/2-Train-Detect-Entities/AIM317-reInvent2021-train-and-detect-entities.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Amazon Comprehend Custom Entity Recognizer\n",
  8 |     "\n",
  9 |     "This notebook will serve as a template for the overall process of taking a text dataset and integrating it into [Amazon Comprehend Custom Entity Recognizer](https://docs.aws.amazon.com/comprehend/latest/dg/custom-entity-recognition.html) and perform natural language processing (NLP) to detect custom entities in your text.\n",
 10 |     "\n",
 11 |     "## Overview\n",
 12 |     "\n",
 13 |     "1. [Introduction to Amazon Comprehend Custom NER](#Introduction)\n",
 14 |     "1. [Obtaining Your Data](#data)\n",
 15 |     "1. [Pre-processing data](#preprocess)\n",
 16 |     "1. [Training a custom recognizer](#train)\n",
 17 |     "1. [Real time inference](#inference)\n",
 18 |     "1. [Cleanup](#cleanup)"
 19 |    ]
 20 |   },
 21 |   {
 22 |    "cell_type": "markdown",
 23 |    "metadata": {},
 24 |    "source": [
 25 |     "## Introduction to Amazon Comprehend Custom Entity Recognition <a class=\"anchor\" id=\"Introduction\"/>\n",
 26 |     "\n",
 27 |     "Amazon Comprehend recognizes and detects nine entity types out of the box from your data, such as person, date, place etc. Custom entity recognition extends the capability of Amazon Comprehend by helping you identify your specific new entity types that are not of from the preset generic entity types. In this case, this notebook trains Amazon Comprehend to detect three additional entity types - Robot Ethics, Positronic Brain and Kinematics.\n",
 28 |     "\n",
 29 |     "Building a custom entity recognizer helps to identify key words and phrases that are relevant to your business needs, and Amazon Comprehend helps you in reducing the complexity by providing automatic annotation and model training to create a custom entity model. For more information, see [Comprehend Custom Entity Recognition](https://docs.aws.amazon.com/comprehend/latest/dg/custom-entity-recognition.html)"
 30 |    ]
 31 |   },
 32 |   {
 33 |    "cell_type": "markdown",
 34 |    "metadata": {},
 35 |    "source": [
 36 |     "## Obtaining Your Data <a class=\"anchor\" id=\"data\"/>\n",
 37 |     "\n",
 38 |     "To train a custom entity recognizer, Amazon Comprehend needs training data in one of two formats -\n",
 39 |     "1. **Entity Lists (plain text only)**\n",
 40 |     "You specify a list of documents that contain your entities, and in addition, specify a list of specific entities to search for in the documents. This is preferred when you have a finite list of entities to work with (for example, the EasyTron model names).\n",
 41 |     "2. **Annotations**\n",
 42 |     "This is more comprehensive, and provides the location of your entities in a large number of documents using the entity locations (offsets). Through this, Comprehnd can train on both the entity and its context. \n",
 43 |     "\n",
 44 |     "For our use case, to generate custom annotations, we make use of [Amazon SageMaker Ground Truth](https://aws.amazon.com/sagemaker/groundtruth/). We use Ground Truth with a private workforce to annotate the entities in hundreds of documents, and generate annotation files using the results. To learn more about how to use Ground Truth to annotate data, see [Named Entity Recognition](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-named-entity-recg.html).\n",
 45 |     "\n",
 46 |     "For the lab, we have already labeled the data and the annotation files are provided. "
 47 |    ]
 48 |   },
 49 |   {
 50 |    "cell_type": "markdown",
 51 |    "metadata": {},
 52 |    "source": [
 53 |     "## Pre-processing data<a class=\"anchor\" id=\"preprocess\"/> "
 54 |    ]
 55 |   },
 56 |   {
 57 |    "cell_type": "code",
 58 |    "execution_count": null,
 59 |    "metadata": {},
 60 |    "outputs": [],
 61 |    "source": [
 62 |     "# Firstly, we import necessary libraries and initialize clients\n",
 63 |     "import re\n",
 64 |     "import time\n",
 65 |     "import json\n",
 66 |     "import uuid\n",
 67 |     "import boto3\n",
 68 |     "import random\n",
 69 |     "import secrets\n",
 70 |     "import datetime\n",
 71 |     "import sagemaker\n",
 72 |     "import pandas as pd\n",
 73 |     "from sagemaker import get_execution_role\n",
 74 |     "\n",
 75 |     "\n",
 76 |     "s3 = boto3.client('s3')\n",
 77 |     "comprehend = boto3.client('comprehend')\n",
 78 |     "\n",
 79 |     "# provide the name of your S3 bucket here. This was already created in your account for this workshop\n",
 80 |     "bucket = '<your-S3-bucket>' \n",
 81 |     "\n",
 82 |     "region = boto3.session.Session().region_name\n",
 83 |     "\n",
 84 |     "# Amazon S3 (S3) client\n",
 85 |     "s3 = boto3.client('s3', region)\n",
 86 |     "s3_resource = boto3.resource('s3')\n",
 87 |     "try:\n",
 88 |     "    s3.head_bucket(Bucket=bucket)\n",
 89 |     "except:\n",
 90 |     "    print(\"The S3 bucket name {} you entered seems to be incorrect, please try again\".format(bucket))"
 91 |    ]
 92 |   },
 93 |   {
 94 |    "cell_type": "code",
 95 |    "execution_count": null,
 96 |    "metadata": {},
 97 |    "outputs": [],
 98 |    "source": [
 99 |     "# This is the execution role that will be used to call Amazon Transcribe and Amazon Translate\n",
100 |     "role = get_execution_role()\n",
101 |     "display(role)"
102 |    ]
103 |   },
104 |   {
105 |    "cell_type": "markdown",
106 |    "metadata": {},
107 |    "source": [
108 |     "#### We already provided you a training dataset and an annotations file in the repository, let's have a look at them now"
109 |    ]
110 |   },
111 |   {
112 |    "cell_type": "code",
113 |    "execution_count": null,
114 |    "metadata": {},
115 |    "outputs": [],
116 |    "source": [
117 |     "pd.read_csv('train.csv',header=None).head(10)"
118 |    ]
119 |   },
120 |   {
121 |    "cell_type": "code",
122 |    "execution_count": null,
123 |    "metadata": {},
124 |    "outputs": [],
125 |    "source": [
126 |     "pd.read_csv('annotations.csv').head(10)"
127 |    ]
128 |   },
129 |   {
130 |    "cell_type": "code",
131 |    "execution_count": null,
132 |    "metadata": {},
133 |    "outputs": [],
134 |    "source": [
135 |     "# let's upload our train and annotation files to S3\n",
136 |     "s3.upload_file('train.csv', bucket, 'comprehend/train/train.csv')\n",
137 |     "s3.upload_file('annotations.csv', bucket, 'comprehend/train/annotations.csv')\n",
138 |     "s3_train_channel = \"s3://\" + bucket + \"/comprehend/train/train.csv\"\n",
139 |     "s3_annot_channel = \"s3://\" + bucket + \"/comprehend/train/annotations.csv\""
140 |    ]
141 |   },
142 |   {
143 |    "cell_type": "markdown",
144 |    "metadata": {},
145 |    "source": [
146 |     "### Create Comprehend Custom Entity Recognizer"
147 |    ]
148 |   },
149 |   {
150 |    "cell_type": "code",
151 |    "execution_count": null,
152 |    "metadata": {},
153 |    "outputs": [],
154 |    "source": [
155 |     "custom_entity_request = {\n",
156 |     "    \"DataFormat\": \"COMPREHEND_CSV\",\n",
157 |     "    \"Documents\": { \n",
158 |     "        \"S3Uri\": s3_train_channel,\n",
159 |     "        \"InputFormat\": \"ONE_DOC_PER_LINE\"\n",
160 |     "    },\n",
161 |     "    \"Annotations\": { \n",
162 |     "         \"S3Uri\": s3_annot_channel\n",
163 |     "    },\n",
164 |     "    \"EntityTypes\": [\n",
165 |     "        {\n",
166 |     "            \"Type\": \"MOVEMENT\"\n",
167 |     "        },\n",
168 |     "        {\n",
169 |     "            \"Type\": \"BRAIN\"\n",
170 |     "        },\n",
171 |     "        {\n",
172 |     "            \"Type\": \"ETHICS\"\n",
173 |     "        }\n",
174 |     "    ]\n",
175 |     "}"
176 |    ]
177 |   },
178 |   {
179 |    "cell_type": "code",
180 |    "execution_count": null,
181 |    "metadata": {},
182 |    "outputs": [],
183 |    "source": [
184 |     "# create unique ID for recognizer\n",
185 |     "uid = str(uuid.uuid4())\n",
186 |     "\n",
187 |     "response = comprehend.create_entity_recognizer(\n",
188 |     "        RecognizerName=f\"aim317-ner-{uid}\", \n",
189 |     "        DataAccessRoleArn=role,\n",
190 |     "        InputDataConfig=custom_entity_request,\n",
191 |     "        LanguageCode=\"en\",\n",
192 |     "        VersionName= 'v001'\n",
193 |     ")\n",
194 |     "\n",
195 |     "print(response['EntityRecognizerArn'])"
196 |    ]
197 |   },
198 |   {
199 |    "cell_type": "markdown",
200 |    "metadata": {},
201 |    "source": [
202 |     "### Check training status in Amazon Comprehend console\n",
203 |     "\n",
204 |     "[Go to Amazon Comprehend Console](https://console.aws.amazon.com/comprehend/v2/home?region=us-east-1#entity-recognition)\n",
205 |     "\n",
206 |     "This will take approximately 20 minutes. **Execute the Entity Recongizer Metrics step below only after** the entity recognizer model has been created and is ready for use. Otherwise you will get an error message. If this is the case no worries, just try it again after the entity recognizer has finished training."
207 |    ]
208 |   },
209 |   {
210 |    "cell_type": "code",
211 |    "execution_count": null,
212 |    "metadata": {},
213 |    "outputs": [],
214 |    "source": [
215 |     "describe_response = comprehend.describe_entity_recognizer(\n",
216 |     "        EntityRecognizerArn=response['EntityRecognizerArn']\n",
217 |     ")\n",
218 |     "\n",
219 |     "print(describe_response['EntityRecognizerProperties']['Status'])"
220 |    ]
221 |   },
222 |   {
223 |    "cell_type": "markdown",
224 |    "metadata": {},
225 |    "source": [
226 |     "### Entity Recognizer Metrics"
227 |    ]
228 |   },
229 |   {
230 |    "cell_type": "code",
231 |    "execution_count": null,
232 |    "metadata": {},
233 |    "outputs": [],
234 |    "source": [
235 |     "# Print recognizer metrics\n",
236 |     "print(\"Entity recognizer metrics:\")\n",
237 |     "for ent in describe_response[\"EntityRecognizerProperties\"][\"RecognizerMetadata\"][\"EntityTypes\"]:\n",
238 |     "    print(ent['Type'])\n",
239 |     "    metrics = ent['EvaluationMetrics']\n",
240 |     "    for k, v in metrics.items():\n",
241 |     "        metrics[k] = round(v, 2)\n",
242 |     "    print(metrics)"
243 |    ]
244 |   },
245 |   {
246 |    "cell_type": "code",
247 |    "execution_count": null,
248 |    "metadata": {},
249 |    "outputs": [],
250 |    "source": [
251 |     "describe_response['EntityRecognizerProperties']['EntityRecognizerArn']"
252 |    ]
253 |   },
254 |   {
255 |    "cell_type": "markdown",
256 |    "metadata": {},
257 |    "source": [
258 |     "## Create endpoint"
259 |    ]
260 |   },
261 |   {
262 |    "cell_type": "markdown",
263 |    "metadata": {},
264 |    "source": [
265 |     "Now that the model is trained, we'll deploy the model to an Amazon Comprehend endpoint for synchronous, real-time inference. "
266 |    ]
267 |   },
268 |   {
269 |    "cell_type": "code",
270 |    "execution_count": null,
271 |    "metadata": {},
272 |    "outputs": [],
273 |    "source": [
274 |     "# NOTE - We are using real-time endpoints and chunked text for demo purposes in this workshop. For your actual use case\n",
275 |     " # if you don't need real-time insights from Comprehend, we suggest using Comprehend start_entities_detection_job or batch_detect_entities to send the full corpus for entity detection\n",
276 |     "  # If your need is real-time inference, please use the Comprehend real-time endpoint as we show in this notebook.\n",
277 |     "   # We have used 4 Inference Units (IU) in this workshop, each IU has a throughput of 100 characters per second.\n",
278 |     "endpoint_response = comprehend.create_endpoint(\n",
279 |     "    EndpointName=f\"aim317-ner-endpoint\",\n",
280 |     "    ModelArn=describe_response['EntityRecognizerProperties']['EntityRecognizerArn'],\n",
281 |     "    DesiredInferenceUnits=4,  # you are charged based on Inference Units, for this workshop lets create 4 IUs\n",
282 |     "    DataAccessRoleArn=role\n",
283 |     ")"
284 |    ]
285 |   },
286 |   {
287 |    "cell_type": "code",
288 |    "execution_count": null,
289 |    "metadata": {},
290 |    "outputs": [],
291 |    "source": [
292 |     "print(endpoint_response['EndpointArn'])"
293 |    ]
294 |   },
295 |   {
296 |    "cell_type": "markdown",
297 |    "metadata": {},
298 |    "source": [
299 |     "### Check endpoint status in Amazon Comprehend console\n",
300 |     "\n",
301 |     "[Go to Amazon Comprehend Console](https://console.aws.amazon.com/comprehend/v2/home?region=us-east-1#endpoints)\n",
302 |     "\n",
303 |     "This will take approximately 10 minutes. Go to the **Run Inference** step below after the endpoint has been created and is ready for use. Running the cells prior to the endpoint being ready will result in error. You can re-execute the cell after the endpoint becomes available."
304 |    ]
305 |   },
306 |   {
307 |    "cell_type": "markdown",
308 |    "metadata": {},
309 |    "source": [
310 |     "## Run inference"
311 |    ]
312 |   },
313 |   {
314 |    "cell_type": "code",
315 |    "execution_count": null,
316 |    "metadata": {},
317 |    "outputs": [],
318 |    "source": [
319 |     "# Input files ready for entity recognition\n",
320 |     "!aws s3 ls s3://{bucket}/comprehend/input/"
321 |    ]
322 |   },
323 |   {
324 |    "cell_type": "code",
325 |    "execution_count": null,
326 |    "metadata": {},
327 |    "outputs": [],
328 |    "source": [
329 |     "# Prepare to page through our transcripts in S3\n",
330 |     "\n",
331 |     "# Define the S3 handles\n",
332 |     "s3 = boto3.client('s3')\n",
333 |     "s3_resource = boto3.resource('s3')\n",
334 |     "\n",
335 |     "\n",
336 |     "# Specify an S3 output prefix\n",
337 |     "t_prefix = 'quicksight/data/entity'\n",
338 |     "\n",
339 |     "\n",
340 |     "# Lets define the bucket name that contains the transcripts first\n",
341 |     "# So far we used a session bucket we created for training and testing the classifier\n",
342 |     "paginator = s3.get_paginator('list_objects_v2')\n",
343 |     "pages = paginator.paginate(Bucket=bucket, Prefix='comprehend/input')\n",
344 |     "job_name_list = []\n",
345 |     "\n",
346 |     "# We will use a temp DataFrame to extract the entity type that is most prominent in the transcript\n",
347 |     "tempcols = ['Type', 'Score']\n",
348 |     "df_temp = pd.DataFrame(columns=tempcols)\n",
349 |     "\n",
350 |     "\n",
351 |     "# We will define a DataFrame to store the results of the classifier\n",
352 |     "cols = ['transcript_name', 'entity_type']\n",
353 |     "df_ent = pd.DataFrame(columns=cols)\n",
354 |     "\n",
355 |     "# Now lets page through the transcripts\n",
356 |     "for page in pages:\n",
357 |     "    for obj in page['Contents']:\n",
358 |     "        entity = ''\n",
359 |     "        # get the transcript file name\n",
360 |     "        transcript_file_name = obj['Key'].split('/')[2]\n",
361 |     "        # now lets get the transcript file contents\n",
362 |     "        temp = s3_resource.Object(bucket, obj['Key'])\n",
363 |     "        transcript_content = temp.get()['Body'].read().decode('utf-8')\n",
364 |     "        # Send a chunk of the transcript for entity recognition\n",
365 |     "        # NOTE - We are using real-time endpoints and chunked text for demo purposes in this workshop. For your actual use case\n",
366 |     "        # if you don't need real-time insights from Comprehend, we suggest using Comprehend start_entities_detection_job or batch_detect_entities to send the full corpus for entity detection\n",
367 |     "        # If your need is real-time inference, please use the Comprehend real-time endpoint as we show in this notebook.\n",
368 |     "        # We have used 4 Inference Units (IU) in this workshop, each IU has a throughput of 100 characters per second.\n",
369 |     "        transcript_truncated = transcript_content[400:1800]\n",
370 |     "        # Call Comprehend to get the entity types the transcript belongs to\n",
371 |     "        response = comprehend.detect_entities(Text=transcript_truncated, LanguageCode='en', EndpointArn=endpoint_response['EndpointArn'])\n",
372 |     "        # Extract prominent entity\n",
373 |     "        df_temp = pd.DataFrame(columns=tempcols)\n",
374 |     "        for ent in response['Entities']:\n",
375 |     "            df_temp.loc[len(df_temp.index)] = [ent['Type'],ent['Score']]\n",
376 |     "        if len(df_temp) > 0:\n",
377 |     "            entity = df_temp.iloc[df_temp.Score.argmax(), 0:2]['Type']\n",
378 |     "        else:\n",
379 |     "            entity = 'No entities'\n",
380 |     "        \n",
381 |     "        # Update the results DataFrame with the detected entities\n",
382 |     "        df_ent.loc[len(df_ent.index)] = [transcript_file_name.strip('en-').strip('.txt'),entity]        \n",
383 |     "\n",
384 |     "        # Create a CSV file with cta label from this DataFrame\n",
385 |     "df_ent.to_csv('s3://' + bucket + '/' + t_prefix + '/' + 'entities.csv', index=False)\n",
386 |     "df_ent"
387 |    ]
388 |   },
389 |   {
390 |    "cell_type": "markdown",
391 |    "metadata": {},
392 |    "source": [
393 |     "### We are done here. You can return to the workshop instructions for next steps"
394 |    ]
395 |   }
396 |  ],
397 |  "metadata": {
398 |   "instance_type": "ml.t3.medium",
399 |   "interpreter": {
400 |    "hash": "9ddb102edfbd95000dbbd260d8bbcf82701cc06b4dcf114fa04ba84aab75adcb"
401 |   },
402 |   "kernelspec": {
403 |    "display_name": "conda_python3",
404 |    "language": "python",
405 |    "name": "conda_python3"
406 |   },
407 |   "language_info": {
408 |    "codemirror_mode": {
409 |     "name": "ipython",
410 |     "version": 3
411 |    },
412 |    "file_extension": ".py",
413 |    "mimetype": "text/x-python",
414 |    "name": "python",
415 |    "nbconvert_exporter": "python",
416 |    "pygments_lexer": "ipython3",
417 |    "version": "3.6.13"
418 |   }
419 |  },
420 |  "nbformat": 4,
421 |  "nbformat_minor": 4
422 | }
423 | 


--------------------------------------------------------------------------------
/notebooks/3-Train-Classify-Calls/AIM317-reInvent2021-train-and-classify-customer-calls.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Amazon Comprehend Custom Classification\n",
  8 |     "\n",
  9 |     "This notebook will serve as a template for the overall process of taking a text dataset and integrating it into [Amazon Comprehend Custom Classification](https://docs.aws.amazon.com/comprehend/latest/dg/how-document-classification.html) and perform NLP for custom classification.\n",
 10 |     "\n",
 11 |     "## Overview\n",
 12 |     "\n",
 13 |     "1. [Introduction to Amazon Comprehend Custom Classification](#Introduction)\n",
 14 |     "1. [Obtaining Your Data](#data)\n",
 15 |     "1. [Pre-processing data](#preprocess)\n",
 16 |     "1. [Building Custom Classification model](#build)\n",
 17 |     "1. [Real time inference](#inference)\n",
 18 |     "1. [Cleanup](#cleanup)\n",
 19 |     "\n",
 20 |     "\n",
 21 |     "## Introduction to Amazon Comprehend Custom Classification <a class=\"anchor\" id=\"Introduction\"/>\n",
 22 |     "\n",
 23 |     "If you are not familiar with Amazon Comprehend Custom Classification you can learn more about this tool on these pages:\n",
 24 |     "\n",
 25 |     "* [Product Page](https://aws.amazon.com/comprehend/)\n",
 26 |     "* [Product Docs](https://docs.aws.amazon.com/comprehend/latest/dg/how-document-classification.html)\n",
 27 |     "\n",
 28 |     "## Training a custom classifier\n",
 29 |     "\n",
 30 |     "Custom classification is a two-step process. First, you train a custom classifier to recognize the classes that are of interest to you. Then you send unlabeled documents to be classified.\n",
 31 |     "\n",
 32 |     "To train the classifier, specify the options you want, and send Amazon Comprehend documents to be used as training material. Based on the options you indicated, Amazon Comprehend creates a custom ML model that it trains based on the documents you provided. This custom model (the classifier) examines each document you submit. It then returns either the specific class that best represents the content (if you're using multi-class mode) or the set of classes that apply to it (if you're using multi-label mode).\n",
 33 |     "\n",
 34 |     "We are going to use a Hugging Face pre-canned dataset of customer reviews and use the multi-class mode. We ensure that dataset is a .csv and the format of the file must be one class and document per line. For example:\n",
 35 |     "```\n",
 36 |     "CLASS,Text of document 1\n",
 37 |     "CLASS,Text of document 2\n",
 38 |     "CLASS,Text of document 3\n",
 39 |     "```\n"
 40 |    ]
 41 |   },
 42 |   {
 43 |    "cell_type": "code",
 44 |    "execution_count": null,
 45 |    "metadata": {},
 46 |    "outputs": [],
 47 |    "source": [
 48 |     "# Install Hugging Face datasets package\n",
 49 |     "!pip --disable-pip-version-check install datasets --quiet"
 50 |    ]
 51 |   },
 52 |   {
 53 |    "cell_type": "markdown",
 54 |    "metadata": {},
 55 |    "source": [
 56 |     "With the datasets installed, now we will import the Pandas library as well as a few other data science tools in order to inspect the information."
 57 |    ]
 58 |   },
 59 |   {
 60 |    "cell_type": "code",
 61 |    "execution_count": null,
 62 |    "metadata": {},
 63 |    "outputs": [],
 64 |    "source": [
 65 |     "import os\n",
 66 |     "import json\n",
 67 |     "import time\n",
 68 |     "import uuid\n",
 69 |     "import boto3\n",
 70 |     "import pprint\n",
 71 |     "import string\n",
 72 |     "import random\n",
 73 |     "import datetime \n",
 74 |     "import subprocess\n",
 75 |     "import numpy as np\n",
 76 |     "import pandas as pd\n",
 77 |     "from time import sleep"
 78 |    ]
 79 |   },
 80 |   {
 81 |    "cell_type": "markdown",
 82 |    "metadata": {},
 83 |    "source": [
 84 |     "Lets load the data in to dataframe and look at the data we uploaded. Examine the number of columns that are present. Look at few samples to see the content of the data. **This will take 5 to 8 minutes**.\n",
 85 |     "\n",
 86 |     "**Note:** CTA means call to action. No CTA means no call to action. This is a metric to determine if the customer's concern was addressed by the agent during the call. A CTA indicates that the customer is satisfied that their concerns has been or will be addressed by the company."
 87 |    ]
 88 |   },
 89 |   {
 90 |    "cell_type": "code",
 91 |    "execution_count": null,
 92 |    "metadata": {},
 93 |    "outputs": [],
 94 |    "source": [
 95 |     "from datasets import load_dataset\n",
 96 |     "dataset = load_dataset('amazon_us_reviews', 'Electronics_v1_00', split='train[:10%]')"
 97 |    ]
 98 |   },
 99 |   {
100 |    "cell_type": "code",
101 |    "execution_count": null,
102 |    "metadata": {},
103 |    "outputs": [],
104 |    "source": [
105 |     "dataset.set_format(type='pandas')\n",
106 |     "df = dataset[:1000]"
107 |    ]
108 |   },
109 |   {
110 |    "cell_type": "markdown",
111 |    "metadata": {},
112 |    "source": [
113 |     "To convert data to the format that is required by Amazon Comprehend Custom Classifier,\n",
114 |     "\n",
115 |     "```\n",
116 |     "CLASS,Text of document 1\n",
117 |     "CLASS,Text of document 2\n",
118 |     "CLASS,Text of document 3\n",
119 |     "```\n",
120 |     "We will identify the column which are class and which have the text content we would like to train on, we can create a new dataframe with selected columns."
121 |    ]
122 |   },
123 |   {
124 |    "cell_type": "code",
125 |    "execution_count": null,
126 |    "metadata": {},
127 |    "outputs": [],
128 |    "source": [
129 |     "df1 = df[['star_rating','review_body']]\n",
130 |     "df1 = df1.rename(columns={\"review_body\": \"text\", \"star_rating\": \"class\"})"
131 |    ]
132 |   },
133 |   {
134 |    "cell_type": "markdown",
135 |    "metadata": {},
136 |    "source": [
137 |     "We will translate the customer product ratings to CTA (call-to-action) and No CTA (no call-to-action). All ratings from 3 and above are considerd as CTA (customer is satisfied) with 1 and 2 considered as No CTA (customer is not satisfied)"
138 |    ]
139 |   },
140 |   {
141 |    "cell_type": "code",
142 |    "execution_count": null,
143 |    "metadata": {},
144 |    "outputs": [],
145 |    "source": [
146 |     "df1.loc[df1['class'] >= 3, 'class'] = 'CTA'\n",
147 |     "df1.loc[df1['class'] != 'CTA', 'class'] = 'No CTA'"
148 |    ]
149 |   },
150 |   {
151 |    "cell_type": "markdown",
152 |    "metadata": {},
153 |    "source": [
154 |     "Remove all punctuation from the text"
155 |    ]
156 |   },
157 |   {
158 |    "cell_type": "code",
159 |    "execution_count": null,
160 |    "metadata": {},
161 |    "outputs": [],
162 |    "source": [
163 |     "import string\n",
164 |     "for i,row in df1.iterrows():\n",
165 |     "    a = row['text'].strip(string.punctuation)\n",
166 |     "    df1.loc[i,'text'] = a"
167 |    ]
168 |   },
169 |   {
170 |    "cell_type": "code",
171 |    "execution_count": null,
172 |    "metadata": {},
173 |    "outputs": [],
174 |    "source": [
175 |     "df1.head()"
176 |    ]
177 |   },
178 |   {
179 |    "cell_type": "code",
180 |    "execution_count": null,
181 |    "metadata": {},
182 |    "outputs": [],
183 |    "source": [
184 |     "df1['class'].value_counts()"
185 |    ]
186 |   },
187 |   {
188 |    "cell_type": "markdown",
189 |    "metadata": {},
190 |    "source": [
191 |     "## Pre-processing data<a class=\"anchor\" id=\"preprocess\"/> \n"
192 |    ]
193 |   },
194 |   {
195 |    "cell_type": "markdown",
196 |    "metadata": {},
197 |    "source": [
198 |     "For training, the file format must conform with the [following](https://docs.aws.amazon.com/comprehend/latest/dg/how-document-classification-training.html):\n",
199 |     "\n",
200 |     "- File must contain one label and one text per line – 2 columns\n",
201 |     "- No header\n",
202 |     "- Format UTF-8, carriage return “\\n”.\n",
203 |     "\n",
204 |     "Labels “must be uppercase, can be multitoken, have whitespace, consist of multiple words connect by underscores or hyphens or may even contain a comma in it, as long as it is correctly escaped.”\n",
205 |     "\n",
206 |     "For the inference part of it - when you want your custom model to determine which label corresponds to a given text -, the file format must conform with the following:\n",
207 |     "\n",
208 |     "- File must contain text per line\n",
209 |     "- No header\n",
210 |     "- Format UTF-8, carriage return “\\n”."
211 |    ]
212 |   },
213 |   {
214 |    "cell_type": "markdown",
215 |    "metadata": {},
216 |    "source": [
217 |     "At this point we have all the data the 2 needed files. \n",
218 |     "\n",
219 |     "### Building The Target Train and Test Files\n",
220 |     "\n",
221 |     "With all of the above spelled out the next thing to do is to build training file:\n",
222 |     "\n",
223 |     "1. `comprehend-train.csv` - A CSV file containing 2 columns without header, first column class, second column text."
224 |    ]
225 |   },
226 |   {
227 |    "cell_type": "code",
228 |    "execution_count": null,
229 |    "metadata": {},
230 |    "outputs": [],
231 |    "source": [
232 |     "DSTTRAINFILE='comprehend-train.csv'\n",
233 |     "\n",
234 |     "df1.to_csv(path_or_buf=DSTTRAINFILE,\n",
235 |     "                  header=False,\n",
236 |     "                  index=False)"
237 |    ]
238 |   },
239 |   {
240 |    "cell_type": "markdown",
241 |    "metadata": {},
242 |    "source": [
243 |     "## Train an Amazon Comprehend custom classifier\n",
244 |     "Now that all of the required data to get started exists, we can start working on Comprehend Custom Classfier. \n",
245 |     "\n",
246 |     "The custom classifier workload is built in two steps:\n",
247 |     "\n",
248 |     "1. Training the custom model – no particular machine learning or deep learning knowledge is necessary\n",
249 |     "1. Classifying new data\n",
250 |     "\n",
251 |     "Lets follow below steps for Training the custom model:\n",
252 |     "\n",
253 |     "1. Specify the bucket name that was pre-created for you that will host training data artifacts and production results. \n",
254 |     "1. Configure an IAM role allowing Comprehend to [access newly created buckets](https://docs.aws.amazon.com/comprehend/latest/dg/access-control-managing-permissions.html#auth-role-permissions)\n",
255 |     "1. Prepare data for training\n",
256 |     "1. Upload training data in the S3 bucket\n",
257 |     "1. Launch a “Train Classifier” job from the console: “Amazon Comprehend” > “Custom Classification” > “Train Classifier”\n",
258 |     "1. Prepare data for classification (one text per line, no header, same format as training data). Some more details [here](https://docs.aws.amazon.com/comprehend/latest/dg/how-class-run.html)\n"
259 |    ]
260 |   },
261 |   {
262 |    "cell_type": "code",
263 |    "execution_count": null,
264 |    "metadata": {},
265 |    "outputs": [],
266 |    "source": [
267 |     "# Get notebook's region\n",
268 |     "region = boto3.Session().region_name\n",
269 |     "print(region)"
270 |    ]
271 |   },
272 |   {
273 |    "cell_type": "markdown",
274 |    "metadata": {},
275 |    "source": [
276 |     "Configure your AWS APIs"
277 |    ]
278 |   },
279 |   {
280 |    "cell_type": "code",
281 |    "execution_count": null,
282 |    "metadata": {},
283 |    "outputs": [],
284 |    "source": [
285 |     "import sagemaker\n",
286 |     "\n",
287 |     "s3 = boto3.client('s3')\n",
288 |     "comprehend = boto3.client('comprehend')\n",
289 |     "role = sagemaker.get_execution_role()"
290 |    ]
291 |   },
292 |   {
293 |    "cell_type": "markdown",
294 |    "metadata": {},
295 |    "source": [
296 |     "Specify an Amazon s3 bucket that will host training data and test data. **Note:** This bucket should have been created already for you. Please go the Amazon S3 console to verify the bucket is present. It should start with `aim317...`. **Specify your bucket name in the cell below**."
297 |    ]
298 |   },
299 |   {
300 |    "cell_type": "code",
301 |    "execution_count": null,
302 |    "metadata": {},
303 |    "outputs": [],
304 |    "source": [
305 |     "bucket = '<your-s3-bucket>' # Provide your bucket name here\n",
306 |     "prefix = 'comprehend-custom-classifier' # you can leave this as it is\n",
307 |     "\n",
308 |     "try:\n",
309 |     "    s3.head_bucket(Bucket=bucket)\n",
310 |     "except:\n",
311 |     "    print(\"The S3 bucket name {} you entered seems to be incorrect, please try again\".format(bucket))"
312 |    ]
313 |   },
314 |   {
315 |    "cell_type": "markdown",
316 |    "metadata": {},
317 |    "source": [
318 |     "### Uploading the data"
319 |    ]
320 |   },
321 |   {
322 |    "cell_type": "code",
323 |    "execution_count": null,
324 |    "metadata": {},
325 |    "outputs": [],
326 |    "source": [
327 |     "s3.upload_file(DSTTRAINFILE, bucket, prefix+'/' + DSTTRAINFILE)"
328 |    ]
329 |   },
330 |   {
331 |    "cell_type": "markdown",
332 |    "metadata": {},
333 |    "source": [
334 |     "## Building Custom Classification model <a class=\"anchor\" id=\"#build\"/>\n",
335 |     "\n",
336 |     "Launch the classifier training:"
337 |    ]
338 |   },
339 |   {
340 |    "cell_type": "code",
341 |    "execution_count": null,
342 |    "metadata": {},
343 |    "outputs": [],
344 |    "source": [
345 |     "s3_train_data = 's3://{}/{}/{}'.format(bucket, prefix, DSTTRAINFILE)\n",
346 |     "s3_output_job = 's3://{}/{}/{}'.format(bucket, prefix, 'output/train_job')\n",
347 |     "print('training data location: ',s3_train_data, \"output location:\", s3_output_job)"
348 |    ]
349 |   },
350 |   {
351 |    "cell_type": "code",
352 |    "execution_count": null,
353 |    "metadata": {},
354 |    "outputs": [],
355 |    "source": [
356 |     "uid = uuid.uuid4()\n",
357 |     "\n",
358 |     "training_job = comprehend.create_document_classifier(\n",
359 |     "    DocumentClassifierName='aim317-cc-' + str(uid),\n",
360 |     "    DataAccessRoleArn=role,\n",
361 |     "    InputDataConfig={\n",
362 |     "        'S3Uri': s3_train_data\n",
363 |     "    },\n",
364 |     "    OutputDataConfig={\n",
365 |     "        'S3Uri': s3_output_job\n",
366 |     "    },\n",
367 |     "    LanguageCode='en',\n",
368 |     "    VersionName='v001'\n",
369 |     ")"
370 |    ]
371 |   },
372 |   {
373 |    "cell_type": "markdown",
374 |    "metadata": {},
375 |    "source": [
376 |     "### Check training status in Amazon Comprehend console\n",
377 |     "\n",
378 |     "[Go to Amazon Comprehend Console](https://console.aws.amazon.com/comprehend/v2/home?region=us-east-1#classification)\n",
379 |     "\n",
380 |     "This will take approximately 30 minutes. Go to the **Classifier Metrics** step below after the classifier has been created and is ready for use. Running the cells prior to classifier being ready, will throw an error. Simply re-execute the cell again after the classifier is ready."
381 |    ]
382 |   },
383 |   {
384 |    "cell_type": "markdown",
385 |    "metadata": {},
386 |    "source": [
387 |     "### Classifier Metrics"
388 |    ]
389 |   },
390 |   {
391 |    "cell_type": "code",
392 |    "execution_count": null,
393 |    "metadata": {},
394 |    "outputs": [],
395 |    "source": [
396 |     "response = comprehend.describe_document_classifier(\n",
397 |     "    DocumentClassifierArn=training_job['DocumentClassifierArn']\n",
398 |     ")\n",
399 |     "print(response['DocumentClassifierProperties']['ClassifierMetadata']['EvaluationMetrics'])"
400 |    ]
401 |   },
402 |   {
403 |    "cell_type": "markdown",
404 |    "metadata": {},
405 |    "source": [
406 |     "## Real time inference <a class=\"anchor\" id=\"inference\"/>\n",
407 |     "We will now use a custom classifier real time endpoint to detect if the audio transcripts and translated text contain indication of there is a clear CTA or not. "
408 |    ]
409 |   },
410 |   {
411 |    "cell_type": "markdown",
412 |    "metadata": {},
413 |    "source": [
414 |     "### Create endpoint"
415 |    ]
416 |   },
417 |   {
418 |    "cell_type": "code",
419 |    "execution_count": null,
420 |    "metadata": {},
421 |    "outputs": [],
422 |    "source": [
423 |     "model_arn = response[\"DocumentClassifierProperties\"][\"DocumentClassifierArn\"]\n",
424 |     "print('Model used for real time endpoint ' + model_arn)"
425 |    ]
426 |   },
427 |   {
428 |    "cell_type": "code",
429 |    "execution_count": null,
430 |    "metadata": {},
431 |    "outputs": [],
432 |    "source": [
433 |     "# Let's create an endpoint with 4 Inference Units to account for us sending approximately 400 characters per second to the endpoint\n",
434 |     "\n",
435 |     "create_endpoint_response = comprehend.create_endpoint(\n",
436 |     "    EndpointName='aim317-cc-ep',\n",
437 |     "    ModelArn=model_arn,\n",
438 |     "    DesiredInferenceUnits=4,\n",
439 |     "    \n",
440 |     ")\n",
441 |     "\n",
442 |     "print(create_endpoint_response['EndpointArn'])"
443 |    ]
444 |   },
445 |   {
446 |    "cell_type": "markdown",
447 |    "metadata": {},
448 |    "source": [
449 |     "### Check endpoint status in Amazon Comprehend console\n",
450 |     "\n",
451 |     "[Go to Amazon Comprehend Console](https://console.aws.amazon.com/comprehend/v2/home?region=us-east-1#endpoints)\n",
452 |     "\n",
453 |     "This will take approximately 10 minutes. Go to the **Run Inference** step below after the classifier has been created and is ready for use. Running the cells prior to classifier being ready, will lock the cell. This will presume only after classifier has been trained."
454 |    ]
455 |   },
456 |   {
457 |    "cell_type": "markdown",
458 |    "metadata": {},
459 |    "source": [
460 |     "### Run Inference\n",
461 |     "\n",
462 |     "Lets review the list of files ready for inference in the `comprehend/input` folder of our S3 bucket. These files were created by the notebook available in `1-Transcribe-Translate-Calls`"
463 |    ]
464 |   },
465 |   {
466 |    "cell_type": "code",
467 |    "execution_count": null,
468 |    "metadata": {},
469 |    "outputs": [],
470 |    "source": [
471 |     "# Input files ready for classification\n",
472 |     "!aws s3 ls s3://{bucket}/comprehend/input/"
473 |    ]
474 |   },
475 |   {
476 |    "cell_type": "code",
477 |    "execution_count": null,
478 |    "metadata": {},
479 |    "outputs": [],
480 |    "source": [
481 |     "# Prepare to page through our transcripts in S3\n",
482 |     "\n",
483 |     "# Define the S3 handles\n",
484 |     "s3 = boto3.client('s3')\n",
485 |     "s3_resource = boto3.resource('s3')\n",
486 |     "\n",
487 |     "\n",
488 |     "# We will be merging the classifier predictions with the transcript segments we created for quicksight in 1-Transcribe-Translate\n",
489 |     "t_prefix = 'quicksight/data/cta'\n",
490 |     "\n",
491 |     "\n",
492 |     "# Lets define the bucket name that contains the transcripts first\n",
493 |     "# So far we used a session bucket we created for training and testing the classifier\n",
494 |     "\n",
495 |     "paginator = s3.get_paginator('list_objects_v2')\n",
496 |     "pages = paginator.paginate(Bucket=bucket, Prefix='comprehend/input')\n",
497 |     "a = []\n",
498 |     "\n",
499 |     "\n",
500 |     "# We will define a DataFrame to store the results of the classifier\n",
501 |     "cols = ['transcript_name', 'cta_status']\n",
502 |     "df_class = pd.DataFrame(columns=cols)\n",
503 |     "\n",
504 |     "# Now lets page through the transcripts\n",
505 |     "for page in pages:\n",
506 |     "    for obj in page['Contents']:\n",
507 |     "        cta = ''\n",
508 |     "        # get the transcript file name\n",
509 |     "        transcript_file_name = obj['Key'].split('/')[2]\n",
510 |     "        # now lets get the transcript file contents\n",
511 |     "        temp = s3_resource.Object(bucket, obj['Key'])\n",
512 |     "        transcript_content = temp.get()['Body'].read().decode('utf-8')\n",
513 |     "        # Send the last few sentence(s) for classification\n",
514 |     "        transcript_truncated = transcript_content[1500:1900]\n",
515 |     "        # Call Comprehend to classify input text\n",
516 |     "        response = comprehend.classify_document(Text=transcript_truncated, EndpointArn=create_endpoint_response['EndpointArn'])\n",
517 |     "        # Now we need to determine which of the two classes has the higher confidence score\n",
518 |     "        # Use the name for that score as our predicted label\n",
519 |     "        a = response['Classes']\n",
520 |     "        # We will use this temp DataFrame to extract the class with maximum confidence level for CTA\n",
521 |     "        tempcols = ['Name', 'Score']\n",
522 |     "        df_temp = pd.DataFrame(columns=tempcols)\n",
523 |     "        for i in range(0, 2):\n",
524 |     "            df_temp.loc[len(df_temp.index)] = [a[i]['Name'], a[i]['Score']]\n",
525 |     "        cta = df_temp.iloc[df_temp.Score.argmax(), 0:2]['Name']\n",
526 |     "        \n",
527 |     "        # Update the results DataFrame with the cta predicted label\n",
528 |     "        # Create a CSV file with cta label from this DataFrame\n",
529 |     "        df_class.loc[len(df_class.index)] = [transcript_file_name.strip('en-').strip('.txt'), cta]        \n",
530 |     "\n",
531 |     "df_class.to_csv('s3://' + bucket + '/' + t_prefix + '/' + 'cta_status.csv', index=False)\n",
532 |     "df_class"
533 |    ]
534 |   },
535 |   {
536 |    "cell_type": "markdown",
537 |    "metadata": {},
538 |    "source": [
539 |     "### End of notebook\n",
540 |     "Please go back to the workshop instructions to continue to the next step"
541 |    ]
542 |   }
543 |  ],
544 |  "metadata": {
545 |   "instance_type": "ml.t3.medium",
546 |   "interpreter": {
547 |    "hash": "9ddb102edfbd95000dbbd260d8bbcf82701cc06b4dcf114fa04ba84aab75adcb"
548 |   },
549 |   "kernelspec": {
550 |    "display_name": "conda_python3",
551 |    "language": "python",
552 |    "name": "conda_python3"
553 |   },
554 |   "language_info": {
555 |    "codemirror_mode": {
556 |     "name": "ipython",
557 |     "version": 3
558 |    },
559 |    "file_extension": ".py",
560 |    "mimetype": "text/x-python",
561 |    "name": "python",
562 |    "nbconvert_exporter": "python",
563 |    "pygments_lexer": "ipython3",
564 |    "version": "3.6.13"
565 |   }
566 |  },
567 |  "nbformat": 4,
568 |  "nbformat_minor": 4
569 | }
570 | 


--------------------------------------------------------------------------------
/notebooks/1-Transcribe-Translate-Calls/AIM317-reInvent2021-transcribe-and-translate-customer-calls.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "id": "04837ae6",
  6 |    "metadata": {},
  7 |    "source": [
  8 |     "## Boost transcription accuracy with Amazon Translate custom vocabulary and localize transcripts with Amazon Translate custom terminology\n",
  9 |     "\n",
 10 |     "This is the accompanying notebook for the re:Invent 2021 workshop AIM317 - Uncover insights from your customer conversations. Please run this notebook after reviewing the **[prerequisites and instructions](https://studio.us-east-1.prod.workshops.aws/preview/076e45e5-760d-41cf-bd22-a86c46ee462c/builds/83c4ddb7-fbc6-4e72-b5da-967f8fe7cfcb/en-US/1-transcribe-translate-calls)** from the workshop. "
 11 |    ]
 12 |   },
 13 |   {
 14 |    "cell_type": "markdown",
 15 |    "id": "f591749b",
 16 |    "metadata": {},
 17 |    "source": [
 18 |     "## Prerequisites for this notebook"
 19 |    ]
 20 |   },
 21 |   {
 22 |    "cell_type": "code",
 23 |    "execution_count": null,
 24 |    "id": "49637499",
 25 |    "metadata": {},
 26 |    "outputs": [],
 27 |    "source": [
 28 |     "# First, let's install dependencies for the transcript word utility we will use in this notebook\n",
 29 |     "!pip install python-docx --quiet\n",
 30 |     "!pip install matplotlib --quiet\n",
 31 |     "!pip install scipy --quiet"
 32 |    ]
 33 |   },
 34 |   {
 35 |    "cell_type": "markdown",
 36 |    "id": "c8906dc9",
 37 |    "metadata": {},
 38 |    "source": [
 39 |     "### Import libraries and initialize variables"
 40 |    ]
 41 |   },
 42 |   {
 43 |    "cell_type": "code",
 44 |    "execution_count": null,
 45 |    "id": "7bedcea1",
 46 |    "metadata": {},
 47 |    "outputs": [],
 48 |    "source": [
 49 |     "import io\n",
 50 |     "import os\n",
 51 |     "import re\n",
 52 |     "import uuid\n",
 53 |     "import json\n",
 54 |     "import time\n",
 55 |     "import boto3\n",
 56 |     "import pprint\n",
 57 |     "import botocore\n",
 58 |     "import sagemaker\n",
 59 |     "import subprocess\n",
 60 |     "from sagemaker import get_execution_role\n",
 61 |     "from datetime import datetime, timezone"
 62 |    ]
 63 |   },
 64 |   {
 65 |    "cell_type": "code",
 66 |    "execution_count": null,
 67 |    "id": "40d91817",
 68 |    "metadata": {},
 69 |    "outputs": [],
 70 |    "source": [
 71 |     "bucket = '<your-s3-bucket>' # Add your bucket name here\n",
 72 |     "\n",
 73 |     "region = boto3.session.Session().region_name\n",
 74 |     "\n",
 75 |     "# Amazon S3 (S3) client\n",
 76 |     "s3 = boto3.client('s3', region)\n",
 77 |     "s3_resource = boto3.resource('s3')\n",
 78 |     "try:\n",
 79 |     "    s3.head_bucket(Bucket=bucket)\n",
 80 |     "except:\n",
 81 |     "    print(\"The S3 bucket name {} you entered seems to be incorrect, please try again\".format(bucket))"
 82 |    ]
 83 |   },
 84 |   {
 85 |    "cell_type": "code",
 86 |    "execution_count": null,
 87 |    "id": "935d324a",
 88 |    "metadata": {},
 89 |    "outputs": [],
 90 |    "source": [
 91 |     "INPUT_PATH_TRANSCRIBE = 'transcribe/input'\n",
 92 |     "OUTPUT_PATH_TRANSCRIBE = 'transcribe/output'\n",
 93 |     "INPUT_PATH_TRANSLATE = 'translate/input'\n",
 94 |     "OUTPUT_PATH_TRANSLATE = 'translate/output'"
 95 |    ]
 96 |   },
 97 |   {
 98 |    "cell_type": "code",
 99 |    "execution_count": null,
100 |    "id": "f32f26e5",
101 |    "metadata": {},
102 |    "outputs": [],
103 |    "source": [
104 |     "region = boto3.session.Session().region_name\n",
105 |     "bucket_region = s3.head_bucket(Bucket=bucket)['ResponseMetadata']['HTTPHeaders']['x-amz-bucket-region']\n",
106 |     "assert bucket_region == region, \"Your S3 bucket {} and this notebook need to be in the same region.\".format(bucket)\n",
107 |     "# Amazon Transcribe client\n",
108 |     "transcribe_client = boto3.client(\"transcribe\")\n",
109 |     "# Amazon Translate client\n",
110 |     "translate_client = boto3.client(\"translate\")"
111 |    ]
112 |   },
113 |   {
114 |    "cell_type": "code",
115 |    "execution_count": null,
116 |    "id": "d190c821",
117 |    "metadata": {},
118 |    "outputs": [],
119 |    "source": [
120 |     "# This is the execution role that will be used to call Amazon Transcribe and Amazon Translate\n",
121 |     "role = get_execution_role()\n",
122 |     "display(role)"
123 |    ]
124 |   },
125 |   {
126 |    "cell_type": "markdown",
127 |    "id": "36e8d4a4",
128 |    "metadata": {},
129 |    "source": [
130 |     "## Amazon Transcribe Custom"
131 |    ]
132 |   },
133 |   {
134 |    "cell_type": "markdown",
135 |    "id": "f03eb05a",
136 |    "metadata": {},
137 |    "source": [
138 |     "### Create custom vocabulary\n",
139 |     "\n",
140 |     "You can give Amazon Transcribe more information about how to process speech in your input file by creating a custom vocabulary in text file format. A custom vocabulary is a list of specific words that you want Amazon Transcribe to recognize in your audio input. These are generally domain-specific words and phrases, words that Amazon Transcribe isn't recognizing, or proper nouns."
141 |    ]
142 |   },
143 |   {
144 |    "cell_type": "code",
145 |    "execution_count": null,
146 |    "id": "5a22d56c",
147 |    "metadata": {},
148 |    "outputs": [],
149 |    "source": [
150 |     "# First lets view our vocabulary files\n",
151 |     "!pygmentize 'input/custom-vocabulary-EN.txt'"
152 |    ]
153 |   },
154 |   {
155 |    "cell_type": "code",
156 |    "execution_count": null,
157 |    "id": "51c9f86b",
158 |    "metadata": {},
159 |    "outputs": [],
160 |    "source": [
161 |     "# First lets view our vocabulary files - Uncomment line below to view if you like\n",
162 |     "#!pygmentize 'input/custom-vocabulary-ES.txt'"
163 |    ]
164 |   },
165 |   {
166 |    "cell_type": "markdown",
167 |    "id": "fdce93dc",
168 |    "metadata": {},
169 |    "source": [
170 |     "#### Custom vocabularies can be in table or list formats\n",
171 |     "\n",
172 |     "Each vocabulary file can be in either table or list format; table format is strongly recommended because it gives you more options for and more control over the input and output of words within your custom vocabulary. As you saw above, we used the table format for this workshop. When you use the table format, it has 4 columns as explain below:\n",
173 |     "\n",
174 |     "1. **Phrase**\n",
175 |     "The word or phrase that should be recognized. If the entry is a phrase, separate the words with a hyphen (-). For example, you type Los Angeles as Los-Angeles. The Phrase field is required\n",
176 |     "\n",
177 |     "1. **IPA**\n",
178 |     "The pronunciation of your word or phrase using IPA characters. You can include characters in the International Phonetic Alphabet (IPA) in this field.\n",
179 |     "\n",
180 |     "1. **SoundsLike**\n",
181 |     "The pronunciation of your word or phrase using the standard orthography of the language to mimic the way that the word sounds.\n",
182 |     "\n",
183 |     "1. **DisplayAs**\n",
184 |     "Defines the how the word or phrase looks when it's output. For example, if the word or phrase is Los-Angeles, you can specify the display form as \"Los Angeles\" so that the hyphen is not present in the output."
185 |    ]
186 |   },
187 |   {
188 |    "cell_type": "code",
189 |    "execution_count": null,
190 |    "id": "dd14c202",
191 |    "metadata": {},
192 |    "outputs": [],
193 |    "source": [
194 |     "# Next we will upload our vocabulary files to our S3 bucket\n",
195 |     "cust_vocab_en = 'custom-vocabulary-EN.txt'\n",
196 |     "cust_vocab_es = 'custom-vocabulary-ES.txt'\n",
197 |     "s3.upload_file('input/' + cust_vocab_en,bucket,INPUT_PATH_TRANSCRIBE + '/' + cust_vocab_en)\n",
198 |     "s3.upload_file('input/' + cust_vocab_es,bucket,INPUT_PATH_TRANSCRIBE + '/' + cust_vocab_es)"
199 |    ]
200 |   },
201 |   {
202 |    "cell_type": "code",
203 |    "execution_count": null,
204 |    "id": "2aba0fbe",
205 |    "metadata": {},
206 |    "outputs": [],
207 |    "source": [
208 |     "# Create the custom vocabulary in Transcribe\n",
209 |     "# The name of your custom vocabulary must be unique!\n",
210 |     "vocab_EN = 'custom-vocab-EN-' + str(uuid.uuid4())\n",
211 |     "vocab_ES = 'custom-vocab-ES-' + str(uuid.uuid4())"
212 |    ]
213 |   },
214 |   {
215 |    "cell_type": "code",
216 |    "execution_count": null,
217 |    "id": "70145b3e",
218 |    "metadata": {},
219 |    "outputs": [],
220 |    "source": [
221 |     "vocab_response_EN = transcribe_client.create_vocabulary(\n",
222 |     "    VocabularyName=vocab_EN,\n",
223 |     "    LanguageCode='en-US',\n",
224 |     "    VocabularyFileUri='s3://' + bucket + '/'+ INPUT_PATH_TRANSCRIBE + '/' + cust_vocab_en\n",
225 |     ")"
226 |    ]
227 |   },
228 |   {
229 |    "cell_type": "code",
230 |    "execution_count": null,
231 |    "id": "9cf568bd",
232 |    "metadata": {},
233 |    "outputs": [],
234 |    "source": [
235 |     "vocab_response_ES = transcribe_client.create_vocabulary(\n",
236 |     "    VocabularyName=vocab_ES,\n",
237 |     "    LanguageCode='es-US',\n",
238 |     "    VocabularyFileUri='s3://' + bucket + '/'+ INPUT_PATH_TRANSCRIBE + '/' + cust_vocab_es\n",
239 |     ")"
240 |    ]
241 |   },
242 |   {
243 |    "cell_type": "markdown",
244 |    "id": "735d85ac",
245 |    "metadata": {},
246 |    "source": [
247 |     "### Check Vocabulary status in Amazon Transcribe console\n",
248 |     "\n",
249 |     "[Go to Amazon Transcribe Console](https://console.aws.amazon.com/transcribe/home?region=us-east-1#vocabulary)\n",
250 |     "\n",
251 |     "This will take 3 to 5 minutes. Go to the **Perform Transcription** step below once the vocabulary has been created and is ready for use.\n",
252 |     "    "
253 |    ]
254 |   },
255 |   {
256 |    "cell_type": "markdown",
257 |    "id": "6ac52da7",
258 |    "metadata": {},
259 |    "source": [
260 |     "### Perform Transcription"
261 |    ]
262 |   },
263 |   {
264 |    "cell_type": "code",
265 |    "execution_count": null,
266 |    "id": "ad29b8c6",
267 |    "metadata": {},
268 |    "outputs": [],
269 |    "source": [
270 |     "# First let us list our audio files and then upload them to the S3 bucket\n",
271 |     "audio_dir = 'input/audio-recordings'\n",
272 |     "\n",
273 |     "for subdir, dirs, files in os.walk(audio_dir):\n",
274 |     "    for file in files:\n",
275 |     "        s3.upload_file(os.path.join(subdir, file), bucket, 'transcribe/' + os.path.join(subdir, file))\n",
276 |     "        print(\"Uploaded to: \" + \"s3://\" + bucket + '/transcribe/' + os.path.join(subdir, file))"
277 |    ]
278 |   },
279 |   {
280 |    "cell_type": "code",
281 |    "execution_count": null,
282 |    "id": "fa357c3a",
283 |    "metadata": {},
284 |    "outputs": [],
285 |    "source": [
286 |     "# Define the method that will perform transcription\n",
287 |     "\n",
288 |     "def transcribe(job_name, job_uri, lang_code, vocab_name):\n",
289 |     "    \"\"\"Transcribe audio files to text.\n",
290 |     "    Args:\n",
291 |     "        job_name (str): the name of the job that you specify;\n",
292 |     "                        the output json will be job_name.json\n",
293 |     "        job_uri (str): input path (in s3) to the file being transcribed\n",
294 |     "        in_bucket (str): s3 bucket prefix where the input audio files are present\n",
295 |     "        out_bucket (str): s3 bucket name that you want the output json\n",
296 |     "                          to be placed in\n",
297 |     "        vocab_name (str): name of custom vocabulary used;\n",
298 |     "    \"\"\"\n",
299 |     "    try:\n",
300 |     "        transcribe_client.start_transcription_job(\n",
301 |     "            TranscriptionJobName=job_name,\n",
302 |     "            LanguageCode=lang_code,\n",
303 |     "            Media={\"MediaFileUri\": job_uri},\n",
304 |     "            Settings={'VocabularyName': vocab_name, 'MaxSpeakerLabels': 2, 'ShowSpeakerLabels': True}\n",
305 |     "            )\n",
306 |     "        \n",
307 |     "        time.sleep(2)\n",
308 |     "        \n",
309 |     "        print(transcribe_client.get_transcription_job(TranscriptionJobName=job_name)['TranscriptionJob']['TranscriptionJobStatus'])\n",
310 |     "\n",
311 |     "    except Exception as e:\n",
312 |     "        print(e)\n"
313 |    ]
314 |   },
315 |   {
316 |    "cell_type": "markdown",
317 |    "id": "abfd025d",
318 |    "metadata": {},
319 |    "source": [
320 |     "**Note:** As you can see in our code below we are determining the language code to send to Amazon Transcribe. However this is not required if you set the [IdentifyLanguage to True](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/transcribe.html#TranscribeService.Client.start_transcription_job). In our case we needed to select either the English or Spanish Custom Vocabulary file to use for transcribing audio files and hence we went with specific language codes. "
321 |    ]
322 |   },
323 |   {
324 |    "cell_type": "code",
325 |    "execution_count": null,
326 |    "id": "75653a8e",
327 |    "metadata": {},
328 |    "outputs": [],
329 |    "source": [
330 |     "# Now we will loop through the recordings in our bucket to submit the transcription jobs\n",
331 |     "now = datetime.now()\n",
332 |     "time_now = now.strftime(\"%H.%M.%S\")\n",
333 |     "\n",
334 |     "paginator = s3.get_paginator('list_objects_v2')\n",
335 |     "pages = paginator.paginate(Bucket=bucket, Prefix='transcribe/input/audio-recordings')\n",
336 |     "job_name_list = []\n",
337 |     "\n",
338 |     "for page in pages:\n",
339 |     "    for obj in page['Contents']:\n",
340 |     "        audio_name = obj['Key'].split('/')[3].split('.')[0]\n",
341 |     "        job_name = audio_name + '-' + time_now\n",
342 |     "        job_name_list.append(job_name)\n",
343 |     "        job_uri = f\"s3://{bucket}/{obj['Key']}\"\n",
344 |     "        print('Submitting transcription for audio: ' + job_name)\n",
345 |     "        vocab = ''\n",
346 |     "        lang_code = ''\n",
347 |     "        if audio_name.split('-')[2] == 'EN':\n",
348 |     "            vocab = vocab_EN\n",
349 |     "            lang_code = 'en-US'\n",
350 |     "        elif audio_name.split('-')[2] == 'ES':\n",
351 |     "            vocab = vocab_ES\n",
352 |     "            lang_code = 'es-US'\n",
353 |     "        # submit the transcription job now, we will provide our current bucket name as the output bucket\n",
354 |     "        transcribe(job_name, job_uri, lang_code,vocab)"
355 |    ]
356 |   },
357 |   {
358 |    "cell_type": "markdown",
359 |    "id": "275266e9",
360 |    "metadata": {},
361 |    "source": [
362 |     "### Check Transcription job status in Amazon Transcribe console\n",
363 |     "\n",
364 |     "[Go to Amazon Transcribe Console](https://console.aws.amazon.com/transcribe/home?region=us-east-1#jobs)\n",
365 |     "\n",
366 |     "This will be complete in about 5 to 8 minutes in total for all the jobs. Go to the **Process Transcription output** step below once the transcription jobs show status as complete otherwise you will get an error. No worries, just try again in a minute or so.\n",
367 |     "    "
368 |    ]
369 |   },
370 |   {
371 |    "cell_type": "markdown",
372 |    "id": "ffd8e4a4",
373 |    "metadata": {},
374 |    "source": [
375 |     "### Process Transcription output"
376 |    ]
377 |   },
378 |   {
379 |    "cell_type": "markdown",
380 |    "id": "823fb45c",
381 |    "metadata": {},
382 |    "source": [
383 |     "#### Clone Transcribe helper repo\n",
384 |     "\n",
385 |     "From a terminal window in your notebook instance, navigate to the current directory where this notebook resides, and execute the command `git clone https://github.com/aws-samples/amazon-transcribe-output-word-document` before executing the cell below. The steps you will have to follow are:\n",
386 |     "\n",
387 |     "1. From Jupyter notebook home page on the right, select New --> Terminal\n",
388 |     "1. In the terminal window, type `cd SageMaker`\n",
389 |     "1. Now type `cd aim317-uncover-insights-customer-conversations`\n",
390 |     "1. Now type `cd notebooks`\n",
391 |     "1. Now type `cd 1-Transcribe-Translate-Calls`\n",
392 |     "1. Finally type the command `git clone https://github.com/aws-samples/amazon-transcribe-output-word-document`\n"
393 |    ]
394 |   },
395 |   {
396 |    "cell_type": "markdown",
397 |    "id": "55d6cc7c",
398 |    "metadata": {},
399 |    "source": [
400 |     "#### Create a word document from call transcript\n",
401 |     "\n",
402 |     "We will generate a word document from the Amazon Transcribe response JSON so we review the transcript. Once you execute the code in the next cell, go to your notebook folder and **you will see the word document created with the Transcribe job name. Select this word document, click download and you can open it to review the transcript**."
403 |    ]
404 |   },
405 |   {
406 |    "cell_type": "code",
407 |    "execution_count": null,
408 |    "id": "c314aea5",
409 |    "metadata": {},
410 |    "outputs": [],
411 |    "source": [
412 |     "!python amazon-transcribe-output-word-document/python/ts-to-word.py --inputJob {job_name_list[0]}"
413 |    ]
414 |   },
415 |   {
416 |    "cell_type": "markdown",
417 |    "id": "82267b4f",
418 |    "metadata": {},
419 |    "source": [
420 |     "### Get Call Segments\n",
421 |     "We will get the call segments and speaker information to derive additional insights we can visualize in QuickSight. "
422 |    ]
423 |   },
424 |   {
425 |    "cell_type": "code",
426 |    "execution_count": null,
427 |    "id": "75b324a0",
428 |    "metadata": {},
429 |    "outputs": [],
430 |    "source": [
431 |     "import pandas as pd\n",
432 |     "\n",
433 |     "def upload_segments(transcript):\n",
434 |     "    # Get the speaker segments\n",
435 |     "    cols = ['transcript_name', 'start_time', 'end_time', 'speaker_label']\n",
436 |     "    spk_df = pd.DataFrame(columns=cols)\n",
437 |     "    for seg in original['results']['speaker_labels']['segments']:\n",
438 |     "        for item in seg['items']:\n",
439 |     "            spk_df.loc[len(spk_df.index)] = [transcript['jobName'], item['start_time'], item['end_time'], item['speaker_label']]\n",
440 |     "    # Get the speaker content\n",
441 |     "    icols = ['transcript_name', 'start_time', 'end_time', 'confidence', 'content']\n",
442 |     "    item_df = pd.DataFrame(columns=icols)\n",
443 |     "    for itms in original['results']['items']:\n",
444 |     "        if itms.get('start_time') is not None:\n",
445 |     "            item_df.loc[len(item_df.index)] = [transcript['jobName'], itms['start_time'], itms['end_time'], itms['alternatives'][0]['confidence'], itms['alternatives'][0]['content']]\n",
446 |     "\n",
447 |     "    # Merge the two on transcript name, start time and end time\n",
448 |     "    full_df = pd.merge(spk_df, item_df,  how='left', left_on=['transcript_name', 'start_time', 'end_time'], right_on = ['transcript_name', 'start_time', 'end_time'])\n",
449 |     "    # We will use the Transcribe Job Name for the CSV file name\n",
450 |     "    csv_file = transcript['jobName'] + '.csv'\n",
451 |     "    full_df.to_csv(csv_file, index=False)\n",
452 |     "    s3.upload_file(csv_file, bucket, 'quicksight/data/transcripts/' + csv_file)\n",
453 |     "    # The print below is too verbose so commenting for now - feel free to uncomment if needed\n",
454 |     "    #print(\"CSV file with speaker segments created and uploaded for visualization input to: \" + \"s3://\" + bucket + \"/\" + \"quicksight/data/transcripts/\" + csv_file)"
455 |    ]
456 |   },
457 |   {
458 |    "cell_type": "markdown",
459 |    "id": "51c363b8",
460 |    "metadata": {},
461 |    "source": [
462 |     "#### Upload transcript text files to S3 bucket\n",
463 |     "\n",
464 |     "We will now get the full transcript from all the calls and send them to our S3 bucket in preparation for our translation tasks\n"
465 |    ]
466 |   },
467 |   {
468 |    "cell_type": "code",
469 |    "execution_count": null,
470 |    "id": "f86e1c56",
471 |    "metadata": {},
472 |    "outputs": [],
473 |    "source": [
474 |     "# First we need an output directory\n",
475 |     "dir = os.getcwd()+'/output'\n",
476 |     "if not os.path.exists(dir):\n",
477 |     "    os.makedirs(dir)"
478 |    ]
479 |   },
480 |   {
481 |    "cell_type": "code",
482 |    "execution_count": null,
483 |    "id": "73269b2d",
484 |    "metadata": {},
485 |    "outputs": [],
486 |    "source": [
487 |     "# Our transcript is in a presigned URL in Transcribe's S3 bucket, let us download it and get the text we need\n",
488 |     "import urllib3\n",
489 |     "\n",
490 |     "for job in job_name_list:\n",
491 |     "    response = transcribe_client.get_transcription_job(\n",
492 |     "        TranscriptionJobName=job \n",
493 |     "    )\n",
494 |     "    file_name = response['TranscriptionJob']['Transcript']['TranscriptFileUri']\n",
495 |     "    http = urllib3.PoolManager()\n",
496 |     "    transcribed_data = http.request('GET', file_name)\n",
497 |     "    original = json.loads(transcribed_data.data.decode('utf-8'))\n",
498 |     "    # Extract the speaker segments, confidence scores for each call\n",
499 |     "    # Send it to the QuickSight folder in the S3 bucket\n",
500 |     "    # We will use this during visualization\n",
501 |     "    upload_segments(original)\n",
502 |     "    entire_transcript = ''\n",
503 |     "    entire_transcript = original[\"results\"][\"transcripts\"]\n",
504 |     "    outfile = 'output/'+job+'.txt'\n",
505 |     "    with open(outfile, 'w') as out:\n",
506 |     "        out.write(entire_transcript[0]['transcript'])\n",
507 |     "    s3.upload_file(outfile,bucket,OUTPUT_PATH_TRANSCRIBE+'/'+job+'.txt')\n",
508 |     "    print(\"Transcript uploaded to: \" + f's3://{bucket}/{OUTPUT_PATH_TRANSCRIBE}/{job}.txt')"
509 |    ]
510 |   },
511 |   {
512 |    "cell_type": "markdown",
513 |    "id": "19f1202e",
514 |    "metadata": {},
515 |    "source": [
516 |     "## Amazon Translate with Custom Terminology\n",
517 |     "\n",
518 |     "[Amazon Translate](https://aws.amazon.com/translate/) is a fully managed, neural machine translation service that delivers high quality and affordable language translation in seventy-one languages. Using [custom terminology](https://docs.aws.amazon.com/translate/latest/dg/how-custom-terminology.html) with your translation requests enables you to make sure that your brand names, character names, model names, and other unique content is translated exactly the way you need it, regardless of its context and the Amazon Translate algorithm’s decision. It's easy to set up a terminology file and attach it to your Amazon Translate account. When you translate text, you simply choose to use the custom terminology as well, and any examples of your source word are translated as you want them."
519 |    ]
520 |   },
521 |   {
522 |    "cell_type": "markdown",
523 |    "id": "98cfd5aa",
524 |    "metadata": {},
525 |    "source": [
526 |     "### Translate the Spanish transcripts\n",
527 |     "\n",
528 |     "We will first create a custom terminology file that consists of examples that show how you want words to be translated. In our case we are using a CSV file as the format, but it supports [TMX as well](https://docs.aws.amazon.com/translate/latest/dg/creating-custom-terminology.html). It includes a collection of words or terminologies in a source language, and for each example, it contains the desired translation output in one or more target languages. We created a sample custom terminology file for our use case which is available in the input folder of this notebook `translate-custom-terminology.txt` to create a translation of our Spanish transcripts. We will now review this file and proceed with setting up a Custom Translation job."
529 |    ]
530 |   },
531 |   {
532 |    "cell_type": "markdown",
533 |    "id": "7826ef83",
534 |    "metadata": {},
535 |    "source": [
536 |     "#### Review custom terminology file"
537 |    ]
538 |   },
539 |   {
540 |    "cell_type": "code",
541 |    "execution_count": null,
542 |    "id": "6e143fdc",
543 |    "metadata": {},
544 |    "outputs": [],
545 |    "source": [
546 |     "# Lets first review our custom terminology file \n",
547 |     "# We created a sample file for this workshop that we can use - uncomment below to check\n",
548 |     "#!pygmentize 'input/translate-custom-terminology.txt'"
549 |    ]
550 |   },
551 |   {
552 |    "cell_type": "code",
553 |    "execution_count": null,
554 |    "id": "0271a2b2",
555 |    "metadata": {},
556 |    "outputs": [],
557 |    "source": [
558 |     "# Change extension to CSV and upload to S3 bucket\n",
559 |     "term_prefix = 'translate/custom-terminology/'\n",
560 |     "pd_filename = 'translate-custom-terminology'\n",
561 |     "s3.upload_file('input/' + pd_filename + '.txt', bucket, term_prefix + '/' + pd_filename + '.csv')"
562 |    ]
563 |   },
564 |   {
565 |    "cell_type": "markdown",
566 |    "id": "58ffe46e",
567 |    "metadata": {},
568 |    "source": [
569 |     "#### Import custom terminology to Amazon Translate"
570 |    ]
571 |   },
572 |   {
573 |    "cell_type": "code",
574 |    "execution_count": null,
575 |    "id": "d5a844e8",
576 |    "metadata": {},
577 |    "outputs": [],
578 |    "source": [
579 |     "# read the custom terminology csv file we uploaded\n",
580 |     "temp = s3_resource.Object(bucket, term_prefix + '/' + pd_filename + '.csv')\n",
581 |     "term_file = temp.get()['Body'].read().decode('utf-8')"
582 |    ]
583 |   },
584 |   {
585 |    "cell_type": "code",
586 |    "execution_count": null,
587 |    "id": "b7e81491",
588 |    "metadata": {},
589 |    "outputs": [],
590 |    "source": [
591 |     "# import the custom terminology file to Translate\n",
592 |     "term_name = 'aim317-custom-terminology'\n",
593 |     "response = translate_client.import_terminology(\n",
594 |     "    Name=term_name,\n",
595 |     "    MergeStrategy='OVERWRITE',\n",
596 |     "    TerminologyData={\n",
597 |     "        'File': term_file,\n",
598 |     "        'Format': 'CSV'\n",
599 |     "    }\n",
600 |     ")"
601 |    ]
602 |   },
603 |   {
604 |    "cell_type": "markdown",
605 |    "id": "7349b0fe",
606 |    "metadata": {},
607 |    "source": [
608 |     "#### Get the Spanish transcripts"
609 |    ]
610 |   },
611 |   {
612 |    "cell_type": "code",
613 |    "execution_count": null,
614 |    "id": "29a9b1fb",
615 |    "metadata": {},
616 |    "outputs": [],
617 |    "source": [
618 |     "# Review the list of transcripts to pick Spanish transcripts\n",
619 |     "paginator = s3.get_paginator('list_objects_v2')\n",
620 |     "pages = paginator.paginate(Bucket=bucket, Prefix=OUTPUT_PATH_TRANSCRIBE)\n",
621 |     "\n",
622 |     "s3_resource = boto3.resource('s3')\n",
623 |     "# Now copy the Spanish transcripts to Translate Input folder\n",
624 |     "for page in pages:\n",
625 |     "    for obj in page['Contents']:\n",
626 |     "        lang = ''\n",
627 |     "        ts_file = obj['Key'].split('/')[2]\n",
628 |     "        tscript = ts_file.split('-')\n",
629 |     "        if len(tscript) > 1:\n",
630 |     "            lang = tscript[2]\n",
631 |     "        if lang == 'ES':\n",
632 |     "            copy_source = {'Bucket': bucket,'Key': obj['Key']}\n",
633 |     "            s3_resource.meta.client.copy(copy_source, bucket, INPUT_PATH_TRANSLATE + '/' + ts_file)"
634 |    ]
635 |   },
636 |   {
637 |    "cell_type": "markdown",
638 |    "id": "d823c05d",
639 |    "metadata": {},
640 |    "source": [
641 |     "#### Run translation synchronously\n",
642 |     "\n",
643 |     "**Note:** For the purposes of this workshop we are running this translate synchronously as we have only 2 call transcripts to be translated. For large scale translation requirements, you should use [start_text_translation_job](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/translate.html#Translate.Client.start_text_translation_job) and for batch custom translation processing requirements you should use the [Parallel Data File with Translate Active Custom Translation](https://docs.aws.amazon.com/translate/latest/dg/customizing-translations-parallel-data.html) "
644 |    ]
645 |   },
646 |   {
647 |    "cell_type": "code",
648 |    "execution_count": null,
649 |    "id": "d221d433",
650 |    "metadata": {},
651 |    "outputs": [],
652 |    "source": [
653 |     "# Read the spanish transcripts from the Translate input folder in S3 bucket\n",
654 |     "paginator = s3.get_paginator('list_objects_v2')\n",
655 |     "pages = paginator.paginate(Bucket=bucket, Prefix=INPUT_PATH_TRANSLATE)\n",
656 |     "for page in pages:\n",
657 |     "    for obj in page['Contents']:\n",
658 |     "        temp = s3_resource.Object(bucket, obj['Key'])\n",
659 |     "        trans_input = temp.get()['Body'].read().decode('utf-8')\n",
660 |     "        if len(trans_input) > 0:\n",
661 |     "            # Translate the Spanish transcripts\n",
662 |     "            trans_response = translate_client.translate_text(\n",
663 |     "                Text=trans_input,\n",
664 |     "                TerminologyNames=[term_name],\n",
665 |     "                SourceLanguageCode='es',\n",
666 |     "                TargetLanguageCode='en'\n",
667 |     "            )\n",
668 |     "            # Write the translated text to a temporary file\n",
669 |     "            with open('temp_translate.txt',  'w') as outfile:\n",
670 |     "                outfile.write(trans_response['TranslatedText'])\n",
671 |     "            # Upload the translated text to S3 bucket   \n",
672 |     "            s3.upload_file('temp_translate.txt', bucket, OUTPUT_PATH_TRANSLATE + '/en-' + obj['Key'].split('/')[2])\n",
673 |     "            print(\"Translated text file uploaded to: \" + 's3://' + bucket + '/' + OUTPUT_PATH_TRANSLATE + '/en-' + obj['Key'].split('/')[2])\n",
674 |     "            "
675 |    ]
676 |   },
677 |   {
678 |    "cell_type": "markdown",
679 |    "id": "33118b19",
680 |    "metadata": {},
681 |    "source": [
682 |     "### Prepare Comprehend inputs\n",
683 |     "\n",
684 |     "We will now collect the original English transcripts and the translated Spanish language transcripts and move them to the Comprehend input folder in our S3 bucket in preparation for next steps in the workshop."
685 |    ]
686 |   },
687 |   {
688 |    "cell_type": "code",
689 |    "execution_count": null,
690 |    "id": "76c148bf",
691 |    "metadata": {},
692 |    "outputs": [],
693 |    "source": [
694 |     "# First copy the English transcripts\n",
695 |     "paginator = s3.get_paginator('list_objects_v2')\n",
696 |     "pages = paginator.paginate(Bucket=bucket, Prefix=OUTPUT_PATH_TRANSCRIBE)\n",
697 |     "\n",
698 |     "s3_resource = boto3.resource('s3')\n",
699 |     "\n",
700 |     "for page in pages:\n",
701 |     "    for obj in page['Contents']:\n",
702 |     "        ts_file1 = obj['Key'].split('/')[2]\n",
703 |     "        tscript = ts_file1.split('-')\n",
704 |     "        if len(tscript) > 1:\n",
705 |     "            lang = tscript[2]\n",
706 |     "            if lang == 'EN':\n",
707 |     "                copy_source = {'Bucket': bucket,'Key': obj['Key']}\n",
708 |     "                s3_resource.meta.client.copy(copy_source, bucket, 'comprehend/input/' + ts_file1)\n",
709 |     "\n",
710 |     "# Now copy the Spanish transcripts that were translated to English\n",
711 |     "pages = paginator.paginate(Bucket=bucket, Prefix=OUTPUT_PATH_TRANSLATE)\n",
712 |     "\n",
713 |     "for page in pages:\n",
714 |     "    for obj in page['Contents']:\n",
715 |     "        ts_file2 = obj['Key'].split('/')[2]\n",
716 |     "        if 'txt' in ts_file2:\n",
717 |     "            copy_source = {'Bucket': bucket,'Key': obj['Key']}\n",
718 |     "            s3_resource.meta.client.copy(copy_source, bucket, 'comprehend/input/' + ts_file2)               "
719 |    ]
720 |   },
721 |   {
722 |    "cell_type": "markdown",
723 |    "id": "9b4d9a2c",
724 |    "metadata": {},
725 |    "source": [
726 |     "Let us review if all the text files are ready for Comprehend custom inference. We should have 7 files in total with two calls that were transcribed in Spanish and translated to English, and 5 English calls that we transcribed. "
727 |    ]
728 |   },
729 |   {
730 |    "cell_type": "code",
731 |    "execution_count": null,
732 |    "id": "5d8fcbd6",
733 |    "metadata": {},
734 |    "outputs": [],
735 |    "source": [
736 |     "paginator = s3.get_paginator('list_objects_v2')\n",
737 |     "pages = paginator.paginate(Bucket=bucket, Prefix='comprehend/input')\n",
738 |     "for page in pages:\n",
739 |     "    for obj in page['Contents']:\n",
740 |     "        print(obj['Key'])"
741 |    ]
742 |   },
743 |   {
744 |    "cell_type": "markdown",
745 |    "id": "f5cfcf5f",
746 |    "metadata": {},
747 |    "source": [
748 |     "## End of notebook, go back to your workshop instructions"
749 |    ]
750 |   }
751 |  ],
752 |  "metadata": {
753 |   "interpreter": {
754 |    "hash": "9ddb102edfbd95000dbbd260d8bbcf82701cc06b4dcf114fa04ba84aab75adcb"
755 |   },
756 |   "kernelspec": {
757 |    "display_name": "conda_python3",
758 |    "language": "python",
759 |    "name": "conda_python3"
760 |   },
761 |   "language_info": {
762 |    "codemirror_mode": {
763 |     "name": "ipython",
764 |     "version": 3
765 |    },
766 |    "file_extension": ".py",
767 |    "mimetype": "text/x-python",
768 |    "name": "python",
769 |    "nbconvert_exporter": "python",
770 |    "pygments_lexer": "ipython3",
771 |    "version": "3.6.13"
772 |   }
773 |  },
774 |  "nbformat": 4,
775 |  "nbformat_minor": 5
776 | }
777 | 


--------------------------------------------------------------------------------
/notebooks/2-Train-Detect-Entities/train.csv:
--------------------------------------------------------------------------------
  1 | "The unique feature of Asimov's robots is the Three Laws of Robotics, hardwired in a robot's positronic brain, with which all robots in his fiction must comply, and which ensure that the robot does not turn against its creators"
  2 | """Victory Unintentional"" has positronic robots obeying the Three Laws, but also a non-human civilization on Jupiter."
  3 | """Let's Get Together"" features humanoid robots, but from a different future (where the Cold War is still in progress), and with no mention of the Three Laws"
  4 | "The Robot series is a series of 37 science fiction short stories and six novels by American writer Isaac Asimov, featuring positronic robots."
  5 | """Mother Earth"" (1948) - short story, in which no individual robots appear, but positronic robots are part of the background"
  6 | "Most of Asimov's robot short stories, which he began to write in 1939, are set in the first age of positronic robotics and space exploration."
  7 | "The stories were not initially conceived as a set, but rather all feature his positronic robotsindeed, there are some inconsistencies among them, especially between the short stories and the novels."
  8 | "It was the Zoromes, then, who were the spiritual ancestors of my own ""positronic robots,"" all of them, from Robbie to R. Daneel."
  9 | "The 1989 anthology Foundation's Friends included the positronic robot stories ""Balance"" by Mike Resnick, ""Blot"" by Hal Clement, ""PAPPI"" by Sheila Finch, ""Plato's Cave"" by Poul Anderson, ""The Fourth Law of Robotics"" by Harry Harrison and ""Carhunters of the Concrete Prairie"" by Robert Sheckley."
 10 | "Bicentennial Man (1999) was the first theatrical movie adaptation of any Asimov story or novel and was based on both Asimov's original short story of the same name (1976) and its novel expansion, The Positronic Man (1993)"
 11 | "The book also contains the short story in which Asimov's Three Laws of Robotics first appear, which had large influence on later science fiction and had impact on thought on ethics of artificial intelligence as well."
 12 | "Its plot incorporates elements of ""Little Lost Robot"",[8] some of Asimov's character names and the Three Laws."
 13 | "In 2004 The Saturday Evening Post said that I, Robot's Three Laws ""revolutionized the science fiction genre and made robots far more interesting than they ever had been before."""
 14 | "In Aliens, a 1986 movie, the synthetic person Bishop paraphrases Asimov's First Law in the line: ""It is impossible for me to harm, or by omission of action allow to be harmed, a human being."""
 15 | "An episode of The Simpsons entitled ""I D'oh Bot"" (2004) has Professor Frink build a robot named ""Smashius Clay"" (also named ""Killhammad Aieee"") that follows all three of Asimov's laws of robotics."
 16 | "Leela once told Bender to ""cover his ears"" so that he would not hear the robot-destroying paradox which she used to destroy Robot Santa (he punishes the bad, he kills people, killing is bad, therefore he must punish himself), causing a total breakdown; additionally, Bender has stated that he is Three Laws Safe."
 17 | "The Indian science fiction film Endhiran, released in 2010, refers to Asimov's three laws for artificial intelligence for the fictional character Chitti: The Robot."
 18 | "When a scientist takes in the robot for evaluation, the panel enquires whether the robot was built using the Three Laws of Robotics."
 19 | "Upon their publication in this collection, Asimov wrote a framing sequence presenting the stories as Calvin's reminiscences during an interview with her about her life's work, chiefly concerned with aberrant behaviour of robots and the use of ""robopsychology"" to sort out what is happening in their positronic brain."
 20 | "Two months after I read it, I began 'Robbie', about a sympathetic robot, and that was the start of my positronic robot series."
 21 | "The positronic brain, which Asimov named his robots' central processors, is what powers Data from Star Trek: The Next Generation, as well as other Soong type androids. "
 22 | "Positronic brains have been referenced in a number of other television shows including Doctor Who, Once Upon a Time... Space, Perry Rhodan, The Number of the Beast, and others."
 23 | "In ""Someday"" there are non-positronic computers which tell stories and do not obey the Three Laws."
 24 | "In ""Sally"" there are positronic brain cars who can damage men or disobey without problems. No other kinds of robots are seen, and there is no mention of the Three Laws."
 25 | "In "". . . That Thou Art Mindful of Him"" robots are created with a very flexible Three Laws management, and these create little, simplified robots with no laws that actually act against the Three Laws of Robotics."
 26 | "Andrew uses the money to pay for bodily upgrades, keeping himself in perfect shape, but never has his positronic brain altered."
 27 | The first scene of the story is explained as Andrew seeks out a robotic surgeon to perform an ultimately fatal operation: altering his positronic brain so that it will decay with time.
 28 | "his story is set within Asimov's Foundation universe, which also includes his earlier Susan Calvin positronic robot tales."
 29 | "Sir reveals that U.S. Robots has ended a study on generalized pathways and creative robots, frightened by Andrew's unpredictability."
 30 | "However, the robot refuses, as the operation is harmful and violates the First Law of Robotics, which says a robot may never harm a human being."
 31 | "The Positronic Man is a 1992 novel by American writers Isaac Asimov and Robert Silverberg, based on Asimov's 1976 novelette ""The Bicentennial Man""."
 32 | In the twenty-first century the creation of the positronic brain leads to the development of robot laborers and revolutionizes life on Earth. 
 33 | "In The Positronic Man, the trends of fictional robotics in Asimov's Robot series (as outlined in the book I, Robot) are detailed as background events, with an indication that they are influenced by Andrew's story."
 34 | "Only when Andrew allows his positronic brain to ""decay"", thereby willfully abandoning his immortality, is he declared a human being."
 35 | "This story is set within Asimov's Foundation universe, which also includes his earlier Susan Calvin positronic robot tales."
 36 | "No individual robots appear, but positronic robots are part of the background."
 37 | "Earth faces a confrontation with its colonies, the ""Outer Worlds."" A historian looks back and sees the problem beginning a century and a half earlier, when Aurora got permission to ""introduce positronic robots into their community life."""
 38 | "The only witness is a malfunctioning house robot that has suffered damage to its positronic brain because it allowed harm to be done to a human, in violation of the First Law."
 39 | "Ultimately, it is revealed that Delmarre's neighbor, roboticist Jothan Leebig, was working on putting positronic brains in spaceships."
 40 | "Leebig poisoned Gruer by tricking his robots, using his knowledge of positronic brains, into putting poison into Gruer's drink."
 41 | "This would negate the First Law, as such ships would not recognize that humans usually inhabit ships, and would therefore be able to attack and destroy other ships without regard for their crews."
 42 | "R. Daneel and R. Giskard discover the roboticists' plan and attempt to stop Amadiro; but are hampered by the First Law of Robotics,"
 43 | "Daneel and Giskard, meanwhile, have inferred an additional Zeroth Law of Robotics: A robot may not injure humanity, or through inaction, allow humanity to come to harm."
 44 | "It might enable them to overcome Amadiro, if they can use their telepathic perception of humanity to quell the inhibitions of the first law"
 45 | "After Amadiro admits their plans, Giskard alters Amadiro's brain (using the newly created Zeroth Law); but in so doing, threatens his own."
 46 | "Under the stress of having violated the First Law (in accordance with the Zeroth Law, but with the predicted benefit to humanity being uncertain), R. Giskard himself suffers a soon-fatal malfunction of his positronic brain but manages to confer his telepathic ability upon R. Daneel."
 47 | "Dave Langford reviewed Robots and Empire for White Dwarf #85, and stated that ""Asimov always perks up when chopping logic with the Three Laws of Robotics, and here his robots come up with a Fourth, or rather Zeroth, Law."
 48 | "In the novel, Asimov depicts the transition from his earlier Milky Way Galaxy, inhabited by both human beings and positronic robots, to his Galactic Empire."
 49 | "Gladia is accompanied by the positronic robots R. Daneel Olivaw and R. Giskard Reventlov, both the former property of their creator, Dr. Han Fastolfe, who bequeathed them to Gladia in his will. R. Giskard has secret telepathic powers of which only R. Daneel knows."
 50 | "The electrical aspect of robots is used for movement (through motors), sensing (where electrical signals are used to measure things like heat, sound, position, and energy status) and operation (robots need some level of electrical energy supplied to their motors and sensors in order to activate and perform basic operations)"
 51 | "Actuators are the ""muscles"" of a robot, the parts which convert stored energy into movement."
 52 | "Scientists from several European countries and Israel developed a prosthetic hand in 2009, called SmartHand, which functions like a real oneallowing patients to write with it, type on a keyboard, play piano and perform other fine movements"
 53 | "As the robot falls to one side, it would jump slightly in that direction, in order to catch itself."
 54 | "A quadruped was also demonstrated which could trot, run, pace, and bound"
 55 | "A more advanced way for a robot to walk is by using a dynamic balancing algorithm, which is potentially more robust than the Zero Moment Point technique, as it constantly monitors the robot's motion, and places the feet in order to maintain stability"
 56 | "In some of Asimov's other works, he states that the first use of the word robotics was in his short story Runaround (Astounding Science Fiction, March 1942),[4][5] where he introduced his concept of The Three Laws of Robotics"
 57 | "Its limb control system allowed it to walk with the lower limbs, and to grip and transport objects with hands, using tactile sensors"
 58 | "There are three different types of robotic programs: remote control, artificial intelligence and hybrid."
 59 | "Robots that use artificial intelligence interact with their environment on their own without a control source, and can determine reactions to objects and problems they encounter using their preexisting programming."
 60 | "Several robots have been made which can walk reliably on two legs, however, none have yet been made which are as robust as a human."
 61 | "Many other robots have been built that walk on more than two legs, due to these robots being significantly easier to construct."
 62 | "Walking robots can be used for uneven terrains, which would provide better mobility and energy efficiency than other locomotion methods."
 63 | "Typically, robots on two legs can walk well on flat floors and can occasionally walk up stairs."
 64 | "Several robots, built in the 1980s by Marc Raibert at the MIT Leg Laboratory, successfully demonstrated very dynamic walking. "
 65 | "Initially, a robot with only one leg, and a very small foot could stay upright simply by hopping."
 66 | " As the robot falls to one side, it would jump slightly in that direction, in order to catch itself."
 67 | "A more advanced way for a robot to walk is by using a dynamic balancing algorithm, which is potentially more robust than the Zero Moment Point technique, as it constantly monitors the robot's motion, and places the feet in order to maintain stability."
 68 | "This technique was recently demonstrated by Anybots' Dexter Robot,[99] which is so stable, it can even jump"
 69 | Perhaps the most promising approach utilizes passive dynamics where the momentum of swinging limbs is used for greater efficiency
 70 | "It has been shown that totally unpowered humanoid mechanisms can walk down a gentle slope, using only gravity to propel themselves."
 71 | "Using this technique, a robot need only supply a small amount of motor power to walk along a flat surface or a little more to walk up a hill"
 72 | One approach mimics the movements of a human climber on a wall with protrusions; adjusting the center of mass and moving each limb in turn to gain leverage.
 73 | "Science fiction authors also typically assume that robots will eventually be capable of communicating with humans through speech, gestures, and facial expressions, rather than a command-line interface."
 74 | "Evolutionary robots is a methodology that uses evolutionary computation to help design robots, especially the body form, or motion and behavior controllers."
 75 | "Direct kinematics or forward kinematics refers to the calculation of end effector position, orientation, velocity, and acceleration when the corresponding joint values are known."
 76 | "Inverse kinematics refers to the opposite case in which required joint values are calculated for given end effector values, as done in path planning."
 77 | "Once all relevant positions, velocities, and accelerations have been calculated using kinematics, methods from the field of dynamics are used to study the effect of forces upon these movements."
 78 | "Normal human gait is a complex process, which happens due to co-ordinated movements of the whole of the body, requiring the whole of Central Nervous System - the brain and spinal cord, to function properly."
 79 | The most common cause for gait impairment is due to an injury of one or both legs.
 80 | "Gait training is not simply re-educating a patient on how to walk, but also includes an initial assessment of their gait cycle - Gait analysis, creation of a plan to address the problem, as well as teaching the patient on how to walk on different surfaces."
 81 | "Assistive devices and splints (orthosis) are often used in gait training, especially with those who have had surgery or an injury on their legs, but also with those who have balance or strength impairments as well."
 82 | "Although gait training with parallel bars, treadmills and support systems can be beneficial, the long-term aim of gait training is usually to reduce patients' dependence on such technology in order to walk more in their daily lives."
 83 | "A gait cycle is defined as the progression of movements that occurs before one leg can return to a certain position during walking, or ambulation."
 84 | The gait cycle is studied in two phases - Swing and stance phase.
 85 | Any gait training addressing a gait abnormality starts with a proper gait analysis.
 86 | "The gait consists of a series of repetitive movements of the whole body during locomotion and is studied considering that each gait cycle repeats over itself, which is almost correct considering normal subjects."
 87 | "The basic two phases are swing and stance phases, depending on whether the leg is free to swing or is in contact with the ground during the phase of gait studied."
 88 | The stance phase is approximately 60% of the gait cycle and takes about 0.6 seconds to complete at a normal walking speed.
 89 | "The swing phase occurs when the foot is not in contact with the ground, and constitutes about 40% of the gait cycle."
 90 | The two point gait pattern requires a high level of coordination and balance
 91 | "Recently, electromechanical devices such as the Hocoma Lokomat robot-driven gait orthosis have been introduced with the intention of reducing the physical labour demands on therapists."
 92 | "Treadmill training, with or without a body-weight support, is an emerging therapy and is being used with stroke patients to improve kinematic gait parameters"
 93 | Research has shown that this form of gait training demonstrates a more normal walking pattern without the compensatory movements commonly associated with stroke
 94 | Determining the movement of a robot so that its end-effectors move from an initial configuration to a desired configuration is known as motion planning
 95 | "The movement of a kinematic chain, whether it is a robot or an animated character, is modeled by the kinematics equations of the chain."
 96 | Movement of one element requires the computation of the joint angles for the other elements to maintain the joint constraints.
 97 | "For example, inverse kinematics allows an artist to move the hand of a 3D human model to a desired position and orientation and have an algorithm select the proper angles of the wrist, elbow, and shoulder joints."
 98 | "Isaac Asimov considered the issue in the 1950s in his I, Robot. At the insistence of his editor John W. Campbell Jr., he proposed the Three Laws of Robotics to govern artificially intelligent systems."
 99 | "Much of his work was then spent testing the boundaries of his three laws to see where they would break down, or where they would create paradoxical or unanticipated behavior."
100 | "A panel convened by the United Kingdom in 2010 revised Asimov's laws to clarify that AI is the responsibility either of its manufacturers, or of its owner/operator."
101 | "The movies Bicentennial Man and A.I. deal with the possibility of sentient robots that could love. I, Robot explored some aspects of Asimov's three laws."
102 | The Three Laws of Robotics (often shortened to The Three Laws or known as Asimov's Laws) are a set of rules devised by science fiction author Isaac Asimov.
103 | "The Three Laws, quoted from the ""Handbook of Robotics, 56th Edition, 2058 A.D."", are: First Law -A robot may not injure a human being or, through inaction, allow a human being to come to harm., Second Law - A robot must obey the orders given it by human beings except where such orders would conflict with the First Law., Third Law -A robot must protect its own existence as long as such protection does not conflict with the First or Second Law"
104 | "The Laws are incorporated into almost all of the positronic robots appearing in his fiction, and cannot be bypassed, being intended as a safety feature."
105 | Many of Asimov's robot-focused stories involve robots behaving in unusual and counter-intuitive ways as an unintended consequence of how the robot applies the Three Laws to the situation in which it finds itself.
106 | The original laws have been altered and elaborated on by Asimov and other authors.
107 | "Asimov also added a fourth, or zeroth law, to precede the others: A robot may not harm humanity, or, by inaction, allow humanity to come to harm."
108 | "The Three Laws, and the zeroth, have pervaded science fiction and are referred to in many books, films, and other media."
109 | "Asimov attributes the Three Laws to John W. Campbell, from a conversation that took place on 23 December 1940."
110 | Campbell claimed that Asimov had the Three Laws already in his mind and that they simply needed to be stated explicitly.
111 | "ccording to his autobiographical writings, Asimov included the First Law's ""inaction"" clause because of Arthur Hugh Clough's poem ""The Latest Decalogue"" (text in Wikisource), which includes the satirical lines ""Thou shalt not kill, but needst not strive / officiously to keep alive""."
112 | "Although Asimov pins the creation of the Three Laws on one particular date, their appearance in his literature happened over a period."
113 | "He wrote two robot stories with no explicit mention of the Laws, ""Robbie"" and ""Reason""."
114 | "He assumed, however, that robots would have certain inherent safeguards. ""Liar!"", his third robot story, makes the first mention of the First Law but not the other two."
115 | "When these stories and several others were compiled in the anthology I, Robot, ""Reason"" and ""Robbie"" were updated to acknowledge all the Three Laws, though the material Asimov added to ""Reason"" is not entirely consistent with the Three Laws as he described them elsewhere"
116 | "In his short story ""Evidence"" Asimov lets his recurring character Dr. Susan Calvin expound a moral basis behind the Three Laws."
117 | "Calvin points out that human beings are typically expected to refrain from harming other human beings (except in times of extreme duress like war, or to save a greater number) and this is equivalent to a robot's First Law"
118 | "Likewise, according to Calvin, society expects individuals to obey instructions from recognized authorities such as doctors, teachers and so forth which equals the Second Law of Robotics."
119 | Finally humans are typically expected to avoid harming themselves which is the Third Law for a robot.
120 | "The plot of ""Evidence"" revolves around the question of telling a human being apart from a robot constructed to appear human – Calvin reasons that if such an individual obeys the Three Laws he may be a robot or simply ""a very good man"""
121 | "Asimov later wrote that he should not be praised for creating the Laws, because they are ""obvious from the start, and everyone is aware of them subliminally."
122 | The Laws just never happened to be put into brief sentences until I managed to do the job.
123 | "I have my answer ready whenever someone asks me if I think that my Three Laws of Robotics will actually be used to govern the behavior of robots, once they become versatile and flexible enough to be able to choose among different courses of behavior."
124 | "My answer is, ""Yes, the Three Laws are the only way in which rational human beings can deal with robotsor with anything else."""
125 | Asimov's stories test his Three Laws in a wide variety of circumstances leading to proposals and rejection of modifications.
126 | "Science fiction scholar James Gunn writes in 1982, ""The Asimov robot stories as a whole may respond best to an analysis on this basis: the ambiguity in the Three Laws and the ways in which Asimov played twenty-nine variations upon a theme"""
127 | "Removing the First Law's ""inaction"" clause solves this problem but creates the possibility of an even greater one: a robot could initiate an action that would harm a human (dropping a heavy weight and failing to catch it is the example given in the text), knowing that it was capable of preventing the harm and then decide not to do so."
128 | "Gaia is a planet with collective intelligence in the Foundation series which adopts a law similar to the First Law, and the Zeroth Law, as its philosophy: Gaia may not harm life or allow life to come to harm."
129 | "Three times during his writing career, Asimov portrayed robots that disregard the Three Laws entirely."
130 | "On the other hand, the short story ""Cal"" (from the collection Gold), told by a first-person robot narrator, features a robot who disregards the Three Laws because he has found something far more importanthe wants to be a writer."
131 | "The third is a short story entitled ""Sally"" in which cars fitted with positronic brains are apparently able to harm and kill humans in disregard of the First Law."
132 | "However, aside from the positronic brain concept, this story does not refer to other robot stories and may not be set in the same continuity."
133 | Without the basic theory of the Three Laws the fictional scientists of Asimov's universe would be unable to design a workable brain unit.
134 | "The character Dr. Gerrigel uses the term ""Asenion"" to describe robots programmed with the Three Laws."
135 | "The robots in Asimov's stories, being Asenion robots, are incapable of knowingly violating the Three Laws but, in principle, a robot in science fiction or in the real world could be non-Asenion."
136 | "Characters within the stories often point out that the Three Laws, as they exist in a robot's mind, are not the written versions usually quoted by humans but abstract mathematical concepts upon which a robot's entire developing consciousness is based."
137 | "This concept is largely fuzzy and unclear in earlier stories depicting very rudimentary robots who are only programmed to comprehend basic physical tasks, where the Three Laws act as an overarching safeguard, but by the era of The Caves of Steel featuring robots with human or beyond-human intelligence the Three Laws have become the underlying basic ethical worldview that determines the actions of all robots."
138 | "These three books, Caliban, Inferno and Utopia, introduce a new set of the Three Laws."
139 | "The so-called New Laws are similar to Asimov's originals with the following differences: the First Law is modified to remove the ""inaction"" clause, the same modification made in ""Little Lost Robot""; the Second Law is modified to require cooperation instead of obedience; the Third Law is modified so it is no longer superseded by the Second (i.e., a ""New Law"" robot cannot be ordered to destroy itself); finally, Allen adds a Fourth Law which instructs the robot to do ""whatever it likes"" so long as this does not conflict with the first three laws."
140 | "The Laws of Robotics are portrayed as something akin to a human religion, and referred to in the language of the Protestant Reformation, with the set of laws containing the Zeroth Law known as the ""Giskardian Reformation"" to the original ""Calvinian Orthodoxy"" of the Three Laws"
141 | "Randall Munroe has discussed the Three Laws in various instances, but possibly most directly by one of his comics entitled The Three Laws of Robotics which imagines the consequences of every distinct ordering of the existing three laws."
142 | "The Laws of Robotics presume that the terms ""human being"" and ""robot"" are understood and well defined."
143 | It takes as its concept the growing development of robots that mimic non-human living things and given programs that mimic simple animal behaviours which do not require the Three Laws.
144 | Both are to be considered alternatives to the possibility of a robot society that continues to be driven by the Three Laws as portrayed in the Foundation series.
145 | "In Lucky Starr and the Rings of Saturn, a novel unrelated to the Robot series but featuring robots programmed with the Three Laws, John Bigman Jones is almost killed by a Sirian robot on orders of its master."
146 | Advanced robots in fiction are typically programmed to handle the Three Laws in a sophisticated manner.
147 | "For example, the First Law may forbid a robot from functioning as a surgeon, as that act may cause damage to a human; however, Asimov's stories eventually included robot surgeons"
148 | "Asimov's Three Laws-obeying robots (Asenion robots) can experience irreversible mental collapse if they are forced into situations where they cannot obey the First Law, or if they discover they have unknowingly violated it."
149 | "The first example of this failure mode occurs in the story ""Liar!"", which introduced the First Law itself, and introduces failure by dilemmain this case the robot will hurt humans if he tells them something and hurt them if he does not."
150 | "This failure mode, which often ruins the positronic brain beyond repair, plays a significant role in Asimov's SF-mystery novel The Naked Sun."
151 | "As such, a robot is capable of taking an action which can be interpreted as following the First Law, thus avoiding a mental collapse."
152 | "Robots and artificial intelligences do not inherently contain or obey the Three Laws; their human creators must choose to program them in, and devise a means to do so."
153 | "Even the most complex robots currently produced are incapable of understanding and applying the Three Laws; significant advances in artificial intelligence would be needed to do so, and even if AI could reach human-level intelligence, the inherent ethical complexity as well as cultural/contextual dependency of the laws prevent them from being a good candidate to formulate robotics design constraints"
154 | "On the other hand, Asimov's later novels The Robots of Dawn, Robots and Empire and Foundation and Earth imply that the robots inflicted their worst long-term harm by obeying the Three Laws perfectly well, thereby depriving humanity of inventive or risk-taking behaviour."
155 | "The futurist Hans Moravec (a prominent figure in the transhumanist movement) proposed that the Laws of Robotics should be adapted to ""corporate intelligences""  the corporations driven by AI and robotic manufacturing power which Moravec believes will arise in the near future."
156 | "In contrast, the David Brin novel Foundation's Triumph (1999) suggests that the Three Laws may decay into obsolescence."
157 | "Brin even portrays R. Daneel Olivaw worrying that, should robots continue to reproduce themselves, the Three Laws would become an evolutionary handicap and natural selection would sweep the Laws away."
158 | "Although the robots would not be evolving through design instead of mutation because the robots would have to follow the Three Laws while designing and the prevalence of the laws would be ensured,[53] design flaws or construction errors could functionally take the place of biological mutation."
159 | "Asimov himself believed that his Three Laws became the basis for a new view of robots which moved beyond the ""Frankenstein complex"""
160 | Stories written by other authors have depicted robots as if they obeyed the Three Laws but tradition dictates that only Asimov could quote the Laws explicitly.
161 | "Asimov believed the Three Laws helped foster the rise of stories in which robots are ""lovable"" – Star Wars being his favorite example."
162 | "Where the laws are quoted verbatim, such as in the Buck Rogers in the 25th Century episode ""Shgoratchx!"", it is not uncommon for Asimov to be mentioned in the same dialogue as can also be seen in the Aaron Stone pilot where an android states that it functions under Asimov's Three Laws."
163 | Asimov was delighted with Robby and noted that Robby appeared to be programmed to follow his Three Laws.
164 | The film Bicentennial Man (1999) features Robin Williams as the Three Laws robot NDR-114 (the serial number is partially a reference to Stanley Kubrick's signature numeral)
165 | "Williams recites the Three Laws to his employers, the Martin family, aided by a holographic projection. The film only loosely follows the original story."
166 | "Harlan Ellison's proposed screenplay for I, Robot began by introducing the Three Laws, and issues growing from the Three Laws form a large part of the screenplay's plot development"
167 | "A positronic brain is a fictional technological device, originally conceived by science fiction writer Isaac Asimov."
168 | "When Asimov wrote his first robot stories in 1939 and 1940, the positron was a newly discovered particle, and so the buzz word ""positronic"" added a scientific connotation to the concept."
169 | "Asimov's 1942 short story ""Runaround"" elaborates his fictional Three Laws of Robotics, which are ingrained in the positronic brains of nearly all of his robots."
170 | "Positronic brains, as such, are a kind of brain made of positrons – small particles which help in the transmission of various thoughts and impulses to the brain and help the brain's cognition relay the selected emotion or solution."
171 | Asimov remained vague about the technical details of positronic brains except to assert that their substructure was formed from an alloy of platinum and iridium.
172 | "The focus of Asimov's stories was directed more towards the software of robotssuch as the Three Laws of Roboticsthan the hardware in which it was implemented, although it is stated in his stories that to create a positronic brain without the Three Laws, it would have been necessary to spend years redesigning the fundamental approach towards the brain itself."
173 | "Within his stories of robotics on Earth and their development by U.S. Robots, Asimov's positronic brain is less of a plot device and more of a technological item worthy of study."
174 | A positronic brain cannot ordinarily be built without incorporating the Three Laws; any modification thereof would drastically modify robot behavior.
175 | The Three Laws are also a bottleneck in brain sophistication
176 | "Very complex brains designed to handle world economy interpret the First Law in an expanded sense to include humanity as opposed to a single human; in Asimov's later works like Robots and Empire this is referred to as the ""Zeroth Law"""
177 | "At least one brain constructed as a calculating machine, as opposed to being a robot control circuit, was designed to have a flexible, childlike personality so that it was able to pursue difficult problems without the Three Laws inhibiting it completely."
178 | The sophistication of positronic circuitry renders a brain so small that it could comfortably fit within the skull of an insect.
179 | "It offers speed and capacity improvements over traditional positronic designs, but the strong influence of tradition make robotics labs reject Anshaw's work."
180 | "Only one roboticist, Fredda Leving, chooses to adopt gravitonics, because it offers her a blank slate on which she could explore alternatives to the Three Laws"
181 | "When Queen Allura of Venus (Mari Blanchard) puts Orville (Lou Costello) to a lie detector test in an ESP-enabled crystal chair, she states that it is ""based on the principle of the Positronic Brain."""
182 | "In a mini story entitled ""Night Vision!"" in Annual #6 of the Marvel comic, writer Scot Edelman refers to the brain of the synthezoid ""The Vision"" as positronic."
183 | "Human space colonists examine ""dead"" Daleks and, upon their re-activation, conjecture as to ""what sort of positronic brain must this device possess""."
184 | "However, the Daleks are actually organic life-forms that were encased in robotic shells, and thus do not possess the purported positronic brain and, in any case, do not obey the Three Laws of Robotics."
185 | "In the seventeenth season (1979–80) story ""The Horns of Nimon"", the fourth incarnation of the Doctor, played by Tom Baker, recognizes the Labyrinth-like building complex that serves as the lair of the Nimons as resembling both physically and functionally a ""giant positronic circuit""."
186 | The creation was said to be controlled by a positronic brain.
187 | "Several fictional characters in Star Trek: The Next GenerationLieutenant Commander Data, his ""mother"" Julianna Soong Tainer, his daughter Lal, and his brothers Lore and B-4are androids equipped with positronic brains created by Dr. Noonien Soong."
188 | """Positronic implants"" were used to replace lost function in Vedek Bareil's brain in the Deep Space 9 episode ""Life Support""."
189 | "In the German science fiction series Perry Rhodan (written starting in 1961), positronic brains (German: Positroniken) are the main computer technology; for quite a time they are replaced by the more powerful Syntronics, but those stop working due to the increased Hyperimpedance."
190 | The most powerful positronic brain is called NATHAN and covers large parts of the Earth's moon.
191 | "Many of the larger computers (including NATHAN) as well as the race of Posbis combine a biological component with the positronic brain, giving them sentience and creativity."
192 | "The robots in the 2004 film I, Robot (loosely based upon several of Isaac Asimov's stories) also have positronic brains."
193 | "Sonny, one of the main characters from the film, has two separate positronic brainsthe second being a positronic ""heart""so it has choices open to him the other robots in the film do not have."
194 | "The film also features a colossal positronic brain, VIKI, who is bound by the Three Laws."
195 | Sonny also has the possibility of being able to develop emotions and a sense of right and wrong independent of the Three Laws of Robotics; it has the ability to choose not to obey them.
196 | "The robots in the 1999 film Bicentennial Man (based on one of Asimov's stories) also have positronic brains, including the main character Andrew, an NDR series robot that starts to experience human characteristics such as creativity."
197 | "Only when Andrew allows his positronic brain to ""decay"", thereby willfully abandoning his immortality, is he declared a human being."
198 | "Twiki and Crichton, two robotic characters who appear in the Buck Rogers in the 25th Century television series, were equipped with positronic brains."
199 | "Crichton recited Asimov's ""Three Laws of Robotics"" upon activation."
200 | "In 1989, in the Mystery Science Theater 3000 Season One episode The Corpse Vanishes, Crow T. Robot and Tom Servo read an issue of Tiger Bot magazine featuring an interview with the Star Trek character, Data. They then lament the fact that they don't have positronic brains like him."
201 | "In the second episode, Spectreman's robot head is found and viewers discover he is a robot with a positronic brain."
202 | "The game Stellaris features Positronic Artificial Intelligence as a possible research goal, which is employed with ""Synthetics"" (sentient robotic beings) and sentient computers for usage in research, administration, combat etc."
203 | "In the game Space Station 13, players can research and construct positronic brains, and place them inside of AIs, cyborgs and even mechas"
204 | "A neural pathway is the connection formed by axons that project from neurons to make synapses onto neurons in another location, to enable a signal to be sent from one region of the nervous system to another."
205 | A neural pathway connects one part of the nervous system to another using bundles of axons called tracts.
206 | "Shorter neural pathways are found within grey matter in the brain, whereas longer projections, made up of myelinated axons, constitute white matter."
207 | "In the hippocampus there are neural pathways involved in its circuitry including the perforant pathway, that provides a connectional route from the entorhinal cortex[2] to all fields of the hippocampal formation, including the dentate gyrus, all CA fields (including CA1),[3] and the subiculum."
208 | "Note that the ""old"" name was primarily descriptive, evoking the pyramids of antiquity, from the appearance of this neural pathway in the medulla oblongata."
209 | "The axon of a nerve cell is, in general, responsible for transmitting information over a relatively long distance, therefore, most neural pathways are made up of axons."
210 | "Neural pathways in the basal ganglia in the cortico-basal ganglia-thalamo-cortical loop, are seen as controlling different aspects of behaviour."
211 | It has been proposed that the dopamine system of pathways is the overall organiser of the neural pathways that are seen to be parallels of the dopamine pathways.
212 | Dopamine is provided both tonically and phasically in response to the needs of the neural pathways.
213 | Miguel Nicolelis and colleagues demonstrated that the activity of large neural ensembles can predict arm position
214 | Their BCI used high-density electrocorticography to tap neural activity from a patient's brain and used deep learning methods to synthesize speech.
215 | The use of BMIs has also led to a deeper understanding of neural networks and the central nervous system.
216 | "Beyond BCI systems that decode neural activity to drive external effectors, BCI systems may be used to encode signals from the periphery."
217 | "These sensory BCI devices enable real-time, behaviorally-relevant decisions based upon closed-loop neural stimulation."
218 | "The participant imagined moving his hand to write letters, and the system performed handwriting recognition on electrical signals detected in the motor cortex, utilizing hidden Markov models and recurrent neural networks for decoding."
219 | This proximity to motor cortex underlies the Stentrode's ability to measure neural activity.
220 | "The Stentrode communicates neural activity to a battery-less telemetry unit implanted in the chest, which communicates wirelessly with an external telemetry unit capable of power and data transfer."
221 | "Their study achieved word error rates of 3% (a marked improvement from prior publications) utilizing an encoder-decoder neural network, which translated ECoG data into one of fifty sentences composed of 250 unique words."
222 | The current focus of research is user-to-user communication through analysis of neural signals
223 | Researchers have built devices to interface with neural cells and entire neural networks in cultures outside animals.
224 | "After collection, the cortical neurons were cultured in a petri dish and rapidly began to reconnect themselves to form a living neural network."
225 | Flexible neural interfaces have been extensively tested in recent years in an effort to minimize brain tissue trauma related to mechanical mismatch between electrode and tissue
226 | "Walking robots simulate human or animal gait, as a replacement for wheeled motion"
227 | "A major goal in this field is in developing capabilities for robots to autonomously decide how, when, and where to move."
228 | "However, coordinating numerous robot joints for even simple matters, like negotiating stairs, is difficult."
229 | "Walking robots simulate human or animal gait, as a replacement for wheeled motion."
230 | "The robot functioned effectively, walking in several gait patterns and crawling with its high DoF legs"
231 | "Multiple legs allow several different gaits, even if a leg is damaged, making their movements more useful in robots transporting objects."
232 | This is because an ideal rolling (but not slipping) wheel loses no energy
233 | "Coordinated, sequential mechanical action having the appearance of a traveling wave is called a metachronal rhythm or wave, and is employed in nature by ciliates for transport, and by worms and arthropods for locomotio"
234 | "Brachiation allows robots to travel by swinging, using energy only to grab and release surfaces"
235 | This motion is similar to an ape swinging from tree to tree.
236 | The two types of brachiation can be compared to bipedal walking motions (continuous contact) or running (ricochetal).
237 | "Continuous contact is when a hand/grasping mechanism is always attached to the surface being crossed; ricochetal employs a phase of aerial ""flight"" from one surface/limb to the next."
238 | "Thus robots of this nature need to be small, light, quick, and possess the ability to move in multiple locomotive modes."
239 | Robots can also be designed to perform locomotion in multiple modes.
240 | "Several robots capable of basic locomotion in a single mode have been invented but are found to lack several capabilities, hence limiting their functions and applications."
241 | "In addition, Pteromyini are able to exhibit multi-modal locomotion due to the membrane that connects the fore and hind legs which also enhances their gliding ability."
242 | Pteromyini are able to boost their gliding ability due to the numerous physical attributes they possess.
243 | "The common vampire bats are known to possess powerful modes of terrestrial locomotion, such as jumping, and aerial locomotion such as gliding."
244 | "Between the two modes of locomotion, there are three bones that are shared."
245 | "Since there already exists a sharing of components for both modes, no additional muscles are needed when transitioning from jumping to gliding"
246 | The desert locust is known for its ability to jump and fly over long distances as well as crawl on land.
247 | A detailed study of the anatomy of this organism provides some detail about the mechanisms for locomotion.
248 | A detailed study of the anatomy of this organism provides some detail about the mechanisms for locomotion.
249 | The hind legs of the locust are developed for jumping.
250 | "In order for a perfect jump to occur, the locust must push its legs on the ground with a strong enough force so as to initiate a fast takeoff."
251 | The force must be adequate enough in order to attain a quick takeoff and decent jump height.
252 | "In order to effectively transition from the jumping mode to the flying mode, the insect must adjust the time during the wing opening to maximize the distance and height of the jump."
253 | "When it is at the zenith of its jump, the flight mode becomes actuated."
254 | "Following the discovery of the requisite model to mimic, researchers sought to design a legged robot that was capable of achieving effective motion in aerial and terrestrial environments by the use of a flexible membrane."
255 | The membrane had to be flexible enough to allow for unrestricted movement of the legs during gliding and walking.
256 | The leg of the robot had to be designed to allow for appropriate torques for walking as well as gliding
257 | "Following the design of the leg and membrane of the robot, its average gliding ratio (GR) was determined to be 1.88."
258 | "The robot functioned effectively, walking in several gait patterns and crawling with its high DoF legs."
259 | These performances demonstrated the gliding and walking capabilities of the robot and its multi-modal locomotion
260 | "The design of the robot called Multi-Mo Bat involved the establishment of four primary phases of operation: energy storage phase, jumping phase, coasting phase, and gliding phase"
261 | The energy storing phase essentially involves the reservation of energy for the jumping energy.
262 | This process additionally creates a torque around the joint of the shoulders which in turn configures the legs for jumping
263 | "Once the stored energy is released, the jump phase can be initiated"
264 | "When the jump phase is initiated and the robot takes off from the ground, it transitions to the coast phase which occurs until the acme is reached and it begins to descend."
265 | "At this stage, the robot glides down."
266 | The robot designed was powered by a single DC motor which integrated the performances of jumping and flapping
267 | The primary feature of the robot's design was a gear system powered by a single motor which allowed the robot to perform its jumping and flapping motions. 
268 | "Just like the motion of the locust, the motion of the robot is initiated by the flexing of the legs to the position of maximum energy storage after which the energy is released immediately to generate the force necessary to attain flight"
269 | The robot was tested for performance and the results demonstrated that the robot was able to jump to an approximate height of 0.9m while weighing 23g and flapping its wings at a frequency of about 19 Hz.
270 | "The robot tested without flapping wings performed less impressively, showing about 30% decrease in jumping performance as compared to the robot with the wings"
271 | These results are quite impressive[editorializing] as it is expected that the reverse be the case since the weight of the wings should have impacted the jumping.
272 | "The unique feature of Asimov's robots is the Three Laws of Robotics, hardwired in a robot's positronic brain, with which all robots in his fiction must comply, and which ensure that the robot does not turn against its creators"
273 | """Victory Unintentional"" has positronic robots obeying the Three Laws, but also a non-human civilization on Jupiter."
274 | """Let's Get Together"" features humanoid robots, but from a different future (where the Cold War is still in progress), and with no mention of the Three Laws"
275 | "The Robot series is a series of 37 science fiction short stories and six novels by American writer Isaac Asimov, featuring positronic robots."
276 | """Mother Earth"" (1948) - short story, in which no individual robots appear, but positronic robots are part of the background"
277 | "Most of Asimov's robot short stories, which he began to write in 1939, are set in the first age of positronic robotics and space exploration."
278 | "The stories were not initially conceived as a set, but rather all feature his positronic robotsindeed, there are some inconsistencies among them, especially between the short stories and the novels."
279 | "It was the Zoromes, then, who were the spiritual ancestors of my own ""positronic robots,"" all of them, from Robbie to R. Daneel."
280 | "The 1989 anthology Foundation's Friends included the positronic robot stories ""Balance"" by Mike Resnick, ""Blot"" by Hal Clement, ""PAPPI"" by Sheila Finch, ""Plato's Cave"" by Poul Anderson, ""The Fourth Law of Robotics"" by Harry Harrison and ""Carhunters of the Concrete Prairie"" by Robert Sheckley."
281 | "Bicentennial Man (1999) was the first theatrical movie adaptation of any Asimov story or novel and was based on both Asimov's original short story of the same name (1976) and its novel expansion, The Positronic Man (1993)"
282 | "The book also contains the short story in which Asimov's Three Laws of Robotics first appear, which had large influence on later science fiction and had impact on thought on ethics of artificial intelligence as well."
283 | "Its plot incorporates elements of ""Little Lost Robot"",[8] some of Asimov's character names and the Three Laws."
284 | "In 2004 The Saturday Evening Post said that I, Robot's Three Laws ""revolutionized the science fiction genre and made robots far more interesting than they ever had been before."""
285 | "In Aliens, a 1986 movie, the synthetic person Bishop paraphrases Asimov's First Law in the line: ""It is impossible for me to harm, or by omission of action allow to be harmed, a human being."""
286 | "An episode of The Simpsons entitled ""I D'oh Bot"" (2004) has Professor Frink build a robot named ""Smashius Clay"" (also named ""Killhammad Aieee"") that follows all three of Asimov's laws of robotics."
287 | "Leela once told Bender to ""cover his ears"" so that he would not hear the robot-destroying paradox which she used to destroy Robot Santa (he punishes the bad, he kills people, killing is bad, therefore he must punish himself), causing a total breakdown; additionally, Bender has stated that he is Three Laws Safe."
288 | "The Indian science fiction film Endhiran, released in 2010, refers to Asimov's three laws for artificial intelligence for the fictional character Chitti: The Robot."
289 | "When a scientist takes in the robot for evaluation, the panel enquires whether the robot was built using the Three Laws of Robotics."
290 | "Upon their publication in this collection, Asimov wrote a framing sequence presenting the stories as Calvin's reminiscences during an interview with her about her life's work, chiefly concerned with aberrant behaviour of robots and the use of ""robopsychology"" to sort out what is happening in their positronic brain."
291 | "Two months after I read it, I began 'Robbie', about a sympathetic robot, and that was the start of my positronic robot series."
292 | "The positronic brain, which Asimov named his robots' central processors, is what powers Data from Star Trek: The Next Generation, as well as other Soong type androids. "
293 | "Positronic brains have been referenced in a number of other television shows including Doctor Who, Once Upon a Time... Space, Perry Rhodan, The Number of the Beast, and others."
294 | "In ""Someday"" there are non-positronic computers which tell stories and do not obey the Three Laws."
295 | "In ""Sally"" there are positronic brain cars who can damage men or disobey without problems. No other kinds of robots are seen, and there is no mention of the Three Laws."
296 | "In "". . . That Thou Art Mindful of Him"" robots are created with a very flexible Three Laws management, and these create little, simplified robots with no laws that actually act against the Three Laws of Robotics."
297 | "Andrew uses the money to pay for bodily upgrades, keeping himself in perfect shape, but never has his positronic brain altered."
298 | The first scene of the story is explained as Andrew seeks out a robotic surgeon to perform an ultimately fatal operation: altering his positronic brain so that it will decay with time.
299 | "his story is set within Asimov's Foundation universe, which also includes his earlier Susan Calvin positronic robot tales."
300 | "Sir reveals that U.S. Robots has ended a study on generalized pathways and creative robots, frightened by Andrew's unpredictability."
301 | "However, the robot refuses, as the operation is harmful and violates the First Law of Robotics, which says a robot may never harm a human being."
302 | "The Positronic Man is a 1992 novel by American writers Isaac Asimov and Robert Silverberg, based on Asimov's 1976 novelette ""The Bicentennial Man""."
303 | In the twenty-first century the creation of the positronic brain leads to the development of robot laborers and revolutionizes life on Earth. 
304 | "In The Positronic Man, the trends of fictional robotics in Asimov's Robot series (as outlined in the book I, Robot) are detailed as background events, with an indication that they are influenced by Andrew's story."
305 | "Only when Andrew allows his positronic brain to ""decay"", thereby willfully abandoning his immortality, is he declared a human being."
306 | "This story is set within Asimov's Foundation universe, which also includes his earlier Susan Calvin positronic robot tales."
307 | "No individual robots appear, but positronic robots are part of the background."
308 | "Earth faces a confrontation with its colonies, the ""Outer Worlds."" A historian looks back and sees the problem beginning a century and a half earlier, when Aurora got permission to ""introduce positronic robots into their community life."""
309 | "The only witness is a malfunctioning house robot that has suffered damage to its positronic brain because it allowed harm to be done to a human, in violation of the First Law."
310 | "Ultimately, it is revealed that Delmarre's neighbor, roboticist Jothan Leebig, was working on putting positronic brains in spaceships."
311 | "Leebig poisoned Gruer by tricking his robots, using his knowledge of positronic brains, into putting poison into Gruer's drink."
312 | "This would negate the First Law, as such ships would not recognize that humans usually inhabit ships, and would therefore be able to attack and destroy other ships without regard for their crews."
313 | "R. Daneel and R. Giskard discover the roboticists' plan and attempt to stop Amadiro; but are hampered by the First Law of Robotics,"
314 | "Daneel and Giskard, meanwhile, have inferred an additional Zeroth Law of Robotics: A robot may not injure humanity, or through inaction, allow humanity to come to harm."
315 | "It might enable them to overcome Amadiro, if they can use their telepathic perception of humanity to quell the inhibitions of the first law"
316 | "After Amadiro admits their plans, Giskard alters Amadiro's brain (using the newly created Zeroth Law); but in so doing, threatens his own."
317 | "Under the stress of having violated the First Law (in accordance with the Zeroth Law, but with the predicted benefit to humanity being uncertain), R. Giskard himself suffers a soon-fatal malfunction of his positronic brain but manages to confer his telepathic ability upon R. Daneel."
318 | "Dave Langford reviewed Robots and Empire for White Dwarf #85, and stated that ""Asimov always perks up when chopping logic with the Three Laws of Robotics, and here his robots come up with a Fourth, or rather Zeroth, Law."
319 | "In the novel, Asimov depicts the transition from his earlier Milky Way Galaxy, inhabited by both human beings and positronic robots, to his Galactic Empire."
320 | "Gladia is accompanied by the positronic robots R. Daneel Olivaw and R. Giskard Reventlov, both the former property of their creator, Dr. Han Fastolfe, who bequeathed them to Gladia in his will. R. Giskard has secret telepathic powers of which only R. Daneel knows."
321 | "The electrical aspect of robots is used for movement (through motors), sensing (where electrical signals are used to measure things like heat, sound, position, and energy status) and operation (robots need some level of electrical energy supplied to their motors and sensors in order to activate and perform basic operations)"
322 | "Actuators are the ""muscles"" of a robot, the parts which convert stored energy into movement."
323 | "Scientists from several European countries and Israel developed a prosthetic hand in 2009, called SmartHand, which functions like a real oneallowing patients to write with it, type on a keyboard, play piano and perform other fine movements"
324 | "As the robot falls to one side, it would jump slightly in that direction, in order to catch itself."
325 | "A quadruped was also demonstrated which could trot, run, pace, and bound"
326 | "A more advanced way for a robot to walk is by using a dynamic balancing algorithm, which is potentially more robust than the Zero Moment Point technique, as it constantly monitors the robot's motion, and places the feet in order to maintain stability"
327 | "In some of Asimov's other works, he states that the first use of the word robotics was in his short story Runaround (Astounding Science Fiction, March 1942),[4][5] where he introduced his concept of The Three Laws of Robotics"
328 | "Its limb control system allowed it to walk with the lower limbs, and to grip and transport objects with hands, using tactile sensors"
329 | "There are three different types of robotic programs: remote control, artificial intelligence and hybrid."
330 | "Robots that use artificial intelligence interact with their environment on their own without a control source, and can determine reactions to objects and problems they encounter using their preexisting programming."
331 | "Several robots have been made which can walk reliably on two legs, however, none have yet been made which are as robust as a human."
332 | "Many other robots have been built that walk on more than two legs, due to these robots being significantly easier to construct."
333 | "Walking robots can be used for uneven terrains, which would provide better mobility and energy efficiency than other locomotion methods."
334 | "Typically, robots on two legs can walk well on flat floors and can occasionally walk up stairs."
335 | "Several robots, built in the 1980s by Marc Raibert at the MIT Leg Laboratory, successfully demonstrated very dynamic walking. "
336 | "Initially, a robot with only one leg, and a very small foot could stay upright simply by hopping."
337 | " As the robot falls to one side, it would jump slightly in that direction, in order to catch itself."
338 | "A more advanced way for a robot to walk is by using a dynamic balancing algorithm, which is potentially more robust than the Zero Moment Point technique, as it constantly monitors the robot's motion, and places the feet in order to maintain stability."
339 | "This technique was recently demonstrated by Anybots' Dexter Robot,[99] which is so stable, it can even jump"
340 | Perhaps the most promising approach utilizes passive dynamics where the momentum of swinging limbs is used for greater efficiency
341 | "It has been shown that totally unpowered humanoid mechanisms can walk down a gentle slope, using only gravity to propel themselves."
342 | "Using this technique, a robot need only supply a small amount of motor power to walk along a flat surface or a little more to walk up a hill"
343 | One approach mimics the movements of a human climber on a wall with protrusions; adjusting the center of mass and moving each limb in turn to gain leverage.
344 | "Science fiction authors also typically assume that robots will eventually be capable of communicating with humans through speech, gestures, and facial expressions, rather than a command-line interface."
345 | "Evolutionary robots is a methodology that uses evolutionary computation to help design robots, especially the body form, or motion and behavior controllers."
346 | "Direct kinematics or forward kinematics refers to the calculation of end effector position, orientation, velocity, and acceleration when the corresponding joint values are known."
347 | "Inverse kinematics refers to the opposite case in which required joint values are calculated for given end effector values, as done in path planning."
348 | "Once all relevant positions, velocities, and accelerations have been calculated using kinematics, methods from the field of dynamics are used to study the effect of forces upon these movements."
349 | "Normal human gait is a complex process, which happens due to co-ordinated movements of the whole of the body, requiring the whole of Central Nervous System - the brain and spinal cord, to function properly."
350 | The most common cause for gait impairment is due to an injury of one or both legs.
351 | "Gait training is not simply re-educating a patient on how to walk, but also includes an initial assessment of their gait cycle - Gait analysis, creation of a plan to address the problem, as well as teaching the patient on how to walk on different surfaces."
352 | "Assistive devices and splints (orthosis) are often used in gait training, especially with those who have had surgery or an injury on their legs, but also with those who have balance or strength impairments as well."
353 | "Although gait training with parallel bars, treadmills and support systems can be beneficial, the long-term aim of gait training is usually to reduce patients' dependence on such technology in order to walk more in their daily lives."
354 | "A gait cycle is defined as the progression of movements that occurs before one leg can return to a certain position during walking, or ambulation."
355 | The gait cycle is studied in two phases - Swing and stance phase.
356 | Any gait training addressing a gait abnormality starts with a proper gait analysis.
357 | "The gait consists of a series of repetitive movements of the whole body during locomotion and is studied considering that each gait cycle repeats over itself, which is almost correct considering normal subjects."
358 | "The basic two phases are swing and stance phases, depending on whether the leg is free to swing or is in contact with the ground during the phase of gait studied."
359 | The stance phase is approximately 60% of the gait cycle and takes about 0.6 seconds to complete at a normal walking speed.
360 | "The swing phase occurs when the foot is not in contact with the ground, and constitutes about 40% of the gait cycle."
361 | The two point gait pattern requires a high level of coordination and balance
362 | "Recently, electromechanical devices such as the Hocoma Lokomat robot-driven gait orthosis have been introduced with the intention of reducing the physical labour demands on therapists."
363 | "Treadmill training, with or without a body-weight support, is an emerging therapy and is being used with stroke patients to improve kinematic gait parameters"
364 | Research has shown that this form of gait training demonstrates a more normal walking pattern without the compensatory movements commonly associated with stroke
365 | Determining the movement of a robot so that its end-effectors move from an initial configuration to a desired configuration is known as motion planning
366 | "The movement of a kinematic chain, whether it is a robot or an animated character, is modeled by the kinematics equations of the chain."
367 | Movement of one element requires the computation of the joint angles for the other elements to maintain the joint constraints.
368 | "For example, inverse kinematics allows an artist to move the hand of a 3D human model to a desired position and orientation and have an algorithm select the proper angles of the wrist, elbow, and shoulder joints."
369 | "Isaac Asimov considered the issue in the 1950s in his I, Robot. At the insistence of his editor John W. Campbell Jr., he proposed the Three Laws of Robotics to govern artificially intelligent systems."
370 | "Much of his work was then spent testing the boundaries of his three laws to see where they would break down, or where they would create paradoxical or unanticipated behavior."
371 | "A panel convened by the United Kingdom in 2010 revised Asimov's laws to clarify that AI is the responsibility either of its manufacturers, or of its owner/operator."
372 | "The movies Bicentennial Man and A.I. deal with the possibility of sentient robots that could love. I, Robot explored some aspects of Asimov's three laws."
373 | The Three Laws of Robotics (often shortened to The Three Laws or known as Asimov's Laws) are a set of rules devised by science fiction author Isaac Asimov.
374 | "The Three Laws, quoted from the ""Handbook of Robotics, 56th Edition, 2058 A.D."", are: First Law -A robot may not injure a human being or, through inaction, allow a human being to come to harm., Second Law - A robot must obey the orders given it by human beings except where such orders would conflict with the First Law., Third Law -A robot must protect its own existence as long as such protection does not conflict with the First or Second Law"
375 | "The Laws are incorporated into almost all of the positronic robots appearing in his fiction, and cannot be bypassed, being intended as a safety feature."
376 | Many of Asimov's robot-focused stories involve robots behaving in unusual and counter-intuitive ways as an unintended consequence of how the robot applies the Three Laws to the situation in which it finds itself.
377 | The original laws have been altered and elaborated on by Asimov and other authors.
378 | "Asimov also added a fourth, or zeroth law, to precede the others: A robot may not harm humanity, or, by inaction, allow humanity to come to harm."
379 | "The Three Laws, and the zeroth, have pervaded science fiction and are referred to in many books, films, and other media."
380 | "Asimov attributes the Three Laws to John W. Campbell, from a conversation that took place on 23 December 1940."
381 | Campbell claimed that Asimov had the Three Laws already in his mind and that they simply needed to be stated explicitly.
382 | "ccording to his autobiographical writings, Asimov included the First Law's ""inaction"" clause because of Arthur Hugh Clough's poem ""The Latest Decalogue"" (text in Wikisource), which includes the satirical lines ""Thou shalt not kill, but needst not strive / officiously to keep alive""."
383 | "Although Asimov pins the creation of the Three Laws on one particular date, their appearance in his literature happened over a period."
384 | "He wrote two robot stories with no explicit mention of the Laws, ""Robbie"" and ""Reason""."
385 | "He assumed, however, that robots would have certain inherent safeguards. ""Liar!"", his third robot story, makes the first mention of the First Law but not the other two."
386 | "When these stories and several others were compiled in the anthology I, Robot, ""Reason"" and ""Robbie"" were updated to acknowledge all the Three Laws, though the material Asimov added to ""Reason"" is not entirely consistent with the Three Laws as he described them elsewhere"
387 | "In his short story ""Evidence"" Asimov lets his recurring character Dr. Susan Calvin expound a moral basis behind the Three Laws."
388 | "Calvin points out that human beings are typically expected to refrain from harming other human beings (except in times of extreme duress like war, or to save a greater number) and this is equivalent to a robot's First Law"
389 | "Likewise, according to Calvin, society expects individuals to obey instructions from recognized authorities such as doctors, teachers and so forth which equals the Second Law of Robotics."
390 | Finally humans are typically expected to avoid harming themselves which is the Third Law for a robot.
391 | "The plot of ""Evidence"" revolves around the question of telling a human being apart from a robot constructed to appear human – Calvin reasons that if such an individual obeys the Three Laws he may be a robot or simply ""a very good man"""
392 | "Asimov later wrote that he should not be praised for creating the Laws, because they are ""obvious from the start, and everyone is aware of them subliminally."
393 | The Laws just never happened to be put into brief sentences until I managed to do the job.
394 | "I have my answer ready whenever someone asks me if I think that my Three Laws of Robotics will actually be used to govern the behavior of robots, once they become versatile and flexible enough to be able to choose among different courses of behavior."
395 | "My answer is, ""Yes, the Three Laws are the only way in which rational human beings can deal with robotsor with anything else."""
396 | Asimov's stories test his Three Laws in a wide variety of circumstances leading to proposals and rejection of modifications.
397 | "Science fiction scholar James Gunn writes in 1982, ""The Asimov robot stories as a whole may respond best to an analysis on this basis: the ambiguity in the Three Laws and the ways in which Asimov played twenty-nine variations upon a theme"""
398 | "Removing the First Law's ""inaction"" clause solves this problem but creates the possibility of an even greater one: a robot could initiate an action that would harm a human (dropping a heavy weight and failing to catch it is the example given in the text), knowing that it was capable of preventing the harm and then decide not to do so."
399 | "Gaia is a planet with collective intelligence in the Foundation series which adopts a law similar to the First Law, and the Zeroth Law, as its philosophy: Gaia may not harm life or allow life to come to harm."
400 | "Three times during his writing career, Asimov portrayed robots that disregard the Three Laws entirely."
401 | "On the other hand, the short story ""Cal"" (from the collection Gold), told by a first-person robot narrator, features a robot who disregards the Three Laws because he has found something far more importanthe wants to be a writer."
402 | "The third is a short story entitled ""Sally"" in which cars fitted with positronic brains are apparently able to harm and kill humans in disregard of the First Law."
403 | "However, aside from the positronic brain concept, this story does not refer to other robot stories and may not be set in the same continuity."
404 | Without the basic theory of the Three Laws the fictional scientists of Asimov's universe would be unable to design a workable brain unit.
405 | "The character Dr. Gerrigel uses the term ""Asenion"" to describe robots programmed with the Three Laws."
406 | 


--------------------------------------------------------------------------------