├── static ├── subnets.png ├── aim317-sm-arch-1.jpg ├── aim317-sm-arch-2.jpg ├── aim317-sm-arch-3.jpg ├── aim317-sm-arch-4.jpg ├── aim317-sm-arch-5.jpg ├── AIM317 Diagram - A1.png ├── AIM317 Solution Flow.png └── aim317-sm-arch-full.jpg ├── notebooks ├── 1-Transcribe-Translate-Calls │ ├── input │ │ ├── audio-recordings │ │ │ ├── AIM317-Call1-EN.m4a │ │ │ ├── AIM317-Call1-ES.m4a │ │ │ ├── AIM317-Call2-EN.m4a │ │ │ ├── AIM317-Call3-EN.m4a │ │ │ ├── AIM317-Call4-EN.m4a │ │ │ ├── AIM317-Call5-EN.m4a │ │ │ └── AIM317-Call5-ES.m4a │ │ ├── translate-custom-terminology.txt │ │ ├── custom-vocabulary-ES.txt │ │ ├── custom-vocabulary-EN.txt │ │ └── translate-parallel-data.txt │ └── AIM317-reInvent2021-transcribe-and-translate-customer-calls.ipynb ├── 5-Visualize-Insights │ ├── quicksight_raw_manifest.json │ └── AIM317-reInvent2021-prepare-quicksight-inputs.ipynb ├── 4-Detect-Call-Sentiment │ └── AIM317-reInvent2021-detect-customer-sentiment.ipynb ├── 2-Train-Detect-Entities │ ├── annotations.csv │ ├── AIM317-reInvent2021-train-and-detect-entities.ipynb │ └── train.csv └── 3-Train-Classify-Calls │ └── AIM317-reInvent2021-train-and-classify-customer-calls.ipynb ├── CODE_OF_CONDUCT.md ├── src ├── importTerminology.py ├── paginateProcessDataTrainTestFiles.py ├── createDocumentClassifier.py ├── startSentimentDetection.py ├── createEntityRecognizer.py ├── startTranscriptionJob.py ├── createVocabulary.py ├── createEndpoint.py ├── classifyDocument.py ├── detectEntities.py ├── buildTrainTest.py └── translateText.py ├── LICENSE ├── NOTICE ├── cloudformation ├── sagemakerNotebookTemplate.yaml ├── sagemakerNotebookEventEngineTemplate.yaml └── aim317Template.yml ├── CONTRIBUTING.md └── README.md /static/subnets.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/aim317-uncover-insights-customer-conversations/main/static/subnets.png -------------------------------------------------------------------------------- /static/aim317-sm-arch-1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/aim317-uncover-insights-customer-conversations/main/static/aim317-sm-arch-1.jpg -------------------------------------------------------------------------------- /static/aim317-sm-arch-2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/aim317-uncover-insights-customer-conversations/main/static/aim317-sm-arch-2.jpg -------------------------------------------------------------------------------- /static/aim317-sm-arch-3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/aim317-uncover-insights-customer-conversations/main/static/aim317-sm-arch-3.jpg -------------------------------------------------------------------------------- /static/aim317-sm-arch-4.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/aim317-uncover-insights-customer-conversations/main/static/aim317-sm-arch-4.jpg -------------------------------------------------------------------------------- /static/aim317-sm-arch-5.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/aim317-uncover-insights-customer-conversations/main/static/aim317-sm-arch-5.jpg -------------------------------------------------------------------------------- /static/AIM317 Diagram - A1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/aim317-uncover-insights-customer-conversations/main/static/AIM317 Diagram - A1.png -------------------------------------------------------------------------------- /static/AIM317 Solution Flow.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/aim317-uncover-insights-customer-conversations/main/static/AIM317 Solution Flow.png -------------------------------------------------------------------------------- /static/aim317-sm-arch-full.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/aim317-uncover-insights-customer-conversations/main/static/aim317-sm-arch-full.jpg -------------------------------------------------------------------------------- /notebooks/1-Transcribe-Translate-Calls/input/audio-recordings/AIM317-Call1-EN.m4a: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/aim317-uncover-insights-customer-conversations/main/notebooks/1-Transcribe-Translate-Calls/input/audio-recordings/AIM317-Call1-EN.m4a -------------------------------------------------------------------------------- /notebooks/1-Transcribe-Translate-Calls/input/audio-recordings/AIM317-Call1-ES.m4a: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/aim317-uncover-insights-customer-conversations/main/notebooks/1-Transcribe-Translate-Calls/input/audio-recordings/AIM317-Call1-ES.m4a -------------------------------------------------------------------------------- /notebooks/1-Transcribe-Translate-Calls/input/audio-recordings/AIM317-Call2-EN.m4a: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/aim317-uncover-insights-customer-conversations/main/notebooks/1-Transcribe-Translate-Calls/input/audio-recordings/AIM317-Call2-EN.m4a -------------------------------------------------------------------------------- /notebooks/1-Transcribe-Translate-Calls/input/audio-recordings/AIM317-Call3-EN.m4a: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/aim317-uncover-insights-customer-conversations/main/notebooks/1-Transcribe-Translate-Calls/input/audio-recordings/AIM317-Call3-EN.m4a -------------------------------------------------------------------------------- /notebooks/1-Transcribe-Translate-Calls/input/audio-recordings/AIM317-Call4-EN.m4a: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/aim317-uncover-insights-customer-conversations/main/notebooks/1-Transcribe-Translate-Calls/input/audio-recordings/AIM317-Call4-EN.m4a -------------------------------------------------------------------------------- /notebooks/1-Transcribe-Translate-Calls/input/audio-recordings/AIM317-Call5-EN.m4a: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/aim317-uncover-insights-customer-conversations/main/notebooks/1-Transcribe-Translate-Calls/input/audio-recordings/AIM317-Call5-EN.m4a -------------------------------------------------------------------------------- /notebooks/1-Transcribe-Translate-Calls/input/audio-recordings/AIM317-Call5-ES.m4a: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/aim317-uncover-insights-customer-conversations/main/notebooks/1-Transcribe-Translate-Calls/input/audio-recordings/AIM317-Call5-ES.m4a -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | ## Code of Conduct 2 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 3 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 4 | opensource-codeofconduct@amazon.com with any additional questions or comments. 5 | -------------------------------------------------------------------------------- /notebooks/5-Visualize-Insights/quicksight_raw_manifest.json: -------------------------------------------------------------------------------- 1 | { 2 | "fileLocations": [ 3 | { 4 | "URIPrefixes": [ 5 | "s3://bucket/prefix" 6 | ] 7 | } 8 | ], 9 | "globalUploadSettings": { 10 | "format": "CSV", 11 | "delimiter": ",", 12 | "textqualifier": "'", 13 | "containsHeader": "true" 14 | } 15 | } 16 | -------------------------------------------------------------------------------- /notebooks/1-Transcribe-Translate-Calls/input/translate-custom-terminology.txt: -------------------------------------------------------------------------------- 1 | es,en 2 | Buenas noches, Good evening 3 | este es Easytron, this is Easytron 4 | necesitado, in need 5 | bots de Trantor, Trantor robots 6 | chocolate para mi hijo, choclate milkshake 7 | tres leyes de la robótica, three laws of robotics 8 | tres leyes, three laws 9 | noches, evening 10 | licuado, milkshake 11 | bots, robots 12 | robo, robot 13 | bot, robot 14 | -------------------------------------------------------------------------------- /notebooks/1-Transcribe-Translate-Calls/input/custom-vocabulary-ES.txt: -------------------------------------------------------------------------------- 1 | Phrase IPA SoundsLike DisplayAs 2 | ahí-Sí-Tron Easytron 3 | necesita necesito 4 | cumple compre 5 | quise-ofenderlo no-quise-ofenderlo 6 | Kuerten cubren 7 | otro-mes-ticos domésticos 8 | sus-bocinas subrutinas 9 | Ecuador licuado 10 | cuadro licuado 11 | y-Citron Easytron 12 | temes-EP-cinco-mil MCP-5000 13 | MSP-cinco-milantes MCP-5000 14 | Sí-sitúan Easytron 15 | robadas robot 16 | robo robot 17 | bot robot -------------------------------------------------------------------------------- /notebooks/1-Transcribe-Translate-Calls/input/custom-vocabulary-EN.txt: -------------------------------------------------------------------------------- 1 | Phrase IPA SoundsLike DisplayAs 2 | cyberpunk CyberPunk 3 | easy-drawn EasyTron 4 | emotional-intelligence Emotional Intelligence 5 | galactic-empire Galactic Empire 6 | is-it-ron EasyTron 7 | kinematics Kinematics 8 | nightfall Nightfall 9 | positronic-brain Positronic Brain 10 | psychohistory Psychohistory 11 | robot Robot 12 | robot-ethics Robot Ethics 13 | three-laws Three Laws 14 | transistor Trantor 15 | trent-or Trantor 16 | worship version -------------------------------------------------------------------------------- /src/importTerminology.py: -------------------------------------------------------------------------------- 1 | import boto3 2 | 3 | def lambda_handler(event, context): 4 | 5 | record = event['Records'][0] 6 | 7 | print("Record: " + str(record)) 8 | 9 | s3bucket = record['s3']['bucket']['name'] 10 | s3object = record['s3']['object']['key'] 11 | 12 | s3Path = "s3://" + s3bucket + "/" + s3object 13 | 14 | s3_resource = boto3.resource('s3') 15 | 16 | temp = s3_resource.Object(s3bucket, s3object) 17 | term_file = temp.get()['Body'].read().decode('utf-8') 18 | 19 | client = boto3.client('translate') 20 | 21 | print("S3 Path:" + s3Path) 22 | 23 | response = client.import_terminology( 24 | Name="aim317-custom-terminology", 25 | MergeStrategy='OVERWRITE', 26 | TerminologyData={ 27 | 'File': term_file, 28 | 'Format': 'CSV' 29 | }, 30 | ) 31 | 32 | return { 33 | 'TerminologyName': response['TerminologyProperties']['Name'] 34 | } -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of 4 | this software and associated documentation files (the "Software"), to deal in 5 | the Software without restriction, including without limitation the rights to 6 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of 7 | the Software, and to permit persons to whom the Software is furnished to do so. 8 | 9 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 10 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS 11 | FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR 12 | COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER 13 | IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN 14 | CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 15 | 16 | -------------------------------------------------------------------------------- /src/paginateProcessDataTrainTestFiles.py: -------------------------------------------------------------------------------- 1 | import boto3 2 | import os 3 | import io 4 | import pandas as pd 5 | 6 | def lambda_handler(event, context): 7 | 8 | s3 = boto3.client('s3') 9 | raw_data = s3.get_object(Bucket=os.environ['comprehendBucket'], Key='comprehend/train/aim317-cust-class-train-data.csv') 10 | raw_content = pd.read_csv(io.BytesIO(raw_data['Body'].read())) 11 | print(raw_content) 12 | raw_content['label'] = raw_content['label'].astype(str) 13 | selected_columns = ['label', 'text'] 14 | selected_data = raw_content[selected_columns] 15 | 16 | DSTTRAINFILE='/tmp/comprehend-train.csv' 17 | 18 | selected_data.to_csv(path_or_buf=DSTTRAINFILE, 19 | header=False, 20 | index=False, 21 | escapechar='\\', 22 | doublequote=False, 23 | quotechar='"') 24 | 25 | s3 = boto3.client('s3') 26 | prefix = 'comprehend-custom-classifier' 27 | bucket = os.environ['comprehendBucket'] 28 | 29 | s3.upload_file(DSTTRAINFILE, bucket, prefix+'/comprehend-train.csv') -------------------------------------------------------------------------------- /src/createDocumentClassifier.py: -------------------------------------------------------------------------------- 1 | import boto3 2 | import uuid 3 | import os 4 | 5 | def lambda_handler(event, context): 6 | 7 | DSTTRAINFILE='comprehend-train.csv' 8 | s3_train_data = 's3://{}/{}/{}'.format(os.environ['classifierBucket'], os.environ['classifierBucketPrefix'], DSTTRAINFILE) 9 | s3_output_job = 's3://{}/{}/{}'.format(os.environ['classifierBucket'], os.environ['classifierBucket'], 'output/train_job') 10 | print('training data location: ',s3_train_data, "output location:", s3_output_job) 11 | 12 | uid = str(uuid.uuid4()) 13 | comprehend = boto3.client('comprehend') 14 | 15 | training_job = comprehend.create_document_classifier( 16 | DocumentClassifierName='aim317-custom-classifier-' + uid, 17 | DataAccessRoleArn=os.environ['ComprehendARN'], 18 | InputDataConfig={ 19 | 'S3Uri': s3_train_data 20 | }, 21 | OutputDataConfig={ 22 | 'S3Uri': s3_output_job 23 | }, 24 | LanguageCode='en', 25 | VersionName='v001' 26 | ) 27 | 28 | return { 29 | 'DocumentClassifierArn': training_job['DocumentClassifierArn'] 30 | } -------------------------------------------------------------------------------- /NOTICE: -------------------------------------------------------------------------------- 1 | aim317-uncover-insights-customer-conversations 2 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | 4 | This project uses training data created from content on Wikipedia licensed under CC-BY-SA-3.0. 5 | # https://en.wikipedia.org/wiki/Wikipedia:Text_of_Creative_Commons_Attribution-ShareAlike_3.0_Unported_License 6 | # notebooks/2-Train-Detect-Entities/train.csv 7 | * https://en.wikipedia.org/wiki/Robot_series 8 | * https://en.wikipedia.org/wiki/I,_Robot 9 | * https://en.wikipedia.org/wiki/The_Complete_Robot 10 | * https://en.wikipedia.org/wiki/The_Bicentennial_Man 11 | * https://en.wikipedia.org/wiki/The_Positronic_Man 12 | * https://en.wikipedia.org/wiki/Mother_Earth_(novella) 13 | * https://en.wikipedia.org/wiki/The_Naked_Sun 14 | * https://en.wikipedia.org/wiki/Robots_and_Empire 15 | * https://en.wikipedia.org/wiki/Robotics 16 | * https://en.wikipedia.org/wiki/Gait_training 17 | * https://en.wikipedia.org/wiki/Inverse_kinematics 18 | * https://en.wikipedia.org/wiki/Ethics_of_artificial_intelligence 19 | * https://en.wikipedia.org/wiki/Three_Laws_of_Robotics 20 | * https://en.wikipedia.org/wiki/Positronic_brain 21 | * https://en.wikipedia.org/wiki/Neural_pathway 22 | * https://en.wikipedia.org/wiki/Brain%E2%80%93computer_interface 23 | * https://en.wikipedia.org/wiki/Robot_locomotion 24 | -------------------------------------------------------------------------------- /src/startSentimentDetection.py: -------------------------------------------------------------------------------- 1 | import awswrangler as wr 2 | import pandas as pd 3 | import boto3 4 | import os 5 | 6 | def lambda_handler(event, context): 7 | 8 | client = boto3.client('comprehend') 9 | 10 | inprefix = 'comprehendInput' 11 | outprefix = 'quicksight/temp/insights' 12 | 13 | comprehend = boto3.client('comprehend') 14 | s3 = boto3.client('s3') 15 | s3_resource = boto3.resource('s3') 16 | 17 | paginator = s3.get_paginator('list_objects_v2') 18 | pages = paginator.paginate(Bucket=os.environ['ComprehendBucket'], Prefix=inprefix) 19 | job_name_list = [] 20 | t_prefix = 'quicksight/data/sentiment' 21 | 22 | cols = ['transcript_name', 'sentiment'] 23 | df_sent = pd.DataFrame(columns=cols) 24 | 25 | for page in pages: 26 | for obj in page['Contents']: 27 | transcript_file_name = obj['Key'].split('/')[1] 28 | temp = s3_resource.Object(os.environ['ComprehendBucket'], obj['Key']) 29 | transcript_contents = temp.get()['Body'].read().decode('utf-8') 30 | response = comprehend.detect_sentiment(Text=transcript_contents, LanguageCode='en') 31 | df_sent.loc[len(df_sent.index)] = [transcript_file_name.strip('en-').strip('.txt'),response['Sentiment']] 32 | 33 | wr.s3.to_csv(df_sent, path='s3://' + os.environ['ComprehendBucket'] + '/' + t_prefix + '/' + 'sentiment.csv') 34 | -------------------------------------------------------------------------------- /src/createEntityRecognizer.py: -------------------------------------------------------------------------------- 1 | import boto3 2 | import uuid 3 | import os 4 | 5 | def lambda_handler(event, context): 6 | 7 | jobName = "aim317-recognizer" + '-' + str(uuid.uuid4()) 8 | 9 | client = boto3.client('comprehend') 10 | 11 | s3TrainingBucket = os.environ['ComprehendAnnotationBucket'] 12 | s3AnnotationBucket = os.environ['ComprehendAnnotationBucket'] 13 | 14 | response = client.create_entity_recognizer( 15 | RecognizerName=jobName, 16 | DataAccessRoleArn=os.environ['ComprehendARN'], 17 | InputDataConfig={ 18 | 'DataFormat': 'COMPREHEND_CSV', 19 | "EntityTypes": [ 20 | { 21 | "Type": "MOVEMENT" 22 | }, 23 | { 24 | "Type": "BRAIN" 25 | }, 26 | { 27 | "Type": "ETHICS" 28 | } 29 | ], 30 | 'Documents': { 31 | 'S3Uri': "s3://" + s3TrainingBucket + "/comprehend/train/train.csv", 32 | 'InputFormat': 'ONE_DOC_PER_LINE' 33 | }, 34 | 'Annotations': { 35 | 'S3Uri': "s3://" + s3AnnotationBucket + "/comprehend/train/annotations.csv", 36 | } 37 | }, 38 | LanguageCode='en', 39 | VersionName= 'v001' 40 | ) 41 | 42 | return { 43 | 'EntityRecognizerArn': response['EntityRecognizerArn'] 44 | } -------------------------------------------------------------------------------- /src/startTranscriptionJob.py: -------------------------------------------------------------------------------- 1 | import boto3 2 | import uuid 3 | import json 4 | import os 5 | 6 | def lambda_handler(event, context): 7 | 8 | record = event['Records'][0] 9 | 10 | print("Record: " + str(record)) 11 | 12 | s3bucket = record['s3']['bucket']['name'] 13 | s3object = record['s3']['object']['key'] 14 | s3fileName = record['s3']['object']['key'].split("/")[-1] 15 | 16 | s3Path = "s3://" + s3bucket + "/" + s3object 17 | jobName = s3fileName + '-' + str(uuid.uuid4()) 18 | 19 | client = boto3.client('transcribe') 20 | 21 | vocabLanguage = s3object.split('-')[2].split('.')[0] 22 | if vocabLanguage == "EN": 23 | vocabLanguage = 'en-US' 24 | vocabularyName = os.environ['ENVocabularyName'] 25 | elif vocabLanguage == "ES": 26 | vocabLanguage = 'es-US' 27 | vocabularyName = os.environ['ESVocabularyName'] 28 | 29 | response = client.start_transcription_job( 30 | TranscriptionJobName=jobName, 31 | LanguageCode=vocabLanguage, 32 | Settings = {'VocabularyName': vocabularyName, 33 | 'ShowSpeakerLabels': True, 34 | 'MaxSpeakerLabels': 2}, 35 | Media={ 36 | 'MediaFileUri': s3Path 37 | }, 38 | OutputBucketName = os.environ['outputBucket'], 39 | OutputKey = os.environ['outputKey'] + s3fileName.split(".")[0] + "-transcription" 40 | ) 41 | 42 | return { 43 | 'TranscriptionJobName': response['TranscriptionJob']['TranscriptionJobName'] 44 | } -------------------------------------------------------------------------------- /src/createVocabulary.py: -------------------------------------------------------------------------------- 1 | import boto3 2 | import uuid 3 | import json 4 | import os 5 | 6 | def lambda_handler(event, context): 7 | 8 | record = event['Records'][0] 9 | 10 | s3bucket = record['s3']['bucket']['name'] 11 | s3object = record['s3']['object']['key'] 12 | 13 | print(s3object.split(".")[0].split("-")[2]) 14 | 15 | if s3object.split(".")[0].split("-")[2] == "EN": 16 | 17 | s3Path = "s3://" + s3bucket + "/" + s3object 18 | VocabName = "custom-vocab-EN-" + str(uuid.uuid4()) 19 | 20 | client = boto3.client('transcribe') 21 | 22 | print("S3 Path:" + s3Path) 23 | 24 | response = client.create_vocabulary( 25 | VocabularyName=VocabName, 26 | LanguageCode='en-US', 27 | VocabularyFileUri = s3Path, 28 | ) 29 | 30 | return { 31 | 'VocabularybName': response['VocabularyName'] 32 | } 33 | 34 | elif s3object.split(".")[0].split("-")[2] == "ES": 35 | s3Path = "s3://" + s3bucket + "/" + s3object 36 | VocabName = "custom-vocab-ES-" + str(uuid.uuid4()) 37 | 38 | client = boto3.client('transcribe') 39 | 40 | print("S3 Path:" + s3Path) 41 | 42 | response = client.create_vocabulary( 43 | VocabularyName=VocabName, 44 | LanguageCode='es-ES', 45 | VocabularyFileUri = s3Path, 46 | ) 47 | 48 | return { 49 | 'VocabularybName': response['VocabularyName'] 50 | } 51 | 52 | else: 53 | 54 | return { 55 | 'ErrorCode': "Language not in filename, must end in EN or ES" 56 | } -------------------------------------------------------------------------------- /src/createEndpoint.py: -------------------------------------------------------------------------------- 1 | import boto3 2 | import uuid 3 | import os 4 | 5 | def lambda_handler(event, context): 6 | 7 | ## Get Model ARN depending on argument passed 8 | 9 | client = boto3.client('comprehend') 10 | 11 | requestParams = event['endpointType'] 12 | 13 | if requestParams == "EntityRecognizer": 14 | endpointName = "aim317-entity-recognizer" + '-' + str(uuid.uuid4())[:8] 15 | response = client.list_entity_recognizers( 16 | Filter={ 17 | 'Status': 'TRAINED', 18 | } 19 | ) 20 | if not response['EntityRecognizerPropertiesList']: 21 | return "No models trained, please check the Comprehend dashboard" 22 | 23 | modelARN = response['EntityRecognizerPropertiesList'][0]['EntityRecognizerArn'] 24 | 25 | elif requestParams == "DocumentClassifier": 26 | endpointName = "aim317-document-classifier" + '-' + str(uuid.uuid4())[:8] 27 | 28 | response = client.list_document_classifiers( 29 | Filter={ 30 | 'Status': 'TRAINED', 31 | } 32 | ) 33 | if not response['DocumentClassifierPropertiesList']: 34 | return "No models trained, please check the Comprehend dashboard" 35 | 36 | modelARN = response['DocumentClassifierPropertiesList'][0]['DocumentClassifierArn'] 37 | 38 | response = client.create_endpoint( 39 | EndpointName=endpointName, 40 | ModelArn=modelARN, 41 | DesiredInferenceUnits=4, 42 | DataAccessRoleArn=os.environ['ComprehendARN'] 43 | ) 44 | 45 | return { 46 | 'EndpointArn': response['EndpointArn'] 47 | } -------------------------------------------------------------------------------- /src/classifyDocument.py: -------------------------------------------------------------------------------- 1 | import awswrangler as wr 2 | import pandas as pd 3 | import boto3 4 | import os 5 | 6 | 7 | def lambda_handler(event, context): 8 | 9 | s3 = boto3.client('s3') 10 | s3_resource = boto3.resource('s3') 11 | comprehend = boto3.client('comprehend') 12 | t_prefix = 'quicksight/data/cta' 13 | 14 | paginator = s3.get_paginator('list_objects_v2') 15 | pages = paginator.paginate(Bucket=os.environ['classifierBucket'], Prefix='comprehendInput') 16 | a = [] 17 | 18 | cols = ['transcript_name', 'cta_status'] 19 | df_class = pd.DataFrame(columns=cols) 20 | 21 | comprehendEndpoint = comprehend.list_endpoints( 22 | Filter={ 23 | 'Status': 'IN_SERVICE', 24 | } 25 | ) 26 | 27 | for item in comprehendEndpoint.get('EndpointPropertiesList'): 28 | if 'document-classifier-endpoint' in item['EndpointArn']: 29 | endpointArn = item['EndpointArn'] 30 | 31 | for page in pages: 32 | for obj in page['Contents']: 33 | transcript_file_name = obj['Key'].split('/')[1] 34 | temp = s3_resource.Object(os.environ['classifierBucket'], obj['Key']) 35 | transcript_content = temp.get()['Body'].read().decode('utf-8') 36 | transcript_truncated = transcript_content[-400:] 37 | response = comprehend.classify_document(Text=transcript_truncated, EndpointArn=endpointArn) 38 | a = response['Classes'] 39 | tempcols = ['Name', 'Score'] 40 | df_temp = pd.DataFrame(columns=tempcols) 41 | for i in range(0, 2): 42 | df_temp.loc[len(df_temp.index)] = [a[i]['Name'], a[i]['Score']] 43 | cta = df_temp.iloc[df_temp.Score.argmax(), 0:2]['Name'] 44 | df_class.loc[len(df_class.index)] = [transcript_file_name.strip('en-').strip('.txt'), cta] 45 | 46 | wr.s3.to_csv(df_class, path='s3://' + os.environ['classifierBucket'] + '/' + t_prefix + '/' + 'cta_status.csv') 47 | -------------------------------------------------------------------------------- /src/detectEntities.py: -------------------------------------------------------------------------------- 1 | import boto3 2 | import os 3 | import pandas as pd 4 | import awswrangler as wr 5 | 6 | def lambda_handler(event, context): 7 | 8 | s3 = boto3.client('s3') 9 | s3_resource = boto3.resource('s3') 10 | comprehend = boto3.client('comprehend') 11 | 12 | t_prefix = 'quicksight/data/entity' 13 | 14 | paginator = s3.get_paginator('list_objects_v2') 15 | pages = paginator.paginate(Bucket=os.environ['entityDetectionBucket'], Prefix='comprehendInput/') 16 | 17 | tempcols = ['Type', 'Score'] 18 | df_temp = pd.DataFrame(columns=tempcols) 19 | 20 | cols = ['transcript_name', 'entity_type'] 21 | df_ent = pd.DataFrame(columns=cols) 22 | 23 | comprehendEndpoint = comprehend.list_endpoints( 24 | Filter={ 25 | 'Status': 'IN_SERVICE', 26 | } 27 | ) 28 | 29 | for item in comprehendEndpoint.get('EndpointPropertiesList'): 30 | if 'entity-recognizer-endpoint' in item['EndpointArn']: 31 | endpointArn = item['EndpointArn'] 32 | 33 | for page in pages: 34 | for obj in page['Contents']: 35 | transcript_file_name = obj['Key'].split('/')[1] 36 | temp = s3_resource.Object(os.environ['entityDetectionBucket'], obj['Key']) 37 | transcript_content = temp.get()['Body'].read().decode('utf-8') 38 | transcript_truncated = transcript_content[500:1800] 39 | response = comprehend.detect_entities(Text=transcript_truncated, LanguageCode='en', EndpointArn=endpointArn) 40 | df_temp = pd.DataFrame(columns=tempcols) 41 | for ent in response['Entities']: 42 | df_temp.loc[len(df_temp.index)] = [ent['Type'],ent['Score']] 43 | if len(df_temp) > 0: 44 | entity = df_temp.iloc[df_temp.Score.argmax(), 0:2]['Type'] 45 | else: 46 | entity = 'No entities' 47 | 48 | df_ent.loc[len(df_ent.index)] = [transcript_file_name.strip('en-'),entity] 49 | 50 | wr.s3.to_csv(df_ent, path='s3://' + os.environ['entityDetectionBucket'] + '/' + t_prefix + '/' + 'entities.csv') 51 | -------------------------------------------------------------------------------- /src/buildTrainTest.py: -------------------------------------------------------------------------------- 1 | import boto3 2 | import os 3 | import pandas as pd 4 | import subprocess 5 | import json 6 | import time 7 | import pprint 8 | import numpy as np 9 | import string 10 | import datetime 11 | import random 12 | 13 | def lambda_handler(event, context): 14 | 15 | s3 = boto3.client('s3') 16 | s3_resource = boto3.resource('s3') 17 | bucket = os.environ['s3Bucket'] 18 | prefix = 'Comprehend-Custom-Classification' 19 | bucket = 'aim317-workshop-bucket' 20 | 21 | DSTTRAINFILE='data/training/comprehend-train.csv' 22 | DSTVALIDATIONFILE='data/test/comprehend-test.csv' 23 | 24 | raw_data = pd.read_csv('data/training/aim317-cust-class-train-data.csv') 25 | raw_data['label'] = raw_data['label'].astype(str) 26 | raw_data.groupby('label')['text'].count() 27 | selected_columns = ['label', 'text'] 28 | selected_data = raw_data[selected_columns] 29 | selected_data.shape 30 | selected_data.groupby('label')['text'].count() 31 | 32 | selected_data.to_csv(path_or_buf=DSTTRAINFILE, 33 | header=False, 34 | index=False, 35 | escapechar='\\', 36 | doublequote=False, 37 | quotechar='"') 38 | 39 | s3 = boto3.client('s3') 40 | comprehend = boto3.client('comprehend') 41 | 42 | s3.upload_file(DSTTRAINFILE, bucket, prefix+'/'+DSTTRAINFILE) 43 | 44 | s3_train_data = 's3://{}/{}/{}'.format(bucket, prefix, DSTTRAINFILE) 45 | s3_output_job = 's3://{}/{}/{}'.format(bucket, prefix, 'output/train_job') 46 | print('training data location: ',s3_train_data, "output location:", s3_output_job) 47 | 48 | id = str(datetime.datetime.now().strftime("%s")) 49 | training_job = comprehend.create_document_classifier( 50 | DocumentClassifierName='BYOD-Custom-Classifier-'+ id, 51 | DataAccessRoleArn=os.environ['ServiceRoleArn'], 52 | InputDataConfig={ 53 | 'S3Uri': s3_train_data 54 | }, 55 | OutputDataConfig={ 56 | 'S3Uri': s3_output_job 57 | }, 58 | LanguageCode='en', 59 | VersionName= 'v001', 60 | ) 61 | 62 | response = comprehend.describe_document_classifier( 63 | DocumentClassifierArn=training_job['DocumentClassifierArn'] 64 | ) 65 | -------------------------------------------------------------------------------- /cloudformation/sagemakerNotebookTemplate.yaml: -------------------------------------------------------------------------------- 1 | --- 2 | AWSTemplateFormatVersion: '2010-09-09' 3 | 4 | Description: IAM Policies, and SageMaker Notebook to work with Amazon Comprehend, it will also clone the Lab codebase into the Notebook before you get started. 5 | 6 | Parameters: 7 | 8 | NotebookName: 9 | Type: String 10 | Default: AIM317WorkshopNotebook 11 | Description: Enter the name of the SageMaker notebook instance. Deafault is ComprehendLabNotebook. 12 | 13 | DefaultCodeRepo: 14 | Type: String 15 | Default: https://github.com/aws-samples/uncover-insights-from-customer-conversations.git 16 | Description: Enter the url of a git code repository for this lab 17 | 18 | InstanceType: 19 | Type: String 20 | Default: ml.t2.medium 21 | AllowedValues: 22 | - ml.t2.medium 23 | - ml.m4.xlarge 24 | - ml.c5.xlarge 25 | - ml.p2.xlarge 26 | - ml.p3.2xlarge 27 | Description: Enter instance type. Default is ml.t2.medium. 28 | 29 | VolumeSize: 30 | Type: Number 31 | Default: 10 32 | MinValue: 5 33 | MaxValue: 16384 34 | ConstraintDescription: Must be an integer between 5 (GB) and 16384 (16 TB). 35 | Description: Enter the size of the EBS volume in GB. Default is 10 GB. 36 | 37 | Resources: 38 | # SageMaker Execution Role 39 | SageMakerIamRole: 40 | Type: "AWS::IAM::Role" 41 | Properties: 42 | AssumeRolePolicyDocument: 43 | Version: "2012-10-17" 44 | Statement: 45 | - 46 | Effect: Allow 47 | Principal: 48 | Service: sagemaker.amazonaws.com 49 | Action: sts:AssumeRole 50 | Path: "/" 51 | ManagedPolicyArns: 52 | - "arn:aws:iam::aws:policy/AmazonSageMakerFullAccess" 53 | - "arn:aws:iam::aws:policy/AmazonS3FullAccess" 54 | - "arn:aws:iam::aws:policy/ComprehendFullAccess" 55 | - "arn:aws:iam::aws:policy/IAMFullAccess" 56 | - "arn:aws:iam::aws:policy/TranslateFullAccess" 57 | - "arn:aws:iam::aws:policy/AmazonTranscribeFullAccess" 58 | 59 | # SageMaker notebook 60 | NotebookInstance: 61 | Type: "AWS::SageMaker::NotebookInstance" 62 | Properties: 63 | InstanceType: !Ref InstanceType 64 | NotebookInstanceName: !Ref NotebookName 65 | RoleArn: !GetAtt SageMakerIamRole.Arn 66 | VolumeSizeInGB: !Ref VolumeSize 67 | DefaultCodeRepository: !Ref DefaultCodeRepo -------------------------------------------------------------------------------- /notebooks/1-Transcribe-Translate-Calls/input/translate-parallel-data.txt: -------------------------------------------------------------------------------- 1 | es,en 2 | Buenas noches, este es Easytron, donde un amigo necesitado es un amigo de hecho ¿cómo puedo ayudarlo hoy?,Good evening this is Easytron where a friend in need is friend indeed how may I help you today? 3 | Tengo uno de estos bots de Trantor para cocinar y todo eso y simplemente no hace un licuado de chocolate para mi hijo. Sé que cargué la receta y la puedo ver pero si pido un licuado de chocolate el robot simplemente no hace nada,Yeah I got one of these Trantor bots for cooking and whatnot and it just wouldn’t make a Chocolate milkshake for my son. I know I loaded the recipe correctly and it is visible but if I ask for Chocolate milkshake the robot just freezes in place. 4 | Gracias por esperar. Parece que el robot estaba configurado con la versión predeterminada de las tres leyes de la robótica. Las tres leyes son las características fundamentales con las que se construye todo robot. Por lo general actualizamos a una versión personalizada de las tres leyes para nuestros robots domésticos con subrutinas adicionales que ayudan al robot a clasificar qué es dañino para los humanos y qué no lo es. Entonces en su caso el robot interpretó las tres leyes en el sentido de que el licuado de chocolate es dañino para los humanos,Thank you for holding. It looks like the robot was encoded with the default version of the three laws of robotics. The three laws are the foundational characteristics every robot is built with. We typically upgrade to a customized version of the three laws for our domestic robots with additional subroutines that help the robot classify what is harmful to humans and what is not. So in your case the robot interpreted the three laws to mean that chocolate milkshake is harmful to humans. 5 | Todo parece estar bien fisicamente pero desde ayer el comportamiento del robot ha sido bastante extraño. Parece que piensa que nuestro perro es un peligro e intenta atacarlo. Yo sé que sus robots se apegan a las 3 leyes de la robótica pero ¿hay alguna ley qué especifique qué pasa con las mascotas?,Everything appears to be alright physically but the robot’s behavior has been very strange since yesterday. It seems to think our dog is a danger and tries to attack it. I know your robots adhere to the three laws of robotics but is there any law that covers what happens with pets? 6 | Bueno actualmente las 3 leyes de la robótica solo cubren a los humanos. No lo puedo ayudar mas de eso. ¿Puedo solicitar un reemplazo para ver si el nuevo funciona para usted? ¿Podría abrir un ticket?… ¿Señor David?… ¿Hola? …¿Sigue en la línea?,Well the three laws of robotics cover only humans at this point I can’t help you any more than that I can get this one replaced to see if the new one will work for you? Should I open a ticket? Sir? Hello? Are you still there? 7 | Gracias ya ubiqué su orden y veo que tiene un robot de la serie MCP-5000. Antes de que envíe la señal de apagado de su robot ¿puedo preguntarle si por casualidad ha derramado algún liquido sobre su robot Easytron o si ha sido expuesto a alguna temperatura fuera de los límites seguros?,Thank you I see your order now and that you have an MCP-5000 series. Before I send a shut-down signal to your robot can I ask if you happened to spill any liquids on the Easytron robot or expose it to temperatures outside the safety limits? -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing Guidelines 2 | 3 | Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional 4 | documentation, we greatly value feedback and contributions from our community. 5 | 6 | Please read through this document before submitting any issues or pull requests to ensure we have all the necessary 7 | information to effectively respond to your bug report or contribution. 8 | 9 | 10 | ## Reporting Bugs/Feature Requests 11 | 12 | We welcome you to use the GitHub issue tracker to report bugs or suggest features. 13 | 14 | When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already 15 | reported the issue. Please try to include as much information as you can. Details like these are incredibly useful: 16 | 17 | * A reproducible test case or series of steps 18 | * The version of our code being used 19 | * Any modifications you've made relevant to the bug 20 | * Anything unusual about your environment or deployment 21 | 22 | 23 | ## Contributing via Pull Requests 24 | Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that: 25 | 26 | 1. You are working against the latest source on the *main* branch. 27 | 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already. 28 | 3. You open an issue to discuss any significant work - we would hate for your time to be wasted. 29 | 30 | To send us a pull request, please: 31 | 32 | 1. Fork the repository. 33 | 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change. 34 | 3. Ensure local tests pass. 35 | 4. Commit to your fork using clear commit messages. 36 | 5. Send us a pull request, answering any default questions in the pull request interface. 37 | 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation. 38 | 39 | GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and 40 | [creating a pull request](https://help.github.com/articles/creating-a-pull-request/). 41 | 42 | 43 | ## Finding contributions to work on 44 | Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start. 45 | 46 | 47 | ## Code of Conduct 48 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 49 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 50 | opensource-codeofconduct@amazon.com with any additional questions or comments. 51 | 52 | 53 | ## Security issue notifications 54 | If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue. 55 | 56 | 57 | ## Licensing 58 | 59 | See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution. 60 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # AIM317 re:Invent 2021 Workshop 2 | ## Uncover insights from customer conversations — no ML expertise required 3 | 4 | ### Abstract 5 | Understanding what your customers are saying is critical to your business. But navigating the technology needed to make sense of these conversations can be daunting. In this hands-on workshop, you will discover how to uncover valuable insights from your data using custom models that are tailored to your business needs—no ML expertise required. With a set of customer calls, learn how to boost transcription accuracy with Amazon Transcribe custom language models, extract insights with Amazon Comprehend custom entities, localize content with Amazon Translate active custom translation, and create powerful visualizations with Amazon QuickSight. 6 | 7 | This code sample is part of the re:Invent 2021 workshop and is designed to guide AWS re:Invent attendees through setting up a complete end-to-end solution for analyzing voice recordings of customer and support representative interactions These analyses can be used to detect sentiment, keywords, transcriptions, or translations. The following are the high-level steps to deploy this solution: 8 | 9 | ## Phase 1 - Build using SageMaker Jupyter notebooks 10 | 11 | We will first deep dive into the solution to understand what the building blocks are and how you can tie these together. The various notebooks you will run are as shown in the following image. 12 | 13 | ![SageMaker notebook architecture](https://github.com/aws-samples/aim317-uncover-insights-customer-conversations/blob/main/static/aim317-sm-arch-full.jpg) 14 | 15 | **For a full list of instructions to running these notebooks**, please refer to the [AIM317 Workshop Instructions](https://catalog.us-east-1.prod.workshops.aws/v2/workshops/076e45e5-760d-41cf-bd22-a86c46ee462c/en-US/) 16 | 17 | 18 | ## Phase 2 - Full Deployment - Operationalized Solution 19 | 20 | The solution that will be deployed is shown in the following image. 21 | 22 | ![Solution Architecture](https://github.com/aws-samples/aim317-uncover-insights-customer-conversations/blob/main/static/AIM317%20Diagram%20-%20A1.png) 23 | 24 | Use the following AWS CloudFormation template to deploy the operational version of the solution. **For deploy instructions please refer to** 6-Deploy in [AIM317 Workshop Instructions](https://catalog.us-east-1.prod.workshops.aws/v2/workshops/076e45e5-760d-41cf-bd22-a86c46ee462c/en-US/) 25 | 26 | [![Launch Stack](https://s3.amazonaws.com/cloudformation-examples/cloudformation-launch-stack.png)](https://console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/quickcreate?templateUrl=https://ai-ml-services-lab.s3.amazonaws.com/public/labs/aim317/cloudformation/aim317Template.yml¶m_SubnetID=subnet-00001) 27 | 28 | Follow are the list of the parameters. 29 | 30 | | Parameters | Description | 31 | | ------------------ | ---------------------------------------------- | 32 | | SubnetID | Subnet where the Lambdas will be deployed | 33 | 34 | You can copy the the required `SubnetID` from your default VPC in the VPC service in the console. 35 | 36 | ![Subnets](static/subnets.png) 37 | 38 | ## Security 39 | 40 | See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information. 41 | 42 | ## License 43 | 44 | This library is licensed under the MIT-0 License. See the [LICENSE](LICENSE) file. 45 | -------------------------------------------------------------------------------- /src/translateText.py: -------------------------------------------------------------------------------- 1 | import boto3 2 | import json 3 | import os 4 | 5 | def lambda_handler(event, context): 6 | 7 | record = event['Records'][0] 8 | 9 | s3bucket = record['s3']['bucket']['name'] 10 | s3object = record['s3']['object']['key'] 11 | 12 | s3 = boto3.client('s3') 13 | s3Resource = boto3.resource('s3') 14 | transcribe = boto3.client('transcribe') 15 | translate = boto3.client('translate') 16 | 17 | ## Get the transcription job name from the filename that triggered the event 18 | 19 | response = transcribe.list_transcription_jobs( 20 | JobNameContains='-'.join(s3object.split("/")[1].split("-")[0:3]) 21 | ) 22 | 23 | TranscriptionJobName = response['TranscriptionJobSummaries'][0]['TranscriptionJobName'] 24 | 25 | transcribed_data = s3Resource.Object(s3bucket,s3object) 26 | original = json.loads(transcribed_data.get()['Body'].read().decode('utf-8')) 27 | entire_transcript = original['results']['transcripts'] 28 | print(entire_transcript) 29 | outfile = '/tmp/'+ TranscriptionJobName +'.txt' 30 | with open(outfile, 'w') as out: 31 | out.write(entire_transcript[0]['transcript']) 32 | s3.upload_file(outfile,os.environ['outputBucket'], 'translateInput' + TranscriptionJobName +'.txt') 33 | 34 | ## Now get the language code from the transcription job 35 | 36 | response = transcribe.get_transcription_job( 37 | TranscriptionJobName=TranscriptionJobName 38 | ) 39 | 40 | TranslateLanguageCode = response['TranscriptionJob']['LanguageCode'].split("-")[0] 41 | 42 | if TranslateLanguageCode != 'en': 43 | 44 | paginator = s3.get_paginator('list_objects_v2') 45 | pages = paginator.paginate(Bucket=os.environ['outputBucket'], Prefix='translateInput' + TranscriptionJobName +'.txt') 46 | for page in pages: 47 | for obj in page['Contents']: 48 | temp = s3Resource.Object(s3bucket, obj['Key']) 49 | trans_input = temp.get()['Body'].read().decode('utf-8') 50 | if len(trans_input) > 0: 51 | # Translate the Spanish transcripts 52 | trans_response = translate.translate_text( 53 | Text=trans_input, 54 | TerminologyNames=['aim317-custom-terminology'], 55 | SourceLanguageCode='es', 56 | TargetLanguageCode='en' 57 | ) 58 | # Write the translated text to a temporary file 59 | with open('/tmp/temp_translate.txt', 'w') as outfile: 60 | outfile.write(trans_response['TranslatedText']) 61 | # Upload the translated text to S3 bucket 62 | s3.upload_file('/tmp/temp_translate.txt', os.environ['outputBucket'], 'comprehendInput' + '/en-' + TranscriptionJobName) 63 | print("Translated text file uploaded to: " + 's3://' + os.environ['outputBucket'] + '/' + 'comprehendInput' + '/en-' + TranscriptionJobName) 64 | 65 | else: 66 | 67 | paginator = s3.get_paginator('list_objects_v2') 68 | pages = paginator.paginate(Bucket=os.environ['outputBucket'], Prefix='translateInput' + TranscriptionJobName +'.txt') 69 | for page in pages: 70 | for obj in page['Contents']: 71 | temp = s3Resource.Object(s3bucket, obj['Key']) 72 | file_input = temp.get()['Body'].read().decode('utf-8') 73 | with open('/tmp/temp_translate.txt', 'w') as outfile: 74 | outfile.write(file_input) 75 | s3.upload_file('/tmp/temp_translate.txt', os.environ['outputBucket'], 'comprehendInput' + '/en-' + TranscriptionJobName) 76 | print("Translated text file uploaded to: " + 's3://' + os.environ['outputBucket'] + '/' + 'comprehendInput' + '/en-' + TranscriptionJobName) -------------------------------------------------------------------------------- /cloudformation/sagemakerNotebookEventEngineTemplate.yaml: -------------------------------------------------------------------------------- 1 | --- 2 | AWSTemplateFormatVersion: '2010-09-09' 3 | 4 | Description: IAM Policies, and SageMaker Notebook to work with Amazon Comprehend, it will also clone the Lab codebase into the Notebook before you get started. 5 | 6 | Parameters: 7 | 8 | EEAssetsBucket: 9 | Description: "Region-specific assets S3 bucket name (e.g. ee-assets-prod-us-east-1)" 10 | Type: String 11 | 12 | EEAssetsKeyPrefix: 13 | Description: "S3 key prefix where this modules assets are stored. (e.g. modules/my_module/v1/)" 14 | Type: String 15 | 16 | NotebookName: 17 | Type: String 18 | Default: AIM317WorkshopNotebook 19 | Description: Enter the name of the SageMaker notebook instance. Deafault is ComprehendLabNotebook. 20 | 21 | InstanceType: 22 | Type: String 23 | Default: ml.t2.medium 24 | AllowedValues: 25 | - ml.t2.medium 26 | - ml.m4.xlarge 27 | - ml.c5.xlarge 28 | - ml.p2.xlarge 29 | - ml.p3.2xlarge 30 | Description: Enter instance type. Default is ml.t2.medium. 31 | 32 | VolumeSize: 33 | Type: Number 34 | Default: 20 35 | MinValue: 5 36 | MaxValue: 16384 37 | ConstraintDescription: Must be an integer between 5 (GB) and 16384 (16 TB). 38 | Description: Enter the size of the EBS volume in GB. Default is 10 GB. 39 | 40 | Resources: 41 | WorkshopRepository: 42 | Type: AWS::CodeCommit::Repository 43 | Properties: 44 | RepositoryName: ih-ea-poc-workshop 45 | RepositoryDescription: CodeCommit Repo for the EA Marketplace PoC workshops 46 | Code: 47 | S3: 48 | Bucket: !Ref EEAssetsBucket 49 | Key: !Sub ${EEAssetsKeyPrefix}intelligenthelpeaPOC.zip 50 | 51 | 52 | # SageMaker Execution Role 53 | SageMakerIamRole: 54 | Type: "AWS::IAM::Role" 55 | Properties: 56 | AssumeRolePolicyDocument: 57 | Version: "2012-10-17" 58 | Statement: 59 | - 60 | Effect: Allow 61 | Principal: 62 | Service: sagemaker.amazonaws.com 63 | Action: sts:AssumeRole 64 | Path: "/" 65 | ManagedPolicyArns: 66 | - "arn:aws:iam::aws:policy/AmazonSageMakerFullAccess" 67 | - "arn:aws:iam::aws:policy/AWSCodeCommitFullAccess" 68 | - "arn:aws:iam::aws:policy/AmazonS3FullAccess" 69 | - "arn:aws:iam::aws:policy/ComprehendFullAccess" 70 | - "arn:aws:iam::aws:policy/IAMFullAccess" 71 | - "arn:aws:iam::aws:policy/TranslateFullAccess" 72 | - "arn:aws:iam::aws:policy/AmazonTranscribeFullAccess" 73 | 74 | # SageMaker notebook 75 | NotebookInstance: 76 | Type: "AWS::SageMaker::NotebookInstance" 77 | Properties: 78 | InstanceType: !Ref InstanceType 79 | NotebookInstanceName: !Ref NotebookName 80 | RoleArn: !GetAtt SageMakerIamRole.Arn 81 | VolumeSizeInGB: !Ref VolumeSize 82 | DefaultCodeRepository: !GetAtt WorkshopRepository.CloneUrlHttp 83 | LifecycleConfigName: !GetAtt NotebookInstanceLifecycleConfig.NotebookInstanceLifecycleConfigName 84 | 85 | 86 | NotebookInstanceLifecycleConfig: 87 | Type: "AWS::SageMaker::NotebookInstanceLifecycleConfig" 88 | Properties: 89 | OnStart: 90 | - Content: 91 | Fn::Base64: !Sub | 92 | #!/bin/bash 93 | set -e 94 | mkdir -p /home/ec2-user/SageMaker/data/ 95 | wget https://${EEAssetsBucket}.s3.amazonaws.com/${EEAssetsKeyPrefix}intelligenthelpeaPOCdata.zip && unzip intelligenthelpeaPOCdata.zip -d /home/ec2-user/SageMaker/data/ 96 | 97 | Outputs: 98 | WorkshopCloneUrlHttp: 99 | Description: Workshop CodeCommit Respository Http Clone Url 100 | Value: !GetAtt WorkshopRepository.CloneUrlHttp 101 | 102 | WorkshopCloneUrlSsh: 103 | Description: Workshop CodeCommit Repository SSH Clone Url 104 | Value: !GetAtt WorkshopRepository.CloneUrlSsh 105 | 106 | WorkshopRepositoryArn: 107 | Description: Workshop CodeCommit Repository Arn 108 | Value: !GetAtt WorkshopRepository.Arn 109 | -------------------------------------------------------------------------------- /notebooks/5-Visualize-Insights/AIM317-reInvent2021-prepare-quicksight-inputs.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "7de4471f", 6 | "metadata": {}, 7 | "source": [ 8 | "# Prepare inputs for Amazon Quicksight visualization\n", 9 | "\n", 10 | "Amazon QuickSight is a cloud-scale business intelligence (BI) service that you can use to deliver easy-to-understand insights to the people who you work with, wherever they are. Amazon QuickSight connects to your data in the cloud and combines data from many different sources. In a single data dashboard, QuickSight can include AWS data, third-party data, big data, spreadsheet data, SaaS data, B2B data, and more. As a fully managed cloud-based service, Amazon QuickSight provides enterprise-grade security, global availability, and built-in redundancy. It also provides the user-management tools that you need to scale from 10 users to 10,000, all with no infrastructure to deploy or manage.\n", 11 | "\n", 12 | "In this notebook, we will prepare the manifest file that we need to use with Amazon Quicksight to visualize insights we generated from our customer call transcripts." 13 | ] 14 | }, 15 | { 16 | "cell_type": "markdown", 17 | "id": "146615fd", 18 | "metadata": {}, 19 | "source": [ 20 | "### Initialize libraries and import variables" 21 | ] 22 | }, 23 | { 24 | "cell_type": "code", 25 | "execution_count": null, 26 | "id": "b2797e21", 27 | "metadata": {}, 28 | "outputs": [], 29 | "source": [ 30 | "# import libraries\n", 31 | "import pandas as pd\n", 32 | "import boto3\n", 33 | "import json\n", 34 | "import csv\n", 35 | "import os\n", 36 | "\n", 37 | "# initialize variables we need\n", 38 | "infile = 'quicksight_raw_manifest.json'\n", 39 | "outfile = 'quicksight_formatted_manifest_type.json'\n", 40 | "\n", 41 | "inprefix = 'quicksight/data'\n", 42 | "manifestprefix = 'quicksight/manifest'\n", 43 | "\n", 44 | "bucket = '' # Enter your bucket name here\n", 45 | "\n", 46 | "s3 = boto3.client('s3')\n", 47 | "\n", 48 | "try:\n", 49 | " s3.head_bucket(Bucket=bucket)\n", 50 | "except:\n", 51 | " print(\"The S3 bucket name {} you entered seems to be incorrect, please try again\".format(bucket))" 52 | ] 53 | }, 54 | { 55 | "cell_type": "markdown", 56 | "id": "fc71d262", 57 | "metadata": {}, 58 | "source": [ 59 | "### Review transcripts with insights for QuickSight\n", 60 | "When we ran the previous notebooks, we created CSV files containing speaker and time segmentation, the inference results that classified the transcripts to CTA/No CTA using Amazon Comprehend custom classification, we detected custom entities using Amazon Comprehend custom entity recognizer, and we finally detected the sentiment of the call transcripts using Amazon Comprehend Sentiment anlysis feature. These are available in our temp folder, let us move these to the quicksight/input folder" 61 | ] 62 | }, 63 | { 64 | "cell_type": "code", 65 | "execution_count": null, 66 | "id": "57d8c711", 67 | "metadata": {}, 68 | "outputs": [], 69 | "source": [ 70 | "# Lets review what CSV files we have for QuickSight\n", 71 | "!aws s3 ls s3://{bucket}/{inprefix} --recursive " 72 | ] 73 | }, 74 | { 75 | "cell_type": "markdown", 76 | "id": "bfe4137c", 77 | "metadata": {}, 78 | "source": [ 79 | "### Update QuickSight Manifest\n", 80 | "We will replace the S3 bucket and prefix from the raw manifest file with what you have entered in STEP 0 - CELL 1 above. We will then create a new formatted manifest file that will be used for creating a dataset with Amazon QuickSight based on the content we extract from the handwritten documents." 81 | ] 82 | }, 83 | { 84 | "cell_type": "code", 85 | "execution_count": null, 86 | "id": "75619a75", 87 | "metadata": {}, 88 | "outputs": [], 89 | "source": [ 90 | "# S3 boto3 client handle\n", 91 | "s3 = boto3.client('s3')\n", 92 | "\n", 93 | "# Create formatted manifests for each type of dataset we need from the raw manifest JSON\n", 94 | "types = ['transcripts', 'entity', 'cta', 'sentiment']\n", 95 | "\n", 96 | "manifest = open(infile, 'r')\n", 97 | "ln = json.load(manifest)\n", 98 | "t = json.dumps(ln['fileLocations'][0]['URIPrefixes'])\n", 99 | "for type in types:\n", 100 | " t1 = t.replace('bucket', bucket).replace('prefix', inprefix + '/' + type)\n", 101 | " ln['fileLocations'][0]['URIPrefixes'] = json.loads(t1)\n", 102 | " outfile_rep = outfile.replace('type', type)\n", 103 | " with open(outfile_rep, 'w', encoding='utf-8') as out:\n", 104 | " json.dump(ln, out, ensure_ascii=False, indent=4)\n", 105 | " # Upload the manifest to S3\n", 106 | " s3.upload_file(outfile_rep, bucket, manifestprefix + '/' + outfile_rep)\n", 107 | " print(\"Manifest file uploaded to: s3://{}/{}\".format(bucket, manifestprefix + '/' + outfile_rep))" 108 | ] 109 | }, 110 | { 111 | "cell_type": "markdown", 112 | "id": "0a6dce9f", 113 | "metadata": {}, 114 | "source": [ 115 | "#### Please copy the manifest S3 URIs above. We need it when we build the datasets for the QuickSight dashboard.\n", 116 | "\n", 117 | "### We are done here. Please go back to workshop instructions." 118 | ] 119 | } 120 | ], 121 | "metadata": { 122 | "kernelspec": { 123 | "display_name": "conda_python3", 124 | "language": "python", 125 | "name": "conda_python3" 126 | }, 127 | "language_info": { 128 | "codemirror_mode": { 129 | "name": "ipython", 130 | "version": 3 131 | }, 132 | "file_extension": ".py", 133 | "mimetype": "text/x-python", 134 | "name": "python", 135 | "nbconvert_exporter": "python", 136 | "pygments_lexer": "ipython3", 137 | "version": "3.6.13" 138 | } 139 | }, 140 | "nbformat": 4, 141 | "nbformat_minor": 5 142 | } 143 | -------------------------------------------------------------------------------- /notebooks/4-Detect-Call-Sentiment/AIM317-reInvent2021-detect-customer-sentiment.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "49212b4a", 6 | "metadata": {}, 7 | "source": [ 8 | "# Detect sentiment in customer calls using Amazon Comprehend\n", 9 | "\n", 10 | "Now we will detect the customer sentiment in the call conversations using Amazon Comprehend. " 11 | ] 12 | }, 13 | { 14 | "cell_type": "markdown", 15 | "id": "adf00292", 16 | "metadata": {}, 17 | "source": [ 18 | "### Import libraries and initialize variables" 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": null, 24 | "id": "a16de80a", 25 | "metadata": {}, 26 | "outputs": [], 27 | "source": [ 28 | "import boto3\n", 29 | "import pandas as pd\n", 30 | "\n", 31 | "inprefix = 'comprehend/input'\n", 32 | "outprefix = 'quicksight/temp/insights'\n", 33 | "# Amazon Comprehend client\n", 34 | "comprehend = boto3.client('comprehend')\n", 35 | "# Amazon S3 clients\n", 36 | "s3 = boto3.client('s3')\n", 37 | "s3_resource = boto3.resource('s3')\n", 38 | "\n", 39 | "bucket = '' # Enter your bucket name here\n", 40 | "\n", 41 | "try:\n", 42 | " s3.head_bucket(Bucket=bucket)\n", 43 | "except:\n", 44 | " print(\"The S3 bucket name {} you entered seems to be incorrect, please try again\".format(bucket))" 45 | ] 46 | }, 47 | { 48 | "cell_type": "markdown", 49 | "id": "e7e708e4", 50 | "metadata": {}, 51 | "source": [ 52 | "### Detect sentiment of transcripts\n", 53 | "For our workshop we will determine the sentiment of an entire call transcript to use with our visuals, but you can also capture sentiment trends in a conversation. We will demonstrate this during the workshop using the new **Transcribe Call Analytics** solution. If you like to try how this looks, please execute the optional code block at the end of this notebook." 54 | ] 55 | }, 56 | { 57 | "cell_type": "code", 58 | "execution_count": null, 59 | "id": "432afad2", 60 | "metadata": {}, 61 | "outputs": [], 62 | "source": [ 63 | "# Prepare to page through our transcripts in S3\n", 64 | "paginator = s3.get_paginator('list_objects_v2')\n", 65 | "pages = paginator.paginate(Bucket=bucket, Prefix=inprefix)\n", 66 | "job_name_list = []\n", 67 | "t_prefix = 'quicksight/data/sentiment'\n", 68 | "\n", 69 | "# We will define a DataFrame to store the results of the sentiment analysis\n", 70 | "cols = ['transcript_name', 'sentiment']\n", 71 | "df_sent = pd.DataFrame(columns=cols)\n", 72 | "\n", 73 | "# Now lets page through the transcripts\n", 74 | "for page in pages:\n", 75 | " for obj in page['Contents']:\n", 76 | " # get the transcript file name\n", 77 | " transcript_file_name = obj['Key'].split('/')[2]\n", 78 | " # now lets get the transcript file contents\n", 79 | " temp = s3_resource.Object(bucket, obj['Key'])\n", 80 | " transcript_contents = temp.get()['Body'].read().decode('utf-8')\n", 81 | " # Call Comprehend to detect sentiment\n", 82 | " response = comprehend.detect_sentiment(Text=transcript_contents, LanguageCode='en')\n", 83 | " # Update the results DataFrame with the cta predicted label\n", 84 | " # Create a CSV file with cta label from this DataFrame\n", 85 | " df_sent.loc[len(df_sent.index)] = [transcript_file_name.strip('en-').strip('.txt'),response['Sentiment']]\n", 86 | " \n", 87 | "df_sent.to_csv('s3://' + bucket + '/' + t_prefix + '/' + 'sentiment.csv', index=False)\n", 88 | "df_sent" 89 | ] 90 | }, 91 | { 92 | "cell_type": "markdown", 93 | "id": "7a785d9a", 94 | "metadata": {}, 95 | "source": [ 96 | "### OPTIONAL - Detect sentiment Trend\n", 97 | "We will now take one of the transcripts and show you how to detect sentiment trend in conversations. This can be a powerful insight to both demonstrate and understand the triggers for a shift in customer perspective as well as how to remedy it." 98 | ] 99 | }, 100 | { 101 | "cell_type": "code", 102 | "execution_count": null, 103 | "id": "b3788eff", 104 | "metadata": {}, 105 | "outputs": [], 106 | "source": [ 107 | "# Select one of the transcripts we created in 1-Transcribe-Translate\n", 108 | "import os\n", 109 | "rootdir = '/home/ec2-user/SageMaker/aim317-uncover-insights-customer-conversations/notebooks/1-Transcribe-Translate-Calls'\n", 110 | "csvfile = ''\n", 111 | "for subdir, dirs, files in os.walk(rootdir):\n", 112 | " for file in files:\n", 113 | " filepath = subdir + os.sep + file\n", 114 | " if filepath.endswith(\".csv\"):\n", 115 | " csvfile = str(filepath)\n", 116 | " break\n", 117 | " \n", 118 | "df_t = pd.read_csv(csvfile)\n", 119 | "df_t.head()" 120 | ] 121 | }, 122 | { 123 | "cell_type": "markdown", 124 | "id": "87d7583d", 125 | "metadata": {}, 126 | "source": [ 127 | "Separate the sentences spoken by each of the speakers to their own dictionaries along with the last timestamp when their sentence ended" 128 | ] 129 | }, 130 | { 131 | "cell_type": "code", 132 | "execution_count": null, 133 | "id": "c7554b69", 134 | "metadata": {}, 135 | "outputs": [], 136 | "source": [ 137 | "spk_0 = {}\n", 138 | "spk_1 = {}\n", 139 | "a = ''\n", 140 | "b = ''\n", 141 | "j = 0\n", 142 | "k = 0\n", 143 | "for i, row in df_t.iterrows():\n", 144 | " if row['speaker_label'] == 'spk_0':\n", 145 | " if len(b) > 0:\n", 146 | " j += 1\n", 147 | " spk_1['end_time'+str(j)] = row['start_time'] \n", 148 | " spk_1['transcript'+str(j)] = b\n", 149 | " b = ''\n", 150 | " a += row['content'] + ' '\n", 151 | " if row['speaker_label'] == 'spk_1':\n", 152 | " if len(a) > 0:\n", 153 | " k += 1\n", 154 | " spk_0['end_time'+str(k)] = row['start_time']\n", 155 | " spk_0['transcript'+str(k)] = a\n", 156 | " a = ''\n", 157 | " b += row['content'] + ' '\n", 158 | "if len(a) > 0:\n", 159 | " spk_0['transcript'+str(j+1)] = a\n", 160 | " spk_0['end_time'+str(j+1)] = row['end_time']\n", 161 | "if len(b) > 0:\n", 162 | " spk_1['transcript'+str(k+1)] = b\n", 163 | " spk_1['end_time'+str(k+1)] = row['end_time']" 164 | ] 165 | }, 166 | { 167 | "cell_type": "markdown", 168 | "id": "4e688186", 169 | "metadata": {}, 170 | "source": [ 171 | "#### Check the results" 172 | ] 173 | }, 174 | { 175 | "cell_type": "code", 176 | "execution_count": null, 177 | "id": "07c49134", 178 | "metadata": {}, 179 | "outputs": [], 180 | "source": [ 181 | "spk_0" 182 | ] 183 | }, 184 | { 185 | "cell_type": "markdown", 186 | "id": "a7addfcf", 187 | "metadata": {}, 188 | "source": [ 189 | "Now get the **sentiment for each line using Amazon Comprehend** and update the transcript with the sentiment" 190 | ] 191 | }, 192 | { 193 | "cell_type": "code", 194 | "execution_count": null, 195 | "id": "4dbc59de", 196 | "metadata": {}, 197 | "outputs": [], 198 | "source": [ 199 | "import re\n", 200 | "for line in spk_0:\n", 201 | " if 'transcript' in line:\n", 202 | " res0 = comprehend.detect_sentiment(Text=spk_0[line], LanguageCode='en')['Sentiment']\n", 203 | " spk_0[line] = res0\n", 204 | "\n", 205 | "for line in spk_1:\n", 206 | " if 'transcript' in line:\n", 207 | " res1 = comprehend.detect_sentiment(Text=spk_1[line], LanguageCode='en')['Sentiment']\n", 208 | " spk_1[line] = res1" 209 | ] 210 | }, 211 | { 212 | "cell_type": "code", 213 | "execution_count": null, 214 | "id": "440b8ad0", 215 | "metadata": {}, 216 | "outputs": [], 217 | "source": [ 218 | "spk_1" 219 | ] 220 | }, 221 | { 222 | "cell_type": "markdown", 223 | "id": "d1e26acd", 224 | "metadata": {}, 225 | "source": [ 226 | "#### Let us now graph it" 227 | ] 228 | }, 229 | { 230 | "cell_type": "code", 231 | "execution_count": null, 232 | "id": "fcde602e", 233 | "metadata": {}, 234 | "outputs": [], 235 | "source": [ 236 | "!pip install matplotlib" 237 | ] 238 | }, 239 | { 240 | "cell_type": "code", 241 | "execution_count": null, 242 | "id": "67257a58", 243 | "metadata": {}, 244 | "outputs": [], 245 | "source": [ 246 | "import matplotlib.pyplot as plt\n", 247 | "\n", 248 | "spk_0_end_time = []\n", 249 | "spk_0_sentiment = []\n", 250 | "spk_1_end_time = []\n", 251 | "spk_1_sentiment = []\n", 252 | "\n", 253 | "\n", 254 | "for x in spk_0:\n", 255 | " if 'end_time' in x:\n", 256 | " spk_0_end_time.append(spk_0[x])\n", 257 | " if 'transcript' in x:\n", 258 | " spk_0_sentiment.append(spk_0[x])\n", 259 | "\n", 260 | "for x in spk_1:\n", 261 | " if 'end_time' in x:\n", 262 | " spk_1_end_time.append(spk_1[x])\n", 263 | " if 'transcript' in x:\n", 264 | " spk_1_sentiment.append(spk_1[x])\n", 265 | " \n", 266 | "plt.plot(spk_0_end_time, spk_0_sentiment, color = 'g', label = 'Speaker 0 Sentiment Trend')\n", 267 | "plt.plot(spk_1_end_time, spk_1_sentiment, color = 'b', label = 'Speaker 1 Sentiment Trend')\n", 268 | "plt.xlabel('Call time in seconds')\n", 269 | "plt.ylabel('Sentiment')\n", 270 | "plt.legend()" 271 | ] 272 | }, 273 | { 274 | "cell_type": "markdown", 275 | "id": "d18282d2", 276 | "metadata": {}, 277 | "source": [ 278 | "As you can see above, the sky's the limit on what you can do with the Amazon Transcribe output in tandem with Amazon Comprehend. Please go back now to watch your team members create some **AWSome visuals using Amazon QuickSight!!**" 279 | ] 280 | }, 281 | { 282 | "cell_type": "markdown", 283 | "id": "e797961e", 284 | "metadata": {}, 285 | "source": [ 286 | "## End of notebook. Please go back to the workshop instructions to review the next steps." 287 | ] 288 | } 289 | ], 290 | "metadata": { 291 | "kernelspec": { 292 | "display_name": "conda_python3", 293 | "language": "python", 294 | "name": "conda_python3" 295 | }, 296 | "language_info": { 297 | "codemirror_mode": { 298 | "name": "ipython", 299 | "version": 3 300 | }, 301 | "file_extension": ".py", 302 | "mimetype": "text/x-python", 303 | "name": "python", 304 | "nbconvert_exporter": "python", 305 | "pygments_lexer": "ipython3", 306 | "version": "3.6.13" 307 | } 308 | }, 309 | "nbformat": 4, 310 | "nbformat_minor": 5 311 | } 312 | -------------------------------------------------------------------------------- /cloudformation/aim317Template.yml: -------------------------------------------------------------------------------- 1 | AWSTemplateFormatVersion: '2010-09-09' 2 | Parameters: 3 | ESVocabularyName: 4 | Type: String 5 | Default: "Add Vocabulary Name" 6 | ENVocabularyName: 7 | Type: String 8 | Default: "Add Vocabulary Name" 9 | Resources: 10 | ServiceRole: 11 | Type: AWS::IAM::Role 12 | Properties: 13 | AssumeRolePolicyDocument: 14 | Version: "2012-10-17" 15 | Statement: 16 | - 17 | Effect: "Allow" 18 | Principal: 19 | Service: 20 | - "comprehend.amazonaws.com" 21 | - "lambda.amazonaws.com" 22 | - "translate.amazonaws.com" 23 | - "s3.amazonaws.com" 24 | Action: 25 | - "sts:AssumeRole" 26 | RoleName: "AIM317ServiceRole" 27 | ManagedPolicyArns: 28 | - arn:aws:iam::aws:policy/TranslateFullAccess 29 | - arn:aws:iam::aws:policy/AmazonS3FullAccess 30 | - arn:aws:iam::aws:policy/AmazonTranscribeFullAccess 31 | - arn:aws:iam::aws:policy/ComprehendFullAccess 32 | - arn:aws:iam::aws:policy/AWSLambdaExecute 33 | 34 | PassthroughPolicy: 35 | Type: AWS::IAM::Policy 36 | Properties: 37 | PolicyName: "AIM317PassthroughPolicy" 38 | PolicyDocument: 39 | Version: "2012-10-17" 40 | Statement: 41 | - Effect: Allow 42 | Action: 43 | - 'iam:PassRole' 44 | Resource: !GetAtt ServiceRole.Arn 45 | Roles: 46 | - !Ref ServiceRole 47 | 48 | StartTranscriptionJob: 49 | Type: AWS::Lambda::Function 50 | Properties: 51 | Description: "Triggers on raw audio files added to S3 location and performs transcription" 52 | Handler: startTranscriptionJob.lambda_handler 53 | Code: 54 | S3Bucket: !Sub 'aim317-code-${AWS::AccountId}' 55 | S3Key: "src/startTranscriptionJob.zip" 56 | Runtime: python3.8 57 | Timeout: 10 58 | ReservedConcurrentExecutions: 1 59 | FunctionName: "AIM317startTranscriptionJob" 60 | Role: !GetAtt ServiceRole.Arn 61 | Environment: 62 | Variables: 63 | outputBucket : !Sub 'aim317-${AWS::AccountId}' 64 | outputKey : "transcriptOutput/" 65 | ESVocabularyName : !Ref ESVocabularyName 66 | ENVocabularyName : !Ref ENVocabularyName 67 | 68 | StartTranscriptionJobPermission: 69 | Type: AWS::Lambda::Permission 70 | Properties: 71 | Action: lambda:InvokeFunction 72 | FunctionName: !GetAtt StartTranscriptionJob.Arn 73 | Principal: s3.amazonaws.com 74 | SourceArn: !Sub 'arn:aws:s3:::aim317-${AWS::AccountId}' 75 | 76 | CreateVocabulary: 77 | Type: AWS::Lambda::Function 78 | Properties: 79 | Handler: createVocabulary.lambda_handler 80 | Code: 81 | S3Bucket: !Sub 'aim317-code-${AWS::AccountId}' 82 | S3Key: "src/createVocabulary.zip" 83 | Runtime: python3.8 84 | ReservedConcurrentExecutions: 1 85 | FunctionName: "AIM317createVocavulary" 86 | Role: !GetAtt ServiceRole.Arn 87 | 88 | CreateVocabularyPermission: 89 | Type: AWS::Lambda::Permission 90 | Properties: 91 | Action: lambda:InvokeFunction 92 | FunctionName: !GetAtt CreateVocabulary.Arn 93 | Principal: s3.amazonaws.com 94 | SourceArn: !Sub 'arn:aws:s3:::aim317-${AWS::AccountId}' 95 | 96 | ImportTerminology: 97 | Type: AWS::Lambda::Function 98 | Properties: 99 | Handler: importTerminology.lambda_handler 100 | Code: 101 | S3Bucket: !Sub 'aim317-code-${AWS::AccountId}' 102 | S3Key: "src/importTerminology.zip" 103 | Runtime: python3.8 104 | ReservedConcurrentExecutions: 1 105 | FunctionName: "AIM317importTerminology" 106 | Role: !GetAtt ServiceRole.Arn 107 | 108 | ImportTerminolofyPermission: 109 | Type: AWS::Lambda::Permission 110 | Properties: 111 | Action: lambda:InvokeFunction 112 | FunctionName: !GetAtt ImportTerminology.Arn 113 | Principal: s3.amazonaws.com 114 | SourceArn: !Sub 'arn:aws:s3:::aim317-${AWS::AccountId}' 115 | 116 | TranslateText: 117 | Type: AWS::Lambda::Function 118 | Properties: 119 | Handler: translateText.lambda_handler 120 | Code: 121 | S3Bucket: !Sub 'aim317-code-${AWS::AccountId}' 122 | S3Key: "src/translateText.zip" 123 | Runtime: python3.8 124 | ReservedConcurrentExecutions: 1 125 | FunctionName: "AIM317translateText" 126 | Role: !GetAtt ServiceRole.Arn 127 | Environment: 128 | Variables: 129 | outputBucket : !Sub 'aim317-${AWS::AccountId}' 130 | outputKey : "translateOutput/" 131 | TranslateARN : !GetAtt ServiceRole.Arn 132 | 133 | TranslateTextPermission: 134 | Type: AWS::Lambda::Permission 135 | Properties: 136 | Action: lambda:InvokeFunction 137 | FunctionName: !GetAtt TranslateText.Arn 138 | Principal: s3.amazonaws.com 139 | SourceArn: !Sub 'arn:aws:s3:::aim317-${AWS::AccountId}' 140 | 141 | CreateEntityRecognizer: 142 | Type: AWS::Lambda::Function 143 | Properties: 144 | Handler: createEntityRecognizer.lambda_handler 145 | Code: 146 | S3Bucket: !Sub 'aim317-code-${AWS::AccountId}' 147 | S3Key: "src/createEntityRecognizer.zip" 148 | Runtime: python3.8 149 | ReservedConcurrentExecutions: 1 150 | FunctionName: "AIM317createEntityRecognizer" 151 | Role: !GetAtt ServiceRole.Arn 152 | Environment: 153 | Variables: 154 | ComprehendARN : !GetAtt ServiceRole.Arn 155 | ComprehendAnnotationBucket : !Sub 'aim317-${AWS::AccountId}' 156 | ComprehendTargetBucket : !Sub 'aim317-${AWS::AccountId}' 157 | 158 | CreateEntityRecognizerPermission: 159 | Type: AWS::Lambda::Permission 160 | Properties: 161 | Action: lambda:InvokeFunction 162 | FunctionName: !GetAtt CreateEntityRecognizer.Arn 163 | Principal: s3.amazonaws.com 164 | SourceArn: !Sub 'arn:aws:s3:::aim317-${AWS::AccountId}' 165 | 166 | CreateEndpoint: 167 | Type: AWS::Lambda::Function 168 | Properties: 169 | Handler: createEndpoint.lambda_handler 170 | Code: 171 | S3Bucket: !Sub 'aim317-code-${AWS::AccountId}' 172 | S3Key: "src/createEndpoint.zip" 173 | Description: "Lambda that creates an endpoint for inference" 174 | Runtime: python3.8 175 | ReservedConcurrentExecutions: 1 176 | FunctionName: "AIM317createEndpoint" 177 | Role: !GetAtt ServiceRole.Arn 178 | Environment: 179 | Variables: 180 | ComprehendARN : !GetAtt ServiceRole.Arn 181 | 182 | CreateEndpointPermission: 183 | Type: AWS::Lambda::Permission 184 | Properties: 185 | Action: lambda:InvokeFunction 186 | FunctionName: !GetAtt CreateEndpoint.Arn 187 | Principal: s3.amazonaws.com 188 | SourceArn: !Sub 'arn:aws:s3:::aim317-${AWS::AccountId}' 189 | 190 | DetectEntities: 191 | Type: AWS::Lambda::Function 192 | Properties: 193 | Handler: detectEntities.lambda_handler 194 | Code: 195 | S3Bucket: !Sub 'aim317-code-${AWS::AccountId}' 196 | S3Key: "src/detectEntities.zip" 197 | Runtime: python3.8 198 | Timeout: 10 199 | ReservedConcurrentExecutions: 1 200 | Layers: 201 | - "arn:aws:lambda:us-east-1:336392948345:layer:AWSDataWrangler-Python38:1" 202 | FunctionName: "AIM317detectEntities" 203 | Role: !GetAtt ServiceRole.Arn 204 | Environment: 205 | Variables: 206 | entityDetectionBucket : !Sub 'aim317-${AWS::AccountId}' 207 | ComprehendARN : !GetAtt ServiceRole.Arn 208 | 209 | DetectEntitiesPermission: 210 | Type: AWS::Lambda::Permission 211 | Properties: 212 | Action: lambda:InvokeFunction 213 | FunctionName: !GetAtt DetectEntities.Arn 214 | Principal: s3.amazonaws.com 215 | SourceArn: !Sub 'arn:aws:s3:::aim317-${AWS::AccountId}' 216 | 217 | BuildTrainTest: 218 | Type: AWS::Lambda::Function 219 | Properties: 220 | Handler: paginateProcessDataTrainTestFiles.lambda_handler 221 | Code: 222 | S3Bucket: !Sub 'aim317-code-${AWS::AccountId}' 223 | S3Key: "src/paginateProcessDataTrainTestFiles.zip" 224 | Runtime: python3.8 225 | ReservedConcurrentExecutions: 1 226 | Layers: 227 | - "arn:aws:lambda:us-east-1:336392948345:layer:AWSDataWrangler-Python38:1" 228 | FunctionName: "AIM317buildTrainTest" 229 | Role: !GetAtt ServiceRole.Arn 230 | Environment: 231 | Variables: 232 | comprehendBucket : !Sub 'aim317-${AWS::AccountId}' 233 | ComprehendARN : !GetAtt ServiceRole.Arn 234 | 235 | BuildTrainTestPermission: 236 | Type: AWS::Lambda::Permission 237 | Properties: 238 | Action: lambda:InvokeFunction 239 | FunctionName: !GetAtt BuildTrainTest.Arn 240 | Principal: s3.amazonaws.com 241 | SourceArn: !Sub 'arn:aws:s3:::aim317-${AWS::AccountId}' 242 | 243 | CreateDocumentClassifier: 244 | Type: AWS::Lambda::Function 245 | Properties: 246 | Handler: createDocumentClassifier.lambda_handler 247 | Code: 248 | S3Bucket: !Sub 'aim317-code-${AWS::AccountId}' 249 | S3Key: "src/createDocumentClassifier.zip" 250 | Runtime: python3.8 251 | ReservedConcurrentExecutions: 1 252 | FunctionName: "AIM317createDocumentClassifier" 253 | Role: !GetAtt ServiceRole.Arn 254 | Environment: 255 | Variables: 256 | classifierBucket : !Sub 'aim317-${AWS::AccountId}' 257 | classifierBucketPrefix : "comprehend-custom-classifier" 258 | ComprehendARN : !GetAtt ServiceRole.Arn 259 | 260 | CreateDocumentClassifierPermission: 261 | Type: AWS::Lambda::Permission 262 | Properties: 263 | Action: lambda:InvokeFunction 264 | FunctionName: !GetAtt CreateDocumentClassifier.Arn 265 | Principal: s3.amazonaws.com 266 | SourceArn: !Sub 'arn:aws:s3:::aim317-${AWS::AccountId}' 267 | 268 | ClassifyDocument: 269 | Type: AWS::Lambda::Function 270 | Properties: 271 | Handler: classifyDocument.lambda_handler 272 | Code: 273 | S3Bucket: !Sub 'aim317-code-${AWS::AccountId}' 274 | S3Key: "src/classifyDocument.zip" 275 | Runtime: python3.8 276 | Timeout: 10 277 | Layers: 278 | - "arn:aws:lambda:us-east-1:336392948345:layer:AWSDataWrangler-Python38:1" 279 | ReservedConcurrentExecutions: 1 280 | FunctionName: "AIM317classifyDocument" 281 | Role: !GetAtt ServiceRole.Arn 282 | Environment: 283 | Variables: 284 | classifierBucketPrefix : "quicksight/data/cta" 285 | classifierBucket : !Sub 'aim317-${AWS::AccountId}' 286 | ComprehendARN : !GetAtt ServiceRole.Arn 287 | 288 | ClassifyDocumentPermission: 289 | Type: AWS::Lambda::Permission 290 | Properties: 291 | Action: lambda:InvokeFunction 292 | FunctionName: !GetAtt ClassifyDocument.Arn 293 | Principal: s3.amazonaws.com 294 | SourceArn: !Sub 'arn:aws:s3:::aim317-${AWS::AccountId}' 295 | 296 | DetectSentiment: 297 | Type: AWS::Lambda::Function 298 | Properties: 299 | Handler: startSentimentDetection.lambda_handler 300 | Code: 301 | S3Bucket: !Sub 'aim317-code-${AWS::AccountId}' 302 | S3Key: "src/startSentimentDetection.zip" 303 | Runtime: python3.8 304 | Timeout: 10 305 | Layers: 306 | - "arn:aws:lambda:us-east-1:336392948345:layer:AWSDataWrangler-Python38:1" 307 | ReservedConcurrentExecutions: 1 308 | FunctionName: "AIM317detectSentiment" 309 | Role: !GetAtt ServiceRole.Arn 310 | Environment: 311 | Variables: 312 | ComprehendBucket : !Sub 'aim317-${AWS::AccountId}' 313 | ComprehendARN : !GetAtt ServiceRole.Arn 314 | 315 | DetectSentimentPermission: 316 | Type: AWS::Lambda::Permission 317 | Properties: 318 | Action: lambda:InvokeFunction 319 | FunctionName: !GetAtt DetectSentiment.Arn 320 | Principal: s3.amazonaws.com 321 | SourceArn: !Sub 'arn:aws:s3:::aim317-${AWS::AccountId}' 322 | 323 | AIM317Bucket: 324 | Type: AWS::S3::Bucket 325 | DeletionPolicy: Delete 326 | UpdateReplacePolicy: Retain 327 | Properties: 328 | AccessControl: Private 329 | BucketName: !Sub 'aim317-${AWS::AccountId}' 330 | LoggingConfiguration: 331 | LogFilePrefix: access-logs 332 | VersioningConfiguration: 333 | Status: Enabled 334 | PublicAccessBlockConfiguration: 335 | BlockPublicAcls: true 336 | BlockPublicPolicy: false 337 | IgnorePublicAcls: true 338 | RestrictPublicBuckets: true 339 | NotificationConfiguration: 340 | LambdaConfigurations: 341 | - Event: s3:ObjectCreated:* 342 | Filter: 343 | S3Key: 344 | Rules: 345 | - Name: prefix 346 | Value: "transcribeInput/" 347 | Function: !GetAtt StartTranscriptionJob.Arn 348 | - Event: s3:ObjectCreated:* 349 | Filter: 350 | S3Key: 351 | Rules: 352 | - Name: prefix 353 | Value: "vocabularyInput/" 354 | Function: !GetAtt CreateVocabulary.Arn 355 | - Event: s3:ObjectCreated:* 356 | Filter: 357 | S3Key: 358 | Rules: 359 | - Name: prefix 360 | Value: "terminologyInput/" 361 | Function: !GetAtt ImportTerminology.Arn 362 | - Event: s3:ObjectCreated:* 363 | Filter: 364 | S3Key: 365 | Rules: 366 | - Name: prefix 367 | Value: "transcriptOutput/" 368 | Function: !GetAtt TranslateText.Arn -------------------------------------------------------------------------------- /notebooks/2-Train-Detect-Entities/annotations.csv: -------------------------------------------------------------------------------- 1 | File,Line,Begin Offset,End Offset,Type 2 | train.csv,0,45,67,ETHICS 3 | train.csv,0,92,108,BRAIN 4 | train.csv,1,28,45,BRAIN 5 | train.csv,1,58,68,ETHICS 6 | train.csv,2,145,155,ETHICS 7 | train.csv,3,123,140,BRAIN 8 | train.csv,4,79,96,BRAIN 9 | train.csv,5,56,61,MOVEMENT 10 | train.csv,5,99,109,BRAIN 11 | train.csv,6,78,95,BRAIN 12 | train.csv,7,70,87,BRAIN 13 | train.csv,8,53,63,BRAIN 14 | train.csv,8,191,201,ETHICS 15 | train.csv,9,200,210,BRAIN 16 | train.csv,10,57,79,ETHICS 17 | train.csv,10,184,207,BRAIN 18 | train.csv,11,99,109,ETHICS 19 | train.csv,12,55,65,ETHICS 20 | train.csv,13,65,83,ETHICS 21 | train.csv,13,104,129,ETHICS 22 | train.csv,14,170,195,ETHICS 23 | train.csv,15,296,306,ETHICS 24 | train.csv,16,79,89,ETHICS 25 | train.csv,16,94,117,BRAIN 26 | train.csv,17,109,131,ETHICS 27 | train.csv,18,299,315,BRAIN 28 | train.csv,19,102,112,BRAIN 29 | train.csv,20,4,20,BRAIN 30 | train.csv,20,153,157,MOVEMENT 31 | train.csv,21,0,16,BRAIN 32 | train.csv,22,87,97,ETHICS 33 | train.csv,23,21,37,BRAIN 34 | train.csv,23,157,167,ETHICS 35 | train.csv,24,80,90,ETHICS 36 | train.csv,24,189,211,ETHICS 37 | train.csv,25,102,118,BRAIN 38 | train.csv,26,135,151,BRAIN 39 | train.csv,27,99,109,BRAIN 40 | train.csv,29,73,82,ETHICS 41 | train.csv,29,119,137,ETHICS 42 | train.csv,30,4,14,BRAIN 43 | train.csv,31,48,64,BRAIN 44 | train.csv,32,7,17,BRAIN 45 | train.csv,33,28,44,BRAIN 46 | train.csv,34,100,110,BRAIN 47 | train.csv,35,33,50,BRAIN 48 | train.csv,36,192,209,BRAIN 49 | train.csv,37,81,97,BRAIN 50 | train.csv,37,165,174,ETHICS 51 | train.csv,38,102,118,BRAIN 52 | train.csv,39,69,85,BRAIN 53 | train.csv,40,22,31,ETHICS 54 | train.csv,41,109,118,ETHICS 55 | train.csv,42,59,69,ETHICS 56 | train.csv,42,91,114,ETHICS 57 | train.csv,42,137,167,ETHICS 58 | train.csv,43,130,139,ETHICS 59 | train.csv,44,59,64,BRAIN 60 | train.csv,44,90,100,ETHICS 61 | train.csv,45,40,49,ETHICS 62 | train.csv,45,74,84,ETHICS 63 | train.csv,45,206,222,BRAIN 64 | train.csv,46,131,153,ETHICS 65 | train.csv,47,114,131,BRAIN 66 | train.csv,48,29,46,BRAIN 67 | train.csv,49,44,52,MOVEMENT 68 | train.csv,51,161,166,MOVEMENT 69 | train.csv,51,176,180,MOVEMENT 70 | train.csv,52,41,45,MOVEMENT 71 | train.csv,54,35,39,MOVEMENT 72 | train.csv,55,203,225,ETHICS 73 | train.csv,56,38,42,MOVEMENT 74 | train.csv,57,69,92,BRAIN 75 | train.csv,58,16,39,BRAIN 76 | train.csv,59,40,44,MOVEMENT 77 | train.csv,60,39,43,MOVEMENT 78 | train.csv,61,0,7,MOVEMENT 79 | train.csv,61,117,127,MOVEMENT 80 | train.csv,62,34,38,MOVEMENT 81 | train.csv,65,42,46,MOVEMENT 82 | train.csv,66,35,39,MOVEMENT 83 | train.csv,67,103,107,MOVEMENT 84 | train.csv,69,65,69,MOVEMENT 85 | train.csv,70,65,70,MOVEMENT 86 | train.csv,70,80,84,MOVEMENT 87 | train.csv,73,124,130,MOVEMENT 88 | train.csv,74,7,17,MOVEMENT 89 | train.csv,74,155,160,MOVEMENT 90 | train.csv,75,8,18,MOVEMENT 91 | train.csv,75,65,70,MOVEMENT 92 | train.csv,77,13,17,MOVEMENT 93 | train.csv,77,162,167,BRAIN 94 | train.csv,78,26,30,MOVEMENT 95 | train.csv,79,0,4,MOVEMENT 96 | train.csv,79,61,65,MOVEMENT 97 | train.csv,80,59,63,MOVEMENT 98 | train.csv,81,9,13,MOVEMENT 99 | train.csv,81,201,205,MOVEMENT 100 | train.csv,82,2,6,MOVEMENT 101 | train.csv,83,4,8,MOVEMENT 102 | train.csv,83,42,47,MOVEMENT 103 | train.csv,83,52,58,MOVEMENT 104 | train.csv,84,4,8,MOVEMENT 105 | train.csv,85,4,8,MOVEMENT 106 | train.csv,85,79,89,MOVEMENT 107 | train.csv,86,25,30,MOVEMENT 108 | train.csv,86,35,41,MOVEMENT 109 | train.csv,86,149,153,MOVEMENT 110 | train.csv,87,4,10,MOVEMENT 111 | train.csv,87,45,49,MOVEMENT 112 | train.csv,87,108,115,MOVEMENT 113 | train.csv,88,4,9,MOVEMENT 114 | train.csv,88,105,109,MOVEMENT 115 | train.csv,89,14,18,MOVEMENT 116 | train.csv,90,76,80,MOVEMENT 117 | train.csv,91,142,146,MOVEMENT 118 | train.csv,92,37,41,MOVEMENT 119 | train.csv,92,78,85,MOVEMENT 120 | train.csv,93,16,24,MOVEMENT 121 | train.csv,93,136,142,MOVEMENT 122 | train.csv,94,4,12,MOVEMENT 123 | train.csv,94,101,111,MOVEMENT 124 | train.csv,95,0,8,MOVEMENT 125 | train.csv,95,56,61,MOVEMENT 126 | train.csv,96,21,31,MOVEMENT 127 | train.csv,97,134,156,ETHICS 128 | train.csv,98,62,72,ETHICS 129 | train.csv,99,55,68,ETHICS 130 | train.csv,100,142,152,ETHICS 131 | train.csv,101,4,26,ETHICS 132 | train.csv,101,74,87,ETHICS 133 | train.csv,102,4,14,ETHICS 134 | train.csv,102,86,95,ETHICS 135 | train.csv,102,194,204,ETHICS 136 | train.csv,102,215,235,ETHICS 137 | train.csv,102,322,331,ETHICS 138 | train.csv,103,49,66,BRAIN 139 | train.csv,104,159,169,ETHICS 140 | train.csv,106,31,41,ETHICS 141 | train.csv,106,114,144,ETHICS 142 | train.csv,107,4,14,ETHICS 143 | train.csv,108,22,32,ETHICS 144 | train.csv,109,37,47,ETHICS 145 | train.csv,110,63,72,ETHICS 146 | train.csv,111,41,51,ETHICS 147 | train.csv,113,136,145,ETHICS 148 | train.csv,114,137,147,ETHICS 149 | train.csv,115,116,126,ETHICS 150 | train.csv,116,212,221,ETHICS 151 | train.csv,117,164,174,ETHICS 152 | train.csv,118,79,88,ETHICS 153 | train.csv,119,178,188,ETHICS 154 | train.csv,122,67,89,ETHICS 155 | train.csv,123,24,34,ETHICS 156 | train.csv,124,26,36,ETHICS 157 | train.csv,125,156,166,ETHICS 158 | train.csv,126,13,22,ETHICS 159 | train.csv,127,105,114,ETHICS 160 | train.csv,127,124,134,ETHICS 161 | train.csv,128,82,92,ETHICS 162 | train.csv,129,144,154,ETHICS 163 | train.csv,130,70,86,BRAIN 164 | train.csv,130,152,161,ETHICS 165 | train.csv,131,24,40,BRAIN 166 | train.csv,132,32,42,ETHICS 167 | train.csv,132,126,131,BRAIN 168 | train.csv,133,90,100,ETHICS 169 | train.csv,134,95,105,ETHICS 170 | train.csv,135,55,65,ETHICS 171 | train.csv,136,165,175,ETHICS 172 | train.csv,137,75,85,ETHICS 173 | train.csv,138,93,102,ETHICS 174 | train.csv,138,203,213,ETHICS 175 | train.csv,138,275,284,ETHICS 176 | train.csv,138,426,436,ETHICS 177 | train.csv,138,538,548,ETHICS 178 | train.csv,139,4,20,ETHICS 179 | train.csv,139,173,183,ETHICS 180 | train.csv,139,267,277,ETHICS 181 | train.csv,140,33,43,ETHICS 182 | train.csv,140,127,149,ETHICS 183 | train.csv,141,4,20,ETHICS 184 | train.csv,142,173,183,ETHICS 185 | train.csv,143,112,122,ETHICS 186 | train.csv,144,119,129,ETHICS 187 | train.csv,145,66,76,ETHICS 188 | train.csv,146,17,26,ETHICS 189 | train.csv,147,9,19,ETHICS 190 | train.csv,147,158,167,ETHICS 191 | train.csv,148,89,98,ETHICS 192 | train.csv,149,41,57,BRAIN 193 | train.csv,150,90,99,ETHICS 194 | train.csv,151,11,34,BRAIN 195 | train.csv,151,74,84,ETHICS 196 | train.csv,152,96,106,ETHICS 197 | train.csv,152,132,155,BRAIN 198 | train.csv,153,178,188,ETHICS 199 | train.csv,154,95,111,ETHICS 200 | train.csv,155,80,90,ETHICS 201 | train.csv,156,103,113,ETHICS 202 | train.csv,157,121,131,ETHICS 203 | train.csv,158,33,43,ETHICS 204 | train.csv,159,76,86,ETHICS 205 | train.csv,160,20,30,ETHICS 206 | train.csv,161,275,285,ETHICS 207 | train.csv,162,93,103,ETHICS 208 | train.csv,163,64,74,ETHICS 209 | train.csv,164,21,31,ETHICS 210 | train.csv,165,75,85,ETHICS 211 | train.csv,166,2,18,BRAIN 212 | train.csv,168,63,85,ETHICS 213 | train.csv,168,114,130,BRAIN 214 | train.csv,169,0,16,BRAIN 215 | train.csv,170,53,69,BRAIN 216 | train.csv,171,90,112,ETHICS 217 | train.csv,171,213,229,BRAIN 218 | train.csv,172,87,103,BRAIN 219 | train.csv,173,2,18,BRAIN 220 | train.csv,173,72,82,ETHICS 221 | train.csv,174,4,14,ETHICS 222 | train.csv,174,40,45,BRAIN 223 | train.csv,175,67,76,ETHICS 224 | train.csv,175,223,233,ETHICS 225 | train.csv,176,13,18,BRAIN 226 | train.csv,176,218,228,ETHICS 227 | train.csv,177,22,32,BRAIN 228 | train.csv,177,53,58,BRAIN 229 | train.csv,178,59,69,BRAIN 230 | train.csv,179,149,159,ETHICS 231 | train.csv,180,181,197,BRAIN 232 | train.csv,181,109,114,BRAIN 233 | train.csv,182,106,122,BRAIN 234 | train.csv,183,127,143,BRAIN 235 | train.csv,183,178,200,ETHICS 236 | train.csv,184,264,274,BRAIN 237 | train.csv,185,44,60,BRAIN 238 | train.csv,186,199,215,BRAIN 239 | train.csv,187,75,80,BRAIN 240 | train.csv,188,78,94,BRAIN 241 | train.csv,189,18,34,BRAIN 242 | train.csv,190,118,134,BRAIN 243 | train.csv,191,102,118,BRAIN 244 | train.csv,192,66,82,BRAIN 245 | train.csv,193,34,50,BRAIN 246 | train.csv,193,78,88,ETHICS 247 | train.csv,194,115,137,ETHICS 248 | train.csv,195,90,106,BRAIN 249 | train.csv,196,28,44,BRAIN 250 | train.csv,197,131,147,BRAIN 251 | train.csv,198,27,49,ETHICS 252 | train.csv,199,255,271,BRAIN 253 | train.csv,200,98,114,BRAIN 254 | train.csv,201,28,38,BRAIN 255 | train.csv,201,39,62,BRAIN 256 | train.csv,202,65,81,BRAIN 257 | train.csv,203,2,16,BRAIN 258 | train.csv,204,2,16,BRAIN 259 | train.csv,205,8,22,BRAIN 260 | train.csv,206,29,43,BRAIN 261 | train.csv,207,115,129,BRAIN 262 | train.csv,208,131,145,BRAIN 263 | train.csv,209,0,14,BRAIN 264 | train.csv,210,90,104,BRAIN 265 | train.csv,211,83,97,BRAIN 266 | train.csv,212,72,78,BRAIN 267 | train.csv,213,56,62,BRAIN 268 | train.csv,213,89,94,BRAIN 269 | train.csv,214,58,72,BRAIN 270 | train.csv,215,31,37,BRAIN 271 | train.csv,216,99,105,BRAIN 272 | train.csv,217,44,49,MOVEMENT 273 | train.csv,217,146,151,MOVEMENT 274 | train.csv,217,205,219,BRAIN 275 | train.csv,218,18,23,MOVEMENT 276 | train.csv,218,76,82,BRAIN 277 | train.csv,219,27,33,BRAIN 278 | train.csv,220,120,134,BRAIN 279 | train.csv,221,80,86,BRAIN 280 | train.csv,222,49,55,BRAIN 281 | train.csv,222,73,87,BRAIN 282 | train.csv,223,128,142,BRAIN 283 | train.csv,224,9,15,BRAIN 284 | train.csv,224,97,102,BRAIN 285 | train.csv,225,0,7,MOVEMENT 286 | train.csv,225,75,81,MOVEMENT 287 | train.csv,228,0,7,MOVEMENT 288 | train.csv,229,34,41,MOVEMENT 289 | train.csv,229,53,57,MOVEMENT 290 | train.csv,231,25,32,MOVEMENT 291 | train.csv,234,5,11,MOVEMENT 292 | train.csv,235,56,63,MOVEMENT 293 | train.csv,238,39,49,MOVEMENT 294 | train.csv,239,32,42,MOVEMENT 295 | train.csv,240,56,66,MOVEMENT 296 | train.csv,240,150,157,MOVEMENT 297 | train.csv,241,35,42,MOVEMENT 298 | train.csv,242,75,85,MOVEMENT 299 | train.csv,244,124,131,MOVEMENT 300 | train.csv,244,135,142,MOVEMENT 301 | train.csv,245,46,50,MOVEMENT 302 | train.csv,245,55,58,MOVEMENT 303 | train.csv,245,90,95,MOVEMENT 304 | train.csv,249,23,27,MOVEMENT 305 | train.csv,250,80,84,MOVEMENT 306 | train.csv,251,44,51,MOVEMENT 307 | train.csv,251,64,70,MOVEMENT 308 | train.csv,252,42,48,MOVEMENT 309 | train.csv,253,141,147,MOVEMENT 310 | train.csv,254,65,73,MOVEMENT 311 | train.csv,254,93,100,MOVEMENT 312 | train.csv,255,77,84,MOVEMENT 313 | train.csv,255,96,103,MOVEMENT 314 | train.csv,256,71,78,MOVEMENT 315 | train.csv,257,34,41,MOVEMENT 316 | train.csv,257,53,57,MOVEMENT 317 | train.csv,258,36,43,MOVEMENT 318 | train.csv,258,48,55,MOVEMENT 319 | train.csv,258,102,112,MOVEMENT 320 | train.csv,259,130,137,MOVEMENT 321 | train.csv,259,165,172,MOVEMENT 322 | train.csv,260,80,87,MOVEMENT 323 | train.csv,261,54,59,MOVEMENT 324 | train.csv,261,115,122,MOVEMENT 325 | train.csv,262,40,44,MOVEMENT 326 | train.csv,263,9,13,MOVEMENT 327 | train.csv,265,46,51,MOVEMENT 328 | train.csv,265,89,96,MOVEMENT 329 | train.csv,266,80,85,MOVEMENT 330 | train.csv,266,125,132,MOVEMENT 331 | train.csv,267,14,20,MOVEMENT 332 | train.csv,267,231,237,MOVEMENT 333 | train.csv,268,93,97,MOVEMENT 334 | train.csv,269,99,106,MOVEMENT 335 | train.csv,271,45,67,ETHICS 336 | train.csv,271,92,108,BRAIN 337 | train.csv,272,28,45,BRAIN 338 | train.csv,272,58,68,ETHICS 339 | train.csv,273,145,155,ETHICS 340 | train.csv,274,123,140,BRAIN 341 | train.csv,275,79,96,BRAIN 342 | train.csv,276,56,61,MOVEMENT 343 | train.csv,276,99,109,BRAIN 344 | train.csv,277,78,95,BRAIN 345 | train.csv,278,70,87,BRAIN 346 | train.csv,279,53,63,BRAIN 347 | train.csv,279,191,201,ETHICS 348 | train.csv,280,200,210,BRAIN 349 | train.csv,281,57,79,ETHICS 350 | train.csv,281,184,207,BRAIN 351 | train.csv,282,99,109,ETHICS 352 | train.csv,283,55,65,ETHICS 353 | train.csv,284,65,83,ETHICS 354 | train.csv,284,104,129,ETHICS 355 | train.csv,285,170,195,ETHICS 356 | train.csv,286,296,306,ETHICS 357 | train.csv,287,79,89,ETHICS 358 | train.csv,287,94,117,BRAIN 359 | train.csv,288,109,131,ETHICS 360 | train.csv,289,299,315,BRAIN 361 | train.csv,290,102,112,BRAIN 362 | train.csv,291,4,20,BRAIN 363 | train.csv,291,153,157,MOVEMENT 364 | train.csv,292,0,16,BRAIN 365 | train.csv,293,87,97,ETHICS 366 | train.csv,294,21,37,BRAIN 367 | train.csv,294,157,167,ETHICS 368 | train.csv,295,80,90,ETHICS 369 | train.csv,295,189,211,ETHICS 370 | train.csv,296,102,118,BRAIN 371 | train.csv,297,135,151,BRAIN 372 | train.csv,298,99,109,BRAIN 373 | train.csv,300,73,82,ETHICS 374 | train.csv,300,119,137,ETHICS 375 | train.csv,301,4,14,BRAIN 376 | train.csv,302,48,64,BRAIN 377 | train.csv,303,7,17,BRAIN 378 | train.csv,304,28,44,BRAIN 379 | train.csv,305,100,110,BRAIN 380 | train.csv,306,33,50,BRAIN 381 | train.csv,307,192,209,BRAIN 382 | train.csv,308,81,97,BRAIN 383 | train.csv,308,165,174,ETHICS 384 | train.csv,309,102,118,BRAIN 385 | train.csv,310,69,85,BRAIN 386 | train.csv,311,22,31,ETHICS 387 | train.csv,312,109,118,ETHICS 388 | train.csv,313,59,69,ETHICS 389 | train.csv,313,91,114,ETHICS 390 | train.csv,313,137,167,ETHICS 391 | train.csv,314,130,139,ETHICS 392 | train.csv,315,59,64,BRAIN 393 | train.csv,315,90,100,ETHICS 394 | train.csv,316,40,49,ETHICS 395 | train.csv,316,74,84,ETHICS 396 | train.csv,316,206,222,BRAIN 397 | train.csv,317,131,153,ETHICS 398 | train.csv,318,114,131,BRAIN 399 | train.csv,319,29,46,BRAIN 400 | train.csv,320,44,52,MOVEMENT 401 | train.csv,322,161,166,MOVEMENT 402 | train.csv,322,176,180,MOVEMENT 403 | train.csv,323,41,45,MOVEMENT 404 | train.csv,325,35,39,MOVEMENT 405 | train.csv,326,203,225,ETHICS 406 | train.csv,327,38,42,MOVEMENT 407 | train.csv,328,69,92,BRAIN 408 | train.csv,329,16,39,BRAIN 409 | train.csv,330,40,44,MOVEMENT 410 | train.csv,331,39,43,MOVEMENT 411 | train.csv,332,0,7,MOVEMENT 412 | train.csv,332,117,127,MOVEMENT 413 | train.csv,333,34,38,MOVEMENT 414 | train.csv,336,42,46,MOVEMENT 415 | train.csv,337,35,39,MOVEMENT 416 | train.csv,338,103,107,MOVEMENT 417 | train.csv,340,65,69,MOVEMENT 418 | train.csv,341,65,70,MOVEMENT 419 | train.csv,341,80,84,MOVEMENT 420 | train.csv,344,124,130,MOVEMENT 421 | train.csv,345,7,17,MOVEMENT 422 | train.csv,345,155,160,MOVEMENT 423 | train.csv,346,8,18,MOVEMENT 424 | train.csv,346,65,70,MOVEMENT 425 | train.csv,348,13,17,MOVEMENT 426 | train.csv,348,162,167,BRAIN 427 | train.csv,349,26,30,MOVEMENT 428 | train.csv,350,0,4,MOVEMENT 429 | train.csv,350,61,65,MOVEMENT 430 | train.csv,351,59,63,MOVEMENT 431 | train.csv,352,9,13,MOVEMENT 432 | train.csv,352,201,205,MOVEMENT 433 | train.csv,353,2,6,MOVEMENT 434 | train.csv,354,4,8,MOVEMENT 435 | train.csv,354,42,47,MOVEMENT 436 | train.csv,354,52,58,MOVEMENT 437 | train.csv,355,4,8,MOVEMENT 438 | train.csv,356,4,8,MOVEMENT 439 | train.csv,356,79,89,MOVEMENT 440 | train.csv,357,25,30,MOVEMENT 441 | train.csv,357,35,41,MOVEMENT 442 | train.csv,357,149,153,MOVEMENT 443 | train.csv,358,4,10,MOVEMENT 444 | train.csv,358,45,49,MOVEMENT 445 | train.csv,358,108,115,MOVEMENT 446 | train.csv,359,4,9,MOVEMENT 447 | train.csv,359,105,109,MOVEMENT 448 | train.csv,360,14,18,MOVEMENT 449 | train.csv,361,76,80,MOVEMENT 450 | train.csv,362,142,146,MOVEMENT 451 | train.csv,363,37,41,MOVEMENT 452 | train.csv,363,78,85,MOVEMENT 453 | train.csv,364,16,24,MOVEMENT 454 | train.csv,364,136,142,MOVEMENT 455 | train.csv,365,4,12,MOVEMENT 456 | train.csv,365,101,111,MOVEMENT 457 | train.csv,366,0,8,MOVEMENT 458 | train.csv,366,56,61,MOVEMENT 459 | train.csv,367,21,31,MOVEMENT 460 | train.csv,368,134,156,ETHICS 461 | train.csv,369,62,72,ETHICS 462 | train.csv,370,55,68,ETHICS 463 | train.csv,371,142,152,ETHICS 464 | train.csv,372,4,26,ETHICS 465 | train.csv,372,74,87,ETHICS 466 | train.csv,373,4,14,ETHICS 467 | train.csv,373,86,95,ETHICS 468 | train.csv,373,194,204,ETHICS 469 | train.csv,373,215,235,ETHICS 470 | train.csv,373,322,331,ETHICS 471 | train.csv,374,49,66,BRAIN 472 | train.csv,375,159,169,ETHICS 473 | train.csv,377,31,41,ETHICS 474 | train.csv,377,114,144,ETHICS 475 | train.csv,378,4,14,ETHICS 476 | train.csv,379,22,32,ETHICS 477 | train.csv,380,37,47,ETHICS 478 | train.csv,381,63,72,ETHICS 479 | train.csv,382,41,51,ETHICS 480 | train.csv,384,136,145,ETHICS 481 | train.csv,385,137,147,ETHICS 482 | train.csv,386,116,126,ETHICS 483 | train.csv,387,212,221,ETHICS 484 | train.csv,388,164,174,ETHICS 485 | train.csv,389,79,88,ETHICS 486 | train.csv,390,178,188,ETHICS 487 | train.csv,393,67,89,ETHICS 488 | train.csv,394,24,34,ETHICS 489 | train.csv,395,26,36,ETHICS 490 | train.csv,396,156,166,ETHICS 491 | train.csv,397,13,22,ETHICS 492 | train.csv,398,105,114,ETHICS 493 | train.csv,398,124,134,ETHICS 494 | train.csv,399,82,92,ETHICS 495 | train.csv,400,144,154,ETHICS 496 | train.csv,401,70,86,BRAIN 497 | train.csv,401,152,161,ETHICS 498 | train.csv,402,24,40,BRAIN 499 | train.csv,403,32,42,ETHICS 500 | train.csv,403,126,131,BRAIN 501 | train.csv,404,90,100,ETHICS 502 | -------------------------------------------------------------------------------- /notebooks/2-Train-Detect-Entities/AIM317-reInvent2021-train-and-detect-entities.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Amazon Comprehend Custom Entity Recognizer\n", 8 | "\n", 9 | "This notebook will serve as a template for the overall process of taking a text dataset and integrating it into [Amazon Comprehend Custom Entity Recognizer](https://docs.aws.amazon.com/comprehend/latest/dg/custom-entity-recognition.html) and perform natural language processing (NLP) to detect custom entities in your text.\n", 10 | "\n", 11 | "## Overview\n", 12 | "\n", 13 | "1. [Introduction to Amazon Comprehend Custom NER](#Introduction)\n", 14 | "1. [Obtaining Your Data](#data)\n", 15 | "1. [Pre-processing data](#preprocess)\n", 16 | "1. [Training a custom recognizer](#train)\n", 17 | "1. [Real time inference](#inference)\n", 18 | "1. [Cleanup](#cleanup)" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": {}, 24 | "source": [ 25 | "## Introduction to Amazon Comprehend Custom Entity Recognition \n", 26 | "\n", 27 | "Amazon Comprehend recognizes and detects nine entity types out of the box from your data, such as person, date, place etc. Custom entity recognition extends the capability of Amazon Comprehend by helping you identify your specific new entity types that are not of from the preset generic entity types. In this case, this notebook trains Amazon Comprehend to detect three additional entity types - Robot Ethics, Positronic Brain and Kinematics.\n", 28 | "\n", 29 | "Building a custom entity recognizer helps to identify key words and phrases that are relevant to your business needs, and Amazon Comprehend helps you in reducing the complexity by providing automatic annotation and model training to create a custom entity model. For more information, see [Comprehend Custom Entity Recognition](https://docs.aws.amazon.com/comprehend/latest/dg/custom-entity-recognition.html)" 30 | ] 31 | }, 32 | { 33 | "cell_type": "markdown", 34 | "metadata": {}, 35 | "source": [ 36 | "## Obtaining Your Data \n", 37 | "\n", 38 | "To train a custom entity recognizer, Amazon Comprehend needs training data in one of two formats -\n", 39 | "1. **Entity Lists (plain text only)**\n", 40 | "You specify a list of documents that contain your entities, and in addition, specify a list of specific entities to search for in the documents. This is preferred when you have a finite list of entities to work with (for example, the EasyTron model names).\n", 41 | "2. **Annotations**\n", 42 | "This is more comprehensive, and provides the location of your entities in a large number of documents using the entity locations (offsets). Through this, Comprehnd can train on both the entity and its context. \n", 43 | "\n", 44 | "For our use case, to generate custom annotations, we make use of [Amazon SageMaker Ground Truth](https://aws.amazon.com/sagemaker/groundtruth/). We use Ground Truth with a private workforce to annotate the entities in hundreds of documents, and generate annotation files using the results. To learn more about how to use Ground Truth to annotate data, see [Named Entity Recognition](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-named-entity-recg.html).\n", 45 | "\n", 46 | "For the lab, we have already labeled the data and the annotation files are provided. " 47 | ] 48 | }, 49 | { 50 | "cell_type": "markdown", 51 | "metadata": {}, 52 | "source": [ 53 | "## Pre-processing data " 54 | ] 55 | }, 56 | { 57 | "cell_type": "code", 58 | "execution_count": null, 59 | "metadata": {}, 60 | "outputs": [], 61 | "source": [ 62 | "# Firstly, we import necessary libraries and initialize clients\n", 63 | "import re\n", 64 | "import time\n", 65 | "import json\n", 66 | "import uuid\n", 67 | "import boto3\n", 68 | "import random\n", 69 | "import secrets\n", 70 | "import datetime\n", 71 | "import sagemaker\n", 72 | "import pandas as pd\n", 73 | "from sagemaker import get_execution_role\n", 74 | "\n", 75 | "\n", 76 | "s3 = boto3.client('s3')\n", 77 | "comprehend = boto3.client('comprehend')\n", 78 | "\n", 79 | "# provide the name of your S3 bucket here. This was already created in your account for this workshop\n", 80 | "bucket = '' \n", 81 | "\n", 82 | "region = boto3.session.Session().region_name\n", 83 | "\n", 84 | "# Amazon S3 (S3) client\n", 85 | "s3 = boto3.client('s3', region)\n", 86 | "s3_resource = boto3.resource('s3')\n", 87 | "try:\n", 88 | " s3.head_bucket(Bucket=bucket)\n", 89 | "except:\n", 90 | " print(\"The S3 bucket name {} you entered seems to be incorrect, please try again\".format(bucket))" 91 | ] 92 | }, 93 | { 94 | "cell_type": "code", 95 | "execution_count": null, 96 | "metadata": {}, 97 | "outputs": [], 98 | "source": [ 99 | "# This is the execution role that will be used to call Amazon Transcribe and Amazon Translate\n", 100 | "role = get_execution_role()\n", 101 | "display(role)" 102 | ] 103 | }, 104 | { 105 | "cell_type": "markdown", 106 | "metadata": {}, 107 | "source": [ 108 | "#### We already provided you a training dataset and an annotations file in the repository, let's have a look at them now" 109 | ] 110 | }, 111 | { 112 | "cell_type": "code", 113 | "execution_count": null, 114 | "metadata": {}, 115 | "outputs": [], 116 | "source": [ 117 | "pd.read_csv('train.csv',header=None).head(10)" 118 | ] 119 | }, 120 | { 121 | "cell_type": "code", 122 | "execution_count": null, 123 | "metadata": {}, 124 | "outputs": [], 125 | "source": [ 126 | "pd.read_csv('annotations.csv').head(10)" 127 | ] 128 | }, 129 | { 130 | "cell_type": "code", 131 | "execution_count": null, 132 | "metadata": {}, 133 | "outputs": [], 134 | "source": [ 135 | "# let's upload our train and annotation files to S3\n", 136 | "s3.upload_file('train.csv', bucket, 'comprehend/train/train.csv')\n", 137 | "s3.upload_file('annotations.csv', bucket, 'comprehend/train/annotations.csv')\n", 138 | "s3_train_channel = \"s3://\" + bucket + \"/comprehend/train/train.csv\"\n", 139 | "s3_annot_channel = \"s3://\" + bucket + \"/comprehend/train/annotations.csv\"" 140 | ] 141 | }, 142 | { 143 | "cell_type": "markdown", 144 | "metadata": {}, 145 | "source": [ 146 | "### Create Comprehend Custom Entity Recognizer" 147 | ] 148 | }, 149 | { 150 | "cell_type": "code", 151 | "execution_count": null, 152 | "metadata": {}, 153 | "outputs": [], 154 | "source": [ 155 | "custom_entity_request = {\n", 156 | " \"DataFormat\": \"COMPREHEND_CSV\",\n", 157 | " \"Documents\": { \n", 158 | " \"S3Uri\": s3_train_channel,\n", 159 | " \"InputFormat\": \"ONE_DOC_PER_LINE\"\n", 160 | " },\n", 161 | " \"Annotations\": { \n", 162 | " \"S3Uri\": s3_annot_channel\n", 163 | " },\n", 164 | " \"EntityTypes\": [\n", 165 | " {\n", 166 | " \"Type\": \"MOVEMENT\"\n", 167 | " },\n", 168 | " {\n", 169 | " \"Type\": \"BRAIN\"\n", 170 | " },\n", 171 | " {\n", 172 | " \"Type\": \"ETHICS\"\n", 173 | " }\n", 174 | " ]\n", 175 | "}" 176 | ] 177 | }, 178 | { 179 | "cell_type": "code", 180 | "execution_count": null, 181 | "metadata": {}, 182 | "outputs": [], 183 | "source": [ 184 | "# create unique ID for recognizer\n", 185 | "uid = str(uuid.uuid4())\n", 186 | "\n", 187 | "response = comprehend.create_entity_recognizer(\n", 188 | " RecognizerName=f\"aim317-ner-{uid}\", \n", 189 | " DataAccessRoleArn=role,\n", 190 | " InputDataConfig=custom_entity_request,\n", 191 | " LanguageCode=\"en\",\n", 192 | " VersionName= 'v001'\n", 193 | ")\n", 194 | "\n", 195 | "print(response['EntityRecognizerArn'])" 196 | ] 197 | }, 198 | { 199 | "cell_type": "markdown", 200 | "metadata": {}, 201 | "source": [ 202 | "### Check training status in Amazon Comprehend console\n", 203 | "\n", 204 | "[Go to Amazon Comprehend Console](https://console.aws.amazon.com/comprehend/v2/home?region=us-east-1#entity-recognition)\n", 205 | "\n", 206 | "This will take approximately 20 minutes. **Execute the Entity Recongizer Metrics step below only after** the entity recognizer model has been created and is ready for use. Otherwise you will get an error message. If this is the case no worries, just try it again after the entity recognizer has finished training." 207 | ] 208 | }, 209 | { 210 | "cell_type": "code", 211 | "execution_count": null, 212 | "metadata": {}, 213 | "outputs": [], 214 | "source": [ 215 | "describe_response = comprehend.describe_entity_recognizer(\n", 216 | " EntityRecognizerArn=response['EntityRecognizerArn']\n", 217 | ")\n", 218 | "\n", 219 | "print(describe_response['EntityRecognizerProperties']['Status'])" 220 | ] 221 | }, 222 | { 223 | "cell_type": "markdown", 224 | "metadata": {}, 225 | "source": [ 226 | "### Entity Recognizer Metrics" 227 | ] 228 | }, 229 | { 230 | "cell_type": "code", 231 | "execution_count": null, 232 | "metadata": {}, 233 | "outputs": [], 234 | "source": [ 235 | "# Print recognizer metrics\n", 236 | "print(\"Entity recognizer metrics:\")\n", 237 | "for ent in describe_response[\"EntityRecognizerProperties\"][\"RecognizerMetadata\"][\"EntityTypes\"]:\n", 238 | " print(ent['Type'])\n", 239 | " metrics = ent['EvaluationMetrics']\n", 240 | " for k, v in metrics.items():\n", 241 | " metrics[k] = round(v, 2)\n", 242 | " print(metrics)" 243 | ] 244 | }, 245 | { 246 | "cell_type": "code", 247 | "execution_count": null, 248 | "metadata": {}, 249 | "outputs": [], 250 | "source": [ 251 | "describe_response['EntityRecognizerProperties']['EntityRecognizerArn']" 252 | ] 253 | }, 254 | { 255 | "cell_type": "markdown", 256 | "metadata": {}, 257 | "source": [ 258 | "## Create endpoint" 259 | ] 260 | }, 261 | { 262 | "cell_type": "markdown", 263 | "metadata": {}, 264 | "source": [ 265 | "Now that the model is trained, we'll deploy the model to an Amazon Comprehend endpoint for synchronous, real-time inference. " 266 | ] 267 | }, 268 | { 269 | "cell_type": "code", 270 | "execution_count": null, 271 | "metadata": {}, 272 | "outputs": [], 273 | "source": [ 274 | "# NOTE - We are using real-time endpoints and chunked text for demo purposes in this workshop. For your actual use case\n", 275 | " # if you don't need real-time insights from Comprehend, we suggest using Comprehend start_entities_detection_job or batch_detect_entities to send the full corpus for entity detection\n", 276 | " # If your need is real-time inference, please use the Comprehend real-time endpoint as we show in this notebook.\n", 277 | " # We have used 4 Inference Units (IU) in this workshop, each IU has a throughput of 100 characters per second.\n", 278 | "endpoint_response = comprehend.create_endpoint(\n", 279 | " EndpointName=f\"aim317-ner-endpoint\",\n", 280 | " ModelArn=describe_response['EntityRecognizerProperties']['EntityRecognizerArn'],\n", 281 | " DesiredInferenceUnits=4, # you are charged based on Inference Units, for this workshop lets create 4 IUs\n", 282 | " DataAccessRoleArn=role\n", 283 | ")" 284 | ] 285 | }, 286 | { 287 | "cell_type": "code", 288 | "execution_count": null, 289 | "metadata": {}, 290 | "outputs": [], 291 | "source": [ 292 | "print(endpoint_response['EndpointArn'])" 293 | ] 294 | }, 295 | { 296 | "cell_type": "markdown", 297 | "metadata": {}, 298 | "source": [ 299 | "### Check endpoint status in Amazon Comprehend console\n", 300 | "\n", 301 | "[Go to Amazon Comprehend Console](https://console.aws.amazon.com/comprehend/v2/home?region=us-east-1#endpoints)\n", 302 | "\n", 303 | "This will take approximately 10 minutes. Go to the **Run Inference** step below after the endpoint has been created and is ready for use. Running the cells prior to the endpoint being ready will result in error. You can re-execute the cell after the endpoint becomes available." 304 | ] 305 | }, 306 | { 307 | "cell_type": "markdown", 308 | "metadata": {}, 309 | "source": [ 310 | "## Run inference" 311 | ] 312 | }, 313 | { 314 | "cell_type": "code", 315 | "execution_count": null, 316 | "metadata": {}, 317 | "outputs": [], 318 | "source": [ 319 | "# Input files ready for entity recognition\n", 320 | "!aws s3 ls s3://{bucket}/comprehend/input/" 321 | ] 322 | }, 323 | { 324 | "cell_type": "code", 325 | "execution_count": null, 326 | "metadata": {}, 327 | "outputs": [], 328 | "source": [ 329 | "# Prepare to page through our transcripts in S3\n", 330 | "\n", 331 | "# Define the S3 handles\n", 332 | "s3 = boto3.client('s3')\n", 333 | "s3_resource = boto3.resource('s3')\n", 334 | "\n", 335 | "\n", 336 | "# Specify an S3 output prefix\n", 337 | "t_prefix = 'quicksight/data/entity'\n", 338 | "\n", 339 | "\n", 340 | "# Lets define the bucket name that contains the transcripts first\n", 341 | "# So far we used a session bucket we created for training and testing the classifier\n", 342 | "paginator = s3.get_paginator('list_objects_v2')\n", 343 | "pages = paginator.paginate(Bucket=bucket, Prefix='comprehend/input')\n", 344 | "job_name_list = []\n", 345 | "\n", 346 | "# We will use a temp DataFrame to extract the entity type that is most prominent in the transcript\n", 347 | "tempcols = ['Type', 'Score']\n", 348 | "df_temp = pd.DataFrame(columns=tempcols)\n", 349 | "\n", 350 | "\n", 351 | "# We will define a DataFrame to store the results of the classifier\n", 352 | "cols = ['transcript_name', 'entity_type']\n", 353 | "df_ent = pd.DataFrame(columns=cols)\n", 354 | "\n", 355 | "# Now lets page through the transcripts\n", 356 | "for page in pages:\n", 357 | " for obj in page['Contents']:\n", 358 | " entity = ''\n", 359 | " # get the transcript file name\n", 360 | " transcript_file_name = obj['Key'].split('/')[2]\n", 361 | " # now lets get the transcript file contents\n", 362 | " temp = s3_resource.Object(bucket, obj['Key'])\n", 363 | " transcript_content = temp.get()['Body'].read().decode('utf-8')\n", 364 | " # Send a chunk of the transcript for entity recognition\n", 365 | " # NOTE - We are using real-time endpoints and chunked text for demo purposes in this workshop. For your actual use case\n", 366 | " # if you don't need real-time insights from Comprehend, we suggest using Comprehend start_entities_detection_job or batch_detect_entities to send the full corpus for entity detection\n", 367 | " # If your need is real-time inference, please use the Comprehend real-time endpoint as we show in this notebook.\n", 368 | " # We have used 4 Inference Units (IU) in this workshop, each IU has a throughput of 100 characters per second.\n", 369 | " transcript_truncated = transcript_content[400:1800]\n", 370 | " # Call Comprehend to get the entity types the transcript belongs to\n", 371 | " response = comprehend.detect_entities(Text=transcript_truncated, LanguageCode='en', EndpointArn=endpoint_response['EndpointArn'])\n", 372 | " # Extract prominent entity\n", 373 | " df_temp = pd.DataFrame(columns=tempcols)\n", 374 | " for ent in response['Entities']:\n", 375 | " df_temp.loc[len(df_temp.index)] = [ent['Type'],ent['Score']]\n", 376 | " if len(df_temp) > 0:\n", 377 | " entity = df_temp.iloc[df_temp.Score.argmax(), 0:2]['Type']\n", 378 | " else:\n", 379 | " entity = 'No entities'\n", 380 | " \n", 381 | " # Update the results DataFrame with the detected entities\n", 382 | " df_ent.loc[len(df_ent.index)] = [transcript_file_name.strip('en-').strip('.txt'),entity] \n", 383 | "\n", 384 | " # Create a CSV file with cta label from this DataFrame\n", 385 | "df_ent.to_csv('s3://' + bucket + '/' + t_prefix + '/' + 'entities.csv', index=False)\n", 386 | "df_ent" 387 | ] 388 | }, 389 | { 390 | "cell_type": "markdown", 391 | "metadata": {}, 392 | "source": [ 393 | "### We are done here. You can return to the workshop instructions for next steps" 394 | ] 395 | } 396 | ], 397 | "metadata": { 398 | "instance_type": "ml.t3.medium", 399 | "interpreter": { 400 | "hash": "9ddb102edfbd95000dbbd260d8bbcf82701cc06b4dcf114fa04ba84aab75adcb" 401 | }, 402 | "kernelspec": { 403 | "display_name": "conda_python3", 404 | "language": "python", 405 | "name": "conda_python3" 406 | }, 407 | "language_info": { 408 | "codemirror_mode": { 409 | "name": "ipython", 410 | "version": 3 411 | }, 412 | "file_extension": ".py", 413 | "mimetype": "text/x-python", 414 | "name": "python", 415 | "nbconvert_exporter": "python", 416 | "pygments_lexer": "ipython3", 417 | "version": "3.6.13" 418 | } 419 | }, 420 | "nbformat": 4, 421 | "nbformat_minor": 4 422 | } 423 | -------------------------------------------------------------------------------- /notebooks/3-Train-Classify-Calls/AIM317-reInvent2021-train-and-classify-customer-calls.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Amazon Comprehend Custom Classification\n", 8 | "\n", 9 | "This notebook will serve as a template for the overall process of taking a text dataset and integrating it into [Amazon Comprehend Custom Classification](https://docs.aws.amazon.com/comprehend/latest/dg/how-document-classification.html) and perform NLP for custom classification.\n", 10 | "\n", 11 | "## Overview\n", 12 | "\n", 13 | "1. [Introduction to Amazon Comprehend Custom Classification](#Introduction)\n", 14 | "1. [Obtaining Your Data](#data)\n", 15 | "1. [Pre-processing data](#preprocess)\n", 16 | "1. [Building Custom Classification model](#build)\n", 17 | "1. [Real time inference](#inference)\n", 18 | "1. [Cleanup](#cleanup)\n", 19 | "\n", 20 | "\n", 21 | "## Introduction to Amazon Comprehend Custom Classification \n", 22 | "\n", 23 | "If you are not familiar with Amazon Comprehend Custom Classification you can learn more about this tool on these pages:\n", 24 | "\n", 25 | "* [Product Page](https://aws.amazon.com/comprehend/)\n", 26 | "* [Product Docs](https://docs.aws.amazon.com/comprehend/latest/dg/how-document-classification.html)\n", 27 | "\n", 28 | "## Training a custom classifier\n", 29 | "\n", 30 | "Custom classification is a two-step process. First, you train a custom classifier to recognize the classes that are of interest to you. Then you send unlabeled documents to be classified.\n", 31 | "\n", 32 | "To train the classifier, specify the options you want, and send Amazon Comprehend documents to be used as training material. Based on the options you indicated, Amazon Comprehend creates a custom ML model that it trains based on the documents you provided. This custom model (the classifier) examines each document you submit. It then returns either the specific class that best represents the content (if you're using multi-class mode) or the set of classes that apply to it (if you're using multi-label mode).\n", 33 | "\n", 34 | "We are going to use a Hugging Face pre-canned dataset of customer reviews and use the multi-class mode. We ensure that dataset is a .csv and the format of the file must be one class and document per line. For example:\n", 35 | "```\n", 36 | "CLASS,Text of document 1\n", 37 | "CLASS,Text of document 2\n", 38 | "CLASS,Text of document 3\n", 39 | "```\n" 40 | ] 41 | }, 42 | { 43 | "cell_type": "code", 44 | "execution_count": null, 45 | "metadata": {}, 46 | "outputs": [], 47 | "source": [ 48 | "# Install Hugging Face datasets package\n", 49 | "!pip --disable-pip-version-check install datasets --quiet" 50 | ] 51 | }, 52 | { 53 | "cell_type": "markdown", 54 | "metadata": {}, 55 | "source": [ 56 | "With the datasets installed, now we will import the Pandas library as well as a few other data science tools in order to inspect the information." 57 | ] 58 | }, 59 | { 60 | "cell_type": "code", 61 | "execution_count": null, 62 | "metadata": {}, 63 | "outputs": [], 64 | "source": [ 65 | "import os\n", 66 | "import json\n", 67 | "import time\n", 68 | "import uuid\n", 69 | "import boto3\n", 70 | "import pprint\n", 71 | "import string\n", 72 | "import random\n", 73 | "import datetime \n", 74 | "import subprocess\n", 75 | "import numpy as np\n", 76 | "import pandas as pd\n", 77 | "from time import sleep" 78 | ] 79 | }, 80 | { 81 | "cell_type": "markdown", 82 | "metadata": {}, 83 | "source": [ 84 | "Lets load the data in to dataframe and look at the data we uploaded. Examine the number of columns that are present. Look at few samples to see the content of the data. **This will take 5 to 8 minutes**.\n", 85 | "\n", 86 | "**Note:** CTA means call to action. No CTA means no call to action. This is a metric to determine if the customer's concern was addressed by the agent during the call. A CTA indicates that the customer is satisfied that their concerns has been or will be addressed by the company." 87 | ] 88 | }, 89 | { 90 | "cell_type": "code", 91 | "execution_count": null, 92 | "metadata": {}, 93 | "outputs": [], 94 | "source": [ 95 | "from datasets import load_dataset\n", 96 | "dataset = load_dataset('amazon_us_reviews', 'Electronics_v1_00', split='train[:10%]')" 97 | ] 98 | }, 99 | { 100 | "cell_type": "code", 101 | "execution_count": null, 102 | "metadata": {}, 103 | "outputs": [], 104 | "source": [ 105 | "dataset.set_format(type='pandas')\n", 106 | "df = dataset[:1000]" 107 | ] 108 | }, 109 | { 110 | "cell_type": "markdown", 111 | "metadata": {}, 112 | "source": [ 113 | "To convert data to the format that is required by Amazon Comprehend Custom Classifier,\n", 114 | "\n", 115 | "```\n", 116 | "CLASS,Text of document 1\n", 117 | "CLASS,Text of document 2\n", 118 | "CLASS,Text of document 3\n", 119 | "```\n", 120 | "We will identify the column which are class and which have the text content we would like to train on, we can create a new dataframe with selected columns." 121 | ] 122 | }, 123 | { 124 | "cell_type": "code", 125 | "execution_count": null, 126 | "metadata": {}, 127 | "outputs": [], 128 | "source": [ 129 | "df1 = df[['star_rating','review_body']]\n", 130 | "df1 = df1.rename(columns={\"review_body\": \"text\", \"star_rating\": \"class\"})" 131 | ] 132 | }, 133 | { 134 | "cell_type": "markdown", 135 | "metadata": {}, 136 | "source": [ 137 | "We will translate the customer product ratings to CTA (call-to-action) and No CTA (no call-to-action). All ratings from 3 and above are considerd as CTA (customer is satisfied) with 1 and 2 considered as No CTA (customer is not satisfied)" 138 | ] 139 | }, 140 | { 141 | "cell_type": "code", 142 | "execution_count": null, 143 | "metadata": {}, 144 | "outputs": [], 145 | "source": [ 146 | "df1.loc[df1['class'] >= 3, 'class'] = 'CTA'\n", 147 | "df1.loc[df1['class'] != 'CTA', 'class'] = 'No CTA'" 148 | ] 149 | }, 150 | { 151 | "cell_type": "markdown", 152 | "metadata": {}, 153 | "source": [ 154 | "Remove all punctuation from the text" 155 | ] 156 | }, 157 | { 158 | "cell_type": "code", 159 | "execution_count": null, 160 | "metadata": {}, 161 | "outputs": [], 162 | "source": [ 163 | "import string\n", 164 | "for i,row in df1.iterrows():\n", 165 | " a = row['text'].strip(string.punctuation)\n", 166 | " df1.loc[i,'text'] = a" 167 | ] 168 | }, 169 | { 170 | "cell_type": "code", 171 | "execution_count": null, 172 | "metadata": {}, 173 | "outputs": [], 174 | "source": [ 175 | "df1.head()" 176 | ] 177 | }, 178 | { 179 | "cell_type": "code", 180 | "execution_count": null, 181 | "metadata": {}, 182 | "outputs": [], 183 | "source": [ 184 | "df1['class'].value_counts()" 185 | ] 186 | }, 187 | { 188 | "cell_type": "markdown", 189 | "metadata": {}, 190 | "source": [ 191 | "## Pre-processing data \n" 192 | ] 193 | }, 194 | { 195 | "cell_type": "markdown", 196 | "metadata": {}, 197 | "source": [ 198 | "For training, the file format must conform with the [following](https://docs.aws.amazon.com/comprehend/latest/dg/how-document-classification-training.html):\n", 199 | "\n", 200 | "- File must contain one label and one text per line – 2 columns\n", 201 | "- No header\n", 202 | "- Format UTF-8, carriage return “\\n”.\n", 203 | "\n", 204 | "Labels “must be uppercase, can be multitoken, have whitespace, consist of multiple words connect by underscores or hyphens or may even contain a comma in it, as long as it is correctly escaped.”\n", 205 | "\n", 206 | "For the inference part of it - when you want your custom model to determine which label corresponds to a given text -, the file format must conform with the following:\n", 207 | "\n", 208 | "- File must contain text per line\n", 209 | "- No header\n", 210 | "- Format UTF-8, carriage return “\\n”." 211 | ] 212 | }, 213 | { 214 | "cell_type": "markdown", 215 | "metadata": {}, 216 | "source": [ 217 | "At this point we have all the data the 2 needed files. \n", 218 | "\n", 219 | "### Building The Target Train and Test Files\n", 220 | "\n", 221 | "With all of the above spelled out the next thing to do is to build training file:\n", 222 | "\n", 223 | "1. `comprehend-train.csv` - A CSV file containing 2 columns without header, first column class, second column text." 224 | ] 225 | }, 226 | { 227 | "cell_type": "code", 228 | "execution_count": null, 229 | "metadata": {}, 230 | "outputs": [], 231 | "source": [ 232 | "DSTTRAINFILE='comprehend-train.csv'\n", 233 | "\n", 234 | "df1.to_csv(path_or_buf=DSTTRAINFILE,\n", 235 | " header=False,\n", 236 | " index=False)" 237 | ] 238 | }, 239 | { 240 | "cell_type": "markdown", 241 | "metadata": {}, 242 | "source": [ 243 | "## Train an Amazon Comprehend custom classifier\n", 244 | "Now that all of the required data to get started exists, we can start working on Comprehend Custom Classfier. \n", 245 | "\n", 246 | "The custom classifier workload is built in two steps:\n", 247 | "\n", 248 | "1. Training the custom model – no particular machine learning or deep learning knowledge is necessary\n", 249 | "1. Classifying new data\n", 250 | "\n", 251 | "Lets follow below steps for Training the custom model:\n", 252 | "\n", 253 | "1. Specify the bucket name that was pre-created for you that will host training data artifacts and production results. \n", 254 | "1. Configure an IAM role allowing Comprehend to [access newly created buckets](https://docs.aws.amazon.com/comprehend/latest/dg/access-control-managing-permissions.html#auth-role-permissions)\n", 255 | "1. Prepare data for training\n", 256 | "1. Upload training data in the S3 bucket\n", 257 | "1. Launch a “Train Classifier” job from the console: “Amazon Comprehend” > “Custom Classification” > “Train Classifier”\n", 258 | "1. Prepare data for classification (one text per line, no header, same format as training data). Some more details [here](https://docs.aws.amazon.com/comprehend/latest/dg/how-class-run.html)\n" 259 | ] 260 | }, 261 | { 262 | "cell_type": "code", 263 | "execution_count": null, 264 | "metadata": {}, 265 | "outputs": [], 266 | "source": [ 267 | "# Get notebook's region\n", 268 | "region = boto3.Session().region_name\n", 269 | "print(region)" 270 | ] 271 | }, 272 | { 273 | "cell_type": "markdown", 274 | "metadata": {}, 275 | "source": [ 276 | "Configure your AWS APIs" 277 | ] 278 | }, 279 | { 280 | "cell_type": "code", 281 | "execution_count": null, 282 | "metadata": {}, 283 | "outputs": [], 284 | "source": [ 285 | "import sagemaker\n", 286 | "\n", 287 | "s3 = boto3.client('s3')\n", 288 | "comprehend = boto3.client('comprehend')\n", 289 | "role = sagemaker.get_execution_role()" 290 | ] 291 | }, 292 | { 293 | "cell_type": "markdown", 294 | "metadata": {}, 295 | "source": [ 296 | "Specify an Amazon s3 bucket that will host training data and test data. **Note:** This bucket should have been created already for you. Please go the Amazon S3 console to verify the bucket is present. It should start with `aim317...`. **Specify your bucket name in the cell below**." 297 | ] 298 | }, 299 | { 300 | "cell_type": "code", 301 | "execution_count": null, 302 | "metadata": {}, 303 | "outputs": [], 304 | "source": [ 305 | "bucket = '' # Provide your bucket name here\n", 306 | "prefix = 'comprehend-custom-classifier' # you can leave this as it is\n", 307 | "\n", 308 | "try:\n", 309 | " s3.head_bucket(Bucket=bucket)\n", 310 | "except:\n", 311 | " print(\"The S3 bucket name {} you entered seems to be incorrect, please try again\".format(bucket))" 312 | ] 313 | }, 314 | { 315 | "cell_type": "markdown", 316 | "metadata": {}, 317 | "source": [ 318 | "### Uploading the data" 319 | ] 320 | }, 321 | { 322 | "cell_type": "code", 323 | "execution_count": null, 324 | "metadata": {}, 325 | "outputs": [], 326 | "source": [ 327 | "s3.upload_file(DSTTRAINFILE, bucket, prefix+'/' + DSTTRAINFILE)" 328 | ] 329 | }, 330 | { 331 | "cell_type": "markdown", 332 | "metadata": {}, 333 | "source": [ 334 | "## Building Custom Classification model \n", 335 | "\n", 336 | "Launch the classifier training:" 337 | ] 338 | }, 339 | { 340 | "cell_type": "code", 341 | "execution_count": null, 342 | "metadata": {}, 343 | "outputs": [], 344 | "source": [ 345 | "s3_train_data = 's3://{}/{}/{}'.format(bucket, prefix, DSTTRAINFILE)\n", 346 | "s3_output_job = 's3://{}/{}/{}'.format(bucket, prefix, 'output/train_job')\n", 347 | "print('training data location: ',s3_train_data, \"output location:\", s3_output_job)" 348 | ] 349 | }, 350 | { 351 | "cell_type": "code", 352 | "execution_count": null, 353 | "metadata": {}, 354 | "outputs": [], 355 | "source": [ 356 | "uid = uuid.uuid4()\n", 357 | "\n", 358 | "training_job = comprehend.create_document_classifier(\n", 359 | " DocumentClassifierName='aim317-cc-' + str(uid),\n", 360 | " DataAccessRoleArn=role,\n", 361 | " InputDataConfig={\n", 362 | " 'S3Uri': s3_train_data\n", 363 | " },\n", 364 | " OutputDataConfig={\n", 365 | " 'S3Uri': s3_output_job\n", 366 | " },\n", 367 | " LanguageCode='en',\n", 368 | " VersionName='v001'\n", 369 | ")" 370 | ] 371 | }, 372 | { 373 | "cell_type": "markdown", 374 | "metadata": {}, 375 | "source": [ 376 | "### Check training status in Amazon Comprehend console\n", 377 | "\n", 378 | "[Go to Amazon Comprehend Console](https://console.aws.amazon.com/comprehend/v2/home?region=us-east-1#classification)\n", 379 | "\n", 380 | "This will take approximately 30 minutes. Go to the **Classifier Metrics** step below after the classifier has been created and is ready for use. Running the cells prior to classifier being ready, will throw an error. Simply re-execute the cell again after the classifier is ready." 381 | ] 382 | }, 383 | { 384 | "cell_type": "markdown", 385 | "metadata": {}, 386 | "source": [ 387 | "### Classifier Metrics" 388 | ] 389 | }, 390 | { 391 | "cell_type": "code", 392 | "execution_count": null, 393 | "metadata": {}, 394 | "outputs": [], 395 | "source": [ 396 | "response = comprehend.describe_document_classifier(\n", 397 | " DocumentClassifierArn=training_job['DocumentClassifierArn']\n", 398 | ")\n", 399 | "print(response['DocumentClassifierProperties']['ClassifierMetadata']['EvaluationMetrics'])" 400 | ] 401 | }, 402 | { 403 | "cell_type": "markdown", 404 | "metadata": {}, 405 | "source": [ 406 | "## Real time inference \n", 407 | "We will now use a custom classifier real time endpoint to detect if the audio transcripts and translated text contain indication of there is a clear CTA or not. " 408 | ] 409 | }, 410 | { 411 | "cell_type": "markdown", 412 | "metadata": {}, 413 | "source": [ 414 | "### Create endpoint" 415 | ] 416 | }, 417 | { 418 | "cell_type": "code", 419 | "execution_count": null, 420 | "metadata": {}, 421 | "outputs": [], 422 | "source": [ 423 | "model_arn = response[\"DocumentClassifierProperties\"][\"DocumentClassifierArn\"]\n", 424 | "print('Model used for real time endpoint ' + model_arn)" 425 | ] 426 | }, 427 | { 428 | "cell_type": "code", 429 | "execution_count": null, 430 | "metadata": {}, 431 | "outputs": [], 432 | "source": [ 433 | "# Let's create an endpoint with 4 Inference Units to account for us sending approximately 400 characters per second to the endpoint\n", 434 | "\n", 435 | "create_endpoint_response = comprehend.create_endpoint(\n", 436 | " EndpointName='aim317-cc-ep',\n", 437 | " ModelArn=model_arn,\n", 438 | " DesiredInferenceUnits=4,\n", 439 | " \n", 440 | ")\n", 441 | "\n", 442 | "print(create_endpoint_response['EndpointArn'])" 443 | ] 444 | }, 445 | { 446 | "cell_type": "markdown", 447 | "metadata": {}, 448 | "source": [ 449 | "### Check endpoint status in Amazon Comprehend console\n", 450 | "\n", 451 | "[Go to Amazon Comprehend Console](https://console.aws.amazon.com/comprehend/v2/home?region=us-east-1#endpoints)\n", 452 | "\n", 453 | "This will take approximately 10 minutes. Go to the **Run Inference** step below after the classifier has been created and is ready for use. Running the cells prior to classifier being ready, will lock the cell. This will presume only after classifier has been trained." 454 | ] 455 | }, 456 | { 457 | "cell_type": "markdown", 458 | "metadata": {}, 459 | "source": [ 460 | "### Run Inference\n", 461 | "\n", 462 | "Lets review the list of files ready for inference in the `comprehend/input` folder of our S3 bucket. These files were created by the notebook available in `1-Transcribe-Translate-Calls`" 463 | ] 464 | }, 465 | { 466 | "cell_type": "code", 467 | "execution_count": null, 468 | "metadata": {}, 469 | "outputs": [], 470 | "source": [ 471 | "# Input files ready for classification\n", 472 | "!aws s3 ls s3://{bucket}/comprehend/input/" 473 | ] 474 | }, 475 | { 476 | "cell_type": "code", 477 | "execution_count": null, 478 | "metadata": {}, 479 | "outputs": [], 480 | "source": [ 481 | "# Prepare to page through our transcripts in S3\n", 482 | "\n", 483 | "# Define the S3 handles\n", 484 | "s3 = boto3.client('s3')\n", 485 | "s3_resource = boto3.resource('s3')\n", 486 | "\n", 487 | "\n", 488 | "# We will be merging the classifier predictions with the transcript segments we created for quicksight in 1-Transcribe-Translate\n", 489 | "t_prefix = 'quicksight/data/cta'\n", 490 | "\n", 491 | "\n", 492 | "# Lets define the bucket name that contains the transcripts first\n", 493 | "# So far we used a session bucket we created for training and testing the classifier\n", 494 | "\n", 495 | "paginator = s3.get_paginator('list_objects_v2')\n", 496 | "pages = paginator.paginate(Bucket=bucket, Prefix='comprehend/input')\n", 497 | "a = []\n", 498 | "\n", 499 | "\n", 500 | "# We will define a DataFrame to store the results of the classifier\n", 501 | "cols = ['transcript_name', 'cta_status']\n", 502 | "df_class = pd.DataFrame(columns=cols)\n", 503 | "\n", 504 | "# Now lets page through the transcripts\n", 505 | "for page in pages:\n", 506 | " for obj in page['Contents']:\n", 507 | " cta = ''\n", 508 | " # get the transcript file name\n", 509 | " transcript_file_name = obj['Key'].split('/')[2]\n", 510 | " # now lets get the transcript file contents\n", 511 | " temp = s3_resource.Object(bucket, obj['Key'])\n", 512 | " transcript_content = temp.get()['Body'].read().decode('utf-8')\n", 513 | " # Send the last few sentence(s) for classification\n", 514 | " transcript_truncated = transcript_content[1500:1900]\n", 515 | " # Call Comprehend to classify input text\n", 516 | " response = comprehend.classify_document(Text=transcript_truncated, EndpointArn=create_endpoint_response['EndpointArn'])\n", 517 | " # Now we need to determine which of the two classes has the higher confidence score\n", 518 | " # Use the name for that score as our predicted label\n", 519 | " a = response['Classes']\n", 520 | " # We will use this temp DataFrame to extract the class with maximum confidence level for CTA\n", 521 | " tempcols = ['Name', 'Score']\n", 522 | " df_temp = pd.DataFrame(columns=tempcols)\n", 523 | " for i in range(0, 2):\n", 524 | " df_temp.loc[len(df_temp.index)] = [a[i]['Name'], a[i]['Score']]\n", 525 | " cta = df_temp.iloc[df_temp.Score.argmax(), 0:2]['Name']\n", 526 | " \n", 527 | " # Update the results DataFrame with the cta predicted label\n", 528 | " # Create a CSV file with cta label from this DataFrame\n", 529 | " df_class.loc[len(df_class.index)] = [transcript_file_name.strip('en-').strip('.txt'), cta] \n", 530 | "\n", 531 | "df_class.to_csv('s3://' + bucket + '/' + t_prefix + '/' + 'cta_status.csv', index=False)\n", 532 | "df_class" 533 | ] 534 | }, 535 | { 536 | "cell_type": "markdown", 537 | "metadata": {}, 538 | "source": [ 539 | "### End of notebook\n", 540 | "Please go back to the workshop instructions to continue to the next step" 541 | ] 542 | } 543 | ], 544 | "metadata": { 545 | "instance_type": "ml.t3.medium", 546 | "interpreter": { 547 | "hash": "9ddb102edfbd95000dbbd260d8bbcf82701cc06b4dcf114fa04ba84aab75adcb" 548 | }, 549 | "kernelspec": { 550 | "display_name": "conda_python3", 551 | "language": "python", 552 | "name": "conda_python3" 553 | }, 554 | "language_info": { 555 | "codemirror_mode": { 556 | "name": "ipython", 557 | "version": 3 558 | }, 559 | "file_extension": ".py", 560 | "mimetype": "text/x-python", 561 | "name": "python", 562 | "nbconvert_exporter": "python", 563 | "pygments_lexer": "ipython3", 564 | "version": "3.6.13" 565 | } 566 | }, 567 | "nbformat": 4, 568 | "nbformat_minor": 4 569 | } 570 | -------------------------------------------------------------------------------- /notebooks/1-Transcribe-Translate-Calls/AIM317-reInvent2021-transcribe-and-translate-customer-calls.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "04837ae6", 6 | "metadata": {}, 7 | "source": [ 8 | "## Boost transcription accuracy with Amazon Translate custom vocabulary and localize transcripts with Amazon Translate custom terminology\n", 9 | "\n", 10 | "This is the accompanying notebook for the re:Invent 2021 workshop AIM317 - Uncover insights from your customer conversations. Please run this notebook after reviewing the **[prerequisites and instructions](https://studio.us-east-1.prod.workshops.aws/preview/076e45e5-760d-41cf-bd22-a86c46ee462c/builds/83c4ddb7-fbc6-4e72-b5da-967f8fe7cfcb/en-US/1-transcribe-translate-calls)** from the workshop. " 11 | ] 12 | }, 13 | { 14 | "cell_type": "markdown", 15 | "id": "f591749b", 16 | "metadata": {}, 17 | "source": [ 18 | "## Prerequisites for this notebook" 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": null, 24 | "id": "49637499", 25 | "metadata": {}, 26 | "outputs": [], 27 | "source": [ 28 | "# First, let's install dependencies for the transcript word utility we will use in this notebook\n", 29 | "!pip install python-docx --quiet\n", 30 | "!pip install matplotlib --quiet\n", 31 | "!pip install scipy --quiet" 32 | ] 33 | }, 34 | { 35 | "cell_type": "markdown", 36 | "id": "c8906dc9", 37 | "metadata": {}, 38 | "source": [ 39 | "### Import libraries and initialize variables" 40 | ] 41 | }, 42 | { 43 | "cell_type": "code", 44 | "execution_count": null, 45 | "id": "7bedcea1", 46 | "metadata": {}, 47 | "outputs": [], 48 | "source": [ 49 | "import io\n", 50 | "import os\n", 51 | "import re\n", 52 | "import uuid\n", 53 | "import json\n", 54 | "import time\n", 55 | "import boto3\n", 56 | "import pprint\n", 57 | "import botocore\n", 58 | "import sagemaker\n", 59 | "import subprocess\n", 60 | "from sagemaker import get_execution_role\n", 61 | "from datetime import datetime, timezone" 62 | ] 63 | }, 64 | { 65 | "cell_type": "code", 66 | "execution_count": null, 67 | "id": "40d91817", 68 | "metadata": {}, 69 | "outputs": [], 70 | "source": [ 71 | "bucket = '' # Add your bucket name here\n", 72 | "\n", 73 | "region = boto3.session.Session().region_name\n", 74 | "\n", 75 | "# Amazon S3 (S3) client\n", 76 | "s3 = boto3.client('s3', region)\n", 77 | "s3_resource = boto3.resource('s3')\n", 78 | "try:\n", 79 | " s3.head_bucket(Bucket=bucket)\n", 80 | "except:\n", 81 | " print(\"The S3 bucket name {} you entered seems to be incorrect, please try again\".format(bucket))" 82 | ] 83 | }, 84 | { 85 | "cell_type": "code", 86 | "execution_count": null, 87 | "id": "935d324a", 88 | "metadata": {}, 89 | "outputs": [], 90 | "source": [ 91 | "INPUT_PATH_TRANSCRIBE = 'transcribe/input'\n", 92 | "OUTPUT_PATH_TRANSCRIBE = 'transcribe/output'\n", 93 | "INPUT_PATH_TRANSLATE = 'translate/input'\n", 94 | "OUTPUT_PATH_TRANSLATE = 'translate/output'" 95 | ] 96 | }, 97 | { 98 | "cell_type": "code", 99 | "execution_count": null, 100 | "id": "f32f26e5", 101 | "metadata": {}, 102 | "outputs": [], 103 | "source": [ 104 | "region = boto3.session.Session().region_name\n", 105 | "bucket_region = s3.head_bucket(Bucket=bucket)['ResponseMetadata']['HTTPHeaders']['x-amz-bucket-region']\n", 106 | "assert bucket_region == region, \"Your S3 bucket {} and this notebook need to be in the same region.\".format(bucket)\n", 107 | "# Amazon Transcribe client\n", 108 | "transcribe_client = boto3.client(\"transcribe\")\n", 109 | "# Amazon Translate client\n", 110 | "translate_client = boto3.client(\"translate\")" 111 | ] 112 | }, 113 | { 114 | "cell_type": "code", 115 | "execution_count": null, 116 | "id": "d190c821", 117 | "metadata": {}, 118 | "outputs": [], 119 | "source": [ 120 | "# This is the execution role that will be used to call Amazon Transcribe and Amazon Translate\n", 121 | "role = get_execution_role()\n", 122 | "display(role)" 123 | ] 124 | }, 125 | { 126 | "cell_type": "markdown", 127 | "id": "36e8d4a4", 128 | "metadata": {}, 129 | "source": [ 130 | "## Amazon Transcribe Custom" 131 | ] 132 | }, 133 | { 134 | "cell_type": "markdown", 135 | "id": "f03eb05a", 136 | "metadata": {}, 137 | "source": [ 138 | "### Create custom vocabulary\n", 139 | "\n", 140 | "You can give Amazon Transcribe more information about how to process speech in your input file by creating a custom vocabulary in text file format. A custom vocabulary is a list of specific words that you want Amazon Transcribe to recognize in your audio input. These are generally domain-specific words and phrases, words that Amazon Transcribe isn't recognizing, or proper nouns." 141 | ] 142 | }, 143 | { 144 | "cell_type": "code", 145 | "execution_count": null, 146 | "id": "5a22d56c", 147 | "metadata": {}, 148 | "outputs": [], 149 | "source": [ 150 | "# First lets view our vocabulary files\n", 151 | "!pygmentize 'input/custom-vocabulary-EN.txt'" 152 | ] 153 | }, 154 | { 155 | "cell_type": "code", 156 | "execution_count": null, 157 | "id": "51c9f86b", 158 | "metadata": {}, 159 | "outputs": [], 160 | "source": [ 161 | "# First lets view our vocabulary files - Uncomment line below to view if you like\n", 162 | "#!pygmentize 'input/custom-vocabulary-ES.txt'" 163 | ] 164 | }, 165 | { 166 | "cell_type": "markdown", 167 | "id": "fdce93dc", 168 | "metadata": {}, 169 | "source": [ 170 | "#### Custom vocabularies can be in table or list formats\n", 171 | "\n", 172 | "Each vocabulary file can be in either table or list format; table format is strongly recommended because it gives you more options for and more control over the input and output of words within your custom vocabulary. As you saw above, we used the table format for this workshop. When you use the table format, it has 4 columns as explain below:\n", 173 | "\n", 174 | "1. **Phrase**\n", 175 | "The word or phrase that should be recognized. If the entry is a phrase, separate the words with a hyphen (-). For example, you type Los Angeles as Los-Angeles. The Phrase field is required\n", 176 | "\n", 177 | "1. **IPA**\n", 178 | "The pronunciation of your word or phrase using IPA characters. You can include characters in the International Phonetic Alphabet (IPA) in this field.\n", 179 | "\n", 180 | "1. **SoundsLike**\n", 181 | "The pronunciation of your word or phrase using the standard orthography of the language to mimic the way that the word sounds.\n", 182 | "\n", 183 | "1. **DisplayAs**\n", 184 | "Defines the how the word or phrase looks when it's output. For example, if the word or phrase is Los-Angeles, you can specify the display form as \"Los Angeles\" so that the hyphen is not present in the output." 185 | ] 186 | }, 187 | { 188 | "cell_type": "code", 189 | "execution_count": null, 190 | "id": "dd14c202", 191 | "metadata": {}, 192 | "outputs": [], 193 | "source": [ 194 | "# Next we will upload our vocabulary files to our S3 bucket\n", 195 | "cust_vocab_en = 'custom-vocabulary-EN.txt'\n", 196 | "cust_vocab_es = 'custom-vocabulary-ES.txt'\n", 197 | "s3.upload_file('input/' + cust_vocab_en,bucket,INPUT_PATH_TRANSCRIBE + '/' + cust_vocab_en)\n", 198 | "s3.upload_file('input/' + cust_vocab_es,bucket,INPUT_PATH_TRANSCRIBE + '/' + cust_vocab_es)" 199 | ] 200 | }, 201 | { 202 | "cell_type": "code", 203 | "execution_count": null, 204 | "id": "2aba0fbe", 205 | "metadata": {}, 206 | "outputs": [], 207 | "source": [ 208 | "# Create the custom vocabulary in Transcribe\n", 209 | "# The name of your custom vocabulary must be unique!\n", 210 | "vocab_EN = 'custom-vocab-EN-' + str(uuid.uuid4())\n", 211 | "vocab_ES = 'custom-vocab-ES-' + str(uuid.uuid4())" 212 | ] 213 | }, 214 | { 215 | "cell_type": "code", 216 | "execution_count": null, 217 | "id": "70145b3e", 218 | "metadata": {}, 219 | "outputs": [], 220 | "source": [ 221 | "vocab_response_EN = transcribe_client.create_vocabulary(\n", 222 | " VocabularyName=vocab_EN,\n", 223 | " LanguageCode='en-US',\n", 224 | " VocabularyFileUri='s3://' + bucket + '/'+ INPUT_PATH_TRANSCRIBE + '/' + cust_vocab_en\n", 225 | ")" 226 | ] 227 | }, 228 | { 229 | "cell_type": "code", 230 | "execution_count": null, 231 | "id": "9cf568bd", 232 | "metadata": {}, 233 | "outputs": [], 234 | "source": [ 235 | "vocab_response_ES = transcribe_client.create_vocabulary(\n", 236 | " VocabularyName=vocab_ES,\n", 237 | " LanguageCode='es-US',\n", 238 | " VocabularyFileUri='s3://' + bucket + '/'+ INPUT_PATH_TRANSCRIBE + '/' + cust_vocab_es\n", 239 | ")" 240 | ] 241 | }, 242 | { 243 | "cell_type": "markdown", 244 | "id": "735d85ac", 245 | "metadata": {}, 246 | "source": [ 247 | "### Check Vocabulary status in Amazon Transcribe console\n", 248 | "\n", 249 | "[Go to Amazon Transcribe Console](https://console.aws.amazon.com/transcribe/home?region=us-east-1#vocabulary)\n", 250 | "\n", 251 | "This will take 3 to 5 minutes. Go to the **Perform Transcription** step below once the vocabulary has been created and is ready for use.\n", 252 | " " 253 | ] 254 | }, 255 | { 256 | "cell_type": "markdown", 257 | "id": "6ac52da7", 258 | "metadata": {}, 259 | "source": [ 260 | "### Perform Transcription" 261 | ] 262 | }, 263 | { 264 | "cell_type": "code", 265 | "execution_count": null, 266 | "id": "ad29b8c6", 267 | "metadata": {}, 268 | "outputs": [], 269 | "source": [ 270 | "# First let us list our audio files and then upload them to the S3 bucket\n", 271 | "audio_dir = 'input/audio-recordings'\n", 272 | "\n", 273 | "for subdir, dirs, files in os.walk(audio_dir):\n", 274 | " for file in files:\n", 275 | " s3.upload_file(os.path.join(subdir, file), bucket, 'transcribe/' + os.path.join(subdir, file))\n", 276 | " print(\"Uploaded to: \" + \"s3://\" + bucket + '/transcribe/' + os.path.join(subdir, file))" 277 | ] 278 | }, 279 | { 280 | "cell_type": "code", 281 | "execution_count": null, 282 | "id": "fa357c3a", 283 | "metadata": {}, 284 | "outputs": [], 285 | "source": [ 286 | "# Define the method that will perform transcription\n", 287 | "\n", 288 | "def transcribe(job_name, job_uri, lang_code, vocab_name):\n", 289 | " \"\"\"Transcribe audio files to text.\n", 290 | " Args:\n", 291 | " job_name (str): the name of the job that you specify;\n", 292 | " the output json will be job_name.json\n", 293 | " job_uri (str): input path (in s3) to the file being transcribed\n", 294 | " in_bucket (str): s3 bucket prefix where the input audio files are present\n", 295 | " out_bucket (str): s3 bucket name that you want the output json\n", 296 | " to be placed in\n", 297 | " vocab_name (str): name of custom vocabulary used;\n", 298 | " \"\"\"\n", 299 | " try:\n", 300 | " transcribe_client.start_transcription_job(\n", 301 | " TranscriptionJobName=job_name,\n", 302 | " LanguageCode=lang_code,\n", 303 | " Media={\"MediaFileUri\": job_uri},\n", 304 | " Settings={'VocabularyName': vocab_name, 'MaxSpeakerLabels': 2, 'ShowSpeakerLabels': True}\n", 305 | " )\n", 306 | " \n", 307 | " time.sleep(2)\n", 308 | " \n", 309 | " print(transcribe_client.get_transcription_job(TranscriptionJobName=job_name)['TranscriptionJob']['TranscriptionJobStatus'])\n", 310 | "\n", 311 | " except Exception as e:\n", 312 | " print(e)\n" 313 | ] 314 | }, 315 | { 316 | "cell_type": "markdown", 317 | "id": "abfd025d", 318 | "metadata": {}, 319 | "source": [ 320 | "**Note:** As you can see in our code below we are determining the language code to send to Amazon Transcribe. However this is not required if you set the [IdentifyLanguage to True](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/transcribe.html#TranscribeService.Client.start_transcription_job). In our case we needed to select either the English or Spanish Custom Vocabulary file to use for transcribing audio files and hence we went with specific language codes. " 321 | ] 322 | }, 323 | { 324 | "cell_type": "code", 325 | "execution_count": null, 326 | "id": "75653a8e", 327 | "metadata": {}, 328 | "outputs": [], 329 | "source": [ 330 | "# Now we will loop through the recordings in our bucket to submit the transcription jobs\n", 331 | "now = datetime.now()\n", 332 | "time_now = now.strftime(\"%H.%M.%S\")\n", 333 | "\n", 334 | "paginator = s3.get_paginator('list_objects_v2')\n", 335 | "pages = paginator.paginate(Bucket=bucket, Prefix='transcribe/input/audio-recordings')\n", 336 | "job_name_list = []\n", 337 | "\n", 338 | "for page in pages:\n", 339 | " for obj in page['Contents']:\n", 340 | " audio_name = obj['Key'].split('/')[3].split('.')[0]\n", 341 | " job_name = audio_name + '-' + time_now\n", 342 | " job_name_list.append(job_name)\n", 343 | " job_uri = f\"s3://{bucket}/{obj['Key']}\"\n", 344 | " print('Submitting transcription for audio: ' + job_name)\n", 345 | " vocab = ''\n", 346 | " lang_code = ''\n", 347 | " if audio_name.split('-')[2] == 'EN':\n", 348 | " vocab = vocab_EN\n", 349 | " lang_code = 'en-US'\n", 350 | " elif audio_name.split('-')[2] == 'ES':\n", 351 | " vocab = vocab_ES\n", 352 | " lang_code = 'es-US'\n", 353 | " # submit the transcription job now, we will provide our current bucket name as the output bucket\n", 354 | " transcribe(job_name, job_uri, lang_code,vocab)" 355 | ] 356 | }, 357 | { 358 | "cell_type": "markdown", 359 | "id": "275266e9", 360 | "metadata": {}, 361 | "source": [ 362 | "### Check Transcription job status in Amazon Transcribe console\n", 363 | "\n", 364 | "[Go to Amazon Transcribe Console](https://console.aws.amazon.com/transcribe/home?region=us-east-1#jobs)\n", 365 | "\n", 366 | "This will be complete in about 5 to 8 minutes in total for all the jobs. Go to the **Process Transcription output** step below once the transcription jobs show status as complete otherwise you will get an error. No worries, just try again in a minute or so.\n", 367 | " " 368 | ] 369 | }, 370 | { 371 | "cell_type": "markdown", 372 | "id": "ffd8e4a4", 373 | "metadata": {}, 374 | "source": [ 375 | "### Process Transcription output" 376 | ] 377 | }, 378 | { 379 | "cell_type": "markdown", 380 | "id": "823fb45c", 381 | "metadata": {}, 382 | "source": [ 383 | "#### Clone Transcribe helper repo\n", 384 | "\n", 385 | "From a terminal window in your notebook instance, navigate to the current directory where this notebook resides, and execute the command `git clone https://github.com/aws-samples/amazon-transcribe-output-word-document` before executing the cell below. The steps you will have to follow are:\n", 386 | "\n", 387 | "1. From Jupyter notebook home page on the right, select New --> Terminal\n", 388 | "1. In the terminal window, type `cd SageMaker`\n", 389 | "1. Now type `cd aim317-uncover-insights-customer-conversations`\n", 390 | "1. Now type `cd notebooks`\n", 391 | "1. Now type `cd 1-Transcribe-Translate-Calls`\n", 392 | "1. Finally type the command `git clone https://github.com/aws-samples/amazon-transcribe-output-word-document`\n" 393 | ] 394 | }, 395 | { 396 | "cell_type": "markdown", 397 | "id": "55d6cc7c", 398 | "metadata": {}, 399 | "source": [ 400 | "#### Create a word document from call transcript\n", 401 | "\n", 402 | "We will generate a word document from the Amazon Transcribe response JSON so we review the transcript. Once you execute the code in the next cell, go to your notebook folder and **you will see the word document created with the Transcribe job name. Select this word document, click download and you can open it to review the transcript**." 403 | ] 404 | }, 405 | { 406 | "cell_type": "code", 407 | "execution_count": null, 408 | "id": "c314aea5", 409 | "metadata": {}, 410 | "outputs": [], 411 | "source": [ 412 | "!python amazon-transcribe-output-word-document/python/ts-to-word.py --inputJob {job_name_list[0]}" 413 | ] 414 | }, 415 | { 416 | "cell_type": "markdown", 417 | "id": "82267b4f", 418 | "metadata": {}, 419 | "source": [ 420 | "### Get Call Segments\n", 421 | "We will get the call segments and speaker information to derive additional insights we can visualize in QuickSight. " 422 | ] 423 | }, 424 | { 425 | "cell_type": "code", 426 | "execution_count": null, 427 | "id": "75b324a0", 428 | "metadata": {}, 429 | "outputs": [], 430 | "source": [ 431 | "import pandas as pd\n", 432 | "\n", 433 | "def upload_segments(transcript):\n", 434 | " # Get the speaker segments\n", 435 | " cols = ['transcript_name', 'start_time', 'end_time', 'speaker_label']\n", 436 | " spk_df = pd.DataFrame(columns=cols)\n", 437 | " for seg in original['results']['speaker_labels']['segments']:\n", 438 | " for item in seg['items']:\n", 439 | " spk_df.loc[len(spk_df.index)] = [transcript['jobName'], item['start_time'], item['end_time'], item['speaker_label']]\n", 440 | " # Get the speaker content\n", 441 | " icols = ['transcript_name', 'start_time', 'end_time', 'confidence', 'content']\n", 442 | " item_df = pd.DataFrame(columns=icols)\n", 443 | " for itms in original['results']['items']:\n", 444 | " if itms.get('start_time') is not None:\n", 445 | " item_df.loc[len(item_df.index)] = [transcript['jobName'], itms['start_time'], itms['end_time'], itms['alternatives'][0]['confidence'], itms['alternatives'][0]['content']]\n", 446 | "\n", 447 | " # Merge the two on transcript name, start time and end time\n", 448 | " full_df = pd.merge(spk_df, item_df, how='left', left_on=['transcript_name', 'start_time', 'end_time'], right_on = ['transcript_name', 'start_time', 'end_time'])\n", 449 | " # We will use the Transcribe Job Name for the CSV file name\n", 450 | " csv_file = transcript['jobName'] + '.csv'\n", 451 | " full_df.to_csv(csv_file, index=False)\n", 452 | " s3.upload_file(csv_file, bucket, 'quicksight/data/transcripts/' + csv_file)\n", 453 | " # The print below is too verbose so commenting for now - feel free to uncomment if needed\n", 454 | " #print(\"CSV file with speaker segments created and uploaded for visualization input to: \" + \"s3://\" + bucket + \"/\" + \"quicksight/data/transcripts/\" + csv_file)" 455 | ] 456 | }, 457 | { 458 | "cell_type": "markdown", 459 | "id": "51c363b8", 460 | "metadata": {}, 461 | "source": [ 462 | "#### Upload transcript text files to S3 bucket\n", 463 | "\n", 464 | "We will now get the full transcript from all the calls and send them to our S3 bucket in preparation for our translation tasks\n" 465 | ] 466 | }, 467 | { 468 | "cell_type": "code", 469 | "execution_count": null, 470 | "id": "f86e1c56", 471 | "metadata": {}, 472 | "outputs": [], 473 | "source": [ 474 | "# First we need an output directory\n", 475 | "dir = os.getcwd()+'/output'\n", 476 | "if not os.path.exists(dir):\n", 477 | " os.makedirs(dir)" 478 | ] 479 | }, 480 | { 481 | "cell_type": "code", 482 | "execution_count": null, 483 | "id": "73269b2d", 484 | "metadata": {}, 485 | "outputs": [], 486 | "source": [ 487 | "# Our transcript is in a presigned URL in Transcribe's S3 bucket, let us download it and get the text we need\n", 488 | "import urllib3\n", 489 | "\n", 490 | "for job in job_name_list:\n", 491 | " response = transcribe_client.get_transcription_job(\n", 492 | " TranscriptionJobName=job \n", 493 | " )\n", 494 | " file_name = response['TranscriptionJob']['Transcript']['TranscriptFileUri']\n", 495 | " http = urllib3.PoolManager()\n", 496 | " transcribed_data = http.request('GET', file_name)\n", 497 | " original = json.loads(transcribed_data.data.decode('utf-8'))\n", 498 | " # Extract the speaker segments, confidence scores for each call\n", 499 | " # Send it to the QuickSight folder in the S3 bucket\n", 500 | " # We will use this during visualization\n", 501 | " upload_segments(original)\n", 502 | " entire_transcript = ''\n", 503 | " entire_transcript = original[\"results\"][\"transcripts\"]\n", 504 | " outfile = 'output/'+job+'.txt'\n", 505 | " with open(outfile, 'w') as out:\n", 506 | " out.write(entire_transcript[0]['transcript'])\n", 507 | " s3.upload_file(outfile,bucket,OUTPUT_PATH_TRANSCRIBE+'/'+job+'.txt')\n", 508 | " print(\"Transcript uploaded to: \" + f's3://{bucket}/{OUTPUT_PATH_TRANSCRIBE}/{job}.txt')" 509 | ] 510 | }, 511 | { 512 | "cell_type": "markdown", 513 | "id": "19f1202e", 514 | "metadata": {}, 515 | "source": [ 516 | "## Amazon Translate with Custom Terminology\n", 517 | "\n", 518 | "[Amazon Translate](https://aws.amazon.com/translate/) is a fully managed, neural machine translation service that delivers high quality and affordable language translation in seventy-one languages. Using [custom terminology](https://docs.aws.amazon.com/translate/latest/dg/how-custom-terminology.html) with your translation requests enables you to make sure that your brand names, character names, model names, and other unique content is translated exactly the way you need it, regardless of its context and the Amazon Translate algorithm’s decision. It's easy to set up a terminology file and attach it to your Amazon Translate account. When you translate text, you simply choose to use the custom terminology as well, and any examples of your source word are translated as you want them." 519 | ] 520 | }, 521 | { 522 | "cell_type": "markdown", 523 | "id": "98cfd5aa", 524 | "metadata": {}, 525 | "source": [ 526 | "### Translate the Spanish transcripts\n", 527 | "\n", 528 | "We will first create a custom terminology file that consists of examples that show how you want words to be translated. In our case we are using a CSV file as the format, but it supports [TMX as well](https://docs.aws.amazon.com/translate/latest/dg/creating-custom-terminology.html). It includes a collection of words or terminologies in a source language, and for each example, it contains the desired translation output in one or more target languages. We created a sample custom terminology file for our use case which is available in the input folder of this notebook `translate-custom-terminology.txt` to create a translation of our Spanish transcripts. We will now review this file and proceed with setting up a Custom Translation job." 529 | ] 530 | }, 531 | { 532 | "cell_type": "markdown", 533 | "id": "7826ef83", 534 | "metadata": {}, 535 | "source": [ 536 | "#### Review custom terminology file" 537 | ] 538 | }, 539 | { 540 | "cell_type": "code", 541 | "execution_count": null, 542 | "id": "6e143fdc", 543 | "metadata": {}, 544 | "outputs": [], 545 | "source": [ 546 | "# Lets first review our custom terminology file \n", 547 | "# We created a sample file for this workshop that we can use - uncomment below to check\n", 548 | "#!pygmentize 'input/translate-custom-terminology.txt'" 549 | ] 550 | }, 551 | { 552 | "cell_type": "code", 553 | "execution_count": null, 554 | "id": "0271a2b2", 555 | "metadata": {}, 556 | "outputs": [], 557 | "source": [ 558 | "# Change extension to CSV and upload to S3 bucket\n", 559 | "term_prefix = 'translate/custom-terminology/'\n", 560 | "pd_filename = 'translate-custom-terminology'\n", 561 | "s3.upload_file('input/' + pd_filename + '.txt', bucket, term_prefix + '/' + pd_filename + '.csv')" 562 | ] 563 | }, 564 | { 565 | "cell_type": "markdown", 566 | "id": "58ffe46e", 567 | "metadata": {}, 568 | "source": [ 569 | "#### Import custom terminology to Amazon Translate" 570 | ] 571 | }, 572 | { 573 | "cell_type": "code", 574 | "execution_count": null, 575 | "id": "d5a844e8", 576 | "metadata": {}, 577 | "outputs": [], 578 | "source": [ 579 | "# read the custom terminology csv file we uploaded\n", 580 | "temp = s3_resource.Object(bucket, term_prefix + '/' + pd_filename + '.csv')\n", 581 | "term_file = temp.get()['Body'].read().decode('utf-8')" 582 | ] 583 | }, 584 | { 585 | "cell_type": "code", 586 | "execution_count": null, 587 | "id": "b7e81491", 588 | "metadata": {}, 589 | "outputs": [], 590 | "source": [ 591 | "# import the custom terminology file to Translate\n", 592 | "term_name = 'aim317-custom-terminology'\n", 593 | "response = translate_client.import_terminology(\n", 594 | " Name=term_name,\n", 595 | " MergeStrategy='OVERWRITE',\n", 596 | " TerminologyData={\n", 597 | " 'File': term_file,\n", 598 | " 'Format': 'CSV'\n", 599 | " }\n", 600 | ")" 601 | ] 602 | }, 603 | { 604 | "cell_type": "markdown", 605 | "id": "7349b0fe", 606 | "metadata": {}, 607 | "source": [ 608 | "#### Get the Spanish transcripts" 609 | ] 610 | }, 611 | { 612 | "cell_type": "code", 613 | "execution_count": null, 614 | "id": "29a9b1fb", 615 | "metadata": {}, 616 | "outputs": [], 617 | "source": [ 618 | "# Review the list of transcripts to pick Spanish transcripts\n", 619 | "paginator = s3.get_paginator('list_objects_v2')\n", 620 | "pages = paginator.paginate(Bucket=bucket, Prefix=OUTPUT_PATH_TRANSCRIBE)\n", 621 | "\n", 622 | "s3_resource = boto3.resource('s3')\n", 623 | "# Now copy the Spanish transcripts to Translate Input folder\n", 624 | "for page in pages:\n", 625 | " for obj in page['Contents']:\n", 626 | " lang = ''\n", 627 | " ts_file = obj['Key'].split('/')[2]\n", 628 | " tscript = ts_file.split('-')\n", 629 | " if len(tscript) > 1:\n", 630 | " lang = tscript[2]\n", 631 | " if lang == 'ES':\n", 632 | " copy_source = {'Bucket': bucket,'Key': obj['Key']}\n", 633 | " s3_resource.meta.client.copy(copy_source, bucket, INPUT_PATH_TRANSLATE + '/' + ts_file)" 634 | ] 635 | }, 636 | { 637 | "cell_type": "markdown", 638 | "id": "d823c05d", 639 | "metadata": {}, 640 | "source": [ 641 | "#### Run translation synchronously\n", 642 | "\n", 643 | "**Note:** For the purposes of this workshop we are running this translate synchronously as we have only 2 call transcripts to be translated. For large scale translation requirements, you should use [start_text_translation_job](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/translate.html#Translate.Client.start_text_translation_job) and for batch custom translation processing requirements you should use the [Parallel Data File with Translate Active Custom Translation](https://docs.aws.amazon.com/translate/latest/dg/customizing-translations-parallel-data.html) " 644 | ] 645 | }, 646 | { 647 | "cell_type": "code", 648 | "execution_count": null, 649 | "id": "d221d433", 650 | "metadata": {}, 651 | "outputs": [], 652 | "source": [ 653 | "# Read the spanish transcripts from the Translate input folder in S3 bucket\n", 654 | "paginator = s3.get_paginator('list_objects_v2')\n", 655 | "pages = paginator.paginate(Bucket=bucket, Prefix=INPUT_PATH_TRANSLATE)\n", 656 | "for page in pages:\n", 657 | " for obj in page['Contents']:\n", 658 | " temp = s3_resource.Object(bucket, obj['Key'])\n", 659 | " trans_input = temp.get()['Body'].read().decode('utf-8')\n", 660 | " if len(trans_input) > 0:\n", 661 | " # Translate the Spanish transcripts\n", 662 | " trans_response = translate_client.translate_text(\n", 663 | " Text=trans_input,\n", 664 | " TerminologyNames=[term_name],\n", 665 | " SourceLanguageCode='es',\n", 666 | " TargetLanguageCode='en'\n", 667 | " )\n", 668 | " # Write the translated text to a temporary file\n", 669 | " with open('temp_translate.txt', 'w') as outfile:\n", 670 | " outfile.write(trans_response['TranslatedText'])\n", 671 | " # Upload the translated text to S3 bucket \n", 672 | " s3.upload_file('temp_translate.txt', bucket, OUTPUT_PATH_TRANSLATE + '/en-' + obj['Key'].split('/')[2])\n", 673 | " print(\"Translated text file uploaded to: \" + 's3://' + bucket + '/' + OUTPUT_PATH_TRANSLATE + '/en-' + obj['Key'].split('/')[2])\n", 674 | " " 675 | ] 676 | }, 677 | { 678 | "cell_type": "markdown", 679 | "id": "33118b19", 680 | "metadata": {}, 681 | "source": [ 682 | "### Prepare Comprehend inputs\n", 683 | "\n", 684 | "We will now collect the original English transcripts and the translated Spanish language transcripts and move them to the Comprehend input folder in our S3 bucket in preparation for next steps in the workshop." 685 | ] 686 | }, 687 | { 688 | "cell_type": "code", 689 | "execution_count": null, 690 | "id": "76c148bf", 691 | "metadata": {}, 692 | "outputs": [], 693 | "source": [ 694 | "# First copy the English transcripts\n", 695 | "paginator = s3.get_paginator('list_objects_v2')\n", 696 | "pages = paginator.paginate(Bucket=bucket, Prefix=OUTPUT_PATH_TRANSCRIBE)\n", 697 | "\n", 698 | "s3_resource = boto3.resource('s3')\n", 699 | "\n", 700 | "for page in pages:\n", 701 | " for obj in page['Contents']:\n", 702 | " ts_file1 = obj['Key'].split('/')[2]\n", 703 | " tscript = ts_file1.split('-')\n", 704 | " if len(tscript) > 1:\n", 705 | " lang = tscript[2]\n", 706 | " if lang == 'EN':\n", 707 | " copy_source = {'Bucket': bucket,'Key': obj['Key']}\n", 708 | " s3_resource.meta.client.copy(copy_source, bucket, 'comprehend/input/' + ts_file1)\n", 709 | "\n", 710 | "# Now copy the Spanish transcripts that were translated to English\n", 711 | "pages = paginator.paginate(Bucket=bucket, Prefix=OUTPUT_PATH_TRANSLATE)\n", 712 | "\n", 713 | "for page in pages:\n", 714 | " for obj in page['Contents']:\n", 715 | " ts_file2 = obj['Key'].split('/')[2]\n", 716 | " if 'txt' in ts_file2:\n", 717 | " copy_source = {'Bucket': bucket,'Key': obj['Key']}\n", 718 | " s3_resource.meta.client.copy(copy_source, bucket, 'comprehend/input/' + ts_file2) " 719 | ] 720 | }, 721 | { 722 | "cell_type": "markdown", 723 | "id": "9b4d9a2c", 724 | "metadata": {}, 725 | "source": [ 726 | "Let us review if all the text files are ready for Comprehend custom inference. We should have 7 files in total with two calls that were transcribed in Spanish and translated to English, and 5 English calls that we transcribed. " 727 | ] 728 | }, 729 | { 730 | "cell_type": "code", 731 | "execution_count": null, 732 | "id": "5d8fcbd6", 733 | "metadata": {}, 734 | "outputs": [], 735 | "source": [ 736 | "paginator = s3.get_paginator('list_objects_v2')\n", 737 | "pages = paginator.paginate(Bucket=bucket, Prefix='comprehend/input')\n", 738 | "for page in pages:\n", 739 | " for obj in page['Contents']:\n", 740 | " print(obj['Key'])" 741 | ] 742 | }, 743 | { 744 | "cell_type": "markdown", 745 | "id": "f5cfcf5f", 746 | "metadata": {}, 747 | "source": [ 748 | "## End of notebook, go back to your workshop instructions" 749 | ] 750 | } 751 | ], 752 | "metadata": { 753 | "interpreter": { 754 | "hash": "9ddb102edfbd95000dbbd260d8bbcf82701cc06b4dcf114fa04ba84aab75adcb" 755 | }, 756 | "kernelspec": { 757 | "display_name": "conda_python3", 758 | "language": "python", 759 | "name": "conda_python3" 760 | }, 761 | "language_info": { 762 | "codemirror_mode": { 763 | "name": "ipython", 764 | "version": 3 765 | }, 766 | "file_extension": ".py", 767 | "mimetype": "text/x-python", 768 | "name": "python", 769 | "nbconvert_exporter": "python", 770 | "pygments_lexer": "ipython3", 771 | "version": "3.6.13" 772 | } 773 | }, 774 | "nbformat": 4, 775 | "nbformat_minor": 5 776 | } 777 | -------------------------------------------------------------------------------- /notebooks/2-Train-Detect-Entities/train.csv: -------------------------------------------------------------------------------- 1 | "The unique feature of Asimov's robots is the Three Laws of Robotics, hardwired in a robot's positronic brain, with which all robots in his fiction must comply, and which ensure that the robot does not turn against its creators" 2 | """Victory Unintentional"" has positronic robots obeying the Three Laws, but also a non-human civilization on Jupiter." 3 | """Let's Get Together"" features humanoid robots, but from a different future (where the Cold War is still in progress), and with no mention of the Three Laws" 4 | "The Robot series is a series of 37 science fiction short stories and six novels by American writer Isaac Asimov, featuring positronic robots." 5 | """Mother Earth"" (1948) - short story, in which no individual robots appear, but positronic robots are part of the background" 6 | "Most of Asimov's robot short stories, which he began to write in 1939, are set in the first age of positronic robotics and space exploration." 7 | "The stories were not initially conceived as a set, but rather all feature his positronic robotsindeed, there are some inconsistencies among them, especially between the short stories and the novels." 8 | "It was the Zoromes, then, who were the spiritual ancestors of my own ""positronic robots,"" all of them, from Robbie to R. Daneel." 9 | "The 1989 anthology Foundation's Friends included the positronic robot stories ""Balance"" by Mike Resnick, ""Blot"" by Hal Clement, ""PAPPI"" by Sheila Finch, ""Plato's Cave"" by Poul Anderson, ""The Fourth Law of Robotics"" by Harry Harrison and ""Carhunters of the Concrete Prairie"" by Robert Sheckley." 10 | "Bicentennial Man (1999) was the first theatrical movie adaptation of any Asimov story or novel and was based on both Asimov's original short story of the same name (1976) and its novel expansion, The Positronic Man (1993)" 11 | "The book also contains the short story in which Asimov's Three Laws of Robotics first appear, which had large influence on later science fiction and had impact on thought on ethics of artificial intelligence as well." 12 | "Its plot incorporates elements of ""Little Lost Robot"",[8] some of Asimov's character names and the Three Laws." 13 | "In 2004 The Saturday Evening Post said that I, Robot's Three Laws ""revolutionized the science fiction genre and made robots far more interesting than they ever had been before.""" 14 | "In Aliens, a 1986 movie, the synthetic person Bishop paraphrases Asimov's First Law in the line: ""It is impossible for me to harm, or by omission of action allow to be harmed, a human being.""" 15 | "An episode of The Simpsons entitled ""I D'oh Bot"" (2004) has Professor Frink build a robot named ""Smashius Clay"" (also named ""Killhammad Aieee"") that follows all three of Asimov's laws of robotics." 16 | "Leela once told Bender to ""cover his ears"" so that he would not hear the robot-destroying paradox which she used to destroy Robot Santa (he punishes the bad, he kills people, killing is bad, therefore he must punish himself), causing a total breakdown; additionally, Bender has stated that he is Three Laws Safe." 17 | "The Indian science fiction film Endhiran, released in 2010, refers to Asimov's three laws for artificial intelligence for the fictional character Chitti: The Robot." 18 | "When a scientist takes in the robot for evaluation, the panel enquires whether the robot was built using the Three Laws of Robotics." 19 | "Upon their publication in this collection, Asimov wrote a framing sequence presenting the stories as Calvin's reminiscences during an interview with her about her life's work, chiefly concerned with aberrant behaviour of robots and the use of ""robopsychology"" to sort out what is happening in their positronic brain." 20 | "Two months after I read it, I began 'Robbie', about a sympathetic robot, and that was the start of my positronic robot series." 21 | "The positronic brain, which Asimov named his robots' central processors, is what powers Data from Star Trek: The Next Generation, as well as other Soong type androids. " 22 | "Positronic brains have been referenced in a number of other television shows including Doctor Who, Once Upon a Time... Space, Perry Rhodan, The Number of the Beast, and others." 23 | "In ""Someday"" there are non-positronic computers which tell stories and do not obey the Three Laws." 24 | "In ""Sally"" there are positronic brain cars who can damage men or disobey without problems. No other kinds of robots are seen, and there is no mention of the Three Laws." 25 | "In "". . . That Thou Art Mindful of Him"" robots are created with a very flexible Three Laws management, and these create little, simplified robots with no laws that actually act against the Three Laws of Robotics." 26 | "Andrew uses the money to pay for bodily upgrades, keeping himself in perfect shape, but never has his positronic brain altered." 27 | The first scene of the story is explained as Andrew seeks out a robotic surgeon to perform an ultimately fatal operation: altering his positronic brain so that it will decay with time. 28 | "his story is set within Asimov's Foundation universe, which also includes his earlier Susan Calvin positronic robot tales." 29 | "Sir reveals that U.S. Robots has ended a study on generalized pathways and creative robots, frightened by Andrew's unpredictability." 30 | "However, the robot refuses, as the operation is harmful and violates the First Law of Robotics, which says a robot may never harm a human being." 31 | "The Positronic Man is a 1992 novel by American writers Isaac Asimov and Robert Silverberg, based on Asimov's 1976 novelette ""The Bicentennial Man""." 32 | In the twenty-first century the creation of the positronic brain leads to the development of robot laborers and revolutionizes life on Earth. 33 | "In The Positronic Man, the trends of fictional robotics in Asimov's Robot series (as outlined in the book I, Robot) are detailed as background events, with an indication that they are influenced by Andrew's story." 34 | "Only when Andrew allows his positronic brain to ""decay"", thereby willfully abandoning his immortality, is he declared a human being." 35 | "This story is set within Asimov's Foundation universe, which also includes his earlier Susan Calvin positronic robot tales." 36 | "No individual robots appear, but positronic robots are part of the background." 37 | "Earth faces a confrontation with its colonies, the ""Outer Worlds."" A historian looks back and sees the problem beginning a century and a half earlier, when Aurora got permission to ""introduce positronic robots into their community life.""" 38 | "The only witness is a malfunctioning house robot that has suffered damage to its positronic brain because it allowed harm to be done to a human, in violation of the First Law." 39 | "Ultimately, it is revealed that Delmarre's neighbor, roboticist Jothan Leebig, was working on putting positronic brains in spaceships." 40 | "Leebig poisoned Gruer by tricking his robots, using his knowledge of positronic brains, into putting poison into Gruer's drink." 41 | "This would negate the First Law, as such ships would not recognize that humans usually inhabit ships, and would therefore be able to attack and destroy other ships without regard for their crews." 42 | "R. Daneel and R. Giskard discover the roboticists' plan and attempt to stop Amadiro; but are hampered by the First Law of Robotics," 43 | "Daneel and Giskard, meanwhile, have inferred an additional Zeroth Law of Robotics: A robot may not injure humanity, or through inaction, allow humanity to come to harm." 44 | "It might enable them to overcome Amadiro, if they can use their telepathic perception of humanity to quell the inhibitions of the first law" 45 | "After Amadiro admits their plans, Giskard alters Amadiro's brain (using the newly created Zeroth Law); but in so doing, threatens his own." 46 | "Under the stress of having violated the First Law (in accordance with the Zeroth Law, but with the predicted benefit to humanity being uncertain), R. Giskard himself suffers a soon-fatal malfunction of his positronic brain but manages to confer his telepathic ability upon R. Daneel." 47 | "Dave Langford reviewed Robots and Empire for White Dwarf #85, and stated that ""Asimov always perks up when chopping logic with the Three Laws of Robotics, and here his robots come up with a Fourth, or rather Zeroth, Law." 48 | "In the novel, Asimov depicts the transition from his earlier Milky Way Galaxy, inhabited by both human beings and positronic robots, to his Galactic Empire." 49 | "Gladia is accompanied by the positronic robots R. Daneel Olivaw and R. Giskard Reventlov, both the former property of their creator, Dr. Han Fastolfe, who bequeathed them to Gladia in his will. R. Giskard has secret telepathic powers of which only R. Daneel knows." 50 | "The electrical aspect of robots is used for movement (through motors), sensing (where electrical signals are used to measure things like heat, sound, position, and energy status) and operation (robots need some level of electrical energy supplied to their motors and sensors in order to activate and perform basic operations)" 51 | "Actuators are the ""muscles"" of a robot, the parts which convert stored energy into movement." 52 | "Scientists from several European countries and Israel developed a prosthetic hand in 2009, called SmartHand, which functions like a real oneallowing patients to write with it, type on a keyboard, play piano and perform other fine movements" 53 | "As the robot falls to one side, it would jump slightly in that direction, in order to catch itself." 54 | "A quadruped was also demonstrated which could trot, run, pace, and bound" 55 | "A more advanced way for a robot to walk is by using a dynamic balancing algorithm, which is potentially more robust than the Zero Moment Point technique, as it constantly monitors the robot's motion, and places the feet in order to maintain stability" 56 | "In some of Asimov's other works, he states that the first use of the word robotics was in his short story Runaround (Astounding Science Fiction, March 1942),[4][5] where he introduced his concept of The Three Laws of Robotics" 57 | "Its limb control system allowed it to walk with the lower limbs, and to grip and transport objects with hands, using tactile sensors" 58 | "There are three different types of robotic programs: remote control, artificial intelligence and hybrid." 59 | "Robots that use artificial intelligence interact with their environment on their own without a control source, and can determine reactions to objects and problems they encounter using their preexisting programming." 60 | "Several robots have been made which can walk reliably on two legs, however, none have yet been made which are as robust as a human." 61 | "Many other robots have been built that walk on more than two legs, due to these robots being significantly easier to construct." 62 | "Walking robots can be used for uneven terrains, which would provide better mobility and energy efficiency than other locomotion methods." 63 | "Typically, robots on two legs can walk well on flat floors and can occasionally walk up stairs." 64 | "Several robots, built in the 1980s by Marc Raibert at the MIT Leg Laboratory, successfully demonstrated very dynamic walking. " 65 | "Initially, a robot with only one leg, and a very small foot could stay upright simply by hopping." 66 | " As the robot falls to one side, it would jump slightly in that direction, in order to catch itself." 67 | "A more advanced way for a robot to walk is by using a dynamic balancing algorithm, which is potentially more robust than the Zero Moment Point technique, as it constantly monitors the robot's motion, and places the feet in order to maintain stability." 68 | "This technique was recently demonstrated by Anybots' Dexter Robot,[99] which is so stable, it can even jump" 69 | Perhaps the most promising approach utilizes passive dynamics where the momentum of swinging limbs is used for greater efficiency 70 | "It has been shown that totally unpowered humanoid mechanisms can walk down a gentle slope, using only gravity to propel themselves." 71 | "Using this technique, a robot need only supply a small amount of motor power to walk along a flat surface or a little more to walk up a hill" 72 | One approach mimics the movements of a human climber on a wall with protrusions; adjusting the center of mass and moving each limb in turn to gain leverage. 73 | "Science fiction authors also typically assume that robots will eventually be capable of communicating with humans through speech, gestures, and facial expressions, rather than a command-line interface." 74 | "Evolutionary robots is a methodology that uses evolutionary computation to help design robots, especially the body form, or motion and behavior controllers." 75 | "Direct kinematics or forward kinematics refers to the calculation of end effector position, orientation, velocity, and acceleration when the corresponding joint values are known." 76 | "Inverse kinematics refers to the opposite case in which required joint values are calculated for given end effector values, as done in path planning." 77 | "Once all relevant positions, velocities, and accelerations have been calculated using kinematics, methods from the field of dynamics are used to study the effect of forces upon these movements." 78 | "Normal human gait is a complex process, which happens due to co-ordinated movements of the whole of the body, requiring the whole of Central Nervous System - the brain and spinal cord, to function properly." 79 | The most common cause for gait impairment is due to an injury of one or both legs. 80 | "Gait training is not simply re-educating a patient on how to walk, but also includes an initial assessment of their gait cycle - Gait analysis, creation of a plan to address the problem, as well as teaching the patient on how to walk on different surfaces." 81 | "Assistive devices and splints (orthosis) are often used in gait training, especially with those who have had surgery or an injury on their legs, but also with those who have balance or strength impairments as well." 82 | "Although gait training with parallel bars, treadmills and support systems can be beneficial, the long-term aim of gait training is usually to reduce patients' dependence on such technology in order to walk more in their daily lives." 83 | "A gait cycle is defined as the progression of movements that occurs before one leg can return to a certain position during walking, or ambulation." 84 | The gait cycle is studied in two phases - Swing and stance phase. 85 | Any gait training addressing a gait abnormality starts with a proper gait analysis. 86 | "The gait consists of a series of repetitive movements of the whole body during locomotion and is studied considering that each gait cycle repeats over itself, which is almost correct considering normal subjects." 87 | "The basic two phases are swing and stance phases, depending on whether the leg is free to swing or is in contact with the ground during the phase of gait studied." 88 | The stance phase is approximately 60% of the gait cycle and takes about 0.6 seconds to complete at a normal walking speed. 89 | "The swing phase occurs when the foot is not in contact with the ground, and constitutes about 40% of the gait cycle." 90 | The two point gait pattern requires a high level of coordination and balance 91 | "Recently, electromechanical devices such as the Hocoma Lokomat robot-driven gait orthosis have been introduced with the intention of reducing the physical labour demands on therapists." 92 | "Treadmill training, with or without a body-weight support, is an emerging therapy and is being used with stroke patients to improve kinematic gait parameters" 93 | Research has shown that this form of gait training demonstrates a more normal walking pattern without the compensatory movements commonly associated with stroke 94 | Determining the movement of a robot so that its end-effectors move from an initial configuration to a desired configuration is known as motion planning 95 | "The movement of a kinematic chain, whether it is a robot or an animated character, is modeled by the kinematics equations of the chain." 96 | Movement of one element requires the computation of the joint angles for the other elements to maintain the joint constraints. 97 | "For example, inverse kinematics allows an artist to move the hand of a 3D human model to a desired position and orientation and have an algorithm select the proper angles of the wrist, elbow, and shoulder joints." 98 | "Isaac Asimov considered the issue in the 1950s in his I, Robot. At the insistence of his editor John W. Campbell Jr., he proposed the Three Laws of Robotics to govern artificially intelligent systems." 99 | "Much of his work was then spent testing the boundaries of his three laws to see where they would break down, or where they would create paradoxical or unanticipated behavior." 100 | "A panel convened by the United Kingdom in 2010 revised Asimov's laws to clarify that AI is the responsibility either of its manufacturers, or of its owner/operator." 101 | "The movies Bicentennial Man and A.I. deal with the possibility of sentient robots that could love. I, Robot explored some aspects of Asimov's three laws." 102 | The Three Laws of Robotics (often shortened to The Three Laws or known as Asimov's Laws) are a set of rules devised by science fiction author Isaac Asimov. 103 | "The Three Laws, quoted from the ""Handbook of Robotics, 56th Edition, 2058 A.D."", are: First Law -A robot may not injure a human being or, through inaction, allow a human being to come to harm., Second Law - A robot must obey the orders given it by human beings except where such orders would conflict with the First Law., Third Law -A robot must protect its own existence as long as such protection does not conflict with the First or Second Law" 104 | "The Laws are incorporated into almost all of the positronic robots appearing in his fiction, and cannot be bypassed, being intended as a safety feature." 105 | Many of Asimov's robot-focused stories involve robots behaving in unusual and counter-intuitive ways as an unintended consequence of how the robot applies the Three Laws to the situation in which it finds itself. 106 | The original laws have been altered and elaborated on by Asimov and other authors. 107 | "Asimov also added a fourth, or zeroth law, to precede the others: A robot may not harm humanity, or, by inaction, allow humanity to come to harm." 108 | "The Three Laws, and the zeroth, have pervaded science fiction and are referred to in many books, films, and other media." 109 | "Asimov attributes the Three Laws to John W. Campbell, from a conversation that took place on 23 December 1940." 110 | Campbell claimed that Asimov had the Three Laws already in his mind and that they simply needed to be stated explicitly. 111 | "ccording to his autobiographical writings, Asimov included the First Law's ""inaction"" clause because of Arthur Hugh Clough's poem ""The Latest Decalogue"" (text in Wikisource), which includes the satirical lines ""Thou shalt not kill, but needst not strive / officiously to keep alive""." 112 | "Although Asimov pins the creation of the Three Laws on one particular date, their appearance in his literature happened over a period." 113 | "He wrote two robot stories with no explicit mention of the Laws, ""Robbie"" and ""Reason""." 114 | "He assumed, however, that robots would have certain inherent safeguards. ""Liar!"", his third robot story, makes the first mention of the First Law but not the other two." 115 | "When these stories and several others were compiled in the anthology I, Robot, ""Reason"" and ""Robbie"" were updated to acknowledge all the Three Laws, though the material Asimov added to ""Reason"" is not entirely consistent with the Three Laws as he described them elsewhere" 116 | "In his short story ""Evidence"" Asimov lets his recurring character Dr. Susan Calvin expound a moral basis behind the Three Laws." 117 | "Calvin points out that human beings are typically expected to refrain from harming other human beings (except in times of extreme duress like war, or to save a greater number) and this is equivalent to a robot's First Law" 118 | "Likewise, according to Calvin, society expects individuals to obey instructions from recognized authorities such as doctors, teachers and so forth which equals the Second Law of Robotics." 119 | Finally humans are typically expected to avoid harming themselves which is the Third Law for a robot. 120 | "The plot of ""Evidence"" revolves around the question of telling a human being apart from a robot constructed to appear human – Calvin reasons that if such an individual obeys the Three Laws he may be a robot or simply ""a very good man""" 121 | "Asimov later wrote that he should not be praised for creating the Laws, because they are ""obvious from the start, and everyone is aware of them subliminally." 122 | The Laws just never happened to be put into brief sentences until I managed to do the job. 123 | "I have my answer ready whenever someone asks me if I think that my Three Laws of Robotics will actually be used to govern the behavior of robots, once they become versatile and flexible enough to be able to choose among different courses of behavior." 124 | "My answer is, ""Yes, the Three Laws are the only way in which rational human beings can deal with robotsor with anything else.""" 125 | Asimov's stories test his Three Laws in a wide variety of circumstances leading to proposals and rejection of modifications. 126 | "Science fiction scholar James Gunn writes in 1982, ""The Asimov robot stories as a whole may respond best to an analysis on this basis: the ambiguity in the Three Laws and the ways in which Asimov played twenty-nine variations upon a theme""" 127 | "Removing the First Law's ""inaction"" clause solves this problem but creates the possibility of an even greater one: a robot could initiate an action that would harm a human (dropping a heavy weight and failing to catch it is the example given in the text), knowing that it was capable of preventing the harm and then decide not to do so." 128 | "Gaia is a planet with collective intelligence in the Foundation series which adopts a law similar to the First Law, and the Zeroth Law, as its philosophy: Gaia may not harm life or allow life to come to harm." 129 | "Three times during his writing career, Asimov portrayed robots that disregard the Three Laws entirely." 130 | "On the other hand, the short story ""Cal"" (from the collection Gold), told by a first-person robot narrator, features a robot who disregards the Three Laws because he has found something far more importanthe wants to be a writer." 131 | "The third is a short story entitled ""Sally"" in which cars fitted with positronic brains are apparently able to harm and kill humans in disregard of the First Law." 132 | "However, aside from the positronic brain concept, this story does not refer to other robot stories and may not be set in the same continuity." 133 | Without the basic theory of the Three Laws the fictional scientists of Asimov's universe would be unable to design a workable brain unit. 134 | "The character Dr. Gerrigel uses the term ""Asenion"" to describe robots programmed with the Three Laws." 135 | "The robots in Asimov's stories, being Asenion robots, are incapable of knowingly violating the Three Laws but, in principle, a robot in science fiction or in the real world could be non-Asenion." 136 | "Characters within the stories often point out that the Three Laws, as they exist in a robot's mind, are not the written versions usually quoted by humans but abstract mathematical concepts upon which a robot's entire developing consciousness is based." 137 | "This concept is largely fuzzy and unclear in earlier stories depicting very rudimentary robots who are only programmed to comprehend basic physical tasks, where the Three Laws act as an overarching safeguard, but by the era of The Caves of Steel featuring robots with human or beyond-human intelligence the Three Laws have become the underlying basic ethical worldview that determines the actions of all robots." 138 | "These three books, Caliban, Inferno and Utopia, introduce a new set of the Three Laws." 139 | "The so-called New Laws are similar to Asimov's originals with the following differences: the First Law is modified to remove the ""inaction"" clause, the same modification made in ""Little Lost Robot""; the Second Law is modified to require cooperation instead of obedience; the Third Law is modified so it is no longer superseded by the Second (i.e., a ""New Law"" robot cannot be ordered to destroy itself); finally, Allen adds a Fourth Law which instructs the robot to do ""whatever it likes"" so long as this does not conflict with the first three laws." 140 | "The Laws of Robotics are portrayed as something akin to a human religion, and referred to in the language of the Protestant Reformation, with the set of laws containing the Zeroth Law known as the ""Giskardian Reformation"" to the original ""Calvinian Orthodoxy"" of the Three Laws" 141 | "Randall Munroe has discussed the Three Laws in various instances, but possibly most directly by one of his comics entitled The Three Laws of Robotics which imagines the consequences of every distinct ordering of the existing three laws." 142 | "The Laws of Robotics presume that the terms ""human being"" and ""robot"" are understood and well defined." 143 | It takes as its concept the growing development of robots that mimic non-human living things and given programs that mimic simple animal behaviours which do not require the Three Laws. 144 | Both are to be considered alternatives to the possibility of a robot society that continues to be driven by the Three Laws as portrayed in the Foundation series. 145 | "In Lucky Starr and the Rings of Saturn, a novel unrelated to the Robot series but featuring robots programmed with the Three Laws, John Bigman Jones is almost killed by a Sirian robot on orders of its master." 146 | Advanced robots in fiction are typically programmed to handle the Three Laws in a sophisticated manner. 147 | "For example, the First Law may forbid a robot from functioning as a surgeon, as that act may cause damage to a human; however, Asimov's stories eventually included robot surgeons" 148 | "Asimov's Three Laws-obeying robots (Asenion robots) can experience irreversible mental collapse if they are forced into situations where they cannot obey the First Law, or if they discover they have unknowingly violated it." 149 | "The first example of this failure mode occurs in the story ""Liar!"", which introduced the First Law itself, and introduces failure by dilemmain this case the robot will hurt humans if he tells them something and hurt them if he does not." 150 | "This failure mode, which often ruins the positronic brain beyond repair, plays a significant role in Asimov's SF-mystery novel The Naked Sun." 151 | "As such, a robot is capable of taking an action which can be interpreted as following the First Law, thus avoiding a mental collapse." 152 | "Robots and artificial intelligences do not inherently contain or obey the Three Laws; their human creators must choose to program them in, and devise a means to do so." 153 | "Even the most complex robots currently produced are incapable of understanding and applying the Three Laws; significant advances in artificial intelligence would be needed to do so, and even if AI could reach human-level intelligence, the inherent ethical complexity as well as cultural/contextual dependency of the laws prevent them from being a good candidate to formulate robotics design constraints" 154 | "On the other hand, Asimov's later novels The Robots of Dawn, Robots and Empire and Foundation and Earth imply that the robots inflicted their worst long-term harm by obeying the Three Laws perfectly well, thereby depriving humanity of inventive or risk-taking behaviour." 155 | "The futurist Hans Moravec (a prominent figure in the transhumanist movement) proposed that the Laws of Robotics should be adapted to ""corporate intelligences"" the corporations driven by AI and robotic manufacturing power which Moravec believes will arise in the near future." 156 | "In contrast, the David Brin novel Foundation's Triumph (1999) suggests that the Three Laws may decay into obsolescence." 157 | "Brin even portrays R. Daneel Olivaw worrying that, should robots continue to reproduce themselves, the Three Laws would become an evolutionary handicap and natural selection would sweep the Laws away." 158 | "Although the robots would not be evolving through design instead of mutation because the robots would have to follow the Three Laws while designing and the prevalence of the laws would be ensured,[53] design flaws or construction errors could functionally take the place of biological mutation." 159 | "Asimov himself believed that his Three Laws became the basis for a new view of robots which moved beyond the ""Frankenstein complex""" 160 | Stories written by other authors have depicted robots as if they obeyed the Three Laws but tradition dictates that only Asimov could quote the Laws explicitly. 161 | "Asimov believed the Three Laws helped foster the rise of stories in which robots are ""lovable"" – Star Wars being his favorite example." 162 | "Where the laws are quoted verbatim, such as in the Buck Rogers in the 25th Century episode ""Shgoratchx!"", it is not uncommon for Asimov to be mentioned in the same dialogue as can also be seen in the Aaron Stone pilot where an android states that it functions under Asimov's Three Laws." 163 | Asimov was delighted with Robby and noted that Robby appeared to be programmed to follow his Three Laws. 164 | The film Bicentennial Man (1999) features Robin Williams as the Three Laws robot NDR-114 (the serial number is partially a reference to Stanley Kubrick's signature numeral) 165 | "Williams recites the Three Laws to his employers, the Martin family, aided by a holographic projection. The film only loosely follows the original story." 166 | "Harlan Ellison's proposed screenplay for I, Robot began by introducing the Three Laws, and issues growing from the Three Laws form a large part of the screenplay's plot development" 167 | "A positronic brain is a fictional technological device, originally conceived by science fiction writer Isaac Asimov." 168 | "When Asimov wrote his first robot stories in 1939 and 1940, the positron was a newly discovered particle, and so the buzz word ""positronic"" added a scientific connotation to the concept." 169 | "Asimov's 1942 short story ""Runaround"" elaborates his fictional Three Laws of Robotics, which are ingrained in the positronic brains of nearly all of his robots." 170 | "Positronic brains, as such, are a kind of brain made of positrons – small particles which help in the transmission of various thoughts and impulses to the brain and help the brain's cognition relay the selected emotion or solution." 171 | Asimov remained vague about the technical details of positronic brains except to assert that their substructure was formed from an alloy of platinum and iridium. 172 | "The focus of Asimov's stories was directed more towards the software of robotssuch as the Three Laws of Roboticsthan the hardware in which it was implemented, although it is stated in his stories that to create a positronic brain without the Three Laws, it would have been necessary to spend years redesigning the fundamental approach towards the brain itself." 173 | "Within his stories of robotics on Earth and their development by U.S. Robots, Asimov's positronic brain is less of a plot device and more of a technological item worthy of study." 174 | A positronic brain cannot ordinarily be built without incorporating the Three Laws; any modification thereof would drastically modify robot behavior. 175 | The Three Laws are also a bottleneck in brain sophistication 176 | "Very complex brains designed to handle world economy interpret the First Law in an expanded sense to include humanity as opposed to a single human; in Asimov's later works like Robots and Empire this is referred to as the ""Zeroth Law""" 177 | "At least one brain constructed as a calculating machine, as opposed to being a robot control circuit, was designed to have a flexible, childlike personality so that it was able to pursue difficult problems without the Three Laws inhibiting it completely." 178 | The sophistication of positronic circuitry renders a brain so small that it could comfortably fit within the skull of an insect. 179 | "It offers speed and capacity improvements over traditional positronic designs, but the strong influence of tradition make robotics labs reject Anshaw's work." 180 | "Only one roboticist, Fredda Leving, chooses to adopt gravitonics, because it offers her a blank slate on which she could explore alternatives to the Three Laws" 181 | "When Queen Allura of Venus (Mari Blanchard) puts Orville (Lou Costello) to a lie detector test in an ESP-enabled crystal chair, she states that it is ""based on the principle of the Positronic Brain.""" 182 | "In a mini story entitled ""Night Vision!"" in Annual #6 of the Marvel comic, writer Scot Edelman refers to the brain of the synthezoid ""The Vision"" as positronic." 183 | "Human space colonists examine ""dead"" Daleks and, upon their re-activation, conjecture as to ""what sort of positronic brain must this device possess""." 184 | "However, the Daleks are actually organic life-forms that were encased in robotic shells, and thus do not possess the purported positronic brain and, in any case, do not obey the Three Laws of Robotics." 185 | "In the seventeenth season (1979–80) story ""The Horns of Nimon"", the fourth incarnation of the Doctor, played by Tom Baker, recognizes the Labyrinth-like building complex that serves as the lair of the Nimons as resembling both physically and functionally a ""giant positronic circuit""." 186 | The creation was said to be controlled by a positronic brain. 187 | "Several fictional characters in Star Trek: The Next GenerationLieutenant Commander Data, his ""mother"" Julianna Soong Tainer, his daughter Lal, and his brothers Lore and B-4are androids equipped with positronic brains created by Dr. Noonien Soong." 188 | """Positronic implants"" were used to replace lost function in Vedek Bareil's brain in the Deep Space 9 episode ""Life Support""." 189 | "In the German science fiction series Perry Rhodan (written starting in 1961), positronic brains (German: Positroniken) are the main computer technology; for quite a time they are replaced by the more powerful Syntronics, but those stop working due to the increased Hyperimpedance." 190 | The most powerful positronic brain is called NATHAN and covers large parts of the Earth's moon. 191 | "Many of the larger computers (including NATHAN) as well as the race of Posbis combine a biological component with the positronic brain, giving them sentience and creativity." 192 | "The robots in the 2004 film I, Robot (loosely based upon several of Isaac Asimov's stories) also have positronic brains." 193 | "Sonny, one of the main characters from the film, has two separate positronic brainsthe second being a positronic ""heart""so it has choices open to him the other robots in the film do not have." 194 | "The film also features a colossal positronic brain, VIKI, who is bound by the Three Laws." 195 | Sonny also has the possibility of being able to develop emotions and a sense of right and wrong independent of the Three Laws of Robotics; it has the ability to choose not to obey them. 196 | "The robots in the 1999 film Bicentennial Man (based on one of Asimov's stories) also have positronic brains, including the main character Andrew, an NDR series robot that starts to experience human characteristics such as creativity." 197 | "Only when Andrew allows his positronic brain to ""decay"", thereby willfully abandoning his immortality, is he declared a human being." 198 | "Twiki and Crichton, two robotic characters who appear in the Buck Rogers in the 25th Century television series, were equipped with positronic brains." 199 | "Crichton recited Asimov's ""Three Laws of Robotics"" upon activation." 200 | "In 1989, in the Mystery Science Theater 3000 Season One episode The Corpse Vanishes, Crow T. Robot and Tom Servo read an issue of Tiger Bot magazine featuring an interview with the Star Trek character, Data. They then lament the fact that they don't have positronic brains like him." 201 | "In the second episode, Spectreman's robot head is found and viewers discover he is a robot with a positronic brain." 202 | "The game Stellaris features Positronic Artificial Intelligence as a possible research goal, which is employed with ""Synthetics"" (sentient robotic beings) and sentient computers for usage in research, administration, combat etc." 203 | "In the game Space Station 13, players can research and construct positronic brains, and place them inside of AIs, cyborgs and even mechas" 204 | "A neural pathway is the connection formed by axons that project from neurons to make synapses onto neurons in another location, to enable a signal to be sent from one region of the nervous system to another." 205 | A neural pathway connects one part of the nervous system to another using bundles of axons called tracts. 206 | "Shorter neural pathways are found within grey matter in the brain, whereas longer projections, made up of myelinated axons, constitute white matter." 207 | "In the hippocampus there are neural pathways involved in its circuitry including the perforant pathway, that provides a connectional route from the entorhinal cortex[2] to all fields of the hippocampal formation, including the dentate gyrus, all CA fields (including CA1),[3] and the subiculum." 208 | "Note that the ""old"" name was primarily descriptive, evoking the pyramids of antiquity, from the appearance of this neural pathway in the medulla oblongata." 209 | "The axon of a nerve cell is, in general, responsible for transmitting information over a relatively long distance, therefore, most neural pathways are made up of axons." 210 | "Neural pathways in the basal ganglia in the cortico-basal ganglia-thalamo-cortical loop, are seen as controlling different aspects of behaviour." 211 | It has been proposed that the dopamine system of pathways is the overall organiser of the neural pathways that are seen to be parallels of the dopamine pathways. 212 | Dopamine is provided both tonically and phasically in response to the needs of the neural pathways. 213 | Miguel Nicolelis and colleagues demonstrated that the activity of large neural ensembles can predict arm position 214 | Their BCI used high-density electrocorticography to tap neural activity from a patient's brain and used deep learning methods to synthesize speech. 215 | The use of BMIs has also led to a deeper understanding of neural networks and the central nervous system. 216 | "Beyond BCI systems that decode neural activity to drive external effectors, BCI systems may be used to encode signals from the periphery." 217 | "These sensory BCI devices enable real-time, behaviorally-relevant decisions based upon closed-loop neural stimulation." 218 | "The participant imagined moving his hand to write letters, and the system performed handwriting recognition on electrical signals detected in the motor cortex, utilizing hidden Markov models and recurrent neural networks for decoding." 219 | This proximity to motor cortex underlies the Stentrode's ability to measure neural activity. 220 | "The Stentrode communicates neural activity to a battery-less telemetry unit implanted in the chest, which communicates wirelessly with an external telemetry unit capable of power and data transfer." 221 | "Their study achieved word error rates of 3% (a marked improvement from prior publications) utilizing an encoder-decoder neural network, which translated ECoG data into one of fifty sentences composed of 250 unique words." 222 | The current focus of research is user-to-user communication through analysis of neural signals 223 | Researchers have built devices to interface with neural cells and entire neural networks in cultures outside animals. 224 | "After collection, the cortical neurons were cultured in a petri dish and rapidly began to reconnect themselves to form a living neural network." 225 | Flexible neural interfaces have been extensively tested in recent years in an effort to minimize brain tissue trauma related to mechanical mismatch between electrode and tissue 226 | "Walking robots simulate human or animal gait, as a replacement for wheeled motion" 227 | "A major goal in this field is in developing capabilities for robots to autonomously decide how, when, and where to move." 228 | "However, coordinating numerous robot joints for even simple matters, like negotiating stairs, is difficult." 229 | "Walking robots simulate human or animal gait, as a replacement for wheeled motion." 230 | "The robot functioned effectively, walking in several gait patterns and crawling with its high DoF legs" 231 | "Multiple legs allow several different gaits, even if a leg is damaged, making their movements more useful in robots transporting objects." 232 | This is because an ideal rolling (but not slipping) wheel loses no energy 233 | "Coordinated, sequential mechanical action having the appearance of a traveling wave is called a metachronal rhythm or wave, and is employed in nature by ciliates for transport, and by worms and arthropods for locomotio" 234 | "Brachiation allows robots to travel by swinging, using energy only to grab and release surfaces" 235 | This motion is similar to an ape swinging from tree to tree. 236 | The two types of brachiation can be compared to bipedal walking motions (continuous contact) or running (ricochetal). 237 | "Continuous contact is when a hand/grasping mechanism is always attached to the surface being crossed; ricochetal employs a phase of aerial ""flight"" from one surface/limb to the next." 238 | "Thus robots of this nature need to be small, light, quick, and possess the ability to move in multiple locomotive modes." 239 | Robots can also be designed to perform locomotion in multiple modes. 240 | "Several robots capable of basic locomotion in a single mode have been invented but are found to lack several capabilities, hence limiting their functions and applications." 241 | "In addition, Pteromyini are able to exhibit multi-modal locomotion due to the membrane that connects the fore and hind legs which also enhances their gliding ability." 242 | Pteromyini are able to boost their gliding ability due to the numerous physical attributes they possess. 243 | "The common vampire bats are known to possess powerful modes of terrestrial locomotion, such as jumping, and aerial locomotion such as gliding." 244 | "Between the two modes of locomotion, there are three bones that are shared." 245 | "Since there already exists a sharing of components for both modes, no additional muscles are needed when transitioning from jumping to gliding" 246 | The desert locust is known for its ability to jump and fly over long distances as well as crawl on land. 247 | A detailed study of the anatomy of this organism provides some detail about the mechanisms for locomotion. 248 | A detailed study of the anatomy of this organism provides some detail about the mechanisms for locomotion. 249 | The hind legs of the locust are developed for jumping. 250 | "In order for a perfect jump to occur, the locust must push its legs on the ground with a strong enough force so as to initiate a fast takeoff." 251 | The force must be adequate enough in order to attain a quick takeoff and decent jump height. 252 | "In order to effectively transition from the jumping mode to the flying mode, the insect must adjust the time during the wing opening to maximize the distance and height of the jump." 253 | "When it is at the zenith of its jump, the flight mode becomes actuated." 254 | "Following the discovery of the requisite model to mimic, researchers sought to design a legged robot that was capable of achieving effective motion in aerial and terrestrial environments by the use of a flexible membrane." 255 | The membrane had to be flexible enough to allow for unrestricted movement of the legs during gliding and walking. 256 | The leg of the robot had to be designed to allow for appropriate torques for walking as well as gliding 257 | "Following the design of the leg and membrane of the robot, its average gliding ratio (GR) was determined to be 1.88." 258 | "The robot functioned effectively, walking in several gait patterns and crawling with its high DoF legs." 259 | These performances demonstrated the gliding and walking capabilities of the robot and its multi-modal locomotion 260 | "The design of the robot called Multi-Mo Bat involved the establishment of four primary phases of operation: energy storage phase, jumping phase, coasting phase, and gliding phase" 261 | The energy storing phase essentially involves the reservation of energy for the jumping energy. 262 | This process additionally creates a torque around the joint of the shoulders which in turn configures the legs for jumping 263 | "Once the stored energy is released, the jump phase can be initiated" 264 | "When the jump phase is initiated and the robot takes off from the ground, it transitions to the coast phase which occurs until the acme is reached and it begins to descend." 265 | "At this stage, the robot glides down." 266 | The robot designed was powered by a single DC motor which integrated the performances of jumping and flapping 267 | The primary feature of the robot's design was a gear system powered by a single motor which allowed the robot to perform its jumping and flapping motions. 268 | "Just like the motion of the locust, the motion of the robot is initiated by the flexing of the legs to the position of maximum energy storage after which the energy is released immediately to generate the force necessary to attain flight" 269 | The robot was tested for performance and the results demonstrated that the robot was able to jump to an approximate height of 0.9m while weighing 23g and flapping its wings at a frequency of about 19 Hz. 270 | "The robot tested without flapping wings performed less impressively, showing about 30% decrease in jumping performance as compared to the robot with the wings" 271 | These results are quite impressive[editorializing] as it is expected that the reverse be the case since the weight of the wings should have impacted the jumping. 272 | "The unique feature of Asimov's robots is the Three Laws of Robotics, hardwired in a robot's positronic brain, with which all robots in his fiction must comply, and which ensure that the robot does not turn against its creators" 273 | """Victory Unintentional"" has positronic robots obeying the Three Laws, but also a non-human civilization on Jupiter." 274 | """Let's Get Together"" features humanoid robots, but from a different future (where the Cold War is still in progress), and with no mention of the Three Laws" 275 | "The Robot series is a series of 37 science fiction short stories and six novels by American writer Isaac Asimov, featuring positronic robots." 276 | """Mother Earth"" (1948) - short story, in which no individual robots appear, but positronic robots are part of the background" 277 | "Most of Asimov's robot short stories, which he began to write in 1939, are set in the first age of positronic robotics and space exploration." 278 | "The stories were not initially conceived as a set, but rather all feature his positronic robotsindeed, there are some inconsistencies among them, especially between the short stories and the novels." 279 | "It was the Zoromes, then, who were the spiritual ancestors of my own ""positronic robots,"" all of them, from Robbie to R. Daneel." 280 | "The 1989 anthology Foundation's Friends included the positronic robot stories ""Balance"" by Mike Resnick, ""Blot"" by Hal Clement, ""PAPPI"" by Sheila Finch, ""Plato's Cave"" by Poul Anderson, ""The Fourth Law of Robotics"" by Harry Harrison and ""Carhunters of the Concrete Prairie"" by Robert Sheckley." 281 | "Bicentennial Man (1999) was the first theatrical movie adaptation of any Asimov story or novel and was based on both Asimov's original short story of the same name (1976) and its novel expansion, The Positronic Man (1993)" 282 | "The book also contains the short story in which Asimov's Three Laws of Robotics first appear, which had large influence on later science fiction and had impact on thought on ethics of artificial intelligence as well." 283 | "Its plot incorporates elements of ""Little Lost Robot"",[8] some of Asimov's character names and the Three Laws." 284 | "In 2004 The Saturday Evening Post said that I, Robot's Three Laws ""revolutionized the science fiction genre and made robots far more interesting than they ever had been before.""" 285 | "In Aliens, a 1986 movie, the synthetic person Bishop paraphrases Asimov's First Law in the line: ""It is impossible for me to harm, or by omission of action allow to be harmed, a human being.""" 286 | "An episode of The Simpsons entitled ""I D'oh Bot"" (2004) has Professor Frink build a robot named ""Smashius Clay"" (also named ""Killhammad Aieee"") that follows all three of Asimov's laws of robotics." 287 | "Leela once told Bender to ""cover his ears"" so that he would not hear the robot-destroying paradox which she used to destroy Robot Santa (he punishes the bad, he kills people, killing is bad, therefore he must punish himself), causing a total breakdown; additionally, Bender has stated that he is Three Laws Safe." 288 | "The Indian science fiction film Endhiran, released in 2010, refers to Asimov's three laws for artificial intelligence for the fictional character Chitti: The Robot." 289 | "When a scientist takes in the robot for evaluation, the panel enquires whether the robot was built using the Three Laws of Robotics." 290 | "Upon their publication in this collection, Asimov wrote a framing sequence presenting the stories as Calvin's reminiscences during an interview with her about her life's work, chiefly concerned with aberrant behaviour of robots and the use of ""robopsychology"" to sort out what is happening in their positronic brain." 291 | "Two months after I read it, I began 'Robbie', about a sympathetic robot, and that was the start of my positronic robot series." 292 | "The positronic brain, which Asimov named his robots' central processors, is what powers Data from Star Trek: The Next Generation, as well as other Soong type androids. " 293 | "Positronic brains have been referenced in a number of other television shows including Doctor Who, Once Upon a Time... Space, Perry Rhodan, The Number of the Beast, and others." 294 | "In ""Someday"" there are non-positronic computers which tell stories and do not obey the Three Laws." 295 | "In ""Sally"" there are positronic brain cars who can damage men or disobey without problems. No other kinds of robots are seen, and there is no mention of the Three Laws." 296 | "In "". . . That Thou Art Mindful of Him"" robots are created with a very flexible Three Laws management, and these create little, simplified robots with no laws that actually act against the Three Laws of Robotics." 297 | "Andrew uses the money to pay for bodily upgrades, keeping himself in perfect shape, but never has his positronic brain altered." 298 | The first scene of the story is explained as Andrew seeks out a robotic surgeon to perform an ultimately fatal operation: altering his positronic brain so that it will decay with time. 299 | "his story is set within Asimov's Foundation universe, which also includes his earlier Susan Calvin positronic robot tales." 300 | "Sir reveals that U.S. Robots has ended a study on generalized pathways and creative robots, frightened by Andrew's unpredictability." 301 | "However, the robot refuses, as the operation is harmful and violates the First Law of Robotics, which says a robot may never harm a human being." 302 | "The Positronic Man is a 1992 novel by American writers Isaac Asimov and Robert Silverberg, based on Asimov's 1976 novelette ""The Bicentennial Man""." 303 | In the twenty-first century the creation of the positronic brain leads to the development of robot laborers and revolutionizes life on Earth. 304 | "In The Positronic Man, the trends of fictional robotics in Asimov's Robot series (as outlined in the book I, Robot) are detailed as background events, with an indication that they are influenced by Andrew's story." 305 | "Only when Andrew allows his positronic brain to ""decay"", thereby willfully abandoning his immortality, is he declared a human being." 306 | "This story is set within Asimov's Foundation universe, which also includes his earlier Susan Calvin positronic robot tales." 307 | "No individual robots appear, but positronic robots are part of the background." 308 | "Earth faces a confrontation with its colonies, the ""Outer Worlds."" A historian looks back and sees the problem beginning a century and a half earlier, when Aurora got permission to ""introduce positronic robots into their community life.""" 309 | "The only witness is a malfunctioning house robot that has suffered damage to its positronic brain because it allowed harm to be done to a human, in violation of the First Law." 310 | "Ultimately, it is revealed that Delmarre's neighbor, roboticist Jothan Leebig, was working on putting positronic brains in spaceships." 311 | "Leebig poisoned Gruer by tricking his robots, using his knowledge of positronic brains, into putting poison into Gruer's drink." 312 | "This would negate the First Law, as such ships would not recognize that humans usually inhabit ships, and would therefore be able to attack and destroy other ships without regard for their crews." 313 | "R. Daneel and R. Giskard discover the roboticists' plan and attempt to stop Amadiro; but are hampered by the First Law of Robotics," 314 | "Daneel and Giskard, meanwhile, have inferred an additional Zeroth Law of Robotics: A robot may not injure humanity, or through inaction, allow humanity to come to harm." 315 | "It might enable them to overcome Amadiro, if they can use their telepathic perception of humanity to quell the inhibitions of the first law" 316 | "After Amadiro admits their plans, Giskard alters Amadiro's brain (using the newly created Zeroth Law); but in so doing, threatens his own." 317 | "Under the stress of having violated the First Law (in accordance with the Zeroth Law, but with the predicted benefit to humanity being uncertain), R. Giskard himself suffers a soon-fatal malfunction of his positronic brain but manages to confer his telepathic ability upon R. Daneel." 318 | "Dave Langford reviewed Robots and Empire for White Dwarf #85, and stated that ""Asimov always perks up when chopping logic with the Three Laws of Robotics, and here his robots come up with a Fourth, or rather Zeroth, Law." 319 | "In the novel, Asimov depicts the transition from his earlier Milky Way Galaxy, inhabited by both human beings and positronic robots, to his Galactic Empire." 320 | "Gladia is accompanied by the positronic robots R. Daneel Olivaw and R. Giskard Reventlov, both the former property of their creator, Dr. Han Fastolfe, who bequeathed them to Gladia in his will. R. Giskard has secret telepathic powers of which only R. Daneel knows." 321 | "The electrical aspect of robots is used for movement (through motors), sensing (where electrical signals are used to measure things like heat, sound, position, and energy status) and operation (robots need some level of electrical energy supplied to their motors and sensors in order to activate and perform basic operations)" 322 | "Actuators are the ""muscles"" of a robot, the parts which convert stored energy into movement." 323 | "Scientists from several European countries and Israel developed a prosthetic hand in 2009, called SmartHand, which functions like a real oneallowing patients to write with it, type on a keyboard, play piano and perform other fine movements" 324 | "As the robot falls to one side, it would jump slightly in that direction, in order to catch itself." 325 | "A quadruped was also demonstrated which could trot, run, pace, and bound" 326 | "A more advanced way for a robot to walk is by using a dynamic balancing algorithm, which is potentially more robust than the Zero Moment Point technique, as it constantly monitors the robot's motion, and places the feet in order to maintain stability" 327 | "In some of Asimov's other works, he states that the first use of the word robotics was in his short story Runaround (Astounding Science Fiction, March 1942),[4][5] where he introduced his concept of The Three Laws of Robotics" 328 | "Its limb control system allowed it to walk with the lower limbs, and to grip and transport objects with hands, using tactile sensors" 329 | "There are three different types of robotic programs: remote control, artificial intelligence and hybrid." 330 | "Robots that use artificial intelligence interact with their environment on their own without a control source, and can determine reactions to objects and problems they encounter using their preexisting programming." 331 | "Several robots have been made which can walk reliably on two legs, however, none have yet been made which are as robust as a human." 332 | "Many other robots have been built that walk on more than two legs, due to these robots being significantly easier to construct." 333 | "Walking robots can be used for uneven terrains, which would provide better mobility and energy efficiency than other locomotion methods." 334 | "Typically, robots on two legs can walk well on flat floors and can occasionally walk up stairs." 335 | "Several robots, built in the 1980s by Marc Raibert at the MIT Leg Laboratory, successfully demonstrated very dynamic walking. " 336 | "Initially, a robot with only one leg, and a very small foot could stay upright simply by hopping." 337 | " As the robot falls to one side, it would jump slightly in that direction, in order to catch itself." 338 | "A more advanced way for a robot to walk is by using a dynamic balancing algorithm, which is potentially more robust than the Zero Moment Point technique, as it constantly monitors the robot's motion, and places the feet in order to maintain stability." 339 | "This technique was recently demonstrated by Anybots' Dexter Robot,[99] which is so stable, it can even jump" 340 | Perhaps the most promising approach utilizes passive dynamics where the momentum of swinging limbs is used for greater efficiency 341 | "It has been shown that totally unpowered humanoid mechanisms can walk down a gentle slope, using only gravity to propel themselves." 342 | "Using this technique, a robot need only supply a small amount of motor power to walk along a flat surface or a little more to walk up a hill" 343 | One approach mimics the movements of a human climber on a wall with protrusions; adjusting the center of mass and moving each limb in turn to gain leverage. 344 | "Science fiction authors also typically assume that robots will eventually be capable of communicating with humans through speech, gestures, and facial expressions, rather than a command-line interface." 345 | "Evolutionary robots is a methodology that uses evolutionary computation to help design robots, especially the body form, or motion and behavior controllers." 346 | "Direct kinematics or forward kinematics refers to the calculation of end effector position, orientation, velocity, and acceleration when the corresponding joint values are known." 347 | "Inverse kinematics refers to the opposite case in which required joint values are calculated for given end effector values, as done in path planning." 348 | "Once all relevant positions, velocities, and accelerations have been calculated using kinematics, methods from the field of dynamics are used to study the effect of forces upon these movements." 349 | "Normal human gait is a complex process, which happens due to co-ordinated movements of the whole of the body, requiring the whole of Central Nervous System - the brain and spinal cord, to function properly." 350 | The most common cause for gait impairment is due to an injury of one or both legs. 351 | "Gait training is not simply re-educating a patient on how to walk, but also includes an initial assessment of their gait cycle - Gait analysis, creation of a plan to address the problem, as well as teaching the patient on how to walk on different surfaces." 352 | "Assistive devices and splints (orthosis) are often used in gait training, especially with those who have had surgery or an injury on their legs, but also with those who have balance or strength impairments as well." 353 | "Although gait training with parallel bars, treadmills and support systems can be beneficial, the long-term aim of gait training is usually to reduce patients' dependence on such technology in order to walk more in their daily lives." 354 | "A gait cycle is defined as the progression of movements that occurs before one leg can return to a certain position during walking, or ambulation." 355 | The gait cycle is studied in two phases - Swing and stance phase. 356 | Any gait training addressing a gait abnormality starts with a proper gait analysis. 357 | "The gait consists of a series of repetitive movements of the whole body during locomotion and is studied considering that each gait cycle repeats over itself, which is almost correct considering normal subjects." 358 | "The basic two phases are swing and stance phases, depending on whether the leg is free to swing or is in contact with the ground during the phase of gait studied." 359 | The stance phase is approximately 60% of the gait cycle and takes about 0.6 seconds to complete at a normal walking speed. 360 | "The swing phase occurs when the foot is not in contact with the ground, and constitutes about 40% of the gait cycle." 361 | The two point gait pattern requires a high level of coordination and balance 362 | "Recently, electromechanical devices such as the Hocoma Lokomat robot-driven gait orthosis have been introduced with the intention of reducing the physical labour demands on therapists." 363 | "Treadmill training, with or without a body-weight support, is an emerging therapy and is being used with stroke patients to improve kinematic gait parameters" 364 | Research has shown that this form of gait training demonstrates a more normal walking pattern without the compensatory movements commonly associated with stroke 365 | Determining the movement of a robot so that its end-effectors move from an initial configuration to a desired configuration is known as motion planning 366 | "The movement of a kinematic chain, whether it is a robot or an animated character, is modeled by the kinematics equations of the chain." 367 | Movement of one element requires the computation of the joint angles for the other elements to maintain the joint constraints. 368 | "For example, inverse kinematics allows an artist to move the hand of a 3D human model to a desired position and orientation and have an algorithm select the proper angles of the wrist, elbow, and shoulder joints." 369 | "Isaac Asimov considered the issue in the 1950s in his I, Robot. At the insistence of his editor John W. Campbell Jr., he proposed the Three Laws of Robotics to govern artificially intelligent systems." 370 | "Much of his work was then spent testing the boundaries of his three laws to see where they would break down, or where they would create paradoxical or unanticipated behavior." 371 | "A panel convened by the United Kingdom in 2010 revised Asimov's laws to clarify that AI is the responsibility either of its manufacturers, or of its owner/operator." 372 | "The movies Bicentennial Man and A.I. deal with the possibility of sentient robots that could love. I, Robot explored some aspects of Asimov's three laws." 373 | The Three Laws of Robotics (often shortened to The Three Laws or known as Asimov's Laws) are a set of rules devised by science fiction author Isaac Asimov. 374 | "The Three Laws, quoted from the ""Handbook of Robotics, 56th Edition, 2058 A.D."", are: First Law -A robot may not injure a human being or, through inaction, allow a human being to come to harm., Second Law - A robot must obey the orders given it by human beings except where such orders would conflict with the First Law., Third Law -A robot must protect its own existence as long as such protection does not conflict with the First or Second Law" 375 | "The Laws are incorporated into almost all of the positronic robots appearing in his fiction, and cannot be bypassed, being intended as a safety feature." 376 | Many of Asimov's robot-focused stories involve robots behaving in unusual and counter-intuitive ways as an unintended consequence of how the robot applies the Three Laws to the situation in which it finds itself. 377 | The original laws have been altered and elaborated on by Asimov and other authors. 378 | "Asimov also added a fourth, or zeroth law, to precede the others: A robot may not harm humanity, or, by inaction, allow humanity to come to harm." 379 | "The Three Laws, and the zeroth, have pervaded science fiction and are referred to in many books, films, and other media." 380 | "Asimov attributes the Three Laws to John W. Campbell, from a conversation that took place on 23 December 1940." 381 | Campbell claimed that Asimov had the Three Laws already in his mind and that they simply needed to be stated explicitly. 382 | "ccording to his autobiographical writings, Asimov included the First Law's ""inaction"" clause because of Arthur Hugh Clough's poem ""The Latest Decalogue"" (text in Wikisource), which includes the satirical lines ""Thou shalt not kill, but needst not strive / officiously to keep alive""." 383 | "Although Asimov pins the creation of the Three Laws on one particular date, their appearance in his literature happened over a period." 384 | "He wrote two robot stories with no explicit mention of the Laws, ""Robbie"" and ""Reason""." 385 | "He assumed, however, that robots would have certain inherent safeguards. ""Liar!"", his third robot story, makes the first mention of the First Law but not the other two." 386 | "When these stories and several others were compiled in the anthology I, Robot, ""Reason"" and ""Robbie"" were updated to acknowledge all the Three Laws, though the material Asimov added to ""Reason"" is not entirely consistent with the Three Laws as he described them elsewhere" 387 | "In his short story ""Evidence"" Asimov lets his recurring character Dr. Susan Calvin expound a moral basis behind the Three Laws." 388 | "Calvin points out that human beings are typically expected to refrain from harming other human beings (except in times of extreme duress like war, or to save a greater number) and this is equivalent to a robot's First Law" 389 | "Likewise, according to Calvin, society expects individuals to obey instructions from recognized authorities such as doctors, teachers and so forth which equals the Second Law of Robotics." 390 | Finally humans are typically expected to avoid harming themselves which is the Third Law for a robot. 391 | "The plot of ""Evidence"" revolves around the question of telling a human being apart from a robot constructed to appear human – Calvin reasons that if such an individual obeys the Three Laws he may be a robot or simply ""a very good man""" 392 | "Asimov later wrote that he should not be praised for creating the Laws, because they are ""obvious from the start, and everyone is aware of them subliminally." 393 | The Laws just never happened to be put into brief sentences until I managed to do the job. 394 | "I have my answer ready whenever someone asks me if I think that my Three Laws of Robotics will actually be used to govern the behavior of robots, once they become versatile and flexible enough to be able to choose among different courses of behavior." 395 | "My answer is, ""Yes, the Three Laws are the only way in which rational human beings can deal with robotsor with anything else.""" 396 | Asimov's stories test his Three Laws in a wide variety of circumstances leading to proposals and rejection of modifications. 397 | "Science fiction scholar James Gunn writes in 1982, ""The Asimov robot stories as a whole may respond best to an analysis on this basis: the ambiguity in the Three Laws and the ways in which Asimov played twenty-nine variations upon a theme""" 398 | "Removing the First Law's ""inaction"" clause solves this problem but creates the possibility of an even greater one: a robot could initiate an action that would harm a human (dropping a heavy weight and failing to catch it is the example given in the text), knowing that it was capable of preventing the harm and then decide not to do so." 399 | "Gaia is a planet with collective intelligence in the Foundation series which adopts a law similar to the First Law, and the Zeroth Law, as its philosophy: Gaia may not harm life or allow life to come to harm." 400 | "Three times during his writing career, Asimov portrayed robots that disregard the Three Laws entirely." 401 | "On the other hand, the short story ""Cal"" (from the collection Gold), told by a first-person robot narrator, features a robot who disregards the Three Laws because he has found something far more importanthe wants to be a writer." 402 | "The third is a short story entitled ""Sally"" in which cars fitted with positronic brains are apparently able to harm and kill humans in disregard of the First Law." 403 | "However, aside from the positronic brain concept, this story does not refer to other robot stories and may not be set in the same continuity." 404 | Without the basic theory of the Three Laws the fictional scientists of Asimov's universe would be unable to design a workable brain unit. 405 | "The character Dr. Gerrigel uses the term ""Asenion"" to describe robots programmed with the Three Laws." 406 | --------------------------------------------------------------------------------