├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── CodeFreeAutoML.yaml ├── LICENSE ├── NOTICE ├── README.md ├── autogluon-tab-with-test.py ├── lambda_function.py └── sourcedir.tar.gz /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | ## Code of Conduct 2 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 3 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 4 | opensource-codeofconduct@amazon.com with any additional questions or comments. 5 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing Guidelines 2 | 3 | Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional 4 | documentation, we greatly value feedback and contributions from our community. 5 | 6 | Please read through this document before submitting any issues or pull requests to ensure we have all the necessary 7 | information to effectively respond to your bug report or contribution. 8 | 9 | 10 | ## Reporting Bugs/Feature Requests 11 | 12 | We welcome you to use the GitHub issue tracker to report bugs or suggest features. 13 | 14 | When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already 15 | reported the issue. Please try to include as much information as you can. Details like these are incredibly useful: 16 | 17 | * A reproducible test case or series of steps 18 | * The version of our code being used 19 | * Any modifications you've made relevant to the bug 20 | * Anything unusual about your environment or deployment 21 | 22 | 23 | ## Contributing via Pull Requests 24 | Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that: 25 | 26 | 1. You are working against the latest source on the *master* branch. 27 | 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already. 28 | 3. You open an issue to discuss any significant work - we would hate for your time to be wasted. 29 | 30 | To send us a pull request, please: 31 | 32 | 1. Fork the repository. 33 | 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change. 34 | 3. Ensure local tests pass. 35 | 4. Commit to your fork using clear commit messages. 36 | 5. Send us a pull request, answering any default questions in the pull request interface. 37 | 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation. 38 | 39 | GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and 40 | [creating a pull request](https://help.github.com/articles/creating-a-pull-request/). 41 | 42 | 43 | ## Finding contributions to work on 44 | Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start. 45 | 46 | 47 | ## Code of Conduct 48 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 49 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 50 | opensource-codeofconduct@amazon.com with any additional questions or comments. 51 | 52 | 53 | ## Security issue notifications 54 | If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue. 55 | 56 | 57 | ## Licensing 58 | 59 | See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution. 60 | 61 | We may ask you to sign a [Contributor License Agreement (CLA)](http://en.wikipedia.org/wiki/Contributor_License_Agreement) for larger changes. 62 | -------------------------------------------------------------------------------- /CodeFreeAutoML.yaml: -------------------------------------------------------------------------------- 1 | AWSTemplateFormatVersion: 2010-09-09 2 | Description: Code-Free AutoML Pipeline Template 3 | 4 | Parameters: 5 | BucketName: 6 | Type: String 7 | Default: code-free-automl-yournamehere 8 | Description: The name of the new S3 bucket created to use with the pipeline. Make sure it is unique! 9 | TrainingInstanceType: 10 | Type: String 11 | Default: m5.xlarge 12 | Description: The compute instance type used to run SageMaker training jobs kicked off automatically by the pipeline. 13 | 14 | Resources: 15 | Bucket: 16 | Type: AWS::S3::Bucket 17 | Properties: 18 | BucketName: !Ref BucketName 19 | PublicAccessBlockConfiguration: 20 | BlockPublicAcls: True 21 | BlockPublicPolicy: True 22 | IgnorePublicAcls: True 23 | RestrictPublicBuckets: True 24 | NotificationConfiguration: 25 | LambdaConfigurations: 26 | - Event: s3:ObjectCreated:* 27 | Filter: 28 | S3Key: 29 | Rules: 30 | - Name: prefix 31 | Value: data/ 32 | - Name: suffix 33 | Value: _train.csv 34 | Function: !GetAtt Lambda.Arn 35 | 36 | Lambda: 37 | Type: AWS::Lambda::Function 38 | Properties: 39 | Code: 40 | ZipFile: | 41 | import json 42 | import os 43 | import boto3 44 | import datetime 45 | from urllib.parse import unquote_plus 46 | import logging 47 | logging.basicConfig(level=logging.INFO, 48 | format='%(asctime)s %(message)s', 49 | datefmt='%Y-%m-%d %H:%M:%S') 50 | 51 | def lambda_handler(event, context): 52 | for record in event['Records']: 53 | bucket = record['s3']['bucket']['name'] 54 | key = unquote_plus(record['s3']['object']['key']) 55 | tmpkey = key.replace('/', '') 56 | logging.info(key) 57 | logging.info(tmpkey) 58 | filename = key.split('/')[-1] 59 | print(filename) 60 | dataset = filename.split('_')[0] 61 | print(dataset) 62 | 63 | now = datetime.datetime.now 64 | str_time = now().strftime('%Y-%m-%d-%H-%M-%S-%f')[:-3] 65 | sm = boto3.Session().client('sagemaker') 66 | training_job_params = { 67 | 'TrainingJobName': dataset + '-autogluon-' + str_time, 68 | 'HyperParameters': { 69 | 'filename':json.dumps(filename), 70 | 'sagemaker_container_log_level': '20', 71 | 'sagemaker_enable_cloudwatch_metrics': 'false', 72 | 'sagemaker_program': 'autogluon-tab-with-test.py', 73 | 'sagemaker_region': os.environ['AWS_REGION'], 74 | 'sagemaker_submit_directory': 's3://' + bucket + '/source/sourcedir.tar.gz', 75 | 's3-output': os.environ['S3_OUTPUT_PATH'] 76 | }, 77 | 'AlgorithmSpecification': { 78 | 'TrainingImage': '763104351884.dkr.ecr.' + os.environ['AWS_REGION'] + '.amazonaws.com/mxnet-training:1.6.0-cpu-py3', 79 | 'TrainingInputMode': 'File', 80 | 'EnableSageMakerMetricsTimeSeries': False 81 | }, 82 | 'RoleArn': os.environ['SAGEMAKER_ROLE_ARN'], 83 | 'InputDataConfig': [ 84 | { 85 | 'ChannelName': 'training', 86 | 'DataSource': { 87 | 'S3DataSource': { 88 | 'S3DataType': 'S3Prefix', 89 | 'S3Uri': os.environ['S3_TRIGGER_PATH'], 90 | 'S3DataDistributionType': 'FullyReplicated' 91 | } 92 | }, 93 | 'CompressionType': 'None', 94 | 'RecordWrapperType': 'None' 95 | } 96 | ], 97 | 'OutputDataConfig': { 98 | 'KmsKeyId': '', 99 | 'S3OutputPath': os.environ['S3_OUTPUT_PATH'] 100 | }, 101 | 'ResourceConfig': { 102 | 'InstanceType': os.environ['AG_INSTANCE_TYPE'], 103 | 'InstanceCount': 1, 104 | 'VolumeSizeInGB': 30 105 | }, 106 | 'StoppingCondition': { 107 | 'MaxRuntimeInSeconds': 86400 108 | }, 109 | 'EnableNetworkIsolation': False, 110 | 'EnableInterContainerTrafficEncryption': False, 111 | 'EnableManagedSpotTraining': False, 112 | } 113 | 114 | response = sm.create_training_job(**training_job_params) 115 | 116 | return { 117 | 'statusCode': 200, 118 | 'body': json.dumps(key) 119 | } 120 | Handler: index.lambda_handler 121 | Runtime: python3.7 122 | Description: Lambda to kick off SageMaker training job for code-free AutoML pipeline 123 | MemorySize: 128 124 | Timeout: 3 125 | Role: !GetAtt LambdaIamRole.Arn 126 | Environment: 127 | Variables: 128 | AG_INSTANCE_TYPE: !Sub ml.${TrainingInstanceType} 129 | S3_OUTPUT_PATH: !Sub s3://${BucketName}/results/ 130 | S3_TRIGGER_PATH: !Sub s3://${BucketName}/data/ 131 | SAGEMAKER_ROLE_ARN: !GetAtt SageMakerIamRole.Arn 132 | 133 | S3InvokeLambdaPermission: 134 | Type: AWS::Lambda::Permission 135 | Properties: 136 | Action: lambda:InvokeFunction 137 | FunctionName: !Ref Lambda 138 | Principal: s3.amazonaws.com 139 | SourceAccount: !Ref AWS::AccountId 140 | SourceArn: !Sub arn:aws:s3:::${BucketName} 141 | 142 | LambdaIamRole: 143 | Type: AWS::IAM::Role 144 | Properties: 145 | AssumeRolePolicyDocument: 146 | Version: 2012-10-17 147 | Statement: 148 | - Effect: Allow 149 | Principal: 150 | Service: lambda.amazonaws.com 151 | Action: sts:AssumeRole 152 | Path: / 153 | ManagedPolicyArns: 154 | - arn:aws:iam::aws:policy/AWSLambdaExecute 155 | - arn:aws:iam::aws:policy/AmazonSageMakerFullAccess 156 | SageMakerIamRole: 157 | Type: AWS::IAM::Role 158 | Properties: 159 | AssumeRolePolicyDocument: 160 | Version: 2012-10-17 161 | Statement: 162 | - Effect: Allow 163 | Principal: 164 | Service: sagemaker.amazonaws.com 165 | Action: sts:AssumeRole 166 | Path: / 167 | ManagedPolicyArns: 168 | - arn:aws:iam::aws:policy/AmazonSageMakerFullAccess 169 | - !Ref SageMakerS3Policy 170 | 171 | SageMakerS3Policy: 172 | Type: AWS::IAM::ManagedPolicy 173 | Properties: 174 | ManagedPolicyName: AmazonS3CodeFreeAutoMLAccess 175 | PolicyDocument: 176 | Version: 2012-10-17 177 | Statement: 178 | - Effect: Allow 179 | Action: 180 | - s3:GetObject 181 | - s3:PutObject 182 | - s3:DeleteObject 183 | - s3:ListBucket 184 | Resource: 185 | - !Sub arn:aws:s3:::${BucketName} 186 | - !Sub arn:aws:s3:::${BucketName}/* 187 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | 2 | Apache License 3 | Version 2.0, January 2004 4 | http://www.apache.org/licenses/ 5 | 6 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 7 | 8 | 1. Definitions. 9 | 10 | "License" shall mean the terms and conditions for use, reproduction, 11 | and distribution as defined by Sections 1 through 9 of this document. 12 | 13 | "Licensor" shall mean the copyright owner or entity authorized by 14 | the copyright owner that is granting the License. 15 | 16 | "Legal Entity" shall mean the union of the acting entity and all 17 | other entities that control, are controlled by, or are under common 18 | control with that entity. For the purposes of this definition, 19 | "control" means (i) the power, direct or indirect, to cause the 20 | direction or management of such entity, whether by contract or 21 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 22 | outstanding shares, or (iii) beneficial ownership of such entity. 23 | 24 | "You" (or "Your") shall mean an individual or Legal Entity 25 | exercising permissions granted by this License. 26 | 27 | "Source" form shall mean the preferred form for making modifications, 28 | including but not limited to software source code, documentation 29 | source, and configuration files. 30 | 31 | "Object" form shall mean any form resulting from mechanical 32 | transformation or translation of a Source form, including but 33 | not limited to compiled object code, generated documentation, 34 | and conversions to other media types. 35 | 36 | "Work" shall mean the work of authorship, whether in Source or 37 | Object form, made available under the License, as indicated by a 38 | copyright notice that is included in or attached to the work 39 | (an example is provided in the Appendix below). 40 | 41 | "Derivative Works" shall mean any work, whether in Source or Object 42 | form, that is based on (or derived from) the Work and for which the 43 | editorial revisions, annotations, elaborations, or other modifications 44 | represent, as a whole, an original work of authorship. For the purposes 45 | of this License, Derivative Works shall not include works that remain 46 | separable from, or merely link (or bind by name) to the interfaces of, 47 | the Work and Derivative Works thereof. 48 | 49 | "Contribution" shall mean any work of authorship, including 50 | the original version of the Work and any modifications or additions 51 | to that Work or Derivative Works thereof, that is intentionally 52 | submitted to Licensor for inclusion in the Work by the copyright owner 53 | or by an individual or Legal Entity authorized to submit on behalf of 54 | the copyright owner. For the purposes of this definition, "submitted" 55 | means any form of electronic, verbal, or written communication sent 56 | to the Licensor or its representatives, including but not limited to 57 | communication on electronic mailing lists, source code control systems, 58 | and issue tracking systems that are managed by, or on behalf of, the 59 | Licensor for the purpose of discussing and improving the Work, but 60 | excluding communication that is conspicuously marked or otherwise 61 | designated in writing by the copyright owner as "Not a Contribution." 62 | 63 | "Contributor" shall mean Licensor and any individual or Legal Entity 64 | on behalf of whom a Contribution has been received by Licensor and 65 | subsequently incorporated within the Work. 66 | 67 | 2. Grant of Copyright License. Subject to the terms and conditions of 68 | this License, each Contributor hereby grants to You a perpetual, 69 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 70 | copyright license to reproduce, prepare Derivative Works of, 71 | publicly display, publicly perform, sublicense, and distribute the 72 | Work and such Derivative Works in Source or Object form. 73 | 74 | 3. Grant of Patent License. Subject to the terms and conditions of 75 | this License, each Contributor hereby grants to You a perpetual, 76 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 77 | (except as stated in this section) patent license to make, have made, 78 | use, offer to sell, sell, import, and otherwise transfer the Work, 79 | where such license applies only to those patent claims licensable 80 | by such Contributor that are necessarily infringed by their 81 | Contribution(s) alone or by combination of their Contribution(s) 82 | with the Work to which such Contribution(s) was submitted. If You 83 | institute patent litigation against any entity (including a 84 | cross-claim or counterclaim in a lawsuit) alleging that the Work 85 | or a Contribution incorporated within the Work constitutes direct 86 | or contributory patent infringement, then any patent licenses 87 | granted to You under this License for that Work shall terminate 88 | as of the date such litigation is filed. 89 | 90 | 4. Redistribution. You may reproduce and distribute copies of the 91 | Work or Derivative Works thereof in any medium, with or without 92 | modifications, and in Source or Object form, provided that You 93 | meet the following conditions: 94 | 95 | (a) You must give any other recipients of the Work or 96 | Derivative Works a copy of this License; and 97 | 98 | (b) You must cause any modified files to carry prominent notices 99 | stating that You changed the files; and 100 | 101 | (c) You must retain, in the Source form of any Derivative Works 102 | that You distribute, all copyright, patent, trademark, and 103 | attribution notices from the Source form of the Work, 104 | excluding those notices that do not pertain to any part of 105 | the Derivative Works; and 106 | 107 | (d) If the Work includes a "NOTICE" text file as part of its 108 | distribution, then any Derivative Works that You distribute must 109 | include a readable copy of the attribution notices contained 110 | within such NOTICE file, excluding those notices that do not 111 | pertain to any part of the Derivative Works, in at least one 112 | of the following places: within a NOTICE text file distributed 113 | as part of the Derivative Works; within the Source form or 114 | documentation, if provided along with the Derivative Works; or, 115 | within a display generated by the Derivative Works, if and 116 | wherever such third-party notices normally appear. The contents 117 | of the NOTICE file are for informational purposes only and 118 | do not modify the License. You may add Your own attribution 119 | notices within Derivative Works that You distribute, alongside 120 | or as an addendum to the NOTICE text from the Work, provided 121 | that such additional attribution notices cannot be construed 122 | as modifying the License. 123 | 124 | You may add Your own copyright statement to Your modifications and 125 | may provide additional or different license terms and conditions 126 | for use, reproduction, or distribution of Your modifications, or 127 | for any such Derivative Works as a whole, provided Your use, 128 | reproduction, and distribution of the Work otherwise complies with 129 | the conditions stated in this License. 130 | 131 | 5. Submission of Contributions. Unless You explicitly state otherwise, 132 | any Contribution intentionally submitted for inclusion in the Work 133 | by You to the Licensor shall be under the terms and conditions of 134 | this License, without any additional terms or conditions. 135 | Notwithstanding the above, nothing herein shall supersede or modify 136 | the terms of any separate license agreement you may have executed 137 | with Licensor regarding such Contributions. 138 | 139 | 6. Trademarks. This License does not grant permission to use the trade 140 | names, trademarks, service marks, or product names of the Licensor, 141 | except as required for reasonable and customary use in describing the 142 | origin of the Work and reproducing the content of the NOTICE file. 143 | 144 | 7. Disclaimer of Warranty. Unless required by applicable law or 145 | agreed to in writing, Licensor provides the Work (and each 146 | Contributor provides its Contributions) on an "AS IS" BASIS, 147 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 148 | implied, including, without limitation, any warranties or conditions 149 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 150 | PARTICULAR PURPOSE. You are solely responsible for determining the 151 | appropriateness of using or redistributing the Work and assume any 152 | risks associated with Your exercise of permissions under this License. 153 | 154 | 8. Limitation of Liability. In no event and under no legal theory, 155 | whether in tort (including negligence), contract, or otherwise, 156 | unless required by applicable law (such as deliberate and grossly 157 | negligent acts) or agreed to in writing, shall any Contributor be 158 | liable to You for damages, including any direct, indirect, special, 159 | incidental, or consequential damages of any character arising as a 160 | result of this License or out of the use or inability to use the 161 | Work (including but not limited to damages for loss of goodwill, 162 | work stoppage, computer failure or malfunction, or any and all 163 | other commercial damages or losses), even if such Contributor 164 | has been advised of the possibility of such damages. 165 | 166 | 9. Accepting Warranty or Additional Liability. While redistributing 167 | the Work or Derivative Works thereof, You may choose to offer, 168 | and charge a fee for, acceptance of support, warranty, indemnity, 169 | or other liability obligations and/or rights consistent with this 170 | License. However, in accepting such obligations, You may act only 171 | on Your own behalf and on Your sole responsibility, not on behalf 172 | of any other Contributor, and only if You agree to indemnify, 173 | defend, and hold each Contributor harmless for any liability 174 | incurred by, or claims asserted against, such Contributor by reason 175 | of your accepting any such warranty or additional liability. 176 | -------------------------------------------------------------------------------- /NOTICE: -------------------------------------------------------------------------------- 1 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## AutoML with AutoGluon, Amazon SageMaker, and AWS Lambda 2 | 3 | This repository contains the CloudFormation template and prewritten source code powering the code-free AutoML pipeline detailed in [this AWS Machine Learning blog post](https://aws.amazon.com/blogs/machine-learning/code-free-machine-learning-automl-with-autogluon-amazon-sagemaker-and-aws-lambda/). Feel free to customize it to fit your use case and share with us what you build! 4 | 5 | * `autogluon-tab-with-test.py` is the script run by the SageMaker training job the Lambda function automatically kicks off when you upload your training data to S3. It's pre-packaged in `sourcedir.tar.gz` for the use of the pipeline. You can modify this script to reuse the pipeline with your own model training code. 6 | * `CodeFreeAutoML.yaml` is the CloudFormation template you use to deploy the pipeline in your account. 7 | * `lambda_function.py` is the source code for the Lambda function that kicks off the SageMaker training job when you upload your data to S3. 8 | * `sourcedir.tar.gz` is the `autogluon-tab-with-test.py` file pre-packaged for your convenience; the pipeline requires it to be gzipped. 9 | 10 | ## Security 11 | 12 | See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information. 13 | 14 | ## License 15 | 16 | This project is licensed under the Apache-2.0 License. 17 | 18 | -------------------------------------------------------------------------------- /autogluon-tab-with-test.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import logging 3 | import os 4 | import json 5 | import boto3 6 | 7 | import subprocess 8 | import sys 9 | 10 | from urllib.parse import urlparse 11 | 12 | os.system('pip install autogluon') 13 | from autogluon import TabularPrediction as task 14 | import pandas as pd # this should come after the pip install. 15 | 16 | logging.basicConfig(level=logging.DEBUG) 17 | 18 | logging.info(subprocess.call('ls -lR /opt/ml/input'.split())) 19 | 20 | # ------------------------------------------------------------ # 21 | # Training methods # 22 | # ------------------------------------------------------------ # 23 | 24 | 25 | def train(args): 26 | # SageMaker passes num_cpus, num_gpus and other args we can use to tailor training to 27 | # the current container environment, but here we just use simple cpu context. 28 | 29 | num_gpus = int(os.environ['SM_NUM_GPUS']) 30 | current_host = args.current_host 31 | hosts = args.hosts 32 | model_dir = args.model_dir 33 | target = args.target 34 | 35 | # load training and validation data 36 | 37 | training_dir = args.train 38 | filename = args.filename 39 | logging.info(training_dir) 40 | train_data = task.Dataset(file_path=training_dir + '/' + filename) 41 | predictor = task.fit(train_data = train_data, label=target, output_directory=model_dir) 42 | 43 | return predictor 44 | 45 | 46 | # ------------------------------------------------------------ # 47 | # Hosting methods # 48 | # ------------------------------------------------------------ # 49 | 50 | def model_fn(model_dir): 51 | """ 52 | Load the gluon model. Called once when hosting service starts. 53 | :param: model_dir The directory where model files are stored. 54 | :return: a model (in this case an AutoGluon network) 55 | """ 56 | net = task.load(model_dir) 57 | return net 58 | 59 | 60 | def transform_fn(net, data, input_content_type, output_content_type): 61 | """ 62 | Transform a request using the Gluon model. Called once per request. 63 | :param net: The AutoGluon model. 64 | :param data: The request payload. 65 | :param input_content_type: The request content type. 66 | :param output_content_type: The (desired) response content type. 67 | :return: response payload and content type. 68 | """ 69 | # we can use content types to vary input/output handling, but 70 | # here we just assume json for both 71 | data = json.loads(data) 72 | # the input request payload has to be deserialized twice since it has a discrete header 73 | data = json.loads(data) 74 | df_parsed = pd.DataFrame(data) 75 | 76 | prediction = net.predict(df_parsed) 77 | 78 | response_body = json.dumps(prediction.tolist()) 79 | 80 | return response_body, output_content_type 81 | 82 | 83 | # ------------------------------------------------------------ # 84 | # Training execution # 85 | # ------------------------------------------------------------ # 86 | 87 | def parse_args(): 88 | parser = argparse.ArgumentParser() 89 | 90 | parser.add_argument('--model-dir', type=str, default=os.environ['SM_MODEL_DIR']) 91 | parser.add_argument('--train', type=str, default=os.environ['SM_CHANNEL_TRAINING']) 92 | parser.add_argument('--filename', type=str, default='train.csv') 93 | 94 | parser.add_argument('--current-host', type=str, default=os.environ['SM_CURRENT_HOST']) 95 | parser.add_argument('--hosts', type=list, default=json.loads(os.environ['SM_HOSTS'])) 96 | 97 | parser.add_argument('--target', type=str, default='target') 98 | parser.add_argument('--s3-output', type=str, default='s3://autogluon-test/results') 99 | parser.add_argument('--training-job-name', type=str, default=json.loads(os.environ['SM_TRAINING_ENV'])['job_name']) 100 | 101 | return parser.parse_args() 102 | 103 | 104 | if __name__ == '__main__': 105 | args = parse_args() 106 | predictor = train(args) 107 | 108 | training_dir = args.train 109 | train_file = args.filename 110 | test_file = train_file.replace('train', 'test', 1) 111 | dataset_name = train_file.split('_')[0] 112 | print(dataset_name) 113 | 114 | test_data = task.Dataset(file_path=os.path.join(training_dir, test_file)) 115 | u = urlparse(args.s3_output, allow_fragments=False) 116 | bucket = u.netloc 117 | print(bucket) 118 | prefix = u.path.strip('/') 119 | print(prefix) 120 | 121 | s3 = boto3.client('s3') 122 | 123 | try: 124 | y_test = test_data[args.target] # values to predict 125 | test_data_nolab = test_data.drop(labels=[args.target], axis=1) # delete label column to prove we're not cheating 126 | 127 | y_pred = predictor.predict(test_data_nolab) 128 | y_pred_df = pd.DataFrame.from_dict({'True': y_test, 'Predicted': y_pred}) 129 | pred_file = f'{dataset_name}_test_predictions.csv' 130 | y_pred_df.to_csv(pred_file, index=False, header=True) 131 | 132 | leaderboard = predictor.leaderboard() 133 | lead_file = f'{dataset_name}_leaderboard.csv' 134 | leaderboard.to_csv(lead_file) 135 | 136 | perf = predictor.evaluate_predictions(y_true=y_test, y_pred=y_pred, auxiliary_metrics=True) 137 | perf_file = f'{dataset_name}_model_performance.txt' 138 | with open(perf_file, 'w') as f: 139 | print(json.dumps(perf, indent=4), file=f) 140 | 141 | summary = predictor.fit_summary() 142 | summ_file = f'{dataset_name}_fit_summary.txt' 143 | with open(summ_file, 'w') as f: 144 | print(summary, file=f) 145 | 146 | files_to_upload = [pred_file, lead_file, perf_file, summ_file] 147 | 148 | except: 149 | y_pred = predictor.predict(test_data) 150 | y_pred_df = pd.DataFrame.from_dict({'Predicted': y_pred}) 151 | pred_file = f'{dataset_name}_test_predictions.csv' 152 | y_pred_df.to_csv(pred_file, index=False, header=True) 153 | 154 | leaderboard = predictor.leaderboard() 155 | lead_file = f'{dataset_name}_leaderboard.csv' 156 | leaderboard.to_csv(lead_file) 157 | 158 | files_to_upload = [pred_file, lead_file] 159 | 160 | for file in files_to_upload: 161 | s3.upload_file(file, bucket, os.path.join(prefix, args.training_job_name.replace('mxnet-training', 'autogluon', 1), file)) 162 | -------------------------------------------------------------------------------- /lambda_function.py: -------------------------------------------------------------------------------- 1 | import json 2 | import os 3 | import boto3 4 | import datetime 5 | from urllib.parse import unquote_plus 6 | import logging 7 | logging.basicConfig(level=logging.INFO, 8 | format='%(asctime)s %(message)s', 9 | datefmt='%Y-%m-%d %H:%M:%S') 10 | 11 | def lambda_handler(event, context): 12 | for record in event['Records']: 13 | bucket = record['s3']['bucket']['name'] 14 | key = unquote_plus(record['s3']['object']['key']) 15 | tmpkey = key.replace('/', '') 16 | logging.info(key) 17 | logging.info(tmpkey) 18 | filename = key.split('/')[-1] 19 | print(filename) 20 | dataset = filename.split('_')[0] 21 | print(dataset) 22 | 23 | now = datetime.datetime.now 24 | str_time = now().strftime('%Y-%m-%d-%H-%M-%S-%f')[:-3] 25 | sm = boto3.Session().client('sagemaker') 26 | training_job_params = { 27 | 'TrainingJobName': dataset + '-autogluon-' + str_time, 28 | 'HyperParameters': { 29 | 'filename':json.dumps(filename), 30 | 'sagemaker_container_log_level': '20', 31 | 'sagemaker_enable_cloudwatch_metrics': 'false', 32 | 'sagemaker_program': 'autogluon-tab-with-test.py', 33 | 'sagemaker_region': os.environ['AWS_REGION'], 34 | 'sagemaker_submit_directory': 's3://' + bucket + '/source/sourcedir.tar.gz', 35 | 's3-output': os.environ['S3_OUTPUT_PATH'] 36 | }, 37 | 'AlgorithmSpecification': { 38 | 'TrainingImage': '763104351884.dkr.ecr.' + os.environ['AWS_REGION'] + '.amazonaws.com/mxnet-training:1.6.0-cpu-py3', 39 | 'TrainingInputMode': 'File', 40 | 'EnableSageMakerMetricsTimeSeries': False 41 | }, 42 | 'RoleArn': os.environ['SAGEMAKER_ROLE_ARN'], 43 | 'InputDataConfig': [ 44 | { 45 | 'ChannelName': 'training', 46 | 'DataSource': { 47 | 'S3DataSource': { 48 | 'S3DataType': 'S3Prefix', 49 | 'S3Uri': os.environ['S3_TRIGGER_PATH'], 50 | 'S3DataDistributionType': 'FullyReplicated' 51 | } 52 | }, 53 | 'CompressionType': 'None', 54 | 'RecordWrapperType': 'None' 55 | } 56 | ], 57 | 'OutputDataConfig': { 58 | 'KmsKeyId': '', 59 | 'S3OutputPath': os.environ['S3_OUTPUT_PATH'] 60 | }, 61 | 'ResourceConfig': { 62 | 'InstanceType': os.environ['AG_INSTANCE_TYPE'], 63 | 'InstanceCount': 1, 64 | 'VolumeSizeInGB': 30 65 | }, 66 | 'StoppingCondition': { 67 | 'MaxRuntimeInSeconds': 86400 68 | }, 69 | 'EnableNetworkIsolation': False, 70 | 'EnableInterContainerTrafficEncryption': False, 71 | 'EnableManagedSpotTraining': False, 72 | } 73 | 74 | response = sm.create_training_job(**training_job_params) 75 | 76 | return { 77 | 'statusCode': 200, 78 | 'body': json.dumps(key) 79 | } -------------------------------------------------------------------------------- /sourcedir.tar.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/automl-pipeline-with-autogluon-sagemaker-lambda/3a47e67196f5bdedc6ddab74bd80a6163ffe3002/sourcedir.tar.gz --------------------------------------------------------------------------------