├── .gitignore ├── Makefile ├── README.md ├── bucket-defs-update-policy.json ├── bucket-scanner-function-policy.json ├── build_lambda.sh ├── clamav-definition-updates-policy.json ├── clamav.py ├── common.py ├── diff.yaml ├── presentation └── Presentation1.pptx ├── python.gitignore ├── requirements.txt ├── scan.py ├── update.py ├── yara-rule-updates-policy.json ├── yara_rules ├── my_first_rule.yara ├── second_rule.yara └── third_yara_rule.yara └── yarascan.py /Makefile: -------------------------------------------------------------------------------- 1 | AMZ_LINUX_VERSION:=2 2 | current_dir := $(shell pwd) 3 | container_dir := /opt/app 4 | circleci := ${CIRCLECI} 5 | 6 | all: archive 7 | 8 | clean: 9 | rm -rf compile/lambda.zip 10 | 11 | archive: clean 12 | ifeq ($(circleci), true) 13 | docker create -v $(container_dir) --name src alpine:3.4 /bin/true 14 | docker cp $(current_dir)/. src:$(container_dir) 15 | docker run --rm -ti \ 16 | --volumes-from src \ 17 | amazonlinux:$(AMZ_LINUX_VERSION) \ 18 | /bin/bash -c "cd $(container_dir) && ./build_lambda.sh" 19 | else 20 | docker run --rm -ti \ 21 | -v $(current_dir):$(container_dir) \ 22 | amazonlinux:$(AMZ_LINUX_VERSION) \ 23 | /bin/bash -c "cd $(container_dir) && ./build_lambda.sh" 24 | endif 25 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Clara 2 | Serverless, real-time, ClamAV+Yara scanning for your S3 Buckets 3 | 4 | ## Join the Slack Channel 5 | * [Slack Channel Link](https://join.slack.com/t/cloudmalscanner/shared_invite/enQtNzkwNTU0MzU1NzgzLWUwZGY1ZjZmZGNiOWNlNDEyNjQ2N2JjMjMzMjE1ODQ0NDMwODM3ODk5YTg3MTZkN2VjMjM4N2JiMDMzMjM2Mjc) 6 | 7 | ## What is clara 8 | Clara is Python based project used to scan your S3 bucket files with ClamAV and Yara signatures. 9 | It is built on top of Upside Travel's [bucket-antivirus-function](https://github.com/upsidetravel/bucket-antivirus-function) project and Airbnb's (BinaryAlert)[https://github.com/airbnb/binaryalert] project. *Clara* combines the two functionalities into a single project with some additional improvements: 10 | * Yara and ClamAV Scanning feature with signature updates. 11 | * Slack/Email SNS alerts. 12 | * DynamoDB storage support. 13 | * Python 3.7 support. 14 | 15 | ## Setup: 16 | 17 | ### Step 1: Building the Lambda Zip file from source 18 | 19 | change current directory to clara and run `make` 20 | This will create a Build directory with the zip file containing the lambda fuction code and requirements. 21 | 22 | ### Step 2: Creating the S3 buckets. 23 | 24 | We will need three different S3 buckets for this project. One for storing the files to be scanned, second for storing clamAV definitions and third for storing your Yara rules. 25 | 26 | 1. Create an S3 bucket with named ```file-scanning-upload```. This bucket will hold the files to be scanned by clara. 27 | 28 | 29 | 2. Create a second bucket to store the clamAV definitions. Name the bucket as ```clamav-definition-updates```. 30 | 31 | 3. Upload the most recent defintion files [main.cvd](http://database.clamav.net/main.cvd), [daily.cvd](http://database.clamav.net/daily.cvd) and [bytecode.cvd](http://database.clamav.net/bytecode.cvd) to this bucket. 32 | 33 | 4. For this same bucket, navigate to Permissions Tab and select Bucket Policy. Copy and paste the json policy from ```clamav-definition-updates-policy.json``` to this field and save. This policy gives public access to the bucket allowing it to download definitions from the internet. 34 | 35 | 5. Now, create the third and final bucket and name it ```yara-rule-updates```. 36 | 37 | 6. You can now upload your yara signature sets to this bucket. Once done, again navigate to Permissions tab and select Bucket Policy. Copy and paste the json policy defined in ```yara-rule-updates-policy.json``` here. 38 | 39 | Now we are all set with creating our S3 buckets and assigning the required access policies. 40 | 41 | 42 | ### Step 3: 43 | * Creating the rules and definition update Lambda function 44 | 45 | 1. Proceed to your AWS Lambda dashboard UI and create a new lambda fuction. Select "From scratch" to define our own parameters. 46 | 47 | 2. Name the function as ```bucket-defs-update``` and select the runtime as **Python 3.7**. 48 | 49 | 3. In the Designer tab, click on the left side "Add trigger" option. Select ```CloudWatch Events``` and then add ```rate(3 hours)``` for **Scheduled expression**. Make sure that **enable trigger** option is selected. Save it. 50 | 51 | 4. In the Function code section, select "upload from zip file" and select the Build.zip file that we created in step 1. Again, select the runtime as Python 3.7. In the Handler field add ```update.lambda_handler```. 52 | 53 | 5. In the environment variables field, we will have to provide the bucket names where our ClamAV and Yara rules recide. Define the following key and value in this field: 54 | Key Value 55 | AV_DEFINITION_S3_BUCKET clamav-definition-updates 56 | YARA_RULES_S3_BUCKET yara-rule-updates 57 | 58 | 6. For the Execution role section, we will have to create a new role to be attached to our lambda function. Head to your IAM dashboard, Click on "Roles" and "create new role". 59 | i. Name your role as ```bucket-defs-update```. 60 | ii. Click on "Attach policies" and then create a new policy. 61 | iii. Copy-paste the json policy from ```bucket-defs-update-policy.json``` and save it. Once the policy is attached to the role, go back to your Lambda creation page and scroll to the "Execution Role" section. 62 | 63 | 7. In the "Execution role" section, select "Use an existing role" and select the newly created role. You might want to click on the Refresh button for your role to reflect. 64 | 65 | 8. In the "Basic Settings", select 512MB memory and Timout as 3 minutes. 66 | 67 | 9. (Optional) Choose the VPC where you want to add your lambda function. 68 | 69 | 70 | ## Step 4 71 | * Creating the scanner lambda function 72 | 73 | 1. Create a new lambda function and name it as ```bucket-scanner-function```. Select the runtime environment as **Python 3.7**. 74 | 75 | 2. In the Designer tab, add a new trigger and select **S3 Event**. 76 | 77 | 3. Select your bucket name where the scanning files will be uploaded. In this example we have named it as ```file-scanning-upload```. In the **Event Type** select **All object create event**. Make sure that enable trigger is configured. Then click on Add. 78 | 79 | 4. In the **Function code** section slect the zip file created in Step 1. Set Python3.7 as the runtime environment. Set the value of **Handler** as ```scan.lambda_handler```. 80 | 81 | 5. Define the below two environment variable key values: 82 | AV_DEFINITION_S3_BUCKET clamav-definition-updates 83 | YARA_RULES_S3_BUCKET yara-rule-updates 84 | 85 | 6. For the **Execution Role**, create a new role and name it as ```bucket-scanner-function```. Select **Attach Policy** and create a new policy. You can again name the policy as above. Copy-paste the ```bucket-scanner-function-policy.json``` into the JSON editor of the policy and save it. 86 | 87 | 7. Go back to the lambda function creation dashboard. In the **Execution role**, select **use an existing role** and choose the newly created role. 88 | 89 | 8. In the **Basic settings**, set the memory to 2048MB and timeout as 3 minutes. Save the lambda function. 90 | 91 | ## Step 5 92 | 93 | ### Testing the Lambda functions 94 | 95 | 1. Click on Test functions and let it run with the default values. 96 | 97 | 2. Come back to your lambda function page. You should now have two functions created. Click on the ```bucket-defs-update``` function to launch its UI. Select the monitoring tab and you should see the graph with execution of test event details. 98 | 99 | 3. You can click on **View Logs in CloudWatch** to see the log lines created by the lambda function. You should see print lines about definitions and rules getting updated. 100 | 101 | 2. Go to your S3 bucket ```file-scanning-upload``` and add a new file there. 102 | 103 | 3. Now, open the monitoring page of the ```bucket-scanner-function```. Select **View logs in CloudWatch**. You should see the scanning getting initiated. If there is a detection, it should print in the log line. 104 | 105 | 106 | ## Acknowledgements 107 | 108 | * [Bucket-antivirus-fuction](https://github.com/upsidetravel/bucket-antivirus-function) 109 | * [Binary Alert](https://github.com/airbnb/binaryalert) 110 | -------------------------------------------------------------------------------- /bucket-defs-update-policy.json: -------------------------------------------------------------------------------- 1 | { 2 | "Version": "2012-10-17", 3 | "Statement": [ 4 | { 5 | "Effect": "Allow", 6 | "Action": [ 7 | "logs:CreateLogGroup", 8 | "logs:CreateLogStream", 9 | "logs:PutLogEvents" 10 | ], 11 | "Resource": "*" 12 | }, 13 | { 14 | "Action": [ 15 | "s3:GetObject", 16 | "s3:GetObjectTagging", 17 | "s3:PutObject", 18 | "s3:PutObjectTagging", 19 | "s3:PutObjectVersionTagging" 20 | ], 21 | "Effect": "Allow", 22 | "Resource": "arn:aws:s3:::clamav-definition-updates/*" 23 | }, 24 | { 25 | "Action": [ 26 | "s3:GetObject", 27 | "s3:GetObjectTagging", 28 | "s3:PutObject", 29 | "s3:PutObjectTagging", 30 | "s3:PutObjectVersionTagging" 31 | ], 32 | "Effect": "Allow", 33 | "Resource": "arn:aws:s3:::yara-rule-updates/*" 34 | } 35 | ] 36 | } -------------------------------------------------------------------------------- /bucket-scanner-function-policy.json: -------------------------------------------------------------------------------- 1 | { 2 | "Version": "2012-10-17", 3 | "Statement": [ 4 | { 5 | "Effect": "Allow", 6 | "Action": [ 7 | "logs:CreateLogGroup", 8 | "logs:CreateLogStream", 9 | "logs:PutLogEvents" 10 | ], 11 | "Resource": "*" 12 | }, 13 | { 14 | "Action": [ 15 | "s3:GetObject", 16 | "s3:GetObjectTagging", 17 | "s3:PutObjectTagging", 18 | "s3:PutObjectVersionTagging" 19 | ], 20 | "Effect": "Allow", 21 | "Resource": [ 22 | "arn:aws:s3:::file-scanning-upload/*" 23 | ] 24 | }, 25 | { 26 | "Action": [ 27 | "s3:GetObject", 28 | "s3:GetObjectTagging" 29 | ], 30 | "Effect": "Allow", 31 | "Resource": [ 32 | "arn:aws:s3:::clamav-definition-updates/*" 33 | ] 34 | }, 35 | { 36 | "Action": [ 37 | "s3:GetObject", 38 | "s3:GetObjectTagging" 39 | ], 40 | "Effect": "Allow", 41 | "Resource": [ 42 | "arn:aws:s3:::yara-rule-updates/*" 43 | ] 44 | }, 45 | { 46 | "Action": [ 47 | "kms:Decrypt" 48 | ], 49 | "Effect": "Allow", 50 | "Resource": [ 51 | "arn:aws:s3:::file-scanning-upload/*" 52 | ] 53 | }, 54 | { 55 | "Action": [ 56 | "sns:Publish" 57 | ], 58 | "Effect": "Allow", 59 | "Resource": [ 60 | "arn:aws:sns:::", 61 | "arn:aws:sns:::" 62 | ] 63 | } 64 | ] 65 | } -------------------------------------------------------------------------------- /build_lambda.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | # 06/23/2019 - Adding new feature that creates Yara scanning lambda fucntion 3 | #author: Abhinav Singh 4 | 5 | lambda_output_file=/opt/app/build/lambda.zip 6 | 7 | set -e 8 | 9 | yum update -y 10 | yum install -y cpio python3-pip yum-utils zip 11 | yum -y install gcc openssl-devel bzip2-devel libffi-devel 12 | yum install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm 13 | #yum install https://www.rpmfind.net/linux/epel/7/x86_64/Packages/y/yara-3.8.1-1.el7.x86_64.rpm 14 | #yum install -y http://download-ib01.fedoraproject.org/pub/epel/testing/7/x86_64/Packages/y/yara-3.11.0-1.el7.x86_64.rpm 15 | yum install -y https://download-ib01.fedoraproject.org/pub/epel/8/Everything/x86_64/Packages/y/yara-3.11.0-1.el8.x86_64.rpm 16 | 17 | yum install -y python3-devel.x86_64 18 | 19 | pip3 install --no-cache-dir virtualenv 20 | 21 | 22 | virtualenv env 23 | . env/bin/activate 24 | pip3 install --no-cache-dir -r requirements.txt 25 | 26 | pushd /tmp 27 | #yumdownloader -x \*i686 --archlist=x86_64 clamav clamav-lib clamav-update json-c pcre2 yara 28 | yumdownloader -x \*i686 --archlist=x86_64 clamav clamav-lib clamav-update json-c pcre2 libtool-ltdl bzip2-libs libprelude gnutls libtasn1 nettle lib64nettle yara 29 | 30 | 31 | rpm2cpio clamav-0*.rpm | cpio -idmv 32 | rpm2cpio clamav-lib*.rpm | cpio -idmv 33 | rpm2cpio clamav-update*.rpm | cpio -idmv 34 | rpm2cpio json-c*.rpm | cpio -idmv 35 | rpm2cpio pcre*.rpm | cpio -idmv 36 | rpm2cpio yara*.rpm | cpio -idmv 37 | rpm2cpio libtool-ltdl*.rpm | cpio -idmv 38 | rpm2cpio bzip2-libs-*.rpm | cpio -idmv 39 | rpm2cpio libprelude-*.rpm | cpio -idmv 40 | rpm2cpio gnutls-*.rpm | cpio -idmv 41 | rpm2cpio libtasn1-*.rpm | cpio -idmv 42 | rpm2cpio nettle-*.rpm | cpio -idmv 43 | rpm2cpio lib* | cpio -idmv 44 | rpm2cpio *.rpm | cpio -idmv 45 | 46 | popd 47 | mkdir -p bin 48 | cp /tmp/usr/bin/clamscan /tmp/usr/bin/freshclam /tmp/usr/bin/yara /tmp/usr/bin/yarac /tmp/usr/lib64/* bin/. 49 | echo "DatabaseMirror database.clamav.net" > bin/freshclam.conf 50 | mkdir -p build 51 | zip -r9 $lambda_output_file *.py bin 52 | zip -r9 $lambda_output_file conf/ bin/. 53 | cd env/lib/python3.7/site-packages 54 | zip -r9 $lambda_output_file * -------------------------------------------------------------------------------- /clamav-definition-updates-policy.json: -------------------------------------------------------------------------------- 1 | { 2 | "Version": "2012-10-17", 3 | "Statement": [ 4 | { 5 | "Sid": "AllowPublic", 6 | "Effect": "Allow", 7 | "Principal": "*", 8 | "Action": [ 9 | "s3:GetObject", 10 | "s3:GetObjectTagging" 11 | ], 12 | "Resource": "arn:aws:s3:::clamav-definition-updates/*" 13 | } 14 | ] 15 | } -------------------------------------------------------------------------------- /clamav.py: -------------------------------------------------------------------------------- 1 | #add 2 | 3 | import botocore 4 | import hashlib 5 | import os 6 | import pwd 7 | import re 8 | from common import * 9 | from subprocess import check_output, Popen, PIPE, STDOUT 10 | 11 | 12 | def current_library_search_path(): 13 | ld_verbose = check_output(["ld", "--verbose"]).decode('utf-8') 14 | rd_ld = re.compile("SEARCH_DIR\(\"([A-z0-9/-]*)\"\)") 15 | return rd_ld.findall(ld_verbose) 16 | 17 | 18 | def update_defs_from_s3(bucket, prefix): 19 | create_dir(AV_DEFINITION_PATH) 20 | for filename in AV_DEFINITION_FILENAMES: 21 | s3_path = os.path.join(AV_DEFINITION_S3_PREFIX, filename) 22 | local_path = os.path.join(AV_DEFINITION_PATH, filename) 23 | s3_md5 = md5_from_s3_tags(bucket, s3_path) 24 | if os.path.exists(local_path) and md5_from_file(local_path) == s3_md5: 25 | print("Not downloading %s because local md5 matches s3." % filename) 26 | continue 27 | if s3_md5: 28 | print("Downloading definition file %s from s3://%s" % (filename, os.path.join(bucket, prefix))) 29 | s3.Bucket(bucket).download_file(s3_path, local_path) 30 | 31 | 32 | def upload_defs_to_s3(bucket, prefix, local_path): 33 | for filename in AV_DEFINITION_FILENAMES: 34 | local_file_path = os.path.join(local_path, filename) 35 | if os.path.exists(local_file_path): 36 | local_file_md5 = md5_from_file(local_file_path) 37 | if local_file_md5 != md5_from_s3_tags(bucket, os.path.join(prefix, filename)): 38 | print("Uploading %s to s3://%s" % (local_file_path, os.path.join(bucket, prefix, filename))) 39 | s3_object = s3.Object(bucket, os.path.join(prefix, filename)) 40 | s3_object.upload_file(os.path.join(local_path, filename)) 41 | s3_client.put_object_tagging( 42 | Bucket=s3_object.bucket_name, 43 | Key=s3_object.key, 44 | Tagging={"TagSet": [{"Key": "md5", "Value": local_file_md5}]} 45 | ) 46 | else: 47 | print("Not uploading %s because md5 on remote matches local." % filename) 48 | 49 | 50 | def update_defs_from_freshclam(path, library_path=""): 51 | create_dir(path) 52 | fc_env = os.environ.copy() 53 | if library_path: 54 | fc_env["LD_LIBRARY_PATH"] = "%s:%s" % (":".join(current_library_search_path()), CLAMAVLIB_PATH) 55 | print("Starting freshclam with defs in %s." % path) 56 | fc_proc = Popen( 57 | [ 58 | FRESHCLAM_PATH, 59 | "--config-file=./bin/freshclam.conf", 60 | "-u %s" % pwd.getpwuid(os.getuid())[0], 61 | "--datadir=%s" % path 62 | ], 63 | stderr=STDOUT, 64 | stdout=PIPE, 65 | env=fc_env 66 | ) 67 | output = fc_proc.communicate()[0] 68 | print("freshclam output:\n%s" % output) 69 | if fc_proc.returncode != 0: 70 | print("Unexpected exit code from freshclam: %s." % fc_proc.returncode) 71 | return fc_proc.returncode 72 | 73 | 74 | def md5_from_file(filename): 75 | hash_md5 = hashlib.md5() 76 | with open(filename, "rb") as f: 77 | for chunk in iter(lambda: f.read(4096), b""): 78 | hash_md5.update(chunk) 79 | return hash_md5.hexdigest() 80 | 81 | 82 | def md5_from_s3_tags(bucket, key): 83 | try: 84 | tags = s3_client.get_object_tagging(Bucket=bucket, Key=key)["TagSet"] 85 | except botocore.exceptions.ClientError as e: 86 | expected_errors = {'404', 'AccessDenied', 'NoSuchKey'} 87 | if e.response['Error']['Code'] in expected_errors: 88 | return "" 89 | else: 90 | raise 91 | for tag in tags: 92 | if tag["Key"] == "md5": 93 | return tag["Value"] 94 | return "" 95 | 96 | 97 | def scan_file(path): 98 | av_env = os.environ.copy() 99 | av_env["LD_LIBRARY_PATH"] = CLAMAVLIB_PATH 100 | print("Starting clamscan of %s." % path) 101 | av_proc = Popen( 102 | [ 103 | CLAMSCAN_PATH, 104 | "-v", 105 | "-a", 106 | "--stdout", 107 | "-d", 108 | AV_DEFINITION_PATH, 109 | path 110 | ], 111 | stderr=STDOUT, 112 | stdout=PIPE, 113 | env=av_env 114 | ) 115 | output = av_proc.communicate()[0] 116 | print("clamscan output:\n%s" % output) 117 | if av_proc.returncode == 0: 118 | return AV_STATUS_CLEAN 119 | elif av_proc.returncode == 1: 120 | return AV_STATUS_INFECTED 121 | else: 122 | msg = "Unexpected exit code from clamscan: %s.\n" % av_proc.returncode 123 | print(msg) 124 | raise Exception(msg) 125 | 126 | def main(): 127 | path = input("Enter the path of your file: ") 128 | scan_file(path) 129 | 130 | if __name__ == "__main__": 131 | main() -------------------------------------------------------------------------------- /common.py: -------------------------------------------------------------------------------- 1 | # 06/23/2019 - Adding new environment variable for Yara signature buckets and lib files. 2 | #author: Abhinav Singh 3 | #pre-update 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | 16 | import boto3 17 | import errno 18 | import os 19 | 20 | AV_DEFINITION_S3_BUCKET = os.getenv("AV_DEFINITION_S3_BUCKET") 21 | AV_DEFINITION_S3_PREFIX = os.getenv("AV_DEFINITION_S3_PREFIX", "clamav_defs") 22 | AV_DEFINITION_PATH = os.getenv("AV_DEFINITION_PATH", "/tmp/clamav_defs") 23 | AV_SCAN_START_SNS_ARN = os.getenv("AV_SCAN_START_SNS_ARN") 24 | AV_SCAN_START_METADATA = os.getenv("AV_SCAN_START_METADATA", "av-scan-start") 25 | AV_STATUS_CLEAN = os.getenv("AV_STATUS_CLEAN", "CLEAN") 26 | AV_STATUS_INFECTED = os.getenv("AV_STATUS_INFECTED", "INFECTED") 27 | AV_STATUS_METADATA = os.getenv("AV_STATUS_METADATA", "av-status") 28 | AV_STATUS_SNS_ARN = os.getenv("AV_STATUS_SNS_ARN") 29 | AV_TIMESTAMP_METADATA = os.getenv("AV_TIMESTAMP_METADATA", "av-timestamp") 30 | CLAMAVLIB_PATH = os.getenv("CLAMAVLIB_PATH", "./bin") 31 | CLAMSCAN_PATH = os.getenv("CLAMSCAN_PATH", "./bin/clamscan") 32 | FRESHCLAM_PATH = os.getenv("FRESHCLAM_PATH", "./bin/freshclam") 33 | AV_PROCESS_ORIGINAL_VERSION_ONLY = os.getenv("AV_PROCESS_ORIGINAL_VERSION_ONLY", "False") 34 | AV_DELETE_INFECTED_FILES = os.getenv("AV_DELETE_INFECTED_FILES", "False") 35 | YARA_RULES_S3_BUCKET = os.getenv("YARA_RULES_S3_BUCKET") 36 | YARA_RULES_S3_PREFIX = os.getenv("YARA_RULES_S3_PREFIX", "yara_rules") 37 | YARA_DEFINITION_PATH = os.getenv("YARA_DEFINITION_PATH", "/tmp/yara_rules") 38 | YARA_LIB_PATH = os.getenv("YARA_LIB_PATH", "./bin") 39 | YARASCAN_PATH = os.getenv("YARASCAN_PATH", "./bin/yara") 40 | AV_DEFINITION_FILENAMES = ["main.cvd", "daily.cvd", "daily.cud", "bytecode.cvd", "bytecode.cud"] 41 | 42 | s3 = boto3.resource('s3') 43 | s3_client = boto3.client('s3') 44 | 45 | 46 | def create_dir(path): 47 | if not os.path.exists(path): 48 | try: 49 | print("Attempting to create directory %s.\n" % path) 50 | os.makedirs(path) 51 | except OSError as exc: 52 | if exc.errno != errno.EEXIST: 53 | raise 54 | -------------------------------------------------------------------------------- /diff.yaml: -------------------------------------------------------------------------------- 1 | AWSTemplateFormatVersion: "2010-09-09" 2 | Metadata: 3 | Generator: "former2" 4 | Description: "" 5 | Resources: 6 | LambdaFunction: 7 | Type: "AWS::Lambda::Function" 8 | Properties: 9 | Description: "" 10 | FunctionName: "bucket-scan-function" 11 | Handler: "scan.lambda_handler" 12 | Code: 13 | S3Bucket: "aws-jam-challenge-resources-010" 14 | S3Key: "malware-in-your-bucket/lambda.zip" 15 | MemorySize: 128 16 | Role: !GetAtt IAMRole.Arn 17 | Runtime: "python3.7" 18 | Timeout: 3 19 | TracingConfig: 20 | Mode: "PassThrough" 21 | 22 | S3Bucket: 23 | Type: "AWS::S3::Bucket" 24 | Properties: 25 | BucketName: !Join 26 | - "-" 27 | - - "file-scanning-upload" 28 | - !Select 29 | - 0 30 | - !Split 31 | - "-" 32 | - !Select 33 | - 2 34 | - !Split 35 | - "/" 36 | - !Ref "AWS::StackId" 37 | 38 | IAMRole: 39 | Type: "AWS::IAM::Role" 40 | Properties: 41 | Path: "/" 42 | RoleName: "BucketScan3" 43 | AssumeRolePolicyDocument: "{\"Version\":\"2012-10-17\",\"Statement\":[{\"Effect\":\"Allow\",\"Principal\":{\"Service\":\"lambda.amazonaws.com\"},\"Action\":\"sts:AssumeRole\"}]}" 44 | MaxSessionDuration: 3600 45 | ManagedPolicyArns: 46 | - !Ref IAMManagedPolicy 47 | Description: "Allows Lambda functions to call AWS services on your behalf." 48 | 49 | IAMManagedPolicy: 50 | Type: "AWS::IAM::ManagedPolicy" 51 | Properties: 52 | ManagedPolicyName: "bucket-scan-policy-2" 53 | Path: "/" 54 | PolicyDocument: | 55 | { 56 | "Version": "2012-10-17", 57 | "Statement": [ 58 | { 59 | "Sid": "VisualEditor0", 60 | "Effect": "Allow", 61 | "Action": [ 62 | "s3:GetObject", 63 | "sns:Publish", 64 | "kms:Decrypt", 65 | "s3:PutObjectVersionTagging", 66 | "s3:GetObjectTagging", 67 | "s3:PutObjectTagging" 68 | ], 69 | "Resource": [ 70 | "arn:aws:sns:::", 71 | "arn:aws:sns:::", 72 | "arn:aws:s3:::{$S3Bucket}/*" 73 | ] 74 | }, 75 | { 76 | "Sid": "VisualEditor1", 77 | "Effect": "Allow", 78 | "Action": [ 79 | "logs:CreateLogStream", 80 | "logs:CreateLogGroup", 81 | "logs:PutLogEvents" 82 | ], 83 | "Resource": "*" 84 | }, 85 | { 86 | "Sid": "VisualEditor2", 87 | "Effect": "Allow", 88 | "Action": "s3:*", 89 | "Resource": "arn:aws:s3:::yara-rules/*" 90 | }, 91 | { 92 | "Sid": "VisualEditor3", 93 | "Effect": "Allow", 94 | "Action": "s3:*", 95 | "Resource": "arn:aws:s3:::{$S3Bucket}/*" 96 | }, 97 | { 98 | "Sid": "VisualEditor9", 99 | "Effect": "Allow", 100 | "Action": "s3:*", 101 | "Resource": "arn:aws:s3:::{$S3Bucket}" 102 | }, 103 | { 104 | "Sid": "VisualEditor4", 105 | "Effect": "Allow", 106 | "Action": "s3:*", 107 | "Resource": "arn:aws:s3:::yara-rules" 108 | } 109 | ] 110 | } 111 | 112 | -------------------------------------------------------------------------------- /presentation/Presentation1.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/abhinavbom/clara/920d30e797488e82cd49e0eecdc48f40a3e7feab/presentation/Presentation1.pptx -------------------------------------------------------------------------------- /python.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | wheels/ 23 | pip-wheel-metadata/ 24 | share/python-wheels/ 25 | *.egg-info/ 26 | .installed.cfg 27 | *.egg 28 | MANIFEST 29 | 30 | # PyInstaller 31 | # Usually these files are written by a python script from a template 32 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 33 | *.manifest 34 | *.spec 35 | 36 | # Installer logs 37 | pip-log.txt 38 | pip-delete-this-directory.txt 39 | 40 | # Unit test / coverage reports 41 | htmlcov/ 42 | .tox/ 43 | .nox/ 44 | .coverage 45 | .coverage.* 46 | .cache 47 | nosetests.xml 48 | coverage.xml 49 | *.cover 50 | *.py,cover 51 | .hypothesis/ 52 | .pytest_cache/ 53 | 54 | # Translations 55 | *.mo 56 | *.pot 57 | 58 | # Django stuff: 59 | *.log 60 | local_settings.py 61 | db.sqlite3 62 | db.sqlite3-journal 63 | 64 | # Flask stuff: 65 | instance/ 66 | .webassets-cache 67 | 68 | # Scrapy stuff: 69 | .scrapy 70 | 71 | # Sphinx documentation 72 | docs/_build/ 73 | 74 | # PyBuilder 75 | target/ 76 | 77 | # Jupyter Notebook 78 | .ipynb_checkpoints 79 | 80 | # IPython 81 | profile_default/ 82 | ipython_config.py 83 | 84 | # pyenv 85 | .python-version 86 | 87 | # pipenv 88 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 89 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 90 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 91 | # install all needed dependencies. 92 | #Pipfile.lock 93 | 94 | # celery beat schedule file 95 | celerybeat-schedule 96 | 97 | # SageMath parsed files 98 | *.sage.py 99 | 100 | # Environments 101 | .env 102 | .venv 103 | env/ 104 | venv/ 105 | ENV/ 106 | env.bak/ 107 | venv.bak/ 108 | 109 | # Spyder project settings 110 | .spyderproject 111 | .spyproject 112 | 113 | # Rope project settings 114 | .ropeproject 115 | 116 | # mkdocs documentation 117 | /site 118 | 119 | # mypy 120 | .mypy_cache/ 121 | .dmypy.json 122 | dmypy.json 123 | 124 | # Pyre type checker 125 | .pyre/ -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | certifi==2018.11.29 2 | chardet==3.0.4 3 | decorator==4.3 4 | idna==2.8 5 | requests==2.21 6 | simplejson==3.16 7 | urllib3==1.24.1 8 | yara-python 9 | pymongo 10 | -------------------------------------------------------------------------------- /scan.py: -------------------------------------------------------------------------------- 1 | # 06/23/2019 - Adding new feature to Scan uploaded files against set of Yara signatures uploaded on s3 bucket. 2 | #author: Abhinav Singh 3 | #pre-update 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | 16 | import boto3 17 | import clamav 18 | import yarascan 19 | import copy 20 | import json 21 | import fsf_client 22 | from urllib.parse import unquote_plus 23 | from common import * 24 | from datetime import datetime 25 | from distutils.util import strtobool 26 | import sys 27 | import os 28 | from botocore.vendored import requests 29 | 30 | ENV = os.getenv("ENV", "") 31 | EVENT_SOURCE = os.getenv("EVENT_SOURCE", "S3") 32 | 33 | 34 | def event_object(event): 35 | if EVENT_SOURCE.upper() == "SNS": 36 | event = json.loads(event['Records'][0]['Sns']['Message']) 37 | bucket = event['Records'][0]['s3']['bucket']['name'] 38 | key = unquote_plus(event['Records'][0]['s3']['object']['key']) 39 | if (not bucket) or (not key): 40 | print("Unable to retrieve object from event.\n%s" % event) 41 | raise Exception("Unable to retrieve object from event.") 42 | return s3.Object(bucket, key) 43 | 44 | 45 | def verify_s3_object_version(s3_object): 46 | # validate that we only process the original version of a file, if asked to do so 47 | # security check to disallow processing of a new (possibly infected) object version 48 | # while a clean initial version is getting processed 49 | # downstream services may consume latest version by mistake and get the infected version instead 50 | if str_to_bool(AV_PROCESS_ORIGINAL_VERSION_ONLY): 51 | bucketVersioning = s3.BucketVersioning(s3_object.bucket_name) 52 | if bucketVersioning.status == "Enabled": 53 | bucket = s3.Bucket(s3_object.bucket_name) 54 | versions = list(bucket.object_versions.filter(Prefix=s3_object.key)) 55 | if len(versions) > 1: 56 | print("Detected multiple object versions in %s.%s, aborting processing" % ( 57 | s3_object.bucket_name, s3_object.key)) 58 | raise Exception("Detected multiple object versions in %s.%s, aborting processing" % ( 59 | s3_object.bucket_name, s3_object.key)) 60 | else: 61 | print("Detected only 1 object version in %s.%s, proceeding with processing" % ( 62 | s3_object.bucket_name, s3_object.key)) 63 | else: 64 | # misconfigured bucket, left with no or suspended versioning 65 | print( 66 | "Unable to implement check for original version, as versioning is not enabled in bucket %s" % s3_object.bucket_name) 67 | raise Exception("Object versioning is not enabled in bucket %s" % s3_object.bucket_name) 68 | 69 | 70 | def download_s3_object(s3_object, local_prefix): 71 | local_path = "%s/%s/%s" % (local_prefix, s3_object.bucket_name, s3_object.key) 72 | create_dir(os.path.dirname(local_path)) 73 | s3_object.download_file(local_path) 74 | return local_path 75 | 76 | 77 | def delete_s3_object(s3_object): 78 | try: 79 | s3_object.delete() 80 | except: 81 | print("Failed to delete infected file: %s.%s" % (s3_object.bucket_name, s3_object.key)) 82 | else: 83 | print("Infected file deleted: %s.%s" % (s3_object.bucket_name, s3_object.key)) 84 | 85 | 86 | def set_av_metadata(s3_object, result): 87 | content_type = s3_object.content_type 88 | metadata = s3_object.metadata 89 | metadata[AV_STATUS_METADATA] = result 90 | metadata[AV_TIMESTAMP_METADATA] = datetime.utcnow().strftime("%Y/%m/%d %H:%M:%S UTC") 91 | s3_object.copy( 92 | { 93 | 'Bucket': s3_object.bucket_name, 94 | 'Key': s3_object.key 95 | }, 96 | ExtraArgs={ 97 | "ContentType": content_type, 98 | "Metadata": metadata, 99 | "MetadataDirective": "REPLACE" 100 | } 101 | ) 102 | 103 | 104 | def set_av_tags(s3_object, result): 105 | curr_tags = s3_client.get_object_tagging(Bucket=s3_object.bucket_name, Key=s3_object.key)["TagSet"] 106 | new_tags = copy.copy(curr_tags) 107 | for tag in curr_tags: 108 | if tag["Key"] in [AV_STATUS_METADATA, AV_TIMESTAMP_METADATA]: 109 | new_tags.remove(tag) 110 | new_tags.append({"Key": AV_STATUS_METADATA, "Value": result}) 111 | new_tags.append({"Key": AV_TIMESTAMP_METADATA, "Value": datetime.utcnow().strftime("%Y/%m/%d %H:%M:%S UTC")}) 112 | s3_client.put_object_tagging( 113 | Bucket=s3_object.bucket_name, 114 | Key=s3_object.key, 115 | Tagging={"TagSet": new_tags} 116 | ) 117 | 118 | 119 | def sns_start_scan(s3_object): 120 | if AV_SCAN_START_SNS_ARN is None: 121 | return 122 | message = { 123 | "bucket": s3_object.bucket_name, 124 | "key": s3_object.key, 125 | "version": s3_object.version_id, 126 | AV_SCAN_START_METADATA: True, 127 | AV_TIMESTAMP_METADATA: datetime.utcnow().strftime("%Y/%m/%d %H:%M:%S UTC") 128 | } 129 | sns_client = boto3.client("sns") 130 | sns_client.publish( 131 | TargetArn=AV_SCAN_START_SNS_ARN, 132 | Message=json.dumps({'default': json.dumps(message)}), 133 | MessageStructure="json" 134 | ) 135 | 136 | 137 | def sns_scan_results(s3_object, result): 138 | if AV_STATUS_SNS_ARN is None: 139 | return 140 | message = { 141 | "bucket": s3_object.bucket_name, 142 | "key": s3_object.key, 143 | "version": s3_object.version_id, 144 | AV_STATUS_METADATA: result, 145 | AV_TIMESTAMP_METADATA: datetime.utcnow().strftime("%Y/%m/%d %H:%M:%S UTC") 146 | } 147 | sns_client = boto3.client("sns") 148 | sns_client.publish( 149 | TargetArn=AV_STATUS_SNS_ARN, 150 | Message=json.dumps({'default': json.dumps(message)}), 151 | MessageStructure="json", 152 | MessageAttributes={ 153 | AV_STATUS_METADATA: { 154 | 'DataType': 'String', 155 | 'StringValue': result 156 | } 157 | } 158 | ) 159 | 160 | def slack_notification(result): 161 | webhook_url = 'https://hooks.slack.com/services/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' 162 | response = requests.post(webhook_url, json={'text': result}) 163 | http_reply = {"statusCode": 200, "body": response.text} 164 | return http_reply 165 | 166 | 167 | def lambda_handler(event, context): 168 | start_time = datetime.utcnow() 169 | print("Script starting at %s\n" % 170 | (start_time.strftime("%Y/%m/%d %H:%M:%S UTC"))) 171 | s3_object = event_object(event) 172 | verify_s3_object_version(s3_object) 173 | sns_start_scan(s3_object) 174 | file_path = download_s3_object(s3_object, "/tmp") 175 | clamav.update_defs_from_s3(AV_DEFINITION_S3_BUCKET, AV_DEFINITION_S3_PREFIX) 176 | scan_result = clamav.scan_file(file_path) 177 | slack_notification(scan_result) 178 | print("yara scanning to begin") 179 | yarascan.update_sigs_from_s3(YARA_RULES_S3_BUCKET, YARA_RULES_S3_PREFIX) 180 | scan_result_yara = yarascan.scan_file(file_path) 181 | print(scan_result_yara) 182 | lambda_result = {"clamav": "Detected", "yara": "Dummrule1.yara"} 183 | with open(file_path, 'rb') as f: 184 | filename = os.path.basename(file_path) 185 | print("sending control to fsf") 186 | fsf = fsf_client.FSFClient(file_path, f.name, False, 'Analyst', False, False, False, f.read(), lambda_result) 187 | print("initiating submission") 188 | print(fsf.initiate_submission()) 189 | print("Scan of s3://%s resulted in %s\n" % (os.path.join(s3_object.bucket_name, s3_object.key), scan_result)) 190 | if "AV_UPDATE_METADATA" in os.environ: 191 | set_av_metadata(s3_object, scan_result) 192 | set_av_tags(s3_object, scan_result) 193 | sns_scan_results(s3_object, scan_result) 194 | #metrics.send(env=ENV, bucket=s3_object.bucket_name, key=s3_object.key, status=scan_result) 195 | # Delete downloaded file to free up room on re-usable lambda function container 196 | try: 197 | os.remove(file_path) 198 | except OSError: 199 | pass 200 | if str_to_bool(AV_DELETE_INFECTED_FILES) and scan_result == AV_STATUS_INFECTED: 201 | delete_s3_object(s3_object) 202 | print("Script finished at %s\n" % 203 | datetime.utcnow().strftime("%Y/%m/%d %H:%M:%S UTC")) 204 | 205 | 206 | def str_to_bool(s): 207 | return bool(strtobool(str(s))) 208 | -------------------------------------------------------------------------------- /update.py: -------------------------------------------------------------------------------- 1 | # 06/23/2019 - Adding new feature to update bucket containing Rara rules for scanning. 2 | #author: Abhinav Singh 3 | #pre-update 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | 16 | import clamav 17 | import yarascan 18 | from common import * 19 | from datetime import datetime 20 | import os 21 | 22 | 23 | def lambda_handler(event, context): 24 | start_time = datetime.utcnow() 25 | print("Script starting at %s\n" % 26 | (start_time.strftime("%Y/%m/%d %H:%M:%S UTC"))) 27 | clamav.update_defs_from_s3(AV_DEFINITION_S3_BUCKET, AV_DEFINITION_S3_PREFIX) 28 | clamav.update_defs_from_freshclam(AV_DEFINITION_PATH, CLAMAVLIB_PATH) 29 | yarascan.update_sigs_from_s3(YARA_RULES_S3_BUCKET, YARA_RULES_S3_PREFIX) 30 | # If main.cvd gets updated (very rare), we will need to force freshclam 31 | # to download the compressed version to keep file sizes down. 32 | # The existence of main.cud is the trigger to know this has happened. 33 | if os.path.exists(os.path.join(AV_DEFINITION_PATH, "main.cud")): 34 | os.remove(os.path.join(AV_DEFINITION_PATH, "main.cud")) 35 | if os.path.exists(os.path.join(AV_DEFINITION_PATH, "main.cvd")): 36 | os.remove(os.path.join(AV_DEFINITION_PATH, "main.cvd")) 37 | clamav.update_defs_from_freshclam(AV_DEFINITION_PATH, CLAMAVLIB_PATH) 38 | clamav.upload_defs_to_s3(AV_DEFINITION_S3_BUCKET, AV_DEFINITION_S3_PREFIX, AV_DEFINITION_PATH) 39 | print("Script finished at %s\n" % 40 | datetime.utcnow().strftime("%Y/%m/%d %H:%M:%S UTC")) 41 | -------------------------------------------------------------------------------- /yara-rule-updates-policy.json: -------------------------------------------------------------------------------- 1 | { 2 | "Version": "2012-10-17", 3 | "Statement": [ 4 | { 5 | "Sid": "AllowPublic", 6 | "Effect": "Allow", 7 | "Principal": "*", 8 | "Action": [ 9 | "s3:GetObject", 10 | "s3:GetObjectTagging" 11 | ], 12 | "Resource": "arn:aws:s3:::yara-rule-updates/*" 13 | } 14 | ] 15 | } -------------------------------------------------------------------------------- /yara_rules/my_first_rule.yara: -------------------------------------------------------------------------------- 1 | rule dummy1 { strings: $a = "m" condition: $a } 2 | -------------------------------------------------------------------------------- /yara_rules/second_rule.yara: -------------------------------------------------------------------------------- 1 | rule dummy2 { strings: $a = "a" condition: $a } 2 | -------------------------------------------------------------------------------- /yara_rules/third_yara_rule.yara: -------------------------------------------------------------------------------- 1 | rule dummy3 { strings: $a = "b" condition: $a } 2 | -------------------------------------------------------------------------------- /yarascan.py: -------------------------------------------------------------------------------- 1 | # 06/23/2019 - Adding new feature to update bucket containing Rara rules for scanning. 2 | #author: Abhinav Singh 3 | 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | 16 | import hashlib 17 | import os 18 | import pwd 19 | import re 20 | from common import * 21 | from subprocess import check_output, Popen, PIPE, STDOUT 22 | import yara 23 | import logging 24 | import boto3 25 | 26 | def current_library_search_path(): 27 | ld_verbose = check_output(["ld", "--verbose"]).decode('utf-8') 28 | rd_ld = re.compile("SEARCH_DIR\(\"([A-z0-9/-]*)\"\)") 29 | return rd_ld.findall(ld_verbose) 30 | 31 | 32 | def update_sigs_from_s3(bucket, prefix): 33 | create_dir(YARA_DEFINITION_PATH) 34 | print ("created yara definitions directory %s" %YARA_DEFINITION_PATH) 35 | #initiate s3 resource 36 | s3 = boto3.resource('s3') 37 | # select bucket 38 | my_bucket = s3.Bucket(YARA_RULES_S3_BUCKET) 39 | local_path = YARA_DEFINITION_PATH 40 | # download file into current directory 41 | for s3_object in my_bucket.objects.all(): 42 | # Need to split s3_object.key into path and file name, else it will give error file not found. 43 | path, filename = os.path.split(s3_object.key) 44 | local_path = os.path.join(YARA_DEFINITION_PATH, filename) 45 | print ("downloading yara rules") 46 | print(path, filename) 47 | my_bucket.download_file(s3_object.key, local_path) 48 | 49 | def scan_file(path): 50 | pwd = os.getcwd() 51 | print (pwd) 52 | #YARA_DEFINITION_PATH = pwd+'/yara_rules/' 53 | file_list = [] 54 | rule_name_list = [] 55 | yara_scan_info = { 56 | "scan_performed":"No", 57 | "scan_result" : "Not-detected", 58 | "detection_rule" : "N/A", 59 | "clamAV_scan": "N/A" 60 | } 61 | yara_env = os.environ.copy() 62 | yara_env["LD_LIBRARY_PATH"] = YARA_LIB_PATH 63 | #print(yara_env) 64 | file = open(path,'rb') #open file for yara scanning 65 | print (file) 66 | file_data = file.read() 67 | try: 68 | for (dirpath, dirnames, filenames) in os.walk(YARA_DEFINITION_PATH): 69 | file_list.extend(filenames) 70 | print (file_list) 71 | for item in file_list: 72 | #print (file_list) 73 | rule = yara.compile(filepath= YARA_DEFINITION_PATH+ '/' + str(item)) 74 | matches = rule.match(data=file_data) 75 | logging.info(matches) 76 | if matches: 77 | rule_name_list.append(matches[0].rule) 78 | yara_scan_info['scan_performed']="Yes" 79 | yara_scan_info['scan_result']="Detected" 80 | yara_scan_info['detection_rule'] = rule_name_list 81 | print (yara_scan_info) 82 | except Exception as e: 83 | print (e) 84 | 85 | def main(): 86 | path = input("Enter the path of your file: ") 87 | scan_file(path) 88 | 89 | if __name__ == "__main__": 90 | main() 91 | 92 | --------------------------------------------------------------------------------