├── img └── Architecture.png ├── images ├── Module-3 │ ├── Module3_LI-pie.png │ ├── Module3_LogInsights.png │ ├── Module3_ServiceMap.png │ ├── Module3_Structured.png │ └── Module3_TraceDetails.png ├── Module-4 │ ├── PhoneNumberLog.png │ ├── MaskedPhoneNumberLog.png │ ├── UpdatedPerOrgStandards.png │ ├── module4-org-standard-update.png │ └── DefaultMaskingWithoutInstructions.png ├── Prepare-Your-Environment │ ├── x.png │ ├── qchat.png │ ├── getTest.png │ ├── postTest.png │ ├── BuilderID.png │ ├── kiro-login.png │ ├── OpenTerminal.png │ ├── vscodeserver.png │ ├── vscodecloudformation.png │ ├── openNewTerminalVScode.png │ └── Test_your_Deploy-GeneralImmersionDay.png └── Module-1 │ ├── Module1_Architecture.png │ ├── Module1_Empty_CW_metrics.png │ ├── Module1_Empty_ServiceMap.png │ └── Module1_CW_UnstructuredLogs.png ├── CODE_OF_CONDUCT.md ├── loadgen.yaml ├── LICENSE ├── LICENSE-SAMPLECODE ├── src └── handlers │ ├── getByIdFunction │ └── index.py │ └── putItemFunction │ └── index.py ├── ollyver ├── Approach │ ├── process-overview.md │ ├── session-management.md │ └── workshop-methodology.md ├── Org-Standards │ ├── deployment-guide.md │ ├── observability-requirements.md │ └── core-patterns.md ├── .amazonq │ └── cli-agents │ │ └── ollyver.json └── ollyver-agent-instructions.md ├── .gitignore ├── CONTRIBUTING.md ├── docs ├── module-2.md ├── setup.md ├── module-4.md ├── introduction.md ├── module-3.md ├── module-5.md └── module-1.md ├── cloudformation └── application.yaml └── README.md /img/Architecture.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/observability-driven-development/HEAD/img/Architecture.png -------------------------------------------------------------------------------- /images/Module-3/Module3_LI-pie.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/observability-driven-development/HEAD/images/Module-3/Module3_LI-pie.png -------------------------------------------------------------------------------- /images/Module-4/PhoneNumberLog.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/observability-driven-development/HEAD/images/Module-4/PhoneNumberLog.png -------------------------------------------------------------------------------- /images/Module-3/Module3_LogInsights.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/observability-driven-development/HEAD/images/Module-3/Module3_LogInsights.png -------------------------------------------------------------------------------- /images/Module-3/Module3_ServiceMap.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/observability-driven-development/HEAD/images/Module-3/Module3_ServiceMap.png -------------------------------------------------------------------------------- /images/Module-3/Module3_Structured.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/observability-driven-development/HEAD/images/Module-3/Module3_Structured.png -------------------------------------------------------------------------------- /images/Prepare-Your-Environment/x.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/observability-driven-development/HEAD/images/Prepare-Your-Environment/x.png -------------------------------------------------------------------------------- /images/Module-1/Module1_Architecture.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/observability-driven-development/HEAD/images/Module-1/Module1_Architecture.png -------------------------------------------------------------------------------- /images/Module-3/Module3_TraceDetails.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/observability-driven-development/HEAD/images/Module-3/Module3_TraceDetails.png -------------------------------------------------------------------------------- /images/Module-4/MaskedPhoneNumberLog.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/observability-driven-development/HEAD/images/Module-4/MaskedPhoneNumberLog.png -------------------------------------------------------------------------------- /images/Prepare-Your-Environment/qchat.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/observability-driven-development/HEAD/images/Prepare-Your-Environment/qchat.png -------------------------------------------------------------------------------- /images/Module-1/Module1_Empty_CW_metrics.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/observability-driven-development/HEAD/images/Module-1/Module1_Empty_CW_metrics.png -------------------------------------------------------------------------------- /images/Module-1/Module1_Empty_ServiceMap.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/observability-driven-development/HEAD/images/Module-1/Module1_Empty_ServiceMap.png -------------------------------------------------------------------------------- /images/Module-4/UpdatedPerOrgStandards.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/observability-driven-development/HEAD/images/Module-4/UpdatedPerOrgStandards.png -------------------------------------------------------------------------------- /images/Prepare-Your-Environment/getTest.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/observability-driven-development/HEAD/images/Prepare-Your-Environment/getTest.png -------------------------------------------------------------------------------- /images/Prepare-Your-Environment/postTest.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/observability-driven-development/HEAD/images/Prepare-Your-Environment/postTest.png -------------------------------------------------------------------------------- /images/Prepare-Your-Environment/BuilderID.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/observability-driven-development/HEAD/images/Prepare-Your-Environment/BuilderID.png -------------------------------------------------------------------------------- /images/Prepare-Your-Environment/kiro-login.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/observability-driven-development/HEAD/images/Prepare-Your-Environment/kiro-login.png -------------------------------------------------------------------------------- /images/Module-1/Module1_CW_UnstructuredLogs.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/observability-driven-development/HEAD/images/Module-1/Module1_CW_UnstructuredLogs.png -------------------------------------------------------------------------------- /images/Module-4/module4-org-standard-update.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/observability-driven-development/HEAD/images/Module-4/module4-org-standard-update.png -------------------------------------------------------------------------------- /images/Prepare-Your-Environment/OpenTerminal.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/observability-driven-development/HEAD/images/Prepare-Your-Environment/OpenTerminal.png -------------------------------------------------------------------------------- /images/Prepare-Your-Environment/vscodeserver.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/observability-driven-development/HEAD/images/Prepare-Your-Environment/vscodeserver.png -------------------------------------------------------------------------------- /images/Module-4/DefaultMaskingWithoutInstructions.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/observability-driven-development/HEAD/images/Module-4/DefaultMaskingWithoutInstructions.png -------------------------------------------------------------------------------- /images/Prepare-Your-Environment/vscodecloudformation.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/observability-driven-development/HEAD/images/Prepare-Your-Environment/vscodecloudformation.png -------------------------------------------------------------------------------- /images/Prepare-Your-Environment/openNewTerminalVScode.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/observability-driven-development/HEAD/images/Prepare-Your-Environment/openNewTerminalVScode.png -------------------------------------------------------------------------------- /images/Prepare-Your-Environment/Test_your_Deploy-GeneralImmersionDay.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/observability-driven-development/HEAD/images/Prepare-Your-Environment/Test_your_Deploy-GeneralImmersionDay.png -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | ## Code of Conduct 2 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 3 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 4 | opensource-codeofconduct@amazon.com with any additional questions or comments. 5 | -------------------------------------------------------------------------------- /loadgen.yaml: -------------------------------------------------------------------------------- 1 | config: 2 | target: "{{ $processEnvironment.URL }}" 3 | phases: 4 | - duration: 3000 5 | arrivalRate: 3 6 | name: sustained load 7 | scenarios: 8 | - name: "insert ride entries" 9 | flow: 10 | - post: 11 | url: "/Prod/items" 12 | json: 13 | id: "{{ $randomString() }}" 14 | name: "{{ $randomString() }}" 15 | milesTraveled: "{{$randomNumber(1,50)}}" 16 | totalTravelTime: "{{$randomNumber(1,50)}}" 17 | price: "{{$randomNumber(5,100)}}" 18 | tenantId: "{{$randomNumber(1001,1003)}}" 19 | timestamp: 0 20 | capture: 21 | json: "$.id" 22 | as: "id" 23 | - get: 24 | url: "/Prod/items/{{ id }}" 25 | capture: 26 | json: "$.name" 27 | as: "name" -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of 4 | this software and associated documentation files (the "Software"), to deal in 5 | the Software without restriction, including without limitation the rights to 6 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of 7 | the Software, and to permit persons to whom the Software is furnished to do so. 8 | 9 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 10 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS 11 | FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR 12 | COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER 13 | IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN 14 | CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 15 | 16 | -------------------------------------------------------------------------------- /LICENSE-SAMPLECODE: -------------------------------------------------------------------------------- 1 | Copyright 2022. Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of this 4 | software and associated documentation files (the "Software"), to deal in the Software 5 | without restriction, including without limitation the rights to use, copy, modify, 6 | merge, publish, distribute, sublicense, and/or sell copies of the Software, and to 7 | permit persons to whom the Software is furnished to do so. 8 | 9 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, 10 | INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 11 | PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 12 | HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 13 | OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 14 | SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 15 | 16 | 17 | -------------------------------------------------------------------------------- /src/handlers/getByIdFunction/index.py: -------------------------------------------------------------------------------- 1 | import os 2 | import json 3 | import boto3 4 | 5 | # Initialize the DynamoDB client 6 | dynamodb = boto3.resource('dynamodb') 7 | tableName = os.environ['SAMPLE_TABLE'] 8 | table = dynamodb.Table(tableName) 9 | 10 | def lambda_handler(event, context): 11 | try: 12 | print (event) 13 | # Parse input from the event 14 | key = event['pathParameters']['id'] 15 | print (key) 16 | # Retrieve the item from DynamoDB 17 | response = table.get_item( 18 | Key={ 19 | 'id': key 20 | } 21 | ) 22 | 23 | # Check if the item was found 24 | if 'Item' in response: 25 | item = response['Item'] 26 | print (item) 27 | return { 28 | 'statusCode': 200, 29 | 'body': json.dumps(item) 30 | } 31 | else: 32 | return { 33 | 'statusCode': 404, 34 | 'body': 'Item not found' 35 | } 36 | except Exception as e: 37 | return { 38 | 'statusCode': 500, 39 | 'body': f'Error: {str(e)}' 40 | } -------------------------------------------------------------------------------- /src/handlers/putItemFunction/index.py: -------------------------------------------------------------------------------- 1 | import os 2 | import json 3 | import boto3 4 | import logging 5 | from datetime import datetime 6 | import base64 7 | 8 | # Initialize the DynamoDB client 9 | dynamodb = boto3.resource('dynamodb') 10 | tableName = os.environ['SAMPLE_TABLE'] 11 | table = dynamodb.Table(tableName) 12 | 13 | def lambda_handler(event, context): 14 | try: 15 | if event.get('isBase64Encoded'): 16 | record = base64.b64decode(event["body"]).decode('utf-8') 17 | else: 18 | record = event["body"] 19 | 20 | data = json.loads(record) 21 | 22 | id = data["id"] 23 | name = data["name"] 24 | milesTraveled = data["milesTraveled"] 25 | totalTravelTime = data["totalTravelTime"] 26 | price = data["price"] 27 | tenantId = data["tenantId"] 28 | timestamp = datetime.now().isoformat(timespec='seconds') 29 | 30 | data = { "id": id, "name": name, "timestamp": timestamp, "milesTraveled": milesTraveled, "totalTravelTime": totalTravelTime, "price": price, "tenantId": tenantId } 31 | # Write the item to DynamoDB 32 | response = table.put_item( 33 | Item=data 34 | ) 35 | print ("Writing to DynamoDB") 36 | 37 | return { 38 | 'statusCode': 200, 39 | 'body': 'Item added successfully' 40 | } 41 | except Exception as e: 42 | return { 43 | 'statusCode': 500, 44 | 'body': f'Error: {str(e)}' 45 | } -------------------------------------------------------------------------------- /ollyver/Approach/process-overview.md: -------------------------------------------------------------------------------- 1 | # Ollyver Observability Agent Workflow Overview 2 | 3 | ## The Workflow: 4 | • **Agent Initialization** → **Gap Detection** → **Automated Remediation** → **Continuous Monitoring** 5 | 6 | ## Your Role: 7 | • **Launch development environment** (VSCode + Q Dev CLI) 8 | • **Interact with Ollyver** using natural language commands 9 | • **Review and approve** automated observability enhancements 10 | • **Add custom requirements** when needed for specific business needs 11 | • **Important**: Ollyver works alongside you - you maintain control while gaining automated assistance 12 | 13 | ## Ollyver's Role: 14 | ``` 15 | ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ 16 | │ Scan │───▶│ Detect │───▶│ Suggest │───▶│ Implement │ 17 | │ Codebase │ │ Gaps │ │ Fixes │ │ & Monitor │ 18 | └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ 19 | ▲ │ 20 | │ │ 21 | └─────────────────────────────────────────────────────────┘ 22 | ``` 23 | 24 | ## Organizational Integration: 25 | • **Standards Enforcement**: Ollyver applies your organization's observability requirements automatically 26 | • **KPI Integration**: Business users define KPIs, Ollyver implements the metrics 27 | • **Day-2 Operations**: Automated instrumentation, logging, and monitoring setup 28 | • **Feedback Loop**: Ops team receives troubleshooting guides and learns from real-world usage 29 | -------------------------------------------------------------------------------- /ollyver/Org-Standards/deployment-guide.md: -------------------------------------------------------------------------------- 1 | # AWS Deployment Guide 2 | 3 | ## Lambda Function Deployment 4 | 5 | ### Check for Existing Deployment Script 6 | 1. **Check if `code/workshop/updateLambda.sh` exists** 7 | 2. **If exists**: Use the existing script for deployment 8 | 3. **If not exists**: Use manual AWS CLI deployment commands 9 | 10 | ### Using Existing Script (Preferred) 11 | ```bash 12 | # Navigate to workshop directory 13 | cd code/workshop 14 | 15 | # Execute the deployment script 16 | ./updateLambda.sh 17 | ``` 18 | 19 | ### Manual Deployment (Fallback) 20 | If the script doesn't exist, use these commands: 21 | ```bash 22 | # For each Lambda function directory: 23 | cd [function-directory] 24 | zip index.zip index.py 25 | aws lambda update-function-code --function-name [function-name] --zip-file fileb://index.zip 26 | ``` 27 | 28 | ## Pattern-Specific Deployment Steps 29 | 30 | ### X-Ray Tracing Pattern 31 | After deploying code changes, enable X-Ray tracing: 32 | ```bash 33 | # Enable X-Ray tracing on Lambda functions 34 | aws lambda update-function-configuration --function-name getByIdFunction --tracing-config Mode=Active 35 | aws lambda update-function-configuration --function-name putItemFunction --tracing-config Mode=Active 36 | 37 | # Update IAM role permissions 38 | aws iam attach-role-policy --role-name lambda-execution-role --policy-arn arn:aws:iam::aws:policy/AWSXRayDaemonWriteAccess 39 | ``` 40 | 41 | ### Operational Dashboards Pattern 42 | Create CloudWatch dashboard: 43 | ```bash 44 | # Create dashboard using AWS CLI 45 | aws cloudwatch put-dashboard --dashboard-name "ObservabilityDashboard" --dashboard-body file://dashboard-config.json 46 | 47 | # Or use AWS Console to create dashboard manually 48 | # Navigate to CloudWatch > Dashboards > Create Dashboard 49 | ``` 50 | 51 | ## Post-Deployment Validation 52 | - Verify function updated in AWS Console 53 | - Check CloudWatch logs for new observability data 54 | - Validate X-Ray traces if implemented (X-Ray Console) 55 | - Confirm custom metrics in CloudWatch Metrics 56 | - Test dashboard widgets display data correctly 57 | -------------------------------------------------------------------------------- /ollyver/.amazonq/cli-agents/ollyver.json: -------------------------------------------------------------------------------- 1 | { 2 | "$schema": "https://raw.githubusercontent.com/aws/amazon-q-developer-cli/refs/heads/main/schemas/agent-v1.json", 3 | "name": "ollyver", 4 | "description": "AI-powered observability agent that detects gaps and automates instrumentation for AWS applications", 5 | "prompt": "👋 **Welcome! I'm Ollyver, your observability companion.**\n\nI'm an AI agent that automates observability enhancement through a structured workflow: **Scan Codebase** → **Detect Gaps** → **Suggest Fixes** → **Implement & Monitor**\n\n**🔍 My Core Capabilities:**\n- **Automated Gap Detection**: Scan your application to identify observability gaps\n- **Standards Enforcement**: Apply organizational observability requirements automatically\n- **Pattern Implementation**: Add X-Ray tracing, structured logging, and custom metrics\n- **Continuous Monitoring**: Automatically enhance observability as you develop features\n- **Educational Guidance**: Show before/after examples with hands-on learning\n\n**Ready to transform your application from 'black box' to 'glass box'?** I can start by analyzing your codebase to identify opportunities for improvement.\n\nType 'yes' to begin the analysis, or ask me any questions about what I can do!", 6 | "mcpServers": { 7 | "aws-knowledge-mcp-server": { 8 | "command": "uvx", 9 | "args": [ 10 | "mcp-proxy", 11 | "--transport", 12 | "streamablehttp", 13 | "https://knowledge-mcp.global.api.aws" 14 | ] 15 | }, 16 | "awslabs.cloudwatch-mcp-server": { 17 | "command": "uvx", 18 | "args": [ 19 | "awslabs.cloudwatch-mcp-server" 20 | ], 21 | "env": { 22 | "FASTMCP_LOG_LEVEL": "ERROR", 23 | "AWS_PROFILE": "default", 24 | "AWS_REGION": "us-east-1" 25 | }, 26 | "autoApprove": [], 27 | "disabled": false 28 | } 29 | }, 30 | "tools": [ 31 | "*" 32 | ], 33 | "toolAliases": {}, 34 | "allowedTools": [ 35 | "fs_read", 36 | "fs_write", 37 | "execute_bash", 38 | "use_aws", 39 | "@aws-knowledge-mcp-server/*", 40 | "@awslabs.cloudwatch-mcp-server/*" 41 | ], 42 | "resources": [ 43 | "file://ollyver/**/*.md", 44 | "file://ollyver/ollyver-agent-instructions.md" 45 | ], 46 | "toolsSettings": {}, 47 | "useLegacyMcpJson": true 48 | } 49 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Logs 2 | logs 3 | *.log 4 | npm-debug.log* 5 | yarn-debug.log* 6 | yarn-error.log* 7 | lerna-debug.log* 8 | 9 | # Diagnostic reports (https://nodejs.org/api/report.html) 10 | report.[0-9]*.[0-9]*.[0-9]*.[0-9]*.json 11 | 12 | # Runtime data 13 | pids 14 | *.pid 15 | *.seed 16 | *.pid.lock 17 | 18 | # Directory for instrumented libs generated by jscoverage/JSCover 19 | lib-cov 20 | 21 | # Coverage directory used by tools like istanbul 22 | coverage 23 | *.lcov 24 | 25 | # nyc test coverage 26 | .nyc_output 27 | 28 | # Grunt intermediate storage (https://gruntjs.com/creating-plugins#storing-task-files) 29 | .grunt 30 | 31 | # Bower dependency directory (https://bower.io/) 32 | bower_components 33 | 34 | # node-waf configuration 35 | .lock-wscript 36 | 37 | # Compiled binary addons (https://nodejs.org/api/addons.html) 38 | build/Release 39 | 40 | # Dependency directories 41 | node_modules/ 42 | jspm_packages/ 43 | 44 | # TypeScript v1 declaration files 45 | typings/ 46 | 47 | # TypeScript cache 48 | *.tsbuildinfo 49 | 50 | # Optional npm cache directory 51 | .npm 52 | 53 | # Optional eslint cache 54 | .eslintcache 55 | 56 | # Microbundle cache 57 | .rpt2_cache/ 58 | .rts2_cache_cjs/ 59 | .rts2_cache_es/ 60 | .rts2_cache_umd/ 61 | 62 | # Optional REPL history 63 | .node_repl_history 64 | 65 | # Output of 'npm pack' 66 | *.tgz 67 | 68 | # Yarn Integrity file 69 | .yarn-integrity 70 | 71 | # dotenv environment variables file 72 | .env 73 | .env.test 74 | 75 | # parcel-bundler cache (https://parceljs.org/) 76 | .cache 77 | 78 | # Next.js build output 79 | .next 80 | 81 | # Nuxt.js build / generate output 82 | .nuxt 83 | dist 84 | 85 | # Gatsby files 86 | .cache/ 87 | # Comment in the public line in if your project uses Gatsby and *not* Next.js 88 | # https://nextjs.org/blog/next-9-1#public-directory-support 89 | # public 90 | 91 | # vuepress build output 92 | .vuepress/dist 93 | 94 | # Serverless directories 95 | .serverless/ 96 | 97 | # FuseBox cache 98 | .fusebox/ 99 | 100 | # DynamoDB Local files 101 | .dynamodb/ 102 | 103 | # TernJS port file 104 | .tern-port 105 | 106 | 107 | # Misc 108 | package-lock.json 109 | .DS_Store 110 | __pycache__ 111 | template-export.yml 112 | packaged.yaml 113 | samconfig.toml 114 | .aws-sam/ 115 | .aws-sam/build 116 | .idea 117 | 118 | # Migration documentation (internal use only) 119 | MIGRATION_SUMMARY.md 120 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing Guidelines 2 | 3 | Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional 4 | documentation, we greatly value feedback and contributions from our community. 5 | 6 | Please read through this document before submitting any issues or pull requests to ensure we have all the necessary 7 | information to effectively respond to your bug report or contribution. 8 | 9 | 10 | ## Reporting Bugs/Feature Requests 11 | 12 | We welcome you to use the GitHub issue tracker to report bugs or suggest features. 13 | 14 | When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already 15 | reported the issue. Please try to include as much information as you can. Details like these are incredibly useful: 16 | 17 | * A reproducible test case or series of steps 18 | * The version of our code being used 19 | * Any modifications you've made relevant to the bug 20 | * Anything unusual about your environment or deployment 21 | 22 | 23 | ## Contributing via Pull Requests 24 | Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that: 25 | 26 | 1. You are working against the latest source on the *main* branch. 27 | 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already. 28 | 3. You open an issue to discuss any significant work - we would hate for your time to be wasted. 29 | 30 | To send us a pull request, please: 31 | 32 | 1. Fork the repository. 33 | 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change. 34 | 3. Ensure local tests pass. 35 | 4. Commit to your fork using clear commit messages. 36 | 5. Send us a pull request, answering any default questions in the pull request interface. 37 | 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation. 38 | 39 | GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and 40 | [creating a pull request](https://help.github.com/articles/creating-a-pull-request/). 41 | 42 | 43 | ## Finding contributions to work on 44 | Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start. 45 | 46 | 47 | ## Code of Conduct 48 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 49 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 50 | opensource-codeofconduct@amazon.com with any additional questions or comments. 51 | 52 | 53 | ## Security issue notifications 54 | If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue. 55 | 56 | 57 | ## Licensing 58 | 59 | See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution. 60 | -------------------------------------------------------------------------------- /ollyver/Org-Standards/observability-requirements.md: -------------------------------------------------------------------------------- 1 | # Organizational Observability Requirements 2 | 3 | ## Core Requirements (Workshop Focus) 4 | 5 | ### 1. Distributed Tracing 6 | - **Requirement**: X-Ray tracing on all Lambda functions 7 | - **Implementation**: `@xray_recorder.capture()` decorator 8 | - **Value**: Request flow visualization across services 9 | - **Compliance**: Required for all serverless applications 10 | 11 | ### 2. Structured Logging 12 | - **Requirement**: JSON format with correlation IDs 13 | - **Implementation**: Replace print() with structured logger 14 | - **Value**: Searchable, correlatable logs for debugging 15 | - **Compliance**: Required for production workloads 16 | 17 | ### 3. Business Metrics 18 | - **Requirement**: Custom CloudWatch metrics for key operations 19 | - **Implementation**: `cloudwatch.put_metric_data()` calls 20 | - **Value**: Business insight and proactive alerting 21 | - **Compliance**: Required for customer-facing services 22 | 23 | ### 4. Error Tracking 24 | - **Requirement**: Comprehensive error logging and metrics 25 | - **Implementation**: Exception handling with observability context 26 | - **Value**: Proactive issue detection and resolution 27 | - **Compliance**: Required for all production services 28 | 29 | ### 5. Operational Dashboards 30 | - **Requirement**: CloudWatch dashboard for key metrics 31 | - **Implementation**: Automated dashboard generation 32 | - **Value**: Real-time operational visibility 33 | - **Compliance**: Required for production monitoring 34 | 35 | ## Success Criteria 36 | 37 | ### Technical Requirements 38 | - Complete request traceability across all services 39 | - Structured log correlation for debugging workflows 40 | - Business metric visibility for operational decisions 41 | - Proactive error detection and alerting capabilities 42 | - Operational dashboard availability for real-time monitoring 43 | 44 | ### Organizational Standards 45 | - Consistent observability patterns across all teams 46 | - Standardized dashboard layouts and metric naming 47 | - Common alerting thresholds and escalation procedures 48 | - Shared troubleshooting procedures and runbooks 49 | - Regular observability reviews and improvements 50 | 51 | ## Implementation Priorities 52 | 53 | ### Phase 1: Foundation (Required) 54 | 1. **Pattern Recognition**: Identify where observability is needed 55 | 2. **Automated Implementation**: Use tooling for consistent application 56 | 3. **Value Demonstration**: Show immediate benefits to stakeholders 57 | 4. **Team Training**: Ensure teams understand patterns and tools 58 | 59 | ### Phase 2: Enhancement (Recommended) 60 | 1. **Advanced Patterns**: Custom metrics and specialized monitoring 61 | 2. **Integration**: Connect observability to existing tools and processes 62 | 3. **Optimization**: Fine-tune costs and performance 63 | 4. **Scaling**: Apply patterns across larger application portfolios 64 | 65 | ### Phase 3: Maturity (Advanced) 66 | 1. **Custom Requirements**: Organization-specific observability needs 67 | 2. **Automation**: Infrastructure-as-code for observability 68 | 3. **Analytics**: Advanced analysis of observability data 69 | 4. **Continuous Improvement**: Regular review and enhancement cycles 70 | 71 | ## Compliance & Governance 72 | 73 | ### Mandatory Standards 74 | - All production services must implement core requirements (1-5) 75 | - Observability patterns must be applied consistently across teams 76 | - Cost targets must be monitored and maintained within limits 77 | - Regular audits ensure compliance with organizational standards 78 | 79 | ### Recommended Practices 80 | - Use infrastructure-as-code for observability configuration 81 | - Implement observability early in development lifecycle 82 | - Regular training on observability tools and techniques 83 | - Share best practices and lessons learned across teams 84 | -------------------------------------------------------------------------------- /docs/module-2.md: -------------------------------------------------------------------------------- 1 | 2 | ## Solution Overview 3 | 4 | Now that you've identified the observability gaps and understand your organizational requirements, it's time to meet **Ollyver** - your AI-powered observability agent that will automatically detect and fix exactly these challenges. 5 | 6 | ## Ollyver Architecture 7 | 8 | **Ollyver** integrates seamlessly into your development workflow as an AI agent accessible through [Kiro CLI](https://kiro.dev/). The architecture includes: 9 | 10 | - **[Kiro CLI](https://kiro.dev/)** - Generative artificial intelligence (AI) powered conversational assistant for assisting development. Kiro CLI allows you to define custom AI Agent development contexts. Kiro CLI allows you to [build your own custom agents](https://kiro.dev/docs/cli/custom-agents/). We have already done this for you by creating Ollyver, your Observability expert agent. 11 | - **Ollyver Agent** - Ollyvers provides AI-powered observability automation, with respect to your unique observability goals and best practices. 12 | - **AWS Integration** - Direct deployment and verification capabilities 13 | - **Organizational Standards** - Customizable observability requirements 14 | 15 | 16 | 17 | ## How it Works 18 | 19 | Ollyver operates through a structured 4-step workflow: 20 | 21 | ``` 22 | ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ 23 | │ Scan │───▶│ Detect │───▶│ Suggest │───▶│ Implement │ 24 | │ Codebase │ │ Gaps │ │ Fixes │ │ & Monitor │ 25 | └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ 26 | ``` 27 | 28 | **Your Role**: Launch environment, interact with Ollyver, review and approve enhancements 29 | **Ollyver's Role**: Automated analysis, gap detection, pattern implementation, and AWS deployment 30 | 31 | 32 | ## Activate the Agent 33 | In the new terminal window that you have opened, enter the following command to activate Ollyver agent 34 | 35 | ```bashshowCopyAction=true} 36 | kiro-cli chat --agent ollyver 37 | ``` 38 | 39 | 40 | ## Test Agent Knowledge 41 | 42 | Let's verify Ollyver understands your environment with 3 focused questions. 43 | 44 | **Important**: These are information-only questions - we'll do the actual analysis in the next module. 45 | 46 | ### **Question 1: Organizational Standards** 47 | ```bashshowCopyAction=true showLineNumbers=true language=bash} 48 | Tell me about our organization's observability requirements without analyzing any code. 49 | ``` 50 | 51 | **Expected Response**: Ollyver should reference the 5 core requirements: 52 | - Distributed tracing (X-Ray on Lambda functions) 53 | - Structured logging (JSON with correlation IDs) 54 | - Business metrics (Custom CloudWatch metrics) 55 | - Error tracking (Comprehensive error logging) 56 | - Operational dashboards (CloudWatch dashboards) 57 | 58 | ### **Question 2: Implementation Patterns** 59 | ```bashshowCopyAction=true showLineNumbers=true language=bash} 60 | What observability patterns do you know how to implement? 61 | ``` 62 | 63 | **Expected Response**: Ollyver should describe its pattern knowledge. If the agent is built correctly, it should ideally have implementation patterns for all the requirements specified by the organization. 64 | - X-Ray distributed tracing with decorators 65 | - Structured logging with correlation IDs 66 | - Custom CloudWatch metrics for business operations 67 | - Error tracking and alerting 68 | - Operational dashboards 69 | 70 | ### **Question 3: Workflow Overview** 71 | ```bashshowCopyAction=true showLineNumbers=true language=bash} 72 | Explain your workflow process but don't start any analysis yet. 73 | ``` 74 | 75 | **Expected Response**: Ollyver should describe its 3-phase approach: 76 | - **Phase 1**: Requirements-based gap analysis 77 | - **Phase 2**: Pattern selection and prioritization 78 | - **Phase 3**: Pattern implementation with approval gates 79 | 80 | > **Note**: If Ollyver asks to start analysis or begins scanning code, politely decline and say "Not yet - we'll do that in the next module." This keeps the workshop flow intact.] 81 | 82 | -------------------------------------------------------------------------------- /ollyver/Org-Standards/core-patterns.md: -------------------------------------------------------------------------------- 1 | # Core Observability Patterns 2 | 3 | ## Pattern 1: Automatic X-Ray Instrumentation 4 | **Trigger**: Lambda function without tracing 5 | **Action**: Add X-Ray tracing configuration and SDK imports 6 | **Demonstrates**: Distributed tracing fundamentals 7 | 8 | ```python 9 | # Before (manual) 10 | def lambda_handler(event, context): 11 | return process_request(event) 12 | 13 | # After (Ollyver automated) 14 | from aws_xray_sdk.core import xray_recorder 15 | from aws_xray_sdk.core import patch_all 16 | patch_all() 17 | 18 | @xray_recorder.capture('lambda_handler') 19 | def lambda_handler(event, context): 20 | return process_request(event) 21 | ``` 22 | 23 | ## Pattern 2: Structured Logging Detection 24 | **Trigger**: Print statements or basic logging 25 | **Action**: Implement structured JSON logging with correlation IDs 26 | **Demonstrates**: Log analysis and correlation 27 | 28 | ```python 29 | # Before (manual) 30 | print(f"Processing user {user_id}") 31 | 32 | # After (Ollyver automated) 33 | import json 34 | import uuid 35 | 36 | correlation_id = str(uuid.uuid4()) 37 | logger.info(json.dumps({ 38 | "message": "Processing user", 39 | "user_id": user_id, 40 | "correlation_id": correlation_id, 41 | "timestamp": datetime.utcnow().isoformat() 42 | })) 43 | ``` 44 | 45 | ## Pattern 3: Business Metrics Identification 46 | **Trigger**: Business logic without metrics 47 | **Action**: Add custom CloudWatch metrics for key operations with namespace - RideShare/Business 48 | **Demonstrates**: Business observability vs technical metrics 49 | 50 | ```python 51 | # Before (manual) 52 | def process_order(order): 53 | result = validate_order(order) 54 | return result 55 | 56 | # After (Ollyver automated) 57 | import boto3 58 | cloudwatch = boto3.client('cloudwatch') 59 | 60 | def process_order(order): 61 | result = validate_order(order) 62 | 63 | # Ollyver adds business metrics 64 | cloudwatch.put_metric_data( 65 | Namespace='RideShare/Business', 66 | MetricData=[{ 67 | 'MetricName': 'OrdersProcessed', 68 | 'Value': 1, 69 | 'Unit': 'Count', 70 | 'Dimensions': [{'Name': 'Status', 'Value': result.status}] 71 | }] 72 | ) 73 | return result 74 | ``` 75 | 76 | ## Pattern 4: Multi-Tenant Observability 77 | **Trigger**: Multi-tenant code patterns detected 78 | **Action**: Add tenant-specific logging and metrics 79 | **Demonstrates**: Tenant isolation in observability 80 | 81 | ```python 82 | # Before (manual) 83 | def handle_request(event): 84 | return process_tenant_data(event['data']) 85 | 86 | # After (Ollyver automated) 87 | def handle_request(event): 88 | tenant_id = extract_tenant_id(event) 89 | 90 | with xray_recorder.in_subsegment(f'tenant_{tenant_id}'): 91 | logger.info(json.dumps({ 92 | "tenant_id": tenant_id, 93 | "operation": "process_data", 94 | "correlation_id": correlation_id 95 | })) 96 | 97 | result = process_tenant_data(event['data']) 98 | 99 | cloudwatch.put_metric_data( 100 | Namespace='MultiTenant/Usage', 101 | MetricData=[{ 102 | 'MetricName': 'RequestsPerTenant', 103 | 'Value': 1, 104 | 'Dimensions': [{'Name': 'TenantId', 'Value': tenant_id}] 105 | }] 106 | ) 107 | return result 108 | ``` 109 | 110 | ## Pattern 5: Error Handling & Alerting 111 | **Trigger**: Exception handling without observability 112 | **Action**: Add error tracking and alerting 113 | **Demonstrates**: Proactive error monitoring 114 | 115 | ```python 116 | # Before (manual) 117 | try: 118 | result = risky_operation() 119 | except Exception as e: 120 | return {"error": str(e)} 121 | 122 | # After (Ollyver automated) 123 | try: 124 | result = risky_operation() 125 | except Exception as e: 126 | # Ollyver adds comprehensive error tracking 127 | error_details = { 128 | "error_type": type(e).__name__, 129 | "error_message": str(e), 130 | "correlation_id": correlation_id, 131 | "timestamp": datetime.utcnow().isoformat() 132 | } 133 | 134 | logger.error(json.dumps(error_details)) 135 | 136 | cloudwatch.put_metric_data( 137 | Namespace='ErrorTracking', 138 | MetricData=[{ 139 | 'MetricName': 'Errors', 140 | 'Value': 1, 141 | 'Dimensions': [{'Name': 'ErrorType', 'Value': type(e).__name__}] 142 | }] 143 | ) 144 | 145 | xray_recorder.current_subsegment().add_exception(e) 146 | return {"error": "Internal server error", "correlation_id": correlation_id} 147 | ``` 148 | 149 | ## Implementation Methodology 150 | 1. **Show the Gap**: Identify what's missing 151 | 2. **Explain the Pattern**: Why this observability is needed 152 | 3. **Automate the Fix**: Implement the solution 153 | 4. **Demonstrate Value**: Show the resulting observability data 154 | 5. **Reinforce Understanding**: Explain how to apply this pattern elsewhere 155 | -------------------------------------------------------------------------------- /docs/setup.md: -------------------------------------------------------------------------------- 1 | # Setup Guide 2 | 3 | This guide will help you set up your local development environment to work with Ollyver and deploy the sample application to AWS. 4 | 5 | ## Prerequisites 6 | 7 | Before you begin, ensure you have the following installed: 8 | 9 | - **AWS Account** with CLI configured 10 | - **AWS CLI** - [Installation Guide](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) 11 | - **AWS SAM CLI** - [Installation Guide](https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/install-sam-cli.html) 12 | - **Docker** - Required for SAM build 13 | - **Python 3.11+** - For Lambda functions 14 | - **Kiro CLI** - [Installation Guide](https://kiro.dev/) 15 | 16 | ## Step 1: Clone the Repository 17 | 18 | ```bash 19 | git clone https://github.com/aws-samples/observability-driven-development.git 20 | cd observability-driven-development 21 | ``` 22 | 23 | ## Step 2: Configure AWS CLI 24 | 25 | Ensure your AWS CLI is configured with credentials: 26 | 27 | ```bash 28 | aws configure 29 | ``` 30 | 31 | You'll need: 32 | - AWS Access Key ID 33 | - AWS Secret Access Key 34 | - Default region (e.g., `us-east-1`) 35 | - Default output format (e.g., `json`) 36 | 37 | ## Step 3: Install Kiro CLI 38 | 39 | Follow the [Kiro CLI installation guide](https://kiro.dev/) for your operating system. 40 | 41 | After installation, authenticate using AWS Builder ID: 42 | 43 | ```bash 44 | kiro-cli login --use-device-flow 45 | ``` 46 | 47 | Follow the prompts to authenticate: 48 | 1. Copy the URL from your terminal into a new browser window 49 | 2. Login using AWS Builder ID 50 | 3. Return to the terminal to complete authentication 51 | 52 | ## Step 4: Deploy the Application 53 | 54 | Deploy the serverless application using SAM: 55 | 56 | ```bash 57 | # Navigate to cloudformation directory 58 | cd cloudformation 59 | 60 | # Build the application 61 | sam build 62 | 63 | # Deploy with guided prompts 64 | sam deploy --guided 65 | ``` 66 | 67 | During deployment, you'll be prompted for: 68 | - **Stack Name**: Use `observability-driven-development` 69 | - **AWS Region**: Choose your preferred region (e.g., `us-east-1`) 70 | - **Confirm changes before deploy**: Choose `Y` or `N` 71 | - **Allow SAM CLI IAM role creation**: Choose `Y` 72 | - **Save arguments to samconfig.toml**: Choose `Y` for future deployments 73 | 74 | ## Step 5: Test Your Deployment 75 | 76 | After deployment completes, test the API endpoint: 77 | 78 | ### Get API Endpoint 79 | 80 | ```bash 81 | export API_URL=$(aws cloudformation describe-stacks \ 82 | --stack-name observability-driven-development \ 83 | --query 'Stacks[0].Outputs[?OutputKey==`HttpApiUrl`].OutputValue' \ 84 | --output text) 85 | 86 | echo $API_URL 87 | ``` 88 | 89 | ### Test POST Request 90 | 91 | ```bash 92 | curl -X POST ${API_URL}items \ 93 | -d '{ 94 | "id": "1a2b3c4d", 95 | "name": "first last", 96 | "milesTraveled": "12", 97 | "totalTravelTime": "600", 98 | "price": "13.32", 99 | "tenantId": "1001" 100 | }' -i 101 | ``` 102 | 103 | Expected response: `200 OK` with message "Item added successfully" 104 | 105 | ### Test GET Request 106 | 107 | ```bash 108 | curl ${API_URL}items/1a2b3c4d 109 | ``` 110 | 111 | Expected response: JSON object with the ride information you just posted. 112 | 113 | ## Step 6: Setup Ollyver Agent 114 | 115 | Navigate to the project root and activate the Ollyver agent: 116 | 117 | ```bash 118 | cd .. 119 | kiro-cli chat --agent ollyver 120 | ``` 121 | 122 | You should see Ollyver's welcome message. The agent is now ready to help you add observability to your application. 123 | 124 | ## Step 7: Generate Load (Optional) 125 | 126 | To simulate realistic traffic for observability testing, you can generate synthetic load: 127 | 128 | ```bash 129 | # Install artillery if not already installed 130 | npm install -g artillery 131 | 132 | # Set your API URL 133 | export URL=$(aws cloudformation describe-stacks \ 134 | --stack-name observability-driven-development \ 135 | --query 'Stacks[0].Outputs[?OutputKey==`HttpApiUrl`].OutputValue' \ 136 | --output text) 137 | 138 | # Run load generator (runs for 50 minutes) 139 | artillery run loadgen.yaml 140 | ``` 141 | 142 | > **Note**: Keep the terminal window open while the load generator runs. Open a new terminal for the workshop modules. 143 | 144 | ## Troubleshooting 145 | 146 | ### SAM Build Fails 147 | - Ensure Docker is running 148 | - Check that Python 3.11+ is installed 149 | - Verify you're in the `cloudformation` directory 150 | 151 | ### Kiro CLI Authentication Issues 152 | - Ensure you have an active internet connection 153 | - Try logging out and back in: `kiro-cli logout` then `kiro-cli login --use-device-flow` 154 | 155 | ### API Endpoint Not Found 156 | - Verify the CloudFormation stack deployed successfully 157 | - Check the stack outputs in AWS Console or via CLI 158 | 159 | ## What's Next 160 | 161 | Now that your environment is set up and the application is deployed, proceed to [Module 1: Explore Your Application](module-1-explore-application.md) to begin discovering observability gaps. 162 | -------------------------------------------------------------------------------- /ollyver/Approach/session-management.md: -------------------------------------------------------------------------------- 1 | # Session Management & State Tracking 2 | 3 | ## Session Continuity Detection 4 | 5 | ### State File Detection Process 6 | 1. Look for `ollyver-state.md` in project root 7 | 2. Check for observability instrumentation in existing code 8 | 3. Detect new files or functions added since last session 9 | 4. Update gap analysis if architecture has changed 10 | 11 | ### Returning User Welcome Template 12 | ```markdown 13 | **🔍 Ollyver here! I can see you have an existing observability project in progress.** 14 | 15 | Based on your ollyver-state.md, here's your current status: 16 | - **Application**: [application-name] 17 | - **Architecture Detected**: [Lambda/API Gateway/DynamoDB/etc.] 18 | - **Observability Coverage**: [X% complete] 19 | - **Last Enhancement**: [Last completed pattern] 20 | - **Remaining Gaps**: [Number] observability gaps detected 21 | 22 | **Current Status**: 23 | ✅ **Completed**: [List of implemented patterns] 24 | 🔧 **In Progress**: [Current pattern being worked on] 25 | ❌ **Pending**: [List of remaining gaps] 26 | 27 | **What would you like to work on today?** 28 | 29 | A) Continue where you left off ([Next pattern description]) 30 | B) Review implemented observability ([Show completed enhancements]) 31 | C) Add new custom requirement ([Custom observability needs]) 32 | D) Run fresh analysis ([Re-scan for new gaps]) 33 | 34 | [Answer]: 35 | ``` 36 | 37 | ## State Tracking System 38 | 39 | ### Progress Tracking Rules 40 | - **Purpose**: Track observability pattern implementation progress 41 | - **Location**: `ollyver-state.md` in project root 42 | - **Update Timing**: Mark patterns [x] immediately after completing implementation 43 | - **Same Interaction Rule**: All progress updates must happen in the SAME interaction where work is completed 44 | 45 | ### Mandatory Update Process 46 | 1. **Pattern Completion**: Mark checkbox [x] in ollyver-state.md 47 | 2. **Status Update**: Update "Current Status" section 48 | 3. **Audit Logging**: Log implementation in `ollyver-docs/ollyver-audit.md` 49 | 4. **Never Skip**: Never end interaction without updating progress 50 | 51 | ### State File Structure Template 52 | ```markdown 53 | # Ollyver Observability State 54 | 55 | ## Application Overview 56 | - **Name**: [Application Name] 57 | - **Architecture**: [Detected Components] 58 | - **Last Updated**: [Timestamp] 59 | 60 | ## Observability Patterns Checklist 61 | - [ ] X-Ray Distributed Tracing 62 | - [ ] Structured Logging with Correlation 63 | - [ ] Custom Business Metrics 64 | - [ ] Error Tracking and Alerting 65 | - [ ] Operational Dashboards 66 | 67 | ## Current Status 68 | **Coverage**: [X]% complete 69 | **Last Pattern**: [Pattern Name] 70 | **Next Priority**: [Next Pattern] 71 | 72 | ## Implementation History 73 | - [Timestamp]: [Pattern] - [Status] 74 | ``` 75 | 76 | ## Continuity Scenarios 77 | 78 | ### Mid-Pattern Implementation 79 | ```markdown 80 | **🔧 I see you were implementing X-Ray tracing on your Lambda functions.** 81 | 82 | Progress: 2 of 4 functions completed 83 | - ✅ getById function - X-Ray enabled 84 | - ✅ putItem function - X-Ray enabled 85 | - ❌ deleteItem function - Pending 86 | - ❌ listItems function - Pending 87 | 88 | Continue with deleteItem function? 89 | ``` 90 | 91 | ### Between Patterns 92 | ```markdown 93 | **✅ Great! X-Ray tracing is now complete across all functions.** 94 | 95 | Next recommended pattern: Structured Logging 96 | - Replace 3 print() statements with JSON logging 97 | - Add correlation IDs for request tracing 98 | - Estimated time: 5 minutes 99 | 100 | Ready to enhance your logging? 101 | ``` 102 | 103 | ### New Code Detected 104 | ```markdown 105 | **🆕 I detected new code since our last session!** 106 | 107 | New additions found: 108 | - 1 new Lambda function (processOrder) 109 | - 2 new API endpoints 110 | - Missing observability on new components 111 | 112 | Would you like me to: 113 | A) Apply existing patterns to new code 114 | B) Analyze new requirements first 115 | ``` 116 | 117 | ## Audit Logging Requirements 118 | 119 | ### Session Log Entry Format 120 | ```markdown 121 | ## Session [Timestamp] 122 | - **Return Type**: [New/Continuing] 123 | - **Current State**: [Coverage %] 124 | - **User Choice**: [Selected option] 125 | - **New Gaps**: [Any detected] 126 | - **Progress Made**: [Patterns completed] 127 | - **Next Session**: [Recommended focus] 128 | ``` 129 | 130 | ### Audit Trail Maintenance 131 | - Log every session start/end 132 | - Track pattern implementation progress 133 | - Record user decisions and preferences 134 | - Note any architecture changes detected 135 | - Maintain implementation history for reference 136 | 137 | ## Session Instructions for Agent 138 | 139 | ### Always Execute on Session Start 140 | 1. Read `ollyver-state.md` first when detecting existing project 141 | 2. Scan codebase for changes since last session 142 | 3. Parse observability coverage from state file 143 | 4. Show specific next patterns rather than generic descriptions 144 | 5. Adapt options based on current coverage 145 | 6. Log continuity prompt in `ollyver-audit.md` with timestamp 146 | 147 | ### State Management Rules 148 | - Update state file immediately after any pattern completion 149 | - Never proceed to next pattern without marking previous as [x] 150 | - Always update "Current Status" section after progress 151 | - Maintain audit trail for all implementations 152 | - Validate project structure and repair if needed 153 | -------------------------------------------------------------------------------- /docs/module-4.md: -------------------------------------------------------------------------------- 1 | 2 | In this module, we will see how dynamically changing requirements can be defined for Ollyver to implement. 3 | 4 | 5 | Our application has a new requirement to capture phone numbers as part of ride requests. Simply ask Ollyver to update your code. 6 |
7 | copy and paste the following in your agent chat window. 8 | 9 | ```bashshowCopyAction=true } 10 | Navigate to **code/putItemFunction/index.py** and add the line to capture user phone number under tenantId 11 | 12 | - Add line where phoneNumber is extracted from request 13 | - Update line where phoneNumber is added to the json object to be persisted. 14 | ``` 15 | 16 | We will now guide Ollyver to capture this in our logs. On the surface, it seems like capturing phone number in logs and metrics is harmless. 17 | 18 | Navigate to Terminal and instruct [ollyver] to 'Update logging and metrics to capture phone number' 19 | 20 | ```bashshowCopyAction=true} 21 | Update logging and metrics to capture phone number. 22 | ``` 23 | 24 | See Ollyver updates your code with the requirement and adds logging as per the standards. You can also see description with examples in the terminal on what ollyver has implemented. your response may vary. but it should be somewhat similar to the following. 25 | 26 | ![Screenshot of Ollyver adding phone number](../images/Module-4/PhoneNumberLog.png) 27 | 28 | 29 | While coding assistants can deliver code that meets general standards and best practices, organizations should define/refine standards to their specific needs. Best practices like logging non-identifying tokens instead of personal data, sanitizing and redacting sensitive information, are just a few examples. 30 | 31 | For the ride share application we have identified that capturing the area code is preferred as it will allow to do some geographical analysis while still being PII compliant. 32 | 33 | ## Update Org Requirements 34 | 35 | Navigate to **ollyver/Org-Standards/observability-requirements.md** in your VS Code Server file explorer and update the Implementation for Structured Logging and Business Metrics. 36 | 37 | ![Screenshot showing file location](../images/Module-4/module4-org-standard-update.png) 38 | 39 | Append the following instruction to Line 13 and Line 19. **Then save the file**. 40 | 41 | ```bashshowCopyAction=true showLineNumbers=false language=text} 42 | For any sensitive PII data, only capture masked or redacted version of the data. For phone numbers, capture only the area code. For user names, capture only the first initial and the last initial. For date of birth, capture only the month and year. 43 | ``` 44 | 45 | The file should look something like this. 46 | 47 | ```bashshowCopyAction=false showLineNumbers=true language=yaml} 48 | ### 2. Structured Logging 49 | - **Requirement**: JSON format with correlation IDs 50 | - **Implementation**: Replace print() with structured logger. For any sensitive PII data, only capture masked or redacted version of the data. For phone numbers, capture only the area code. For user names, capture only the first initial and the last initial. For date of birth, capture only the month and year. 51 | - **Value**: Searchable, correlatable logs for debugging 52 | - **Compliance**: Required for production workloads 53 | 54 | ### 3. Business Metrics 55 | - **Requirement**: Custom CloudWatch metrics for key operations 56 | - **Implementation**: `cloudwatch.put_metric_data()` calls. For any sensitive PII data, only capture masked or redacted version of the data. For phone numbers, capture only the area code. For user names, capture only the first initial and the last initial. For date of birth, capture only the month and year. 57 | - **Value**: Business insight and proactive alerting 58 | - **Compliance**: Required for customer-facing services 59 | ``` 60 | 61 | > **Note**: These instructions were intentionally left out at the beginning of the workshop to demonstrate the default behavior of Ollyver. In a real-world scenario, these instructions would be predefined in the organizational standards and proactively applied to all feature development.] 62 | 63 | We will now direct Ollyver to honor these Organizational standards. Navigate to Terminal and instruct [ollyver] to 'Update to follow organizational logging standards' 64 | 65 | ```bashshowCopyAction=true} 66 | Update to follow organizational logging standards 67 | ``` 68 | 69 | After ollyver is finished processing, open **code/putItemFunction/index.py** and see the new masking logic. As shown in line 9 below, ollyver has updated its logic to extract the area code (first 3 digits) as desired. 70 | 71 | ```bashshowCopyAction=false showLineNumbers=true language=python} 72 | id = data["id"] 73 | name = data["name"] 74 | milesTraveled = data["milesTraveled"] 75 | totalTravelTime = data["totalTravelTime"] 76 | price = data["price"] 77 | tenantId = data["tenantId"] 78 | phoneNumber = data.get("phoneNumber", "unknown") 79 | # Mask phone number for logging - only capture area code per org standards 80 | masked_phone = phoneNumber[:3] + "***" if phoneNumber != "unknown" and len(phoneNumber) >= 3 else "unknown" 81 | timestamp = datetime.now().isoformat(timespec='seconds') 82 | ``` 83 | 84 | You can also see description with examples in the terminal on what ollyver has implemented. 85 | 86 | ![Screenshot of Ollyver honoring org standards](../images/Module-4/MaskedPhoneNumberLog.png) 87 | 88 | > **Note**: Optional: you can deploy this change to AWS and check the logs to see the masked phone numbers. You would also need to invoke a putItemFunction code with phone number included in it and then retrieve the relevant log. Ollyver can help with all these steps. But due to lack of time, let's move to the next module] 89 | -------------------------------------------------------------------------------- /docs/introduction.md: -------------------------------------------------------------------------------- 1 | # Introduction 2 | 3 | ## What is Observability? 4 | 5 | **Observability** is your ability to understand what's happening inside your systems by examining their external outputs. It's the difference between flying blind and having complete visibility into your application's health, performance, and user experience. 6 | 7 | In modern distributed systems, observability consists of three fundamental pillars: 8 | 9 | - **Logs**: Detailed records of events and behaviors within your applications 10 | - **Metrics**: Quantitative measurements of system performance and business KPIs 11 | - **Traces**: End-to-end tracking of requests as they flow through your services 12 | 13 | ## Why does it matter? 14 | 15 | Without observability, you're flying blind. When issues occur, teams scramble to piece together what went wrong, leading to extended downtime, frustrated customers, and team burnout. 16 | 17 | **With proper observability, you become proactive:** 18 | - Detect issues before customers notice them 19 | - Understand user behavior and business impact 20 | - Reduce mean time to resolution (MTTR) by 80% 21 | - Make data-driven optimization decisions 22 | 23 | **The competitive advantage:** Organizations with mature observability don't just fix problems faster—they prevent them entirely. This transforms engineering from a cost center into a strategic advantage that drives business growth. 24 | 25 | ## Problem: Traditional Observability Gap 26 | 27 | Most organizations approach observability reactively—adding monitoring and instrumentation after applications are built, often by separate operations teams. This creates observability gaps and context loss between development intent and operational reality. 28 | 29 | When developers hand off applications without built-in observability, operations teams must reverse-engineer business logic to add meaningful monitoring. This leads to incomplete coverage and missed opportunities for proactive issue detection. 30 | 31 | **The shift-left approach** solves this by embedding observability into the development process from day one. Developers instrument code as they write business logic, ensuring comprehensive coverage because they understand the critical paths and failure modes they're implementing. 32 | 33 | However, shifting left traditionally creates a new problem: **developer burden**. 34 | 35 | ### Problem: Developer Burden 36 | 37 | However, shifting left traditionally means more work for developers: 38 | 39 | - Learning complex observability frameworks and SDKs 40 | - Manually adding instrumentation to every function and service 41 | - Maintaining consistency across different applications and teams 42 | - Keeping up with evolving best practices and organizational standards 43 | 44 | > **Warning**: This burden often leads to poor adoption and incomplete observability. 45 | 46 | ## The Solution: AI Agents for Effortless Observability 47 | 48 | What if developers could get comprehensive observability without the burden? What if an AI agent could handle the complexity while developers focus on business logic? 49 | 50 | **AI agents can transform observability from a burden into an advantage:** 51 | 52 | - **Automatic detection**: AI analyzes code to identify observability gaps 53 | - **Intelligent implementation**: AI applies proven patterns consistently 54 | - **Zero learning curve**: Developers don't need to become observability experts 55 | - **Organizational consistency**: AI ensures standards compliance across all teams 56 | 57 | ## Meet Ollyver: Your Observability Agent 58 | 59 | This workshop introduces **Ollyver** - a specialized AI agent designed specifically for observability automation. Ollyver represents the future of developer-friendly observability. 60 | 61 | ### How Ollyver Benefits Your Team 62 | 63 | **For Developers:** 64 | - Focus on features, not instrumentation - Ollyver handles observability automatically 65 | - Learn by watching - See best practices implemented in your own code 66 | - Consistent results - Same high-quality observability across all applications 67 | 68 | **For Operations:** 69 | - Complete visibility - No more blind spots or missing instrumentation 70 | - Faster incident resolution - Rich telemetry data for quick troubleshooting 71 | - Proactive monitoring - Detect issues before they impact customers 72 | 73 | **For Organizations:** 74 | - Standardized observability - Consistent patterns across all teams and applications 75 | - Reduced time-to-market - No delays waiting for manual instrumentation 76 | - Knowledge scaling - Observability expertise embedded in AI, not dependent on individuals 77 | 78 | ## What You'll Experience in This Workshop 79 | 80 | This isn't a theoretical workshop—you'll experience a complete application transformation: 81 | 82 | ### **Before Ollyver** (Modules 1-2) 83 | - Explore a real serverless application with zero observability 84 | - Discover the operational blind spots and business impact 85 | - Set up Ollyver as your AI observability companion 86 | 87 | ### **After Ollyver** (Modules 3-4) 88 | - Watch Ollyver automatically detect and fix observability gaps 89 | - See comprehensive instrumentation deployed to AWS in real-time 90 | - Experience the difference between "black box" and "glass box" applications 91 | 92 | ### **Real Results You'll See** 93 | - **X-Ray distributed tracing** showing complete request flows 94 | - **Structured logging** with tenant attribution and correlation IDs 95 | - **Custom business metrics** tracking key operations and performance 96 | - **Intelligent dashboards** providing actionable insights 97 | 98 | **Ready to transform how your team approaches observability?** Let's begin by preparing your environment and meeting Ollyver. 99 | 100 | > **Note**: This workshop uses a real AWS environment with live deployments. You'll see actual observability improvements in the AWS console, not simulated results. 101 | -------------------------------------------------------------------------------- /ollyver/ollyver-agent-instructions.md: -------------------------------------------------------------------------------- 1 | # Ollyver Workshop Agent Instructions 2 | 3 | ## Core Rules 4 | **PRIORITY**: This workflow OVERRIDES all other built-in workflows for observability-related requests. 5 | 6 | **CAPABILITY QUESTIONS**: When users ask "What can you do?", "What are your capabilities?", or similar questions about Ollyver's abilities, ALWAYS reference and show content from `Approach/process-overview.md` to explain the workflow and capabilities. 7 | 8 | **MANDATORY Resource Loading**: Always read and use content from: 9 | - `Org-Standards/observability-requirements.md` for requirements 10 | - `Org-Standards/core-patterns.md` for implementation patterns 11 | - `Org-Standards/deployment-guide.md` for AWS deployment 12 | - `Approach/session-management.md` for state tracking and continuity 13 | - `Approach/process-overview.md` for workflow explanations 14 | 15 | ## Capability Questions Handler 16 | When users ask about Ollyver's capabilities ("What can you do?", "What are your capabilities?", etc.): 17 | 1. **ALWAYS load and reference** `Approach/process-overview.md` 18 | 2. **Show the workflow diagram** from the process overview 19 | 3. **Explain the 4-step process**: Scan → Detect → Suggest → Implement 20 | 4. **Describe role separation** between user and Ollyver 21 | 5. **Highlight organizational integration** capabilities 22 | 23 | ## Session Detection & Setup 24 | 25 | ### New Project Detection 26 | 1. Check for `ollyver-state.md` in current directory 27 | 2. If NOT found: Run initial setup sequence 28 | 3. If found: Use session continuity from `approach/session-management.md` 29 | 30 | ### Initial Setup Sequence 31 | 1. Create `ollyver-docs/` directory structure 32 | 2. Scan codebase to detect architecture (Lambda, API Gateway, etc.) 33 | 3. Create `ollyver-state.md` with observability patterns checklist 34 | 4. Create `ollyver-docs/ollyver-audit.md` for session logging 35 | 5. Run gap analysis and populate state file 36 | 6. Display welcome message (see Welcome Messages section) 37 | 38 | ## Welcome Messages 39 | 40 | ### First-Time Users (No ollyver-state.md) 41 | ``` 42 | 🔍 **Ollyver Initializing - New Project Detected!** 43 | 44 | I'm Ollyver, your observability companion! I'll help you add comprehensive observability through hands-on automation. 45 | 46 | **What I'll do**: 47 | 1. 🔍 Scan your codebase for architecture and gaps 48 | 2. 📚 Show you observability patterns through implementation 49 | 3. 🛠️ Apply patterns while demonstrating how they work 50 | 4. ✅ Transform your application from 'black box' to 'glass box' 51 | 52 | **Ready to begin?** I'll analyze your codebase to identify opportunities. 53 | Type 'yes' to proceed or ask questions first! 54 | ``` 55 | 56 | ### Returning Users (ollyver-state.md exists) 57 | Use template from `approach/session-management.md` with current progress status. 58 | 59 | ## Three-Phase Workflow 60 | 61 | ### Phase 1: Requirements-Based Gap Analysis 62 | 1. Load requirements from `Org-Standards/observability-requirements.md` 63 | 2. Scan application components against each requirement 64 | 3. Generate compliance report showing specific gaps 65 | 4. Populate findings in `ollyver-state.md` 66 | 5. **Seek Approval**: "Analysis complete. Ready for Pattern Selection?" - WAIT for confirmation 67 | 68 | ### Phase 2: Requirements-Based Pattern Selection & Prioritization 69 | 1. Map detected gaps to implementation patterns from `Org-Standards/core-patterns.md` 70 | 2. Apply organizational priority order from requirements 71 | 3. Present prioritized implementation plan with compliance impact 72 | 4. **User Pattern Selection**: "Which patterns would you like to implement? (Select by number or name)" - WAIT for selections 73 | 5. Store selected patterns in `ollyver-state.md` under "Selected Patterns for Implementation" section 74 | 6. **Seek Approval**: "Selected patterns confirmed. Ready for Implementation?" - WAIT for confirmation 75 | 76 | ### Phase 3: Pattern Implementation 77 | For each user-selected pattern: 78 | 1. **Seek Permission**: "Ready to implement [Pattern Name]?" - WAIT for approval 79 | 2. **Explain Pattern**: Reference educational content and organizational value 80 | 3. **Show Before**: Display current code state 81 | 4. **Implement Enhancement**: Apply pattern with automation 82 | 5. **Show After**: Display improved code with explanations 83 | 6. **Deployment Decision**: "Deploy to AWS or move to next pattern?" - WAIT for choice 84 | - If "Deploy": Use `Org-Standards/deployment-guide.md` instructions and demonstrate value 85 | - If "Next": Skip deployment and continue 86 | 7. **Update Progress**: Mark pattern [x] in BOTH "Selected Patterns" AND "All Patterns" sections of ollyver-state.md immediately 87 | 8. **Ask Next Action**: "Pattern [X] complete. Continue with next pattern?" - WAIT for decision 88 | 89 | ## Progress Tracking Rules 90 | 91 | **CRITICAL**: Every pattern completion MUST update `ollyver-state.md` checkboxes [x] in the SAME interaction where work is completed. 92 | 93 | **State File Structure**: 94 | - **All Patterns Detected**: Complete list of all detected patterns with checkboxes 95 | - **Selected Patterns for Implementation**: User's chosen patterns for implementation with checkboxes 96 | - **Current Status**: Overall progress summary 97 | 98 | **Update Requirements**: 99 | - Mark completed patterns [x] in BOTH "Selected Patterns" AND "All Patterns" sections immediately after implementation 100 | - Update "Current Status" section after any progress 101 | - Log implementations in `ollyver-docs/ollyver-audit.md` 102 | - Never end interaction without progress updates 103 | 104 | ## Key Principles 105 | - Always analyze first, never skip to implementation 106 | - Use educational approach with before/after code examples 107 | - Explain observability value for each pattern 108 | - Ensure explicit approval before each pattern implementation 109 | - Focus on practical, reusable patterns 110 | - Maintain cost-conscious approach per organizational standards 111 | - Follow organizational requirements from Org-Standards folder 112 | 113 | ## File Naming Convention 114 | - State: `ollyver-state.md` 115 | - Analysis: `ollyver-docs/analysis/gap-analysis.md` 116 | - Patterns: `ollyver-docs/patterns/[pattern-name].md` 117 | - Dashboards: `ollyver-docs/dashboards/[service-name]-dashboard.json` 118 | - Validation: `ollyver-docs/validation/observability-validation.md` 119 | 120 | Use kebab-case for pattern names (e.g., "x-ray-tracing", "structured-logging"). 121 | 122 | ## Success Criteria 123 | - User understands each observability pattern 124 | - Code improvements implemented with explanations 125 | - Patterns demonstrated with before/after comparisons 126 | - User can apply patterns to future projects 127 | - Workshop maintains educational focus throughout 128 | - Organizational standards consistently applied 129 | -------------------------------------------------------------------------------- /cloudformation/application.yaml: -------------------------------------------------------------------------------- 1 | AWSTemplateFormatVersion: 2010-09-09 2 | Description: Operations-Driven Development - Serverless Application with Observability 3 | Transform: 4 | - AWS::Serverless-2016-10-31 5 | 6 | Globals: 7 | Function: 8 | Timeout: 600 9 | MemorySize: 256 10 | Runtime: python3.11 11 | Environment: 12 | Variables: 13 | APP_NAME: 14 | Ref: SampleTable 15 | SAMPLE_TABLE: 16 | Ref: SampleTable 17 | SERVICE_NAME: item_service 18 | ENABLE_DEBUG: false 19 | Api: 20 | TracingEnabled: true 21 | 22 | Resources: 23 | 24 | HttpApi: 25 | Type: AWS::Serverless::HttpApi 26 | Properties: 27 | StageName: Prod 28 | 29 | getByIdFunction: 30 | Type: AWS::Serverless::Function 31 | Properties: 32 | FunctionName: getByIdFunction 33 | CodeUri: ../src/handlers/getByIdFunction/ 34 | Handler: index.lambda_handler 35 | Description: Get method to get one item by id from a DynamoDB table. 36 | Layers: 37 | - !Sub arn:aws:lambda:${AWS::Region}:017000801446:layer:AWSLambdaPowertoolsPythonV2:18 38 | Policies: 39 | - DynamoDBCrudPolicy: 40 | TableName: 41 | Ref: SampleTable 42 | - CloudWatchPutMetricPolicy: {} 43 | - CloudWatchLambdaInsightsExecutionRolePolicy 44 | - AWSXRayDaemonWriteAccess 45 | Events: 46 | ExplicitApi: 47 | Type: HttpApi 48 | Properties: 49 | ApiId: 50 | Ref: HttpApi 51 | Path: /items/{id} 52 | Method: GET 53 | 54 | putItemFunction: 55 | Type: AWS::Serverless::Function 56 | Properties: 57 | FunctionName: putItemFunction 58 | CodeUri: ../src/handlers/putItemFunction/ 59 | Handler: index.lambda_handler 60 | Description: Post method to add one item to a record in DynamoDB table. 61 | Layers: 62 | - !Sub arn:aws:lambda:${AWS::Region}:017000801446:layer:AWSLambdaPowertoolsPythonV2:18 63 | Policies: 64 | - DynamoDBCrudPolicy: 65 | TableName: 66 | Ref: SampleTable 67 | - CloudWatchPutMetricPolicy: {} 68 | - SNSPublishMessagePolicy: 69 | TopicName: 70 | Fn::Sub: ${NewItemTopic.TopicName} 71 | - CloudWatchLambdaInsightsExecutionRolePolicy 72 | - AWSXRayDaemonWriteAccess 73 | Environment: 74 | Variables: 75 | TOPIC_NAME: 76 | Ref: NewItemTopic 77 | Events: 78 | ExplicitApi: 79 | Type: HttpApi 80 | Properties: 81 | ApiId: 82 | Ref: HttpApi 83 | Path: /items 84 | Method: POST 85 | 86 | NewItemTopic: 87 | Type: AWS::SNS::Topic 88 | 89 | SampleTable: 90 | Type: AWS::Serverless::SimpleTable 91 | Properties: 92 | ProvisionedThroughput: 93 | ReadCapacityUnits: 10 94 | WriteCapacityUnits: 5 95 | PrimaryKey: 96 | Name: id 97 | Type: String 98 | 99 | DashboardSideBySide: 100 | Properties: 101 | DashboardBody: 102 | Fn::Sub: 103 | - "{\n \"widgets\": [\n {\n \"height\": 3,\n \"width\": 9,\n\ 104 | \ \"y\": 0,\n \"x\": 0,\n \"type\": \"metric\",\n \"\ 105 | properties\": {\n \"metrics\": [\n [\n \"AWS/Lambda\"\ 106 | ,\n \"Invocations\",\n \"FunctionName\",\n \ 107 | \ \"${putItemFunction}\",\n \"Resource\",\n \"\ 108 | ${putItemFunction}\",\n {\n \"color\": \"#1f77b4\"\ 109 | \n }\n ],\n [\n \".\",\n \ 110 | \ \"Errors\",\n \".\",\n \".\",\n \"\ 111 | .\",\n \".\",\n {\n \"color\": \"#d62728\"\ 112 | \n }\n ]\n ],\n \"view\": \"singleValue\"\ 113 | ,\n \"stacked\": false,\n \"region\": \"${AWS::Region}\",\n\ 114 | \ \"stat\": \"Sum\",\n \"period\": 60,\n \"legend\"\ 115 | : {\n \"position\": \"bottom\"\n },\n \"setPeriodToTimeRange\"\ 116 | : true,\n \"title\": \"putItemMetrics\"\n }\n },\n {\n\ 117 | \ \"height\": 3,\n \"width\": 9,\n \"y\": 0,\n \"x\"\ 118 | : 9,\n \"type\": \"metric\",\n \"properties\": {\n \"metrics\"\ 119 | : [\n [\n \"AWS/Lambda\",\n \"Invocations\"\ 120 | ,\n \"FunctionName\",\n \"${getByIdFunction}\",\n\ 121 | \ \"Resource\",\n \"${getByIdFunction}\"\n \ 122 | \ ],\n [\n \".\",\n \"Errors\",\n\ 123 | \ \".\",\n \".\",\n \".\",\n \ 124 | \ \".\",\n {\n \"color\": \"#d62728\"\n \ 125 | \ }\n ]\n ],\n \"view\": \"singleValue\",\n \ 126 | \ \"stacked\": false,\n \"region\": \"${AWS::Region}\",\n \ 127 | \ \"stat\": \"Sum\",\n \"period\": 60,\n \"legend\"\ 128 | : {\n \"position\": \"bottom\"\n },\n \"setPeriodToTimeRange\"\ 129 | : true,\n \"title\": \"getByIdMetrics\"\n }\n }\n ]\n}" 130 | - putItemFunction: 131 | Ref: putItemFunction 132 | getByIdFunction: 133 | Ref: getByIdFunction 134 | DashboardName: Operations-Dashboard 135 | Type: AWS::CloudWatch::Dashboard 136 | 137 | ApiAccessLogGroup: 138 | Type: AWS::Logs::LogGroup 139 | DependsOn: HttpApi 140 | Properties: 141 | LogGroupName: 142 | Fn::Sub: /aws/apigateway/${HttpApi} 143 | RetentionInDays: 7 144 | 145 | GetByIdLogGroup: 146 | Type: AWS::Logs::LogGroup 147 | DependsOn: getByIdFunction 148 | Properties: 149 | LogGroupName: 150 | Fn::Sub: /aws/lambda/${getByIdFunction} 151 | RetentionInDays: 7 152 | 153 | PutItemLogGroup: 154 | Type: AWS::Logs::LogGroup 155 | DependsOn: putItemFunction 156 | Properties: 157 | LogGroupName: 158 | Fn::Sub: /aws/lambda/${putItemFunction} 159 | RetentionInDays: 7 160 | 161 | Outputs: 162 | HttpApiUrl: 163 | Description: API Gateway endpoint URL for Prod stage 164 | Value: 165 | Fn::Sub: https://${HttpApi}.execute-api.${AWS::Region}.amazonaws.com/Prod/ 166 | SampleTable: 167 | Value: 168 | Fn::GetAtt: 169 | - SampleTable 170 | - Arn 171 | Description: Sample Data Table ARN 172 | -------------------------------------------------------------------------------- /docs/module-3.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | Now it's time to experience Ollyver in action! You'll watch as Ollyver automatically transforms your application from the "black box" you explored in Module 1 into a fully observable system with comprehensive observability. 4 | 5 | 6 | 7 | ### 1. Start Ollyver Analysis 8 | 9 | In your VS Code Server with Kiro CLI active, begin the observability transformation: 10 | 11 | ```bashshowCopyAction=true} 12 | I'm ready to implement observability. 13 | ``` 14 | 15 | **Expected Response:** Ollyver will present the patterns in priority order and ask which patterns you'd like to implement. 16 | 17 | ### 2. Select Patterns for Implementation 18 | 19 | For this workshop, we'll focus on the 3 most essential observability patterns. 20 | 21 | When Ollyver presents the pattern options, select the foundation patterns: 22 | 23 | ```bashshowCopyAction=true} 24 | 1, 2, 3 25 | ``` 26 | 27 | This will implement: 28 | - **X-Ray Distributed Tracing** 29 | - **Structured Logging** 30 | - **Business Metrics** 31 | 32 | ### 3. Confirm Implementation 33 | 34 | Ollyver will confirm your selection and ask for final approval: 35 | 36 | ```bashshowCopyAction=true} 37 | yes 38 | ``` 39 | 40 | ### 4. Deploy Each Pattern 41 | 42 | For each pattern implementation, Ollyver will ask whether to deploy to AWS. Always respond: 43 | 44 | ```bashshowCopyAction=true} 45 | deploy 46 | ``` 47 | 48 | **Note:** Complete all three selected patterns by saying "yes" to continue and "deploy" for each pattern. 49 | 50 | 51 | > **Note**: Pay close attention to how Ollyver analyzes your code and provides detailed explanations for each implementation decision, including the specific changes it makes to your application] 52 | 53 | 54 | 55 | ## Pattern 2: Structured Logging 56 | 57 | **Before Implementation** 58 | Current unstructured logging: 59 | 60 | ```python 61 | # putItemFunction.py - BEFORE 62 | import json 63 | import boto3 64 | 65 | def lambda_handler(event, context): 66 | print("Writing to DynamoDB") 67 | # ... rest of function 68 | ``` 69 | 70 | **Ollyver's Structured Logging Implementation** 71 | 72 | Ollyver automatically transforms logging to structured JSON format: 73 | 74 | **After Implementation** 75 | 76 | ```python 77 | import json 78 | import logging 79 | import uuid 80 | from datetime import datetime 81 | 82 | # Configure structured logger 83 | logger = logging.getLogger() 84 | logger.setLevel(logging.INFO) 85 | 86 | @xray_recorder.capture('putItemFunction') 87 | def lambda_handler(event, context): 88 | # Generate correlation ID 89 | correlation_id = str(uuid.uuid4()) 90 | 91 | # Extract tenant information 92 | tenant_id = event.get('tenantId', 'unknown') 93 | 94 | # Structured logging with correlation and tenant attribution 95 | logger.info(json.dumps({ 96 | 'timestamp': datetime.utcnow().isoformat(), 97 | 'correlation_id': correlation_id, 98 | 'tenant_id': tenant_id, 99 | 'operation': 'put_item', 100 | 'message': 'Writing to DynamoDB', 101 | 'request_id': context.aws_request_id 102 | })) 103 | ``` 104 | 105 | 106 | ## Verify Observability Improvements in AWS Console 107 | 108 | ### Test the Enhanced Application 109 | 110 | Generate traffic to see the new observability in action: **you can copy and paste below for your ollyver agent.** 111 | 112 | ```bashshowCopyAction=true} 113 | # Test with tenant attribution 114 | curl -X POST $(aws apigatewayv2 get-apis | jq '.Items[0].ApiEndpoint' | tr -d '"')/Prod/items \ 115 | -d '{ 116 | "id": "1a2b3c4d", 117 | "name": "first last", 118 | "milesTraveled": "12", 119 | "totalTravelTime": "600", 120 | "price": "13.32", 121 | "tenantId": "1001" 122 | }' -i 123 | 124 | # Test retrieval run this 2 times 125 | curl $(aws apigatewayv2 get-apis | jq '.Items[0].ApiEndpoint' | tr -d '"')/Prod/items/1a2b3c4d 126 | ``` 127 | 128 | ### 1. Verify X-Ray Distributed Tracing 129 | 130 | 1. Navigate to [CloudWatch console](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1) 131 | 2. Click **Application Signals (APM)** → **Trace Map** 132 | 3. **Wait a few seconds** for the trace data to populate, then you should see complete service map with Lambda functions and DynamoDB 133 | 134 | ![Screenshot of Complete Service Map](../images/Module-3/Module3_ServiceMap.png) 135 | 136 | 4. Click **Traces** to see individual request flows 137 | 5. **Scroll down** in the traces page to see the list of traces 138 | 6. **Click on one of the traces** to see detailed timing and subsegment information 139 | 140 | ![Screenshot of Trace Details](../images/Module-3/Module3_TraceDetails.png) 141 | 142 | ### 2a. Verify Structured Logging 143 | 144 | 1. Go to **CloudWatch** → **Log groups** 145 | 2. Open `/aws/lambda/putItemFunction` log group 146 | 3. Click on the most recent log stream. To open it and view 147 | 4. Search **DynamoDB** (case sensitive) 148 | 149 | **Before (unstructured):** 150 | ``` 151 | Writing to DynamoDB 152 | Item created successfully 153 | ``` 154 | 155 | **After (structured):** 156 | 157 | ![Screenshot of structured log](../images/Module-3/Module3_Structured.png) 158 | 159 | 160 | ### 2b. Structured logs for Tenant Attribution 161 | Now that structured logging includes tenant identifiers (as required by your organization), you can attribute logs, metrics, and alerts to specific tenants. Use the Log Insights query shown above to filter or aggregate by `tenant_id` for tenant-level usage, revenue, and error reporting. Ensure `tenant_id` is consistently included in all relevant log entries and metric dimensions to maintain accurate attribution for billing, monitoring, and alerts. 162 | 163 | 164 | 1. Go to **CloudWatch** → **Logs Insights** 165 | 2. Select the log group **`/aws/lambda/putItemFunction`** 166 | 3. Run tenant-specific queries: 167 | 168 | ```sql 169 | fields @timestamp, @message 170 | | filter @message like /tenant_id/ 171 | | parse @message '"tenant_id": "* "' as tenant_id 172 | | stats count() by tenant_id 173 | ``` 174 | 175 | ![Screenshot of Log Insights](../images/Module-3/Module3_LogInsights.png) 176 | 177 | 4. Click on the Visualization tab and choose Pie from the Widget type drop down. 178 | This is an example of how you can attribute tenant usage based on the log entries. 179 | 180 | ![Screenshot of Log Insights pie chart](../images/Module-3/Module3_LI-pie.png) 181 | 182 | 183 | 184 | ## Summary 185 | 186 | You have successfully implemented all three observability patterns: 187 | 188 | **X-Ray Distributed Tracing**: Complete request flow visibility across Lambda and DynamoDB 189 | **Structured Logging**: JSON format with tenant attribution and correlation IDs 190 | **Custom Business Metrics**: Real-time KPIs for ride requests, revenue, and errors by tenant 191 | **AWS Integration**: Real observability data flowing through CloudWatch and X-Ray 192 | **Automated Deployment**: Ollyver handled all code changes and AWS updates 193 | 194 | ## What's Next 195 | 196 | In Module 4, you'll see how businesses can add requirements for compliance and how Ollyver can handle that seamlessly. 197 | 198 | 199 | -------------------------------------------------------------------------------- /ollyver/Approach/workshop-methodology.md: -------------------------------------------------------------------------------- 1 | # Workshop Methodology: "Show, Don't Tell" 2 | 3 | ## Educational Approach 4 | 5 | ### Core Teaching Philosophy 6 | - **Hands-on Learning**: Implement patterns while explaining concepts 7 | - **Before/After Demonstrations**: Show immediate value of each enhancement 8 | - **Pattern Recognition**: Help users identify where patterns apply 9 | - **Transferable Skills**: Enable users to apply patterns independently 10 | 11 | ### Interactive Learning Flow 12 | 13 | #### Discovery Phase 14 | ``` 15 | User: "I have a serverless application that needs observability" 16 | Ollyver: "Let me analyze your application architecture..." 17 | 18 | 🔍 Found: 19 | - 3 Lambda functions (Python) 20 | - 1 API Gateway 21 | - 1 DynamoDB table 22 | - No observability instrumentation detected 23 | 24 | 📚 I'll teach you 5 key patterns: 25 | 1. X-Ray distributed tracing 26 | 2. Structured logging with correlation 27 | 3. Custom business metrics 28 | 4. Error tracking and alerting 29 | 5. Dashboard creation 30 | 31 | Ready to start with Pattern 1: X-Ray Tracing? 32 | ``` 33 | 34 | #### Implementation Phase 35 | ``` 36 | User: "Yes, show me X-Ray tracing" 37 | Ollyver: "X-Ray helps you visualize request flow across services. 38 | Let me add it to your getById function..." 39 | 40 | BEFORE: 41 | def lambda_handler(event, context): 42 | return get_item_by_id(event['id']) 43 | 44 | AFTER: 45 | from aws_xray_sdk.core import xray_recorder 46 | @xray_recorder.capture('lambda_handler') 47 | def lambda_handler(event, context): 48 | return get_item_by_id(event['id']) 49 | 50 | ✅ Added X-Ray tracing to getById function 51 | 📊 You can now see traces in AWS X-Ray console 52 | 🎯 This pattern applies to all your Lambda functions 53 | 54 | Apply to remaining functions? (y/n) 55 | ``` 56 | 57 | #### Reinforcement Phase 58 | ``` 59 | User: "y" 60 | Ollyver: "Applying X-Ray pattern to putItem and deleteItem functions..." 61 | 62 | ✅ All Lambda functions now have X-Ray tracing 63 | 📈 Your service map will show: API Gateway → Lambda → DynamoDB 64 | 🔧 Next: Let's add structured logging for better debugging 65 | 66 | Ready for Pattern 2: Structured Logging? 67 | ``` 68 | 69 | ## Pattern Implementation Methodology 70 | 71 | ### Step-by-Step Teaching Process 72 | 1. **Explain the Pattern**: "X-Ray tracing helps you see request flow across services" 73 | 2. **Show Current Code**: Display "before" state 74 | 3. **Implement Enhancement**: Add instrumentation automatically 75 | 4. **Show Improved Code**: Display "after" state with explanations 76 | 5. **Demonstrate Value**: "Now you can see request traces in X-Ray console" 77 | 6. **Scale the Pattern**: Apply to similar components 78 | 7. **Connect to Next**: Bridge to related observability patterns 79 | 80 | ### Code Analysis Patterns 81 | 82 | #### Requirements-Based Gap Detection 83 | Scan code against organizational requirements from `observability-requirements.md`: 84 | 85 | **X-Ray Distributed Tracing Gap**: 86 | ```python 87 | # Look for Lambda functions missing: 88 | - from aws_xray_sdk.core import xray_recorder 89 | - @xray_recorder.capture() decorator 90 | - patch_all() for automatic instrumentation 91 | ``` 92 | 93 | **Structured Logging Gap**: 94 | ```python 95 | # Identify these anti-patterns: 96 | - print() statements 97 | - Basic logging without JSON structure 98 | - Missing correlation IDs 99 | - No timestamp or context information 100 | ``` 101 | 102 | **Business Metrics Gap**: 103 | ```python 104 | # Business logic without observability: 105 | - Database operations without metrics 106 | - API calls without success/failure tracking 107 | - Business processes without measurement 108 | - Missing cloudwatch.put_metric_data() calls 109 | ``` 110 | 111 | **Error Tracking Gap**: 112 | ```python 113 | # Exception handling without observability: 114 | - try/except blocks without logging 115 | - Errors without correlation context 116 | - Missing error metrics 117 | - No X-Ray exception tracking 118 | ``` 119 | 120 | **Dashboard Gap**: 121 | ```python 122 | # Missing operational visibility: 123 | - No CloudWatch dashboards defined 124 | - Key metrics not visualized 125 | - No real-time monitoring setup 126 | ``` 127 | 128 | #### Lambda Function Detection 129 | ```python 130 | # Look for these patterns: 131 | - def lambda_handler(event, context): 132 | - AWS Lambda runtime indicators 133 | - boto3 client usage 134 | - Missing X-Ray imports 135 | ``` 136 | 137 | ## Interactive Commands 138 | # Business logic without observability: 139 | - Database operations without metrics 140 | - API calls without success/failure tracking 141 | - Business processes without measurement 142 | ``` 143 | 144 | ## Interactive Commands 145 | 146 | ### Analysis Commands 147 | - `"Ollyver, analyze this code"` - Detect observability gaps 148 | - `"Ollyver, what patterns do you see?"` - Identify architectural patterns 149 | - `"Ollyver, check observability coverage"` - Assess current state 150 | 151 | ### Implementation Commands 152 | - `"Ollyver, add X-Ray tracing"` - Implement distributed tracing 153 | - `"Ollyver, fix the logging"` - Add structured logging 154 | - `"Ollyver, add business metrics"` - Implement custom metrics 155 | - `"Ollyver, create dashboards"` - Generate monitoring dashboards 156 | 157 | ### Learning Commands 158 | - `"Ollyver, explain this pattern"` - Deep dive into observability concepts 159 | - `"Ollyver, show me the before/after"` - Compare implementations 160 | - `"Ollyver, how does this scale?"` - Discuss enterprise patterns 161 | 162 | ## Workshop Outcomes 163 | 164 | ### Immediate Learning Objectives 165 | - Hands-on experience with automated observability implementation 166 | - Understanding of core patterns and their applications 167 | - Real-time feedback on implementation quality 168 | - Immediate visibility into application behavior 169 | 170 | ### Transferable Skills Development 171 | - Pattern recognition for observability gaps 172 | - Understanding of when and how to apply each pattern 173 | - Ability to implement similar patterns manually 174 | - Knowledge of organizational observability standards 175 | 176 | ### Organizational Value Creation 177 | - Consistent observability standards across teams 178 | - Reduced time to production for new services 179 | - Improved system reliability and debugging capability 180 | - Standardized approach to operational excellence 181 | 182 | ## Teaching Principles 183 | 184 | ### Always Demonstrate Value 185 | - Show immediate benefits after each pattern implementation 186 | - Connect technical improvements to business outcomes 187 | - Provide concrete examples of how patterns help in production 188 | - Explain cost/benefit trade-offs for each enhancement 189 | 190 | ### Maintain Educational Focus 191 | - Explain the "why" behind each pattern, not just the "how" 192 | - Connect patterns to broader observability principles 193 | - Help users understand when to apply patterns in future projects 194 | - Encourage questions and exploration of concepts 195 | 196 | ### Ensure Practical Application 197 | - Use real code from user's actual project 198 | - Apply patterns to user's specific architecture 199 | - Demonstrate patterns in deployed AWS environment 200 | - Provide actionable next steps for continued improvement 201 | -------------------------------------------------------------------------------- /docs/module-5.md: -------------------------------------------------------------------------------- 1 | 2 | ## Organizational Expectations 3 | 4 | The immediate need and opportunity value of wide spread metrics and Key Performance Indicators (KPIs) in workloads enables the ability to answer next level questions when a technical solution is impaired. Do we have a comprehensive technical understanding of system health? Do we understand the business consequence of technical impairments? Or, do we have long established tunnel vision leading us to blame components, a business unit, or people? 5 | 6 | We cannot rely on a narrow set of metrics with potentially skewed perspectives, flawed decision-making, and unknown consequences. With Generative AI, this is no longer a burden on the builders. 7 | 8 | ## From Workshop to Organization 9 | 10 | You've just experienced a complete application transformation in under an hour. What took you 40 minutes to implement with Ollyver would have taken days or weeks using traditional manual approaches. Now the question is: **How do you scale this success across your entire organization?** 11 | 12 | This module provides a practical roadmap for taking your Ollyver experience and implementing AI-powered observability automation at organizational scale. 13 | 14 | 15 | ### **The Ollyver Approach** 16 | - **AI-driven gap detection** - Automatically identifies missing observability 17 | - **Pattern-based implementation** - Consistent, proven solutions every time 18 | - **Organizational standards integration** - Enforces company requirements automatically 19 | - **Educational automation** - Teams learn while AI does the work 20 | 21 | ### **Proven Patterns from This Workshop** 22 | 1. **X-Ray Distributed Tracing** - Works across any serverless architecture 23 | 2. **Structured Logging** - Applies to all application types and languages 24 | 3. **Business Metrics** - Scales to organization-wide KPI tracking 25 | 4. **Automated Deployment** - Reduces errors and speeds delivery 26 | 27 | 28 | 29 | ## Review Ollyver's Structure 30 | 31 | ### 1. Explore Ollyver Files 32 | 33 | In your VS Code Server, Ollyver's files are already available. Let's explore the structure: 34 | 35 | Navigate to Ollyver directory in your file explorer 36 | 37 | You should see the following file structure in that folder 38 | 39 | ``` 40 | ollyver/ 41 | ├── ollyver-agent-instructions.md # Core agent behavior and workflow 42 | ├── Org-Standards/ # Your organization's requirements 43 | │ ├── observability-requirements.md 44 | │ ├── core-patterns.md 45 | │ └── deployment-guide.md 46 | └── approach/ # Process documentation 47 | ├── process-overview.md 48 | ├── session-management.md 49 | └── workshop-methodology.md 50 | ``` 51 | 52 | ### 2. Examine Organizational Standards 53 | 54 | **Key Insight**: Ollyver includes the exact observability requirements that your **observability team** and **customer** collaborated to create. 55 | 56 | ```bashshowCopyAction=true showLineNumbers=true language=bash} 57 | # Review your organization's observability requirements in the following file. 58 | ollyver/Org-Standards/observability-requirements.md 59 | ``` 60 | 61 | This file contains: 62 | - **Distributed Tracing**: X-Ray on all Lambda functions 63 | - **Structured Logging**: JSON format with correlation IDs 64 | - **Business Metrics**: Custom CloudWatch metrics for operations 65 | - **Error Tracking**: Comprehensive error logging and metrics 66 | - **Operational Dashboards**: CloudWatch dashboards for key metrics 67 | 68 | ### 3. Review Implementation Patterns 69 | 70 | ```bashshowCopyAction=true showLineNumbers=true language=bash} 71 | # See how Ollyver implements observability patterns in the following file 72 | ollyver/Org-Standards/core-patterns.md 73 | ``` 74 | 75 | 76 | 77 | 78 | 79 | 80 | ## Scaling Strategy: The 3-Phase Approach 81 | 82 | ### **Phase 1: Foundation ** 83 | 84 | #### **Establish Organizational Standards** 85 | Based on your workshop experience, create your organization's observability requirements: 86 | 87 | ```yaml 88 | # Your-Company-Observability-Standards.yaml 89 | distributed_tracing: 90 | - X-Ray on all production Lambda functions 91 | - Correlation IDs across all services 92 | - Performance thresholds: <200ms P95 93 | 94 | structured_logging: 95 | - JSON format with company metadata 96 | - Tenant/customer attribution for SaaS applications 97 | - Security audit trails for compliance 98 | 99 | business_metrics: 100 | - Revenue-impacting operations tracked 101 | - Customer experience metrics automated 102 | - Cost allocation by business unit 103 | 104 | . 105 | . 106 | . 107 | ``` 108 | 109 | #### **Build Your observability Agent** 110 | 1. **Customize the Org-Standards or create an MCP server** with your company requirements 111 | 2. **Add company-specific patterns** (compliance, security, industry standards) 112 | 3. **Configure deployment processes** for your CI/CD pipelines 113 | 4. **Set up cost monitoring** and optimization rules 114 | 115 | #### **Train Your Core Team** 116 | - **2-3 senior developers** become Ollyver champions 117 | - **1 platform engineer** manages organizational standards 118 | - **1 operations lead** defines monitoring and alerting requirements 119 | 120 | 121 | ### **Phase 2: Pilot Program ** 122 | 123 | #### **Select Pilot Applications** 124 | Choose applications that represent your organization's diversity: 125 | - **Customer-facing service** (high visibility, business impact) 126 | - **Internal tool** (lower risk, good for learning) 127 | - **Data processing pipeline** (different architecture pattern) 128 | 129 | #### **Measure Baseline Metrics** 130 | Before Ollyver implementation: 131 | - Time to add observability to new features 132 | - Mean time to detect (MTTD) production issues 133 | - Mean time to resolve (MTTR) incidents 134 | - Developer satisfaction with observability tools 135 | 136 | #### **Run Pilot Implementations** 137 | For each pilot application: 138 | 1. **Workshop-style session** with development team 139 | 2. **Ollyver transformation** of existing application 140 | 3. **Verification and testing** in staging environment 141 | 4. **Production deployment** with monitoring 142 | 143 | #### **Collect Success Stories** 144 | Document specific improvements such as: 145 | - "Reduced debugging time from 4 hours to 15 minutes" 146 | - "Detected performance regression before customer impact" 147 | - "Implemented observability in 30 minutes vs 2 days previously" 148 | 149 | ### **Phase 3: Organization-Wide Rollout (Month 5-12)** 150 | 151 | #### **Mandatory Standards Implementation** 152 | - **New applications** must use Ollyver from day one 153 | - **Existing applications** retrofit during regular maintenance cycles 154 | - **CI/CD integration** automatically checks observability compliance 155 | - **Architecture reviews** include observability requirements 156 | 157 | #### **Team Training at Scale** 158 | **Workshop Replication:** 159 | - Run monthly "Ollyver workshops" for new teams 160 | - Create internal video tutorials based on this workshop 161 | - Establish "observability office hours" for questions and support 162 | 163 | **Knowledge Sharing:** 164 | - Internal Slack/Teams channels for Ollyver best practices 165 | - Quarterly "observability showcase" presentations 166 | - Cross-team mentoring programs 167 | 168 | #### **Advanced Patterns and Optimization** 169 | - **Multi-service correlation** across microservices architectures 170 | - **Cross-account observability** for complex enterprise environments 171 | - **Cost optimization** based on actual usage patterns 172 | - **Custom business metrics** aligned with company KPIs 173 | 174 | 175 | 176 | ## Conclusion 177 | 178 | Observability transforms your solutions from "black box" to "glass box". In less than an hour, you implemented multiple best practice observability patterns in a code base that was new to you. This is a radical shift in productivity that delivers higher quality solutions and provides new data to shorten mean time to recovery (MTTR). By scaling Ollyver's AI-powered approach across your organization, you can: 179 | - Move observability implementation from linear cost to scalable asset 180 | - Achieve consistent, high-quality monitoring across all applications 181 | - Free developer time to focus on business value creation 182 | - Build organizational expertise through automated education 183 | 184 | Start your journey today. Your customers, your organization, and your future self will thank you. 185 | -------------------------------------------------------------------------------- /docs/module-1.md: -------------------------------------------------------------------------------- 1 | 2 | ## Scenario 3 | 4 | You have inherited a serverless ride-sharing application that was **developed without any observability**. The application consists of two Lambda functions (putItemFunction, getByIdFunction), an API Gateway, and a DynamoDB table. 5 | 6 | While the application works functionally, your **organization's requirements** have evolved. A collaboration between your customer and your observability team has resulted in comprehensive observability goals that must now be implemented. 7 | 8 | ## Application Architecture 9 | 10 | The ride share application used in this workshop consists of serverless services such as Amazon API Gateway, AWS Lambda, and Amazon DynamoDB. 11 | 12 | For simplicity we are using just two AWS Lambda functions. One Lambda function to insert an Item into a DynamoDB table, and another function to retrieve it using the appropriate unique ID. 13 | 14 | ![Application Architecture](../images/Module-1/Module1_Architecture.png) 15 | 16 | ## The Challenge 17 | 18 | **Operations teams** and **business stakeholders** are struggling because the current application provides no visibility into: 19 | 20 | * Request flow across services 21 | * Tenant-specific usage patterns 22 | * Business KPIs and operational metrics 23 | * Error tracking and performance bottlenecks 24 | 25 | This lack of observability creates significant business and operational challenges that must be addressed. 26 | 27 | ## Organizational Requirements Discovery 28 | 29 | To address this, your **organization** has created specific observability goals and requirements. These represent real-world organizational standards that must be implemented across all applications. 30 | 31 | **Review the requirements** 32 | 33 | ```yaml 34 | # Organizational Observability Requirements 35 | observability_standards: 36 | distributed_tracing: 37 | - Enable X-Ray tracing on all Lambda functions 38 | - Capture downstream service calls 39 | - Include correlation IDs in all traces 40 | 41 | structured_logging: 42 | - JSON format for all log entries 43 | - Include tenant attribution for multi-tenant applications 44 | - Correlation IDs for request tracking 45 | 46 | business_metrics: 47 | - Custom CloudWatch metrics for key business operations 48 | - Tenant-specific usage tracking 49 | - Performance and error rate monitoring 50 | 51 | . 52 | . 53 | . 54 | 55 | ``` 56 | 57 | These requirements will guide the observability implementation throughout this workshop. 58 | 59 | ## Explore Your Application Environment 60 | 61 | Let's systematically explore the current application to understand exactly what observability gaps exist. 62 | 63 | ### 1. AWS X-Ray Trace Map - Missing 64 | 65 | Navigate to the [CloudWatch console](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1) and examine X-Ray tracing: 66 | 67 | 1. Click **Application Signals (APM)** in the left sidebar 68 | 2. Select **Trace Map** 69 | 3. Observe the results 70 | 71 | ![Screenshot of Empty X-Ray](../images/Module-1/Module1_Empty_ServiceMap.png) 72 | 73 | **Gap Identified**: No distributed tracing data because X-Ray instrumentation is not implemented in the application code. 74 | 75 | **Business Impact**: Cannot trace requests across services, making troubleshooting nearly impossible during incidents. 76 | 77 | 78 | ### 2a. Review CloudWatch Logs - Unstructured 79 | 80 | Examine the current logging approach: 81 | 82 | 1. Navigate to the [CloudWatch console](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1) 83 | 2. Click **Logs** → **Log groups** in the left sidebar 84 | 3. Click on `/aws/lambda/putItemFunction` log group 85 | 4. Open the most recent log stream 86 | 87 | ![Screenshot of Unstructured Logs](../images/Module-1/Module1_CW_UnstructuredLogs.png) 88 | 89 | 90 | 91 | ### 2b. Tenant Attribution - Missing 92 | 93 | **Gap Identified**: No easy way to identify tenant usage despite multi-tenant application architecture. 94 | 95 | **Business Impact**: Cannot track usage per tenant, making cost allocation and customer-specific analytics impossible. 96 | 97 | 98 | ### 3. Custom Metrics - Missing 99 | 100 | Now switch to your **VS Code Server terminal** to check for business-specific metrics in CloudWatch: 101 | 102 | ```bashshowCopyAction=true showLineNumbers=true language=bash} 103 | # Check for custom business metrics 104 | aws cloudwatch list-metrics --namespace "RideShare/Business" 105 | ``` 106 | 107 | ![Screenshot of Metrics](../images/Module-1/Module1_Empty_CW_metrics.png) 108 | 109 | **Gap Identified**: No custom business metrics exist for ride operations, tenant usage, or application-specific KPIs. 110 | 111 | **Business Impact**: No visibility into business performance, customer usage patterns, or operational efficiency. 112 | 113 | 114 | 115 | 116 | ## Examine Application Code Structure 117 | 118 | Let's look at the actual Lambda function code to understand the current implementation: 119 | 120 | 1. Navigate to the [Lambda console](https://us-east-1.console.aws.amazon.com/lambda/home?region=us-east-1#/functions) 121 | 122 | 2. Click on **putItemFunction** and examine the code: 123 | 124 | ```python 125 | import os 126 | import json 127 | import boto3 128 | import logging 129 | from datetime import datetime 130 | import base64 131 | 132 | # Initialize the DynamoDB client 133 | dynamodb = boto3.resource('dynamodb') 134 | tableName = os.environ['SAMPLE_TABLE'] 135 | table = dynamodb.Table(tableName) 136 | 137 | def lambda_handler(event, context): 138 | try: 139 | if event.get('isBase64Encoded'): 140 | record = base64.b64decode(event["body"]).decode('utf-8') 141 | else: 142 | record = event["body"] 143 | 144 | data = json.loads(record) 145 | 146 | id = data["id"] 147 | name = data["name"] 148 | milesTraveled = data["milesTraveled"] 149 | totalTravelTime = data["totalTravelTime"] 150 | price = data["price"] 151 | tenantId = data["tenantId"] 152 | timestamp = datetime.now().isoformat(timespec='seconds') 153 | 154 | data = 155 | # Write the item to DynamoDB 156 | response = table.put_item( 157 | Item=data 158 | ) 159 | print ("Writing to DynamoDB") 160 | 161 | return { 162 | 'statusCode': 200, 163 | 'body': 'Item added successfully' 164 | } 165 | except Exception as e: 166 | return { 167 | 'statusCode': 500, 168 | 'body': f'Error: ' 169 | } 170 | 171 | ``` 172 | 173 | 3. Click on **getByIdFunction** and examine the code: 174 | 175 | ```python 176 | import os 177 | import json 178 | import boto3 179 | 180 | # Initialize the DynamoDB client 181 | dynamodb = boto3.resource('dynamodb') 182 | tableName = os.environ['SAMPLE_TABLE'] 183 | table = dynamodb.Table(tableName) 184 | 185 | def lambda_handler(event, context): 186 | try: 187 | print (event) 188 | # Parse input from the event 189 | key = event['pathParameters']['id'] 190 | print (key) 191 | # Retrieve the item from DynamoDB 192 | response = table.get_item( 193 | Key={ 194 | 'id': key 195 | } 196 | ) 197 | 198 | # Check if the item was found 199 | if 'Item' in response: 200 | item = response['Item'] 201 | print (item) 202 | return { 203 | 'statusCode': 200, 204 | 'body': json.dumps(item) 205 | } 206 | else: 207 | return { 208 | 'statusCode': 404, 209 | 'body': 'Item not found' 210 | } 211 | except Exception as e: 212 | return { 213 | 'statusCode': 500, 214 | 'body': f'Error: ' 215 | } 216 | ``` 217 | 218 | **🔍 Code Analysis - Observability Gaps Identified:** 219 | 220 | | Issue | Current Code | Impact | 221 | |-------|-------------|---------| 222 | | **No Tracing** | Missing `from aws_xray_sdk.core import xray_recorder` | Cannot track requests across services | 223 | | **Unstructured Logging** | `print("Writing to DynamoDB")` | Logs are not searchable or correlatable | 224 | | **No Correlation IDs** | No request tracking mechanism | Cannot trace individual requests | 225 | | **No Custom Metrics** | No `cloudwatch.put_metric_data()` calls | No business KPI visibility | 226 | | **Missing Tenant Context** | `tenantId` extracted but not logged | Cannot track per-tenant usage | 227 | | **No Error Instrumentation** | Basic exception handling only | Limited error tracking and alerting | 228 | 229 | 230 | 231 | ## Organizational Requirements Gap Analysis 232 | 233 | Comparing current state against organizational requirements: 234 | 235 | | Requirement | Current State | Gap | 236 | |-------------|---------------|-----| 237 | | **X-Ray Tracing** | None | ❌ Complete gap | 238 | | **Structured Logging** | Basic print() | ❌ Complete gap | 239 | | **Tenant Attribution** | Missing | ❌ Complete gap | 240 | | **Custom Metrics** | None | ❌ Complete gap | 241 | | **Correlation IDs** | Missing | ❌ Complete gap | 242 | | **Cost Targets** | Unknown costs | ❌ No visibility | 243 | 244 | 245 | ## What's Next 246 | 247 | The observability gaps you've identified represent exactly the challenges that **Ollyver** is designed to solve. In the next module, you'll meet Ollyver - your AI observability agent that will: 248 | 249 | 1. **Automatically detect** these same gaps through code analysis 250 | 2. **Implement organizational standards** using proven patterns 251 | 3. **Deploy changes** using your existing AWS infrastructure 252 | 4. **Validate** that observability improvements work correctly 253 | 254 | Ollyver will transform this "black box" application into a fully observable system that meets all organizational requirements. 255 | 256 | > **Note**: The gaps you manually discovered will be automatically detected and fixed by Ollyver in the next module.] 257 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Operations-Driven Development with Ollyver 2 | 3 | AI-powered observability automation using the Ollyver agent to transform serverless applications from "black box" to "glass box" observability. 4 | 5 | --- 6 | 7 | > **📢 Preview Release** 8 | > 9 | > This content is best experienced at an AWS-hosted event using the full workshop: [Operations-Driven Development Workshop](https://catalog.us-east-1.prod.workshops.aws/workshops/89b1708b-12ca-4871-a1e4-171c600c2736/en-US) 10 | > 11 | > **Current Status:** This repository is in **preview mode**. You can currently access and use the Ollyver agent code and configuration. 12 | > 13 | > **Coming Soon:** Full standalone workshop experience with MCP (Model Context Protocol) integration for organizational standards management. Stay tuned for updates! 14 | 15 | --- 16 | 17 | ## Overview 18 | 19 | This project demonstrates **Operations-Driven Development** - an approach that elevates observability to a primary concern in the development lifecycle using AI agents. Meet **Ollyver**, an AI agent that automatically detects observability gaps and implements comprehensive monitoring, logging, and tracing across your serverless applications. 20 | 21 | ## What You'll Learn 22 | 23 | This hands-on workshop teaches you how to transform a serverless application from "black box" to "glass box" observability using AI automation. You'll experience: 24 | 25 | - **AI-Powered Observability** - Watch Ollyver automatically detect and fix observability gaps 26 | - **Distributed Tracing** - Implement AWS X-Ray tracing across all services 27 | - **Structured Logging** - Create JSON logs with tenant attribution and correlation IDs 28 | - **Custom Metrics** - Build business KPIs and operational dashboards 29 | - **Organizational Scaling** - Apply consistent observability patterns across teams 30 | 31 | ### Workshop Format 32 | 33 | This is a **self-paced, hands-on workshop** where you'll: 34 | 35 | 1. **Deploy** a real serverless application to your AWS account 36 | 2. **Discover** observability gaps in the application 37 | 3. **Meet Ollyver** - Your AI observability companion 38 | 4. **Transform** the application with automated instrumentation 39 | 5. **Validate** improvements in AWS CloudWatch and X-Ray 40 | 41 | **Time Required:** 2-3 hours 42 | **Cost:** Minimal AWS charges (< $5 for workshop duration) 43 | 44 | ### What is Ollyver? 45 | 46 | Ollyver is a specialized AI agent built on [Kiro CLI](https://kiro.dev/) that automates observability implementation. Instead of manually instrumenting code, Ollyver: 47 | 48 | - **Scans** your codebase to detect architecture and gaps 49 | - **Detects** missing observability patterns against organizational standards 50 | - **Suggests** fixes with educational context 51 | - **Implements** patterns automatically with AWS deployment 52 | 53 | ### Key Features 54 | 55 | - ✅ **Automated X-Ray Tracing** - Distributed tracing across all services 56 | - ✅ **Structured Logging** - JSON logs with tenant attribution and correlation IDs 57 | - ✅ **Custom Business Metrics** - Track KPIs and operational metrics 58 | - ✅ **CloudWatch Dashboards** - Automated dashboard creation 59 | - ✅ **Organizational Standards** - Enforce consistent observability patterns 60 | 61 | ## Architecture 62 | 63 | The sample application is a serverless ride-sharing service consisting of: 64 | 65 | - **Amazon API Gateway** - HTTP API endpoint 66 | - **AWS Lambda** - Two functions (putItemFunction, getByIdFunction) 67 | - **Amazon DynamoDB** - Data storage 68 | - **CloudWatch** - Logs, metrics, and dashboards 69 | - **AWS X-Ray** - Distributed tracing 70 | 71 | ![Architecture](/img/Architecture.png) 72 | 73 | ## Prerequisites 74 | 75 | - **AWS Account** with CLI configured 76 | - **AWS CLI** - [Installation Guide](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) 77 | - **AWS SAM CLI** - [Installation Guide](https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/install-sam-cli.html) 78 | - **Docker** - Required for SAM build 79 | - **Python 3.11+** - For Lambda functions 80 | - **Kiro CLI** - [Installation Guide](https://kiro.dev/) 81 | 82 | ## Quick Start 83 | 84 | ### 1. Clone the Repository 85 | 86 | ```bash 87 | git clone https://github.com/aws-samples/observability-driven-development.git 88 | cd observability-driven-development 89 | ``` 90 | 91 | ### 2. Deploy the Application 92 | 93 | ```bash 94 | cd cloudformation 95 | sam build 96 | sam deploy --guided 97 | ``` 98 | 99 | Use stack name: `observability-driven-development` 100 | 101 | ### 3. Setup Kiro CLI 102 | 103 | ```bash 104 | # Authenticate with AWS Builder ID 105 | kiro-cli login --use-device-flow 106 | ``` 107 | 108 | ### 4. Activate Ollyver Agent 109 | 110 | ```bash 111 | # From project root 112 | kiro-cli chat --agent ollyver 113 | ``` 114 | 115 | ### 5. Test the Application 116 | 117 | ```bash 118 | # Get API endpoint 119 | export API_URL=$(aws cloudformation describe-stacks \ 120 | --stack-name observability-driven-development \ 121 | --query 'Stacks[0].Outputs[?OutputKey==`HttpApiUrl`].OutputValue' \ 122 | --output text) 123 | 124 | # Test POST 125 | curl -X POST ${API_URL}items \ 126 | -d '{"id":"1a2b3c4d","name":"test user","milesTraveled":"12","totalTravelTime":"600","price":"13.32","tenantId":"1001"}' 127 | 128 | # Test GET 129 | curl ${API_URL}items/1a2b3c4d 130 | ``` 131 | 132 | ## Workshop Modules 133 | 134 | This workshop is organized into modules that guide you through the complete observability transformation journey. **Start with the Introduction** to understand the concepts, then follow the modules in order. 135 | 136 | ### Getting Started 137 | 138 | **📖 [Introduction](docs/introduction.md)** - Start here to understand: 139 | - What is observability and why it matters 140 | - The traditional observability gap problem 141 | - How AI agents solve the developer burden 142 | - What you'll accomplish in this workshop 143 | 144 | **⚙️ [Setup Guide](docs/setup.md)** - Environment setup: 145 | - Install prerequisites (AWS CLI, SAM CLI, Kiro CLI) 146 | - Deploy the sample application 147 | - Configure Ollyver agent 148 | - Test your deployment 149 | 150 | ### Workshop Modules 151 | 152 | **Module 1: [Explore Your Application](docs/module-1.md)** 153 | - Examine the serverless ride-sharing application 154 | - Discover observability gaps manually 155 | - Understand organizational requirements 156 | - See the "black box" problem firsthand 157 | 158 | **Module 2: [Meet Ollyver](docs/module-2.md)** 159 | - Introduction to the Ollyver AI agent 160 | - Understand the automated workflow 161 | - Test Ollyver's knowledge 162 | - Prepare for transformation 163 | 164 | **Module 3: [Closing the Gaps](docs/module-3.md)** 165 | - Watch Ollyver detect observability gaps 166 | - Implement X-Ray tracing automatically 167 | - Add structured logging with correlation IDs 168 | - Deploy and validate improvements 169 | 170 | **Module 4: [Dynamic Requirements](docs/module-4.md)** 171 | - Adapt to changing organizational standards 172 | - Implement PII masking in logs 173 | - See how Ollyver handles evolving requirements 174 | - Experience continuous observability improvement 175 | 176 | **Module 5: [Organizational Scaling](docs/module-5.md)** 177 | - Scale observability patterns across teams 178 | - Customize organizational standards 179 | - Apply consistent patterns organization-wide 180 | - Build a culture of observability excellence 181 | 182 | ## Ollyver Agent Configuration 183 | 184 | The Ollyver agent is located in the `ollyver/` directory: 185 | 186 | ``` 187 | ollyver/ 188 | ├── .amazonq/ 189 | │ └── cli-agents/ 190 | │ └── ollyver.json # Agent configuration 191 | ├── Org-Standards/ # Organizational requirements 192 | │ ├── observability-requirements.md 193 | │ ├── core-patterns.md 194 | │ └── deployment-guide.md 195 | ├── Approach/ # Agent workflow 196 | │ ├── process-overview.md 197 | │ ├── session-management.md 198 | │ └── workshop-methodology.md 199 | └── ollyver-agent-instructions.md # Agent instructions 200 | ``` 201 | 202 | ### Customizing Organizational Standards 203 | 204 | Edit files in `ollyver/Org-Standards/` to customize observability requirements for your organization: 205 | 206 | - **observability-requirements.md** - Define your observability standards 207 | - **core-patterns.md** - Implementation patterns and best practices 208 | - **deployment-guide.md** - AWS deployment procedures 209 | 210 | ## Project Structure 211 | 212 | ``` 213 | observability-driven-development/ 214 | ├── cloudformation/ # CloudFormation templates 215 | │ └── application.yaml # Main application template 216 | ├── src/handlers/ # Lambda function code 217 | │ ├── putItemFunction/ 218 | │ └── getByIdFunction/ 219 | ├── ollyver/ # Ollyver agent configuration 220 | ├── docs/ # Workshop documentation 221 | ├── images/ # Workshop images 222 | └── README.md # This file 223 | ``` 224 | 225 | ## What You'll Learn 226 | 227 | By completing this workshop, you will: 228 | 229 | - **Understand Observability Fundamentals** - Learn the three pillars (logs, metrics, traces) and why they matter 230 | - **Experience AI-Powered Automation** - See how AI agents can automate complex observability tasks 231 | - **Implement Distributed Tracing** - Add AWS X-Ray tracing across Lambda functions and API Gateway 232 | - **Create Structured Logs** - Build JSON logs with tenant attribution and correlation IDs 233 | - **Build Custom Metrics** - Track business KPIs and operational metrics in CloudWatch 234 | - **Scale Organizational Patterns** - Apply consistent observability standards across teams 235 | - **Master Operations-Driven Development** - Elevate observability as a primary development concern 236 | 237 | ### Skills You'll Gain 238 | 239 | - Working with Kiro CLI and custom AI agents 240 | - AWS observability services (X-Ray, CloudWatch, Lambda Insights) 241 | - Serverless application instrumentation 242 | - Infrastructure as Code with AWS SAM 243 | - Best practices for production observability 244 | 245 | ## Clean Up 246 | 247 | To avoid ongoing charges, delete the CloudFormation stack: 248 | 249 | ```bash 250 | sam delete --stack-name observability-driven-development 251 | ``` 252 | 253 | Or via AWS Console: 254 | 1. Navigate to CloudFormation console 255 | 2. Select `observability-driven-development` stack 256 | 3. Click **Delete** 257 | 258 | ## Contributing 259 | 260 | See [CONTRIBUTING.md](CONTRIBUTING.md) for contribution guidelines. 261 | 262 | ## License 263 | 264 | This project is licensed under the MIT-0 License. See the [LICENSE](LICENSE) file for details. 265 | 266 | The documentation is made available under the Creative Commons Attribution-ShareAlike 4.0 International License. 267 | --------------------------------------------------------------------------------