├── .DS_Store ├── Factory View.png ├── Machine View.png ├── Data Preparation.png ├── Solution Architecture.png ├── Redshift queries.md └── README.md /.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ashleshk/Data-Engineering-Real-Time-Performance-Optimization/main/.DS_Store -------------------------------------------------------------------------------- /Factory View.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ashleshk/Data-Engineering-Real-Time-Performance-Optimization/main/Factory View.png -------------------------------------------------------------------------------- /Machine View.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ashleshk/Data-Engineering-Real-Time-Performance-Optimization/main/Machine View.png -------------------------------------------------------------------------------- /Data Preparation.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ashleshk/Data-Engineering-Real-Time-Performance-Optimization/main/Data Preparation.png -------------------------------------------------------------------------------- /Solution Architecture.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ashleshk/Data-Engineering-Real-Time-Performance-Optimization/main/Solution Architecture.png -------------------------------------------------------------------------------- /Redshift queries.md: -------------------------------------------------------------------------------- 1 | # Queries 2 | 3 | ## Load data into your Amazon Redshift Serverless Instance 4 | 5 | ``` 6 | CREATE TABLE public.machine_data ( 7 | machine_timestamp timestamp without time zone, 8 | machine_manufacturer_id character varying(256), 9 | line character varying(256), 10 | line_description character varying(256), 11 | operator_name character varying(256), 12 | operator_perno character varying(256), 13 | machine character varying(256), 14 | machine_description character varying(256), 15 | units_target integer, 16 | wire_tension double precision, 17 | power_unit integer, 18 | error_code character varying(256), 19 | bit_speed double precision, 20 | temperature double precision, 21 | units double precision, 22 | oee double precision, 23 | defective_count double precision 24 | ) DISTSTYLE AUTO; 25 | 26 | 27 | ``` 28 | 29 | 30 | 31 | ### create a view to be used in QuickSight as a datasource 32 | 33 | ``` 34 | CREATE MATERIALIZED VIEW full_machine_data 35 | AS 36 | select machine_timestamp, 37 | machine_manufacturer_id, 38 | line, 39 | line_description, 40 | operator_name, 41 | operator_perno, 42 | machine, 43 | machine_description, 44 | units_target, 45 | wire_tension, 46 | power_unit, 47 | error_code, 48 | bit_speed, 49 | temperature, 50 | units, 51 | oee, 52 | defective_count from machine_data; 53 | 54 | ``` 55 | ``` 56 | select * from full_machine_data; 57 | 58 | ``` 59 | 60 | ## Ranges for Quicksight Dashboard 61 | 62 | ``` 63 | select min(machine_timestamp), 64 | max(machine_timestamp) 65 | from full_machine_data 66 | 67 | ``` 68 | 69 | 70 | ### Join Streaming Data with Redshift Data. 71 | 72 | 73 | ``` 74 | DROP MATERIALIZED VIEW full_machine_data; 75 | CREATE MATERIALIZED VIEW full_machine_data 76 | AS 77 | select machine_timestamp, 78 | machine_manufacturer_id, 79 | line, 80 | line_description, 81 | operator_name, 82 | operator_perno, 83 | machine, 84 | machine_description, 85 | units_target, 86 | bit_speed, 87 | temperature, 88 | wire_tension, 89 | power_unit, 90 | error_code, 91 | units, 92 | oee, 93 | defective_count 94 | from ext_cleaned_stream.cleaned_stream 95 | UNION 96 | select machine_timestamp, 97 | machine_manufacturer_id, 98 | line, 99 | line_description, 100 | operator_name, 101 | operator_perno, 102 | machine, 103 | machine_description, 104 | units_target, 105 | bit_speed, 106 | temperature, 107 | wire_tension, 108 | power_unit, 109 | error_code, 110 | units, 111 | oee, 112 | defective_count 113 | from machine_data; 114 | 115 | ``` 116 | 117 | ``` 118 | 119 | SELECT * FROM full_machine_data 120 | WHERE operator_name = 'engineering_maintenance' 121 | 122 | ``` -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ### Data-Engineering: 2 | 3 | # Real-Time Performance Optimization for Manufacturing Company 4 | 5 | Real-Time Performance Optimization Project with complete streaming data pipeline including acquisition, storage, enrichment, analysis and visualization 6 | 7 | - **Credit**: Special thanks to **AWS** & **Thoughtworks** to conduct **"Data-Driven Everything(D2E) Workshop in Chicago"**. Thank you all for this amazing workshop. 8 | 9 | - **Source**: [AWS Catalog Workshop](https://catalog.workshops.aws/event/dashboard/en-US/workshop#data-strategy:-real-time-performance-optimization) 10 | 11 | ### Project Objective 12 | 13 | AnyCompany Manufacturing is looking to become a world class manufacturing company. To achieve this all machines within the factory should operate at a 85% or higher Overall Operating Efficiency to be considered world class. AnyCompany manufacturing has found that they are are close to this operating target but suspect there are operators of the machine that need improvement. 14 | 15 | You as Analyst need to investigate this and work backwards to solve the problem of having a low operational effectiveness efficiency. You will ultimately build a real-time data pipeline resulting in machine insights and visualizations for factory effectiveness. 16 | 17 | ### AWS Services used 18 | - Amazon Simple Storage Service 19 | - AWS Glue 20 | - Amazon Kinesis 21 | - AWS Lambda 22 | - Amazon Simple Notification Service (SNS) 23 | - Amazon QuickSight 24 | - AWS Identity and Access Management 25 | - AWS Redshift Serverless 26 | - Amazon Sagemaker 27 | 28 | ## Proposed Solution 29 | 30 | ![Architecture diagram](https://github.com/Ashleshk/Data-Engineering-Real-Time-Performance-Optimization-for-Manufacturing-Company/blob/main/Solution%20Architecture.png) 31 | 32 | ## Attribute Information 33 | 34 | The historical data is made up of the following fields: 35 | ``` 36 | line - the factory line id 37 | line_description - description of the type of vehicle parts are made on the line. 38 | operator_name - last name, first name of the employee operating the machine at the time of data entry. 39 | operator_perno - HR personel number of the employee. 40 | assembly_id - vehicle part number being assembled. 41 | machine_model_no - model number of the manufacturing machine. 42 | machine_manufacturer_id - unique id of the specific machine ( concatenation line+machine). 43 | machine - the machine location on a given line 44 | machine_description - description of the area of the vehicle in which the machine is assemblying for. 45 | units_target - target number of units to be produced for a shift. 46 | timestamp - time at which the event was recorded. 47 | wire_tension - amount of tension for the machine's wire (in newtons), this wire helps with assembly. 48 | power_unit - amount of power being generated by the machine (0-100) in kilowatts. 49 | error_code - code of error being produced at the moment. 50 | bit_speed - how fast the machine's rotating bit (to power the machine) is spinning in revolutions per minute (R.P.M). 51 | temperature - the current temperature of the machine in celsius. 52 | units - current number of units produced (the machine has the ability to track when a full unit is produced, allowing a decimal to be shown between each full unit). 53 | defective_count - amount of defects produced at the given time (shown as a decimal but can be converted to a percentage). 54 | ``` 55 | 56 | 57 | ## Question Addressed & My Findings (Insights for Business decision) 58 | 59 | 1. **Question - How can we visualize the operating parameters for each machine in the factory?** 60 | 61 | **Answer** : Yes, By selecting any data point in the Line Chart, say LO1, Machine 5, 62 | - this value of the line field for data point is applied as a filter to the box plot visual operated during this time period with wider parameter settings. 63 | - We also see that the median settings during the time period for Machine 5 were drastically different from the other machines. 64 | 65 | 66 | 2. **Question - Which Machine was most Defective, which operator was responsible?** 67 | 68 | **Answer** : **Line 1 Machine 5** was most defective among others for frequent period of time. 69 | - John Doe worked mostly on this machine and Overall, he was responsible for many more defects (count: **4681.46**) than his colleagues. concluded from machine view below. 70 | - Based on this, **We can recommended that John receives more training to reduce the overall amount of defects**. 71 | 72 | ___ 73 | 74 | 75 | 76 | 77 | ## Data preparation 78 | 79 | ![Data prepartion](https://github.com/Ashleshk/Data-Engineering-Real-Time-Performance-Optimization-for-Manufacturing-Company/blob/main/Data%20Preparation.png) 80 | 81 | ___ 82 | 83 | ## Visualization 84 | 85 | 1. **Factory View Dashboard** 86 | ![Factory View Dashboard](https://github.com/Ashleshk/Data-Engineering-Real-Time-Performance-Optimization-for-Manufacturing-Company/blob/main/Factory%20View.png) 87 | 88 | 2. **Machine View Dashboard** 89 | 90 | ![Machine View Dashboard](https://github.com/Ashleshk/Data-Engineering-Real-Time-Performance-Optimization-for-Manufacturing-Company/blob/main/Machine%20View.png) --------------------------------------------------------------------------------