├── README.md ├── courses ├── level102 │ ├── .level102 │ ├── networking │ │ ├── media │ │ │ ├── RTT.png │ │ │ ├── Anycast.png │ │ │ ├── Dual ToR.png │ │ │ ├── LB 1-Arm.png │ │ │ ├── LB 2-Arm.png │ │ │ ├── Dual ToR BGP.png │ │ │ └── Single ToR.png │ │ ├── conclusion.md │ │ └── rtt.md │ ├── linux_intermediate │ │ ├── images │ │ │ ├── image1.png │ │ │ ├── image10.png │ │ │ ├── image11.png │ │ │ ├── image12.png │ │ │ ├── image13.png │ │ │ ├── image14.png │ │ │ ├── image15.png │ │ │ ├── image16.png │ │ │ ├── image17.png │ │ │ ├── image19.png │ │ │ ├── image2.png │ │ │ ├── image20.png │ │ │ ├── image21.png │ │ │ ├── image22.png │ │ │ ├── image23.png │ │ │ ├── image24.png │ │ │ ├── image25.png │ │ │ ├── image26.png │ │ │ ├── image28.png │ │ │ ├── image29.png │ │ │ ├── image3.png │ │ │ ├── image30.png │ │ │ ├── image4.png │ │ │ ├── image5.png │ │ │ ├── image6.png │ │ │ ├── image7.png │ │ │ └── image9.png │ │ ├── conclusion.md │ │ ├── introvim.md │ │ ├── package_management.md │ │ └── archiving_backup.md │ ├── system_design │ │ ├── images │ │ │ ├── microservices.jpg │ │ │ ├── initial_architecture.jpeg │ │ │ └── initial_application_sketch.jpeg │ │ ├── conclusion.md │ │ └── scaling-beyond-the-datacenter.md │ ├── containerization_and_orchestration │ │ ├── images │ │ │ ├── VM.png │ │ │ ├── cg1.png │ │ │ ├── cg2.png │ │ │ ├── cg3.png │ │ │ ├── cg4.png │ │ │ ├── cg5.png │ │ │ ├── cg6.png │ │ │ ├── ns1.png │ │ │ ├── ns2.png │ │ │ ├── ns3.png │ │ │ ├── ns4.png │ │ │ ├── ns5.png │ │ │ ├── kube1.png │ │ │ ├── kube2.png │ │ │ ├── kube3.png │ │ │ ├── kube4.png │ │ │ ├── kube5.png │ │ │ ├── kube6.png │ │ │ ├── kube7.png │ │ │ ├── kube8.png │ │ │ ├── kube9.png │ │ │ ├── Containers.png │ │ │ ├── dockerengine.png │ │ │ └── kubernetes.png │ │ ├── conclusion.md │ │ └── intro.md │ ├── system_troubleshooting_and_performance │ │ ├── images │ │ │ ├── FlaskCode.png │ │ │ ├── FlaskStart.png │ │ │ ├── MemUsage01.png │ │ │ ├── MemUsage02.png │ │ │ ├── MemUsage03.png │ │ │ ├── MemUsageChart.png │ │ │ ├── Tracemalloc01.png │ │ │ ├── Tracemalloc02.png │ │ │ ├── Tracemalloc03.png │ │ │ └── TroubleshootingFlow.jpg │ │ ├── conclusion.md │ │ ├── important-tools.md │ │ ├── troubleshooting.md │ │ ├── troubleshooting-example.md │ │ └── introduction.md │ ├── continuous_integration_and_continuous_delivery │ │ ├── images │ │ │ ├── CD_Image1.JPG │ │ │ ├── CI_Image1.JPG │ │ │ ├── CI_Image2.JPG │ │ │ ├── Jenkins1.png │ │ │ ├── Jenkins2.png │ │ │ ├── Jenkins3.png │ │ │ ├── Jenkins4.png │ │ │ ├── Jenkins5.png │ │ │ └── Jenkins6.png │ │ ├── introduction_to_cicd.md │ │ ├── continuous_delivery_release_pipeline.md │ │ ├── continuous_integration_build_pipeline.md │ │ ├── introduction.md │ │ ├── conclusion.md │ │ └── cicd_brief_history.md │ └── system_calls_and_signals │ │ ├── images │ │ └── Transition_between_User_and_Kernel_Mode.png │ │ ├── conclusion.md │ │ └── intro.md ├── img │ ├── sos.png │ └── favicon.ico ├── level101 │ ├── security │ │ ├── images │ │ │ ├── image1.png │ │ │ ├── image10.png │ │ │ ├── image11.png │ │ │ ├── image122.png │ │ │ ├── image14.png │ │ │ ├── image15.png │ │ │ ├── image17.png │ │ │ ├── image18.png │ │ │ ├── image19.png │ │ │ ├── image20.png │ │ │ ├── image22.png │ │ │ ├── image23.png │ │ │ ├── image26.png │ │ │ ├── image5.png │ │ │ ├── image6.png │ │ │ ├── image7.png │ │ │ ├── image8.png │ │ │ └── image9.png │ │ ├── intro.md │ │ └── conclusion.md │ ├── big_data │ │ ├── images │ │ │ ├── map_reduce.jpg │ │ │ ├── pig_example.png │ │ │ ├── hadoop_evolution.png │ │ │ ├── hdfs_architecture.png │ │ │ ├── mapreduce_example.jpg │ │ │ └── yarn_architecture.gif │ │ ├── tasks.md │ │ └── intro.md │ ├── systems_design │ │ ├── images │ │ │ ├── cdn.jpg │ │ │ ├── availability.jpg │ │ │ ├── sharding-1.jpg │ │ │ ├── sharding-2.jpg │ │ │ ├── swimlane-1.jpg │ │ │ ├── swimlane-2.jpg │ │ │ ├── microservices.jpg │ │ │ ├── first-architecture.jpg │ │ │ └── horizontal-scaling.jpg │ │ ├── conclusion.md │ │ ├── intro.md │ │ └── fault-tolerance.md │ ├── databases_nosql │ │ ├── images │ │ │ ├── Quorum.png │ │ │ ├── vector_clocks.png │ │ │ ├── cluster_quorum.png │ │ │ ├── consistent_hashing.png │ │ │ └── database_sharding.png │ │ └── further_reading.md │ ├── linux_networking │ │ ├── images │ │ │ ├── arp.gif │ │ │ ├── pcap.png │ │ │ ├── closed.png │ │ │ └── established.png │ │ ├── conclusion.md │ │ ├── intro.md │ │ ├── udp.md │ │ ├── ipr.md │ │ └── tcp.md │ ├── databases_sql │ │ ├── images │ │ │ ├── partial_backup.png │ │ │ ├── mysql_architecture.png │ │ │ ├── mysqldumpslow_out.png │ │ │ ├── rbr_binlog_view_1.png │ │ │ ├── sbr_binlog_view_1.png │ │ │ ├── innodb_architecture.png │ │ │ ├── mysqldump_gtid_text.png │ │ │ ├── rbr_example_update_1.png │ │ │ ├── replication_function.png │ │ │ ├── sbr_example_update_1.png │ │ │ └── replication_topologies.png │ │ ├── conclusion.md │ │ ├── innodb.md │ │ ├── intro.md │ │ ├── mysql.md │ │ └── operations.md │ ├── metrics_and_monitoring │ │ ├── images │ │ │ ├── image1.jpg │ │ │ ├── image2.png │ │ │ ├── image3.jpg │ │ │ ├── image4.jpg │ │ │ ├── image5.jpg │ │ │ ├── image6.png │ │ │ ├── image7.png │ │ │ ├── image8.png │ │ │ ├── image9.png │ │ │ ├── image10.png │ │ │ ├── image11.png │ │ │ └── image12.png │ │ ├── alerts.md │ │ ├── best_practices.md │ │ ├── third-party_monitoring.md │ │ ├── conclusion.md │ │ └── command-line_tools.md │ ├── linux_basics │ │ ├── images │ │ │ └── linux │ │ │ │ ├── admin │ │ │ │ ├── image1.png │ │ │ │ ├── image10.png │ │ │ │ ├── image11.png │ │ │ │ ├── image12.png │ │ │ │ ├── image13.png │ │ │ │ ├── image14.png │ │ │ │ ├── image15.png │ │ │ │ ├── image16.png │ │ │ │ ├── image17.png │ │ │ │ ├── image18.png │ │ │ │ ├── image19.png │ │ │ │ ├── image2.png │ │ │ │ ├── image20.png │ │ │ │ ├── image21.png │ │ │ │ ├── image22.png │ │ │ │ ├── image23.png │ │ │ │ ├── image24.png │ │ │ │ ├── image25.png │ │ │ │ ├── image26.png │ │ │ │ ├── image27.png │ │ │ │ ├── image28.png │ │ │ │ ├── image29.png │ │ │ │ ├── image3.png │ │ │ │ ├── image30.png │ │ │ │ ├── image31.jpg │ │ │ │ ├── image32.png │ │ │ │ ├── image33.png │ │ │ │ ├── image34.png │ │ │ │ ├── image35.png │ │ │ │ ├── image36.png │ │ │ │ ├── image37.png │ │ │ │ ├── image38.png │ │ │ │ ├── image39.png │ │ │ │ ├── image4.png │ │ │ │ ├── image40.png │ │ │ │ ├── image41.png │ │ │ │ ├── image42.png │ │ │ │ ├── image43.png │ │ │ │ ├── image44.png │ │ │ │ ├── image45.png │ │ │ │ ├── image46.png │ │ │ │ ├── image47.png │ │ │ │ ├── image48.png │ │ │ │ ├── image49.png │ │ │ │ ├── image5.png │ │ │ │ ├── image50.png │ │ │ │ ├── image51.png │ │ │ │ ├── image52.png │ │ │ │ ├── image53.png │ │ │ │ ├── image54.png │ │ │ │ ├── image55.png │ │ │ │ ├── image56.png │ │ │ │ ├── image57.png │ │ │ │ ├── image58.png │ │ │ │ ├── image6.png │ │ │ │ ├── image7.png │ │ │ │ ├── image8.png │ │ │ │ └── image9.png │ │ │ │ └── commands │ │ │ │ ├── image1.png │ │ │ │ ├── image2.png │ │ │ │ ├── image3.png │ │ │ │ ├── image4.png │ │ │ │ ├── image5.png │ │ │ │ ├── image6.png │ │ │ │ ├── image7.png │ │ │ │ ├── image8.png │ │ │ │ ├── image9.png │ │ │ │ ├── image10.png │ │ │ │ ├── image11.png │ │ │ │ ├── image12.png │ │ │ │ ├── image13.png │ │ │ │ ├── image14.png │ │ │ │ ├── image15.png │ │ │ │ ├── image16.png │ │ │ │ ├── image17.png │ │ │ │ ├── image18.png │ │ │ │ ├── image19.png │ │ │ │ ├── image20.png │ │ │ │ ├── image21.png │ │ │ │ ├── image22.png │ │ │ │ ├── image23.png │ │ │ │ ├── image24.png │ │ │ │ ├── image25.png │ │ │ │ ├── image26.png │ │ │ │ ├── image27.png │ │ │ │ ├── image28.png │ │ │ │ ├── image29.png │ │ │ │ ├── image30.png │ │ │ │ ├── image31.png │ │ │ │ ├── image32.png │ │ │ │ ├── image33.png │ │ │ │ ├── image34.png │ │ │ │ ├── image35.png │ │ │ │ └── image36.png │ │ └── conclusion.md │ ├── git │ │ ├── conclusion.md │ │ └── github-hooks.md │ ├── messagequeue │ │ └── further_reading.md │ └── python_web │ │ ├── python-web-flask.md │ │ └── sre-conclusion.md ├── sre_community.md ├── stylesheets │ └── custom.css ├── CONTRIBUTING.md └── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── .gitignore ├── img └── sos.png ├── requirements.txt ├── NOTICE ├── overrides └── partials │ ├── nav.html │ └── header.html └── .github └── workflows └── gh-deploy.yml /README.md: -------------------------------------------------------------------------------- 1 | courses/index.md -------------------------------------------------------------------------------- /courses/level102/.level102: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | courses/CONTRIBUTING.md -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store 2 | .venv 3 | site/ 4 | -------------------------------------------------------------------------------- /img/sos.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/img/sos.png -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | mkdocs==1.5.3 2 | mkdocs-material==9.5.12 3 | jinja2>=3.0.2 4 | -------------------------------------------------------------------------------- /courses/img/sos.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/img/sos.png -------------------------------------------------------------------------------- /courses/img/favicon.ico: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/img/favicon.ico -------------------------------------------------------------------------------- /courses/level102/networking/media/RTT.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/networking/media/RTT.png -------------------------------------------------------------------------------- /courses/level101/security/images/image1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/security/images/image1.png -------------------------------------------------------------------------------- /courses/level101/security/images/image10.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/security/images/image10.png -------------------------------------------------------------------------------- /courses/level101/security/images/image11.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/security/images/image11.png -------------------------------------------------------------------------------- /courses/level101/security/images/image122.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/security/images/image122.png -------------------------------------------------------------------------------- /courses/level101/security/images/image14.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/security/images/image14.png -------------------------------------------------------------------------------- /courses/level101/security/images/image15.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/security/images/image15.png -------------------------------------------------------------------------------- /courses/level101/security/images/image17.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/security/images/image17.png -------------------------------------------------------------------------------- /courses/level101/security/images/image18.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/security/images/image18.png -------------------------------------------------------------------------------- /courses/level101/security/images/image19.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/security/images/image19.png -------------------------------------------------------------------------------- /courses/level101/security/images/image20.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/security/images/image20.png -------------------------------------------------------------------------------- /courses/level101/security/images/image22.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/security/images/image22.png -------------------------------------------------------------------------------- /courses/level101/security/images/image23.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/security/images/image23.png -------------------------------------------------------------------------------- /courses/level101/security/images/image26.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/security/images/image26.png -------------------------------------------------------------------------------- /courses/level101/security/images/image5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/security/images/image5.png -------------------------------------------------------------------------------- /courses/level101/security/images/image6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/security/images/image6.png -------------------------------------------------------------------------------- /courses/level101/security/images/image7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/security/images/image7.png -------------------------------------------------------------------------------- /courses/level101/security/images/image8.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/security/images/image8.png -------------------------------------------------------------------------------- /courses/level101/security/images/image9.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/security/images/image9.png -------------------------------------------------------------------------------- /courses/level102/networking/media/Anycast.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/networking/media/Anycast.png -------------------------------------------------------------------------------- /courses/level101/big_data/images/map_reduce.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/big_data/images/map_reduce.jpg -------------------------------------------------------------------------------- /courses/level101/systems_design/images/cdn.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/systems_design/images/cdn.jpg -------------------------------------------------------------------------------- /courses/level102/networking/media/Dual ToR.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/networking/media/Dual ToR.png -------------------------------------------------------------------------------- /courses/level102/networking/media/LB 1-Arm.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/networking/media/LB 1-Arm.png -------------------------------------------------------------------------------- /courses/level102/networking/media/LB 2-Arm.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/networking/media/LB 2-Arm.png -------------------------------------------------------------------------------- /courses/level101/big_data/images/pig_example.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/big_data/images/pig_example.png -------------------------------------------------------------------------------- /courses/level101/databases_nosql/images/Quorum.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/databases_nosql/images/Quorum.png -------------------------------------------------------------------------------- /courses/level101/linux_networking/images/arp.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_networking/images/arp.gif -------------------------------------------------------------------------------- /courses/level101/linux_networking/images/pcap.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_networking/images/pcap.png -------------------------------------------------------------------------------- /courses/level102/networking/media/Dual ToR BGP.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/networking/media/Dual ToR BGP.png -------------------------------------------------------------------------------- /courses/level102/networking/media/Single ToR.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/networking/media/Single ToR.png -------------------------------------------------------------------------------- /courses/level101/linux_networking/images/closed.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_networking/images/closed.png -------------------------------------------------------------------------------- /courses/level101/big_data/images/hadoop_evolution.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/big_data/images/hadoop_evolution.png -------------------------------------------------------------------------------- /courses/level101/big_data/images/hdfs_architecture.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/big_data/images/hdfs_architecture.png -------------------------------------------------------------------------------- /courses/level101/big_data/images/mapreduce_example.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/big_data/images/mapreduce_example.jpg -------------------------------------------------------------------------------- /courses/level101/big_data/images/yarn_architecture.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/big_data/images/yarn_architecture.gif -------------------------------------------------------------------------------- /courses/level101/systems_design/images/availability.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/systems_design/images/availability.jpg -------------------------------------------------------------------------------- /courses/level101/systems_design/images/sharding-1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/systems_design/images/sharding-1.jpg -------------------------------------------------------------------------------- /courses/level101/systems_design/images/sharding-2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/systems_design/images/sharding-2.jpg -------------------------------------------------------------------------------- /courses/level101/systems_design/images/swimlane-1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/systems_design/images/swimlane-1.jpg -------------------------------------------------------------------------------- /courses/level101/systems_design/images/swimlane-2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/systems_design/images/swimlane-2.jpg -------------------------------------------------------------------------------- /courses/level102/linux_intermediate/images/image1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/linux_intermediate/images/image1.png -------------------------------------------------------------------------------- /courses/level102/linux_intermediate/images/image10.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/linux_intermediate/images/image10.png -------------------------------------------------------------------------------- /courses/level102/linux_intermediate/images/image11.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/linux_intermediate/images/image11.png -------------------------------------------------------------------------------- /courses/level102/linux_intermediate/images/image12.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/linux_intermediate/images/image12.png -------------------------------------------------------------------------------- /courses/level102/linux_intermediate/images/image13.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/linux_intermediate/images/image13.png -------------------------------------------------------------------------------- /courses/level102/linux_intermediate/images/image14.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/linux_intermediate/images/image14.png -------------------------------------------------------------------------------- /courses/level102/linux_intermediate/images/image15.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/linux_intermediate/images/image15.png -------------------------------------------------------------------------------- /courses/level102/linux_intermediate/images/image16.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/linux_intermediate/images/image16.png -------------------------------------------------------------------------------- /courses/level102/linux_intermediate/images/image17.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/linux_intermediate/images/image17.png -------------------------------------------------------------------------------- /courses/level102/linux_intermediate/images/image19.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/linux_intermediate/images/image19.png -------------------------------------------------------------------------------- /courses/level102/linux_intermediate/images/image2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/linux_intermediate/images/image2.png -------------------------------------------------------------------------------- /courses/level102/linux_intermediate/images/image20.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/linux_intermediate/images/image20.png -------------------------------------------------------------------------------- /courses/level102/linux_intermediate/images/image21.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/linux_intermediate/images/image21.png -------------------------------------------------------------------------------- /courses/level102/linux_intermediate/images/image22.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/linux_intermediate/images/image22.png -------------------------------------------------------------------------------- /courses/level102/linux_intermediate/images/image23.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/linux_intermediate/images/image23.png -------------------------------------------------------------------------------- /courses/level102/linux_intermediate/images/image24.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/linux_intermediate/images/image24.png -------------------------------------------------------------------------------- /courses/level102/linux_intermediate/images/image25.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/linux_intermediate/images/image25.png -------------------------------------------------------------------------------- /courses/level102/linux_intermediate/images/image26.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/linux_intermediate/images/image26.png -------------------------------------------------------------------------------- /courses/level102/linux_intermediate/images/image28.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/linux_intermediate/images/image28.png -------------------------------------------------------------------------------- /courses/level102/linux_intermediate/images/image29.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/linux_intermediate/images/image29.png -------------------------------------------------------------------------------- /courses/level102/linux_intermediate/images/image3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/linux_intermediate/images/image3.png -------------------------------------------------------------------------------- /courses/level102/linux_intermediate/images/image30.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/linux_intermediate/images/image30.png -------------------------------------------------------------------------------- /courses/level102/linux_intermediate/images/image4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/linux_intermediate/images/image4.png -------------------------------------------------------------------------------- /courses/level102/linux_intermediate/images/image5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/linux_intermediate/images/image5.png -------------------------------------------------------------------------------- /courses/level102/linux_intermediate/images/image6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/linux_intermediate/images/image6.png -------------------------------------------------------------------------------- /courses/level102/linux_intermediate/images/image7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/linux_intermediate/images/image7.png -------------------------------------------------------------------------------- /courses/level102/linux_intermediate/images/image9.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/linux_intermediate/images/image9.png -------------------------------------------------------------------------------- /courses/level102/system_design/images/microservices.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/system_design/images/microservices.jpg -------------------------------------------------------------------------------- /courses/level101/databases_nosql/images/vector_clocks.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/databases_nosql/images/vector_clocks.png -------------------------------------------------------------------------------- /courses/level101/databases_sql/images/partial_backup.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/databases_sql/images/partial_backup.png -------------------------------------------------------------------------------- /courses/level101/linux_networking/images/established.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_networking/images/established.png -------------------------------------------------------------------------------- /courses/level101/metrics_and_monitoring/images/image1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/metrics_and_monitoring/images/image1.jpg -------------------------------------------------------------------------------- /courses/level101/metrics_and_monitoring/images/image2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/metrics_and_monitoring/images/image2.png -------------------------------------------------------------------------------- /courses/level101/metrics_and_monitoring/images/image3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/metrics_and_monitoring/images/image3.jpg -------------------------------------------------------------------------------- /courses/level101/metrics_and_monitoring/images/image4.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/metrics_and_monitoring/images/image4.jpg -------------------------------------------------------------------------------- /courses/level101/metrics_and_monitoring/images/image5.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/metrics_and_monitoring/images/image5.jpg -------------------------------------------------------------------------------- /courses/level101/metrics_and_monitoring/images/image6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/metrics_and_monitoring/images/image6.png -------------------------------------------------------------------------------- /courses/level101/metrics_and_monitoring/images/image7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/metrics_and_monitoring/images/image7.png -------------------------------------------------------------------------------- /courses/level101/metrics_and_monitoring/images/image8.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/metrics_and_monitoring/images/image8.png -------------------------------------------------------------------------------- /courses/level101/metrics_and_monitoring/images/image9.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/metrics_and_monitoring/images/image9.png -------------------------------------------------------------------------------- /courses/level101/systems_design/images/microservices.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/systems_design/images/microservices.jpg -------------------------------------------------------------------------------- /courses/level101/databases_nosql/images/cluster_quorum.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/databases_nosql/images/cluster_quorum.png -------------------------------------------------------------------------------- /courses/level101/databases_sql/images/mysql_architecture.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/databases_sql/images/mysql_architecture.png -------------------------------------------------------------------------------- /courses/level101/databases_sql/images/mysqldumpslow_out.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/databases_sql/images/mysqldumpslow_out.png -------------------------------------------------------------------------------- /courses/level101/databases_sql/images/rbr_binlog_view_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/databases_sql/images/rbr_binlog_view_1.png -------------------------------------------------------------------------------- /courses/level101/databases_sql/images/sbr_binlog_view_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/databases_sql/images/sbr_binlog_view_1.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image1.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image10.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image10.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image11.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image11.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image12.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image12.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image13.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image13.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image14.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image14.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image15.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image15.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image16.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image16.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image17.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image17.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image18.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image18.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image19.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image19.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image2.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image20.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image20.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image21.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image21.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image22.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image22.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image23.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image23.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image24.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image24.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image25.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image25.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image26.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image26.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image27.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image27.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image28.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image28.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image29.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image29.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image3.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image30.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image30.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image31.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image31.jpg -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image32.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image32.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image33.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image33.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image34.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image34.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image35.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image35.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image36.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image36.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image37.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image37.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image38.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image38.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image39.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image39.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image4.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image40.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image40.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image41.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image41.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image42.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image42.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image43.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image43.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image44.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image44.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image45.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image45.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image46.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image46.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image47.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image47.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image48.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image48.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image49.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image49.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image5.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image50.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image50.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image51.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image51.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image52.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image52.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image53.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image53.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image54.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image54.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image55.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image55.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image56.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image56.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image57.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image57.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image58.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image58.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image6.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image7.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image8.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image8.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/admin/image9.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/admin/image9.png -------------------------------------------------------------------------------- /courses/level101/metrics_and_monitoring/images/image10.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/metrics_and_monitoring/images/image10.png -------------------------------------------------------------------------------- /courses/level101/metrics_and_monitoring/images/image11.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/metrics_and_monitoring/images/image11.png -------------------------------------------------------------------------------- /courses/level101/metrics_and_monitoring/images/image12.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/metrics_and_monitoring/images/image12.png -------------------------------------------------------------------------------- /courses/level101/databases_nosql/images/consistent_hashing.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/databases_nosql/images/consistent_hashing.png -------------------------------------------------------------------------------- /courses/level101/databases_nosql/images/database_sharding.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/databases_nosql/images/database_sharding.png -------------------------------------------------------------------------------- /courses/level101/databases_sql/images/innodb_architecture.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/databases_sql/images/innodb_architecture.png -------------------------------------------------------------------------------- /courses/level101/databases_sql/images/mysqldump_gtid_text.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/databases_sql/images/mysqldump_gtid_text.png -------------------------------------------------------------------------------- /courses/level101/databases_sql/images/rbr_example_update_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/databases_sql/images/rbr_example_update_1.png -------------------------------------------------------------------------------- /courses/level101/databases_sql/images/replication_function.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/databases_sql/images/replication_function.png -------------------------------------------------------------------------------- /courses/level101/databases_sql/images/sbr_example_update_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/databases_sql/images/sbr_example_update_1.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/commands/image1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/commands/image1.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/commands/image2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/commands/image2.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/commands/image3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/commands/image3.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/commands/image4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/commands/image4.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/commands/image5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/commands/image5.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/commands/image6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/commands/image6.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/commands/image7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/commands/image7.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/commands/image8.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/commands/image8.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/commands/image9.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/commands/image9.png -------------------------------------------------------------------------------- /courses/level101/systems_design/images/first-architecture.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/systems_design/images/first-architecture.jpg -------------------------------------------------------------------------------- /courses/level101/systems_design/images/horizontal-scaling.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/systems_design/images/horizontal-scaling.jpg -------------------------------------------------------------------------------- /courses/level101/databases_sql/images/replication_topologies.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/databases_sql/images/replication_topologies.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/commands/image10.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/commands/image10.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/commands/image11.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/commands/image11.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/commands/image12.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/commands/image12.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/commands/image13.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/commands/image13.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/commands/image14.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/commands/image14.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/commands/image15.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/commands/image15.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/commands/image16.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/commands/image16.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/commands/image17.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/commands/image17.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/commands/image18.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/commands/image18.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/commands/image19.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/commands/image19.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/commands/image20.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/commands/image20.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/commands/image21.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/commands/image21.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/commands/image22.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/commands/image22.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/commands/image23.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/commands/image23.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/commands/image24.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/commands/image24.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/commands/image25.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/commands/image25.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/commands/image26.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/commands/image26.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/commands/image27.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/commands/image27.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/commands/image28.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/commands/image28.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/commands/image29.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/commands/image29.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/commands/image30.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/commands/image30.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/commands/image31.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/commands/image31.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/commands/image32.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/commands/image32.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/commands/image33.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/commands/image33.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/commands/image34.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/commands/image34.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/commands/image35.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/commands/image35.png -------------------------------------------------------------------------------- /courses/level101/linux_basics/images/linux/commands/image36.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level101/linux_basics/images/linux/commands/image36.png -------------------------------------------------------------------------------- /courses/level102/containerization_and_orchestration/images/VM.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/containerization_and_orchestration/images/VM.png -------------------------------------------------------------------------------- /courses/level102/system_design/images/initial_architecture.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/system_design/images/initial_architecture.jpeg -------------------------------------------------------------------------------- /courses/level102/containerization_and_orchestration/images/cg1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/containerization_and_orchestration/images/cg1.png -------------------------------------------------------------------------------- /courses/level102/containerization_and_orchestration/images/cg2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/containerization_and_orchestration/images/cg2.png -------------------------------------------------------------------------------- /courses/level102/containerization_and_orchestration/images/cg3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/containerization_and_orchestration/images/cg3.png -------------------------------------------------------------------------------- /courses/level102/containerization_and_orchestration/images/cg4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/containerization_and_orchestration/images/cg4.png -------------------------------------------------------------------------------- /courses/level102/containerization_and_orchestration/images/cg5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/containerization_and_orchestration/images/cg5.png -------------------------------------------------------------------------------- /courses/level102/containerization_and_orchestration/images/cg6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/containerization_and_orchestration/images/cg6.png -------------------------------------------------------------------------------- /courses/level102/containerization_and_orchestration/images/ns1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/containerization_and_orchestration/images/ns1.png -------------------------------------------------------------------------------- /courses/level102/containerization_and_orchestration/images/ns2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/containerization_and_orchestration/images/ns2.png -------------------------------------------------------------------------------- /courses/level102/containerization_and_orchestration/images/ns3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/containerization_and_orchestration/images/ns3.png -------------------------------------------------------------------------------- /courses/level102/containerization_and_orchestration/images/ns4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/containerization_and_orchestration/images/ns4.png -------------------------------------------------------------------------------- /courses/level102/containerization_and_orchestration/images/ns5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/containerization_and_orchestration/images/ns5.png -------------------------------------------------------------------------------- /courses/level102/containerization_and_orchestration/images/kube1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/containerization_and_orchestration/images/kube1.png -------------------------------------------------------------------------------- /courses/level102/containerization_and_orchestration/images/kube2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/containerization_and_orchestration/images/kube2.png -------------------------------------------------------------------------------- /courses/level102/containerization_and_orchestration/images/kube3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/containerization_and_orchestration/images/kube3.png -------------------------------------------------------------------------------- /courses/level102/containerization_and_orchestration/images/kube4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/containerization_and_orchestration/images/kube4.png -------------------------------------------------------------------------------- /courses/level102/containerization_and_orchestration/images/kube5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/containerization_and_orchestration/images/kube5.png -------------------------------------------------------------------------------- /courses/level102/containerization_and_orchestration/images/kube6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/containerization_and_orchestration/images/kube6.png -------------------------------------------------------------------------------- /courses/level102/containerization_and_orchestration/images/kube7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/containerization_and_orchestration/images/kube7.png -------------------------------------------------------------------------------- /courses/level102/containerization_and_orchestration/images/kube8.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/containerization_and_orchestration/images/kube8.png -------------------------------------------------------------------------------- /courses/level102/containerization_and_orchestration/images/kube9.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/containerization_and_orchestration/images/kube9.png -------------------------------------------------------------------------------- /courses/level102/system_design/images/initial_application_sketch.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/system_design/images/initial_application_sketch.jpeg -------------------------------------------------------------------------------- /courses/level102/containerization_and_orchestration/images/Containers.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/containerization_and_orchestration/images/Containers.png -------------------------------------------------------------------------------- /courses/level102/containerization_and_orchestration/images/dockerengine.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/containerization_and_orchestration/images/dockerengine.png -------------------------------------------------------------------------------- /courses/level102/containerization_and_orchestration/images/kubernetes.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/containerization_and_orchestration/images/kubernetes.png -------------------------------------------------------------------------------- /courses/level102/system_troubleshooting_and_performance/images/FlaskCode.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/system_troubleshooting_and_performance/images/FlaskCode.png -------------------------------------------------------------------------------- /courses/level102/system_troubleshooting_and_performance/images/FlaskStart.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/system_troubleshooting_and_performance/images/FlaskStart.png -------------------------------------------------------------------------------- /courses/level102/system_troubleshooting_and_performance/images/MemUsage01.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/system_troubleshooting_and_performance/images/MemUsage01.png -------------------------------------------------------------------------------- /courses/level102/system_troubleshooting_and_performance/images/MemUsage02.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/system_troubleshooting_and_performance/images/MemUsage02.png -------------------------------------------------------------------------------- /courses/level102/system_troubleshooting_and_performance/images/MemUsage03.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/system_troubleshooting_and_performance/images/MemUsage03.png -------------------------------------------------------------------------------- /courses/level102/system_troubleshooting_and_performance/images/MemUsageChart.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/system_troubleshooting_and_performance/images/MemUsageChart.png -------------------------------------------------------------------------------- /courses/level102/system_troubleshooting_and_performance/images/Tracemalloc01.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/system_troubleshooting_and_performance/images/Tracemalloc01.png -------------------------------------------------------------------------------- /courses/level102/system_troubleshooting_and_performance/images/Tracemalloc02.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/system_troubleshooting_and_performance/images/Tracemalloc02.png -------------------------------------------------------------------------------- /courses/level102/system_troubleshooting_and_performance/images/Tracemalloc03.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/system_troubleshooting_and_performance/images/Tracemalloc03.png -------------------------------------------------------------------------------- /courses/level102/continuous_integration_and_continuous_delivery/images/CD_Image1.JPG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/continuous_integration_and_continuous_delivery/images/CD_Image1.JPG -------------------------------------------------------------------------------- /courses/level102/continuous_integration_and_continuous_delivery/images/CI_Image1.JPG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/continuous_integration_and_continuous_delivery/images/CI_Image1.JPG -------------------------------------------------------------------------------- /courses/level102/continuous_integration_and_continuous_delivery/images/CI_Image2.JPG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/continuous_integration_and_continuous_delivery/images/CI_Image2.JPG -------------------------------------------------------------------------------- /courses/level102/continuous_integration_and_continuous_delivery/images/Jenkins1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/continuous_integration_and_continuous_delivery/images/Jenkins1.png -------------------------------------------------------------------------------- /courses/level102/continuous_integration_and_continuous_delivery/images/Jenkins2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/continuous_integration_and_continuous_delivery/images/Jenkins2.png -------------------------------------------------------------------------------- /courses/level102/continuous_integration_and_continuous_delivery/images/Jenkins3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/continuous_integration_and_continuous_delivery/images/Jenkins3.png -------------------------------------------------------------------------------- /courses/level102/continuous_integration_and_continuous_delivery/images/Jenkins4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/continuous_integration_and_continuous_delivery/images/Jenkins4.png -------------------------------------------------------------------------------- /courses/level102/continuous_integration_and_continuous_delivery/images/Jenkins5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/continuous_integration_and_continuous_delivery/images/Jenkins5.png -------------------------------------------------------------------------------- /courses/level102/continuous_integration_and_continuous_delivery/images/Jenkins6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/continuous_integration_and_continuous_delivery/images/Jenkins6.png -------------------------------------------------------------------------------- /courses/level102/system_troubleshooting_and_performance/images/TroubleshootingFlow.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/system_troubleshooting_and_performance/images/TroubleshootingFlow.jpg -------------------------------------------------------------------------------- /courses/level102/system_calls_and_signals/images/Transition_between_User_and_Kernel_Mode.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linkedin/school-of-sre/HEAD/courses/level102/system_calls_and_signals/images/Transition_between_User_and_Kernel_Mode.png -------------------------------------------------------------------------------- /NOTICE: -------------------------------------------------------------------------------- 1 | Copyright 2020 LinkedIn Corporation 2 | All Rights Reserved. 3 | 4 | Licensed under the Creative Commons Attribution 4.0 International Public License (the "License"). 5 | See LICENSE in the project root for license information. 6 | 7 | This product includes: 8 | * N/A 9 | -------------------------------------------------------------------------------- /courses/level101/git/conclusion.md: -------------------------------------------------------------------------------- 1 | ## What next from here? 2 | 3 | There are a lot of git commands and features which we have not explored here. But with the base built-up, be sure to explore concepts like 4 | 5 | - Cherrypick 6 | - Squash 7 | - Amend 8 | - Stash 9 | - Reset 10 | 11 | -------------------------------------------------------------------------------- /courses/sre_community.md: -------------------------------------------------------------------------------- 1 | We are having an active [LinkedIn](https://www.linkedin.com) community for School of SRE. 2 | 3 | **Please join the group via**: [https://www.linkedin.com/groups/12493545/](https://www.linkedin.com/groups/12493545/) 4 | 5 | The group has members with different levels of experience in site reliability engineering. There are active conversation on different technical topics centered around site reliability engineering. We encourage everyone to join the conversation and learn from each other and build a successful career in the SRE space. 6 | -------------------------------------------------------------------------------- /courses/level101/systems_design/conclusion.md: -------------------------------------------------------------------------------- 1 | # Conclusion 2 | 3 | Armed with these principles, we hope the course will give a fresh perspective to design software systems. It might be over-engineering to get all this on day zero. But some are really important from day 0 like eliminating single points of failure, making scalable services by just increasing replicas. As a bottleneck is reached, we can _split code by services_, _shard data_ to scale. As the organization matures, bringing in [chaos engineering](https://en.wikipedia.org/wiki/Chaos_engineering) to measure how systems react to failure will help in designing robust software systems. 4 | -------------------------------------------------------------------------------- /courses/level102/containerization_and_orchestration/conclusion.md: -------------------------------------------------------------------------------- 1 | # Conclusion 2 | 3 | In this sub-module we have toured the world of containers starting from why we use containers, how containers evolved from the virtual machine past (though they are, in no means, obsolete) and how they are different from virtual machines. We then saw how containers are implemented with emphasis on cgroups and namespaces along with some hands-on exercises. Finally we concluded our journey with container orchestration where we learnt a bit of Kubernetes with some practical examples. 4 | 5 | Hope this module gives you enough knowledge and interest to continue learning and applying these technologies in greater depth! -------------------------------------------------------------------------------- /courses/level102/system_design/conclusion.md: -------------------------------------------------------------------------------- 1 | We have looked at designing a sytem from the scratch, scaling it up from a single server to multiple datacenters and hundreds of thousands of users. However, you might have (rightly!) guessed that there is a lot more to system design than what we have covered so far. This course should give you a sweeping glance at the things that are fundamental to any system design process. Specific solutions implemented, frameworks and orchestration systems used evolve rapidly. However, the guiding principles remain the same. We hope you this course helped in getting you started along the right direction and that you have fun designing systems and solving interesting problems. -------------------------------------------------------------------------------- /courses/stylesheets/custom.css: -------------------------------------------------------------------------------- 1 | div.md-content img { border: 4px solid #ddd; padding: 12px; } 2 | .callout { 3 | padding: 20px; 4 | margin: 20px 0; 5 | border: 1px solid #eee; 6 | border-left-width: 5px; 7 | border-radius: 3px; 8 | h4 { 9 | margin-top: 0; 10 | margin-bottom: 1px; 11 | } 12 | p:last-child { 13 | margin-bottom: 0; 14 | } 15 | code { 16 | border-radius: 3px; 17 | } 18 | } 19 | .callout-info { 20 | border-left-color: #428bca; 21 | h4 { 22 | color: #428bca; 23 | } 24 | } 25 | .callout-primary { 26 | border-left-color: #5bc0de; 27 | h4 { 28 | color: #5bc0de; 29 | } 30 | } 31 | .callout-danger { 32 | border-left-color: #d9534f; 33 | h4 { 34 | color: #d9534f; 35 | } 36 | } 37 | -------------------------------------------------------------------------------- /courses/level101/big_data/tasks.md: -------------------------------------------------------------------------------- 1 | # Tasks and conclusion 2 | 3 | ## Post-training tasks: 4 | 5 | 1. Try setting up your own three-node Hadoop cluster. 6 | 1. A VM-based solution can be found [here](http://hortonworks.com/wp-content/uploads/2015/04/Import_on_VBox_4_07_2015.pdf) 7 | 2. Write a simple Spark/MR job of your choice and understand how to generate analytics from data. 8 | 1. Sample dataset can be found [here](https://grouplens.org/datasets/movielens/) 9 | 10 | ## References: 11 | 1. [Hadoop documentation](http://hadoop.apache.org/docs/current/) 12 | 2. [HDFS Architecture](http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html) 13 | 3. [YARN Architecture](http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html) 14 | 4. [Google GFS paper](https://storage.googleapis.com/pub-tools-public-publication-data/pdf/035fc972c796d33122033a0614bc94cff1527999.pdf) 15 | -------------------------------------------------------------------------------- /courses/level102/linux_intermediate/conclusion.md: -------------------------------------------------------------------------------- 1 | # Conclusion 2 | 3 | Understanding package management is very crucial as an SRE, we always want the right set of software with their compatible versions to work in harmony to drive the big infrastructure and organization. 4 | 5 | We also saw how we can configure and use storage drives and how we can have redundancy of data using RAID to avoid the data loss, how data is placed over disk and use of file systems. 6 | 7 | Archiving and Backup is also a crucial part of being an SRE, It’s our responsibility to keep the data safe and in a more efficient manner. 8 | 9 | Bash is very useful to automate the day to day toil that an SRE stumbles into. The above walkthrough of bash gives us an idea to get started, but mere reading through it won’t take you much further. I believe “taking action and practicing the topic” would give you confidence and will help you become a better SRE. 10 | 11 | -------------------------------------------------------------------------------- /overrides/partials/nav.html: -------------------------------------------------------------------------------- 1 | {% import "partials/nav-item.html" as item with context %} 2 | 3 | 4 | {% set class = "md-nav md-nav--primary" %} 5 | {% if "navigation.tabs" in features %} 6 | {% set class = class ~ " md-nav--lifted" %} 7 | {% endif %} 8 | {% if "toc.integrate" in features %} 9 | {% set class = class ~ " md-nav--integrated" %} 10 | {% endif %} 11 | 12 | 13 | -------------------------------------------------------------------------------- /courses/level102/continuous_integration_and_continuous_delivery/introduction_to_cicd.md: -------------------------------------------------------------------------------- 1 | Continuous Integration and Continuous Delivery, also known as CI/CD, is a set of processes that helps in faster integration of software code changes and deployment to the end user in a reliable manner. The more frequent integrations and deployments helps reduce the software development lifecycle. 2 | There are three practices in CI/CD: 3 | 4 | * Continuous Integration 5 | * Continuous Delivery 6 | * Continuous Deployment 7 | Let’s look in detail at each of these in the coming sections. 8 | 9 | ## The Benefits of CI/CD 10 | 11 | 1. Significant reduction in integration problems. 12 | 2. Teams can develop cohesive software more rapidly. 13 | 3. Improved Collaboration between developers and operation teams can reduce the production integration issues. 14 | 4. Faster delivery of new features with less friction 15 | 5. Better debugging the production issues and fixing them in the next release/patch. 16 | -------------------------------------------------------------------------------- /courses/level102/networking/conclusion.md: -------------------------------------------------------------------------------- 1 | 2 | This course would have given some background on deploying services in datacentre and various parameters to consider and available solutions. It has to be noted that, each of the solution discussed here have various pros and cons, so specific to the scenario/requirement, the right fit among these are to be identified and used. As we didnt go the depth of various technologies/solution in this course, it might have made the reader curious to know about some of the topics. Here are some of the reference or online training content, for further learning. 3 | 4 | [linked engineering blog](https://engineering.linkedin.com/blog/topic/datacenter) : has information about how Linkedin datacentres are setup and some of the key problems are solved. 5 | 6 | [IPSpace blog](https://blog.ipspace.net/tag/data-center.html) : Has lot of articles about datacentre networking. 7 | 8 | [Networking Basics](https://www.edx.org/course/introduction-to-networking) course in edx. 9 | 10 | Happy learning !! 11 | -------------------------------------------------------------------------------- /courses/level101/linux_networking/conclusion.md: -------------------------------------------------------------------------------- 1 | # Conclusion 2 | 3 | With this, we have traversed through the TCP/IP stack completely. We hope there will be a different perspective when one opens any website in the browser post the course. 4 | 5 | During the course we have also dissected what are common tasks in this pipeline which falls under the ambit of SRE. 6 | 7 | # Post Training Exercises 8 | 1. Set up your own DNS resolver in the `dev` environment which acts as an authoritative DNS server for `example.com` and forwarder for other domains. Update `resolv.conf` to use the new DNS resolver running in `localhost`. 9 | 2. Set up a site `dummy.example.com` in `localhost` and run a webserver with a self-signed certificate. Update the trusted CAs or pass self-signed CA’s public key as a parameter so that `curl https://dummy.example.com -v` works properly without self-signed cert warning. 10 | 3. Update the routing table to use another host (container/VM) in the same network as a gateway for `8.8.8.8/32` and run `ping 8.8.8.8`. Do the packet capture on the new gateway to see L3 hop is working as expected (might need to disable `icmp_redirect`). 11 | 12 | -------------------------------------------------------------------------------- /courses/level101/databases_sql/conclusion.md: -------------------------------------------------------------------------------- 1 | # Conclusion 2 | We have covered basic concepts of SQL databases. We have also covered some of the tasks that an SRE may be responsible for—there is so much more to learn and do. We hope this course gives you a good start and inspires you to explore further. 3 | 4 | 5 | ### Further reading 6 | 7 | * More practice with online resources like [this one](https://www.w3resource.com/sql-exercises/index.php) 8 | * [Normalization](https://beginnersbook.com/2015/05/normalization-in-dbms/) 9 | * [Routines](https://dev.mysql.com/doc/refman/8.0/en/stored-routines.html), [triggers](https://dev.mysql.com/doc/refman/8.0/en/trigger-syntax.html) 10 | * [Views](https://www.essentialsql.com/what-is-a-relational-database-view/) 11 | * [Transaction isolation levels](https://dev.mysql.com/doc/refman/8.0/en/innodb-transaction-isolation-levels.html) 12 | * [Sharding](https://www.digitalocean.com/community/tutorials/understanding-database-sharding) 13 | * [Setting up HA](https://severalnines.com/database-blog/introduction-database-high-availability-mysql-mariadb), [monitoring](https://blog.serverdensity.com/how-to-monitor-mysql/), [backups](https://dev.mysql.com/doc/refman/8.0/en/backup-methods.html) -------------------------------------------------------------------------------- /courses/level102/system_troubleshooting_and_performance/conclusion.md: -------------------------------------------------------------------------------- 1 | Complex systems have many factors which can go wrong. It can be a bad design & architecture, poorly managed code, poor policies around different caches, bad DB queries or architecture, improper use of resources, or bad OS version, poorly monitored system, datacenter issues, network faults, and many more, Any of these can go wrong. 2 | 3 | As an SRE, Knowing important tools/commands, best practices, profiling, benchmarking and scaling can help you with faster troubleshooting and performance improvement of the overall system. 4 | 5 | ## Further readings 6 | 7 | Here are some links from the LinkedIn Engineering Blog, as written by LinkedIn engineers, about firefighting they did, ensuring site up 24x7x365. 8 | 9 | - [Taming memory fragmentation in Venice with Jemalloc](https://engineering.linkedin.com/blog/2021/taming-memory-fragmentation-in-venice-with-jemalloc) 10 | - [Intro: Every Day Is Monday in Operations](https://www.linkedin.com/pulse/introduction-every-day-monday-operations-benjamin-purgason) 11 | - [Fixing Linux filesystem performance regressions](https://engineering.linkedin.com/blog/2020/fixing-linux-filesystem-performance-regressions) 12 | - [The impact of slow NFS on data systems](https://engineering.linkedin.com/blog/2020/the-impact-of-slow-nfs-on-data-systems) 13 | -------------------------------------------------------------------------------- /courses/level101/databases_nosql/further_reading.md: -------------------------------------------------------------------------------- 1 | # Conclusion 2 | 3 | We have covered basic concepts of NoSQL databases. There is much more to learn and do. We hope this course gives you a good start and inspires you to explore further. 4 | 5 | # Further reading 6 | 7 | NoSQL: 8 | 9 | [https://hostingdata.co.uk/nosql-database/](https://hostingdata.co.uk/nosql-database/) 10 | 11 | [https://www.mongodb.com/nosql-explained](https://www.mongodb.com/nosql-explained) 12 | 13 | [https://www.mongodb.com/nosql-explained/nosql-vs-sql](https://www.mongodb.com/nosql-explained/nosql-vs-sql) 14 | 15 | Cap Theorem 16 | 17 | [http://www.julianbrowne.com/article/brewers-cap-theorem](http://www.julianbrowne.com/article/brewers-cap-theorem) 18 | 19 | Scalability 20 | 21 | [http://www.slideshare.net/jboner/scalability-availability-stability-patterns](http://www.slideshare.net/jboner/scalability-availability-stability-patterns) 22 | 23 | Eventual Consistency 24 | 25 | [https://www.allthingsdistributed.com/2008/12/eventually_consistent.html](https://www.allthingsdistributed.com/2008/12/eventually_consistent.html) 26 | 27 | [https://www.toptal.com/big-data/consistent-hashing](https://www.toptal.com/big-data/consistent-hashing) 28 | 29 | [https://web.stanford.edu/class/cs244/papers/chord_TON_2003.pdf](https://web.stanford.edu/class/cs244/papers/chord_TON_2003.pdf) 30 | -------------------------------------------------------------------------------- /.github/workflows/gh-deploy.yml: -------------------------------------------------------------------------------- 1 | name: Deploy to gh-pages 2 | 3 | # Controls when the action will run. 4 | on: 5 | # Triggers the workflow on push or pull request events but only for the main branch 6 | push: 7 | branches: [ main ] 8 | 9 | # Allows you to run this workflow manually from the Actions tab 10 | workflow_dispatch: 11 | 12 | # A workflow run is made up of one or more jobs that can run sequentially or in parallel 13 | jobs: 14 | build-and-deploy: 15 | # The type of runner that the job will run on 16 | runs-on: ubuntu-latest 17 | 18 | # Steps represent a sequence of tasks that will be executed as part of the job 19 | steps: 20 | # Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it 21 | - uses: actions/checkout@v4 22 | with: 23 | # this fetches all branches. Needed because we need gh-pages branch for deploy to work 24 | fetch-depth: 0 25 | - name: Set up Python 26 | uses: actions/setup-python@v5 27 | with: 28 | python-version: '3.9' 29 | - name: Install dependencies 30 | run: | 31 | python -m pip install --upgrade pip 32 | pip install -r requirements.txt 33 | - name: Deploy 34 | run: | 35 | git config user.name github-actions 36 | git config user.email github-actions@github.com 37 | mkdocs gh-deploy 38 | -------------------------------------------------------------------------------- /courses/level101/messagequeue/further_reading.md: -------------------------------------------------------------------------------- 1 | # Conclusion 2 | 3 | We have covered basic concepts of Message Services. There is much more to learn and do. We hope this course gives you a good start and inspires you to explore further. 4 | 5 | # Further reading 6 | 7 | [https://sudhir.io/the-big-little-guide-to-message-queues](https://sudhir.io/the-big-little-guide-to-message-queues) 8 | 9 | [Understanding message brokers: learn the mechanics of messaging though ActiveMQ and Kafka](http://www.oreilly.com/programming/free/understanding-message-brokers.csp) 10 | 11 | [Video: The Myth of the Magical Messaging Fabric by Jakub Korab](https://www.youtube.com/watch?v=Ie3--CSpCGs) 12 | 13 | [G. Fu, Y. Zhang and G. Yu, "A Fair Comparison of Message Queuing Systems," in IEEE Access, vol. 9, pp. 421-432, 2021, doi: 10.1109/ACCESS.2020.3046503.](https://ieeexplore.ieee.org/document/9303425) ([PDF](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9303425)) 14 | 15 | [Design Patterns for Cloud Native Applications: Chapter 2 Communication Patterns]() 16 | 17 | [Choose between Azure messaging services - Event Grid, Event Hubs, and Service Bus](https://docs.microsoft.com/en-us/azure/event-grid/compare-messaging-services) 18 | 19 | [Exactly-once message delivery](https://exactly-once.github.io/posts/exactly-once-delivery/) 20 | 21 | [Task Queues](https://taskqueues.com/) 22 | 23 | [RabbitMQ tutorial](https://www.rabbitmq.com/getstarted.html) 24 | -------------------------------------------------------------------------------- /courses/level102/system_calls_and_signals/conclusion.md: -------------------------------------------------------------------------------- 1 | # Conclusion 2 | 3 | One of the main goals of a SRE is to improve the reliability of high scale systems. Inorder to achieve this, a basic understanding of the internal workings of a system is necessary. 4 | 5 | Getting to know about how signals work is important since they play a big role in the lifecycle of processes. We see the use of signals in a range of operations on processes : from creating a process to killing a process. Knowledge of signals is important especially when handling them in programs. If you anticipate an event that causes signals, you can define a handler function and tell the operating system to run it when that particular type of signal arrives. 6 | 7 | Understanding system calls is especially useful to SRE's while debugging any Linux process. System calls provide precise knowledge of the internal functionalities of an operating system. It gives an in-depth understanding for programmers about C library functions which implement system calls at a lower level. With the use of *strace* command, one may easily debug slow or hung processes. 8 | 9 | 10 | 11 | # Further Reading 12 | 13 | 14 | 15 | 16 | 17 | 18 | -------------------------------------------------------------------------------- /courses/level102/continuous_integration_and_continuous_delivery/continuous_delivery_release_pipeline.md: -------------------------------------------------------------------------------- 1 | ***Continuous Delivery*** means deploying the application builds more frequently in the non-production environments such as [SIT, UAT, INT](https://medium.com/@buttertechn/qa-testing-what-is-dev-sit-uat-prod-ac97965ce4f) and performing the integration tests and the acceptance tests automatically. 2 | 3 | In the CD, the tests are performed on the integrated application instead of the single microservice in the cases of microservice based application. The tests must include all the functional tests and the acceptance tests that may contain the UI tests. The build must be immutable in nature, that is the same package must be deployed across all the environments including the Production. 4 | 5 | The deployment to the Production is often manual after performing additional acceptance tests such as performance tests etc. So, the fully automated deployment to the Production environments is called the ***Continuous Deployment*** (whereas ***CD – Continuous delivery*** doesn’t automatically deploy to Production). The continuous deployment must have a [feature toggle](https://martinfowler.com/articles/feature-toggles.html) so that a feature can be toggled off without the need for redeploying the code. 6 | 7 | Often, the deployment involves more than one production environment, for example in [blue-green environments](https://www.linkedin.com/pulse/using-blue-green-deployments-reduce-downtime-nessan-harpur) the application is first deployed to the blue environment and then to the green environment so that the downtime is not required. 8 | 9 | 10 | ![](./images/CD_Image1.JPG) 11 | 12 | *Fig 3: Continuous Delivery Pipeline* -------------------------------------------------------------------------------- /courses/CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | We realise that the initial content we created is just a starting point and our hope is that the community can help in the journey refining and extending the contents. 2 | 3 | As a contributor, you represent that the content you submit is not plagiarised. By submitting the content, you (and, if applicable, your employer) are licensing the submitted content to LinkedIn and the open source community subject to the Creative Commons Attribution 4.0 International Public License. 4 | 5 | *Repository URL*: [https://github.com/linkedin/school-of-sre](https://github.com/linkedin/school-of-sre) 6 | 7 | ### Contributing Guidelines 8 | Ensure that you adhere to the following guidelines: 9 | 10 | * Should be about principles and concepts that can be applied in any company or individual project. Do not focus on particular tools or tech stack (which usually change over time). 11 | * Adhere to the [Code of Conduct](/school-of-sre/CODE_OF_CONDUCT/). 12 | * Should be relevant to the roles and responsibilities of an SRE. 13 | * Should be locally tested (see steps for testing) and well-formatted. 14 | * It is good practice to open an issue first and discuss your changes before submitting a pull request. This way, you can incorporate ideas from others before you even start. 15 | 16 | ### Building and testing locally 17 | Run the following commands to build and view the site locally before opening a PR. 18 | 19 | ```shell 20 | python3 -m venv .venv 21 | source .venv/bin/activate 22 | pip install -r requirements.txt 23 | mkdocs build 24 | mkdocs serve 25 | ``` 26 | 27 | ### Opening a PR 28 | Follow the [GitHub PR workflow](https://guides.github.com/introduction/flow/) for your contributions. 29 | 30 | Fork this repo, create a feature branch, commit your changes and open a PR to this repo. -------------------------------------------------------------------------------- /courses/level101/metrics_and_monitoring/alerts.md: -------------------------------------------------------------------------------- 1 | ## 2 | 3 | # Proactive monitoring using alerts 4 | Earlier we discussed different ways to collect key metric data points 5 | from a service and its underlying infrastructure. This data gives us a 6 | better understanding of how the service is performing. One of the main 7 | objectives of monitoring is to detect any service degradations early 8 | (reduce Mean Time To Detect) and notify stakeholders so that the issues 9 | are either avoided or can be fixed early, thus reducing Mean Time To 10 | Recover (MTTR). For example, if you are notified when resource usage by 11 | a service exceeds 90%, you can take preventive measures to avoid 12 | any service breakdown due to a shortage of resources. On the other hand, 13 | when a service goes down due to an issue, early detection and 14 | notification of such incidents can help you quickly fix the issue. 15 | 16 | ![An alert notification received on Slack](images/image11.png) 17 |

Figure 8: An alert notification received on Slack

18 | 19 | Today most of the monitoring services available provide a mechanism to 20 | set up alerts on one or a combination of metrics to actively monitor the 21 | service health. These alerts have a set of defined rules or conditions, 22 | and when the rule is broken, you are notified. These rules can be as 23 | simple as notifying when the metric value exceeds _n_ to as complex as a 24 | week-over-week (WoW) comparison of standard deviation over a period of 25 | time. Monitoring tools notify you about an active alert, and most of 26 | these tools support instant messaging (IM) platforms, SMS, email, or 27 | phone calls. Figure 8 shows a sample alert notification received on 28 | Slack for memory usage exceeding 90% of total RAM space on the 29 | host. 30 | -------------------------------------------------------------------------------- /courses/level101/security/intro.md: -------------------------------------------------------------------------------- 1 | # Security 2 | 3 | ## Prerequisites 4 | 5 | 1. [Linux Basics](https://linkedin.github.io/school-of-sre/level101/linux_basics/intro/) 6 | 7 | 2. [Linux Networking](https://linkedin.github.io/school-of-sre/level101/linux_networking/intro/) 8 | 9 | 10 | ## What to expect from this course 11 | 12 | The course covers fundamentals of information security along with touching on subjects of system security, network & web security. This course aims to get you familiar with the basics of information security in day-to-day operations and then as an SRE develop the mindset of ensuring that security takes a front-seat while developing solutions. The course also serves as an introduction to common risks and best practices along with practical ways to find out vulnerable systems and loopholes which might become compromised if not secured. 13 | 14 | 15 | ## What is not covered under this course 16 | 17 | The courseware is not an ethical hacking workshop or a very deep dive into the fundamentals of the problems. The course does not deal with hacking or breaking into systems but rather an approach on how to ensure you don’t get into those situations and also to make you aware of different ways a system can be compromised. 18 | 19 | 20 | ## Course Contents 21 | 22 | 1. [Fundamentals](https://linkedin.github.io/school-of-sre/level101/security/fundamentals/) 23 | 2. [Network Security](https://linkedin.github.io/school-of-sre/level101/security/network_security/) 24 | 3. [Threats, Attacks & Defence](https://linkedin.github.io/school-of-sre/level101/security/threats_attacks_defences/) 25 | 4. [Writing Secure Code & More](https://linkedin.github.io/school-of-sre/level101/security/writing_secure_code/) 26 | 5. [Conclusion](https://linkedin.github.io/school-of-sre/level101/security/conclusion/) 27 | -------------------------------------------------------------------------------- /courses/level101/databases_sql/innodb.md: -------------------------------------------------------------------------------- 1 | ### Why should you use this? 2 | 3 | General purpose, row level locking, ACID support, transactions, crash recovery and multi-version concurrency control, etc. 4 | 5 | 6 | ### Architecture 7 | 8 | ![alt_text](images/innodb_architecture.png "InnoDB components") 9 | 10 | 11 | ### Key components: 12 | 13 | * Memory: 14 | * Buffer pool: LRU cache of frequently used data (table and index) to be processed directly from memory, which speeds up processing. Important for tuning performance. 15 | * Change buffer: Caches changes to secondary index pages when those pages are not in the buffer pool and merges it when they are fetched. Merging may take a long time and impact live queries. It also takes up part of the buffer pool. Avoids the extra I/O to read secondary indexes in. 16 | * Adaptive hash index: Supplements InnoDB’s B-Tree indexes with fast hash lookup tables like a cache. Slight performance penalty for misses, also adds maintenance overhead of updating it. Hash collisions cause AHI rebuilding for large DBs. 17 | * Log buffer: Holds log data before flush to disk. 18 | 19 | Size of each above memory is configurable, and impacts performance a lot. Requires careful analysis of workload, available resources, benchmarking and tuning for optimal performance. 20 | 21 | * Disk: 22 | * Tables: Stores data within rows and columns. 23 | * Indexes: Helps find rows with specific column values quickly, avoids full table scans. 24 | * Redo Logs: all transactions are written to them, and after a crash, the recovery process corrects data written by incomplete transactions and replays any pending ones. 25 | * Undo Logs: Records associated with a single transaction that contains information about how to undo the latest change by a transaction. 26 | 27 | -------------------------------------------------------------------------------- /courses/level102/networking/rtt.md: -------------------------------------------------------------------------------- 1 | > *Latency plays a key role in determining the overall performance of the 2 | distributed service/application, where calls are made between hosts to 3 | serve the users.* 4 | 5 | RTT is a measure of time, it takes for a packet to reach B from A, and 6 | return to A. It is measured in milliseconds. This measure plays a role 7 | in determining the performance of the services. Its impact is seen in 8 | calls made between different servers/services, to serve the user, as 9 | well as the TCP throughput that can be achieved. 10 | 11 | It is fairly common that service makes multiple calls to servers within 12 | its cluster or to different services like authentication, logging, 13 | database, etc, to respond to each user/client request. These servers can 14 | be spread across different cabinets, at times even between different 15 | data centres in the same region. Such cases are quite possible in cloud 16 | solutions, where the deployment spreads across different sites within a 17 | region. As the RTT increases, the response time for each of the calls 18 | gets longer and thereby has a cascading effect on the end response being 19 | sent to the user. 20 | 21 | ### Relation of RTT and throughput 22 | 23 | RTT is inversely proportional to the TCP throughput. As RTT increases, 24 | it reduces the TCP throughput, just like packet loss. Below is a formula 25 | to estimate the TCP throughput, based on TCP mss, RTT and packet loss. 26 | 27 | ![Diagram, schematic Description automatically 28 | generated](./media/RTT.png) 29 | 30 | As within a data centre, these calculations are also, important for 31 | communication over the internet, where a client can connect to the DC 32 | hosted services, over different telco networks and the RTT is not very 33 | stable, due to the unpredictability of the Internet routing policies. 34 | -------------------------------------------------------------------------------- /courses/level101/metrics_and_monitoring/best_practices.md: -------------------------------------------------------------------------------- 1 | ## 2 | 3 | # Best practices for monitoring 4 | 5 | When setting up monitoring for a service, keep the following best 6 | practices in mind. 7 | 8 | - **Use the right metric type**—Most of the libraries available 9 | today offer various metric types. Choose the appropriate metric 10 | type for monitoring your system. Following are the types of 11 | metrics and their purposes. 12 | 13 | - **Gauge**—*Gauge* is a constant type of metric. After the 14 | metric is initialized, the metric value does not change unless 15 | you intentionally update it. 16 | 17 | - **Timer**—*Timer* measures the time taken to complete a 18 | task. 19 | 20 | - **Counter**—*Counter* counts the number of occurrences of a 21 | particular event. 22 | 23 | For more information about these metric types, see [Data 24 | Types](https://statsd.readthedocs.io/en/v0.5.0/types.html). 25 | 26 | - **Avoid over-monitoring**—Monitoring can be a significant 27 | engineering endeavor. Therefore, be sure not to spend too 28 | much time and resources on monitoring services, yet make sure all 29 | important metrics are captured. 30 | 31 | - **Prevent alert fatigue**—Set alerts for metrics that are 32 | important and actionable. If you receive too many non-critical 33 | alerts, you might start ignoring alert notifications over time. As 34 | a result, critical alerts might get overlooked. 35 | 36 | - **Have a runbook for alerts**—For every alert, make sure you have 37 | a document explaining what actions and checks need to be performed 38 | when the alert fires. This enables any engineer on the team to 39 | handle the alert and take necessary actions, without any help from 40 | others. -------------------------------------------------------------------------------- /courses/level102/continuous_integration_and_continuous_delivery/continuous_integration_build_pipeline.md: -------------------------------------------------------------------------------- 1 | CI is a software development practice where members of a team integrate their work frequently. Each integration is verified by an automated build (including test) to detect integration errors as quickly as possible. 2 | 3 | Continuous integration requires that all the code changes be maintained in a single code repository where all the members can push the changes to their feature branches regularly. The code changes must be quickly integrated with the rest of the code and automated builds should happen and feedback to the member to resolve them early. 4 | 5 | There should be a CI server where it can trigger a build as soon as the code is pushed by a member. The build typically involves compiling the code and transforming it to an executable file such as JARs or DLLs etc. called packaging. It must also perform [unit tests](https://en.wikipedia.org/wiki/Unit_testing) with code coverage. Optionally, the build process can have additional stages such as static code analysis and vulnerability checks etc. 6 | 7 | [Jenkins](https://www.jenkins.io/), [Bamboo](https://confluence.atlassian.com/bamboo/understanding-the-bamboo-ci-server-289277285.html), [Travis CI](https://travis-ci.org/), [GitLab](https://about.gitlab.com/), [Azure DevOps](https://azure.microsoft.com/en-in/services/devops/) etc. are the few popular CI tools. These tools provide various plugins and integration such as [ant](https://ant.apache.org/), [maven](https://maven.apache.org/) etc. for building and packaging, and Junit, selenium etc. are for performing the unit tests. [SonarQube](https://www.sonarqube.org/) can be used for static code analysis and code security. 8 | 9 | 10 | ![](./images/CI_Image1.JPG) 11 | 12 | *Fig 1: Continuous Integration Pipeline* 13 | 14 | ![](./images/CI_Image2.JPG) 15 | 16 | *Fig 2: Continuous Integration Process* -------------------------------------------------------------------------------- /courses/level101/databases_sql/intro.md: -------------------------------------------------------------------------------- 1 | # Relational Databases 2 | 3 | ### Prerequisites 4 | * Complete [Linux course](https://linkedin.github.io/school-of-sre/level101/linux_basics/intro/) 5 | * Install Docker (for lab section) 6 | 7 | ### What to expect from this course 8 | You will have an understanding of what relational databases are, their advantages, and some MySQL specific concepts. 9 | 10 | ### What is not covered under this course 11 | * In-depth implementation details 12 | 13 | * Advanced topics like normalization, sharding 14 | 15 | * Specific tools for administration 16 | 17 | ### Introduction 18 | The main purpose of database systems is to manage data. This includes storage, adding new data, deleting unused data, updating existing data, retrieving data within a reasonable response time, other maintenance tasks to keep the system running, etc. 19 | 20 | ### Pre-reads 21 | [RDBMS Concepts](https://beginnersbook.com/2015/04/rdbms-concepts/) 22 | 23 | ### Course Contents 24 | - [Key Concepts](https://linkedin.github.io/school-of-sre/level101/databases_sql/concepts/) 25 | - [MySQL Architecture](https://linkedin.github.io/school-of-sre/level101/databases_sql/mysql/#mysql-architecture) 26 | - [InnoDB](https://linkedin.github.io/school-of-sre/level101/databases_sql/innodb/) 27 | - [Backup and Recovery](https://linkedin.github.io/school-of-sre/level101/databases_sql/backup_recovery/) 28 | - [MySQL Replication](https://linkedin.github.io/school-of-sre/level101/databases_sql/replication/) 29 | - Operational Concepts 30 | - [SELECT Query](https://linkedin.github.io/school-of-sre/level101/databases_sql/select_query/) 31 | - [Query Performance](https://linkedin.github.io/school-of-sre/level101/databases_sql/query_performance/) 32 | - [Lab](https://linkedin.github.io/school-of-sre/level101/databases_sql/lab/) 33 | - [Further Reading](https://linkedin.github.io/school-of-sre/level101/databases_sql/conclusion/#further-reading) 34 | -------------------------------------------------------------------------------- /courses/level102/continuous_integration_and_continuous_delivery/introduction.md: -------------------------------------------------------------------------------- 1 | ## Prerequisites 2 | 1. [Software Development and Maintenance](https://en.wikibooks.org/wiki/Introduction_to_Software_Engineering/Implementation/Documentation) 3 | 2. [Git](https://linkedin.github.io/school-of-sre/level101/git/git-basics/) 4 | 3. [Docker](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/containerization_with_docker/) 5 | 6 | ## What to expect from this course? 7 | In this course, you will learn the basics of CI/CD and how it helps drive the SRE discipline in an organization. It also discusses the various DevOps tools in CI/CD practice and a hands-on lab session on [Jenkins](https://www.jenkins.io/) based pipeline. Finally, it will conclude by explaining the role in the growing SRE philosophy. 8 | 9 | ## What is not covered under this course? 10 | The course does not cover DevOps elements such as Infrastructure as a code, continuous monitoring applications and infrastructure comprehensively. 11 | 12 | ## Table of Contents 13 | 14 | * [What is CI/CD?](https://linkedin.github.io/school-of-sre/level102/continuous_integration_and_continuous_delivery/introduction_to_cicd) 15 | * [Brief History to CI/CD and DevOps](https://linkedin.github.io/school-of-sre/level102/continuous_integration_and_continuous_delivery/cicd_brief_history) 16 | * [Continuous Integration](https://linkedin.github.io/school-of-sre/level102/continuous_integration_and_continuous_delivery/continuous_integration_build_pipeline) 17 | * [Continuous Delivery and Deployment](https://linkedin.github.io/school-of-sre/level102/continuous_integration_and_continuous_delivery/continuous_delivery_release_pipeline) 18 | * [Jenkins based CI/CD pipeline - Hands-on](https://linkedin.github.io/school-of-sre/level102/continuous_integration_and_continuous_delivery/jenkins_cicd_pipeline_hands_on_lab) 19 | * [Conclusion](https://linkedin.github.io/school-of-sre/level102/continuous_integration_and_continuous_delivery/conclusion) 20 | -------------------------------------------------------------------------------- /courses/level102/linux_intermediate/introvim.md: -------------------------------------------------------------------------------- 1 | 2 | # Introduction to Vim 3 | 4 | ## Introduction 5 | As an SRE we several times log into into the servers and make changes to the config file, edit and modify scripts and the editor which comes handy and available in almost all linux distribution is Vim. Vim is an open-source and free command line editor, widely accepted and used. We will see some basics of how to use vim for creating and editing files. This knowledge will help us in understanding the next section, Scripting. 6 | 7 | ## Opening a file and using insert mode 8 | 9 | We use the command *`vim filename`* to open a file *`filename`*. The terminal will open an editor but once you start writing, it won’t work. It’s because we are not in "INSERT" mode in vim. 10 | 11 | Press ***`i`*** and get into insert mode and start writing. 12 | 13 | ![](images/image2.png) 14 | 15 | You will see on the bottom left “INSERT” after pressing “***i***” . You can use *`ESC`” key to get back to normal mode. 16 | 17 | ## Saving a file 18 | 19 | After you insert your text in INSERT mode press ESC(escape) key on your keyboard to get out of it. Press `:`(colon shift +;) and press ***`w`*** and hit enter, the text you entered will get written in the file. 20 | 21 | ![](images/image19.png) 22 | 23 | ## Exiting the VIM editor 24 | 25 | Exiting vim can get real challenging for the beginners. There are various ways you can exit the Vim like exit without saving the work, exit with saving the work. 26 | 27 | Try below commands after exiting insert mode and pressing ***`:`***(colon). 28 | 29 | | Vim Commands | Description | 30 | | --- | --- | 31 | | **:q** | Exit the file but won’t exit if file has unsaved changes | 32 | | **:wq** | Write(save) and exit the file. | 33 | | **:q!** | Exit without saving the changes. | 34 | 35 | This is basic we would be needing in bash scripting in the next section. You can always visit tutorial for learning more. For quick practice of vim commands visit: [https://www.openvim.com/](https://www.openvim.com/) 36 | 37 | -------------------------------------------------------------------------------- /courses/level101/metrics_and_monitoring/third-party_monitoring.md: -------------------------------------------------------------------------------- 1 | ## 2 | 3 | # Third-party monitoring 4 | 5 | Today most cloud providers offer a variety of monitoring solutions. In 6 | addition, a number of companies such as 7 | [Datadog](https://www.datadoghq.com/) offer 8 | monitoring-as-a-service. In this section, we are not covering 9 | monitoring-as-a-service in depth. 10 | 11 | In recent years, more and more people have access to the Internet. Many 12 | services are offered online to cater to the increasing user base. As a 13 | result, web pages are becoming larger, with increased client-side 14 | scripts. Users want these services to be fast and error-free. From the 15 | service point of view, when the response body is composed, an HTTP 200 16 | OK response is sent, and everything looks okay. But there might be 17 | errors during transmission or on the client-side. As previously 18 | mentioned, monitoring services from within the service infrastructure 19 | give good visibility into service health, but this is not enough. You 20 | need to monitor user experience, specifically the availability of 21 | services for clients. A number of third-party services such as 22 | [Catchpoint](https://www.catchpoint.com/), 23 | [Pingdom](https://www.pingdom.com/), and so on are available for 24 | achieving this goal. 25 | 26 | Third-party monitoring services can generate synthetic traffic 27 | simulating user requests from various parts of the world, to ensure the 28 | service is globally accessible. Other third-party monitoring solutions 29 | for real user monitoring (RUM) provide performance statistics such as 30 | service uptime and response time, from different geographical locations. 31 | This allows you to monitor the user experience from these locations, 32 | which might have different Internet backbones, different operating 33 | systems, and different browsers and browser versions. [Catchpoint 34 | Global Monitoring 35 | Network](https://pages.catchpoint.com/overview-video) is a 36 | comprehensive 3-minute video that explains the importance of monitoring 37 | the client experience. 38 | -------------------------------------------------------------------------------- /courses/level101/git/github-hooks.md: -------------------------------------------------------------------------------- 1 | # Git with GitHub 2 | 3 | Till now all the operations we did were in our local repo while git also helps us in a collaborative environment. GitHub is one place on the Internet where you can centrally host your git repos and collaborate with other developers. 4 | 5 | Most of the workflow will remain the same as we discussed, with addition of couple of things: 6 | 7 | 1. Pull: to pull latest changes from GitHub (the central) repo 8 | 2. Push: to push your changes to GitHub repo so that it's available to all people 9 | 10 | GitHub has written nice guides and tutorials about this and you can refer to them here: 11 | 12 | - [GitHub Hello World](https://guides.github.com/activities/hello-world/) 13 | - [Git Handbook](https://guides.github.com/introduction/git-handbook/) 14 | 15 | ## Hooks 16 | 17 | Git has another nice feature called hooks. Hooks are basically scripts which will be called when a certain event happens. Here is where hooks are located: 18 | 19 | ```bash 20 | $ ls .git/hooks/ 21 | applypatch-msg.sample fsmonitor-watchman.sample pre-applypatch.sample pre-push.sample pre-receive.sample update.sample 22 | commit-msg.sample post-update.sample pre-commit.sample pre-rebase.sample prepare-commit-msg.sample 23 | ``` 24 | 25 | Names are self-explanatory. These hooks are useful when you want to do certain things when a certain event happens. If you want to run tests before pushing code, you would want to setup `pre-push` hooks. Let's try to create a pre commit hook. 26 | 27 | ```bash 28 | $ echo "echo this is from pre commit hook" > .git/hooks/pre-commit 29 | $ chmod +x .git/hooks/pre-commit 30 | ``` 31 | 32 | We basically create a file called `pre-commit` in hooks folder and make it executable. Now if we make a commit, we should see the message getting printed. 33 | 34 | ```bash 35 | $ echo "sample file" > sample.txt 36 | $ git add sample.txt 37 | $ git commit -m "adding sample file" 38 | this is from pre commit hook # <===== THE MESSAGE FROM HOOK EXECUTION 39 | [master 9894e05] adding sample file 40 | 1 file changed, 1 insertion(+) 41 | create mode 100644 sample.txt 42 | ``` 43 | -------------------------------------------------------------------------------- /courses/level101/linux_networking/intro.md: -------------------------------------------------------------------------------- 1 | # Linux Networking Fundamentals 2 | 3 | ## Prerequisites 4 | 5 | - High-level knowledge of commonly used jargon in TCP/IP stack like DNS, TCP, UDP and HTTP 6 | - [Linux Commandline Basics](https://linkedin.github.io/school-of-sre/level101/linux_basics/command_line_basics/) 7 | 8 | ## What to expect from this course 9 | 10 | Throughout the course, we cover how an SRE can optimize the system to improve their web stack performance and troubleshoot if there is an issue in any of the layers of the networking stack. This course tries to dig through each layer of traditional TCP/IP stack and expects an SRE to have a picture beyond the bird’s eye view of the functioning of the Internet. 11 | 12 | ## What is not covered under this course 13 | 14 | This course spends time on the fundamentals. We are not covering concepts like [HTTP/2.0](https://en.wikipedia.org/wiki/HTTP/2), [QUIC](https://en.wikipedia.org/wiki/QUIC), [TCP congestion control protocols](https://en.wikipedia.org/wiki/TCP_congestion_control), [Anycast](https://en.wikipedia.org/wiki/Anycast), [BGP](https://en.wikipedia.org/wiki/Border_Gateway_Protocol), [CDN](https://en.wikipedia.org/wiki/Content_delivery_network), [Tunnels](https://en.wikipedia.org/wiki/Virtual_private_network) and [Multicast](https://en.wikipedia.org/wiki/Multicast). We expect that this course will provide the relevant basics to understand such concepts. 15 | 16 | ## Birds eye view of the course 17 | 18 | The course covers the question “What happens when you open [linkedin.com](https://www.linkedin.com) in your browser?” The course follows the flow of TCP/IP stack. More specifically, the course covers topics of Application layer protocols (DNS and HTTP), transport layer protocols (UDP and TCP), networking layer protocol (IP) and data link layer protocol. 19 | 20 | ## Course Contents 21 | 1. [DNS](https://linkedin.github.io/school-of-sre/level101/linux_networking/dns/) 22 | 2. [UDP](https://linkedin.github.io/school-of-sre/level101/linux_networking/udp/) 23 | 3. [HTTP](https://linkedin.github.io/school-of-sre/level101/linux_networking/http/) 24 | 4. [TCP](https://linkedin.github.io/school-of-sre/level101/linux_networking/tcp/) 25 | 5. [IP Routing](https://linkedin.github.io/school-of-sre/level101/linux_networking/ipr/) 26 | -------------------------------------------------------------------------------- /courses/level102/continuous_integration_and_continuous_delivery/conclusion.md: -------------------------------------------------------------------------------- 1 | ## Applications in SRE Role 2 | 3 | The Monitoring, Automation and Eliminating the toil are some of the core pillars of the SRE discipline. As an SRE, you may require spending about 50% of time on automating the repetitive tasks and to eliminate the toil. CI/CD pipelines are one of the crucial tools for the SRE. They help in delivering the quality application with the smaller and regular and more frequent builds. Additionally, the CI/CD metrics such as Deployment time, Success rate, Cycle time and Automated test success rate etc. are the key things to watch to improve the quality of the product thus improving the reliability of the applications. 4 | 5 | * [Infrastructure-as-code](https://en.wikipedia.org/wiki/Infrastructure_as_code) is one of the standard practices followed in SRE for automating the repetitive configuration tasks. Every configuration is maintained as code, so it can be deployed using CI/CD pipelines. It is important to deliver the configuration changes to the production environments through CI/CD pipelines to maintain the versioning, consistency of the changes across environments and to avoid manual errors. 6 | * Often, as an SRE, you are required to review the application CI/CD pipelines and recommend additional stages such as static code analysis and the security and privacy checks in the code to improve the security and reliability of the product. 7 | 8 | ## Conclusion 9 | 10 | In this chapter, we have studied the CI/CD pipelines with brief history on the challenges with the traditional build practices. We have also looked at how the CI/CD pipelines augments the SRE discipline. Use of CI/CD pipelines in software development life cycle is a modern approach in the SRE realm that helps achieve greater efficiency. 11 | 12 | We have also performed a hands-on lab activity on creating the CI/CD pipeline using Jenkins. 13 | 14 | ## References 15 | 16 | 1. [Continuous Integration(martinfowler.com)](https://martinfowler.com/articles/continuousIntegration.html) 17 | 2. [CI/CD for microservices - Azure Architecture Center | Microsoft Docs](https://docs.microsoft.com/en-us/azure/architecture/microservices/ci-cd) 18 | 3. [SREFoundationBlueprint_2 (devopsinstitute.com)](https://www.devopsinstitute.com/wp-content/uploads/2020/11/SREF-Blueprint.pdf) 19 | 4. [Jenkins User Documentation](https://www.jenkins.io/doc/) -------------------------------------------------------------------------------- /courses/level102/linux_intermediate/package_management.md: -------------------------------------------------------------------------------- 1 | # Package Management 2 | ## Introduction 3 | 4 | One of the main features of any operating system is the ability to run other programs and softwares, and hence Package management comes into picture. Package management is a method of installing and maintaining software programs on any operating system. 5 | 6 | ## Package 7 | 8 | In the early days of Linux, one had to download source code of any software and compile it to install and run the software. As the Linux space became more mature, it is understood the software landscape is very dynamic and started distributing software in the form of packages. Package file is a compressed collection of files that contains software, its dependencies, installation instructions and metadata about the package. 9 | 10 | ## Dependencies 11 | 12 | It is rare that a software package is stand-alone, it depends on the different software, libraries and modules. These subroutines are stored and made available in the form of shared libraries which may serve more than one program. These shared resources are called dependencies. Package management does this hard job of resolving dependencies and installing them for the user along with the software. 13 | 14 | ## Repository 15 | 16 | Repository is a storage location where all the packages, updates, dependencies are stored. Each repository can contain thousands of software packages hosted on a remote server intended to be installed and updated on linux systems. We usually update the package information ( *often referred to as metadata*) by running “*sudo dnf update”.* 17 | 18 | ![](images/image29.png) 19 | 20 | Try out *`sudo dnf repolist all`* to list all the repositories. 21 | 22 | We usually add repositories for installing packages from third party vendors. 23 | 24 | > dnf config-manager --add-repo http://www.example.com/example.repo 25 | 26 | ## High Level and Low-Level Package management tools 27 | 28 | There are mainly two types of packages management tools: 29 | 30 | > 1\. *Low-level tools*: This is mostly used for installing, removing and upgrading package files. 31 | > 32 | > 2\. *High-Level tools*: In addition to Low-level tools, High-level tools do metadata searching and dependency resolution as well. 33 | 34 | | Linux Distribution | Low-Level Tools | High-Level tools | 35 | | --- | --- | --- | 36 | | Debian | dpkg | apt-get | 37 | | Fedora, RedHat | dnf | dnf | 38 | 39 | -------------------------------------------------------------------------------- /courses/level101/metrics_and_monitoring/conclusion.md: -------------------------------------------------------------------------------- 1 | # Conclusion 2 | 3 | A robust monitoring and alerting system is necessary for maintaining and 4 | troubleshooting a system. A dashboard with key metrics can give you an 5 | overview of service performance, all in one place. Well-defined alerts 6 | (with realistic thresholds and notifications) further enable you to 7 | quickly identify any anomalies in the service infrastructure and in 8 | resource saturation. By taking necessary actions, you can avoid any 9 | service degradations and decrease MTTD for service breakdowns. 10 | 11 | In addition to in-house monitoring, monitoring real-user experience can 12 | help you to understand service performance as perceived by the users. 13 | Many modules are involved in serving the user, and most of them are out 14 | of your control. Therefore, you need to have real-user monitoring in 15 | place. 16 | 17 | Metrics give very abstract details on service performance. To get a 18 | better understanding of the system and for faster recovery during 19 | incidents, you might want to implement the other two pillars of 20 | observability: logs and tracing. Logs and trace data can help you 21 | understand what led to service failure or degradation. 22 | 23 | Following are some resources to learn more about monitoring and 24 | observability: 25 | 26 | - [Google SRE book: Monitoring Distributed 27 | Systems](https://sre.google/sre-book/monitoring-distributed-systems/) 28 | 29 | - [Mastering Distributed Tracing by Yuri 30 | Shkuro](https://learning.oreilly.com/library/view/mastering-distributed-tracing/9781788628464/) 31 | 32 | 33 | ## References 34 | 35 | - [Google SRE book: Monitoring Distributed 36 | Systems](https://sre.google/sre-book/monitoring-distributed-systems/) 37 | 38 | - [Mastering Distributed Tracing, by Yuri 39 | Shkuro](https://learning.oreilly.com/library/view/mastering-distributed-tracing/9781788628464/) 40 | 41 | - [Monitoring and 42 | Observability](https://copyconstruct.medium.com/monitoring-and-observability-8417d1952e1c) 43 | 44 | - [Three PIllars with Zero 45 | Answers](https://medium.com/lightstephq/three-pillars-with-zero-answers-2a98b36358b8) 46 | 47 | - Engineering blogs on 48 | [LinkedIn](https://engineering.linkedin.com/blog/topic/monitoring), 49 | [Grafana](https://grafana.com/blog/), 50 | [Elastic.co](https://www.elastic.co/blog/), 51 | [OpenTelemetry](https://medium.com/opentelemetry) 52 | -------------------------------------------------------------------------------- /courses/level102/system_calls_and_signals/intro.md: -------------------------------------------------------------------------------- 1 | # System Calls and Signals 2 | 3 | ## Prerequisites 4 | 5 | - [Linux Basics](https://linkedin.github.io/school-of-sre/level101/linux_basics/intro/) 6 | - [Python Basics](https://linkedin.github.io/school-of-sre/level101/python_web/intro/) 7 | 8 | ## What to expect from this course 9 | 10 | The course covers a fundamental understanding of signals and system calls. It sheds light on how the knowledge of signals and system calls can be helpful for an SRE. 11 | 12 | ## What is not covered under this course 13 | 14 | The course does not discuss any other interrupts or interrupt handling apart from signals. The course will not deep dive into signal handler and GNU C library. 15 | 16 | ## Course Contents 17 | - [Signals](https://linkedin.github.io/school-of-sre/level102/system_calls_and_signals/signals) 18 | - [Introduction to interrupts and signals](https://linkedin.github.io/school-of-sre/level102/system_calls_and_signals/signals/#introduction-to-interrupts-and-signals) 19 | - [Types of signals](https://linkedin.github.io/school-of-sre/level102/system_calls_and_signals/signals/#types-of-signals) 20 | - [Sending signals to process](https://linkedin.github.io/school-of-sre/level102/system_calls_and_signals/signals/#sending-signals-to-process) 21 | - [Handling signals](https://linkedin.github.io/school-of-sre/level102/system_calls_and_signals/signals/#handling-signals) 22 | - [Role of signals in system calls with the example of *wait()*](https://linkedin.github.io/school-of-sre/level102/system_calls_and_signals/signals/#role-of-signals-in-system-calls-with-the-example-of-wait) 23 | - [System calls](https://linkedin.github.io/school-of-sre/level102/system_calls_and_signals/system_calls) 24 | - [Introduction](https://linkedin.github.io/school-of-sre/level102/system_calls_and_signals/system_calls/#introduction) 25 | - [Types of system calls](https://linkedin.github.io/school-of-sre/level102/system_calls_and_signals/system_calls/#types-of-system-calls) 26 | - [User mode,kernel mode and their transitions](https://linkedin.github.io/school-of-sre/level102/system_calls_and_signals/system_calls/#user-mode-kernel-mode-and-their-transitions) 27 | - [Working of *write()* system call](https://linkedin.github.io/school-of-sre/level102/system_calls_and_signals/system_calls/#working-of-write-system-call) 28 | - [Debugging in Linux with strace](https://linkedin.github.io/school-of-sre/level102/system_calls_and_signals/system_calls/#debugging-in-linux-with-strace) 29 | 30 | -------------------------------------------------------------------------------- /courses/level101/security/conclusion.md: -------------------------------------------------------------------------------- 1 | # Conclusion 2 | 3 | Now that you have completed this course on Security you are now aware of the possible security threats to computer systems & networks. Not only that, but you are now better able to protect your systems as well as recommend security measures to others. 4 | 5 | This course provides fundamental everyday knowledge on security domain which will also help you keep security at the top of your priority. 6 | 7 | ## Other Resources 8 | 9 | Some books that would be a great resource 10 | 11 | - Holistic Info-Sec for Web Developers ()—Free and downloadable book series with very broad and deep coverage of what Web Developers and DevOps Engineers need to know in order to create robust, reliable, maintainable and secure software, networks and other, that are delivered continuously, on time, with no nasty surprises. 12 | 13 | - Docker Security: Quick Reference—For DevOps Engineers ()—A book on understanding the Docker security defaults, how to improve them (theory and practical), along with many tools and techniques. 14 | 15 | - How to Hack Like a Legend ()—A hacker’s tale breaking into a secretive offshore company, Sparc Flow, 2018 16 | 17 | - How to Investigate Like a Rockstar ()—Live a real crisis to master the secrets of forensic analysis, Sparc Flow, 2017 18 | 19 | - Real World Cryptography ()—This early-access book teaches you applied cryptographic techniques to understand and apply security at every level of your systems and applications. 20 | 21 | - AWS Security ()—This early-access book covers common AWS security issues and best practices for access policies, data protection, auditing, continuous monitoring, and incident response. 22 | 23 | ## Post Training asks/ Further Reading 24 | 25 | - CTF Events like: 26 | - Penetration Testing: 27 | - Threat Intelligence: 28 | - Threat Detection & Hunting: 29 | - Web Security: 30 | - Building Secure and Reliable Systems: 31 | -------------------------------------------------------------------------------- /courses/level102/system_troubleshooting_and_performance/important-tools.md: -------------------------------------------------------------------------------- 1 | ### Important linux commands 2 | 3 | Having knowledge of following commands will help find issues faster. Elaborating each command in detail is out of scope, please look for man pages or online for more information and examples around the same. 4 | 5 | - For logs parsing -: grep, sed, awk, cut, tail, head 6 | - For network checks -: nc, netstat, traceroute/6, mtr, ping/6, route, tcpdump, ss, ip 7 | - For DNS -: dig, host, nslookup 8 | - For tracing system call -: strace 9 | - For parallel executions over ssh -: gnu parallel, xargs + ssh. 10 | - For http/s checks -: curl, wget 11 | - For list of open files -: lsof 12 | - For modifying attributes of the system kernel -: [sysctl](https://man7.org/linux/man-pages/man8/sysctl.8.html) 13 | 14 | In case of distributed systems, some good third party tools can help to execute commands/instructions on many hosts at once, like: 15 | 16 | - **SSH based tools** 17 | - [ClusterSSH](https://github.com/duncs/clusterssh): Cluster ssh can help you run a command in parallel on many hosts at once. 18 | - [Ansible](https://github.com/ansible/ansible): It allows you to write ansible playbooks which you can run on hundreds/thousands of hosts at the same time. 19 | - **Agent Based tools** 20 | - [Saltstack](https://github.com/saltstack/salt): Is a configuration, state and remote execution framework, provides a wide variety of flexibility to users to execute modules on large numbers of hosts at once. 21 | - [Puppet](https://github.com/puppetlabs/puppet): Is an automated administrative engine for your Linux, Unix, and Windows systems, performs administrative tasks. 22 | 23 | ### Log analysis tools 24 | 25 | These can help in writing SQL type queries for parsing, analysing logs and provide an easy UI interface to create dashboards which can render various types of charts based on defined queries. 26 | 27 | - [ELK](https://www.elastic.co/what-is/elk-stack): Elasticsearch, Logstash and Kibana, provide package of tools and services to allow, parse logs, index logs and analyse logs easily and quickly. Once logs/data is parsed/filtered through logstash and indexed in elasticsearch, one can create dynamic dashboards in Kibana in a matter of minutes. Such provides easy analysis and correlation on application errors/exceptions/warnings. 28 | - [Azure kusto](https://docs.microsoft.com/en-us/azure/data-explorer): Azure kusto is a cloud based service similar to Elasticsearch and Kibana, it allows easy indexing of heavy logs, provides SQL type interface for writing queries, and an interface to create dynamic dashboards. 29 | 30 | -------------------------------------------------------------------------------- /overrides/partials/header.html: -------------------------------------------------------------------------------- 1 | {% block libs %} 2 | 3 | 10 | {% endblock %} 11 | {% set site_url = config.site_url | default(nav.homepage.url, true) | url %} 12 | {% if not config.use_directory_urls and site_url[0] == site_url[-1] == "." %} 13 | {% set site_url = site_url ~ "/index.html" %} 14 | {% endif %} 15 |
16 | 61 |
62 | -------------------------------------------------------------------------------- /courses/level101/linux_basics/conclusion.md: -------------------------------------------------------------------------------- 1 | # Conclusion 2 | 3 | We have covered the basics of Linux operating systems and basic commands used in Linux. 4 | We have also covered the Linux server administration commands. 5 | 6 | We hope that this course will make it easier for you to operate on the command line. 7 | 8 | ## Applications in SRE Role 9 | 10 | 1. As a SRE, you will be required to perform some general tasks on these Linux servers. You will also be using the command line when you are troubleshooting issues. 11 | 2. Moving from one location to another in the filesystem will require the help of `ls`, `pwd` and `cd` commands. 12 | 3. You may need to search some specific information in the log files. `grep` command would be very useful here. I/O redirection will become handy if you want to store the output in a file or pass it as an input to another command. 13 | 4. `tail` command is very useful to view the latest data in the log file. 14 | 5. Different users will have different permissions depending on their roles. We will also not want everyone in the company to access our servers for security reasons. Users permissions can be restricted with `chown`, `chmod` and `chgrp` commands. 15 | 6. `ssh` is one of the most frequently used commands for a SRE. Logging into servers and troubleshooting along with performing basic administration tasks will only be possible if we are able to login into the server. 16 | 7. What if we want to run an Apache server or NGINX on a server? We will first install it using the package manager. Package management commands become important here. 17 | 8. Managing services on servers is another critical responsibility of a SRE. `systemd`-related commands can help in troubleshooting issues. If a service goes down, we can start it using `systemctl start` command. We can also stop a service in case it is not needed. 18 | 9. Monitoring is another core responsibility of a SRE. Memory and CPU are two important system-level metrics which should be monitored. Commands like `top` and `free` are quite helpful here. 19 | 10. If a service throws an error, how do we find out the root cause of the error? We will certainly need to check logs to find out the whole stack trace of the error. The log file will also tell us the number of times the error has occurred along with time when it started. 20 | 21 | ## Useful Courses and Tutorials 22 | 23 | * [Edx basic linux commands course](https://courses.edx.org/courses/course-v1:LinuxFoundationX+LFS101x+1T2020/course/) 24 | * [Edx Red Hat Enterprise Linux Course](https://courses.edx.org/courses/course-v1:RedHat+RH066x+2T2017/course/) 25 | * [https://linuxcommand.org/lc3_learning_the_shell.php](https://linuxcommand.org/lc3_learning_the_shell.php) 26 | -------------------------------------------------------------------------------- /courses/level101/systems_design/intro.md: -------------------------------------------------------------------------------- 1 | # Systems Design 2 | 3 | ## Prerequisites 4 | 5 | Fundamentals of common software system components: 6 | 7 | - [Linux Basics](https://linkedin.github.io/school-of-sre/level101/linux_basics/intro/) 8 | - [Linux Networking](https://linkedin.github.io/school-of-sre/level101/linux_networking/intro/) 9 | - Databases RDBMS 10 | - [NoSQL Concepts](https://linkedin.github.io/school-of-sre/level101/databases_nosql/intro/) 11 | 12 | ## What to expect from this course 13 | 14 | Thinking about and designing for scalability, availability, and reliability of large scale software systems. 15 | 16 | ## What is not covered under this course 17 | 18 | Individual software components’ scalability and reliability concerns like e.g. Databases, while the same scalability principles and thinking can be applied, these individual components have their own specific nuances when scaling them and thinking about their reliability. 19 | 20 | More light will be shed on concepts rather than on setting up and configuring components like Loadbalancers to achieve scalability, availability, and reliability of systems 21 | 22 | ## Course Contents 23 | 24 | - [Introduction](https://linkedin.github.io/school-of-sre/level101/systems_design/intro/#backstory) 25 | - [Scalability](https://linkedin.github.io/school-of-sre/level101/systems_design/scalability/) 26 | - [High Availability](https://linkedin.github.io/school-of-sre/level101/systems_design/availability/) 27 | - [Fault Tolerance](https://linkedin.github.io/school-of-sre/level101/systems_design/fault-tolerance/) 28 | 29 | 30 | ## Introduction 31 | 32 | So, how do you go about learning to design a system? 33 | 34 | "*Like most great questions, it showed a level of naivety that was breathtaking. The only short answer I could give was, essentially, that you learned how to design a system by designing systems and finding out what works and what doesn’t work.*"—Jim Waldo, Sun Microsystems, On System Design 35 | 36 | 37 | As software and hardware systems have multiple moving parts, we need to think about how those parts will grow, their failure modes, their inter-dependencies, how it will impact the users and the business. 38 | 39 | There is no one-shot method or way to learn or do system design, we only learn to design systems by designing and iterating on them. 40 | 41 | This course will be a starter to make one think about _scalability_, _availability_, and _fault tolerance_ during systems design. 42 | 43 | ## Backstory 44 | 45 | Let’s design a simple content sharing application where users can share photos, media in our application which can be liked by their friends. Let’s start with a simple design of the application and evolve it as we learn system design concepts. 46 | 47 | ![First architecture diagram](images/first-architecture.jpg) 48 | 49 | -------------------------------------------------------------------------------- /courses/level101/databases_sql/mysql.md: -------------------------------------------------------------------------------- 1 | ### MySQL architecture 2 | 3 | ![alt_text](images/mysql_architecture.png "MySQL architecture diagram") 4 | 5 | MySQL architecture enables you to select the right storage engine for your needs, and abstracts away all implementation details from the end users (application engineers and [DBA](https://en.wikipedia.org/wiki/Database_administrator)) who only need to know a consistent stable API. 6 | 7 | Application layer: 8 | 9 | * Connection handling: each client gets its own connection which is cached for the duration of access 10 | * Authentication: server checks (username, password, host) info of client and allows/rejects connection 11 | * Security: server determines whether the client has privileges to execute each query (check with `SHOW PRIVILEGES` command) 12 | 13 | Server layer: 14 | 15 | * Services and utilities: backup/restore, replication, cluster, etc 16 | * SQL interface: clients run queries for data access and manipulation 17 | * SQL parser: creates a parse tree from the query (lexical/syntactic/semantic analysis and code generation) 18 | * Optimizer: optimizes queries using various algorithms and data available to it (table-level stats), modifies queries, order of scanning, indexes to use, etc. (check with `EXPLAIN` command) 19 | * Caches and buffers: cache stores query results, buffer pool (InnoDB) stores table and index data in [LRU](https://en.wikipedia.org/wiki/Cache_replacement_policies#Least_recently_used_(LRU)) fashion 20 | 21 | Storage engine options: 22 | 23 | * InnoDB: most-widely used, transaction support, ACID compliant, supports row-level locking, crash recovery and multi-version concurrency control. Default since MySQL 5.5+. 24 | * MyISAM: fast, does not support transactions, provides table-level locking, great for read-heavy workloads, mostly in web and data warehousing. Default upto MySQL 5.1. 25 | * Archive: optimised for high speed inserts, compresses data as it is inserted, does not support transactions, ideal for storing and retrieving large amounts of seldom referenced historical, archived data 26 | * Memory: tables in memory. Fastest engine, supports table-level locking, does not support transactions, ideal for creating temporary tables or quick lookups, data is lost after a shutdown 27 | * CSV: stores data in CSV files, great for integrating into other applications that use this format 28 | * … etc. 29 | 30 | It is possible to migrate from one storage engine to another. But this migration locks tables for all operations and is not online, as it changes the physical layout of the data. It takes a long time and is generally not recommended. Hence, choosing the right storage engine at the beginning is important. 31 | 32 | General guideline is to use InnoDB unless you have a specific need for one of the other storage engines. 33 | 34 | Running `mysql> SHOW ENGINES;` shows you the supported engines on your MySQL server. -------------------------------------------------------------------------------- /courses/level101/big_data/intro.md: -------------------------------------------------------------------------------- 1 | # Big Data 2 | 3 | ## Prerequisites 4 | 5 | - Basics of Linux File systems. 6 | - Basic understanding of System Design. 7 | 8 | ## What to expect from this course 9 | 10 | This course covers the basics of Big Data and how it has evolved to become what it is today. We will take a look at a few realistic scenarios where Big Data would be a perfect fit. An interesting assignment on designing a Big Data system is followed by understanding the architecture of Hadoop and the tooling around it. 11 | 12 | ## What is not covered under this course 13 | 14 | Writing programs to draw analytics from data. 15 | 16 | ## Course Contents 17 | 18 | 1. [Overview of Big Data](https://linkedin.github.io/school-of-sre/level101/big_data/intro/#overview-of-big-data) 19 | 2. [Usage of Big Data Techniques](https://linkedin.github.io/school-of-sre/level101/big_data/intro/#usage-of-big-data-techniques) 20 | 3. [Evolution of Hadoop](https://linkedin.github.io/school-of-sre/level101/big_data/evolution/) 21 | 4. [Architecture of Hadoop](https://linkedin.github.io/school-of-sre/level101/big_data/evolution/#architecture-of-hadoop) 22 | 1. HDFS 23 | 2. Yarn 24 | 5. [MapReduce Framework](https://linkedin.github.io/school-of-sre/level101/big_data/evolution/#mapreduce-framework) 25 | 6. [Other Tooling Around Hadoop](https://linkedin.github.io/school-of-sre/level101/big_data/evolution/#other-tooling-around-hadoop) 26 | 1. Hive 27 | 2. Pig 28 | 3. Spark 29 | 4. Presto 30 | 7. [Data Serialization and Storage](https://linkedin.github.io/school-of-sre/level101/big_data/evolution/#data-serialisation-and-storage) 31 | 32 | 33 | # Overview of Big Data 34 | 35 | 1. Big Data is a collection of large datasets that cannot be processed using traditional computing techniques. It is not a single technique or a tool, rather it has become a complete subject, which involves various tools, techniques, and frameworks. 36 | 2. Big Data could consist of 37 | 1. Structured data 38 | 2. Unstructured data 39 | 3. Semi-structured data 40 | 3. Characteristics of Big Data: 41 | 1. Volume 42 | 2. Variety 43 | 3. Velocity 44 | 4. Variability 45 | 4. Examples of Big Data generation include stock exchanges, social media sites, jet engines, etc. 46 | 47 | 48 | # Usage of Big Data Techniques 49 | 50 | 1. Take the example of the traffic lights problem. 51 | 1. There are more than 300,000 traffic lights in the US as of 2018. 52 | 2. Let us assume that we placed a device on each of them to collect metrics and send it to a central metrics collection system. 53 | 3. If each of the IoT devices sends 10 events per minute, we have `300000 x 10 x 60 x 24 = 432 x 10 ^ 7` events per day. 54 | 4. How would you go about processing that and telling me how many of the signals were “green” at 10:45 am on a particular day? 55 | 2. Consider the next example on Unified Payments Interface (UPI) transactions: 56 | 1. We had about 1.15 billion UPI transactions in the month of October 2019 in India. 57 | 12. If we try to extrapolate this data to about a year and try to find out some common payments that were happening through a particular UPI ID, how do you suggest we go about that? 58 | -------------------------------------------------------------------------------- /courses/level102/continuous_integration_and_continuous_delivery/cicd_brief_history.md: -------------------------------------------------------------------------------- 1 | ## The Evolution of the CI/CD 2 | 3 | Traditional development approaches have been around for a very long time. The [waterfall model](https://www.linkedin.com/pulse/waterfall-model-shobika-ramasubbarayalu) has been widely used in both large and small projects and has been successful. Despite the success, it has a lot of drawbacks like longer cycle times or delivery. 4 | 5 | While multiple team members are working on the project, the code changes get accumulated and never integrated until the planned build date. The build usually happens on agreed cycles that range from a month to a quarter. This results in several integration issues and build failures as the developers were working on their features in silos. 6 | 7 | It was a nightmare situation for the operations teams/for anyone to deploy the new builds/releases to the production environment because of lack of proper documentation on every change and the configuration requirements. So, to deploy successfully, often it required hot fixes and immediate patches. 8 | 9 | Another big challenge was collaboration. It is rare that the developer meets the operation engineers and does not have a full understanding of the production environment. All these challenges have given rise to longer cycle times for the delivery of the code changes. 10 | 11 | [Agile](https://www.linkedin.com/pulse/list-popular-agile-methodologies-used-organizations) methodology prescribes the delivery of incremental delivery of features in multiple iterations. So, the developers commit their code changes in smaller increments and roll out more frequently. Every code commit triggers a new build, and the integration issues are identified much early. This has improved the build process and thereby reduced the cycle time. This process is known as *continuous integration or CI*. 12 | 13 | The big barrier between the developers and the operation teams has been shrunken with the emergence of the trend where organizations are adapting to the DevOps and SRE disciplines. The collaboration between the developers and the operation teams is improved. Moreover, the use of the same tools and processes by both the teams has improved coordination and avoided conflicting understanding of the process. One of the main drivers in this regard is the *continuous delivery (CD)* process that ensures the incremental deployment of smaller changes. There are multiple pre-production environments also called the staging environments before deploying to production environments. 14 | 15 | ## CI/CD and DevOps 16 | 17 | The term **DevOps** represents the combination of Development (Dev) and Operations (Ops) teams. That is bringing developers and operations teams together for more collaboration. The development team often wants to introduce more features and more changes while the operation teams are more focused on the stability of the application in production. A change is always taken as a threat by the operations team as it can shake the stability of the environment. DevOps is termed as a culture that introduces the processes to reduce the barriers between developers and operations. 18 | 19 | The collaboration between Dev and Ops allows better follow-up of end-to-end production deployments and more frequent deployments. So, thus CI/CD is a key element in the DevOps processes. 20 | -------------------------------------------------------------------------------- /courses/level101/linux_networking/udp.md: -------------------------------------------------------------------------------- 1 | # UDP 2 | 3 | 4 | UDP is a transport layer protocol. DNS is an application layer protocol that runs on top of UDP (most of the times). Before jumping into UDP, let's try to understand what an application and transport layer is. DNS protocol is used by a DNS client (eg `dig`) and DNS server (eg `named`). The transport layer makes sure the DNS request reaches the DNS server process and similarly the response reaches the DNS client process. Multiple processes can run on a system and they can listen on any [ports](https://en.wikipedia.org/wiki/Port_(computer_networking)). DNS servers usually listen on port number `53`. When a client makes a DNS request, after filling the necessary application payload, it passes the payload to the kernel via **sendto** system call. The kernel picks a random port number ([>1024](https://www.cyberciti.biz/tips/linux-increase-outgoing-network-sockets-range.html)) as source port number and puts 53 as destination port number and sends the packet to lower layers. When the kernel on server-side receives the packet, it checks the port number and queues the packet to the application buffer of the DNS server process which makes a **recvfrom** system call and reads the packet. This process by the kernel is called multiplexing (combining packets from multiple applications to same lower layers) and demultiplexing (segregating packets from single lower layer to multiple applications). Multiplexing and Demultiplexing is done by the Transport layer. 5 | 6 | UDP is one of the simplest transport layer protocol and it does only multiplexing and demultiplexing. Another common transport layer protocol TCP does a bunch of other things like reliable communication, flow control and congestion control. UDP is designed to be lightweight and handle communications with little overhead. So, it doesn’t do anything beyond multiplexing and demultiplexing. If applications running on top of UDP need any of the features of TCP, they have to implement that in their application. 7 | 8 | This [example from python wiki](https://wiki.python.org/moin/UdpCommunication) covers a sample UDP client and server where “Hello World” is an application payload sent to server listening on port number `5005`. The server receives the packet and prints the “Hello World” string from the client. 9 | 10 | ## Applications in SRE role 11 | 12 | 13 | 1. If the underlying network is slow and the UDP layer is unable to queue packets down to the networking layer, `sendto` syscall from the application will hang till the kernel finds some of its buffer is freed. This can affect the throughput of the system. Increasing write memory buffer values using [sysctl variables](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/5/html/tuning_and_optimizing_red_hat_enterprise_linux_for_oracle_9i_and_10g_databases/sect-oracle_9i_and_10g_tuning_guide-adjusting_network_settings-changing_network_kernel_settings) *net.core.wmem_max* and *net.core.wmem_default* provides some cushion to the application from the slow network 14 | 2. Similarly, if the receiver process is slow in consuming from its buffer, the kernel has to drop packets which it can’t queue due to the buffer being full. Since UDP doesn’t guarantee reliability these dropped packets can cause data loss unless tracked by the application layer. Increasing sysctl variables *rmem_default* and *rmem_max* can provide some cushion to slow applications from fast senders. 15 | 16 | -------------------------------------------------------------------------------- /courses/level101/linux_networking/ipr.md: -------------------------------------------------------------------------------- 1 | # IP Routing and Data Link Layer 2 | We will dig how packets that leave the client reach the server and vice versa. When the packet reaches the IP layer, the transport layer populates source port, destination port. IP/Network layer populates destination IP (discovered from DNS) and then looks up the route to the destination IP on the routing table. 3 | 4 | ```bash 5 | # Linux `route -n` command gives the default routing table 6 | route -n 7 | ``` 8 | 9 | ```bash 10 | Kernel IP routing table 11 | Destination Gateway Genmask Flags Metric Ref Use Iface 12 | 0.0.0.0 172.17.0.1 0.0.0.0 UG 0 0 0 eth0 13 | 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0 14 | ``` 15 | 16 | Here, the destination IP is bitwise AND’d with the Genmask and if the answer is the destination part of the table, then that gateway and interface is picked for routing. Here, [linkedin.com](https://www.linkedin.com)’s IP `108.174.10.10` is AND’d with `255.255.255.0` and the answer we get is `108.174.10.0` which doesn’t match with any destination in the routing table. Then, Linux does an AND of destination IP with `0.0.0.0` and we get `0.0.0.0`. This answer matches the default row. 17 | 18 | Routing table is processed in the order of more octets of 1 set in Genmask and Genmask `0.0.0.0` is the default route if nothing matches. 19 | At the end of this operation, Linux figured out that the packet has to be sent to next hop `172.17.0.1` via `eth0`. The source IP of the packet will be set as the IP of interface `eth0`. 20 | Now, to send the packet to `172.17.0.1`, Linux has to figure out the MAC address of `172.17.0.1`. MAC address is figured by looking at the internal ARP cache which stores translation between IP address and MAC address. If there is a cache miss, Linux broadcasts ARP request within the internal network asking who has `172.17.0.1`. The owner of the IP sends an ARP response which is cached by the kernel and the kernel sends the packet to the gateway by setting Source MAC address as MAC address of `eth0` and destination MAC address of `172.17.0.1` which we got just now. Similar routing lookup process is followed in each hop till the packet reaches the actual server. Transport layer and layers above it come to play only at end servers. During intermediate hops, only till the IP/Network layer is involved. 21 | 22 | ![Screengrab for above explanation](images/arp.gif) 23 | 24 | One weird gateway we saw in the routing table is `0.0.0.0`. This gateway means no Layer3 (Network layer) hop is needed to send the packet. Both source and destination are in the same network. Kernel has to figure out the MAC of the destination and populate source and destination MAC appropriately and send the packet out so that it reaches the destination without any Layer3 hop in the middle. 25 | 26 | As we followed in other modules, let's complete this session with SRE use cases. 27 | 28 | ## Applications in SRE role 29 | 1. Generally the routing table is populated by DHCP and playing around is not a good practice. There can be reasons where one has to play around the routing table but take that path only when it's absolutely necessary. 30 | 2. Understanding error messages better like, “No route to host” error can mean MAC address of the destination host is not found and it can mean the destination host is down. 31 | 3. On rare cases, looking at the ARP table can help us understand if there is a IP conflict where same IP is assigned to two hosts by mistake and this is causing unexpected behavior. 32 | 33 | -------------------------------------------------------------------------------- /courses/level102/system_troubleshooting_and_performance/troubleshooting.md: -------------------------------------------------------------------------------- 1 | Troubleshooting system failures can be tricky or tedious at times. In this practice we need to examine the end-to-end flow of a service, all its downstreams, analysing logs, memory leak, CPU usage, disk IO, network failures, hosts issues, etc. Knowing certain practices and tools can help figure & mitigate failures faster. Here’s the high level troubleshooting flowchart -: 2 | 3 | ### Troubleshooting Flowchart 4 | ![](images/TroubleshootingFlow.jpg) 5 | 6 | ### General Practices 7 | Different systems require different approaches for finding issues. Scope of this is limited and given a problem, there can be many more points which can be looked into. Following points will look at some high level practices towards finding webapp failures and finding fixes for the same. 8 | 9 | **Reproduce problem** 10 | 11 | * Try the broken request to reproduce the issue, Like try Hit http/s request which fails. 12 | * Check the end to end flow of request and look for return codes, mostly [3xx, 4xx or 5xx](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes). 3xx are mostly about redirections, 4xx are about unauthorized, bad request, forbidden, etc, And 5xx is mostly about server side issues. Based on the return code you can look for the next step. 13 | * Client side issues are mainly about missing or buggy static contents, like javascript issues, bad image, broken json from an async call etc, such can result in incorrect page rendering on browsers. 14 | 15 | **Gather Information** 16 | 17 | * Look for errors/exceptions in application logs, Like "Can’t Allocate Memory" or OutOfMemoryError, Or Something like "disk I/O error", Or a DNS resolution error. 18 | * Check application and host metrics, Look for anomalies in service and hosts graphs. Since when CPU usage has increased, since when memory usage increased, since when disk space is reduced Or Disk I/O is increased, when load average start shooting up etc. Please read the School of SRE link for more detail around [metrics and monitoring](https://linkedin.github.io/school-of-sre/level101/metrics_and_monitoring/introduction). 19 | * Look for recent code or config changes which possibly are breaking the system. 20 | 21 | **Understand the problem** 22 | 23 | * Try correlating gathered data with recent actions, like an exception showing up in logs after config/code deployment. 24 | * Is it due to the [QPS](https://en.wikipedia.org/wiki/Queries_per_second) increase? Is it bad SQL queries? Do recent code changes demand better or more hardware? 25 | 26 | **Find a solution and apply a fix** 27 | 28 | * Based on the above findings, look for a quick fix if possible, For example like rolling back changes if errors/exceptions correlate. 29 | * Try patching or [hotfixing](https://en.wikipedia.org/wiki/Hotfix) the code, probably in staging setup if you want to fix forward. 30 | * Try to scale up the system, if high QPS is the reason for system failure, then try adding resources (compute, storage, memory, etc) as necessary. 31 | * Optimize SQL queries if needed. 32 | 33 | **Verify complete request flow** 34 | 35 | * Hit requests again and ensure returns are successful (return code 2xx). 36 | * Check Logs ensure no more exceptions/errors, as found earlier. 37 | * Ensure metrics are back to normal. 38 | 39 | ### General Host issues 40 | 41 | To Know if host health is fine or not, look for any hardware failures or its performance issues, one can try following -: 42 | 43 | * Dmesg -: Shows recent errors / failures thrown by kernel. This help with knowing hardware failures if any 44 | * ls commands -: lspci, lsblk, lscpu, lsscsi, These commands list out pci, disk, cpu information. 45 | * /var/log/messages -: Shows system app/service related errors/warnings, also shows kernel issues. 46 | * Smartd -: check disk health. 47 | 48 | -------------------------------------------------------------------------------- /courses/level101/python_web/python-web-flask.md: -------------------------------------------------------------------------------- 1 | # Python, Web and Flask 2 | 3 | Back in the old days, websites were simple. They were simple static html contents. A webserver would be listening on a defined port and according to the HTTP request received, it would read files from disk and return them in response. But since then, complexity has evolved and websites are now dynamic. Depending on the request, multiple operations need to be performed like reading from database or calling other API and finally returning some response (HTML data, JSON content, etc.) 4 | 5 | Since serving web requests is no longer a simple task like reading files from disk and return contents, we need to process each HTTP request, perform some operations programmatically and construct a response. 6 | 7 | ## Sockets 8 | 9 | Though we have frameworks like Flask, HTTP is still a protocol that works over TCP protocol. So, let us setup a TCP server and send an HTTP request and inspect the request's payload. Note that this is not a tutorial on socket programming but what we are doing here is inspecting HTTP protocol at its ground level and look at what its contents look like. (Ref: [Socket Programming in Python (Guide) on RealPython](https://realpython.com/python-sockets/)) 10 | 11 | ```python 12 | import socket 13 | 14 | HOST = '127.0.0.1' # Standard loopback interface address (localhost) 15 | PORT = 65432 # Port to listen on (non-privileged ports are > 1023) 16 | 17 | with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s: 18 | s.bind((HOST, PORT)) 19 | s.listen() 20 | conn, addr = s.accept() 21 | with conn: 22 | print('Connected by', addr) 23 | while True: 24 | data = conn.recv(1024) 25 | if not data: 26 | break 27 | print(data) 28 | ``` 29 | 30 | Then, we open `localhost:65432` in our web browser and following would be the output: 31 | 32 | ```bash 33 | Connected by ('127.0.0.1', 54719) 34 | b'GET / HTTP/1.1\r\nHost: localhost:65432\r\nConnection: keep-alive\r\nDNT: 1\r\nUpgrade-Insecure-Requests: 1\r\nUser-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36 Edg/85.0.564.44\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9\r\nSec-Fetch-Site: none\r\nSec-Fetch-Mode: navigate\r\nSec-Fetch-User: ?1\r\nSec-Fetch-Dest: document\r\nAccept-Encoding: gzip, deflate, br\r\nAccept-Language: en-US,en;q=0.9\r\n\r\n' 35 | ``` 36 | 37 | Examine closely and the content will look like the HTTP protocol's format. ie: 38 | 39 | ``` 40 | HTTP_METHOD URI_PATH HTTP_VERSION 41 | HEADERS_SEPARATED_BY_SEPARATOR 42 | ``` 43 | 44 | So though it's a blob of bytes, knowing [http protocol specification](https://tools.ietf.org/html/rfc2616), you can parse that string (ie: split by `\r\n`) and get meaningful information out of it. 45 | 46 | ## Flask 47 | 48 | Flask, and other such frameworks does pretty much what we just discussed in the last section (with added more sophistication). They listen on a port on a TCP socket, receive an HTTP request, parse the data according to protocol format and make it available to you in a convenient manner. 49 | 50 | That is you can access headers in Flask by `request.headers` which is made available to you by splitting above payload by `/r/n`, as defined in HTTP protocol. 51 | 52 | Another example: we register routes in Flask by `@app.route("/hello")`. What Flask will do is maintain a registry internally which will map `/hello` with the function you decorated with. Now, whenever a request comes with the `/hello` route (second component in the first line, split by space), Flask calls the registered function and returns whatever the function returned. 53 | 54 | Same with all other web frameworks in other languages too. They all work on similar principles. What they basically do is understand the HTTP protocol, parses the HTTP request data and gives us programmers a nice interface to work with HTTP requests. 55 | 56 | Not so much of magic in it? 57 | -------------------------------------------------------------------------------- /courses/level101/databases_sql/operations.md: -------------------------------------------------------------------------------- 1 | * Explain and explain+analyze 2 | 3 | `EXPLAIN ` analyzes query plans from the optimizer, including how tables are joined, which tables/rows are scanned, etc. 4 | 5 | `EXPLAIN ANALYZE` shows the above and additional info like execution cost, number of rows returned, time taken, etc. 6 | 7 | This knowledge is useful to tweak queries and add indexes. 8 | 9 | Watch this performance tuning [tutorial video](https://www.youtube.com/watch?v=pjRTLPeUOug). 10 | 11 | Checkout the [lab section](https://linkedin.github.io/school-of-sre/level101/databases_sql/lab/) for a hands-on about indexes. 12 | 13 | * [Slow query logs](https://dev.mysql.com/doc/refman/5.7/en/slow-query-log.html) 14 | 15 | Used to identify slow queries (configurable threshold), enabled in config or dynamically with a query. 16 | 17 | Checkout the [lab section](https://linkedin.github.io/school-of-sre/level101/databases_sql/lab/) about identifying slow queries. 18 | 19 | * User management 20 | 21 | This includes creation and changes to users, like managing privileges, changing password etc. 22 | 23 | * Backup and restore strategies, pros and cons 24 | 25 | - Logical backup using `mysqldump` - slower but can be done online 26 | 27 | - Physical backup (copy data directory or use XtraBackup) - quick backup/recovery. Copying data directory requires locking or shut down. XtraBackup is an improvement because it supports backups without shutting down (hot backup). 28 | 29 | - Others - PITR, snapshots etc. 30 | 31 | * Crash recovery process using redo logs 32 | 33 | After a crash, when you restart server, it reads redo logs and replays modifications to recover 34 | 35 | * Monitoring MySQL 36 | 37 | - Key MySQL metrics: reads, writes, query runtime, errors, slow queries, connections, running threads, InnoDB metrics 38 | 39 | - Key OS metrics: CPU, load, memory, disk I/O, network 40 | 41 | 42 | * Replication 43 | 44 | Copies data from one instance to one or more instances. Helps in horizontal scaling, data protection, analytics and performance. Binlog dump thread on primary, replication I/O and SQL threads on secondary. Strategies include the standard async, semi async or group replication. 45 | 46 | * High Availability 47 | 48 | Ability to cope with failure at software, hardware and network level. Essential for anyone who needs 99.9%+ uptime. Can be implemented with replication or clustering solutions from MySQL, Percona, Oracle, etc. Requires expertise to setup and maintain. Failover can be manual, scripted or using tools like Orchestrator. 49 | 50 | * [Data directory](https://dev.mysql.com/doc/refman/8.0/en/data-directory.html) 51 | 52 | Data is stored in a particular directory, with nested directories for the data contained in each database. There are also MySQL log files, InnoDB log files, server process ID file and some other configs. The data directory is configurable. 53 | 54 | * [MySQL configuration](https://dev.mysql.com/doc/refman/5.7/en/server-configuration.html) 55 | 56 | This can be done by passing [parameters during startup](https://dev.mysql.com/doc/refman/5.7/en/server-options.html), or in a [file](https://dev.mysql.com/doc/refman/8.0/en/option-files.html). There are a few [standard paths](https://dev.mysql.com/doc/refman/8.0/en/option-files.html#option-file-order) where MySQL looks for config files, `/etc/my.cnf` is one of the commonly used paths. These options are organized under headers (`mysqld` for server and `mysql` for client), you can explore them more in the lab that follows. 57 | 58 | * [Logs](https://dev.mysql.com/doc/refman/5.7/en/server-logs.html) 59 | 60 | MySQL has logs for various purposes - general query log, errors, binary logs (for replication), slow query log. Only error log is enabled by default (to reduce I/O and storage requirement), the others can be enabled when required - by specifying config parameters at startup or running commands at runtime. [Log destination](https://dev.mysql.com/doc/refman/5.7/en/log-destinations.html) can also be tweaked with config parameters. 61 | -------------------------------------------------------------------------------- /courses/level102/linux_intermediate/archiving_backup.md: -------------------------------------------------------------------------------- 1 | 2 | # Archiving and Backup 3 | 4 | ## Introduction 5 | One of the things SREs make sure of is the services are up all the time (at least 99.99% of the time), but the amount of data generated at each server running those services are immense. This data could be logs, user data in the database, or any other kind of metadata. Hence we need to compress, archive, rotate, and Backup the data in a timely manner for data safety and to make sure we don’t run out of space. 6 | 7 | ## Archiving 8 | 9 | We usually archive the data that are no longer needed but are kept mostly for compliance purposes. This helps in storing the data into compressed format saving a lot of space. Below section is to familiarize with the archiving tools and commands. 10 | 11 | ## gzip 12 | 13 | gzip is a program used to [compress](https://en.wikipedia.org/wiki/Data_compression) one or more files, it replaces the original file with a compressed version of the original file. 14 | 15 | ![](images/image14.png) 16 | 17 | Here we can see that the *messages* log file is compressed to almost one-fifth of the original size and replaced with messages.gz. We can uncompress this file using [*gunzip*](https://linux.die.net/man/1/gunzip) command. 18 | 19 | ## tar 20 | 21 | *tar* program is a tool for archiving files and directories into a single file (often called tarball). This tool is usually used to prepare archives of files before it is transferred to a long term backup server. *tar* doesn’t replace the existing files and folders but creates a new file with extension *.tar* . It provides lot of flag to choose from for archiving 22 | 23 | | Flags | Description | 24 | | --- | --- | 25 | | -c | Creates archive | 26 | | -x | Extracts the archive | 27 | | -f | Creates archive with the given filename | 28 | | -t | Displays or lists files in archived file | 29 | | -u | Archives and adds to an existing archive file | 30 | | -v | Displays verbose information | 31 | | -A | Concatenates the archived file | 32 | | -z | Compresses the tar file using gzip | 33 | | -j | Compresses the tar file using bzip2 | 34 | | -W | Verifies an archive file | 35 | | -r | Updates or adds file or directory in already existing .tar file | 36 | 37 | ### Create an archive with files and folder 38 | 39 | Flag `c` is used for creating the archive where `f` is the filename. 40 | 41 | ![](images/image24.png) 42 | 43 | ### Listing files in the archive 44 | 45 | We can use flag `t` for listing out what an archive contains. 46 | 47 | ![](images/image7.png) 48 | 49 | ### Extract files from the archive 50 | 51 | We can use flag `x` to unarchive the archive. 52 | 53 | ![](images/image26.png) 54 | 55 | ## Backup 56 | 57 | Backup is a process of copying/duplicating the existing data, This backup can be used to restore the dataset in case of data loss. Data backup also becomes critical when the data is not needed in a day to day job but can be referred to as a source of truth and for compliance reasons in future. Different types of backup are : 58 | 59 | ### Incremental backup 60 | 61 | Incremental backup is the backup of data since the last backup, this reduces data redundancy and storage efficiency. 62 | 63 | ### Differential backup 64 | 65 | Sometimes our data keeps on modifying/updating. In that case we take backup of changes that occurred since the last backup called differential backup. 66 | 67 | ### Network backup 68 | 69 | Network backup refers to sending out data over the network from the source to a backup destination in a client-server model. This backup destination can be centralized or decentralized. Decentralized backups are useful for disaster recovery scenarios. 70 | 71 | `rsync` is one of the linux command which sync up file from one server to the destination server over the network. 72 | 73 | ![](images/image11.png) 74 | 75 | The syntax for *rsync* goes like `rsync \[options\] `. We can locate the file on the path specified after `:` (colon) in the “*destination”*. If nothing is specified the default path is the home directory of the user used for backup. `/home/azureuser` in this case. You can always look for different options for rsync using the `man rsync` command. 76 | 77 | ### Cloud Backup 78 | 79 | There are various third parties which provide the backup of data to the cloud. These cloud backups are much more reliable than stored backups on local machines or any server without RAID configuration as these providers manage redundancy of data, data recovery along with the data security. Two most widely used cloud backup options are Azure backup (from Microsoft) and Amazon Glacier backup (from AWS). 80 | 81 | -------------------------------------------------------------------------------- /courses/level102/system_troubleshooting_and_performance/troubleshooting-example.md: -------------------------------------------------------------------------------- 1 | In this section we will see an example of an issue and try to troubleshoot it, and at the end a few famous troubleshooting stories are shared, which were shared by LinkedIn engineers earlier. 2 | 3 | ### Example - Memory leak : 4 | Often memory leak issues go unnoticed until the service becomes unresponsive after running for some time (days, week or even month) until service is restarted or bug is fixed, In such cases, service memory usage will reflect in increasing order in the metric graph, something like this graph. 5 | 6 | ![](images/MemUsageChart.png) 7 | 8 | Memory leak is mismanagement of memory allocations by application, where unneeded memory is not released, over the period of time objects continue to pile up in memory resulting in service crash. Generally such non-released objects get sorted by [garbage collector](https://en.wikipedia.org/wiki/Garbage_collection_(computer_science)) automatically, but sometimes due to a bug it fails. Debugging helps in figuring where much of the application storage memory is being applied. Then, you start tracking and filter everything based on usage. In case, you find objects that aren’t in use, but are referenced, you can get rid of them by deleting them to avoid memory leaks. In the case of python applications, it comes with inbuilt features like [tracemalloc](https://docs.python.org/3/library/tracemalloc.html). This module can help pinpoint where an object was allocated first. Almost every language comes with a set of tools/libraries (inbuilt or external) which helps find memory issues. Similarly for Java there is a famous memory leak detection tool called [Java VisualVM](http://visualvm.java.net/intro.html). 9 | 10 | Let’s see how a dummy flask based web app with a memory leak bug, with every request its memory usage keeps increasing, and how we can use tracemalloc to capture the leak. 11 | 12 | Assumption -: A python virtual environment is created, and flask is installed in it. 13 | 14 | **A bare minimum flask code with bug, read comments for more info** 15 | ![](images/FlaskCode.png) 16 | 17 | **Starting flask app** 18 | ![](images/FlaskStart.png) 19 | 20 | **On start, Its memory usage is around 26576 kb, i.e approx 26MB** 21 | ![](images/MemUsage01.png) 22 | 23 | **Now with every subsequent GET request, We can notice that process memory usage continues to increase slowly.** 24 | ![](images/MemUsage02.png) 25 | 26 | **Now lets try 10000 requests, to see if memory usage increases heavily.** 27 | To hit a high number of requests, we use an Apache benchmarking tool called [“ab”](https://httpd.apache.org/docs/2.4/programs/ab.html). After 10000 hits, we can notice memory usage of flask app is jumped almost 15 times, i.e from initial **26576 KB to 419316 KB, i.e from roughly 26 MB to 419 MB**, That’s a huge jump for such a small webapp. 28 | ![](images/MemUsage03.png) 29 | 30 | **Lets try the python [tracemalloc](https://docs.python.org/3/library/tracemalloc.html) module to try to understand the application memory allocations.** Tracemalloc takes memory snapshots at a particular point, performing various statistics on the same. 31 | 32 | Adding a bare minimum code to our app.py file, no change in fetchuserdata.py file, it will allow us to capture tracemalloc snapshots whenever we will hit /capture uri. 33 | ![](images/Tracemalloc01.png) 34 | 35 | **After restart of app.py (flask run)**, we will 36 | - First hit http://127.0.0.1:5000/capture 37 | - Then hit http://127.0.0.1:5000/ 10000 times, for memory leak/s to take place. 38 | - Finally hit http://127.0.0.1:5000/capture again to take a snapshot to know which line has the most allocation. 39 | ![](images/Tracemalloc02.png) 40 | 41 | In the final snapshot, we noticed the exact module and lineno where most allocation happened. I.e fetchuserdata.py, line no 6, after 10000 hits, it is holding 419 MB of memory. 42 | ![](images/Tracemalloc03.png) 43 | 44 | **Summary** 45 | 46 | Above example shows how a bug can lead to memory leak, and how we can use [tracemalloc](https://docs.python.org/3/library/tracemalloc.html) to understand where it is. In real world applications are way more complex than the above dummy example, you must understand that using tracemalloc might degrade application performance somebit, due to tracemalloc own overheads. Be mindful about its use in production environments. 47 | 48 | If you are interested in digging deeper into Python Object Memory Allocation Internals and debugging memory leak, have a look at an Interesting talk by [Sanket Patel](https://www.linkedin.com/in/sanketplus/) in PyCon India 2019, [Debug Memory Leak In Python Flask | Python Object Memory Allocation Internals](https://www.youtube.com/watch?v=s9kAghWpzoE) 49 | 50 | -------------------------------------------------------------------------------- /courses/level102/system_design/scaling-beyond-the-datacenter.md: -------------------------------------------------------------------------------- 1 | ## Caching static assets 2 | 3 | Extending the existing caching solution a bit, we arrive at Content Delivery Networks(CDNs). CDNs are the caching layer that is closest to the user. A significant chunk of resources served in a webpage, may not be changing on an hourly or even a daily basis. In those cases, we would want to cache these at the CDN level, reducing our load. CDNs not only help reduce the load on our servers by removing the burden of serving static / bandwidth intensive resources, they also let us be present closer to our users, by way of points of presence(POPs). CDNs also let us do geo-load balancing, in case we have multiple data centres around 4 | the world, and would want to serve from the closest data center (DC) possible. 5 | 6 | **Taking it a step further** 7 | 8 | With the addition of caching and distributing our application into simpler services, we have solved the problem of scaling to 50000 users. However, our users may be geographically distributed locations and may not be at the same distance from our data centre or our cloud region. Consistency in user experience is important, else we are excluding users who are far away from our location, potentially eliminating a significant chunk of potential users. However, it is not impractical to have data centers all over the world, or even in more than a couple of locations in the world. This is where CDNs and POPs come into picture. 9 | 10 | ## Points of Presence 11 | 12 | CDN POPs are geographically distributed data centers aimed at being close to users. POPs reduce the round trip time by delivering content from a location that is nearest to the user. POPs typically may not have all the content, but have caching servers that cache the static assets, and fetch the rest of the content from the [origin server](https://www.cloudflare.com/en-in/learning/cdn/glossary/origin-server/) where the application actually resides. Their main function is to reduce round trip time by bringing the content closer to the website’s visitor. POPs can also route traffic to one of the multiple origin DCs possible. This way, POPs can be leveraged to add resiliency as well as load-balancing. 13 | 14 | 15 | Now, with our image sharing application becoming more popular by the day, let us assume that we have hit 100,000 concurrent users. And we have built another data center, predicting this increase in traffic. Now we need to be able to route the service to both of these data centers in a reliable manner, while also retaining the ability to fall back to a single data center in case there is an issue with one of the two DCs. This is where sticky routing comes into play. 16 | 17 | ## Sticky Routing 18 | 19 | When an user sends a request, there are cases in which we might want to serve a specific user’s requests from a DC if we have multiple DCs, or a specific server inside a DC. We may also wish to serve all requests from a specific POP by a single data center. Sticky routing helps us do exactly that. It might be simply pinning all users to a specific DC or pinning specific users to specific servers. This is typically done from the POP, so that as soon as the user enters reaches our servers, we can route them to the nearest DC possible. 20 | 21 | ## Geo DNS 22 | 23 | When a user opens the application, the user can be directed to one of the multiple 24 | globally distributed POPs. This can be done using [GeoDNS](https://jameshfisher.com/2017/02/08/how-does-geodns-work/), which simply put, gives out a different IP address(which are distributed geographically), depending on the location of the user making the DNS request. GeoDNS is the first step in distributing users to different locations - it is not 100% accurate, and typically makes use of IP address allotment information for guessing the location of the user. However, it works well enough for \>90% of the users. After this, we can have a sticky routing service that assigns each user to a specific DC, which we can use to assign a DC to this user, and set a cookie. When the user next visits, the cookie can be read at the POP to decide which data center the user’s traffic must be directed to. 25 | 26 | Having multiple DCs and leveraging sticky routing has not only scaling benefits, but also adds to the resiliency of the service, albeit at the cost of additional complexity. 27 | 28 | Let us consider another use case in which an user uploads a new profile picture for themselves. If we have multiple data centres or POPs which are not synced in real time - not all of them might have the newer picture. In such a case, it would make sense to tie that user to a specific DC/region until the update has propagated to all regions. Sticky routing would enable us to do this. 29 | 30 | 31 | ## References 32 | 1. [CDNs](https://www.cloudflare.com/en-in/learning/cdn/what-is-a-cdn/) 33 | 2. LinkedIn's TrafficShift [blog](https://engineering.linkedin.com/blog/2017/05/trafficshift--load-testing-at-scale) talks about sticky routing -------------------------------------------------------------------------------- /courses/level101/linux_networking/tcp.md: -------------------------------------------------------------------------------- 1 | # TCP 2 | 3 | TCP is a transport layer protocol like UDP but it guarantees reliability, flow control and congestion control. 4 | TCP guarantees reliable delivery by using sequence numbers. A TCP connection is established by a three-way handshake. In our case, the client sends a `SYN` packet along with the starting sequence number it plans to use, the server acknowledges the `SYN` packet and sends a `SYN` with its sequence number. Once the client acknowledges the `SYN` packet, the connection is established. Each data transferred from here on is considered delivered reliably once acknowledgement for that sequence is received by the concerned party. 5 | 6 | ![3-way handshake](images/established.png) 7 | 8 | ```bash 9 | # To understand handshake run packet capture on one bash session 10 | tcpdump -S -i any port 80 11 | # Run curl on one bash session 12 | curl www.linkedin.com 13 | ``` 14 | 15 | ![tcpdump-3way](images/pcap.png) 16 | 17 | 18 | Here, client sends a `SYN` flag shown by [S] flag with a sequence number `1522264672`. The server acknowledges receipt of `SYN` with an `ACK` [.] flag and a `SYN` flag for its sequence number [S]. The server uses the sequence number `1063230400` and acknowledges the client it's expecting sequence number `1522264673` (client sequence + 1). Client sends a zero length acknowledgement packet to the server (server sequence + 1) and connection stands established. This is called three way handshake. The client sends a 76 bytes length packet after this and increments its sequence number by 76. Server sends a 170 byte response and closes the connection. This was the difference we were talking about between HTTP/1.1 and HTTP/1.0. In HTTP/1.1, this same connection can be reused which reduces overhead of three-way handshake for each HTTP request. If a packet is missed between client and server, server won’t send an `ACK` to the client and client would retry sending the packet till the `ACK` is received. This guarantees reliability. 19 | The flow control is established by the `WIN` size field in each segment. The `WIN` size says available TCP buffer length in the kernel which can be used to buffer received segments. A size 0 means the receiver has a lot of lag to catch from its socket buffer and the sender has to pause sending packets so that receiver can cope up. This flow control protects from slow receiver and fast sender problem. 20 | 21 | TCP also does congestion control which determines how many segments can be in transit without an `ACK`. Linux provides us the ability to configure algorithms for congestion control which we are not covering here. 22 | 23 | While closing a connection, client/server calls a close syscall. Let's assume client do that. Client’s kernel will send a `FIN` packet to the server. Server’s kernel can’t close the connection till the close syscall is called by the server application. Once server app calls close, server also sends a `FIN` packet and client enters into `TIME_WAIT` state for 2*MSS (120s) so that this socket can’t be reused for that time period to prevent any TCP state corruptions due to stray stale packets. 24 | 25 | ![Connection tearing](images/closed.png) 26 | 27 | Armed with our TCP and HTTP knowledge, let's see how this is used by SREs in their role. 28 | 29 | ## Applications in SRE role 30 | 1. Scaling HTTP performance using load balancers need consistent knowledge about both TCP and HTTP. There are [different kinds of load balancing](https://blog.envoyproxy.io/introduction-to-modern-network-load-balancing-and-proxying-a57f6ff80236?gi=428394dbdcc3) like L4, L7 load balancing, Direct Server Return etc. HTTPs offloading can be done on Load balancer or directly on servers based on the performance and compliance needs. 31 | 2. Tweaking `sysctl` variables for `rmem` and `wmem` like we did for UDP can improve throughput of sender and receiver. 32 | 3. `sysctl` variable `tcp_max_syn_backlog` and socket variable `somax_conn` determines how many connections for which the kernel can complete 3-way handshake before app calling accept syscall. This is much useful in single-threaded applications. Once the backlog is full, new connections stay in `SYN_RCVD` state (when you run `netstat`) till the application calls accept syscall. 33 | 4. Apps can run out of file descriptors if there are too many short-lived connections. Digging through [tcp_reuse and tcp_recycle](http://lxr.linux.no/linux+v3.2.8/Documentation/networking/ip-sysctl.txt#L464) can help reduce time spent in the `TIME_WAIT` state (it has its own risk). Making apps reuse a pool of connections instead of creating ad hoc connection can also help. 34 | 5. Understanding performance bottlenecks by seeing metrics and classifying whether it's a problem in App or network side. Example too many sockets in `CLOSE_WAIT` state is a problem on application whereas retransmissions can be a problem more on network or on OS stack than the application itself. Understanding the fundamentals can help us narrow down where the bottleneck is. 35 | 36 | -------------------------------------------------------------------------------- /courses/level101/metrics_and_monitoring/command-line_tools.md: -------------------------------------------------------------------------------- 1 | ## 2 | 3 | # Command-line tools 4 | Most of the Linux distributions today come with a set of tools that 5 | monitor the system's performance. These tools help you measure and 6 | understand various subsystem statistics (CPU, memory, network, and so 7 | on). Let's look at some of the tools that are predominantly used. 8 | 9 | - **`ps/top`**: The process status command (`ps`) displays information 10 | about all the currently running processes in a Linux system. The 11 | top command is similar to the `ps` command, but it periodically 12 | updates the information displayed until the program is terminated. 13 | An advanced version of top, called `htop`, has a more user-friendly 14 | interface and some additional features. These command-line 15 | utilities come with options to modify the operation and output of 16 | the command. Following are some important options supported by the 17 | `ps` command. 18 | 19 | - `-p `: Displays information about processes 20 | that match the specified process IDs. Similarly, you can use 21 | `-u ` and `-g ` to display information about 22 | processes belonging to a specific user or group. 23 | 24 | - `-a`: Displays information about other users' processes, as well 25 | as one's own. 26 | 27 | - `-x`: When displaying processes matched by other options, 28 | includes processes that do not have a controlling terminal. 29 | 30 | ![Results of top command](images/image12.png) 31 |

Figure 2: Results of top command

32 | 33 | - **`ss`**: The socket statistics command (`ss`) displays information 34 | about network sockets on the system. This tool is the successor of 35 | [netstat](https://man7.org/linux/man-pages/man8/netstat.8.html), 36 | which is deprecated. Following are some command-line options 37 | supported by the `ss` command: 38 | 39 | - `-t`: Displays the TCP socket. Similarly, `-u` displays UDP 40 | sockets, `-x` is for UNIX domain sockets, and so on. 41 | 42 | - `-l`: Displays only listening sockets. 43 | 44 | - `-n`: Instructs the command to not resolve service names. 45 | Instead displays the port numbers. 46 | 47 | ![List of listening sockets on a system](images/image8.png)

Figure 48 | 3: List of listening sockets on a system

49 | 50 | - **`free`**: The `free` command displays memory usage statistics on the 51 | host like available memory, used memory, and free memory. Most often, 52 | this command is used with the `-h` command-line option, which 53 | displays the statistics in a human-readable format. 54 | 55 | ![Memory statistics on a host in human-readable form](images/image6.png) 56 |

Figure 4: Memory statistics on a host in human-readable form

57 | 58 | - **`df`**: The `df` command displays disk space usage statistics. The 59 | `-i` command-line option is also often used to display 60 | [inode](https://en.wikipedia.org/wiki/Inode) usage 61 | statistics. The `-h` command-line option is used for displaying 62 | statistics in a human-readable format. 63 | 64 | ![Disk usage statistics on a system in human-readable form](images/image9.png) 65 |

Figure 5: 66 | Disk usage statistics on a system in human-readable form

67 | 68 | - **`sar`**: The `sar` utility monitors various subsystems, such as CPU 69 | and memory, in real time. This data can be stored in a file 70 | specified with the `-o` option. This tool helps to identify 71 | anomalies. 72 | 73 | - **`iftop`**: The interface top command (`iftop`) displays bandwidth 74 | utilization by a host on an interface. This command is often used 75 | to identify bandwidth usage by active connections. The `-i` option 76 | specifies which network interface to watch. 77 | 78 | ![Network bandwidth usage by 79 | active connection on the host](images/image2.png) 80 |

Figure 6: Network bandwidth usage by 81 | active connection on the host

82 | 83 | - **`tcpdump`**: The `tcpdump` command is a network monitoring tool that 84 | captures network packets flowing over the network and displays a 85 | description of the captured packets. The following options are 86 | available: 87 | 88 | - `-i `: Interface to listen on 89 | 90 | - `host `: Filters traffic going to or from the 91 | specified host 92 | 93 | - `src/dst`: Displays one-way traffic from the source (src) or to 94 | the destination (dst) 95 | 96 | - `port `: Filters traffic to or from a particular 97 | port 98 | 99 | ![tcpdump of packets on an interface](images/image10.png) 100 |

Figure 7: tcpdump of packets on docker0 101 | interface on a host

-------------------------------------------------------------------------------- /courses/level101/systems_design/fault-tolerance.md: -------------------------------------------------------------------------------- 1 | # Fault Tolerance 2 | 3 | Failures are not avoidable in any system and will happen all the time, hence we need to build systems that can tolerate failures or recover from them. 4 | 5 | - In systems, failure is the norm rather than the exception. 6 | - "Anything that can go wrong will go wrong”—Murphy’s Law 7 | - “Complex systems contain changing mixtures of failures latent within them”—How Complex Systems Fail. 8 | 9 | ### Fault Tolerance: Failure Metrics 10 | 11 | Common failure metrics that get measured and tracked for any system. 12 | 13 | **Mean time to repair (MTTR):** The average time to repair and restore a failed system. 14 | 15 | **Mean time between failures (MTBF):** The average operational time between one device failure or system breakdown and the next. 16 | 17 | **Mean time to failure (MTTF):** The average time a device or system is expected to function before it fails. 18 | 19 | **Mean time to detect (MTTD):** The average time between the onset of a problem and when the organization detects it. 20 | 21 | **Mean time to investigate (MTTI):** The average time between the detection of an incident and when the organization begins to investigate its cause and solution. 22 | 23 | **Mean time to restore service (MTRS):** The average elapsed time from the detection of an incident until the affected system or component is again available to users. 24 | 25 | **Mean time between system incidents (MTBSI):** The average elapsed time between the detection of two consecutive incidents. MTBSI can be calculated by adding MTBF and MTRS (MTBSI = MTBF + MTRS). 26 | 27 | **Failure rate:** Another reliability metric, which measures the frequency with which a component or system fails. It is expressed as a number of failures over a unit of time. 28 | 29 | #### Refer 30 | - [https://www.splunk.com/en_us/data-insider/what-is-mean-time-to-repair.html](https://www.splunk.com/en_us/data-insider/what-is-mean-time-to-repair.html) 31 | 32 | ### Fault Tolerance: Fault Isolation Terms 33 | Systems should have a short circuit. Say in our content sharing system, if “Notifications” is not working, the site should gracefully handle that failure by removing the functionality instead of taking the whole site down. 34 | 35 | Swimlane is one of the commonly used fault isolation methodologies. Swimlane adds a barrier to the service from other services so that failure on either of them won’t affect the other. Say we roll out a new feature ‘Advertisement’ in our content sharing app. 36 | We can have two architectures 37 | 38 | ![Swimlane](images/swimlane-1.jpg) 39 | 40 | If Ads are generated on the fly synchronously during each Newsfeed request, the faults in the Ads feature get propagated to the Newsfeed feature. Instead if we swimlane the “Generation of Ads” service and use a shared storage to populate Newsfeed App, Ads failures won’t cascade to Newsfeed, and worst case if Ads don’t meet SLA, we can have Newsfeed without Ads. 41 | 42 | Let's take another example, we have come up with a new model for our Content sharing App. Here, we roll out an enterprise content sharing App where enterprises pay for the service and the content should never be shared outside the enterprise. 43 | 44 | ![Swimlane-principles](images/swimlane-2.jpg) 45 | 46 | ### Swimlane Principles 47 | 48 | **Principle 1:** Nothing is shared (also known as “share as little as possible”). The less that is shared within a swimlane, the more fault isolative the swimlane becomes. (as shown in Enterprise use-case) 49 | 50 | **Principle 2:** Nothing crosses a swimlane boundary. Synchronous (defined by expecting a request—not the transfer protocol) communication never crosses a swimlane boundary; if it does, the boundary is drawn incorrectly. (as shown in Ads feature) 51 | 52 | ### Swimlane Approaches 53 | **Approach 1:** Swimlane the money-maker. Never allow your cash register to be compromised by other systems. (Tier 1 vs Tier 2 in enterprise use case) 54 | 55 | **Approach 2:** Swimlane the biggest sources of incidents. Identify the recurring causes of pain and isolate them. (If Ads feature is in code yellow, swimlaning it is the best option.) 56 | 57 | **Approach 3:** Swimlane natural barriers. Customer boundaries make good swimlanes. (Public vs Enterprise customers) 58 | 59 | 60 | #### Refer 61 | - [https://learning.oreilly.com/library/view/the-art-of/9780134031408/ch21.html#ch21](https://learning.oreilly.com/library/view/the-art-of/9780134031408/ch21.html#ch21) 62 | 63 | 64 | ### Applications in SRE role 65 | 1. Work with the DC tech or cloud team to distribute infrastructure such that it's immune to switch or power failures by creating fault zones within a Data Center ([https://docs.microsoft.com/en-us/azure/virtual-machines/manage-availability#use-availability-zones-to-protect-from-datacenter-level-failures]( 66 | https://docs.microsoft.com/en-us/azure/virtual-machines/manage-availability#use-availability-zones-to-protect-from-datacenter-level-failures)). 67 | 2. Work with the partners and design interaction between services such that one service breakdown is not amplified in a cascading fashion to all upstreams. 68 | -------------------------------------------------------------------------------- /courses/CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | This code of conduct outlines expectations for participation in LinkedIn-managed open source communities, as well as steps for reporting unacceptable behavior. We are committed to providing a welcoming and inspiring community for all. People violating this code of conduct may be banned from the community. 2 | 3 | Our open source communities strive to: 4 | 5 | * **Be friendly and patient:** Remember you might not be communicating in someone else's primary spoken or programming language, and others may not have your level of understanding. 6 | * **Be welcoming:** Our communities welcome and support people of all backgrounds and identities. This includes, but is not limited to members of any race, ethnicity, culture, national origin, color, immigration status, social and economic class, educational level, sex, sexual orientation, gender identity and expression, age, size, family status, political belief, religion, and mental and physical ability. 7 | * **Be respectful:** We are a world-wide community of professionals, and we conduct ourselves professionally. Disagreement is no excuse for poor behavior and poor manners. Disrespectful and unacceptable behavior includes, but is not limited to: 8 | * Violent threats or language. 9 | * Discriminatory or derogatory jokes and language. 10 | * Posting sexually explicit or violent material. 11 | * Posting, or threatening to post, people's personally identifying information ("doxing"). 12 | * Insults, especially those using discriminatory terms or slurs. 13 | * Behavior that could be perceived as sexual attention. 14 | * Advocating for or encouraging any of the above behaviors. 15 | * **Understand disagreements:** Disagreements, both social and technical, are useful learning opportunities. Seek to understand the other viewpoints and resolve differences constructively. 16 | * This code is not exhaustive or complete. It serves to capture our common understanding of a productive, collaborative environment. We expect the code to be followed in spirit as much as in the letter. 17 | 18 | ### Scope 19 | 20 | This code of conduct applies to all repos and communities for LinkedIn-managed open source projects regardless of whether or not the repo explicitly calls out its use of this code. The code also applies in public spaces when an individual is representing a project or its community. Examples include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers. 21 | 22 | Note: Some LinkedIn-managed communities have codes of conduct that pre-date this document and issue resolution process. While communities are not required to change their code, they are expected to use the resolution process outlined here. The review team will coordinate with the communities involved to address your concerns. 23 | 24 | ### Reporting Code of Conduct Issues 25 | 26 | We encourage all communities to resolve issues on their own whenever possible. This builds a broader and deeper understanding and ultimately a healthier interaction. In the event that an issue cannot be resolved locally, please feel free to report your concerns by contacting [oss@linkedin.com](mailto:oss@linkedin.com). 27 | 28 | In your report, please include: 29 | 30 | * Your contact information. 31 | * Names (real, usernames or pseudonyms) of any individuals involved. If there are additional witnesses, please include them as well. 32 | * Your account of what occurred, and if you believe the incident is ongoing. If there is a publicly available record (e.g. a mailing list archive or a public chat log), please include a link or attachment. 33 | * Any additional information that may be helpful. 34 | 35 | All reports will be reviewed by a multi-person team and will result in a response that is deemed necessary and appropriate to the circumstances. Where additional perspectives are needed, the team may seek insight from others with relevant expertise or experience. The confidentiality of the person reporting the incident will be kept at all times. Involved parties are never part of the review team. 36 | 37 | Anyone asked to stop unacceptable behavior is expected to comply immediately. If an individual engages in unacceptable behavior, the review team may take any action they deem appropriate, including a permanent ban from the community. 38 | 39 | _This code of conduct is based on the [Microsoft](https://opensource.microsoft.com/codeofconduct/) Open Source Code of Conduct which was based on the [template](http://todogroup.org/opencodeofconduct) established by the [TODO Group](http://todogroup.org/) and used by numerous other large communities (e.g., [Facebook](https://code.facebook.com/pages/876921332402685/open-source-code-of-conduct), [Yahoo](https://yahoo.github.io/codeofconduct), [Twitter](https://engineering.twitter.com/opensource/code-of-conduct), [GitHub](http://todogroup.org/opencodeofconduct/#opensource@github.com)) and the Scope section from the [Contributor Covenant version 1.4](http://contributor-covenant.org/version/1/4/)._ -------------------------------------------------------------------------------- /courses/level101/python_web/sre-conclusion.md: -------------------------------------------------------------------------------- 1 | # Conclusion 2 | 3 | ## Scaling The App 4 | 5 | The design and development is just a part of the journey. We will need to setup continuous integration and continuous delivery pipelines sooner or later. And we have to deploy this app somewhere. 6 | 7 | Initially, we can start with deploying this app on one virtual machine on any cloud provider. But this is a `Single point of failure` which is something we never allow as an SRE (or even as an engineer). So an improvement here can be having multiple instances of applications deployed behind a load balancer. This certainly prevents problems of one machine going down. 8 | 9 | Scaling here would mean adding more instances behind the load balancer. But this is scalable upto only a certain point. After that, other bottlenecks in the system will start appearing. ie: DB will become the bottleneck, or perhaps the load balancer itself. How do you know what is the bottleneck? You need to have observability into each aspects of the application architecture. 10 | 11 | Only after you have metrics, you will be able to know what is going wrong where. **What gets measured, gets fixed!** 12 | 13 | Get deeper insights into scaling from School Of SRE's [Scalability module](../systems_design/scalability.md) and post going through it, apply your learnings and takeaways to this app. Think how will we make this app geographically distributed and highly available and scalable. 14 | 15 | ## Monitoring Strategy 16 | 17 | Once we have our application deployed. It will be working okay. But not forever. Reliability is in the title of our job and we make systems reliable by making the design in a certain way. But things still will go down. Machines will fail. Disks will behave weirdly. Buggy code will get pushed to production. And all these possible scenarios will make the system less reliable. So what do we do? **We monitor!** 18 | 19 | We keep an eye on the system's health and if anything is not going as expected, we want ourselves to get alerted. 20 | 21 | Now let's think in terms of the given URL-shortening app. We need to monitor it. And we would want to get notified in case something goes wrong. But we first need to decide what is that _something_ that we want to keep an eye on. 22 | 23 | 1. Since it's a web app serving HTTP requests, we want to keep an eye on HTTP Status codes and latencies 24 | 2. Request volume again is a good candidate, if the app is receiving an unusual amount of traffic, something might be off. 25 | 3. We also want to keep an eye on the database so depending on the database solution chosen. Query times, volumes, disk usage, etc. 26 | 4. Finally, there also needs to be some external monitoring which runs periodic tests from devices outside of your data centers. This emulates customers and ensures that from customer point of view, the system is working as expected. 27 | 28 | ## Applications in SRE role 29 | 30 | In the world of SRE, Python is a widely used language for small scripts and tooling developed for various purposes. Since tooling developed by SRE works with critical pieces of infrastructure and has great power (to bring things down), it is important to know what you are doing while using a programming language and its features. Also it is equally important to know the language and its characteristics while debugging the issues. As an SRE having a deeper understanding of Python language, it has helped me a lot to debug very sneaky bugs and be generally more aware and informed while making certain design decisions. 31 | 32 | While developing tools may or may not be part of SRE job, supporting tools or services is more likely to be a daily duty. Building an application or tool is just a small part of productionization. While there is certainly that goes in the design of the application itself to make it more robust, as an SRE you are responsible for its reliability and stability once it is deployed and running. And to ensure that, you’d need to understand the application first and then come up with a strategy to monitor it properly and be prepared for various failure scenarios. 33 | 34 | ## Optional Exercises 35 | 36 | 1. Make a decorator that will cache function return values depending on input parameters. 37 | 2. Host the URL-shortening app on any cloud provider. 38 | 3. Setup monitoring using many of the tools available like Catchpoint, Datadog, etc. 39 | 4. Create a minimal Flask-like framework on top of TCP sockets. 40 | 41 | ## Conclusion 42 | 43 | This module, in the first part, aims to make you more aware of the things that will happen when you choose Python as your programming language and what happens when you run a Python program. With the knowledge of how Python handles things internally as objects, lot of seemingly magic things in Python will start to make more sense. 44 | 45 | The second part will first explain how a framework like Flask works using the existing knowledge of protocols like TCP and HTTP. It then touches the whole lifecycle of an application development lifecycle including the SRE parts of it. While the design and areas in architecture considered will not be exhaustive, it will give a good overview of things that are also important being an SRE and why they are important. 46 | -------------------------------------------------------------------------------- /courses/level102/containerization_and_orchestration/intro.md: -------------------------------------------------------------------------------- 1 | # Containers and orchestration 2 | 3 | ## Introduction 4 | 5 | Containers, Docker and Kubernetes are "cool" terms that are being spoken of by everyone involved with software in some way. Let's dive into each of these pieces of technology at enough depth to understand what the whole deal is about! 6 | 7 | In this module we talk about the ins and outs of containers: the internals and usage of containers; how they are implemented, how to containerize your application and finally, how to deploy containerized applications on a large scale without losing your sleep. We'll also get our hands dirty by trying out a few lab exercises. 8 | 9 | ### Prerequisites 10 | - Basic knowledge of linux will be helpful understanding the internals of containers 11 | - Basic knowledge of shell commands (will come handy when we're containerizing applications) 12 | - Knowledge of running a basic web application. You can go through our [Python And Web module](https://linkedin.github.io/school-of-sre/level101/python_web/intro/) to gain familiarity with this. 13 | 14 | 15 | ## What to expect from this course 16 | 17 | This module is divided into 3 sub-modules. In the first sub module, we will cover the internals of containerization and why they’re used for. 18 | 19 | The second sub-module introduces Docker, a popular container engine and contains lab exercises on dockerizing a basic webapp. 20 | 21 | The last module talks about container orchestration with Kubernetes and some lab exercises to show how it makes the lives of SREs easy. 22 | 23 | ## What is not covered under this course 24 | 25 | We will not cover advanced docker and kubernetes concepts. However, we will be leading you to links and references from where you can pick them up as per your interest. 26 | 27 | ## Course Contents 28 | 29 | The following topics has been covered in this course: 30 | 31 | - [Introduction to containers](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/intro_to_containers/) 32 | - [What are containers](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/intro_to_containers/#what-are-containers) 33 | - [Why containers](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/intro_to_containers/#why-containers) 34 | - [Difference between virtual machines and containers](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/intro_to_containers/#difference-between-virtual-machines-and-containers) 35 | - [How are containers implemented](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/intro_to_containers/#how-are-containers-implemented) 36 | - [Namespaces](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/intro_to_containers/#namespaces) 37 | - [Cgroups](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/intro_to_containers/#cgroups) 38 | - [Container engines](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/intro_to_containers/#container-engine) 39 | - [Containerization with Docker](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/containerization_with_docker/) 40 | - [Introduction](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/containerization_with_docker/#introduction) 41 | - [Basic docker terminology](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/containerization_with_docker/#docker-terminology) 42 | - [Components of Docker engine](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/containerization_with_docker/#components-of-docker-engine) 43 | - [Hands-on](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/containerization_with_docker/#lab) 44 | - [Introduction to Advanced Docker](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/containerization_with_docker/#advanced-features-of-docker) 45 | - [Container orchestration with Kubernetes](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/orchestration_with_kubernetes/) 46 | - [Introduction](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/orchestration_with_kubernetes/#introduction) 47 | - [Motivation to use Kubernetes](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/orchestration_with_kubernetes/#motivation-to-use-kubernetes) 48 | - [Kubernetes Architecture](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/orchestration_with_kubernetes/#architecture-of-kubernetes) 49 | - [Hands-on](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/orchestration_with_kubernetes/#lab) 50 | - [Introduction to Advanced Kubernetes concepts](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/orchestration_with_kubernetes/#advanced-topics) 51 | - [Conclusion](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/conclusion/) -------------------------------------------------------------------------------- /courses/level102/system_troubleshooting_and_performance/introduction.md: -------------------------------------------------------------------------------- 1 | # System troubleshooting and performance improvements 2 | 3 | ## Prerequisites 4 | 5 | * [Linux Basics](https://linkedin.github.io/school-of-sre/level101/linux_basics/intro/) 6 | * [System design](https://linkedin.github.io/school-of-sre/level101/systems_design/intro/) 7 | * [Basic Networking](https://linkedin.github.io/school-of-sre/level101/linux_networking/intro/) 8 | * [Metrics and Monitoring](https://linkedin.github.io/school-of-sre/level101/metrics_and_monitoring/introduction/) 9 | 10 | ## What to expect from this course 11 | 12 | This brief course tries to provide a general introduction on how to troubleshoot system issues, like analysing api failures, 13 | resource utilization, network issues, hardware and OS issues. Course also briefs on profiling and benchmarking to measure overall system performance. 14 | 15 | ## What is not covered under this course 16 | 17 | This course does not cover following -: 18 | 19 | * System Design and Architecture. 20 | * Programming practices. 21 | * Metrics and Monitoring. 22 | * OS basics. 23 | 24 | ## Course Contents 25 | - [Introduction](https://linkedin.github.io/school-of-sre/level102/system_troubleshooting_and_performance/introduction) 26 | - [Troubleshooting](https://linkedin.github.io/school-of-sre/level102/system_troubleshooting_and_performance/troubleshooting) 27 | - [Troubleshooting Flowchart](https://linkedin.github.io/school-of-sre/level102/system_troubleshooting_and_performance/troubleshooting/#troubleshooting-flowchart) 28 | - [General Practices](https://linkedin.github.io/school-of-sre/level102/system_troubleshooting_and_performance/troubleshooting/#general-practices) 29 | - [General Host issues](https://linkedin.github.io/school-of-sre/level102/system_troubleshooting_and_performance/troubleshooting/#general-host-issues) 30 | - [Important tools to know](https://linkedin.github.io/school-of-sre/level102/system_troubleshooting_and_performance/important-tools) 31 | - [Important linux commands](https://linkedin.github.io/school-of-sre/level102/system_troubleshooting_and_performance/important-tools/#important-linux-commands) 32 | - [Log analysis tools](https://linkedin.github.io/school-of-sre/level102/system_troubleshooting_and_performance/important-tools/#log-analysis-tools) 33 | - [Performance improvements](https://linkedin.github.io/school-of-sre/level102/system_troubleshooting_and_performance/performance-improvements) 34 | - [Performance analysis commands](https://linkedin.github.io/school-of-sre/level102/system_troubleshooting_and_performance/performance-improvements/#performance-analysis-commands) 35 | - [Profiling tools](https://linkedin.github.io/school-of-sre/level102/system_troubleshooting_and_performance/performance-improvements/#profiling-tools) 36 | - [Benchmarking](https://linkedin.github.io/school-of-sre/level102/system_troubleshooting_and_performance/performance-improvements/#benchmarking) 37 | - [Scaling](https://linkedin.github.io/school-of-sre/level102/system_troubleshooting_and_performance/performance-improvements/#scaling) 38 | - [Troubleshooting Example](https://linkedin.github.io/school-of-sre/level102/system_troubleshooting_and_performance/troubleshooting-example) 39 | - [Conclusion](https://linkedin.github.io/school-of-sre/level102/system_troubleshooting_and_performance/conclusion) 40 | - [Further readings](https://linkedin.github.io/school-of-sre/level102/system_troubleshooting_and_performance/conclusion/#further-readings) 41 | 42 | ## Introduction 43 | Troubleshooting is an important part of operations & development. It can’t be learned by just reading one article or completing a course online, 44 | Its a continuous learning process, one learns it during :- 45 | 46 | * Daily operations and development. 47 | * Finding & Fixing application bugs. 48 | * Finding & Fixing system & network issues. 49 | * Performance analysis and improvements. 50 | * And more. 51 | 52 | From an SRE’s perspective, It is expected that they are aware of certain topics upfront to be able to troubleshoot problems around single or distributed systems. 53 | 54 | * Know your resources well, understand host specifications, liks CPU, Memory, Network, Disk etc. 55 | * Understand system design and architecture. 56 | * Ensure important metrics are being collected/rendered properly. 57 | 58 | There was a famous quote by HP founders - **“What gets measured gets fixed”** 59 | 60 | If system components and performance metrics are captured thoroughly then there is a high chance of success in troubleshooting an issue, at its earliest. 61 | 62 | ### Scope 63 | There is no common approach to troubleshoot different types of applications or services, the failure can occur at any layer of it. We will keep the scope of this work to a web api service type only. 64 | 65 | **Note -:** Linux ecosystem is wide, there are hundreds of tools and utilities which can help with system troubleshooting, each comes with its own set of benefits and functionalities. We will cover some of the known tools, either already available with Linux or are available in the open source world. Detailed explanation of mentioned tools in this doc is out of scope, please explore the internet or man pages for more examples and documentation around the same. 66 | 67 | --------------------------------------------------------------------------------