Figure 8: An alert notification received on Slack
18 | 19 | Today most of the monitoring services available provide a mechanism to 20 | set up alerts on one or a combination of metrics to actively monitor the 21 | service health. These alerts have a set of defined rules or conditions, 22 | and when the rule is broken, you are notified. These rules can be as 23 | simple as notifying when the metric value exceeds _n_ to as complex as a 24 | week-over-week (WoW) comparison of standard deviation over a period of 25 | time. Monitoring tools notify you about an active alert, and most of 26 | these tools support instant messaging (IM) platforms, SMS, email, or 27 | phone calls. Figure 8 shows a sample alert notification received on 28 | Slack for memory usage exceeding 90% of total RAM space on the 29 | host. 30 | -------------------------------------------------------------------------------- /courses/level101/security/intro.md: -------------------------------------------------------------------------------- 1 | # Security 2 | 3 | ## Prerequisites 4 | 5 | 1. [Linux Basics](https://linkedin.github.io/school-of-sre/level101/linux_basics/intro/) 6 | 7 | 2. [Linux Networking](https://linkedin.github.io/school-of-sre/level101/linux_networking/intro/) 8 | 9 | 10 | ## What to expect from this course 11 | 12 | The course covers fundamentals of information security along with touching on subjects of system security, network & web security. This course aims to get you familiar with the basics of information security in day-to-day operations and then as an SRE develop the mindset of ensuring that security takes a front-seat while developing solutions. The course also serves as an introduction to common risks and best practices along with practical ways to find out vulnerable systems and loopholes which might become compromised if not secured. 13 | 14 | 15 | ## What is not covered under this course 16 | 17 | The courseware is not an ethical hacking workshop or a very deep dive into the fundamentals of the problems. The course does not deal with hacking or breaking into systems but rather an approach on how to ensure you don’t get into those situations and also to make you aware of different ways a system can be compromised. 18 | 19 | 20 | ## Course Contents 21 | 22 | 1. [Fundamentals](https://linkedin.github.io/school-of-sre/level101/security/fundamentals/) 23 | 2. [Network Security](https://linkedin.github.io/school-of-sre/level101/security/network_security/) 24 | 3. [Threats, Attacks & Defence](https://linkedin.github.io/school-of-sre/level101/security/threats_attacks_defences/) 25 | 4. [Writing Secure Code & More](https://linkedin.github.io/school-of-sre/level101/security/writing_secure_code/) 26 | 5. [Conclusion](https://linkedin.github.io/school-of-sre/level101/security/conclusion/) 27 | -------------------------------------------------------------------------------- /courses/level101/databases_sql/innodb.md: -------------------------------------------------------------------------------- 1 | ### Why should you use this? 2 | 3 | General purpose, row level locking, ACID support, transactions, crash recovery and multi-version concurrency control, etc. 4 | 5 | 6 | ### Architecture 7 | 8 |  9 | 10 | 11 | ### Key components: 12 | 13 | * Memory: 14 | * Buffer pool: LRU cache of frequently used data (table and index) to be processed directly from memory, which speeds up processing. Important for tuning performance. 15 | * Change buffer: Caches changes to secondary index pages when those pages are not in the buffer pool and merges it when they are fetched. Merging may take a long time and impact live queries. It also takes up part of the buffer pool. Avoids the extra I/O to read secondary indexes in. 16 | * Adaptive hash index: Supplements InnoDB’s B-Tree indexes with fast hash lookup tables like a cache. Slight performance penalty for misses, also adds maintenance overhead of updating it. Hash collisions cause AHI rebuilding for large DBs. 17 | * Log buffer: Holds log data before flush to disk. 18 | 19 | Size of each above memory is configurable, and impacts performance a lot. Requires careful analysis of workload, available resources, benchmarking and tuning for optimal performance. 20 | 21 | * Disk: 22 | * Tables: Stores data within rows and columns. 23 | * Indexes: Helps find rows with specific column values quickly, avoids full table scans. 24 | * Redo Logs: all transactions are written to them, and after a crash, the recovery process corrects data written by incomplete transactions and replays any pending ones. 25 | * Undo Logs: Records associated with a single transaction that contains information about how to undo the latest change by a transaction. 26 | 27 | -------------------------------------------------------------------------------- /courses/level102/networking/rtt.md: -------------------------------------------------------------------------------- 1 | > *Latency plays a key role in determining the overall performance of the 2 | distributed service/application, where calls are made between hosts to 3 | serve the users.* 4 | 5 | RTT is a measure of time, it takes for a packet to reach B from A, and 6 | return to A. It is measured in milliseconds. This measure plays a role 7 | in determining the performance of the services. Its impact is seen in 8 | calls made between different servers/services, to serve the user, as 9 | well as the TCP throughput that can be achieved. 10 | 11 | It is fairly common that service makes multiple calls to servers within 12 | its cluster or to different services like authentication, logging, 13 | database, etc, to respond to each user/client request. These servers can 14 | be spread across different cabinets, at times even between different 15 | data centres in the same region. Such cases are quite possible in cloud 16 | solutions, where the deployment spreads across different sites within a 17 | region. As the RTT increases, the response time for each of the calls 18 | gets longer and thereby has a cascading effect on the end response being 19 | sent to the user. 20 | 21 | ### Relation of RTT and throughput 22 | 23 | RTT is inversely proportional to the TCP throughput. As RTT increases, 24 | it reduces the TCP throughput, just like packet loss. Below is a formula 25 | to estimate the TCP throughput, based on TCP mss, RTT and packet loss. 26 | 27 |  29 | 30 | As within a data centre, these calculations are also, important for 31 | communication over the internet, where a client can connect to the DC 32 | hosted services, over different telco networks and the RTT is not very 33 | stable, due to the unpredictability of the Internet routing policies. 34 | -------------------------------------------------------------------------------- /courses/level101/metrics_and_monitoring/best_practices.md: -------------------------------------------------------------------------------- 1 | ## 2 | 3 | # Best practices for monitoring 4 | 5 | When setting up monitoring for a service, keep the following best 6 | practices in mind. 7 | 8 | - **Use the right metric type**—Most of the libraries available 9 | today offer various metric types. Choose the appropriate metric 10 | type for monitoring your system. Following are the types of 11 | metrics and their purposes. 12 | 13 | - **Gauge**—*Gauge* is a constant type of metric. After the 14 | metric is initialized, the metric value does not change unless 15 | you intentionally update it. 16 | 17 | - **Timer**—*Timer* measures the time taken to complete a 18 | task. 19 | 20 | - **Counter**—*Counter* counts the number of occurrences of a 21 | particular event. 22 | 23 | For more information about these metric types, see [Data 24 | Types](https://statsd.readthedocs.io/en/v0.5.0/types.html). 25 | 26 | - **Avoid over-monitoring**—Monitoring can be a significant 27 | engineering endeavor. Therefore, be sure not to spend too 28 | much time and resources on monitoring services, yet make sure all 29 | important metrics are captured. 30 | 31 | - **Prevent alert fatigue**—Set alerts for metrics that are 32 | important and actionable. If you receive too many non-critical 33 | alerts, you might start ignoring alert notifications over time. As 34 | a result, critical alerts might get overlooked. 35 | 36 | - **Have a runbook for alerts**—For every alert, make sure you have 37 | a document explaining what actions and checks need to be performed 38 | when the alert fires. This enables any engineer on the team to 39 | handle the alert and take necessary actions, without any help from 40 | others. -------------------------------------------------------------------------------- /courses/level102/continuous_integration_and_continuous_delivery/continuous_integration_build_pipeline.md: -------------------------------------------------------------------------------- 1 | CI is a software development practice where members of a team integrate their work frequently. Each integration is verified by an automated build (including test) to detect integration errors as quickly as possible. 2 | 3 | Continuous integration requires that all the code changes be maintained in a single code repository where all the members can push the changes to their feature branches regularly. The code changes must be quickly integrated with the rest of the code and automated builds should happen and feedback to the member to resolve them early. 4 | 5 | There should be a CI server where it can trigger a build as soon as the code is pushed by a member. The build typically involves compiling the code and transforming it to an executable file such as JARs or DLLs etc. called packaging. It must also perform [unit tests](https://en.wikipedia.org/wiki/Unit_testing) with code coverage. Optionally, the build process can have additional stages such as static code analysis and vulnerability checks etc. 6 | 7 | [Jenkins](https://www.jenkins.io/), [Bamboo](https://confluence.atlassian.com/bamboo/understanding-the-bamboo-ci-server-289277285.html), [Travis CI](https://travis-ci.org/), [GitLab](https://about.gitlab.com/), [Azure DevOps](https://azure.microsoft.com/en-in/services/devops/) etc. are the few popular CI tools. These tools provide various plugins and integration such as [ant](https://ant.apache.org/), [maven](https://maven.apache.org/) etc. for building and packaging, and Junit, selenium etc. are for performing the unit tests. [SonarQube](https://www.sonarqube.org/) can be used for static code analysis and code security. 8 | 9 | 10 |  11 | 12 | *Fig 1: Continuous Integration Pipeline* 13 | 14 |  15 | 16 | *Fig 2: Continuous Integration Process* -------------------------------------------------------------------------------- /courses/level101/databases_sql/intro.md: -------------------------------------------------------------------------------- 1 | # Relational Databases 2 | 3 | ### Prerequisites 4 | * Complete [Linux course](https://linkedin.github.io/school-of-sre/level101/linux_basics/intro/) 5 | * Install Docker (for lab section) 6 | 7 | ### What to expect from this course 8 | You will have an understanding of what relational databases are, their advantages, and some MySQL specific concepts. 9 | 10 | ### What is not covered under this course 11 | * In-depth implementation details 12 | 13 | * Advanced topics like normalization, sharding 14 | 15 | * Specific tools for administration 16 | 17 | ### Introduction 18 | The main purpose of database systems is to manage data. This includes storage, adding new data, deleting unused data, updating existing data, retrieving data within a reasonable response time, other maintenance tasks to keep the system running, etc. 19 | 20 | ### Pre-reads 21 | [RDBMS Concepts](https://beginnersbook.com/2015/04/rdbms-concepts/) 22 | 23 | ### Course Contents 24 | - [Key Concepts](https://linkedin.github.io/school-of-sre/level101/databases_sql/concepts/) 25 | - [MySQL Architecture](https://linkedin.github.io/school-of-sre/level101/databases_sql/mysql/#mysql-architecture) 26 | - [InnoDB](https://linkedin.github.io/school-of-sre/level101/databases_sql/innodb/) 27 | - [Backup and Recovery](https://linkedin.github.io/school-of-sre/level101/databases_sql/backup_recovery/) 28 | - [MySQL Replication](https://linkedin.github.io/school-of-sre/level101/databases_sql/replication/) 29 | - Operational Concepts 30 | - [SELECT Query](https://linkedin.github.io/school-of-sre/level101/databases_sql/select_query/) 31 | - [Query Performance](https://linkedin.github.io/school-of-sre/level101/databases_sql/query_performance/) 32 | - [Lab](https://linkedin.github.io/school-of-sre/level101/databases_sql/lab/) 33 | - [Further Reading](https://linkedin.github.io/school-of-sre/level101/databases_sql/conclusion/#further-reading) 34 | -------------------------------------------------------------------------------- /courses/level102/continuous_integration_and_continuous_delivery/introduction.md: -------------------------------------------------------------------------------- 1 | ## Prerequisites 2 | 1. [Software Development and Maintenance](https://en.wikibooks.org/wiki/Introduction_to_Software_Engineering/Implementation/Documentation) 3 | 2. [Git](https://linkedin.github.io/school-of-sre/level101/git/git-basics/) 4 | 3. [Docker](https://linkedin.github.io/school-of-sre/level102/containerization_and_orchestration/containerization_with_docker/) 5 | 6 | ## What to expect from this course? 7 | In this course, you will learn the basics of CI/CD and how it helps drive the SRE discipline in an organization. It also discusses the various DevOps tools in CI/CD practice and a hands-on lab session on [Jenkins](https://www.jenkins.io/) based pipeline. Finally, it will conclude by explaining the role in the growing SRE philosophy. 8 | 9 | ## What is not covered under this course? 10 | The course does not cover DevOps elements such as Infrastructure as a code, continuous monitoring applications and infrastructure comprehensively. 11 | 12 | ## Table of Contents 13 | 14 | * [What is CI/CD?](https://linkedin.github.io/school-of-sre/level102/continuous_integration_and_continuous_delivery/introduction_to_cicd) 15 | * [Brief History to CI/CD and DevOps](https://linkedin.github.io/school-of-sre/level102/continuous_integration_and_continuous_delivery/cicd_brief_history) 16 | * [Continuous Integration](https://linkedin.github.io/school-of-sre/level102/continuous_integration_and_continuous_delivery/continuous_integration_build_pipeline) 17 | * [Continuous Delivery and Deployment](https://linkedin.github.io/school-of-sre/level102/continuous_integration_and_continuous_delivery/continuous_delivery_release_pipeline) 18 | * [Jenkins based CI/CD pipeline - Hands-on](https://linkedin.github.io/school-of-sre/level102/continuous_integration_and_continuous_delivery/jenkins_cicd_pipeline_hands_on_lab) 19 | * [Conclusion](https://linkedin.github.io/school-of-sre/level102/continuous_integration_and_continuous_delivery/conclusion) 20 | -------------------------------------------------------------------------------- /courses/level102/linux_intermediate/introvim.md: -------------------------------------------------------------------------------- 1 | 2 | # Introduction to Vim 3 | 4 | ## Introduction 5 | As an SRE we several times log into into the servers and make changes to the config file, edit and modify scripts and the editor which comes handy and available in almost all linux distribution is Vim. Vim is an open-source and free command line editor, widely accepted and used. We will see some basics of how to use vim for creating and editing files. This knowledge will help us in understanding the next section, Scripting. 6 | 7 | ## Opening a file and using insert mode 8 | 9 | We use the command *`vim filename`* to open a file *`filename`*. The terminal will open an editor but once you start writing, it won’t work. It’s because we are not in "INSERT" mode in vim. 10 | 11 | Press ***`i`*** and get into insert mode and start writing. 12 | 13 |  14 | 15 | You will see on the bottom left “INSERT” after pressing “***i***” . You can use *`ESC`” key to get back to normal mode. 16 | 17 | ## Saving a file 18 | 19 | After you insert your text in INSERT mode press ESC(escape) key on your keyboard to get out of it. Press `:`(colon shift +;) and press ***`w`*** and hit enter, the text you entered will get written in the file. 20 | 21 |  22 | 23 | ## Exiting the VIM editor 24 | 25 | Exiting vim can get real challenging for the beginners. There are various ways you can exit the Vim like exit without saving the work, exit with saving the work. 26 | 27 | Try below commands after exiting insert mode and pressing ***`:`***(colon). 28 | 29 | | Vim Commands | Description | 30 | | --- | --- | 31 | | **:q** | Exit the file but won’t exit if file has unsaved changes | 32 | | **:wq** | Write(save) and exit the file. | 33 | | **:q!** | Exit without saving the changes. | 34 | 35 | This is basic we would be needing in bash scripting in the next section. You can always visit tutorial for learning more. For quick practice of vim commands visit: [https://www.openvim.com/](https://www.openvim.com/) 36 | 37 | -------------------------------------------------------------------------------- /courses/level101/metrics_and_monitoring/third-party_monitoring.md: -------------------------------------------------------------------------------- 1 | ## 2 | 3 | # Third-party monitoring 4 | 5 | Today most cloud providers offer a variety of monitoring solutions. In 6 | addition, a number of companies such as 7 | [Datadog](https://www.datadoghq.com/) offer 8 | monitoring-as-a-service. In this section, we are not covering 9 | monitoring-as-a-service in depth. 10 | 11 | In recent years, more and more people have access to the Internet. Many 12 | services are offered online to cater to the increasing user base. As a 13 | result, web pages are becoming larger, with increased client-side 14 | scripts. Users want these services to be fast and error-free. From the 15 | service point of view, when the response body is composed, an HTTP 200 16 | OK response is sent, and everything looks okay. But there might be 17 | errors during transmission or on the client-side. As previously 18 | mentioned, monitoring services from within the service infrastructure 19 | give good visibility into service health, but this is not enough. You 20 | need to monitor user experience, specifically the availability of 21 | services for clients. A number of third-party services such as 22 | [Catchpoint](https://www.catchpoint.com/), 23 | [Pingdom](https://www.pingdom.com/), and so on are available for 24 | achieving this goal. 25 | 26 | Third-party monitoring services can generate synthetic traffic 27 | simulating user requests from various parts of the world, to ensure the 28 | service is globally accessible. Other third-party monitoring solutions 29 | for real user monitoring (RUM) provide performance statistics such as 30 | service uptime and response time, from different geographical locations. 31 | This allows you to monitor the user experience from these locations, 32 | which might have different Internet backbones, different operating 33 | systems, and different browsers and browser versions. [Catchpoint 34 | Global Monitoring 35 | Network](https://pages.catchpoint.com/overview-video) is a 36 | comprehensive 3-minute video that explains the importance of monitoring 37 | the client experience. 38 | -------------------------------------------------------------------------------- /courses/level101/git/github-hooks.md: -------------------------------------------------------------------------------- 1 | # Git with GitHub 2 | 3 | Till now all the operations we did were in our local repo while git also helps us in a collaborative environment. GitHub is one place on the Internet where you can centrally host your git repos and collaborate with other developers. 4 | 5 | Most of the workflow will remain the same as we discussed, with addition of couple of things: 6 | 7 | 1. Pull: to pull latest changes from GitHub (the central) repo 8 | 2. Push: to push your changes to GitHub repo so that it's available to all people 9 | 10 | GitHub has written nice guides and tutorials about this and you can refer to them here: 11 | 12 | - [GitHub Hello World](https://guides.github.com/activities/hello-world/) 13 | - [Git Handbook](https://guides.github.com/introduction/git-handbook/) 14 | 15 | ## Hooks 16 | 17 | Git has another nice feature called hooks. Hooks are basically scripts which will be called when a certain event happens. Here is where hooks are located: 18 | 19 | ```bash 20 | $ ls .git/hooks/ 21 | applypatch-msg.sample fsmonitor-watchman.sample pre-applypatch.sample pre-push.sample pre-receive.sample update.sample 22 | commit-msg.sample post-update.sample pre-commit.sample pre-rebase.sample prepare-commit-msg.sample 23 | ``` 24 | 25 | Names are self-explanatory. These hooks are useful when you want to do certain things when a certain event happens. If you want to run tests before pushing code, you would want to setup `pre-push` hooks. Let's try to create a pre commit hook. 26 | 27 | ```bash 28 | $ echo "echo this is from pre commit hook" > .git/hooks/pre-commit 29 | $ chmod +x .git/hooks/pre-commit 30 | ``` 31 | 32 | We basically create a file called `pre-commit` in hooks folder and make it executable. Now if we make a commit, we should see the message getting printed. 33 | 34 | ```bash 35 | $ echo "sample file" > sample.txt 36 | $ git add sample.txt 37 | $ git commit -m "adding sample file" 38 | this is from pre commit hook # <===== THE MESSAGE FROM HOOK EXECUTION 39 | [master 9894e05] adding sample file 40 | 1 file changed, 1 insertion(+) 41 | create mode 100644 sample.txt 42 | ``` 43 | -------------------------------------------------------------------------------- /courses/level101/linux_networking/intro.md: -------------------------------------------------------------------------------- 1 | # Linux Networking Fundamentals 2 | 3 | ## Prerequisites 4 | 5 | - High-level knowledge of commonly used jargon in TCP/IP stack like DNS, TCP, UDP and HTTP 6 | - [Linux Commandline Basics](https://linkedin.github.io/school-of-sre/level101/linux_basics/command_line_basics/) 7 | 8 | ## What to expect from this course 9 | 10 | Throughout the course, we cover how an SRE can optimize the system to improve their web stack performance and troubleshoot if there is an issue in any of the layers of the networking stack. This course tries to dig through each layer of traditional TCP/IP stack and expects an SRE to have a picture beyond the bird’s eye view of the functioning of the Internet. 11 | 12 | ## What is not covered under this course 13 | 14 | This course spends time on the fundamentals. We are not covering concepts like [HTTP/2.0](https://en.wikipedia.org/wiki/HTTP/2), [QUIC](https://en.wikipedia.org/wiki/QUIC), [TCP congestion control protocols](https://en.wikipedia.org/wiki/TCP_congestion_control), [Anycast](https://en.wikipedia.org/wiki/Anycast), [BGP](https://en.wikipedia.org/wiki/Border_Gateway_Protocol), [CDN](https://en.wikipedia.org/wiki/Content_delivery_network), [Tunnels](https://en.wikipedia.org/wiki/Virtual_private_network) and [Multicast](https://en.wikipedia.org/wiki/Multicast). We expect that this course will provide the relevant basics to understand such concepts. 15 | 16 | ## Birds eye view of the course 17 | 18 | The course covers the question “What happens when you open [linkedin.com](https://www.linkedin.com) in your browser?” The course follows the flow of TCP/IP stack. More specifically, the course covers topics of Application layer protocols (DNS and HTTP), transport layer protocols (UDP and TCP), networking layer protocol (IP) and data link layer protocol. 19 | 20 | ## Course Contents 21 | 1. [DNS](https://linkedin.github.io/school-of-sre/level101/linux_networking/dns/) 22 | 2. [UDP](https://linkedin.github.io/school-of-sre/level101/linux_networking/udp/) 23 | 3. [HTTP](https://linkedin.github.io/school-of-sre/level101/linux_networking/http/) 24 | 4. [TCP](https://linkedin.github.io/school-of-sre/level101/linux_networking/tcp/) 25 | 5. [IP Routing](https://linkedin.github.io/school-of-sre/level101/linux_networking/ipr/) 26 | -------------------------------------------------------------------------------- /courses/level102/continuous_integration_and_continuous_delivery/conclusion.md: -------------------------------------------------------------------------------- 1 | ## Applications in SRE Role 2 | 3 | The Monitoring, Automation and Eliminating the toil are some of the core pillars of the SRE discipline. As an SRE, you may require spending about 50% of time on automating the repetitive tasks and to eliminate the toil. CI/CD pipelines are one of the crucial tools for the SRE. They help in delivering the quality application with the smaller and regular and more frequent builds. Additionally, the CI/CD metrics such as Deployment time, Success rate, Cycle time and Automated test success rate etc. are the key things to watch to improve the quality of the product thus improving the reliability of the applications. 4 | 5 | * [Infrastructure-as-code](https://en.wikipedia.org/wiki/Infrastructure_as_code) is one of the standard practices followed in SRE for automating the repetitive configuration tasks. Every configuration is maintained as code, so it can be deployed using CI/CD pipelines. It is important to deliver the configuration changes to the production environments through CI/CD pipelines to maintain the versioning, consistency of the changes across environments and to avoid manual errors. 6 | * Often, as an SRE, you are required to review the application CI/CD pipelines and recommend additional stages such as static code analysis and the security and privacy checks in the code to improve the security and reliability of the product. 7 | 8 | ## Conclusion 9 | 10 | In this chapter, we have studied the CI/CD pipelines with brief history on the challenges with the traditional build practices. We have also looked at how the CI/CD pipelines augments the SRE discipline. Use of CI/CD pipelines in software development life cycle is a modern approach in the SRE realm that helps achieve greater efficiency. 11 | 12 | We have also performed a hands-on lab activity on creating the CI/CD pipeline using Jenkins. 13 | 14 | ## References 15 | 16 | 1. [Continuous Integration(martinfowler.com)](https://martinfowler.com/articles/continuousIntegration.html) 17 | 2. [CI/CD for microservices - Azure Architecture Center | Microsoft Docs](https://docs.microsoft.com/en-us/azure/architecture/microservices/ci-cd) 18 | 3. [SREFoundationBlueprint_2 (devopsinstitute.com)](https://www.devopsinstitute.com/wp-content/uploads/2020/11/SREF-Blueprint.pdf) 19 | 4. [Jenkins User Documentation](https://www.jenkins.io/doc/) -------------------------------------------------------------------------------- /courses/level102/linux_intermediate/package_management.md: -------------------------------------------------------------------------------- 1 | # Package Management 2 | ## Introduction 3 | 4 | One of the main features of any operating system is the ability to run other programs and softwares, and hence Package management comes into picture. Package management is a method of installing and maintaining software programs on any operating system. 5 | 6 | ## Package 7 | 8 | In the early days of Linux, one had to download source code of any software and compile it to install and run the software. As the Linux space became more mature, it is understood the software landscape is very dynamic and started distributing software in the form of packages. Package file is a compressed collection of files that contains software, its dependencies, installation instructions and metadata about the package. 9 | 10 | ## Dependencies 11 | 12 | It is rare that a software package is stand-alone, it depends on the different software, libraries and modules. These subroutines are stored and made available in the form of shared libraries which may serve more than one program. These shared resources are called dependencies. Package management does this hard job of resolving dependencies and installing them for the user along with the software. 13 | 14 | ## Repository 15 | 16 | Repository is a storage location where all the packages, updates, dependencies are stored. Each repository can contain thousands of software packages hosted on a remote server intended to be installed and updated on linux systems. We usually update the package information ( *often referred to as metadata*) by running “*sudo dnf update”.* 17 | 18 |  19 | 20 | Try out *`sudo dnf repolist all`* to list all the repositories. 21 | 22 | We usually add repositories for installing packages from third party vendors. 23 | 24 | > dnf config-manager --add-repo http://www.example.com/example.repo 25 | 26 | ## High Level and Low-Level Package management tools 27 | 28 | There are mainly two types of packages management tools: 29 | 30 | > 1\. *Low-level tools*: This is mostly used for installing, removing and upgrading package files. 31 | > 32 | > 2\. *High-Level tools*: In addition to Low-level tools, High-level tools do metadata searching and dependency resolution as well. 33 | 34 | | Linux Distribution | Low-Level Tools | High-Level tools | 35 | | --- | --- | --- | 36 | | Debian | dpkg | apt-get | 37 | | Fedora, RedHat | dnf | dnf | 38 | 39 | -------------------------------------------------------------------------------- /courses/level101/metrics_and_monitoring/conclusion.md: -------------------------------------------------------------------------------- 1 | # Conclusion 2 | 3 | A robust monitoring and alerting system is necessary for maintaining and 4 | troubleshooting a system. A dashboard with key metrics can give you an 5 | overview of service performance, all in one place. Well-defined alerts 6 | (with realistic thresholds and notifications) further enable you to 7 | quickly identify any anomalies in the service infrastructure and in 8 | resource saturation. By taking necessary actions, you can avoid any 9 | service degradations and decrease MTTD for service breakdowns. 10 | 11 | In addition to in-house monitoring, monitoring real-user experience can 12 | help you to understand service performance as perceived by the users. 13 | Many modules are involved in serving the user, and most of them are out 14 | of your control. Therefore, you need to have real-user monitoring in 15 | place. 16 | 17 | Metrics give very abstract details on service performance. To get a 18 | better understanding of the system and for faster recovery during 19 | incidents, you might want to implement the other two pillars of 20 | observability: logs and tracing. Logs and trace data can help you 21 | understand what led to service failure or degradation. 22 | 23 | Following are some resources to learn more about monitoring and 24 | observability: 25 | 26 | - [Google SRE book: Monitoring Distributed 27 | Systems](https://sre.google/sre-book/monitoring-distributed-systems/) 28 | 29 | - [Mastering Distributed Tracing by Yuri 30 | Shkuro](https://learning.oreilly.com/library/view/mastering-distributed-tracing/9781788628464/) 31 | 32 | 33 | ## References 34 | 35 | - [Google SRE book: Monitoring Distributed 36 | Systems](https://sre.google/sre-book/monitoring-distributed-systems/) 37 | 38 | - [Mastering Distributed Tracing, by Yuri 39 | Shkuro](https://learning.oreilly.com/library/view/mastering-distributed-tracing/9781788628464/) 40 | 41 | - [Monitoring and 42 | Observability](https://copyconstruct.medium.com/monitoring-and-observability-8417d1952e1c) 43 | 44 | - [Three PIllars with Zero 45 | Answers](https://medium.com/lightstephq/three-pillars-with-zero-answers-2a98b36358b8) 46 | 47 | - Engineering blogs on 48 | [LinkedIn](https://engineering.linkedin.com/blog/topic/monitoring), 49 | [Grafana](https://grafana.com/blog/), 50 | [Elastic.co](https://www.elastic.co/blog/), 51 | [OpenTelemetry](https://medium.com/opentelemetry) 52 | -------------------------------------------------------------------------------- /courses/level102/system_calls_and_signals/intro.md: -------------------------------------------------------------------------------- 1 | # System Calls and Signals 2 | 3 | ## Prerequisites 4 | 5 | - [Linux Basics](https://linkedin.github.io/school-of-sre/level101/linux_basics/intro/) 6 | - [Python Basics](https://linkedin.github.io/school-of-sre/level101/python_web/intro/) 7 | 8 | ## What to expect from this course 9 | 10 | The course covers a fundamental understanding of signals and system calls. It sheds light on how the knowledge of signals and system calls can be helpful for an SRE. 11 | 12 | ## What is not covered under this course 13 | 14 | The course does not discuss any other interrupts or interrupt handling apart from signals. The course will not deep dive into signal handler and GNU C library. 15 | 16 | ## Course Contents 17 | - [Signals](https://linkedin.github.io/school-of-sre/level102/system_calls_and_signals/signals) 18 | - [Introduction to interrupts and signals](https://linkedin.github.io/school-of-sre/level102/system_calls_and_signals/signals/#introduction-to-interrupts-and-signals) 19 | - [Types of signals](https://linkedin.github.io/school-of-sre/level102/system_calls_and_signals/signals/#types-of-signals) 20 | - [Sending signals to process](https://linkedin.github.io/school-of-sre/level102/system_calls_and_signals/signals/#sending-signals-to-process) 21 | - [Handling signals](https://linkedin.github.io/school-of-sre/level102/system_calls_and_signals/signals/#handling-signals) 22 | - [Role of signals in system calls with the example of *wait()*](https://linkedin.github.io/school-of-sre/level102/system_calls_and_signals/signals/#role-of-signals-in-system-calls-with-the-example-of-wait) 23 | - [System calls](https://linkedin.github.io/school-of-sre/level102/system_calls_and_signals/system_calls) 24 | - [Introduction](https://linkedin.github.io/school-of-sre/level102/system_calls_and_signals/system_calls/#introduction) 25 | - [Types of system calls](https://linkedin.github.io/school-of-sre/level102/system_calls_and_signals/system_calls/#types-of-system-calls) 26 | - [User mode,kernel mode and their transitions](https://linkedin.github.io/school-of-sre/level102/system_calls_and_signals/system_calls/#user-mode-kernel-mode-and-their-transitions) 27 | - [Working of *write()* system call](https://linkedin.github.io/school-of-sre/level102/system_calls_and_signals/system_calls/#working-of-write-system-call) 28 | - [Debugging in Linux with strace](https://linkedin.github.io/school-of-sre/level102/system_calls_and_signals/system_calls/#debugging-in-linux-with-strace) 29 | 30 | -------------------------------------------------------------------------------- /courses/level101/security/conclusion.md: -------------------------------------------------------------------------------- 1 | # Conclusion 2 | 3 | Now that you have completed this course on Security you are now aware of the possible security threats to computer systems & networks. Not only that, but you are now better able to protect your systems as well as recommend security measures to others. 4 | 5 | This course provides fundamental everyday knowledge on security domain which will also help you keep security at the top of your priority. 6 | 7 | ## Other Resources 8 | 9 | Some books that would be a great resource 10 | 11 | - Holistic Info-Sec for Web Developers (Figure 2: Results of top command
32 | 33 | - **`ss`**: The socket statistics command (`ss`) displays information 34 | about network sockets on the system. This tool is the successor of 35 | [netstat](https://man7.org/linux/man-pages/man8/netstat.8.html), 36 | which is deprecated. Following are some command-line options 37 | supported by the `ss` command: 38 | 39 | - `-t`: Displays the TCP socket. Similarly, `-u` displays UDP 40 | sockets, `-x` is for UNIX domain sockets, and so on. 41 | 42 | - `-l`: Displays only listening sockets. 43 | 44 | - `-n`: Instructs the command to not resolve service names. 45 | Instead displays the port numbers. 46 | 47 | Figure 48 | 3: List of listening sockets on a system
49 | 50 | - **`free`**: The `free` command displays memory usage statistics on the 51 | host like available memory, used memory, and free memory. Most often, 52 | this command is used with the `-h` command-line option, which 53 | displays the statistics in a human-readable format. 54 | 55 |  56 |Figure 4: Memory statistics on a host in human-readable form
57 | 58 | - **`df`**: The `df` command displays disk space usage statistics. The 59 | `-i` command-line option is also often used to display 60 | [inode](https://en.wikipedia.org/wiki/Inode) usage 61 | statistics. The `-h` command-line option is used for displaying 62 | statistics in a human-readable format. 63 | 64 |  65 |Figure 5: 66 | Disk usage statistics on a system in human-readable form
67 | 68 | - **`sar`**: The `sar` utility monitors various subsystems, such as CPU 69 | and memory, in real time. This data can be stored in a file 70 | specified with the `-o` option. This tool helps to identify 71 | anomalies. 72 | 73 | - **`iftop`**: The interface top command (`iftop`) displays bandwidth 74 | utilization by a host on an interface. This command is often used 75 | to identify bandwidth usage by active connections. The `-i` option 76 | specifies which network interface to watch. 77 | 78 |  80 |Figure 6: Network bandwidth usage by 81 | active connection on the host
82 | 83 | - **`tcpdump`**: The `tcpdump` command is a network monitoring tool that 84 | captures network packets flowing over the network and displays a 85 | description of the captured packets. The following options are 86 | available: 87 | 88 | - `-i Figure 7: tcpdump of packets on docker0
101 | interface on a host