├── README.md ├── docs ├── cpu-ctxt.md ├── cpu-load.md ├── cpu-percentage.md ├── images │ └── linux_io.png ├── io-usage.md ├── memory-usage.md ├── net-util.md └── references.md ├── packer ├── README.md ├── bootstrap.sh └── packer.json ├── saltstack ├── README.md ├── Saltfile ├── master ├── passwords │ ├── README.md │ ├── parseme.py │ ├── sample_sorted.txt │ └── uuids.json ├── pillars │ ├── defaults │ │ └── init.sls │ └── top.sls ├── roster ├── states │ ├── bootstrap │ │ ├── init.sls │ │ ├── iptables_redirect │ │ ├── netdata.service │ │ ├── rc-local.service │ │ └── rc.local │ ├── control │ │ └── init.sls │ ├── top.sls │ └── users │ │ ├── init.sls │ │ ├── root_bashrc │ │ └── workshop_sudoers └── tools │ └── install_python.sh └── scripts ├── cpu ├── dummy1.sh ├── dummy2.sh ├── dummy3.sh ├── dummy4.sh └── dummy_app.py ├── disk ├── fio1.sh ├── fio2.sh ├── fio3.sh └── writer.sh └── memory ├── buffer.sh ├── cache.sh ├── dentry.py ├── dentry2.py └── hog.sh /README.md: -------------------------------------------------------------------------------- 1 | # Linux Metrics Workshop 2 | While you can learn a lot by emitting metrics from your application, some insights can only be gained by looking at OS metrics. In this hands-on workshop, we will cover the basics in Linux metric collection for monitoring, performance tuning and capacity planning. 3 | 4 | ## Topics 5 | 1. CPU 6 | 1. [CPU Percentage](docs/cpu-percentage.md) 7 | 1. [CPU Load](docs/cpu-load.md) 8 | 1. [Context Switches](docs/cpu-ctxt.md) 9 | 1. Memory 10 | 1. [Memory Usage](docs/memory-usage.md) 11 | 1. IO 12 | 1. [IO Usage](docs/io-usage.md) 13 | 1. Network 14 | 1. [Network Utilization](docs/net-util.md) 15 | 1. [References](docs/references.md) 16 | 17 | ## Setup 18 | The workshop was designed to run on AWS EC2 t2.small instances with general purpose SSD, Ubuntu 18.04 amd64, and transparent hugh pages disabled. 19 | You can build an AMI with all the dependencies installed using the attached [packer](https://www.packer.io/) template. 20 | 21 | If you run on your own instance, make sure you have only 1 CPU (easier to read the metrics) and that you disable transparent huge pages (`echo never > /sys/kernel/mm/transparent_hugepage/enabled `) 22 | -------------------------------------------------------------------------------- /docs/cpu-ctxt.md: -------------------------------------------------------------------------------- 1 | # CPU Metrics 2 | 3 | You will need 3 ssh terminals 4 | 5 | ## Context Switches 6 | 7 | ### Recall: Linux Process Context Switches 8 | A mechanism to store current process *state* ie. Registers, Memory maps, Kernel structs (eg. TSS in 32bit), and load another (or a new one). Context switches are usually computationally expensive (although optimization exist), yet inevitable. For example, they are used to allow multi-tasking (eg. preemption), and to switch between user and kernel modes. 9 | 10 | Interprocess context switches are classified as *voluntary* or *involuntary*. A voluntary context switch occurs when a thread blocks because it 11 | requires a resource that is unavailable. An involuntary context switch takes place when a thread executes for the duration of its time slice or when 12 | the system identifies a higher-priority thread to run. 13 | 14 | ### Task CS1: Context Switches 15 | 16 | 1. Execute `vmstat 2` in a session (#1) and write down the current context switch rate (`cs` field): 17 | ```bash 18 | (term 1) root:~# vmstat 2 19 | ``` 20 | 2. Raise that number by executing `stress -i 10` in a new session (#2): 21 | ```bash 22 | (term 2) root:~# stress -i 10 23 | ``` 24 | 1. What is the current context switch rate? 25 | 2. What is causing this rate? Multi-tasking? Interrupts? Switches between kernel and user modes? 26 | 3. Kill the `stress` command in session #2, and watch the rate drop. 27 | 3. Now let's see how a high context switch rate affects a dummy application. 28 | 1. On session #2 run the dummy application `dummy_app.py` (which calls a dummy function 5000 times, and prints it's runtime percentiles): 29 | ```bash 30 | (term 2) root:~# perf stat -e cs python linux-metrics/scripts/cpu/dummy_app.py 31 | ``` 32 | 2. Write the current CPU usage, the application percentiles and context switch rate. 33 | 3. **In the same session (#2)**, raise the context switch rate using `stress -i 10 -t 150 &` and re-run the dummy application. Write the current CPU usage, the application percentiles and context switch rate. 34 | ```bash 35 | (term 2) root:~# stress -i 10 -t 150 & 36 | (term 2) root:~# perf stat -e cs python linux-metrics/scripts/cpu/dummy_app.py 37 | ``` 38 | 4. Describe the change in the percentiles. Did the high context switch rate affect most of `foo()` runs (ie. the 50th percentile)? If not, why? 39 | 4. Observe the behaviour when running `stress` in a different scheduling task group: 40 | 1. Open a new session (#3) and move it to a different cgroup: 41 | ```bash 42 | (term 3) root:~# mkdir -p /sys/fs/cgroup/cpu/grp/c; echo $$ | sudo tee /sys/fs/cgroup/cpu/grp/c/tasks 43 | ``` 44 | 2. Run stress again in the new session (#3) `stress -i 10 -t 150` or `stress -c 10 -t 150`: 45 | ```bash 46 | (term 3) root:~# stress -i 10 -t 150 47 | ``` 48 | 3. Compare the CPU usage to **3.iii** (it should be roughly the same) and compare the context switch rate (which should be the same). 49 | 4. Re-run the dummy application in the previous session (#2) and describe the change in the percentiles (and process context switch) vs **3.iv**. 50 | 5. What happens when processes compete for cpu time under a cgroup hierarchy? 51 | 1. Move the second session to a new cgroup: 52 | ```bash 53 | (term 2) root:~# mkdir -p /sys/fs/cgroup/cpu/grp/b; echo $$ | sudo tee /sys/fs/cgroup/cpu/grp/b/tasks 54 | ``` 55 | 2. Run `stress` in session #2 and `perf dummy_app.py` in session #3. What do you observe? 56 | ```bash 57 | (term 2) root:~# stress -i 10 -t 150 58 | (term 3) root:~# perf stat -e cs python linux-metrics/scripts/cpu/dummy_app.py 59 | ``` 60 | 2. Lower cpu.shares for stress cgroup (#2) and raise for cgroup (#3): 61 | ```bash 62 | (term 2) root:~# echo 200 > /sys/fs/cgroup/cpu/grp/b/cpu.shares 63 | (term 3) root:~# echo 1000 > /sys/fs/cgroup/cpu/grp/c/cpu.shares 64 | ``` 65 | 3. In session #2 run `stress` again and in session #3 run the `perf dummy_app.py`: 66 | ```bash 67 | (term 2) root:~# stress -i 10 -t 150 68 | (term 3) root:~# perf stat -e cs python linux-metrics/scripts/cpu/dummy_app.py 69 | ``` 70 | 4. What do you observe? 71 | 72 | ### Discussion 73 | 74 | - Can performance measurements on a staging environment truly estimate performance on production? 75 | - Why did we run the `stress` command and our dummy application in the same session? 76 | 77 | ### Tools 78 | 79 | - Most tools use `/proc/vmstat` to fetch global context switching information (`ctxt` field), and `/proc/[PID]/status` for process specific information (`voluntary_context_switches` and `nonvoluntary_context_switches` fields). 80 | - From the command-line you can use: 81 | - `vmstat ` for global information. 82 | - `pidstat -w -p ` for process specific information (voluntary/involuntary context switches). 83 | 84 | ### Further reading 85 | 86 | - `man sched` 87 | - http://www.linfo.org/context_switch.html 88 | - https://wiki.archlinux.org/index.php/cgroups 89 | 90 | #### Next: [Memory Usage](memory-usage.md) 91 | -------------------------------------------------------------------------------- /docs/cpu-load.md: -------------------------------------------------------------------------------- 1 | # CPU Metrics 2 | 3 | ## CPU Load 4 | On Linux, Load Average is average the number of runnable (running + ready to run) tasks, plus tasks in uninterruptible state, in the last 1, 5, and 15 minutes. 5 | 6 | 7 | ### Recall: Linux Process State 8 | 9 | From `man 1 ps`: 10 | ``` 11 | D uninterruptible sleep (usually IO) 12 | R running or runnable (on run queue) 13 | S interruptible sleep (waiting for an event to complete) 14 | T stopped, either by a job control signal or because it is being traced 15 | Z defunct ("zombie") process, terminated but not reaped by its parent 16 | ``` 17 | 18 | ### Task CL1: CPU Load 19 | Open 3 terminals (ssh connections). 20 | 21 | 1. Where are the Load Averages? Use the Linux Process States and `man 5 proc` (search for loadavg): 22 | ```bash 23 | (term 1) root:~# man 5 proc 24 | (term 1) root:~# cat /proc/loadavg 25 | 46.26 12.59 4.39 1/106 7023 26 | ``` 27 | 2. Start the disk stress script (**NOTE: Do not run this on your own laptop !!!**): 28 | 29 | ```bash 30 | (term 1) root:~# /bin/sh linux-metrics/scripts/disk/writer.sh 31 | ``` 32 | 33 | 3. Run the following command and look at the Load values for about a minute until `ldavg-1` stabilizes: 34 | 35 | ```bash 36 | (term 2) root:~# sar -q 1 100 37 | ``` 38 | * What is the writing speed of the script? 39 | * What is the current Load Average? Why? Which processes contribute to this number? 40 | ```bash 41 | (term 2) root:~# top 42 | ``` 43 | * What are CPU `%user`, `%sy`, `%IO-wait` and `%idle`? 44 | 45 | 4. While the previous script is running, start a single CPU stress: 46 | 47 | ```bash 48 | (term 3) root:~# stress -c 1 -t 3600 49 | ``` 50 | Wait another minute, and answer the above questions above. 51 | 52 | 5. Stop all scripts. 53 | 54 | ### Discussion 55 | 56 | - Why are processes waiting for IO included in the Load Average? 57 | - Assuming we have 1 CPU core and Load of 5, is our CPU core on 100% utilization? 58 | - How can we know if load is going up or down? 59 | - Does a load average of 70 indicate a problem? 60 | 61 | ### Further reading 62 | - [Understanding Linux CPU Load](http://blog.scoutapp.com/articles/2009/07/31/understanding-load-averages) 63 | 64 | ## Tools 65 | 66 | - Most tools use `/proc/loadavg` to fetch Load Average and run queue information. 67 | - To get a percentage over a specific interval of time, you can use: 68 | - `sar -q ` 69 | - `-q` queue length and load averages 70 | - or simply `uptime` 71 | 72 | #### Next: [Context Switches](cpu-ctxt.md) 73 | -------------------------------------------------------------------------------- /docs/cpu-percentage.md: -------------------------------------------------------------------------------- 1 | # CPU Metrics 2 | 3 | You will need 3 ssh terminals 4 | 5 | ## CPU Percentage 6 | Let's start with the most common CPU metric. 7 | Fire up `top`, and let's start figuring out what the different CPU percentage values are. 8 | ```bash 9 | (term 1) root:~# top 10 | ``` 11 | 12 | The output will look like: 13 | ```bash 14 | %Cpu(s): 2.3 us, 0.6 sy, 0.0 ni, 96.7 id, 0.2 wa, 0.0 hi, 0.0 si, 0.0 st 15 | ``` 16 | 17 | ### Task CP1: CPU Percentage 18 | For each of the following scripts (`dummy1.sh`, `dummy2.sh`, `dummy3.sh`, `dummy4.sh`) under the `scripts/cpu/` directory: 19 | 20 | 1. Run the script: 21 | ```bash 22 | (term 2) root:~# /bin/sh linux-metrics/scripts/cpu/dummy1.sh 23 | ``` 24 | 2. While the script is running, look at `top` on terminal window 1. 25 | 3. Without looking at the code, try to figure out what the script is doing (find the percentage fields description in `man 1 top`) 26 | 4. Stop the script (use `Ctrl+C` or wait 2 minutes for it to timeout) 27 | 5. Verify your answer by reading the script content 28 | 29 | ### Tools 30 | 31 | - Most tools use `/proc/stat` to fetch CPU percentage. Note that it displays amount of time measured in units of USER_HZ 32 | (1/100ths of a second on most architectures), also called jiffies, and not percentage. 33 | - To get a percentage over a specific interval of time, you can use: 34 | - `sar ` 35 | - or `sar -P ALL -u ` (for details on multiple cpus) 36 | - `-P` per-processor statistics 37 | - `-u` CPU utilization 38 | - or `mpstat` (similar usage and output) 39 | ```bash 40 | (term 3) root:~# sar 1 41 | ``` 42 | or 43 | ```bash 44 | (term 3) root:~# mpstat 1 45 | ``` 46 | 47 | ### Discussion 48 | 49 | - What's the difference between `%IO-wait` and `%idle`? 50 | - Is the entire CPU load created by a process accounted to that process? 51 | 52 | ### Further reading 53 | 54 | - `man proc` 55 | - `man sar` 56 | 57 | #### Time Stolen and Amazon EC2 58 | 59 | You may have noticed the `st` label. From `man 1 top`: 60 | ``` 61 | st : time stolen from this vm by the hypervisor 62 | ``` 63 | Amazon EC2 uses the hypervisor to regulate the machine CPU usage (to match the instance type's EC2 Compute Units). If you see inconsistent stolen percentage over time, then you might be using [Burstable Performance Instances](http://aws.amazon.com/ec2/instance-types/#burst). 64 | 65 | #### Next: [CPU Load](cpu-load.md) 66 | -------------------------------------------------------------------------------- /docs/images/linux_io.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kargig/linux-metrics/bf5a2447bcf67872f2c756d1774f57adcc2f2811/docs/images/linux_io.png -------------------------------------------------------------------------------- /docs/io-usage.md: -------------------------------------------------------------------------------- 1 | # IO Metrics 2 | 3 | You will need 2 ssh terminals 4 | 5 | ## IO Usage 6 | 7 | ### Recall: Linux IO, Merges, IOPS 8 | 9 | Linux IO performance is affected by many factors, including your application workload, choice of file-system, IO scheduler choice (eg. [cfq](https://www.kernel.org/doc/Documentation/block/cfq-iosched.txt), [deadline](https://www.kernel.org/doc/Documentation/block/deadline-iosched.txt)), queue configuration, device driver, underlying device(s) caches and more. 10 | 11 | ![Linux IO](images/linux_io.png) 12 | 13 | #### Merged Reads/Writes 14 | 15 | From [the Kernel Documentation](https://www.kernel.org/doc/Documentation/iostats.txt): *"Reads and writes which are adjacent to each other may be merged for efficiency. Thus two 4K reads may become one 8K read before it is ultimately handed to the disk, and so it will be counted (and queued) as only one I/O"* 16 | 17 | #### IOPS 18 | 19 | IOPS are input/output operations per second. Some operations take longer than others, eg. HDDs can do sequential reading operations much faster than random writing operations. Here are some rough estimations from [Wikipedia](https://en.wikipedia.org/wiki/IOPS) and [Amazon EBS Product Details](http://aws.amazon.com/ebs/details/): 20 | 21 | | Device/Type | IOPS | 22 | |-----------------------|-----------| 23 | | 7.2k-10k RPM SATA HDD | 75-150 | 24 | | 10k-15k RPM SAS HDD | 140-210 | 25 | | SATA SSD | 1k-120k | 26 | | AWS EC2 gp2 | up to 10k | 27 | | AWS EC2 io1 | up to 20k | 28 | 29 | ### Task I1: IO Usage 30 | 31 | 1. Start by running `iostat`, and examine the output fields. Let's go over the important ones together: 32 | ```bash 33 | (term 1) root:~# iostat -xd 2 34 | ``` 35 | - **rrqm/s** & **wrqm/s**- Number of read/write requests merged per-second. 36 | - **r/s** & **w/s**- Read/Write requests (after merges) per-second. Their sum is the **IOPS**! 37 | - **rkB/s** & **wkB/s**- Number of kB read/written per-second, ie. **IO throughput**. 38 | - **avgqu-sz**- Average requests queue size for this device. Check out `/sys/block//queue/nr_requests` for the maximum queue size. 39 | - **r_await**, **w_await**, **await**- The average time (in ms.) for read/write/both requests to be served, including time spent in the queue, ie. **IO latency** 40 | 2. Please write down these field's values when our system is at rest. 41 | 3. In a new session, let's benchmark our device *write performance* by running: 42 | 43 | ```bash 44 | (term 2) root:~# /bin/sh linux-metrics/scripts/disk/fio1.sh 45 | ``` 46 | 47 | This will clone 16 processes to perform non-buffered (direct) random writes for 3 minutes. 48 | 1. Compare the values you see in `iostat` to the values you wrote down earlier. Do they make sense? 49 | 2. Look at `fio` results and try to see if the number of IOPS make sense (we are using EBS gp2 volumes). 50 | 4. Repeat the previous task, this time benchmark **read performance**: 51 | 52 | ```bash 53 | (term 2) root:~# /bin/sh linux-metrics/scripts/disk/fio2.sh 54 | ``` 55 | 56 | 5. Finally, repeat **read performance** benchmark with 1 process: 57 | 58 | ```bash 59 | (term 2) root:~# /bin/sh linux-metrics/scripts/disk/fio3.sh 60 | ``` 61 | 1. Read about the `svctm` field in `man 1 iostat`. Compare the value we got now to the value we got for 16 processes. Is there a difference? If so, why? 62 | 2. Repeat the previous question for the `%util` field. 63 | 64 | 6. `fio` also supports other IO patterns (by changing the `--rw=` parameter), including: 65 | - `read` Sequential reads 66 | - `write` Sequential writes 67 | - `rw` Mixed sequential reads and writes 68 | - `randrw` Mixed random reads and writes 69 | 70 | If time permits, explore these IO patterns to learn more about EBS gp2's performance under different workloads. 71 | 72 | ### Discussion 73 | 74 | - Why do we need an IO queue? What does it enable the kernel to perform? Read a few more things on IO queue depths [here](https://blog.docbert.org/queue-depth-iops-and-latency/) 75 | - Why are the `svctm` and `%util` iostat fields essentially useless in a modern environment? (read [Marc Brooker's excellent blog post](https://brooker.co.za/blog/2014/07/04/iostat-pct.html)) 76 | - What is the difference in how the kernel handles reads and writes? How does that effect metrics and application behaviour? 77 | 78 | ### Tools 79 | 80 | - Most tools use `/proc/diskstats` to fetch global IO statistics. 81 | - Per-process IO statistics are usually fetched from `/proc/[pid]/io`, which is documented in `man 5 proc`. 82 | - From the command-line you can use: 83 | - `iostat -xd ` for per-device information 84 | - `-d` device utilization 85 | - `-x` extended statistics 86 | - `sudo iotop` for a `top`-like interface (easily find the process doing most reads/writes) 87 | - `-o` only show processes or threads actually doing I/O 88 | 89 | #### Next: [Network Utilization](net-util.md) 90 | -------------------------------------------------------------------------------- /docs/memory-usage.md: -------------------------------------------------------------------------------- 1 | # Memory Metrics 2 | 3 | ## Memory Usage 4 | 5 | Is free memory really free? What's the difference between cached memory and buffers? Let's get our hands dirty and find out... 6 | 7 | You will need 3 open terminals for this task. **DO NOT RUN ANY SCRIPTS ON YOUR LAPTOP!** 8 | 9 | ### Task M1: Memory usage, Caches and Buffers 10 | 11 | 1. Fire up `top` on Terminal 1, and write down how much `free` memory you have (**keep it running for the rest of this module**): 12 | ```bash 13 | (term 1) root:~# top 14 | ``` 15 | 2. Start the memory hog `hog.sh` on Terminal 2, let it run until it gets killed (if it hangs- use `Ctrl+c`): 16 | ```bash 17 | (term 2) root:~# /bin/sh linux-metrics/scripts/memory/hog.sh 18 | ``` 19 | 3. Go to Terminal 1 and compare that to the number you wrote. Are they (almost) the same? If not, why? 20 | 4. Read about the `Buffers` and `Cached` values in `man 5 proc` (under `meminfo`): 21 | 1. Run the memory hog on Terminal 2 `scripts/memory/hog.sh`: 22 | ```bash 23 | (term 2) root:~# /bin/sh linux-metrics/scripts/memory/hog.sh 24 | ``` 25 | 2. Write down the `buffer` size from Terminal 1. 26 | 3. Now run the buffer balloon `buffer.sh` on Terminal 2: 27 | ```bash 28 | (term 2) root:~# /bin/sh linux-metrics/scripts/memory/buffer.sh 29 | ``` 30 | 4. Check the `buffer` size again. 31 | 5. Read the script, and see if you can make sense of the results. 32 | 6. Repeat all 5 steps above with the `cached Mem` value. 33 | 7. Repeat all steps for `cache.sh`: 34 | ```bash 35 | (term 2) root:~# /bin/sh linux-metrics/scripts/memory/cache.sh 36 | ``` 37 | 5. Let's see how `cached Mem` affects application performance: 38 | 1. Drop the cache: 39 | ```bash 40 | (term 2) root:~# echo 3 > /proc/sys/vm/drop_caches 41 | ``` 42 | 2. Time a dummy Python application (you can repeat these 2 steps multiple times): 43 | ```bash 44 | (term 2) root:~# time python -c 'print "Hello World"' 45 | ``` 46 | 3. Now re-run our dummy Python application, but this time without flushing the cached memory. Can you see the difference? 47 | 6. Run the `dentry.py` script and observe the memory usage using `free`. What is using the memory? How does it effect performance? 48 | ```bash 49 | (term 2) root:~# python linux-metrics/scripts/memory/dentry.py 50 | (term 2) root:~# echo 3 > /proc/sys/vm/drop_caches 51 | (term 2) root:~# time ls trash/ >/dev/null 52 | (term 2) root:~# time ls trash/ >/dev/null 53 | ``` 54 | 7. Run the `dentry2.py` script and try dropping the caches. Does it make a difference? 55 | ```bash 56 | (term 2) root:~# python linux-metrics/scripts/memory/dentry2.py 57 | ``` 58 | 59 | ### Discussion 60 | 61 | - What's the difference between `dentry.py` and `dentry2.py`? 62 | - Assuming a server has some amount of free memory, can we assume it has enough memory to support it's current workload? If not, why? 63 | - Why wasn't our memory hog able to grab all the `cached` memory? 64 | - Run the following stress test, what do you see? 65 | ```bash 66 | (term 2) root:~# stress -m 18 --vm-bytes 100M -t 600s 67 | ``` 68 | 69 | 70 | ### Tools 71 | 72 | - Most tools use `/proc/meminfo` to fetch memory usage information. 73 | - A simple example is the `free` utility. 74 | - What does the 2nd line of `free` tell us? 75 | - To get usage information over some period, use `sar -r `. 76 | - Here you can also see how many dirty pages you have (try running `sync` while `sar` is running). 77 | 78 | #### Next: [IO Usage](io-usage.md) 79 | -------------------------------------------------------------------------------- /docs/net-util.md: -------------------------------------------------------------------------------- 1 | # Network Metrics 2 | 3 | ## Network Utilization 4 | 5 | ### Task NP1: Network Utilization 6 | 7 | 1. Assuming we have two machines connected with Gigabit Ethernet interfaces, what is the maximum expected **throughput in kilo-bytes**? 8 | 2. For the following task you'll need two machines, or a partner: 9 | 10 | | Machine/s | Command | Notes | 11 | |:---------:|---------|-------| 12 | | A + B | `sar -n DEV 2` | Write down the receive/transmit packets/KB per-second. Keep this running for the entire length of the task | 13 | | A + B | `sar -n EDEV 2` | These are the error statistics, read about them in `man 1 sar`. Keep this running for the entire length of the task | 14 | | A | `ip a` | Write down A's IP address | 15 | | A | `iperf -s` | This will start a server for our little benchmark | 16 | | B | `iperf -c -t 30` | Start the benchmark client for 30 seconds | 17 | | A | `iperf -su` | Replace the previous TCP server with a UDP one | 18 | | B | `iperf -c 172.30.0.251 -u -b 1000M -l 8k` | Repeat the benchmark with UDP traffic | 19 | 20 | 1. When running the client on B, use `sar` data to determine A's link utilization (in %, assuming Gigabit Ethernet)? 21 | 2. What are the major differences between TCP and UDP traffic observable with `sar`? 22 | 3. Start to decrease the UDP buffer length (ie. from `8k` to `4k`, `2k`, `1k`, `512`, `128`, `64`). 23 | 1. Does the **throughput in KB** increase or decrease? 24 | 2. What about the **throughput in packets**? 25 | 3. Look carefully at the `iperf` client and server report. Can you see any packet loss? Can you also see them in `ifconfig`? 26 | 27 | ### Network Errors 28 | 29 | While Linux provides multiple metrics for network errors including- collisions, errors, and packet drops, the [kernel documentation](https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-class-net-statistics) indicates that these metrics meaning is driver specific, and tightly coupled with your underlying networking layers. 30 | 31 | ### Tools 32 | 33 | - Most tools use `/proc/net/dev` to fetch network device information. 34 | - For example, try running `sar -n DEV `. 35 | - Connection information for TCP, UDP and raw sockets can be fetched using `/proc/net/tcp`, `/proc/net/udp`, `/proc/net/raw` 36 | - For parsed socket information use `netstat -tuwnp`. 37 | - `-t`, `-u`, `-w`: TCP, UDP and raw sockets 38 | - `-n`: no DNS resolving 39 | - `-p`: the process owning this socket 40 | - The most comprehensive command-line utility is `netstat`, covering metrics from interface statistics to socket information. 41 | - Check `iptraf` for interactive traffic monitoring (no socket information). 42 | - Finally, `nethogs` provides a `top`-like experience, allowing you to find which process is taking up the most bandwidth (TCP only). 43 | 44 | ### Discussion 45 | 46 | - What could be the reasons for packet drops? Which of these reasons can be measured on the receiving side? 47 | - Why can't you see the `%ifutil` value on EC2? 48 | - **Hint**: Network device speed is usually found in `/sys/class/net//speed`. 49 | - **Workaround**: The `nicstat` utility allows you to specify the speed and duplex of your network interface from the command line: 50 | ``` 51 | nicstat -S eth0:1000fd -n 2 52 | ``` 53 | 54 | -------------------------------------------------------------------------------- /docs/references.md: -------------------------------------------------------------------------------- 1 | # References 2 | 3 | This workshop borrows heavily from the following talks/blog-posts: 4 | 5 | - Harald van Breederode's "Understanding Linux Load Average" [Part 1](https://prutser.wordpress.com/2012/04/23/understanding-linux-load-average-part-1/) and [Part 2](https://prutser.wordpress.com/2012/05/05/understanding-linux-load-average-part-2/) 6 | - Brendan Gregg's [Linux Load Averages: Solving the Mystery](http://www.brendangregg.com/blog/2017-08-08/linux-load-averages.html) 7 | - Vidar Holen's ["Help! Linux ate my RAM!"](http://www.linuxatemyram.com/play.html) 8 | - Ben Mildren's [Monitoring IO performance using iostat & pt-diskstats](https://www.percona.com/live/mysql-conference-2013/sites/default/files/slides/Monitoring-Linux-IO.pdf) 9 | - Amazon EC2 User Guide for Linux Instances, [Benchmark Volumes](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/benchmark_piops.html) 10 | - Brendan Gregg's [USE Method: Linux Performance Checklist](http://www.brendangregg.com/USEmethod/use-linux.html) 11 | - Serhiy Topchiy's [Testing Amazon EC2 network speed](http://epamcloud.blogspot.co.il/2013/03/testing-amazon-ec2-network-speed.html) 12 | -------------------------------------------------------------------------------- /packer/README.md: -------------------------------------------------------------------------------- 1 | # AMI builder for Linux Metrics workshop 2 | 3 | ## Usage 4 | 5 | ``` 6 | packer build -var 'aws_access_key=...' -var 'aws_secret_key=...' -var 'subnet_id=...' -var 'vpc_id=...' packer.json 7 | ``` 8 | -------------------------------------------------------------------------------- /packer/bootstrap.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | if [ $(id -u) -ne 0 ]; then 4 | echo "You must run this script as root. Attempting to sudo" 1>&2 5 | exec sudo -n bash $0 $@ 6 | fi 7 | 8 | # Wait for cloud-init 9 | sleep 10 10 | 11 | # sysdig repo 12 | curl -s https://s3.amazonaws.com/download.draios.com/DRAIOS-GPG-KEY.public | apt-key add - 13 | curl -s -o /etc/apt/sources.list.d/draios.list http://download.draios.com/stable/deb/draios.list 14 | 15 | # Install packages 16 | apt-get update 17 | export DEBIAN_PRIORITY=critical 18 | export DEBIAN_FRONTEND=noninteractive 19 | 20 | apt-get -y install procps sysstat stress python2.7 gcc vim vim-youcompleteme linux-tools-common linux-tools-generic linux-tools-$(uname -r) fio iotop iperf iptraf nethogs nicstat git build-essential manpages-dev glibc-doc 21 | apt-get -y install linux-headers-$(uname -r) sysdig 22 | 23 | # BCC 24 | apt-get install -y bison build-essential cmake flex git libedit-dev \ 25 | libllvm3.9 llvm-3.9-dev libclang-3.9-dev python zlib1g-dev libelf-dev luajit luajit-5.1-dev arping netperf 26 | 27 | cd ~ 28 | git clone https://github.com/iovisor/bcc.git 29 | mkdir bcc/build; cd bcc/build 30 | cmake .. -DCMAKE_INSTALL_PREFIX=/usr/local 31 | make install 32 | ldconfig 33 | 34 | cd ~ 35 | LATEST_NETDATA="$(wget -q -O - https://raw.githubusercontent.com/firehol/binary-packages/master/netdata-latest.gz.run)" 36 | wget -q -O /tmp/netdata.gz.run "https://raw.githubusercontent.com/firehol/binary-packages/master/${LATEST_NETDATA}" 37 | bash /tmp/netdata.gz.run --quiet --accept 38 | rm /tmp/netdata.gz.run 39 | 40 | # Change netdata systemd definition (e.g. to renice) 41 | cat > /etc/systemd/system/netdata.service <<'EOF' 42 | [Unit] 43 | Description=Real time performance monitoring 44 | After=network.target httpd.service squid.service nfs-server.service mysqld.service mysql.service named.service postfix.service 45 | 46 | [Service] 47 | Type=simple 48 | User=netdata 49 | Group=netdata 50 | ExecStart=/opt/netdata/usr/sbin/netdata -D 51 | 52 | # The minimum netdata Out-Of-Memory (OOM) score. 53 | # netdata (via [global].OOM score in netdata.conf) can only increase the value set here. 54 | # To decrease it, set the minimum here and set the same or a higher value in netdata.conf. 55 | # Valid values: -1000 (never kill netdata) to 1000 (always kill netdata). 56 | OOMScoreAdjust=-1000 57 | 58 | Nice=-10 59 | 60 | # saving a big db on slow disks may need some time 61 | TimeoutStopSec=60 62 | 63 | # restart netdata if it crashes 64 | Restart=on-failure 65 | RestartSec=30 66 | 67 | [Install] 68 | WantedBy=multi-user.target 69 | EOF 70 | 71 | systemctl daemon-reload 72 | systemctl restart netdata.service 73 | 74 | # Disable transparent huge pages 75 | cat > /etc/rc.local <<'EOF' 76 | #!/bin/sh -e 77 | # 78 | # rc.local 79 | # 80 | 81 | THP_DIR="/sys/kernel/mm/transparent_hugepage" 82 | 83 | if test -f $THP_DIR/enabled; then 84 | echo never > $THP_DIR/enabled 85 | fi 86 | 87 | if test -f $THP_DIR/defrag; then 88 | echo never > $THP_DIR/defrag 89 | fi 90 | 91 | exit 0 92 | EOF 93 | 94 | # Clone linux-metrics in the user's home 95 | cat > /etc/profile.d/clone-linux-metrics.sh <<'EOF' 96 | #!/bin/bash 97 | if [[ ! -e ~/linux-metrics ]] && [[ $(id -u) -ne 0 ]]; then 98 | echo "unable to find linux-metrics directory- cloning..." 99 | pushd ~ 100 | git clone https://github.com/natict/linux-metrics.git 101 | popd 102 | fi 103 | EOF 104 | -------------------------------------------------------------------------------- /packer/packer.json: -------------------------------------------------------------------------------- 1 | { 2 | "variables": { 3 | "aws_access_key": "{{env `AWS_ACCESS_KEY_ID`}}", 4 | "aws_secret_key": "{{env `AWS_SECRET_ACCESS_KEY`}}", 5 | "subnet_id": "", 6 | "vpc_id": "", 7 | "aws_region": "eu-west-1", 8 | "source_ami": "ami-05801d0a3c8e4c443" 9 | }, 10 | "builders": [ 11 | { 12 | "name": "linux-metrics", 13 | "type": "amazon-ebs", 14 | "access_key": "{{user `aws_access_key`}}", 15 | "secret_key": "{{user `aws_secret_key`}}", 16 | "region": "{{user `aws_region`}}", 17 | "source_ami": "{{user `source_ami`}}", 18 | "instance_type": "m4.large", 19 | "ssh_username": "ubuntu", 20 | "ami_name": "linux-metrics-{{timestamp}}", 21 | "subnet_id": "{{user `subnet_id`}}", 22 | "vpc_id": "{{user `vpc_id`}}" 23 | } 24 | ], 25 | "provisioners": [ 26 | { 27 | "type": "shell", 28 | "script": "bootstrap.sh" 29 | } 30 | ] 31 | } 32 | -------------------------------------------------------------------------------- /saltstack/README.md: -------------------------------------------------------------------------------- 1 | ### Saltstack installation for of target hosts for the Linux Metrics Workshop 2 | 3 | Tested on Debian 8 (Jessie). 4 | 5 | Requirements: 6 | 7 | 1. Root/User with sudo rights and the same key on all target hosts 8 | 2. SSHd running on all targethosts and accepting password authentication 9 | 3. Python installed on target hosts (check tools directory) 10 | 4. Control machine with salt-ssh installed 11 | 12 | 13 | 14 | -------------------------------------------------------------------------------- /saltstack/Saltfile: -------------------------------------------------------------------------------- 1 | salt-ssh: 2 | config_dir: ./ 3 | log_file: /tmp/salt/master.log 4 | log_level_logfile: debug 5 | ssh_max_procs: 30 6 | ssh_wipe: False 7 | roster_file: ./roster 8 | -------------------------------------------------------------------------------- /saltstack/master: -------------------------------------------------------------------------------- 1 | conf_file: ./master 2 | cachedir: /tmp/saltcache 3 | file_roots: 4 | base: 5 | - ./states 6 | pillar_roots: 7 | base: 8 | - ./pillars 9 | timeout: 45 10 | pki_dir: ./pki/salt/master 11 | state_verbose: True 12 | roster_defaults: 13 | user: root 14 | priv: ~/.ssh/id_rsa 15 | thin_dir: /tmp/salt-thin 16 | -------------------------------------------------------------------------------- /saltstack/passwords/README.md: -------------------------------------------------------------------------------- 1 | ## Print generated password 2 | 3 | The user salt state has created the `lab` user using the first 10 characters from the server's salt "UUID". All server full UUID's can be printed in json format by running: 4 | ``` 5 | $ salt-ssh ws* grains.get uuid --out=json --static > uuids.json 6 | ``` 7 | Passwords can be printed on screen by running: 8 | ``` 9 | $python parseme.py 10 | ``` 11 | -------------------------------------------------------------------------------- /saltstack/passwords/parseme.py: -------------------------------------------------------------------------------- 1 | import json 2 | import sys 3 | import pprint 4 | import os 5 | 6 | 7 | raw_data = json.load(open('uuids.json')) 8 | raw_list = [] 9 | for key, value in raw_data.items(): 10 | raw_list.append((key,value[:10])) 11 | 12 | user_list = sorted(raw_list) 13 | for user in user_list: 14 | print "{0}.example.net {1}".format(user[0],user[1]) 15 | 16 | -------------------------------------------------------------------------------- /saltstack/passwords/sample_sorted.txt: -------------------------------------------------------------------------------- 1 | ws1.example.com: 9137141d-2 2 | ws2.example.com: 48d90b48-7 3 | -------------------------------------------------------------------------------- /saltstack/passwords/uuids.json: -------------------------------------------------------------------------------- 1 | { 2 | "ws1": "136fe95c-6a97-3147-be25-946e3270eef9", 3 | "ws2": "b3fcaf3a-12b5-6f43-b21a-8062f0b391ad" 4 | } 5 | -------------------------------------------------------------------------------- /saltstack/pillars/defaults/init.sls: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kargig/linux-metrics/bf5a2447bcf67872f2c756d1774f57adcc2f2811/saltstack/pillars/defaults/init.sls -------------------------------------------------------------------------------- /saltstack/pillars/top.sls: -------------------------------------------------------------------------------- 1 | base: 2 | '*': 3 | - defaults 4 | -------------------------------------------------------------------------------- /saltstack/roster: -------------------------------------------------------------------------------- 1 | ws1: 2 | host: 'ws1.example.com' 3 | ws2: 4 | host: 'ws2.example.com' 5 | -------------------------------------------------------------------------------- /saltstack/states/bootstrap/init.sls: -------------------------------------------------------------------------------- 1 | update: 2 | cmd.run: 3 | - name: 'apt-get -qq update' 4 | - order: 1 5 | upgrade: 6 | module.run: 7 | - name: pkg.upgrade 8 | - dist_upgrade: True 9 | - order: 2 10 | 11 | draios_repo: 12 | pkgrepo.managed: 13 | - humanname: Draios 14 | - name: deb http://download.draios.com/stable/deb stable-$(ARCH)/ 15 | - file: /etc/apt/sources.list.d/draios.list 16 | - keyid: EC51E8C4 17 | - keyserver: keyserver.ubuntu.com 18 | - require: 19 | - pkg: base_packages 20 | 21 | base_packages: 22 | pkg.installed: 23 | - pkgs: 24 | - wget: 25 | - curl: 26 | - telnet: 27 | - strace: 28 | - git: 29 | - tree: 30 | - rsync: 31 | - less: 32 | - sysstat: 33 | - ethtool: 34 | - screen: 35 | - bzip2: 36 | - lsof: 37 | - sudo: 38 | - haveged: 39 | - file: 40 | - lftp: 41 | - unzip: 42 | - htop: 43 | - heirloom-mailx: 44 | - htop: 45 | - python-apt: 46 | - pkg-config: 47 | - multitail: 48 | - vim: 49 | - ngrep: 50 | - netcat-openbsd: 51 | - dnsutils: 52 | - vim-scripts: 53 | - iptables-persistent 54 | - bc 55 | - aptitude 56 | - python-msgpack 57 | - python-m2crypto 58 | - python-pip 59 | - lsb-release 60 | - debconf-utils 61 | - apt-transport-https: 62 | - psmisc: 63 | - atop 64 | - iotop 65 | - whois 66 | - procps 67 | - sysstat 68 | - stress 69 | - linux-tools 70 | - fio 71 | - iptraf 72 | - nethogs 73 | - iperf 74 | - nicstat 75 | - linux-headers-amd64 76 | - fortunes 77 | - bash-completion 78 | 79 | 80 | linux_metrics_workshop: 81 | git.latest: 82 | - name: https://github.com/kargig/linux-metrics/ 83 | - rev: master 84 | - target: /root/linux-metrics 85 | - remote: origin 86 | - force_reset: True 87 | 88 | netdata: 89 | cmd.run: 90 | - name: curl https://my-netdata.io/kickstart-static64.sh >/tmp/kickstart-static64.sh && sh /tmp/kickstart-static64.sh --dont-wait 91 | - unless: ls -la /opt/netdata/ 92 | - require: 93 | - pkg: base_packages 94 | service.running: 95 | - require: 96 | - cmd: netdata 97 | - watch: 98 | - file: netdata 99 | file.managed: 100 | - name: /etc/systemd/system/netdata.service 101 | - source: salt://bootstrap/netdata.service 102 | - require: 103 | - cmd: netdata 104 | rc.local: 105 | file.managed: 106 | - name: /etc/rc.local 107 | - source: salt://bootstrap/rc.local 108 | - mode: 700 109 | 110 | rc-local.service: 111 | file.managed: 112 | - name: /etc/systemd/system/rc-local.service 113 | - source: salt://bootstrap/rc-local.service 114 | service.running: 115 | - enable: True 116 | - watch: 117 | - file: rc-local.service 118 | - require: 119 | - file: rc-local.service 120 | 121 | sysstat: 122 | file.replace: 123 | - name: /etc/default/sysstat 124 | - pattern: "^ENABLED=.*" 125 | - repl: "ENABLED=true" 126 | - append_if_not_found: True 127 | service.running: 128 | - enable: True 129 | - watch: 130 | - file: /etc/default/sysstat 131 | 132 | locale: 133 | file.replace: 134 | - name: /etc/locale.gen 135 | - pattern: "^# el_GR.UTF-8 UTF-8" 136 | - repl: "el_GR.UTF-8 UTF-8" 137 | - append_if_not_found: True 138 | cmd.wait: 139 | - name: locale-gen 140 | - watch: 141 | - file: /etc/locale.gen 142 | 143 | iptables: 144 | file.managed: 145 | - name: /etc/iptables/rules.v4 146 | - source: salt://bootstrap/iptables_redirect 147 | cmd.wait: 148 | - name: iptables-restore $THP_DIR/enabled 5 | fi 6 | if test -f $THP_DIR/defrag; then 7 | echo never > $THP_DIR/defrag 8 | fi 9 | echo 0 > /sys/devices/system/cpu/cpu1/online 10 | exit 0 11 | -------------------------------------------------------------------------------- /saltstack/states/control/init.sls: -------------------------------------------------------------------------------- 1 | salt: 2 | pkg.latest: 3 | - pkgs: 4 | - python2.7 5 | - python-apt 6 | - python-msgpack 7 | - python-m2crypto 8 | - python-pip 9 | - apt-transport-https 10 | - pkg-config 11 | - python-pip 12 | salt-ssh: 13 | pip.installed: 14 | - require: 15 | - pkg: salt 16 | 17 | -------------------------------------------------------------------------------- /saltstack/states/top.sls: -------------------------------------------------------------------------------- 1 | base: 2 | '*': 3 | - bootstrap 4 | - users 5 | -------------------------------------------------------------------------------- /saltstack/states/users/init.sls: -------------------------------------------------------------------------------- 1 | {%- set password=grains['uuid'][:10] %} 2 | sudo: 3 | pkg.installed 4 | 5 | workshop: 6 | user.present: 7 | - home: /home/workshop 8 | - password: {{password}} 9 | - hash_password: True 10 | - shell: /bin/bash 11 | - fullname: workshop user 12 | - require: 13 | - pkg: sudo 14 | workshop_bashrc: 15 | file.managed: 16 | - name: /home/workshop/.bashrc 17 | - user: workshop 18 | - group: workshop 19 | - source: salt://users/root_bashrc 20 | - require: 21 | - user: workshop 22 | 23 | /etc/sudoers.d/workshop: 24 | file.managed: 25 | - source: salt://users/workshop_sudoers 26 | - user: root 27 | - group: root 28 | - mode: 440 29 | - template: jinja 30 | - check_cmd: /usr/sbin/visudo -c -f 31 | - require: 32 | - pkg: sudo 33 | - user: workshop 34 | 35 | root_bashrc: 36 | file.managed: 37 | - name: /root/.bashrc 38 | - source: salt://users/root_bashrc 39 | 40 | 41 | linux_metrics_workshop_2: 42 | git.latest: 43 | - name: https://github.com/kargig/linux-metrics/ 44 | - rev: master 45 | - target: /home/workshop/linux-metrics 46 | - remote: origin 47 | - user: workshop 48 | - force_reset: True 49 | - require: 50 | - user: workshop 51 | -------------------------------------------------------------------------------- /saltstack/states/users/root_bashrc: -------------------------------------------------------------------------------- 1 | ## set colors for GNU ls ; set this to right file 2 | export LS_COLORS='*.swp=00;44;37:*,v=5;34;93:*.vim=35:no=0:fi=0:di=32:ln=36:or=1;40:mi=1;40:pi=31:so=33:bd=44;37:cd=44;37:ex=35:*.jpg=1;32:*.jpeg=1;32:*.JPG=1;32:*.gif=1;32:*.png=1;32:*.jpeg=1;32:*.ppm=1;32:*.pgm=1;32:*.pbm=1;32:*.c=1;32:*.C=1;33:*.h=1;33:*.cc=1;33:*.awk=1;33:*.pl=1;33:*.gz=31:*.tar=31:*.zip=31:*.lha=1;31:*.lzh=1;31:*.arj=1;31:*.bz2=31:*.tgz=31:*.taz=1;31:*.html=36:*.htm=1;34:*.doc=1;34:*.txt=1;34:*.o=1;36:*.a=1;36' 3 | export LS_OPTIONS='--color=auto' 4 | alias ls='ls $LS_OPTIONS' 5 | alias ll='ls $LS_OPTIONS -l' 6 | alias l='ls $LS_OPTIONS -lA' 7 | export LESSCHARSET=utf-8 8 | export EDITOR=vim 9 | #custom exports for coloured less 10 | export LESS_TERMCAP_mb=$'\E[01;31m' 11 | export LESS_TERMCAP_md=$'\E[01;31m' 12 | export LESS_TERMCAP_me=$'\E[0m' 13 | export LESS_TERMCAP_se=$'\E[0m' 14 | export LESS_TERMCAP_so=$'\E[01;47;34m' 15 | export LESS_TERMCAP_ue=$'\E[0m' 16 | export LESS_TERMCAP_us=$'\E[01;32m' 17 | 18 | -------------------------------------------------------------------------------- /saltstack/states/users/workshop_sudoers: -------------------------------------------------------------------------------- 1 | %fosscomm ALL=(ALL:ALL) NOPASSWD:ALL 2 | -------------------------------------------------------------------------------- /saltstack/tools/install_python.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | DOMAIN='example.com' 3 | for i in 1 2 ; do ssh ws${i}.${DOMAIN} 'DEBIAN_FRONTEND=noninteractive;apt-get -y -qq install python2.7>/dev/null' ; done 4 | -------------------------------------------------------------------------------- /scripts/cpu/dummy1.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | stress -q -c 1 -t 120 4 | -------------------------------------------------------------------------------- /scripts/cpu/dummy2.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | stress -q -m 1 -t 120 4 | -------------------------------------------------------------------------------- /scripts/cpu/dummy3.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | nice -n 10 -- stress -q -c 1 -t 120 4 | -------------------------------------------------------------------------------- /scripts/cpu/dummy4.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | stress -q -d 1 -t 120 4 | -------------------------------------------------------------------------------- /scripts/cpu/dummy_app.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | 3 | import time 4 | import math 5 | 6 | def percentile(N, percent): 7 | if not N: 8 | return None 9 | N = sorted(N) 10 | k = (len(N)-1) * percent 11 | f = math.floor(k) 12 | c = math.ceil(k) 13 | if f == c: 14 | return N[int(k)] 15 | d0 = N[int(f)] * (c-k) 16 | d1 = N[int(c)] * (k-f) 17 | return d0+d1 18 | 19 | def foo(): 20 | for i in xrange(20000): 21 | x = math.sqrt(i) 22 | 23 | if __name__ == "__main__": 24 | m = [] 25 | 26 | for _ in xrange(5000): 27 | start = time.time() 28 | foo() 29 | m.append(time.time() - start) 30 | 31 | print "50th, 75th, 90th and 99th percentile: %f, %f, %f, %f" % ( 32 | percentile(m, 0.5), percentile(m, 0.75), percentile(m, 0.9), percentile(m, 0.99)) 33 | -------------------------------------------------------------------------------- /scripts/disk/fio1.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | fio --directory=/tmp --name fio_test_file --direct=1 --rw=randwrite --bs=16k --size=100M --numjobs=16 --time_based --runtime=180 --group_reporting --norandommap 4 | -------------------------------------------------------------------------------- /scripts/disk/fio2.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | fio --directory=/tmp --name fio_test_file --direct=1 --rw=randread --bs=16k --size=100M --numjobs=16 --time_based --runtime=180 --group_reporting --norandommap 4 | -------------------------------------------------------------------------------- /scripts/disk/fio3.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | fio --directory=/tmp --name fio_test_file --direct=1 --rw=randread --bs=16k --size=100M --numjobs=1 --time_based --runtime=180 --group_reporting --norandommap 4 | -------------------------------------------------------------------------------- /scripts/disk/writer.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | while true; do dd if=/dev/zero of=/tmp/test.1G bs=1M count=1024; done 4 | -------------------------------------------------------------------------------- /scripts/memory/buffer.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | root_dev="$(mount | grep 'on / ' | cut -d' ' -f1)" # /dev/xvda 4 | 5 | # Read 1GB of blocks from the root device 6 | dd if=$root_dev of=/dev/null bs=1M count=4096 7 | -------------------------------------------------------------------------------- /scripts/memory/cache.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Read all readable files in /usr (Generates ~1.5GB cache) 4 | # find /usr -readable -type f -exec dd if={} of=/dev/null status=none \; 5 | 6 | # Just create a large file 7 | SIZE_IN_MB=4096 8 | dd if=/dev/zero of=/tmp/bigfile bs=1MB count=${SIZE_IN_MB} 9 | 10 | echo "Press Enter to continue..."; read 11 | #rm /tmp/bigfile 12 | -------------------------------------------------------------------------------- /scripts/memory/dentry.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | import os 4 | import uuid 5 | import sys 6 | import time 7 | 8 | try: 9 | directory = sys.argv[1] 10 | except IndexError: 11 | directory = "./trash" 12 | 13 | if not os.path.exists(directory): 14 | os.makedirs(directory) 15 | 16 | try: 17 | t_end = time.time() + 30 18 | while time.time() < t_end: 19 | open(os.path.join(directory, str(uuid.uuid4())), 'w') 20 | except Exception: 21 | raw_input("click any key to terminate") 22 | -------------------------------------------------------------------------------- /scripts/memory/dentry2.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | import os 4 | import time 5 | 6 | try: 7 | t_end = time.time() + 30 8 | while time.time() < t_end: 9 | os.makedirs("t") 10 | os.chdir("t") 11 | except Exception: 12 | raw_input("press any key to terminate\n") 13 | -------------------------------------------------------------------------------- /scripts/memory/hog.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | HOG_C=/tmp/hog.c 4 | HOG=/tmp/hog 5 | rm -rf $HOG $HOG_C 6 | 7 | cat >$HOG_C <<'EOF' 8 | #include 9 | #include 10 | #include 11 | 12 | 13 | int main(int argc, char* argv[]) { 14 | long page_size = sysconf(_SC_PAGESIZE); 15 | 16 | long count = 0; 17 | while(1) { 18 | char* tmp = (char*) malloc(page_size); 19 | if (tmp) { 20 | tmp[0] = 0; 21 | count += page_size; 22 | if (count % (page_size*1024) == 0) { 23 | printf("Allocated %ld KB\n", count/1024); 24 | usleep(10000); 25 | } 26 | } 27 | } 28 | 29 | return 0; 30 | } 31 | EOF 32 | 33 | gcc -o $HOG $HOG_C 34 | 35 | exec $HOG 36 | --------------------------------------------------------------------------------