├── .gitignore ├── CHANGELOG ├── LICENSE ├── Makefile ├── README.md ├── README.pg_cgroups ├── expected ├── test_blkio.out ├── test_cpu.out ├── test_cpuset.out └── test_memory.out ├── libcg1.c ├── pg_cgroups.c ├── pg_cgroups.h └── sql ├── test_blkio.sql ├── test_cpu.sql ├── test_cpuset.sql └── test_memory.sql /.gitignore: -------------------------------------------------------------------------------- 1 | # Object files 2 | *.o 3 | *.bc 4 | *.obj 5 | 6 | # Libraries 7 | *.lib 8 | *.a 9 | 10 | # Shared objects (inc. Windows DLLs) 11 | *.dll 12 | *.so 13 | *.so.* 14 | *.dylib 15 | 16 | # Executables 17 | *.exe 18 | 19 | # Regression test results 20 | results 21 | regression.* 22 | 23 | # dependency tracking 24 | .deps 25 | -------------------------------------------------------------------------------- /CHANGELOG: -------------------------------------------------------------------------------- 1 | Release 0.9.1 2 | 3 | Bugfixes: 4 | 5 | - Fix operation on kernels without `CONFIG_MEMCG_SWAP_ENABLED`. 6 | Newer Debian kernels are configured like that, and on such kernels 7 | `memory.memsw.limit_in_bytes` does not exist, which caused pg_cgroups 8 | to fail during startup with the error 9 | 10 | cannot access '/sys/fs/cgroup/memory/init.scope/memory.memsw.limit_in_bytes': No such file or directory 11 | 12 | Fix by not defining `pg_cgroups.swap_limit` on such systems. 13 | 14 | Bug reported by Jens Wilke in #1. 15 | 16 | - Fix building on PostgreSQL v10. 17 | `OpenTransientFile` had a third argument back then. 18 | 19 | Release 0.9.0 (2019-04-28) 20 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2018-2019, CYBERTEC PostgreSQL International GmbH 2 | 3 | Permission to use, copy, modify, and distribute this software and its 4 | documentation for any purpose, without fee, and without a written agreement 5 | is hereby granted, provided that the above copyright notice and this paragraph 6 | and the following two paragraphs appear in all copies. 7 | 8 | IN NO EVENT SHALL THE COPYRIGHT HOLDER BE LIABLE TO ANY PARTY FOR 9 | DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING 10 | LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, 11 | EVEN IF THE COPYRIGHT HOLDER HAS BEEN ADVISED OF THE POSSIBILITY OF 12 | SUCH DAMAGE. 13 | 14 | THE COPYRIGHT HOLDER SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, 15 | BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 16 | A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS IS" BASIS, 17 | AND THE COPYRIGHT HOLDER HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE, 18 | SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS. 19 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | MODULE_big = pg_cgroups 2 | OBJS = pg_cgroups.o libcg1.o 3 | DOCS = README.pg_cgroups 4 | REGRESS = test_memory test_blkio test_cpu test_cpuset 5 | 6 | PG_CONFIG = pg_config 7 | PGXS := $(shell $(PG_CONFIG) --pgxs) 8 | include $(PGXS) 9 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | Linux Control Groups for PostgreSQL 2 | =================================== 3 | 4 | `pg_cgroups` is a module that allows you to run a PostgreSQL database cluster 5 | in a Linux Control Group (cgroup) and set the cgroup parameters as PostgreSQL 6 | configuration parameters. 7 | 8 | This enables you to limit the operating system resources for the cluster. 9 | 10 | Installation 11 | ============ 12 | 13 | Make sure that you have the PostgreSQL headers and the extension 14 | building infrastructure installed. If you do not build PostgreSQL 15 | from source, this is done by installing a `*-devel` or `*-dev` 16 | package. 17 | 18 | Check that the correct `pg_config` is found on the `PATH`. 19 | Then build and install `pg_cgroups` with 20 | 21 | make 22 | sudo make install 23 | 24 | Then you must add `pg_cgroups` to `shared_preload_libraries` and restart 25 | the PostgreSQL server process, but make sure that you have completed the 26 | setup as described below, or PostgreSQL will not start. 27 | 28 | Setup 29 | ===== 30 | 31 | As user `root`, create the file `/etc/cgconfig.conf` with the following 32 | content: 33 | 34 | group postgres { 35 | perm { 36 | task { 37 | uid = postgres; 38 | gid = postgres; 39 | fperm = 644; 40 | } 41 | admin { 42 | uid = postgres; 43 | gid = postgres; 44 | dperm = 755; 45 | fperm = 644; 46 | } 47 | } 48 | 49 | memory { 50 | } 51 | 52 | blkio { 53 | } 54 | 55 | cpu { 56 | } 57 | 58 | cpuset { 59 | } 60 | } 61 | 62 | Here `postgres` is the PostgreSQL operating system user. 63 | 64 | Then make sure that cgroups are initialized and `/etc/cgconfig.conf` 65 | is loaded. How this is done will depend on the distribution. 66 | On RedHat-based systems, you would do the following: 67 | 68 | yum install -y libcgroup libcgroup-tools 69 | systemctl enable cgconfig 70 | systemctl start cgconfig 71 | 72 | If PostgreSQL is automatically started during system startup, make sure 73 | that cgroups are configured before PostgreSQL is started. 74 | With `systemd`, you can do that by adding an `After` and a `Requires` 75 | option to the `[Unit]` section of the PostgreSQL service file. 76 | 77 | Usage 78 | ===== 79 | 80 | PostgreSQL will automatically create a cgroup called `/postgres/` (where 81 | `` is the postmaster process ID) for the following controllers: 82 | 83 | - memory 84 | - cpu 85 | - blkio 86 | - cpuset 87 | 88 | Then it will add itself to this cgroup so that all PostgreSQL processes 89 | get to run under that cgroup. The cgroup is deleted when PostgreSQL is 90 | shut down. 91 | 92 | You can configure limits for various operating system resources by setting 93 | configuration parameters in `postgresql.conf` or with `ALTER SYSTEM`. 94 | 95 | You should also avoid modifying the cgroup parameters outside of PostgreSQL. 96 | This will work, but then the configuration parameters won't contain the 97 | correct setting. 98 | 99 | `pg_cgroups` tries to check the parameter values for validity, but be careful 100 | because an incorrect parameter setting will cause PostgreSQL to stop. 101 | 102 | The parameters are: 103 | 104 | Memory parameters 105 | ----------------- 106 | 107 | - `pg_cgroups.memory_limit` (type `integer`, unit MB, default value -1) 108 | 109 | This corresponds to the cgroup memory parameter 110 | `memory.limit_in_bytes` and limits the amount of RAM available. 111 | 112 | The parameter can be positive or -1 for "no limit". 113 | 114 | Once `memory_limit` plus `swap_limit` is exhausted, the `oom_killer` 115 | parameter determines what will happen. 116 | 117 | - `pg_cgroups.swap_limit` (type `integer`, unit MB, default value -1) 118 | 119 | This configures the cgroup memory parameter `memory.memsw.limit_in_bytes` 120 | and limits the available swap space 121 | (note, however, that while `memory.memsw.limit_in_bytes` limits the sum of 122 | memory and swap space, `pg_cgroups.swap_limit` limits *only* the swap space). 123 | 124 | This parameter can be 0, positive or -1 for "no limit". 125 | 126 | Once `memory_limit` plus `swap_limit` is exhausted, the `oom_killer` 127 | parameter determines what will happen. 128 | 129 | **Note:** If the kernel was configured without `CONFIG_MEMCG_SWAP_ENABLED`, 130 | this parameter is not available. 131 | 132 | - `pg_cgroups.oom_killer` (type `boolean`, default value `on`) 133 | 134 | This parameter configures what will happen if the limit on memory and swap 135 | space is exhausted. If set to `on`, the Linux out-of-memory killer will 136 | kill PostgreSQL processes, otherwise execution is suspended until some 137 | memory is freed (which may never happen). 138 | 139 | Block-I/O parameters 140 | -------------------- 141 | 142 | For all these parameters, the format of the entries is `major:minor limit`, 143 | where `major` and `minor` are the device major and minor numbers, 144 | and `limit` is a number (bytes or number of I/O operations). 145 | 146 | To limit I/O on several devices, use several such entries, separated by 147 | a comma. 148 | 149 | For example, if I want to limit I/O on the device `/dev/mapper/home`, 150 | you first find out what device that actually is: 151 | 152 | $ readlink -e /dev/mapper/home 153 | /dev/dm-2 154 | 155 | Then you find out the major and minor numbers: 156 | 157 | $ ls -l /dev/dm-2 158 | brw-rw---- 1 root disk 253, 2 Jun 21 12:13 /dev/dm-2 159 | 160 | So in this case, you would use an entry like `253:2 1048576` if you want to 161 | limit I/O to 1MB per second. 162 | 163 | To remove a limit with `ALTER SYSTEM`, you have to set it to 0 explicitly, 164 | as in `253:2 0`. 165 | Using `ALTER SYSTEM RESET` or setting the limit to an empty string won't 166 | change the limit (this is how Linux control groups are implemented). 167 | However, setting the limit to an empty string and restarting the server 168 | will work, since the cgroup is deleted and re-created in this case. 169 | 170 | - `pg_cgroups.read_bps_limit` (type `text`, default empty) 171 | 172 | This corresponds to the cgroup blkio parameter 173 | `blkio.throttle.read_bps_device` and limits the amount of bytes that can 174 | be read per second. 175 | 176 | - `pg_cgroups.write_bps_limit` (type `text`, default empty) 177 | 178 | This corresponds to the cgroup blkio parameter 179 | `blkio.throttle.write_bps_device` and limits the amount of bytes that can 180 | be written per second. 181 | 182 | - `pg_cgroups.read_iops_limit` (type `text`, default empty) 183 | 184 | This corresponds to the cgroup blkio parameter 185 | `blkio.throttle.read_iops_device` and limits the number of read I/O 186 | operations that can be performed per second. 187 | 188 | - `pg_cgroups.write_iops_limit` (type `text`, default empty) 189 | 190 | This corresponds to the cgroup blkio parameter 191 | `blkio.throttle.write_iops_device` and limits the number of write I/O 192 | operations that can be performed per second. 193 | 194 | CPU parameters 195 | -------------- 196 | 197 | - `pg_cgroups.cpu_share` (type `integer`, default -1) 198 | 199 | This corresponds to the cgroup cpu parameter `cpu.cfs_quota_us` and defines 200 | the percentage of CPU bandwidth that can be used by PostgreSQL. 201 | The unit is 1/1000 of a percent, so 100000 stands for 100% of one CPU core. 202 | The minimum value is 1000, which stands for 1%. 203 | 204 | The default value -1 means “no limit&rdqo;. 205 | 206 | To allow PostgreSQL to use more than one CPU fully, set the parameter to 207 | a value greater than 100000. 208 | 209 | NUMA parameters 210 | --------------- 211 | 212 | These parameters limit the CPUs and memory nodes that can be used by PostgreSQL. 213 | Setting these parameters usually only makes sense on [NUMA][1] architectures. 214 | Use `numactl --hardware` so see your machine's NUMA configuration. 215 | 216 | If you restrict PostgreSQL to run on the CPUs that belong to one memory node, 217 | you should also restrict memory usage to that node and vice versa, so that 218 | PostgreSQL only needs to access node-local memory. 219 | 220 | All these parameters take the form of a comma separated list of zero based 221 | numbers or number ranges, like `0`, `0-3` or `4,7-9`. 222 | 223 | - `pg_cgroups.memory_nodes` (`text`, defaults to all online nodes) 224 | 225 | This corresponds to the cgroup parameter `cpuset.mems` and defines the 226 | memory nodes that PostgreSQL can use. 227 | 228 | - `pg_cgroups.cpus` (`text`, defaults to all online CPUs) 229 | 230 | This corresponds to the cgroup parameter `cpuset.cpus` and defines the 231 | CPUs that PostgreSQL can use. 232 | 233 | [1]: https://en.wikipedia.org/wiki/Non-uniform_memory_access 234 | 235 | Diagnostic parameter 236 | -------------------- 237 | 238 | - `pg_cgroups.version` (type `text`) 239 | 240 | This parameter shows the current version of `pg_cgroups` and can only 241 | be read. 242 | 243 | Support 244 | ======= 245 | 246 | You can 247 | [open an issue](https://github.com/cybertec-postgresql/pg_cgroups/issues) 248 | on Github if you have questions or problems. 249 | 250 | For professional support, contact 251 | [CYBERTEC PostgreSQL International GmbH](https://www.cybertec-postgresql.com). 252 | 253 | Make sure you report which version you are using. 254 | The version can be found with this SQL command: 255 | 256 | SHOW pg_cgroups.version; 257 | -------------------------------------------------------------------------------- /README.pg_cgroups: -------------------------------------------------------------------------------- 1 | README.md -------------------------------------------------------------------------------- /expected/test_blkio.out: -------------------------------------------------------------------------------- 1 | /* 2 | * Unfortunately I cannot test everything because I cannot 3 | * rely on the existence of a certain block device on 4 | * every Linux system. 5 | */ 6 | -- try several incorrect settings that should fail 7 | ALTER SYSTEM SET pg_cgroups.read_bps_limit = '1024'; 8 | ERROR: invalid value for parameter "pg_cgroups.read_bps_limit": "1024" 9 | DETAIL: Entry "1024" must have a space between device and limit. 10 | ALTER SYSTEM SET pg_cgroups.write_bps_limit = '8:0'; 11 | ERROR: invalid value for parameter "pg_cgroups.write_bps_limit": "8:0" 12 | DETAIL: Entry "8:0" must have a space between device and limit. 13 | ALTER SYSTEM SET pg_cgroups.read_iops_limit = ':0 9210'; 14 | ERROR: invalid value for parameter "pg_cgroups.read_iops_limit": ":0 9210" 15 | DETAIL: Entry ":0 9210" does not start with "major:minor" device numbers. 16 | ALTER SYSTEM SET pg_cgroups.write_iops_limit = '100 9210'; 17 | ERROR: invalid value for parameter "pg_cgroups.write_iops_limit": "100 9210" 18 | DETAIL: Entry "100 9210" does not start with "major:minor" device numbers. 19 | ALTER SYSTEM SET pg_cgroups.read_bps_limit = '100: 9210'; 20 | ERROR: invalid value for parameter "pg_cgroups.read_bps_limit": "100: 9210" 21 | DETAIL: Entry "100: 9210" does not start with "major:minor" device numbers. 22 | ALTER SYSTEM SET pg_cgroups.write_iops_limit = '1:0 xyz'; 23 | ERROR: invalid value for parameter "pg_cgroups.write_iops_limit": "1:0 xyz" 24 | DETAIL: Limit "xyz" must be an integer number. 25 | -------------------------------------------------------------------------------- /expected/test_cpu.out: -------------------------------------------------------------------------------- 1 | -- check the default settings 2 | SHOW pg_cgroups.cpu_share; 3 | pg_cgroups.cpu_share 4 | ---------------------- 5 | -1 6 | (1 row) 7 | 8 | -- this should fail 9 | ALTER SYSTEM SET pg_cgroups.cpu_share = 0; 10 | ERROR: invalid value for parameter "pg_cgroups.cpu_share": 0 11 | -- allow 50% of the availabe CPU 12 | ALTER SYSTEM SET pg_cgroups.cpu_share = 50000; 13 | SELECT pg_reload_conf(); 14 | pg_reload_conf 15 | ---------------- 16 | t 17 | (1 row) 18 | 19 | SELECT pg_sleep_for('0.3'); 20 | pg_sleep_for 21 | -------------- 22 | 23 | (1 row) 24 | 25 | SHOW pg_cgroups.cpu_share; 26 | pg_cgroups.cpu_share 27 | ---------------------- 28 | 50000 29 | (1 row) 30 | 31 | -- reset 32 | ALTER SYSTEM RESET pg_cgroups.cpu_share; 33 | SELECT pg_reload_conf(); 34 | pg_reload_conf 35 | ---------------- 36 | t 37 | (1 row) 38 | 39 | SELECT pg_sleep_for('0.3'); 40 | pg_sleep_for 41 | -------------- 42 | 43 | (1 row) 44 | 45 | SHOW pg_cgroups.cpu_share; 46 | pg_cgroups.cpu_share 47 | ---------------------- 48 | -1 49 | (1 row) 50 | 51 | -------------------------------------------------------------------------------- /expected/test_cpuset.out: -------------------------------------------------------------------------------- 1 | -- check the default settings 2 | SHOW pg_cgroups.cpus; 3 | pg_cgroups.cpus 4 | ----------------- 5 | 0-3 6 | (1 row) 7 | 8 | SHOW pg_cgroups.memory_nodes; 9 | pg_cgroups.memory_nodes 10 | ------------------------- 11 | 0 12 | (1 row) 13 | 14 | -- test some incorrect settings 15 | ALTER SYSTEM SET pg_cgroups.cpus = '-1'; 16 | ERROR: invalid value for parameter "pg_cgroups.cpus": "-1" 17 | DETAIL: Value "-1" has "-" in an invalid place. 18 | ALTER SYSTEM SET pg_cgroups.cpus = '0-0-0'; 19 | ERROR: invalid value for parameter "pg_cgroups.cpus": "0-0-0" 20 | DETAIL: Value "0-0-0" has "-" in an invalid place. 21 | ALTER SYSTEM SET pg_cgroups.cpus = '0,1-0,1'; 22 | ERROR: invalid value for parameter "pg_cgroups.cpus": "0,1-0,1" 23 | DETAIL: Number 0 is outside of range 1-3. 24 | ALTER SYSTEM SET pg_cgroups.cpus = '10000'; 25 | ERROR: invalid value for parameter "pg_cgroups.cpus": "10000" 26 | DETAIL: Number 10000 is outside of range 0-3. 27 | ALTER SYSTEM SET pg_cgroups.cpus = '1000000'; 28 | ERROR: invalid value for parameter "pg_cgroups.cpus": "1000000" 29 | DETAIL: Value "1000000" contains an invalid number. 30 | ALTER SYSTEM SET pg_cgroups.cpus = ',1'; 31 | ERROR: invalid value for parameter "pg_cgroups.cpus": ",1" 32 | DETAIL: Value ",1" is missing a number at the end of a group. 33 | ALTER SYSTEM SET pg_cgroups.cpus = '0-1,'; 34 | ERROR: invalid value for parameter "pg_cgroups.cpus": "0-1," 35 | DETAIL: Value "0-1," is missing a number at the end of a group. 36 | -- set the available CPUs 37 | ALTER SYSTEM SET pg_cgroups.cpus = '0'; 38 | SELECT pg_reload_conf(); 39 | pg_reload_conf 40 | ---------------- 41 | t 42 | (1 row) 43 | 44 | SELECT pg_sleep_for('0.3'); 45 | pg_sleep_for 46 | -------------- 47 | 48 | (1 row) 49 | 50 | SHOW pg_cgroups.cpus; 51 | pg_cgroups.cpus 52 | ----------------- 53 | 0 54 | (1 row) 55 | 56 | -- set the available memory nodes 57 | ALTER SYSTEM SET pg_cgroups.memory_nodes = '0'; 58 | SELECT pg_reload_conf(); 59 | pg_reload_conf 60 | ---------------- 61 | t 62 | (1 row) 63 | 64 | SELECT pg_sleep_for('0.3'); 65 | pg_sleep_for 66 | -------------- 67 | 68 | (1 row) 69 | 70 | SHOW pg_cgroups.memory_nodes; 71 | pg_cgroups.memory_nodes 72 | ------------------------- 73 | 0 74 | (1 row) 75 | 76 | -- reset 77 | ALTER SYSTEM RESET pg_cgroups.cpus; 78 | ALTER SYSTEM RESET pg_cgroups.memory_nodes; 79 | SELECT pg_reload_conf(); 80 | pg_reload_conf 81 | ---------------- 82 | t 83 | (1 row) 84 | 85 | SELECT pg_sleep_for('0.3'); 86 | pg_sleep_for 87 | -------------- 88 | 89 | (1 row) 90 | 91 | SHOW pg_cgroups.cpus; 92 | pg_cgroups.cpus 93 | ----------------- 94 | 0-3 95 | (1 row) 96 | 97 | SHOW pg_cgroups.memory_nodes; 98 | pg_cgroups.memory_nodes 99 | ------------------------- 100 | 0 101 | (1 row) 102 | 103 | -------------------------------------------------------------------------------- /expected/test_memory.out: -------------------------------------------------------------------------------- 1 | -- check the default settings 2 | SHOW pg_cgroups.memory_limit; 3 | pg_cgroups.memory_limit 4 | ------------------------- 5 | -1 6 | (1 row) 7 | 8 | SHOW pg_cgroups.swap_limit; 9 | pg_cgroups.swap_limit 10 | ----------------------- 11 | -1 12 | (1 row) 13 | 14 | SHOW pg_cgroups.oom_killer; 15 | pg_cgroups.oom_killer 16 | ----------------------- 17 | on 18 | (1 row) 19 | 20 | -- change swap_limit (will set the parameter, leave kernel value unlimited) 21 | ALTER SYSTEM SET pg_cgroups.swap_limit = 512; 22 | SELECT pg_reload_conf(); 23 | pg_reload_conf 24 | ---------------- 25 | t 26 | (1 row) 27 | 28 | SELECT pg_sleep_for('0.3'); 29 | pg_sleep_for 30 | -------------- 31 | 32 | (1 row) 33 | 34 | SHOW pg_cgroups.swap_limit; 35 | pg_cgroups.swap_limit 36 | ----------------------- 37 | 512MB 38 | (1 row) 39 | 40 | -- change memory limit (should work) 41 | ALTER SYSTEM SET pg_cgroups.memory_limit = 1024; 42 | SELECT pg_reload_conf(); 43 | pg_reload_conf 44 | ---------------- 45 | t 46 | (1 row) 47 | 48 | SELECT pg_sleep_for('0.3'); 49 | pg_sleep_for 50 | -------------- 51 | 52 | (1 row) 53 | 54 | SHOW pg_cgroups.memory_limit; 55 | pg_cgroups.memory_limit 56 | ------------------------- 57 | 1GB 58 | (1 row) 59 | 60 | SHOW pg_cgroups.swap_limit; 61 | pg_cgroups.swap_limit 62 | ----------------------- 63 | 512MB 64 | (1 row) 65 | 66 | -- change swap_limit (should work) 67 | ALTER SYSTEM SET pg_cgroups.swap_limit = 0; 68 | SELECT pg_reload_conf(); 69 | pg_reload_conf 70 | ---------------- 71 | t 72 | (1 row) 73 | 74 | SELECT pg_sleep_for('0.3'); 75 | pg_sleep_for 76 | -------------- 77 | 78 | (1 row) 79 | 80 | SHOW pg_cgroups.swap_limit; 81 | pg_cgroups.swap_limit 82 | ----------------------- 83 | 0 84 | (1 row) 85 | 86 | -- lower memory limit (should work) 87 | ALTER SYSTEM SET pg_cgroups.memory_limit = 256; 88 | SELECT pg_reload_conf(); 89 | pg_reload_conf 90 | ---------------- 91 | t 92 | (1 row) 93 | 94 | SELECT pg_sleep_for('0.3'); 95 | pg_sleep_for 96 | -------------- 97 | 98 | (1 row) 99 | 100 | SHOW pg_cgroups.memory_limit; 101 | pg_cgroups.memory_limit 102 | ------------------------- 103 | 256MB 104 | (1 row) 105 | 106 | SHOW pg_cgroups.swap_limit; 107 | pg_cgroups.swap_limit 108 | ----------------------- 109 | 0 110 | (1 row) 111 | 112 | -- raise memory limit (should work) 113 | ALTER SYSTEM SET pg_cgroups.memory_limit = 512; 114 | SELECT pg_reload_conf(); 115 | pg_reload_conf 116 | ---------------- 117 | t 118 | (1 row) 119 | 120 | SELECT pg_sleep_for('0.3'); 121 | pg_sleep_for 122 | -------------- 123 | 124 | (1 row) 125 | 126 | SHOW pg_cgroups.memory_limit; 127 | pg_cgroups.memory_limit 128 | ------------------------- 129 | 512MB 130 | (1 row) 131 | 132 | SHOW pg_cgroups.swap_limit; 133 | pg_cgroups.swap_limit 134 | ----------------------- 135 | 0 136 | (1 row) 137 | 138 | -- set swap limit to -1 (should work) 139 | ALTER SYSTEM SET pg_cgroups.swap_limit = -1; 140 | SELECT pg_reload_conf(); 141 | pg_reload_conf 142 | ---------------- 143 | t 144 | (1 row) 145 | 146 | SELECT pg_sleep_for('0.3'); 147 | pg_sleep_for 148 | -------------- 149 | 150 | (1 row) 151 | 152 | SHOW pg_cgroups.swap_limit; 153 | pg_cgroups.swap_limit 154 | ----------------------- 155 | -1 156 | (1 row) 157 | 158 | -- set swap limit to 0 (should work) 159 | ALTER SYSTEM SET pg_cgroups.swap_limit = 0; 160 | SELECT pg_reload_conf(); 161 | pg_reload_conf 162 | ---------------- 163 | t 164 | (1 row) 165 | 166 | SELECT pg_sleep_for('0.3'); 167 | pg_sleep_for 168 | -------------- 169 | 170 | (1 row) 171 | 172 | SHOW pg_cgroups.swap_limit; 173 | pg_cgroups.swap_limit 174 | ----------------------- 175 | 0 176 | (1 row) 177 | 178 | -- set memory limit to 0 (should fail) 179 | ALTER SYSTEM SET pg_cgroups.memory_limit = 0; 180 | ERROR: invalid value for parameter "pg_cgroups.memory_limit": 0 181 | -- disable OOM killer (should work) 182 | ALTER SYSTEM SET pg_cgroups.oom_killer = off; 183 | SELECT pg_reload_conf(); 184 | pg_reload_conf 185 | ---------------- 186 | t 187 | (1 row) 188 | 189 | SELECT pg_sleep_for('0.3'); 190 | pg_sleep_for 191 | -------------- 192 | 193 | (1 row) 194 | 195 | SHOW pg_cgroups.oom_killer; 196 | pg_cgroups.oom_killer 197 | ----------------------- 198 | off 199 | (1 row) 200 | 201 | -- reset all settings 202 | ALTER SYSTEM RESET pg_cgroups.memory_limit; 203 | ALTER SYSTEM RESET pg_cgroups.swap_limit; 204 | ALTER SYSTEM RESET pg_cgroups.oom_killer; 205 | SELECT pg_reload_conf(); 206 | pg_reload_conf 207 | ---------------- 208 | t 209 | (1 row) 210 | 211 | SELECT pg_sleep_for('0.3'); 212 | pg_sleep_for 213 | -------------- 214 | 215 | (1 row) 216 | 217 | SHOW pg_cgroups.memory_limit; 218 | pg_cgroups.memory_limit 219 | ------------------------- 220 | -1 221 | (1 row) 222 | 223 | SHOW pg_cgroups.swap_limit; 224 | pg_cgroups.swap_limit 225 | ----------------------- 226 | -1 227 | (1 row) 228 | 229 | SHOW pg_cgroups.oom_killer; 230 | pg_cgroups.oom_killer 231 | ----------------------- 232 | on 233 | (1 row) 234 | 235 | -------------------------------------------------------------------------------- /libcg1.c: -------------------------------------------------------------------------------- 1 | #ifndef __linux__ 2 | #error "Linux control groups are only available on Linux" 3 | #endif 4 | 5 | #include "postgres.h" 6 | 7 | #include "storage/fd.h" 8 | #include "storage/ipc.h" 9 | #include "utils/memutils.h" 10 | 11 | #include 12 | #include 13 | #include 14 | #include 15 | #include 16 | #include 17 | #include 18 | #include 19 | #include 20 | #include 21 | 22 | #include "pg_cgroups.h" 23 | 24 | /* v11 did away with the third parameter of OpenTransientFile */ 25 | #if PG_VERSION_NUM < 110000 26 | #define OpenTransFile(filename, fileflags) \ 27 | OpenTransientFile((filename), (fileflags), S_IRUSR | S_IWUSR) 28 | #else 29 | #define OpenTransFile(filename, fileflags) \ 30 | OpenTransientFile((filename), (fileflags)) 31 | #endif /* PG_VERSION_NUM */ 32 | 33 | /* 34 | * static variables 35 | */ 36 | 37 | /* structure for information about cgroup controllers */ 38 | static struct { 39 | char *name; 40 | bool init; 41 | char *mountpoint; 42 | } cgctl[MAX_CONTROLLERS] = { 43 | {"memory", false, NULL}, 44 | {"cpu", false, NULL}, 45 | {"blkio", false, NULL}, 46 | {"cpuset", false, NULL} 47 | }; 48 | /* postmaster PID */ 49 | static pid_t postmaster_pid; 50 | 51 | /* default values for the parameters */ 52 | static char *def_cpus; 53 | static char *def_memory_nodes; 54 | 55 | /* 56 | * function prototypes 57 | */ 58 | 59 | static void check_controllers(void); 60 | static void get_mountpoints(void); 61 | static char * const get_online(char * const what); 62 | static void cg_write_string(int controller, char * const cgroup, char * const parameter, char * const value); 63 | static char *cg_read_string(int controller, char * const cgroup, char * const parameter, bool ignore_errors); 64 | static void cg_move_process(char * const cgroup, char * const process, bool silent); 65 | static void on_exit_callback(int code, Datum arg); 66 | 67 | /* 68 | * static functions 69 | */ 70 | 71 | /* check if all required controllers are present */ 72 | void 73 | check_controllers() 74 | { 75 | FILE *cgfile; 76 | char *line = NULL; 77 | size_t size = 0; 78 | int i, len; 79 | 80 | errno = 0; 81 | 82 | /* check if all cgroup controllers exist */ 83 | if ((cgfile = AllocateFile("/proc/cgroups", "r")) == NULL) { 84 | ereport(FATAL, 85 | (errcode(ERRCODE_SYSTEM_ERROR), 86 | errmsg("cannot open \"/proc/cgroups\": %m"), 87 | errhint("Make sure that Linux Control Groups are supported by the kernel and activated."))); 88 | } 89 | 90 | /* 91 | * This uses malloc(3) internally, which we shouldn't do in 92 | * server code, but "getline" is so convenient. 93 | * Make sure to free(2) "line"! 94 | */ 95 | while (getline(&line, &size, cgfile) != -1) { 96 | /* skip empty and comment lines */ 97 | if (size < 1 || line[0] == '#') 98 | continue; 99 | 100 | /* set "init" true for any controller found */ 101 | for (i=0; imnt_type, "cgroup") != 0) 155 | continue; 156 | 157 | /* find the controller name in the mount options */ 158 | p1 = mnt->mnt_opts; 159 | while (p1 != NULL) { 160 | p2 = strchr(p1, ','); 161 | if (p2 == NULL) 162 | len = strlen(p1); 163 | else 164 | len = p2 - p1; 165 | 166 | /* if any of the options match, set the mount point */ 167 | for (i=0; imnt_dir); 174 | break; 175 | } 176 | 177 | p1 = p2 ? (p2 + 1) : NULL; 178 | } 179 | } 180 | 181 | FreeFile(mntfile); 182 | 183 | /* check that all cgroups are properly set up */ 184 | for (i=0; i 0) 234 | { 235 | size_t len = strlen(value); 236 | 237 | value = repalloc(value, len + bytes + 1); 238 | strncat(value, buf, bytes); 239 | value[len + bytes] = '\0'; 240 | } 241 | 242 | if (errno) 243 | { 244 | pfree(value); 245 | 246 | ereport(ERROR, 247 | (errcode(ERRCODE_SYSTEM_ERROR), 248 | errmsg("error reading file \"%s\": %m", path))); 249 | } 250 | 251 | CloseTransientFile(fd); 252 | 253 | /* remove the trailing newline */ 254 | value[strlen(value) - 1] = '\0'; 255 | 256 | return value; 257 | } 258 | 259 | /* 260 | * Write a control group parameter. 261 | */ 262 | void 263 | cg_write_string(int controller, char * const cgroup, char * const parameter, char * const value) 264 | { 265 | char *path; 266 | int fd; 267 | 268 | path = palloc(strlen(cgctl[controller].mountpoint) 269 | + strlen(cgroup) 270 | + strlen(parameter) + 3); 271 | sprintf(path, 272 | "%s/%s/%s", 273 | cgctl[controller].mountpoint, cgroup, parameter); 274 | 275 | errno = 0; 276 | 277 | fd = OpenTransFile(path, O_WRONLY | O_TRUNC); 278 | 279 | if (fd == -1) 280 | ereport(ERROR, 281 | (errcode(ERRCODE_SYSTEM_ERROR), 282 | errmsg("error opening file \"%s\" for writing: %m", path))); 283 | 284 | /* 285 | * The attempt to write an empty string causes an error, 286 | * so don't write anything in this case. 287 | * The file is truncated on open anyway. 288 | */ 289 | if (strlen(value) > 0 && write(fd, value, strlen(value)) < 0) 290 | ereport(ERROR, 291 | (errcode(ERRCODE_SYSTEM_ERROR), 292 | errmsg("error writing file \"%s\": %m", path))); 293 | 294 | pfree(path); 295 | 296 | CloseTransientFile(fd); 297 | } 298 | 299 | /* 300 | * Read a control group parameter. 301 | * Returns a palloc'ed value. 302 | * If "ignore_errors" is "true", the function returns NULL if it encounters errors. 303 | */ 304 | char * 305 | cg_read_string(int controller, char * const cgroup, char * const parameter, bool ignore_errors) 306 | { 307 | char *result = NULL, *path, buf[1000]; 308 | ssize_t bytes, total = 0; 309 | int fd; 310 | 311 | path = palloc(strlen(cgctl[controller].mountpoint) 312 | + strlen(cgroup) 313 | + strlen(parameter) + 3); 314 | sprintf(path, 315 | "%s/%s/%s", 316 | cgctl[controller].mountpoint, cgroup, parameter); 317 | 318 | errno = 0; 319 | 320 | fd = OpenTransFile(path, O_RDONLY | O_TRUNC); 321 | 322 | if (fd == -1) 323 | { 324 | if (ignore_errors) 325 | return NULL; 326 | else 327 | ereport(ERROR, 328 | (errcode(ERRCODE_SYSTEM_ERROR), 329 | errmsg("error opening file \"%s\" for reading: %m", path))); 330 | } 331 | 332 | while ((bytes = read(fd, buf, 1000)) > 0) 333 | { 334 | total += bytes; 335 | 336 | if (result) 337 | result = repalloc(result, total + 1); 338 | else 339 | { 340 | result = palloc(total + 1); 341 | result[0] = '\0'; 342 | } 343 | 344 | strncat(result, buf, bytes); 345 | } 346 | if (errno) 347 | ereport(ERROR, 348 | (errcode(ERRCODE_SYSTEM_ERROR), 349 | errmsg("error reading file \"%s\": %m", path))); 350 | 351 | pfree(path); 352 | 353 | CloseTransientFile(fd); 354 | 355 | return result; 356 | } 357 | 358 | /* 359 | * Add the processes to a Linux control group for all controllers. 360 | * "processes" contains the process IDs, separated by comma. 361 | * If "silent", ignore errors. 362 | */ 363 | void 364 | cg_move_process(char * const cgroup, char * const process, bool silent) 365 | { 366 | int i, fd; 367 | char *path; 368 | 369 | for (i=0; i 13 | #include 14 | #include 15 | #include 16 | #include 17 | #include 18 | #include 19 | 20 | #include "pg_cgroups.h" 21 | 22 | PG_MODULE_MAGIC; 23 | 24 | static char *pg_cgroups_version; 25 | 26 | /* GUCs defined by the module */ 27 | static int memory_limit = -1; 28 | static int swap_limit = -1; 29 | static bool oom_killer = true; 30 | static char *read_bps_limit = NULL; 31 | static char *write_bps_limit = NULL; 32 | static char *read_iops_limit = NULL; 33 | static char *write_iops_limit = NULL; 34 | static int cpu_share = -1; 35 | static char* cpus = NULL; /* set during module initialization */ 36 | static char* memory_nodes = NULL; /* set during module initialization */ 37 | 38 | /* other static variables */ 39 | static bool cgroup_has_swap_param = false; /* set during module initialization */ 40 | static int max_cpu_share = -1; /* set during module initialization */ 41 | 42 | /* static functions declarations */ 43 | static bool memory_limit_check(int *newval, void **extra, GucSource source); 44 | static void memory_limit_assign(int newval, void *extra); 45 | static void swap_limit_assign(int newval, void *extra); 46 | static void oom_killer_assign(bool newval, void *extra); 47 | static bool device_limit_check(char **newval, void **extra, GucSource source); 48 | static void device_limit_assign(char * const limit_name, char *newval); 49 | static void read_bps_limit_assign(const char *newval, void *extra); 50 | static void write_bps_limit_assign(const char *newval, void *extra); 51 | static void read_iops_limit_assign(const char *newval, void *extra); 52 | static void write_iops_limit_assign(const char *newval, void *extra); 53 | static bool cpu_share_check(int *newval, void **extra, GucSource source); 54 | static void cpu_share_assign(int newval, void *extra); 55 | static bool parse_online(char * const online, int *pmin, int *pmax); 56 | static bool cpuset_check(char * const newval, char * const online); 57 | static bool cpus_check(char **newval, void **extra, GucSource source); 58 | static void cpus_assign(const char *newval, void *extra); 59 | static bool memory_nodes_check(char **newval, void **extra, GucSource source); 60 | static void memory_nodes_assign(const char *newval, void *extra); 61 | 62 | void 63 | _PG_init(void) 64 | { 65 | int dummy, num_cpus; 66 | 67 | if (!process_shared_preload_libraries_in_progress) 68 | ereport(FATAL, 69 | (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), 70 | errmsg("\"pg_cgroups\" must be added to \"shared_preload_libraries\""))); 71 | 72 | /* initialize cgroups library and set get GUC defaults */ 73 | cg_init(&cgroup_has_swap_param); 74 | 75 | /* set a default value (and upper limit) for cpu_share */ 76 | if (!parse_online(get_def_cpus(), &dummy, &num_cpus)) 77 | elog(FATAL, "internal error getting CPU count"); 78 | 79 | max_cpu_share = (num_cpus + 1) * 100000; 80 | 81 | /* once the control group is set up, we can define the GUCs */ 82 | DefineCustomIntVariable( 83 | "pg_cgroups.memory_limit", 84 | "Limit the RAM available to this cluster.", 85 | "This corresponds to \"memory.limit_in_bytes\".", 86 | &memory_limit, 87 | -1, 88 | -1, 89 | INT_MAX / 2, 90 | PGC_SIGHUP, 91 | GUC_UNIT_MB, 92 | memory_limit_check, 93 | memory_limit_assign, 94 | NULL 95 | ); 96 | 97 | if (cgroup_has_swap_param) 98 | DefineCustomIntVariable( 99 | "pg_cgroups.swap_limit", 100 | "Limit the swap space available to this cluster.", 101 | "This corresponds to \"memory.memsw.limit_in_bytes\" minus \"memory.limit_in_bytes\".", 102 | &swap_limit, 103 | -1, 104 | -1, 105 | INT_MAX / 2, 106 | PGC_SIGHUP, 107 | GUC_UNIT_MB, 108 | NULL, 109 | swap_limit_assign, 110 | NULL 111 | ); 112 | 113 | DefineCustomBoolVariable( 114 | "pg_cgroups.oom_killer", 115 | "Determines how to treat processes that exceed the memory limit.", 116 | "This corresponds to the negation of \"memory.oom_control\".", 117 | &oom_killer, 118 | true, 119 | PGC_SIGHUP, 120 | 0, 121 | NULL, 122 | oom_killer_assign, 123 | NULL 124 | ); 125 | 126 | DefineCustomStringVariable( 127 | "pg_cgroups.read_bps_limit", 128 | "Sets the read I/O limit per device in bytes.", 129 | "This corresponds to \"blkio.throttle.read_bps_device\".", 130 | &read_bps_limit, 131 | "", 132 | PGC_SIGHUP, 133 | 0, 134 | device_limit_check, 135 | read_bps_limit_assign, 136 | NULL 137 | ); 138 | 139 | DefineCustomStringVariable( 140 | "pg_cgroups.write_bps_limit", 141 | "Sets the write I/O limit per device in bytes.", 142 | "This corresponds to \"blkio.throttle.write_bps_device\".", 143 | &write_bps_limit, 144 | "", 145 | PGC_SIGHUP, 146 | 0, 147 | device_limit_check, 148 | write_bps_limit_assign, 149 | NULL 150 | ); 151 | 152 | DefineCustomStringVariable( 153 | "pg_cgroups.read_iops_limit", 154 | "Sets the read I/O limit per device in I/O operations per second.", 155 | "This corresponds to \"blkio.throttle.read_iops_device\".", 156 | &read_iops_limit, 157 | "", 158 | PGC_SIGHUP, 159 | 0, 160 | device_limit_check, 161 | read_iops_limit_assign, 162 | NULL 163 | ); 164 | 165 | DefineCustomStringVariable( 166 | "pg_cgroups.write_iops_limit", 167 | "Sets the write I/O limit per device in I/O operations per second.", 168 | "This corresponds to \"blkio.throttle.write_iops_device\".", 169 | &write_iops_limit, 170 | "", 171 | PGC_SIGHUP, 172 | 0, 173 | device_limit_check, 174 | write_iops_limit_assign, 175 | NULL 176 | ); 177 | 178 | DefineCustomIntVariable( 179 | "pg_cgroups.cpu_share", 180 | "Limit share of the available CPU time (100000 = 1 core).", 181 | "This corresponds to \"cpu.cfs_quota_us\".", 182 | &cpu_share, 183 | -1, 184 | -1, 185 | max_cpu_share, 186 | PGC_SIGHUP, 187 | 0, 188 | cpu_share_check, 189 | cpu_share_assign, 190 | NULL 191 | ); 192 | 193 | DefineCustomStringVariable( 194 | "pg_cgroups.cpus", 195 | "Specifies which CPUs are available for this cluster.", 196 | "This corresponds to \"cpuset.cpus\".", 197 | &cpus, 198 | strdup(get_def_cpus()), 199 | PGC_SIGHUP, 200 | 0, 201 | cpus_check, 202 | cpus_assign, 203 | NULL 204 | ); 205 | 206 | DefineCustomStringVariable( 207 | "pg_cgroups.memory_nodes", 208 | "Specifies which memory nodes are available for this cluster.", 209 | "This corresponds to \"cpuset.mems\".", 210 | &memory_nodes, 211 | strdup(get_def_memory_nodes()), 212 | PGC_SIGHUP, 213 | 0, 214 | memory_nodes_check, 215 | memory_nodes_assign, 216 | NULL 217 | ); 218 | 219 | DefineCustomStringVariable( 220 | "pg_cgroups.version", 221 | "The version of pg_cgroups.", 222 | NULL, 223 | &pg_cgroups_version, 224 | PG_CGROUPS_VERSION, 225 | PGC_INTERNAL, 226 | 0, 227 | NULL, 228 | NULL, 229 | NULL 230 | ); 231 | 232 | EmitWarningsOnPlaceholders("pg_cgroups"); 233 | } 234 | 235 | bool 236 | memory_limit_check(int *newval, void **extra, GucSource source) 237 | { 238 | return (bool) (*newval != 0); 239 | } 240 | 241 | void 242 | memory_limit_assign(int newval, void *extra) 243 | { 244 | int64_t mem_value, swap_value, newtotal; 245 | 246 | /* only the postmaster changes the kernel */ 247 | if (MyProcPid != PostmasterPid) 248 | return; 249 | 250 | /* convert from MB to bytes */ 251 | mem_value = (newval == -1) ? -1 : newval * (int64_t)1048576; 252 | 253 | /* calculate the new value for swap_limit */ 254 | if (newval == -1 || swap_limit == -1) 255 | newtotal = -1; 256 | else 257 | newtotal = (int64_t) swap_limit + newval; 258 | 259 | /* convert from MB to bytes */ 260 | swap_value = (newtotal == -1) ? -1 : newtotal * 1048576; 261 | 262 | if (newval == -1 263 | || (newval > memory_limit && memory_limit != -1)) 264 | { 265 | /* we have to raise the limit on memory + swap first */ 266 | if (cgroup_has_swap_param) 267 | cg_set_int64(CONTROLLER_MEMORY, "memory.memsw.limit_in_bytes", swap_value); 268 | cg_set_int64(CONTROLLER_MEMORY, "memory.limit_in_bytes", mem_value); 269 | } 270 | else 271 | { 272 | /* we have to lower the limit on memory + swap last */ 273 | cg_set_int64(CONTROLLER_MEMORY, "memory.limit_in_bytes", mem_value); 274 | if (cgroup_has_swap_param) 275 | cg_set_int64(CONTROLLER_MEMORY, "memory.memsw.limit_in_bytes", swap_value); 276 | } 277 | } 278 | 279 | void 280 | swap_limit_assign(int newval, void *extra) 281 | { 282 | int64_t swap_value, newtotal; 283 | 284 | Assert(cgroup_has_swap_param); 285 | 286 | /* only the postmaster changes the kernel */ 287 | if (MyProcPid != PostmasterPid) 288 | return; 289 | 290 | /* calculate the new memory + swap */ 291 | if (memory_limit == -1 || newval == -1) 292 | { 293 | newtotal = -1; 294 | newval = -1; 295 | } 296 | else 297 | newtotal = (int64_t) newval + memory_limit; 298 | 299 | /* convert from MB to bytes */ 300 | swap_value = (newtotal == -1) ? -1 : newtotal * 1048576; 301 | 302 | cg_set_int64(CONTROLLER_MEMORY, "memory.memsw.limit_in_bytes", swap_value); 303 | } 304 | 305 | void 306 | oom_killer_assign(bool newval, void *extra) 307 | { 308 | int64_t oom_value = !newval; 309 | 310 | /* only the postmaster changes the kernel */ 311 | if (MyProcPid != PostmasterPid) 312 | return; 313 | 314 | cg_set_int64(CONTROLLER_MEMORY, "memory.oom_control", oom_value); 315 | } 316 | 317 | bool 318 | device_limit_check(char **newval, void **extra, GucSource source) 319 | { 320 | char *val = pstrdup(*newval), 321 | *freeme = val; 322 | 323 | if (*val == '\0') 324 | return true; 325 | 326 | /* loop through comma-separated list */ 327 | while (val) 328 | { 329 | char *nextp, *device, *limit, *filename; 330 | bool have_colon = false, 331 | have_digit = false; 332 | struct stat statbuf; 333 | 334 | if ((nextp = strchr(val, ',')) != NULL) 335 | { 336 | *nextp = '\0'; 337 | ++nextp; 338 | } 339 | 340 | /* parse entry of the form : */ 341 | device = val; 342 | while (*val != ' ') 343 | { 344 | if (*val >= '0' && *val <= '9') 345 | have_digit = true; 346 | else if (*val == ':') 347 | { 348 | if (have_colon || !have_digit) 349 | { 350 | GUC_check_errdetail( 351 | "Entry \"%s\" does not start with \"major:minor\" device numbers.", 352 | device 353 | ); 354 | return false; 355 | } 356 | 357 | have_colon = true; 358 | have_digit = false; 359 | } 360 | else if (*val == '\0') 361 | { 362 | GUC_check_errdetail( 363 | "Entry \"%s\" must have a space between device and limit.", 364 | device 365 | ); 366 | return false; 367 | } 368 | else 369 | { 370 | GUC_check_errdetail( 371 | "Entry \"%s\" does not start with \"major:minor\" device numbers.", 372 | device 373 | ); 374 | return false; 375 | } 376 | 377 | ++val; 378 | } 379 | if (!have_colon || !have_digit) 380 | { 381 | GUC_check_errdetail( 382 | "Entry \"%s\" does not start with \"major:minor\" device numbers.", 383 | device 384 | ); 385 | return false; 386 | } 387 | 388 | *(val++) = '\0'; 389 | while (*val == ' ') 390 | ++val; 391 | limit = val; 392 | 393 | have_digit = false; 394 | while (*val >= '0' && *val <= '9') 395 | { 396 | have_digit = true; 397 | ++val; 398 | } 399 | if (*val != '\0' || !have_digit) 400 | { 401 | GUC_check_errdetail( 402 | "Limit \"%s\" must be an integer number.", 403 | limit 404 | ); 405 | return false; 406 | } 407 | 408 | /* check if the device exists */ 409 | filename = palloc(strlen(device) + 12); 410 | strcpy(filename, "/dev/block/"); 411 | strcat(filename, device); 412 | 413 | errno = 0; 414 | if (stat(filename, &statbuf)) 415 | { 416 | GUC_check_errdetail( 417 | errno == ENOENT ? "Device file \"%s\" does not exist." 418 | : "Error accessing device file \"%s\": %m", 419 | filename 420 | ); 421 | return false; 422 | } 423 | 424 | if ((statbuf.st_mode & S_IFMT) != S_IFBLK) 425 | { 426 | GUC_check_errdetail( 427 | "Device file \"%s\" is not a block device.", 428 | filename 429 | ); 430 | return false; 431 | } 432 | 433 | pfree(filename); 434 | 435 | val = nextp; 436 | } 437 | 438 | pfree(freeme); 439 | return true; 440 | } 441 | 442 | void 443 | device_limit_assign(char * const limit_name, char *newval) 444 | { 445 | int i; 446 | char *device_limit_val = NULL; 447 | 448 | /* only the postmaster changes the kernel */ 449 | if (MyProcPid != PostmasterPid) 450 | return; 451 | 452 | device_limit_val = pstrdup(newval ? newval : ""); 453 | 454 | /* replace commas with line breaks */ 455 | if (device_limit_val) 456 | for (i=strlen(device_limit_val)-1; i>=0; --i) 457 | if (device_limit_val[i] == ',') 458 | device_limit_val[i] = '\n'; 459 | 460 | cg_set_string(CONTROLLER_BLKIO, limit_name, device_limit_val); 461 | 462 | pfree(device_limit_val); 463 | } 464 | 465 | void 466 | read_bps_limit_assign(const char *newval, void *extra) 467 | { 468 | device_limit_assign("blkio.throttle.read_bps_device", (char *) newval); 469 | } 470 | 471 | void 472 | write_bps_limit_assign(const char *newval, void *extra) 473 | { 474 | device_limit_assign("blkio.throttle.write_bps_device", (char *) newval); 475 | } 476 | 477 | void 478 | read_iops_limit_assign(const char *newval, void *extra) 479 | { 480 | device_limit_assign("blkio.throttle.read_iops_device", (char *) newval); 481 | } 482 | 483 | void 484 | write_iops_limit_assign(const char *newval, void *extra) 485 | { 486 | device_limit_assign("blkio.throttle.write_iops_device", (char *) newval); 487 | } 488 | 489 | bool 490 | cpu_share_check(int *newval, void **extra, GucSource source) 491 | { 492 | return (bool) (*newval == -1 || *newval >= 1000); 493 | } 494 | 495 | void 496 | cpu_share_assign(int newval, void *extra) 497 | { 498 | /* only the postmaster changes the kernel */ 499 | if (MyProcPid != PostmasterPid) 500 | return; 501 | 502 | cg_set_int64(CONTROLLER_CPU, "cpu.cfs_quota_us", (int64_t) newval); 503 | } 504 | 505 | /* 506 | * Extracts the first and the last number from a string that starts 507 | * and ends with a number. 508 | */ 509 | bool 510 | parse_online(char * const online, int *pmin, int *pmax) 511 | { 512 | char *start, *p, buf[100]; 513 | 514 | /* we read from the start to get the first number */ 515 | start = p = online; 516 | while (*p >= '0' && *p <= '9') 517 | ++p; 518 | if (start == p || p - start >= 6) 519 | { 520 | GUC_check_errdetail( 521 | "Online limit \"%s\" does not start with a valid number.", 522 | online 523 | ); 524 | 525 | return false; 526 | } 527 | memcpy(buf, start, p - start); 528 | buf[p - start] = '\0'; 529 | *pmin = atoi(buf); 530 | 531 | /* now we read backwards from the end for the second number */ 532 | p = start += strlen(online); 533 | while (start > online && *(start-1) >= '0' && *(start-1) <= '9') 534 | --start; 535 | if (start == p || p - start >= 6) 536 | { 537 | GUC_check_errdetail( 538 | "Online limit \"%s\" does not end with a valid number.", 539 | online 540 | ); 541 | 542 | return false; 543 | } 544 | *pmax = atoi(start); 545 | 546 | return true; 547 | } 548 | 549 | /* 550 | * "newval" is parsed and checked if it matches the regexp 551 | * ^[0-9]+\(-[0-9]+\)?\(,[0-9]+\(-[0-9]+\)?\)* 552 | * "online" is of the form "m-n" and specifies the limits 553 | * for the numbers that appera in "newval". 554 | */ 555 | bool 556 | cpuset_check(char * const newval, char * const online) 557 | { 558 | int online_min, online_max, min = 0, max = 0; 559 | char *start, *p, buf[100]; 560 | /* 561 | * values for "state": 562 | * 0: before comma group 563 | * 1: in first number 564 | * 2: after hyphen 565 | * 3: in second number 566 | */ 567 | int state = 0; 568 | 569 | /* we take the first and the last number in "online" as limits */ 570 | if (!parse_online(online, &online_min, &online_max)) 571 | return false; 572 | 573 | /* parse and check newval */ 574 | for (p = newval; true; ++p) 575 | { 576 | if (*p >= '0' && *p <= '9') 577 | { 578 | if (state == 0 || state == 2) 579 | { 580 | start = p; 581 | ++state; 582 | } 583 | } 584 | else if (*p == '-') 585 | { 586 | if (state != 1) 587 | { 588 | GUC_check_errdetail( 589 | "Value \"%s\" has \"-\" in an invalid place.", 590 | newval 591 | ); 592 | 593 | return false; 594 | } 595 | 596 | if (p - start >= 6) 597 | { 598 | GUC_check_errdetail( 599 | "Value \"%s\" contains an invalid number.", 600 | newval 601 | ); 602 | 603 | return false; 604 | } 605 | 606 | memcpy(buf, start, p - start); 607 | buf[p - start] = '\0'; 608 | min = atoi(buf); 609 | 610 | if (min < online_min || min > online_max) 611 | { 612 | GUC_check_errdetail( 613 | "Number %d is outside of range %d-%d.", 614 | min, online_min, online_max 615 | ); 616 | 617 | return false; 618 | } 619 | 620 | state = 2; 621 | } 622 | else if (*p == ',' || *p == '\0') 623 | { 624 | if (state != 1 && state != 3) 625 | { 626 | GUC_check_errdetail( 627 | "Value \"%s\" is missing a number at the end of a group.", 628 | newval 629 | ); 630 | 631 | return false; 632 | } 633 | 634 | if (p - start >= 6) 635 | { 636 | GUC_check_errdetail( 637 | "Value \"%s\" contains an invalid number.", 638 | newval 639 | ); 640 | 641 | return false; 642 | } 643 | 644 | memcpy(buf, start, p - start); 645 | buf[p - start] = '\0'; 646 | max = atoi(buf); 647 | 648 | if (state == 1 && (max < online_min || max > online_max)) 649 | { 650 | GUC_check_errdetail( 651 | "Number %d is outside of range %d-%d.", 652 | max, online_min, online_max 653 | ); 654 | 655 | return false; 656 | } 657 | 658 | if (state == 3 && (max < min || max > online_max)) 659 | { 660 | GUC_check_errdetail( 661 | "Number %d is outside of range %d-%d.", 662 | max, min, online_max 663 | ); 664 | 665 | return false; 666 | } 667 | 668 | if (*p == '\0') 669 | break; 670 | else 671 | state = 0; 672 | } 673 | else 674 | { 675 | GUC_check_errdetail( 676 | "Value \"%s\" contains an invalid character.", 677 | newval 678 | ); 679 | 680 | return false; 681 | } 682 | } 683 | 684 | return true; 685 | } 686 | 687 | bool 688 | cpus_check(char **newval, void **extra, GucSource source) 689 | { 690 | return cpuset_check(*newval, get_def_cpus()); 691 | } 692 | 693 | void 694 | cpus_assign(const char *newval, void *extra) 695 | { 696 | /* only the postmaster changes the kernel */ 697 | if (MyProcPid != PostmasterPid) 698 | return; 699 | 700 | cg_set_string(CONTROLLER_CPUSET, "cpuset.cpus", (char *) newval); 701 | } 702 | 703 | bool 704 | memory_nodes_check(char **newval, void **extra, GucSource source) 705 | { 706 | return cpuset_check(*newval, get_def_memory_nodes()); 707 | } 708 | 709 | void 710 | memory_nodes_assign(const char *newval, void *extra) 711 | { 712 | /* only the postmaster changes the kernel */ 713 | if (MyProcPid != PostmasterPid) 714 | return; 715 | 716 | cg_set_string(CONTROLLER_CPUSET, "cpuset.mems", (char *) newval); 717 | } 718 | -------------------------------------------------------------------------------- /pg_cgroups.h: -------------------------------------------------------------------------------- 1 | #define PG_CGROUPS_VERSION "pg_cgroups version 0.9.1devel" 2 | 3 | /* cgroup controllers we use */ 4 | #define MAX_CONTROLLERS 4 5 | 6 | #define CONTROLLER_MEMORY 0 7 | #define CONTROLLER_CPU 1 8 | #define CONTROLLER_BLKIO 2 9 | #define CONTROLLER_CPUSET 3 10 | 11 | /* defined in pg_cgrops.c */ 12 | extern void _PG_init(void); 13 | 14 | /* defined in libcg1.c */ 15 | extern void cg_init(bool *cgroup_has_swap_param); 16 | extern char * const get_def_cpus(void); 17 | extern char * const get_def_memory_nodes(void); 18 | extern void cg_set_string(int controller, char * const parameter, char * const value); 19 | extern void cg_set_int64(int controller, char * const parameter, int64_t value); 20 | -------------------------------------------------------------------------------- /sql/test_blkio.sql: -------------------------------------------------------------------------------- 1 | /* 2 | * Unfortunately I cannot test everything because I cannot 3 | * rely on the existence of a certain block device on 4 | * every Linux system. 5 | */ 6 | 7 | -- try several incorrect settings that should fail 8 | ALTER SYSTEM SET pg_cgroups.read_bps_limit = '1024'; 9 | ALTER SYSTEM SET pg_cgroups.write_bps_limit = '8:0'; 10 | ALTER SYSTEM SET pg_cgroups.read_iops_limit = ':0 9210'; 11 | ALTER SYSTEM SET pg_cgroups.write_iops_limit = '100 9210'; 12 | ALTER SYSTEM SET pg_cgroups.read_bps_limit = '100: 9210'; 13 | ALTER SYSTEM SET pg_cgroups.write_iops_limit = '1:0 xyz'; 14 | -------------------------------------------------------------------------------- /sql/test_cpu.sql: -------------------------------------------------------------------------------- 1 | -- check the default settings 2 | SHOW pg_cgroups.cpu_share; 3 | 4 | -- this should fail 5 | ALTER SYSTEM SET pg_cgroups.cpu_share = 0; 6 | 7 | -- allow 50% of the availabe CPU 8 | ALTER SYSTEM SET pg_cgroups.cpu_share = 50000; 9 | SELECT pg_reload_conf(); 10 | SELECT pg_sleep_for('0.3'); 11 | SHOW pg_cgroups.cpu_share; 12 | 13 | -- reset 14 | ALTER SYSTEM RESET pg_cgroups.cpu_share; 15 | SELECT pg_reload_conf(); 16 | SELECT pg_sleep_for('0.3'); 17 | SHOW pg_cgroups.cpu_share; 18 | -------------------------------------------------------------------------------- /sql/test_cpuset.sql: -------------------------------------------------------------------------------- 1 | -- check the default settings 2 | SHOW pg_cgroups.cpus; 3 | SHOW pg_cgroups.memory_nodes; 4 | 5 | -- test some incorrect settings 6 | ALTER SYSTEM SET pg_cgroups.cpus = '-1'; 7 | ALTER SYSTEM SET pg_cgroups.cpus = '0-0-0'; 8 | ALTER SYSTEM SET pg_cgroups.cpus = '0,1-0,1'; 9 | ALTER SYSTEM SET pg_cgroups.cpus = '10000'; 10 | ALTER SYSTEM SET pg_cgroups.cpus = '1000000'; 11 | ALTER SYSTEM SET pg_cgroups.cpus = ',1'; 12 | ALTER SYSTEM SET pg_cgroups.cpus = '0-1,'; 13 | 14 | -- set the available CPUs 15 | ALTER SYSTEM SET pg_cgroups.cpus = '0'; 16 | SELECT pg_reload_conf(); 17 | SELECT pg_sleep_for('0.3'); 18 | SHOW pg_cgroups.cpus; 19 | 20 | -- set the available memory nodes 21 | ALTER SYSTEM SET pg_cgroups.memory_nodes = '0'; 22 | SELECT pg_reload_conf(); 23 | SELECT pg_sleep_for('0.3'); 24 | SHOW pg_cgroups.memory_nodes; 25 | 26 | -- reset 27 | ALTER SYSTEM RESET pg_cgroups.cpus; 28 | ALTER SYSTEM RESET pg_cgroups.memory_nodes; 29 | SELECT pg_reload_conf(); 30 | SELECT pg_sleep_for('0.3'); 31 | SHOW pg_cgroups.cpus; 32 | SHOW pg_cgroups.memory_nodes; 33 | -------------------------------------------------------------------------------- /sql/test_memory.sql: -------------------------------------------------------------------------------- 1 | -- check the default settings 2 | SHOW pg_cgroups.memory_limit; 3 | SHOW pg_cgroups.swap_limit; 4 | SHOW pg_cgroups.oom_killer; 5 | 6 | -- change swap_limit (will set the parameter, leave kernel value unlimited) 7 | ALTER SYSTEM SET pg_cgroups.swap_limit = 512; 8 | SELECT pg_reload_conf(); 9 | SELECT pg_sleep_for('0.3'); 10 | SHOW pg_cgroups.swap_limit; 11 | 12 | -- change memory limit (should work) 13 | ALTER SYSTEM SET pg_cgroups.memory_limit = 1024; 14 | SELECT pg_reload_conf(); 15 | SELECT pg_sleep_for('0.3'); 16 | SHOW pg_cgroups.memory_limit; 17 | SHOW pg_cgroups.swap_limit; 18 | 19 | -- change swap_limit (should work) 20 | ALTER SYSTEM SET pg_cgroups.swap_limit = 0; 21 | SELECT pg_reload_conf(); 22 | SELECT pg_sleep_for('0.3'); 23 | SHOW pg_cgroups.swap_limit; 24 | 25 | -- lower memory limit (should work) 26 | ALTER SYSTEM SET pg_cgroups.memory_limit = 256; 27 | SELECT pg_reload_conf(); 28 | SELECT pg_sleep_for('0.3'); 29 | SHOW pg_cgroups.memory_limit; 30 | SHOW pg_cgroups.swap_limit; 31 | 32 | -- raise memory limit (should work) 33 | ALTER SYSTEM SET pg_cgroups.memory_limit = 512; 34 | SELECT pg_reload_conf(); 35 | SELECT pg_sleep_for('0.3'); 36 | SHOW pg_cgroups.memory_limit; 37 | SHOW pg_cgroups.swap_limit; 38 | 39 | -- set swap limit to -1 (should work) 40 | ALTER SYSTEM SET pg_cgroups.swap_limit = -1; 41 | SELECT pg_reload_conf(); 42 | SELECT pg_sleep_for('0.3'); 43 | SHOW pg_cgroups.swap_limit; 44 | 45 | -- set swap limit to 0 (should work) 46 | ALTER SYSTEM SET pg_cgroups.swap_limit = 0; 47 | SELECT pg_reload_conf(); 48 | SELECT pg_sleep_for('0.3'); 49 | SHOW pg_cgroups.swap_limit; 50 | 51 | -- set memory limit to 0 (should fail) 52 | ALTER SYSTEM SET pg_cgroups.memory_limit = 0; 53 | 54 | -- disable OOM killer (should work) 55 | ALTER SYSTEM SET pg_cgroups.oom_killer = off; 56 | SELECT pg_reload_conf(); 57 | SELECT pg_sleep_for('0.3'); 58 | SHOW pg_cgroups.oom_killer; 59 | 60 | -- reset all settings 61 | ALTER SYSTEM RESET pg_cgroups.memory_limit; 62 | ALTER SYSTEM RESET pg_cgroups.swap_limit; 63 | ALTER SYSTEM RESET pg_cgroups.oom_killer; 64 | SELECT pg_reload_conf(); 65 | SELECT pg_sleep_for('0.3'); 66 | SHOW pg_cgroups.memory_limit; 67 | SHOW pg_cgroups.swap_limit; 68 | SHOW pg_cgroups.oom_killer; 69 | --------------------------------------------------------------------------------