├── .github └── PULL_REQUEST_TEMPLATE.md ├── .gitignore ├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── LICENSE.txt ├── MANIFEST.in ├── Makefile ├── README.md ├── agent ├── enable-ec2-spot-hibernation └── hibagent ├── etc ├── hibagent-config.cfg └── init.d │ └── hibagent ├── hibagent.spec ├── setup.py └── test ├── __init__.py └── hibagent_test.py /.github/PULL_REQUEST_TEMPLATE.md: -------------------------------------------------------------------------------- 1 | *Issue #, if available:* 2 | 3 | *Description of changes:* 4 | 5 | 6 | By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. 7 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .idea 2 | *.iml 3 | output 4 | dist 5 | *.egg-info 6 | build 7 | __pycache__ 8 | *.pyc 9 | .eggs 10 | hibagentc 11 | -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | ## Code of Conduct 2 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 3 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 4 | opensource-codeofconduct@amazon.com with any additional questions or comments. 5 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing Guidelines 2 | 3 | Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional 4 | documentation, we greatly value feedback and contributions from our community. 5 | 6 | Please read through this document before submitting any issues or pull requests to ensure we have all the necessary 7 | information to effectively respond to your bug report or contribution. 8 | 9 | 10 | ## Reporting Bugs/Feature Requests 11 | 12 | We welcome you to use the GitHub issue tracker to report bugs or suggest features. 13 | 14 | When filing an issue, please check [existing open](https://github.com/aws/ec2-hibernate-linux-agent/issues), or [recently closed](https://github.com/aws/ec2-hibernate-linux-agent/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aclosed%20), issues to make sure somebody else hasn't already 15 | reported the issue. Please try to include as much information as you can. Details like these are incredibly useful: 16 | 17 | * A reproducible test case or series of steps 18 | * The version of our code being used 19 | * Any modifications you've made relevant to the bug 20 | * Anything unusual about your environment or deployment 21 | 22 | 23 | ## Contributing via Pull Requests 24 | Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that: 25 | 26 | 1. You are working against the latest source on the *master* branch. 27 | 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already. 28 | 3. You open an issue to discuss any significant work - we would hate for your time to be wasted. 29 | 30 | To send us a pull request, please: 31 | 32 | 1. Fork the repository. 33 | 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change. 34 | 3. Ensure local tests pass. 35 | 4. Commit to your fork using clear commit messages. 36 | 5. Send us a pull request, answering any default questions in the pull request interface. 37 | 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation. 38 | 39 | GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and 40 | [creating a pull request](https://help.github.com/articles/creating-a-pull-request/). 41 | 42 | 43 | ## Finding contributions to work on 44 | Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels ((enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any ['help wanted'](https://github.com/aws/ec2-hibernate-linux-agent/labels/help%20wanted) issues is a great place to start. 45 | 46 | 47 | ## Code of Conduct 48 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 49 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 50 | opensource-codeofconduct@amazon.com with any additional questions or comments. 51 | 52 | 53 | ## Security issue notifications 54 | If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue. 55 | 56 | 57 | ## Licensing 58 | 59 | See the [LICENSE](https://github.com/aws/ec2-hibernate-linux-agent/blob/master/LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution. 60 | 61 | We may ask you to sign a [Contributor License Agreement (CLA)](http://en.wikipedia.org/wiki/Contributor_License_Agreement) for larger changes. 62 | -------------------------------------------------------------------------------- /LICENSE.txt: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | 203 | 204 | 205 | ## Runtime Library Exception to the Apache 2.0 License: ## 206 | 207 | 208 | As an exception, if you use this Software to compile your source code and 209 | portions of this Software are embedded into the binary product as a result, 210 | you may redistribute such product without providing attribution as would 211 | otherwise be required by Sections 4(a), 4(b) and 4(d) of the License. 212 | 213 | -------------------------------------------------------------------------------- /MANIFEST.in: -------------------------------------------------------------------------------- 1 | include README.md LICENSE.txt 2 | include etc/hibagent-config.cfg 3 | include etc/init.d/hibagent 4 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | sources: 2 | python2.7 setup.py sdist --formats=gztar 3 | mv dist/*.gz . 4 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # The EC2 Spot hibernation agent (Legacy) 2 | 3 | > With the release of a new generation of Spot Hibernation, this repo has entered legacy status 4 | > 5 | > Please refer to the [hibinit-agent](https://github.com/aws/amazon-ec2-hibinit-agent) repo now used for Spot Hibernation 6 | > 7 | > Related Documentation: 8 | > * Instructions to enable Spot Hibernation | [Link](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/hibernate-spot-instances.html) 9 | 10 | ## License 11 | The code is released under Apache License Vesion 2.0. See LICENSE.txt for details. 12 | 13 | ## Description 14 | 15 | This agent does several things: 16 | 17 | 1. Upon startup it checks for sufficient swap space to allow hibernate and fails 18 | if it's present but there's not enough of it. 19 | 2. If there's no swap space, it creates it and launches a background thread to 20 | touch all of its blocks to make sure that EBS volumes are pre-warmed. 21 | 3. It updates the offset of the swap file in the kernel using SNAPSHOT_SET_SWAP_AREA ioctl. 22 | 4. It daemonizes and starts a polling thread to listen for instance termination notifications. 23 | 24 | ## Building 25 | The code can be build using the usual Python setuptools: 26 | 27 | ``` 28 | python setup.py install 29 | ``` 30 | 31 | Additionally, you can build an sRPM package for CentOS/RedHat by running "make sources". 32 | -------------------------------------------------------------------------------- /agent/enable-ec2-spot-hibernation: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # Add this script in your user-data to turn on the hibernation support 3 | # for the EC2 Spot Instances. 4 | 5 | # Make sure the agent is added to the auto-start 6 | /sbin/chkconfig hibagent on 7 | 8 | # And start it 9 | exec service hibagent start 10 | -------------------------------------------------------------------------------- /agent/hibagent: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | # The EC2 Spot hibernation agent. This agent does several things: 3 | # 1. Upon startup it checks for sufficient swap space to allow hibernate and fails 4 | # if it's present but there's not enough of it. 5 | # 2. If there's no swap space, it creates it and launches a background thread to 6 | # touch all of its blocks to make sure that EBS volumes are pre-warmed. 7 | # 3. It updates the offset of the swap file in the kernel using SNAPSHOT_SET_SWAP_AREA ioctl. 8 | # 4. It daemonizes and starts a polling thread to listen for instance termination notifications. 9 | # 10 | # This file is compatible both with Python 2 and Python 3 11 | import argparse 12 | import array 13 | import atexit 14 | import ctypes as ctypes 15 | import fcntl 16 | import mmap 17 | import os 18 | import struct 19 | import sys 20 | import syslog 21 | import requests 22 | from subprocess import check_call, check_output 23 | from threading import Thread 24 | from math import ceil 25 | from time import sleep 26 | 27 | try: 28 | from urllib.request import urlopen, Request 29 | except ImportError: 30 | from urllib2 import urlopen, Request, HTTPError 31 | 32 | try: 33 | from ConfigParser import ConfigParser, NoSectionError, NoOptionError 34 | except: 35 | from configparser import ConfigParser, NoSectionError, NoOptionError 36 | 37 | GRUB_FILE = '/boot/grub/menu.lst' 38 | GRUB2_DIR = '/etc/default/grub.d' 39 | SWAP_RESERVED_SIZE = 16384 40 | log_to_syslog = True 41 | log_to_stderr = True 42 | 43 | IMDS_BASEURL = 'http://169.254.169.254' 44 | IMDS_API_TOKEN_PATH = 'latest/api/token' 45 | IMDS_SPOT_ACTION_PATH = 'latest/meta-data/hibernation/configured' 46 | 47 | def log(message): 48 | if log_to_syslog: 49 | syslog.syslog(message) 50 | if log_to_stderr: 51 | sys.stderr.write("%s\n" % message) 52 | 53 | 54 | def fallocate(fl, size): 55 | try: 56 | _libc = ctypes.CDLL('libc.so.6') 57 | _fallocate = _libc.fallocate 58 | _fallocate.argtypes = [ctypes.c_int, ctypes.c_int, ctypes.c_ulong, ctypes.c_ulong] 59 | 60 | # (FD, mode, offset, len) 61 | res = _fallocate(fl.fileno(), 0, 0, size) 62 | if res != 0: 63 | raise Exception("Failed to perform fallocate(). Result: %d" % res) 64 | except Exception as e: 65 | log("Failed to call fallocate(), will use resize. Err: %s" % str(e)) 66 | fl.seek(size-1) 67 | fl.write(chr(0)) 68 | 69 | 70 | def mlockall(): 71 | log("Locking all the code in memory") 72 | try: 73 | _libc = ctypes.CDLL('libc.so.6') 74 | _mlockall = _libc.mlockall 75 | _mlockall.argtypes = [ctypes.c_int] 76 | _MCL_CURRENT = 1 77 | _MCL_FUTURE = 2 78 | _mlockall(_MCL_CURRENT | _MCL_FUTURE) 79 | except Exception as e: 80 | log("Failed to lock hibernation agent into RAM. Error: %s" % str(e)) 81 | 82 | 83 | def get_file_block_number(filename): 84 | with open(filename, 'r') as handle: 85 | buf = array.array('L', [0]) 86 | # from linux/fs.h 87 | FIBMAP = 0x01 88 | result = fcntl.ioctl(handle.fileno(), FIBMAP, buf) 89 | if result < 0: 90 | raise Exception("Failed to get the file offset. Error=%d" % result) 91 | return buf[0] 92 | 93 | 94 | def get_swap_space(): 95 | # Format is (tab-separated): 96 | # Filename Type Size Used Priority 97 | # / swapfile file 536870908 0 - 1 98 | with open('/proc/swaps') as swp: 99 | lines = swp.readlines()[1:] 100 | if not lines: 101 | return 0 102 | return int(lines[0].split()[2]) * 1024 103 | 104 | 105 | def patch_grub_config(swap_device, offset, grub_file, grub2_dir): 106 | log("Updating GRUB to use the device %s with offset %d for resume" 107 | % (swap_device, offset)) 108 | if grub_file and os.path.exists(grub_file): 109 | lines = [] 110 | with open(grub_file) as fl: 111 | for ln in fl.readlines(): 112 | params = ln.split() 113 | if not params or params[0] != 'kernel': 114 | lines.append(ln) 115 | continue 116 | 117 | new_params = [] 118 | for param in params: 119 | if "resume_offset=" in param or "resume=" in param: 120 | continue 121 | new_params.append(param) 122 | 123 | if "no_console_suspend=" not in ln: 124 | new_params.append("no_console_suspend=1") 125 | new_params.append("resume_offset=%d" % offset) 126 | new_params.append("resume=%s" % swap_device) 127 | 128 | lines.append(" ".join(new_params)+"\n") 129 | 130 | with open(grub_file, "w") as fl: 131 | fl.write("".join(lines)) 132 | 133 | # Do GRUB2 update as well 134 | if grub2_dir and os.path.exists(grub2_dir): 135 | offset_file = os.path.join(grub2_dir, '99-set-swap.cfg') 136 | with open(offset_file, 'w') as fl: 137 | fl.write('GRUB_CMDLINE_LINUX_DEFAULT="$GRUB_CMDLINE_LINUX_DEFAULT no_console_suspend=1 ' 138 | 'resume_offset=%d resume=%s"\n' % (offset, swap_device)) 139 | check_call('/usr/sbin/update-grub2') 140 | 141 | log("GRUB configuration is updated") 142 | 143 | 144 | def update_kernel_swap_offset(grub_update): 145 | with open('/proc/swaps') as swp: 146 | lines = swp.readlines()[1:] 147 | if not lines: 148 | raise Exception("Swap file is not found") 149 | filename = lines[0].split()[0] 150 | log("Updating the kernel offset for the swapfile: %s" % filename) 151 | 152 | statbuf = os.stat(filename) 153 | dev = statbuf.st_dev 154 | offset = get_file_block_number(filename) 155 | 156 | if grub_update: 157 | # Find the mount point for the swap file ('df -P /swap') 158 | df_out = check_output(['df', '-P', filename]).decode('ascii') 159 | dev_str = df_out.split("\n")[1].split()[0] 160 | patch_grub_config(dev_str, offset, GRUB_FILE, GRUB2_DIR) 161 | else: 162 | log("Skipping GRUB configuration update") 163 | 164 | log("Setting swap device to %d with offset %d" % (dev, offset)) 165 | 166 | # Set the kernel swap offset, see https://www.kernel.org/doc/Documentation/power/userland-swsusp.txt 167 | # From linux/suspend_ioctls.h 168 | SNAPSHOT_SET_SWAP_AREA = 0x400C330D 169 | buf = struct.pack('LI', offset, dev) 170 | with open('/dev/snapshot', 'r') as snap: 171 | fcntl.ioctl(snap, SNAPSHOT_SET_SWAP_AREA, buf) 172 | log("Done updating the swap offset") 173 | 174 | 175 | class SwapInitializer(object): 176 | def __init__(self, filename, swap_size, touch_swap, mkswap, swapon): 177 | self.filename = filename 178 | self.swap_size = swap_size 179 | self.need_to_hurry = False 180 | self.mkswap = mkswap 181 | self.swapon = swapon 182 | self.touch_swap = touch_swap 183 | 184 | def do_allocate(self): 185 | log("Allocating %d bytes in %s" % (self.swap_size, self.filename)) 186 | with open(self.filename, 'w+') as fl: 187 | fallocate(fl, self.swap_size) 188 | os.chmod(self.filename, 0o600) 189 | 190 | def init_swap(self): 191 | """ 192 | Initialize the swap using direct IO to avoid polluting the page cache 193 | """ 194 | try: 195 | cur_swap_size = os.stat(self.filename).st_size 196 | if cur_swap_size >= self.swap_size: 197 | log("Swap file size (%d bytes) is already large enough" % cur_swap_size) 198 | return 199 | except OSError: 200 | pass 201 | self.do_allocate() 202 | 203 | if not self.touch_swap: 204 | log("Swap pre-heating is skipped, the swap blocks won't be touched during " 205 | "initialization to ensure they are ready") 206 | return 207 | 208 | written = 0 209 | log("Opening %s for direct IO" % self.filename) 210 | fd = os.open(self.filename, os.O_RDWR | os.O_DIRECT | os.O_SYNC | os.O_DSYNC) 211 | if fd < 0: 212 | raise Exception("Failed to initialize the swap. Err: %s" % os.strerror(os.errno)) 213 | 214 | filler_block = None 215 | try: 216 | # Create a filler block that is correctly aligned for direct IO 217 | filler_block = mmap.mmap(-1, 1024 * 1024) 218 | # We're using 'b' to avoid optimizations that might happen for zero-filled pages 219 | filler_block.write(b'b' * 1024 * 1024) 220 | 221 | log("Touching all blocks in %s" % self.filename) 222 | 223 | while written < self.swap_size and not self.need_to_hurry: 224 | res = os.write(fd, filler_block) 225 | if res <= 0: 226 | raise Exception("Failed to touch a block. Err: %s" % os.strerror(os.errno)) 227 | written += res 228 | finally: 229 | os.close(fd) 230 | if filler_block: 231 | filler_block.close() 232 | log("Swap file %s is ready" % self.filename) 233 | 234 | def turn_on_swap(self): 235 | # Do mkswap 236 | try: 237 | mkswap = self.mkswap.format(swapfile=self.filename) 238 | log("Running: %s" % mkswap) 239 | check_call(mkswap, shell=True) 240 | swapon = self.swapon.format(swapfile=self.filename) 241 | log("Running: %s" % swapon) 242 | check_call(swapon, shell=True) 243 | except Exception as e: 244 | log("Failed to initialize swap, reason: %s" % str(e)) 245 | 246 | 247 | class BackgroundInitializerRunner(object): 248 | def __init__(self, swapper, update_grub): 249 | self.swapper = swapper 250 | self.thread = None 251 | self.error = None 252 | self.update_grub = update_grub 253 | 254 | def start_init(self): 255 | self.thread = Thread(target=self.do_async_init, name="SwapInitializer") 256 | self.thread.setDaemon(True) 257 | self.thread.start() 258 | 259 | def check_finished(self): 260 | if self.thread is not None: 261 | self.thread.join(timeout=0) 262 | if self.thread.isAlive(): 263 | return False 264 | self.thread = None 265 | 266 | log("Background swap initialization thread is complete.") 267 | if self.error is not None: 268 | raise self.error 269 | return True 270 | 271 | def force_completion(self): 272 | log("We're out of time, stopping the background swap initialization.") 273 | self.swapper.need_to_hurry = True 274 | self.thread.join() 275 | log("Background swap initialization thread has stopped.") 276 | self.thread = None 277 | if self.error is not None: 278 | raise self.error 279 | 280 | def do_async_init(self): 281 | try: 282 | self.swapper.init_swap() 283 | self.swapper.turn_on_swap() 284 | update_kernel_swap_offset(self.update_grub) 285 | except Exception as ex: 286 | log("Failed to initialize swap, reason: %s" % str(ex)) 287 | self.error = ex 288 | 289 | 290 | class ItnPoller(object): 291 | def __init__(self, url, hibernate_cmd, initializer): 292 | self.url = url 293 | self.hibernate_cmd = hibernate_cmd 294 | self.initializer = initializer 295 | 296 | def poll_loop(self): 297 | log("Starting the hibernation polling loop") 298 | while True: 299 | self.run_loop_iteration() 300 | sleep(1) 301 | 302 | def run_loop_iteration(self): 303 | if self.initializer and self.initializer.check_finished(): 304 | self.initializer = None 305 | if self.poll_for_termination(): 306 | if self.initializer: 307 | self.initializer.force_completion() 308 | self.initializer = None 309 | self.do_hibernate() 310 | 311 | def poll_for_termination(self): 312 | # noinspection PyBroadException 313 | response1 = None 314 | response2 = None 315 | try: 316 | request1 = Request("http://169.254.169.254/latest/api/token") 317 | request1.add_header('X-aws-ec2-metadata-token-ttl-seconds', '21600') 318 | request1.get_method = lambda:"PUT" 319 | response1 = urlopen(request1) 320 | 321 | token = response1.read() 322 | 323 | request2 = Request(self.url) 324 | request2.add_header('X-aws-ec2-metadata-token', token) 325 | response2 = urlopen(request2) 326 | res = response2.read() 327 | return b"hibernate" in res 328 | except: 329 | return False 330 | finally: 331 | if response1: 332 | response1.close() 333 | if response2: 334 | response2.close() 335 | 336 | def do_hibernate(self): 337 | log("Attempting to hibernate") 338 | try: 339 | check_call(self.hibernate_cmd, shell=True) 340 | except Exception as e: 341 | log("Failed to hibernate, reason: %s" % str(e)) 342 | # We're not guaranteed to be stopped immediately after the hibernate 343 | # command fires. So wait a little bit to avoid triggering ourselves twice 344 | sleep(2) 345 | 346 | 347 | def daemonize(pidfile): 348 | """ 349 | Convert the process into a daemon, doing the usual Unix magic 350 | """ 351 | try: 352 | pid = os.fork() 353 | if pid > 0: 354 | # Exit from first parent 355 | sys.exit(0) 356 | except OSError as e: 357 | log("Fork #1 failed: %d (%s)\n" % (e.errno, e.strerror)) 358 | sys.exit(1) 359 | 360 | # Decouple from parent environment 361 | os.chdir("/") 362 | os.setsid() 363 | os.umask(0) 364 | 365 | # Second fork 366 | try: 367 | pid = os.fork() 368 | if pid > 0: 369 | # Exit from second parent 370 | sys.exit(0) 371 | except OSError as e: 372 | log("Fork #2 failed: %d (%s)\n" % (e.errno, e.strerror)) 373 | sys.exit(1) 374 | 375 | # Write the PID file 376 | pid = str(os.getpid()) 377 | with open(pidfile, "w+") as fl: 378 | fl.write("%s\n" % pid) 379 | atexit.register(lambda: os.unlink(pidfile)) 380 | 381 | # Redirect standard file descriptors to null to avoid blocking 382 | nul = open('/dev/null', 'a+') 383 | os.dup2(nul.fileno(), sys.stdin.fileno()) 384 | os.dup2(nul.fileno(), sys.stdout.fileno()) 385 | os.dup2(nul.fileno(), sys.stderr.fileno()) 386 | 387 | 388 | class Config(object): 389 | def __init__(self, config, args): 390 | def get(section, name): 391 | try: 392 | return config.get(section, name) 393 | except NoSectionError: 394 | return None 395 | except NoOptionError: 396 | return None 397 | 398 | def get_int(section, name): 399 | v = get(section, name) 400 | if v is None: 401 | return None 402 | return int(v) 403 | 404 | self.lock_in_ram = self.merge( 405 | self.to_bool(get('core', 'lock-in-ram')), self.to_bool(args.lock_in_ram), True) 406 | self.log_to_syslog = self.merge( 407 | self.to_bool(get('core', 'log-to-syslog')), self.to_bool(args.log_to_syslog), True) 408 | self.log_to_stderr = self.merge( 409 | self.to_bool(get('core', 'log-to-stderr')), self.to_bool(args.log_to_stderr), True) 410 | 411 | self.touch_swap = self.merge( 412 | self.to_bool(get('core', 'touch-swap')), self.to_bool(args.touch_swap), True) 413 | self.grub_update = self.merge( 414 | self.to_bool(get('core', 'grub-update')), self.to_bool(args.grub_update), True) 415 | 416 | self.ephemeral_check = self.merge( 417 | self.to_bool(get('core', 'check-ephemeral-volumes')), 418 | self.to_bool(args.check_ephemeral_volumes), True) 419 | self.freeze_timeout_curve = self.merge(get('core', 'freeze-timeout-curve'), args.freeze_timeout_curve, 420 | '0-8:20,8-16:40,16-64:60,64-128:150,128-256:200,256-:400') 421 | 422 | self.swap_percentage = self.merge( 423 | get_int('swap', 'percentage-of-ram'), args.swap_ram_percentage, 100) 424 | self.swap_mb = self.merge( 425 | get_int('swap', 'target-size-mb'), args.swap_target_size_mb, 4000) 426 | 427 | self.mkswap = self.merge(get('swap', 'mkswap'), args.mkswap, '/sbin/mkswap {swapfile}') 428 | self.swapon = self.merge(get('swap', 'swapon'), args.swapon, '/sbin/swapon {swapfile}') 429 | self.swapfile = self.merge(get('swap', 'swapfile'), args.swapfile, '/swap') 430 | 431 | self.hibernate = self.merge( 432 | get('pm-utils', 'hibernate-command'), args.hibernate, '/usr/sbin/pm-hibernate') 433 | self.url = self.merge( 434 | get('notification', 'monitored-url'), args.monitored_url, 435 | 'http://169.254.169.254/latest/meta-data/spot/instance-action') 436 | 437 | def merge(self, cf_value, arg_value, def_val): 438 | if arg_value is not None: 439 | return arg_value 440 | if cf_value is not None: 441 | return cf_value 442 | return def_val 443 | 444 | def to_bool(self, bool_str): 445 | """Parse the string and return the boolean value encoded or raise an exception""" 446 | if bool_str is None: 447 | return None 448 | if bool_str.lower() in ['true', 't', '1']: 449 | return True 450 | elif bool_str.lower() in ['false', 'f', '0']: 451 | return False 452 | # if here we couldn't parse it 453 | raise ValueError("%s is not recognized as a boolean value" % bool_str) 454 | 455 | def __str__(self): 456 | return str(self.__dict__) 457 | 458 | 459 | def get_pm_freeze_timeout(freeze_timeout_curve, ram_bytes): 460 | if not freeze_timeout_curve: 461 | return None 462 | 463 | ram_gb = ceil(ram_bytes / (1024.0*1024.0*1024.0)) 464 | try: 465 | for curve_part in freeze_timeout_curve.split(","): 466 | ram_sizes, timeout = curve_part.split(":") 467 | sizes_parts = ram_sizes.split("-") 468 | if len(sizes_parts) == 2 and sizes_parts[1] and sizes_parts[0]: 469 | ram_min = int(sizes_parts[0]) 470 | ram_max = int(sizes_parts[1]) 471 | elif len(sizes_parts) == 1 and sizes_parts[0] or \ 472 | len(sizes_parts) == 2 and not sizes_parts[1]: 473 | ram_min = int(sizes_parts[0]) 474 | ram_max = None 475 | else: 476 | raise Exception("can't parse %s, expected -[]" % ram_sizes) 477 | if (ram_min <= ram_gb and ram_max is None) or (ram_min <= ram_gb < ram_max): 478 | return int(timeout) 479 | except Exception as ex: 480 | log("Failed to parse the freeze timeout curve, error: %s. " 481 | "The pm_freeze_timeout will not be adjusted." % str(ex)) 482 | return None 483 | 484 | log("Failed to find a fitting PM freeze timeout curve segment " 485 | "for %d GB of RAM. The pm_freeze_timeout will not be adjusted." % ram_gb) 486 | return None 487 | 488 | 489 | def adjust_pm_timeout(timeout): 490 | try: 491 | with open('/sys/power/pm_freeze_timeout') as fl: 492 | cur_timeout = int(fl.read()) / 1000 493 | if cur_timeout >= timeout: 494 | log("Info current pm_freeze_timeout (%d seconds) is greater than or equal " 495 | "to the requested (%d seconds) timeout, doing nothing" % (cur_timeout, timeout)) 496 | else: 497 | with open('/sys/power/pm_freeze_timeout', 'w') as fl: 498 | fl.write("%d" % (timeout*1000)) 499 | log("Adjusted pm_freeze_timeout to %d from %d" % (timeout, cur_timeout)) 500 | except Exception as e: 501 | log("Failed to adjust pm_freeze_timeout to %d. Error: %s" % (timeout, str(e))) 502 | exit(1) 503 | 504 | def get_imds_token(seconds=21600): 505 | """ Get a token to access instance metadata. """ 506 | log("Requesting new IMDSv2 token.") 507 | request_header = {'X-aws-ec2-metadata-token-ttl-seconds': '{}'.format(seconds)} 508 | token_url = '{}/{}'.format(IMDS_BASEURL, IMDS_API_TOKEN_PATH) 509 | response = requests.put(token_url, headers=request_header) 510 | response.close() 511 | if response.status_code != requests.codes.ok: 512 | return None 513 | 514 | return response.text 515 | 516 | def hibernation_enabled(): 517 | """Returns a boolean indicating whether hibernation-option.configured is enabled or not.""" 518 | 519 | imds_token = get_imds_token() 520 | if imds_token is None: 521 | log("IMDS_V2 http endpoint is disabled") 522 | # IMDS http endpoint is disabled 523 | return False 524 | 525 | request_header = {'X-aws-ec2-metadata-token': imds_token} 526 | response = requests.get("{}/{}".format(IMDS_BASEURL, IMDS_SPOT_ACTION_PATH), 527 | headers=request_header) 528 | response.close() 529 | if response.status_code != requests.codes.ok or response.text.lower() == "false": 530 | return False 531 | 532 | log("Hibernation Configured Flag found") 533 | 534 | return True 535 | 536 | def main(): 537 | # Parse arguments 538 | parser = argparse.ArgumentParser(description="An EC2 agent that watches for instance stop " 539 | "notifications and initiates hibernation") 540 | parser.add_argument('-c', '--config', help='Configuration file to use', type=str) 541 | parser.add_argument('-i', '--pidfile', help='The file to write PID to', type=str, 542 | default='/run/hibagent') 543 | parser.add_argument('-f', '--foreground', help="Run in foreground, don't daemonize", action="store_true") 544 | 545 | parser.add_argument("-l", "--lock-in-ram", help='Lock the code in RAM', type=str) 546 | parser.add_argument("-syslog", "--log-to-syslog", help='Log to syslog', type=str) 547 | parser.add_argument("-stderr", "--log-to-stderr", help='Log to stderr', type=str) 548 | parser.add_argument("-touch", "--touch-swap", help='Do swap initialization', type=str) 549 | parser.add_argument("-grub", "--grub-update", help='Update GRUB config with resume offset', type=str) 550 | parser.add_argument("-e", "--check-ephemeral-volumes", help='Check if ephemeral volumes are mounted', type=str) 551 | parser.add_argument("-u", "--freeze-timeout-curve", help='The pm_freeze_timeout curve (by RAM size)', type=str) 552 | 553 | parser.add_argument("-p", "--swap-ram-percentage", help='The target swap size as a percentage of RAM', type=int) 554 | parser.add_argument("-s", "--swap-target-size-mb", help='The target swap size in megabytes', type=int) 555 | parser.add_argument("-w", "--swapfile", help="Swap file name", type=str) 556 | parser.add_argument('--mkswap', help='The command line utility to set up swap', type=str) 557 | parser.add_argument('--swapon', help='The command line utility to turn on swap', type=str) 558 | parser.add_argument('--hibernate', help='The command line utility to initiate hibernation', type=str) 559 | parser.add_argument('--monitored-url', help='The URL to monitor for notifications', type=str) 560 | 561 | args = parser.parse_args() 562 | 563 | config_file = ConfigParser() 564 | if args.config: 565 | config_file.read(args.config) 566 | 567 | config = Config(config_file, args) 568 | global log_to_syslog, log_to_stderr 569 | log_to_stderr = config.log_to_stderr 570 | log_to_syslog = config.log_to_syslog 571 | 572 | log("Effective config: %s" % config) 573 | 574 | # Let's first check if we need to kill the Spot Hibernate Agent 575 | if hibernation_enabled(): 576 | log("Spot Instance Launch has enabled Hibernation Configured Flag. hibagent exiting!!") 577 | exit(0) 578 | 579 | target_swap_size = config.swap_mb * 1024 * 1024 580 | ram_bytes = os.sysconf('SC_PAGE_SIZE') * os.sysconf('SC_PHYS_PAGES') 581 | swap_percentage_size = ram_bytes * config.swap_percentage // 100 582 | if swap_percentage_size > target_swap_size: 583 | target_swap_size = int(swap_percentage_size) 584 | log("Will check if swap is at least: %d megabytes" % (target_swap_size // (1024*1024))) 585 | 586 | timeout = get_pm_freeze_timeout(config.freeze_timeout_curve, ram_bytes) 587 | if timeout: 588 | adjust_pm_timeout(timeout) 589 | 590 | # Validate the swap configuration 591 | cur_swap = get_swap_space() 592 | bi = None 593 | if cur_swap >= target_swap_size - SWAP_RESERVED_SIZE: 594 | log("There's sufficient swap available (have %d, need %d)" % 595 | (cur_swap, target_swap_size)) 596 | update_kernel_swap_offset(config.grub_update) 597 | elif cur_swap > 0: 598 | log("There's not enough swap space available (have %d, need %d), exiting" % 599 | (cur_swap, target_swap_size)) 600 | exit(1) 601 | else: 602 | log("No swap is present, will create and initialize it") 603 | # We need to create swap, but first validate that we have enough free space 604 | swap_dev = os.path.dirname(config.swapfile) 605 | st = os.statvfs(swap_dev) 606 | free_bytes = st.f_bavail * st.f_frsize 607 | free_space_needed = target_swap_size + 10 * 1024 * 1024 608 | if free_space_needed >= free_bytes: 609 | log("There's not enough space (%d present, %d needed) on the swap device: %s" % ( 610 | free_bytes, free_space_needed, swap_dev)) 611 | exit(1) 612 | log("There's enough space (%d present, %d needed) on the swap device: %s" % ( 613 | free_bytes, free_space_needed, swap_dev)) 614 | 615 | sw = SwapInitializer(config.swapfile, target_swap_size, config.touch_swap, 616 | config.mkswap, config.swapon) 617 | bi = BackgroundInitializerRunner(sw, config.grub_update) 618 | 619 | # Daemonize now! The parent process will not return from this method 620 | if not args.foreground: 621 | log("Initial checks are finished, daemonizing and writing PID into %s" % args.pidfile) 622 | daemonize(args.pidfile) 623 | else: 624 | log("Initial checks are finished, will run in foreground now") 625 | 626 | poller = ItnPoller(config.url, config.hibernate, bi) 627 | if config.lock_in_ram: 628 | mlockall() 629 | 630 | # This loop will now be running inside the child 631 | if bi: 632 | bi.start_init() 633 | poller.poll_loop() 634 | 635 | 636 | if __name__ == '__main__': 637 | main() 638 | -------------------------------------------------------------------------------- /etc/hibagent-config.cfg: -------------------------------------------------------------------------------- 1 | [core] 2 | # Whether to lock the code in RAM to avoid being swapped out 3 | lock-in-ram = True 4 | log-to-syslog = True 5 | log-to-stderr = True 6 | 7 | # Automatically update GRUB config to include resume_offset for the swap file 8 | grub-update = True 9 | # Touch swap file to make sure EBS volume is pre-heated 10 | touch-swap = False 11 | 12 | # If set to true, then during the startup we check for presence 13 | # of ephemeral volumes in the instance metadata and fail 14 | # if any of them is mounted 15 | check-ephemeral-volumes = True 16 | 17 | # The pm_freeze timeout curve. Format is: -[]:seconds 18 | # For example, 0-8:20 means that for RAM sizes from 0 to 8 GB we'll set the 19 | # /sys/power/pm_freeze_timeout value to 20 seconds. 20 | # Leave empty if you want to disable pm_freeze adjustment 21 | freeze-timeout-curve=0-8:20,8-16:40,16-64:60,64-128:150,128-256:200,256-:400 22 | 23 | [swap] 24 | # If there's no swap then we create it to be equal to the specified 25 | # percentage of RAM or to the target size, whichever is greater 26 | percentage-of-ram = 100 27 | target-size-mb = 4000 28 | 29 | # The command used to initialize the swap file 30 | mkswap = /sbin/mkswap {swapfile} 31 | # The command used to turn on the swap 32 | swapon = /sbin/swapon {swapfile} 33 | swapfile = /swap 34 | 35 | [pm-utils] 36 | # The command used to initiate the hibernation 37 | hibernate-command = /usr/sbin/pm-hibernate 38 | 39 | [notification] 40 | # The URL to monitor for the notification 41 | monitored-url = http://169.254.169.254/latest/meta-data/spot/instance-action 42 | -------------------------------------------------------------------------------- /etc/init.d/hibagent: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # hibagent daemon 3 | # chkconfig: - 20 80 4 | # description: EC2 instance hibernation agent 5 | # processname: hibagent 6 | 7 | DAEMON_PATH="/usr/bin" 8 | 9 | NAME=hibagent 10 | DESC="My daemon description" 11 | PIDFILE=/var/run/$NAME.pid 12 | SCRIPTNAME=/etc/init.d/hibagent 13 | 14 | DAEMON=hibagent 15 | DAEMONOPTS="--config /etc/hibagent-config.cfg --pidfile $PIDFILE" 16 | 17 | if [[ $# -eq 0 ]]; then 18 | echo "Usage: $0 {start|stop|restart}" 19 | exit 1 20 | else 21 | COMMAND="$1" 22 | fi 23 | 24 | case $COMMAND in 25 | start) 26 | printf "%-50s" "Starting $NAME..." 27 | if [[ -f $PIDFILE ]]; then 28 | PID=`cat $PIDFILE` 29 | if [[ "`ps axf | grep ${PID} | grep -v grep`" ]]; then 30 | echo "hibagent is already running" 31 | exit 0 32 | fi 33 | fi 34 | 35 | cd $DAEMON_PATH 36 | $DAEMON $DAEMONOPTS > /dev/null 2>&1 37 | if [ $? -eq 0 ]; then 38 | printf "%s\n" "Ok" 39 | else 40 | printf "%s\n" "Fail" 41 | exit 1 42 | fi 43 | ;; 44 | status) 45 | printf "%-50s" "Checking $NAME..." 46 | if [[ -f $PIDFILE ]]; then 47 | PID=`cat $PIDFILE` 48 | if [[ -z "`ps axf | grep ${PID} | grep -v grep`" ]]; then 49 | printf "%s\n" "Process dead but pidfile exists" 50 | else 51 | echo "Running" 52 | fi 53 | else 54 | printf "%s\n" "Service not running" 55 | fi 56 | ;; 57 | stop) 58 | printf "%-50s" "Stopping $NAME" 59 | if [[ -f $PIDFILE ]]; then 60 | PID=`cat $PIDFILE` 61 | kill -HUP $PID 62 | printf "%s\n" "Ok" 63 | rm -f $PIDFILE 64 | exit 0 65 | else 66 | printf "%s\n" "already stopped" 67 | exit 0 68 | fi 69 | ;; 70 | 71 | restart) 72 | $0 stop 73 | $0 start 74 | ;; 75 | 76 | *) 77 | echo "Usage: $0 {status|start|stop|restart}" 78 | exit 1 79 | esac 80 | -------------------------------------------------------------------------------- /hibagent.spec: -------------------------------------------------------------------------------- 1 | Name: hibagent 2 | Version: 1.1.0 3 | Release: 6%{?dist} 4 | Summary: Hibernation trigger utility for AWS EC2 5 | 6 | Group: Development/Languages 7 | License: ASL 2.0 8 | URL: https://github.com/awslabs/hibagent 9 | Source0: hibagent-%{version}.tar.gz 10 | BuildArch: noarch 11 | BuildRequires: system-python system-python-devel 12 | Requires(preun): initscripts 13 | Requires(postun): initscripts 14 | Requires(post): initscripts 15 | Requires: /sbin/service 16 | Requires: /sbin/chkconfig 17 | 18 | %description 19 | An EC2 agent that watches for instance stop notifications and initiates hibernation 20 | 21 | %prep 22 | %setup -q -n hibagent-%{version} 23 | 24 | %build 25 | %{__sys_python} setup.py build 26 | 27 | %install 28 | %{__sys_python} setup.py install --prefix=usr -O1 --skip-build --root $RPM_BUILD_ROOT 29 | 30 | %files 31 | %defattr(-,root,root) 32 | %doc LICENSE.txt README.md 33 | %{_sysconfdir}/hibagent-config.cfg 34 | %{_sysconfdir}/init.d/hibagent 35 | %{_bindir}/hibagent 36 | %{_bindir}/enable-ec2-spot-hibernation 37 | %{sys_python_sitelib}/* 38 | 39 | %clean 40 | rm -rf $RPM_BUILD_ROOT 41 | 42 | %post 43 | if [ $1 = 1 ]; then 44 | #initial installation 45 | /sbin/chkconfig --add hibagent 46 | fi 47 | 48 | %preun 49 | if [ $1 = 0 ]; then 50 | # Package removal, not upgrade 51 | /sbin/service hibagent stop >/dev/null 2>&1 52 | /sbin/chkconfig --del hibagent 53 | fi 54 | 55 | %postun 56 | if [ $1 -ge 1 ]; then 57 | /sbin/service hibagent condrestart >/dev/null 2>&1 || : 58 | fi 59 | 60 | %changelog 61 | * Mon Sep 4 2017 Aleksei Besogonov - 1.0.0-1 62 | - Initial build 63 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | 3 | try: 4 | from setuptools import setup 5 | except ImportError: 6 | from distutils.core import setup 7 | 8 | hib_classifiers = [ 9 | "License :: OSI Approved :: MIT License", 10 | "Topic :: Utilities", 11 | ] 12 | 13 | with open("README.md", "r") as fp: 14 | hib_long_description = fp.read() 15 | 16 | setup(name="hibagent", 17 | version='1.1.0', 18 | author="Aleksei Besogonov", 19 | author_email="cyberax@amazon.com", 20 | url="https://github.com/awslabs/hibagent", 21 | tests_require=["pytest"], 22 | scripts=['agent/hibagent', 'agent/enable-ec2-spot-hibernation'], 23 | data_files=[('/etc', ['etc/hibagent-config.cfg']), 24 | ('/etc/init.d', ['etc/init.d/hibagent'])], 25 | test_suite='test', 26 | description="Hibernation Trigger for EC2 Spot Instances", 27 | long_description=hib_long_description, 28 | license="Apache 2.0", 29 | classifiers=hib_classifiers 30 | ) 31 | -------------------------------------------------------------------------------- /test/__init__.py: -------------------------------------------------------------------------------- 1 | # Placeholder 2 | -------------------------------------------------------------------------------- /test/hibagent_test.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python2.7 2 | import platform 3 | import unittest 4 | import os 5 | from os import unlink, stat 6 | from random import random 7 | from tempfile import mktemp 8 | from threading import Thread 9 | 10 | import imp 11 | agent_file = os.path.dirname(os.path.abspath(__file__))+"/../agent/hibagent" 12 | with open(agent_file) as fl: 13 | hibagent = imp.load_module('hibagent', fl, agent_file, ('py', 'rb', 1)) 14 | 15 | from time import sleep 16 | 17 | try: 18 | from http.server import BaseHTTPRequestHandler, HTTPServer 19 | except: 20 | import BaseHTTPServer 21 | BaseHTTPRequestHandler=BaseHTTPServer.BaseHTTPRequestHandler 22 | HTTPServer=BaseHTTPServer.HTTPServer 23 | 24 | # Patch out a function that require actual root privileges 25 | hibagent.update_kernel_swap_offset = lambda _: 0 26 | 27 | if platform.system() == "Darwin": 28 | os.O_DIRECT = 0 # Just pretend that it's there 29 | 30 | 31 | class TestGrubPatching(unittest.TestCase): 32 | doc = """ 33 | # created by imagebuilder 34 | default=0 35 | timeout=0 36 | hiddenmenu 37 | 38 | title Amazon Linux 2017.09 (4.9.51-10.53.amzn1.x86_64) 39 | root (hd0,0) 40 | kernel /boot/vmlinuz-4.9.51-10.53.amzn1.x86_64 root=LABEL=/ console=tty1 console=ttyS0 selinux=0 LANG=en_US.UTF-8 KEYTABLE=us 41 | initrd /boot/initramfs-4.9.51-10.53.amzn1.x86_64.img 42 | 43 | title Amazon Linux 2017.09 (4.9.51-10.52.amzn1.x86_64) 44 | root (hd0,0) 45 | kernel /boot/vmlinuz-4.9.51-10.52.amzn1.x86_64 root=LABEL=/ resume_offset=456 console=tty1 console=ttyS0 selinux=0 resume=/dev/nope 46 | initrd /boot/initramfs-4.9.51-10.52.amzn1.x86_64.img 47 | """ 48 | good_doc = """ 49 | # created by imagebuilder 50 | default=0 51 | timeout=0 52 | hiddenmenu 53 | 54 | title Amazon Linux 2017.09 (4.9.51-10.53.amzn1.x86_64) 55 | root (hd0,0) 56 | kernel /boot/vmlinuz-4.9.51-10.53.amzn1.x86_64 root=LABEL=/ console=tty1 console=ttyS0 selinux=0 LANG=en_US.UTF-8 KEYTABLE=us no_console_suspend=1 resume_offset=123 resume=/dev/help 57 | initrd /boot/initramfs-4.9.51-10.53.amzn1.x86_64.img 58 | 59 | title Amazon Linux 2017.09 (4.9.51-10.52.amzn1.x86_64) 60 | root (hd0,0) 61 | kernel /boot/vmlinuz-4.9.51-10.52.amzn1.x86_64 root=LABEL=/ console=tty1 console=ttyS0 selinux=0 no_console_suspend=1 resume_offset=123 resume=/dev/help 62 | initrd /boot/initramfs-4.9.51-10.52.amzn1.x86_64.img 63 | """ 64 | 65 | def test_patch(self): 66 | grub = mktemp() 67 | try: 68 | with open(grub, "w") as fl: 69 | fl.write(TestGrubPatching.doc) 70 | 71 | hibagent.patch_grub_config("/dev/help", 123, grub, None) 72 | 73 | with open(grub, "r") as fl: 74 | content = fl.read() 75 | finally: 76 | unlink(grub) 77 | 78 | self.assertEqual(TestGrubPatching.good_doc, content) 79 | 80 | 81 | class TestPmFreezeCurve(unittest.TestCase): 82 | def test_curve_parsing(self): 83 | curve = '0-8:20,8-16:40,16-64:60,64-128:150,128-256:200,256-:400' 84 | GB = 1024 ** 3 85 | self.assertEqual(20, hibagent.get_pm_freeze_timeout(curve, 7*GB)) 86 | self.assertEqual(40, hibagent.get_pm_freeze_timeout(curve, 8*GB)) 87 | self.assertEqual(200, hibagent.get_pm_freeze_timeout(curve, 128*GB)) 88 | self.assertEqual(400, hibagent.get_pm_freeze_timeout(curve, 500*GB)) 89 | 90 | def test_bad_curves(self): 91 | holey_curve = '0-8:20,16-64:60' 92 | GB = 1024**3 93 | self.assertIsNone(hibagent.get_pm_freeze_timeout(holey_curve, 9*GB)) 94 | self.assertIsNone(hibagent.get_pm_freeze_timeout(holey_curve, 70*GB)) 95 | self.assertEqual(20, hibagent.get_pm_freeze_timeout(holey_curve, 7*GB)) 96 | self.assertEqual(60, hibagent.get_pm_freeze_timeout(holey_curve, 22*GB)) 97 | 98 | 99 | class TestHibernation(unittest.TestCase): 100 | def setUp(self): 101 | self.swapfile = mktemp() 102 | self.mkswap_flag = mktemp() 103 | self.swapon_flag = mktemp() 104 | 105 | def tearDown(self): 106 | def _unlink(fl): 107 | # noinspection PyBroadException 108 | try: 109 | unlink(fl) 110 | except: 111 | pass 112 | _unlink(self.swapfile) 113 | _unlink(self.mkswap_flag) 114 | _unlink(self.swapon_flag) 115 | 116 | def test_swap_initializer(self): 117 | si = hibagent.SwapInitializer(self.swapfile, 100663296, True, 118 | '/usr/bin/touch %s' % self.mkswap_flag, 119 | '/usr/bin/touch %s' % self.swapon_flag) 120 | # Default filler 121 | expected = b'b' * 1024 122 | self.do_fill_file(si, expected) 123 | 124 | def test_need_to_hurry(self): 125 | si = hibagent.SwapInitializer(self.swapfile, 100663296, True, 126 | '/usr/bin/touch %s' % self.mkswap_flag, 127 | '/usr/bin/touch %s' % self.swapon_flag) 128 | si.need_to_hurry = True 129 | # The file must be zero-padded 130 | expected = b'\0' * 1024 131 | self.do_fill_file(si, expected) 132 | 133 | def do_fill_file(self, si, expected_filler): 134 | 135 | si.init_swap() 136 | # Assert that the swapfile exists and is appropriately sized 137 | self.assertEqual(100663296, stat(self.swapfile).st_size) 138 | si.turn_on_swap() 139 | 140 | # Assert that we have 'turned on' the swap 141 | stat(self.swapon_flag) 142 | stat(self.mkswap_flag) 143 | 144 | with open(self.swapfile) as fl: 145 | while True: 146 | buf = os.read(fl.fileno(), 1024) 147 | if not buf: 148 | break 149 | self.assertEqual(expected_filler, buf) 150 | 151 | 152 | class FakeSwapper(object): 153 | def __init__(self): 154 | self.need_to_hurry = False 155 | self.finished = False 156 | self.turned_on = False 157 | 158 | def init_swap(self): 159 | while not self.finished and not self.need_to_hurry: 160 | sleep(0.1) 161 | 162 | def turn_on_swap(self): 163 | self.turned_on = True 164 | 165 | 166 | class TestSwapInitializer(unittest.TestCase): 167 | def test_background_run(self): 168 | fs = FakeSwapper() 169 | bi = hibagent.BackgroundInitializerRunner(fs, False) 170 | bi.start_init() 171 | self.assertFalse(bi.check_finished()) 172 | 173 | # Signal for the init end and check it 174 | fs.finished = True 175 | while not bi.check_finished(): 176 | sleep(0.1) 177 | self.assertFalse(fs.need_to_hurry) 178 | self.assertTrue(fs.turned_on) 179 | 180 | def test_early_interrupt(self): 181 | fs = FakeSwapper() 182 | bi = hibagent.BackgroundInitializerRunner(fs, False) 183 | bi.start_init() 184 | self.assertFalse(bi.check_finished()) 185 | bi.force_completion() 186 | self.assertTrue(fs.need_to_hurry) 187 | self.assertTrue(fs.turned_on) 188 | 189 | def test_error(self): 190 | def raiser(): 191 | raise Exception("test") 192 | 193 | fs = FakeSwapper() 194 | bi = hibagent.BackgroundInitializerRunner(fs, False) 195 | fs.init_swap = raiser 196 | bi.start_init() 197 | try: 198 | while not bi.check_finished(): 199 | sleep(0.1) 200 | self.fail("Should have thrown") 201 | except Exception as ex: 202 | self.assertEqual("test", str(ex)) 203 | 204 | 205 | global_content = '' 206 | 207 | 208 | class SimpleHandler(BaseHTTPRequestHandler): 209 | def _set_headers(self, code): 210 | self.send_response(code) 211 | self.send_header('Content-type', 'text/html') 212 | self.end_headers() 213 | 214 | def do_GET(self): 215 | global global_content 216 | if global_content: 217 | self.send_error(200, global_content) 218 | else: 219 | self.send_error(404) 220 | 221 | 222 | class ServerRunner(object): 223 | def __init__(self): 224 | self.port = int(random()*30000+10000) 225 | server_address = ('localhost', self.port) 226 | self.httpd = HTTPServer(server_address, SimpleHandler) 227 | 228 | def run(self): 229 | thread = Thread(target=self.httpd.serve_forever, name="SwapInitializer") 230 | thread.setDaemon(True) 231 | thread.start() 232 | 233 | def stop(self): 234 | self.httpd.shutdown() 235 | self.httpd.server_close() 236 | 237 | 238 | class FakeInitializer(object): 239 | def __init__(self): 240 | self.forced = False 241 | self.finished = False 242 | 243 | def check_finished(self): 244 | return self.finished 245 | 246 | def force_completion(self): 247 | self.forced = True 248 | 249 | 250 | class ItnPollerTest(unittest.TestCase): 251 | def setUp(self): 252 | self.server = ServerRunner() 253 | self.server.run() 254 | self.flagfile = mktemp() 255 | 256 | def tearDown(self): 257 | self.server.stop() 258 | # noinspection PyBroadException 259 | try: 260 | unlink(self.flagfile) 261 | except: 262 | pass 263 | 264 | def test_itn_polls(self): 265 | fi = FakeInitializer() 266 | poller = hibagent.ItnPoller("http://localhost:%d/blah" % self.server.port, 267 | '/usr/bin/touch %s' % self.flagfile, fi) 268 | poller.run_loop_iteration() 269 | # Nothing happens 270 | self.check_not_exists() 271 | 272 | # Signal the hibernation 273 | global global_content 274 | global_content = 'hibernate' 275 | 276 | poller.run_loop_iteration() 277 | # Now we should have hibernated 278 | self.assertTrue(fi.forced) 279 | stat(self.flagfile) 280 | 281 | def check_not_exists(self): 282 | # noinspection PyBroadException 283 | try: 284 | stat(self.flagfile) 285 | self.fail("Should not exist") 286 | except: 287 | pass 288 | 289 | 290 | if __name__ == '__main__': 291 | unittest.main() 292 | --------------------------------------------------------------------------------