├── LICENSE
├── README.md
├── ansible.cfg
├── clean-os-error.sh
├── cluster_create.sh
├── cluster_create_local.sh
├── cluster_destroy.sh
├── cluster_destroy_local.sh
├── compute_build_base_img.yml
├── compute_take_snapshot.sh
├── cron-node-check.sh
├── figures
    └── virtual-clusters.jpeg
├── install.sh
├── install_jupyterhub.yml
├── install_local.sh
├── jhub_files
    ├── https_redirect.conf.j2
    ├── jhub_conf.py
    ├── jhub_service.j2
    ├── jhub_sudoers
    ├── jupyterhub.conf.j2
    └── python_mod_3.8
├── prevent-updates.ci
├── slurm-logrotate.conf
├── slurm.conf
├── slurm_prolog.sh
├── slurm_resume.sh
├── slurm_suspend.sh
├── slurm_test.job
└── ssh.cfg


/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2018 XSEDE
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Elastic Slurm Cluster on the Jetstream 2 Cloud
 2 | 
 3 | ## Intro
 4 | 
 5 | This repo contains scripts and ansible playbooks for creating a virtual 
 6 | cluster in an Openstack environment, specifically aimed at the Jetstream2 resource.
 7 | 
 8 | The basic structure is to have a single image act as headnode, with
 9 | compute nodes managed by SLURM via the openstack API. A customized 
10 | image is created for worker nodes, which contains configuration 
11 | and software specific to that cluster. The Slurm daemon on the
12 | headnode dynamically creates and destroys worker nodes in response to 
13 | jobs in the queue (refer to the figure below). The current version is based on Rocky Linux 8, using
14 | RPMs from the [OpenHPC project](https://openhpc.community).
15 | 
16 | As current installation scripts work for Rocky Linux 8 distribution it is expected to have
17 | a virtual machine created from the latest Rocky Linux 8 base image in Jestream 2 before proceeding with the installation
18 | 
19 | ![Integration Diagram](figures/virtual-clusters.jpeg)
20 | 
21 | ### Installation 
22 | 1. Login to the Rocky Linux installed virtual machine. This is the installation host and the head node for the virtual cluster
23 | 2. Move to the rocky user if you are in a different user. ```sudo su - rocky```
24 | 3. If you have not already done, create an openrc fire for your jetsream2 account by following the [Jestream 2 Documentation](https://docs.jetstream-cloud.org/ui/cli/openrc/)
25 | 4. Copy the generated openrc file to the home directory of rocky user
26 | 5. Clone the [Virtual Cluster repository](https://github.com/access-ci-org/Jetstream_Cluster) 
27 | 6. If you'd like to modify your cluster, now is a good time!
28 | 
29 |     * The number of nodes can be set in the slurm.conf file, by editing
30 |       the NodeName and PartitionName line.
31 |     * If you'd like to change the default node size, the ```node_size=```line
32 |       in ```slurm_resume.sh``` must be changed.
33 |     * If you'd like to enable any specific software, you should edit
34 |       ```compute_build_base_img.yml```. The task named "install basic packages"
35 |       can be easily extended to install anything available from a yum
36 |       repository. If you need to *add* a repo, you can copy the task
37 |       titled "Add OpenHPC 1.3.? Repo". For more detailed configuration,
38 |       it may be easiest to build your software in /export on the headnode,
39 |       and only install the necessary libraries via the compute_build_base_img
40 |       (or ensure that they're available in the shared filesystem).
41 |     * For other modifications, feel free to get in touch!
42 | 7. Now you are all set to install the cluster. Run ```cluster_create_local.sh``` as the rocky user. This will take around
43 |    30 minutes to fully install the cluster.
44 | 8. If you need to destroy the cluster, Run ```cluster_destroy_local.sh```. This will decommission the SLURM cluster and any
45 |    runnning compute nodes. If you need to delete the headnode as well, pass the -d paramaeter to cluster destroy script
46 | 
47 | 
48 | ### Usage note:
49 | Slurm will run the suspend/resume scripts in response to
50 | ```
51 | scontrol update nodename=compute-[0-1] state=power_down
52 | ```
53 | or
54 | ```
55 | scontrol update nodename=compute-[0-1] state=power_up
56 | ```
57 | 
58 | If compute instances got stuck in a bad state, it's often helpful to
59 | cycle through the following:
60 | 
61 | ```
62 | scontrol update nodename=compute-[?] state=down reason=resetting
63 | ```
64 | ```
65 | scontrol update nodename=compute-[?] state=idle
66 | ```
67 | 
68 | or to re-run the suspend/resume scripts as above (if the instance
69 | power state doesn't match the current state as seen by slurm). Instances
70 | in a failed state within Openstack may simply be deleted, as they will
71 | be built anew by slurm the next time they are needed.
72 | 
73 | 
74 | This work supported by [![NSF-1548562](https://img.shields.io/badge/NSF-1548562-blue.svg)](https://nsf.gov/awardsearch/showAward?AWD_ID=1548562)
75 | 


--------------------------------------------------------------------------------
/ansible.cfg:
--------------------------------------------------------------------------------
  1 | # config file for ansible -- https://ansible.com/
  2 | # ===============================================
  3 | 
  4 | # nearly all parameters can be overridden in ansible-playbook
  5 | # or with command line flags. ansible will read ANSIBLE_CONFIG,
  6 | # ansible.cfg in the current working directory, .ansible.cfg in
  7 | # the home directory or /etc/ansible/ansible.cfg, whichever it
  8 | # finds first
  9 | 
 10 | [defaults]
 11 | 
 12 | # some basic default values...
 13 | 
 14 | #inventory      = /etc/ansible/hosts
 15 | #library        = /usr/share/my_modules/
 16 | #module_utils   = /usr/share/my_module_utils/
 17 | remote_tmp     = /tmp/.ansible/tmp
 18 | local_tmp      = /tmp/.ansible/tmp
 19 | #forks          = 5
 20 | #poll_interval  = 15
 21 | #sudo_user      = root
 22 | #ask_sudo_pass = True
 23 | #ask_pass      = True
 24 | #transport      = smart
 25 | #remote_port    = 22
 26 | #module_lang    = C
 27 | #module_set_locale = False
 28 | 
 29 | # plays will gather facts by default, which contain information about
 30 | # the remote system.
 31 | #
 32 | # smart - gather by default, but don't regather if already gathered
 33 | # implicit - gather by default, turn off with gather_facts: False
 34 | # explicit - do not gather by default, must say gather_facts: True
 35 | #gathering = implicit
 36 | 
 37 | # This only affects the gathering done by a play's gather_facts directive,
 38 | # by default gathering retrieves all facts subsets
 39 | # all - gather all subsets
 40 | # network - gather min and network facts
 41 | # hardware - gather hardware facts (longest facts to retrieve)
 42 | # virtual - gather min and virtual facts
 43 | # facter - import facts from facter
 44 | # ohai - import facts from ohai
 45 | # You can combine them using comma (ex: network,virtual)
 46 | # You can negate them using ! (ex: !hardware,!facter,!ohai)
 47 | # A minimal set of facts is always gathered.
 48 | #gather_subset = all
 49 | 
 50 | # some hardware related facts are collected
 51 | # with a maximum timeout of 10 seconds. This
 52 | # option lets you increase or decrease that
 53 | # timeout to something more suitable for the
 54 | # environment. 
 55 | # gather_timeout = 10
 56 | 
 57 | # additional paths to search for roles in, colon separated
 58 | #roles_path    = /etc/ansible/roles
 59 | 
 60 | # uncomment this to disable SSH key host checking
 61 | #host_key_checking = False
 62 | 
 63 | # change the default callback
 64 | #stdout_callback = skippy
 65 | # enable additional callbacks
 66 | #callback_whitelist = timer, mail
 67 | 
 68 | # Determine whether includes in tasks and handlers are "static" by
 69 | # default. As of 2.0, includes are dynamic by default. Setting these
 70 | # values to True will make includes behave more like they did in the
 71 | # 1.x versions.
 72 | #task_includes_static = True
 73 | #handler_includes_static = True
 74 | 
 75 | # Controls if a missing handler for a notification event is an error or a warning
 76 | #error_on_missing_handler = True
 77 | 
 78 | # change this for alternative sudo implementations
 79 | #sudo_exe = sudo
 80 | 
 81 | # What flags to pass to sudo
 82 | # WARNING: leaving out the defaults might create unexpected behaviours
 83 | #sudo_flags = -H -S -n
 84 | 
 85 | # SSH timeout
 86 | #timeout = 10
 87 | 
 88 | # default user to use for playbooks if user is not specified
 89 | # (/usr/bin/ansible will use current user as default)
 90 | #remote_user = root
 91 | 
 92 | # logging is off by default unless this path is defined
 93 | # if so defined, consider logrotate
 94 | #log_path = /var/log/ansible.log
 95 | 
 96 | # default module name for /usr/bin/ansible
 97 | #module_name = command
 98 | 
 99 | # use this shell for commands executed under sudo
100 | # you may need to change this to bin/bash in rare instances
101 | # if sudo is constrained
102 | #executable = /bin/sh
103 | 
104 | # if inventory variables overlap, does the higher precedence one win
105 | # or are hash values merged together?  The default is 'replace' but
106 | # this can also be set to 'merge'.
107 | #hash_behaviour = replace
108 | 
109 | # by default, variables from roles will be visible in the global variable
110 | # scope. To prevent this, the following option can be enabled, and only
111 | # tasks and handlers within the role will see the variables there
112 | #private_role_vars = yes
113 | 
114 | # list any Jinja2 extensions to enable here:
115 | #jinja2_extensions = jinja2.ext.do,jinja2.ext.i18n
116 | 
117 | # if set, always use this private key file for authentication, same as
118 | # if passing --private-key to ansible or ansible-playbook
119 | #private_key_file = /path/to/file
120 | 
121 | # If set, configures the path to the Vault password file as an alternative to
122 | # specifying --vault-password-file on the command line.
123 | #vault_password_file = /path/to/vault_password_file
124 | 
125 | # format of string {{ ansible_managed }} available within Jinja2
126 | # templates indicates to users editing templates files will be replaced.
127 | # replacing {file}, {host} and {uid} and strftime codes with proper values.
128 | #ansible_managed = Ansible managed: {file} modified on %Y-%m-%d %H:%M:%S by {uid} on {host}
129 | # {file}, {host}, {uid}, and the timestamp can all interfere with idempotence
130 | # in some situations so the default is a static string:
131 | #ansible_managed = Ansible managed
132 | 
133 | # by default, ansible-playbook will display "Skipping [host]" if it determines a task
134 | # should not be run on a host.  Set this to "False" if you don't want to see these "Skipping"
135 | # messages. NOTE: the task header will still be shown regardless of whether or not the
136 | # task is skipped.
137 | #display_skipped_hosts = True
138 | 
139 | # by default, if a task in a playbook does not include a name: field then
140 | # ansible-playbook will construct a header that includes the task's action but
141 | # not the task's args.  This is a security feature because ansible cannot know
142 | # if the *module* considers an argument to be no_log at the time that the
143 | # header is printed.  If your environment doesn't have a problem securing
144 | # stdout from ansible-playbook (or you have manually specified no_log in your
145 | # playbook on all of the tasks where you have secret information) then you can
146 | # safely set this to True to get more informative messages.
147 | #display_args_to_stdout = False
148 | 
149 | # by default (as of 1.3), Ansible will raise errors when attempting to dereference
150 | # Jinja2 variables that are not set in templates or action lines. Uncomment this line
151 | # to revert the behavior to pre-1.3.
152 | #error_on_undefined_vars = False
153 | 
154 | # by default (as of 1.6), Ansible may display warnings based on the configuration of the
155 | # system running ansible itself. This may include warnings about 3rd party packages or
156 | # other conditions that should be resolved if possible.
157 | # to disable these warnings, set the following value to False:
158 | #system_warnings = True
159 | 
160 | # by default (as of 1.4), Ansible may display deprecation warnings for language
161 | # features that should no longer be used and will be removed in future versions.
162 | # to disable these warnings, set the following value to False:
163 | #deprecation_warnings = True
164 | 
165 | # (as of 1.8), Ansible can optionally warn when usage of the shell and
166 | # command module appear to be simplified by using a default Ansible module
167 | # instead.  These warnings can be silenced by adjusting the following
168 | # setting or adding warn=yes or warn=no to the end of the command line
169 | # parameter string.  This will for example suggest using the git module
170 | # instead of shelling out to the git command.
171 | # command_warnings = False
172 | 
173 | 
174 | # set plugin path directories here, separate with colons
175 | #action_plugins     = /usr/share/ansible/plugins/action
176 | #cache_plugins      = /usr/share/ansible/plugins/cache
177 | #callback_plugins   = /usr/share/ansible/plugins/callback
178 | #connection_plugins = /usr/share/ansible/plugins/connection
179 | #lookup_plugins     = /usr/share/ansible/plugins/lookup
180 | #inventory_plugins  = /usr/share/ansible/plugins/inventory
181 | #vars_plugins       = /usr/share/ansible/plugins/vars
182 | #filter_plugins     = /usr/share/ansible/plugins/filter
183 | #test_plugins       = /usr/share/ansible/plugins/test
184 | #terminal_plugins   = /usr/share/ansible/plugins/terminal
185 | #strategy_plugins   = /usr/share/ansible/plugins/strategy
186 | 
187 | 
188 | # by default, ansible will use the 'linear' strategy but you may want to try
189 | # another one
190 | #strategy = free
191 | 
192 | # by default callbacks are not loaded for /bin/ansible, enable this if you
193 | # want, for example, a notification or logging callback to also apply to
194 | # /bin/ansible runs
195 | #bin_ansible_callbacks = False
196 | 
197 | 
198 | # don't like cows?  that's unfortunate.
199 | # set to 1 if you don't want cowsay support or export ANSIBLE_NOCOWS=1
200 | #nocows = 1
201 | 
202 | # set which cowsay stencil you'd like to use by default. When set to 'random',
203 | # a random stencil will be selected for each task. The selection will be filtered
204 | # against the `cow_whitelist` option below.
205 | #cow_selection = default
206 | #cow_selection = random
207 | 
208 | # when using the 'random' option for cowsay, stencils will be restricted to this list.
209 | # it should be formatted as a comma-separated list with no spaces between names.
210 | # NOTE: line continuations here are for formatting purposes only, as the INI parser
211 | #       in python does not support them.
212 | #cow_whitelist=bud-frogs,bunny,cheese,daemon,default,dragon,elephant-in-snake,elephant,eyes,\
213 | #              hellokitty,kitty,luke-koala,meow,milk,moofasa,moose,ren,sheep,small,stegosaurus,\
214 | #              stimpy,supermilker,three-eyes,turkey,turtle,tux,udder,vader-koala,vader,www
215 | 
216 | # don't like colors either?
217 | # set to 1 if you don't want colors, or export ANSIBLE_NOCOLOR=1
218 | #nocolor = 1
219 | 
220 | # if set to a persistent type (not 'memory', for example 'redis') fact values
221 | # from previous runs in Ansible will be stored.  This may be useful when
222 | # wanting to use, for example, IP information from one group of servers
223 | # without having to talk to them in the same playbook run to get their
224 | # current IP information.
225 | #fact_caching = memory
226 | 
227 | 
228 | # retry files
229 | # When a playbook fails by default a .retry file will be created in ~/
230 | # You can disable this feature by setting retry_files_enabled to False
231 | # and you can change the location of the files by setting retry_files_save_path
232 | 
233 | retry_files_enabled = False
234 | #retry_files_save_path = ~/.ansible-retry
235 | 
236 | # squash actions
237 | # Ansible can optimise actions that call modules with list parameters
238 | # when looping. Instead of calling the module once per with_ item, the
239 | # module is called once with all items at once. Currently this only works
240 | # under limited circumstances, and only with parameters named 'name'.
241 | #squash_actions = apk,apt,dnf,homebrew,pacman,pkgng,yum,zypper
242 | 
243 | # prevents logging of task data, off by default
244 | #no_log = False
245 | 
246 | # prevents logging of tasks, but only on the targets, data is still logged on the master/controller
247 | #no_target_syslog = False
248 | 
249 | # controls whether Ansible will raise an error or warning if a task has no
250 | # choice but to create world readable temporary files to execute a module on
251 | # the remote machine.  This option is False by default for security.  Users may
252 | # turn this on to have behaviour more like Ansible prior to 2.1.x.  See
253 | # https://docs.ansible.com/ansible/become.html#becoming-an-unprivileged-user
254 | # for more secure ways to fix this than enabling this option.
255 | #allow_world_readable_tmpfiles = False
256 | 
257 | # controls the compression level of variables sent to
258 | # worker processes. At the default of 0, no compression
259 | # is used. This value must be an integer from 0 to 9.
260 | #var_compression_level = 9
261 | 
262 | # controls what compression method is used for new-style ansible modules when
263 | # they are sent to the remote system.  The compression types depend on having
264 | # support compiled into both the controller's python and the client's python.
265 | # The names should match with the python Zipfile compression types:
266 | # * ZIP_STORED (no compression. available everywhere)
267 | # * ZIP_DEFLATED (uses zlib, the default)
268 | # These values may be set per host via the ansible_module_compression inventory
269 | # variable
270 | #module_compression = 'ZIP_DEFLATED'
271 | 
272 | # This controls the cutoff point (in bytes) on --diff for files
273 | # set to 0 for unlimited (RAM may suffer!).
274 | #max_diff_size = 1048576
275 | 
276 | # This controls how ansible handles multiple --tags and --skip-tags arguments
277 | # on the CLI.  If this is True then multiple arguments are merged together.  If
278 | # it is False, then the last specified argument is used and the others are ignored.
279 | #merge_multiple_cli_flags = False
280 | 
281 | # Controls showing custom stats at the end, off by default
282 | #show_custom_stats = True
283 | 
284 | # Controls which files to ignore when using a directory as inventory with
285 | # possibly multiple sources (both static and dynamic)
286 | #inventory_ignore_extensions = ~, .orig, .bak, .ini, .cfg, .retry, .pyc, .pyo
287 | 
288 | # This family of modules use an alternative execution path optimized for network appliances
289 | # only update this setting if you know how this works, otherwise it can break module execution
290 | #network_group_modules=['eos', 'nxos', 'ios', 'iosxr', 'junos', 'vyos']
291 | 
292 | # When enabled, this option allows lookups (via variables like {{lookup('foo')}} or when used as
293 | # a loop with `with_foo`) to return data that is not marked "unsafe". This means the data may contain
294 | # jinja2 templating language which will be run through the templating engine.
295 | # ENABLING THIS COULD BE A SECURITY RISK
296 | #allow_unsafe_lookups = False
297 | 
298 | [privilege_escalation]
299 | become=True
300 | become_method=sudo
301 | become_user=root
302 | become_ask_pass=False
303 | 
304 | [paramiko_connection]
305 | 
306 | # uncomment this line to cause the paramiko connection plugin to not record new host
307 | # keys encountered.  Increases performance on new host additions.  Setting works independently of the
308 | # host key checking setting above.
309 | #record_host_keys=False
310 | 
311 | # by default, Ansible requests a pseudo-terminal for commands executed under sudo. Uncomment this
312 | # line to disable this behaviour.
313 | #pty=False
314 | 
315 | # paramiko will default to looking for SSH keys initially when trying to
316 | # authenticate to remote devices.  This is a problem for some network devices
317 | # that close the connection after a key failure.  Uncomment this line to
318 | # disable the Paramiko look for keys function
319 | #look_for_keys = False
320 | 
321 | # When using persistent connections with Paramiko, the connection runs in a
322 | # background process.  If the host doesn't already have a valid SSH key, by
323 | # default Ansible will prompt to add the host key.  This will cause connections
324 | # running in background processes to fail.  Uncomment this line to have
325 | # Paramiko automatically add host keys.
326 | #host_key_auto_add = True
327 | 
328 | [ssh_connection]
329 | 
330 | # ssh arguments to use
331 | # Leaving off ControlPersist will result in poor performance, so use
332 | # paramiko on older platforms rather than removing it, -C controls compression use
333 | ssh_args = -F /etc/ansible/ssh.cfg -C -o ControlMaster=auto -o ControlPersist=60s
334 | 
335 | # The base directory for the ControlPath sockets. 
336 | # This is the "%(directory)s" in the control_path option
337 | # 
338 | # Example: 
339 | # control_path_dir = /tmp/.ansible/cp
340 | control_path_dir = /tmp/.ansible/cp
341 | 
342 | # The path to use for the ControlPath sockets. This defaults to a hashed string of the hostname, 
343 | # port and username (empty string in the config). The hash mitigates a common problem users 
344 | # found with long hostames and the conventional %(directory)s/ansible-ssh-%%h-%%p-%%r format. 
345 | # In those cases, a "too long for Unix domain socket" ssh error would occur.
346 | #
347 | # Example:
348 | # control_path = %(directory)s/%%h-%%r
349 | #control_path =
350 | 
351 | # Enabling pipelining reduces the number of SSH operations required to
352 | # execute a module on the remote server. This can result in a significant
353 | # performance improvement when enabled, however when using "sudo:" you must
354 | # first disable 'requiretty' in /etc/sudoers
355 | #
356 | # By default, this option is disabled to preserve compatibility with
357 | # sudoers configurations that have requiretty (the default on many distros).
358 | #
359 | #pipelining = False
360 | 
361 | # Control the mechanism for transferring files (old)
362 | #   * smart = try sftp and then try scp [default]
363 | #   * True = use scp only
364 | #   * False = use sftp only
365 | #scp_if_ssh = smart
366 | 
367 | # Control the mechanism for transferring files (new)
368 | # If set, this will override the scp_if_ssh option
369 | #   * sftp  = use sftp to transfer files
370 | #   * scp   = use scp to transfer files
371 | #   * piped = use 'dd' over SSH to transfer files
372 | #   * smart = try sftp, scp, and piped, in that order [default]
373 | #transfer_method = smart
374 | 
375 | # if False, sftp will not use batch mode to transfer files. This may cause some
376 | # types of file transfer failures impossible to catch however, and should
377 | # only be disabled if your sftp version has problems with batch mode
378 | #sftp_batch_mode = False
379 | 
380 | [persistent_connection]
381 | 
382 | # Configures the persistent connection timeout value in seconds.  This value is
383 | # how long the persistent connection will remain idle before it is destroyed.  
384 | # If the connection doesn't receive a request before the timeout value 
385 | # expires, the connection is shutdown.  The default value is 30 seconds.
386 | connect_timeout = 30
387 | 
388 | # Configures the persistent connection retries.  This value configures the
389 | # number of attempts the ansible-connection will make when trying to connect
390 | # to the local domain socket.  The default value is 30.
391 | connect_retries = 30
392 | 
393 | # Configures the amount of time in seconds to wait between connection attempts 
394 | # to the local unix domain socket.  This value works in conjunction with the
395 | # connect_retries value to define how long to try to connect to the local
396 | # domain socket when setting up a persistent connection.  The default value is
397 | # 1 second.
398 | connect_interval = 1
399 | 
400 | [accelerate]
401 | #accelerate_port = 5099
402 | #accelerate_timeout = 30
403 | #accelerate_connect_timeout = 5.0
404 | 
405 | # The daemon timeout is measured in minutes. This time is measured
406 | # from the last activity to the accelerate daemon.
407 | #accelerate_daemon_timeout = 30
408 | 
409 | # If set to yes, accelerate_multi_key will allow multiple
410 | # private keys to be uploaded to it, though each user must
411 | # have access to the system via SSH to add a new key. The default
412 | # is "no".
413 | #accelerate_multi_key = yes
414 | 
415 | [selinux]
416 | # file systems that require special treatment when dealing with security context
417 | # the default behaviour that copies the existing context or uses the user default
418 | # needs to be changed to use the file system dependent context.
419 | #special_context_filesystems=nfs,vboxsf,fuse,ramfs,9p
420 | 
421 | # Set this to yes to allow libvirt_lxc connections to work without SELinux.
422 | #libvirt_lxc_noseclabel = yes
423 | 
424 | [colors]
425 | #highlight = white
426 | #verbose = blue
427 | #warn = bright purple
428 | #error = red
429 | #debug = dark gray
430 | #deprecate = purple
431 | #skip = cyan
432 | #unreachable = red
433 | #ok = green
434 | #changed = yellow
435 | #diff_add = green
436 | #diff_remove = red
437 | #diff_lines = cyan
438 | 
439 | 
440 | [diff]
441 | # Always print diff when running ( same as always running with -D/--diff )
442 | # always = no
443 | 
444 | # Set how many context lines to show in diff
445 | # context = 3
446 | 


--------------------------------------------------------------------------------
/clean-os-error.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | source /etc/slurm/openrc.sh
 4 | 
 5 | os_error_list=$(openstack server list --status ERROR -f value -c ID)
 6 | logfile=/var/log/slurm/os_clean.log
 7 | 
 8 | for host_id in $os_error_list
 9 | do
10 |   echo "Removing OS_HOST $host_id" >> $logfile 2>&1
11 |   openstack server show $host_id >> $logfile 2>&1
12 |   openstack server delete $host_id >> $logfile 2>&1
13 | done
14 | 


--------------------------------------------------------------------------------
/cluster_create.sh:
--------------------------------------------------------------------------------
  1 | #!/bin/bash
  2 | 
  3 | #This script makes several assumptions:
  4 | # 1. Running on a host with openstack client tools installed
  5 | # 2. Using a default ssh key in ~/.ssh/
  6 | # 3. The user knows what they're doing.
  7 | # 4. Take some options: 
  8 | #    openrc file 
  9 | #    headnode size 
 10 | #    cluster name 
 11 | #    volume size
 12 | 
 13 | show_help() {
 14 |   echo "Options:
 15 |         -n: HEADNODE_NAME: required, name of the cluster
 16 |         -o: OPENRC_PATH: optional, path to a valid openrc file, default is ./openrc.sh
 17 |         -s: HEADNODE_SIZE: optional, size of the headnode in Openstack flavor (default: m1.small)
 18 |         -v: VOLUME_SIZE: optional, size of storage volume in GB, volume not created if 0
 19 |         -d: DOCKER_ALLOW: optional flag, leave docker installed on headnode if set.
 20 | 	-j: JUPYTERHUB_BUILD: optional flag, install jupyterhub with SSL certs.
 21 |   
 22 | Usage: $0 -n [HEADNODE_NAME] -o [OPENRC_PATH] -v [VOLUME_SIZE] -s [HEADNODE_SIZE] [-d]"
 23 | }
 24 | 
 25 | OPTIND=1
 26 | 
 27 | openrc_path="./openrc.sh"
 28 | headnode_size="m1.small"
 29 | headnode_name="noname"
 30 | volume_size="0"
 31 | install_opts=""
 32 | 
 33 | while getopts ":jdhhelp:n:o:s:v:" opt; do
 34 |   case ${opt} in
 35 |     h|help|\?) show_help
 36 |       exit 0
 37 |       ;;
 38 |     d) install_opts+="-d "
 39 |       ;;
 40 |     j) install_opts+="-j "
 41 |       ;;
 42 |     o) openrc_path=${OPTARG}
 43 |       ;;
 44 |     s) headnode_size=${OPTARG}
 45 |       ;;
 46 |     v) volume_size=${OPTARG}
 47 |       ;;
 48 |     n) headnode_name=${OPTARG}
 49 |       ;;
 50 |     :) echo "Option -$OPTARG requires an argument."
 51 |       exit 1
 52 |       ;;
 53 | 
 54 |   esac
 55 | done
 56 | 
 57 | 
 58 | if [[ ! -f ${openrc_path} ]]; then
 59 |   echo "openrc path: ${openrc_path} \n does not point to a file!"
 60 |   exit 1
 61 | fi
 62 | 
 63 | #Move this to allow for error checking of OS conflicts
 64 | source ${openrc_path}
 65 | 
 66 | if [[ -z $( echo ${headnode_size} | grep -E '^m1|^m2|^g1|^g2' ) ]]; then
 67 |   echo "Headnode size ${headnode_size} is not a valid JS instance size!"
 68 |   exit 1
 69 | elif [[ -n "$(echo ${volume_size} | tr -d [0-9])" ]]; then
 70 |   echo "Volume size must be numeric only, in units of GB."
 71 |   exit 1
 72 | elif [[ ${headnode_name} == "noname" ]]; then
 73 |   echo "No headnode name provided with -n, exiting!"
 74 |   exit 1
 75 | elif [[ -n $(openstack server list | grep -i ${headnode_name}) ]]; then
 76 |   echo "Cluster name [${headnode_name}] conficts with existing Openstack entity!" 
 77 |   exit 1
 78 | elif [[ -n $(openstack volume list | grep -i ${headnode_name}-storage) ]]; then
 79 |   echo "Volume name [${headnode_name}-storage] conficts with existing Openstack entity!" 
 80 |   exit 1
 81 | fi
 82 | 
 83 | if [[ ! -e ${HOME}/.ssh/id_rsa.pub ]]; then
 84 | #This may be temporary... but seems fairly reasonable.
 85 |   echo "NO KEY FOUND IN ${HOME}/.ssh/id_rsa.pub! - please create one and re-run!"  
 86 |   exit
 87 | fi
 88 | 
 89 | volume_name="${headnode_name}-storage"
 90 | 
 91 | # Defining a function here to check for quotas, and exit if this script will cause problems!
 92 | # also, storing 'quotas' in a global var, so we're not calling it every single time
 93 | quotas=$(openstack quota show)
 94 | quota_check () 
 95 | {
 96 | quota_name=$1
 97 | type_name=$2 #the name for a quota and the name for the thing itself are not the same
 98 | number_created=$3 #number of the thing that we'll create here.
 99 | 
100 | current_num=$(openstack ${type_name} list -f value | wc -l)
101 | 
102 | max_types=$(echo "${quotas}" | awk -v quota=${quota_name} '$0 ~ quota {print $4}')
103 | 
104 | #echo "checking quota for ${quota_name} of ${type_name} to create ${number_created} - want ${current_num} to be less than ${max_types}"
105 | 
106 | if [[ "${current_num}" -lt "$((max_types + number_created))" ]]; then 
107 |   return 0
108 | fi
109 | return 1
110 | }
111 | 
112 | 
113 | quota_check "secgroups" "security group" 1
114 | quota_check "networks" "network" 1
115 | quota_check "subnets" "subnet" 1
116 | quota_check "routers" "router" 1
117 | quota_check "key-pairs" "keypair" 1
118 | quota_check "instances" "server" 1
119 | 
120 | #These must match those defined in install.sh, slurm_resume.sh, compute_build_base_img.yml
121 | #  and compute_take_snapshot.sh, which ASSUME the headnode_name convention has not been deviated from.
122 | 
123 | OS_PREFIX=${headnode_name}
124 | OS_NETWORK_NAME=${OS_PREFIX}-elastic-net
125 | OS_SUBNET_NAME=${OS_PREFIX}-elastic-subnet
126 | OS_ROUTER_NAME=${OS_PREFIX}-elastic-router
127 | OS_SSH_SECGROUP_NAME=${OS_PREFIX}-ssh-global
128 | OS_INTERNAL_SECGROUP_NAME=${OS_PREFIX}-internal
129 | OS_HTTP_S_SECGROUP_NAME=${OS_PREFIX}-http-s
130 | OS_KEYPAIR_NAME=${OS_USERNAME}-elastic-key
131 | OS_APP_CRED=${OS_PREFIX}-slurm-app-cred
132 | 
133 | # This will allow for customization of the 1st 24 bits of the subnet range
134 | # The last 8 will be assumed open (netmask 255.255.255.0 or /24)
135 | # because going beyond that requires a general mechanism for translation from CIDR
136 | # to wildcard notation for ssh.cfg and compute_build_base_img.yml
137 | # which is assumed to be beyond the scope of this project.
138 | #  If there is a maintainable mechanism for this, of course, please let us know!
139 | SUBNET_PREFIX=10.0.0
140 | 
141 | 
142 | # Ensure that the correct private network/router/subnet exists
143 | if [[ -z "$(openstack network list | grep ${OS_NETWORK_NAME})" ]]; then
144 |   openstack network create ${OS_NETWORK_NAME}
145 |   openstack subnet create --network ${OS_NETWORK_NAME} --subnet-range ${SUBNET_PREFIX}.0/24 ${OS_SUBNET_NAME}
146 | fi
147 | ##openstack subnet list
148 | if [[ -z "$(openstack router list | grep ${OS_ROUTER_NAME})" ]]; then
149 |   openstack router create ${OS_ROUTER_NAME}
150 |   openstack router add subnet ${OS_ROUTER_NAME} ${OS_SUBNET_NAME}
151 |   openstack router set --external-gateway public ${OS_ROUTER_NAME}
152 | fi
153 | 
154 | security_groups=$(openstack security group list -f value)
155 | if [[ ! ("${security_groups}" =~ "${OS_SSH_SECGROUP_NAME}") ]]; then
156 |   openstack security group create --description "ssh \& icmp enabled" ${OS_SSH_SECGROUP_NAME}
157 |   openstack security group rule create --protocol tcp --dst-port 22:22 --remote-ip 0.0.0.0/0 ${OS_SSH_SECGROUP_NAME}
158 |   openstack security group rule create --protocol icmp ${OS_SSH_SECGROUP_NAME}
159 | fi
160 | if [[ ! ("${security_groups}" =~ "${OS_INTERNAL_SECGROUP_NAME}") ]]; then
161 |   openstack security group create --description "internal group for cluster" ${OS_INTERNAL_SECGROUP_NAME}
162 |   openstack security group rule create --protocol tcp --dst-port 1:65535 --remote-ip ${SUBNET_PREFIX}.0/24 ${OS_INTERNAL_SECGROUP_NAME}
163 |   openstack security group rule create --protocol icmp ${OS_INTERNAL_SECGROUP_NAME}
164 | fi
165 | if [[ (! ("${security_groups}" =~ "${OS_HTTP_S_SECGROUP_NAME}")) && "${install_opts}" =~ "j" ]]; then
166 |   openstack security group create --description "http/s for jupyterhub" ${OS_HTTP_S_SECGROUP_NAME}
167 |   openstack security group rule create --protocol tcp --dst-port 80 --remote-ip 0.0.0.0/0 ${OS_HTTP_S_SECGROUP_NAME}
168 |   openstack security group rule create --protocol tcp --dst-port 443 --remote-ip 0.0.0.0/0 ${OS_HTTP_S_SECGROUP_NAME}
169 | fi
170 | 
171 | #Check if ${HOME}/.ssh/id_rsa.pub exists in JS
172 | if [[ -e ${HOME}/.ssh/id_rsa.pub ]]; then
173 |   home_key_fingerprint=$(ssh-keygen -l -E md5 -f ${HOME}/.ssh/id_rsa.pub | sed  's/.*MD5:\(\S*\) .*/\1/')
174 | fi
175 | openstack_keys=$(openstack keypair list -f value)
176 | 
177 | home_key_in_OS=$(echo "${openstack_keys}" | awk -v mykey="${home_key_fingerprint}" '$2 ~ mykey {print $1}')
178 | 
179 | if [[ -n "${home_key_in_OS}" ]]; then 
180 | 	#RESET this to key that's already in OS
181 |   OS_KEYPAIR_NAME=${home_key_in_OS}
182 | elif [[ -n $(echo "${openstack_keys}" | grep ${OS_KEYPAIR_NAME}) ]]; then
183 |   openstack keypair delete ${OS_KEYPAIR_NAME}
184 | # This doesn't need to depend on the OS_PROJECT_NAME, as the slurm-key does, in install.sh and slurm_resume
185 |   openstack keypair create --public-key ${HOME}/.ssh/id_rsa.pub ${OS_KEYPAIR_NAME}
186 | else
187 | # This doesn't need to depend on the OS_PROJECT_NAME, as the slurm-key does, in install.sh and slurm_resume
188 |   openstack keypair create --public-key ${HOME}/.ssh/id_rsa.pub ${OS_KEYPAIR_NAME}
189 | fi
190 | 
191 | #centos_base_image=$(openstack image list --status active | grep -iE "API-Featured-centos7-[[:alpha:]]{3,4}-[0-9]{2}-[0-9]{4}" | awk '{print $4}' | tail -n 1)
192 | centos_base_image="JS-API-Featured-CentOS8-Latest"
193 | 
194 | #Now, generate an Openstack Application Credential for use on the cluster
195 | export $(openstack application credential create -f shell ${OS_APP_CRED} | sed 's/^\(.*\)/OS_ac_\1/')
196 | 
197 | #Write it to a temporary file
198 | echo -e "export OS_AUTH_TYPE=v3applicationcredential
199 | export OS_AUTH_URL=${OS_AUTH_URL}
200 | export OS_IDENTITY_API_VERSION=3
201 | export OS_REGION_NAME="RegionOne"
202 | export OS_INTERFACE=public
203 | export OS_APPLICATION_CREDENTIAL_ID=${OS_ac_id}
204 | export OS_APPLICATION_CREDENTIAL_SECRET=${OS_ac_secret}" > ./openrc-app.sh
205 | 
206 | #Function to generate file: sections for cloud-init config files
207 | # arguments are owner path permissions file_to_be_copied
208 | # All calls to this must come after an "echo "write_files:\n"
209 | generate_write_files () {
210 | #This is generating YAML, so... spaces are important.
211 | echo -e "  - encoding: b64\n    owner: $1\n    path: $2\n    permissions: $3\n    content: |\n$(cat $4 | base64 | sed 's/^/      /')"
212 | }
213 | 
214 | user_data="$(cat ./prevent-updates.ci)\n"
215 | user_data+="$(echo -e "write_files:")\n"
216 | user_data+="$(generate_write_files "slurm" "/etc/slurm/openrc.sh" "0400" "./openrc-app.sh")\n"
217 | 
218 | #Clean up!
219 | rm ./openrc-app.sh
220 | 
221 | echo -e "openstack server create\
222 |         --user-data <(echo -e "${user_data}") \
223 |         --flavor ${headnode_size} \
224 |         --image ${centos_base_image} \
225 |         --key-name ${OS_KEYPAIR_NAME} \
226 |         --security-group ${OS_SSH_SECGROUP_NAME} \
227 |         --security-group ${OS_INTERNAL_SECGROUP_NAME} \
228 |         --nic net-id=${OS_NETWORK_NAME} \
229 |         ${headnode_name}"
230 | 
231 | openstack server create \
232 |         --user-data <(echo -e "${user_data}") \
233 |         --flavor ${headnode_size} \
234 |         --image ${centos_base_image} \
235 |         --key-name ${OS_KEYPAIR_NAME} \
236 |         --security-group ${OS_SSH_SECGROUP_NAME} \
237 |         --security-group ${OS_INTERNAL_SECGROUP_NAME} \
238 |         --nic net-id=${OS_NETWORK_NAME} \
239 |         ${headnode_name}
240 | 
241 | public_ip=$(openstack floating ip create public | awk '/floating_ip_address/ {print $4}')
242 | #For some reason there's a time issue here - adding a sleep command to allow network to become ready
243 | sleep 10
244 | openstack server add floating ip ${headnode_name} ${public_ip}
245 | 
246 | hostname_test=$(ssh -q -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no centos@${public_ip} 'hostname')
247 | echo "test1: ${hostname_test}"
248 | until [[ ${hostname_test} =~ "${headnode_name}" ]]; do
249 |   sleep 2
250 |   hostname_test=$(ssh -q -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no centos@${public_ip} 'hostname')
251 |   echo "ssh -q -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no centos@${public_ip} 'hostname'"
252 |   echo "test2: ${hostname_test}"
253 | done
254 | 
255 | rsync -qa --exclude="openrc.sh" -e 'ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no' ${PWD} centos@${public_ip}:
256 | 
257 | if [[ "${volume_size}" != "0" ]]; then
258 |   echo "Creating volume ${volume_name} of ${volume_size} GB"
259 |   openstack volume create --size ${volume_size} ${volume_name}
260 |   openstack server add volume --device /dev/sdb ${headnode_name} ${volume_name}
261 |   sleep 5 # To fix a wait issue in volume creation
262 |   ssh -o StrictHostKeyChecking=no centos@${public_ip} 'sudo mkfs.xfs /dev/sdb && sudo mkdir -m 777 /export'
263 |   vol_uuid=$(ssh centos@${public_ip} 'sudo blkid /dev/sdb | sed "s|.*UUID=\"\(.\{36\}\)\" .*|\1|"')
264 |   echo "volume uuid is: ${vol_uuid}"
265 |   ssh centos@${public_ip} "echo -e \"UUID=${vol_uuid} /export                 xfs     defaults        0 0\" | sudo tee -a /etc/fstab && sudo mount -a"
266 |   echo "Volume sdb has UUID ${vol_uuid} on ${public_ip}"
267 |   if [[ ${docker_allow} == 1 ]]; then
268 |     ssh centos@${public_ip} "echo -E '{ \"data-root\": \"/export/docker\" }' | sudo tee -a /etc/docker/daemon.json && sudo systemctl restart docker"
269 |   fi
270 | 
271 | fi
272 | 
273 | if [[ "${install_opts}" =~ "-j" ]]; then
274 |   openstack server add security group ${headnode_name} ${OS_HTTP_S_SECGROUP_NAME}
275 | fi
276 |   
277 | echo "Copied over VC files, beginning Slurm installation and Compute Image configuration - should take 8-10 minutes."
278 | 
279 | #Since PWD on localhost has the full path, we only want the current directory name
280 | ssh -o StrictHostKeyChecking=no centos@${public_ip} "cd ./${PWD##*/} && sudo ./install.sh ${install_opts}"
281 | 
282 | echo "You should be able to login to your headnode with your Jetstream key: ${OS_KEYPAIR_NAME}, at ${public_ip}"
283 | 
284 | if [[ ${install_opts} =~ "-j" ]]; then
285 |   echo "You will need to edit the file ${PWD}/install_jupyterhub.yml to reflect the public hostname of your new cluster, and use your email for SSL certs."
286 |   echo "Then, run the following command from the directory ${PWD} ON THE NEW HEADNODE to complete your jupyterhub setup:"
287 |   echo "sudo ansible-playbook -v --ssh-common-args='-o StrictHostKeyChecking=no' install_jupyterhub.yml"
288 | fi
289 | 


--------------------------------------------------------------------------------
/cluster_create_local.sh:
--------------------------------------------------------------------------------
  1 | #!/bin/bash
  2 | 
  3 | # Uncomment below to help with debugging
  4 | set -x
  5 | 
  6 | #This script makes several assumptions:
  7 | # 1. Running on a host with openstack client tools installed
  8 | # 2. Using a default ssh key in ~/.ssh/
  9 | # 3. The user knows what they're doing.
 10 | # 4. Take some options:
 11 | #    openrc file
 12 | #    volume size
 13 | 
 14 | show_help() {
 15 |   echo "Options:
 16 |         -o: OPENRC_PATH: optional, path to a valid openrc file, default is ~/openrc.sh
 17 |         -v: VOLUME_SIZE: optional, size of storage volume in GB, volume not created if 0
 18 |         -d: DOCKER_ALLOW: optional flag, leave docker installed on headnode if set.
 19 | 	-j: JUPYTERHUB_BUILD: optional flag, install jupyterhub with SSL certs.
 20 | 
 21 | Usage: $0 -o [OPENRC_PATH] -v [VOLUME_SIZE] [-d]"
 22 | }
 23 | 
 24 | OPTIND=1
 25 | 
 26 | openrc_path="${HOME}/openrc.sh"
 27 | volume_size="0"
 28 | install_opts=""
 29 | 
 30 | while getopts ":jdhhelp:n:o:s:v:" opt; do
 31 |   case ${opt} in
 32 |     h|help|\?) show_help
 33 |       exit 0
 34 |       ;;
 35 |     d) install_opts+="-d "
 36 |       ;;
 37 |     j) install_opts+="-j "
 38 |       ;;
 39 |     o) openrc_path=${OPTARG}
 40 |       ;;
 41 |     v) volume_size=${OPTARG}
 42 |       ;;
 43 |     :) echo "Option -$OPTARG requires an argument."
 44 |       exit 1
 45 |       ;;
 46 | 
 47 |   esac
 48 | done
 49 | 
 50 | sudo pip3 install openstacksdk==0.61.0
 51 | sudo pip3 install python-openstackclient
 52 | sudo ln -s /usr/local/bin/openstack /usr/bin/openstack
 53 | 
 54 | if [[ ! -f ${openrc_path} ]]; then
 55 |   echo "openrc path: ${openrc_path} \n does not point to a file!"
 56 |   exit 1
 57 | fi
 58 | 
 59 | headnode_name="$(hostname --short)"
 60 | 
 61 | #Move this to allow for error checking of OS conflicts
 62 | source ${openrc_path}
 63 | 
 64 | if [[ -n "$(echo ${volume_size} | tr -d [0-9])" ]]; then
 65 |   echo "Volume size must be numeric only, in units of GB."
 66 |   exit 1
 67 | elif [[ -n $(openstack volume list | grep -i ${headnode_name}-storage) ]]; then
 68 |   echo "Volume name [${headnode_name}-storage] conficts with existing Openstack entity!" 
 69 |   exit 1
 70 | fi
 71 | 
 72 | if [[ ! -e ${HOME}/.ssh/id_rsa.pub ]]; then
 73 |   ssh-keygen -q -N "" -f ${HOME}/.ssh/id_rsa
 74 | fi
 75 | 
 76 | volume_name="${headnode_name}-storage"
 77 | 
 78 | # Defining a function here to check for quotas, and exit if this script will cause problems!
 79 | # also, storing 'quotas' in a global var, so we're not calling it every single time
 80 | quotas=$(openstack quota show)
 81 | quota_check () 
 82 | {
 83 | quota_name=$1
 84 | type_name=$2 #the name for a quota and the name for the thing itself are not the same
 85 | number_created=$3 #number of the thing that we'll create here.
 86 | 
 87 | current_num=$(openstack ${type_name} list -f value | wc -l)
 88 | 
 89 | max_types=$(echo "${quotas}" | awk -v quota=${quota_name} '$0 ~ quota {print $4}')
 90 | 
 91 | #echo "checking quota for ${quota_name} of ${type_name} to create ${number_created} - want ${current_num} to be less than ${max_types}"
 92 | 
 93 | if [[ "${current_num}" -lt "$((max_types + number_created))" ]]; then 
 94 |   return 0
 95 | fi
 96 | return 1
 97 | }
 98 | 
 99 | 
100 | quota_check "secgroups" "security group" 1
101 | quota_check "networks" "network" 1
102 | quota_check "subnets" "subnet" 1
103 | quota_check "routers" "router" 1
104 | quota_check "key-pairs" "keypair" 1
105 | quota_check "instances" "server" 1
106 | 
107 | #These must match those defined in install.sh, slurm_resume.sh, compute_build_base_img.yml
108 | #  and compute_take_snapshot.sh, which ASSUME the headnode_name convention has not been deviated from.
109 | 
110 | OS_PREFIX=${headnode_name}
111 | OS_SSH_SECGROUP_NAME=${OS_PREFIX}-ssh-global
112 | OS_INTERNAL_SECGROUP_NAME=${OS_PREFIX}-internal
113 | OS_HTTP_S_SECGROUP_NAME=${OS_PREFIX}-http-s
114 | OS_KEYPAIR_NAME=${OS_PREFIX}-elastic-key
115 | 
116 | HEADNODE_NETWORK=$(openstack server show $(hostname -s) | grep addresses | awk  -F'|' '{print $3}' | awk -F'=' '{print $1}'  | awk '{$1=$1};1')
117 | HEADNODE_IP=$(openstack server show $(hostname -s) | grep addresses | awk  -F'|' '{print $3}' | awk  -F'=' '{print $2}' | awk  -F',' '{print $1}')
118 | SUBNET=$(ip addr | grep $HEADNODE_IP | awk '{print $2}')
119 | 
120 | echo "Headnode network name ${HEADNODE_NETWORK}"
121 | echo "Headnode ip ${HEADNODE_IP}"
122 | echo "Subnet ${SUBNET}"
123 | 
124 | # This will allow for customization of the 1st 24 bits of the subnet range
125 | # The last 8 will be assumed open (netmask 255.255.255.0 or /24)
126 | # because going beyond that requires a general mechanism for translation from CIDR
127 | # to wildcard notation for ssh.cfg and compute_build_base_img.yml
128 | # which is assumed to be beyond the scope of this project.
129 | #  If there is a maintainable mechanism for this, of course, please let us know!
130 | 
131 | security_groups=$(openstack security group list -f value)
132 | if [[ ! ("${security_groups}" =~ "${OS_SSH_SECGROUP_NAME}") ]]; then
133 |   openstack security group create --description "ssh \& icmp enabled" ${OS_SSH_SECGROUP_NAME}
134 |   openstack security group rule create --protocol tcp --dst-port 22:22 --remote-ip 0.0.0.0/0 ${OS_SSH_SECGROUP_NAME}
135 |   openstack security group rule create --protocol icmp ${OS_SSH_SECGROUP_NAME}
136 | fi
137 | if [[ ! ("${security_groups}" =~ "${OS_INTERNAL_SECGROUP_NAME}") ]]; then
138 |   openstack security group create --description "internal group for cluster" ${OS_INTERNAL_SECGROUP_NAME}
139 |   openstack security group rule create --protocol tcp --dst-port 1:65535 --remote-ip ${SUBNET} ${OS_INTERNAL_SECGROUP_NAME}
140 |   openstack security group rule create --protocol icmp ${OS_INTERNAL_SECGROUP_NAME}
141 | fi
142 | if [[ (! ("${security_groups}" =~ "${OS_HTTP_S_SECGROUP_NAME}")) && "${install_opts}" =~ "j" ]]; then
143 |   openstack security group create --description "http/s for jupyterhub" ${OS_HTTP_S_SECGROUP_NAME}
144 |   openstack security group rule create --protocol tcp --dst-port 80 --remote-ip 0.0.0.0/0 ${OS_HTTP_S_SECGROUP_NAME}
145 |   openstack security group rule create --protocol tcp --dst-port 443 --remote-ip 0.0.0.0/0 ${OS_HTTP_S_SECGROUP_NAME}
146 | fi
147 | 
148 | #Check if ${HOME}/.ssh/id_rsa.pub exists in JS
149 | if [[ -e ${HOME}/.ssh/id_rsa.pub ]]; then
150 |   home_key_fingerprint=$(ssh-keygen -l -E md5 -f ${HOME}/.ssh/id_rsa.pub | sed  's/.*MD5:\(\S*\) .*/\1/')
151 | fi
152 | openstack_keys=$(openstack keypair list -f value)
153 | 
154 | home_key_in_OS=$(echo "${openstack_keys}" | awk -v mykey="${home_key_fingerprint}" '$2 ~ mykey {print $1}')
155 | 
156 | if [[ -n "${home_key_in_OS}" ]]; then 
157 | 	#RESET this to key that's already in OS
158 |   OS_KEYPAIR_NAME=${home_key_in_OS}
159 | elif [[ -n $(echo "${openstack_keys}" | grep ${OS_KEYPAIR_NAME}) ]]; then
160 |   openstack keypair delete ${OS_KEYPAIR_NAME}
161 | # This doesn't need to depend on the OS_PROJECT_NAME, as the slurm-key does, in install.sh and slurm_resume
162 |   openstack keypair create --public-key ${HOME}/.ssh/id_rsa.pub ${OS_KEYPAIR_NAME}
163 | else
164 | # This doesn't need to depend on the OS_PROJECT_NAME, as the slurm-key does, in install.sh and slurm_resume
165 |   openstack keypair create --public-key ${HOME}/.ssh/id_rsa.pub ${OS_KEYPAIR_NAME}
166 | fi
167 | 
168 | SERVER_UUID=$(curl http://169.254.169.254/openstack/latest/meta_data.json | jq '.uuid' | sed -e 's#"##g')
169 | 
170 | server_security_groups=$(openstack server show -f value -c security_groups ${SERVER_UUID} | sed -e "s#name=##" -e "s#'##g" | paste -s -)
171 | 
172 | if [[ ! ("${server_security_groups}" =~ "${OS_SSH_SECGROUP_NAMEOS_SSH_SECGROUP_NAME}") ]]; then
173 |   echo -e "openstack server add security group ${SERVER_UUID} ${OS_SSH_SECGROUP_NAME}"
174 |   openstack server add security group ${SERVER_UUID} ${OS_SSH_SECGROUP_NAME}
175 | fi
176 | 
177 | if [[ ! ("${server_security_groups}" =~ "${OS_INTERNAL_SECGROUP_NAME}") ]]; then
178 |   echo -e "openstack server add security group ${SERVER_UUID} ${OS_INTERNAL_SECGROUP_NAME}"
179 |   openstack server add security group ${SERVER_UUID} ${OS_INTERNAL_SECGROUP_NAME}
180 | fi
181 | 
182 | if [[ "${volume_size}" != "0" ]]; then
183 |   echo "Creating volume ${volume_name} of ${volume_size} GB"
184 |   openstack volume create --size ${volume_size} ${volume_name}
185 |   openstack server add volume --device /dev/sdb ${SERVER_UUID} ${volume_name}
186 |   sleep 5 # To fix a wait issue in volume creation
187 |   sudo mkfs.xfs /dev/sdb && sudo mkdir -m 777 /export
188 |   vol_uuid=$(sudo blkid /dev/sdb | sed "s|.*UUID=\"\(.\{36\}\)\" .*|\1|")
189 |   echo "volume uuid is: ${vol_uuid}"
190 |   echo -e \"UUID=${vol_uuid} /export                 xfs     defaults        0 0\" | sudo tee -a /etc/fstab && sudo mount -a
191 |   echo "Volume sdb has UUID ${vol_uuid}"
192 |   if [[ ${docker_allow} == 1 ]]; then
193 |     echo -E '{ \"data-root\": \"/export/docker\" }' | sudo tee -a /etc/docker/daemon.json && sudo systemctl restart docker
194 |   fi
195 | 
196 | fi
197 | 
198 | if [[ (! ("${server_security_groups}" =~ "${OS_HTTP_S_SECGROUP_NAME}")) && "${install_opts}" =~ "-j" ]]; then
199 |   openstack server add security group ${SERVER_UUID} ${OS_HTTP_S_SECGROUP_NAME}
200 | fi
201 |   
202 | echo "Beginning Slurm installation and Compute Image configuration - should take 8-10 minutes."
203 | 
204 | sudo mkdir -p /etc/slurm
205 | sudo cp "${openrc_path}" /etc/slurm/openrc.sh
206 | sudo chmod 400 /etc/slurm/openrc.sh
207 | 
208 | sudo ./install_local.sh ${install_opts}
209 | 
210 | if [[ ${install_opts} =~ "-j" ]]; then
211 |   echo "You will need to edit the file ${PWD}/install_jupyterhub.yml to reflect the public hostname of your new cluster, and use your email for SSL certs."
212 |   echo "Then, run the following command from the directory ${PWD} on this instance to complete your jupyterhub setup:"
213 |   echo "sudo ansible-playbook -v --ssh-common-args='-o StrictHostKeyChecking=no' install_jupyterhub.yml"
214 | fi
215 | 


--------------------------------------------------------------------------------
/cluster_destroy.sh:
--------------------------------------------------------------------------------
  1 | #!/bin/bash
  2 | 
  3 | #This script makes several assumptions:
  4 | # 1. Running on a host with openstack client tools installed
  5 | # 2. Using a default ssh key in ~/.ssh/
  6 | # 3. The user knows what they're doing.
  7 | # 4. Take some options: 
  8 | #    openrc file 
  9 | #    headnode size 
 10 | #    cluster name 
 11 | #    volume size
 12 | 
 13 | show_help() {
 14 |   echo "Options:
 15 |         HEADNODE_NAME: required, name of the cluster to delete
 16 |         OPENRC_PATH: optional, path to a valid openrc file, defaults to ./openrc.sh
 17 |         VOLUME_DELETE: optional flag, set to delete storage volumes, default false
 18 |   
 19 | Usage: $0 -n <HEADNODE_NAME> -o [OPENRC_PATH] [-v] "
 20 | }
 21 | 
 22 | OPTIND=1
 23 | 
 24 | openrc_path="./openrc.sh"
 25 | headnode_name="noname"
 26 | volume_delete="0"
 27 | 
 28 | while getopts ":hhelp:o:n:v" opt; do
 29 |   case ${opt} in
 30 |     h|help|\?) show_help
 31 |       exit 0
 32 |       ;;
 33 |     o) openrc_path=${OPTARG}
 34 |       ;;
 35 |     n) headnode_name=${OPTARG}
 36 |       ;;
 37 |     v) volume_delete=1
 38 |       ;;
 39 |     :) echo "Option -$OPTARG requires an argument."
 40 |       exit 1
 41 |       ;;
 42 | 
 43 |   esac
 44 | done
 45 | 
 46 | 
 47 | if [[ ! -f ${openrc_path} ]]; then
 48 |   echo "openrc path: ${openrc_path} \n does not point to a file!"
 49 |   exit 1
 50 | elif [[ ${headnode_name} == "noname" ]]; then
 51 |   echo "No headnode name provided with -n, exiting!"
 52 |   exit 1
 53 | elif [[ "${volume_delete}" != "0" && "${volume_delete}" != "1" ]]; then
 54 |   echo "Volume_delete parameter must be 0 or 1 instead of ${volume_delete}"
 55 |   exit 1
 56 | fi
 57 | 
 58 | source ${openrc_path}
 59 | 
 60 | headnode_ip=$(openstack server list -f value -c Networks --name ${headnode_name} | sed 's/.*\(149.[0-9]\{1,3\}.[0-9]\{1,3\}.[0-9]\{1,3\}\).*/\1/')
 61 | echo "Removing cluster based on ${headnode_name} at ${headnode_ip}"
 62 | 
 63 | #remove 1st instance of "id", then the only alphanumeric chars left are the openstack id of the instance
 64 | volume_id=$(openstack server show -f value -c volumes_attached ${headnode_name} | sed 's/id//' | tr -dc [:alnum:]-) 
 65 | 
 66 | openstack server delete ${headnode_name}
 67 | 
 68 | openstack floating ip delete ${headnode_ip}
 69 | 
 70 | #There's only one of each thing floating around, potentially some compute instances that weren't cleaned up and some images... SO!
 71 | OS_PREFIX=${headnode_name}
 72 | OS_SSH_SECGROUP_NAME=${OS_PREFIX}-ssh-global
 73 | OS_INTERNAL_SECGROUP_NAME=${OS_PREFIX}-internal
 74 | OS_SLURM_KEYPAIR=${OS_PREFIX}-slurm-key
 75 | OS_ROUTER_NAME=${OS_PREFIX}-elastic-router
 76 | OS_SUBNET_NAME=${OS_PREFIX}-elastic-subnet
 77 | OS_NETWORK_NAME=${OS_PREFIX}-elastic-net
 78 | OS_APP_CRED=${OS_PREFIX}-slurm-app-cred
 79 | 
 80 | compute_nodes=$(openstack server list -f value -c Name | grep -E "compute-${headnode_name}-base-instance|${headnode_name}-compute" )
 81 | if [[ -n "${compute_nodes}" ]]; then
 82 | for node in "${compute_nodes}"
 83 | do
 84 | 	echo "Deleting compute node: ${node}"
 85 |   openstack server delete ${node}
 86 | done
 87 | fi
 88 | 
 89 | sleep 5 # seems like there are issues with the network deleting correctly 
 90 | 
 91 | if [[ "${volume_delete}" == "1" ]]; then
 92 |   echo "DELETING VOLUME: ${volume_id}"
 93 |   openstack volume delete ${volume_id}
 94 | fi
 95 | 
 96 | openstack security group delete ${OS_SSH_SECGROUP_NAME}
 97 | openstack security group delete ${OS_INTERNAL_SECGROUP_NAME}
 98 | openstack keypair delete ${OS_SLURM_KEYPAIR} # We don't delete the elastic-key, since it could be a user's key used for other stuff
 99 | openstack router unset --external-gateway ${OS_ROUTER_NAME}
100 | openstack router remove subnet ${OS_ROUTER_NAME} ${OS_SUBNET_NAME}
101 | openstack router delete ${OS_ROUTER_NAME}
102 | openstack subnet delete ${OS_SUBNET_NAME}
103 | openstack network delete ${OS_NETWORK_NAME}
104 | 
105 | 
106 | headnode_images=$(openstack image list --private -f value -c Name | grep ${headnode_name}-compute-image- )
107 | for image in "${headnode_images}"
108 | do
109 |   openstack image delete ${image}
110 | done
111 | 
112 | openstack application credential delete ${OS_APP_CRED}
113 | 


--------------------------------------------------------------------------------
/cluster_destroy_local.sh:
--------------------------------------------------------------------------------
  1 | #!/bin/bash
  2 | 
  3 | # Uncomment below to help with debugging
  4 | # set -x
  5 | 
  6 | #This script makes several assumptions:
  7 | # 1. Running on a host with openstack client tools installed
  8 | # 2. Using a default ssh key in ~/.ssh/
  9 | # 3. The user knows what they're doing.
 10 | # 4. Take some options: 
 11 | #    openrc file 
 12 | #    cluster name
 13 | #    volume size
 14 | 
 15 | show_help() {
 16 |   echo "Options:
 17 |         -n: HEADNODE_NAME: required, name of the cluster to delete
 18 |         -o: OPENRC_PATH: optional, path to a valid openrc file, defaults to ~/openrc.sh
 19 |         -v: VOLUME_DELETE: optional flag, set to delete storage volumes, default false
 20 |         -d: HEADNODE_DELETE: delete the headnode once the cluster is deleted
 21 |   
 22 | Usage: $0 -n <HEADNODE_NAME> -o [OPENRC_PATH] [-v] [-d]"
 23 | }
 24 | 
 25 | OPTIND=1
 26 | 
 27 | openrc_path="${HOME}/openrc.sh"
 28 | headnode_name="$(hostname --short)"
 29 | volume_delete="0"
 30 | headnode_delete="0"
 31 | 
 32 | while getopts ":hhelp:o:n:v:d" opt; do
 33 |   case ${opt} in
 34 |     h|help|\?) show_help
 35 |       exit 0
 36 |       ;;
 37 |     o) openrc_path=${OPTARG}
 38 |       ;;
 39 |     n) headnode_name=${OPTARG}
 40 |       ;;
 41 |     v) volume_delete=1
 42 |       ;;
 43 |     d) headnode_delete=1
 44 |       ;;
 45 |     :) echo "Option -$OPTARG requires an argument."
 46 |       exit 1
 47 |       ;;
 48 | 
 49 |   esac
 50 | done
 51 | 
 52 | 
 53 | if [[ ! -f ${openrc_path} ]]; then
 54 |   echo "openrc path: ${openrc_path} \n does not point to a file!"
 55 |   exit 1
 56 | elif [[ "${volume_delete}" != "0" && "${volume_delete}" != "1" ]]; then
 57 |   echo "Volume_delete parameter must be 0 or 1 instead of ${volume_delete}"
 58 |   exit 1
 59 | elif [[ "${headnode_delete}" != "0" && "${headnode_delete}" != "1" ]]; then
 60 |   echo "Headnode_delete parameter must be 0 or 1 instead of ${headnode_delete}"
 61 |   exit 1
 62 | fi
 63 | 
 64 | source ${openrc_path}
 65 | 
 66 | #There's only one of each thing floating around, potentially some compute instances that weren't cleaned up and some images... SO!
 67 | OS_PREFIX=${headnode_name}
 68 | OS_SSH_SECGROUP_NAME=${OS_PREFIX}-ssh-global
 69 | OS_INTERNAL_SECGROUP_NAME=${OS_PREFIX}-internal
 70 | OS_SLURM_KEYPAIR=${OS_PREFIX}-slurm-key
 71 | OS_KEYPAIR_NAME=${OS_PREFIX}-elastic-key
 72 | 
 73 | compute_nodes=$(openstack server list -f value -c Name | grep -E "compute-${headnode_name}-base-instance|${headnode_name}-compute" )
 74 | if [[ -n "${compute_nodes}" ]]; then
 75 | for node in "${compute_nodes}"
 76 | do
 77 | 	echo "Deleting compute node: ${node}"
 78 |   openstack server delete ${node}
 79 | done
 80 | fi
 81 | 
 82 | sleep 5 # seems like there are issues with the network deleting correctly 
 83 | 
 84 | SERVER_UUID=$(curl http://169.254.169.254/openstack/latest/meta_data.json | jq '.uuid' | sed -e 's#"##g')
 85 | 
 86 | openstack server remove security group ${SERVER_UUID} ${OS_SSH_SECGROUP_NAME} || true
 87 | openstack server remove security group ${SERVER_UUID} ${OS_INTERNAL_SECGROUP_NAME} || true
 88 | 
 89 | openstack security group delete ${OS_SSH_SECGROUP_NAME}
 90 | openstack security group delete ${OS_INTERNAL_SECGROUP_NAME}
 91 | openstack keypair delete ${OS_SLURM_KEYPAIR}
 92 | # We DO delete the elastic-key, since we created it from scratch before
 93 | openstack keypair delete ${OS_KEYPAIR_NAME}
 94 | 
 95 | headnode_images=$(openstack image list --private -f value -c Name | grep ${headnode_name}-compute-image- )
 96 | for image in "${headnode_images}"
 97 | do
 98 |   openstack image delete ${image}
 99 | done
100 | 
101 | if [[ "${headnode_delete}" == "1" ]]; then
102 |   echo "DELETING HEADNODE: ${headnode_name}"
103 |   openstack server delete ${headnode_name}
104 | fi
105 | 


--------------------------------------------------------------------------------
/compute_build_base_img.yml:
--------------------------------------------------------------------------------
  1 | ---
  2 | 
  3 | - hosts: localhost
  4 | 
  5 |   vars:
  6 |     compute_base_image: "Featured-RockyLinux8"
  7 |     sec_group_global: "{{ ansible_facts.hostname }}-ssh-global"
  8 |     sec_group_internal: "{{ ansible_facts.hostname }}-internal"
  9 |     compute_base_size: "m3.tiny"
 10 |     network_name: "{{ ansible_facts.hostname }}-elastic-net"
 11 |     JS_ssh_keyname: "{{ ansible_facts.hostname }}-slurm-key"
 12 |     openstack_cloud: "openstack"
 13 | 
 14 |   vars_files:
 15 |     - clouds.yaml
 16 | 
 17 |   tasks:
 18 | 
 19 |   - name: build compute base instance
 20 |     os_server:
 21 |       timeout: 300
 22 |       state: present
 23 |       name: "compute-{{ ansible_facts.hostname }}-base-instance"
 24 |       cloud: "{{ openstack_cloud }}"
 25 |       image: "{{ compute_base_image }}"
 26 |       key_name: "{{ JS_ssh_keyname }}"
 27 |       security_groups: "{{ sec_group_global }},{{ sec_group_internal }}"
 28 |       flavor: "{{ compute_base_size }}"
 29 |       meta: { compute: "base" }
 30 |       auto_ip: "no"
 31 |       user_data: |
 32 |         #cloud-config
 33 |         packages: []
 34 |         package_update: false
 35 |         package_upgrade: false
 36 |         package_reboot_if_required: false
 37 |         final_message: "Boot completed in $UPTIME seconds"
 38 |       network: "{{ network_name }}"
 39 |       wait: yes
 40 |     register: "os_host"
 41 | 
 42 |   - debug:
 43 |       var: os_host
 44 | 
 45 |   - name: add compute instance to inventory
 46 |     add_host:
 47 |       name: "{{ os_host['openstack']['name'] }}"
 48 |       groups: "compute-base"
 49 |       ansible_host: "{{ os_host.openstack.private_v4 }}"
 50 | 
 51 |   - name: pause for ssh to come up
 52 |     pause:
 53 |       seconds: 90
 54 | 
 55 | 
 56 | - hosts: compute-base
 57 | 
 58 |   vars:
 59 |     compute_base_package_list:
 60 |       - "python3-libselinux"
 61 |       - "telnet"
 62 |       - "bind-utils"
 63 |       - "vim"
 64 |       - "openmpi4-gnu9-ohpc"
 65 |       - "ohpc-slurm-client"
 66 |       - "lmod-ohpc"
 67 |       - "ceph-common"
 68 |     packages_to_remove:
 69 |       - "environment-modules"
 70 |       - "containerd.io.x86_64"
 71 |       - "docker-ce.x86_64"
 72 |       - "docker-ce-cli.x86_64"
 73 |       - "docker-ce-rootless-extras.x86_64"
 74 |       - "Lmod"
 75 | 
 76 |   tasks:
 77 | 
 78 |   - name: Get the headnode private IP
 79 |     local_action:
 80 |       module: shell source /etc/slurm/openrc.sh && openstack server show $(hostname -s) | grep addresses | awk  -F'|' '{print $3}' | awk  -F'=' '{print $2}' | awk  -F',' '{print $1}'
 81 |     register: headnode_private_ip
 82 |     become: False # for running as slurm, since no sudo on localhost
 83 | 
 84 |   - name: Get the slurmctld uid
 85 |     local_action:
 86 |       module: shell getent passwd slurm | awk -F':' '{print $3}'
 87 |     register: headnode_slurm_uid
 88 |     become: False # for running as slurm, since no sudo on localhost
 89 | 
 90 |   - name: turn off the firewall
 91 |     service:
 92 |       name: firewalld
 93 |       state: stopped
 94 |       enabled: no
 95 | 
 96 |   - name: Add OpenHPC 2.0 repo
 97 |     dnf:
 98 |       name: "http://repos.openhpc.community/OpenHPC/2/CentOS_8/x86_64/ohpc-release-2-1.el8.x86_64.rpm"
 99 |       state: present
100 |       lock_timeout: 900
101 |       disable_gpg_check: yes
102 | 
103 | 
104 |   - name: Enable CentOS PowerTools repo
105 |     command: dnf config-manager --set-enabled powertools
106 | 
107 |   - name: Disable docker-ce repo
108 |     command: dnf config-manager --set-disabled docker-ce-stable
109 | 
110 |   - name: remove env-modules and docker packages
111 |     dnf:
112 |       name: "{{ packages_to_remove }}"
113 |       state: absent
114 |       lock_timeout: 300
115 | 
116 |   # There is an issue in removing Lmod in early call. Seems like we need to run it twice
117 |   - name: remove Lmod packages
118 |     dnf:
119 |       name: Lmod
120 |       state: absent
121 |       lock_timeout: 300
122 | 
123 |   - name: install basic packages
124 |     dnf:
125 |       name: "{{ compute_base_package_list }}"
126 |       state: present
127 |       lock_timeout: 300
128 | 
129 |   - name: fix slurm user uid
130 |     user:
131 |       name: slurm
132 |       uid: "{{ headnode_slurm_uid.stdout}}"
133 |       shell: "/sbin/nologin"
134 |       home: "/etc/slurm"
135 | 
136 |   - name: create slurm spool directories
137 |     file:
138 |       path: /var/spool/slurm/ctld
139 |       state: directory
140 |       owner: slurm
141 |       group: slurm
142 |       mode: 0755
143 |       recurse: yes
144 | 
145 |   - name: change ownership of slurm files
146 |     file:
147 |       path: "{{ item }}"
148 |       owner: slurm
149 |       group: slurm
150 |     with_items:
151 |       - "/var/spool/slurm"
152 |       - "/var/spool/slurm/ctld"
153 | #      - "/var/log/slurm_jobacct.log"
154 | 
155 |   - name: disable selinux
156 |     selinux: state=permissive policy=targeted
157 | 
158 |  # - name: allow use_nfs_home_dirs
159 |  #   seboolean: name=use_nfs_home_dirs state=yes persistent=yes
160 | 
161 |   - name: import /home on compute nodes
162 |     lineinfile:
163 |       dest: /etc/fstab
164 |       line:  "{{ headnode_private_ip.stdout }}:/home  /home  nfs  defaults,nfsvers=4.0 0 0"
165 |       state: present
166 | 
167 |   - name: ensure /opt/ohpc/pub exists
168 |     file: path=/opt/ohpc/pub state=directory mode=777 recurse=yes
169 | 
170 |   - name: import /opt/ohpc/pub on compute nodes
171 |     lineinfile:
172 |       dest: /etc/fstab
173 |       line:  "{{ headnode_private_ip.stdout }}:/opt/ohpc/pub  /opt/ohpc/pub  nfs  defaults,nfsvers=4.0 0 0"
174 |       state: present
175 | 
176 |   - name: ensure /export exists
177 |     file: path=/export state=directory mode=777
178 | 
179 |   - name: import /export on compute nodes
180 |     lineinfile:
181 |       dest: /etc/fstab
182 |       line:  "{{ headnode_private_ip.stdout }}:/export  /export  nfs  defaults,nfsvers=4.0 0 0"
183 |       state: present
184 | 
185 |   - name: fix sda1 mount in fstab
186 |     lineinfile:
187 |       dest: /etc/fstab
188 |       regex: "/                       xfs     defaults"
189 |       line: "/dev/sda1           /                       xfs     defaults  0 0"
190 |       state: present
191 | 
192 |   - name: add local users to compute node
193 |     script: /tmp/add_users.sh
194 |     ignore_errors: True
195 | 
196 |   - name: copy munge key from headnode
197 |     synchronize:
198 |       mode: push
199 |       src: /etc/munge/munge.key
200 |       dest: /etc/munge/munge.key
201 |       set_remote_user: no
202 |       use_ssh_args: yes
203 | 
204 |   - name: fix perms on munge key
205 |     file: 
206 |       path: /etc/munge/munge.key
207 |       owner: munge
208 |       group: munge
209 |       mode: 0600
210 |  
211 |   - name: copy slurm.conf from headnode
212 |     synchronize:
213 |       mode: push
214 |       src: /etc/slurm/slurm.conf
215 |       dest: /etc/slurm/slurm.conf
216 |       set_remote_user: no
217 |       use_ssh_args: yes
218 |  
219 |   - name: copy slurm_prolog.sh from headnode
220 |     synchronize:
221 |       mode: push
222 |       src: /usr/local/sbin/slurm_prolog.sh
223 |       dest: /usr/local/sbin/slurm_prolog.sh
224 |       set_remote_user: no
225 |       use_ssh_args: yes
226 |  
227 |   - name: enable munge
228 |     service: name=munge.service enabled=yes 
229 |  
230 |   - name: enable slurmd
231 |     service: name=slurmd enabled=yes
232 | 
233 | #cat /etc/systemd/system/multi-user.target.wants/slurmd.service
234 | #[Unit]
235 | #Description=Slurm node daemon
236 | #After=network.target munge.service #CHANGING TO: network-online.target
237 | #ConditionPathExists=/etc/slurm/slurm.conf
238 | #
239 | #[Service]
240 | #Type=forking
241 | #EnvironmentFile=-/etc/sysconfig/slurmd
242 | #ExecStart=/usr/sbin/slurmd $SLURMD_OPTIONS
243 | #ExecReload=/bin/kill -HUP $MAINPID
244 | #PIDFile=/var/run/slurmd.pid
245 | #KillMode=process
246 | #LimitNOFILE=51200
247 | #LimitMEMLOCK=infinity
248 | #LimitSTACK=infinity
249 | #Delegate=yes
250 | #
251 | #
252 | #[Install]
253 | #WantedBy=multi-user.target
254 | 
255 |   - name: change slurmd service "After" to sshd and remote filesystems
256 |     command: sed -i 's/network.target/sshd.service remote-fs.target/' /usr/lib/systemd/system/slurmd.service
257 | 
258 |   - name: add slurmd service "Requires" of sshd and remote filesystems
259 |     command: sed -i '/After=network/aRequires=sshd.service remote-fs.target' /usr/lib/systemd/system/slurmd.service
260 | 
261 | #  - name: mount -a on compute nodes
262 | #    command: "mount -a"
263 | 
264 | - hosts: localhost
265 | 
266 |   vars_files:
267 |     - clouds.yaml
268 | 
269 |   tasks:
270 | 
271 |   - name: create compute instance snapshot
272 |     command: ./compute_take_snapshot.sh
273 |       
274 | # os_server no longer handles instance state correctly
275 | #  - name: remove compute instance
276 | #    os_server:
277 | #      timeout: 200
278 | #      state: absent
279 | #      name: "compute-{{ inventory_hostname_short }}-base-instance"
280 | #      cloud: "tacc"
281 | 


--------------------------------------------------------------------------------
/compute_take_snapshot.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | source /etc/slurm/openrc.sh
 4 | 
 5 | compute_image="$(hostname -s)-compute-image-latest"
 6 | compute_instance="compute-$(hostname -s)-base-instance"
 7 | 
 8 | openstack server stop ${compute_instance}
 9 | 
10 | count=0
11 | declare -i count
12 | until [[ ${count} -ge 12 || "${shutoff_check}" =~ "SHUTOFF" ]];
13 | do
14 |   shutoff_check=$(openstack server show -f value -c status ${compute_instance})
15 |   count+=1
16 |   sleep 5
17 | done
18 | 
19 | image_check=$(openstack image show -f value -c name ${compute_image})
20 | # If there is already a -latest image, re-name it with the date of its creation
21 | if [[ -n ${image_check} ]];
22 | then
23 |   old_image_date=$(openstack image show ${compute_image} -f value -c created_at | cut -d'T' -f 1)
24 |   backup_image_name=${compute_image::-7}-${old_image_date}
25 | 
26 |   if [[ ${old_image_date} == "$(date +%Y-%m-%d)" && -n "$(openstack image show -f value -c name ${backup_image_name})" ]]; 
27 |   then
28 |     openstack image delete ${backup_image_name}
29 |   fi
30 | 
31 |   openstack image set --name ${backup_image_name} ${compute_image}
32 | fi
33 | 
34 | openstack server image create --name ${compute_image} ${compute_instance}
35 | 
36 | count=0
37 | declare -i count
38 | until [[ ${count} -ge 20 || "${instance_check}" =~ "active" ]];
39 | do
40 |   instance_check=$(openstack image show -f value -c status ${compute_image})
41 |   count+=1
42 |   sleep 15
43 | done
44 | 
45 | if [[ ${count} -ge 20 ]];
46 | then
47 |   echo "Image still in queued status after 300 seconds"
48 |   exit 2
49 | fi
50 | 
51 | echo "Done after ${count} sleeps."
52 | 
53 | openstack image list | grep $(hostname -s)
54 | openstack server delete ${compute_instance}
55 | 


--------------------------------------------------------------------------------
/cron-node-check.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | sinfo_check=$(sinfo | grep -iE "drain|down")
 4 | 
 5 | #mail_domain=$(curl -s https://ipinfo.io/hostname)
 6 | mail_domain=$(host $(curl -s http://169.254.169.254/latest/meta-data/public-ipv4) | sed 's/.*pointer \(.*\)./\1/') 
 7 | 
 8 | email_addr=""
 9 | 
10 | try_count=0
11 | declare -i try_count
12 | until [[ -n $mail_domain || $try_count -ge 10 ]];
13 | do
14 |  sleep 3
15 |  mail_domain=$(curl -s https://ipinfo.io/hostname)
16 |  try_count=$try_count+1
17 |  echo $mail_domain, $try_count
18 | done
19 | 
20 | if [[ $try_count -ge 10 ]]; then
21 |  echo "failed to get domain name!"
22 |  exit 1
23 | fi
24 | 
25 | if [[ -n $sinfo_check ]]; then
26 |   echo $sinfo_check | mailx -r "node-check@$mail_domain" -s "NODE IN BAD STATE - $mail_domain" $email_addr
27 | #  echo "$sinfo_check  mailx -r "node-check@$mail_domain" -s "NODE IN BAD STATE - $mail_domain" $email_addr" # TESTING LINE
28 | fi
29 | 
30 | #Check for ACTIVE nodes without running/cf/cg jobs
31 | squeue_check=$(squeue -h -t CF,CG,R)
32 | 
33 | #source the openrc.sh for instance check
34 | $(sudo cat /etc/slurm/openrc.sh)
35 | compute_node_check=$(openstack server list | awk '/compute/ && /ACTIVE/')
36 | 
37 | if [[ -n $compute_node_check && -z $squeue_check ]]; then
38 |   echo $compute_node_check $squeue_check | mailx -r "node-check@$mail_domain" -s "NODE IN ACTIVE STATE WITHOUT JOBS- $mail_domain" $email_addr
39 | fi
40 | 


--------------------------------------------------------------------------------
/figures/virtual-clusters.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/access-ci-org/Jetstream_Cluster/29d3c0dfe54ed09c0501f9547c5b860e87efbbe3/figures/virtual-clusters.jpeg


--------------------------------------------------------------------------------
/install.sh:
--------------------------------------------------------------------------------
  1 | #!/bin/bash
  2 | 
  3 | OPTIND=1
  4 | 
  5 | docker_allow=0 #default to NOT installing docker; must be 0 or 1
  6 | jhub_build=0 #default to NOT installing jupyterhub; must be 0 or 1
  7 | 
  8 | while getopts ":jd" opt; do
  9 |   case ${opt} in
 10 |     d) docker_allow=1
 11 |       ;;
 12 |     j) jhub_build=1
 13 |       ;;
 14 |     \?) echo "BAD OPTION! $opt TRY AGAIN"
 15 |       exit 1
 16 |       ;;
 17 |   esac
 18 | done
 19 | 
 20 | if [[ ! -e /etc/slurm/openrc.sh ]]; then
 21 |   echo "NO OPENRC FOUND! CREATE ONE, AND TRY AGAIN!"
 22 |   exit 1
 23 | fi
 24 | 
 25 | if [[ $EUID -ne 0 ]]; then
 26 |    echo "This script must be run as root"
 27 |    exit 1
 28 | fi
 29 | 
 30 | #do this early, allow the user to leave while the rest runs!
 31 | source /etc/slurm/openrc.sh
 32 | 
 33 | OS_PREFIX=$(hostname -s)
 34 | OS_SLURM_KEYPAIR=${OS_PREFIX}-slurm-key
 35 | 
 36 | SUBNET_PREFIX=10.0.0
 37 | 
 38 | #Open the firewall on the internal network for Cent8
 39 | firewall-cmd --permanent --add-rich-rule="rule source address="${SUBNET_PREFIX}.0/24" family='ipv4' accept"
 40 | firewall-cmd --add-rich-rule="rule source address="${SUBNET_PREFIX}.0/24" family='ipv4' accept"
 41 | 
 42 | dnf -y install http://repos.openhpc.community/OpenHPC/2/CentOS_8/x86_64/ohpc-release-2-1.el8.x86_64.rpm \
 43 |        centos-release-openstack-train
 44 | 
 45 | dnf config-manager --set-enabled powertools
 46 | 
 47 | if [[ ${docker_allow} == 0 ]]; then
 48 |   dnf config-manager --set-disabled docker-ce-stable
 49 |   
 50 |   dnf -y remove containerd.io.x86_64 docker-ce.x86_64 docker-ce-cli.x86_64 docker-ce-rootless-extras.x86_64
 51 | fi
 52 | 
 53 | dnf -y --allowerasing install \
 54 |         ohpc-slurm-server \
 55 |         vim \
 56 |         ansible \
 57 |         mailx \
 58 |         lmod-ohpc \
 59 |         bash-completion \
 60 |         gnu9-compilers-ohpc \
 61 |         openmpi4-gnu9-ohpc \
 62 |         singularity-ohpc \
 63 |         lmod-defaults-gnu9-openmpi4-ohpc \
 64 |         moreutils \
 65 |         bind-utils \
 66 |         python3-openstackclient \
 67 |  	python3-pexpect
 68 | 
 69 | dnf -y update  # until the base python2-openstackclient install works out of the box!
 70 | 
 71 | #create user that can be used to submit jobs
 72 | [ ! -d /home/gateway-user ] && useradd -m gateway-user
 73 | 
 74 | [ ! -f slurm-key ] && ssh-keygen -b 2048 -t rsa -P "" -f slurm-key
 75 | 
 76 | # generate a local key for centos for after homedirs are mounted!
 77 | [ ! -f /home/centos/.ssh/id_rsa ] && su centos - -c 'ssh-keygen -t rsa -b 2048 -P "" -f /home/centos/.ssh/id_rsa && cat /home/centos/.ssh/id_rsa.pub >> /home/centos/.ssh/authorized_keys'
 78 | 
 79 | 
 80 | #create clouds.yaml file from contents of openrc
 81 | echo -e "clouds:
 82 |   tacc:
 83 |     auth:
 84 |       auth_url: '${OS_AUTH_URL}'
 85 |       application_credential_id: '${OS_APPLICATION_CREDENTIAL_ID}'
 86 |       application_credential_secret: '${OS_APPLICATION_CREDENTIAL_SECRET}'
 87 |     user_domain_name: tacc
 88 |     identity_api_version: 3
 89 |     project_domain_name: tacc
 90 |     auth_type: 'v3applicationcredential'" > clouds.yaml
 91 | 
 92 | #Make sure only root can read this
 93 | chmod 400 clouds.yaml
 94 | 
 95 | if [[ -n $(openstack keypair list | grep ${OS_SLURM_KEYPAIR}) ]]; then
 96 |   openstack keypair delete ${OS_SLURM_KEYPAIR}
 97 |   openstack keypair create --public-key slurm-key.pub ${OS_SLURM_KEYPAIR}
 98 | else
 99 |   openstack keypair create --public-key slurm-key.pub ${OS_SLURM_KEYPAIR}
100 | fi
101 | 
102 | #TACC-specific changes:
103 | 
104 | if [[ $OS_AUTH_URL =~ "tacc" ]]; then
105 |   #Insert headnode into /etc/hosts
106 |   echo "$(ip add show dev eth0 | awk '/inet / {sub("/24","",$2); print $2}') $(hostname) $(hostname -s)" >> /etc/hosts
107 | fi
108 | 
109 | #Get OS Network name of *this* server, and set as the network for compute-nodes
110 | # Only need this if you've changed the subnet name for some reason
111 | #headnode_os_subnet=$(openstack server show $(hostname | cut -f 1 -d'.') | awk '/addresses/ {print $4}' | cut -f 1 -d'=')
112 | #sed -i "s/network_name=.*/network_name=$headnode_os_subnet/" ./slurm_resume.sh
113 | 
114 | #Set compute node names to $OS_PREFIX-compute-
115 | sed -i "s/=compute-*/=${OS_PREFIX}-compute-/" ./slurm.conf
116 | sed -i "s/Host compute-*/Host ${OS_PREFIX}-compute-/" ./ssh.cfg
117 | 
118 | #set the subnet in ssh.cfg and compute_build_base_img.yml
119 | sed -i "s/Host 10.0.0.\*/Host ${SUBNET_PREFIX}.\*/" ./ssh.cfg
120 | sed -i "s/^\(.*\)10.0.0\(.*\)$/\1${SUBNET_PREFIX}\2/" ./compute_build_base_img.yml
121 | 
122 | # Deal with files required by slurm - better way to encapsulate this section?
123 | 
124 | mkdir -p -m 700 /etc/slurm/.ssh
125 | 
126 | cp slurm-key slurm-key.pub /etc/slurm/.ssh/
127 | 
128 | #Make sure slurm-user will still be valid after the nfs mount happens!
129 | cat slurm-key.pub >> /home/centos/.ssh/authorized_keys
130 | 
131 | chown -R slurm:slurm /etc/slurm/.ssh
132 | 
133 | setfacl -m u:slurm:rw /etc/hosts
134 | setfacl -m u:slurm:rwx /etc/
135 | 
136 | chmod +t /etc
137 | 
138 | #The following may be removed when appcred gen during cluster_create is working
139 | ##Possible to handle this at the cloud-init level? From a machine w/
140 | ## pre-loaded openrc, possible via user-data and write_files, yes.
141 | ## This needs a check for success, and if not, fail?
142 | ##export $(openstack application credential create -f shell ${OS_APP_CRED} | sed 's/^\(.*\)/OS_ac_\1/')
143 | ##echo -e "export OS_AUTH_TYPE=v3applicationcredential
144 | ##export OS_AUTH_URL=${OS_AUTH_URL}
145 | ##export OS_IDENTITY_API_VERSION=3
146 | ##export OS_REGION_NAME="RegionOne"
147 | ##export OS_INTERFACE=public
148 | ##export OS_APPLICATION_CREDENTIAL_ID=${OS_ac_id}
149 | ##export OS_APPLICATION_CREDENTIAL_SECRET=${OS_ac_secret} > /etc/slurm/openrc.sh
150 | #
151 | #echo -e "export OS_PROJECT_DOMAIN_NAME=tacc
152 | #export OS_USER_DOMAIN_NAME=tacc
153 | #export OS_PROJECT_NAME=${OS_PROJECT_NAME}
154 | #export OS_USERNAME=${OS_USERNAME}
155 | #export OS_PASSWORD=${OS_PASSWORD}
156 | #export OS_AUTH_URL=${OS_AUTH_URL}
157 | #export OS_IDENTITY_API_VERSION=3" > /etc/slurm/openrc.sh
158 | 
159 | #chown slurm:slurm /etc/slurm/openrc.sh
160 | 
161 | #chmod 400 /etc/slurm/openrc.sh
162 | 
163 | cp prevent-updates.ci /etc/slurm/
164 | 
165 | chown slurm:slurm /etc/slurm/openrc.sh
166 | chown slurm:slurm /etc/slurm/prevent-updates.ci
167 | 
168 | mkdir -p /var/log/slurm
169 | 
170 | touch /var/log/slurm/slurm_elastic.log
171 | touch /var/log/slurm/os_clean.log
172 | 
173 | chown -R slurm:slurm /var/log/slurm
174 | 
175 | cp slurm-logrotate.conf /etc/logrotate.d/slurm
176 | 
177 | setfacl -m u:slurm:rw /etc/ansible/hosts
178 | setfacl -m u:slurm:rwx /etc/ansible/
179 | 
180 | cp slurm_*.sh /usr/local/sbin/
181 | 
182 | cp cron-node-check.sh /usr/local/sbin/
183 | cp clean-os-error.sh /usr/local/sbin/
184 | 
185 | chown slurm:slurm /usr/local/sbin/slurm_*.sh
186 | chown slurm:slurm /usr/local/sbin/clean-os-error.sh
187 | 
188 | chown centos:centos /usr/local/sbin/cron-node-check.sh
189 | 
190 | echo "#13 */6  *  *  * centos     /usr/local/sbin/cron-node-check.sh" >> /etc/crontab
191 | echo "#*/4 *  *  *  * slurm     /usr/local/sbin/clean-os-error.sh" >> /etc/crontab
192 | 
193 | #"dynamic" hostname adjustment
194 | sed -i "s/ControlMachine=slurm-example/ControlMachine=$(hostname -s)/" ./slurm.conf
195 | cp slurm.conf /etc/slurm/slurm.conf
196 | 
197 | cp ansible.cfg /etc/ansible/
198 | 
199 | cp ssh.cfg /etc/ansible/
200 | 
201 | cp slurm_test.job ${HOME}
202 | 
203 | #create share directory
204 | mkdir -m 777 -p /export
205 | 
206 | #create export of homedirs and /export and /opt/ohpc/pub
207 | echo -e "/home ${SUBNET_PREFIX}.0/24(rw,no_root_squash) \n/export ${SUBNET_PREFIX}.0/24(rw,no_root_squash)" > /etc/exports
208 | echo -e "/opt/ohpc/pub ${SUBNET_PREFIX}.0/24(rw,no_root_squash)" >> /etc/exports
209 | 
210 | #Get latest CentOS7 minimal image for base - if os_image_facts or the os API allowed for wildcards,
211 | #  this would be different. But this is the world we live in.
212 | # After the naming convention change of May 5, 2020, this is no longer necessary - JS-API-Featured-CentOS7-Latest is the default.
213 | # These lines remain as a testament to past struggles.
214 | #centos_base_image=$(openstack image list --status active | grep -iE "API-Featured-centos7-[[:alpha:]]{3,4}-[0-9]{2}-[0-9]{4}" | awk '{print $4}' | tail -n 1)
215 | #centos_base_image="JS-API-Featured-CentOS7-Latest"
216 | #sed -i "s/\(\s*compute_base_image: \).*/\1\"${centos_base_image}\"/" compute_build_base_img.yml | head -n 10
217 | 
218 | #create temporary script to add local users
219 | echo "#!/bin/bash" > /tmp/add_users.sh
220 | cat /etc/passwd | awk -F':' '$4 >= 1001 && $4 < 65000 {print "useradd -M -u", $3, $1}' >> /tmp/add_users.sh
221 | 
222 | # build instance for compute base image generation, take snapshot, and destroy it
223 | echo "Creating compute image! based on $centos_base_image"
224 | 
225 | ansible-playbook -v --ssh-common-args='-o StrictHostKeyChecking=no' compute_build_base_img.yml
226 | 
227 | #to allow other users to run ansible!
228 | rm -r /tmp/.ansible
229 | 
230 | if [[ ${jhub_build} == 1 ]]; then
231 |   ansible-galaxy collection install community.general
232 |   ansible-galaxy collection install ansible.posix
233 |   ansible-galaxy install geerlingguy.certbot
234 | #  ansible-playbook -v --ssh-common-args='-o StrictHostKeyChecking=no' install_jupyterhub.yml
235 | fi
236 | 
237 | #Start required services
238 | systemctl enable slurmctld munge nfs-server rpcbind 
239 | systemctl restart munge slurmctld nfs-server rpcbind 
240 | 
241 | echo -e "If you wish to enable an email when node state is drain or down, please uncomment \nthe cron-node-check.sh job in /etc/crontab, and place your email of choice in the 'email_addr' variable \nat the beginning of /usr/local/sbin/cron-node-check.sh"
242 | 


--------------------------------------------------------------------------------
/install_jupyterhub.yml:
--------------------------------------------------------------------------------
  1 | ---
  2 | 
  3 | - hosts: localhost
  4 | 
  5 |   vars:
  6 |     headnode_public_hostname: FILL-ME-IN
  7 |     headnode_alternate_hostname: "" #Optional addition DNS entry pointing to your host
  8 |     certbot_create_if_missing: yes
  9 |     certbot_admin_email: FILL-ME-IN
 10 |     certbot_install_method: snap
 11 |     certbot_create_method: standalone
 12 |     certbot_certs:
 13 |       - domains:
 14 |           - "{{ headnode_public_hostname }}"
 15 |     certbot_create_standalone_stop_services:
 16 |       - httpd
 17 | 
 18 |   roles:
 19 |    - geerlingguy.certbot
 20 | 
 21 |   pre_tasks:
 22 | 
 23 |   - name: disable selinux
 24 |     ansible.posix.selinux:
 25 |       policy: targeted
 26 |       state: permissive
 27 | 
 28 |   - name: install httpd bits
 29 |     dnf:
 30 |       state: latest
 31 |       name:
 32 |         - nodejs
 33 |         - npm
 34 |         - httpd
 35 |         - httpd-filesystem
 36 |         - httpd-tools
 37 |         - python3-certbot-apache
 38 |         - snapd
 39 |         - snap-confine
 40 |         - snapd-selinux
 41 | 
 42 |   - name: start and enable snapd
 43 |     service: 
 44 |       name: snapd
 45 |       state: started
 46 |       enabled: yes
 47 | 
 48 |   - name: add http/s to firewalld
 49 |     shell: firewall-cmd --add-service http --zone=public --permanent && \
 50 |            firewall-cmd --add-service https --zone=public --permanent && \
 51 |            firewall-cmd --reload
 52 | 
 53 |   tasks:
 54 | 
 55 |   - name: Get the headnode private IP
 56 |     local_action:
 57 |       module: shell ip addr | grep -Eo '10.0.0.[0-9]*' | head -1
 58 |     register: headnode_private_ip
 59 | 
 60 |   - name: Get the headnode hostname 
 61 |     local_action:
 62 |       module: shell hostname -s
 63 |     register: headnode_hostname
 64 | 
 65 |   - name: https redirect config
 66 |     template:
 67 |       src: jhub_files/https_redirect.conf.j2
 68 |       dest: /etc/httpd/conf.d/https_redirect.conf
 69 |       owner: root
 70 |       mode: 0644
 71 | 
 72 |   - name: jupyterhub proxy config
 73 |     template:
 74 |       src: jhub_files/jupyterhub.conf.j2
 75 |       dest: /etc/httpd/conf.d/jupyterhub.conf
 76 |       owner: root
 77 |       mode: 0644
 78 | 
 79 |   - name: restart httpd
 80 |     service:
 81 |       name: httpd
 82 |       state: restarted
 83 |       enabled: yes
 84 | 
 85 |   - name: create a shadow group
 86 |     group:
 87 |       name: shadow
 88 |       state: present
 89 | 
 90 |   - name: let shadow group read /etc/shadow
 91 |     file:
 92 |       path: /etc/shadow
 93 |       mode: 0040
 94 |       group: shadow
 95 |       owner: root
 96 | 
 97 |   - name: create jupyterhub user and group
 98 |     user:
 99 |       name: jupyterhub
100 |       state: present
101 |       groups: shadow
102 | 
103 |   - name: create jupyterhub-users group
104 |     group:
105 |       name: jupyterhub-users
106 |       state: present
107 | 
108 |   - name: create sudoers directory
109 |     file:
110 |       path: /etc/sudoers.d
111 |       owner: root
112 |       group: root
113 |       mode: 0750
114 |       state: directory
115 | 
116 |   - name: set sudoers permissions for jupyterhub non-root
117 |     copy:
118 |       src: jhub_files/jhub_sudoers
119 |       dest: /etc/sudoers.d/
120 |       owner: root
121 |       group: root
122 |       mode: 0440
123 | 
124 |   - name: create jupyterhub config dir
125 |     file:
126 |       path: /etc/jupyterhub
127 |       owner: jupyterhub
128 |       group: jupyterhub
129 |       mode: 0755
130 |       state: directory
131 | 
132 |   - name: install devel deps for building Python
133 |     dnf:
134 |       state: latest
135 |       name:
136 |         - bzip2-devel
137 |         - ncurses-devel
138 |         - gdbm-devel
139 |         - libsqlite3x-devel
140 |         - sqlite-devel
141 |         - libuuid-devel
142 |         - uuid-devel
143 |         - openssl-devel
144 |         - readline-devel
145 |         - zlib-devel
146 |         - libffi-devel
147 |         - xz-devel
148 |         - tk-devel
149 | 
150 |   - name: install configurable-http-proxy
151 |     npm:
152 |       name: configurable-http-proxy
153 |       global: yes
154 | 
155 |   - name: create tmp builddir
156 |     file:
157 |       path: /tmp/build/
158 |       state: directory
159 | 
160 |   - name: fetch python source
161 |     unarchive:
162 |       src: https://www.python.org/ftp/python/3.8.10/Python-3.8.10.tgz
163 |       dest: /tmp/build/
164 |       remote_src: yes
165 | 
166 |   - name: run python configure
167 |     command: 
168 |       cmd: ./configure --prefix=/opt/python3
169 |       chdir: /tmp/build/Python-3.8.10
170 | 
171 |   - name: build python source
172 |     community.general.make:
173 |       target: all
174 |       chdir: /tmp/build/Python-3.8.10
175 |       
176 |   - name: install python
177 |     community.general.make:
178 |       target: install
179 |       chdir: /tmp/build/Python-3.8.10
180 |     become: yes
181 | 
182 |   - name: run python configure for public build
183 |     command: 
184 |       cmd: ./configure --prefix=/opt/ohpc/pub/compiler/python3
185 |       chdir: /tmp/build/Python-3.8.10
186 | 
187 |   - name: install python publicly
188 |     community.general.make:
189 |       target: install
190 |       chdir: /tmp/build/Python-3.8.10
191 |     become: yes
192 | 
193 |   - name: install jupyterhub
194 |     pip: 
195 |       executable: /opt/python3/bin/pip3
196 |       name: jupyterhub 
197 | 
198 |   - name: install wrapspawner
199 |     pip: 
200 |       executable: /opt/python3/bin/pip3
201 |       name: 
202 |         - wrapspawner
203 |         - traitlets<5
204 | 
205 |   - name: install jupyterlab
206 |     pip:
207 |       executable: /opt/ohpc/pub/compiler/python3/bin/pip3
208 |       name: jupyterlab
209 | 
210 |   - name: create jupyterhub service
211 |     template:
212 |       src: jhub_files/jhub_service.j2
213 |       dest: /etc/systemd/system/jupyterhub.service
214 |       mode: 0644
215 |       owner: root
216 |       group: root
217 | 
218 | #This is hard b/c of Batchspawner config
219 |   - name: install base jupyterhub config
220 |     copy:
221 |       src: jhub_files/jhub_conf.py
222 |       dest: /etc/jupyterhub/jupyterhub_config.py
223 |       owner: jupyterhub
224 |       group: jupyterhub
225 |       mode: 0644
226 | 
227 |   - name: set headnode ip in jhub_config
228 |     lineinfile:
229 |       regexp: JEC_HEADNODE_IP
230 |       line: "c.JupyterHub.hub_ip = \'{{ headnode_private_ip.stdout }}\' #JEC_HEADNODE_IP"
231 |       path: /etc/jupyterhub/jupyterhub_config.py
232 | 
233 |   - name: set hostname in jhub_config for batchspawner
234 |     lineinfile:
235 |       regexp: JEC_SPAWNER_HOSTNAME
236 |       line: "c.BatchSpawnerBase.req_host = \'{{ headnode_hostname.stdout }}\' #JEC_SPAWNER_HOSTNAME "
237 |       path: /etc/jupyterhub/jupyterhub_config.py
238 | 
239 |   - name: set hostname in jhub_config for batchspawner
240 |     lineinfile:
241 |       regexp: JEC_PUBLIC_HOSTNAME
242 |       line: "public_hostname = \'{{ headnode_public_hostname }}\' #JEC_PUBLIC_HOSTNAME"
243 |       path: /etc/jupyterhub/jupyterhub_config.py
244 | 
245 |   - name: install batchspawner to jhub python
246 |     pip:
247 |       name: batchspawner
248 |       executable: /opt/python3/bin/pip3
249 | 
250 |   - name: install batchspawner to public python
251 |     pip:
252 |       name: batchspawner
253 |       executable: /opt/ohpc/pub/compiler/python3/bin/pip3
254 | 
255 |   - name: create python module dir
256 |     file:
257 |       state: directory
258 |       path: /opt/ohpc/pub/modulefiles/python3.8
259 | 
260 |   - name: create python module
261 |     copy:
262 |       src: jhub_files/python_mod_3.8
263 |       dest: /opt/ohpc/pub/modulefiles/python3.8/3.8.10
264 |       mode: 0777
265 |       owner: root
266 |       group: root
267 | 
268 |   - name: start the jupyterhub service
269 |     service:
270 |       name: jupyterhub
271 |       enabled: yes
272 |       state: started
273 | 


--------------------------------------------------------------------------------
/install_local.sh:
--------------------------------------------------------------------------------
  1 | #!/bin/bash
  2 | 
  3 | # Uncomment below to help with debugging
  4 | # set -x
  5 | 
  6 | OPTIND=1
  7 | 
  8 | docker_allow=0 #default to NOT installing docker; must be 0 or 1
  9 | jhub_build=0 #default to NOT installing jupyterhub; must be 0 or 1
 10 | 
 11 | while getopts ":jd" opt; do
 12 |   case ${opt} in
 13 |     d) docker_allow=1
 14 |       ;;
 15 |     j) jhub_build=1
 16 |       ;;
 17 |     \?) echo "BAD OPTION! $opt TRY AGAIN"
 18 |       exit 1
 19 |       ;;
 20 |   esac
 21 | done
 22 | 
 23 | if [[ ! -e /etc/slurm/openrc.sh ]]; then
 24 |   echo "NO OPENRC FOUND! CREATE ONE, AND TRY AGAIN!"
 25 |   exit 1
 26 | fi
 27 | 
 28 | if [[ $EUID -ne 0 ]]; then
 29 |    echo "This script must be run as root"
 30 |    exit 1
 31 | fi
 32 | 
 33 | #do this early, allow the user to leave while the rest runs!
 34 | source /etc/slurm/openrc.sh
 35 | 
 36 | OS_PREFIX=$(hostname -s)
 37 | OS_SLURM_KEYPAIR=${OS_PREFIX}-slurm-key
 38 | 
 39 | HEADNODE_NETWORK=$(openstack server show $(hostname -s) | grep addresses | awk  -F'|' '{print $3}' | awk -F'=' '{print $1}' | awk '{$1=$1};1')
 40 | HEADNODE_IP=$(openstack server show $(hostname -s) | grep addresses | awk  -F'|' '{print $3}' | awk  -F'=' '{print $2}' | awk  -F',' '{print $1}')
 41 | SUBNET=$(ip addr | grep $HEADNODE_IP | awk '{print $2}')
 42 | SUBNET_PREFIX=$(ip addr | grep $HEADNODE_IP | awk '{print $2}' | awk -F. '{print $1 "." $2 "." $3 ".*"}')
 43 | 
 44 | #Open the firewall on the internal network for Cent8. Use offline tool as this runs as a cloud init script.
 45 | # See the discussion : https://titanwolf.org/Network/Articles/Article?AID=ca474d74-d632-4b1e-9b03-cd10add19633
 46 | firewall-offline-cmd --add-rich-rule="rule source address="${SUBNET}" family='ipv4' accept"
 47 | systemctl enable firewalld
 48 | systemctl restart firewalld
 49 | 
 50 | dnf -y install http://repos.openhpc.community/OpenHPC/2/CentOS_8/x86_64/ohpc-release-2-1.el8.x86_64.rpm
 51 | 
 52 | dnf config-manager --set-enabled powertools
 53 | 
 54 | if [[ ${docker_allow} == 0 ]]; then
 55 |   dnf config-manager --set-disabled docker-ce-stable
 56 | 
 57 |   dnf -y remove containerd.io.x86_64 docker-ce.x86_64 docker-ce-cli.x86_64 docker-ce-rootless-extras.x86_64
 58 | fi
 59 | 
 60 | dnf -y --allowerasing install \
 61 |         ohpc-slurm-server \
 62 |         vim \
 63 |         mailx \
 64 |         lmod-ohpc \
 65 |         bash-completion \
 66 |         gnu9-compilers-ohpc \
 67 |         openmpi4-gnu9-ohpc \
 68 |         singularity-ohpc \
 69 |         lmod-defaults-gnu9-openmpi4-ohpc \
 70 |         moreutils \
 71 |         bind-utils \
 72 |  	python3-pexpect
 73 | 
 74 | pip3 install ansible
 75 | mkdir -p /etc/ansible
 76 | ln -s /usr/local/bin/ansible-playbook /usr/bin/ansible-playbook
 77 | 
 78 | pip3 install openstacksdk==0.61.0
 79 | pip3 install python-openstackclient
 80 | 
 81 | dnf -y update  # until the base python2-openstackclient install works out of the box!
 82 | 
 83 | #create user that can be used to submit jobs
 84 | [ ! -d /home/gateway-user ] && useradd -m gateway-user
 85 | 
 86 | [ ! -f slurm-key ] && ssh-keygen -b 2048 -t rsa -P "" -f slurm-key
 87 | 
 88 | # generate a local key for centos for after homedirs are mounted!
 89 | [ ! -f /home/rocky/.ssh/id_rsa ] && su rocky - -c 'ssh-keygen -t rsa -b 2048 -P "" -f /home/rocky/.ssh/id_rsa && cat /home/rocky/.ssh/id_rsa.pub >> /home/rocky/.ssh/authorized_keys'
 90 | 
 91 | 
 92 | #create clouds.yaml file from contents of openrc
 93 | echo -e "clouds:
 94 |   openstack:
 95 |     auth:
 96 |       auth_url: '${OS_AUTH_URL}'
 97 |       application_credential_id: '${OS_APPLICATION_CREDENTIAL_ID}'
 98 |       application_credential_secret: '${OS_APPLICATION_CREDENTIAL_SECRET}'
 99 |     region_name: '${OS_REGION_NAME}'
100 |     interface: '${OS_INTERFACE}'
101 |     identity_api_version: '${OS_IDENTITY_API_VERSION}'
102 |     auth_type: 'v3applicationcredential'" > clouds.yaml
103 | 
104 | #Make sure only root can read this
105 | chmod 400 clouds.yaml
106 | 
107 | if [[ -n $(openstack keypair list | grep ${OS_SLURM_KEYPAIR}) ]]; then
108 |   openstack keypair delete ${OS_SLURM_KEYPAIR}
109 |   openstack keypair create --public-key slurm-key.pub ${OS_SLURM_KEYPAIR}
110 | else
111 |   openstack keypair create --public-key slurm-key.pub ${OS_SLURM_KEYPAIR}
112 | fi
113 | 
114 | #TACC-specific changes:
115 | 
116 | if [[ $OS_AUTH_URL =~ "tacc" ]]; then
117 |   #Insert headnode into /etc/hosts
118 |   echo "$(ip addr | grep -Eo '10.0.0.[0-9]*' | head -1) $(hostname) $(hostname -s)" >> /etc/hosts
119 | fi
120 | 
121 | #Get OS Network name of *this* server, and set as the network for compute-nodes
122 | # Only need this if you've changed the subnet name for some reason
123 | #headnode_os_subnet=$(openstack server show $(hostname | cut -f 1 -d'.') | awk '/addresses/ {print $4}' | cut -f 1 -d'=')
124 | #sed -i "s/network_name=.*/network_name=$headnode_os_subnet/" ./slurm_resume.sh
125 | 
126 | #Set compute node names to $OS_PREFIX-compute-
127 | sed -i "s/=compute-*/=${OS_PREFIX}-compute-/" ./slurm.conf
128 | sed -i "s/Host compute-*/Host ${OS_PREFIX}-compute-/" ./ssh.cfg
129 | 
130 | #set the subnet in ssh.cfg and compute_build_base_img.yml
131 | sed -i "s/Host 10.0.0.\*/Host ${SUBNET_PREFIX}/" ./ssh.cfg
132 | sed -i "s/{{ ansible_facts.hostname }}-elastic-net/${HEADNODE_NETWORK}/" ./compute_build_base_img.yml
133 | 
134 | # Deal with files required by slurm - better way to encapsulate this section?
135 | 
136 | mkdir -p -m 700 /etc/slurm/.ssh
137 | 
138 | cp slurm-key slurm-key.pub /etc/slurm/.ssh/
139 | 
140 | #Make sure slurm-user will still be valid after the nfs mount happens!
141 | cat slurm-key.pub >> /home/rocky/.ssh/authorized_keys
142 | 
143 | chown -R slurm:slurm /etc/slurm/.ssh
144 | 
145 | setfacl -m u:slurm:rw /etc/hosts
146 | setfacl -m u:slurm:rwx /etc/
147 | 
148 | chmod +t /etc
149 | 
150 | #The following may be removed when appcred gen during cluster_create is working
151 | ##Possible to handle this at the cloud-init level? From a machine w/
152 | ## pre-loaded openrc, possible via user-data and write_files, yes.
153 | ## This needs a check for success, and if not, fail?
154 | ##export $(openstack application credential create -f shell ${OS_APP_CRED} | sed 's/^\(.*\)/OS_ac_\1/')
155 | ##echo -e "export OS_AUTH_TYPE=v3applicationcredential
156 | ##export OS_AUTH_URL=${OS_AUTH_URL}
157 | ##export OS_IDENTITY_API_VERSION=3
158 | ##export OS_REGION_NAME="RegionOne"
159 | ##export OS_INTERFACE=public
160 | ##export OS_APPLICATION_CREDENTIAL_ID=${OS_ac_id}
161 | ##export OS_APPLICATION_CREDENTIAL_SECRET=${OS_ac_secret} > /etc/slurm/openrc.sh
162 | #
163 | #echo -e "export OS_PROJECT_DOMAIN_NAME=tacc
164 | #export OS_USER_DOMAIN_NAME=tacc
165 | #export OS_PROJECT_NAME=${OS_PROJECT_NAME}
166 | #export OS_USERNAME=${OS_USERNAME}
167 | #export OS_PASSWORD=${OS_PASSWORD}
168 | #export OS_AUTH_URL=${OS_AUTH_URL}
169 | #export OS_IDENTITY_API_VERSION=3" > /etc/slurm/openrc.sh
170 | 
171 | #chown slurm:slurm /etc/slurm/openrc.sh
172 | 
173 | #chmod 400 /etc/slurm/openrc.sh
174 | 
175 | cp prevent-updates.ci /etc/slurm/
176 | 
177 | chown slurm:slurm /etc/slurm/openrc.sh
178 | chown slurm:slurm /etc/slurm/prevent-updates.ci
179 | 
180 | mkdir -p /var/log/slurm
181 | 
182 | touch /var/log/slurm/slurm_elastic.log
183 | touch /var/log/slurm/os_clean.log
184 | 
185 | chown -R slurm:slurm /var/log/slurm
186 | 
187 | cp slurm-logrotate.conf /etc/logrotate.d/slurm
188 | 
189 | setfacl -m u:slurm:rw /etc/ansible/hosts
190 | setfacl -m u:slurm:rwx /etc/ansible/
191 | 
192 | cp slurm_*.sh /usr/local/sbin/
193 | 
194 | cp cron-node-check.sh /usr/local/sbin/
195 | cp clean-os-error.sh /usr/local/sbin/
196 | 
197 | chown slurm:slurm /usr/local/sbin/slurm_*.sh
198 | chown slurm:slurm /usr/local/sbin/clean-os-error.sh
199 | 
200 | chown rocky:rocky /usr/local/sbin/cron-node-check.sh
201 | 
202 | echo "#13 */6  *  *  * rocky     /usr/local/sbin/cron-node-check.sh" >> /etc/crontab
203 | echo "#*/4 *  *  *  * slurm     /usr/local/sbin/clean-os-error.sh" >> /etc/crontab
204 | 
205 | #"dynamic" hostname adjustment
206 | sed -i "s/ControlMachine=slurm-example/ControlMachine=$(hostname -s)/" ./slurm.conf
207 | cp slurm.conf /etc/slurm/slurm.conf
208 | 
209 | cp ansible.cfg /etc/ansible/
210 | 
211 | cp ssh.cfg /etc/ansible/
212 | 
213 | cp slurm_test.job ${HOME}
214 | 
215 | #create share directory
216 | mkdir -m 777 -p /export
217 | 
218 | #create export of homedirs and /export and /opt/ohpc/pub
219 | echo -e "/home ${SUBNET}(rw,no_root_squash) \n/export ${SUBNET}(rw,no_root_squash)" > /etc/exports
220 | echo -e "/opt/ohpc/pub ${SUBNET}(rw,no_root_squash)" >> /etc/exports
221 | 
222 | #Get latest CentOS7 minimal image for base - if os_image_facts or the os API allowed for wildcards,
223 | #  this would be different. But this is the world we live in.
224 | # After the naming convention change of May 5, 2020, this is no longer necessary - JS-API-Featured-CentOS7-Latest is the default.
225 | # These lines remain as a testament to past struggles.
226 | #centos_base_image=$(openstack image list --status active | grep -iE "API-Featured-centos7-[[:alpha:]]{3,4}-[0-9]{2}-[0-9]{4}" | awk '{print $4}' | tail -n 1)
227 | #centos_base_image="JS-API-Featured-CentOS7-Latest"
228 | #sed -i "s/\(\s*compute_base_image: \).*/\1\"${centos_base_image}\"/" compute_build_base_img.yml | head -n 10
229 | 
230 | #create temporary script to add local users
231 | echo "#!/bin/bash" > /tmp/add_users.sh
232 | cat /etc/passwd | awk -F':' '$4 >= 1001 && $4 < 65000 {print "useradd -M -u", $3, $1}' >> /tmp/add_users.sh
233 | 
234 | # build instance for compute base image generation, take snapshot, and destroy it
235 | echo "Creating compute image! based on $centos_base_image"
236 | 
237 | ansible-playbook -v --ssh-common-args='-o StrictHostKeyChecking=no' compute_build_base_img.yml
238 | 
239 | #to allow other users to run ansible!
240 | rm -r /tmp/.ansible
241 | 
242 | if [[ ${jhub_build} == 1 ]]; then
243 |   ansible-galaxy collection install community.general
244 |   ansible-galaxy collection install ansible.posix
245 |   ansible-galaxy install geerlingguy.certbot
246 | #  ansible-playbook -v --ssh-common-args='-o StrictHostKeyChecking=no' install_jupyterhub.yml
247 | fi
248 | 
249 | #Start required services
250 | systemctl enable slurmctld munge nfs-server rpcbind 
251 | systemctl restart munge slurmctld nfs-server rpcbind 
252 | 
253 | echo -e "If you wish to enable an email when node state is drain or down, please uncomment \nthe cron-node-check.sh job in /etc/crontab, and place your email of choice in the 'email_addr' variable \nat the beginning of /usr/local/sbin/cron-node-check.sh"
254 | 


--------------------------------------------------------------------------------
/jhub_files/https_redirect.conf.j2:
--------------------------------------------------------------------------------
 1 | <VirtualHost *:80>
 2 | ServerName {{ headnode_public_hostname }}
 3 | ServerAlias {{ headnode_alternate_hostname }}
 4 | {% raw %}
 5 | # redirect all port 80 traffic to 443
 6 | RewriteEngine on
 7 | ReWriteCond %{SERVER_PORT} !^443$
 8 | RewriteRule ^/(.*) https://%{HTTP_HOST}/$1 [NC,R,L]
 9 | RewriteCond %{SERVER_NAME} {% endraw %} ={{ headnode_alternate_hostname }}
10 | {% raw %}
11 | RewriteRule ^ https://%{SERVER_NAME}%{REQUEST_URI} [END,NE,R=permanent]
12 | RewriteCond %{SERVER_NAME} {% endraw %} ={{ headnode_public_hostname }}
13 | {% raw %}
14 | RewriteRule ^ https://%{SERVER_NAME}%{REQUEST_URI} [END,NE,R=permanent]
15 | </VirtualHost>
16 | {% endraw %}
17 | 


--------------------------------------------------------------------------------
/jhub_files/jhub_conf.py:
--------------------------------------------------------------------------------
   1 | # Configuration file for jupyterhub.
   2 | 
   3 | #------------------------------------------------------------------------------
   4 | # Application(SingletonConfigurable) configuration
   5 | #------------------------------------------------------------------------------
   6 | ## This is an application.
   7 | 
   8 | ## The date format used by logging formatters for %(asctime)s
   9 | #  Default: '%Y-%m-%d %H:%M:%S'
  10 | # c.Application.log_datefmt = '%Y-%m-%d %H:%M:%S'
  11 | 
  12 | ## The Logging format template
  13 | #  Default: '[%(name)s]%(highlevel)s %(message)s'
  14 | # c.Application.log_format = '[%(name)s]%(highlevel)s %(message)s'
  15 | 
  16 | ## Set the log level by value or name.
  17 | #  Choices: any of [0, 10, 20, 30, 40, 50, 'DEBUG', 'INFO', 'WARN', 'ERROR', 'CRITICAL']
  18 | #  Default: 30
  19 | # c.Application.log_level = 30
  20 | 
  21 | ## Instead of starting the Application, dump configuration to stdout
  22 | #  Default: False
  23 | # c.Application.show_config = False
  24 | 
  25 | ## Instead of starting the Application, dump configuration to stdout (as JSON)
  26 | #  Default: False
  27 | # c.Application.show_config_json = False
  28 | 
  29 | #------------------------------------------------------------------------------
  30 | # JupyterHub(Application) configuration
  31 | #------------------------------------------------------------------------------
  32 | ## An Application for starting a Multi-User Jupyter Notebook server.
  33 | 
  34 | ## Maximum number of concurrent servers that can be active at a time.
  35 | #  
  36 | #  Setting this can limit the total resources your users can consume.
  37 | #  
  38 | #  An active server is any server that's not fully stopped. It is considered
  39 | #  active from the time it has been requested until the time that it has
  40 | #  completely stopped.
  41 | #  
  42 | #  If this many user servers are active, users will not be able to launch new
  43 | #  servers until a server is shutdown. Spawn requests will be rejected with a 429
  44 | #  error asking them to try again.
  45 | #  
  46 | #  If set to 0, no limit is enforced.
  47 | #  Default: 0
  48 | # c.JupyterHub.active_server_limit = 0
  49 | 
  50 | ## Duration (in seconds) to determine the number of active users.
  51 | #  Default: 1800
  52 | # c.JupyterHub.active_user_window = 1800
  53 | 
  54 | ## Resolution (in seconds) for updating activity
  55 | #  
  56 | #  If activity is registered that is less than activity_resolution seconds more
  57 | #  recent than the current value, the new value will be ignored.
  58 | #  
  59 | #  This avoids too many writes to the Hub database.
  60 | #  Default: 30
  61 | # c.JupyterHub.activity_resolution = 30
  62 | 
  63 | ## Grant admin users permission to access single-user servers.
  64 | #  
  65 | #  Users should be properly informed if this is enabled.
  66 | #  Default: False
  67 | # c.JupyterHub.admin_access = False
  68 | 
  69 | ## DEPRECATED since version 0.7.2, use Authenticator.admin_users instead.
  70 | #  Default: set()
  71 | # c.JupyterHub.admin_users = set()
  72 | 
  73 | ## Allow named single-user servers per user
  74 | #  Default: False
  75 | # c.JupyterHub.allow_named_servers = False
  76 | 
  77 | ## Answer yes to any questions (e.g. confirm overwrite)
  78 | #  Default: False
  79 | # c.JupyterHub.answer_yes = False
  80 | 
  81 | ## PENDING DEPRECATION: consider using services
  82 | #  
  83 | #  Dict of token:username to be loaded into the database.
  84 | #  
  85 | #  Allows ahead-of-time generation of API tokens for use by externally managed
  86 | #  services, which authenticate as JupyterHub users.
  87 | #  
  88 | #  Consider using services for general services that talk to the JupyterHub API.
  89 | #  Default: {}
  90 | # c.JupyterHub.api_tokens = {}
  91 | 
  92 | ## Authentication for prometheus metrics
  93 | #  Default: True
  94 | # c.JupyterHub.authenticate_prometheus = True
  95 | 
  96 | ## Class for authenticating users.
  97 | #  
  98 | #          This should be a subclass of :class:`jupyterhub.auth.Authenticator`
  99 | #  
 100 | #          with an :meth:`authenticate` method that:
 101 | #  
 102 | #          - is a coroutine (asyncio or tornado)
 103 | #          - returns username on success, None on failure
 104 | #          - takes two arguments: (handler, data),
 105 | #            where `handler` is the calling web.RequestHandler,
 106 | #            and `data` is the POST form data from the login page.
 107 | #  
 108 | #          .. versionchanged:: 1.0
 109 | #              authenticators may be registered via entry points,
 110 | #              e.g. `c.JupyterHub.authenticator_class = 'pam'`
 111 | #  
 112 | #  Currently installed: 
 113 | #    - default: jupyterhub.auth.PAMAuthenticator
 114 | #    - dummy: jupyterhub.auth.DummyAuthenticator
 115 | #    - pam: jupyterhub.auth.PAMAuthenticator
 116 | #  Default: 'jupyterhub.auth.PAMAuthenticator'
 117 | # c.JupyterHub.authenticator_class = 'jupyterhub.auth.PAMAuthenticator'
 118 | 
 119 | ## The base URL of the entire application.
 120 | #  
 121 | #  Add this to the beginning of all JupyterHub URLs. Use base_url to run
 122 | #  JupyterHub within an existing website.
 123 | #  
 124 | #  .. deprecated: 0.9
 125 | #      Use JupyterHub.bind_url
 126 | #  Default: '/'
 127 | # c.JupyterHub.base_url = '/'
 128 | 
 129 | ## The public facing URL of the whole JupyterHub application.
 130 | #  
 131 | #  This is the address on which the proxy will bind. Sets protocol, ip, base_url
 132 | #  Default: 'http://:8000'
 133 | c.JupyterHub.bind_url = 'http://127.0.0.1:8000'
 134 | 
 135 | ## Whether to shutdown the proxy when the Hub shuts down.
 136 | #  
 137 | #  Disable if you want to be able to teardown the Hub while leaving the proxy
 138 | #  running.
 139 | #  
 140 | #  Only valid if the proxy was starting by the Hub process.
 141 | #  
 142 | #  If both this and cleanup_servers are False, sending SIGINT to the Hub will
 143 | #  only shutdown the Hub, leaving everything else running.
 144 | #  
 145 | #  The Hub should be able to resume from database state.
 146 | #  Default: True
 147 | # c.JupyterHub.cleanup_proxy = True
 148 | 
 149 | ## Whether to shutdown single-user servers when the Hub shuts down.
 150 | #  
 151 | #  Disable if you want to be able to teardown the Hub while leaving the single-
 152 | #  user servers running.
 153 | #  
 154 | #  If both this and cleanup_proxy are False, sending SIGINT to the Hub will only
 155 | #  shutdown the Hub, leaving everything else running.
 156 | #  
 157 | #  The Hub should be able to resume from database state.
 158 | #  Default: True
 159 | # c.JupyterHub.cleanup_servers = True
 160 | 
 161 | ## Maximum number of concurrent users that can be spawning at a time.
 162 | #  
 163 | #  Spawning lots of servers at the same time can cause performance problems for
 164 | #  the Hub or the underlying spawning system. Set this limit to prevent bursts of
 165 | #  logins from attempting to spawn too many servers at the same time.
 166 | #  
 167 | #  This does not limit the number of total running servers. See
 168 | #  active_server_limit for that.
 169 | #  
 170 | #  If more than this many users attempt to spawn at a time, their requests will
 171 | #  be rejected with a 429 error asking them to try again. Users will have to wait
 172 | #  for some of the spawning services to finish starting before they can start
 173 | #  their own.
 174 | #  
 175 | #  If set to 0, no limit is enforced.
 176 | #  Default: 100
 177 | # c.JupyterHub.concurrent_spawn_limit = 100
 178 | 
 179 | ## The config file to load
 180 | #  Default: 'jupyterhub_config.py'
 181 | # c.JupyterHub.config_file = 'jupyterhub_config.py'
 182 | 
 183 | ## DEPRECATED: does nothing
 184 | #  Default: False
 185 | # c.JupyterHub.confirm_no_ssl = False
 186 | 
 187 | ## Number of days for a login cookie to be valid. Default is two weeks.
 188 | #  Default: 14
 189 | # c.JupyterHub.cookie_max_age_days = 14
 190 | 
 191 | ## The cookie secret to use to encrypt cookies.
 192 | #  
 193 | #  Loaded from the JPY_COOKIE_SECRET env variable by default.
 194 | #  
 195 | #  Should be exactly 256 bits (32 bytes).
 196 | #  Default: b''
 197 | # c.JupyterHub.cookie_secret = b''
 198 | 
 199 | ## File in which to store the cookie secret.
 200 | #  Default: 'jupyterhub_cookie_secret'
 201 | # c.JupyterHub.cookie_secret_file = 'jupyterhub_cookie_secret'
 202 | 
 203 | ## The location of jupyterhub data files (e.g. /usr/local/share/jupyterhub)
 204 | #  Default: '/opt/python3/share/jupyterhub'
 205 | # c.JupyterHub.data_files_path = '/opt/python3/share/jupyterhub'
 206 | 
 207 | ## Include any kwargs to pass to the database connection. See
 208 | #  sqlalchemy.create_engine for details.
 209 | #  Default: {}
 210 | # c.JupyterHub.db_kwargs = {}
 211 | 
 212 | ## url for the database. e.g. `sqlite:///jupyterhub.sqlite`
 213 | #  Default: 'sqlite:///jupyterhub.sqlite'
 214 | # c.JupyterHub.db_url = 'sqlite:///jupyterhub.sqlite'
 215 | 
 216 | ## log all database transactions. This has A LOT of output
 217 | #  Default: False
 218 | # c.JupyterHub.debug_db = False
 219 | 
 220 | ## DEPRECATED since version 0.8: Use ConfigurableHTTPProxy.debug
 221 | #  Default: False
 222 | # c.JupyterHub.debug_proxy = False
 223 | 
 224 | ## If named servers are enabled, default name of server to spawn or open, e.g. by
 225 | #  user-redirect.
 226 | #  Default: ''
 227 | # c.JupyterHub.default_server_name = ''
 228 | 
 229 | ## The default URL for users when they arrive (e.g. when user directs to "/")
 230 | #  
 231 | #  By default, redirects users to their own server.
 232 | #  
 233 | #  Can be a Unicode string (e.g. '/hub/home') or a callable based on the handler
 234 | #  object:
 235 | #  
 236 | #  ::
 237 | #  
 238 | #      def default_url_fn(handler):
 239 | #          user = handler.current_user
 240 | #          if user and user.admin:
 241 | #              return '/hub/admin'
 242 | #          return '/hub/home'
 243 | #  
 244 | #      c.JupyterHub.default_url = default_url_fn
 245 | #  Default: traitlets.Undefined
 246 | # c.JupyterHub.default_url = traitlets.Undefined
 247 | 
 248 | ## Dict authority:dict(files). Specify the key, cert, and/or ca file for an
 249 | #  authority. This is useful for externally managed proxies that wish to use
 250 | #  internal_ssl.
 251 | #  
 252 | #  The files dict has this format (you must specify at least a cert)::
 253 | #  
 254 | #      {
 255 | #          'key': '/path/to/key.key',
 256 | #          'cert': '/path/to/cert.crt',
 257 | #          'ca': '/path/to/ca.crt'
 258 | #      }
 259 | #  
 260 | #  The authorities you can override: 'hub-ca', 'notebooks-ca', 'proxy-api-ca',
 261 | #  'proxy-client-ca', and 'services-ca'.
 262 | #  
 263 | #  Use with internal_ssl
 264 | #  Default: {}
 265 | # c.JupyterHub.external_ssl_authorities = {}
 266 | 
 267 | ## Register extra tornado Handlers for jupyterhub.
 268 | #  
 269 | #  Should be of the form ``("<regex>", Handler)``
 270 | #  
 271 | #  The Hub prefix will be added, so `/my-page` will be served at `/hub/my-page`.
 272 | #  Default: []
 273 | # c.JupyterHub.extra_handlers = []
 274 | 
 275 | ## DEPRECATED: use output redirection instead, e.g.
 276 | #  
 277 | #  jupyterhub &>> /var/log/jupyterhub.log
 278 | #  Default: ''
 279 | # c.JupyterHub.extra_log_file = ''
 280 | 
 281 | ## Extra log handlers to set on JupyterHub logger
 282 | #  Default: []
 283 | # c.JupyterHub.extra_log_handlers = []
 284 | 
 285 | ## Generate certs used for internal ssl
 286 | #  Default: False
 287 | # c.JupyterHub.generate_certs = False
 288 | 
 289 | ## Generate default config file
 290 | #  Default: False
 291 | # c.JupyterHub.generate_config = False
 292 | 
 293 | ## The URL on which the Hub will listen. This is a private URL for internal
 294 | #  communication. Typically set in combination with hub_connect_url. If a unix
 295 | #  socket, hub_connect_url **must** also be set.
 296 | #  
 297 | #  For example:
 298 | #  
 299 | #      "http://127.0.0.1:8081"
 300 | #      "unix+http://%2Fsrv%2Fjupyterhub%2Fjupyterhub.sock"
 301 | #  
 302 | #  .. versionadded:: 0.9
 303 | #  Default: ''
 304 | # c.JupyterHub.hub_bind_url = ''
 305 | 
 306 | ## The ip or hostname for proxies and spawners to use for connecting to the Hub.
 307 | #  
 308 | #  Use when the bind address (`hub_ip`) is 0.0.0.0, :: or otherwise different
 309 | #  from the connect address.
 310 | #  
 311 | #  Default: when `hub_ip` is 0.0.0.0 or ::, use `socket.gethostname()`, otherwise
 312 | #  use `hub_ip`.
 313 | #  
 314 | #  Note: Some spawners or proxy implementations might not support hostnames.
 315 | #  Check your spawner or proxy documentation to see if they have extra
 316 | #  requirements.
 317 | #  
 318 | #  .. versionadded:: 0.8
 319 | #  Default: ''
 320 | # c.JupyterHub.hub_connect_ip = ''
 321 | 
 322 | ## DEPRECATED
 323 | #  
 324 | #  Use hub_connect_url
 325 | #  
 326 | #  .. versionadded:: 0.8
 327 | #  
 328 | #  .. deprecated:: 0.9
 329 | #      Use hub_connect_url
 330 | #  Default: 0
 331 | # c.JupyterHub.hub_connect_port = 0
 332 | 
 333 | ## The URL for connecting to the Hub. Spawners, services, and the proxy will use
 334 | #  this URL to talk to the Hub.
 335 | #  
 336 | #  Only needs to be specified if the default hub URL is not connectable (e.g.
 337 | #  using a unix+http:// bind url).
 338 | #  
 339 | #  .. seealso::
 340 | #      JupyterHub.hub_connect_ip
 341 | #      JupyterHub.hub_bind_url
 342 | #  
 343 | #  .. versionadded:: 0.9
 344 | #  Default: ''
 345 | # c.JupyterHub.hub_connect_url = ''
 346 | 
 347 | ## The ip address for the Hub process to *bind* to.
 348 | #  
 349 | #  By default, the hub listens on localhost only. This address must be accessible
 350 | #  from the proxy and user servers. You may need to set this to a public ip or ''
 351 | #  for all interfaces if the proxy or user servers are in containers or on a
 352 | #  different host.
 353 | #  
 354 | #  See `hub_connect_ip` for cases where the bind and connect address should
 355 | #  differ, or `hub_bind_url` for setting the full bind URL.
 356 | #  Default: '127.0.0.1'
 357 | #jecoulte - this is the ip used by the singlespawner instance of a job to connect to the hub
 358 | c.JupyterHub.hub_ip = '{{ headnode_ip }}' #JEC_HEADNODE_IP
 359 | #c.JupyterHub.hub_ip = '127.0.0.1'
 360 | 
 361 | ## The internal port for the Hub process.
 362 | #  
 363 | #  This is the internal port of the hub itself. It should never be accessed
 364 | #  directly. See JupyterHub.port for the public port to use when accessing
 365 | #  jupyterhub. It is rare that this port should be set except in cases of port
 366 | #  conflict.
 367 | #  
 368 | #  See also `hub_ip` for the ip and `hub_bind_url` for setting the full bind URL.
 369 | #  Default: 8081
 370 | # c.JupyterHub.hub_port = 8081
 371 | 
 372 | ## Trigger implicit spawns after this many seconds.
 373 | #  
 374 | #  When a user visits a URL for a server that's not running, they are shown a
 375 | #  page indicating that the requested server is not running with a button to
 376 | #  spawn the server.
 377 | #  
 378 | #  Setting this to a positive value will redirect the user after this many
 379 | #  seconds, effectively clicking this button automatically for the users,
 380 | #  automatically beginning the spawn process.
 381 | #  
 382 | #  Warning: this can result in errors and surprising behavior when sharing access
 383 | #  URLs to actual servers, since the wrong server is likely to be started.
 384 | #  Default: 0
 385 | # c.JupyterHub.implicit_spawn_seconds = 0
 386 | 
 387 | ## Timeout (in seconds) to wait for spawners to initialize
 388 | #  
 389 | #  Checking if spawners are healthy can take a long time if many spawners are
 390 | #  active at hub start time.
 391 | #  
 392 | #  If it takes longer than this timeout to check, init_spawner will be left to
 393 | #  complete in the background and the http server is allowed to start.
 394 | #  
 395 | #  A timeout of -1 means wait forever, which can mean a slow startup of the Hub
 396 | #  but ensures that the Hub is fully consistent by the time it starts responding
 397 | #  to requests. This matches the behavior of jupyterhub 1.0.
 398 | #  
 399 | #  .. versionadded: 1.1.0
 400 | #  Default: 10
 401 | # c.JupyterHub.init_spawners_timeout = 10
 402 | 
 403 | ## The location to store certificates automatically created by JupyterHub.
 404 | #  
 405 | #  Use with internal_ssl
 406 | #  Default: 'internal-ssl'
 407 | # c.JupyterHub.internal_certs_location = 'internal-ssl'
 408 | 
 409 | ## Enable SSL for all internal communication
 410 | #  
 411 | #  This enables end-to-end encryption between all JupyterHub components.
 412 | #  JupyterHub will automatically create the necessary certificate authority and
 413 | #  sign notebook certificates as they're created.
 414 | #  Default: False
 415 | # c.JupyterHub.internal_ssl = False
 416 | 
 417 | ## The public facing ip of the whole JupyterHub application (specifically
 418 | #  referred to as the proxy).
 419 | #  
 420 | #  This is the address on which the proxy will listen. The default is to listen
 421 | #  on all interfaces. This is the only address through which JupyterHub should be
 422 | #  accessed by users.
 423 | #  
 424 | #  .. deprecated: 0.9
 425 | #      Use JupyterHub.bind_url
 426 | #  Default: ''
 427 | # c.JupyterHub.ip = ''
 428 | 
 429 | ## Supply extra arguments that will be passed to Jinja environment.
 430 | #  Default: {}
 431 | # c.JupyterHub.jinja_environment_options = {}
 432 | 
 433 | ## Interval (in seconds) at which to update last-activity timestamps.
 434 | #  Default: 300
 435 | # c.JupyterHub.last_activity_interval = 300
 436 | 
 437 | ## Dict of 'group': ['usernames'] to load at startup.
 438 | #  
 439 | #  This strictly *adds* groups and users to groups.
 440 | #  
 441 | #  Loading one set of groups, then starting JupyterHub again with a different set
 442 | #  will not remove users or groups from previous launches. That must be done
 443 | #  through the API.
 444 | #  Default: {}
 445 | # c.JupyterHub.load_groups = {}
 446 | 
 447 | ## The date format used by logging formatters for %(asctime)s
 448 | #  See also: Application.log_datefmt
 449 | # c.JupyterHub.log_datefmt = '%Y-%m-%d %H:%M:%S'
 450 | 
 451 | ## The Logging format template
 452 | #  See also: Application.log_format
 453 | # c.JupyterHub.log_format = '[%(name)s]%(highlevel)s %(message)s'
 454 | 
 455 | ## Set the log level by value or name.
 456 | #  See also: Application.log_level
 457 | # c.JupyterHub.log_level = 30
 458 | 
 459 | ## Specify path to a logo image to override the Jupyter logo in the banner.
 460 | #  Default: ''
 461 | # c.JupyterHub.logo_file = ''
 462 | 
 463 | ## Maximum number of concurrent named servers that can be created by a user at a
 464 | #  time.
 465 | #  
 466 | #  Setting this can limit the total resources a user can consume.
 467 | #  
 468 | #  If set to 0, no limit is enforced.
 469 | #  Default: 0
 470 | # c.JupyterHub.named_server_limit_per_user = 0
 471 | 
 472 | ## File to write PID Useful for daemonizing JupyterHub.
 473 | #  Default: ''
 474 | # c.JupyterHub.pid_file = ''
 475 | 
 476 | ## The public facing port of the proxy.
 477 | #  
 478 | #  This is the port on which the proxy will listen. This is the only port through
 479 | #  which JupyterHub should be accessed by users.
 480 | #  
 481 | #  .. deprecated: 0.9
 482 | #      Use JupyterHub.bind_url
 483 | #  Default: 8000
 484 | # c.JupyterHub.port = 8000
 485 | 
 486 | ## DEPRECATED since version 0.8 : Use ConfigurableHTTPProxy.api_url
 487 | #  Default: ''
 488 | # c.JupyterHub.proxy_api_ip = ''
 489 | 
 490 | ## DEPRECATED since version 0.8 : Use ConfigurableHTTPProxy.api_url
 491 | #  Default: 0
 492 | # c.JupyterHub.proxy_api_port = 0
 493 | 
 494 | ## DEPRECATED since version 0.8: Use ConfigurableHTTPProxy.auth_token
 495 | #  Default: ''
 496 | # c.JupyterHub.proxy_auth_token = ''
 497 | 
 498 | ## Interval (in seconds) at which to check if the proxy is running.
 499 | #  Default: 30
 500 | # c.JupyterHub.proxy_check_interval = 30
 501 | 
 502 | ## The class to use for configuring the JupyterHub proxy.
 503 | #  
 504 | #          Should be a subclass of :class:`jupyterhub.proxy.Proxy`.
 505 | #  
 506 | #          .. versionchanged:: 1.0
 507 | #              proxies may be registered via entry points,
 508 | #              e.g. `c.JupyterHub.proxy_class = 'traefik'`
 509 | #  
 510 | #  Currently installed: 
 511 | #    - configurable-http-proxy: jupyterhub.proxy.ConfigurableHTTPProxy
 512 | #    - default: jupyterhub.proxy.ConfigurableHTTPProxy
 513 | #  Default: 'jupyterhub.proxy.ConfigurableHTTPProxy'
 514 | # c.JupyterHub.proxy_class = 'jupyterhub.proxy.ConfigurableHTTPProxy'
 515 | 
 516 | ## DEPRECATED since version 0.8. Use ConfigurableHTTPProxy.command
 517 | #  Default: []
 518 | # c.JupyterHub.proxy_cmd = []
 519 | 
 520 | ## Recreate all certificates used within JupyterHub on restart.
 521 | #  
 522 | #  Note: enabling this feature requires restarting all notebook servers.
 523 | #  
 524 | #  Use with internal_ssl
 525 | #  Default: False
 526 | # c.JupyterHub.recreate_internal_certs = False
 527 | 
 528 | ## Redirect user to server (if running), instead of control panel.
 529 | #  Default: True
 530 | # c.JupyterHub.redirect_to_server = True
 531 | 
 532 | ## Purge and reset the database.
 533 | #  Default: False
 534 | # c.JupyterHub.reset_db = False
 535 | 
 536 | ## Interval (in seconds) at which to check connectivity of services with web
 537 | #  endpoints.
 538 | #  Default: 60
 539 | # c.JupyterHub.service_check_interval = 60
 540 | 
 541 | ## Dict of token:servicename to be loaded into the database.
 542 | #  
 543 | #  Allows ahead-of-time generation of API tokens for use by externally managed
 544 | #  services.
 545 | #  Default: {}
 546 | # c.JupyterHub.service_tokens = {}
 547 | 
 548 | ## List of service specification dictionaries.
 549 | #  
 550 | #  A service
 551 | #  
 552 | #  For instance::
 553 | #  
 554 | #      services = [
 555 | #          {
 556 | #              'name': 'cull_idle',
 557 | #              'command': ['/path/to/cull_idle_servers.py'],
 558 | #          },
 559 | #          {
 560 | #              'name': 'formgrader',
 561 | #              'url': 'http://127.0.0.1:1234',
 562 | #              'api_token': 'super-secret',
 563 | #              'environment':
 564 | #          }
 565 | #      ]
 566 | #  Default: []
 567 | # c.JupyterHub.services = []
 568 | 
 569 | ## Instead of starting the Application, dump configuration to stdout
 570 | #  See also: Application.show_config
 571 | # c.JupyterHub.show_config = False
 572 | 
 573 | ## Instead of starting the Application, dump configuration to stdout (as JSON)
 574 | #  See also: Application.show_config_json
 575 | # c.JupyterHub.show_config_json = False
 576 | 
 577 | ## Shuts down all user servers on logout
 578 | #  Default: False
 579 | # c.JupyterHub.shutdown_on_logout = False
 580 | 
 581 | ## The class to use for spawning single-user servers.
 582 | #  
 583 | #          Should be a subclass of :class:`jupyterhub.spawner.Spawner`.
 584 | #  
 585 | #          .. versionchanged:: 1.0
 586 | #              spawners may be registered via entry points,
 587 | #              e.g. `c.JupyterHub.spawner_class = 'localprocess'`
 588 | #  
 589 | #  Currently installed: 
 590 | #    - default: jupyterhub.spawner.LocalProcessSpawner
 591 | #    - localprocess: jupyterhub.spawner.LocalProcessSpawner
 592 | #    - simple: jupyterhub.spawner.SimpleLocalProcessSpawner
 593 | #  Default: 'jupyterhub.spawner.LocalProcessSpawner'
 594 | # c.JupyterHub.spawner_class = 'jupyterhub.spawner.LocalProcessSpawner'
 595 | 
 596 | ## Path to SSL certificate file for the public facing interface of the proxy
 597 | #  
 598 | #  When setting this, you should also set ssl_key
 599 | #  Default: ''
 600 | # c.JupyterHub.ssl_cert = ''
 601 | 
 602 | ## Path to SSL key file for the public facing interface of the proxy
 603 | #  
 604 | #  When setting this, you should also set ssl_cert
 605 | #  Default: ''
 606 | # c.JupyterHub.ssl_key = ''
 607 | 
 608 | ## Host to send statsd metrics to. An empty string (the default) disables sending
 609 | #  metrics.
 610 | #  Default: ''
 611 | # c.JupyterHub.statsd_host = ''
 612 | 
 613 | ## Port on which to send statsd metrics about the hub
 614 | #  Default: 8125
 615 | # c.JupyterHub.statsd_port = 8125
 616 | 
 617 | ## Prefix to use for all metrics sent by jupyterhub to statsd
 618 | #  Default: 'jupyterhub'
 619 | # c.JupyterHub.statsd_prefix = 'jupyterhub'
 620 | 
 621 | ## Run single-user servers on subdomains of this host.
 622 | #  
 623 | #  This should be the full `https://hub.domain.tld[:port]`.
 624 | #  
 625 | #  Provides additional cross-site protections for javascript served by single-
 626 | #  user servers.
 627 | #  
 628 | #  Requires `<username>.hub.domain.tld` to resolve to the same host as
 629 | #  `hub.domain.tld`.
 630 | #  
 631 | #  In general, this is most easily achieved with wildcard DNS.
 632 | #  
 633 | #  When using SSL (i.e. always) this also requires a wildcard SSL certificate.
 634 | #  Default: ''
 635 | # c.JupyterHub.subdomain_host = ''
 636 | 
 637 | ## Paths to search for jinja templates, before using the default templates.
 638 | #  Default: []
 639 | # c.JupyterHub.template_paths = []
 640 | 
 641 | ## Extra variables to be passed into jinja templates
 642 | #  Default: {}
 643 | # c.JupyterHub.template_vars = {}
 644 | 
 645 | ## Extra settings overrides to pass to the tornado application.
 646 | #  Default: {}
 647 | # c.JupyterHub.tornado_settings = {}
 648 | 
 649 | ## Trust user-provided tokens (via JupyterHub.service_tokens) to have good
 650 | #  entropy.
 651 | #  
 652 | #  If you are not inserting additional tokens via configuration file, this flag
 653 | #  has no effect.
 654 | #  
 655 | #  In JupyterHub 0.8, internally generated tokens do not pass through additional
 656 | #  hashing because the hashing is costly and does not increase the entropy of
 657 | #  already-good UUIDs.
 658 | #  
 659 | #  User-provided tokens, on the other hand, are not trusted to have good entropy
 660 | #  by default, and are passed through many rounds of hashing to stretch the
 661 | #  entropy of the key (i.e. user-provided tokens are treated as passwords instead
 662 | #  of random keys). These keys are more costly to check.
 663 | #  
 664 | #  If your inserted tokens are generated by a good-quality mechanism, e.g.
 665 | #  `openssl rand -hex 32`, then you can set this flag to True to reduce the cost
 666 | #  of checking authentication tokens.
 667 | #  Default: False
 668 | # c.JupyterHub.trust_user_provided_tokens = False
 669 | 
 670 | ## Names to include in the subject alternative name.
 671 | #  
 672 | #  These names will be used for server name verification. This is useful if
 673 | #  JupyterHub is being run behind a reverse proxy or services using ssl are on
 674 | #  different hosts.
 675 | #  
 676 | #  Use with internal_ssl
 677 | #  Default: []
 678 | # c.JupyterHub.trusted_alt_names = []
 679 | 
 680 | ## Downstream proxy IP addresses to trust.
 681 | #  
 682 | #  This sets the list of IP addresses that are trusted and skipped when
 683 | #  processing the `X-Forwarded-For` header. For example, if an external proxy is
 684 | #  used for TLS termination, its IP address should be added to this list to
 685 | #  ensure the correct client IP addresses are recorded in the logs instead of the
 686 | #  proxy server's IP address.
 687 | #  Default: []
 688 | # c.JupyterHub.trusted_downstream_ips = []
 689 | 
 690 | ## Upgrade the database automatically on start.
 691 | #  
 692 | #  Only safe if database is regularly backed up. Only SQLite databases will be
 693 | #  backed up to a local file automatically.
 694 | #  Default: False
 695 | # c.JupyterHub.upgrade_db = False
 696 | 
 697 | ## Callable to affect behavior of /user-redirect/
 698 | #  
 699 | #  Receives 4 parameters: 1. path - URL path that was provided after /user-
 700 | #  redirect/ 2. request - A Tornado HTTPServerRequest representing the current
 701 | #  request. 3. user - The currently authenticated user. 4. base_url - The
 702 | #  base_url of the current hub, for relative redirects
 703 | #  
 704 | #  It should return the new URL to redirect to, or None to preserve current
 705 | #  behavior.
 706 | #  Default: None
 707 | # c.JupyterHub.user_redirect_hook = None
 708 | 
 709 | #------------------------------------------------------------------------------
 710 | # Spawner(LoggingConfigurable) configuration
 711 | #------------------------------------------------------------------------------
 712 | ## Base class for spawning single-user notebook servers.
 713 | #  
 714 | #  Subclass this, and override the following methods:
 715 | #  
 716 | #  - load_state - get_state - start - stop - poll
 717 | #  
 718 | #  As JupyterHub supports multiple users, an instance of the Spawner subclass is
 719 | #  created for each user. If there are 20 JupyterHub users, there will be 20
 720 | #  instances of the subclass.
 721 | 
 722 | ## Extra arguments to be passed to the single-user server.
 723 | #  
 724 | #  Some spawners allow shell-style expansion here, allowing you to use
 725 | #  environment variables here. Most, including the default, do not. Consult the
 726 | #  documentation for your spawner to verify!
 727 | #  Default: []
 728 | # c.Spawner.args = []
 729 | 
 730 | ## An optional hook function that you can implement to pass `auth_state` to the
 731 | #  spawner after it has been initialized but before it starts. The `auth_state`
 732 | #  dictionary may be set by the `.authenticate()` method of the authenticator.
 733 | #  This hook enables you to pass some or all of that information to your spawner.
 734 | #  
 735 | #  Example::
 736 | #  
 737 | #      def userdata_hook(spawner, auth_state):
 738 | #          spawner.userdata = auth_state["userdata"]
 739 | #  
 740 | #      c.Spawner.auth_state_hook = userdata_hook
 741 | #  Default: None
 742 | # c.Spawner.auth_state_hook = None
 743 | 
 744 | ## The command used for starting the single-user server.
 745 | #  
 746 | #  Provide either a string or a list containing the path to the startup script
 747 | #  command. Extra arguments, other than this path, should be provided via `args`.
 748 | #  
 749 | #  This is usually set if you want to start the single-user server in a different
 750 | #  python environment (with virtualenv/conda) than JupyterHub itself.
 751 | #  
 752 | #  Some spawners allow shell-style expansion here, allowing you to use
 753 | #  environment variables. Most, including the default, do not. Consult the
 754 | #  documentation for your spawner to verify!
 755 | #  Default: ['jupyterhub-singleuser']
 756 | # c.Spawner.cmd = ['jupyterhub-singleuser']
 757 | 
 758 | ## Maximum number of consecutive failures to allow before shutting down
 759 | #  JupyterHub.
 760 | #  
 761 | #  This helps JupyterHub recover from a certain class of problem preventing
 762 | #  launch in contexts where the Hub is automatically restarted (e.g. systemd,
 763 | #  docker, kubernetes).
 764 | #  
 765 | #  A limit of 0 means no limit and consecutive failures will not be tracked.
 766 | #  Default: 0
 767 | # c.Spawner.consecutive_failure_limit = 0
 768 | 
 769 | ## Minimum number of cpu-cores a single-user notebook server is guaranteed to
 770 | #  have available.
 771 | #  
 772 | #  If this value is set to 0.5, allows use of 50% of one CPU. If this value is
 773 | #  set to 2, allows use of up to 2 CPUs.
 774 | #  
 775 | #  **This is a configuration setting. Your spawner must implement support for the
 776 | #  limit to work.** The default spawner, `LocalProcessSpawner`, does **not**
 777 | #  implement this support. A custom spawner **must** add support for this setting
 778 | #  for it to be enforced.
 779 | #  Default: None
 780 | # c.Spawner.cpu_guarantee = None
 781 | 
 782 | ## Maximum number of cpu-cores a single-user notebook server is allowed to use.
 783 | #  
 784 | #  If this value is set to 0.5, allows use of 50% of one CPU. If this value is
 785 | #  set to 2, allows use of up to 2 CPUs.
 786 | #  
 787 | #  The single-user notebook server will never be scheduled by the kernel to use
 788 | #  more cpu-cores than this. There is no guarantee that it can access this many
 789 | #  cpu-cores.
 790 | #  
 791 | #  **This is a configuration setting. Your spawner must implement support for the
 792 | #  limit to work.** The default spawner, `LocalProcessSpawner`, does **not**
 793 | #  implement this support. A custom spawner **must** add support for this setting
 794 | #  for it to be enforced.
 795 | #  Default: None
 796 | # c.Spawner.cpu_limit = None
 797 | 
 798 | ## Enable debug-logging of the single-user server
 799 | #  Default: False
 800 | # c.Spawner.debug = False
 801 | 
 802 | ## The URL the single-user server should start in.
 803 | #  
 804 | #  `{username}` will be expanded to the user's username
 805 | #  
 806 | #  Example uses:
 807 | #  
 808 | #  - You can set `notebook_dir` to `/` and `default_url` to `/tree/home/{username}` to allow people to
 809 | #    navigate the whole filesystem from their notebook server, but still start in their home directory.
 810 | #  - Start with `/notebooks` instead of `/tree` if `default_url` points to a notebook instead of a directory.
 811 | #  - You can set this to `/lab` to have JupyterLab start by default, rather than Jupyter Notebook.
 812 | #  Default: ''
 813 | # c.Spawner.default_url = ''
 814 | 
 815 | ## Disable per-user configuration of single-user servers.
 816 | #  
 817 | #  When starting the user's single-user server, any config file found in the
 818 | #  user's $HOME directory will be ignored.
 819 | #  
 820 | #  Note: a user could circumvent this if the user modifies their Python
 821 | #  environment, such as when they have their own conda environments / virtualenvs
 822 | #  / containers.
 823 | #  Default: False
 824 | # c.Spawner.disable_user_config = False
 825 | 
 826 | ## List of environment variables for the single-user server to inherit from the
 827 | #  JupyterHub process.
 828 | #  
 829 | #  This list is used to ensure that sensitive information in the JupyterHub
 830 | #  process's environment (such as `CONFIGPROXY_AUTH_TOKEN`) is not passed to the
 831 | #  single-user server's process.
 832 | #  Default: ['PATH', 'PYTHONPATH', 'CONDA_ROOT', 'CONDA_DEFAULT_ENV', 'VIRTUAL_ENV', 'LANG', 'LC_ALL', 'JUPYTERHUB_SINGLEUSER_APP']
 833 | # c.Spawner.env_keep = ['PATH', 'PYTHONPATH', 'CONDA_ROOT', 'CONDA_DEFAULT_ENV', 'VIRTUAL_ENV', 'LANG', 'LC_ALL', 'JUPYTERHUB_SINGLEUSER_APP']
 834 | 
 835 | ## Extra environment variables to set for the single-user server's process.
 836 | #  
 837 | #  Environment variables that end up in the single-user server's process come from 3 sources:
 838 | #    - This `environment` configurable
 839 | #    - The JupyterHub process' environment variables that are listed in `env_keep`
 840 | #    - Variables to establish contact between the single-user notebook and the hub (such as JUPYTERHUB_API_TOKEN)
 841 | #  
 842 | #  The `environment` configurable should be set by JupyterHub administrators to
 843 | #  add installation specific environment variables. It is a dict where the key is
 844 | #  the name of the environment variable, and the value can be a string or a
 845 | #  callable. If it is a callable, it will be called with one parameter (the
 846 | #  spawner instance), and should return a string fairly quickly (no blocking
 847 | #  operations please!).
 848 | #  
 849 | #  Note that the spawner class' interface is not guaranteed to be exactly same
 850 | #  across upgrades, so if you are using the callable take care to verify it
 851 | #  continues to work after upgrades!
 852 | #  
 853 | #  .. versionchanged:: 1.2
 854 | #      environment from this configuration has highest priority,
 855 | #      allowing override of 'default' env variables,
 856 | #      such as JUPYTERHUB_API_URL.
 857 | #  Default: {}
 858 | # c.Spawner.environment = {}
 859 | 
 860 | ## Timeout (in seconds) before giving up on a spawned HTTP server
 861 | #  
 862 | #  Once a server has successfully been spawned, this is the amount of time we
 863 | #  wait before assuming that the server is unable to accept connections.
 864 | #  Default: 30
 865 | # c.Spawner.http_timeout = 30
 866 | 
 867 | ## The IP address (or hostname) the single-user server should listen on.
 868 | #  
 869 | #  The JupyterHub proxy implementation should be able to send packets to this
 870 | #  interface.
 871 | #  Default: ''
 872 | # c.Spawner.ip = ''
 873 | 
 874 | ## Minimum number of bytes a single-user notebook server is guaranteed to have
 875 | #  available.
 876 | #  
 877 | #  Allows the following suffixes:
 878 | #    - K -> Kilobytes
 879 | #    - M -> Megabytes
 880 | #    - G -> Gigabytes
 881 | #    - T -> Terabytes
 882 | #  
 883 | #  **This is a configuration setting. Your spawner must implement support for the
 884 | #  limit to work.** The default spawner, `LocalProcessSpawner`, does **not**
 885 | #  implement this support. A custom spawner **must** add support for this setting
 886 | #  for it to be enforced.
 887 | #  Default: None
 888 | # c.Spawner.mem_guarantee = None
 889 | 
 890 | ## Maximum number of bytes a single-user notebook server is allowed to use.
 891 | #  
 892 | #  Allows the following suffixes:
 893 | #    - K -> Kilobytes
 894 | #    - M -> Megabytes
 895 | #    - G -> Gigabytes
 896 | #    - T -> Terabytes
 897 | #  
 898 | #  If the single user server tries to allocate more memory than this, it will
 899 | #  fail. There is no guarantee that the single-user notebook server will be able
 900 | #  to allocate this much memory - only that it can not allocate more than this.
 901 | #  
 902 | #  **This is a configuration setting. Your spawner must implement support for the
 903 | #  limit to work.** The default spawner, `LocalProcessSpawner`, does **not**
 904 | #  implement this support. A custom spawner **must** add support for this setting
 905 | #  for it to be enforced.
 906 | #  Default: None
 907 | # c.Spawner.mem_limit = None
 908 | 
 909 | ## Path to the notebook directory for the single-user server.
 910 | #  
 911 | #  The user sees a file listing of this directory when the notebook interface is
 912 | #  started. The current interface does not easily allow browsing beyond the
 913 | #  subdirectories in this directory's tree.
 914 | #  
 915 | #  `~` will be expanded to the home directory of the user, and {username} will be
 916 | #  replaced with the name of the user.
 917 | #  
 918 | #  Note that this does *not* prevent users from accessing files outside of this
 919 | #  path! They can do so with many other means.
 920 | #  Default: ''
 921 | # c.Spawner.notebook_dir = ''
 922 | 
 923 | ## An HTML form for options a user can specify on launching their server.
 924 | #  
 925 | #  The surrounding `<form>` element and the submit button are already provided.
 926 | #  
 927 | #  For example:
 928 | #  
 929 | #  .. code:: html
 930 | #  
 931 | #      Set your key:
 932 | #      <input name="key" val="default_key"></input>
 933 | #      <br>
 934 | #      Choose a letter:
 935 | #      <select name="letter" multiple="true">
 936 | #        <option value="A">The letter A</option>
 937 | #        <option value="B">The letter B</option>
 938 | #      </select>
 939 | #  
 940 | #  The data from this form submission will be passed on to your spawner in
 941 | #  `self.user_options`
 942 | #  
 943 | #  Instead of a form snippet string, this could also be a callable that takes as
 944 | #  one parameter the current spawner instance and returns a string. The callable
 945 | #  will be called asynchronously if it returns a future, rather than a str. Note
 946 | #  that the interface of the spawner class is not deemed stable across versions,
 947 | #  so using this functionality might cause your JupyterHub upgrades to break.
 948 | #  Default: traitlets.Undefined
 949 | # c.Spawner.options_form = traitlets.Undefined
 950 | 
 951 | ## Interval (in seconds) on which to poll the spawner for single-user server's
 952 | #  status.
 953 | #  
 954 | #  At every poll interval, each spawner's `.poll` method is called, which checks
 955 | #  if the single-user server is still running. If it isn't running, then
 956 | #  JupyterHub modifies its own state accordingly and removes appropriate routes
 957 | #  from the configurable proxy.
 958 | #  Default: 30
 959 | # c.Spawner.poll_interval = 30
 960 | 
 961 | ## The port for single-user servers to listen on.
 962 | #  
 963 | #  Defaults to `0`, which uses a randomly allocated port number each time.
 964 | #  
 965 | #  If set to a non-zero value, all Spawners will use the same port, which only
 966 | #  makes sense if each server is on a different address, e.g. in containers.
 967 | #  
 968 | #  New in version 0.7.
 969 | #  Default: 0
 970 | # c.Spawner.port = 0
 971 | 
 972 | ## An optional hook function that you can implement to do work after the spawner
 973 | #  stops.
 974 | #  
 975 | #  This can be set independent of any concrete spawner implementation.
 976 | #  Default: None
 977 | # c.Spawner.post_stop_hook = None
 978 | 
 979 | ## An optional hook function that you can implement to do some bootstrapping work
 980 | #  before the spawner starts. For example, create a directory for your user or
 981 | #  load initial content.
 982 | #  
 983 | #  This can be set independent of any concrete spawner implementation.
 984 | #  
 985 | #  This maybe a coroutine.
 986 | #  
 987 | #  Example::
 988 | #  
 989 | #      from subprocess import check_call
 990 | #      def my_hook(spawner):
 991 | #          username = spawner.user.name
 992 | #          check_call(['./examples/bootstrap-script/bootstrap.sh', username])
 993 | #  
 994 | #      c.Spawner.pre_spawn_hook = my_hook
 995 | #  Default: None
 996 | # c.Spawner.pre_spawn_hook = None
 997 | 
 998 | ## List of SSL alt names
 999 | #  
1000 | #  May be set in config if all spawners should have the same value(s), or set at
1001 | #  runtime by Spawner that know their names.
1002 | #  Default: []
1003 | # c.Spawner.ssl_alt_names = []
1004 | 
1005 | ## Whether to include DNS:localhost, IP:127.0.0.1 in alt names
1006 | #  Default: True
1007 | # c.Spawner.ssl_alt_names_include_local = True
1008 | 
1009 | ## Timeout (in seconds) before giving up on starting of single-user server.
1010 | #  
1011 | #  This is the timeout for start to return, not the timeout for the server to
1012 | #  respond. Callers of spawner.start will assume that startup has failed if it
1013 | #  takes longer than this. start should return when the server process is started
1014 | #  and its location is known.
1015 | #  Default: 60
1016 | # c.Spawner.start_timeout = 60
1017 | 
1018 | #------------------------------------------------------------------------------
1019 | # Authenticator(LoggingConfigurable) configuration
1020 | #------------------------------------------------------------------------------
1021 | ## Base class for implementing an authentication provider for JupyterHub
1022 | 
1023 | ## Set of users that will have admin rights on this JupyterHub.
1024 | #  
1025 | #  Admin users have extra privileges:
1026 | #   - Use the admin panel to see list of users logged in
1027 | #   - Add / remove users in some authenticators
1028 | #   - Restart / halt the hub
1029 | #   - Start / stop users' single-user servers
1030 | #   - Can access each individual users' single-user server (if configured)
1031 | #  
1032 | #  Admin access should be treated the same way root access is.
1033 | #  
1034 | #  Defaults to an empty set, in which case no user has admin access.
1035 | #  Default: set()
1036 | # c.Authenticator.admin_users = set()
1037 | c.Authenticator.admin_users = { 'centos' }
1038 | 
1039 | ## Set of usernames that are allowed to log in.
1040 | #  
1041 | #  Use this with supported authenticators to restrict which users can log in.
1042 | #  This is an additional list that further restricts users, beyond whatever
1043 | #  restrictions the authenticator has in place.
1044 | #  
1045 | #  If empty, does not perform any additional restriction.
1046 | #  
1047 | #  .. versionchanged:: 1.2
1048 | #      `Authenticator.whitelist` renamed to `allowed_users`
1049 | #  Default: set()
1050 | # c.Authenticator.allowed_users = set()
1051 | 
1052 | ## The max age (in seconds) of authentication info before forcing a refresh of
1053 | #  user auth info.
1054 | #  
1055 | #  Refreshing auth info allows, e.g. requesting/re-validating auth tokens.
1056 | #  
1057 | #  See :meth:`.refresh_user` for what happens when user auth info is refreshed
1058 | #  (nothing by default).
1059 | #  Default: 300
1060 | # c.Authenticator.auth_refresh_age = 300
1061 | 
1062 | ## Automatically begin the login process
1063 | #  
1064 | #  rather than starting with a "Login with..." link at `/hub/login`
1065 | #  
1066 | #  To work, `.login_url()` must give a URL other than the default `/hub/login`,
1067 | #  such as an oauth handler or another automatic login handler, registered with
1068 | #  `.get_handlers()`.
1069 | #  
1070 | #  .. versionadded:: 0.8
1071 | #  Default: False
1072 | # c.Authenticator.auto_login = False
1073 | 
1074 | ## Set of usernames that are not allowed to log in.
1075 | #  
1076 | #  Use this with supported authenticators to restrict which users can not log in.
1077 | #  This is an additional block list that further restricts users, beyond whatever
1078 | #  restrictions the authenticator has in place.
1079 | #  
1080 | #  If empty, does not perform any additional restriction.
1081 | #  
1082 | #  .. versionadded: 0.9
1083 | #  
1084 | #  .. versionchanged:: 1.2
1085 | #      `Authenticator.blacklist` renamed to `blocked_users`
1086 | #  Default: set()
1087 | # c.Authenticator.blocked_users = set()
1088 | 
1089 | ## Delete any users from the database that do not pass validation
1090 | #  
1091 | #  When JupyterHub starts, `.add_user` will be called on each user in the
1092 | #  database to verify that all users are still valid.
1093 | #  
1094 | #  If `delete_invalid_users` is True, any users that do not pass validation will
1095 | #  be deleted from the database. Use this if users might be deleted from an
1096 | #  external system, such as local user accounts.
1097 | #  
1098 | #  If False (default), invalid users remain in the Hub's database and a warning
1099 | #  will be issued. This is the default to avoid data loss due to config changes.
1100 | #  Default: False
1101 | # c.Authenticator.delete_invalid_users = False
1102 | 
1103 | ## Enable persisting auth_state (if available).
1104 | #  
1105 | #  auth_state will be encrypted and stored in the Hub's database. This can
1106 | #  include things like authentication tokens, etc. to be passed to Spawners as
1107 | #  environment variables.
1108 | #  
1109 | #  Encrypting auth_state requires the cryptography package.
1110 | #  
1111 | #  Additionally, the JUPYTERHUB_CRYPT_KEY environment variable must contain one
1112 | #  (or more, separated by ;) 32B encryption keys. These can be either base64 or
1113 | #  hex-encoded.
1114 | #  
1115 | #  If encryption is unavailable, auth_state cannot be persisted.
1116 | #  
1117 | #  New in JupyterHub 0.8
1118 | #  Default: False
1119 | # c.Authenticator.enable_auth_state = False
1120 | 
1121 | ## An optional hook function that you can implement to do some bootstrapping work
1122 | #  during authentication. For example, loading user account details from an
1123 | #  external system.
1124 | #  
1125 | #  This function is called after the user has passed all authentication checks
1126 | #  and is ready to successfully authenticate. This function must return the
1127 | #  authentication dict reguardless of changes to it.
1128 | #  
1129 | #  This maybe a coroutine.
1130 | #  
1131 | #  .. versionadded: 1.0
1132 | #  
1133 | #  Example::
1134 | #  
1135 | #      import os, pwd
1136 | #      def my_hook(authenticator, handler, authentication):
1137 | #          user_data = pwd.getpwnam(authentication['name'])
1138 | #          spawn_data = {
1139 | #              'pw_data': user_data
1140 | #              'gid_list': os.getgrouplist(authentication['name'], user_data.pw_gid)
1141 | #          }
1142 | #  
1143 | #          if authentication['auth_state'] is None:
1144 | #              authentication['auth_state'] = {}
1145 | #          authentication['auth_state']['spawn_data'] = spawn_data
1146 | #  
1147 | #          return authentication
1148 | #  
1149 | #      c.Authenticator.post_auth_hook = my_hook
1150 | #  Default: None
1151 | # c.Authenticator.post_auth_hook = None
1152 | 
1153 | ## Force refresh of auth prior to spawn.
1154 | #  
1155 | #  This forces :meth:`.refresh_user` to be called prior to launching a server, to
1156 | #  ensure that auth state is up-to-date.
1157 | #  
1158 | #  This can be important when e.g. auth tokens that may have expired are passed
1159 | #  to the spawner via environment variables from auth_state.
1160 | #  
1161 | #  If refresh_user cannot refresh the user auth data, launch will fail until the
1162 | #  user logs in again.
1163 | #  Default: False
1164 | # c.Authenticator.refresh_pre_spawn = False
1165 | 
1166 | ## Dictionary mapping authenticator usernames to JupyterHub users.
1167 | #  
1168 | #  Primarily used to normalize OAuth user names to local users.
1169 | #  Default: {}
1170 | # c.Authenticator.username_map = {}
1171 | 
1172 | ## Regular expression pattern that all valid usernames must match.
1173 | #  
1174 | #  If a username does not match the pattern specified here, authentication will
1175 | #  not be attempted.
1176 | #  
1177 | #  If not set, allow any username.
1178 | #  Default: ''
1179 | # c.Authenticator.username_pattern = ''
1180 | 
1181 | ## Deprecated, use `Authenticator.allowed_users`
1182 | #  Default: set()
1183 | # c.Authenticator.whitelist = set()
1184 | 
1185 | #------------------------------------------------------------------------------
1186 | # CryptKeeper(SingletonConfigurable) configuration
1187 | #------------------------------------------------------------------------------
1188 | ## Encapsulate encryption configuration
1189 | #  
1190 | #  Use via the encryption_config singleton below.
1191 | 
1192 | #  Default: []
1193 | # c.CryptKeeper.keys = []
1194 | 
1195 | ## The number of threads to allocate for encryption
1196 | #  Default: 2
1197 | # c.CryptKeeper.n_threads = 2
1198 | 
1199 | #------------------------------------------------------------------------------
1200 | # Pagination(Configurable) configuration
1201 | #------------------------------------------------------------------------------
1202 | ## Default number of entries per page for paginated results.
1203 | #  Default: 100
1204 | # c.Pagination.default_per_page = 100
1205 | 
1206 | ## Maximum number of entries per page for paginated results.
1207 | #  Default: 250
1208 | # c.Pagination.max_per_page = 250
1209 | 
1210 | 
1211 | #------------------------------------------------------------------------------
1212 | # BatchSpawner Configuration
1213 | #------------------------------------------------------------------------------
1214 | #BatchSpawner config:
1215 | import batchspawner
1216 | #c.JupyterHub.spawner_class = 'batchspawner.SlurmSpawner'
1217 | c.JupyterHub.spawner_class = 'wrapspawner.ProfilesSpawner'
1218 | c.Spawner.http_timeout = 300
1219 | c.Spawner.start_timeout = 300
1220 | c.Spawner.poll_interval = 10
1221 | c.BatchSpawnerBase.req_nprocs = '2'
1222 | c.BatchSpawnerBase.req_partition = 'cloud'
1223 | c.BatchSpawnerBase.req_host = '{{ headnode_hostname }}' #JEC_SPAWNER_HOSTNAME 
1224 | c.BatchSpawnerBase.req_runtime = '12:00:00'
1225 | 
1226 | c.SlurmSpawner.cmd = "jupyter-labhub"
1227 | 
1228 | c.SlurmSpawner.batch_script="""#!/bin/bash
1229 | #SBATCH --output={{homedir}}/jupyterhub_slurmspawner_%j.log
1230 | #SBATCH --job-name=spawner-jupyterhub
1231 | #SBATCH --chdir={{homedir}}
1232 | #SBATCH --export={{keepvars}}
1233 | #SBATCH --get-user-env=L
1234 | {% if partition  %}#SBATCH --partition={{partition}}
1235 | {% endif %}{% if runtime    %}#SBATCH --time={{runtime}}
1236 | {% endif %}{% if memory     %}#SBATCH --mem={{memory}}
1237 | {% endif %}{% if gres       %}#SBATCH --gres={{gres}}
1238 | {% endif %}{% if nprocs     %}#SBATCH --cpus-per-task={{nprocs}}
1239 | {% endif %}{% if reservation%}#SBATCH --reservation={{reservation}}
1240 | {% endif %}{% if options    %}#SBATCH {{options}}{% endif %}
1241 | set -euo pipefail
1242 | trap 'echo SIGTERM received' TERM
1243 | module load python3.8
1244 | 
1245 | # to avoid https://github.com/jupyter/notebook/issues/1318
1246 | export XDG_RUNTIME_DIR=$HOME/.jupyter-run
1247 | 
1248 | {{cmd}}
1249 | 
1250 | echo "jupyterlab-singleuser ended gracefully"
1251 | """
1252 | ##------------------------------------------------------------------------------
1253 | ## ProfilesSpawner configuration
1254 | ##------------------------------------------------------------------------------
1255 | ## List of profiles to offer for selection. Signature is:
1256 | ##   List(Tuple( Unicode, Unicode, Type(Spawner), Dict ))
1257 | ## corresponding to profile display name, unique key, Spawner class,
1258 | ## dictionary of spawner config options.
1259 | ##
1260 | ## The first three values will be exposed in the input_template as {display},
1261 | ## {key}, and {type}
1262 | ##
1263 | #c.ProfilesSpawner.profiles = [
1264 | #   ('Mesabi - 2 cores, 4 GB, 8 hours', 'mesabi2c4g12h', 'batchspawner.TorqueSpawner',
1265 | #   ( "Local server", 'local', 'jupyterhub.spawner.LocalProcessSpawner', {'ip':'0.0.0.0'} ),
1266 | #      dict(req_nprocs='2', req_queue='mesabi', req_runtime='8:00:00', req_memory='4gb')),
1267 | #   ('Mesabi - 12 cores, 128 GB, 4 hours', 'mesabi128gb', 'batchspawner.TorqueSpawner',
1268 | #      dict(req_nprocs='12', req_queue='ram256g', req_runtime='4:00:00', req_memory='125gb')),
1269 | #   ('Mesabi - 2 cores, 4 GB, 24 hours', 'mesabi2c4gb24h', 'batchspawner.TorqueSpawner',
1270 | #      dict(req_nprocs='2', req_queue='mesabi', req_runtime='24:00:00', req_memory='4gb')),
1271 | #   ('Interactive Cluster - 2 cores, 4 GB, 8 hours', 'lab', 'batchspawner.TorqueSpawner',
1272 | #      dict(req_nprocs='2', req_host='labhost.xyz.edu', req_queue='lab',
1273 | #          req_runtime='8:00:00', req_memory='4gb', state_exechost_exp='')),
1274 | #   ]
1275 | c.ProfilesSpawner.profiles = [
1276 |    ( "Cloud queue", 'cloud-norm', 'batchspawner.SlurmSpawner', 
1277 |       dict(req_nprocs='2', req_partition='cloud', req_runtime='8:00:00')),
1278 | ]
1279 | #   ( "Cloud queue 2", 'cloud-2', 'batchspawner.SlurmSpawner', 
1280 | #      dict(req_nprocs='2', req_partition='cloud2', req_runtime='1:00:00')),
1281 | #   ( "Cloud queue 3", 'cloud-3', 'batchspawner.SlurmSpawner', 
1282 | #      dict(req_nprocs='2', req_partition='cloud3', req_runtime='2:00:00')),
1283 | c.ProfilesSpawner.ip = '0.0.0.0'
1284 | 
1285 | #------------------------------------------------------------------------------
1286 | # Keycloak Configuration
1287 | #------------------------------------------------------------------------------
1288 | # Values needed by the django portal
1289 | #KEYCLOAK_AUTHORIZE_URL = 'https://iam.scigap.org/auth/realms/delta/protocol/openid-connect/auth'
1290 | #KEYCLOAK_TOKEN_URL = 'https://iam.scigap.org/auth/realms/delta/protocol/openid-connect/token'
1291 | #KEYCLOAK_USERINFO_URL = 'https://iam.scigap.org/auth/realms/delta/protocol/openid-connect/userinfo'
1292 | #KEYCLOAK_LOGOUT_URL = 'https://iam.scigap.org/auth/realms/delta/protocol/openid-connect/logout'
1293 | 
1294 | #public_hostname = extract_hostname(routes, application_name)
1295 | #
1296 | #keycloak_name = os.environ.get('KEYCLOAK_SERVICE_NAME')
1297 | #keycloak_hostname = extract_hostname(routes, keycloak_name)
1298 | #print('keycloak_hostname', keycloak_hostname)
1299 | #
1300 | #keycloak_realm = os.environ.get('KEYCLOAK_REALM')
1301 | #
1302 | #keycloak_account_url = 'https://%s/auth/realms/%s/account' % (
1303 | #        keycloak_hostname, keycloak_realm)
1304 | #
1305 | #with open('templates/vars.html', 'w') as fp:
1306 | #    fp.write('{%% set keycloak_account_url = "%s" %%}' % keycloak_account_url)
1307 | import os
1308 | public_hostname="{{ headnode_public_hostname }}" #JEC_PUBLIC_HOSTNAME
1309 | 


--------------------------------------------------------------------------------
/jhub_files/jhub_service.j2:
--------------------------------------------------------------------------------
 1 | [Unit]
 2 | Description=Jupyterhub
 3 | 
 4 | [Service]
 5 | User=jupyterhub
 6 | ExecStart=/opt/python3/bin/jupyterhub -f /etc/jupyterhub/jupyterhub_config.py
 7 | WorkingDirectory=/etc/jupyterhub
 8 | After=network-online.target
 9 | 
10 | [Install]
11 | WantedBy=multi-user.target
12 | 


--------------------------------------------------------------------------------
/jhub_files/jhub_sudoers:
--------------------------------------------------------------------------------
1 | ##Jupyterhub batchspawner
2 | Cmnd_Alias JUPYTER_CMD = /opt/python3/bin/batchspawner-singleuser, /opt/python3/bin/sudospawner, /bin/sbatch, /bin/squeue, /bin/scancel
3 | 
4 | %jupyterhub-users ALL=(jupyterhub) /usr/bin/sudo
5 | jupyterhub ALL=(%jupyterhub-users) NOPASSWD:SETENV:JUPYTER_CMD
6 | 


--------------------------------------------------------------------------------
/jhub_files/jupyterhub.conf.j2:
--------------------------------------------------------------------------------
 1 | <VirtualHost *:443>
 2 | ServerName {{ headnode_public_hostname }}
 3 | ServerAlias {{ headnode_alternate_hostname }}
 4 | {% raw %}
 5 |   SSLEngine on
 6 |   SSLProtocol -ALL +TLSv1.2
 7 |   SSLCipherSuite HIGH:!MEDIUM:!aNULL:!MD5:!SEED:!IDEA:!RC4
 8 |   SSLHonorCipherOrder on
 9 |   TraceEnable off
10 | #JupyterHub - provided changes:
11 |   RewriteEngine On
12 |   RewriteCond %{HTTP:Connection} Upgrade [NC]
13 |   RewriteCond %{HTTP:Upgrade} websocket [NC]
14 |   RewriteRule /(.*) ws://127.0.0.1:8000/$1 [P,L]
15 |   RewriteRule /(.*) http://127.0.0.1:8000/$1 [P,L]
16 | 
17 |   <Location "/">
18 |     # preserve Host header to avoid cross-origin problems
19 |     ProxyPreserveHost on
20 |     # proxy to JupyterHub
21 |     ProxyPass         http://127.0.0.1:8000/
22 |     ProxyPassReverse  http://127.0.0.1:8000/
23 |   </Location>
24 | 
25 | #  Include /etc/letsencrypt/options-ssl-apache.conf
26 | {% endraw %}
27 |   SSLCertificateFile /etc/letsencrypt/live/{{ headnode_public_hostname }}/fullchain.pem
28 |   SSLCertificateKeyFile /etc/letsencrypt/live/{{ headnode_public_hostname }}/privkey.pem
29 | </VirtualHost>
30 | 


--------------------------------------------------------------------------------
/jhub_files/python_mod_3.8:
--------------------------------------------------------------------------------
 1 | #%Module1.0#####################################################################
 2 | 
 3 | proc ModulesHelp { } {
 4 | 
 5 | puts stderr " "
 6 | puts stderr "This module loads Python 3.8.10"
 7 | puts stderr " "
 8 | puts stderr "See the man pages for python3 for detailed information"
 9 | puts stderr "on available compiler options and command-line syntax."
10 | puts stderr " "
11 | 
12 | puts stderr "\nVersion 3.8.10\n"
13 | 
14 | }
15 | module-whatis "Name: Python 3.8.10 environment"
16 | module-whatis "Version: 3.8.10"
17 | module-whatis "Category: compiler, runtime support"
18 | module-whatis "Description: Python 3.8.10 compiler, REPL, and runtime environment"
19 | module-whatis "URL: http://python.org/"
20 | 
21 | set     version			    3.8.10
22 | 
23 | prepend-path    PATH                /opt/ohpc/pub/compiler/python3/bin
24 | prepend-path    MANPATH             /opt/ohpc/pub/compiler/python3/share/man
25 | prepend-path    INCLUDE             /opt/ohpc/pub/compiler/python3/include
26 | prepend-path	LD_LIBRARY_PATH	    /opt/ohpc/pub/compiler/python3/lib
27 | prepend-path    MODULEPATH          /opt/ohpc/pub/moduledeps/python3.8
28 | 
29 | family "python_base"
30 | 


--------------------------------------------------------------------------------
/prevent-updates.ci:
--------------------------------------------------------------------------------
1 | #cloud-config
2 | packages: []
3 | 
4 | package_update: false
5 | package_upgrade: false
6 | package_reboot_if_required: false
7 | 
8 | final_message: "Boot completed in $UPTIME seconds"
9 | 


--------------------------------------------------------------------------------
/slurm-logrotate.conf:
--------------------------------------------------------------------------------
 1 | compress
 2 | 
 3 | /var/log/slurmctld.log {
 4 |     rotate 999
 5 |     missingok
 6 |     notifempty
 7 |     size 10M
 8 |     Monthly
 9 | }
10 | 
11 | /var/log/slurm_elastic.log {
12 |     rotate 999
13 |     missingok
14 |     notifempty
15 |     size 10M
16 |     Monthly
17 | }
18 | 


--------------------------------------------------------------------------------
/slurm.conf:
--------------------------------------------------------------------------------
  1 | #
  2 | # Example slurm.conf file. Please run configurator.html
  3 | # (in doc/html) to build a configuration file customized
  4 | # for your environment.
  5 | #
  6 | #
  7 | # slurm.conf file generated by configurator.html.
  8 | #
  9 | # See the slurm.conf man page for more information.
 10 | #
 11 | ClusterName=js-slurm-elastic
 12 | ControlMachine=slurm-example
 13 | #ControlAddr=
 14 | #BackupController=
 15 | #BackupAddr=
 16 | #
 17 | SlurmUser=slurm
 18 | SlurmdUser=root
 19 | SlurmctldPort=6817
 20 | SlurmdPort=6818
 21 | AuthType=auth/munge
 22 | #JobCredentialPrivateKey=
 23 | #JobCredentialPublicCertificate=
 24 | StateSaveLocation=/tmp
 25 | SlurmdSpoolDir=/tmp/slurmd
 26 | SwitchType=switch/none
 27 | MpiDefault=none
 28 | SlurmctldPidFile=/var/run/slurmctld.pid
 29 | SlurmdPidFile=/var/run/slurmd.pid
 30 | ProctrackType=proctrack/pgid
 31 | #PluginDir=
 32 | #FirstJobId=
 33 | ReturnToService=1
 34 | #MaxJobCount=
 35 | #PlugStackConfig=
 36 | #PropagatePrioProcess=
 37 | #PropagateResourceLimits=
 38 | #PropagateResourceLimitsExcept=
 39 | Prolog=/usr/local/sbin/slurm_prolog.sh
 40 | #Epilog=
 41 | #SrunProlog=
 42 | #SrunEpilog=
 43 | #TaskProlog=
 44 | #TaskEpilog=
 45 | #TaskPlugin=
 46 | #TrackWCKey=no
 47 | #TreeWidth=50
 48 | #TmpFS=
 49 | #UsePAM=
 50 | #
 51 | # TIMERS
 52 | SlurmctldTimeout=300
 53 | SlurmdTimeout=300
 54 | #make slurm a little more tolerant here
 55 | MessageTimeout=30
 56 | TCPTimeout=15
 57 | BatchStartTimeout=20
 58 | GetEnvTimeout=20
 59 | InactiveLimit=0
 60 | MinJobAge=604800
 61 | KillWait=30
 62 | Waittime=0
 63 | #
 64 | # SCHEDULING
 65 | SchedulerType=sched/backfill
 66 | #SchedulerAuth=
 67 | #SchedulerPort=
 68 | #SchedulerRootFilter=
 69 | #SelectType=select/linear
 70 | SelectType=select/cons_res
 71 | SelectTypeParameters=CR_CPU
 72 | #FastSchedule=0
 73 | #PriorityType=priority/multifactor
 74 | #PriorityDecayHalfLife=14-0
 75 | #PriorityUsageResetPeriod=14-0
 76 | #PriorityWeightFairshare=100000
 77 | #PriorityWeightAge=1000
 78 | #PriorityWeightPartition=10000
 79 | #PriorityWeightJobSize=1000
 80 | #PriorityMaxAge=1-0
 81 | #
 82 | # LOGGING
 83 | SlurmctldDebug=3
 84 | SlurmctldLogFile=/var/log/slurm/slurmctld.log
 85 | SlurmdDebug=3
 86 | SlurmdLogFile=/var/log/slurmd.log
 87 | JobCompType=jobcomp/none
 88 | #JobCompLoc=
 89 | #
 90 | # ACCOUNTING
 91 | JobAcctGatherType=jobacct_gather/linux
 92 | JobAcctGatherFrequency=30
 93 | #
 94 | #AccountingStorageType=accounting_storage/filetxt
 95 | #AccountingStorageLoc=/var/log/slurm/slurm_jobacct.log
 96 | #AccountingStorageEnforce=associations,limits
 97 | #AccountingStoragePass=
 98 | #AccountingStorageUser=
 99 | #
100 | #GENERAL RESOURCE
101 | GresTypes=""
102 | #
103 | #CLOUD CONFIGURATION
104 | PrivateData=cloud
105 | ResumeProgram=/usr/local/sbin/slurm_resume.sh
106 | SuspendProgram=/usr/local/sbin/slurm_suspend.sh
107 | ResumeRate=0 #number of nodes per minute that can be created; 0 means no limit
108 | ResumeTimeout=900 #max time in seconds between ResumeProgram running and when the node is ready for use
109 | SuspendRate=0 #number of nodes per minute that can be suspended/destroyed
110 | SuspendTime=60 #time in seconds before an idle node is suspended
111 | SuspendTimeout=30 #time between running SuspendProgram and the node being completely down
112 | #COMPUTE NODES
113 | NodeName=compute-[0-1] State=CLOUD CPUs=2
114 | #PARTITIONS
115 | PartitionName=cloud  Nodes=compute-[0-1] Default=YES MaxTime=INFINITE State=UP 
116 | 


--------------------------------------------------------------------------------
/slurm_prolog.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | mount_test=$(grep home /etc/mtab)
 4 | count=0
 5 | declare -i count
 6 | 
 7 | until [ -n "${mount_test}" -o $count -ge 10 ]; 
 8 | do
 9 |   sleep 1
10 |   count+=1
11 |   mount_test=$(grep home /etc/mtab)
12 |   echo "$count test: $mount_test"
13 | done
14 | 
15 | if [[ $count -ge 10 ]]; then
16 |  echo "FAILED TO MOUNT home - $hostname"
17 |  exit 1
18 | fi
19 | 
20 | echo "HOME IS MOUNTED! $hostname"
21 | 
22 | exit 0
23 | 


--------------------------------------------------------------------------------
/slurm_resume.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | source /etc/slurm/openrc.sh
 4 | 
 5 | node_size="m3.small"
 6 | # See compute_take_snapshot.sh for naming convention; backup snapshots exist with date appended
 7 | #node_image=$(openstack image list -f value | grep -i $(hostname -s)-compute-image-latest | cut -f 2 -d' '| tail -n 1)
 8 | node_image=$(hostname -s)-compute-image-latest
 9 | log_loc=/var/log/slurm/slurm_elastic.log
10 | 
11 | OS_PREFIX=$(hostname -s)
12 | OS_SLURM_KEYPAIR=${OS_PREFIX}-slurm-key
13 | HEADNODE_NETWORK=$(openstack server show $(hostname -s) | grep addresses | awk  -F'|' '{print $3}' | awk -F'=' '{print $1}' | awk '{$1=$1};1')
14 | OS_SSH_SECGROUP_NAME=${OS_PREFIX}-ssh-global
15 | OS_INTERNAL_SECGROUP_NAME=${OS_PREFIX}-internal
16 | 
17 | #def f'n to generate a write_files entry for cloud-config for copying over a file
18 | # arguments are owner path permissions file_to_be_copied
19 | # All calls to this must come after an "echo "write_files:\n"
20 | generate_write_files () {
21 | #This is generating YAML, so... spaces are important.
22 | echo -e "  - encoding: b64\n    owner: $1\n    path: $2\n    permissions: $3\n    content: |\n$(cat $4 | base64 | sed 's/^/      /')"
23 | }
24 | 
25 | user_data_long="$(cat /etc/slurm/prevent-updates.ci)\n"
26 | user_data_long+="$(echo -e "hostname: $host \npreserve_hostname: true\ndebug: \nmanage_etc_hosts: false\n")\n"
27 | user_data_long+="$(echo -e "write_files:")\n"
28 | user_data_long+="$(generate_write_files slurm "/etc/slurm/slurm.conf" "0644" "/etc/slurm/slurm.conf")\n"
29 | user_data_long+="$(generate_write_files root "/etc/hosts" "0664" "/etc/hosts")\n"
30 | user_data_long+="$(generate_write_files root "/etc/passwd" "0644" "/etc/passwd")\n"
31 | #Done generating the cloud-config for compute nodes
32 | 
33 | echo "Node resume invoked: $0 $*" >> $log_loc
34 | 
35 | #First, loop over hosts and run the openstack create commands for *all* resume hosts at once.
36 | for host in $(scontrol show hostname $1)
37 | do
38 | 
39 | #Launch compute nodes and check for new ip address in same subprocess - with 2s delay between Openstack requests
40 |     #--user-data <(cat /etc/slurm/prevent-updates.ci && echo -e "hostname: $host \npreserve_hostname: true\ndebug:") \
41 |     # the current --user-data pulls in the slurm.conf and /etc/passwd as well, to avoid rebuilding node images
42 |     # when adding / changing partitions
43 | 
44 |     (echo "creating $host" >> $log_loc;
45 |     openstack server create $host \
46 |     --flavor $node_size \
47 |     --image $node_image \
48 |     --key-name ${OS_SLURM_KEYPAIR} \
49 |     --user-data <(echo -e "${user_data_long}") \
50 |     --security-group ${OS_SSH_SECGROUP_NAME} --security-group ${OS_INTERNAL_SECGROUP_NAME} \
51 |     --nic net-id=${HEADNODE_NETWORK} 2>&1 \
52 |     | tee -a $log_loc | awk '/status/ {print $4}' >> $log_loc 2>&1;
53 | 
54 |   node_status="UNKOWN";
55 |   stat_count=0
56 |   declare -i stat_count;
57 |   until [[ $node_status == "ACTIVE" || $stat_count -ge 20 ]]; do
58 |     node_state=$(openstack server show $host 2>&1);
59 |     node_status=$(echo -e "${node_state}" | awk '/status/ {print $4}');
60 | #    echo "$host status is: $node_status" >> $log_loc;
61 | #    echo "$host ip is: $node_ip" >> $log_loc;
62 |     stat_count+=1
63 |     sleep 3;
64 |   done;
65 |   if [[ $node_status != "ACTIVE" ]]; then
66 |      echo "$host creation failed" >> $log_loc;
67 |      exit 1;
68 |   fi;
69 |   node_ip=$(echo -e "${node_state}" | awk '/addresses/ {print gensub(/^.*=/,"","g",$4)}');
70 | 
71 |   echo "$host ip is $node_ip" >> $log_loc;
72 |   scontrol update nodename=$host nodeaddr=$node_ip >> $log_loc;)&
73 |   sleep 2 # don't send all the JS requests at "once"
74 | done
75 | 


--------------------------------------------------------------------------------
/slurm_suspend.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | source /etc/slurm/openrc.sh
 4 | 
 5 | log_loc=/var/log/slurm/slurm_elastic.log
 6 | #log_loc=/dev/stdout 
 7 | 
 8 | echo "$(date) Node suspend invoked: $0 $*" >> $log_loc
 9 | 
10 | ##############################
11 | # active_hosts takes in a hostlist, and echos an updated list of instances
12 | # that are still active - this simplifies the count-loop below, which should
13 | # loop only over active instances
14 | ##############################
15 | active_hosts() {
16 | 
17 | hostlist="$1"
18 | os_status_list=$(openstack server list -f value -c ID -c Name -c Status)
19 | 
20 | updated_hosts=""
21 | 
22 | for host in $hostlist
23 | do
24 |   # the quotes around os_status_list preserve newlines!
25 |   if [[ "$(echo "$os_status_list" | awk -v host="$host " '$0 ~ host {print $3}')" == "ACTIVE" ]]; then
26 |     echo -n "$host "
27 |   elif [[ $(echo "${os_status_list}" | grep "$host " | wc -l) -ge 2 ]]; then
28 |     #switch to using OS id, because we have multiples of the same host
29 |     echo -n $(echo "${os_status_list}" | awk -v host="$host " '$0 ~ host {print $1}')
30 |   fi
31 | done
32 | 
33 | 
34 | return 0
35 | 
36 | }
37 | ##############################
38 | 
39 | count=0
40 | declare -i count
41 | 
42 | hostlist=$(scontrol show hostname $1 | tr '\n' ' ' | sed 's/[ ]*$//')
43 | 
44 | #Now, try 3 times to ensure all hosts are suspended...
45 | until [ -z "${hostlist}" -o $count -ge 3 ]; 
46 | do
47 |   for host in $hostlist 
48 |   do
49 |     if [[ $count == 0 ]]; then
50 |       scontrol update nodename=${host} nodeaddr="(null)" >> $log_loc
51 |     fi
52 |     destroy_result=$(openstack server delete $host 2>&1) 
53 |     echo "$(date) Deleted $host: $destroy_result" >> $log_loc
54 |   done
55 | 
56 |   sleep 5 #wait a bit for hosts to enter STOP state
57 |   count+=1
58 |   hostlist="$(active_hosts "${hostlist}")"
59 |   echo "$(date) delete Attempt $count: remaining hosts: $hostlist" >> $log_loc
60 | done
61 | 


--------------------------------------------------------------------------------
/slurm_test.job:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | #SBATCH -n 2
3 | #SBATCH -o nodes_%A.out
4 | 
5 | module load gnu9
6 | module load openmpi4
7 | 
8 | mpirun -n 2 hostname
9 | 


--------------------------------------------------------------------------------
/ssh.cfg:
--------------------------------------------------------------------------------
 1 | Host dimuthu-vc-compute-*
 2 |  User rocky
 3 |  StrictHostKeyChecking no
 4 |  BatchMode yes
 5 |  UserKnownHostsFile=/dev/null
 6 |  IdentityFile /etc/slurm/.ssh/slurm-key
 7 | 
 8 | Host 10.0.0.*
 9 |  User rocky
10 |  StrictHostKeyChecking no
11 |  BatchMode yes
12 |  UserKnownHostsFile=/dev/null
13 |  IdentityFile /etc/slurm/.ssh/slurm-key
14 | 


--------------------------------------------------------------------------------