├── LICENSE ├── README.md ├── additional_scripts └── install_singularity.sh ├── configuration_settings.txt ├── dependencies ├── etc │ ├── cron.monthly │ │ └── logstash_geoip_update │ ├── filebeat │ │ └── filebeat.yml │ ├── genders │ ├── init.d │ │ └── nvidia │ ├── logrotate.d │ │ └── slurm │ ├── logstash │ │ └── conf.d │ │ │ ├── 10-beats-input.conf │ │ │ ├── 20-syslog-filters.conf │ │ │ ├── 90-elasticsearch-output.conf │ │ │ └── 91-additional-output.conf │ ├── microway │ │ └── mcms_database.conf │ ├── nhc │ │ ├── compute-node-checks.conf │ │ ├── compute-node-checks_blocking-io.conf │ │ └── compute-node-checks_intense.conf │ ├── nvidia-healthmon.conf │ ├── slurm │ │ ├── cgroup.conf │ │ ├── cgroup_allowed_devices_file.conf │ │ ├── gres.conf │ │ ├── plugstack.conf │ │ ├── plugstack.conf.d │ │ │ └── x11.conf │ │ ├── scripts │ │ │ ├── slurm.epilog │ │ │ ├── slurm.healthcheck │ │ │ ├── slurm.healthcheck_long │ │ │ ├── slurm.jobstart_messages.sh │ │ │ ├── slurmctld.power_nodes_off │ │ │ ├── slurmctld.power_nodes_on │ │ │ ├── slurmctld.power_nodes_on_as_root │ │ │ ├── slurmctld.prolog │ │ │ └── slurmd.gres_init │ │ ├── slurm.conf │ │ └── slurmdbd.conf │ └── sysconfig │ │ ├── nhc │ │ └── nvidia ├── opt │ └── ohpc │ │ └── pub │ │ └── modulefiles │ │ └── cuda.lua ├── usr │ └── lib │ │ └── systemd │ │ └── system │ │ └── nvidia-gpu.service └── var │ └── spool │ └── slurmd │ └── validate-ssh-command ├── install_head_node.sh ├── install_login_server.sh └── install_monitoring_server.sh /LICENSE: -------------------------------------------------------------------------------- 1 | GNU GENERAL PUBLIC LICENSE 2 | Version 3, 29 June 2007 3 | 4 | Copyright (C) 2007 Free Software Foundation, Inc. 5 | Everyone is permitted to copy and distribute verbatim copies 6 | of this license document, but changing it is not allowed. 7 | 8 | Preamble 9 | 10 | The GNU General Public License is a free, copyleft license for 11 | software and other kinds of works. 12 | 13 | The licenses for most software and other practical works are designed 14 | to take away your freedom to share and change the works. By contrast, 15 | the GNU General Public License is intended to guarantee your freedom to 16 | share and change all versions of a program--to make sure it remains free 17 | software for all its users. We, the Free Software Foundation, use the 18 | GNU General Public License for most of our software; it applies also to 19 | any other work released this way by its authors. You can apply it to 20 | your programs, too. 21 | 22 | When we speak of free software, we are referring to freedom, not 23 | price. Our General Public Licenses are designed to make sure that you 24 | have the freedom to distribute copies of free software (and charge for 25 | them if you wish), that you receive source code or can get it if you 26 | want it, that you can change the software or use pieces of it in new 27 | free programs, and that you know you can do these things. 28 | 29 | To protect your rights, we need to prevent others from denying you 30 | these rights or asking you to surrender the rights. Therefore, you have 31 | certain responsibilities if you distribute copies of the software, or if 32 | you modify it: responsibilities to respect the freedom of others. 33 | 34 | For example, if you distribute copies of such a program, whether 35 | gratis or for a fee, you must pass on to the recipients the same 36 | freedoms that you received. You must make sure that they, too, receive 37 | or can get the source code. And you must show them these terms so they 38 | know their rights. 39 | 40 | Developers that use the GNU GPL protect your rights with two steps: 41 | (1) assert copyright on the software, and (2) offer you this License 42 | giving you legal permission to copy, distribute and/or modify it. 43 | 44 | For the developers' and authors' protection, the GPL clearly explains 45 | that there is no warranty for this free software. For both users' and 46 | authors' sake, the GPL requires that modified versions be marked as 47 | changed, so that their problems will not be attributed erroneously to 48 | authors of previous versions. 49 | 50 | Some devices are designed to deny users access to install or run 51 | modified versions of the software inside them, although the manufacturer 52 | can do so. This is fundamentally incompatible with the aim of 53 | protecting users' freedom to change the software. The systematic 54 | pattern of such abuse occurs in the area of products for individuals to 55 | use, which is precisely where it is most unacceptable. Therefore, we 56 | have designed this version of the GPL to prohibit the practice for those 57 | products. If such problems arise substantially in other domains, we 58 | stand ready to extend this provision to those domains in future versions 59 | of the GPL, as needed to protect the freedom of users. 60 | 61 | Finally, every program is threatened constantly by software patents. 62 | States should not allow patents to restrict development and use of 63 | software on general-purpose computers, but in those that do, we wish to 64 | avoid the special danger that patents applied to a free program could 65 | make it effectively proprietary. To prevent this, the GPL assures that 66 | patents cannot be used to render the program non-free. 67 | 68 | The precise terms and conditions for copying, distribution and 69 | modification follow. 70 | 71 | TERMS AND CONDITIONS 72 | 73 | 0. Definitions. 74 | 75 | "This License" refers to version 3 of the GNU General Public License. 76 | 77 | "Copyright" also means copyright-like laws that apply to other kinds of 78 | works, such as semiconductor masks. 79 | 80 | "The Program" refers to any copyrightable work licensed under this 81 | License. Each licensee is addressed as "you". "Licensees" and 82 | "recipients" may be individuals or organizations. 83 | 84 | To "modify" a work means to copy from or adapt all or part of the work 85 | in a fashion requiring copyright permission, other than the making of an 86 | exact copy. The resulting work is called a "modified version" of the 87 | earlier work or a work "based on" the earlier work. 88 | 89 | A "covered work" means either the unmodified Program or a work based 90 | on the Program. 91 | 92 | To "propagate" a work means to do anything with it that, without 93 | permission, would make you directly or secondarily liable for 94 | infringement under applicable copyright law, except executing it on a 95 | computer or modifying a private copy. Propagation includes copying, 96 | distribution (with or without modification), making available to the 97 | public, and in some countries other activities as well. 98 | 99 | To "convey" a work means any kind of propagation that enables other 100 | parties to make or receive copies. Mere interaction with a user through 101 | a computer network, with no transfer of a copy, is not conveying. 102 | 103 | An interactive user interface displays "Appropriate Legal Notices" 104 | to the extent that it includes a convenient and prominently visible 105 | feature that (1) displays an appropriate copyright notice, and (2) 106 | tells the user that there is no warranty for the work (except to the 107 | extent that warranties are provided), that licensees may convey the 108 | work under this License, and how to view a copy of this License. If 109 | the interface presents a list of user commands or options, such as a 110 | menu, a prominent item in the list meets this criterion. 111 | 112 | 1. Source Code. 113 | 114 | The "source code" for a work means the preferred form of the work 115 | for making modifications to it. "Object code" means any non-source 116 | form of a work. 117 | 118 | A "Standard Interface" means an interface that either is an official 119 | standard defined by a recognized standards body, or, in the case of 120 | interfaces specified for a particular programming language, one that 121 | is widely used among developers working in that language. 122 | 123 | The "System Libraries" of an executable work include anything, other 124 | than the work as a whole, that (a) is included in the normal form of 125 | packaging a Major Component, but which is not part of that Major 126 | Component, and (b) serves only to enable use of the work with that 127 | Major Component, or to implement a Standard Interface for which an 128 | implementation is available to the public in source code form. A 129 | "Major Component", in this context, means a major essential component 130 | (kernel, window system, and so on) of the specific operating system 131 | (if any) on which the executable work runs, or a compiler used to 132 | produce the work, or an object code interpreter used to run it. 133 | 134 | The "Corresponding Source" for a work in object code form means all 135 | the source code needed to generate, install, and (for an executable 136 | work) run the object code and to modify the work, including scripts to 137 | control those activities. However, it does not include the work's 138 | System Libraries, or general-purpose tools or generally available free 139 | programs which are used unmodified in performing those activities but 140 | which are not part of the work. For example, Corresponding Source 141 | includes interface definition files associated with source files for 142 | the work, and the source code for shared libraries and dynamically 143 | linked subprograms that the work is specifically designed to require, 144 | such as by intimate data communication or control flow between those 145 | subprograms and other parts of the work. 146 | 147 | The Corresponding Source need not include anything that users 148 | can regenerate automatically from other parts of the Corresponding 149 | Source. 150 | 151 | The Corresponding Source for a work in source code form is that 152 | same work. 153 | 154 | 2. Basic Permissions. 155 | 156 | All rights granted under this License are granted for the term of 157 | copyright on the Program, and are irrevocable provided the stated 158 | conditions are met. This License explicitly affirms your unlimited 159 | permission to run the unmodified Program. The output from running a 160 | covered work is covered by this License only if the output, given its 161 | content, constitutes a covered work. This License acknowledges your 162 | rights of fair use or other equivalent, as provided by copyright law. 163 | 164 | You may make, run and propagate covered works that you do not 165 | convey, without conditions so long as your license otherwise remains 166 | in force. You may convey covered works to others for the sole purpose 167 | of having them make modifications exclusively for you, or provide you 168 | with facilities for running those works, provided that you comply with 169 | the terms of this License in conveying all material for which you do 170 | not control copyright. Those thus making or running the covered works 171 | for you must do so exclusively on your behalf, under your direction 172 | and control, on terms that prohibit them from making any copies of 173 | your copyrighted material outside their relationship with you. 174 | 175 | Conveying under any other circumstances is permitted solely under 176 | the conditions stated below. Sublicensing is not allowed; section 10 177 | makes it unnecessary. 178 | 179 | 3. Protecting Users' Legal Rights From Anti-Circumvention Law. 180 | 181 | No covered work shall be deemed part of an effective technological 182 | measure under any applicable law fulfilling obligations under article 183 | 11 of the WIPO copyright treaty adopted on 20 December 1996, or 184 | similar laws prohibiting or restricting circumvention of such 185 | measures. 186 | 187 | When you convey a covered work, you waive any legal power to forbid 188 | circumvention of technological measures to the extent such circumvention 189 | is effected by exercising rights under this License with respect to 190 | the covered work, and you disclaim any intention to limit operation or 191 | modification of the work as a means of enforcing, against the work's 192 | users, your or third parties' legal rights to forbid circumvention of 193 | technological measures. 194 | 195 | 4. Conveying Verbatim Copies. 196 | 197 | You may convey verbatim copies of the Program's source code as you 198 | receive it, in any medium, provided that you conspicuously and 199 | appropriately publish on each copy an appropriate copyright notice; 200 | keep intact all notices stating that this License and any 201 | non-permissive terms added in accord with section 7 apply to the code; 202 | keep intact all notices of the absence of any warranty; and give all 203 | recipients a copy of this License along with the Program. 204 | 205 | You may charge any price or no price for each copy that you convey, 206 | and you may offer support or warranty protection for a fee. 207 | 208 | 5. Conveying Modified Source Versions. 209 | 210 | You may convey a work based on the Program, or the modifications to 211 | produce it from the Program, in the form of source code under the 212 | terms of section 4, provided that you also meet all of these conditions: 213 | 214 | a) The work must carry prominent notices stating that you modified 215 | it, and giving a relevant date. 216 | 217 | b) The work must carry prominent notices stating that it is 218 | released under this License and any conditions added under section 219 | 7. This requirement modifies the requirement in section 4 to 220 | "keep intact all notices". 221 | 222 | c) You must license the entire work, as a whole, under this 223 | License to anyone who comes into possession of a copy. This 224 | License will therefore apply, along with any applicable section 7 225 | additional terms, to the whole of the work, and all its parts, 226 | regardless of how they are packaged. This License gives no 227 | permission to license the work in any other way, but it does not 228 | invalidate such permission if you have separately received it. 229 | 230 | d) If the work has interactive user interfaces, each must display 231 | Appropriate Legal Notices; however, if the Program has interactive 232 | interfaces that do not display Appropriate Legal Notices, your 233 | work need not make them do so. 234 | 235 | A compilation of a covered work with other separate and independent 236 | works, which are not by their nature extensions of the covered work, 237 | and which are not combined with it such as to form a larger program, 238 | in or on a volume of a storage or distribution medium, is called an 239 | "aggregate" if the compilation and its resulting copyright are not 240 | used to limit the access or legal rights of the compilation's users 241 | beyond what the individual works permit. Inclusion of a covered work 242 | in an aggregate does not cause this License to apply to the other 243 | parts of the aggregate. 244 | 245 | 6. Conveying Non-Source Forms. 246 | 247 | You may convey a covered work in object code form under the terms 248 | of sections 4 and 5, provided that you also convey the 249 | machine-readable Corresponding Source under the terms of this License, 250 | in one of these ways: 251 | 252 | a) Convey the object code in, or embodied in, a physical product 253 | (including a physical distribution medium), accompanied by the 254 | Corresponding Source fixed on a durable physical medium 255 | customarily used for software interchange. 256 | 257 | b) Convey the object code in, or embodied in, a physical product 258 | (including a physical distribution medium), accompanied by a 259 | written offer, valid for at least three years and valid for as 260 | long as you offer spare parts or customer support for that product 261 | model, to give anyone who possesses the object code either (1) a 262 | copy of the Corresponding Source for all the software in the 263 | product that is covered by this License, on a durable physical 264 | medium customarily used for software interchange, for a price no 265 | more than your reasonable cost of physically performing this 266 | conveying of source, or (2) access to copy the 267 | Corresponding Source from a network server at no charge. 268 | 269 | c) Convey individual copies of the object code with a copy of the 270 | written offer to provide the Corresponding Source. This 271 | alternative is allowed only occasionally and noncommercially, and 272 | only if you received the object code with such an offer, in accord 273 | with subsection 6b. 274 | 275 | d) Convey the object code by offering access from a designated 276 | place (gratis or for a charge), and offer equivalent access to the 277 | Corresponding Source in the same way through the same place at no 278 | further charge. You need not require recipients to copy the 279 | Corresponding Source along with the object code. If the place to 280 | copy the object code is a network server, the Corresponding Source 281 | may be on a different server (operated by you or a third party) 282 | that supports equivalent copying facilities, provided you maintain 283 | clear directions next to the object code saying where to find the 284 | Corresponding Source. Regardless of what server hosts the 285 | Corresponding Source, you remain obligated to ensure that it is 286 | available for as long as needed to satisfy these requirements. 287 | 288 | e) Convey the object code using peer-to-peer transmission, provided 289 | you inform other peers where the object code and Corresponding 290 | Source of the work are being offered to the general public at no 291 | charge under subsection 6d. 292 | 293 | A separable portion of the object code, whose source code is excluded 294 | from the Corresponding Source as a System Library, need not be 295 | included in conveying the object code work. 296 | 297 | A "User Product" is either (1) a "consumer product", which means any 298 | tangible personal property which is normally used for personal, family, 299 | or household purposes, or (2) anything designed or sold for incorporation 300 | into a dwelling. In determining whether a product is a consumer product, 301 | doubtful cases shall be resolved in favor of coverage. For a particular 302 | product received by a particular user, "normally used" refers to a 303 | typical or common use of that class of product, regardless of the status 304 | of the particular user or of the way in which the particular user 305 | actually uses, or expects or is expected to use, the product. A product 306 | is a consumer product regardless of whether the product has substantial 307 | commercial, industrial or non-consumer uses, unless such uses represent 308 | the only significant mode of use of the product. 309 | 310 | "Installation Information" for a User Product means any methods, 311 | procedures, authorization keys, or other information required to install 312 | and execute modified versions of a covered work in that User Product from 313 | a modified version of its Corresponding Source. The information must 314 | suffice to ensure that the continued functioning of the modified object 315 | code is in no case prevented or interfered with solely because 316 | modification has been made. 317 | 318 | If you convey an object code work under this section in, or with, or 319 | specifically for use in, a User Product, and the conveying occurs as 320 | part of a transaction in which the right of possession and use of the 321 | User Product is transferred to the recipient in perpetuity or for a 322 | fixed term (regardless of how the transaction is characterized), the 323 | Corresponding Source conveyed under this section must be accompanied 324 | by the Installation Information. But this requirement does not apply 325 | if neither you nor any third party retains the ability to install 326 | modified object code on the User Product (for example, the work has 327 | been installed in ROM). 328 | 329 | The requirement to provide Installation Information does not include a 330 | requirement to continue to provide support service, warranty, or updates 331 | for a work that has been modified or installed by the recipient, or for 332 | the User Product in which it has been modified or installed. Access to a 333 | network may be denied when the modification itself materially and 334 | adversely affects the operation of the network or violates the rules and 335 | protocols for communication across the network. 336 | 337 | Corresponding Source conveyed, and Installation Information provided, 338 | in accord with this section must be in a format that is publicly 339 | documented (and with an implementation available to the public in 340 | source code form), and must require no special password or key for 341 | unpacking, reading or copying. 342 | 343 | 7. Additional Terms. 344 | 345 | "Additional permissions" are terms that supplement the terms of this 346 | License by making exceptions from one or more of its conditions. 347 | Additional permissions that are applicable to the entire Program shall 348 | be treated as though they were included in this License, to the extent 349 | that they are valid under applicable law. If additional permissions 350 | apply only to part of the Program, that part may be used separately 351 | under those permissions, but the entire Program remains governed by 352 | this License without regard to the additional permissions. 353 | 354 | When you convey a copy of a covered work, you may at your option 355 | remove any additional permissions from that copy, or from any part of 356 | it. (Additional permissions may be written to require their own 357 | removal in certain cases when you modify the work.) You may place 358 | additional permissions on material, added by you to a covered work, 359 | for which you have or can give appropriate copyright permission. 360 | 361 | Notwithstanding any other provision of this License, for material you 362 | add to a covered work, you may (if authorized by the copyright holders of 363 | that material) supplement the terms of this License with terms: 364 | 365 | a) Disclaiming warranty or limiting liability differently from the 366 | terms of sections 15 and 16 of this License; or 367 | 368 | b) Requiring preservation of specified reasonable legal notices or 369 | author attributions in that material or in the Appropriate Legal 370 | Notices displayed by works containing it; or 371 | 372 | c) Prohibiting misrepresentation of the origin of that material, or 373 | requiring that modified versions of such material be marked in 374 | reasonable ways as different from the original version; or 375 | 376 | d) Limiting the use for publicity purposes of names of licensors or 377 | authors of the material; or 378 | 379 | e) Declining to grant rights under trademark law for use of some 380 | trade names, trademarks, or service marks; or 381 | 382 | f) Requiring indemnification of licensors and authors of that 383 | material by anyone who conveys the material (or modified versions of 384 | it) with contractual assumptions of liability to the recipient, for 385 | any liability that these contractual assumptions directly impose on 386 | those licensors and authors. 387 | 388 | All other non-permissive additional terms are considered "further 389 | restrictions" within the meaning of section 10. If the Program as you 390 | received it, or any part of it, contains a notice stating that it is 391 | governed by this License along with a term that is a further 392 | restriction, you may remove that term. If a license document contains 393 | a further restriction but permits relicensing or conveying under this 394 | License, you may add to a covered work material governed by the terms 395 | of that license document, provided that the further restriction does 396 | not survive such relicensing or conveying. 397 | 398 | If you add terms to a covered work in accord with this section, you 399 | must place, in the relevant source files, a statement of the 400 | additional terms that apply to those files, or a notice indicating 401 | where to find the applicable terms. 402 | 403 | Additional terms, permissive or non-permissive, may be stated in the 404 | form of a separately written license, or stated as exceptions; 405 | the above requirements apply either way. 406 | 407 | 8. Termination. 408 | 409 | You may not propagate or modify a covered work except as expressly 410 | provided under this License. Any attempt otherwise to propagate or 411 | modify it is void, and will automatically terminate your rights under 412 | this License (including any patent licenses granted under the third 413 | paragraph of section 11). 414 | 415 | However, if you cease all violation of this License, then your 416 | license from a particular copyright holder is reinstated (a) 417 | provisionally, unless and until the copyright holder explicitly and 418 | finally terminates your license, and (b) permanently, if the copyright 419 | holder fails to notify you of the violation by some reasonable means 420 | prior to 60 days after the cessation. 421 | 422 | Moreover, your license from a particular copyright holder is 423 | reinstated permanently if the copyright holder notifies you of the 424 | violation by some reasonable means, this is the first time you have 425 | received notice of violation of this License (for any work) from that 426 | copyright holder, and you cure the violation prior to 30 days after 427 | your receipt of the notice. 428 | 429 | Termination of your rights under this section does not terminate the 430 | licenses of parties who have received copies or rights from you under 431 | this License. If your rights have been terminated and not permanently 432 | reinstated, you do not qualify to receive new licenses for the same 433 | material under section 10. 434 | 435 | 9. Acceptance Not Required for Having Copies. 436 | 437 | You are not required to accept this License in order to receive or 438 | run a copy of the Program. Ancillary propagation of a covered work 439 | occurring solely as a consequence of using peer-to-peer transmission 440 | to receive a copy likewise does not require acceptance. However, 441 | nothing other than this License grants you permission to propagate or 442 | modify any covered work. These actions infringe copyright if you do 443 | not accept this License. Therefore, by modifying or propagating a 444 | covered work, you indicate your acceptance of this License to do so. 445 | 446 | 10. Automatic Licensing of Downstream Recipients. 447 | 448 | Each time you convey a covered work, the recipient automatically 449 | receives a license from the original licensors, to run, modify and 450 | propagate that work, subject to this License. You are not responsible 451 | for enforcing compliance by third parties with this License. 452 | 453 | An "entity transaction" is a transaction transferring control of an 454 | organization, or substantially all assets of one, or subdividing an 455 | organization, or merging organizations. If propagation of a covered 456 | work results from an entity transaction, each party to that 457 | transaction who receives a copy of the work also receives whatever 458 | licenses to the work the party's predecessor in interest had or could 459 | give under the previous paragraph, plus a right to possession of the 460 | Corresponding Source of the work from the predecessor in interest, if 461 | the predecessor has it or can get it with reasonable efforts. 462 | 463 | You may not impose any further restrictions on the exercise of the 464 | rights granted or affirmed under this License. For example, you may 465 | not impose a license fee, royalty, or other charge for exercise of 466 | rights granted under this License, and you may not initiate litigation 467 | (including a cross-claim or counterclaim in a lawsuit) alleging that 468 | any patent claim is infringed by making, using, selling, offering for 469 | sale, or importing the Program or any portion of it. 470 | 471 | 11. Patents. 472 | 473 | A "contributor" is a copyright holder who authorizes use under this 474 | License of the Program or a work on which the Program is based. The 475 | work thus licensed is called the contributor's "contributor version". 476 | 477 | A contributor's "essential patent claims" are all patent claims 478 | owned or controlled by the contributor, whether already acquired or 479 | hereafter acquired, that would be infringed by some manner, permitted 480 | by this License, of making, using, or selling its contributor version, 481 | but do not include claims that would be infringed only as a 482 | consequence of further modification of the contributor version. For 483 | purposes of this definition, "control" includes the right to grant 484 | patent sublicenses in a manner consistent with the requirements of 485 | this License. 486 | 487 | Each contributor grants you a non-exclusive, worldwide, royalty-free 488 | patent license under the contributor's essential patent claims, to 489 | make, use, sell, offer for sale, import and otherwise run, modify and 490 | propagate the contents of its contributor version. 491 | 492 | In the following three paragraphs, a "patent license" is any express 493 | agreement or commitment, however denominated, not to enforce a patent 494 | (such as an express permission to practice a patent or covenant not to 495 | sue for patent infringement). To "grant" such a patent license to a 496 | party means to make such an agreement or commitment not to enforce a 497 | patent against the party. 498 | 499 | If you convey a covered work, knowingly relying on a patent license, 500 | and the Corresponding Source of the work is not available for anyone 501 | to copy, free of charge and under the terms of this License, through a 502 | publicly available network server or other readily accessible means, 503 | then you must either (1) cause the Corresponding Source to be so 504 | available, or (2) arrange to deprive yourself of the benefit of the 505 | patent license for this particular work, or (3) arrange, in a manner 506 | consistent with the requirements of this License, to extend the patent 507 | license to downstream recipients. "Knowingly relying" means you have 508 | actual knowledge that, but for the patent license, your conveying the 509 | covered work in a country, or your recipient's use of the covered work 510 | in a country, would infringe one or more identifiable patents in that 511 | country that you have reason to believe are valid. 512 | 513 | If, pursuant to or in connection with a single transaction or 514 | arrangement, you convey, or propagate by procuring conveyance of, a 515 | covered work, and grant a patent license to some of the parties 516 | receiving the covered work authorizing them to use, propagate, modify 517 | or convey a specific copy of the covered work, then the patent license 518 | you grant is automatically extended to all recipients of the covered 519 | work and works based on it. 520 | 521 | A patent license is "discriminatory" if it does not include within 522 | the scope of its coverage, prohibits the exercise of, or is 523 | conditioned on the non-exercise of one or more of the rights that are 524 | specifically granted under this License. You may not convey a covered 525 | work if you are a party to an arrangement with a third party that is 526 | in the business of distributing software, under which you make payment 527 | to the third party based on the extent of your activity of conveying 528 | the work, and under which the third party grants, to any of the 529 | parties who would receive the covered work from you, a discriminatory 530 | patent license (a) in connection with copies of the covered work 531 | conveyed by you (or copies made from those copies), or (b) primarily 532 | for and in connection with specific products or compilations that 533 | contain the covered work, unless you entered into that arrangement, 534 | or that patent license was granted, prior to 28 March 2007. 535 | 536 | Nothing in this License shall be construed as excluding or limiting 537 | any implied license or other defenses to infringement that may 538 | otherwise be available to you under applicable patent law. 539 | 540 | 12. No Surrender of Others' Freedom. 541 | 542 | If conditions are imposed on you (whether by court order, agreement or 543 | otherwise) that contradict the conditions of this License, they do not 544 | excuse you from the conditions of this License. If you cannot convey a 545 | covered work so as to satisfy simultaneously your obligations under this 546 | License and any other pertinent obligations, then as a consequence you may 547 | not convey it at all. For example, if you agree to terms that obligate you 548 | to collect a royalty for further conveying from those to whom you convey 549 | the Program, the only way you could satisfy both those terms and this 550 | License would be to refrain entirely from conveying the Program. 551 | 552 | 13. Use with the GNU Affero General Public License. 553 | 554 | Notwithstanding any other provision of this License, you have 555 | permission to link or combine any covered work with a work licensed 556 | under version 3 of the GNU Affero General Public License into a single 557 | combined work, and to convey the resulting work. The terms of this 558 | License will continue to apply to the part which is the covered work, 559 | but the special requirements of the GNU Affero General Public License, 560 | section 13, concerning interaction through a network will apply to the 561 | combination as such. 562 | 563 | 14. Revised Versions of this License. 564 | 565 | The Free Software Foundation may publish revised and/or new versions of 566 | the GNU General Public License from time to time. Such new versions will 567 | be similar in spirit to the present version, but may differ in detail to 568 | address new problems or concerns. 569 | 570 | Each version is given a distinguishing version number. If the 571 | Program specifies that a certain numbered version of the GNU General 572 | Public License "or any later version" applies to it, you have the 573 | option of following the terms and conditions either of that numbered 574 | version or of any later version published by the Free Software 575 | Foundation. If the Program does not specify a version number of the 576 | GNU General Public License, you may choose any version ever published 577 | by the Free Software Foundation. 578 | 579 | If the Program specifies that a proxy can decide which future 580 | versions of the GNU General Public License can be used, that proxy's 581 | public statement of acceptance of a version permanently authorizes you 582 | to choose that version for the Program. 583 | 584 | Later license versions may give you additional or different 585 | permissions. However, no additional obligations are imposed on any 586 | author or copyright holder as a result of your choosing to follow a 587 | later version. 588 | 589 | 15. Disclaimer of Warranty. 590 | 591 | THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY 592 | APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT 593 | HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY 594 | OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, 595 | THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 596 | PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM 597 | IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF 598 | ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 599 | 600 | 16. Limitation of Liability. 601 | 602 | IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING 603 | WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS 604 | THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY 605 | GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE 606 | USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF 607 | DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD 608 | PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), 609 | EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF 610 | SUCH DAMAGES. 611 | 612 | 17. Interpretation of Sections 15 and 16. 613 | 614 | If the disclaimer of warranty and limitation of liability provided 615 | above cannot be given local legal effect according to their terms, 616 | reviewing courts shall apply local law that most closely approximates 617 | an absolute waiver of all civil liability in connection with the 618 | Program, unless a warranty or assumption of liability accompanies a 619 | copy of the Program in return for a fee. 620 | 621 | END OF TERMS AND CONDITIONS 622 | 623 | How to Apply These Terms to Your New Programs 624 | 625 | If you develop a new program, and you want it to be of the greatest 626 | possible use to the public, the best way to achieve this is to make it 627 | free software which everyone can redistribute and change under these terms. 628 | 629 | To do so, attach the following notices to the program. It is safest 630 | to attach them to the start of each source file to most effectively 631 | state the exclusion of warranty; and each file should have at least 632 | the "copyright" line and a pointer to where the full notice is found. 633 | 634 | {one line to give the program's name and a brief idea of what it does.} 635 | Copyright (C) {year} {name of author} 636 | 637 | This program is free software: you can redistribute it and/or modify 638 | it under the terms of the GNU General Public License as published by 639 | the Free Software Foundation, either version 3 of the License, or 640 | (at your option) any later version. 641 | 642 | This program is distributed in the hope that it will be useful, 643 | but WITHOUT ANY WARRANTY; without even the implied warranty of 644 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 645 | GNU General Public License for more details. 646 | 647 | You should have received a copy of the GNU General Public License 648 | along with this program. If not, see . 649 | 650 | Also add information on how to contact you by electronic and paper mail. 651 | 652 | If the program does terminal interaction, make it output a short 653 | notice like this when it starts in an interactive mode: 654 | 655 | {project} Copyright (C) {year} {fullname} 656 | This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'. 657 | This is free software, and you are welcome to redistribute it 658 | under certain conditions; type `show c' for details. 659 | 660 | The hypothetical commands `show w' and `show c' should show the appropriate 661 | parts of the General Public License. Of course, your program's commands 662 | might be different; for a GUI interface, you would use an "about box". 663 | 664 | You should also get your employer (if you work as a programmer) or school, 665 | if any, to sign a "copyright disclaimer" for the program, if necessary. 666 | For more information on this, and how to apply and follow the GNU GPL, see 667 | . 668 | 669 | The GNU General Public License does not permit incorporating your program 670 | into proprietary programs. If your program is a subroutine library, you 671 | may consider it more useful to permit linking proprietary applications with 672 | the library. If this is what you want to do, use the GNU Lesser General 673 | Public License instead of this License. But first, please read 674 | . 675 | 676 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # MCMS for OpenHPC Recipe 2 | 3 | [![Join the chat at https://gitter.im/Microway/MCMS-OpenHPC-Recipe](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/Microway/MCMS-OpenHPC-Recipe?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) 4 | 5 | ## This is an experimental work in progress - it is not ready for production 6 | 7 | MCMS is Microway's Cluster Management Software. This is not the production-ready 8 | version of MCMS. This is an ongoing project to bring Microway's expertise and 9 | software tools to the recently-announced OpenHPC collaborative framework. 10 | 11 | ### Purpose 12 | This recipe contains many of the same elements as the official OpenHPC recipe, 13 | but offers a variety of customizations and enhancements, including: 14 | 15 | * Automated power-down of idle compute nodes 16 | * Support for Mellanox InfiniBand 17 | * Support for NVIDIA GPU accelerators 18 | * Monitoring of many additional metrics **(WIP)** 19 | * More sophisticated log collection and analysis **(WIP)** 20 | * Nagios-compatible monitoring with a more modern interface **(WIP)** 21 | 22 | ### Installation 23 | *Given a vanilla CentOS 7.x installation, this collection of scripts will stand 24 | up an OpenHPC cluster. This script will be tested with fresh installations - 25 | attempting to run it on an installation that's had a lot of changes may break.* 26 | 27 | ``` 28 | # Use your favorite text editor to customize the install 29 | vim configuration_settings.txt 30 | 31 | # Run the installation on the new Head Node 32 | ./install_head_node.sh 33 | ``` 34 | 35 | ### More Information 36 | If you would like to purchase professional support/services for an OpenHPC 37 | cluster, or to fund development of a new feature, please visit: 38 | https://www.microway.com/contact/ 39 | 40 | To learn more about OpenHPC or to view the official installation recipe, visit: 41 | http://www.openhpc.community/ 42 | 43 | -------------------------------------------------------------------------------- /additional_scripts/install_singularity.sh: -------------------------------------------------------------------------------- 1 | ################################################################################ 2 | ######################## Microway Cluster Management Software (MCMS) for OpenHPC 3 | ################################################################################ 4 | # 5 | # Copyright (c) 2016 by Microway, Inc. 6 | # 7 | # This file is part of Microway Cluster Management Software (MCMS) for OpenHPC. 8 | # 9 | # MCMS for OpenHPC is free software: you can redistribute it and/or modify 10 | # it under the terms of the GNU General Public License as published by 11 | # the Free Software Foundation, either version 3 of the License, or 12 | # (at your option) any later version. 13 | # 14 | # MCMS for OpenHPC is distributed in the hope that it will be useful, 15 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 16 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 17 | # GNU General Public License for more details. 18 | # 19 | # You should have received a copy of the GNU General Public License 20 | # along with MCMS. If not, see 21 | # 22 | ################################################################################ 23 | 24 | 25 | ################################################################################ 26 | ## 27 | ## Instructions for setting up Singularity on a CentOS system 28 | ## 29 | ################################################################################ 30 | 31 | # Grab the machine architecture (most commonly x86_64) 32 | machine_arch=$(uname -m) 33 | 34 | # Set the default node VNFS chroot if one is not already set 35 | node_chroot="${node_chroot:-/opt/ohpc/admin/images/centos-7/}" 36 | 37 | git clone -b master --depth empty https://github.com/gmkurtzer/singularity.git 38 | 39 | cd singularity/ 40 | sh ./autogen.sh 41 | make dist 42 | rpmbuild -ta singularity-[0-9]*.tar.gz 43 | cd ../ 44 | rm -Rf singularity 45 | 46 | yum -y install ~/rpmbuild/RPMS/${machine_arch}/singularity-[0-9]*.rpm 47 | yum -y --installroot=${node_chroot} install ~/rpmbuild/RPMS/${machine_arch}/singularity-[0-9]*.rpm 48 | 49 | -------------------------------------------------------------------------------- /configuration_settings.txt: -------------------------------------------------------------------------------- 1 | ################################################################################ 2 | ######################## Microway Cluster Management Software (MCMS) for OpenHPC 3 | ################################################################################ 4 | # 5 | # Copyright (c) 2015-2016 by Microway, Inc. 6 | # 7 | # This file is part of Microway Cluster Management Software (MCMS) for OpenHPC. 8 | # 9 | # MCMS for OpenHPC is free software: you can redistribute it and/or modify 10 | # it under the terms of the GNU General Public License as published by 11 | # the Free Software Foundation, either version 3 of the License, or 12 | # (at your option) any later version. 13 | # 14 | # MCMS for OpenHPC is distributed in the hope that it will be useful, 15 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 16 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 17 | # GNU General Public License for more details. 18 | # 19 | # You should have received a copy of the GNU General Public License 20 | # along with MCMS. If not, see 21 | # 22 | ################################################################################ 23 | 24 | 25 | 26 | ################################################################################ 27 | ## 28 | ## This file defines the configuration settings for an OpenHPC cluster install. 29 | ## Configure the settings below before starting the cluster installation script. 30 | ## 31 | ## The System Management Server (SMS) is often called the Head or Master Node. 32 | ## 33 | ################################################################################ 34 | 35 | 36 | 37 | ################################################################################ 38 | # Mandatory settings - the default passwords are not acceptable! 39 | ################################################################################ 40 | 41 | # Number of compute nodes to initialize 42 | compute_node_count=4 43 | 44 | # Root password for the databases (MariaDB/MongoDB) 45 | db_root_password="ChangeMe" 46 | 47 | # Management password for the databases (will be used by Warewulf/SLURM) 48 | db_mgmt_password="ChangeMe" 49 | 50 | # BMC username and password for use by IPMI 51 | # Warewulf will add this user to the BMC on each compute node 52 | bmc_username="wwipmi" 53 | bmc_password="ChangeMe" 54 | 55 | # A mail server to which the cluster may forward notices, alerts, etc. 56 | # Most commonly, this will be the mail server used on your internal network. 57 | mail_server="mailserver.example.com" 58 | 59 | 60 | ################################################################################ 61 | # Optional settings 62 | ################################################################################ 63 | 64 | # Install InfiniBand drivers and tools 65 | enable_infiniband="true" 66 | 67 | # Install NVIDIA GPU drivers and tools 68 | enable_nvidia_gpu="true" 69 | 70 | # Install Intel Xeon Phi coprocessor drivers and tools 71 | # (disabled by default due to the additional steps necessary for Phi) 72 | enable_phi_coprocessor="false" 73 | 74 | # Restrict users from logging into any node via SSH (unless a job is running) 75 | restrict_user_ssh_logins="true" 76 | 77 | 78 | # This information is used to set up the hierarchy for SLURM accounting. It is 79 | # easy to add more accounts after installation using the sacctmgr utility. 80 | # 81 | # TAKE NOTE: SLURM wants lower-case all-one-word organization and account names! 82 | # 83 | declare -A cluster_acct_hierarchy 84 | cluster_acct_hierarchy['cluster_name']="microway_hpc" 85 | cluster_acct_hierarchy['default_organization']="unnamed_organization" 86 | cluster_acct_hierarchy['default_organization_description']="Unnamed Organization" 87 | cluster_acct_hierarchy['default_account']="default_account" 88 | cluster_acct_hierarchy['default_account_description']="Default User Account" 89 | 90 | 91 | # MAC addresses for compute nodes 92 | # 93 | # If you don't know the MAC addresses, leave the defaults. However, you will 94 | # need to update these later using the Warewulf wwsh tool. Compute Nodes will 95 | # not boot correctly until each MAC address is registered within Warewulf. 96 | declare -A c_mac 97 | # 98 | # For now, we'll generate bogus MAC addresses 99 | for ((i=0; i<${compute_node_count}; i++)); do 100 | # This algorithm supports up to 10^6 compute nodes 101 | first_octet=$(( ($i+1) / 100000 )) 102 | second_octet=$(( ($i+1) / 10000 )) 103 | third_octet=$(( ($i+1) / 1000 )) 104 | fourth_octet=$(( ($i+1) / 100 )) 105 | fifth_octet=$(( ($i+1) / 10 )) 106 | sixth_octet=$(( ($i+1) % 10 )) 107 | c_mac[$i]=0${first_octet}:0${second_octet}:0${third_octet}:0${fourth_octet}:0${fifth_octet}:0${sixth_octet} 108 | done 109 | # 110 | # Set these values to have the nodes come up on boot: 111 | # 112 | # c_mac[0]=01:02:03:04:05:06 113 | # c_mac[1]=02:02:03:04:05:06 114 | # c_mac[2]=03:02:03:04:05:06 115 | # c_mac[3]=04:02:03:04:05:06 116 | # 117 | 118 | 119 | 120 | # MCMS cluster hosts 121 | # ================== 122 | # 123 | # A cluster needs a head node: 124 | # 125 | # head 126 | # 127 | # For redundancy, it needs two: 128 | # 129 | # head-a 130 | # head-b 131 | # 132 | # 133 | # There may also be storage and login/session nodes. For example: 134 | # 135 | # metadata1 136 | # metadata2 137 | # 138 | # storage1 139 | # storage2 140 | # ... 141 | # storage63 142 | # 143 | # login-a 144 | # login-b 145 | # 146 | # 147 | # Compute node names can vary as needed, but should not include any of the names 148 | # listed above. Keep it simple and end with a number (which allows admins to 149 | # specify node ranges such as node[1-20] node[2,4,6,8] node[1,10-20] ). 150 | # 151 | # node1 152 | # node2 153 | # ... 154 | # node32768 155 | # 156 | # 157 | # 158 | # Network address ranges 159 | # ====================== 160 | # 161 | # In a modern cluster, each server/node contains several IP-enabled devices, 162 | # such as Ethernet, InfiniBand, IPMI, etc. Microway recommends that a class B 163 | # network be devoted to each type of traffic (for simplicity and scaling). 164 | # 165 | # By default, Microway recommends that one of the following subnets be 166 | # divided up and used for the cluster traffic. Choose whichever does not 167 | # conflict with your existing private networks: 168 | # 169 | # 10.0.0.0/8 default (supports IPs 10.0.0.1 through 10.255.255.254) 170 | # 172.16.0.0/12 (supports IPs 172.16.0.1 through 172.31.255.254) 171 | # 172 | # 173 | # The following subnets are recommended: 174 | # ====================================== 175 | # 176 | # 10.0.0.1 - 10.0.255.254 (Ethernet) 177 | # 10.10.0.1 - 10.10.255.254 (InfiniBand) 178 | # 10.13.0.1 - 10.13.255.254 (IPMI) 179 | # 180 | # For clusters with IP-enabled accelerators (such as Xeon Phi), use numbering: 181 | # 10.100.0.1 + (Accelerator #0) 182 | # 10.101.0.1 + (Accelerator #1) 183 | # ... 184 | # 10.10N.0.1 + (Accelerator #N) 185 | # 186 | 187 | # The network prefix and subnet netmask for the internal network 188 | internal_subnet_prefix=10.0 189 | internal_netmask=255.255.0.0 190 | 191 | # Network Prefix and Subnet Netmask for internal IPoIB (if IB is enabled) 192 | ipoib_network_prefix=10.10 193 | ipoib_netmask=255.255.0.0 194 | 195 | # The network prefix and subnet netmask for the BMC/IPMI network 196 | bmc_subnet_prefix=10.13 197 | bmc_netmask=255.255.0.0 198 | 199 | # The network interface on the compute nodes which will download the node image 200 | eth_provision=eth0 201 | 202 | # The first part of each node's name. Node numbers will be appended, so if the 203 | # prefix is set to 'stars' then the nodes will be: star1, star2, star3, etc. 204 | compute_node_name_prefix="node" 205 | 206 | # A regular expression which will capture all compute node names. This value is 207 | # almost always safe (unless exotic and irregular names are selected). 208 | compute_regex="${compute_node_name_prefix}*" 209 | 210 | # OpenHPC repo location 211 | ohpc_repo=https://github.com/openhpc/ohpc/releases/download/v1.1.GA/ohpc-release-centos7.2-1.1-1.x86_64.rpm 212 | 213 | # Local ntp server for time synchronization - this is typically only necessary 214 | # if the cluster doesn't have access to NTP servers from the Internet. 215 | ntp_server="" 216 | 217 | # Additional arguments to send to the Linux kernel on compute nodes 218 | kargs="" 219 | 220 | # Set up a Lustre filesystem mount 221 | enable_lustre_client="false" 222 | 223 | # Lustre MGS mount name (if Lustre is enabled) 224 | mgs_fs_name="${mgs_fs_name:-10.0.255.254@o2ib:/lustre}" 225 | 226 | 227 | 228 | 229 | ################################################################################ 230 | # Settings which should be set up before the installation script is executed. In 231 | # other words, you must set the SMS/Head Node's hostname and IP addresses before 232 | # beginning. The values you have set are loaded in at the end of this file. 233 | # 234 | # sms_name Hostname for SMS server 235 | # sms_ip Internal IP address on SMS server 236 | # sms_eth_internal Internal Ethernet interface on SMS 237 | # sms_ipoib IPoIB address for SMS server 238 | # 239 | ################################################################################ 240 | 241 | sms_name=$(hostname --short) 242 | sms_ip=$(ip route get ${internal_subnet_prefix}.0.1 | head -n1 | sed 's/.*src //' | tr -d '[[:space:]]') 243 | sms_eth_internal=$(ip route get ${internal_subnet_prefix}.0.1 | head -n1 | sed 's/.*dev \([^ ]*\) .*/\1/') 244 | sms_ipoib=${ipoib_network_prefix}.$(echo ${sms_ip} | sed -r 's/[0-9]+\.[0-9]+\.//') 245 | 246 | # How would we get to Google? That defines the external interface. 247 | sms_eth_external=$(ip route get 8.8.8.8 | head -n1 | sed 's/.*dev \([^ ]*\) .*/\1/') 248 | 249 | 250 | 251 | 252 | ################################################################################ 253 | # Settings which will be auto-calculated during execution: 254 | # 255 | # c_ip[0], c_ip[1], ... Desired compute node addresses 256 | # c_bmc[0], c_bmc[1], ... BMC addresses for compute nodes 257 | # c_ipoib[0], c_ipoib[1], ... IPoIB addresses for computes 258 | # 259 | ################################################################################ 260 | 261 | 262 | # Compute node IP addresses will start at x.x.0.1 and increase from there (with 263 | # a maximum limit of x.x.255.254, at which point the addresses will wrap). 264 | declare -A c_ip 265 | for ((i=0; i<${compute_node_count}; i++)); do 266 | third_octet=$(( ($i+1) / 255 )) 267 | fourth_octet=$(( ($i+1) % 255 )) 268 | c_ip[$i]=${internal_subnet_prefix}.${third_octet}.${fourth_octet} 269 | done 270 | 271 | 272 | # Node BMC IP addresses will start at x.x.0.1 and increase from there (with 273 | # a maximum limit of x.x.255.254, at which point the addresses will wrap). 274 | declare -A c_bmc 275 | for ((i=0; i<${compute_node_count}; i++)); do 276 | third_octet=$(( ($i+1) / 255 )) 277 | fourth_octet=$(( ($i+1) % 255 )) 278 | c_bmc[$i]=${bmc_subnet_prefix}.${third_octet}.${fourth_octet} 279 | done 280 | 281 | 282 | # Node IPoIB addresses will start at x.x.0.1 and increase from there (with 283 | # a maximum limit of x.x.255.254, at which point the addresses will wrap). 284 | declare -A c_ipoib 285 | for ((i=0; i<${compute_node_count}; i++)); do 286 | third_octet=$(( ($i+1) / 255 )) 287 | fourth_octet=$(( ($i+1) % 255 )) 288 | c_ipoib[$i]=${ipoib_network_prefix}.${third_octet}.${fourth_octet} 289 | done 290 | 291 | -------------------------------------------------------------------------------- /dependencies/etc/cron.monthly/logstash_geoip_update: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | ################################################################################ 3 | ######################## Microway Cluster Management Software (MCMS) for OpenHPC 4 | ################################################################################ 5 | # 6 | # Copyright (c) 2015-2016 by Microway, Inc. 7 | # 8 | # This file is part of Microway Cluster Management Software (MCMS) for OpenHPC. 9 | # 10 | # MCMS for OpenHPC is free software: you can redistribute it and/or modify 11 | # it under the terms of the GNU General Public License as published by 12 | # the Free Software Foundation, either version 3 of the License, or 13 | # (at your option) any later version. 14 | # 15 | # MCMS for OpenHPC is distributed in the hope that it will be useful, 16 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 17 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 18 | # GNU General Public License for more details. 19 | # 20 | # You should have received a copy of the GNU General Public License 21 | # along with MCMS. If not, see 22 | # 23 | ################################################################################ 24 | 25 | 26 | 27 | ################################################################################ 28 | # 29 | # Update the GeoLiteCity database which is used for GeoIP lookups in Logstash 30 | # 31 | ################################################################################ 32 | 33 | db_url="http://geolite.maxmind.com/download/geoip/database/GeoLiteCity.dat.gz" 34 | 35 | tmp_file="/etc/logstash/GeoLiteCity.dat.tmp" 36 | db_file="/etc/logstash/GeoLiteCity.dat" 37 | 38 | 39 | # Download the new database file with some fairly lenient settings (to survive 40 | # interruptions and to minimize network load). 41 | wget_options="--quiet --tries=20 --waitretry=100 --retry-connrefused --limit-rate=50k" 42 | 43 | 44 | wget_path=$(which wget) 45 | if [[ "$?" -gt "0" ]]; then 46 | echo "Unable to locate wget utility" 47 | exit 1 48 | fi 49 | 50 | 51 | # Sleep a random amount (up to 8 hours) to prevent multiple 52 | # clusters from DDOSing the GeoIP database site. 53 | sleep $(( $RANDOM % (60 * 60 * 8) )) 54 | 55 | 56 | $wget_path $wget_options --output-document=- $db_url | gunzip > $tmp_file 57 | 58 | 59 | # Assuming the update completed successfully, move the new file into place 60 | RETVAL=$? 61 | if [[ "$RETVAL" -eq "0" ]]; then 62 | mv --force $tmp_file $db_file 63 | else 64 | echo "Unable to update GeoLiteCity IP geolocation database for Logstash" 65 | exit $RETVAL 66 | fi 67 | -------------------------------------------------------------------------------- /dependencies/etc/filebeat/filebeat.yml: -------------------------------------------------------------------------------- 1 | ################################################################################ 2 | ######################## Microway Cluster Management Software (MCMS) for OpenHPC 3 | ################################################################################ 4 | # 5 | # Copyright (c) 2016 by Microway, Inc. 6 | # 7 | # This file is part of Microway Cluster Management Software (MCMS) for OpenHPC. 8 | # 9 | # MCMS for OpenHPC is free software: you can redistribute it and/or modify 10 | # it under the terms of the GNU General Public License as published by 11 | # the Free Software Foundation, either version 3 of the License, or 12 | # (at your option) any later version. 13 | # 14 | # MCMS for OpenHPC is distributed in the hope that it will be useful, 15 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 16 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 17 | # GNU General Public License for more details. 18 | # 19 | # You should have received a copy of the GNU General Public License 20 | # along with MCMS. If not, see 21 | # 22 | ################################################################################ 23 | 24 | 25 | 26 | ################################################################################ 27 | ## 28 | ## The following log files will be monitored and their contents forwarded to 29 | ## the specified logstash server(s). Log parsing takes place on those host(s). 30 | ## 31 | ################################################################################ 32 | 33 | 34 | filebeat: 35 | prospectors: 36 | - 37 | paths: 38 | - /var/log/cron 39 | - /var/log/maillog 40 | - /var/log/messages 41 | - /var/log/secure 42 | input_type: log 43 | document_type: syslog 44 | - 45 | paths: 46 | - /var/log/slurm/*.log 47 | input_type: log 48 | document_type: slurm 49 | 50 | registry_file: /var/lib/filebeat/registry 51 | 52 | output: 53 | logstash: 54 | hosts: ["{sms_ip}:5044"] 55 | 56 | tls: 57 | certificate_authorities: ["/etc/pki/tls/certs/logstash-forwarder.crt"] 58 | -------------------------------------------------------------------------------- /dependencies/etc/genders: -------------------------------------------------------------------------------- 1 | # /etc/genders 2 | # 3 | # Defines cluster components (Head Node, Compute Nodes, etc.) 4 | # 5 | # Used by genders library for PDSH, PDCP and other utilities 6 | # 7 | # 8 | # Recommended types to define in every cluster: 9 | # head (sms) (the Head/Master/System Management Server) 10 | # login (the Login Nodes of the cluster) 11 | # compute (the Compute Nodes of the cluster) 12 | # 13 | ############################################################################## 14 | -------------------------------------------------------------------------------- /dependencies/etc/init.d/nvidia: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # 3 | # nvidia Set up NVIDIA GPU Compute Accelerators 4 | # 5 | # chkconfig: 2345 55 25 6 | # description: NVIDIA GPUs provide additional compute capability. \ 7 | # This service sets the GPUs into the desired state. 8 | # 9 | # config: /etc/sysconfig/nvidia 10 | 11 | ### BEGIN INIT INFO 12 | # Provides: nvidia 13 | # Required-Start: $local_fs $network $syslog 14 | # Required-Stop: $local_fs $syslog 15 | # Should-Start: $syslog 16 | # Should-Stop: $network $syslog 17 | # Default-Start: 2 3 4 5 18 | # Default-Stop: 0 1 6 19 | # Short-Description: Set GPUs into the desired state 20 | # Description: NVIDIA GPUs provide additional compute capability. 21 | # This service sets the GPUs into the desired state. 22 | ### END INIT INFO 23 | 24 | 25 | ################################################################################ 26 | ######################## Microway Cluster Management Software (MCMS) for OpenHPC 27 | ################################################################################ 28 | # 29 | # Copyright (c) 2015-2016 by Microway, Inc. 30 | # 31 | # This file is part of Microway Cluster Management Software (MCMS) for OpenHPC. 32 | # 33 | # MCMS for OpenHPC is free software: you can redistribute it and/or modify 34 | # it under the terms of the GNU General Public License as published by 35 | # the Free Software Foundation, either version 3 of the License, or 36 | # (at your option) any later version. 37 | # 38 | # MCMS for OpenHPC is distributed in the hope that it will be useful, 39 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 40 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 41 | # GNU General Public License for more details. 42 | # 43 | # You should have received a copy of the GNU General Public License 44 | # along with MCMS. If not, see 45 | # 46 | ################################################################################ 47 | 48 | 49 | # source function library 50 | . /etc/rc.d/init.d/functions 51 | 52 | # Some definitions to make the below more readable 53 | NVSMI=/usr/bin/nvidia-smi 54 | NVCONFIG=/etc/sysconfig/nvidia 55 | prog="nvidia" 56 | 57 | # default settings 58 | NVIDIA_ACCOUNTING=1 59 | NVIDIA_PERSISTENCE_MODE=1 60 | NVIDIA_COMPUTE_MODE=0 61 | NVIDIA_CLOCK_SPEEDS=max 62 | # pull in sysconfig settings 63 | [ -f $NVCONFIG ] && . $NVCONFIG 64 | 65 | RETVAL=0 66 | 67 | 68 | # Determine the maximum graphics and memory clock speeds for each GPU. 69 | # Create an array of clock speed pairs (memory,graphics) to be passed to nvidia-smi 70 | declare -a MAX_CLOCK_SPEEDS 71 | get_max_clocks() 72 | { 73 | GPU_QUERY="$NVSMI --query-gpu=clocks.max.memory,clocks.max.graphics --format=csv,noheader,nounits" 74 | 75 | MAX_CLOCK_SPEEDS=( $($GPU_QUERY | awk '{print $1 $2}') ) 76 | } 77 | 78 | 79 | start() 80 | { 81 | /sbin/lspci | grep -qi nvidia 82 | if [ $? -ne 0 ] ; then 83 | echo -n $"No NVIDIA GPUs present. Skipping NVIDIA GPU tuning." 84 | warning 85 | echo 86 | exit 0 87 | fi 88 | 89 | echo -n $"Starting $prog: " 90 | 91 | # If the nvidia-smi utility is missing, this script can't do its job 92 | [ -x $NVSMI ] || exit 5 93 | 94 | # A configuration file is not required 95 | if [ ! -f $NVCONFIG ] ; then 96 | echo -n $"No GPU config file present ($NVCONFIG) - using defaults" 97 | echo 98 | fi 99 | 100 | # Set persistence mode first to speed things up 101 | echo -n "persistence" 102 | $NVSMI --persistence-mode=$NVIDIA_PERSISTENCE_MODE 1> /dev/null 103 | RETVAL=$? 104 | 105 | if [ ! $RETVAL -gt 0 ]; then 106 | echo -n " accounting" 107 | $NVSMI --accounting-mode=$NVIDIA_ACCOUNTING 1> /dev/null 108 | RETVAL=$? 109 | fi 110 | 111 | if [ ! $RETVAL -gt 0 ]; then 112 | echo -n " compute" 113 | $NVSMI --compute-mode=$NVIDIA_COMPUTE_MODE 1> /dev/null 114 | RETVAL=$? 115 | fi 116 | 117 | 118 | if [ ! $RETVAL -gt 0 ]; then 119 | echo -n " clocks" 120 | if [ -n "$NVIDIA_CLOCK_SPEEDS" ]; then 121 | # If the requested clock speed value is "max", 122 | # work through each GPU and set to max speed. 123 | if [ "$NVIDIA_CLOCK_SPEEDS" == "max" ]; then 124 | get_max_clocks 125 | 126 | GPU_COUNTER=0 127 | GPUS_SKIPPED=0 128 | while [ "$GPU_COUNTER" -lt ${#MAX_CLOCK_SPEEDS[*]} ] && [ ! $RETVAL -gt 0 ]; do 129 | if [[ ${MAX_CLOCK_SPEEDS[$GPU_COUNTER]} =~ Supported ]] ; then 130 | if [ $GPUS_SKIPPED -eq 0 ] ; then 131 | echo 132 | GPUS_SKIPPED=1 133 | fi 134 | echo "Skipping non-boostable GPU" 135 | else 136 | $NVSMI -i $GPU_COUNTER --applications-clocks=${MAX_CLOCK_SPEEDS[$GPU_COUNTER]} 1> /dev/null 137 | fi 138 | RETVAL=$? 139 | 140 | GPU_COUNTER=$(( $GPU_COUNTER + 1 )) 141 | done 142 | else 143 | # This sets all GPUs to the same clock speeds (which only works 144 | # if the GPUs in this system are all the same). 145 | $NVSMI --applications-clocks=$NVIDIA_CLOCK_SPEEDS 1> /dev/null 146 | fi 147 | else 148 | $NVSMI --reset-applications-clocks 1> /dev/null 149 | fi 150 | RETVAL=$? 151 | fi 152 | 153 | if [ ! $RETVAL -gt 0 ]; then 154 | if [ -n "$NVIDIA_POWER_LIMIT" ]; then 155 | echo -n " power-limit" 156 | $NVSMI --power-limit=$NVIDIA_POWER_LIMIT 1> /dev/null 157 | RETVAL=$? 158 | fi 159 | fi 160 | 161 | if [ ! $RETVAL -gt 0 ]; then 162 | success 163 | else 164 | failure 165 | fi 166 | echo 167 | return $RETVAL 168 | } 169 | 170 | stop() 171 | { 172 | /sbin/lspci | grep -qi nvidia 173 | if [ $? -ne 0 ] ; then 174 | echo -n $"No NVIDIA GPUs present. Skipping NVIDIA GPU tuning." 175 | warning 176 | echo 177 | exit 0 178 | fi 179 | 180 | echo -n $"Stopping $prog: " 181 | [ -x $NVSMI ] || exit 5 182 | 183 | $NVSMI --persistence-mode=0 1> /dev/null && success || failure 184 | RETVAL=$? 185 | echo 186 | return $RETVAL 187 | } 188 | 189 | restart() { 190 | stop 191 | start 192 | } 193 | 194 | force_reload() { 195 | restart 196 | } 197 | 198 | status() { 199 | $NVSMI 200 | } 201 | 202 | case "$1" in 203 | start) 204 | start 205 | ;; 206 | stop) 207 | stop 208 | ;; 209 | restart) 210 | restart 211 | ;; 212 | force-reload) 213 | force_reload 214 | ;; 215 | status) 216 | status 217 | RETVAL=$? 218 | ;; 219 | *) 220 | echo $"Usage: $0 {start|stop|restart|force-reload|status}" 221 | RETVAL=2 222 | esac 223 | exit $RETVAL 224 | -------------------------------------------------------------------------------- /dependencies/etc/logrotate.d/slurm: -------------------------------------------------------------------------------- 1 | /var/log/slurm/*.log { 2 | weekly 3 | compress 4 | missingok 5 | nocopytruncate 6 | nodelaycompress 7 | nomail 8 | notifempty 9 | noolddir 10 | rotate 7 11 | sharedscripts 12 | size 10M 13 | postrotate 14 | /etc/init.d/slurm reconfig 15 | endscript 16 | } 17 | -------------------------------------------------------------------------------- /dependencies/etc/logstash/conf.d/10-beats-input.conf: -------------------------------------------------------------------------------- 1 | ################################################################################ 2 | ######################## Microway Cluster Management Software (MCMS) for OpenHPC 3 | ################################################################################ 4 | # 5 | # Copyright (c) 2016 by Microway, Inc. 6 | # 7 | # This file is part of Microway Cluster Management Software (MCMS) for OpenHPC. 8 | # 9 | # MCMS for OpenHPC is free software: you can redistribute it and/or modify 10 | # it under the terms of the GNU General Public License as published by 11 | # the Free Software Foundation, either version 3 of the License, or 12 | # (at your option) any later version. 13 | # 14 | # MCMS for OpenHPC is distributed in the hope that it will be useful, 15 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 16 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 17 | # GNU General Public License for more details. 18 | # 19 | # You should have received a copy of the GNU General Public License 20 | # along with MCMS. If not, see 21 | # 22 | ################################################################################ 23 | 24 | 25 | 26 | ################################################################################ 27 | ## 28 | ## This host will listen for FileBeats traffic 29 | ## 30 | ################################################################################ 31 | 32 | 33 | input { 34 | beats { 35 | port => 5044 36 | ssl => true 37 | ssl_certificate => "/etc/pki/tls/certs/logstash-forwarder.crt" 38 | ssl_key => "/etc/pki/tls/private/logstash-forwarder.key" 39 | } 40 | } 41 | 42 | -------------------------------------------------------------------------------- /dependencies/etc/logstash/conf.d/20-syslog-filters.conf: -------------------------------------------------------------------------------- 1 | ################################################################################ 2 | ######################## Microway Cluster Management Software (MCMS) for OpenHPC 3 | ################################################################################ 4 | # 5 | # Copyright (c) 2015-2016 by Microway, Inc. 6 | # 7 | # This file is part of Microway Cluster Management Software (MCMS) for OpenHPC. 8 | # 9 | # MCMS for OpenHPC is free software: you can redistribute it and/or modify 10 | # it under the terms of the GNU General Public License as published by 11 | # the Free Software Foundation, either version 3 of the License, or 12 | # (at your option) any later version. 13 | # 14 | # MCMS for OpenHPC is distributed in the hope that it will be useful, 15 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 16 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 17 | # GNU General Public License for more details. 18 | # 19 | # You should have received a copy of the GNU General Public License 20 | # along with MCMS. If not, see 21 | # 22 | ################################################################################ 23 | 24 | 25 | # MCMS tag documentation 26 | # ======================== 27 | # 28 | # The following event tags are interpreted and monitored by MCMS: 29 | # 30 | # * user_account_event creation or deletion of a user 31 | # * group_account_event creation or deletion of a group 32 | # 33 | # * remote_login_event a user logged in from a remote location 34 | # * local_login_event a user logged in from a local console 35 | # * remote_logout_event a user logged out from a remote location 36 | # * local_logout_event a user logged out from a local console 37 | # 38 | # * auth_failure_event security issue: a user failed to authenticate 39 | # 40 | # * config_error_event a configuration issue is causing errors 41 | # 42 | # * hardware_event an event occurred in the hardware 43 | # 44 | # 45 | # 46 | # When something must be reported, assign a value to the field "report". 47 | # The following values are interpreted and escalated by MCMS: 48 | # 49 | # * emergency The system is no longer functioning or is unusable. 50 | # Most likely, all staff on call should be alerted. 51 | # Example: Users are unable to read /home 52 | # 53 | # * alert A major issue has occurred which requires immediate 54 | # attention. A staff member should be alerted. 55 | # Example: the primary network uplink is down 56 | # 57 | # * critical An issue requires immediate attention, but is 58 | # related to a secondary system. 59 | # Example: the secondary network uplink is down 60 | # 61 | # * error A non-urgent error has occurred. An administrator 62 | # will need to take action, but can do so during the 63 | # next business day. 64 | # Example: one compute node has gone down 65 | # 66 | # * warning Warn of a condition which may result in an error if 67 | # action is not taken soon - not urgent. 68 | # Example: a filesystem is nearing 90% full 69 | # 70 | # * notice An unusual event has occurred, but it was not an 71 | # error. A message should be sent to administrators 72 | # for follow-up. 73 | # 74 | # * informational System is operating normally - no action required. 75 | # Often harvested for reporting or measuring purposes. 76 | # 77 | # * debug Not used during normal operations. This message 78 | # could be useful to developers when debugging. 79 | # 80 | # 81 | # These severity levels are the same as those used by syslog: 82 | # http://en.wikipedia.org/wiki/Syslog#Severity_levels 83 | 84 | 85 | filter { 86 | if [type] == "syslog" or [type] == "cron" or [type] == "mail" or [type] == "secure" { 87 | mutate { add_field => ["format", "syslog"] } 88 | 89 | # Feb 23 04:48:16 head slurmctld[3166]: completing job 36 90 | grok { 91 | overwrite => "message" 92 | # We're not saving the hostname - it's grabbed by logstash-forwarder 93 | match => [ 94 | "message", "^(?:<%{POSINT:syslog_pri}>)?%{SYSLOGTIMESTAMP:timestamp} %{SYSLOGHOST} (?:%{SYSLOGPROG}: )?%{GREEDYDATA:message}" 95 | ] 96 | } 97 | syslog_pri { } 98 | date { 99 | match => [ "timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss", "ISO8601" ] 100 | } 101 | } 102 | 103 | 104 | if [type] == "audit" { 105 | # type=USER_END msg=audit(1392782461.768:2961): user pid=75557 uid=0 auid=0 ses=447 msg='op=PAM:session_close acct="root" exe="/usr/sbin/crond" hostname=? addr=? terminal=cron res=success' 106 | grok { 107 | overwrite => "message" 108 | match => [ "message", "type=%{WORD:audit_type} msg=audit\(%{BASE10NUM:timestamp}:%{POSINT:audit_id}\): %{GREEDYDATA:message}" ] 109 | add_tag => [ "grokked" ] 110 | } 111 | date { 112 | match => [ "timestamp", "UNIX_MS" ] 113 | } 114 | } 115 | 116 | 117 | else if [type] == "apache-access" { 118 | # 10.0.0.3 - - [16/Feb/2014:05:22:51 -0500] "GET /WW/vnfs?hwaddr=00:25:90:6b:bb:5c HTTP/1.1" 500 620 "-" "Wget" 119 | grok { 120 | overwrite => "message" 121 | match => [ "message", "%{COMBINEDAPACHELOG}" ] 122 | add_tag => [ "grokked" ] 123 | } 124 | } 125 | 126 | 127 | else if [type] == "apache-error" { 128 | # [Sun Feb 16 09:32:44 2014] [notice] Apache/2.2.15 (Unix) DAV/2 PHP/5.3.3 mod_perl/2.0.4 Perl/v5.10.1 configured -- resuming normal operations 129 | # 130 | # [Sun Feb 16 03:22:27 2014] [error] [client 127.0.0.1] PHP Warning: date(): It is not safe to rely on the system's timezone settings. 131 | grok { 132 | overwrite => "message" 133 | match => [ 134 | "message", "\[%{GREEDYDATA:timestamp}\] \[%{WORD:loglevel}\](?: \[%{WORD:originator} %{IP:remote_ip}\])? %{GREEDYDATA:message}" 135 | ] 136 | add_tag => [ "grokked" ] 137 | } 138 | date { 139 | match => [ "timestamp", "EEE MMM dd HH:mm:ss YYYY" ] 140 | } 141 | } 142 | 143 | 144 | else if [type] == "cron" { 145 | # Uses syslog format, so timestamp/host/program/pid were parsed above 146 | 147 | # (root) CMD (run-parts /etc/cron.hourly) 148 | # run-parts(/etc/cron.hourly)[12910]: starting 0anacron 149 | grok { 150 | overwrite => "message" 151 | match => [ 152 | "message", "\(%{USERNAME:user}\) %{CRON_ACTION:action} \(%{DATA:message}\)", 153 | "message", "%{PROG:program}\(%{UNIXPATH:cron_stage}\)(?:\[%{POSINT:pid}\])?: %{GREEDYDATA:message}" 154 | ] 155 | add_tag => [ "grokked" ] 156 | } 157 | } 158 | 159 | 160 | # else if [type] == "mail" { 161 | # Uses syslog format, so timestamp/host/program/pid were parsed above 162 | # } 163 | 164 | 165 | # else if [type] == "mcelog" { 166 | # 167 | # } 168 | 169 | 170 | # else if [type] == "mysqld" { 171 | # 172 | # } 173 | 174 | 175 | # else if [type] == "opensm" { 176 | # 177 | # } 178 | 179 | 180 | else if [type] == "secure" { 181 | # Uses syslog format, so timestamp/host/program/pid were parsed above 182 | 183 | if [program] == "groupadd" { 184 | # new group: name=elasticsearch, GID=494 185 | grok { 186 | overwrite => "message" 187 | match => [ "message", "new group: name=%{WORD:group_name}, GID=%{NONNEGINT:gid}" ] 188 | add_tag => [ "group_account_event", "grokked" ] 189 | } 190 | # group added to /etc/group: name=elasticsearch, GID=494 191 | grok { 192 | overwrite => "message" 193 | match => [ "message", "group added to %{UNIXPATH:group_file} name=%{WORD:group_name}(?:, GID=%{NONNEGINT:gid})?" ] 194 | add_tag => [ "grokked" ] 195 | } 196 | } 197 | else if [program] == "groupdel" { 198 | # group 'testgroup' removed 199 | grok { 200 | overwrite => "message" 201 | match => [ "message", "group '%{WORD:group_name}' removed" ] 202 | add_tag => [ "group_account_event", "grokked" ] 203 | } 204 | # group 'testgroup' removed from /etc/group 205 | grok { 206 | overwrite => "message" 207 | match => [ "message", "group '%{WORD:group_name}' removed from %{UNIXPATH:group_file}" ] 208 | add_tag => [ "grokked" ] 209 | } 210 | } 211 | else if [program] == "useradd" { 212 | # new user: name=logstash, UID=494, GID=493, home=/opt/logstash, shell=/sbin/nologin 213 | grok { 214 | overwrite => "message" 215 | # Careful below: *nix PATHs can include commas 216 | match => [ "message", "new user: name=%{WORD:username}, UID=%{NONNEGINT:uid}, GID=%{NONNEGINT:gid}, home=%{GREEDYDATA:home_dir}, shell=%{UNIXPATH:shell}" ] 217 | add_tag => [ "user_account_event", "grokked" ] 218 | } 219 | # add 'eliot' to group 'benchmark' 220 | # add 'eliot' to shadow group 'benchmark' 221 | grok { 222 | overwrite => "message" 223 | match => [ "message", "add '%{WORD:username}' to(?: %{WORD:group_file})? group '%{WORD:group_name}'" ] 224 | add_tag => [ "grokked" ] 225 | } 226 | } 227 | else if [program] == "userdel" { 228 | # delete user 'test' 229 | grok { 230 | overwrite => "message" 231 | match => [ "message", "delete user '%{WORD:username}'" ] 232 | add_tag => [ "user_account_event", "grokked" ] 233 | } 234 | # TODO: removed group 'slurm' owned by 'slurm' 235 | } 236 | 237 | # else if [program] == "slurm" { 238 | # 239 | # } 240 | 241 | # else if [program] == "slurmctld" { 242 | # 243 | # } 244 | 245 | else if [program] == "sshd" { 246 | if "grokked" not in [tags] { 247 | # Accepted password for tiwa from 72.83.55.11 port 58019 ssh2 248 | grok { 249 | overwrite => "message" 250 | match => [ "message", "Accepted %{WORD:auth_method} for %{USERNAME:user} from %{IPORHOST:remote_ip} port %{POSINT:remote_port} %{WORD:remote_utility}" ] 251 | add_tag => [ "remote_login_event", "grokked" ] 252 | } 253 | } 254 | 255 | if "grokked" not in [tags] { 256 | # pam_unix(sshd:session): session opened for user coma by (uid=0) 257 | grok { 258 | overwrite => "message" 259 | match => [ "message", "%{WORD:pam_service}\(%{NOTSPACE:session_type}\): session opened for user %{USERNAME:username} by \(uid=%{NONNEGINT:uid}\)" ] 260 | add_tag => [ "remote_login_event", "grokked" ] 261 | } 262 | } 263 | 264 | if "grokked" not in [tags] { 265 | # pam_lastlog(sshd:session): unable to open /var/log/lastlog: No such file or directory 266 | # lastlog_openseek: Couldn't stat /var/log/lastlog: No such file or directory 267 | grok { 268 | overwrite => "message" 269 | match => [ "message", "%{NOTSPACE:calling_function}: (unable to open|Couldn't stat) %{GREEDYDATA:missing_file}: No such file or directory" ] 270 | add_tag => [ "config_error_event", "grokked" ] 271 | } 272 | } 273 | 274 | if "grokked" not in [tags] { 275 | # subsystem request for sftp 276 | grok { 277 | match => [ "message", "subsystem request for %{WORD:subsystem_type}" ] 278 | add_tag => [ "remote_login_event", "grokked" ] 279 | } 280 | } 281 | 282 | if "grokked" not in [tags] { 283 | # Failed password for eliot from 76.19.216.82 port 40415 ssh2 284 | # Failed password for invalid user cordm from 50.159.20.141 port 51799 ssh2 285 | grok { 286 | overwrite => "message" 287 | match => [ "message", "Failed %{WORD:auth_method} for (?:%{WORD:user_status} user )?%{USERNAME:username} from %{IPORHOST:remote_ip} port %{POSINT:remote_port} %{WORD:remote_utility}" ] 288 | add_tag => [ "auth_failure_event", "invalid_password", "grokked" ] 289 | } 290 | } 291 | 292 | if "grokked" not in [tags] { 293 | # pam_unix(sshd:auth): check pass; user unknown 294 | grok { 295 | match => [ "message", "%{WORD:pam_service}\(%{NOTSPACE:session_type}\): check pass; user unknown" ] 296 | add_tag => [ "auth_failure_event", "invalid_username", "grokked" ] 297 | } 298 | } 299 | 300 | if "grokked" not in [tags] { 301 | # input_userauth_request: invalid user cordm 302 | grok { 303 | match => [ "message", "%{WORD:auth_request}: invalid user %{USERNAME:username}" ] 304 | add_tag => [ "auth_failure_event", "invalid_username", "grokked" ] 305 | } 306 | } 307 | 308 | if "grokked" not in [tags] { 309 | # Invalid user cordm from 50.159.20.141 310 | grok { 311 | match => [ "message", "Invalid user %{USERNAME:username} from %{IPORHOST:remote_ip}" ] 312 | add_tag => [ "auth_failure_event", "invalid_username", "grokked" ] 313 | } 314 | } 315 | 316 | if "grokked" not in [tags] { 317 | # pam_succeed_if(sshd:auth): error retrieving information about user cordm 318 | grok { 319 | match => [ "message", "%{WORD:pam_service}\(%{NOTSPACE:session_type}\): error retrieving information about user %{USERNAME:username}" ] 320 | add_tag => [ "auth_failure_event", "invalid_username", "grokked" ] 321 | } 322 | } 323 | 324 | if "grokked" not in [tags] { 325 | # pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.50.1.1 user=root 326 | # pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=c-50-159-20-141.hsd1.wa.comcast.net 327 | grok { 328 | overwrite => "message" 329 | match => [ "message", "%{WORD:pam_service}\(%{NOTSPACE:session_type}\): authentication failure; logname=(?:%{WORD:logname})? uid=%{NONNEGINT:uid} euid=%{NONNEGINT:euid} tty=%{WORD:tty} ruser=(?:%{USERNAME:remote_user})? rhost=%{IPORHOST:remote_ip}(?:%{SPACE}user=%{USERNAME:username})?" ] 330 | add_tag => [ "auth_failure_event", "grokked" ] 331 | } 332 | } 333 | 334 | if "grokked" not in [tags] { 335 | # PAM 1 more authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=c-50-159-20-141.hsd1.wa.comcast.net 336 | # PAM 2 more authentication failures; logname= uid=0 euid=0 tty=ssh ruser= rhost=c-76-19-216-82.hsd1.ma.comcast.net user=eliot 337 | grok { 338 | overwrite => "message" 339 | match => [ "message", "PAM %{POSINT:failure_count} more authentication failure(?:s)?; logname=(?:%{WORD:logname})? uid=%{NONNEGINT:uid} euid=%{NONNEGINT:euid} tty=%{WORD:tty} ruser=(?:%{USERNAME:remote_user})? rhost=%{IPORHOST:remote_ip}(?:%{SPACE}user=%{USERNAME:username})?" ] 340 | add_tag => [ "auth_failure_event", "grokked" ] 341 | } 342 | } 343 | 344 | if "grokked" not in [tags] { 345 | # Connection closed by 10.0.0.4 346 | grok { 347 | match => [ "message", "Connection closed by %{IPORHOST:remote_ip}" ] 348 | add_tag => [ "remote_logout_event", "grokked" ] 349 | } 350 | } 351 | 352 | if "grokked" not in [tags] { 353 | # pam_unix(sshd:session): session closed for user coma 354 | grok { 355 | overwrite => "message" 356 | match => [ "message", "%{WORD:pam_service}\(%{NOTSPACE:session_type}\): session closed for user %{USERNAME:username}" ] 357 | add_tag => [ "remote_logout_event", "grokked" ] 358 | } 359 | } 360 | 361 | if "grokked" not in [tags] { 362 | # Received disconnect from 10.0.0.254: 11: disconnected by user 363 | grok { 364 | match => [ "message", "Received disconnect from %{IPORHOST:remote_ip}" ] 365 | add_tag => [ "remote_logout_event", "grokked" ] 366 | } 367 | } 368 | } 369 | 370 | else if [program] == "sudo" { 371 | # pam_unix(sudo:auth): authentication failure; logname=root uid=500 euid=0 tty=/dev/pts/5 ruser=eliot rhost= user=eliot 372 | # eliot : user NOT in sudoers ; TTY=pts/5 ; PWD=/home/eliot ; USER=root ; COMMAND=/bin/ls / 373 | grok { 374 | overwrite => "message" 375 | match => [ 376 | "message", "%{WORD:pam_service}\(%{NOTSPACE:session_type}\): authentication failure; logname=%{USERNAME:logname} uid=%{NONNEGINT:uid} euid=%{NONNEGINT:euid} tty=%{TTY:tty} ruser=%{USERNAME:remote_username} rhost=(?:%{IPORHOST:remote_ip})? user=%{USERNAME:username}", 377 | "message", "%{USERNAME:username} : user NOT in sudoers ; TTY=%{GREEDYDATA:tty} ; PWD=%{UNIXPATH:pwd} ; USER=%{USERNAME:sudo_username} ; COMMAND=%{GREEDYDATA:command}" 378 | ] 379 | add_tag => [ "auth_failure_event", "grokked" ] 380 | } 381 | } 382 | } 383 | 384 | # else if [type] == "spooler" { 385 | # 386 | # } 387 | 388 | # else if [type] == "yum" { 389 | # 390 | # } 391 | 392 | 393 | ################################################################ 394 | # Mark type of host (compute node events are less critical) 395 | ################################################################ 396 | if [host] =~ /^node.+/ { 397 | mutate { add_field => ["node_type", "compute"] } 398 | } 399 | else if [host] =~ /^storage.+/ { 400 | mutate { add_field => ["node_type", "storage"] } 401 | } 402 | else { 403 | mutate { add_field => ["node_type", "management"] } 404 | } 405 | 406 | 407 | ################################################################ 408 | # Lookup Geolocation data for remote hosts 409 | ################################################################ 410 | if [remote_ip] =~ /(.+)/ { 411 | # Make sure we didn't pick up a hostname instead of an IP. 412 | # If so, overwrite the hostname with the IP address. 413 | if ! ([remote_ip] =~ /[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/) { 414 | dns { 415 | action => "replace" 416 | resolve => "remote_ip" 417 | } 418 | } 419 | 420 | geoip { 421 | source => "remote_ip" 422 | target => "geoip" 423 | database => "/etc/logstash/GeoLiteCity.dat" 424 | add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ] 425 | add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}" ] 426 | } 427 | mutate { 428 | convert => [ "[geoip][coordinates]", "float"] 429 | } 430 | } 431 | 432 | 433 | ################################################################ 434 | # Mark hardware events (likely to be more critical) if they 435 | # have not already been assigned a severity level. 436 | ################################################################ 437 | if ! ([report] =~ /(.+)/) { 438 | if [type] == "mcelog" { 439 | mutate { add_field => ["report", "error"] } 440 | } 441 | else if [type] == "syslog" { 442 | if [program] =~ /.*ipmiseld/ { 443 | mutate { add_field => ["report", "error"] } 444 | } 445 | } 446 | } 447 | 448 | 449 | ################################################################ 450 | # Set up events which will be forwarded to Shinken/Nagios 451 | ################################################################ 452 | if [report] =~ /(.+)/ { 453 | mutate { 454 | add_field => [ "nagios_host", "%{host}", 455 | "nagios_service", "%{program}" ] 456 | } 457 | } 458 | 459 | 460 | ################################################################ 461 | # Clean up after ourselves - remove any internal tags/fields 462 | ################################################################ 463 | if [format] =~ /(.+)/ { 464 | mutate { remove_field => ["format"] } 465 | } 466 | if "_grokparsefailure" in [tags] and "grokked" in [tags] { 467 | mutate { remove_tag => ["_grokparsefailure"] } 468 | } 469 | } 470 | 471 | -------------------------------------------------------------------------------- /dependencies/etc/logstash/conf.d/90-elasticsearch-output.conf: -------------------------------------------------------------------------------- 1 | ################################################################################ 2 | ######################## Microway Cluster Management Software (MCMS) for OpenHPC 3 | ################################################################################ 4 | # 5 | # Copyright (c) 2016 by Microway, Inc. 6 | # 7 | # This file is part of Microway Cluster Management Software (MCMS) for OpenHPC. 8 | # 9 | # MCMS for OpenHPC is free software: you can redistribute it and/or modify 10 | # it under the terms of the GNU General Public License as published by 11 | # the Free Software Foundation, either version 3 of the License, or 12 | # (at your option) any later version. 13 | # 14 | # MCMS for OpenHPC is distributed in the hope that it will be useful, 15 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 16 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 17 | # GNU General Public License for more details. 18 | # 19 | # You should have received a copy of the GNU General Public License 20 | # along with MCMS. If not, see 21 | # 22 | ################################################################################ 23 | 24 | ################################################################################# 25 | ## 26 | ## All data will be written to the local ElasticSearch database 27 | ## 28 | ################################################################################# 29 | 30 | 31 | output { 32 | elasticsearch { 33 | hosts => ["localhost:9200"] 34 | sniffing => true 35 | manage_template => false 36 | index => "%{[@metadata][beat]}-%{+YYYY.MM.dd}" 37 | document_type => "%{[@metadata][type]}" 38 | } 39 | } 40 | 41 | -------------------------------------------------------------------------------- /dependencies/etc/logstash/conf.d/91-additional-output.conf: -------------------------------------------------------------------------------- 1 | ################################################################################ 2 | ######################## Microway Cluster Management Software (MCMS) for OpenHPC 3 | ################################################################################ 4 | # 5 | # Copyright (c) 2016 by Microway, Inc. 6 | # 7 | # This file is part of Microway Cluster Management Software (MCMS) for OpenHPC. 8 | # 9 | # MCMS for OpenHPC is free software: you can redistribute it and/or modify 10 | # it under the terms of the GNU General Public License as published by 11 | # the Free Software Foundation, either version 3 of the License, or 12 | # (at your option) any later version. 13 | # 14 | # MCMS for OpenHPC is distributed in the hope that it will be useful, 15 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 16 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 17 | # GNU General Public License for more details. 18 | # 19 | # You should have received a copy of the GNU General Public License 20 | # along with MCMS. If not, see 21 | # 22 | ################################################################################ 23 | 24 | ################################################################################# 25 | ## 26 | ## Additional optional outputs 27 | ## 28 | ################################################################################# 29 | 30 | 31 | output { 32 | ################################################################ 33 | # Optionally, elect to forward items to Graphite/Statsd 34 | ################################################################ 35 | # if "hardware_event" in [tags] or "user_account_event" in [tags] or "group_account_event" in [tags] { 36 | # graphite { 37 | # host => "10.0.0.254" 38 | # } 39 | # statsd { } 40 | # } 41 | 42 | 43 | ################################################################ 44 | # Optionally, elect to forward items to Shinken/Nagios 45 | ################################################################ 46 | # if [report] =~ /(.+)/ { 47 | # nagios { } 48 | # } 49 | } 50 | 51 | -------------------------------------------------------------------------------- /dependencies/etc/microway/mcms_database.conf: -------------------------------------------------------------------------------- 1 | ################################################################################ 2 | ######################## Microway Cluster Management Software (MCMS) for OpenHPC 3 | ################################################################################ 4 | # 5 | # Copyright (c) 2016 by Microway, Inc. 6 | # 7 | # This file is part of Microway Cluster Management Software (MCMS) for OpenHPC. 8 | # 9 | # MCMS for OpenHPC is free software: you can redistribute it and/or modify 10 | # it under the terms of the GNU General Public License as published by 11 | # the Free Software Foundation, either version 3 of the License, or 12 | # (at your option) any later version. 13 | # 14 | # MCMS for OpenHPC is distributed in the hope that it will be useful, 15 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 16 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 17 | # GNU General Public License for more details. 18 | # 19 | # You should have received a copy of the GNU General Public License 20 | # along with MCMS. If not, see 21 | # 22 | ################################################################################ 23 | 24 | 25 | ################################################################################ 26 | # 27 | # This file contains the credentials for accessing the MCMS database. 28 | # 29 | # NORMAL USERS SHOULD NOT BE ABLE TO READ THIS FILE: 30 | # 31 | # chmod 600 /etc/microway/mcms_database.conf 32 | # 33 | ################################################################################ 34 | 35 | 36 | ################################################################################ 37 | # This file uses BASH syntax (and is sourced by a bash shell). If you need to, 38 | # you can use shell programming and advanced features inside this file. 39 | ################################################################################ 40 | 41 | mcms_database_user='mcmsDBAdmin' 42 | 43 | mcms_database_password='ChangeMe' 44 | 45 | -------------------------------------------------------------------------------- /dependencies/etc/nhc/compute-node-checks.conf: -------------------------------------------------------------------------------- 1 | ################################################################################ 2 | ######################## Microway Cluster Management Software (MCMS) for OpenHPC 3 | ################################################################################ 4 | # 5 | # Copyright (c) 2015-2016 by Microway, Inc. 6 | # 7 | # This file is part of Microway Cluster Management Software (MCMS) for OpenHPC. 8 | # 9 | # MCMS for OpenHPC is free software: you can redistribute it and/or modify 10 | # it under the terms of the GNU General Public License as published by 11 | # the Free Software Foundation, either version 3 of the License, or 12 | # (at your option) any later version. 13 | # 14 | # MCMS for OpenHPC is distributed in the hope that it will be useful, 15 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 16 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 17 | # GNU General Public License for more details. 18 | # 19 | # You should have received a copy of the GNU General Public License 20 | # along with MCMS. If not, see 21 | # 22 | ################################################################################ 23 | 24 | 25 | ################################################################################ 26 | # 27 | # Node Health Check (NHC) configuration file 28 | # 29 | # Checks to be run on compute nodes only 30 | # 31 | # These tests are typically run by the batch scheduler (e.g., Torque, SLURM) to 32 | # ensure that the compute nodes are in a healthy state. A variety of checks are 33 | # executed, including hardware health, software health and filesystem health. 34 | # 35 | # All tests in this file should run very fast and should not use system 36 | # resources as they will be running at the same time as compute jobs. Tests must 37 | # not access any network filesystems, as they can hang. Longer-running health 38 | # checks and filesystem checks should be put in one of the following files: 39 | # 40 | # compute-node-checks_intense.conf (resource-intensive checks) 41 | # compute-node-checks_blocking-io.conf (checks which could lock up) 42 | # 43 | # 44 | # Lines are in the form "||" 45 | # Hostmask is a glob, /regexp/, or {noderange} 46 | # Comments begin with '#' 47 | # 48 | ################################################################################ 49 | 50 | 51 | ####################################################################### 52 | ### 53 | ### NHC Configuration Variables 54 | ### 55 | 56 | # 57 | # NHC-wide configuration settings (such as PATH and resource manager) 58 | # are set system-wide in the file: 59 | # 60 | # /etc/sysconfig/nhc 61 | # 62 | 63 | 64 | ####################################################################### 65 | ### 66 | ### CPU & Memory Hardware checks 67 | ### 68 | 69 | # Set these to your correct CPU socket, core, and thread counts 70 | * || check_hw_cpuinfo 2 28 28 71 | 72 | # Compares the accumulated CPU time (in seconds) between kswapd kernel threads 73 | # to make sure there's no imbalance among different NUMA nodes (which could be 74 | # an early symptom of failure). 75 | # 76 | # Max 500 CPU hours; 100x discrepancy limit 77 | * || check_ps_kswapd 1800000 100 log syslog 78 | 79 | # Check that the correct amount of memory is present (with a fudge factor) 80 | * || check_hw_physmem 256GB 256GB 2% 81 | 82 | # Check that at least 1MB of physical memory is free 83 | * || check_hw_physmem_free 1MB 84 | 85 | # If less than 100MB of Memory+SWAP is free, things will die soon 86 | * || check_hw_mem_free 100MB 87 | 88 | # Make sure swap is present (without being too picky on the capacity) 89 | * || check_hw_swap 2G 1TB 90 | 91 | # If less than 1GB of SWAP is free, things will be moving slowly! 92 | * || check_hw_swap_free 1GB 93 | 94 | # Make sure the memory is running at the correct frequency / bus rate 95 | * || check_dmi_data_match -t "Memory Device" "*Speed: 2400 MHz" 96 | 97 | # Check for MCEs (memory warnings and errors) 98 | * || check_hw_mcelog 99 | 100 | # Ensure nodes are not overloaded. The rule of thumb is that load should remain 101 | # below 2-times the number of CPU cores, but we'll allow for short bursts. The 102 | # 1-minute load can be up 4xCoreCount; 5-minute load must be below 2xCoreCount: 103 | * || check_ps_loadavg $((4*$HW_CORES)) $((2*$HW_CORES)) 104 | 105 | 106 | ####################################################################### 107 | ### 108 | ### Network checks 109 | ### 110 | 111 | # Check that there's an active ethernet interface named "eth0" 112 | * || check_hw_eth eth0 113 | 114 | # Check for an IB interface that shows LinkUp (with the specified datarate) 115 | * || check_hw_ib 56 116 | 117 | 118 | ####################################################################### 119 | ### 120 | ### Filesystem checks 121 | ### 122 | 123 | # Filesystems which should be mounted (simply check for their presence) 124 | * || check_fs_mount_rw -f / 125 | * || check_fs_mount_rw -f /tmp 126 | * || check_fs_mount_rw -f /home 127 | * || check_fs_mount_rw /dev/pts '/(none|devpts)/' devpts 128 | 129 | # 130 | # Check for modest amounts of free space in the important places. 131 | # Free inodes are also important. 132 | # 133 | # Only check local filesystems in this file! Checking network filesystems 134 | # can hang badly, so such things must be checked via this file: 135 | # 136 | # /etc/nhc/compute-node-checks_blocking-io.conf 137 | # 138 | 139 | * || export DF_FLAGS="-Tkal" 140 | * || export DFI_FLAGS="-Tkal" 141 | 142 | * || check_fs_free / 3% 143 | * || check_fs_ifree / 1k 144 | 145 | * || check_fs_free /tmp 3% 146 | * || check_fs_ifree /tmp 1k 147 | 148 | * || check_fs_free /var 3% 149 | * || check_fs_ifree /var 1k 150 | 151 | * || check_fs_free /var/tmp 3% 152 | * || check_fs_ifree /var/tmp 1k 153 | 154 | * || check_fs_free /var/log 3% 155 | * || check_fs_ifree /var/log 1k 156 | 157 | 158 | ####################################################################### 159 | ### 160 | ### File/metadata checks 161 | ### 162 | 163 | # These should always be directories and always be read/write/execute and sticky. 164 | * || check_file_test -r -w -x -d -k /tmp /var/tmp 165 | 166 | # Assert common properties for devices which occasionally get clobbered 167 | * || check_file_test -c -r -w /dev/null /dev/zero 168 | * || check_file_stat -m 0666 -u 0 -g 0 -t 1 -T 3 /dev/null 169 | 170 | # These should always be readable and should never be empty. 171 | * || check_file_test -r -s /etc/passwd /etc/group 172 | 173 | # Validate a couple important accounts in the passwd and group files 174 | * || check_file_contents /etc/passwd "/^root:x:0:0:/" "sshd:*" 175 | * || check_file_contents /etc/group "/^root:x:0:/" 176 | 177 | # Make sure there's relatively recent (~2 hours) activity from the syslog 178 | * || check_file_stat -n 7200 /var/log/messages 179 | 180 | 181 | ####################################################################### 182 | ### 183 | ### Process checks 184 | ### 185 | 186 | # Ensure the SSH daemon is running (and start it if not) 187 | * || check_ps_service -u root -S sshd 188 | 189 | # Processes which should be running (restart them, if necessary) 190 | * || check_ps_service -u root -r crond 191 | * || check_ps_service -u ganglia -r gmond 192 | * || check_ps_service -u root -r ipmiseld 193 | * || check_ps_service -u root -r filebeat 194 | * || check_ps_service -u root -r mcelog 195 | * || check_ps_service -u ntp -r ntpd 196 | 197 | # SLURM Resource Manager / Batch Scheduler Processes 198 | * || check_ps_service -u munge -r munged 199 | * || check_ps_service -u root -r slurmd 200 | 201 | # TORQUE Resource Manager / Batch Scheduler Processes 202 | # * || check_ps_service -u root -r trqauthd 203 | # * || check_ps_service -u root -r pbs_mom 204 | 205 | # Most systems also need NFS locking services. 206 | # * || check_ps_service -d rpc.statd -r nfslock 207 | 208 | # The audit daemon can sometimes disappear if things get hairy. 209 | # * || check_ps_service -r auditd 210 | 211 | # This is only valid for RHEL6 and similar/newer systems. 212 | # * || check_ps_service -d rsyslogd -r rsyslog 213 | 214 | # In the case of MySQL, it's typically better to cycle. 215 | # * || check_ps_service -c mysqld 216 | 217 | # If desired, watch for users manually running commands and log them. 218 | # * || check_ps_unauth_users log syslog 219 | 220 | # If desired, make sure no users are SSH'd in, but don't kill them. 221 | # * || check_ps_blacklist sshd '!root' 222 | 223 | 224 | ####################################################################### 225 | ### 226 | ### GPU checks 227 | ### 228 | 229 | # This is a fast-running, less-intense run of the GPU health test 230 | * || NVIDIA_HEALTHMON_ARGS="-v" 231 | * || check_nv_healthmon 232 | 233 | -------------------------------------------------------------------------------- /dependencies/etc/nhc/compute-node-checks_blocking-io.conf: -------------------------------------------------------------------------------- 1 | ################################################################################ 2 | ######################## Microway Cluster Management Software (MCMS) for OpenHPC 3 | ################################################################################ 4 | # 5 | # Copyright (c) 2015-2016 by Microway, Inc. 6 | # 7 | # This file is part of Microway Cluster Management Software (MCMS) for OpenHPC. 8 | # 9 | # MCMS for OpenHPC is free software: you can redistribute it and/or modify 10 | # it under the terms of the GNU General Public License as published by 11 | # the Free Software Foundation, either version 3 of the License, or 12 | # (at your option) any later version. 13 | # 14 | # MCMS for OpenHPC is distributed in the hope that it will be useful, 15 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 16 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 17 | # GNU General Public License for more details. 18 | # 19 | # You should have received a copy of the GNU General Public License 20 | # along with MCMS. If not, see 21 | # 22 | ################################################################################ 23 | 24 | 25 | ################################################################################ 26 | # 27 | # Node Health Check (NHC) configuration file 28 | # 29 | # Checks to be run on compute nodes only 30 | # 31 | # These tests are started when Compute Nodes are idle (before a job starts). All 32 | # filesystem-intensive checks (including checks of network filesystems) should 33 | # be performed here. These tests should be written with the understanding that 34 | # they may lock up if a filesystem hangs (e.g., if the NFS server goes down). 35 | # 36 | # 37 | # Lines are in the form "||" 38 | # Hostmask is a glob, /regexp/, or {noderange} 39 | # Comments begin with '#' 40 | # 41 | ################################################################################ 42 | 43 | 44 | ####################################################################### 45 | ### 46 | ### NHC Configuration Variables 47 | ### 48 | 49 | # 50 | # NHC-wide configuration settings (such as PATH and resource manager) 51 | # are set system-wide in the file: 52 | # 53 | # /etc/sysconfig/nhc 54 | # 55 | 56 | 57 | ####################################################################### 58 | ### 59 | ### Filesystem checks 60 | ### 61 | 62 | # 63 | # Check for modest amounts of free space. Free inodes are also important. 64 | # 65 | 66 | * || export DF_FLAGS="-Tka" 67 | * || export DFI_FLAGS="-Tka" 68 | 69 | * || check_fs_free / 3% 70 | * || check_fs_ifree / 1k 71 | 72 | * || check_fs_free /home 3% 73 | * || check_fs_ifree /home 1k 74 | 75 | * || check_fs_free /opt 3% 76 | * || check_fs_ifree /opt 1k 77 | 78 | * || check_fs_free /tmp 3% 79 | * || check_fs_ifree /tmp 1k 80 | 81 | * || check_fs_free /var 3% 82 | * || check_fs_ifree /var 1k 83 | 84 | * || check_fs_free /var/tmp 3% 85 | * || check_fs_ifree /var/tmp 1k 86 | 87 | * || check_fs_free /var/log 3% 88 | * || check_fs_ifree /var/log 1k 89 | 90 | -------------------------------------------------------------------------------- /dependencies/etc/nhc/compute-node-checks_intense.conf: -------------------------------------------------------------------------------- 1 | ################################################################################ 2 | ######################## Microway Cluster Management Software (MCMS) for OpenHPC 3 | ################################################################################ 4 | # 5 | # Copyright (c) 2015-2016 by Microway, Inc. 6 | # 7 | # This file is part of Microway Cluster Management Software (MCMS) for OpenHPC. 8 | # 9 | # MCMS for OpenHPC is free software: you can redistribute it and/or modify 10 | # it under the terms of the GNU General Public License as published by 11 | # the Free Software Foundation, either version 3 of the License, or 12 | # (at your option) any later version. 13 | # 14 | # MCMS for OpenHPC is distributed in the hope that it will be useful, 15 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 16 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 17 | # GNU General Public License for more details. 18 | # 19 | # You should have received a copy of the GNU General Public License 20 | # along with MCMS. If not, see 21 | # 22 | ################################################################################ 23 | 24 | 25 | ################################################################################ 26 | # 27 | # Node Health Check (NHC) configuration file 28 | # 29 | # Checks to be run on compute nodes only 30 | # 31 | # These tests are started when Compute Nodes are idle (before a job starts). All 32 | # resource-intensive checks (excluding filesystems) should be performed here. 33 | # Example subsystems to test would be: CPU, memory, GPUs, accelerators, etc. 34 | # 35 | # 36 | # Lines are in the form "||" 37 | # Hostmask is a glob, /regexp/, or {noderange} 38 | # Comments begin with '#' 39 | # 40 | ################################################################################ 41 | 42 | 43 | ####################################################################### 44 | ### 45 | ### NHC Configuration Variables 46 | ### 47 | 48 | # 49 | # NHC-wide configuration settings (such as PATH and resource manager) 50 | # are set system-wide in the file: 51 | # 52 | # /etc/sysconfig/nhc 53 | # 54 | 55 | 56 | ####################################################################### 57 | ### 58 | ### GPU checks 59 | ### 60 | 61 | # This test performs an examination of GPU health and bus transfer speeds 62 | * || NVIDIA_HEALTHMON_ARGS="-e -v" 63 | * || check_nv_healthmon 64 | 65 | -------------------------------------------------------------------------------- /dependencies/etc/nvidia-healthmon.conf: -------------------------------------------------------------------------------- 1 | ;;; 2 | ;;; The global section contains configurations that apply to all devices 3 | ;;; 4 | [global] 5 | 6 | ;; 7 | ;; Enable this setting to ensure that the expected number of Tesla brand GPUs 8 | ;; are detected by the NVML library. 9 | ;; 10 | ;; This count only includes Tesla brand GPUs that the nvidia-healthmon 11 | ;; process has sufficient permission to access. 12 | ;; 13 | ;; If this setting is not configured, then checks that require it will skip 14 | ;; 15 | ; devices.tesla.count = 1 16 | 17 | ;; 18 | ;; nvidia-healthmon checks the system for drivers that have been known to 19 | ;; cause issues with NVIDIA hardware, drivers, and software. The following 20 | ;; list contains the names of drivers which are known to cause problems. 21 | ;; If nvidia-healthmon detects any blacklisted drivers it will not 22 | ;; execute further tests. 23 | ;; 24 | ;; You may add/remove drivers on this list at your own risk. 25 | ;; 26 | ;; If this setting is not configured, then checks that require it will skip 27 | ;; 28 | ; 29 | drivers.blacklist = nouveau 30 | 31 | ;;; 32 | ;;; The configuration in each device section only applies to devices of that SKU 33 | ;;; Below is an explanation of all fields that can be set in the device section 34 | ;;; 35 | ; 36 | ;[Tesla K20m] 37 | ; 38 | ;; Each device section starts with the name of the device 39 | ;; Run nvidia-smi to determine the name of your GPU 40 | ; 41 | ;; Bandwidth configuration 42 | ; 43 | ;; nvidia-healthmon can check the PCIe bandwidth between the pinned host 44 | ;; memory and GPU memory 45 | ;; If the bandwidth from the host to GPU or from the GPU to the host is 46 | ;; below this value (in MB/s), nvidia-healthmon will generate a warning 47 | ;; 48 | ;; If this setting is not configured, then checks that require it will skip 49 | ; 50 | ;bandwidth.warn = 1500 51 | ; 52 | ; 53 | ;; nvidia-healthmon can check the PCIe bandwidth between the pinned host 54 | ;; memory and GPU memory 55 | ;; If the bandwidth from the host to GPU or from the GPU to the host is 56 | ;; below this value (in MB/s), nvidia-healthmon will generate an error 57 | ;; 58 | ;; If this setting is not configured, then checks that require it will skip 59 | ; 60 | ;bandwidth.min = 100 61 | ; 62 | ; 63 | ;; Peer to Peer configuration 64 | ; 65 | ;; nvidia-healthmon can check whether peer to peer access is supported between 66 | ;; GPUs on the same host. It can then run a bandwidth test between two GPUs. 67 | ;; In the case that peer access is supported, if the bandwidth from one GPU to 68 | ;; the other GPU is supported is below this value (in MB/s), nvidia-healthmon 69 | ;; will generate a warning. If peer to peer access is not supported, the 70 | ;; bandwidth test is still run, but no comparison to the minimum bandwidth is 71 | ;; done. 72 | ; 73 | ;peer.bandwidth.warn = 8000 74 | ; 75 | ; 76 | ;; nvidia-healthmon can check whether peer to peer access is supported between 77 | ;; GPUs on the same host. It can then run a bandwidth test between two GPUs. 78 | ;; In the case that peer access is supported, if the bandwidth from one GPU to 79 | ;; the other GPU is supported is below this value (in MB/s), nvidia-healthmon 80 | ;; will generate an error. If peer to peer access is not supported, the 81 | ;; bandwidth test is still run, but no comparison to the minimum bandwidth is 82 | ;; done. 83 | ; 84 | ;peer.bandwidth.min = 5000 85 | ; 86 | ;; PCIe link configuration 87 | ; 88 | ;; nvidia-healthmon can compare the maximum PCIe link generation for the PCIe 89 | ;; link closest to the GPU chip against the value specified here. 90 | ;; 91 | ;; If this setting is not configured, then checks that require it will skip 92 | ;; An error will be generated if there is a mismatch 93 | ;; 94 | ;; For GPU board that contain multiple GPU chips, this value will reflect 95 | ;; the PCIe link generation between the GPU chip and an on board PCIe switch. 96 | ;; For single GPU boards this value reflects the link width between the GPU 97 | ;; chip and the PCIe slot the GPU is connected to. 98 | ;; Note that additional PCIe links upstream from the GPU may have a 99 | ;; different link generation. Those links are not considered here. 100 | ;; 101 | ; 102 | ;pci.gen = 1 103 | ; 104 | ; 105 | ;; nvidia-healthmon can compare the maximum PCIe link width for the PCIe 106 | ;; link closest to the GPU chip against the value specified here. 107 | ;; 108 | ;; If this setting is not configured, then checks that require it will skip 109 | ;; An error will be generated if there is a mismatch 110 | ;; 111 | ;; For GPU board that contain multiple GPU chips, this value will reflect 112 | ;; the PCIe link width between the GPU chip and an on board PCIe switch. 113 | ;; For single GPU boards this value reflects the link width between the GPU 114 | ;; chip and the PCIe slot the GPU is connected to. 115 | ;; Note that additional PCIe links upstream from the GPU may have a 116 | ;; different link width. Those links are not considered here 117 | ; 118 | ;pci.width = 16 119 | ; 120 | ;; nvidia-healthmon can compare the current temperature to a warning level in 121 | ;; degrees Celsius. A warning will be generated if the current temperature is 122 | ;; at or above the warning level 123 | ;; 124 | ;; Note that the desired temperature may vary based on the cooling system used 125 | ;; 126 | ;; If this setting is not configured, then checks that require it will skip 127 | ; 128 | ;temperature.warn = 95 129 | ; 130 | 131 | ;;; 132 | ;;; NVIDIA provides default configuration for various settings of various GPUs 133 | ;;; 134 | ;;; Some fields provide conservative maximum expected values 135 | ;;; Some fields are highly system specific, so no default is provided 136 | ;;; Please adjust these values as needed based on local system configuration 137 | ;;; 138 | 139 | [Tesla K10.G1.8GB] 140 | ; This value is affected by a number of factors 141 | ; Let's assume a PCIe Gen 2 system with 8x lane width 142 | ; If your system supports only Gen 1 or <8x lane width, this estimate may be 143 | ; too high. If your system supports Gen 3 or >8x lane width, this estimate 144 | ; be too low 145 | ; The theoretical bandwidth for such a link will be: 146 | ; * 147 | ; PCIe Gen 2 has 500 MB/s per lane 148 | ; So the max theoretical bandwidth is 500 * 8 = 149 | ; 4000 MB/s 150 | ; In reality we can't hit the max, so we need some value lower 151 | ; Other processes running on other GPUs, processes running on the CPU, and 152 | ; processes communicating over the PCIe bus can affect the measured bandwidth 153 | ; 154 | bandwidth.warn = 9500 155 | ; Set this based on your local system configuration 156 | ;bandwidth.min = 8000 157 | 158 | ; The bandwidth between peers is also subject to the above 159 | ; estimation in the worst case. The best case will be much faster. 160 | peer.bandwidth.warn = 9500 161 | ; Set this based on your local system configuration 162 | peer.bandwidth.min = 8000 163 | 164 | ; The on link in question here is the one between the GPU and the PCI switch 165 | ; on the GPU board. This is a PCIe Gen 3 link even if the link to the system 166 | ; is lower. Additionally, this is a 16x wide link. 167 | pci.gen = 3 168 | pci.width = 16 169 | 170 | ; This is an intentionally high default. Set it lower based on your system 171 | ; thermal configuration. 172 | temperature.warn = 90 173 | 174 | 175 | [Tesla K10.G2.8GB] 176 | ; See [Tesla K10.G1.8GB] section for an explanation of defaults 177 | bandwidth.warn = 9500 178 | bandwidth.min = 8000 179 | peer.bandwidth.warn = 9500 180 | peer.bandwidth.min = 8000 181 | pci.gen = 3 182 | pci.width = 16 183 | temperature.warn = 90 184 | 185 | 186 | [Tesla K20] 187 | bandwidth.warn = 5000 188 | bandwidth.min = 4500 189 | peer.bandwidth.warn = 5000 190 | peer.bandwidth.min = 4500 191 | pci.gen = 2 192 | pci.width = 16 193 | temperature.warn = 90 194 | 195 | 196 | [Tesla K20X] 197 | bandwidth.warn = 5000 198 | bandwidth.min = 4500 199 | peer.bandwidth.warn = 5000 200 | peer.bandwidth.min = 4500 201 | pci.gen = 2 202 | pci.width = 16 203 | temperature.warn = 90 204 | 205 | 206 | [Tesla K20Xm] 207 | bandwidth.warn = 5000 208 | bandwidth.min = 4500 209 | peer.bandwidth.warn = 5000 210 | peer.bandwidth.min = 4500 211 | pci.gen = 2 212 | pci.width = 16 213 | temperature.warn = 90 214 | 215 | 216 | [Tesla K20c] 217 | bandwidth.warn = 5000 218 | bandwidth.min = 4500 219 | peer.bandwidth.warn = 5000 220 | peer.bandwidth.min = 4500 221 | pci.gen = 2 222 | pci.width = 16 223 | temperature.warn = 90 224 | 225 | 226 | [Tesla K20s] 227 | bandwidth.warn = 5000 228 | bandwidth.min = 4500 229 | peer.bandwidth.warn = 5000 230 | peer.bandwidth.min = 4500 231 | pci.gen = 2 232 | pci.width = 16 233 | temperature.warn = 90 234 | 235 | 236 | [Tesla K20m] 237 | bandwidth.warn = 5000 238 | bandwidth.min = 4500 239 | peer.bandwidth.warn = 5000 240 | peer.bandwidth.min = 4500 241 | pci.gen = 2 242 | pci.width = 16 243 | temperature.warn = 90 244 | 245 | 246 | [Tesla K40] 247 | bandwidth.warn = 9500 248 | bandwidth.min = 8000 249 | peer.bandwidth.warn = 9500 250 | peer.bandwidth.min = 8000 251 | pci.gen = 3 252 | pci.width = 16 253 | temperature.warn = 90 254 | 255 | 256 | [Tesla K40c] 257 | bandwidth.warn = 9500 258 | bandwidth.min = 8000 259 | peer.bandwidth.warn = 9500 260 | peer.bandwidth.min = 8000 261 | pci.gen = 3 262 | pci.width = 16 263 | temperature.warn = 90 264 | 265 | 266 | [Tesla K40m] 267 | bandwidth.warn = 9500 268 | bandwidth.min = 8000 269 | peer.bandwidth.warn = 9500 270 | peer.bandwidth.min = 8000 271 | pci.gen = 3 272 | pci.width = 16 273 | temperature.warn = 90 274 | 275 | 276 | [Tesla K80] 277 | bandwidth.warn = 9500 278 | bandwidth.min = 8000 279 | peer.bandwidth.warn = 9500 280 | peer.bandwidth.min = 8000 281 | pci.gen = 3 282 | pci.width = 16 283 | temperature.warn = 90 284 | 285 | 286 | -------------------------------------------------------------------------------- /dependencies/etc/slurm/cgroup.conf: -------------------------------------------------------------------------------- 1 | ################################################################################# 2 | ######################### Microway Cluster Management Software (MCMS) for OpenHPC 3 | ################################################################################# 4 | # 5 | # Slurm cgroup support configuration file 6 | # 7 | # See man slurm.conf and man cgroup.conf for further 8 | # information on cgroup configuration parameters 9 | # 10 | # 11 | # This file must be present on all nodes of your cluster. 12 | # See the slurm.conf man page for more information. 13 | # 14 | ################################################################################# 15 | 16 | CgroupAutomount=yes 17 | CgroupReleaseAgentDir="/etc/slurm/cgroup" 18 | CgroupMountpoint="/sys/fs/cgroup" 19 | 20 | ConstrainCores=yes 21 | ConstrainRAMSpace=yes 22 | ConstrainDevices=yes 23 | AllowedDevicesFile="/etc/slurm/cgroup_allowed_devices_file.conf" 24 | 25 | -------------------------------------------------------------------------------- /dependencies/etc/slurm/cgroup_allowed_devices_file.conf: -------------------------------------------------------------------------------- 1 | /dev/null 2 | /dev/urandom 3 | /dev/zero 4 | /dev/sd* 5 | /dev/vd* 6 | /dev/cpu/*/* 7 | /dev/pts/* 8 | -------------------------------------------------------------------------------- /dependencies/etc/slurm/gres.conf: -------------------------------------------------------------------------------- 1 | # Defines "generic resources" to be used by SLURM 2 | # 3 | # Each compute node must have its own file in /etc/slurm/gres.conf to define the 4 | # resources that it provides. 5 | # 6 | -------------------------------------------------------------------------------- /dependencies/etc/slurm/plugstack.conf: -------------------------------------------------------------------------------- 1 | ################################################################################# 2 | ######################### Microway Cluster Management Software (MCMS) for OpenHPC 3 | ################################################################################# 4 | # 5 | # Configuration for SLURM plug-ins 6 | # 7 | # 8 | # This file must be present on all nodes of your cluster. 9 | # 10 | ################################################################################# 11 | 12 | include /etc/slurm/plugstack.conf.d/* 13 | 14 | -------------------------------------------------------------------------------- /dependencies/etc/slurm/plugstack.conf.d/x11.conf: -------------------------------------------------------------------------------- 1 | ################################################################################ 2 | # X11 SLURM spank plugin configuration file 3 | # 4 | # this plugin can be used to add X11 support in slurm jobs using ssh X11 5 | # tunneling capabilities 6 | # 7 | # The following configuration parameters are available (the character | 8 | # replaces the space in compound options) : 9 | # 10 | # ssh_cmd : can be used to modify the ssh binary to use. 11 | # default corresponds to ssh_cmd=ssh 12 | # ssh_args : can be used to modify the ssh arguments to use. 13 | # default corresponds to ssh_cmd= 14 | # helpertask_cmd: can be used to add a trailing argument to the helper task 15 | # responsible for setting up the ssh tunnel 16 | # default corresponds to helpertask_cmd= 17 | # an interesting value can be helpertask_cmd=2>/tmp/log to 18 | # capture the stderr of the helper task 19 | # 20 | # Users can ask for X11 support for both interactive (srun) and batch (sbatch) 21 | # jobs using parameter --x11=[batch|first|last|all] or the SLURM_SPANK_X11 22 | # environment variable set to the required value. 23 | # 24 | # In interactive mode (srun), values can be first to establish a tunnel with 25 | # the first allocated node, last for the last one and all for all nodes. 26 | # 27 | # In batch mode (sbatch), only "batch" mode can be used but batch script can 28 | # be used first|last|all values with srun. In batch mode, the first allocated 29 | # node will contact the submission node using ssh to establish the tunnel 30 | # from the submission node to itself. As a result, the user must kept its 31 | # initial connection to the submission host as long as it wants to be able to 32 | # forward its X11 display to batch execution node. 33 | # 34 | #------------------------------------------------------------------------------- 35 | optional /usr/lib64/slurm/x11.so 36 | #------------------------------------------------------------------------------- 37 | -------------------------------------------------------------------------------- /dependencies/etc/slurm/scripts/slurm.epilog: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | ################################################################################ 3 | ######################## Microway Cluster Management Software (MCMS) for OpenHPC 4 | ################################################################################ 5 | # 6 | # Copyright (c) 2015-2016 by Microway, Inc. 7 | # 8 | # This file is part of Microway Cluster Management Software (MCMS) for OpenHPC. 9 | # 10 | # MCMS for OpenHPC is free software: you can redistribute it and/or modify 11 | # it under the terms of the GNU General Public License as published by 12 | # the Free Software Foundation, either version 3 of the License, or 13 | # (at your option) any later version. 14 | # 15 | # MCMS for OpenHPC is distributed in the hope that it will be useful, 16 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 17 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 18 | # GNU General Public License for more details. 19 | # 20 | # You should have received a copy of the GNU General Public License 21 | # along with MCMS. If not, see 22 | # 23 | ################################################################################ 24 | 25 | 26 | ################################################################################ 27 | # 28 | # This epilog script is run on each compute node after a user's job has 29 | # completed. Other jobs (from this user or another user) could still be running. 30 | # 31 | # If no other jobs from this user are running on the node, we will ensure all 32 | # their processes are terminated and all temporary/scratch files are removed. 33 | # 34 | # Without this script, a user can login to a node while their job is running and 35 | # that session will persist even after their job has finished. 36 | # 37 | ################################################################################ 38 | 39 | 40 | # The default SLURM path can be replaced, if necessary 41 | SLURM_BIN_DIR=/usr/slurm/16.05/bin/ 42 | 43 | 44 | # 45 | # List of temporary directories which should be cleaned after all of 46 | # a user's jobs have completed. You can find all such locations on your 47 | # systems by running this command (it is I/O intensive!): 48 | # 49 | # find / -type d -perm 1777 50 | # 51 | TMP_DIRS="/dev/shm /tmp /usr/tmp /var/tmp" 52 | 53 | 54 | # Exit if this script isn't actually running within a SLURM context 55 | if [[ -z "$SLURM_JOB_UID" ]] || [[ -z "$SLURM_JOB_ID" ]]; then 56 | echo "Do not run this script manually - it is used by SLURM" 57 | exit 1 58 | fi 59 | 60 | 61 | # 62 | # Don't try to kill user root or system daemon jobs. 63 | # 64 | # Note that the maximum system UID varies by distro (499 for older RHEL; 65 | # 999 for Debian and newer versions of RHEL). 66 | # 67 | # See UID_MIN in /etc/login.defs: 68 | # 69 | # awk '/^UID_MIN/ {print $2}' /etc/login.defs 70 | # 71 | if [[ $SLURM_JOB_UID -lt 1000 ]]; then 72 | exit 0 73 | fi 74 | 75 | 76 | # Pull the list of jobs this user is currently running on this node. 77 | job_list=$(${SLURM_BIN_DIR}squeue --noheader --format=%A --user=$SLURM_JOB_UID --node=localhost) 78 | squeue_retval=$? 79 | 80 | # If squeue failed, we probably have the wrong PATH or SLURM is down... 81 | if [[ $squeue_retval -gt 0 ]]; then 82 | exit $squeue_retval 83 | fi 84 | 85 | # Look through each job running on this node 86 | for job_id in $job_list; do 87 | # If the user still has a job on this node, stop here. 88 | if [[ $job_id -ne $SLURM_JOB_ID ]]; then 89 | exit 0 90 | fi 91 | done 92 | 93 | 94 | # Drop clean caches (recommended by OpenHPC) 95 | echo 3 > /proc/sys/vm/drop_caches 96 | 97 | 98 | # 99 | # No other SLURM jobs found - purge all remaining processes of this user. 100 | # 101 | # Note: the user can have other processes exiting, especially if they have 102 | # an interactive session (e.g., ssh with SPANK plugins). We may need to be more 103 | # descriminating in which processes are killed... 104 | # 105 | pkill -KILL -U $SLURM_JOB_UID 106 | 107 | 108 | # Remove any remaining temporary files the user created. 109 | for tmpdir in $TMP_DIRS; do 110 | find "$tmpdir" -uid $SLURM_JOB_UID -exec rm -Rf {} + 111 | find_retval=$? 112 | 113 | if [[ $find_retval -gt 0 ]]; then 114 | echo "Epilog error - unable to clean up temp files in $tmpdir" 115 | exit $find_retval 116 | fi 117 | done 118 | 119 | 120 | # If we've gotten to the end cleanly, everything should have worked 121 | exit 0 122 | 123 | -------------------------------------------------------------------------------- /dependencies/etc/slurm/scripts/slurm.healthcheck: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | ################################################################################ 3 | ######################## Microway Cluster Management Software (MCMS) for OpenHPC 4 | ################################################################################ 5 | # 6 | # Copyright (c) 2015 by Microway, Inc. 7 | # 8 | # This file is part of Microway Cluster Management Software (MCMS) for OpenHPC. 9 | # 10 | # MCMS for OpenHPC is free software: you can redistribute it and/or modify 11 | # it under the terms of the GNU General Public License as published by 12 | # the Free Software Foundation, either version 3 of the License, or 13 | # (at your option) any later version. 14 | # 15 | # MCMS for OpenHPC is distributed in the hope that it will be useful, 16 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 17 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 18 | # GNU General Public License for more details. 19 | # 20 | # You should have received a copy of the GNU General Public License 21 | # along with MCMS. If not, see 22 | # 23 | ################################################################################ 24 | 25 | 26 | ################################################################################ 27 | # 28 | # Start a node health check. 29 | # 30 | # This should run very fast and not use system resources, as it will be running 31 | # at the same time as compute jobs. Longer-term health checks may be run inside 32 | # the slurm.healthcheck_long script. 33 | # 34 | ################################################################################ 35 | 36 | 37 | source '/mcms/scripts/util/lib.lockfile.sh' 38 | 39 | 40 | NHC=/usr/sbin/nhc 41 | 42 | NHC_FILE="/etc/nhc/compute-node-checks.conf" 43 | 44 | 45 | # Other scripts can also spawn health checks, so we need a lock file 46 | MCMS_LOCKFILE="node-health-check" 47 | 48 | # Attempt to get the NHC lock 49 | mcms_get_lock 50 | 51 | # If unable to get the lock, we'll pass. Assume a longer health test is running. 52 | if [[ -z "$MCMS_RECEIVED_LOCK" ]]; then 53 | exit 0 54 | fi 55 | 56 | 57 | # Execute Node Health Check 58 | eval $NHC -c $NHC_FILE 59 | nhc_retval=$? 60 | 61 | 62 | exit $nhc_retval 63 | -------------------------------------------------------------------------------- /dependencies/etc/slurm/scripts/slurm.healthcheck_long: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | ################################################################################ 3 | ######################## Microway Cluster Management Software (MCMS) for OpenHPC 4 | ################################################################################ 5 | # 6 | # Copyright (c) 2015 by Microway, Inc. 7 | # 8 | # This file is part of Microway Cluster Management Software (MCMS) for OpenHPC. 9 | # 10 | # MCMS for OpenHPC is free software: you can redistribute it and/or modify 11 | # it under the terms of the GNU General Public License as published by 12 | # the Free Software Foundation, either version 3 of the License, or 13 | # (at your option) any later version. 14 | # 15 | # MCMS for OpenHPC is distributed in the hope that it will be useful, 16 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 17 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 18 | # GNU General Public License for more details. 19 | # 20 | # You should have received a copy of the GNU General Public License 21 | # along with MCMS. If not, see 22 | # 23 | ################################################################################ 24 | 25 | 26 | ################################################################################ 27 | # 28 | # Start longer-running node health checks. 29 | # 30 | # This script expects to be spawned from the front-end SLURM server before a 31 | # user's job is started. This script is run as root. 32 | # 33 | # During execution of this script, the nodes have state POWER_UP/CONFIGURING. 34 | # This gives us time to run longer health tests than are normally allowed in 35 | # prolog and epilog scripts. 36 | # 37 | # Each NHC script is expected to finish within 2 minutes. If an error (such as 38 | # a broken NFS mount) causes the script to run beyond 2 minutes, it will be 39 | # terminated (which results in an error condition and drains the compute node). 40 | # 41 | ################################################################################ 42 | 43 | 44 | source '/mcms/scripts/util/lib.lockfile.sh' 45 | 46 | 47 | NHC=/usr/sbin/nhc 48 | 49 | # 50 | # NHC files to run: 51 | # * normal (quick-running) tests 52 | # * more intensive tests (e.g., accelerator memory transfer performance) 53 | # * filesystem checks that could lock up (e.g., checking free space) 54 | # 55 | NHC_FILES="/etc/nhc/compute-node-checks.conf \ 56 | /etc/nhc/compute-node-checks_intense.conf \ 57 | /etc/nhc/compute-node-checks_blocking-io.conf" 58 | 59 | 60 | # Other scripts can also spawn health checks, so we need a lock file 61 | MCMS_LOCKFILE="node-health-check" 62 | 63 | # Attempt to get the NHC lock. 64 | # Because the other health tests are quick-running, we should wait a bit. 65 | attempts=3 66 | while [[ $attempts -gt 0 ]] 67 | do 68 | mcms_get_lock 69 | 70 | if [[ -n "$MCMS_RECEIVED_LOCK" ]]; then 71 | break 72 | fi 73 | 74 | sleep 0.5s 75 | 76 | attempts=$(( $attempts - 1 )) 77 | done 78 | 79 | # If unable to get the lock, we'll pass. Assume a longer health test is running. 80 | if [[ -z "$MCMS_RECEIVED_LOCK" ]]; then 81 | exit 0 82 | fi 83 | 84 | 85 | # Execute Node Health Checks 86 | for nhc_file in $NHC_FILES 87 | do 88 | eval $NHC -c $nhc_file 89 | nhc_retval=$? 90 | 91 | if [[ $nhc_retval -gt 0 ]]; then 92 | exit $nhc_retval 93 | fi 94 | done 95 | 96 | 97 | # If we've gotten to the end cleanly, everything should have worked 98 | exit 0 99 | 100 | -------------------------------------------------------------------------------- /dependencies/etc/slurm/scripts/slurm.jobstart_messages.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | ################################################################################ 3 | ######################## Microway Cluster Management Software (MCMS) for OpenHPC 4 | ################################################################################ 5 | # 6 | # Copyright (c) 2015-2016 by Microway, Inc. 7 | # 8 | # This file is part of Microway Cluster Management Software (MCMS) for OpenHPC. 9 | # 10 | # MCMS for OpenHPC is free software: you can redistribute it and/or modify 11 | # it under the terms of the GNU General Public License as published by 12 | # the Free Software Foundation, either version 3 of the License, or 13 | # (at your option) any later version. 14 | # 15 | # MCMS for OpenHPC is distributed in the hope that it will be useful, 16 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 17 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 18 | # GNU General Public License for more details. 19 | # 20 | # You should have received a copy of the GNU General Public License 21 | # along with MCMS. If not, see 22 | # 23 | ################################################################################ 24 | 25 | 26 | ################################################################################ 27 | # 28 | # Provide helpful messages for the start and end of a batch job. 29 | # 30 | ################################################################################ 31 | 32 | # Exit if this script isn't actually running within a SLURM context 33 | if [[ -z "$SLURM_JOB_UID" ]] || [[ -z "$SLURM_JOB_ID" ]]; then 34 | echo "Do not run this script manually - it is used by SLURM" 35 | exit 1 36 | fi 37 | 38 | 39 | echo " 40 | ################################################################################ 41 | # JOB DETAILS 42 | # 43 | # Job started at $(date +"%F %T") 44 | # Job ID number: $SLURM_JOBID 45 | # 46 | # Starting from host: $(hostname) 47 | # The following compute nodes will be used: $SLURM_NODELIST 48 | #" 49 | 50 | NPROCS=$(( $SLURM_NTASKS * $SLURM_CPUS_PER_TASK )) 51 | NODES=$SLURM_JOB_NUM_NODES 52 | NUM_SOCKETS=$((`grep 'physical id' /proc/cpuinfo | sort -u | tail -n1 | cut -d" " -f3` + 1)) 53 | NUM_CORES=$(grep siblings /proc/cpuinfo | head -n1 | cut -d" " -f2) 54 | 55 | echo "# 56 | # Using $NPROCS processes across $NODES nodes. 57 | # Reserving $SLURM_MEM_PER_NODE MB of memory per node. 58 | # 59 | # The node starting this job has: 60 | # 61 | # $NUM_SOCKETS CPU sockets with $NUM_CORES cores each -- $(grep -m1 'model name' /proc/cpuinfo) 62 | # System memory: $(awk '/MemTotal/ {print $2 $3}' /proc/meminfo) 63 | #" 64 | 65 | # Check for GPUs and print their status 66 | if [[ -n "$CUDA_VISIBLE_DEVICES" && "$CUDA_VISIBLE_DEVICES" != "NoDevFiles" ]]; then 67 | GPUS_PER_NODE=$(echo $CUDA_VISIBLE_DEVICES | sed 's/,/ /g' | wc --words) 68 | 69 | first_index=$(echo $CUDA_VISIBLE_DEVICES | sed 's/,.*//') 70 | GPU_TYPE=$(nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=$first_index | sed 's/ /_/g') 71 | 72 | echo "# 73 | # NVIDIA CUDA device IDs in use: $CUDA_VISIBLE_DEVICES 74 | # 75 | # Full list of GPU devices 76 | # $(nvidia-smi) 77 | #" 78 | fi 79 | 80 | # Check for Xeon Phi Coprocessors and print their status 81 | if [[ -n "$OFFLOAD_DEVICES" ]]; then 82 | echo "# 83 | # Xeon Phi device IDs in use: $OFFLOAD_DEVICES 84 | # 85 | # $(micinfo) 86 | #" 87 | fi 88 | 89 | 90 | # Check for storage devices 91 | STORAGE_DEVICES=$(awk '!/Attached devices/' /proc/scsi/scsi) 92 | if [[ -n "$STORAGE_DEVICES" ]]; then 93 | echo "# 94 | # Storage devices attached to this node: 95 | # $STORAGE_DEVICES 96 | #" 97 | else 98 | echo "# 99 | # No storage devices are attached to this node. 100 | #" 101 | fi 102 | 103 | 104 | echo "# 105 | # Changing to working directory $SLURM_SUBMIT_DIR 106 | # 107 | ################################################################################ 108 | 109 | " 110 | 111 | 112 | ################################################################################ 113 | # 114 | # The section below will be run when the job has finished 115 | # 116 | ################################################################################ 117 | 118 | # Trap all exits (both with and without errors) 119 | trap exit_handler EXIT 120 | 121 | # Remap errors and interrupts to exit (to prevent two calls to the handler) 122 | trap exit ERR INT TERM 123 | 124 | exit_handler() { 125 | local error_code="$?" 126 | local exit_time=$(date +'%F %T') 127 | 128 | # If there was an error, report it. 129 | if [ "$error_code" -gt 0 ]; then 130 | echo " 131 | 132 | ################################################################################ 133 | # 134 | # WARNING! Job exited abnormally at $exit_time with error code: $error_code 135 | # 136 | ################################################################################" 137 | 138 | # If the job completed successfully, report success. 139 | else 140 | echo " 141 | 142 | ################################################################################ 143 | # 144 | # Job finished successfully at $exit_time 145 | # 146 | ################################################################################" 147 | fi 148 | 149 | exit $error_code 150 | } 151 | -------------------------------------------------------------------------------- /dependencies/etc/slurm/scripts/slurmctld.power_nodes_off: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | ################################################################################ 3 | ######################## Microway Cluster Management Software (MCMS) for OpenHPC 4 | ################################################################################ 5 | # 6 | # Copyright (c) 2015 by Microway, Inc. 7 | # 8 | # This file is part of Microway Cluster Management Software (MCMS) for OpenHPC. 9 | # 10 | # MCMS for OpenHPC is free software: you can redistribute it and/or modify 11 | # it under the terms of the GNU General Public License as published by 12 | # the Free Software Foundation, either version 3 of the License, or 13 | # (at your option) any later version. 14 | # 15 | # MCMS for OpenHPC is distributed in the hope that it will be useful, 16 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 17 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 18 | # GNU General Public License for more details. 19 | # 20 | # You should have received a copy of the GNU General Public License 21 | # along with MCMS. If not, see 22 | # 23 | ################################################################################ 24 | 25 | 26 | ################################################################################ 27 | # 28 | # This script is run from the front-end SLURM server when powered-up nodes are 29 | # idle and just wasting power. This script is run as the SlurmUser, not as root. 30 | # 31 | ################################################################################ 32 | 33 | 34 | NODELIST="$1" 35 | 36 | 37 | # Exit if this script isn't actually running within a SLURM context 38 | if [[ -z "$SLURM_CONF" ]]; then 39 | echo "Do not run this script manually - it is used by SLURM" 40 | exit 1 41 | fi 42 | 43 | 44 | logger "SLURM is suspending node(s): $NODELIST" 45 | 46 | 47 | ################################################################################ 48 | # Parse the short-form node list information from SLURM. scontrol can do this, 49 | # but we should try not to shell out. Example: node[1,4-7,18] 50 | # 51 | full_node_list=( ) 52 | 53 | nodename_prefix=${NODELIST%%\[*} 54 | nodename_postfix=${NODELIST##*\]} 55 | short_list=${NODELIST##*\[} 56 | short_list=${short_list%%\]*} 57 | 58 | # If the 'node list' is a single node, we're done 59 | if [[ "$nodename_prefix" == "$nodename_postfix" ]]; then 60 | full_node_list[0]=$NODELIST 61 | else 62 | # Break down the comma-separated list 63 | OLD_IFS=$IFS 64 | IFS=, 65 | for item in $short_list; do 66 | range_begin=${item%%-*} 67 | range_end=${item##*-} 68 | 69 | # Add in each node in the specified node range (even if it's just one node) 70 | for (( i=$range_begin; i<$(($range_end+1)); i++ )); do 71 | full_node_list[${#full_node_list[@]}]=${nodename_prefix}${i}${nodename_postfix} 72 | done 73 | done 74 | IFS=$OLD_IFS 75 | fi 76 | ################################################################################ 77 | 78 | 79 | ################################################################################ 80 | # Power off the nodes 81 | # 82 | 83 | # Specify arguments to pass to SSH 84 | # Slurm will use a private SSH key to login as root on each compute node. 85 | SSH_EXECUTABLE=${SSH_EXECUTABLE:-/usr/bin/ssh} 86 | ssh_arguments="-i /var/spool/slurmd/.ssh/.poweroff-ssh-key -2 -a -x -lroot" 87 | 88 | # Power off all idle nodes 89 | for (( i=0; i<${#full_node_list[@]}; i++ )); do 90 | $SSH_EXECUTABLE $ssh_arguments ${full_node_list[$i]} /sbin/poweroff 91 | ssh_retval=$? 92 | 93 | if [[ $ssh_retval -gt 0 ]]; then 94 | exit $ssh_retval 95 | fi 96 | done 97 | ################################################################################ 98 | 99 | 100 | exit 0 101 | 102 | -------------------------------------------------------------------------------- /dependencies/etc/slurm/scripts/slurmctld.power_nodes_on: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | ################################################################################ 3 | ######################## Microway Cluster Management Software (MCMS) for OpenHPC 4 | ################################################################################ 5 | # 6 | # Copyright (c) 2016 by Microway, Inc. 7 | # 8 | # This file is part of Microway Cluster Management Software (MCMS) for OpenHPC. 9 | # 10 | # MCMS for OpenHPC is free software: you can redistribute it and/or modify 11 | # it under the terms of the GNU General Public License as published by 12 | # the Free Software Foundation, either version 3 of the License, or 13 | # (at your option) any later version. 14 | # 15 | # MCMS for OpenHPC is distributed in the hope that it will be useful, 16 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 17 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 18 | # GNU General Public License for more details. 19 | # 20 | # You should have received a copy of the GNU General Public License 21 | # along with MCMS. If not, see 22 | # 23 | ################################################################################ 24 | 25 | 26 | ################################################################################ 27 | # 28 | # This script is run from the front-end SLURM server when powered-down nodes 29 | # need to be re-activated and powered back up. This script is run as the 30 | # SlurmUser, not as root. 31 | # 32 | # During execution of this script, the nodes have state POWER_UP/CONFIGURING. 33 | # 34 | ################################################################################ 35 | 36 | 37 | NODELIST="$1" 38 | 39 | 40 | # Exit if this script isn't actually running within a SLURM context 41 | if [[ -z "$SLURM_CONF" ]]; then 42 | echo "Do not run this script manually - it is used by SLURM" 43 | exit 1 44 | fi 45 | 46 | 47 | logger "SLURM is resuming node(s) $NODELIST" 48 | 49 | 50 | ################################################################################ 51 | # Power on the nodes 52 | # 53 | POWERUP_EXECUTABLE=${POWERUP_EXECUTABLE:-/etc/slurm/scripts/slurmctld.power_nodes_on_as_root} 54 | SSH_EXECUTABLE=${SSH_EXECUTABLE:-/usr/bin/ssh} 55 | 56 | # Specify arguments to pass to SSH 57 | # Slurm will use a private SSH key to login to the master as root 58 | ssh_arguments="-i /var/spool/slurmd/.ssh/.poweroff-ssh-key -2 -a -x -lroot" 59 | 60 | 61 | # We pass the node list in via stdin so that our SSH key checking can securely 62 | # verify the exact script which is running with root privileges. 63 | echo $NODELIST | $SSH_EXECUTABLE $ssh_arguments localhost $POWERUP_EXECUTABLE 64 | powerup_retval=$? 65 | 66 | # The 'wwsh' utility (which starts the nodes) returns 1 even upon success 67 | if [[ $powerup_retval -gt 1 ]]; then 68 | exit $powerup_retval 69 | fi 70 | ################################################################################ 71 | 72 | 73 | ################################################################################ 74 | # Parse the short-form node list information from SLURM. scontrol can do this, 75 | # but we should try not to shell out. Example: node[1,4-7,18] 76 | # 77 | full_node_list=( ) 78 | 79 | nodename_prefix=${NODELIST%%\[*} 80 | nodename_postfix=${NODELIST##*\]} 81 | short_list=${NODELIST##*\[} 82 | short_list=${short_list%%\]*} 83 | 84 | # If the 'node list' is a single node, we're done 85 | if [[ "$nodename_prefix" == "$nodename_postfix" ]]; then 86 | full_node_list[0]=$NODELIST 87 | else 88 | # Break down the comma-separated list 89 | OLD_IFS=$IFS 90 | IFS=, 91 | for item in $short_list; do 92 | range_begin=${item%%-*} 93 | range_end=${item##*-} 94 | 95 | # Add in each node in the specified node range (even if it's just one node) 96 | for (( i=$range_begin; i<$(($range_end+1)); i++ )); do 97 | full_node_list[${#full_node_list[@]}]=${nodename_prefix}${i}${nodename_postfix} 98 | done 99 | done 100 | IFS=$OLD_IFS 101 | fi 102 | ################################################################################ 103 | 104 | 105 | ################################################################################ 106 | # Wait for the nodes to complete the boot process. 107 | # To start, we'll try one node at random. As soon as more than one node is 108 | # responding, we'll exit and SLURM can verify they are actually up. 109 | # 110 | # SSH will wait up to 5 seconds per attempt; we'll wait up to another 5 seconds 111 | retry_interval="5s" 112 | 113 | # Specify arguments to pass to SSH 114 | # Slurm will use a private SSH key to login as root on each compute node. 115 | SSH_EXECUTABLE=${SSH_EXECUTABLE:-/usr/bin/ssh} 116 | ssh_arguments="-i /var/spool/slurmd/.poweroff-ssh-key -2 -a -x -lroot -oConnectTimeout=${retry_interval}" 117 | 118 | # Each retry will last between 5 and 10 seconds (we'll wait 5 to 10 minutes) 119 | retry_attempts=60 120 | ssh_retval=999 121 | nodes_responding=0 122 | while [[ $ssh_retval -gt 0 ]] && 123 | [[ $retry_attempts -gt 0 ]] && 124 | [[ $nodes_responding -lt 2 ]]; 125 | do 126 | sleep $retry_interval 127 | 128 | random_node_index=$(( $RANDOM % ${#full_node_list[@]} )) 129 | random_node=${full_node_list[$random_node_index]} 130 | 131 | $SSH_EXECUTABLE $ssh_arguments $random_node echo 132 | ssh_retval=$? 133 | 134 | # Once nodes start responding, count them 135 | if [[ $ssh_retval -eq 0 ]]; then 136 | nodes_responding=$(( $nodes_responding + 1 )) 137 | fi 138 | 139 | retry_attempts=$(( $retry_attempts - 1 )) 140 | done 141 | 142 | # If we waited the whole time and no nodes are responding, error out 143 | if [[ $ssh_retval -gt 0 ]] && [[ $nodes_responding -lt 2 ]]; then 144 | logger "SLURM was not able to successfully power up all requested nodes" 145 | exit $ssh_retval 146 | fi 147 | ################################################################################ 148 | 149 | 150 | exit 0 151 | 152 | -------------------------------------------------------------------------------- /dependencies/etc/slurm/scripts/slurmctld.power_nodes_on_as_root: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | ################################################################################ 3 | ######################## Microway Cluster Management Software (MCMS) for OpenHPC 4 | ################################################################################ 5 | # 6 | # Copyright (c) 2015 by Microway, Inc. 7 | # 8 | # This file is part of Microway Cluster Management Software (MCMS) for OpenHPC. 9 | # 10 | # MCMS for OpenHPC is free software: you can redistribute it and/or modify 11 | # it under the terms of the GNU General Public License as published by 12 | # the Free Software Foundation, either version 3 of the License, or 13 | # (at your option) any later version. 14 | # 15 | # MCMS for OpenHPC is distributed in the hope that it will be useful, 16 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 17 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 18 | # GNU General Public License for more details. 19 | # 20 | # You should have received a copy of the GNU General Public License 21 | # along with MCMS. If not, see 22 | # 23 | ################################################################################ 24 | 25 | 26 | ################################################################################ 27 | # 28 | # This script is called via SSH from the front-end SLURM server when powered- 29 | # down nodes need to be re-activated and powered back up. This script will be 30 | # executed remotely via an SSH connection (using SLURM's private SSH key). 31 | # 32 | # We expect the calling script to pass in the nodelist via stdin. 33 | # 34 | ################################################################################ 35 | 36 | 37 | # Exit if this script isn't actually running within a SLURM context. Because 38 | # we expect to be called via SSH (which strips much of the SLURM context), check 39 | # to be sure that this script was called directly and not from a user session. 40 | if [[ ! "$SSH_ORIGINAL_COMMAND" =~ "slurmctld.power_nodes_on_as_root" ]]; then 41 | echo "Do not run this script manually - it is used by SLURM" 42 | exit 1 43 | fi 44 | 45 | 46 | read NODELIST 47 | 48 | 49 | ################################################################################ 50 | # Power on the nodes 51 | # 52 | WWSH_EXECUTABLE=${WWSH_EXECUTABLE:-wwsh} 53 | 54 | $WWSH_EXECUTABLE ipmi poweron $NODELIST 55 | wwsh_retval=$? 56 | 57 | # The 'wwsh' utility returns 1 even upon success 58 | if [[ $wwsh_retval -gt 1 ]]; then 59 | exit $wwsh_retval 60 | fi 61 | ################################################################################ 62 | 63 | 64 | exit 0 65 | 66 | -------------------------------------------------------------------------------- /dependencies/etc/slurm/scripts/slurmctld.prolog: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | ################################################################################ 3 | ######################## Microway Cluster Management Software (MCMS) for OpenHPC 4 | ################################################################################ 5 | # 6 | # Copyright (c) 2015 by Microway, Inc. 7 | # 8 | # This file is part of Microway Cluster Management Software (MCMS) for OpenHPC. 9 | # 10 | # MCMS for OpenHPC is free software: you can redistribute it and/or modify 11 | # it under the terms of the GNU General Public License as published by 12 | # the Free Software Foundation, either version 3 of the License, or 13 | # (at your option) any later version. 14 | # 15 | # MCMS for OpenHPC is distributed in the hope that it will be useful, 16 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 17 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 18 | # GNU General Public License for more details. 19 | # 20 | # You should have received a copy of the GNU General Public License 21 | # along with MCMS. If not, see 22 | # 23 | ################################################################################ 24 | 25 | 26 | ################################################################################ 27 | # 28 | # This prolog script is run from the front-end SLURM server before a user's job 29 | # is allocated compute nodes. This script is run as the SlurmUser, not as root. 30 | # 31 | # During execution of this script, the nodes have state POWER_UP/CONFIGURING. 32 | # This gives us time to run longer health tests than are normally allowed in 33 | # prolog and epilog scripts. 34 | # 35 | # Each NHC script is expected to finish within 2 minutes. If an error (such as 36 | # a broken NFS mount) causes the script to run beyond 2 minutes, it will be 37 | # terminated (which results in an error condition and drains the compute node). 38 | # 39 | ################################################################################ 40 | 41 | 42 | # Exit if this script isn't actually running within a SLURM context 43 | if [[ -z "$SLURM_JOB_UID" ]] || [[ -z "$SLURM_JOB_ID" ]]; then 44 | echo "Do not run this script manually - it is used by SLURM" 45 | exit 1 46 | fi 47 | 48 | 49 | ################################################################################ 50 | # Parse the short-form node list information from SLURM. scontrol can do this, 51 | # but we should try not to shell out. Example: node[1,4-7,18] 52 | # 53 | full_node_list=( ) 54 | 55 | nodename_prefix=${SLURM_JOB_NODELIST%%\[*} 56 | nodename_postfix=${SLURM_JOB_NODELIST##*\]} 57 | short_list=${SLURM_JOB_NODELIST##*\[} 58 | short_list=${short_list%%\]*} 59 | 60 | # If the 'node list' is a single node, we're done 61 | if [[ "$nodename_prefix" == "$nodename_postfix" ]]; then 62 | full_node_list[0]=$SLURM_JOB_NODELIST 63 | else 64 | # Break down the comma-separated list 65 | OLD_IFS=$IFS 66 | IFS=, 67 | for item in $short_list; do 68 | range_begin=${item%%-*} 69 | range_end=${item##*-} 70 | 71 | # Add in each node in the specified node range (even if it's just one node) 72 | for (( i=$range_begin; i<$(($range_end+1)); i++ )); do 73 | full_node_list[${#full_node_list[@]}]=${nodename_prefix}${i}${nodename_postfix} 74 | done 75 | done 76 | IFS=$OLD_IFS 77 | fi 78 | ################################################################################ 79 | 80 | 81 | # We may have a pause here if SLURM is getting nodes ready (either by powering 82 | # up nodes that are powered off and/or running long health checks). 83 | 84 | 85 | ################################################################################ 86 | # Wait for the nodes to complete the boot process. 87 | # To start, we'll try one node at random. As soon as more than one node is 88 | # responding, we'll exit and SLURM can verify they are actually up. 89 | # 90 | # SSH will wait up to 5 seconds per attempt; we'll wait up to another 5 seconds 91 | retry_interval="5s" 92 | 93 | # Specify arguments to pass to SSH 94 | # Slurm will use a private SSH key to login as root on each compute node. 95 | SSH_EXECUTABLE=${SSH_EXECUTABLE:-/usr/bin/ssh} 96 | ssh_arguments="-i /var/spool/slurmd/.ssh/.poweroff-ssh-key -2 -a -x -lroot -oConnectTimeout=${retry_interval}" 97 | 98 | # Each retry will last between 5 and 10 seconds (we'll wait 5 to 10 minutes) 99 | retry_attempts=60 100 | ssh_retval=999 101 | nodes_responding=0 102 | while [[ $ssh_retval -gt 0 ]] && 103 | [[ $retry_attempts -gt 0 ]] && 104 | [[ $nodes_responding -lt 2 ]]; 105 | do 106 | sleep $retry_interval 107 | 108 | random_node_index=$(( $RANDOM % ${#full_node_list[@]} )) 109 | random_node=${full_node_list[$random_node_index]} 110 | 111 | $SSH_EXECUTABLE $ssh_arguments $random_node echo 112 | ssh_retval=$? 113 | 114 | # Once nodes start responding, count them 115 | if [[ $ssh_retval -eq 0 ]]; then 116 | nodes_responding=$(( $nodes_responding + 1 )) 117 | fi 118 | 119 | retry_attempts=$(( $retry_attempts - 1 )) 120 | done 121 | 122 | # If we waited the whole time and no nodes are responding, error out 123 | if [[ $ssh_retval -gt 0 ]] && [[ $nodes_responding -lt 2 ]]; then 124 | exit $ssh_retval 125 | fi 126 | ################################################################################ 127 | 128 | 129 | 130 | ################################################################################ 131 | # Prevent long tests from running over and over on compute nodes. While many 132 | # cluster jobs are long-running, SLURM also supports large numbers of short- 133 | # running jobs. We don't want multi-minute tests between each job. 134 | 135 | # Assume that one intensive health test per day will be sufficient 136 | LONG_HEALTH_CHECK_INTERVAL=${LONG_HEALTH_CHECK_INTERVAL:-$((60*60*24))} 137 | 138 | # Number of seconds since epoch 139 | current_time=$(printf "%(%s)T" "-1") 140 | 141 | # List of nodes which need to be checked 142 | check_node_list=( ) 143 | 144 | # Store the health check cache in memory - requires 4KB per node 145 | cache_dir=/dev/shm/.slurmctld_health_check_cache 146 | 147 | for compute_node in ${full_node_list[@]}; do 148 | # Split node cache files into several directories 149 | node_number=${compute_node##*[a-zA-Z-_]} 150 | node_dir="${cache_dir}/${node_number:0:1}" 151 | node_cache_file="${node_dir}/${compute_node}" 152 | 153 | # See if the node has ever been checked 154 | last_tested=0 155 | if [[ -f "$node_cache_file" ]]; then 156 | last_tested=$(< $node_cache_file) 157 | fi 158 | 159 | if (( $current_time > ($last_tested + $LONG_HEALTH_CHECK_INTERVAL) )); then 160 | # Node was not checked recently. Check it now. 161 | check_node_list[${#check_node_list[@]}]=$compute_node 162 | fi 163 | done 164 | ################################################################################ 165 | 166 | 167 | 168 | ################################################################################ 169 | # Start the long healthcheck script on the compute nodes in parallel. 170 | # 171 | LONG_HEALTH_CHECK_SCRIPT=${LONG_HEALTH_CHECK_SCRIPT:-/etc/slurm/scripts/slurm.healthcheck_long} 172 | 173 | # Specify arguments to pass to SSH - slurm will use a private SSH key to login 174 | # as root on each compute node. Note that the username parameter must be set 175 | # twice (once with '%u' and once with 'root') to prevent PDSH from overwriting 176 | # this setting with the SLURM username. 177 | export PDSH_SSH_ARGS="-i /var/spool/slurmd/.ssh/.healthcheck-ssh-key -2 -a -x -l%u -lroot %h" 178 | export PDSH_EXECUTABLE=${PDSH_EXECUTABLE:-/usr/bin/pdsh} 179 | 180 | # Execute Node Health Checks on all nodes assigned to this job 181 | $PDSH_EXECUTABLE -Sw $SLURM_JOB_NODELIST $LONG_HEALTH_CHECK_SCRIPT 182 | pdsh_retval=$? 183 | 184 | if [[ $pdsh_retval -gt 0 ]]; then 185 | exit $pdsh_retval 186 | fi 187 | ################################################################################ 188 | 189 | 190 | 191 | ################################################################################ 192 | # If we've gotten to the end cleanly, everything should have worked. 193 | # 194 | # Mark the compute nodes as checked. 195 | # 196 | for compute_node in ${check_node_list[@]}; do 197 | # Split node cache files into several directories 198 | node_number=${compute_node##*[a-zA-Z-_]} 199 | node_dir="${cache_dir}/${node_number:0:1}" 200 | node_cache_file="${node_dir}/${compute_node}" 201 | 202 | if [[ ! -d "$node_dir" ]]; then 203 | mkdir -p "$node_dir" 204 | fi 205 | 206 | echo $current_time > "$node_cache_file" 207 | done 208 | ################################################################################ 209 | 210 | 211 | exit 0 212 | 213 | -------------------------------------------------------------------------------- /dependencies/etc/slurm/scripts/slurmd.gres_init: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | ################################################################################ 3 | ######################## Microway Cluster Management Software (MCMS) for OpenHPC 4 | ################################################################################ 5 | # 6 | # Copyright (c) 2015-2016 by Microway, Inc. 7 | # 8 | # This file is part of Microway Cluster Management Software (MCMS) for OpenHPC. 9 | # 10 | # MCMS for OpenHPC is free software: you can redistribute it and/or modify 11 | # it under the terms of the GNU General Public License as published by 12 | # the Free Software Foundation, either version 3 of the License, or 13 | # (at your option) any later version. 14 | # 15 | # MCMS for OpenHPC is distributed in the hope that it will be useful, 16 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 17 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 18 | # GNU General Public License for more details. 19 | # 20 | # You should have received a copy of the GNU General Public License 21 | # along with MCMS. If not, see 22 | # 23 | ################################################################################ 24 | 25 | 26 | ################################################################################ 27 | # 28 | # Additional SLURM configuration 29 | # 30 | # This script is run by SLURM during startup to iterate through GPUs and Phis 31 | # 32 | ################################################################################ 33 | 34 | 35 | ############## Discover "Generic Resources" and Populate gres.conf ############# 36 | gresfile=/etc/slurm/gres.conf 37 | 38 | 39 | ########### Add GPUs (if present) ########## 40 | if [[ -n "$(lspci | grep NVIDIA)" ]]; then 41 | # If GPU settings are already present, assume they've been set elsewhere, 42 | # or that SLURM is restarting and set them up the last time it started. 43 | if [[ -z "$(grep '/dev/nvidia' $gresfile 2> /dev/null)" ]]; then 44 | 45 | # Test for the presence of NVIDIA GPU devices 46 | if [[ -c /dev/nvidia0 ]]; then 47 | 48 | # SLURM desires that we also inform it which CPUs are 49 | # allowed to use each GPU. By default, we allow all CPUs: 50 | cpu_list=( $(awk '/processor/ {print $3}' /proc/cpuinfo) ) 51 | first_cpu=${cpu_list[0]} 52 | final_cpu=${cpu_list[${#cpu_list[@]}-1]} 53 | 54 | # Determine the ordering of the GPUs as seen by NVIDIA CUDA 55 | check_for_smi=$(which nvidia-smi) 56 | if [[ -z "$check_for_smi" ]]; then 57 | echo "SLURM startup - the nvidia-smi tool is unavailable, so unable to set GPU types" 58 | 59 | # If we were not able to grab the GPU ordering, we pass a group 60 | # of generic GPU devices to SLURM. SLURM won't know their types. 61 | # 62 | # Loop through each NVIDIA device in PCI order 63 | for gpu_device in $(find /dev/ -type c -name "nvidia[0-9]*" | sort); do 64 | echo "Name=gpu File=${gpu_device} CPUs=${first_cpu}-${final_cpu}" >> $gresfile 65 | done 66 | else 67 | # Determine the ordering and types of GPUs 68 | for gpu_device in $(find /dev/ -type c -name "nvidia[0-9]*" | sort); do 69 | gpu_id=${gpu_device#*nvidia} 70 | gpu_name=$(nvidia-smi --format=csv,noheader --query-gpu=gpu_name --id=${gpu_id} | sed 's/ /-/g') 71 | 72 | # If we were able to grab the name, we provide the additional 73 | # information to SLURM. If not, SLURM will just see a group of 74 | # generic GPU devices (without knowing their type) 75 | if [[ -n "${gpu_name}" ]]; then 76 | echo "Name=gpu Type=${gpu_name} File=${gpu_device} CPUs=${first_cpu}-${final_cpu}" >> $gresfile 77 | else 78 | echo "SLURM startup - unable to read the name of GPU ${gpu_device}" 79 | echo "Name=gpu File=${gpu_device} CPUs=${first_cpu}-${final_cpu}" >> $gresfile 80 | fi 81 | done 82 | fi 83 | else 84 | echo "SLURM startup - unable to add GPUs to SLURM $gresfile file - lspci reports GPUs, but they are not in /dev !" 85 | fi 86 | fi 87 | else 88 | echo "SLURM startup - no NVIDIA GPUs detected..." 89 | fi 90 | 91 | 92 | ########### Add Intel Xeon Phi MIC coprocessors (if present) ########## 93 | if [[ -n "$(lspci | grep 'Xeon Phi coprocessor')" ]]; then 94 | # If MIC settings are already present, assume they've been set elsewhere, 95 | # or that SLURM is restarting and set them up the last time it started. 96 | if [[ -z "$(grep '/dev/mic' $gresfile 2> /dev/null)" ]]; then 97 | if [[ -c /dev/mic0 ]]; then 98 | for mic in $(find /dev/ -type c -name "mic[0-9]*" | sort); do 99 | echo "Name=mic File=$mic" >> $gresfile 100 | done 101 | else 102 | echo "SLURM startup - unable to add Xeon Phi coprocessors to SLURM $gresfile file - lspci reports PHIs, but they are not in /dev !" 103 | fi 104 | fi 105 | else 106 | echo "SLURM startup - no Intel Xeon PHIs detected..." 107 | fi 108 | 109 | -------------------------------------------------------------------------------- /dependencies/etc/slurm/slurm.conf: -------------------------------------------------------------------------------- 1 | ################################################################################# 2 | ######################### Microway Cluster Management Software (MCMS) for OpenHPC 3 | ################################################################################# 4 | # 5 | # Configuration for SLURM Resource Manager 6 | # 7 | # 8 | # This file must be present on all nodes of your cluster. 9 | # See the slurm.conf man page for more information. 10 | # 11 | ################################################################################# 12 | 13 | ClusterName={clusterName} 14 | 15 | ControlMachine={headName} 16 | #ControlAddr= 17 | 18 | # Specify a backup SLURM control server 19 | # (note that both control servers must share a SLURM state filesystem) 20 | #BackupController= 21 | #BackupAddr= 22 | 23 | SlurmdPort=6818 24 | SlurmctldPort=6817 25 | SlurmdPidFile=/var/run/slurmd.pid 26 | SlurmctldPidFile=/var/run/slurmctld.pid 27 | SlurmdSpoolDir=/var/spool/slurmd 28 | StateSaveLocation=/var/lib/slurmd 29 | 30 | SlurmUser=slurm 31 | #SlurmdUser=root 32 | 33 | 34 | 35 | ################################################################################# 36 | # PROLOG, EPILOG AND HEALTH SCRIPTS 37 | 38 | # Prepare for user jobs (must run very quickly) 39 | #Prolog= 40 | 41 | # Clean up after user jobs 42 | Epilog=/etc/slurm/scripts/slurm.epilog 43 | 44 | #SrunProlog= 45 | #SrunEpilog= 46 | 47 | #TaskProlog= 48 | #TaskEpilog= 49 | 50 | # Prepare nodes for use (periodically run a longer health check) 51 | PrologSlurmctld=/etc/slurm/scripts/slurmctld.prolog 52 | 53 | #EpilogSlurmctld= 54 | 55 | # Check health of all nodes in the cluster. This program must run very 56 | # quickly, because it is automatically terminated after 60 seconds. 57 | HealthCheckProgram=/etc/slurm/scripts/slurm.healthcheck 58 | 59 | # Run health check every 15 minutes (900 seconds) 60 | HealthCheckInterval=900 61 | 62 | 63 | 64 | ################################################################################# 65 | AuthType=auth/munge 66 | CryptoType=crypto/munge 67 | #JobCredentialPrivateKey= 68 | #JobCredentialPublicCertificate= 69 | 70 | CacheGroups=0 71 | #GroupUpdateForce=0 72 | #GroupUpdateTime=600 73 | 74 | #DisableRootJobs=NO 75 | 76 | # Rather than confuse users by accepting impossibly-sized jobs, such requests 77 | # will be rejected when the user submits the job. 78 | EnforcePartLimits=YES 79 | 80 | #FirstJobId=1 81 | 82 | # This is the most active jobs SLURM will support at a time 83 | MaxJobCount=25000 84 | #MaxJobId=999999 85 | 86 | GresTypes=gpu,mic 87 | 88 | #CheckpointType=checkpoint/none 89 | #JobCheckpointDir=/var/slurm/checkpoint 90 | 91 | #JobFileAppend=0 92 | #JobRequeue=1 93 | #JobSubmitPlugins=1 94 | #KillOnBadExit=0 95 | #LaunchType=launch/slurm 96 | 97 | # If licenses need to be tracked on this cluster, list them here: 98 | #Licenses=foo*4,bar 99 | 100 | MailProg=/usr/bin/mailq 101 | 102 | #MaxStepCount=40000 103 | #MaxTasksPerNode=128 104 | 105 | MpiDefault=none 106 | #MpiParams=ports=12000-12999 107 | 108 | #PluginDir=/usr/local/lib/slurm:/etc/slurm/plugins 109 | PlugStackConfig=/etc/slurm/plugstack.conf 110 | 111 | # Prevent users from seeing: 112 | # * reservations they cannot use 113 | # * other user's usage information 114 | PrivateData=reservations,usage 115 | 116 | ProctrackType=proctrack/cgroup 117 | #PropagatePrioProcess=0 118 | #PropagateResourceLimits= 119 | PropagateResourceLimitsExcept=MEMLOCK 120 | 121 | RebootProgram="/sbin/shutdown --reboot +1 SLURM is rebooting this node" 122 | 123 | # Set this value to 2 if you want downed nodes to be returned to service, 124 | # regardless of why they were set DOWN (e.g., unexpected reboots) 125 | ReturnToService=1 126 | 127 | #SallocDefaultCommand= 128 | 129 | SwitchType=switch/none 130 | 131 | TaskPlugin=task/cgroup 132 | #TaskPluginParam= 133 | 134 | #TopologyPlugin=topology/tree 135 | 136 | # Specify which directory on compute nodes is considered temporary storage 137 | #TmpFS=/tmp 138 | 139 | #TrackWCKey=no 140 | #TreeWidth= 141 | #UnkillableStepProgram= 142 | #UsePAM=0 143 | 144 | 145 | 146 | ################################################################################# 147 | # TIMERS 148 | 149 | #BatchStartTimeout=10 150 | #CompleteWait=0 151 | #EpilogMsgTime=2000 152 | #GetEnvTimeout=2 153 | InactiveLimit=0 154 | KillWait=30 155 | MessageTimeout=30 156 | MinJobAge=300 157 | #ResvOverRun=0 158 | SlurmctldTimeout=120 159 | SlurmdTimeout=300 160 | #UnkillableStepTimeout=60 161 | #VSizeFactor=0 162 | Waittime=0 163 | 164 | # Allow jobs to run 5 minutes longer than the time that was allocated to them 165 | OverTimeLimit=5 166 | 167 | 168 | 169 | ################################################################################# 170 | # SCHEDULING 171 | 172 | # It's important to set a default in case users don't list their memory needs. 173 | # On modern HPC clusters it's unusual to have less than 1GB per core 174 | DefMemPerCPU=1024 175 | 176 | FastSchedule=0 177 | #MaxMemPerCPU=0 178 | #SchedulerRootFilter=1 179 | #SchedulerTimeSlice=30 180 | SchedulerType=sched/backfill 181 | SchedulerPort=7321 182 | 183 | # Use 'select/linear' if you want to allocate whole nodes to each job 184 | SelectType=select/cons_res 185 | 186 | SelectTypeParameters=CR_Core_Memory 187 | 188 | # By default, we set the following scheduling options: 189 | # 190 | # * bf_interval sets how often SLURM works on backfilling jobs 191 | # * bf_continue is enabled to improve the ability of SLURM to backfill jobs 192 | # * bf_resolution is increased from 1 to 10 minutes, which will increase system 193 | # utilization (but may cause some jobs to start a few minutes late) 194 | # * bf_window is increased from 1 to 2 days, which causes SLURM to look further 195 | # into the future to determine when and where jobs can start 196 | # * bf_max_job_test sets the maximum number of jobs to try backfilling 197 | # * bf_max_job_part limits the number of jobs to backfill from one partition 198 | # * bf_max_job_user limits the number of backfilled jobs for any given user 199 | # * bf_max_job_start limits the number of backfill jobs to start at a time 200 | # 201 | SchedulerParameters=bf_interval=60,bf_continue,bf_resolution=600,bf_window=2880,bf_max_job_test=5000,bf_max_job_part=1000,bf_max_job_user=10,bf_max_job_start=100 202 | 203 | # Allow higher-priority jobs to take resources from lower-priority jobs 204 | PreemptType=preempt/partition_prio 205 | 206 | # By default, a job which is preempted will simply be paused 207 | PreemptMode=SUSPEND,GANG 208 | 209 | 210 | 211 | ################################################################################# 212 | # JOB PRIORITY 213 | 214 | PriorityFlags=FAIR_TREE,SMALL_RELATIVE_TO_TIME 215 | PriorityType=priority/multifactor 216 | PriorityDecayHalfLife=14-0 217 | #PriorityCalcPeriod= 218 | PriorityFavorSmall=NO 219 | 220 | # All jobs more than a week old will be given the same age priority 221 | PriorityMaxAge=7-0 222 | #PriorityUsageResetPeriod= 223 | 224 | # These values set the relative importance of each priority factor 225 | PriorityWeightAge=1000 226 | PriorityWeightFairshare=20000000 227 | PriorityWeightJobSize=1000 228 | PriorityWeightPartition=100000000 229 | PriorityWeightQOS=1000000000 230 | 231 | 232 | 233 | ################################################################################# 234 | # LOGGING AND ACCOUNTING 235 | 236 | AccountingStorageHost={headName} 237 | AccountingStorageEnforce=limits 238 | AccountingStorageType=accounting_storage/slurmdbd 239 | AccountingStoragePass=/var/run/munge/munge.socket.2 240 | 241 | # If tracking GRES usage is desired, the names of the devices must be included: 242 | AccountingStorageTRES=gres/Tesla-K20m,gres/Tesla-K40m,gres/Tesla-K80,gres/Tesla-M40,gres/Tesla-P100-PCIE-16GB 243 | 244 | AccountingStoreJobComment=YES 245 | 246 | #DebugFlags= 247 | 248 | # We do not worry about this plugin because SLURMDBD offers the same capability 249 | JobCompType=jobcomp/none 250 | #JobCompHost= 251 | #JobCompLoc= 252 | #JobCompPass= 253 | #JobCompPort= 254 | #JobCompUser= 255 | 256 | JobAcctGatherType=jobacct_gather/linux 257 | JobAcctGatherFrequency=30 258 | 259 | SlurmctldLogFile=/var/log/slurm/slurmctld.log 260 | SlurmctldDebug=3 261 | 262 | SlurmdLogFile=/var/log/slurm/slurmd.log 263 | SlurmdDebug=3 264 | 265 | # By default, the scheduler does not write logs (can be enabled with scontrol) 266 | SlurmSchedLogFile=/var/log/slurm/slurmsched.log 267 | SlurmSchedLogLevel=0 268 | 269 | 270 | 271 | ################################################################################# 272 | # POWER SAVE SUPPORT FOR IDLE NODES 273 | SuspendProgram=/etc/slurm/scripts/slurmctld.power_nodes_off 274 | ResumeProgram=/etc/slurm/scripts/slurmctld.power_nodes_on 275 | 276 | # How long a node must be idle before it will be powered off (in seconds) 277 | SuspendTime=14400 # Four hours 278 | 279 | SuspendTimeout=30 # Number of seconds we expect the node shutdown to take 280 | ResumeTimeout=300 # Number of seconds we expect the node boot process to take 281 | ResumeRate=100 # Number of nodes we're willing to turn on at a time 282 | SuspendRate=100 # Number of nodes we're willing to power off at a time 283 | 284 | # Nodes and Partitions which should not be powered off 285 | #SuspendExcNodes= 286 | #SuspendExcParts= 287 | 288 | 289 | 290 | ################################################################################# 291 | # COMPUTE NODES 292 | 293 | NodeName=DEFAULT Sockets=2 CoresPerSocket=8 ThreadsPerCore=2 State=UNKNOWN 294 | 295 | # CPU-only compute nodes 296 | NodeName=node[1-3] 297 | 298 | # Compute Nodes with Accelerators/Coprocessors 299 | # 300 | # There's no need to set specific information about GPUs or Phis here - it will 301 | # be automatically detected by each node during startup 302 | # 303 | #NodeName=node[4-32] Gres=gpu 304 | #NodeName=node[33-64] Gres=mic 305 | 306 | # Identify any nodes which are temporarily down 307 | # 308 | #DownNodes=node56 State=DOWN Reason=fan 309 | 310 | 311 | 312 | ################################################################################# 313 | # PARTITION / QUEUE DEFINITIONS 314 | 315 | # Set the number of nodes here - it will apply to the partitions below 316 | PartitionName=DEFAULT Nodes=node[1-3] 317 | 318 | # SLURM will use this value when users do not specify their job's length 319 | PartitionName=DEFAULT DefaultTime=4:00:00 320 | 321 | # By default we will not want SLURM to suspend jobs 322 | PartitionName=DEFAULT PreemptMode=off 323 | 324 | # If there are multiple login nodes, each must be listed here 325 | #PartitionName=DEFAULT AllocNodes=login[1-2] 326 | 327 | 328 | # Send most jobs here 329 | # 330 | # Take note that because SelectType is set to 'select/cons_res', this partition 331 | # will schedule multiple jobs on each compute node. However, it will not force 332 | # jobs to share CPU cores - they'll each receive their own dedicated CPU cores. 333 | # 334 | PartitionName=normal Nodes=ALL Priority=10 Default=YES MaxTime=7-0:00:00 Shared=FORCE:1 335 | 336 | 337 | # Users debugging their runs should submit here 338 | PartitionName=debug Nodes=ALL Priority=12 DefaultTime=30:00 MaxTime=30:00 MaxNodes=4 339 | 340 | 341 | # Interactive/Realtime sessions are higher priority than queued batch jobs. 342 | PartitionName=interactive Priority=14 Nodes=node[1-3] DefaultTime=4:00:00 MaxTime=8:00:00 MaxNodes=4 MaxCPUsPerNode=2 MaxMemPerNode=4096 343 | 344 | 345 | # Administrators/Operators can test and debug the cluster here - regular users 346 | # will not be able to submit jobs to this partition 347 | PartitionName=admin Nodes=ALL Priority=14 DefaultTime=30:00 MaxTime=30:00 AllowGroups=hpc-admin 348 | 349 | 350 | # Pre-emptable Partitions 351 | # 352 | # SLURM allows users to submit jobs that are later paused, interrupted or 353 | # rescheduled. This helps give users immediate access to resources they might 354 | # not otherwise have access to (with the understanding that those resources may 355 | # only be available to them for a short period of time). Although other options 356 | # are available with SLURM, the following preemption QOS options are used below: 357 | # 358 | # * Requeue: Requeue the job (it will be killed and then started again later) 359 | # * Suspend: Suspend the lower priority job and automatically resume it when 360 | # the higher priority job terminates (re-uses some gang scheduling logic). 361 | # For this to work, memory use must be managed/monitored by SLURM. 362 | # 363 | # These partitions have very lax restrictions, but do not guarantee that the job 364 | # will have uninterrupted access to the resources. The GraceTime parameter 365 | # allows preempted jobs to clean themselves up before being cancelled (note that 366 | # the application must cleanly handle SIGCONT and SIGTERM for this to work). 367 | # 368 | PartitionName=DEFAULT Priority=2 MaxTime=14-0:00:00 Shared=FORCE:1 GraceTime=30 369 | 370 | # Jobs which are designed cleanly (which means handling SIGCONT and SIGTERM) 371 | # should be submitted to 'reschedulable'. When preempted, they will be 372 | # cancelled (freeing all resources) and rescheduled for later execution. We want 373 | # to incentivize users to build this type of job, as it is the cleanest method. 374 | PartitionName=reschedulable Nodes=ALL PreemptMode=REQUEUE 375 | 376 | # Jobs which cannot properly be cancelled and rescheduled should be submitted 377 | # to 'pausable'. When preempted, they will be paused/suspended (but will remain 378 | # in memory). Once the high-priority jobs are finished, these jobs will be 379 | # resumed. This is fairly clean, but may cause contention for memory. 380 | PartitionName=pausable Nodes=ALL PreemptMode=SUSPEND 381 | 382 | -------------------------------------------------------------------------------- /dependencies/etc/slurm/slurmdbd.conf: -------------------------------------------------------------------------------- 1 | ################################################################################# 2 | ######################### Microway Cluster Management Software (MCMS) for OpenHPC 3 | ################################################################################# 4 | # 5 | # Configuration for SLURM Resource Manager's database daemon 'slurmdbd' 6 | # 7 | # 8 | # This file need only be present on the SLURM management server. 9 | # See the slurmdbd.conf man page for more information. 10 | # 11 | ################################################################################# 12 | 13 | # Archive info 14 | ArchiveEvents=yes 15 | ArchiveJobs=yes 16 | ArchiveResvs=yes 17 | ArchiveSteps=no 18 | ArchiveSuspend=no 19 | ArchiveDir="/tmp" 20 | #ArchiveScript= 21 | 22 | 23 | # Purge individual records after this many days 24 | # Aggregate data is always kept permanently 25 | PurgeEventAfter=1month 26 | PurgeJobAfter=12month 27 | PurgeResvAfter=1month 28 | PurgeStepAfter=1month 29 | PurgeSuspendAfter=1month 30 | 31 | 32 | # Authentication info 33 | AuthType=auth/munge 34 | #AuthInfo=/var/run/munge/munge.socket.2 35 | 36 | # slurmDBD info 37 | DbdAddr=localhost 38 | DbdHost=localhost 39 | #DbdPort=7031 40 | SlurmUser=slurm 41 | #MessageTimeout=300 42 | DebugLevel=4 43 | #DefaultQOS=normal,standby 44 | LogFile=/var/log/slurm/slurmdbd.log 45 | PidFile=/var/run/slurmdbd.pid 46 | #PluginDir=/usr/lib/slurm 47 | 48 | # Prevent users from seeing: 49 | # * reservations they cannot use 50 | # * other user's usage information 51 | PrivateData=reservations,usage 52 | 53 | #TrackWCKey=yes 54 | 55 | # Database info 56 | StorageType=accounting_storage/mysql 57 | StorageHost=localhost 58 | #StoragePort=1234 59 | StoragePass={ChangeMe} 60 | StorageUser=slurm 61 | StorageLoc=slurm_acct_db 62 | 63 | -------------------------------------------------------------------------------- /dependencies/etc/sysconfig/nhc: -------------------------------------------------------------------------------- 1 | ################################################################################ 2 | ######################## Microway Cluster Management Software (MCMS) for OpenHPC 3 | ################################################################################ 4 | # 5 | # Copyright (c) 2015-2016 by Microway, Inc. 6 | # 7 | # This file is part of Microway Cluster Management Software (MCMS) for OpenHPC. 8 | # 9 | # MCMS for OpenHPC is free software: you can redistribute it and/or modify 10 | # it under the terms of the GNU General Public License as published by 11 | # the Free Software Foundation, either version 3 of the License, or 12 | # (at your option) any later version. 13 | # 14 | # MCMS for OpenHPC is distributed in the hope that it will be useful, 15 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 16 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 17 | # GNU General Public License for more details. 18 | # 19 | # You should have received a copy of the GNU General Public License 20 | # along with MCMS. If not, see 21 | # 22 | ################################################################################ 23 | 24 | 25 | ################################################################################ 26 | # 27 | # Node Health Check (NHC) configuration file 28 | # 29 | ####################################################################### 30 | 31 | 32 | ####################################################################### 33 | ### 34 | ### NHC Configuration Variables 35 | ### 36 | 37 | # If you are having trouble with NHC, uncomment the following lines 38 | # to get a full verbose log of the situation: 39 | # 40 | # VERBOSE=1 41 | # DEBUG=1 42 | 43 | # Uncomment to let nodes continue running jobs (even when problems are found) 44 | # MARK_OFFLINE=0 45 | 46 | # Uncomment to run ALL checks (instead of exiting upon the first failure) 47 | # NHC_CHECK_ALL=1 48 | 49 | # If necessary, additional directories may be added to PATH 50 | # PATH="/opt/example/bin:$PATH" 51 | 52 | # Set the resource manager/workload manager to SLURM 53 | PATH="/usr/slurm/16.05/bin:$PATH" 54 | NHC_RM=slurm 55 | 56 | -------------------------------------------------------------------------------- /dependencies/etc/sysconfig/nvidia: -------------------------------------------------------------------------------- 1 | ################################################################################ 2 | ######################## Microway Cluster Management Software (MCMS) for OpenHPC 3 | ################################################################################ 4 | # 5 | # Copyright (c) 2015 by Microway, Inc. 6 | # 7 | # This file is part of Microway Cluster Management Software (MCMS) for OpenHPC. 8 | # 9 | # MCMS for OpenHPC is free software: you can redistribute it and/or modify 10 | # it under the terms of the GNU General Public License as published by 11 | # the Free Software Foundation, either version 3 of the License, or 12 | # (at your option) any later version. 13 | # 14 | # MCMS for OpenHPC is distributed in the hope that it will be useful, 15 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 16 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 17 | # GNU General Public License for more details. 18 | # 19 | # You should have received a copy of the GNU General Public License 20 | # along with MCMS. If not, see 21 | # 22 | ################################################################################ 23 | 24 | 25 | # Define start-up configuration settings for NVIDIA GPUs. 26 | # These settings are enabled by the 'nvidia' init.d service. 27 | 28 | 29 | # Supported products: 30 | # - Full Support 31 | # - NVIDIA Tesla Line: Fermi, Kepler and later generations 32 | # - NVIDIA Quadro Line: Fermi, Kepler and later generations 33 | # - NVIDIA GRID Line: All generations 34 | # - NVIDIA GeForce Line: None 35 | # 36 | # - Limited Support 37 | # - NVIDIA Tesla and Quadro Line: Generations before Fermi-architecture 38 | # - NVIDIA GeForce Line 39 | 40 | 41 | # Enables or disables GPU Accounting. With GPU Accounting one can keep track of 42 | # usage of resources throughout the lifespan of each process running on the GPU. 43 | # 44 | # Accounting data can be queried by any user. Execute: 45 | # nvidia-smi -q -d ACCOUNTING 46 | # 47 | # Set accounting: 48 | # 0/DISABLED, 1/ENABLED 49 | # 50 | NVIDIA_ACCOUNTING=1 51 | 52 | # A flag that indicates whether persistence mode is enabled for the GPU. Value 53 | # is either "Enabled" or "Disabled". When persistence mode is enabled the NVIDIA 54 | # driver remains loaded even when no active clients, such as X11 or nvidia-smi, 55 | # exist. This minimizes the driver load latency associated with running 56 | # dependent apps, such as CUDA programs. There is a modest power usage penalty. 57 | # 58 | # Set persistence mode: 59 | # 0/DISABLED, 1/ENABLED 60 | # 61 | NVIDIA_PERSISTENCE_MODE=1 62 | 63 | 64 | # The compute mode flag indicates whether individual or multiple compute 65 | # applications may run on the GPU. 66 | # 67 | # "Default" means multiple contexts are allowed per device. 68 | # 69 | # "Exclusive Thread" means only one context is allowed per device, usable from 70 | # one thread at a time. 71 | # 72 | # "Exclusive Process" means only one context is allowed per device, usable from 73 | # multiple threads at a time. 74 | # 75 | # "Prohibited" means no contexts are allowed per device (no compute apps). 76 | # 77 | # "EXCLUSIVE_PROCESS" was added in CUDA 4.0. Prior CUDA releases 78 | # supported only one exclusive mode, which is equivalent to "EXCLUSIVE_THREAD" 79 | # in CUDA 4.0 and beyond. 80 | # 81 | # Set MODE for compute applications: 82 | # 0/DEFAULT, 1/EXCLUSIVE_THREAD, 83 | # 2/PROHIBITED, 3/EXCLUSIVE_PROCESS 84 | # 85 | NVIDIA_COMPUTE_MODE=0 86 | 87 | 88 | # Specifies maximum clocks as a pair (e.g. 2000,800) 89 | # that defines GPU’s speed while running applications on a GPU. Values need to 90 | # be one of the available options as reported by: 91 | # 92 | # nvidia-smi -q -d SUPPORTED_CLOCKS 93 | # 94 | # If set to value "max" the maximum speed of each GPU will be queried and set. 95 | # If not set, default clock speeds are used. 96 | # 97 | # For example, to set memory to 3004 MHz and graphics to 875 MHz: 98 | # NVIDIA_CLOCK_SPEEDS=3004,875 99 | # 100 | NVIDIA_CLOCK_SPEEDS=max 101 | 102 | 103 | # Specifies maximum power limit (in watts). Accepts integer and floating point 104 | # numbers. Value needs to be between Min and Max Power Limit as reported by: 105 | # 106 | # nvidia-smi --query-gpu=power.min_limit,power.max_limit --format=csv 107 | # 108 | # If not set, GPUs will run at their normal TDP (the default) 109 | # 110 | # For example, to limit the consumption of each GPU to 200 Watts (or less): 111 | # NVIDIA_POWER_LIMIT=200 112 | # 113 | -------------------------------------------------------------------------------- /dependencies/opt/ohpc/pub/modulefiles/cuda.lua: -------------------------------------------------------------------------------- 1 | help( 2 | [[ 3 | 4 | This module provides the environment for NVIDIA CUDA. 5 | CUDA tools and libraries must be in your path in order 6 | to take advantage of NVIDIA GPU compute capabilities. 7 | 8 | {version} 9 | ]]) 10 | 11 | 12 | whatis("Name: CUDA") 13 | whatis("Version: {version}") 14 | whatis("Category: library, runtime support") 15 | whatis("Description: NVIDIA CUDA libraries and tools for GPU acceleration") 16 | whatis("URL: https://developer.nvidia.com/cuda-downloads") 17 | 18 | 19 | family("cuda") 20 | 21 | 22 | local version = "{version}" 23 | local base = "/usr/local/cuda-{version}" 24 | 25 | 26 | setenv("CUDA_HOME", base) 27 | setenv("CUDA_VERSION", "{version}") 28 | 29 | prepend_path("PATH", pathJoin(base, "bin")) 30 | prepend_path("INCLUDE", pathJoin(base, "include")) 31 | prepend_path("LD_LIBRARY_PATH", pathJoin(base, "lib64")) 32 | 33 | -- Having the CUDA SDK samples available can be useful. 34 | prepend_path("PATH", pathJoin(base, 'samples-bin')) 35 | 36 | -- Push the 64-bit NVIDIA libraries into the front of the LD path. 37 | -- Necessary to fix applications which stupidly look in /usr/lib/ first. 38 | prepend_path("LD_LIBRARY_PATH", "/usr/lib64/nvidia") 39 | 40 | 41 | -- 42 | -- No man files included with CUDA 43 | -- 44 | 45 | -------------------------------------------------------------------------------- /dependencies/usr/lib/systemd/system/nvidia-gpu.service: -------------------------------------------------------------------------------- 1 | [Unit] 2 | Description=NVIDIA GPU Initialization 3 | After=remote-fs.target 4 | 5 | [Service] 6 | Type=oneshot 7 | RemainAfterExit=yes 8 | ExecStart=/etc/init.d/nvidia start 9 | ExecStop=/etc/init.d/nvidia stop 10 | 11 | [Install] 12 | WantedBy=multi-user.target 13 | 14 | -------------------------------------------------------------------------------- /dependencies/var/spool/slurmd/validate-ssh-command: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | ################################################################################ 3 | ######################## Microway Cluster Management Software (MCMS) for OpenHPC 4 | ################################################################################ 5 | # 6 | # Copyright (c) 2015-2016 by Microway, Inc. 7 | # 8 | # This file is part of Microway Cluster Management Software (MCMS) for OpenHPC. 9 | # 10 | # MCMS for OpenHPC is free software: you can redistribute it and/or modify 11 | # it under the terms of the GNU General Public License as published by 12 | # the Free Software Foundation, either version 3 of the License, or 13 | # (at your option) any later version. 14 | # 15 | # MCMS for OpenHPC is distributed in the hope that it will be useful, 16 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 17 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 18 | # GNU General Public License for more details. 19 | # 20 | # You should have received a copy of the GNU General Public License 21 | # along with MCMS. If not, see 22 | # 23 | ################################################################################ 24 | 25 | 26 | ################################################################################ 27 | # 28 | # Ensures public-key logins for the slurm user can only run allowed commands: 29 | # * long-running health check script 30 | # * power off idle nodes 31 | # * check availability of node (by echoing a newline) 32 | # 33 | ################################################################################ 34 | 35 | 36 | rejection_message="The provided SSH key does not have permission to execute these commands." 37 | 38 | case "$SSH_ORIGINAL_COMMAND" in 39 | *\&*) 40 | echo $rejection_message 41 | ;; 42 | *\(*) 43 | echo $rejection_message 44 | ;; 45 | *\{*) 46 | echo $rejection_message 47 | ;; 48 | *\;*) 49 | echo $rejection_message 50 | ;; 51 | *\<*) 52 | echo $rejection_message 53 | ;; 54 | *\`*) 55 | echo $rejection_message 56 | ;; 57 | *\|*) 58 | echo $rejection_message 59 | ;; 60 | /etc/slurm/scripts/slurm.healthcheck_long) 61 | /etc/slurm/scripts/slurm.healthcheck_long 62 | ;; 63 | /sbin/poweroff) 64 | /sbin/poweroff 65 | ;; 66 | echo) 67 | echo 68 | ;; 69 | *) 70 | echo $rejection_message 71 | ;; 72 | esac 73 | -------------------------------------------------------------------------------- /install_login_server.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | ################################################################################ 3 | ######################## Microway Cluster Management Software (MCMS) for OpenHPC 4 | ################################################################################ 5 | # 6 | # Copyright (c) 2016 by Microway, Inc. 7 | # 8 | # This file is part of Microway Cluster Management Software (MCMS) for OpenHPC. 9 | # 10 | # MCMS for OpenHPC is free software: you can redistribute it and/or modify 11 | # it under the terms of the GNU General Public License as published by 12 | # the Free Software Foundation, either version 3 of the License, or 13 | # (at your option) any later version. 14 | # 15 | # MCMS for OpenHPC is distributed in the hope that it will be useful, 16 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 17 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 18 | # GNU General Public License for more details. 19 | # 20 | # You should have received a copy of the GNU General Public License 21 | # along with MCMS. If not, see 22 | # 23 | ################################################################################ 24 | 25 | 26 | 27 | ################################################################################ 28 | ## 29 | ## This script sets up a Login server for an OpenHPC cluster. A Login server 30 | ## sits on your network and accepts user logins (logging them into the cluster). 31 | ## 32 | ## 33 | ## This script should be run on the cluster's Head Node/SMS Server. It will 34 | ## create an OpenHPC/Warewulf image that can be deployed to the login server(s). 35 | ## 36 | ## The Login servers should be connected to two networks: 37 | ## * the internal cluster network (to communicate with the Head/Compute Nodes) 38 | ## * the campus/institute's network (for user access) 39 | ## 40 | ## 41 | ## Please note that certain design/configuration choices are made by this script 42 | ## which may not be compatible with all sites. Efforts are made to maintain 43 | ## portability, but compatibility cannot be guaranteed. 44 | ## 45 | ################################################################################ 46 | 47 | 48 | # Set the default names of the VNFS images 49 | export node_chroot_name=centos-7 50 | export login_chroot_name=login 51 | 52 | 53 | 54 | ################################################################################ 55 | # Create the new VNFS 56 | ################################################################################ 57 | ohpc_vnfs_clone ${node_chroot_name} ${login_chroot_name} 58 | export login_chroot=/opt/ohpc/admin/images/${login_chroot_name} 59 | 60 | # Hybridize some paths which commonly bloat the images 61 | echo " 62 | 63 | # We will be mounting this from the Head Node via NFS 64 | exclude += /opt/ohpc 65 | 66 | # These paths will be made available to nodes via NFS 67 | hybridize += /usr/local 68 | hybridize += /usr/lib/golang 69 | hybridize += /usr/lib/jvm 70 | hybridize += /usr/lib64/nvidia 71 | hybridize += /usr/lib64/firefox 72 | 73 | " >> /etc/warewulf/vnfs/${login_chroot_name}.conf 74 | 75 | 76 | 77 | ################################################################################ 78 | # Disable the services that are only needed on Compute Nodes 79 | ################################################################################ 80 | chroot ${login_chroot} systemctl disable slurmd.service 81 | 82 | 83 | 84 | ################################################################################ 85 | # Ensure all users can login to this system 86 | ################################################################################ 87 | sed -i 's/- : ALL EXCEPT root hpc-admin : ALL//' ${login_chroot}/etc/security/access.conf 88 | sed -i 's/# Reject users who do not have jobs running on this node//' ${login_chroot}/etc/pam.d/sshd 89 | sed -i 's/account required pam_slurm.so//' ${login_chroot}/etc/pam.d/sshd 90 | 91 | 92 | 93 | ################################################################################ 94 | # Configure the second network interface 95 | ################################################################################ 96 | mkdir -p /etc/warewulf/files/login_servers/ 97 | echo " 98 | DEVICE=eth0 99 | BOOTPROTO=static 100 | ONBOOT=yes 101 | ZONE=trusted 102 | IPADDR=%{NETDEVS::ETH0::IPADDR} 103 | NETMASK=%{NETDEVS::ETH0::NETMASK} 104 | GATEWAY=%{NETDEVS::ETH0::GATEWAY} 105 | HWADDR=%{NETDEVS::ETH0::HWADDR} 106 | MTU=%{NETDEVS::ETH0::MTU} 107 | " > /etc/warewulf/files/login_servers/ifcfg-eth0.ww 108 | wwsh file import /etc/warewulf/files/login_servers/ifcfg-eth0.ww \ 109 | --name=loginServers_ifcfg-eth0 \ 110 | --path=/etc/sysconfig/network-scripts/ifcfg-eth0 111 | echo " 112 | DEVICE=eth1 113 | BOOTPROTO=static 114 | ONBOOT=yes 115 | ZONE=public 116 | IPADDR=%{NETDEVS::ETH1::IPADDR} 117 | NETMASK=%{NETDEVS::ETH1::NETMASK} 118 | GATEWAY=%{NETDEVS::ETH1::GATEWAY} 119 | HWADDR=%{NETDEVS::ETH1::HWADDR} 120 | MTU=%{NETDEVS::ETH1::MTU} 121 | " > /etc/warewulf/files/login_servers/ifcfg-eth1.ww 122 | wwsh file import /etc/warewulf/files/login_servers/ifcfg-eth1.ww \ 123 | --name=loginServers_ifcfg-eth1 \ 124 | --path=/etc/sysconfig/network-scripts/ifcfg-eth1 125 | 126 | 127 | 128 | ################################################################################ 129 | # Configure the firewall 130 | ################################################################################ 131 | # Ensure the firewall is active (it's not usually enabled on compute nodes) 132 | yum -y --installroot=${login_chroot} install firewalld 133 | chroot ${login_chroot} systemctl enable firewalld.service 134 | 135 | # By default, only SSH is allowed in on the public-facing network interface. 136 | # We can allow more services here, if desired: 137 | # 138 | # chroot ${login_chroot} firewall-offline-cmd --zone=public --add-service=http 139 | # chroot ${login_chroot} firewall-offline-cmd --zone=public --add-port=4000/tcp 140 | # 141 | 142 | 143 | 144 | ################################################################################ 145 | # If remote graphical access is desired, NoMachine works well. It is a licensed 146 | # product, but is stable and provides better performance. Microway can help you 147 | # select a version with the capabilities you require: 148 | # https://www.microway.com/technologies/software/responsive-enterprise-class-remote-desktops-nomachine/ 149 | ################################################################################ 150 | 151 | # Install common desktop environments 152 | yum -y --installroot=${login_chroot} groups install "GNOME Desktop" 153 | yum -y --installroot=${login_chroot} groups install "KDE Plasma Workspaces" 154 | 155 | # Disable SELinux 156 | setenforce 0 157 | sed -i 's/SELINUX=enforcing/SELINUX=disabled/' ${login_chroot}/etc/selinux/config 158 | 159 | # ############################################################################## 160 | # # Syncronize the built-in users/groups between the Head and the Login Nodes 161 | # # 162 | # # If not done now, the users created by the following packages will have 163 | # # different UIDs and GIDs on the Login Nodes than on the Head Node. 164 | # ############################################################################## 165 | # 166 | # groupadd nx 167 | # groupadd nxhtd 168 | # 169 | # useradd --home-dir '/var/NX/nx' \ 170 | # --password '*' \ 171 | # --gid nx \ 172 | # --shell /bin/false \ 173 | # --system \ 174 | # nx 175 | # 176 | # useradd --home-dir '/var/NX/nxhtd' \ 177 | # --password '*' \ 178 | # --gid nxhtd \ 179 | # --shell /bin/false \ 180 | # --system \ 181 | # nxhtd 182 | # 183 | # cp -af /etc/passwd ${login_chroot}/etc/ 184 | # cp -af /etc/group ${login_chroot}/etc/ 185 | # wwsh file sync 186 | # 187 | # 188 | # An admin will need to manually install the selected NoMachine services 189 | # 190 | # 191 | # chroot ${login_chroot} systemctl enable nxserver.service 192 | 193 | 194 | 195 | ################################################################################ 196 | # Re-assemble Login server VNFS with all the changes 197 | ################################################################################ 198 | # Clear out the random entries from chrooting into the VNFS image 199 | > ${login_chroot}/root/.bash_history 200 | 201 | # Rebuild the VNFS 202 | wwvnfs -y --chroot ${login_chroot} 203 | 204 | 205 | 206 | echo " 207 | 208 | The Login server software image is now ready for use. To deploy to a server, you 209 | should run something similar to the commands below: 210 | 211 | wwsh node clone node1 login1 212 | 213 | wwsh provision set login1 --fileadd=loginServers_ifcfg-eth0 --vnfs=login 214 | wwsh provision set login1 --fileadd=loginServers_ifcfg-eth1 --vnfs=login 215 | 216 | wwsh node set login1 --netdev=eth0 --ipaddr=10.0.254.253 --netmask=255.255.0.0 --hwaddr=00:aa:bb:cc:dd:ee --mtu=9000 --fqdn=login1.hpc.example.com 217 | wwsh node set login1 --netdev=eth1 --ipaddr= --netmask= --gateway= --mtu=9000 --hwaddr=00:aa:bb:cc:dd:ef 218 | wwsh node set login1 --netdev=ib0 --ipaddr=10.10.254.253 --netmask=255.255.0.0 219 | wwsh node set login1 --domain=hpc.example.com 220 | wwsh ipmi set login1 --ipaddr=10.13.254.253 221 | 222 | " 223 | -------------------------------------------------------------------------------- /install_monitoring_server.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | ################################################################################ 3 | ######################## Microway Cluster Management Software (MCMS) for OpenHPC 4 | ################################################################################ 5 | # 6 | # Copyright (c) 2015-2016 by Microway, Inc. 7 | # 8 | # This file is part of Microway Cluster Management Software (MCMS) for OpenHPC. 9 | # 10 | # MCMS for OpenHPC is free software: you can redistribute it and/or modify 11 | # it under the terms of the GNU General Public License as published by 12 | # the Free Software Foundation, either version 3 of the License, or 13 | # (at your option) any later version. 14 | # 15 | # MCMS for OpenHPC is distributed in the hope that it will be useful, 16 | # but WITHOUT ANY WARRANTY; without even the implied warranty of 17 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 18 | # GNU General Public License for more details. 19 | # 20 | # You should have received a copy of the GNU General Public License 21 | # along with MCMS. If not, see 22 | # 23 | ################################################################################ 24 | 25 | 26 | 27 | ################################################################################ 28 | ## 29 | ## This script sets up a monitoring server for an OpenHPC cluster 30 | ## 31 | ## 32 | ## This script should be run on a monitoring server on the same network as the 33 | ## cluster's Head/Master Node - also referred to as the System Management Server 34 | ## (SMS). This script presumes that a Red Hat derivative (CentOS, SL, etc) has 35 | ## just been installed (with vanilla configuration). 36 | ## 37 | ## 38 | ## Please note that certain design/configuration choices are made by this script 39 | ## which may not be compatible with all sites. Efforts are made to maintain 40 | ## portability, but compatibility cannot be guaranteed. 41 | ## 42 | ################################################################################ 43 | 44 | 45 | 46 | ################################################################################ 47 | # Determine where this script is running from (so we can locate patches, etc.) 48 | ################################################################################ 49 | install_script_dir="$( dirname "$( readlink -f "$0" )" )" 50 | 51 | dependencies_dir=${install_script_dir}/dependencies 52 | config_file=${install_script_dir}/configuration_settings.txt 53 | 54 | 55 | # Ensure the settings have been completed 56 | if [[ ! -r ${config_file} ]]; then 57 | echo " 58 | 59 | This script requires you to provide configuration settings. Please ensure 60 | that the file ${config_file} exists and has been fully completed. 61 | " 62 | exit 1 63 | else 64 | source ${config_file} 65 | fi 66 | 67 | if [[ ! -z "$(egrep "^[^#].*ChangeMe" ${config_file})" ]]; then 68 | echo " 69 | 70 | For security, you *must* change the passwords in the configuration file. 71 | Please double-check your settings in ${config_file} 72 | " 73 | exit 1 74 | fi 75 | 76 | 77 | 78 | ################################################################################ 79 | # Currently, only RHEL/SL/CentOS 7 is supported for the bootstrap 80 | ################################################################################ 81 | distribution=$(egrep "CentOS Linux 7|Scientific Linux 7|Red Hat Enterprise Linux Server release 7" /etc/*-release) 82 | centos_check=$? 83 | 84 | if [[ ${centos_check} -ne 0 ]]; then 85 | echo " 86 | 87 | Currently, only RHEL, Scientific and CentOS Linux 7 are supported 88 | " 89 | exit 1 90 | else 91 | echo "RHEL/SL/CentOS 7 was detected. Continuing..." 92 | fi 93 | 94 | 95 | 96 | ################################################################################ 97 | # Update system packages and EPEL package repo 98 | ################################################################################ 99 | yum -y update 100 | yum -y install epel-release 101 | 102 | 103 | 104 | ################################################################################ 105 | # If enabled, disable auto-update on this server. 106 | ################################################################################ 107 | if [[ -r /etc/sysconfig/yum-autoupdate ]]; then 108 | sed -i 's/ENABLED="true"/ENABLED="false"/' /etc/sysconfig/yum-autoupdate 109 | fi 110 | 111 | 112 | 113 | ################################################################################ 114 | # Disable SELinux 115 | ################################################################################ 116 | setenforce 0 117 | sed -i 's/SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config 118 | 119 | 120 | 121 | ################################################################################ 122 | # Enable NTP (particularly important for things like SLURM and Ceph) 123 | ################################################################################ 124 | yum -y install ntp ntpdate ntp-doc 125 | 126 | if [[ ! -z "${ntp_server}" ]]; then 127 | sed -i 's/^server /#server /' /etc/ntp.conf 128 | echo -e " 129 | 130 | server ${ntp_server} 131 | 132 | " >> /etc/ntp.conf 133 | 134 | ntpdate ${ntp_server} 135 | else 136 | ntpdate 0.rhel.pool.ntp.org \ 137 | 1.rhel.pool.ntp.org \ 138 | 2.rhel.pool.ntp.org \ 139 | 3.rhel.pool.ntp.org 140 | fi 141 | 142 | # Because some clusters are not connected to the Internet, we need to enable 143 | # orphan mode as described here: 144 | # 145 | # https://www.eecis.udel.edu/~mills/ntp/html/miscopt.html#tos 146 | # 147 | echo " 148 | 149 | tos orphan 5 150 | 151 | " >> /etc/ntp.conf 152 | hwclock --systohc --utc 153 | systemctl enable ntpd.service 154 | systemctl start ntpd.service 155 | 156 | 157 | 158 | ################################################################################ 159 | # Disable X-Windows since this is typically a headless server 160 | ################################################################################ 161 | systemctl set-default multi-user.target 162 | 163 | 164 | 165 | ################################################################################ 166 | # Install SaltStack, which provides distribution-agnostic configuration mgmt 167 | ################################################################################ 168 | yum -y install salt-minion 169 | systemctl enable salt-minion 170 | systemctl start salt-minion 171 | 172 | 173 | 174 | ################################################################################ 175 | # Create a group for HPC administrators 176 | ################################################################################ 177 | groupadd hpc-admin 178 | 179 | 180 | 181 | ################################################################################ 182 | # Install the monitoring tools (Shinken/Thruk/Check_MK) 183 | ################################################################################ 184 | useradd --system shinken --home-dir /tmp --no-create-home 185 | 186 | yum install python-pip 187 | pip install shinken 188 | 189 | systemctl enable shinken.service 190 | systemctl start shinken.service 191 | 192 | --------------------------------------------------------------------------------