├── COPYING ├── ChangeLog ├── DISCLAIMER ├── META ├── NEWS ├── README ├── skewstats ├── skewstats.1 ├── slurm-joblog.conf.example ├── slurm-joblog.pl ├── sqlog ├── sqlog-db-util ├── sqlog-db-util.8 ├── sqlog.1 ├── sqlog.conf.example └── sqlog.spec /COPYING: -------------------------------------------------------------------------------- 1 | GNU GENERAL PUBLIC LICENSE 2 | Version 2, June 1991 3 | 4 | Copyright (C) 1989, 1991 Free Software Foundation, Inc. 5 | 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA 6 | Everyone is permitted to copy and distribute verbatim copies 7 | of this license document, but changing it is not allowed. 8 | 9 | Preamble 10 | 11 | The licenses for most software are designed to take away your 12 | freedom to share and change it. By contrast, the GNU General Public 13 | License is intended to guarantee your freedom to share and change free 14 | software--to make sure the software is free for all its users. This 15 | General Public License applies to most of the Free Software 16 | Foundation's software and to any other program whose authors commit to 17 | using it. (Some other Free Software Foundation software is covered by 18 | the GNU Library General Public License instead.) You can apply it to 19 | your programs, too. 20 | 21 | When we speak of free software, we are referring to freedom, not 22 | price. Our General Public Licenses are designed to make sure that you 23 | have the freedom to distribute copies of free software (and charge for 24 | this service if you wish), that you receive source code or can get it 25 | if you want it, that you can change the software or use pieces of it 26 | in new free programs; and that you know you can do these things. 27 | 28 | To protect your rights, we need to make restrictions that forbid 29 | anyone to deny you these rights or to ask you to surrender the rights. 30 | These restrictions translate to certain responsibilities for you if you 31 | distribute copies of the software, or if you modify it. 32 | 33 | For example, if you distribute copies of such a program, whether 34 | gratis or for a fee, you must give the recipients all the rights that 35 | you have. You must make sure that they, too, receive or can get the 36 | source code. And you must show them these terms so they know their 37 | rights. 38 | 39 | We protect your rights with two steps: (1) copyright the software, and 40 | (2) offer you this license which gives you legal permission to copy, 41 | distribute and/or modify the software. 42 | 43 | Also, for each author's protection and ours, we want to make certain 44 | that everyone understands that there is no warranty for this free 45 | software. If the software is modified by someone else and passed on, we 46 | want its recipients to know that what they have is not the original, so 47 | that any problems introduced by others will not reflect on the original 48 | authors' reputations. 49 | 50 | Finally, any free program is threatened constantly by software 51 | patents. We wish to avoid the danger that redistributors of a free 52 | program will individually obtain patent licenses, in effect making the 53 | program proprietary. To prevent this, we have made it clear that any 54 | patent must be licensed for everyone's free use or not licensed at all. 55 | 56 | The precise terms and conditions for copying, distribution and 57 | modification follow. 58 | 59 | GNU GENERAL PUBLIC LICENSE 60 | TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 61 | 62 | 0. This License applies to any program or other work which contains 63 | a notice placed by the copyright holder saying it may be distributed 64 | under the terms of this General Public License. The "Program", below, 65 | refers to any such program or work, and a "work based on the Program" 66 | means either the Program or any derivative work under copyright law: 67 | that is to say, a work containing the Program or a portion of it, 68 | either verbatim or with modifications and/or translated into another 69 | language. (Hereinafter, translation is included without limitation in 70 | the term "modification".) Each licensee is addressed as "you". 71 | 72 | Activities other than copying, distribution and modification are not 73 | covered by this License; they are outside its scope. The act of 74 | running the Program is not restricted, and the output from the Program 75 | is covered only if its contents constitute a work based on the 76 | Program (independent of having been made by running the Program). 77 | Whether that is true depends on what the Program does. 78 | 79 | 1. You may copy and distribute verbatim copies of the Program's 80 | source code as you receive it, in any medium, provided that you 81 | conspicuously and appropriately publish on each copy an appropriate 82 | copyright notice and disclaimer of warranty; keep intact all the 83 | notices that refer to this License and to the absence of any warranty; 84 | and give any other recipients of the Program a copy of this License 85 | along with the Program. 86 | 87 | You may charge a fee for the physical act of transferring a copy, and 88 | you may at your option offer warranty protection in exchange for a fee. 89 | 90 | 2. You may modify your copy or copies of the Program or any portion 91 | of it, thus forming a work based on the Program, and copy and 92 | distribute such modifications or work under the terms of Section 1 93 | above, provided that you also meet all of these conditions: 94 | 95 | a) You must cause the modified files to carry prominent notices 96 | stating that you changed the files and the date of any change. 97 | 98 | b) You must cause any work that you distribute or publish, that in 99 | whole or in part contains or is derived from the Program or any 100 | part thereof, to be licensed as a whole at no charge to all third 101 | parties under the terms of this License. 102 | 103 | c) If the modified program normally reads commands interactively 104 | when run, you must cause it, when started running for such 105 | interactive use in the most ordinary way, to print or display an 106 | announcement including an appropriate copyright notice and a 107 | notice that there is no warranty (or else, saying that you provide 108 | a warranty) and that users may redistribute the program under 109 | these conditions, and telling the user how to view a copy of this 110 | License. (Exception: if the Program itself is interactive but 111 | does not normally print such an announcement, your work based on 112 | the Program is not required to print an announcement.) 113 | 114 | These requirements apply to the modified work as a whole. If 115 | identifiable sections of that work are not derived from the Program, 116 | and can be reasonably considered independent and separate works in 117 | themselves, then this License, and its terms, do not apply to those 118 | sections when you distribute them as separate works. But when you 119 | distribute the same sections as part of a whole which is a work based 120 | on the Program, the distribution of the whole must be on the terms of 121 | this License, whose permissions for other licensees extend to the 122 | entire whole, and thus to each and every part regardless of who wrote it. 123 | 124 | Thus, it is not the intent of this section to claim rights or contest 125 | your rights to work written entirely by you; rather, the intent is to 126 | exercise the right to control the distribution of derivative or 127 | collective works based on the Program. 128 | 129 | In addition, mere aggregation of another work not based on the Program 130 | with the Program (or with a work based on the Program) on a volume of 131 | a storage or distribution medium does not bring the other work under 132 | the scope of this License. 133 | 134 | 3. You may copy and distribute the Program (or a work based on it, 135 | under Section 2) in object code or executable form under the terms of 136 | Sections 1 and 2 above provided that you also do one of the following: 137 | 138 | a) Accompany it with the complete corresponding machine-readable 139 | source code, which must be distributed under the terms of Sections 140 | 1 and 2 above on a medium customarily used for software interchange; or, 141 | 142 | b) Accompany it with a written offer, valid for at least three 143 | years, to give any third party, for a charge no more than your 144 | cost of physically performing source distribution, a complete 145 | machine-readable copy of the corresponding source code, to be 146 | distributed under the terms of Sections 1 and 2 above on a medium 147 | customarily used for software interchange; or, 148 | 149 | c) Accompany it with the information you received as to the offer 150 | to distribute corresponding source code. (This alternative is 151 | allowed only for noncommercial distribution and only if you 152 | received the program in object code or executable form with such 153 | an offer, in accord with Subsection b above.) 154 | 155 | The source code for a work means the preferred form of the work for 156 | making modifications to it. For an executable work, complete source 157 | code means all the source code for all modules it contains, plus any 158 | associated interface definition files, plus the scripts used to 159 | control compilation and installation of the executable. However, as a 160 | special exception, the source code distributed need not include 161 | anything that is normally distributed (in either source or binary 162 | form) with the major components (compiler, kernel, and so on) of the 163 | operating system on which the executable runs, unless that component 164 | itself accompanies the executable. 165 | 166 | If distribution of executable or object code is made by offering 167 | access to copy from a designated place, then offering equivalent 168 | access to copy the source code from the same place counts as 169 | distribution of the source code, even though third parties are not 170 | compelled to copy the source along with the object code. 171 | 172 | 4. You may not copy, modify, sublicense, or distribute the Program 173 | except as expressly provided under this License. Any attempt 174 | otherwise to copy, modify, sublicense or distribute the Program is 175 | void, and will automatically terminate your rights under this License. 176 | However, parties who have received copies, or rights, from you under 177 | this License will not have their licenses terminated so long as such 178 | parties remain in full compliance. 179 | 180 | 5. You are not required to accept this License, since you have not 181 | signed it. However, nothing else grants you permission to modify or 182 | distribute the Program or its derivative works. These actions are 183 | prohibited by law if you do not accept this License. Therefore, by 184 | modifying or distributing the Program (or any work based on the 185 | Program), you indicate your acceptance of this License to do so, and 186 | all its terms and conditions for copying, distributing or modifying 187 | the Program or works based on it. 188 | 189 | 6. Each time you redistribute the Program (or any work based on the 190 | Program), the recipient automatically receives a license from the 191 | original licensor to copy, distribute or modify the Program subject to 192 | these terms and conditions. You may not impose any further 193 | restrictions on the recipients' exercise of the rights granted herein. 194 | You are not responsible for enforcing compliance by third parties to 195 | this License. 196 | 197 | 7. If, as a consequence of a court judgment or allegation of patent 198 | infringement or for any other reason (not limited to patent issues), 199 | conditions are imposed on you (whether by court order, agreement or 200 | otherwise) that contradict the conditions of this License, they do not 201 | excuse you from the conditions of this License. If you cannot 202 | distribute so as to satisfy simultaneously your obligations under this 203 | License and any other pertinent obligations, then as a consequence you 204 | may not distribute the Program at all. For example, if a patent 205 | license would not permit royalty-free redistribution of the Program by 206 | all those who receive copies directly or indirectly through you, then 207 | the only way you could satisfy both it and this License would be to 208 | refrain entirely from distribution of the Program. 209 | 210 | If any portion of this section is held invalid or unenforceable under 211 | any particular circumstance, the balance of the section is intended to 212 | apply and the section as a whole is intended to apply in other 213 | circumstances. 214 | 215 | It is not the purpose of this section to induce you to infringe any 216 | patents or other property right claims or to contest validity of any 217 | such claims; this section has the sole purpose of protecting the 218 | integrity of the free software distribution system, which is 219 | implemented by public license practices. Many people have made 220 | generous contributions to the wide range of software distributed 221 | through that system in reliance on consistent application of that 222 | system; it is up to the author/donor to decide if he or she is willing 223 | to distribute software through any other system and a licensee cannot 224 | impose that choice. 225 | 226 | This section is intended to make thoroughly clear what is believed to 227 | be a consequence of the rest of this License. 228 | 229 | 8. If the distribution and/or use of the Program is restricted in 230 | certain countries either by patents or by copyrighted interfaces, the 231 | original copyright holder who places the Program under this License 232 | may add an explicit geographical distribution limitation excluding 233 | those countries, so that distribution is permitted only in or among 234 | countries not thus excluded. In such case, this License incorporates 235 | the limitation as if written in the body of this License. 236 | 237 | 9. The Free Software Foundation may publish revised and/or new versions 238 | of the General Public License from time to time. Such new versions will 239 | be similar in spirit to the present version, but may differ in detail to 240 | address new problems or concerns. 241 | 242 | Each version is given a distinguishing version number. If the Program 243 | specifies a version number of this License which applies to it and "any 244 | later version", you have the option of following the terms and conditions 245 | either of that version or of any later version published by the Free 246 | Software Foundation. If the Program does not specify a version number of 247 | this License, you may choose any version ever published by the Free Software 248 | Foundation. 249 | 250 | 10. If you wish to incorporate parts of the Program into other free 251 | programs whose distribution conditions are different, write to the author 252 | to ask for permission. For software which is copyrighted by the Free 253 | Software Foundation, write to the Free Software Foundation; we sometimes 254 | make exceptions for this. Our decision will be guided by the two goals 255 | of preserving the free status of all derivatives of our free software and 256 | of promoting the sharing and reuse of software generally. 257 | 258 | NO WARRANTY 259 | 260 | 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY 261 | FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN 262 | OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES 263 | PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED 264 | OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF 265 | MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS 266 | TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE 267 | PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, 268 | REPAIR OR CORRECTION. 269 | 270 | 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING 271 | WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR 272 | REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, 273 | INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING 274 | OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED 275 | TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY 276 | YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER 277 | PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE 278 | POSSIBILITY OF SUCH DAMAGES. 279 | 280 | END OF TERMS AND CONDITIONS 281 | 282 | How to Apply These Terms to Your New Programs 283 | 284 | If you develop a new program, and you want it to be of the greatest 285 | possible use to the public, the best way to achieve this is to make it 286 | free software which everyone can redistribute and change under these terms. 287 | 288 | To do so, attach the following notices to the program. It is safest 289 | to attach them to the start of each source file to most effectively 290 | convey the exclusion of warranty; and each file should have at least 291 | the "copyright" line and a pointer to where the full notice is found. 292 | 293 | 294 | Copyright (C) 295 | 296 | This program is free software; you can redistribute it and/or modify 297 | it under the terms of the GNU General Public License as published by 298 | the Free Software Foundation; either version 2 of the License, or 299 | (at your option) any later version. 300 | 301 | This program is distributed in the hope that it will be useful, 302 | but WITHOUT ANY WARRANTY; without even the implied warranty of 303 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 304 | GNU General Public License for more details. 305 | 306 | You should have received a copy of the GNU General Public License 307 | along with this program; if not, write to the Free Software 308 | Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA 309 | 310 | 311 | Also add information on how to contact you by electronic and paper mail. 312 | 313 | If the program is interactive, make it output a short notice like this 314 | when it starts in an interactive mode: 315 | 316 | Gnomovision version 69, Copyright (C) year name of author 317 | Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. 318 | This is free software, and you are welcome to redistribute it 319 | under certain conditions; type `show c' for details. 320 | 321 | The hypothetical commands `show w' and `show c' should show the appropriate 322 | parts of the General Public License. Of course, the commands you use may 323 | be called something other than `show w' and `show c'; they could even be 324 | mouse-clicks or menu items--whatever suits your program. 325 | 326 | You should also get your employer (if you work as a programmer) or your 327 | school, if any, to sign a "copyright disclaimer" for the program, if 328 | necessary. Here is a sample; alter the names: 329 | 330 | Yoyodyne, Inc., hereby disclaims all copyright interest in the program 331 | `Gnomovision' (which makes passes at compilers) written by James Hacker. 332 | 333 | , 1 April 1989 334 | Ty Coon, President of Vice 335 | 336 | This General Public License does not permit incorporating your program into 337 | proprietary programs. If your program is a subroutine library, you may 338 | consider it more useful to permit linking proprietary applications with the 339 | library. If this is what you want to do, use the GNU Library General 340 | Public License instead of this License. 341 | -------------------------------------------------------------------------------- /ChangeLog: -------------------------------------------------------------------------------- 1 | 2011-07-11 Mark Grondona 2 | 3 | * : tag v0.19. 4 | 5 | * slurm-joblog.pl : 6 | Use NODECNT environment variable if set instead of counting 7 | node list. Resolves Issue 2. 8 | 9 | * slurm-joblog.pl, sqlog-db-util : 10 | Improve debug and verbose output for better debugging. 11 | 12 | * sqlog-db-util : 13 | Fix CHAOS bz#1204: error setting up DB. 14 | 15 | 2009-10-23 Mark Grondona 16 | 17 | * : tag v0.18. 18 | 19 | * sqlog : 20 | Add missing table name for nodes table in &lookup_ids(). 21 | 22 | 2009-10-23 Mark Grondona 23 | 24 | * : tag v0.17. 25 | 26 | * skewstats, skewstats.1, sqlog.spec : 27 | Add skewstats utility to sqlog package. 28 | 29 | 2009-10-16 Mark Grondona 30 | 31 | * sqlog.1 : Update documentation of --regex. 32 | 33 | * sqlog : Replace global conf{regex} boolean with conf{regex}{key} 34 | to allow per-key setting of regex matching vs exact matching. 35 | The --regex flag now operates similar to --exclude. 36 | 37 | 2009-10-12 Mark Grondona 38 | 39 | * : tag v0.16. 40 | 41 | * sqlog : Fix --regex queries against v2 database format. 42 | 43 | * sqlog : Don't treat filter arguments for running jobs 44 | as regular expressions unles --regex was used. 45 | 46 | 2009-10-10 Mark Grondona 47 | 48 | * sqlog : Refactor code for running job selection, and fix 49 | -J, --job-names processing on running jobs. 50 | 51 | * sqlog-db-util : Fix -L --localhost processing. 52 | 53 | 2009-05-07 Mark Grondona 54 | 55 | * : tag v0.15. 56 | 57 | * sqlog : Fix -j, --jobids option on systems with v2 58 | database. 59 | 60 | 2009-04-09 Mark Grondona 61 | 62 | * : tag v0.14. 63 | 64 | * sqlog : 65 | Remove ncores from default short output format. 66 | 67 | * sqlog (query_running_jobs, reformat_squeue_datetime) : 68 | Drop the T character from new-style squeue output. SLURM 69 | seems to have introduced this in slurm-1.4, and Date::Manip 70 | doesn't handle it. 71 | 72 | * sqlog (get_seconds_date_manip_is_buggy) : 73 | If date is literally NONE, then just return that value. 74 | Otherwise, try to handle dates that cannot be parsed by 75 | Date::Manip by logging an error and setting them to NONE. 76 | 77 | 2009-04-02 Mark Grondona 78 | 79 | * slurm-joblog.pl : 80 | Try to log to text file even if there is a failure reading 81 | one of the config files. 82 | 83 | 2009-04-02 Mark Grondona 84 | 85 | * : tag v0.13. 86 | 87 | 2009-04-01 Mark Grondona 88 | 89 | * sqlog : 90 | Change timelocal debug message from log_verbose to log_debug. 91 | 92 | * sqlog-db-util, sqlog-db-util.8, README : 93 | Rename --cores option to --cores-per-node. 94 | 95 | * sqlog-db-util : 96 | Reformat --help output and comments to fit in 80 characters. 97 | 98 | 2009-03-13 Adam Moody 99 | 100 | * sqlog : Added logic to track nodes and query records by node name 101 | via SQL commands. 102 | 103 | * sqlog-db-util : Added --backup, --prune, --cores, --obfuscate, 104 | --notrack, and --delay-index options. Changed --drop to --drop=V. 105 | Created usernames, jobnames, jobstates, and partitions tables 106 | to reduce record length in new jobs table. Added indicies on 107 | jobnames, starttime, endtime, runtime, nodecount, corecount to 108 | speed up such queries. Added nodes table to assign each node name 109 | to an id, and created a jobs_nodes table that maps node ids to 110 | job ids. 111 | 112 | 2008-03-03 Adam Moody 113 | 114 | * sqlog : Added --ncores, --maxcores, and --mincores options to display 115 | and sort jobs based on number of processors used (PROCS field from 116 | SLURM job log). 117 | 118 | * sqlog-db-util : Added -x, --convert options to convert existing 119 | slurm_job_log tables to version 2 (adds procs column and extends 120 | nodelist column to a blob). Avoids reconverting an up-to-date table. 121 | 122 | * slurm-joblog.pl : Added code to insert procs field. 123 | 124 | 2008-12-03 Mark Grondona 125 | 126 | * : tag v0.12. 127 | 128 | * sqlog : Add --regex option to use REGEXP instead of IN 129 | for queries of jobid, username, jobstate, partition, or 130 | job name. 131 | 132 | 2008-12-02 Mark Grondona 133 | 134 | * sqlog : It appears that Date::Manip is buggy and doesn't 135 | support DST properly. sqlog used Date::Manip to convert 136 | between stored database dates and Unix time (seconds since 137 | epoch). Instead convert time using other methods in this 138 | instance to avoid misconverting during DST conversions. 139 | 140 | 2008-10-03 Py Watson 141 | 142 | * sqlog.spec : 143 | Change the various perl Requires to be based on the module 144 | name rather than the RPM name. Fix for RPM Requires on SLES 9. 145 | 146 | 2008-06-24 Mark Grondona 147 | 148 | * : tag v0.11. 149 | 150 | * sqlog-db-util : 151 | Drop all slurm* users from mysql DB before adding new 152 | user privileges to avoid leaving old settings around. 153 | 154 | * sqlog-db-util : 155 | New slurm-joblog.conf parameter SQLNETWORK to specify the network 156 | on which read-only access to the DB is allowed. Default: 192.168.%.%. 157 | 158 | * slurm-joblog.conf.example : 159 | Describe SQLNETWORK parameter. 160 | 161 | 2008-04-18 Mark Grondona 162 | 163 | * : tag v0.10. 164 | 165 | * sqlog.1 : Add OUTPUT FORMAT section. 166 | 167 | * sqlog : Separate regexes in duration_to_seconds for 168 | better parsing of the two support forms. 169 | 170 | 2008-02-29 Mark Grondona 171 | 172 | * : tag v0.9. 173 | 174 | * sqlog.1 : Add JOB STATE CODES section to manpage explaining 175 | the various job states and their abbreviations. 176 | 177 | * sqlog : Fix -o, --format when just a new format specification 178 | is provided (e.g. "long" or "freeform"). 179 | 180 | 2007-09-27 Mark Grondona 181 | 182 | * : tag v0.8. 183 | 184 | 2007-09-14 Pythagoras Watson 185 | 186 | * sqlog.spec : Add more packages to Requires. 187 | 188 | 2007-09-12 Mark Grondona 189 | 190 | * slurm-joblog.pl : Fix test for whether a job logfile is 191 | configured. 192 | 193 | 2007-08-13 Mark Grondona 194 | 195 | * : tag v0.7. 196 | 197 | 2007-08-13 Pythagoras Watson 198 | 199 | * sqlog, sqlog-db-util, slurm-joblog.pl, sqlog.spec : 200 | Fixes required for AIX and other installations without 201 | prefix = /usr: Allow modification of perl include paths 202 | and PATH at RPM build time, use __perl RPM macro instead 203 | of hardcoding /usr/bin/perl in specfile, correct perms 204 | of sql-db-util.8 man page, and improve subst() functon in 205 | specfile. 206 | 207 | 2007-08-10 Mark Grondona 208 | 209 | * sqlog : Add runtime_s output field (runtime in seconds). 210 | 211 | * sqlog : Add unixstart and unixend format keys for start and 212 | end times in seconds since the epoch. Also, add an alias 213 | time_s for runtime_s. 214 | 215 | * : tag v0.6. 216 | 217 | 2007-08-10 Adam Moody 218 | 219 | * sqlog : Fixed bug in parse_end_time preventing --end-before 220 | and --end-after from working. Fixed comment in parse_start_time 221 | to print --start-before and --start-after in error message. 222 | 223 | 2007-08-10 Mark Grondona 224 | 225 | * sqlog, sqlog-db-util, slurm-joblog.pl, sqlog.spec : 226 | Allow configuration directory (/etc/slurm by default) and perl 227 | path (/usr/bin/perl by default) to be overridden at RPM build time. 228 | 229 | * : tag 0.5. 230 | 231 | 2007-08-07 Mark Grondona 232 | 233 | * sqlog-db-util : Failsafe check for existence of slurm_job_log 234 | table in create_db(). Create SLURM DB with "IF NOT EXISTS" to 235 | avoid error. 236 | 237 | * sqlog : Don't read ~/.sqlog. This file is reserved for future 238 | user configuration. 239 | 240 | * sqlog : Parse ~/.sqlog for specification of alternate format 241 | lists via "format{name} = LIST..." and default limit with 242 | "limit = N". Similarly, new format lists may be specific in 243 | sqlog.conf by creating a %FORMATS hash. User and system 244 | configs may override the default format key lists for 245 | "short", "long", and "freeform". 246 | 247 | * sqlog.1 : Document ~/.sqlog. 248 | 249 | * : tag v0.4. 250 | 251 | 2007-08-06 Mark Grondona 252 | 253 | * sqlog.spec : Add perl-DateManip and gendersllnl to Requires. 254 | 255 | * sqlog.1 : Add a note about sqlog's "More results available..." 256 | message. Add some more examples. 257 | 258 | * sqlog : When sorting start and end times, assume "NONE" to 259 | mean "possibly infinite in the future" by faking a date 10 years 260 | from now. This allows sorting end time to work as expected since 261 | all jobs that end in the future should have the greateast end 262 | time. 263 | 264 | * sqlog-db-util.8 : Added. 265 | 266 | * sqlog-db-util : Reformat usage output. 267 | 268 | * sqlog : Give each format type (short, long, freeform) its own 269 | format list. Put "longstart" and "longend" into default format 270 | lists for long and freeform output types. 271 | 272 | 2007-08-06 Adam Moody 273 | 274 | * sqlog : Changed option order in help output to more closely match 275 | the order in the manpage. 276 | 277 | * sqlog.1 : Added -L and -a, which were missing. Changed option 278 | listing order slightly to group options by function. 279 | 280 | 2007-08-06 Mark Grondona 281 | 282 | * sqlog : Apply sort keys to "ORDER BY" in DB query. Also, reverse 283 | the sense of '-' on sort keys to be more intuitive. 284 | 285 | * sqlog : Print "More results available..." if not all results from 286 | DB and/or queue were displayed due to --limit. 287 | 288 | * sqlog : Add "longstart" and "longend" format keys which print start 289 | and end datetime in format "%Y-%m-%dT%H:%M:%S". 290 | 291 | * sqlog : Fix typo in initialization of config arrays that caused 292 | selection of job states to break. 293 | 294 | 2007-08-04 Mark Grondona 295 | 296 | * slurm-joblog.pl, slurm-joblog.conf.example, README : 297 | Optionally create DB if it doesn't exist in slurm-joblog. 298 | 299 | * sqlog-db-util : Intialize $conf{verbose}. 300 | 301 | * sqlog-db-util : Add --info option. 302 | 303 | * : tag v0.3. 304 | 305 | 2007-08-03 Mark Grondona 306 | 307 | * sqlog-db-util : Add Adam Moody's utility for creation of SLURM 308 | job log database. 309 | 310 | * sqlog.spec : Add sqlog-db-util to specfile. 311 | 312 | * sqlog.conf.example, slurm-joblog.conf.example : 313 | Add example config files. 314 | 315 | * : tag v0.1. 316 | 317 | * README, NEWS : Added. 318 | 319 | * : tag v0.2. 320 | 321 | 2007-07-27 Mark Grondona 322 | 323 | * sqlog.1 : Update man page with new RANGE OPERATORS section. 324 | 325 | * sqlog : Remove Examples from --help. They are now in the manpage. 326 | Read alternate config from /etc/slurm/sqlog.conf or ~/.sqlog. 327 | 328 | * slurm-joblog.pl : Add slurm job completion script to repo. 329 | 330 | * META, sqlog.spec : Add specfile and META for building RPMs. 331 | 332 | 2007-07-25 Mark Grondona 333 | 334 | * sqlog : Changed range operator to ".." 335 | --time may specify a min, max, or window of time. 336 | @ may be used to escape leading + or - in datetime values. 337 | 338 | 2007-07-20 Adam Moody 339 | 340 | * sqlog : Added RANGE operator description to usage, 341 | --time still needs support. 342 | 343 | 2007-07-19 Adam Moody 344 | 345 | * sqlog : Added support for 'S' in time duration, only 's' was working. 346 | 347 | * sqlog : Changed comparison for runtime, minruntime, and maxruntime 348 | to string equality tests of "eq" and "ne" since numeric operators of ">" 349 | would get confused for input such as -T +1h. 350 | 351 | * sqlog : Changed 'N-M' to 'N--M' to be consistent with time window format. 352 | This way there is one consistent set of operators +/-/--. 353 | 354 | 2007-07-19 Mark Grondona 355 | 356 | * sqlog : New options --start, --start-before, --start-after, 357 | --end, --end-before, --end-after. Removed --before and --after 358 | which are replaced by --start-before, --start-after. 359 | 360 | * sqlog : Query running jobs by default and add -X, --no-running. 361 | 362 | * sqlog : Add functions for parsing time ranges and min/max specifications. 363 | 364 | * sqlog : Fix time window parsing. 365 | 366 | * sqlog.1 : Added man page. 367 | 368 | 2007-07-18 Mark Grondona 369 | 370 | * sqlog : Initial commit. 371 | 372 | -------------------------------------------------------------------------------- /DISCLAIMER: -------------------------------------------------------------------------------- 1 | This work was produced at the Lawrence Livermore National Laboratory 2 | (LLNL) under Contract No. DE-AC52-07NA27344 (Contract 44) between 3 | the U.S. Department of Energy (DOE) and Lawrence Livermore National 4 | Security, LLC (LLNS) for the operation of LLNL. 5 | 6 | This work was prepared as an account of work sponsored by an agency of 7 | the United States Government. Neither the United States Government nor 8 | Lawrence Livermore National Security, LLC nor any of their employees, 9 | makes any warranty, express or implied, or assumes any liability or 10 | responsibility for the accuracy, completeness, or usefulness of any 11 | information, apparatus, product, or process disclosed, or represents 12 | that its use would not infringe privately-owned rights. 13 | 14 | Reference herein to any specific commercial products, process, or 15 | services by trade name, trademark, manufacturer or otherwise does 16 | not necessarily constitute or imply its endorsement, recommendation, 17 | or favoring by the United States Government or Lawrence Livermore 18 | National Security, LLC. The views and opinions of authors expressed 19 | herein do not necessarily state or reflect those of the Untied States 20 | Government or Lawrence Livermore National Security, LLC, and shall 21 | not be used for advertising or product endorsement purposes. 22 | 23 | The precise terms and conditions for copying, distribution, and 24 | modification are specified in the file "COPYING". 25 | -------------------------------------------------------------------------------- /META: -------------------------------------------------------------------------------- 1 | ### 2 | ## $Id$ 3 | ### 4 | 5 | Name: sqlog 6 | Version: 0.25 7 | Release: 1 8 | -------------------------------------------------------------------------------- /NEWS: -------------------------------------------------------------------------------- 1 | Version 0.25 (2016-04-18): 2 | - Fix defined() on non-scalar warnings (Py Watson) 3 | 4 | Version 0.24 (2016-03-24): 5 | - Support for MariaDB 5.5 (Jeff B. Ogden) 6 | 7 | Version 0.23 (2015-09-02): 8 | - Handle SLURM's PREEMPTED state in sqlog(1). 9 | 10 | Version 0.22 (2011-12-23): 11 | - Handle SLURM's RESIZING state in sqlog(1). 12 | 13 | Version 0.21 (2011-12-08): 14 | - slurm-joblogger.pl: Do not use the NODECNT variable. It is likely to 15 | be incorrect. Instead always compute nodecount from the nodelist. 16 | - sqlog-db-util: Add new --recalc-nodecnt option that recalculates 17 | nodecount from nodelist on backfill. 18 | - sqlog: Fix "Use of uninitialized value" in sqlog on RUNNING jobs. 19 | 20 | Version 0.20 (2011-08-23): 21 | - sqlog-db-util: Don't make failure to connect to DB a fatal error for 22 | all cases. (Fixes bug in initial DB creation). 23 | - sqlog-db-util: Always connect to DB via 'localhost' if -L is used. 24 | 25 | Version 0.19 (2011-07-11): 26 | - Fix Issue 2: Use NODECNT environment variable if set by SLURM. 27 | - sqlog-db-util: Fix bug in initial DB creation. 28 | - Slightly better debug and error log messages. 29 | 30 | Version 0.18 (2009-11-09): 31 | - Fix bug in sqlog preventing queries with -n, --nodes. 32 | 33 | Version 0.17 (2009-10-23): 34 | - Add the skewstats(1) utility to the sqlog package. 35 | - The sqlog --regex option now only applies to the following job query 36 | option, instead of globally to all filter options. This mirrors the 37 | functionality of the --exclude option. 38 | 39 | Version 0.16 (2009-10-12): 40 | - Fix broken sqlog-db-util -L, --localhost option. 41 | - Fix job name (-J, --job-name) filtering for running jobs. 42 | - Fix use of --regex queries on running jobs and against v2 database. 43 | 44 | Version 0.15 (2009-05-07): 45 | - Fix sqlog -j, --jobids on systems with v2 database. 46 | 47 | Version 0.14 (2009-04-09): 48 | - Try harder to log to joblog text file when database isn't accessible. 49 | - Handle datetime of NONE in database. 50 | - Properly handle slurm-1.4 squeue datetime format. 51 | 52 | Version 0.13 (2009-04-02): 53 | - Update database schema from v1 to v2. 54 | - Added --covert, --backup, --prune, --obfuscate options to 55 | sqlog-db-util, as well as, --cores-per-node, --notrack, 56 | and --delay-index. Added "CONVERTING" and "BACKING UP" 57 | sections to README to discuss new options. 58 | - Added new indicies to schema: increased from just: username 59 | to: username, jobname, starttime, endtime, runtime, nodecount, 60 | corecount, nodename. Speeds up common queries. 61 | - Add corecount column to track number of cores allocated to each 62 | job, which is useful for machines using the consumable resources 63 | SLURM plugin. 64 | - Added --ncores, --mincores, --maxcores options to sqlog to 65 | specify conditions on new corecount column. 66 | - Extend nodelist column to fix truncation when very fragmented 67 | nodelists exceeded the 1024 char limit initially set for the field. 68 | 69 | Version 0.12 (2008-12-03): 70 | - Do not use Date::Manip routines to convert dates to "Unix time" 71 | (seconds since epoch) Date::Manip doesn't handle daylight savings 72 | transitions properly and instead uses the current DST offset. 73 | - New --regex option allows sqlog to query with regexes for jobids, 74 | user names, states, paritions, and job names, instead of a simple 75 | exact match. 76 | 77 | Version 0.11 (2008-06-24): 78 | - New slurm-joblog.conf parameter $SQLNETWORK sets the network 79 | on which read access to database is allowed. Default = 192.168.%.%. 80 | - sqlog-db-util now deletes slurm* users from mysql DB before 81 | creating new user entries to avoid stale privileges. 82 | 83 | Version 0.10 (2008-04-18): 84 | - Add OUTPUT FORMAT section to sqlog(1) manpage. 85 | - Improve RUNTIME parsing in sqlog script. 86 | 87 | Version 0.9 (2008-02-29): 88 | - Fix --format=long which wasn't properly setting long format. 89 | - Add "JOB STATE CODES" section to sqlog(1) man page describing the 90 | various job state abbreviations. 91 | 92 | Version 0.8 (2007-09-27): 93 | - Add more packages to RPM Requires 94 | - Fix test for whether job logfile is configured in slurm-joblog.pl. 95 | 96 | Version 0.7 (2007-08-13): 97 | - Applied Py Watson's fixes for non-standard installs: 98 | -- Allow perl library path and PATH to be specified at RPM build time. 99 | -- Use __perl RPM macro instead of hardcoding /usr/bin/perl. 100 | -- Other specfile improvements. 101 | - Fix sqlog-db-util.8 manpage permissions. 102 | 103 | Version 0.6 (2007-08-10): 104 | - Fix for bug in --end-before and --end-after argument processing. 105 | - Add new format keys: runtime_s (runtime in seconds) and unixstart/ 106 | unixend (start and end times in seconds since the epoch). 107 | 108 | Version 0.5 (2007-08-10): 109 | - Allow perl path (default = /usr/bin/perl) and 110 | confdir (default = /etc/slurm) to be overridden at RPM 111 | build time via _slurm_confidir and _perl_path. 112 | 113 | Version 0.4 (2007-08-07): 114 | - Fix for broken processing of -s, --states. 115 | - Sort keys are now applied in "ORDER BY" statement of database query. 116 | - New format keys "longstart" and "longend" for including year in output. 117 | - longstart/end are displayed by default in "long" and "freeform" output types. 118 | - Add support for user configuration in ~/.sqlog. 119 | - When sorting start and end time, assume "NONE" is the max date & time. 120 | - Add string [More results available...] if more results may be in database. 121 | - Added manpage for sqlog-db-util(8). 122 | - Manpage and --usage output cleanup. 123 | 124 | Version 0.3 (2007-08-04): 125 | - Enable auto-creation of database from slurm-joblog script. 126 | - Add --info option to sqlog-db-util. 127 | 128 | Version 0.2 (2007-08-03): 129 | - Add README and NEWS files. 130 | 131 | Version 0.1 (2007-08-03): 132 | - Initial release. 133 | -------------------------------------------------------------------------------- /README: -------------------------------------------------------------------------------- 1 | The sqlog package contains a set of scripts useful for creating, 2 | populating, and issuing queries to a SLURM job log database. 3 | 4 | COMPONENTS 5 | 6 | sqlog The "SLURM Query Log" utility. Provides a single interface to 7 | query jobs from the SLURM job log database and/or current 8 | queue of running jobs. 9 | 10 | slurm-joblog Logs completed jobs using SLURM's jobcomp/script interface 11 | to the SLURM job log database and an optional text file. 12 | 13 | 14 | sqlog-db-util Administrative utility used to create SLURM job log database 15 | and its corresponding users. Also provides an interface to 16 | "backfill" the database using existing SLURM joblog files 17 | created by the jobcomp/filetxt plugin. 18 | 19 | sqlog.conf World-readable config file. Contains local configuration for 20 | SQL host, read-only user, and read-only password. 21 | 22 | slurm-joblog.conf 23 | Private configuration for slurm-joblog script (also used by 24 | by sqlog-db-util). Contains SQL read-write user and password, 25 | root user passwd (for sqlog-db-util) and a list of hosts 26 | that should have RW access to DB. 27 | 28 | 29 | CONFIGURATION 30 | 31 | For fully-automated operation, both the /etc/slurm/sqlog.conf and 32 | /etc/slurm/slurm-joblog.conf must exist. These files are read 33 | using perl's do() function, so the files can and must be valid perl. 34 | This allows a bit of scripting to get the values if necessary. 35 | (See the sqlog doc directory for examples). 36 | 37 | The available variables in each config file include: 38 | 39 | sqlog.conf: 40 | 41 | SQLHOST SQL server hostname (default = sqlhost) 42 | SQLUSER Read-only user (default = slurm_read) 43 | SQLPASS Read-only password (default = none) 44 | SQLDB DB name (default = slurm) 45 | TRACKNODES Set to 0 to disable per-job node tracking (default = 1) 46 | %FORMATS Hash of format aliases (e.g. "f1" => "jid,name,user,state") 47 | 48 | slurm-joblog.conf: 49 | 50 | SQLUSER Read-write user (default = slurm) 51 | SQLPASS Read-write password (not set) 52 | SQLROOTPASS DB root password (not set) 53 | @SQLRWHOSTS Read-write hosts (array of hosts to give rw access) 54 | JOBLOGFILE txt joblog location (set if you want to log to a file too) 55 | AUTOCREATE Attempt to create DB if it doesn't yet exist the 56 | first time slurm-joblog is run (default = no). 57 | 58 | 59 | CREATING JOB LOG DATABASE 60 | 61 | Once the config files exist, the following command will create the 62 | SLURM job log database: 63 | 64 | sqlog-db-util --create 65 | 66 | If you have existing text joblog files you'd like to seed the new 67 | DB with, use 68 | 69 | sqlog-db-util --backfill [FILE]... 70 | 71 | e.g. 72 | 73 | sqlog-db-util --backfill /var/log/slurm/joblog* 74 | 75 | If AUTOCREATE is set in slurm-joblog.conf, then sqlog-db-util --create 76 | will be automatically run the first time the database is accessed. 77 | 78 | CONVERTING JOB LOG DATABASE 79 | 80 | The database schema changed from v0.12 to v0.13 of the sqlog package. 81 | The highest schema version currently running on a system can be 82 | determined from the --info output. 83 | 84 | To create tables for the new schema, run: 85 | 86 | sqlog-db-util --create 87 | 88 | Once created, the slurm-joblog.pl script will detect the new schema 89 | and automatically switch to insert records to the new tables. The sqlog command 90 | will query both schemas for records. 91 | 92 | To copy existing data from the old schema to the new schema, 93 | use the --convert option. 94 | 95 | Speeding up the conversion: 96 | The new schema tracks the nodes that each job uses so that sqlog queries 97 | involving nodes names return much faster. The data and indicies associated 98 | with this node tracking can significantly slow down the conversion operation 99 | when converting a large number of records. There are two options to speed 100 | this up: 101 | 102 | 1) Disable node-tracking for all converted jobs via the --notrack option. 103 | 2) Delay indexing of converted data via the --delay-index option. 104 | 105 | With the --notrack option, no node-tracking data will be stored for jobs 106 | inserted via conversion. As such, if node-tracking is enabled on the 107 | system, such jobs will not return in queries involving node names. Newly 108 | inserted jobs will still have node-tracking data. 109 | 110 | With the --delay-index option, node tracking indicies are removed before 111 | data is converted, and they are restored when the conversion completes. 112 | Queries involving node names while there are no indicies will take a very 113 | long time to return on a large database. 114 | 115 | For a database on Atlas, which had 580,000 jobs spanning two years the 116 | conversion took: 117 | 118 | 13 minutes for: sqlog-db-util --convert --notrack 119 | 33 minutes for: sqlog-db-util --convert --delay-index 120 | 85 minutes for: sqlog-db-util --convert 121 | 122 | The recommended method is to use --delay-index. 123 | 124 | It's also possible to disable node-tracking in the new schema completely. 125 | To do this, add the following line to the sqlog.conf file. 126 | 127 | $TRACKNODES=0; 128 | 129 | Number of allocated cores: 130 | The new schema adds a new field to record the number of cores allocated 131 | to a job. This data was not captured in the version 1 schema. However, 132 | on many systems, this core count can be computed. On systems that have the 133 | same number of cores per node and allocate whole nodes to a job, one may 134 | use the --cores-per-node option to specify the number of cores per node. 135 | This --cores-per-node value is multiplied with the node count recorded 136 | in the version 1 schema to determine the number of cores allocated to 137 | the job. For example, to convert from schema version 1 to version 2 on 138 | a machine that has 8 cores per node and allocates whole nodes to jobs, 139 | run the following command: 140 | 141 | sqlog-db-util --convert --cores-per-node=8 142 | 143 | For all other systems, do not specify --cores-per-node. In this case, 144 | the number of cores allocated will be set to 0. The conversion command 145 | on these systems is simply: 146 | 147 | sqlog-db-util --convert 148 | 149 | If a mistake is made during conversion, you can drop the version 2 tables 150 | and start from scratch (be very careful to specify '2' and not '1' here): 151 | 152 | sqlog-db-util --drop=2 153 | 154 | You may issue the --convert command on a live system, however, be 155 | careful to specify the command correctly in this case. The slurm-joblog.pl 156 | script will insert records to the new schema as soon as it is created. 157 | If a mistake is made during conversion, and the version 2 tables must 158 | be dropped and recreated, any records inserted by slurm-joblog.pl will be lost. 159 | 160 | After conversion, sqlog may report duplicate records as it finds 161 | matches from both the version 1 and version 2 tables. Once converted, 162 | it's recommended that the version 1 tables be dropped by running the 163 | following command (be very careful to specify '1' and not '2' here): 164 | 165 | sqlog-db-util --drop=1 166 | 167 | Finally, here is a full example set of commands to create the new schema 168 | and convert records to it: 169 | 170 | sqlog-db-util -v --create 171 | sqlog-db-util -v --backup=all schema1_jobs.log 172 | sqlog-db-util -v --convert --delay-index --cores-per-node=8 173 | sqlog-db-util -v --drop=1 174 | 175 | BACKING UP AND PRUNING THE DATABASE 176 | 177 | It is possible to dump records from the job log database into a text 178 | file, which can then be read in via --backfill. This is useful to 179 | capture a text file backup of the logs. One must specify the time 180 | period as either "all", "DATE", or "DATE..DATE", to dump all jobs, 181 | jobs before a given date, and jobs that started between two dates, 182 | respectively. DATE should be specified with the 'YYYY-MM-DD HH:MM:SS' 183 | format, e.g., 184 | 185 | sqlog-db-util -v --backup='2009-01-01 00:00:00'..'2009-02-01 00:00:00'\ 186 | logs.txt 187 | 188 | One utility of this backup option is to share job log records with 189 | others potentially outside of the organization. Typically, one would 190 | like to protect user and job names when sharing such information. 191 | For this, an --obfuscate option is available which dumps records and 192 | modifies user names to be of the form "user_X", userids to match "X", 193 | and job names to be of the form "job_Y", where X and Y are numbers. 194 | 195 | Finally, over a long period of time, the database may gather so many 196 | records that is slows down significantly. A --prune option is available 197 | to remove old records. One specifies a date, and all jobs which started 198 | before that date will be removed from the database and written to a file 199 | name specified by the user, e.g., 200 | 201 | sqlog-db-util -v --prune='2007-01-01 00:00:00' pre2007.log 202 | 203 | ENABLE JOB LOGGING 204 | 205 | To enable the SLURM job log database, the following configuration 206 | options must be set in slurm.conf: 207 | 208 | JobCompType = jobcomp/script 209 | JobCompLoc = /usr/libexec/sqlog/slurm-joblog 210 | 211 | Adjust the path if the sqlog RPM was installed with a different PREFIX. 212 | This has only been tested on SLURM 1.2.10 or greater. 213 | 214 | Restart slurmctld and slurm-joblog will begin logging jobs as they 215 | complete. 216 | 217 | $Id$ 218 | -------------------------------------------------------------------------------- /skewstats: -------------------------------------------------------------------------------- 1 | #!/usr/bin/perl 2 | ############################################################################### 3 | # $Id$ 4 | #****************************************************************************** 5 | # Copyright (C) 2007-2009 Lawrence Livermore National Security, LLC. 6 | # Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER). 7 | # Written by Mark Grondona 8 | # 9 | # UCRL-CODE-235340. 10 | # 11 | # This file is part of sqlog. 12 | # 13 | # This is free software; you can redistribute it and/or modify it 14 | # under the terms of the GNU General Public License as published by 15 | # the Free Software Foundation; either version 2 of the License, or 16 | # (at your option) any later version. 17 | # 18 | # This is distributed in the hope that it will be useful, but WITHOUT 19 | # ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or 20 | # FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License 21 | # for more details. 22 | # 23 | # You should have received a copy of the GNU General Public License 24 | # along with this program; if not, see . 25 | # 26 | ############################################################################## 27 | # 28 | # skewstats - SLURM Queue Stats 29 | # 30 | # Report simple SLURM utilization and other statistics for a configurable time 31 | # window optionally split into intervals. Uses the sqlog(1) utility to pull 32 | # historical job information from the SLURM job log database. 33 | # 34 | ############################################################################## 35 | use strict; 36 | use Date::Manip; 37 | use Hostlist qw/ expand compress union /; 38 | use Getopt::Long qw/ :config gnu_getopt /; 39 | use Time::HiRes qw/ tv_interval gettimeofday /; 40 | use File::Basename; 41 | 42 | my @utilization = (); 43 | my %opt = (); 44 | my $progname = basename ($0); 45 | 46 | ############################################################################## 47 | # 48 | # Usage: 49 | # 50 | 51 | my $usage = < { "name" => "STARTTIME", "fmt" => "@>>>>>>>>>>>>>>" }, 90 | "endtime" => { "name" => "ENDTIME", "fmt" => "@>>>>>>>>>>>>>>" }, 91 | "njobs" => { "name" => "NJOBS", "fmt" => "@######" }, 92 | "avgjobsize" => { "name" => "AVGSZ", "fmt" => "@###.##" }, 93 | "maxsize" => { "name" => "MAXSZ", "fmt" => "@####" }, 94 | "minsize" => { "name" => "MINSZ", "fmt" => "@####" }, 95 | "medsize" => { "name" => "MEDSZ", "fmt" => "@###.##" }, 96 | "njobstarts" => { "name" => "STARTED", "fmt" => "@######" }, 97 | "njobends" => { "name" => "ENDED", "fmt" => "@#####" }, 98 | "NF" => { "name" => "NF", "fmt" => "@####" }, 99 | "F" => { "name" => "F", "fmt" => "@####" }, 100 | "TO" => { "name" => "TO", "fmt" => "@####" }, 101 | "CA" => { "name" => "CA", "fmt" => "@####" }, 102 | "CD" => { "name" => "CD", "fmt" => "@####" }, 103 | "utilization" => { "name" => "UTILIZATION","fmt" => "@######.###" } 104 | ); 105 | 106 | ############################################################################## 107 | # 108 | # Main script: 109 | 110 | &parse_options (); 111 | 112 | # 113 | # Create user-defined format: 114 | # 115 | &format_create (); 116 | 117 | # 118 | # Get list of jobs between start and end time: 119 | # 120 | my @joblist = &get_jobs (); 121 | my @results = &utilization_loop ($opt{start}, $opt{end}, $opt{interval_s}); 122 | 123 | if (scalar @results > 1 && $opt{print_total}) { 124 | push (@results, &utilization ($opt{start}, $opt{end})); 125 | } 126 | 127 | # 128 | # Write output using format defined in &format_create() 129 | # 130 | write foreach @results; 131 | 132 | exit 0; 133 | ############################################################################## 134 | # 135 | # Functions: 136 | 137 | sub format_create 138 | { 139 | my $format_top = ""; 140 | my $format = ""; 141 | my $common = ""; 142 | my %count; 143 | 144 | if ($opt{date_format} =~ /^%H:/) { 145 | $fmt{starttime}{fmt} =~ s/>>>>>//; 146 | $fmt{endtime}{fmt} =~ s/>>>>>//; 147 | } 148 | 149 | my @form = grep { !$count{$_}++ } split /\s*,\s*/, $opt{format}; 150 | 151 | for my $f (@form) { 152 | if (!exists $fmt{$f}) { 153 | log_fatal ("Invalid format key \"$f\"\n"); 154 | next; 155 | } 156 | $common .= "$fmt{$f}{fmt} "; 157 | } 158 | $common .= "\n"; 159 | 160 | $format .= "format STDOUT = \n"; 161 | $format .= $common; 162 | $format .= join ',', map { '$$_{' . $_ . '}' } @form; 163 | $format .= "\n.\n"; 164 | 165 | # 166 | # Change '#' and '.' to '>' for format header 167 | # 168 | $common =~ s/(#|\.)/>/g; 169 | 170 | $format_top .= "format STDOUT_TOP = \n"; 171 | $format_top .= $common; 172 | $format_top .= join ',', map { '"'. $fmt{$_}{name} . '"' } @form; 173 | $format_top .= "\n.\n"; 174 | 175 | eval $format; 176 | eval $format_top unless $opt{noheader}; 177 | } 178 | 179 | # 180 | # Calculate utilization for all time intervals 181 | # 182 | sub utilization_loop 183 | { 184 | my ($t1, $end, $interval) = @_; 185 | my $t2; 186 | my @results = (); 187 | 188 | # Loop over all configured intervals printing utilization for each: 189 | # 190 | do { 191 | my $err; 192 | 193 | $t2 = DateCalc ($t1, "+${interval}s", \$err); 194 | 195 | # Shrink this interval if it extends past end time. 196 | # 197 | $t2 = $end if (Date_Cmp ($t2, $end) > 0); 198 | 199 | push (@results, &utilization ($t1, $t2)); 200 | 201 | } while (Date_Cmp (($t1 = $t2), $end) < 0 && $interval != 0); 202 | 203 | return @results; 204 | } 205 | 206 | # 207 | # Calculate some stats (including utilization) for a 208 | # specific time window. Returns a reference to a hash containing 209 | # the results. 210 | # 211 | sub utilization 212 | { 213 | my ($t1, $t2) = @_; 214 | my $nodesecs = 0; 215 | my $njobs = 0; 216 | my @nnodes = (); 217 | my $totalnodesecs = 0; 218 | my $njobstarts = 0; 219 | my $njobends = 0; 220 | my %states; 221 | 222 | # Convert start and end time to seconds since epoch for 223 | # quicker calculations and comparisons below. 224 | # 225 | my $starttime = UnixDate ($t1, "%s"); 226 | my $endtime = UnixDate ($t2, "%s"); 227 | 228 | # Total node-seconds available is nnodes * window, unless 229 | # start and end time are the same, in which case we just use 230 | # nnodes (instantaneous snapshot). 231 | # 232 | $totalnodesecs = ($starttime < $endtime) ? 233 | ($endtime - $starttime) * $opt{nnodes} : $opt{nnodes}; 234 | 235 | for my $job (@joblist) { 236 | my $jobstart = $$job{start}; 237 | my $jobend = $$job{end}; 238 | 239 | # Continue if this job ran outside the current interval. 240 | # 241 | next if ($jobend < $starttime || $jobstart > $endtime); 242 | 243 | # Count number of jobs and keep a list of node count: 244 | # 245 | $njobs++; 246 | push (@nnodes, $$job{nnodes}); 247 | 248 | # Adjust job start and end times if either fell outside 249 | # the current time window. Otherwise count the number of jobs 250 | # starting and/or ending during this time. 251 | # 252 | ($jobstart < $starttime) ? $jobstart = $starttime : $njobstarts++; 253 | if ($jobend > $endtime) { 254 | $jobend = $endtime; 255 | } 256 | else { 257 | $njobends++; 258 | $states{$$job{state}}++; 259 | } 260 | 261 | # 262 | # If we're using an instantaneous snapshot, then just count 263 | # number of nodes used (i.e. runtime == 1). 264 | # 265 | my $runtime = ($starttime == $endtime) ? 1 : ($jobend - $jobstart); 266 | 267 | # Add this job's node-seconds used to the total node-seconds 268 | # utilized during this interval: 269 | # 270 | $nodesecs += ($runtime * $$job{nnodes}); 271 | } 272 | 273 | my %r = (); 274 | 275 | @nnodes = sort { $a <=> $b } @nnodes; 276 | 277 | $r{t1} = $t1; 278 | $r{t2} = $t2; 279 | $r{starttime} = UnixDate ($t1, $opt{date_format}); 280 | $r{endtime} = UnixDate ($t2, $opt{date_format}); 281 | $r{totalnodesec} = $totalnodesecs; 282 | $r{usednodesec} = $nodesecs; 283 | $r{njobs} = $njobs; 284 | $r{avgjobsize} = mean (@nnodes); 285 | $r{medsize} = median (@nnodes); 286 | $r{maxsize} = $nnodes[$#nnodes]; 287 | $r{minsize} = $nnodes[0]; 288 | $r{njobstarts} = $njobstarts; 289 | $r{njobends} = $njobends; 290 | $r{utilization} = ($nodesecs / $totalnodesecs); 291 | for my $state (qw/ NF F CA TO CD /) { 292 | $r{$state} = $states{$state}; 293 | } 294 | 295 | return (\%r); 296 | } 297 | 298 | # 299 | # Calculate meant of an array of values 300 | # 301 | sub mean 302 | { 303 | return 0 unless @_; 304 | 305 | my $total = 0; 306 | $total += $_ for @_; 307 | return $total / scalar @_; 308 | } 309 | 310 | # 311 | # Calculate median of an array of values 312 | # 313 | sub median 314 | { 315 | my @s = sort { $a <=> $b } @_; 316 | my $n = scalar @s + 1; 317 | return (($s[($n/2) + 1] + $s[$n/2])/2) if ($n % 2); 318 | return $s[$n/2]; 319 | } 320 | 321 | # 322 | # Return a list of job info hashes for all jobs that ran in the 323 | # time interval (start, end). 324 | # 325 | sub get_jobs 326 | { 327 | my @jobs = (); 328 | my $cmd = "sqlog -Ho jid,unixstart,unixend,runtime_s,nnodes,st -L0"; 329 | 330 | # Ignore completing jobs, which don't contribute to utilization 331 | # 332 | $cmd .= " -xs CG"; 333 | 334 | # 335 | # If start time is "now" then there is no need for sqlog to 336 | # query the db. 337 | # 338 | $cmd .= $opt{snapshot} ? " --no-db" : ""; 339 | 340 | # Grab jobs that were running during our configured window: 341 | # 342 | $cmd .= " -t $opt{start}..$opt{end}"; 343 | 344 | log_verbose ("Querying jobs from $opt{start} to $opt{end}.\n"); 345 | log_debug ("Running $cmd\n"); 346 | 347 | my $t0 = [gettimeofday]; 348 | 349 | open (SQLOG, "$cmd |") or log_fatal ("Failed to run $cmd: $!\n"); 350 | while () { 351 | chomp; 352 | push (@jobs, job_entry_create (split)); 353 | } 354 | close (SQLOG); 355 | 356 | log_verbose ("sqlog took ", sprintf ("%.3f", tv_interval ($t0)) , "s.\n"); 357 | log_verbose ("Found ", scalar @jobs, " jobs.\n"); 358 | 359 | return @jobs; 360 | } 361 | 362 | # 363 | # Return a reference to a hash with job info 364 | # 365 | sub job_entry_create 366 | { 367 | return { 368 | "jobid" => shift, 369 | "start" => shift, 370 | "end" => shift, 371 | "runtime" => shift, 372 | "nnodes" => shift, 373 | "state" => shift, 374 | }; 375 | } 376 | 377 | # 378 | # Parse user command line options: 379 | # 380 | sub parse_options 381 | { 382 | $opt{verbose} = 0; 383 | $opt{date_format} = "%H:%M:%S"; 384 | $opt{format} = "starttime,endtime,njobs,utilization"; 385 | 386 | my $start; 387 | my $end; 388 | my $err; 389 | 390 | my $rc = GetOptions (\%opt, 391 | 'help|h', 392 | 'start|s=s', 393 | 'end|e=s', 394 | 'nnodes|n=i', 395 | 'interval|i=s', 396 | 'print_total|total|t', 397 | 'output|o=s', 398 | 'noheader|H', 399 | 'verbose|v+', 400 | 401 | # Undocumented: 402 | 'format=s', 403 | 'date_format|date-format=s', 404 | ); 405 | 406 | &usage() if defined $opt{help} || ! $rc; 407 | 408 | $opt{format} = "starttime,endtime,njobs,njobstarts,njobends" . 409 | ",avgjobsize,utilization" 410 | if ($opt{jobstats}); 411 | 412 | # 413 | # Default start time is 12AM today, default end time is "now" 414 | # 415 | $opt{start} = "today,12am" if (!$opt{start}); 416 | $opt{end} = "now" if (!$opt{end}); 417 | 418 | # 419 | # Note that we're doing a snapshot of start time == "now" 420 | # 421 | $opt{snapshot}++ if ($opt{start} eq "now"); 422 | 423 | # 424 | # Set nnodes to the number of nodes configured now if not set 425 | # 426 | chomp ($opt{nnodes} = `sinfo -ho %D`) if (!$opt{nnodes}); 427 | 428 | # 429 | # Special case: If --end option begins with a '+' consider this 430 | # an offset from --start time. 431 | # 432 | $opt{end} = DateCalc ($opt{start}, "+$opt{end}", \$err) 433 | if ($opt{end} =~ s/^\+//); 434 | 435 | # 436 | # Make sure start and end times can be parsed as datetimes: 437 | # 438 | log_fatal ("Failed to parse datetime \"$opt{start}\"\n") 439 | if (!($start = ParseDate ($opt{start}))); 440 | 441 | log_fatal ("Failed to parse datetime \"$opt{end}\"\n") 442 | if (!($end = ParseDate ($opt{end}))); 443 | 444 | log_fatal ("Start time \"$opt{start}\" greater than end time \"$opt{end}\"\n") 445 | if (Date_Cmp ($start, $end) > 0); 446 | 447 | # 448 | # Convert start and end time to formats useful for sqlog: 449 | # 450 | $opt{start} = UnixDate ($start, "%Y-%m-%dT%H:%M:%S"); 451 | $opt{end} = UnixDate ($end, "%Y-%m-%dT%H:%M:%S"); 452 | 453 | # 454 | # Generate interval in seconds (interval_s): 455 | # 456 | if (!$opt{interval}) { 457 | $opt{interval_s} = UnixDate($end, "%s") - UnixDate($start, "%s"); 458 | } 459 | else { 460 | $opt{interval_s} = duration_to_seconds ($opt{interval}) 461 | or log_fatal ("Invalid interval \"$opt{interval}\"\n"); 462 | } 463 | 464 | # 465 | # Include MM-DD in default date format if start and end were on 466 | # different days. 467 | # 468 | $opt{date_format} = "%m-%d-%H:%M:%S" 469 | if (UnixDate ($opt{start}, "%m%d") != UnixDate ($opt{end}, "%m%d")); 470 | 471 | if ($opt{output}) { 472 | my $format = "starttime,endtime,njobs"; 473 | for (split /,/, $opt{output}) { 474 | /^util(ization)?/ && 475 | do { $format .= ",utilization"; next; }; 476 | /^(job)?stats/ && 477 | do { $format .= ",njobends,F,NF,TO,CA,CD"; next; }; 478 | /^(job)?starts/ && 479 | do { $format .= ",njobstarts"; next; }; 480 | /^(job)?size/ && 481 | do { $format .= ",maxsize,minsize,avgjobsize,medsize"; next; }; 482 | /^all/ && 483 | do { $format .= ",njobstarts,njobends,maxsize,minsize" . 484 | ",avgjobsize,medsize,F,NF,TO,CA,CD,utilization"; next }; 485 | log_fatal ("Invalid argument: --output \"$_\"\n"); 486 | } 487 | 488 | log_debug ("format = $format\n"); 489 | $opt{format} = $format; 490 | } 491 | 492 | } 493 | 494 | 495 | # Convert a duration to seconds. 496 | # 497 | # Valid duration strings include the common SLURM form D-HH:MM:SS 498 | # or the form 3H 499 | # 500 | sub duration_to_seconds 501 | { 502 | my ($t) = @_; 503 | my ($d, $h, $m, $s); 504 | 505 | # 506 | # list of valid regexes to check in order. 507 | # 1. DD-HH:MM:SS type 508 | # 2. 1hr20min or 1h20m type 509 | # 510 | my @regexes = qw/ 511 | ^(?:(\d+)(?:-))?(\d*?):?(\d*?):(\d+)$ 512 | ^(\d*?)(?i:d|days?)?(\d*?)(?i:hr?)?(\d*?)(?i:m|min)?(\d*)(?i:s||sec)?$ 513 | NOTFOUND 514 | /; 515 | 516 | for my $re (@regexes) { 517 | return undef if ($re eq "NOTFOUND"); 518 | last if (($d, $h, $m, $s) = ($t =~ /$re/)) 519 | } 520 | 521 | log_debug ("duration_to_seconds ($t): d=$d h=$h m=$m s=$s\n"); 522 | 523 | return (($s||0) + ($m||0) * 60 + ($h||0) * 3600 + ($d||0) * 3600 * 24); 524 | } 525 | 526 | 527 | sub usage 528 | { 529 | print STDERR $usage; 530 | exit 1; 531 | } 532 | 533 | sub log_msg { print STDERR "$progname: ", @_; } 534 | sub log_verbose { log_msg (@_) if ($opt{verbose} > 0); } 535 | sub log_debug { log_msg (@_) if ($opt{verbose} > 1); } 536 | sub log_error { log_msg ("Error: ", @_); } 537 | sub log_fatal { log_msg ("Fatal: ", @_); exit 1; } 538 | 539 | # vi: ts=4 sw=4 expandtab 540 | -------------------------------------------------------------------------------- /skewstats.1: -------------------------------------------------------------------------------- 1 | .\" $Id$ 2 | .\" 3 | 4 | .TH SKEWSTATS 1 "SLURM Queue Stats" 5 | 6 | .SH NAME 7 | skewstats \- report simple SLURM queue statistics 8 | 9 | .SH SYNOPSIS 10 | .B skewstats 11 | [\fIOPTIONS\fR]... 12 | 13 | .SH DESCRIPTION 14 | The \fBskewstats\fR utility reports simple SLURM queue statistics 15 | such as number of jobs executed and average utilization over 16 | defined time periods. It uses the \fBsqlog\fR(1) utility to query 17 | the SLURM job log database, so \fBskewstats\fR can be used to 18 | report on historical data. 19 | 20 | By default \fBskewstats\fR reports number of jobs run during the 21 | specified time period, as well as the cluster utilization. However, 22 | other data is available such as average job size, number of jobs 23 | that started running during the specified time interval, and number 24 | of jobs that ended. Other data may be included in future versions 25 | of the script. 26 | 27 | The default time window for which \fBskewstats\fR reports when 28 | run with no arguments is 12AM, today to the current time. Thus 29 | it is reporting statistics for the current day so far. Both the 30 | start and end time of the window may be specified with the 31 | \fI--start\fR and \fI--end\fR options. For instance, to get 32 | statistics for the current instant only, \fI--start\fR=\fBnow\fR may be 33 | used. See the \fIOPTIONS\fR section below for further information. 34 | 35 | Unless the \fI-i, --interval\fR option is used, \fBskewstats\fR 36 | will display one line of output for the entire time period from 37 | \fI--start\fR to \fIend\fR. The \fI--interval\fR option can be 38 | used to break the time period into equal-sized intervals, and 39 | stats are summarized for each interval in turn. If the \fI-t, --total\fR 40 | option is used, a final summary line is displayed for the 41 | whole time period. 42 | 43 | .SH OPTIONS 44 | .TP 45 | .BI "-h, --help" 46 | Display a summary of the command-line options. 47 | .TP 48 | .BI "-v, --verbose" 49 | Increase debugging verbosity of the program. 50 | .TP 51 | .BI "-H, --noheader" 52 | Do not display a header row in output. 53 | .TP 54 | .BI "-s, --start " DATETIME 55 | Provide a start date and time for the time window of interest. 56 | The default start time is today, 12AM. 57 | .TP 58 | .BI "-e, --end " DATETIME 59 | Provide an end date and time for the window of interest. 60 | The default end time is the current time (or \fInow\fR). As a special 61 | case, if the end DATETIME begins with a plus \fI+\fR, then the end 62 | date and time is considered to be an offset from the start time. 63 | .TP 64 | .BI "-i, --interval " DURATION 65 | Split time window into intervals of size \fIDURATION\fR. DURATION may 66 | have the form DD-HH:MM:SS or DDdHHhMMmSSs or DDdaysHHhrMMminSSsec, 67 | where DD is days, HH is hours, MM is minutes, and SS is seconds. In the 68 | second two forms, values that are zero may be lef out, e.g. 4hr. 69 | In the first form, days, hours, and minutes are optional, e.g. :04 70 | is 4 seconds. 71 | .TP 72 | .BI "-t, --total " 73 | Include a final line summarizing statistics for the total time 74 | period when using the \fI--interval\fR option. 75 | .TP 76 | .BI "-o, --output " TYPE 77 | Select alternate output statistics. By default, TYPE is 78 | "utilization" which includes the number of jobs run and the 79 | cluster utilization during the specified time window. Alternate 80 | output types include: 81 | .TP 20 82 | .B "utilization | util" 83 | This is default output type. It reports the cluster utilization for the 84 | configured time window. 85 | .TP 86 | .B "jobstats | stats" 87 | Report job completion statistics including the number of jobs that ended 88 | during the time window, and the number of jobs that ended with each of 89 | the job state codes F = failed, NF = node failure, TO = timed out, 90 | CA = cancelled, and CD = completed. 91 | .TP 92 | .B "jobstarts | starts" 93 | Report the number of jobs that started during the time window. 94 | .TP 95 | .B "jobsize | size" 96 | Report simple job size statistics, including the maximum, minimum, 97 | average, and median job size. 98 | .TP 99 | .B "all" 100 | Report all job statistics at once. 101 | 102 | .SH EXAMPLES 103 | Display number of jobs run and utilization from 3 hrs ago until now: 104 | .nf 105 | 106 | skewstats --start=-3hr 107 | 108 | .fi 109 | Display statistics since yesterday at 8AM divided into 1hr intervals, 110 | including extra job stats such as number of jobs starting and ending 111 | during each time interval: 112 | .nf 113 | 114 | skewstats -i 1h -s yesterday,8am --jobstats 115 | 116 | .fi 117 | Display stats for the 4 hour window starting on April 3rd, 2008, 09:00 118 | .nf 119 | 120 | skewstats --start=2008-04-03T09:00 --end=+3hr 121 | 122 | .fi 123 | .SH AUTHOR 124 | Written by Mark Grondona. 125 | 126 | .SH "SEE ALSO" 127 | .BR sqlog (1), 128 | .BR squeue (1), 129 | .BR sinfo (1) 130 | -------------------------------------------------------------------------------- /slurm-joblog.conf.example: -------------------------------------------------------------------------------- 1 | ############################################################################### 2 | # $Id$ 3 | ############################################################################### 4 | # 5 | # SLURM Job log utility config file. 6 | # 7 | # Allows configuration of the following: 8 | # 9 | # SQLUSER : The Read-write username for the DB 10 | # SQLPASS : Read-write password 11 | # SQLROOTPASS : Root password (Needed for DB creation) 12 | # JOBLOGFILE : Location of joblog file (empty if you don't want a logfile) 13 | # SQLRWHOSTS : Array of all hosts from which to allow RW user access. 14 | # SQLNETWORK : Restricted network for db (default = 192.168.%.%) 15 | # 16 | package conf; 17 | use Genders; 18 | 19 | $SQLUSER = "slurm"; 20 | $SQLPASS = "MyReadWritePassword"; 21 | $SQLNETWORK = "192.168.%.%"; 22 | 23 | # Root password needed for creation of SLURM tables 24 | $SQLROOTPASS = "MyRootPassword"; 25 | 26 | # Attempt to autocreate DB if it doesn't exist 27 | $AUTOCREATE = 1; 28 | 29 | # Job log file. If no logfile, set to empty. 30 | $JOBLOGFILE = "/var/log/slurm/joblog"; 31 | 32 | # Give rw access to slurm db from these hosts 33 | @SQLRWHOSTS = get_rw_nodes (); 34 | 35 | 1; 36 | 37 | sub get_rw_nodes 38 | { 39 | my $g = Genders->new (); 40 | my @nodes = $g->getnodes ("mysqld"); 41 | push (@nodes, $g->getnodes ("primgmt")); 42 | push (@nodes, $g->getnodes ("altmgmt")); 43 | 44 | # Include altnames 45 | push (@nodes, map { $g->getattrval ("altname", $_) } @nodes); 46 | 47 | return (@nodes); 48 | } 49 | # vi: ts=4 sw=4 50 | -------------------------------------------------------------------------------- /slurm-joblog.pl: -------------------------------------------------------------------------------- 1 | #!/usr/bin/perl -w 2 | ############################################################################### 3 | # $Id$ 4 | #****************************************************************************** 5 | # Copyright (C) 2007-2009 Lawrence Livermore National Security, LLC. 6 | # Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER). 7 | # Written by Adam Moody and 8 | # Mark Grondona 9 | # 10 | # UCRL-CODE-235340. 11 | # 12 | # This file is part of sqlog. 13 | # 14 | # This is free software; you can redistribute it and/or modify it 15 | # under the terms of the GNU General Public License as published by 16 | # the Free Software Foundation; either version 2 of the License, or 17 | # (at your option) any later version. 18 | # 19 | # This is distributed in the hope that it will be useful, but WITHOUT 20 | # ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or 21 | # FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License 22 | # for more details. 23 | # 24 | # You should have received a copy of the GNU General Public License 25 | # along with this program; if not, see . 26 | ############################################################################### 27 | # 28 | # This script is run by the SLURM controller at every job completion 29 | # to insert records in the job completion database. 30 | # 31 | # 32 | require 5.005; 33 | use strict; 34 | use lib qw(); # Required for _perl_libpaths RPM option 35 | use DBI; 36 | use File::Basename; 37 | use POSIX qw(strftime); 38 | use Hostlist qw(expand); 39 | 40 | # Required for _path_env_var RPM option 41 | $ENV{PATH} = '/bin:/usr/bin:/usr/sbin'; 42 | 43 | my $prog = basename $0; 44 | 45 | # List of job variables provided in ENV by SLURM. 46 | # 47 | my @SLURMvars = qw(JOBID UID JOBNAME JOBSTATE PARTITION LIMIT START END 48 | NODES PROCS NODECNT); 49 | 50 | # List of parameters (in order) to pass to SQL execute command below. 51 | # 52 | my @params = qw(jobid username uid jobname jobstate partition limit 53 | start end nodes nodecount procs); 54 | 55 | # 56 | # Set up SQL parameters 57 | # 58 | my %conf = (); 59 | $conf{db} = "slurm"; 60 | $conf{sqluser} = "slurm"; 61 | $conf{sqlpass} = ""; 62 | $conf{sqlhost} = "sqlhost"; 63 | $conf{stmt_v1} = qq(INSERT INTO slurm_job_log VALUES (?,?,?,?,?,?,?,?,?,?,?,?)); 64 | $conf{confdir} = "/etc/slurm"; 65 | 66 | # enables / disables node tracking per job in version 2 schema 67 | $conf{track} = 1; 68 | 69 | # assume neither version 1 nor version 2 are available 70 | $conf{version}{1} = 0; 71 | $conf{version}{2} = 0; 72 | 73 | # 74 | # Autocreate slurm_job_log DB if it doesn't exist? 75 | # 76 | $conf{autocreate} = 0; 77 | 78 | # 79 | # Default job logfile. If empty, no logfile is used. 80 | # 81 | $conf{joblogfile} = "/var/log/slurm/joblog"; 82 | 83 | # 84 | # Read db, user, password, host from config files: 85 | # 86 | read_config (); 87 | # 88 | # Get SLURM-provided env vars and add to config 89 | # 90 | get_slurm_vars (); 91 | 92 | # Append job log to database. 93 | my $success = append_job_db (); 94 | 95 | # Append to text file.if requested or DB failed. 96 | if ($conf{joblogfile} || !$success) { 97 | append_joblog (); 98 | } 99 | 100 | exit 0; 101 | 102 | 103 | # 104 | # Error logging functions: 105 | # 106 | sub log_msg 107 | { 108 | my @msg = @_; 109 | my $logfile = "/var/log/slurm/jobcomp.log"; 110 | 111 | if (!open (LOG, ">>$logfile")) { 112 | print STDERR @msg; 113 | return; 114 | } 115 | print LOG scalar localtime, ": ", @msg; 116 | close (LOG); 117 | } 118 | 119 | sub log_error 120 | { 121 | log_msg "$prog: Error: ", @_; 122 | return undef 123 | } 124 | 125 | sub log_fatal 126 | { 127 | log_msg "$prog: Fatal: ", @_; 128 | exit 1; 129 | } 130 | 131 | 132 | sub read_config 133 | { 134 | my $ro = "$conf{confdir}/sqlog.conf"; 135 | my $rw = "$conf{confdir}/slurm-joblog.conf"; 136 | 137 | # First read sqlog config to get SQLHOST and SQLDB 138 | # (ignore SQLUSER) 139 | unless (my $rc = do $ro) { 140 | return log_error ("Couldn't parse $ro: $@\n") if $@; 141 | return log_error ("couldn't run $ro\n") if (defined $rc && !$rc); 142 | } 143 | $conf{sqlhost} = $conf::SQLHOST if (defined $conf::SQLHOST); 144 | $conf{db} = $conf::SQLDB if (defined $conf::SQLDB); 145 | 146 | # enable / disable per job node tracking 147 | $conf{track} = $conf::TRACKNODES if (defined $conf::TRACKNODES); 148 | 149 | undef $conf::SQLUSER; 150 | undef $conf::SQLPASS; 151 | 152 | # Now read slurm-joblog.conf 153 | -r $rw || return log_error ("Unable to read required config file: $rw.\n"); 154 | unless (my $rc = do $rw) { 155 | return log_error ("Couldn't parse $rw: $@\n") if $@; 156 | return log_error ("couldn't run $rw\n") if (defined $rc && !$rc); 157 | } 158 | 159 | $conf{sqluser} = $conf::SQLUSER if (defined $conf::SQLUSER); 160 | $conf{sqlpass} = $conf::SQLPASS if (defined $conf::SQLPASS); 161 | $conf{joblogfile} = $conf::JOBLOGFILE if (defined $conf::JOBLOGFILE); 162 | $conf{autocreate} = $conf::AUTOCREATE if (defined $conf::AUTOCREATE); 163 | } 164 | 165 | 166 | sub get_slurm_vars 167 | { 168 | # if a job is cancelled before it starts, 169 | # set reasonable defaults for missing variables 170 | # PROCS may be set to the number of requested 171 | # processors (don't know), force it to 0 172 | # NODECNT may be set to the number of requested 173 | # nodes (don't know), we don't use this anyway 174 | if (not $ENV{NODES}) { 175 | $ENV{NODES} = ""; 176 | $ENV{PROCS} = 0; 177 | } 178 | 179 | # set fields in conf corresponding to each SLURM variable 180 | for my $var (@SLURMvars) { 181 | exists $ENV{$var} or 182 | log_fatal "$var not set in script environment! Aborting...\n"; 183 | $conf{lc $var} = $ENV{$var}; 184 | } 185 | 186 | # get username 187 | $conf{username} = getpwuid($conf{uid}); 188 | 189 | # If NODECNT wasn't set, try counting the list of nodes: 190 | # 191 | # XXX: SLURM's NODECNT variable is incorrect in many cases, 192 | # e.g. when a job's state is NODE_FAIL, NODECNT will have been 193 | # decremented at the time the job ends (from the failed node) 194 | # So, we unfortunately cannot trust the NODECNT variable here. 195 | # 196 | #if (defined $conf{nodecnt}) { 197 | # $conf{nodecount} = $conf{nodecnt}; 198 | #} 199 | #else { 200 | # $conf{nodecount} = ($conf{nodes} =~ /^\s*$/) ? 0 : expand($conf{nodes}); 201 | #} 202 | $conf{nodecount} = ($conf{nodes} =~ /^\s*$/) ? 0 : expand($conf{nodes}); 203 | } 204 | 205 | sub create_db 206 | { 207 | my $cmd = "sqlog-db-util --create"; 208 | system ($cmd); 209 | if ($?>>8) { 210 | log_error ("'$cmd' exited with exit code ", $?>>8, "\n"); 211 | return (0); 212 | } 213 | 214 | log_msg ("Created DB $conf{db} at host $conf{sqlhost}\n"); 215 | 216 | return (1); 217 | } 218 | 219 | ######################################## 220 | # These following functions are similar to those in sqlog-db-util 221 | # TODO: Move these to a perl module? 222 | ######################################## 223 | 224 | # cache for name ids, saves us from hitting the database 225 | # over and over at the cost of more memory 226 | # not really needed in this case (insert of a single job), 227 | # but this way, the functions are the same as sqlog-db-util 228 | my %IDcache = (); 229 | %{$IDcache{nodes}} = (); 230 | 231 | # execute (do) sql statement on dbh 232 | sub do_sql { 233 | my ($dbh, $stmt) = @_; 234 | if (!$dbh->do ($stmt)) { 235 | log_error ("SQL failed: $stmt\n"); 236 | return 0; 237 | } 238 | return 1; 239 | } 240 | 241 | # returns 1 if table exists, 0 otherwise 242 | sub table_exists 243 | { 244 | my $dbh = shift @_; 245 | my $table = shift @_; 246 | 247 | # check whether our database has a table by the proper name 248 | my $sth = $dbh->prepare("SHOW TABLES;"); 249 | if ($sth->execute()) { 250 | while (my ($name) = $sth->fetchrow_array()) { 251 | if ($name eq $table) { return 1; } 252 | } 253 | } 254 | 255 | # didn't find it 256 | return 0; 257 | } 258 | 259 | # return the auto increment value for the last inserted record 260 | sub get_last_insert_id 261 | { 262 | my $dbh = shift @_; 263 | my $id = undef; 264 | 265 | my $sql = "SELECT LAST_INSERT_ID();"; 266 | my $sth = $dbh->prepare($sql); 267 | if ($sth->execute()) { 268 | ($id) = $sth->fetchrow_array(); 269 | } else { 270 | log_error ("Fetching last id: $sql\n"); 271 | } 272 | 273 | return $id; 274 | } 275 | 276 | # given a table and name, 277 | # read id for name from table and add to id cache if found 278 | sub read_id 279 | { 280 | my $dbh = shift @_; 281 | my $table = shift @_; 282 | my $name = shift @_; 283 | 284 | my $id = undef; 285 | 286 | # if name is not set, don't try to look it up in hash, just return undef 287 | if (not defined $name) { return $id; } 288 | 289 | if (not defined $IDcache{$table}) { %{$IDcache{$table}} = (); } 290 | if (not defined $IDcache{$table}{$name}) { 291 | my $q_name = $dbh->quote($name); 292 | my $sql = "SELECT * FROM `$table` WHERE `name` = $q_name;"; 293 | my $sth = $dbh->prepare($sql); 294 | if ($sth->execute ()) { 295 | my ($table_id, $table_name) = $sth->fetchrow_array (); 296 | if (defined $table_id and defined $table_name) { 297 | $IDcache{$table}{$name} = $table_id; 298 | $id = $table_id; 299 | } 300 | } else { 301 | log_error ("Reading record: $sql --> " . $dbh->errstr . "\n"); 302 | } 303 | } else { 304 | $id = $IDcache{$table}{$name}; 305 | } 306 | 307 | return $id; 308 | } 309 | 310 | # insert name into table if it does not exist, and return its id 311 | sub read_write_id 312 | { 313 | my $dbh = shift @_; 314 | my $table = shift @_; 315 | my $name = shift @_; 316 | 317 | # attempt to read the id first, 318 | # if not found, insert it and return the last insert id 319 | my $id = read_id ($dbh, $table, $name); 320 | if (not defined $id) { 321 | my $q_name = $dbh->quote($name); 322 | my $sql = "INSERT IGNORE INTO `$table` (`id`,`name`)" . 323 | " VALUES (NULL,$q_name);"; 324 | my $sth = $dbh->prepare($sql); 325 | if ($sth->execute ()) { 326 | # user read_id here instead of get_last_insert_id 327 | # to avoid race conditions 328 | $id = read_id ($dbh, $table, $name); 329 | if (not defined $id) { 330 | log_error ("Error inserting new record (id undefined): $sql\n"); 331 | $id = 0; 332 | } elsif ($id == 0) { 333 | log_error ("Error inserting new record (id=0): $sql\n"); 334 | $id = 0; 335 | } 336 | } else { 337 | log_error ("Error inserting new record: $sql --> " . 338 | $dbh->errstr . "\n"); 339 | $id = 0; 340 | } 341 | } 342 | 343 | return $id; 344 | } 345 | 346 | # given a reference to a list of nodes, 347 | # read their ids from the nodes table and add them to the id cache 348 | sub read_node_ids 349 | { 350 | my $dbh = shift @_; 351 | my $nodes_ref = shift @_; 352 | my $success = 1; 353 | 354 | # build list of nodes not in our cache 355 | my @missing_nodes = (); 356 | foreach my $node (@$nodes_ref) { 357 | if (not defined $IDcache{nodes}{$node}) { push @missing_nodes, $node; } 358 | } 359 | 360 | # if any missing nodes, try to look up their values 361 | if (@missing_nodes > 0) { 362 | my @q_nodes = map $dbh->quote($_), @missing_nodes; 363 | my $in_nodes = join(",", @q_nodes); 364 | my $sql = "SELECT * FROM `nodes` WHERE `name` IN ($in_nodes);"; 365 | my $sth = $dbh->prepare($sql); 366 | if ($sth->execute ()) { 367 | while (my ($table_id, $table_name) = $sth->fetchrow_array ()) { 368 | $IDcache{nodes}{$table_name} = $table_id; 369 | } 370 | } else { 371 | log_error ("Reading nodes: $sql --> " . $dbh->errstr . "\n"); 372 | $success = 0; 373 | } 374 | } 375 | 376 | return $success; 377 | } 378 | 379 | # given a reference to a list of nodes, 380 | # insert them into the nodes table and add their ids to the id cache 381 | sub read_write_node_ids 382 | { 383 | my $dbh = shift @_; 384 | my $nodes_ref = shift @_; 385 | my $success = 1; 386 | 387 | # read node_ids for these nodes into our cache 388 | read_node_ids($dbh, $nodes_ref); 389 | 390 | # if still missing nodes, we need to insert them 391 | my @missing_nodes = (); 392 | foreach my $node (@$nodes_ref) { 393 | if (not defined $IDcache{nodes}{$node}) { push @missing_nodes, $node; } 394 | } 395 | if (@missing_nodes > 0) { 396 | my @q_nodes = map $dbh->quote($_), @missing_nodes; 397 | my $values = join("),(", @q_nodes); 398 | my $sql = "INSERT IGNORE INTO `nodes` (`name`) VALUES ($values);"; 399 | my $sth = $dbh->prepare($sql); 400 | if (not $sth->execute ()) { 401 | log_error ("Inserting nodes: $sql --> " . $dbh->errstr . "\n"); 402 | $success = 0; 403 | } 404 | 405 | # fetch ids for just inserted nodes 406 | read_node_ids($dbh, $nodes_ref); 407 | } 408 | 409 | return $success; 410 | } 411 | 412 | # given a job_id and a nodelist, 413 | # insert jobs_nodes records for each node used in job_id 414 | sub insert_job_nodes 415 | { 416 | my $dbh = shift @_; 417 | my $job_id = shift @_; 418 | my $nodelist = shift @_; 419 | my $success = 1; 420 | 421 | if (defined $job_id and defined $nodelist and $nodelist ne "") { 422 | my $q_job_id = $dbh->quote($job_id); 423 | 424 | # clean up potentially bad nodelist 425 | if ($nodelist =~ /\[/ and $nodelist !~ /\]/) { 426 | # found an opening bracket, but no closing bracket, 427 | # nodelist is probably incomplete 428 | # chop back to last ',' or '-' and replace with a ']' 429 | $nodelist =~ s/[,-]\d+$/\]/; 430 | } 431 | 432 | # get our nodeset 433 | my @nodes = Hostlist::expand($nodelist); 434 | 435 | # this will fill our node_id cache 436 | read_write_node_ids($dbh, \@nodes); 437 | 438 | # get the node_id for each node 439 | my @values = (); 440 | foreach my $node (@nodes) { 441 | if (defined $IDcache{nodes}{$node}) { 442 | my $q_node_id = $dbh->quote($IDcache{nodes}{$node}); 443 | push @values, "($q_job_id,$q_node_id)"; 444 | } 445 | } 446 | 447 | # if we have any nodes for this job, insert them 448 | if (@values > 0) { 449 | my $sql = "INSERT DELAYED IGNORE INTO `jobs_nodes`" . 450 | " (`job_id`,`node_id`)" . 451 | " VALUES " . join(",", @values) . ";"; 452 | my $sth = $dbh->prepare($sql); 453 | if (not $sth->execute ()) { 454 | log_error ("Inserting jobs_nodes records for job id" . 455 | " $job_id: $sql --> " . $dbh->errstr . "\n"); 456 | $success = 0; 457 | } 458 | } 459 | } 460 | 461 | return $success; 462 | } 463 | 464 | # compute time since epoch, attempt to account for DST changes via timelocal 465 | sub get_seconds 466 | { 467 | my ($date) = @_; 468 | use Time::Local; 469 | 470 | my ($y, $m, $d, $H, $M, $S) = ($date =~ /(\d\d\d\d)\-(\d\d)\-(\d\d) (\d\d):(\d\d):(\d\d)/); 471 | $y -= 1900; 472 | $m -= 1; 473 | 474 | return timelocal ($S, $M, $H, $d, $m, $y); 475 | } 476 | 477 | # given hash of values, create mysql values string for insert statement 478 | sub value_string_v2 479 | { 480 | my $dbh = shift @_; 481 | my $h = shift @_; 482 | 483 | # given start and end times, compute the number of 484 | # seconds the job ran for 485 | # TODO: unsure whether this correctly handles jobs 486 | # that straddle DST changes 487 | my $seconds = 0; 488 | if (defined $h->{StartTime} and $h->{StartTime} !~ /^\s*$/ and 489 | defined $h->{EndTime} and $h->{EndTime} !~ /^\s*$/) 490 | { 491 | my $start = get_seconds($h->{StartTime}); 492 | my $end = get_seconds($h->{EndTime}); 493 | $seconds = $end - $start; 494 | if ($seconds < 0) { $seconds = 0; } 495 | } 496 | 497 | # if Procs is not set, but ppn is specified and NodeCnt is set, 498 | # compute Procs (assumes all processors on the node were 499 | # allocated to the job, only use for clusters which use 500 | # whole-node allocation) 501 | # if (not defined $h->{Procs} and defined $conf{ppn} and 502 | # defined $h->{NodeCnt} 503 | # ) 504 | # { 505 | # $h->{Procs} = $h->{NodeCnt} * $conf{ppn}; 506 | # } 507 | 508 | # insert the field values, order matters 509 | my @parts = (); 510 | push @parts, (defined $h->{Id}) ? $dbh->quote($h->{Id}) : "NULL"; 511 | push @parts, $dbh->quote($h->{JobId}); 512 | push @parts, $dbh->quote(read_write_id($dbh, "usernames", $h->{UserName})); 513 | push @parts, $dbh->quote($h->{UserNumb}); 514 | push @parts, $dbh->quote(read_write_id($dbh, "jobnames", $h->{Name})); 515 | push @parts, $dbh->quote(read_write_id($dbh, "jobstates", $h->{JobState})); 516 | push @parts, $dbh->quote(read_write_id($dbh, "partitions", $h->{Partition})); 517 | push @parts, $dbh->quote($h->{TimeLimit}); 518 | push @parts, $dbh->quote($h->{StartTime}); 519 | push @parts, $dbh->quote($h->{EndTime}); 520 | push @parts, $dbh->quote($seconds); 521 | push @parts, $dbh->quote($h->{NodeList}); 522 | push @parts, $dbh->quote($h->{NodeCnt}); 523 | push @parts, (defined $h->{Procs}) ? $dbh->quote($h->{Procs}) : 0; 524 | 525 | # finally, return the ('field1','field2',...) string 526 | return "(" . join(',', @parts) . ")"; 527 | } 528 | 529 | ######################################## 530 | # The above functions are similar to those in sqlog-db-util 531 | # TODO: Move these to a perl module? 532 | ######################################## 533 | 534 | # 535 | # Append data to SLURM job log (database) 536 | # 537 | sub append_job_db 538 | { 539 | # Ignore if no sqlhost, just append to txt joblog 540 | # 541 | if (!$conf{"sqlhost"}) { 542 | log_error "No SQLHOST found $conf{sqlhost}\n"; 543 | return 0; 544 | } 545 | 546 | my $str = "DBI:mysql:database=$conf{db};host=$conf{sqlhost}"; 547 | my $dbh = DBI->connect($str, $conf{sqluser}, $conf{sqlpass}); 548 | 549 | if (!$dbh) { 550 | if (!$conf{autocreate}) { 551 | log_error ("Failed to connect to DB at $conf{sqlhost}: ", 552 | "$DBI::errstr\n"); 553 | return (0); 554 | } 555 | create_db() 556 | or return (0); 557 | $dbh = DBI->connect($str, $conf{sqluser}, $conf{sqlpass}) 558 | or return (0); 559 | } 560 | 561 | # check whether we have version 1 and version 2 schemas 562 | $conf{version}{1} = table_exists ($dbh, 'slurm_job_log'); 563 | $conf{version}{2} = table_exists ($dbh, 'jobs'); 564 | 565 | # Check for tables, if not found, try to create them 566 | if (not $conf{version}{1} and not $conf{version}{2}) { 567 | log_msg ("SLURM job log table doesn't exist in DB. Creating.\n"); 568 | create_db () or return (0); 569 | } 570 | 571 | # if we have schema 2 use it, otherwise, try schema 1 572 | # if neither is found, print an error 573 | if ($conf{version}{2}) { 574 | # value_string_v2 expects certain field names, so convert conf 575 | my %h = (); 576 | $h{JobId} = $conf{jobid}; 577 | $h{UserName} = $conf{username}; 578 | $h{UserNumb} = $conf{uid}; 579 | $h{Name} = $conf{jobname}; 580 | $h{JobState} = $conf{jobstate}; 581 | $h{Partition} = $conf{partition}; 582 | $h{TimeLimit} = $conf{limit}; 583 | $h{StartTime} = convtime_db("start"); 584 | $h{EndTime} = convtime_db("end"); 585 | $h{NodeList} = $conf{nodes}; 586 | $h{NodeCnt} = $conf{nodecount}; 587 | $h{Procs} = $conf{procs}; 588 | 589 | # convert hash to VALUES clause 590 | my $value_string = value_string_v2 ($dbh, \%h); 591 | 592 | # insert into v2 schema 593 | my $sql = "INSERT INTO `jobs` VALUES $value_string;"; 594 | if (not do_sql ($dbh, $sql)) { 595 | log_error "Problem inserting into slurm table:" . 596 | " $sql: error: ", $dbh->errstr, "\n"; 597 | return 0; 598 | } 599 | 600 | # insert nodes used by this job if node tracking is enabled 601 | if ($conf{track}) { 602 | my $job_id = get_last_insert_id ($dbh); 603 | if (defined $job_id and $job_id != 0) { 604 | insert_job_nodes ($dbh, $job_id, $h{NodeList}); 605 | } 606 | } 607 | } elsif ($conf{version}{1}) { 608 | # insert into v1 schema 609 | my @params_v1 = @params; 610 | pop @params_v1; 611 | 612 | my $sth_v1 = $dbh->prepare($conf{stmt_v1}) 613 | or log_error "prepare: ", $dbh->errstr, "\n"; 614 | 615 | if (not $sth_v1->execute("NULL", map {convtime_db($_)} @params_v1)) { 616 | log_error "Problem inserting into slurm table: ", 617 | $dbh->errstr, "\n"; 618 | return 0; 619 | } 620 | } else { 621 | log_error "No tables found to insert record into\n"; 622 | return 0; 623 | } 624 | 625 | $dbh->disconnect; 626 | return 1; 627 | } 628 | 629 | sub convtime_db 630 | { 631 | my ($var) = @_; 632 | my $fmt = "%Y-%m-%d %H:%M:%S"; 633 | 634 | $var =~ /^(start|end)$/ && return strftime $fmt, localtime ($conf{$var}); 635 | return $conf{$var}; 636 | } 637 | 638 | 639 | sub convtime 640 | { 641 | my ($var) = @_; 642 | my $fmt = "%Y-%m-%dT%H:%M:%S"; 643 | 644 | $var =~ /^(start|end)$/ && return strftime $fmt, localtime ($conf{$var}); 645 | return $conf{$var}; 646 | } 647 | 648 | # 649 | # Append data to SLURM job log (text file) 650 | # 651 | sub append_joblog 652 | { 653 | my $joblog = $conf{joblogfile}; 654 | 655 | if (!open (JOBLOG, ">>$joblog")) { 656 | log_error "Unable to open $joblog: $!\n"; 657 | return 0; 658 | } 659 | 660 | printf JOBLOG "JobId=%s UserId=%s(%s) Name=%s JobState=%s Partition=%s " . 661 | "TimeLimit=%s StartTime=%s EndTime=%s NodeList=%s " . 662 | "NodeCnt=%s Procs=%s\n", 663 | map {convtime($_)} @params; 664 | 665 | close (JOBLOG); 666 | } 667 | 668 | # vi: ts=4 sw=4 expandtab 669 | -------------------------------------------------------------------------------- /sqlog-db-util: -------------------------------------------------------------------------------- 1 | #!/usr/bin/perl -w 2 | ############################################################################### 3 | # $Id$ 4 | #****************************************************************************** 5 | # Copyright (C) 2007-2009 Lawrence Livermore National Security, LLC. 6 | # Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER). 7 | # Written by Adam Moody and 8 | # Mark Grondona 9 | # 10 | # UCRL-CODE-235340. 11 | # 12 | # This file is part of sqlog. 13 | # 14 | # This is free software; you can redistribute it and/or modify it 15 | # under the terms of the GNU General Public License as published by 16 | # the Free Software Foundation; either version 2 of the License, or 17 | # (at your option) any later version. 18 | # 19 | # This is distributed in the hope that it will be useful, but WITHOUT 20 | # ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or 21 | # FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License 22 | # for more details. 23 | # 24 | # You should have received a copy of the GNU General Public License 25 | # along with this program; if not, see . 26 | ############################################################################### 27 | # 28 | # sqlog-db-util - SQLOG job database maintenance. 29 | # 30 | ############################################################################### 31 | use strict; 32 | use lib qw(); # Required for _perl_libpaths RPM option 33 | use DBI; 34 | use Digest::SHA1 qw/ sha1_hex /; 35 | use Getopt::Long qw/ :config gnu_getopt ignore_case /; 36 | use File::Basename; 37 | use Hostlist; 38 | use Time::HiRes qw( gettimeofday ); 39 | 40 | # This file contains the SQL statements needed 41 | # to set up a 'slurm_job_log' table in a 'slurm' DB 42 | # on a MySQL server. 43 | # 44 | # It can also be used to backfill the database by 45 | # inserting records from a list of slurm job completion 46 | # logfiles. 47 | # 48 | # Adam Moody 49 | 50 | # Required for _path_env_var RPM option 51 | $ENV{PATH} = '/bin:/usr/bin:/usr/sbin'; 52 | 53 | my %conf = (); 54 | 55 | ############################## 56 | # Usage: 57 | ############################# 58 | my $progname = basename $0; 59 | 60 | $conf{usage} = < \$conf{help}, 158 | "verbose|v+" => \$conf{verbose}, 159 | "info|i" => \$conf{info}, 160 | "drop|d=i" => \$conf{drop}, 161 | "create|c" => \$conf{create}, 162 | "backfill|b" => \$conf{backfill}, 163 | "convert|x" => \$conf{convert}, 164 | "backup|B=s" => \$conf{backup}, 165 | "obfuscate|o" => \$conf{obfuscate}, 166 | "prune|p=s" => \$conf{prune}, 167 | "cores-per-node|C=i" => \$conf{cores}, 168 | "notrack" => sub { $conf{track} = 0; }, 169 | "delay-index" => sub { $conf{indicies} = 0; }, 170 | "localhost|L" => \$conf{localhost}, 171 | "recalc-nodecnt" => \$conf{recalculate_nodecount}, 172 | ) or usage (); 173 | 174 | if (!$conf{create} && !$conf{convert} && !$conf{drop} && 175 | !$conf{backfill} && !$conf{backup} && !$conf{prune} && 176 | !$conf{info} && !$conf{help}) { 177 | log_error ("Specify at least one of " . 178 | "--{create,convert,drop,backfill,backup,prune,info}.\n"); 179 | usage (); 180 | } 181 | 182 | if ($conf{help}) { 183 | usage (); 184 | } 185 | 186 | ############################# 187 | # Attempt to connect to slurm database 188 | ############################# 189 | 190 | # test whether slurm db already exists by trying to connect 191 | my $dbh = connect_db_rw (); 192 | 193 | # backup table data -- writes records to a file readable by backfill 194 | # global variables to help obfuscate user and jobnames 195 | my %obfuscate = (); 196 | my $num_users = 0; 197 | my $num_jobs = 0; 198 | 199 | if ($conf{backup} or defined $conf{prune}) { 200 | # check that we have a db connection 201 | if (!$dbh) { 202 | log_fatal ("Data dump requested, but connection to database failed!\n") 203 | } 204 | 205 | # check that user gave us exactly one file name 206 | if (@ARGV != 1) { 207 | log_fatal ("You must specify a date range and" . 208 | " a filename to append data to.\n"); 209 | } 210 | my $joblog = shift @ARGV; 211 | 212 | # dump data to joblog file 213 | if (table_exists ($dbh, "slurm_job_log")) { 214 | dump_slurm_joblog_table (1, $dbh, $joblog); 215 | } 216 | if (table_exists ($dbh, "jobs")) { 217 | dump_slurm_joblog_table (2, $dbh, $joblog); 218 | } 219 | } 220 | 221 | # 222 | # Drop existing tables 223 | # 224 | if ($conf{drop}) { 225 | if ($dbh) { 226 | if ($conf{drop} == 1) { 227 | log_verbose ("drop: Dropping version 1 tables\n"); 228 | drop_slurm_joblog_table_v1 ($dbh); 229 | } elsif ($conf{drop} == 2) { 230 | log_verbose ("drop: Dropping version 2 tables\n"); 231 | drop_slurm_joblog_table_v2 ($dbh); 232 | } else { 233 | log_verbose ("drop: Unknown schema version: $conf{drop}\n"); 234 | } 235 | $dbh = disconnect_db_rw (); 236 | } else { 237 | log_verbose ("drop: No existing slurm DB to drop\n"); 238 | } 239 | # TODO: should we also delete the slurm db and users 240 | # (i.e., undo everything create does?) 241 | } 242 | 243 | # 244 | # Create database 245 | # 246 | if ($conf{create} && $dbh) { 247 | # if version 2 tables do not exist, create them 248 | if (not table_exists ($dbh, "jobs")) { 249 | log_verbose ("create: Creating version 2 tables.\n"); 250 | create_slurm_joblog_table_v2 ($dbh); 251 | } else { 252 | log_verbose ("create: SLURM database already exists.\n"); 253 | } 254 | } elsif ($conf{create} && !$dbh) { 255 | # the db may not exist (couldn't connect), try to create it 256 | create_db_and_slurm_users (); 257 | 258 | # try to connect again 259 | $dbh = connect_db_rw() 260 | or log_fatal ("create: Failed to connect to SLURM DB after create!\n"); 261 | 262 | # create version 2 from the beginning on a brand new install 263 | log_verbose ("create: Creating version 2 tables.\n"); 264 | 265 | create_slurm_joblog_table_v2 ($dbh); 266 | } 267 | 268 | # 269 | # Convert slurm_job_log table to version 2 270 | # (add corecount and extend nodelist columns) 271 | # 272 | if ($conf{convert}) { 273 | # 274 | # Attempt to convert table to version 2, if conversion fails 275 | # print an error. 276 | # 277 | # If the table has already been converted, a message is printed 278 | # and no action is taken 279 | # 280 | if (!$dbh) { 281 | log_fatal ("convert: Conversion requested," . 282 | " but connection to database failed.\n") 283 | } 284 | log_verbose ("convert: Initiating conversion from" . 285 | " version 1 to version 2 tables.\n"); 286 | if (!convert_slurm_joblog_table_from_v1_to_v2 ($dbh)) { 287 | log_fatal ("convert: SLURM job log table conversion failed.\n"); 288 | } 289 | } 290 | 291 | # 292 | # Backfill from logfiles 293 | # 294 | if ($conf{backfill}) { 295 | if (!$dbh) { 296 | log_fatal ("backfill: Backfill requested," . 297 | " but connection to database failed!\n") 298 | } 299 | # if we find the version 2 schema, backfill to it 300 | # otherwise, if we find the version 1 schema, backfill to it 301 | # if we find neither, throw an error 302 | if (table_exists ($dbh, "jobs")) { 303 | backfill_slurm_joblog_table_to_v2 ($dbh, @ARGV); 304 | } elsif (table_exists ($dbh, "slurm_job_log")) { 305 | backfill_slurm_joblog_table_to_v1 ($dbh, @ARGV); 306 | } else { 307 | log_fatal ("backfill: Unknown schema version.\n"); 308 | } 309 | } 310 | 311 | if ($conf{info}) { 312 | show_info (); 313 | } 314 | 315 | disconnect_db_rw (); 316 | 317 | exit 0; 318 | 319 | ############################# 320 | # Support functions 321 | ############################# 322 | 323 | sub db_host_string 324 | { 325 | return $conf{localhost} ? "localhost" : $conf{sqlhost}; 326 | } 327 | 328 | sub connect_db_rw 329 | { 330 | my $host = db_host_string (); 331 | my $cstr = "DBI:mysql(PrintError=>0):" . 332 | "database=$conf{db};host=$host:"; 333 | 334 | my $dbh = DBI->connect($cstr, $conf{rw}{sqluser}, $conf{rw}{sqlpass}) 335 | or log_verbose ("Unable to connect to MySQL DB as ", 336 | "$conf{rw}{sqluser}\@$conf{sqlhost}: ", $DBI::errstr, "\n"); 337 | 338 | $conf{dbh}{rw} = $dbh; 339 | 340 | return ($dbh); 341 | } 342 | 343 | sub disconnect_db_rw 344 | { 345 | return if !$conf{dbh}{rw}; 346 | $conf{dbh}{rw}->disconnect; 347 | return $conf{dbh}{rw} = undef; 348 | } 349 | 350 | sub connect_db_root 351 | { 352 | my $host = db_host_string (); 353 | my $str = "DBI:mysql(PrintError=>0):host=$host;"; 354 | 355 | $conf{dbh}{root} = DBI->connect ($str, "root", $conf{rw}{rootpass}) 356 | or log_fatal ("Unable to connect to MySQL DB as root\@$host: ", 357 | $DBI::errstr, "\n"); 358 | 359 | return ($conf{dbh}{root}); 360 | } 361 | 362 | # returns 1 if table exists, 0 otherwise 363 | sub table_exists 364 | { 365 | my $dbh = shift @_; 366 | my $table = shift @_; 367 | 368 | # check whether our database has a table by the proper name 369 | my $sth = $dbh->prepare("SHOW TABLES;"); 370 | if ($sth->execute()) { 371 | while (my ($name) = $sth->fetchrow_array()) { 372 | if ($name eq $table) { return 1; } 373 | } 374 | } 375 | 376 | # didn't find it 377 | return 0; 378 | } 379 | 380 | sub read_config 381 | { 382 | my $ro = "$conf{confdir}/sqlog.conf"; 383 | my $rw = "$conf{confdir}/slurm-joblog.conf"; 384 | 385 | # First read sqlog config to get SQLHOST and SQLDB 386 | # (ignore SQLUSER/SQLPASS) 387 | unless (my $rc = do $ro) { 388 | log_fatal ("Couldn't parse $ro: $@\n") if $@; 389 | log_fatal ("couldn't run $ro\n") if (defined $rc && !$rc); 390 | } 391 | 392 | $conf{db} = $conf::SQLDB if (defined $conf::SQLDB); 393 | $conf{sqlhost} = $conf::SQLHOST if (defined $conf::SQLHOST); 394 | $conf{ro}{sqluser} = $conf::SQLUSER if (defined $conf::SQLUSER); 395 | $conf{ro}{sqlpass} = $conf::SQLPASS if (defined $conf::SQLPASS); 396 | 397 | # enable / disable per job node tracking 398 | $conf{track} = $conf::TRACKNODES if (defined $conf::TRACKNODES); 399 | 400 | undef $conf::SQLUSER; 401 | undef $conf::SQLPASS; 402 | 403 | # Now read slurm-joblog.conf 404 | -r $rw || log_fatal ("Unable to read required config file: $rw.\n"); 405 | unless (my $rc = do $rw) { 406 | log_fatal ("Couldn't parse $rw: $@\n") if $@; 407 | log_fatal ("couldn't run $rw\n") if (defined $rc && !$rc); 408 | } 409 | 410 | $conf{rw}{sqluser} = $conf::SQLUSER if (defined $conf::SQLUSER); 411 | $conf{rw}{sqlpass} = $conf::SQLPASS if (defined $conf::SQLPASS); 412 | $conf{rw}{rootpass} = $conf::SQLROOTPASS if (defined $conf::SQLROOTPASS); 413 | $conf{rw}{sqlnetwork} = $conf::SQLNETWORK if (defined $conf::SQLNETWORK); 414 | 415 | @{$conf{rw}{hosts}} = @conf::SQLRWHOSTS if (@conf::SQLRWHOSTS); 416 | 417 | my %seen; 418 | @{$conf{rw}{hosts}} = grep {$_ && !$seen{$_}++} @{$conf{rw}{hosts}}; 419 | 420 | } 421 | 422 | # Connect to MySQL as root user to build slurm db 423 | # and insert slurm and slurm_read users 424 | sub create_db_and_slurm_users 425 | { 426 | my $dbh = connect_db_root () 427 | or log_fatal ("Couldn't connect to database as root\n"); 428 | 429 | # 430 | # Abort if slurm_job_log table already exists. 431 | if (table_exists ($dbh, "slurm_job_log") or table_exists ($dbh, "jobs")) { 432 | log_msg ("create: SLURM job log table exists. No create necessary.\n"); 433 | return; 434 | } 435 | 436 | ############################# 437 | # Create slurm db / table 438 | ############################# 439 | 440 | log_verbose ("Creating slurm DB\n"); 441 | do_sql ($dbh, "CREATE DATABASE IF NOT EXISTS $conf{db};"); 442 | 443 | ############################# 444 | # Set up slurm (r/w) and slurm_read (r/o) access 445 | ############################# 446 | 447 | # Switch to management databases 448 | do_sql($dbh, "USE mysql;"); 449 | 450 | log_verbose ("Dropping previous slurm joblog db users and privileges.\n"); 451 | drop_slurm_users ($dbh); 452 | 453 | # set up permissions for different users of slurm database 454 | for my $host (@{$conf{rw}{hosts}}, "localhost") { 455 | my $user = $conf{rw}{sqluser}; 456 | log_verbose ("Granting rw privileges to $user on $host\n"); 457 | do_sql ($dbh, 458 | "GRANT ALL ON $conf{db}.* TO" . 459 | " '$user'\@'$host'" . 460 | " IDENTIFIED BY '$conf{rw}{sqlpass}'"); 461 | } 462 | 463 | log_verbose ("Granting readonly privs to $conf{ro}{sqluser} " . 464 | "on $conf{rw}{sqlnetwork}.\n"); 465 | do_sql ($dbh, 466 | "GRANT SELECT ON $conf{db}.* TO" . 467 | " $conf{ro}{sqluser}\@'$conf{rw}{sqlnetwork}'" . 468 | " IDENTIFIED BY ''"); 469 | 470 | # flush privileges to make our changes current 471 | log_verbose ("FLUSH PRIVILEGES\n"); 472 | do_sql($dbh, "FLUSH PRIVILEGES;"); 473 | 474 | # we're done 475 | log_verbose ("Done creating slurm joblog DB.\n"); 476 | } 477 | 478 | sub show_info 479 | { 480 | my $dbh = connect_db_rw () or return; 481 | 482 | # determine what schema version we're at 483 | my $version = "UKNOWN"; 484 | if (table_exists ($dbh, "jobs")) { 485 | $version = 2; 486 | } elsif (table_exists ($dbh, "slurm_job_log")) { 487 | $version = 1; 488 | } 489 | 490 | &log_verbose ("Connected to joblog database version $version\n"); 491 | 492 | # count the number of jobs in version 1 493 | my $count_v1 = 0; 494 | my $stmt = "SELECT COUNT(*) FROM `$conf{db}`.`slurm_job_log`;"; 495 | my $sth = $dbh->prepare ($stmt) or return; 496 | if ($sth->execute ()) { ($count_v1) = $sth->fetchrow_array; } 497 | 498 | # count the number of jobs in version 2 499 | my $count_v2 = 0; 500 | $stmt = "SELECT COUNT(*) FROM `$conf{db}`.`jobs`;"; 501 | $sth = $dbh->prepare ($stmt) or return; 502 | if ($sth->execute ()) { ($count_v2) = $sth->fetchrow_array; } 503 | 504 | # add the job counts to get the total 505 | my $count = $count_v1 + $count_v2; 506 | 507 | # now we're ready to print 508 | log_msg ("Information for SLURM job log DB:\n"); 509 | print "DB Host: $conf{sqlhost}\n"; 510 | print "DB User: $conf{ro}{sqluser}\n"; 511 | print "RW User: $conf{rw}{sqluser}\n"; 512 | print "SLURM DB: $conf{db}\n"; 513 | print "Version: $version\n"; 514 | print "Job count: $count\n"; 515 | 516 | return; 517 | } 518 | 519 | sub drop_slurm_users 520 | { 521 | my $dbh = shift @_; 522 | my $stmt = "SELECT user,host from mysql.user;"; 523 | my @oldusers = (); 524 | 525 | my $sth = $dbh->prepare ($stmt) or return; 526 | $sth->execute () or return; 527 | 528 | while ((my $a = $sth->fetchrow_arrayref)) { 529 | if ($a->[0] ne "$conf{ro}{sqluser}" && 530 | $a->[0] ne "$conf{rw}{sqluser}" ) { 531 | next; 532 | } 533 | push (@oldusers, "$a->[0]\@'$a->[1]'"); 534 | } 535 | do_sql ($dbh, "DROP USER " . join (", ", @oldusers)) if @oldusers; 536 | } 537 | 538 | # execute (do) sql statement on dbh 539 | sub do_sql { 540 | my ($dbh, $stmt) = @_; 541 | log_debug ("SQL: [$stmt]\n"); 542 | $dbh->do ($stmt); 543 | if (not $dbh->do ($stmt)) { 544 | log_error ("FAILED SQL: $stmt ERROR: " . $dbh->errstr . "\n"); 545 | return 0; 546 | } 547 | return 1; 548 | } 549 | 550 | #################### 551 | # Schema version 1 functions 552 | #################### 553 | 554 | # drop the table 555 | sub drop_slurm_joblog_table_v1 556 | { 557 | my $dbh = shift @_; 558 | my $success = 1; 559 | 560 | # switch to the slurm db 561 | if (not do_sql ($dbh, "USE $conf{db};")) { $success = 0; } 562 | 563 | # now drop the tables 564 | log_verbose ("drop: Dropping existing 'slurm_job_log' table\n"); 565 | my $sql = "DROP TABLE `slurm_job_log`;"; 566 | if (not do_sql ($dbh, $sql)) { $success = 0; } 567 | 568 | return $success; 569 | } 570 | 571 | # build the table 572 | sub create_slurm_joblog_table_v1 573 | { 574 | my $dbh = shift @_; 575 | my $success = 1; 576 | 577 | # switch to the slurm db 578 | if (not do_sql ($dbh, "USE $conf{db};")) { $success = 0; } 579 | 580 | # keep this schema around for historical record 581 | # (could enable one to build a v1 table if so desired) 582 | my $sql = "CREATE TABLE IF NOT EXISTS slurm_job_log ( 583 | id int(10) NOT NULL AUTO_INCREMENT, 584 | jobid int(10) NOT NULL, 585 | username char(100) NOT NULL, 586 | userid int(10) NOT NULL, 587 | jobname char(100) NOT NULL, 588 | jobstate char(25) NOT NULL, 589 | partition char(25) NOT NULL, 590 | timelimit int(10) NOT NULL, 591 | starttime datetime NOT NULL, 592 | endtime datetime NOT NULL, 593 | nodelist varchar(1024) NOT NULL, 594 | nodecount int(10) NOT NULL, 595 | PRIMARY KEY (id), 596 | UNIQUE INDEX jobid (jobid,starttime), 597 | INDEX username (username) 598 | ) ENGINE=MyISAM;"; 599 | if (not do_sql ($dbh, $sql)) { $success = 0; } 600 | 601 | return $success; 602 | } 603 | 604 | # given hash of values, create mysql values string for insert statement 605 | sub value_string_v1 606 | { 607 | my $dbh = shift @_; 608 | my $h = shift @_; 609 | 610 | my @parts = (); 611 | push @parts, "NULL"; 612 | push @parts, $dbh->quote($h->{JobId}); 613 | push @parts, $dbh->quote($h->{UserName}); 614 | push @parts, $dbh->quote($h->{UserNumb}); 615 | push @parts, $dbh->quote($h->{Name}); 616 | push @parts, $dbh->quote($h->{JobState}); 617 | push @parts, $dbh->quote($h->{Partition}); 618 | push @parts, $dbh->quote($h->{TimeLimit}); 619 | push @parts, $dbh->quote($h->{StartTime}); 620 | push @parts, $dbh->quote($h->{EndTime}); 621 | push @parts, $dbh->quote($h->{NodeList}); 622 | push @parts, $dbh->quote($h->{NodeCnt}); 623 | 624 | return "(" . join(',', @parts) . ")"; 625 | } 626 | 627 | # do a batch insert to be more efficient 628 | sub insert_values_v1 629 | { 630 | my $dbh = shift @_; 631 | my @values = @_; 632 | 633 | while (@values) { 634 | my @subvalues = (); 635 | for (my $i = 0; $i < 50 and @values; $i++) { 636 | push @subvalues, shift @values; 637 | } 638 | my $sql = "INSERT IGNORE INTO `$conf{db}`.`slurm_job_log` VALUES " . 639 | join(",", @subvalues) . ";"; 640 | 641 | #log_debug ("SQL: $sql\n"); 642 | $dbh->do($sql); 643 | } 644 | } 645 | 646 | # given a dbh and list of slurm job completion logfiles, 647 | # insert them into the dbh 648 | sub backfill_slurm_joblog_table_to_v1 649 | { 650 | my $dbh = shift @_; 651 | my @files = @_; 652 | my $success = 1; 653 | 654 | # switch to the slurm db 655 | if (not do_sql ($dbh, "USE $conf{db};")) { $success = 0; } 656 | 657 | # if our new table does not exist, create it 658 | if (not table_exists ($dbh, "slurm_job_log")) { 659 | if (not create_slurm_joblog_table_v1($dbh)) { 660 | return 0; 661 | } 662 | } 663 | 664 | log_error ("No files to backfill!\n") if (!@files); 665 | 666 | foreach my $file (@files) { 667 | my @values = (); 668 | my $count = 0; 669 | my $skipped = 0; 670 | 671 | my $f = $file; 672 | $f = "gzip -dc $f | " if ($f =~ /\.gz$/); 673 | 674 | open (IN, $f) or log_error ("Failed to open \"$file\":$!\n"), next; 675 | 676 | while (my $line = ) { 677 | chomp $line; 678 | my @parts = split(" ", $line); 679 | 680 | my %h = (); 681 | foreach my $part (@parts) { 682 | my ($key, $value) = split("=", $part); 683 | $h{$key} = $value; 684 | } 685 | 686 | # Some very old joblog files may have the incorrect 687 | # datetime format. Unfortunately, the year wasn't 688 | # included in these, so we have to drop these entries :-( 689 | if (defined $h{StartTime} and $h{StartTime} =~ m{^\d\d/\d\d-}) { 690 | $skipped++; 691 | next; 692 | } 693 | 694 | # convert from slurm log to format for MySQL 695 | if (defined $h{"UserId"}) { 696 | my $userid = $h{"UserId"}; 697 | my ($username, $usernumb) = ($userid =~ /(.+)\((\d+)\)/); 698 | if (defined $username and defined $usernumb) { 699 | $h{"UserName"} = $username; 700 | $h{"UserNumb"} = $usernumb; 701 | } 702 | } 703 | if (defined $h{"StartTime"}) { 704 | $h{"StartTime"} =~ s/T/ /; 705 | } 706 | if (defined $h{"EndTime"}) { 707 | $h{"EndTime"} =~ s/T/ /; 708 | } 709 | 710 | push @values, value_string_v1($dbh, \%h); 711 | 712 | if (@values > 100) { 713 | insert_values_v1($dbh, @values); 714 | @values = (); 715 | } 716 | $count++; 717 | } 718 | insert_values_v1($dbh, @values); 719 | 720 | log_verbose ("Backfilled $count jobs from file $file\n"); 721 | log_error ("Skipped $skipped job(s) from file $file because of ", 722 | "old date format\n") if $skipped; 723 | 724 | close(IN); 725 | } 726 | 727 | return $success; 728 | } 729 | 730 | #################### 731 | # Schema version 2 functions 732 | #################### 733 | 734 | # cache for name ids, saves us from hitting the database 735 | # over and over at the cost of more memory 736 | my %IDcache = (); 737 | %{$IDcache{nodes}} = (); 738 | 739 | # return the auto increment value for the last inserted record 740 | sub get_last_insert_id 741 | { 742 | my $dbh = shift @_; 743 | my $id = undef; 744 | 745 | my $sql = "SELECT LAST_INSERT_ID();"; 746 | my $sth = $dbh->prepare($sql); 747 | if ($sth->execute()) { 748 | ($id) = $sth->fetchrow_array(); 749 | } else { 750 | log_error ("Fetching last id: $sql\n"); 751 | } 752 | 753 | return $id; 754 | } 755 | 756 | # given a table and name, read id for name from table 757 | # and add to id cache if found 758 | sub read_id 759 | { 760 | my $dbh = shift @_; 761 | my $table = shift @_; 762 | my $name = shift @_; 763 | 764 | my $id = undef; 765 | 766 | # if name is not set, don't try to look it up in hash, just return undef 767 | if (not defined $name) { return $id; } 768 | 769 | if (not defined $IDcache{$table}) { %{$IDcache{$table}} = (); } 770 | if (not defined $IDcache{$table}{$name}) { 771 | my $q_name = $dbh->quote($name); 772 | my $sql = "SELECT * FROM `$table` WHERE `name` = $q_name;"; 773 | my $sth = $dbh->prepare($sql); 774 | if ($sth->execute ()) { 775 | my ($table_id, $table_name) = $sth->fetchrow_array (); 776 | if (defined $table_id and defined $table_name) { 777 | $IDcache{$table}{$name} = $table_id; 778 | $id = $table_id; 779 | } 780 | } else { 781 | log_error ("Reading record: $sql --> " . $dbh->errstr . "\n"); 782 | } 783 | } else { 784 | $id = $IDcache{$table}{$name}; 785 | } 786 | 787 | return $id; 788 | } 789 | 790 | # insert name into table if it does not exist, and return its id 791 | sub read_write_id 792 | { 793 | my $dbh = shift @_; 794 | my $table = shift @_; 795 | my $name = shift @_; 796 | 797 | # if name isn't set, set it to the empty string 798 | # DON'T do this in slurm-joblog, it will fail and 799 | # write to the joblog instead 800 | if (not defined $name) { $name = ""; } 801 | 802 | # attempt to read the id first, if not found, 803 | # insert it and return the last insert id 804 | my $id = read_id($dbh, $table, $name); 805 | if (not defined $id) { 806 | my $q_name = $dbh->quote($name); 807 | my $sql = "INSERT IGNORE INTO `$table` (`id`,`name`)" . 808 | " VALUES (NULL,$q_name);"; 809 | my $sth = $dbh->prepare($sql); 810 | if ($sth->execute ()) { 811 | # user read_id here instead of get_last_insert_id 812 | # to avoid race conditions 813 | $id = read_id ($dbh, $table, $name); 814 | if (not defined $id) { 815 | log_error ("Error inserting new record (id undefined): $sql\n"); 816 | $id = 0; 817 | } elsif ($id == 0) { 818 | log_error ("Error inserting new record (id=0): $sql\n"); 819 | $id = 0; 820 | } 821 | } else { 822 | log_error ("Error inserting new record: $sql --> " . 823 | $dbh->errstr . "\n"); 824 | $id = 0; 825 | } 826 | } 827 | 828 | return $id; 829 | } 830 | 831 | # given a reference to a list of nodes, 832 | # read their ids from the nodes table and add them to the id cache 833 | sub read_node_ids 834 | { 835 | my $dbh = shift @_; 836 | my $nodes_ref = shift @_; 837 | my $success = 1; 838 | 839 | # build list of nodes not in our cache 840 | my @missing_nodes = (); 841 | foreach my $node (@$nodes_ref) { 842 | if (not defined $IDcache{nodes}{$node}) { push @missing_nodes, $node; } 843 | } 844 | 845 | # if any missing nodes, try to look up their values 846 | if (@missing_nodes > 0) { 847 | my @q_nodes = map $dbh->quote($_), @missing_nodes; 848 | my $in_nodes = join(",", @q_nodes); 849 | my $sql = "SELECT * FROM `nodes` WHERE `name` IN ($in_nodes);"; 850 | my $sth = $dbh->prepare($sql); 851 | if ($sth->execute ()) { 852 | while (my ($table_id, $table_name) = $sth->fetchrow_array ()) { 853 | $IDcache{nodes}{$table_name} = $table_id; 854 | } 855 | } else { 856 | log_error ("Reading nodes: $sql --> " . $dbh->errstr . "\n"); 857 | $success = 0; 858 | } 859 | } 860 | 861 | return $success; 862 | } 863 | 864 | # given a reference to a list of nodes, 865 | # insert them into the nodes table and add their ids to the id cache 866 | sub read_write_node_ids 867 | { 868 | my $dbh = shift @_; 869 | my $nodes_ref = shift @_; 870 | my $success = 1; 871 | 872 | # read node_ids for these nodes into our cache 873 | read_node_ids($dbh, $nodes_ref); 874 | 875 | # if still missing nodes, we need to insert them 876 | my @missing_nodes = (); 877 | foreach my $node (@$nodes_ref) { 878 | if (not defined $IDcache{nodes}{$node}) { push @missing_nodes, $node; } 879 | } 880 | if (@missing_nodes > 0) { 881 | my @q_nodes = map $dbh->quote($_), @missing_nodes; 882 | my $values = join("),(", @q_nodes); 883 | my $sql = "INSERT IGNORE INTO `nodes` (`name`) VALUES ($values);"; 884 | my $sth = $dbh->prepare($sql); 885 | if (not $sth->execute ()) { 886 | log_error ("Inserting nodes: $sql --> " . $dbh->errstr . "\n"); 887 | $success = 0; 888 | } 889 | 890 | # fetch ids for just inserted nodes 891 | read_node_ids($dbh, $nodes_ref); 892 | } 893 | 894 | return $success; 895 | } 896 | 897 | # given a job_id and a nodelist, 898 | # insert jobs_nodes records for each node used in job_id 899 | sub insert_job_nodes 900 | { 901 | my $dbh = shift @_; 902 | my $job_id = shift @_; 903 | my $nodelist = shift @_; 904 | my $success = 1; 905 | 906 | if (defined $job_id and defined $nodelist and $nodelist ne "") { 907 | my $q_job_id = $dbh->quote($job_id); 908 | 909 | # clean up potentially bad nodelist 910 | if ($nodelist =~ /\[/ and $nodelist !~ /\]/) { 911 | # found an opening bracket, but no closing bracket, 912 | # nodelist is probably incomplete 913 | # chop back to last ',' or '-' and replace with a ']' 914 | $nodelist =~ s/[,-]\d+$/\]/; 915 | } 916 | 917 | # get our nodeset 918 | my @nodes = Hostlist::expand($nodelist); 919 | 920 | # this will fill our node_id cache 921 | read_write_node_ids($dbh, \@nodes); 922 | 923 | # get the node_id for each node 924 | my @values = (); 925 | foreach my $node (@nodes) { 926 | if (defined $IDcache{nodes}{$node}) { 927 | my $q_node_id = $dbh->quote($IDcache{nodes}{$node}); 928 | push @values, "($q_job_id,$q_node_id)"; 929 | } 930 | } 931 | 932 | # if we have any nodes for this job, insert them 933 | if (@values > 0) { 934 | my $sql = "INSERT DELAYED IGNORE INTO `jobs_nodes`" . 935 | " (`job_id`,`node_id`)" . 936 | " VALUES " . join(",", @values) . ";"; 937 | my $sth = $dbh->prepare($sql); 938 | if (not $sth->execute ()) { 939 | log_error ("Inserting jobs_nodes records for job id" . 940 | " $job_id: $sql --> " . $dbh->errstr . "\n"); 941 | $success = 0; 942 | } 943 | } 944 | } 945 | 946 | return $success; 947 | } 948 | 949 | # compute time since epoch, attempt to account for DST changes via timelocal 950 | sub get_seconds 951 | { 952 | my ($date) = @_; 953 | use Time::Local; 954 | 955 | my ($y, $m, $d, $H, $M, $S) = ($date =~ /(\d\d\d\d)\-(\d\d)\-(\d\d) (\d\d):(\d\d):(\d\d)/); 956 | $y -= 1900; 957 | $m -= 1; 958 | 959 | return timelocal ($S, $M, $H, $d, $m, $y); 960 | } 961 | 962 | # given hash of values, create mysql values string for insert statement 963 | sub value_string_v2 964 | { 965 | my $dbh = shift @_; 966 | my $h = shift @_; 967 | 968 | # given start and end times, compute the number of seconds 969 | # the job ran for 970 | # TODO: unsure whether this correctly handles jobs that 971 | # straddle DST changes 972 | my $seconds = 0; 973 | if (defined $h->{StartTime} and $h->{StartTime} !~ /^\s*$/ and 974 | defined $h->{EndTime} and $h->{EndTime} !~ /^\s*$/) 975 | { 976 | my $start = get_seconds($h->{StartTime}); 977 | my $end = get_seconds($h->{EndTime}); 978 | $seconds = $end - $start; 979 | if ($seconds < 0) { $seconds = 0; } 980 | } 981 | 982 | # if Procs is not set, but cores is specified and NodeCnt is set, 983 | # compute Procs 984 | # (assumes all processors on the node were allocated to the job, 985 | # only use for clusters which use whole-node allocation) 986 | if (not defined $h->{Procs} and defined $conf{cores} and 987 | defined $h->{NodeCnt} 988 | ) 989 | { 990 | $h->{Procs} = $h->{NodeCnt} * $conf{cores}; 991 | } 992 | 993 | # get id values 994 | my $username_id = read_write_id($dbh, "usernames", $h->{UserName}); 995 | my $jobname_id = read_write_id($dbh, "jobnames", $h->{Name}); 996 | my $jobstate_id = read_write_id($dbh, "jobstates", $h->{JobState}); 997 | my $partition_id = read_write_id($dbh, "partitions", $h->{Partition}); 998 | if (not defined $username_id or 999 | not defined $jobname_id or 1000 | not defined $jobstate_id or 1001 | not defined $partition_id) 1002 | { 1003 | log_error ("Missing an id for one of: jobid=$h->{JobId}," . 1004 | " username=$h->{UserName}, jobname=$h->{Name}," . 1005 | " jobstate=$h->{JobState}, partition=$h->{Partition}\n"); 1006 | log_error ("Missing an id for one of: $username_id -- $jobname_id" . 1007 | " -- $jobstate_id -- $partition_id\n"); 1008 | } 1009 | 1010 | # insert the field values, order matters 1011 | my @parts = (); 1012 | push @parts, (defined $h->{Id}) ? $dbh->quote($h->{Id}) : "NULL"; 1013 | push @parts, $dbh->quote($h->{JobId}); 1014 | push @parts, $dbh->quote($username_id); 1015 | push @parts, $dbh->quote($h->{UserNumb}); 1016 | push @parts, $dbh->quote($jobname_id); 1017 | push @parts, $dbh->quote($jobstate_id); 1018 | push @parts, $dbh->quote($partition_id); 1019 | push @parts, $dbh->quote($h->{TimeLimit}); 1020 | push @parts, $dbh->quote($h->{StartTime}); 1021 | push @parts, $dbh->quote($h->{EndTime}); 1022 | push @parts, $dbh->quote($seconds); 1023 | push @parts, $dbh->quote($h->{NodeList}); 1024 | push @parts, $dbh->quote($h->{NodeCnt}); 1025 | push @parts, (defined $h->{Procs}) ? $dbh->quote($h->{Procs}) : 0; 1026 | 1027 | # finally, return the ('field1','field2',...) string 1028 | return "(" . join(',', @parts) . ")"; 1029 | } 1030 | 1031 | # drop all v2 tables 1032 | sub drop_slurm_joblog_table_v2 1033 | { 1034 | my $dbh = shift @_; 1035 | my $success = 1; 1036 | 1037 | # switch to the slurm db 1038 | if (not do_sql ($dbh, "USE $conf{db};")) { $success = 0; } 1039 | 1040 | # now drop the tables 1041 | if (not do_sql($dbh, "DROP TABLE `jobs`;")) { $success = 0; } 1042 | if (not do_sql($dbh, "DROP TABLE `usernames`;")) { $success = 0; } 1043 | if (not do_sql($dbh, "DROP TABLE `jobnames`;")) { $success = 0; } 1044 | if (not do_sql($dbh, "DROP TABLE `jobstates`;")) { $success = 0; } 1045 | if (not do_sql($dbh, "DROP TABLE `partitions`;")) { $success = 0; } 1046 | if (not do_sql($dbh, "DROP TABLE `nodes`;")) { $success = 0; } 1047 | if (not do_sql($dbh, "DROP TABLE `jobs_nodes`;")) { $success = 0; } 1048 | 1049 | return $success; 1050 | } 1051 | 1052 | # build all v2 tables 1053 | sub create_slurm_joblog_table_v2 1054 | { 1055 | my $dbh = shift @_; 1056 | my $success = 1; 1057 | 1058 | # switch to the slurm db 1059 | if (not do_sql ($dbh, "USE $conf{db};")) { $success = 0; } 1060 | 1061 | # nodelist can be null since some jobs are canceled before 1062 | # ever being assigned resources 1063 | my $sql = "CREATE TABLE IF NOT EXISTS `jobs` ( 1064 | `id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, 1065 | `jobid` INT NOT NULL, 1066 | `username_id` INT UNSIGNED NOT NULL, 1067 | `userid` INT NOT NULL, 1068 | `jobname_id` INT UNSIGNED NOT NULL, 1069 | `jobstate_id` INT UNSIGNED NOT NULL, 1070 | `partition_id` INT UNSIGNED NOT NULL, 1071 | `timelimit` INT NOT NULL, 1072 | `starttime` DATETIME NOT NULL, 1073 | `endtime` DATETIME NOT NULL, 1074 | `runtime` INT UNSIGNED NOT NULL, 1075 | `nodelist` BLOB NOT NULL, 1076 | `nodecount` INT UNSIGNED NOT NULL, 1077 | `corecount` INT UNSIGNED NOT NULL, 1078 | UNIQUE INDEX `jobid` (`jobid`,`starttime`), 1079 | INDEX `username_id` (`username_id`), 1080 | INDEX `jobname_id` (`jobname_id`), 1081 | INDEX `starttime` (`starttime`), 1082 | INDEX `endtime` (`endtime`), 1083 | INDEX `runtime` (`runtime`), 1084 | INDEX `nodecount` (`nodecount`), 1085 | INDEX `corecount` (`corecount`) 1086 | ) ENGINE=MyISAM;"; 1087 | if (not do_sql ($dbh, $sql)) { $success = 0; } 1088 | 1089 | # NOTE: The UNIQUE INDEX below ensures that two jobs with 1090 | # the same name (etc.) that complete around the same time 1091 | # do not insert two records. The downside is that the 1092 | # index prefix length cannot be more than 1000 bytes, so 1093 | # names must be limited by the prefix size. 1094 | 1095 | # maps username strings to unique ids 1096 | $sql = "CREATE TABLE IF NOT EXISTS `usernames` ( 1097 | `id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, 1098 | `name` VARCHAR(512) NOT NULL, 1099 | UNIQUE INDEX `name` (`name`(512)) 1100 | ) ENGINE=MyISAM;"; 1101 | if (not do_sql ($dbh, $sql)) { $success = 0; } 1102 | 1103 | # maps partition name strings to unique ids 1104 | $sql = "CREATE TABLE IF NOT EXISTS `partitions` ( 1105 | `id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, 1106 | `name` VARCHAR(512) NOT NULL, 1107 | UNIQUE INDEX `name` (`name`(512)) 1108 | ) ENGINE=MyISAM;"; 1109 | if (not do_sql ($dbh, $sql)) { $success = 0; } 1110 | 1111 | # maps job state strings to unique ids 1112 | $sql = "CREATE TABLE IF NOT EXISTS `jobstates` ( 1113 | `id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, 1114 | `name` VARCHAR(512) NOT NULL, 1115 | UNIQUE INDEX `name` (`name`(512)) 1116 | ) ENGINE=MyISAM;"; 1117 | if (not do_sql ($dbh, $sql)) { $success = 0; } 1118 | 1119 | # maps job name strings to unique ids 1120 | $sql = "CREATE TABLE IF NOT EXISTS `jobnames` ( 1121 | `id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, 1122 | `name` VARCHAR(512) NOT NULL, 1123 | UNIQUE INDEX `name` (`name`(512)) 1124 | ) ENGINE=MyISAM;"; 1125 | if (not do_sql ($dbh, $sql)) { $success = 0; } 1126 | 1127 | # maps node name strings to unique ids 1128 | $sql = "CREATE TABLE IF NOT EXISTS `nodes` ( 1129 | `id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, 1130 | `name` VARCHAR(512) NOT NULL, 1131 | UNIQUE INDEX `name` (`name`(512)) 1132 | ) ENGINE=MyISAM;"; 1133 | if (not do_sql ($dbh, $sql)) { $success = 0; } 1134 | 1135 | # insert a record for each node a job uses 1136 | $sql = "CREATE TABLE IF NOT EXISTS `jobs_nodes` ( 1137 | `job_id` INT UNSIGNED NOT NULL, 1138 | `node_id` INT UNSIGNED NOT NULL, 1139 | UNIQUE INDEX `job_node` (`job_id`,`node_id`), 1140 | INDEX `node_id` (`node_id`) 1141 | ) ENGINE=MyISAM;"; 1142 | if (not do_sql ($dbh, $sql)) { $success = 0; } 1143 | 1144 | return $success; 1145 | } 1146 | 1147 | # convert all data in version 1 table to version 2 schema 1148 | sub convert_slurm_joblog_table_from_v1_to_v2 1149 | { 1150 | my $dbh = shift @_; 1151 | my $success = 1; 1152 | 1153 | # switch to the slurm db 1154 | if (not do_sql ($dbh, "USE $conf{db};")) { $success = 0; } 1155 | 1156 | # check that there is an older table to convert from 1157 | if (not table_exists ($dbh, "slurm_job_log")) { 1158 | log_msg ("convert: 'slurm_job_log' table does not exist.\n"); 1159 | return 0; 1160 | } 1161 | 1162 | # if our new table does not exist, create it 1163 | if (not table_exists ($dbh, "jobs")) { 1164 | log_msg ("convert: 'jobs' table does not exist," . 1165 | " attempting to create it.\n"); 1166 | if (not create_slurm_joblog_table_v2($dbh)) { 1167 | return 0; 1168 | } 1169 | } 1170 | 1171 | # if tracking and --noindicies was specified, 1172 | # remove indicies before insert, we'll add them back later 1173 | if ($conf{track} and not $conf{indicies}) { 1174 | my $drop = "ALTER TABLE `jobs_nodes`" . 1175 | " DROP INDEX `job_node`, DROP INDEX `node_id`;"; 1176 | if (not do_sql ($dbh, $drop)) { 1177 | log_error ("Problem dropping node tracking indicies.\n"); 1178 | } 1179 | } 1180 | 1181 | # get the total count of jobs in the database 1182 | # (used to print percentage of progress) 1183 | my $total_count = 0; 1184 | my $sth_count = $dbh->prepare("SELECT COUNT(*) FROM `slurm_job_log`;"); 1185 | if ($sth_count->execute()) { 1186 | ($total_count) = $sth_count->fetchrow_array(); 1187 | } 1188 | my $milemarker = int($total_count / 100); 1189 | if ($milemarker == 0) { $milemarker = 1; } 1190 | 1191 | # now grab all of the jobs and insert them one-by-one 1192 | my $sth_all_jobs = $dbh->prepare("SELECT * FROM `slurm_job_log`;"); 1193 | my $job_id = undef; 1194 | if ($sth_all_jobs->execute()) { 1195 | my $count = 0; 1196 | my $time_sum = 0; 1197 | while (my @parts = $sth_all_jobs->fetchrow_array()) { 1198 | # start timer 1199 | my ($start_secs, $start_micros) = gettimeofday(); 1200 | 1201 | my %h = (); 1202 | # throw id away, we'll get a new one any way, 1203 | # and this way we can run the conversion on a live machine, 1204 | # since slurm-joblog.pl will be inserting records as this 1205 | # conversion is running 1206 | #$h{Id} = $parts[0]; 1207 | $h{JobId} = $parts[1]; 1208 | $h{UserName} = $parts[2]; 1209 | $h{UserNumb} = $parts[3]; 1210 | $h{Name} = $parts[4]; 1211 | $h{JobState} = $parts[5]; 1212 | $h{Partition} = $parts[6]; 1213 | $h{TimeLimit} = $parts[7]; 1214 | $h{StartTime} = $parts[8]; 1215 | $h{EndTime} = $parts[9]; 1216 | $h{NodeList} = $parts[10]; 1217 | $h{NodeCnt} = $parts[11]; 1218 | # Procs field wasn't defined in version 1 schema 1219 | #$h{Procs} = $parts[12]; 1220 | 1221 | # bug in version 1, which set nodecount to 1 for blank hostlists 1222 | my $hostlist = $h{NodeList}; 1223 | if ($hostlist =~ /^\s*$/) { 1224 | $h{NodeList} = ""; 1225 | $h{NodeCnt} = 0; 1226 | } 1227 | 1228 | # build the values string 1229 | my $values = value_string_v2($dbh, \%h); 1230 | 1231 | # insert the job 1232 | if ($conf{track}) { 1233 | # insert the job, need to wait on the insert 1234 | # since we need the job_id 1235 | my $sql = "INSERT IGNORE INTO `$conf{db}`.`jobs`" . 1236 | " VALUES $values;"; 1237 | if (not do_sql($dbh, $sql)) { 1238 | $success = 0; 1239 | } else { 1240 | # now insert nodes used by this job 1241 | my $job_id = get_last_insert_id ($dbh); 1242 | if (defined $job_id and $job_id != 0) { 1243 | insert_job_nodes ($dbh, $job_id, $h{NodeList}); 1244 | } 1245 | } 1246 | } else { 1247 | # insert the job, no need to wait on it 1248 | my $sql = "INSERT DELAYED IGNORE INTO `$conf{db}`.`jobs`" . 1249 | " VALUES $values;"; 1250 | if (not do_sql($dbh, $sql)) { $success = 0; } 1251 | } 1252 | 1253 | # stop timer and print timing and progress as we go 1254 | my ($end_secs, $end_micros) = gettimeofday(); 1255 | my $micros = ($end_secs * 1000000 + $end_micros) - 1256 | ($start_secs * 1000000 + $start_micros); 1257 | $time_sum += $micros; 1258 | $count++; 1259 | if ($count % $milemarker == 0) { 1260 | my $avg_time = int($time_sum / $count); 1261 | my $perc = ""; 1262 | if ($total_count > 0) { 1263 | $perc = sprintf("%.0f", $count / $total_count * 100); 1264 | } 1265 | log_msg ("Records converted $count ($perc%):" . 1266 | " $avg_time usec / record\n"); 1267 | $time_sum = 0; 1268 | } 1269 | } 1270 | } else { 1271 | # select against version 1 table failed 1272 | $success = 0; 1273 | } 1274 | 1275 | # rebuild indicies 1276 | if ($conf{track} and not $conf{indicies}) { 1277 | my $rebuild = "ALTER TABLE `jobs_nodes`" . 1278 | " ADD UNIQUE INDEX `job_node` (`job_id`,`node_id`)," . 1279 | " ADD INDEX `node_id` (`node_id`);"; 1280 | if (not do_sql ($dbh, $rebuild)) { 1281 | log_error ("Problem rebuilding node tracking indicies.\n"); 1282 | } 1283 | } 1284 | 1285 | return $success; 1286 | } 1287 | 1288 | # backfill data from files into version 2 tables 1289 | sub backfill_slurm_joblog_table_to_v2 1290 | { 1291 | my $dbh = shift @_; 1292 | my @files = @_; 1293 | my $success = 1; 1294 | 1295 | # switch to the slurm db 1296 | do_sql ($dbh, "USE $conf{db};"); 1297 | 1298 | # if our new table does not exist, create it 1299 | if (not table_exists ($dbh, "jobs")) { 1300 | create_slurm_joblog_table_v2($dbh); 1301 | } 1302 | 1303 | log_error ("No files to backfill!\n") if (!@files); 1304 | 1305 | # if tracking and --noindicies was specified, 1306 | # remove indicies before insert, we'll add them back later 1307 | if ($conf{track} and not $conf{indicies}) { 1308 | my $drop = "ALTER TABLE `jobs_nodes`" . 1309 | " DROP INDEX `job_node`," . 1310 | " DROP INDEX `node_id`;"; 1311 | if (not do_sql ($dbh, $drop)) { 1312 | log_error ("Problem dropping node tracking indicies.\n"); 1313 | } 1314 | } 1315 | 1316 | my $count = 0; 1317 | my $time_sum = 0; 1318 | foreach my $file (@files) { 1319 | my $skipped = 0; 1320 | 1321 | my $f = $file; 1322 | $f = "gzip -dc $f | " if ($f =~ /\.gz$/); 1323 | 1324 | open (IN, $f) or log_error ("Failed to open \"$file\":$!\n"), next; 1325 | 1326 | while (my $line = ) { 1327 | # start timer 1328 | my ($start_secs, $start_micros) = gettimeofday(); 1329 | 1330 | chomp $line; 1331 | my @parts = split(" ", $line); 1332 | 1333 | my %h = (); 1334 | foreach my $part (@parts) { 1335 | my ($key, $value) = split("=", $part); 1336 | $h{$key} = $value; 1337 | } 1338 | 1339 | # Some very old joblog files may have the incorrect 1340 | # datetime format. Unfortunately, the year wasn't 1341 | # included in these, so we have to drop these entries :-( 1342 | if (defined $h{StartTime} and $h{StartTime} =~ m{^\d\d/\d\d-}) { 1343 | $skipped++; 1344 | next; 1345 | } 1346 | 1347 | if ($conf{recalculate_nodecount} && defined $h{NodeList}) { 1348 | my $hostlist = $h{NodeList}; 1349 | if ($hostlist =~ /^\s*$/) { 1350 | $h{NodeCnt} = 0; 1351 | } 1352 | else { 1353 | $h{NodeCnt} = Hostlist::expand($hostlist) 1354 | } 1355 | } 1356 | 1357 | # convert from slurm log to format for MySQL 1358 | if (defined $h{"UserId"}) { 1359 | my $userid = $h{"UserId"}; 1360 | my ($username, $usernumb) = ($userid =~ /(.+)\((\d+)\)/); 1361 | if (defined $username and defined $usernumb) { 1362 | $h{"UserName"} = $username; 1363 | $h{"UserNumb"} = $usernumb; 1364 | } 1365 | } 1366 | if (defined $h{"StartTime"}) { 1367 | $h{"StartTime"} =~ s/T/ /; 1368 | } 1369 | if (defined $h{"EndTime"}) { 1370 | $h{"EndTime"} =~ s/T/ /; 1371 | } 1372 | 1373 | # set the values 1374 | my $values = value_string_v2($dbh, \%h); 1375 | 1376 | # insert the job 1377 | if ($conf{track}) { 1378 | # insert the job, need to wait on the insert 1379 | # since we need the job_id 1380 | my $sql = "INSERT IGNORE INTO `$conf{db}`.`jobs`" . 1381 | " VALUES $values;"; 1382 | if (not do_sql($dbh, $sql)) { 1383 | $success = 0; 1384 | } else { 1385 | # now insert nodes used by this job 1386 | my $job_id = get_last_insert_id ($dbh); 1387 | if (defined $job_id and $job_id != 0) { 1388 | insert_job_nodes ($dbh, $job_id, $h{NodeList}); 1389 | } 1390 | } 1391 | } else { 1392 | # insert the job, no need to wait on it 1393 | my $sql = "INSERT DELAYED IGNORE INTO `$conf{db}`.`jobs`" . 1394 | " VALUES $values;"; 1395 | if (not do_sql($dbh, $sql)) { $success = 0; } 1396 | } 1397 | 1398 | # stop timer and print timing and progress as we go 1399 | my ($end_secs, $end_micros) = gettimeofday(); 1400 | my $micros = ($end_secs * 1000000 + $end_micros) - 1401 | ($start_secs * 1000000 + $start_micros); 1402 | $time_sum += $micros; 1403 | $count++; 1404 | if ($count % 1000 == 0) { 1405 | my $avg_time = int($time_sum / $count); 1406 | log_msg ("Records converted $count:" . 1407 | " $avg_time usec / record\n"); 1408 | $time_sum = 0; 1409 | } 1410 | } 1411 | 1412 | log_verbose ("Backfilled $count jobs from file $file\n"); 1413 | log_error ("Skipped $skipped job(s) from file $file because of ", 1414 | "old date format\n") if $skipped; 1415 | 1416 | close(IN); 1417 | } 1418 | 1419 | # rebuild indicies 1420 | if ($conf{track} and not $conf{indicies}) { 1421 | my $rebuild = "ALTER TABLE `jobs_nodes`" . 1422 | " ADD UNIQUE INDEX `job_node` (`job_id`,`node_id`)," . 1423 | " ADD INDEX `node_id` (`node_id`);"; 1424 | if (not do_sql ($dbh, $rebuild)) { 1425 | log_error ("Problem rebuilding node tracking indicies.\n"); 1426 | } 1427 | } 1428 | 1429 | return $success; 1430 | } 1431 | 1432 | #################### 1433 | # Utility functions 1434 | #################### 1435 | 1436 | # append records to file 1437 | sub dump_slurm_joblog_table 1438 | { 1439 | my $version = shift @_; 1440 | my $dbh = shift @_; 1441 | my $joblog = shift @_; 1442 | my $success = 1; 1443 | 1444 | # switch to the slurm db 1445 | if (not do_sql ($dbh, "USE $conf{db};")) { $success = 0; } 1446 | 1447 | # check that we have a table to get data from 1448 | if ($version > 1) { 1449 | if (not table_exists ($dbh, "jobs")) { 1450 | log_msg ("'jobs' table does not exist.\n"); 1451 | return 0; 1452 | } 1453 | } else { 1454 | if (not table_exists ($dbh, "slurm_job_log")) { 1455 | log_msg ("'slurm_job_log' table does not exist.\n"); 1456 | return 0; 1457 | } 1458 | } 1459 | 1460 | # if prune is set, check that the date format is valid, 1461 | # and check that we're not also obfuscating 1462 | my $date = undef; 1463 | if (defined $conf{prune}) { 1464 | # can't prune and obfuscate at the same time 1465 | if ($conf{obfuscate}) { 1466 | log_fatal ("You cannot prune and obfuscate at the same time.\n"); 1467 | } 1468 | 1469 | # make sure the date is valid format 1470 | if ($conf{prune} !~ /^\d\d\d\d\-\d\d\-\d\d \d\d:\d\d:\d\d$/) { 1471 | log_fatal ("Invalid prune date: $conf{prune}." . 1472 | " Must be in --prune='yyyy-mm-dd hh:mm:ss' format.\n"); 1473 | } 1474 | 1475 | # ok, build out date qualifier 1476 | $date = "`starttime` < " . $dbh->quote($conf{prune}); 1477 | 1478 | # TODO: if tracking, remove indicies from jobs_nodes and 1479 | # add back after we're done? 1480 | } elsif (defined $conf{backup}) { 1481 | # make sure the date is valid format 1482 | if ($conf{backup} =~ /^all$/i) { 1483 | # nothing to do here 1484 | } elsif ($conf{backup} =~ /^(\d\d\d\d\-\d\d\-\d\d \d\d:\d\d:\d\d)$/) { 1485 | $date = "`starttime` < " . $dbh->quote($1); 1486 | } elsif ($conf{backup} =~ /^(\d\d\d\d\-\d\d\-\d\d \d\d:\d\d:\d\d)\.\.(\d\d\d\d\-\d\d\-\d\d \d\d:\d\d:\d\d)$/) { 1487 | $date = "`starttime` >= " . $dbh->quote($1) . 1488 | " AND `starttime` < " . $dbh->quote($2); 1489 | } else { 1490 | log_fatal ("Invalid backup range: $conf{backup}." . 1491 | " Must be one of: \"all\", DATE, or DATE..DATE;" . 1492 | " where DATE is 'yyyy-mm-dd hh:mm:ss'.\n"); 1493 | } 1494 | } 1495 | 1496 | # open our output file 1497 | if (!open (JOBLOG, ">>$joblog")) { 1498 | log_fatal ("Unable to open $joblog: $!\n"); 1499 | } 1500 | 1501 | my $stmt = ""; 1502 | 1503 | # build a statement to get the total count of jobs in 1504 | # the database (used to print percentage of progress) 1505 | my $total_count = 0; 1506 | if ($version > 1) { 1507 | $stmt = "SELECT COUNT(*) FROM `jobs`"; 1508 | } else { 1509 | $stmt = "SELECT COUNT(*) FROM `slurm_job_log`"; 1510 | } 1511 | if (defined $date) { $stmt .= " WHERE $date"; } 1512 | $stmt .= " ORDER BY `starttime`,`id` ASC;"; 1513 | 1514 | # get the count 1515 | log_debug ("$stmt\n"); 1516 | my $sth_count = $dbh->prepare($stmt); 1517 | if ($sth_count->execute()) { 1518 | ($total_count) = $sth_count->fetchrow_array(); 1519 | } 1520 | my $milemarker = int($total_count / 100); 1521 | if ($milemarker == 0) { $milemarker = 1; } 1522 | 1523 | # build a statement to select our records 1524 | if ($version > 1) { 1525 | $stmt = "SELECT" . 1526 | " `jobs`.*," . 1527 | "`usernames`.`name` as `username`," . 1528 | "`jobnames`.`name` as `jobname`," . 1529 | "`jobstates`.`name` as `jobstate`," . 1530 | "`partitions`.`name` as `partition`" . 1531 | " FROM `jobs`" . 1532 | " LEFT JOIN `usernames` ON `jobs`.`username_id` = `usernames`.`id`" . 1533 | " LEFT JOIN `jobnames` ON `jobs`.`jobname_id` = `jobnames`.`id`" . 1534 | " LEFT JOIN `jobstates` ON `jobs`.`jobstate_id` = `jobstates`.`id`" . 1535 | " LEFT JOIN `partitions` ON `jobs`.`partition_id` = `partitions`.`id`"; 1536 | } else { 1537 | $stmt = "SELECT * FROM `slurm_job_log`"; 1538 | } 1539 | if (defined $date) { $stmt .= " WHERE $date"; } 1540 | $stmt .= " ORDER BY `starttime`,`id` ASC;"; 1541 | 1542 | # now grab all of the jobs and append them one-by-one 1543 | log_debug ("$stmt\n"); 1544 | my $sth_all_jobs = $dbh->prepare($stmt); 1545 | if ($sth_all_jobs->execute()) { 1546 | my $count = 0; 1547 | my $time_sum = 0; 1548 | 1549 | while (my $h = $sth_all_jobs->fetchrow_hashref()) { 1550 | # start timer 1551 | my ($start_secs, $start_micros) = gettimeofday(); 1552 | 1553 | # bug in version 1, which set nodecount to 1 for blank hostlists 1554 | if ($$h{nodelist} =~ /^\s*$/) { 1555 | $$h{nodelist} = ""; 1556 | $$h{nodecount} = 0; 1557 | } 1558 | 1559 | # set time to proper format 1560 | $$h{starttime} =~ s/(\-\d\d) (\d\d:)/$1T$2/; 1561 | $$h{endtime} =~ s/(\-\d\d) (\d\d:)/$1T$2/; 1562 | 1563 | # set procs field 1564 | my $procs = undef; 1565 | if ($version > 1) { 1566 | $procs = $$h{'corecount'}; 1567 | } elsif (defined $conf{cores}) { 1568 | $procs = $$h{'nodecount'} * $conf{cores}; 1569 | } 1570 | 1571 | # optionally obfuscate username, userid, and jobname 1572 | my $username = $$h{'username'}; 1573 | my $userid = $$h{'userid'}; 1574 | my $jobname = $$h{'jobname'}; 1575 | if ($conf{obfuscate} and not defined $conf{prune}) { 1576 | # obfuscate username 1577 | if (not defined $obfuscate{usernames}{$username}) { 1578 | $num_users++; 1579 | $obfuscate{usernames}{$username} = $num_users; 1580 | } 1581 | $username = "user_" . $obfuscate{usernames}{$username}; 1582 | 1583 | # obfuscate userid 1584 | $userid = $num_users; 1585 | 1586 | # obfuscate jobname 1587 | if (not defined $obfuscate{jobnames}{$jobname}) { 1588 | $num_jobs++; 1589 | $obfuscate{jobnames}{$jobname} = $num_jobs; 1590 | } 1591 | $jobname = "job_" . $obfuscate{jobnames}{$jobname}; 1592 | } 1593 | 1594 | # append record to file 1595 | my @params = (); 1596 | push @params, $$h{'jobid'}; 1597 | push @params, $username; 1598 | push @params, $userid; 1599 | push @params, $jobname; 1600 | push @params, $$h{'jobstate'}; 1601 | push @params, $$h{'partition'}; 1602 | push @params, $$h{'timelimit'}; 1603 | push @params, $$h{'starttime'}; 1604 | push @params, $$h{'endtime'}; 1605 | push @params, $$h{'nodelist'}; 1606 | push @params, $$h{'nodecount'}; 1607 | if (defined $procs) { 1608 | push @params, "$procs"; 1609 | printf JOBLOG 1610 | "JobId=%s UserId=%s(%s) Name=%s JobState=%s Partition=%s " . 1611 | "TimeLimit=%s StartTime=%s EndTime=%s NodeList=%s " . 1612 | "NodeCnt=%s Procs=%s\n", @params; 1613 | } else { 1614 | printf JOBLOG 1615 | "JobId=%s UserId=%s(%s) Name=%s JobState=%s Partition=%s " . 1616 | "TimeLimit=%s StartTime=%s EndTime=%s NodeList=%s " . 1617 | "NodeCnt=%s\n", @params; 1618 | } 1619 | 1620 | # if we are pruning, delete the job and any associated records 1621 | if (defined $conf{prune}) { 1622 | my $id = $$h{'id'}; 1623 | if (defined $id) { 1624 | my $q_job_id = $dbh->quote($id); 1625 | 1626 | # if tracking, first delete all node records 1627 | if ($version > 1 and $conf{track}) { 1628 | do_sql ($dbh, "DELETE FROM `jobs_nodes`" . 1629 | " WHERE `job_id` = $q_job_id;"); 1630 | } 1631 | 1632 | # now delete the job record 1633 | if ($version > 1) { 1634 | do_sql ($dbh, "DELETE FROM `jobs`" . 1635 | " WHERE `id` = $q_job_id;"); 1636 | } else { 1637 | do_sql ($dbh, "DELETE FROM `slurm_job_log`" . 1638 | " WHERE `id` = $q_job_id;"); 1639 | } 1640 | } 1641 | } 1642 | 1643 | # stop timer and print timing and progress as we go 1644 | my ($end_secs, $end_micros) = gettimeofday(); 1645 | my $micros = ($end_secs * 1000000 + $end_micros) - 1646 | ($start_secs * 1000000 + $start_micros); 1647 | $time_sum += $micros; 1648 | $count++; 1649 | if ($count % $milemarker == 0) { 1650 | my $avg_time = int($time_sum / $count); 1651 | my $perc = ""; 1652 | if ($total_count > 0) { 1653 | $perc = sprintf("%.0f", $count / $total_count * 100); 1654 | } 1655 | log_msg ("Records written $count ($perc%):" . 1656 | " $avg_time usec / record\n"); 1657 | $time_sum = 0; 1658 | } 1659 | } 1660 | 1661 | log_msg ("Wrote $count jobs to $joblog.\n"); 1662 | } else { 1663 | # select against version 1 table failed 1664 | $success = 0; 1665 | } 1666 | 1667 | close (JOBLOG); 1668 | return $success; 1669 | } 1670 | 1671 | # 1672 | # Generate a digest of the password, sha1 or md5 depending on the 1673 | # size of the password column in the user table 1674 | # 1675 | sub passwd_digest 1676 | { 1677 | my $dbh = connect_db_root (); 1678 | my $passwd = shift @_; 1679 | 1680 | log_fatal ("passwd_digest: Failed to get DB handle!\n") if !$dbh; 1681 | 1682 | my $sth = $dbh->prepare ("SELECT PASSWORD('example');") 1683 | or log_fatal ($dbh->errstr); 1684 | 1685 | $sth->execute (); 1686 | 1687 | my ($r) = $sth->fetchrow_array (); 1688 | 1689 | if (length $r >= 41) { 1690 | return "*" . sha1_hex ($passwd); 1691 | } 1692 | 1693 | # I don't know what the short password hash is, so 1694 | # use of this function is disabled for now. 1695 | # 1696 | #return (unpack ("H16", pack ("A13", $c))); 1697 | } 1698 | 1699 | sub log_msg { print STDERR "$progname: ", @_; } 1700 | sub log_error { log_msg ("Error: ", @_); } 1701 | sub log_fatal { log_msg ("Fatal: ", @_); exit 1; } 1702 | sub log_verbose { log_msg (@_) if ($conf{verbose}); } 1703 | sub log_debug { log_msg (@_) if ($conf{verbose} > 1); } 1704 | 1705 | # vi: ts=4 sw=4 expandtab 1706 | -------------------------------------------------------------------------------- /sqlog-db-util.8: -------------------------------------------------------------------------------- 1 | .\" $Id$ 2 | .\" 3 | 4 | .TH SQLOG-DB-UTIL 8 "SQLOG Database Utility" 5 | 6 | .SH NAME 7 | sqlog-db-util \- Utility for SLURM job log database maintenance 8 | 9 | .SH SYNOPSIS 10 | .B sqlog-db-util 11 | [\fIOPTIONS\fR]... 12 | 13 | .SH DESCRIPTION 14 | The \fBsqlog-db-util\fR utility is an interface for creating and 15 | backfilling the SLURM job log database used by the \fBsqlog\fR(1) 16 | command. It reads the sqlog.conf and slurm-joblog.conf files to 17 | determine the DB users, passwords, and SQL host it should use 18 | for DB creation. 19 | 20 | .SH OPTIONS 21 | .TP 22 | .BI "-h, --help" 23 | Display a usage message. 24 | .TP 25 | .BI "-i, --info" 26 | Provide information about the currently configured DB, including the 27 | server hostname, read-only username, read-write username, SLURM job 28 | log database name, and the total number of jobs currently stored in 29 | the DB. 30 | .TP 31 | .BI "-v, --verbose" 32 | Increase verbosity. 33 | .TP 34 | .BI "-d, --drop=V" 35 | Drop tables for version V={1,2} of database schema. 36 | Currently, this option doesn't remove SLURM job log users or DB. 37 | .TP 38 | .BI "-c, --create" 39 | Create the SLURM job log DB, the slurm_job_log table, and the associated 40 | read-only and read-write users. 41 | .TP 42 | .BI "-b, --backfill" 43 | Backfill the database with SLURM job information using the list of files 44 | provided on the command line. Files should be in the format created by 45 | SLURM's jobcomp/filetxt plugin. If the files end in .gz, they will be 46 | automatically unzipped at runtime. Also see --cores-per-node. 47 | .TP 48 | .BI "-x, --convert" 49 | Convert data from database schema version 1 to version 2. Also see 50 | --cores-per-node. 51 | .TP 52 | .BI "-B, --backup=RANGE" 53 | Copy data from tables to a file (readable by --backfill). 54 | Specify records via a range of job start times. RANGE can be any 55 | of: "all", DATE, or DATE..DATE; where DATE is 'yyyy-mm-dd hh:mm:ss', 56 | to specify all jobs, all jobs with a start time older than DATE, 57 | or all jobs with a start time between DATE..DATE, respectively. 58 | Also see --cores-per-node and --obfuscate. 59 | .TP 60 | .BI "-o, --obfuscate" 61 | Obfuscate usernames, userids, and jobnames during a backup operation. 62 | This is useful when sharing joblogs with outside collaborators. 63 | .TP 64 | .BI "-p, --prune=DATE" 65 | Prune database of all jobs with start times older than DATE; write such records to a file. 66 | DATE must be in format of 'yyyy-mm-dd hh:mm:ss'. Also see --cores-per-node. 67 | .TP 68 | .BI "-C, --cores-per-node=N" 69 | The version 1 schema did not record the number of cores allocated to a job. 70 | For systems that allocate whole nodes to jobs and have the same number of 71 | cores per node, the number of cores allocated to a job can be computed 72 | by multiplying the number of node allocated to a job by the number of 73 | cores per node. On such systems, use the --cores-per-node option to specify the 74 | number of cores per node. This option can be used during --convert, 75 | --backfill, and --backup operations. 76 | .TP 77 | .BI "--notrack" 78 | Disable per-job node tracking for jobs inserted during convert 79 | or backfill operations. If node-tracking is enabled on a system, 80 | i.e., TRACKNODES is not set or is set to 1 in sqlog.conf, 81 | then such jobs will not show up in queries involving specific node names. 82 | .TP 83 | .BI "--delay-index" 84 | Temporarily disable node tracking indicies for jobs inserted during 85 | convert of backfill operations. This drops the indicies, inserts the jobs, 86 | and rebuilds the indicies. When converting or inserting lots of records, 87 | this speeds up the operation, however, it makes things slower when loading 88 | only a few records. 89 | .TP 90 | .BI "--recalc-nodecnt" 91 | Some versions of SLURM incorrectly set NODECNT in the jobcomp/script plugin 92 | and thus the nodecount may be incorrect in the sqlog database. Use of this 93 | option along with a \fB--backfill\fR will fix the incorrect nodecount 94 | values by recalculating directly from the nodelist in the joblog. 95 | .BI "-L, --localhost" 96 | Override the SQL host configuration and connect to DB on localhost. 97 | (May be required if root user is only allowed access to DB via localhost) 98 | 99 | .SH EXAMPLES 100 | Create database: 101 | .nf 102 | 103 | sqlog-db-util --create 104 | 105 | .fi 106 | Insert job records in database for all jobs in current SLURM txt joblog files: 107 | .nf 108 | 109 | sqlog-db-util --backfill /var/log/slurm/joblog* 110 | 111 | .fi 112 | Drop an existing version 2 database, recreate using current configuration, 113 | and seed the new database using SLURM joblog files: 114 | .nf 115 | 116 | sqlog-db-util -d 2 -cb /var/log/slurm/joblog* 117 | 118 | .fi 119 | 120 | .SH AUTHOR 121 | Written by Adam Moody and Mark Grondona. 122 | 123 | .SH SEE ALSO 124 | \fBsqlog\fR(1), /etc/slurm/sqlog.conf, /etc/slurm/slurm-joblog.conf 125 | -------------------------------------------------------------------------------- /sqlog.1: -------------------------------------------------------------------------------- 1 | .\" $Id$ 2 | .\" 3 | 4 | .TH SQLOG 1 "SLURM Query Log" 5 | 6 | .SH NAME 7 | sqlog \- SLURM query log utility 8 | 9 | .SH SYNOPSIS 10 | .B sqlog 11 | [\fIOPTIONS\fR]... 12 | 13 | .SH DESCRIPTION 14 | The \fBsqlog\fR utility provides a single interface to query information 15 | about jobs from the SLURM job log database and/or the current queue 16 | of running jobs. 17 | 18 | By default both the current queue of running jobs and the database 19 | of completed jobs are queried, and a limit of 25 results is displayed. 20 | If more results are available in the database or queue, sqlog will 21 | note the fact with an informational message to stderr: 22 | .nf 23 | 24 | sqlog: [More results available....] 25 | 26 | .fi 27 | This message is suppressed if the \fB--no-header\fR option is provided. 28 | 29 | .SH CONFIGURATION 30 | 31 | \fBsqlog\fR reads configuration from the \fBsqlog.conf\fR config file 32 | (typically in /etc/slurm). This config file provides information about 33 | the SLURM job log database location and username. In addition, 34 | \fBsqlog\fR reads user defaults and additional output format types 35 | from a ~/.sqlog file if it exists. See the USER CONFIG section 36 | below for more information. 37 | 38 | Various config parameters are set in the following order: 39 | Internal defaults, system configuration, user configuration, 40 | and command-line. 41 | 42 | .SH OPTIONS 43 | .TP 44 | .BI "-h, --help" 45 | Display a summary of the command-line options. 46 | .TP 47 | .BI "-v, --verbose" 48 | Increase debugging verbosity of the program. 49 | .TP 50 | .BI "--dry-run" 51 | Don't actually do anything. 52 | .TP 53 | .BI "-j, --jobids " LIST 54 | Provide a comma-separated list of jobids to include (or exclude if 55 | preceded by the \fI-x\fR, \fI--exclude\fR option). This option may 56 | be specified multiple times. 57 | .TP 58 | .BI "-J, --job-names " LIST 59 | Provide a comma-separated list of job names to include (or exclude if 60 | preceded by the \fI-x\fR, \fI--exclude\fR option). This option may 61 | be specified multiple times. 62 | .TP 63 | .BI "-n, --nodes " LIST 64 | Provide a comma-separated list of nodes or node lists to include 65 | (or exclude if preceded by the \fI-x\fR, \fI--exclude\fR option). 66 | This option may be specified multiple times. Node lists can be 67 | in hostlist format, e.g. host[34-36,67]. 68 | .TP 69 | .BI "-p, --partitions " LIST 70 | Provide a comma-separated list of partitions to include (or exclude if 71 | preceded by the \fI-x\fR, \fI--exclude\fR option). This option may 72 | be specified multiple times. 73 | .TP 74 | .BI "-s, --states " LIST 75 | Provide a comma-separated list of job states to include (or exclude if 76 | preceded by the \fI-x\fR, \fI--exclude\fR option). This option may 77 | be specified multiple times. Use --states=list to generate list of valid 78 | job state keys. 79 | .TP 80 | .BI "-u, --users " LIST 81 | Provide a comma-separated list of users to include (or exclude if 82 | preceded by the \fI-x\fR, \fI--exclude\fR option). This option may 83 | be specified multiple times. 84 | .TP 85 | .BI "--regex" 86 | Enable regular expressions instead of exact matching for the following list 87 | of jobids, job names, partitions, states, and user names. For example 88 | .nf 89 | 90 | sqlog --regex --job-names='^foo-.*' 91 | 92 | .fi 93 | .TP 94 | .BI "-x, --exclude" 95 | Exclude the following list of jobids, users, partitions, nodes, job names, 96 | or job states. For example: \fB--exclude --jobids\fR=\fILIST\fR or 97 | \fB-xn\fR \fINODES\fR. 98 | .TP 99 | .BI "-N, --nnodes " N 100 | List jobs that ran on exactly \fIN\fR nodes. \fIN\fR may also have the 101 | form +\fIN\fR, -\fIN\fR, or \fIN\fR..\fIM\fR, to specify a minimum, 102 | maximum, or range of node counts. For examples see RANGE OPERATORS 103 | section below. 104 | .TP 105 | .BI "--minnodes " N 106 | Explicity specify a minimum node count. This is equivalent to using 107 | the option \fB--nnodes\fR=+\fIN\fR. 108 | .TP 109 | .BI "--maxnodes " N 110 | Explicity specify a maximum node count. This is equivalent to using 111 | the option \fB--nnodes\fR=-\fIN\fR. 112 | .TP 113 | .BI "-C, --ncores " N 114 | List jobs that ran on exactly \fIN\fR cores. \fIN\fR may also have the 115 | form +\fIN\fR, -\fIN\fR, or \fIN\fR..\fIM\fR, to specify a minimum, 116 | maximum, or range of node counts. For examples see RANGE OPERATORS 117 | section below. 118 | .TP 119 | .BI "--mincores " N 120 | Explicity specify a minimum core count. This is equivalent to using 121 | the option \fB--ncores\fR=+\fIN\fR. 122 | .TP 123 | .BI "--maxcores " N 124 | Explicity specify a maximum core count. This is equivalent to using 125 | the option \fB--ncores\fR=-\fIN\fR. 126 | .TP 127 | .BI "-T, --runtime " DURATION 128 | List jobs that ran or have run for \fIDURATION\fR. \fIDURATION\fR may 129 | have the form DD-HH:MM:SS or DDdHHhMMmSSs, where DD is days, HH 130 | hours, MM minutes, and SS seconds. In the second form, values that 131 | are zero may be left out, e.g. 4h. In the first form, days, hours, 132 | and minutes are optional, e.g. :04 is 4 seconds. \fIDURATION\fR may 133 | be specified as +\fIT\fR, -\fIT\fR or \fIT1\fR..\fIT2\fR to specify 134 | a min, max, or runtime range. For more information, see the RANGE 135 | OPERATORS section below. 136 | .TP 137 | .BI "--mintime " DURATION 138 | List all jobs that ran for at least \fIDURATION\fR. 139 | This is equivalent to specifying \fB--runtime\fR=+\fIDURATION\fR. 140 | .TP 141 | .BI "--maxtime " DURATION 142 | List all jobs that ran for at most \fIDURATION\fR. 143 | This is equivalent to specifying \fB--runtime\fR=-\fIDURATION\fR. 144 | .TP 145 | .BI "-t, --time, --at " TIME 146 | List jobs which were running at a particular date and time. 147 | \fITIME\fR arguments are parsed by the perl \fBDate::Manip\fR(3pm) 148 | package, so many date/time formats are allowed, i.e. 2pm or 149 | "4/14 15:30:00". A window of time may be specified by separating the 150 | start and end of the window with "..", e.g. \fB--time\fR=2pm..3pm. 151 | For more information see the RANGE OPERATORS section below. 152 | .TP 153 | .BI "-S, --start " TIME 154 | List all jobs that started at date and time \fITIME\fR. \fITIME\fR may 155 | have the form +\fIT\fR, -\fIT\fR, or \fIT1\fR..\fIT2\fR to specify a 156 | minimum, maximum, or start time range. See RANGE OPERATORS below 157 | for more information. 158 | .TP 159 | .BI "--start-after " TIME 160 | List all jobs that started after time \fITIME\fR. This is equivalent 161 | to using \fB--start\fR=+\fITIME\fR. 162 | .TP 163 | .BI "--start-before " TIME 164 | List all jobs that started before time \fITIME\fR. This is equivalent 165 | to using \fB--start\fR=-\fITIME\fR. 166 | .TP 167 | .BI "-E, --end " TIME 168 | List all jobs that ended at date and time \fITIME\fR. \fITIME\fR may 169 | have the form +\fIT\fR, -\fIT\fR, or \fIT1\fR..\fIT2\fR to specify a 170 | minimum, maximum, or end time range (see RANGE OPERATORS below for 171 | more information). For running jobs, SLURM uses 172 | an estimated end time, so end times in the future are valid and will 173 | be used. (and using \fB--end\fR=+now would list all currently 174 | running jobs, since they all end in the future.) 175 | .TP 176 | .BI "--end-after " TIME 177 | List all jobs that ended (or will end) after time \fITIME\fR. This is 178 | equivalent to using \fB--end\fR=+\fITIME\fR. 179 | .TP 180 | .BI "--end-before " TIME 181 | List all jobs that ended (or will end) before time \fITIME\fR. This is 182 | equivalent to using \fB--end\fR=-\fITIME\fR. 183 | .TP 184 | .BI "-X, --no-running" 185 | Do not query running jobs, i.e. ignore the current queue and only 186 | query the SLURM job log database. 187 | .TP 188 | .BI "--no-db" 189 | Do not query the SLURM job log database, i.e. only query the current 190 | queue. 191 | .TP 192 | .BI "-H, --no-header" 193 | Do not display header rows in output. 194 | .TP 195 | .BI "-o, --format " LIST 196 | Specify a list of format keys to display or a format type, or both 197 | using the form \fITYPE\fR:\fIKEYS\fR... Use \fB--format\fR=list to 198 | list valid keys and types. See OUTPUT FORMAT below for further 199 | information. 200 | .TP 201 | .BI "-P, --sort " LIST 202 | Specify a list of keys on which to sort output. Prepend a '-' to sort 203 | in descending as opposed to ascending order. List valid keys 204 | using \fB--sort\fR=list. The default sort method is '-start'. 205 | .TP 206 | .BI "-L, --limit " N 207 | Limit the number of records to report (Default = 25). 208 | .TP 209 | .BI "-a, --all" 210 | Do not limit the number of returned results. (Return all matching rows). 211 | This is equivalent to \fB--limit\fR=0. 212 | 213 | .SH RANGE OPERATORS 214 | \fITIME\fR, \fIDURATION\fR, and numeric arguments may use the 215 | RANGE OPERATORS '+', '-', and '..' to specify minimum, maximum, 216 | or a range of values respectively. TIME arguments may also use 217 | the '@' symbol to escape a leading + or - in the TIME itself 218 | (e.g. '-1hr' means '1 hr ago'). The \fB--time\fB, \fB--start\fR, 219 | \fB--end\fR, \fB--runtime\fB, and \fB--nnodes\fR options to 220 | \fBsqlog\fR all take RANGE OPERATORS. 221 | .TP 222 | Examples 223 | .TP 20 224 | .BI "--nnodes " +8 225 | Jobs that ran with 8 or more nodes. 226 | .TP 227 | .BI "--nnodes " 16..32 228 | Jobs that ran with between 16 and 32 nodes, inclusive. 229 | .TP 230 | .BI "--runtime " -2h 231 | Jobs that ran for 2 hours or less. 232 | .TP 233 | .BI "--runtime " 5m..1hr 234 | Jobs that ran for between 5 minutes and 1 hour, inclusive. 235 | .TP 236 | .BI "--end " 2pm..3pm 237 | Jobs that ended today between 2PM and 3PM, inclusive. 238 | .TP 239 | .BI "--time " 7/17..7/18 240 | Jobs that ran anytime from 12AM, 7/17 to 12AM, 7/18. 241 | .TP 242 | .BI "--time " "+'1 hour ago'" 243 | Jobs that ran in the past hour (1 hour ago or later). 244 | .TP 245 | .BI "--time " "+-1hr (or +@-1hr)" 246 | Same as above. 247 | .TP 248 | .BI "--time " @-1hr 249 | Jobs that were running exactly at one hour ago. 250 | .TP 251 | .BI "--time " @-2hr..-1hr 252 | Jobs that were running between 2 hours ago and 1 hour ago. 253 | 254 | 255 | .SH USER CONFIGURATION 256 | When \fBsqlog\fR runs, it will first check for a ~/.sqlog file and 257 | parse it if it exists. At this time, the ~/.sqlog file may be used 258 | to set a new default limit (see \fB--limit\fR) and addtional output format 259 | types (see \fB--format\fR). These two configuration parameters take the form: 260 | .TP 20 261 | \fBlimit\fR = \fIN\fR 262 | Set the new default output limit to \fIN\fR. 263 | .TP 264 | \fBformat{\fINAME\fB}\fR = \fILIST...\fR 265 | Create an alias \fINAME\fR for the format list \fILIST\fR. 266 | .PP 267 | For example, the following sqlog file 268 | .nf 269 | # Sample ~/.sqlog file 270 | limit = 30 271 | format{mine} = long:start,end,jobid,user,state 272 | 273 | .fi 274 | would set the default output limit to 30 records and 275 | add a new format type \fImine\fR. The new format type would 276 | be used by specifying 277 | .nf 278 | 279 | \fB--format\fR \fImine\fR 280 | 281 | .fi 282 | on the command line, which would be equivalent to 283 | .nf 284 | 285 | \fB--format\fR long:start,end,jobid,user,state 286 | 287 | .fi 288 | Any number of format types may be specified in this way, though 289 | if there are duplicate names, the last one specified will override 290 | all previous types. This also implies that a user can redefine 291 | the default \fBsqlog\fR format types \fIshort\fR, \fIlong\fR, 292 | and \fIfreeform\fR, though this is not recommended. 293 | 294 | .SH OUTPUT FORMAT 295 | \fBsqlog\fR provides precise control over the output format, which aids with 296 | readability and simplifies parsing via scripts. When parsing output, be sure 297 | to specify each field and the expected order using the -o,--format option. 298 | The built-in formats (short, long, and freeform) may add or reorder fields 299 | over time. 300 | 301 | By default, \fBsqlog\fR uses the output format 302 | .nf 303 | 304 | short:jobid,partition,name,user,state,start,runtime,ncores,nnodes,nodes 305 | 306 | .fi 307 | 308 | The \fIshort:\fR preceeding the format specification tells \fBsqlog\fR 309 | to use the \fIshort\fR form of each of the format keys. The result 310 | is what you see when running \fBsqlog\fR without using the -o,--format 311 | option. All format keys currently available are detailed here. Some 312 | keys have shorter aliases that are provided for convenience. These 313 | are listed alongside the full key name below. Note that all these 314 | keys can also be listed by using --\fIformat=list\fR. 315 | .TP 20 316 | .B "jobid | jid" 317 | The SLURM jobid for this job. 318 | .TP 319 | .B "partition | part" 320 | The SLURM partition in which the job ran or is running. 321 | .TP 322 | .B "name" 323 | The name of the job as recorded by SLURM. 324 | .TP 325 | .B "user" 326 | The username of the user running the job. 327 | .TP 328 | .B "state | st" 329 | The current or final state of the job. See JOB STATE CODES 330 | for a description of the two-letter codes that this field 331 | displays by default. 332 | .TP 333 | .B "start" 334 | The start time of the job in the form MM/DD-HH:MM:SS. 335 | .TP 336 | .B "runtime | time" 337 | The total runtime of the job in the form 338 | DD-HH:MM:SS. Leading zero values may be dropped, 339 | for instance 4:30 is 4 minutes 30 seconds. 340 | .TP 341 | .B "ncores | C" 342 | The number of cores allocated to the job. 343 | .TP 344 | .B "nnodes | N" 345 | The number of nodes allocated to the job. 346 | .TP 347 | .B "nodes" 348 | The nodelist that was allocated to the job. Note that for 349 | completing jobs (CG) this nodelist will be restricted to 350 | the currently completing nodes for the job. To see the 351 | full nodelist, restrict \fBsqlog\fR to the database only, 352 | i.e. run with the -X, --no-running option. 353 | .TP 354 | .B "runtime_s | time_s" 355 | The total job runtime in seconds. 356 | .TP 357 | .B "end" 358 | The time at which the job completed in the form 359 | MM/DD-HH:MM:SS. 360 | .TP 361 | .B "longstart" 362 | Date and time the job started in the form 363 | YYYY-MM-DDTHH:MM:SS. This is displayed by 364 | default in the \fIlong\fR format type. 365 | .TP 366 | .B "longend" 367 | Date and time the job ended in the form 368 | YYYY-MM-DDTHH:MM:SS. This is displayed by 369 | default in the \fIlong\fR format type. 370 | .TP 371 | .B "unixstart" 372 | Job start time in seconds since epoch. 373 | .TP 374 | .B "unixend" 375 | Job end time in seconds since epoch. 376 | .TP 0 377 | 378 | A format type may be specified in addition to the format fields. These change the output width and in some cases the output format of the fields above. The format type may also be specified alone to the \fI--format\fR option. For instance \fI--format=long\fR would choose the default fields configured for the \fIlong\fR format type. 379 | 380 | .TP 20 381 | .B "short" 382 | This is the default output type. It uses the format fields: 383 | jobid,part,name,user,state,start,runtime,ncores,nnodes,nodes 384 | .TP 385 | .B "long" 386 | This format type uses longer widths for most fields, and 387 | displays the the full job state code by default (e.g. 388 | completing instead of CG). Its default format fields are: 389 | jobid,part,name,user,state,longstart,longend,runtime,ncores,nnodes,nodes 390 | .TP 391 | .B "freeform" 392 | This is a freeform output in which full width fields are displayed 393 | separated by whitespace. This would be used for parsing sqlog 394 | output for instance, to guarantee no field is trunctated. 395 | It uses the same format fields as the \fBlong\fR format type. 396 | 397 | .SH JOB STATE CODES 398 | In normal output, job states are displayed with two letter abbreviations 399 | in \fBsqlog\fR output. Job state codes are fully explained in the 400 | \fBsqueue\fR(1) man page, but the abbreviations are restated here 401 | for completeness. 402 | .TP 20 403 | .B "CA CANCELLED" 404 | Job was cancelled. 405 | .TP 406 | .B "CD COMPLETED" 407 | Job completed normally. 408 | .TP 409 | .B "CG COMPLETING" 410 | Job is in the process of completing. 411 | .TP 412 | .B "F FAILED" 413 | Job termined abnormally. 414 | .TP 415 | .B "NF NODE_FAIL" 416 | Job terminated due to node failure. 417 | .TP 418 | .B "PD PENDING" 419 | Job is pending allocation. 420 | .TP 421 | .B "R RUNNING" 422 | Job currently has an allocation. 423 | .TP 424 | .B "S SUSPENDED" 425 | Job is suspended. 426 | .TP 427 | .B "TO TIMEOUT" 428 | Job terminated upon reaching its time limit. 429 | 430 | 431 | .SH EXAMPLES 432 | Display the job or jobs that were running on host55 at July 19, 4:00PM: 433 | .nf 434 | 435 | sqlog --time="July 19, 4pm" --nodes=host55 436 | 437 | .fi 438 | Display at most 25 jobs that were running at midnight yesterday: 439 | .nf 440 | 441 | sqlog --time=yesterday,midnight 442 | 443 | .fi 444 | Display all jobs that failed between 8:00AM and 9:00AM this morning, 445 | sorted by descending endtime: 446 | .nf 447 | 448 | sqlog --all --end=8am..9am --states=F --sort=-end 449 | 450 | .fi 451 | Display all jobs that started today: 452 | .nf 453 | 454 | sqlog --start=+midnight --all 455 | 456 | .fi 457 | Display all jobs that have run between 3 and 4 hours on the nodes 458 | host30 through host65, and that didn't complete normally 459 | .nf 460 | 461 | sqlog -L 0 -T=3h..4h -n 'host[30-65]' -xs completed 462 | 463 | .fi 464 | Display all jobs that were running yesterday with 1000 nodes or 465 | greater and completed normally: 466 | .nf 467 | 468 | sqlog -t yesterday,12am..12am -s CD -N +1000 469 | 470 | .fi 471 | List current queue, sorted by number of nodes (ascending): 472 | .nf 473 | 474 | sqlog --all --no-db --sort=nnodes 475 | 476 | .fi 477 | List the top 10 longest running jobs, and then the 5 oldest jobs: 478 | .nf 479 | 480 | sqlog --sort=runtime --limit=10 481 | sqlog --sort=-start --limit=5 482 | 483 | .fi 484 | .SH AUTHOR 485 | Written by Adam Moody and Mark Grondona. 486 | -------------------------------------------------------------------------------- /sqlog.conf.example: -------------------------------------------------------------------------------- 1 | ############################################################################### 2 | # $Id$ 3 | ############################################################################### 4 | # 5 | # sqlog(1) database configuration 6 | # 7 | # Override defaults for 8 | # SQLHOST Hostname for SQL db (default = sqlhbost) 9 | # SQLUSER Read-only username for db (default = slurm_read) 10 | # SQLPASS Read-only password (default = no password) 11 | # SQLDB Database name for slurm data (default = slurm) 12 | # TRACKNODES Set to 0 to disable per-job node tracking (default = 1) 13 | # 14 | # This "config file" is read by perl's do() routine, so arbitrary 15 | # perl can be used. (For example, to automatically determine the 16 | # SQLHOST, etc.) 17 | # 18 | 19 | # Must begin with package "conf". 20 | # 21 | package conf; 22 | 23 | use Genders; 24 | 25 | # Only need to override SQLHOST for now. Other defaults are ok. 26 | $SQLHOST = get_sqlhost (); 27 | 28 | # Example of adding new format list 29 | %FORMATS = ( "sys1" => "jid,user,name,start,end" ); 30 | 31 | 1; 32 | # 33 | # Get cluster's "sqlhost" (altmgmt node for now) 34 | # (Returns the ethernet hostname for the node, if available) 35 | # 36 | sub get_sqlhost 37 | { 38 | my $genders = Genders->new(); 39 | my $host = ""; 40 | 41 | ($host) = $genders->getnodes("mysqld") or 42 | &main::log_fatal ("Failed to get SQL host from mysqld genders attr.\n"); 43 | 44 | my $server = $genders->getattrval("altname", $host) or 45 | &main::log_error ("Failed to get altname for $host.\n"); 46 | 47 | return $server || $host; 48 | } 49 | 50 | -------------------------------------------------------------------------------- /sqlog.spec: -------------------------------------------------------------------------------- 1 | Name: sqlog 2 | Version: See META 3 | Release: See META 4 | 5 | Summary: SLURM job completion database utilities 6 | Group: Applications/System 7 | License: GPL 8 | Source: %{name}-%{version}.tgz 9 | BuildRoot: %{_tmppath}/%{name}-%{version} 10 | BuildArch: noarch 11 | Requires: slurm perl(Date::Manip) perl(DBI) perl(DBD::mysql) perl(Digest::SHA1) gendersllnl 12 | 13 | %define debug_package %{nil} 14 | 15 | %description 16 | sqlog provides a system for creation, query, and population of a 17 | database of SLURM job history. 18 | 19 | %{!?_slurm_sysconfdir: %define _slurm_sysconfdir %{_sysconfdir}/slurm} 20 | %{!?_perl_path: %define _perl_path %{__perl}} 21 | %{!?_perl_libpaths: %define _perl_libpaths %{nil}} 22 | %{!?_path_env_var: %define _path_env_var /bin:/usr/bin:/usr/sbin} 23 | 24 | %prep 25 | %setup 26 | 27 | %build 28 | #NOOP 29 | 30 | %install 31 | rm -rf "$RPM_BUILD_ROOT" 32 | mkdir -p "$RPM_BUILD_ROOT" 33 | mkdir -p -m0755 $RPM_BUILD_ROOT/%{_libexecdir}/sqlog 34 | 35 | perl -pli -e "s|/etc/slurm|%{_slurm_sysconfdir}|g; 36 | s|/usr/bin/perl|%{_perl_path}|; 37 | s|^use lib qw\(\);|use lib qw(%{_perl_libpaths});|; 38 | s|^(\\\$ENV\{PATH\}) = '[^']*';|\$1 = '%{_path_env_var}';|;" \ 39 | sqlog sqlog.1 sqlog-db-util sqlog-db-util.8 slurm-joblog.pl \ 40 | skewstats skewstats.1 41 | 42 | install -D -m 755 sqlog ${RPM_BUILD_ROOT}/%{_bindir}/sqlog 43 | install -D -m 644 sqlog.1 ${RPM_BUILD_ROOT}/%{_mandir}/man1/sqlog.1 44 | install -D -m 755 skewstats ${RPM_BUILD_ROOT}/%{_bindir}/skewstats 45 | install -D -m 644 skewstats.1 ${RPM_BUILD_ROOT}/%{_mandir}/man1/skewstats.1 46 | install -D -m 755 sqlog-db-util ${RPM_BUILD_ROOT}/%{_sbindir}/sqlog-db-util 47 | install -D -m 644 sqlog-db-util.8 ${RPM_BUILD_ROOT}/%{_mandir}/man8/sqlog-db-util.8 48 | install -D -m 755 slurm-joblog.pl \ 49 | ${RPM_BUILD_ROOT}/%{_libexecdir}/sqlog/slurm-joblog 50 | 51 | 52 | %clean 53 | rm -rf "$RPM_BUILD_ROOT" 54 | 55 | %files 56 | %defattr(-,root,root) 57 | %doc README NEWS ChangeLog sqlog.conf.example slurm-joblog.conf.example 58 | %{_bindir}/sqlog 59 | %{_bindir}/skewstats 60 | %{_sbindir}/sqlog-db-util 61 | %{_mandir}/*/* 62 | %{_libexecdir}/sqlog 63 | 64 | --------------------------------------------------------------------------------