├── COPYING
├── ChangeLog
├── DISCLAIMER
├── META
├── NEWS
├── README
├── skewstats
├── skewstats.1
├── slurm-joblog.conf.example
├── slurm-joblog.pl
├── sqlog
├── sqlog-db-util
├── sqlog-db-util.8
├── sqlog.1
├── sqlog.conf.example
└── sqlog.spec


/COPYING:
--------------------------------------------------------------------------------
  1 | 		    GNU GENERAL PUBLIC LICENSE
  2 | 		       Version 2, June 1991
  3 | 
  4 |  Copyright (C) 1989, 1991 Free Software Foundation, Inc.
  5 |                        51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
  6 |  Everyone is permitted to copy and distribute verbatim copies
  7 |  of this license document, but changing it is not allowed.
  8 | 
  9 | 			    Preamble
 10 | 
 11 |   The licenses for most software are designed to take away your
 12 | freedom to share and change it.  By contrast, the GNU General Public
 13 | License is intended to guarantee your freedom to share and change free
 14 | software--to make sure the software is free for all its users.  This
 15 | General Public License applies to most of the Free Software
 16 | Foundation's software and to any other program whose authors commit to
 17 | using it.  (Some other Free Software Foundation software is covered by
 18 | the GNU Library General Public License instead.)  You can apply it to
 19 | your programs, too.
 20 | 
 21 |   When we speak of free software, we are referring to freedom, not
 22 | price.  Our General Public Licenses are designed to make sure that you
 23 | have the freedom to distribute copies of free software (and charge for
 24 | this service if you wish), that you receive source code or can get it
 25 | if you want it, that you can change the software or use pieces of it
 26 | in new free programs; and that you know you can do these things.
 27 | 
 28 |   To protect your rights, we need to make restrictions that forbid
 29 | anyone to deny you these rights or to ask you to surrender the rights.
 30 | These restrictions translate to certain responsibilities for you if you
 31 | distribute copies of the software, or if you modify it.
 32 | 
 33 |   For example, if you distribute copies of such a program, whether
 34 | gratis or for a fee, you must give the recipients all the rights that
 35 | you have.  You must make sure that they, too, receive or can get the
 36 | source code.  And you must show them these terms so they know their
 37 | rights.
 38 | 
 39 |   We protect your rights with two steps: (1) copyright the software, and
 40 | (2) offer you this license which gives you legal permission to copy,
 41 | distribute and/or modify the software.
 42 | 
 43 |   Also, for each author's protection and ours, we want to make certain
 44 | that everyone understands that there is no warranty for this free
 45 | software.  If the software is modified by someone else and passed on, we
 46 | want its recipients to know that what they have is not the original, so
 47 | that any problems introduced by others will not reflect on the original
 48 | authors' reputations.
 49 | 
 50 |   Finally, any free program is threatened constantly by software
 51 | patents.  We wish to avoid the danger that redistributors of a free
 52 | program will individually obtain patent licenses, in effect making the
 53 | program proprietary.  To prevent this, we have made it clear that any
 54 | patent must be licensed for everyone's free use or not licensed at all.
 55 | 
 56 |   The precise terms and conditions for copying, distribution and
 57 | modification follow.
 58 | 
 59 | 		    GNU GENERAL PUBLIC LICENSE
 60 |    TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
 61 | 
 62 |   0. This License applies to any program or other work which contains
 63 | a notice placed by the copyright holder saying it may be distributed
 64 | under the terms of this General Public License.  The "Program", below,
 65 | refers to any such program or work, and a "work based on the Program"
 66 | means either the Program or any derivative work under copyright law:
 67 | that is to say, a work containing the Program or a portion of it,
 68 | either verbatim or with modifications and/or translated into another
 69 | language.  (Hereinafter, translation is included without limitation in
 70 | the term "modification".)  Each licensee is addressed as "you".
 71 | 
 72 | Activities other than copying, distribution and modification are not
 73 | covered by this License; they are outside its scope.  The act of
 74 | running the Program is not restricted, and the output from the Program
 75 | is covered only if its contents constitute a work based on the
 76 | Program (independent of having been made by running the Program).
 77 | Whether that is true depends on what the Program does.
 78 | 
 79 |   1. You may copy and distribute verbatim copies of the Program's
 80 | source code as you receive it, in any medium, provided that you
 81 | conspicuously and appropriately publish on each copy an appropriate
 82 | copyright notice and disclaimer of warranty; keep intact all the
 83 | notices that refer to this License and to the absence of any warranty;
 84 | and give any other recipients of the Program a copy of this License
 85 | along with the Program.
 86 | 
 87 | You may charge a fee for the physical act of transferring a copy, and
 88 | you may at your option offer warranty protection in exchange for a fee.
 89 | 
 90 |   2. You may modify your copy or copies of the Program or any portion
 91 | of it, thus forming a work based on the Program, and copy and
 92 | distribute such modifications or work under the terms of Section 1
 93 | above, provided that you also meet all of these conditions:
 94 | 
 95 |     a) You must cause the modified files to carry prominent notices
 96 |     stating that you changed the files and the date of any change.
 97 | 
 98 |     b) You must cause any work that you distribute or publish, that in
 99 |     whole or in part contains or is derived from the Program or any
100 |     part thereof, to be licensed as a whole at no charge to all third
101 |     parties under the terms of this License.
102 | 
103 |     c) If the modified program normally reads commands interactively
104 |     when run, you must cause it, when started running for such
105 |     interactive use in the most ordinary way, to print or display an
106 |     announcement including an appropriate copyright notice and a
107 |     notice that there is no warranty (or else, saying that you provide
108 |     a warranty) and that users may redistribute the program under
109 |     these conditions, and telling the user how to view a copy of this
110 |     License.  (Exception: if the Program itself is interactive but
111 |     does not normally print such an announcement, your work based on
112 |     the Program is not required to print an announcement.)
113 | 
114 | These requirements apply to the modified work as a whole.  If
115 | identifiable sections of that work are not derived from the Program,
116 | and can be reasonably considered independent and separate works in
117 | themselves, then this License, and its terms, do not apply to those
118 | sections when you distribute them as separate works.  But when you
119 | distribute the same sections as part of a whole which is a work based
120 | on the Program, the distribution of the whole must be on the terms of
121 | this License, whose permissions for other licensees extend to the
122 | entire whole, and thus to each and every part regardless of who wrote it.
123 | 
124 | Thus, it is not the intent of this section to claim rights or contest
125 | your rights to work written entirely by you; rather, the intent is to
126 | exercise the right to control the distribution of derivative or
127 | collective works based on the Program.
128 | 
129 | In addition, mere aggregation of another work not based on the Program
130 | with the Program (or with a work based on the Program) on a volume of
131 | a storage or distribution medium does not bring the other work under
132 | the scope of this License.
133 | 
134 |   3. You may copy and distribute the Program (or a work based on it,
135 | under Section 2) in object code or executable form under the terms of
136 | Sections 1 and 2 above provided that you also do one of the following:
137 | 
138 |     a) Accompany it with the complete corresponding machine-readable
139 |     source code, which must be distributed under the terms of Sections
140 |     1 and 2 above on a medium customarily used for software interchange; or,
141 | 
142 |     b) Accompany it with a written offer, valid for at least three
143 |     years, to give any third party, for a charge no more than your
144 |     cost of physically performing source distribution, a complete
145 |     machine-readable copy of the corresponding source code, to be
146 |     distributed under the terms of Sections 1 and 2 above on a medium
147 |     customarily used for software interchange; or,
148 | 
149 |     c) Accompany it with the information you received as to the offer
150 |     to distribute corresponding source code.  (This alternative is
151 |     allowed only for noncommercial distribution and only if you
152 |     received the program in object code or executable form with such
153 |     an offer, in accord with Subsection b above.)
154 | 
155 | The source code for a work means the preferred form of the work for
156 | making modifications to it.  For an executable work, complete source
157 | code means all the source code for all modules it contains, plus any
158 | associated interface definition files, plus the scripts used to
159 | control compilation and installation of the executable.  However, as a
160 | special exception, the source code distributed need not include
161 | anything that is normally distributed (in either source or binary
162 | form) with the major components (compiler, kernel, and so on) of the
163 | operating system on which the executable runs, unless that component
164 | itself accompanies the executable.
165 | 
166 | If distribution of executable or object code is made by offering
167 | access to copy from a designated place, then offering equivalent
168 | access to copy the source code from the same place counts as
169 | distribution of the source code, even though third parties are not
170 | compelled to copy the source along with the object code.
171 | 
172 |   4. You may not copy, modify, sublicense, or distribute the Program
173 | except as expressly provided under this License.  Any attempt
174 | otherwise to copy, modify, sublicense or distribute the Program is
175 | void, and will automatically terminate your rights under this License.
176 | However, parties who have received copies, or rights, from you under
177 | this License will not have their licenses terminated so long as such
178 | parties remain in full compliance.
179 | 
180 |   5. You are not required to accept this License, since you have not
181 | signed it.  However, nothing else grants you permission to modify or
182 | distribute the Program or its derivative works.  These actions are
183 | prohibited by law if you do not accept this License.  Therefore, by
184 | modifying or distributing the Program (or any work based on the
185 | Program), you indicate your acceptance of this License to do so, and
186 | all its terms and conditions for copying, distributing or modifying
187 | the Program or works based on it.
188 | 
189 |   6. Each time you redistribute the Program (or any work based on the
190 | Program), the recipient automatically receives a license from the
191 | original licensor to copy, distribute or modify the Program subject to
192 | these terms and conditions.  You may not impose any further
193 | restrictions on the recipients' exercise of the rights granted herein.
194 | You are not responsible for enforcing compliance by third parties to
195 | this License.
196 | 
197 |   7. If, as a consequence of a court judgment or allegation of patent
198 | infringement or for any other reason (not limited to patent issues),
199 | conditions are imposed on you (whether by court order, agreement or
200 | otherwise) that contradict the conditions of this License, they do not
201 | excuse you from the conditions of this License.  If you cannot
202 | distribute so as to satisfy simultaneously your obligations under this
203 | License and any other pertinent obligations, then as a consequence you
204 | may not distribute the Program at all.  For example, if a patent
205 | license would not permit royalty-free redistribution of the Program by
206 | all those who receive copies directly or indirectly through you, then
207 | the only way you could satisfy both it and this License would be to
208 | refrain entirely from distribution of the Program.
209 | 
210 | If any portion of this section is held invalid or unenforceable under
211 | any particular circumstance, the balance of the section is intended to
212 | apply and the section as a whole is intended to apply in other
213 | circumstances.
214 | 
215 | It is not the purpose of this section to induce you to infringe any
216 | patents or other property right claims or to contest validity of any
217 | such claims; this section has the sole purpose of protecting the
218 | integrity of the free software distribution system, which is
219 | implemented by public license practices.  Many people have made
220 | generous contributions to the wide range of software distributed
221 | through that system in reliance on consistent application of that
222 | system; it is up to the author/donor to decide if he or she is willing
223 | to distribute software through any other system and a licensee cannot
224 | impose that choice.
225 | 
226 | This section is intended to make thoroughly clear what is believed to
227 | be a consequence of the rest of this License.
228 | 
229 |   8. If the distribution and/or use of the Program is restricted in
230 | certain countries either by patents or by copyrighted interfaces, the
231 | original copyright holder who places the Program under this License
232 | may add an explicit geographical distribution limitation excluding
233 | those countries, so that distribution is permitted only in or among
234 | countries not thus excluded.  In such case, this License incorporates
235 | the limitation as if written in the body of this License.
236 | 
237 |   9. The Free Software Foundation may publish revised and/or new versions
238 | of the General Public License from time to time.  Such new versions will
239 | be similar in spirit to the present version, but may differ in detail to
240 | address new problems or concerns.
241 | 
242 | Each version is given a distinguishing version number.  If the Program
243 | specifies a version number of this License which applies to it and "any
244 | later version", you have the option of following the terms and conditions
245 | either of that version or of any later version published by the Free
246 | Software Foundation.  If the Program does not specify a version number of
247 | this License, you may choose any version ever published by the Free Software
248 | Foundation.
249 | 
250 |   10. If you wish to incorporate parts of the Program into other free
251 | programs whose distribution conditions are different, write to the author
252 | to ask for permission.  For software which is copyrighted by the Free
253 | Software Foundation, write to the Free Software Foundation; we sometimes
254 | make exceptions for this.  Our decision will be guided by the two goals
255 | of preserving the free status of all derivatives of our free software and
256 | of promoting the sharing and reuse of software generally.
257 | 
258 | 			    NO WARRANTY
259 | 
260 |   11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
261 | FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW.  EXCEPT WHEN
262 | OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
263 | PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
264 | OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
265 | MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.  THE ENTIRE RISK AS
266 | TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU.  SHOULD THE
267 | PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
268 | REPAIR OR CORRECTION.
269 | 
270 |   12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
271 | WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
272 | REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
273 | INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
274 | OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
275 | TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
276 | YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
277 | PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
278 | POSSIBILITY OF SUCH DAMAGES.
279 | 
280 | 		     END OF TERMS AND CONDITIONS
281 | 
282 | 	    How to Apply These Terms to Your New Programs
283 | 
284 |   If you develop a new program, and you want it to be of the greatest
285 | possible use to the public, the best way to achieve this is to make it
286 | free software which everyone can redistribute and change under these terms.
287 | 
288 |   To do so, attach the following notices to the program.  It is safest
289 | to attach them to the start of each source file to most effectively
290 | convey the exclusion of warranty; and each file should have at least
291 | the "copyright" line and a pointer to where the full notice is found.
292 | 
293 |     <one line to give the program's name and a brief idea of what it does.>
294 |     Copyright (C) <year>  <name of author>
295 | 
296 |     This program is free software; you can redistribute it and/or modify
297 |     it under the terms of the GNU General Public License as published by
298 |     the Free Software Foundation; either version 2 of the License, or
299 |     (at your option) any later version.
300 | 
301 |     This program is distributed in the hope that it will be useful,
302 |     but WITHOUT ANY WARRANTY; without even the implied warranty of
303 |     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
304 |     GNU General Public License for more details.
305 | 
306 |     You should have received a copy of the GNU General Public License
307 |     along with this program; if not, write to the Free Software
308 |     Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
309 | 
310 | 
311 | Also add information on how to contact you by electronic and paper mail.
312 | 
313 | If the program is interactive, make it output a short notice like this
314 | when it starts in an interactive mode:
315 | 
316 |     Gnomovision version 69, Copyright (C) year name of author
317 |     Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
318 |     This is free software, and you are welcome to redistribute it
319 |     under certain conditions; type `show c' for details.
320 | 
321 | The hypothetical commands `show w' and `show c' should show the appropriate
322 | parts of the General Public License.  Of course, the commands you use may
323 | be called something other than `show w' and `show c'; they could even be
324 | mouse-clicks or menu items--whatever suits your program.
325 | 
326 | You should also get your employer (if you work as a programmer) or your
327 | school, if any, to sign a "copyright disclaimer" for the program, if
328 | necessary.  Here is a sample; alter the names:
329 | 
330 |   Yoyodyne, Inc., hereby disclaims all copyright interest in the program
331 |   `Gnomovision' (which makes passes at compilers) written by James Hacker.
332 | 
333 |   <signature of Ty Coon>, 1 April 1989
334 |   Ty Coon, President of Vice
335 | 
336 | This General Public License does not permit incorporating your program into
337 | proprietary programs.  If your program is a subroutine library, you may
338 | consider it more useful to permit linking proprietary applications with the
339 | library.  If this is what you want to do, use the GNU Library General
340 | Public License instead of this License.
341 | 


--------------------------------------------------------------------------------
/ChangeLog:
--------------------------------------------------------------------------------
  1 | 2011-07-11  Mark Grondona <mgrondona@llnl.gov>
  2 | 
  3 | 	* : tag v0.19.
  4 | 
  5 | 	* slurm-joblog.pl :
  6 | 	  Use NODECNT environment variable if set instead of counting
  7 | 	  node list. Resolves Issue 2.
  8 | 
  9 | 	* slurm-joblog.pl, sqlog-db-util :
 10 | 	  Improve debug and verbose output for better debugging.
 11 | 
 12 | 	* sqlog-db-util :
 13 | 	  Fix CHAOS bz#1204: error setting up DB.
 14 | 
 15 | 2009-10-23  Mark Grondona <mgrondona@llnl.gov>
 16 | 
 17 | 	* : tag v0.18.
 18 | 
 19 | 	* sqlog :
 20 | 	  Add missing table name for nodes table in &lookup_ids().
 21 | 
 22 | 2009-10-23  Mark Grondona <mgrondona@llnl.gov>
 23 | 
 24 | 	* : tag v0.17.
 25 | 
 26 | 	* skewstats, skewstats.1, sqlog.spec :
 27 | 	  Add skewstats utility to sqlog package.
 28 | 
 29 | 2009-10-16  Mark Grondona <mgrondona@llnl.gov>
 30 | 
 31 | 	* sqlog.1 : Update documentation of --regex.
 32 | 
 33 | 	* sqlog : Replace global conf{regex} boolean with conf{regex}{key}
 34 | 	  to allow per-key setting of regex matching vs exact matching. 
 35 | 	  The --regex flag now operates similar to --exclude.
 36 | 
 37 | 2009-10-12  Mark Grondona <mgrondona@llnl.gov>
 38 | 
 39 | 	* : tag v0.16.
 40 | 
 41 | 	* sqlog : Fix --regex queries against v2 database format.
 42 | 
 43 | 	* sqlog : Don't treat filter arguments for running jobs
 44 | 	  as regular expressions unles --regex was used.
 45 | 
 46 | 2009-10-10  Mark Grondona <mgrondona@llnl.gov>
 47 | 
 48 | 	* sqlog : Refactor code for running job selection, and fix
 49 | 	  -J, --job-names processing on running jobs.
 50 | 
 51 | 	* sqlog-db-util : Fix  -L --localhost processing. 
 52 | 
 53 | 2009-05-07 Mark Grondona <mgrondona@llnl.gov>
 54 | 
 55 | 	* : tag v0.15.
 56 | 
 57 | 	* sqlog : Fix -j, --jobids option on systems with v2
 58 | 	  database.
 59 | 
 60 | 2009-04-09 Mark Grondona <mgrondona@llnl.gov>
 61 | 
 62 | 	* : tag v0.14.
 63 | 
 64 | 	* sqlog :
 65 | 	  Remove ncores from default short output format.
 66 | 
 67 | 	* sqlog (query_running_jobs, reformat_squeue_datetime) :
 68 | 	  Drop the T character from new-style squeue output. SLURM
 69 | 	  seems to have introduced this in slurm-1.4, and Date::Manip
 70 | 	  doesn't handle it.
 71 | 
 72 | 	* sqlog (get_seconds_date_manip_is_buggy) :
 73 | 	  If date is literally NONE, then just return that value.
 74 | 	  Otherwise, try to handle dates that cannot be parsed by
 75 | 	  Date::Manip by logging an error and setting them to NONE.
 76 | 
 77 | 2009-04-02 Mark Grondona <mgrondona@llnl.gov>
 78 | 
 79 | 	* slurm-joblog.pl :
 80 | 	  Try to log to text file even if there is a failure reading
 81 | 	  one of the config files.
 82 | 
 83 | 2009-04-02 Mark Grondona <mgrondona@llnl.gov>
 84 | 
 85 | 	* : tag v0.13.
 86 | 
 87 | 2009-04-01 Mark Grondona <mgrondona@llnl.gov>
 88 | 
 89 | 	* sqlog :
 90 | 	  Change timelocal debug message from log_verbose to log_debug.
 91 | 
 92 | 	* sqlog-db-util, sqlog-db-util.8, README :
 93 | 	  Rename --cores option to --cores-per-node.
 94 | 
 95 | 	* sqlog-db-util : 
 96 | 	  Reformat --help output and comments to fit in 80 characters.
 97 | 
 98 | 2009-03-13 Adam Moody <moody20@llnl.gov>
 99 | 
100 | 	* sqlog : Added logic to track nodes and query records by node name
101 | 	  via SQL commands.
102 | 
103 | 	* sqlog-db-util : Added --backup, --prune, --cores, --obfuscate,
104 | 	  --notrack, and --delay-index options.  Changed --drop to --drop=V.
105 | 	  Created usernames, jobnames, jobstates, and partitions tables
106 | 	  to reduce record length in new jobs table.  Added indicies on
107 | 	  jobnames, starttime, endtime, runtime, nodecount, corecount to
108 | 	  speed up such queries.  Added nodes table to assign each node name
109 | 	  to an id, and created a jobs_nodes table that maps node ids to
110 | 	  job ids.
111 | 
112 | 2008-03-03 Adam Moody <moody20@llnl.gov>
113 | 
114 | 	* sqlog : Added --ncores, --maxcores, and --mincores options to display
115 | 	  and sort jobs based on number of processors used (PROCS field from 
116 | 	  SLURM job log).
117 | 
118 | 	* sqlog-db-util : Added -x, --convert options to convert existing
119 | 	  slurm_job_log tables to version 2 (adds procs column and extends
120 | 	  nodelist column to a blob).  Avoids reconverting an up-to-date table.
121 | 
122 | 	* slurm-joblog.pl : Added code to insert procs field.
123 | 
124 | 2008-12-03 Mark Grondona <mgrondona@llnl.gov>
125 | 
126 | 	* : tag v0.12.
127 | 
128 | 	* sqlog : Add --regex option to use REGEXP instead of IN
129 | 	  for queries of jobid, username, jobstate, partition, or
130 | 	  job name.
131 | 
132 | 2008-12-02 Mark Grondona <mgrondona@llnl.gov>
133 | 
134 | 	* sqlog : It appears that Date::Manip is buggy and doesn't
135 | 	  support DST properly. sqlog used Date::Manip to convert
136 | 	  between stored database dates and Unix time (seconds since
137 | 	  epoch). Instead convert time using other methods in this
138 | 	  instance to avoid misconverting during DST conversions.
139 | 	  
140 | 2008-10-03 Py Watson <py@llnl.gov>
141 | 
142 | 	* sqlog.spec :
143 | 	  Change the various perl Requires to be based on the module 
144 | 	  name rather than the RPM name. Fix for RPM Requires on SLES 9.
145 | 
146 | 2008-06-24 Mark Grondona <mgrondona@llnl.gov>
147 | 
148 | 	* : tag v0.11.
149 | 
150 | 	* sqlog-db-util :
151 | 	  Drop all slurm* users from mysql DB before adding new
152 | 	  user privileges to avoid leaving old settings around.
153 | 
154 | 	* sqlog-db-util :
155 | 	  New slurm-joblog.conf parameter SQLNETWORK to specify the network 
156 | 	  on which read-only access to the DB is allowed. Default: 192.168.%.%.
157 | 
158 | 	* slurm-joblog.conf.example :
159 | 	  Describe SQLNETWORK parameter.
160 | 
161 | 2008-04-18 Mark Grondona <mgrondona@llnl.gov>
162 | 
163 | 	* : tag v0.10.
164 | 
165 | 	* sqlog.1 : Add OUTPUT FORMAT section.
166 | 
167 | 	* sqlog : Separate regexes in duration_to_seconds for 
168 | 	  better parsing of the two support forms.
169 | 
170 | 2008-02-29 Mark Grondona <mgrondona@llnl.gov>
171 | 
172 | 	* : tag v0.9.
173 | 
174 | 	* sqlog.1 : Add JOB STATE CODES section to manpage explaining
175 | 	  the various job states and their abbreviations.
176 | 
177 | 	* sqlog : Fix -o, --format when just a new format specification
178 | 	  is provided (e.g. "long" or "freeform").
179 | 
180 | 2007-09-27 Mark Grondona <mgrondona@llnl.gov>
181 | 
182 | 	* : tag v0.8.
183 | 
184 | 2007-09-14 Pythagoras Watson <py@llnl.gov>
185 | 
186 | 	* sqlog.spec : Add more packages to Requires.
187 | 
188 | 2007-09-12 Mark Grondona <mgorndona@llnl.gov>
189 | 
190 | 	* slurm-joblog.pl : Fix test for whether a job logfile is
191 | 	  configured.
192 | 
193 | 2007-08-13 Mark Grondona <mgorndona@llnl.gov>
194 | 
195 | 	* : tag v0.7.
196 | 
197 | 2007-08-13 Pythagoras Watson <py@llnl.gov>
198 | 
199 | 	* sqlog, sqlog-db-util, slurm-joblog.pl, sqlog.spec :
200 | 	  Fixes required for AIX and other installations without 
201 | 	  prefix = /usr: Allow modification of perl include paths
202 | 	  and PATH at RPM build time, use __perl RPM macro instead
203 | 	  of hardcoding /usr/bin/perl in specfile, correct perms
204 | 	  of sql-db-util.8 man page, and improve subst() functon in
205 | 	  specfile.
206 | 
207 | 2007-08-10 Mark Grondona <mgrondona@llnl.gov>
208 | 
209 | 	* sqlog : Add runtime_s output field (runtime in seconds).
210 | 
211 | 	* sqlog : Add unixstart and unixend format keys for start and
212 | 	  end times in seconds since the epoch. Also, add an alias
213 | 	  time_s for runtime_s.
214 | 
215 | 	* : tag v0.6.
216 | 
217 | 2007-08-10 Adam Moody <moody20@llnl.gov>
218 | 
219 | 	* sqlog : Fixed bug in parse_end_time preventing --end-before
220 | 	  and --end-after from working.  Fixed comment in parse_start_time
221 | 	  to print --start-before and --start-after in error message.
222 | 
223 | 2007-08-10 Mark Grondona <mgrondona@llnl.gov>
224 | 
225 | 	* sqlog, sqlog-db-util, slurm-joblog.pl, sqlog.spec :
226 | 	  Allow configuration directory (/etc/slurm by default) and perl
227 | 	  path (/usr/bin/perl by default) to be overridden at RPM build time.
228 | 
229 | 	* : tag 0.5.
230 | 
231 | 2007-08-07 Mark Grondona <mgrondona@llnl.gov>
232 | 
233 | 	* sqlog-db-util : Failsafe check for existence of slurm_job_log
234 | 	  table in create_db(). Create SLURM DB with "IF NOT EXISTS" to 
235 | 	  avoid error.
236 | 
237 | 	* sqlog : Don't read ~/.sqlog. This file is reserved for future 
238 | 	  user configuration.
239 | 
240 | 	* sqlog : Parse ~/.sqlog for specification of alternate format
241 | 	  lists via "format{name} = LIST..." and default limit with
242 | 	  "limit = N". Similarly, new format lists may be specific in
243 | 	  sqlog.conf by creating a %FORMATS hash. User and system
244 | 	  configs may override the default format key lists for
245 | 	  "short", "long", and "freeform".
246 | 
247 | 	* sqlog.1 : Document ~/.sqlog.
248 | 
249 | 	* : tag v0.4.
250 | 
251 | 2007-08-06 Mark Grondona <mgrondona@llnl.gov>
252 | 
253 | 	* sqlog.spec : Add perl-DateManip and gendersllnl to Requires.
254 | 
255 | 	* sqlog.1 : Add a note about sqlog's "More results available..."
256 | 	  message. Add some more examples.
257 | 
258 | 	* sqlog : When sorting start and end times, assume "NONE" to 
259 | 	  mean "possibly infinite in the future" by faking a date 10 years
260 | 	  from now. This allows sorting end time to work as expected since 
261 | 	  all jobs that end in the future should have the greateast end
262 | 	  time.
263 | 
264 | 	* sqlog-db-util.8 : Added.
265 | 
266 | 	* sqlog-db-util : Reformat usage output.
267 | 
268 | 	* sqlog : Give each format type (short, long, freeform) its own 
269 | 	  format list. Put "longstart" and "longend" into default format 
270 | 	  lists for long and freeform output types.
271 | 
272 | 2007-08-06 Adam Moody <moody20@llnl.gov>
273 | 
274 | 	* sqlog : Changed option order in help output to more closely match
275 | 	  the order in the manpage.
276 | 
277 | 	* sqlog.1 : Added -L and -a, which were missing.  Changed option
278 | 	  listing order slightly to group options by function.
279 | 
280 | 2007-08-06 Mark Grondona <mgrondona@llnl.gov>
281 | 
282 | 	* sqlog : Apply sort keys to "ORDER BY" in DB query. Also, reverse
283 | 	  the sense of '-' on sort keys to be more intuitive.
284 | 
285 | 	* sqlog : Print "More results available..." if not all results from
286 | 	  DB and/or queue were displayed due to --limit.
287 | 
288 | 	* sqlog : Add "longstart" and "longend" format keys which print start
289 | 	  and end datetime in format "%Y-%m-%dT%H:%M:%S".
290 | 
291 | 	* sqlog : Fix typo in initialization of config arrays that caused
292 | 	  selection of job states to break.
293 | 
294 | 2007-08-04 Mark Grondona <mgrondona@llnl.gov>
295 | 
296 | 	* slurm-joblog.pl, slurm-joblog.conf.example, README :
297 | 	  Optionally create DB if it doesn't exist in slurm-joblog.
298 | 
299 | 	* sqlog-db-util : Intialize $conf{verbose}.
300 | 
301 | 	* sqlog-db-util : Add --info option.
302 | 
303 | 	* : tag v0.3.
304 | 
305 | 2007-08-03 Mark Grondona <mgrondona@llnl.gov>
306 | 
307 | 	* sqlog-db-util : Add Adam Moody's utility for creation of SLURM
308 | 	  job log database.
309 | 
310 | 	* sqlog.spec : Add sqlog-db-util to specfile.
311 | 
312 | 	* sqlog.conf.example, slurm-joblog.conf.example : 
313 | 	  Add example config files.
314 | 
315 | 	* : tag v0.1.
316 | 
317 | 	* README, NEWS : Added.
318 | 
319 | 	* : tag v0.2.
320 | 
321 | 2007-07-27 Mark Grondona <mgrondona@llnl.gov>
322 | 
323 | 	* sqlog.1 : Update man page with new RANGE OPERATORS section.
324 | 
325 | 	* sqlog : Remove Examples from --help. They are now in the manpage.
326 | 	  Read alternate config from /etc/slurm/sqlog.conf or ~/.sqlog.
327 | 
328 | 	* slurm-joblog.pl : Add slurm job completion script to repo.
329 | 
330 | 	* META, sqlog.spec : Add specfile and META for building RPMs.
331 | 
332 | 2007-07-25 Mark Grondona <mgrondona@llnl.gov>
333 | 
334 | 	* sqlog : Changed range operator to ".."
335 | 	  --time may specify a min, max, or window of time.
336 | 	  @ may be used to escape leading + or - in datetime values.
337 | 
338 | 2007-07-20 Adam Moody <moody20@llnl.gov>
339 | 
340 | 	* sqlog : Added RANGE operator description to usage, 
341 | 	  --time still needs support.
342 | 
343 | 2007-07-19 Adam Moody <moody20@llnl.gov>
344 | 
345 | 	* sqlog : Added support for 'S' in time duration, only 's' was working.
346 | 
347 | 	* sqlog : Changed comparison for runtime, minruntime, and maxruntime
348 | 	 to string equality tests of "eq" and "ne" since numeric operators of ">"
349 | 	 would get confused for input such as -T +1h.
350 | 
351 | 	* sqlog : Changed 'N-M' to 'N--M' to be consistent with time window format.
352 | 	 This way there is one consistent set of operators +/-/--.
353 | 
354 | 2007-07-19 Mark Grondona <mgrondona@llnl.gov>
355 | 
356 | 	* sqlog : New options --start, --start-before, --start-after,
357 | 	 --end, --end-before, --end-after. Removed --before and --after
358 | 	 which are replaced by --start-before, --start-after.
359 | 
360 | 	* sqlog : Query running jobs by default and add -X, --no-running.
361 | 
362 | 	* sqlog : Add functions for parsing time ranges and min/max specifications.
363 | 
364 | 	* sqlog : Fix time window parsing.
365 | 
366 | 	* sqlog.1 : Added man page.
367 | 
368 | 2007-07-18 Mark Grondona <mgrondona@llnl.gov>
369 | 
370 | 	* sqlog : Initial commit.
371 | 
372 | 


--------------------------------------------------------------------------------
/DISCLAIMER:
--------------------------------------------------------------------------------
 1 | This work was produced at the Lawrence Livermore National Laboratory
 2 | (LLNL) under Contract No. DE-AC52-07NA27344 (Contract 44) between
 3 | the U.S. Department of Energy (DOE) and Lawrence Livermore National
 4 | Security, LLC (LLNS) for the operation of LLNL.
 5 | 
 6 | This work was prepared as an account of work sponsored by an agency of
 7 | the United States Government.  Neither the United States Government nor
 8 | Lawrence Livermore National Security, LLC nor any of their employees,
 9 | makes any warranty, express or implied, or assumes any liability or
10 | responsibility for the accuracy, completeness, or usefulness of any
11 | information, apparatus, product, or process disclosed, or represents
12 | that its use would not infringe privately-owned rights.
13 | 
14 | Reference herein to any specific commercial products, process, or
15 | services by trade name, trademark, manufacturer or otherwise does
16 | not necessarily constitute or imply its endorsement, recommendation,
17 | or favoring by the United States Government or Lawrence Livermore
18 | National Security, LLC.  The views and opinions of authors expressed
19 | herein do not necessarily state or reflect those of the Untied States
20 | Government or Lawrence Livermore National Security, LLC, and shall
21 | not be used for advertising or product endorsement purposes.
22 | 
23 | The precise terms and conditions for copying, distribution, and
24 | modification are specified in the file "COPYING".
25 | 


--------------------------------------------------------------------------------
/META:
--------------------------------------------------------------------------------
1 | ###
2 | ## $Id$
3 | ###
4 | 
5 | Name: sqlog
6 | Version: 0.25
7 | Release: 1
8 | 


--------------------------------------------------------------------------------
/NEWS:
--------------------------------------------------------------------------------
  1 | Version 0.25 (2016-04-18):
  2 |  - Fix defined() on non-scalar warnings (Py Watson)
  3 | 
  4 | Version 0.24 (2016-03-24):
  5 |  - Support for MariaDB 5.5 (Jeff B. Ogden)
  6 | 
  7 | Version 0.23 (2015-09-02):
  8 |  - Handle SLURM's PREEMPTED state in sqlog(1).
  9 | 
 10 | Version 0.22 (2011-12-23):
 11 |  - Handle SLURM's RESIZING state in sqlog(1).
 12 | 
 13 | Version 0.21 (2011-12-08):
 14 |  - slurm-joblogger.pl: Do not use the NODECNT variable. It is likely to
 15 |     be incorrect. Instead always compute nodecount from the nodelist.
 16 |  - sqlog-db-util: Add new --recalc-nodecnt option that recalculates
 17 |     nodecount from nodelist on backfill.
 18 |  - sqlog: Fix "Use of uninitialized value" in sqlog on RUNNING jobs.
 19 | 
 20 | Version 0.20 (2011-08-23):
 21 |  - sqlog-db-util: Don't make failure to connect to DB a fatal error for
 22 |     all cases. (Fixes bug in initial DB creation).
 23 |  - sqlog-db-util: Always connect to DB via 'localhost' if -L is used.
 24 | 
 25 | Version 0.19 (2011-07-11):
 26 |  - Fix Issue 2: Use NODECNT environment variable if set by SLURM.
 27 |  - sqlog-db-util: Fix bug in initial DB creation.
 28 |  - Slightly better debug and error log messages.
 29 | 
 30 | Version 0.18 (2009-11-09):
 31 |  - Fix bug in sqlog preventing queries with -n, --nodes.
 32 | 
 33 | Version 0.17 (2009-10-23):
 34 |  - Add the skewstats(1) utility to the sqlog package.
 35 |  - The sqlog --regex option now only applies to the following job query
 36 |    option, instead of globally to all filter options. This mirrors the
 37 |    functionality of the --exclude option.
 38 | 
 39 | Version 0.16 (2009-10-12):
 40 |  - Fix broken sqlog-db-util -L, --localhost option.
 41 |  - Fix job name (-J, --job-name) filtering for running jobs.
 42 |  - Fix use of --regex queries on running jobs and against v2 database.
 43 | 
 44 | Version 0.15 (2009-05-07):
 45 |  - Fix sqlog -j, --jobids on systems with v2 database.
 46 | 
 47 | Version 0.14 (2009-04-09):
 48 |  - Try harder to log to joblog text file when database isn't accessible.
 49 |  - Handle datetime of NONE in database.
 50 |  - Properly handle slurm-1.4 squeue datetime format.
 51 | 
 52 | Version 0.13 (2009-04-02):
 53 |  - Update database schema from v1 to v2.
 54 |  - Added --covert, --backup, --prune, --obfuscate options to
 55 |    sqlog-db-util, as well as, --cores-per-node, --notrack,
 56 |    and --delay-index.  Added "CONVERTING" and "BACKING UP"
 57 |    sections to README to discuss new options.
 58 |  - Added new indicies to schema: increased from just: username
 59 |    to: username, jobname, starttime, endtime, runtime, nodecount,
 60 |    corecount, nodename.  Speeds up common queries.
 61 |  - Add corecount column to track number of cores allocated to each
 62 |    job, which is useful for machines using the consumable resources
 63 |    SLURM plugin.
 64 |  - Added --ncores, --mincores, --maxcores options to sqlog to
 65 |    specify conditions on new corecount column.
 66 |  - Extend nodelist column to fix truncation when very fragmented
 67 |    nodelists exceeded the 1024 char limit initially set for the field.
 68 | 
 69 | Version 0.12 (2008-12-03):
 70 |  - Do not use Date::Manip routines to convert dates to "Unix time"
 71 |    (seconds since epoch) Date::Manip doesn't handle daylight savings 
 72 |    transitions properly and instead uses the current DST offset.
 73 |  - New --regex option allows sqlog to query with regexes for jobids,
 74 |    user names, states, paritions, and job names, instead of a simple
 75 |    exact match.
 76 | 
 77 | Version 0.11 (2008-06-24):
 78 |  - New slurm-joblog.conf parameter $SQLNETWORK sets the network
 79 |    on which read access to database is allowed. Default = 192.168.%.%.
 80 |  - sqlog-db-util now deletes slurm* users from mysql DB before 
 81 |    creating new user entries to avoid stale privileges.
 82 | 
 83 | Version 0.10 (2008-04-18):
 84 |  - Add OUTPUT FORMAT section to sqlog(1) manpage.
 85 |  - Improve RUNTIME parsing in sqlog script.
 86 | 
 87 | Version 0.9 (2008-02-29):
 88 |  - Fix --format=long which wasn't properly setting long format.
 89 |  - Add "JOB STATE CODES" section to sqlog(1) man page describing the
 90 |    various job state abbreviations.
 91 | 
 92 | Version 0.8 (2007-09-27):
 93 |  - Add more packages to RPM Requires
 94 |  - Fix test for whether job logfile is configured in slurm-joblog.pl.
 95 | 
 96 | Version 0.7 (2007-08-13):
 97 |  - Applied Py Watson's fixes for non-standard installs:
 98 |   -- Allow perl library path and PATH to be specified at RPM build time.
 99 |   -- Use __perl RPM macro instead of hardcoding /usr/bin/perl.
100 |   -- Other specfile improvements.
101 |  - Fix sqlog-db-util.8 manpage permissions.
102 | 
103 | Version 0.6 (2007-08-10):
104 |  - Fix for bug in --end-before and --end-after argument processing.
105 |  - Add new format keys: runtime_s (runtime in seconds) and unixstart/
106 |    unixend (start and end times in seconds since the epoch).
107 | 
108 | Version 0.5 (2007-08-10):
109 |  - Allow perl path (default = /usr/bin/perl) and
110 |    confdir (default = /etc/slurm) to be overridden at RPM 
111 |    build time via _slurm_confidir and _perl_path.
112 | 
113 | Version 0.4 (2007-08-07):
114 |  - Fix for broken processing of -s, --states.
115 |  - Sort keys are now applied in "ORDER BY" statement of database query.
116 |  - New format keys "longstart" and "longend" for including year in output.
117 |  - longstart/end are displayed by default in "long" and "freeform" output types.
118 |  - Add support for user configuration in ~/.sqlog. 
119 |  - When sorting start and end time, assume "NONE" is the max date & time.
120 |  - Add string [More results available...] if more results may be in database.
121 |  - Added manpage for sqlog-db-util(8).
122 |  - Manpage and --usage output cleanup.
123 | 
124 | Version 0.3 (2007-08-04):
125 |  - Enable auto-creation of database from slurm-joblog script. 
126 |  - Add --info option to sqlog-db-util.
127 | 
128 | Version 0.2 (2007-08-03):
129 |  - Add README and NEWS files.
130 | 
131 | Version 0.1 (2007-08-03):
132 |  - Initial release.
133 | 


--------------------------------------------------------------------------------
/README:
--------------------------------------------------------------------------------
  1 | The sqlog package contains a set of scripts useful for creating, 
  2 | populating, and issuing queries to a SLURM job log database. 
  3 | 
  4 | COMPONENTS
  5 | 
  6 |   sqlog          The "SLURM Query Log" utility. Provides a single interface to
  7 |                  query jobs from the SLURM job log database and/or current 
  8 |                  queue of running jobs.
  9 | 
 10 |   slurm-joblog   Logs completed jobs using SLURM's jobcomp/script interface
 11 |                  to the SLURM job log database and an optional text file.
 12 | 
 13 |  
 14 |   sqlog-db-util  Administrative utility used to create SLURM job log database
 15 |                  and its corresponding users. Also provides an interface to
 16 |                  "backfill" the database using existing SLURM joblog files
 17 |                  created by the jobcomp/filetxt plugin.
 18 | 
 19 |   sqlog.conf     World-readable config file. Contains local configuration for
 20 |                  SQL host, read-only user, and read-only password.
 21 |  
 22 |   slurm-joblog.conf
 23 |                  Private configuration for slurm-joblog script (also used by 
 24 |                  by sqlog-db-util). Contains SQL read-write user and password,
 25 |                  root user passwd (for sqlog-db-util) and a list of hosts
 26 |                  that should have RW access to DB.
 27 |          
 28 |  
 29 | CONFIGURATION
 30 | 
 31 | For fully-automated operation, both the /etc/slurm/sqlog.conf and
 32 | /etc/slurm/slurm-joblog.conf must exist. These files are read 
 33 | using perl's do() function, so the files can and must be valid perl.
 34 | This allows a bit of scripting to get the values if necessary.
 35 | (See the sqlog doc directory for examples).
 36 | 
 37 | The available variables in each config file include:
 38 | 
 39 |   sqlog.conf:
 40 | 
 41 |   SQLHOST          SQL server hostname (default = sqlhost)
 42 |   SQLUSER          Read-only user      (default = slurm_read)
 43 |   SQLPASS          Read-only password  (default = none)
 44 |   SQLDB            DB name             (default = slurm)
 45 |   TRACKNODES       Set to 0 to disable per-job node tracking (default = 1)
 46 |   %FORMATS         Hash of format aliases (e.g. "f1" => "jid,name,user,state")
 47 | 
 48 |   slurm-joblog.conf:
 49 | 
 50 |   SQLUSER          Read-write user     (default = slurm)
 51 |   SQLPASS          Read-write password (not set)
 52 |   SQLROOTPASS      DB root password    (not set)
 53 |   @SQLRWHOSTS      Read-write hosts    (array of hosts to give rw access)
 54 |   JOBLOGFILE       txt joblog location (set if you want to log to a file too)
 55 |   AUTOCREATE       Attempt to create DB if it doesn't yet exist the
 56 |                    first time slurm-joblog is run (default = no).
 57 | 
 58 | 
 59 | CREATING JOB LOG DATABASE
 60 | 
 61 | Once the config files exist, the following command will create the 
 62 | SLURM job log database:
 63 |  
 64 |  sqlog-db-util --create
 65 | 
 66 | If you have existing text joblog files you'd like to seed the new
 67 | DB with, use 
 68 | 
 69 |   sqlog-db-util --backfill [FILE]...
 70 | 
 71 | e.g.
 72 | 
 73 |   sqlog-db-util --backfill /var/log/slurm/joblog*
 74 | 
 75 | If AUTOCREATE is set in slurm-joblog.conf, then sqlog-db-util --create
 76 | will be automatically run the first time the database is accessed.
 77 | 
 78 | CONVERTING JOB LOG DATABASE
 79 | 
 80 | The database schema changed from v0.12 to v0.13 of the sqlog package.
 81 | The highest schema version currently running on a system can be
 82 | determined from the --info output.
 83 | 
 84 | To create tables for the new schema, run:
 85 | 
 86 |   sqlog-db-util --create
 87 | 
 88 | Once created, the slurm-joblog.pl script will detect the new schema
 89 | and automatically switch to insert records to the new tables.  The sqlog command
 90 | will query both schemas for records.
 91 | 
 92 | To copy existing data from the old schema to the new schema,
 93 | use the --convert option.
 94 | 
 95 | Speeding up the conversion:
 96 |   The new schema tracks the nodes that each job uses so that sqlog queries
 97 |   involving nodes names return much faster.  The data and indicies associated
 98 |   with this node tracking can significantly slow down the conversion operation
 99 |   when converting a large number of records.  There are two options to speed
100 |   this up:
101 | 
102 |     1) Disable node-tracking for all converted jobs via the --notrack option.
103 |     2) Delay indexing of converted data via the --delay-index option.
104 | 
105 |   With the --notrack option, no node-tracking data will be stored for jobs
106 |   inserted via conversion.  As such, if node-tracking is enabled on the
107 |   system, such jobs will not return in queries involving node names.  Newly
108 |   inserted jobs will still have node-tracking data.
109 | 
110 |   With the --delay-index option, node tracking indicies are removed before
111 |   data is converted, and they are restored when the conversion completes.
112 |   Queries involving node names while there are no indicies will take a very
113 |   long time to return on a large database.
114 | 
115 |   For a database on Atlas, which had 580,000 jobs spanning two years the
116 |   conversion took:
117 | 
118 |     13 minutes for:  sqlog-db-util --convert --notrack
119 |     33 minutes for:  sqlog-db-util --convert --delay-index
120 |     85 minutes for:  sqlog-db-util --convert
121 | 
122 |   The recommended method is to use --delay-index.
123 | 
124 |   It's also possible to disable node-tracking in the new schema completely.
125 |   To do this, add the following line to the sqlog.conf file.
126 | 
127 |     $TRACKNODES=0;
128 |  
129 | Number of allocated cores:
130 |   The new schema adds a new field to record the number of cores allocated
131 |   to a job.  This data was not captured in the version 1 schema.  However,
132 |   on many systems, this core count can be computed.  On systems that have the
133 |   same number of cores per node and allocate whole nodes to a job, one may
134 |   use the --cores-per-node option to specify the number of cores per node.
135 |   This --cores-per-node value is multiplied with the node count recorded
136 |   in the version 1 schema to determine the number of cores allocated to
137 |   the job.  For example, to convert from schema version 1 to version 2 on
138 |   a machine that has 8 cores per node and allocates whole nodes to jobs,
139 |   run the following command:
140 | 
141 |     sqlog-db-util --convert --cores-per-node=8
142 | 
143 |   For all other systems, do not specify --cores-per-node.  In this case,
144 |   the number of cores allocated will be set to 0.  The conversion command
145 |   on these systems is simply:
146 | 
147 |     sqlog-db-util --convert
148 | 
149 | If a mistake is made during conversion, you can drop the version 2 tables
150 | and start from scratch (be very careful to specify '2' and not '1' here):
151 | 
152 |   sqlog-db-util --drop=2
153 | 
154 | You may issue the --convert command on a live system, however, be
155 | careful to specify the command correctly in this case.  The slurm-joblog.pl
156 | script will insert records to the new schema as soon as it is created.
157 | If a mistake is made during conversion, and the version 2 tables must
158 | be dropped and recreated, any records inserted by slurm-joblog.pl will be lost.
159 | 
160 | After conversion, sqlog may report duplicate records as it finds
161 | matches from both the version 1 and version 2 tables.  Once converted,
162 | it's recommended that the version 1 tables be dropped by running the
163 | following command (be very careful to specify '1' and not '2' here):
164 | 
165 |   sqlog-db-util --drop=1
166 | 
167 | Finally, here is a full example set of commands to create the new schema
168 | and convert records to it:
169 | 
170 |   sqlog-db-util -v --create
171 |   sqlog-db-util -v --backup=all schema1_jobs.log
172 |   sqlog-db-util -v --convert --delay-index --cores-per-node=8
173 |   sqlog-db-util -v --drop=1
174 | 
175 | BACKING UP AND PRUNING THE DATABASE
176 | 
177 | It is possible to dump records from the job log database into a text
178 | file, which can then be read in via --backfill.  This is useful to
179 | capture a text file backup of the logs.  One must specify the time
180 | period as either "all", "DATE", or "DATE..DATE", to dump all jobs,
181 | jobs before a given date, and jobs that started between two dates,
182 | respectively.  DATE should be specified with the 'YYYY-MM-DD HH:MM:SS'
183 | format, e.g.,
184 | 
185 |   sqlog-db-util -v --backup='2009-01-01 00:00:00'..'2009-02-01 00:00:00'\ 
186 |     logs.txt
187 | 
188 | One utility of this backup option is to share job log records with
189 | others potentially outside of the organization.  Typically, one would
190 | like to protect user and job names when sharing such information.
191 | For this, an --obfuscate option is available which dumps records and
192 | modifies user names to be of the form "user_X", userids to match "X",
193 | and job names to be of the form "job_Y", where X and Y are numbers.
194 | 
195 | Finally, over a long period of time, the database may gather so many
196 | records that is slows down significantly.  A --prune option is available
197 | to remove old records.  One specifies a date, and all jobs which started
198 | before that date will be removed from the database and written to a file
199 | name specified by the user, e.g.,
200 | 
201 |   sqlog-db-util -v --prune='2007-01-01 00:00:00' pre2007.log
202 | 
203 | ENABLE JOB LOGGING
204 | 
205 | To enable the SLURM job log database, the following configuration
206 | options must be set in slurm.conf:
207 | 
208 |   JobCompType = jobcomp/script
209 |   JobCompLoc = /usr/libexec/sqlog/slurm-joblog
210 | 
211 | Adjust the path if the sqlog RPM was installed with a different PREFIX.
212 | This has only been tested on SLURM 1.2.10 or greater.
213 | 
214 | Restart slurmctld and slurm-joblog will begin logging jobs as they
215 | complete.
216 | 
217 | $Id$
218 | 


--------------------------------------------------------------------------------
/skewstats:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/perl
  2 | ###############################################################################
  3 | #  $Id$
  4 | #******************************************************************************
  5 | #  Copyright (C) 2007-2009  Lawrence Livermore National Security, LLC.
  6 | #  Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
  7 | #  Written by Mark Grondona <mgrondona@llnl.gov>
  8 | #
  9 | #  UCRL-CODE-235340.
 10 | #
 11 | #  This file is part of sqlog.
 12 | #
 13 | #  This is free software; you can redistribute it and/or modify it
 14 | #  under the terms of the GNU General Public License as published by
 15 | #  the Free Software Foundation; either version 2 of the License, or
 16 | #  (at your option) any later version.
 17 | #
 18 | #  This is distributed in the hope that it will be useful, but WITHOUT
 19 | #  ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
 20 | #  FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
 21 | #  for more details.
 22 | #
 23 | #  You should have received a copy of the GNU General Public License
 24 | #  along with this program; if not, see <http://www.gnu.org/licenses/>.
 25 | #
 26 | ##############################################################################
 27 | #
 28 | #  skewstats - SLURM Queue Stats 
 29 | #
 30 | #  Report simple SLURM utilization and other statistics for a configurable time 
 31 | #   window optionally split into intervals. Uses the sqlog(1) utility to pull
 32 | #   historical job information from the SLURM job log database.
 33 | #
 34 | ##############################################################################
 35 | use strict;
 36 | use Date::Manip;
 37 | use Hostlist qw/ expand compress union /;
 38 | use Getopt::Long qw/ :config gnu_getopt /;
 39 | use Time::HiRes qw/ tv_interval gettimeofday /;
 40 | use File::Basename;
 41 | 
 42 | my @utilization = ();
 43 | my %opt = ();
 44 | my $progname = basename ($0);
 45 | 
 46 | ##############################################################################
 47 | #
 48 | #  Usage:
 49 | #
 50 | 
 51 | my $usage = <<EOF
 52 | Usage: $progname [OPTIONS]...
 53 | 
 54 |  Report simple SLURM queue statistics such as cluster utilization and
 55 |  total number of jobs run during a given time period. Uses the sqlog(1) 
 56 |  utility to query historical job information from the SLURM job log database.
 57 | 
 58 |  -h, --help         Display this help message.
 59 |  -v, --verbose      Increase verbosity.
 60 |  -H, --noheader     Don't include a header row on output.
 61 | 
 62 |  -s, --start=DATE   Initial date and time of statistics window.
 63 |                      Default is 12AM of the current day.
 64 | 
 65 |  -e, --end=DATE     End date and time of statistics window. The default value 
 66 |                      is the current time.  The DATE value may optionally begin 
 67 |                      with a '+' character, in which case the value is considered
 68 |                      an offset from the start time.  E.g.: --end=+3d would 
 69 |                      specify an end date 3 days past the start date and time.
 70 | 
 71 |  -i, --interval=T   Split time window into intervals of size T, e.g.
 72 |                       "1hr" (1 hour), or "30m" (30 minutes), etc.
 73 | 
 74 |  -t, --total        When used with -i, --interval, also include total
 75 |                     utilization from --start to --end on last line of output.
 76 | 
 77 |  -n, --nnodes=N     Specify a total of N nodes available.
 78 |                      (Default = currently configured node count)
 79 | 
 80 |  -o, --output=TYPE  Choose the type of stats to output. Current choices include:
 81 |                     utilization*, jobstats, jobsize, all (* = default).
 82 | 
 83 | EOF
 84 | ;
 85 | 
 86 | ##############################################################################
 87 | 
 88 | my %fmt = (
 89 |     "starttime"   => { "name" => "STARTTIME",  "fmt" => "@>>>>>>>>>>>>>>" },
 90 |     "endtime"     => { "name" => "ENDTIME",    "fmt" => "@>>>>>>>>>>>>>>" },
 91 |     "njobs"       => { "name" => "NJOBS",      "fmt" => "@######"         },
 92 |     "avgjobsize"  => { "name" => "AVGSZ",      "fmt" => "@###.##"         },
 93 |     "maxsize"     => { "name" => "MAXSZ",      "fmt" => "@####"           },
 94 |     "minsize"     => { "name" => "MINSZ",      "fmt" => "@####"           },
 95 |     "medsize"     => { "name" => "MEDSZ",      "fmt" => "@###.##"         },
 96 |     "njobstarts"  => { "name" => "STARTED",    "fmt" => "@######"         },
 97 |     "njobends"    => { "name" => "ENDED",      "fmt" => "@#####"          },
 98 |     "NF"          => { "name" => "NF",         "fmt" => "@####"           },
 99 |     "F"           => { "name" => "F",          "fmt" => "@####"           },
100 |     "TO"          => { "name" => "TO",         "fmt" => "@####"           },
101 |     "CA"          => { "name" => "CA",         "fmt" => "@####"           },
102 |     "CD"          => { "name" => "CD",         "fmt" => "@####"           },  
103 |     "utilization" => { "name" => "UTILIZATION","fmt" => "@######.###"     }
104 | );
105 | 
106 | ##############################################################################
107 | #
108 | #  Main script:
109 | 
110 | &parse_options ();
111 | 
112 | #
113 | #  Create user-defined format:
114 | #
115 | &format_create ();
116 | 
117 | #
118 | #  Get list of jobs between start and end time:
119 | #
120 | my @joblist = &get_jobs ();
121 | my @results = &utilization_loop ($opt{start}, $opt{end}, $opt{interval_s}); 
122 | 
123 | if (scalar @results > 1 && $opt{print_total}) {
124 |     push (@results, &utilization ($opt{start}, $opt{end}));
125 | }
126 | 
127 | #
128 | #  Write output using format defined in &format_create()
129 | #
130 | write foreach @results;
131 | 
132 | exit 0;
133 | ##############################################################################
134 | #
135 | #  Functions:
136 | 
137 | sub format_create
138 | {
139 |     my $format_top = "";
140 |     my $format = "";
141 |     my $common = "";
142 |     my %count;
143 | 
144 |     if ($opt{date_format} =~ /^%H:/) {
145 |         $fmt{starttime}{fmt} =~ s/>>>>>//;
146 |         $fmt{endtime}{fmt} =~   s/>>>>>//;
147 |     }
148 | 
149 |     my @form = grep { !$count{$_}++ } split /\s*,\s*/, $opt{format};
150 | 
151 |     for my $f (@form) {
152 |         if (!exists $fmt{$f}) {
153 |             log_fatal ("Invalid format key \"$f\"\n");
154 |             next;
155 |         }
156 |         $common .= "$fmt{$f}{fmt} ";
157 |     }
158 |     $common .= "\n";
159 | 
160 |     $format .= "format STDOUT = \n";
161 |     $format .= $common;
162 |     $format .= join ',', map { '$$_{' . $_ . '}' } @form;
163 |     $format .= "\n.\n";
164 | 
165 |     #
166 |     #  Change '#' and '.' to '>' for format header
167 |     #
168 |     $common =~ s/(#|\.)/>/g;
169 | 
170 |     $format_top .= "format STDOUT_TOP = \n";
171 |     $format_top .= $common;
172 |     $format_top .= join ',', map { '"'. $fmt{$_}{name} . '"' } @form;
173 |     $format_top .= "\n.\n";
174 | 
175 |     eval $format;
176 |     eval $format_top unless $opt{noheader};
177 | }
178 | 
179 | #
180 | #  Calculate utilization for all time intervals
181 | #
182 | sub utilization_loop
183 | {
184 |     my ($t1, $end, $interval) = @_;
185 |     my $t2;
186 |     my @results = ();
187 | 
188 |     #  Loop over all configured intervals printing utilization for each:
189 |     #
190 |     do {
191 |         my $err;
192 | 
193 |         $t2 = DateCalc ($t1, "+${interval}s", \$err);
194 | 
195 |         #  Shrink this interval if it extends past end time.
196 |         #
197 |         $t2 = $end if (Date_Cmp ($t2, $end) > 0);
198 | 
199 |         push (@results, &utilization ($t1, $t2));
200 | 
201 |     } while (Date_Cmp (($t1 = $t2), $end) < 0 && $interval != 0);
202 | 
203 |     return @results;
204 | }
205 | 
206 | #
207 | #  Calculate some stats (including utilization) for a 
208 | #   specific time window. Returns a reference to a hash containing
209 | #   the results.
210 | #
211 | sub utilization 
212 | {
213 |     my ($t1, $t2) = @_;
214 |     my $nodesecs = 0;
215 |     my $njobs = 0;
216 |     my @nnodes = ();
217 |     my $totalnodesecs = 0;
218 |     my $njobstarts = 0;
219 |     my $njobends = 0;
220 |     my %states;
221 | 
222 |     #  Convert start and end time to seconds since epoch for
223 |     #   quicker calculations and comparisons below.
224 |     #
225 |     my $starttime = UnixDate ($t1, "%s");
226 |     my $endtime   = UnixDate ($t2, "%s");
227 | 
228 |     #  Total node-seconds available is nnodes * window, unless
229 |     #   start and end time are the same, in which case we just use
230 |     #   nnodes (instantaneous snapshot).
231 |     #
232 |     $totalnodesecs = ($starttime < $endtime) ? 
233 |                      ($endtime - $starttime) * $opt{nnodes} : $opt{nnodes};
234 | 
235 |     for my $job (@joblist) {
236 |         my $jobstart = $$job{start};
237 |         my $jobend   = $$job{end};
238 | 
239 |         #  Continue if this job ran outside the current interval.
240 |         #
241 |         next if ($jobend < $starttime || $jobstart > $endtime);
242 | 
243 |         #  Count number of jobs and keep a list of node count:
244 |         #
245 |         $njobs++;
246 |         push (@nnodes, $$job{nnodes});
247 |             
248 |         #  Adjust job start and end times if either fell outside
249 |         #   the current time window. Otherwise count the number of jobs
250 |         #   starting and/or ending during this time.
251 |         #
252 |         ($jobstart < $starttime) ? $jobstart = $starttime : $njobstarts++;
253 |         if ($jobend > $endtime) {
254 |             $jobend = $endtime;    
255 |         }
256 |         else {
257 |             $njobends++;
258 |             $states{$$job{state}}++;
259 |         }
260 | 
261 |         #
262 |         #  If we're using an instantaneous snapshot, then just count 
263 |         #   number of nodes used (i.e. runtime == 1).
264 |         #
265 |         my $runtime = ($starttime == $endtime) ? 1 : ($jobend - $jobstart);
266 | 
267 |         #  Add this job's node-seconds used to the total node-seconds
268 |         #   utilized during this interval:
269 |         #
270 |         $nodesecs += ($runtime * $$job{nnodes});
271 |     }
272 | 
273 |     my %r = ();
274 | 
275 |     @nnodes = sort { $a <=> $b } @nnodes;
276 | 
277 |     $r{t1}           = $t1;
278 |     $r{t2}           = $t2;
279 |     $r{starttime}    = UnixDate ($t1, $opt{date_format});
280 |     $r{endtime}      = UnixDate ($t2, $opt{date_format});
281 |     $r{totalnodesec} = $totalnodesecs;
282 |     $r{usednodesec}  = $nodesecs;
283 |     $r{njobs}        = $njobs;
284 |     $r{avgjobsize}   = mean (@nnodes);
285 |     $r{medsize}      = median (@nnodes);
286 |     $r{maxsize}      = $nnodes[$#nnodes];
287 |     $r{minsize}      = $nnodes[0];
288 |     $r{njobstarts}   = $njobstarts;
289 |     $r{njobends}     = $njobends;
290 |     $r{utilization}  = ($nodesecs / $totalnodesecs);
291 |     for my $state (qw/ NF F CA TO CD /) {
292 |         $r{$state} = $states{$state};
293 |     }
294 | 
295 |     return (\%r);
296 | }
297 | 
298 | #
299 | #  Calculate meant of an array of values
300 | #
301 | sub mean
302 | {
303 |     return 0 unless @_;
304 | 
305 |     my $total = 0;
306 |     $total += $_ for @_;
307 |     return $total / scalar @_;
308 | }
309 | 
310 | #
311 | #  Calculate median of an array of values
312 | #
313 | sub median
314 | {
315 |     my @s = sort { $a <=> $b } @_;
316 |     my $n = scalar @s + 1;
317 |     return (($s[($n/2) + 1] + $s[$n/2])/2) if ($n % 2); 
318 |     return $s[$n/2];
319 | }
320 | 
321 | #
322 | #  Return a list of job info hashes for all jobs that ran in the
323 | #   time interval (start, end).
324 | #
325 | sub get_jobs
326 | {
327 |     my @jobs = ();
328 |     my $cmd = "sqlog -Ho jid,unixstart,unixend,runtime_s,nnodes,st -L0";
329 | 
330 |     #  Ignore completing jobs, which don't contribute to utilization
331 |     #
332 |     $cmd .= " -xs CG";
333 | 
334 |     #
335 |     #  If start time is "now" then there is no need for sqlog to 
336 |     #   query the db.
337 |     #
338 |     $cmd .= $opt{snapshot} ? " --no-db" : "";
339 | 
340 |     #  Grab jobs that were running during our configured window:
341 |     #
342 |     $cmd .= " -t $opt{start}..$opt{end}";
343 | 
344 |     log_verbose ("Querying jobs from $opt{start} to $opt{end}.\n");
345 |     log_debug ("Running $cmd\n");
346 | 
347 |     my $t0 = [gettimeofday];
348 | 
349 |     open (SQLOG, "$cmd |") or log_fatal ("Failed to run $cmd: $!\n");
350 |     while (<SQLOG>) {
351 |         chomp;
352 |         push (@jobs, job_entry_create (split));
353 |     }
354 |     close (SQLOG);
355 | 
356 |     log_verbose ("sqlog took ", sprintf ("%.3f", tv_interval ($t0)) , "s.\n");
357 |     log_verbose ("Found ", scalar @jobs, " jobs.\n");
358 | 
359 |     return @jobs;
360 | }
361 | 
362 | #
363 | #  Return a reference to a hash with job info
364 | #
365 | sub job_entry_create
366 | {
367 |     return {  
368 |        "jobid"   => shift,
369 |        "start"   => shift,
370 |        "end"     => shift,
371 |        "runtime" => shift,
372 |        "nnodes"  => shift,
373 |        "state"   => shift,
374 |     };
375 | }
376 | 
377 | #
378 | #  Parse user command line options:
379 | #
380 | sub parse_options
381 | {
382 |     $opt{verbose} = 0;
383 |     $opt{date_format} = "%H:%M:%S";
384 |     $opt{format} = "starttime,endtime,njobs,utilization";
385 | 
386 |     my $start;
387 |     my $end;
388 |     my $err;
389 | 
390 |     my $rc = GetOptions (\%opt,
391 |         'help|h',
392 |         'start|s=s',
393 |         'end|e=s',
394 |         'nnodes|n=i',
395 |         'interval|i=s',
396 |         'print_total|total|t',
397 |         'output|o=s',
398 |         'noheader|H',
399 | 		'verbose|v+',
400 | 
401 |         #  Undocumented: 
402 |         'format=s',
403 |         'date_format|date-format=s',
404 |     );
405 | 
406 |     &usage() if defined $opt{help} || ! $rc;
407 | 
408 |     $opt{format} = "starttime,endtime,njobs,njobstarts,njobends" .
409 |                    ",avgjobsize,utilization"
410 |         if ($opt{jobstats});
411 | 
412 |     #
413 |     #  Default start time is 12AM today, default end time is "now"
414 |     #
415 |     $opt{start} = "today,12am" if (!$opt{start});
416 |     $opt{end}   = "now"        if (!$opt{end});
417 | 
418 |     #
419 |     #  Note that we're doing a snapshot of start time == "now"
420 |     #
421 |     $opt{snapshot}++           if ($opt{start} eq "now");
422 | 
423 |     #
424 |     #  Set nnodes to the number of nodes configured now if not set
425 |     #
426 |     chomp ($opt{nnodes} = `sinfo -ho %D`) if (!$opt{nnodes});
427 | 
428 |     #
429 |     #  Special case: If --end option begins with a '+' consider this
430 |     #   an offset from --start time.
431 |     #
432 |     $opt{end} = DateCalc ($opt{start}, "+$opt{end}", \$err) 
433 |                                if ($opt{end} =~ s/^\+//); 
434 | 
435 |     #
436 |     #  Make sure start and end times can be parsed as datetimes:
437 |     #
438 |     log_fatal ("Failed to parse datetime \"$opt{start}\"\n") 
439 |        if (!($start = ParseDate ($opt{start})));
440 | 
441 |     log_fatal ("Failed to parse datetime \"$opt{end}\"\n") 
442 |        if (!($end = ParseDate ($opt{end})));
443 | 
444 |     log_fatal ("Start time \"$opt{start}\" greater than end time \"$opt{end}\"\n")
445 |        if (Date_Cmp ($start, $end) > 0);
446 | 
447 |     #
448 |     #  Convert start and end time to formats useful for sqlog:
449 |     #
450 |     $opt{start} = UnixDate ($start, "%Y-%m-%dT%H:%M:%S");
451 |     $opt{end} =   UnixDate ($end,   "%Y-%m-%dT%H:%M:%S");
452 | 
453 |     #
454 |     #  Generate interval in seconds (interval_s):
455 |     #
456 |     if (!$opt{interval}) {
457 |         $opt{interval_s} = UnixDate($end, "%s") - UnixDate($start, "%s");
458 |     }
459 |     else {
460 |         $opt{interval_s} = duration_to_seconds ($opt{interval}) 
461 |            or log_fatal ("Invalid interval \"$opt{interval}\"\n");
462 |     }
463 | 
464 |     #
465 |     #  Include MM-DD in default date format if start and end were on
466 |     #   different days.
467 |     #
468 |     $opt{date_format} = "%m-%d-%H:%M:%S"
469 |         if (UnixDate ($opt{start}, "%m%d") != UnixDate ($opt{end}, "%m%d"));
470 | 
471 |     if ($opt{output}) {
472 |         my $format = "starttime,endtime,njobs";
473 |         for (split /,/, $opt{output}) {
474 |             /^util(ization)?/ && 
475 |                 do { $format .= ",utilization"; next; };
476 |             /^(job)?stats/         && 
477 |                 do { $format .= ",njobends,F,NF,TO,CA,CD"; next; };
478 |             /^(job)?starts/        && 
479 |                 do { $format .= ",njobstarts"; next; };
480 |             /^(job)?size/          &&
481 |                 do { $format .= ",maxsize,minsize,avgjobsize,medsize"; next; };
482 |             /^all/                 && 
483 |                 do { $format .= ",njobstarts,njobends,maxsize,minsize" .
484 |                                 ",avgjobsize,medsize,F,NF,TO,CA,CD,utilization"; next };
485 |             log_fatal ("Invalid argument: --output \"$_\"\n");
486 |         }
487 | 
488 |         log_debug ("format = $format\n");
489 |         $opt{format} = $format;
490 |     }
491 | 
492 | }
493 | 
494 | 
495 | #  Convert a duration to seconds. 
496 | #
497 | #  Valid duration strings include the common SLURM form D-HH:MM:SS
498 | #    or the form 3H
499 | #
500 | sub duration_to_seconds
501 | {
502 |     my ($t) = @_;
503 |     my ($d, $h, $m, $s);
504 | 
505 |     #
506 |     #  list of valid regexes to check in order.
507 |     #  1. DD-HH:MM:SS type 
508 |     #  2. 1hr20min or 1h20m type
509 |     #
510 |     my @regexes = qw/
511 |       ^(?:(\d+)(?:-))?(\d*?):?(\d*?):(\d+)$
512 |       ^(\d*?)(?i:d|days?)?(\d*?)(?i:hr?)?(\d*?)(?i:m|min)?(\d*)(?i:s||sec)?$
513 |       NOTFOUND
514 |       /;
515 | 
516 |     for my $re (@regexes) {
517 |         return undef if ($re eq "NOTFOUND");
518 |         last if (($d, $h, $m, $s) = ($t =~ /$re/)) 
519 |     }
520 | 
521 |     log_debug ("duration_to_seconds ($t): d=$d h=$h m=$m s=$s\n");
522 | 
523 |     return (($s||0) + ($m||0) * 60 + ($h||0) * 3600 + ($d||0) * 3600 * 24);
524 | }
525 | 
526 | 
527 | sub usage 
528 | {
529 |     print STDERR $usage;
530 |     exit 1;
531 | }
532 | 
533 | sub log_msg     { print STDERR "$progname: ", @_; }
534 | sub log_verbose { log_msg (@_) if ($opt{verbose} > 0); }
535 | sub log_debug   { log_msg (@_) if ($opt{verbose} > 1); }
536 | sub log_error   { log_msg ("Error: ", @_);             }
537 | sub log_fatal   { log_msg ("Fatal: ", @_); exit 1;     }
538 | 
539 | # vi: ts=4 sw=4 expandtab
540 | 


--------------------------------------------------------------------------------
/skewstats.1:
--------------------------------------------------------------------------------
  1 | .\" $Id$
  2 | .\"
  3 | 
  4 | .TH SKEWSTATS 1 "SLURM Queue Stats"
  5 | 
  6 | .SH NAME
  7 | skewstats \- report simple SLURM queue statistics 
  8 | 
  9 | .SH SYNOPSIS
 10 | .B skewstats
 11 | [\fIOPTIONS\fR]...
 12 | 
 13 | .SH DESCRIPTION
 14 | The \fBskewstats\fR utility reports simple SLURM queue statistics 
 15 | such as number of jobs executed and average utilization over
 16 | defined time periods. It uses the \fBsqlog\fR(1) utility to query
 17 | the SLURM job log database, so \fBskewstats\fR can be used to
 18 | report on historical data.
 19 | 
 20 | By default \fBskewstats\fR reports number of jobs run during the
 21 | specified time period, as well as the cluster utilization. However,
 22 | other data is available such as average job size, number of jobs
 23 | that started running during the specified time interval, and number
 24 | of jobs that ended. Other data may be included in future versions
 25 | of the script.
 26 | 
 27 | The default time window for which \fBskewstats\fR reports when
 28 | run with no arguments is 12AM, today to the current time. Thus
 29 | it is reporting statistics for the current day so far. Both the
 30 | start and end time of the window may be specified with the
 31 | \fI--start\fR and \fI--end\fR options. For instance, to get 
 32 | statistics for the current instant only, \fI--start\fR=\fBnow\fR may be 
 33 | used. See the \fIOPTIONS\fR section below for further information.
 34 | 
 35 | Unless the \fI-i, --interval\fR option is used, \fBskewstats\fR
 36 | will display one line of output for the entire time period from
 37 | \fI--start\fR to \fIend\fR. The \fI--interval\fR option can be
 38 | used to break the time period into equal-sized intervals, and 
 39 | stats are summarized for each interval in turn. If the \fI-t, --total\fR
 40 | option is used, a final summary line is displayed for the
 41 | whole time period. 
 42 | 
 43 | .SH OPTIONS
 44 | .TP
 45 | .BI "-h, --help"
 46 | Display a summary of the command-line options.
 47 | .TP
 48 | .BI "-v, --verbose"
 49 | Increase debugging verbosity of the program.
 50 | .TP
 51 | .BI "-H, --noheader"
 52 | Do not display a header row in output.
 53 | .TP
 54 | .BI "-s, --start " DATETIME
 55 | Provide a start date and time for the time window of interest.
 56 | The default start time is today, 12AM.
 57 | .TP
 58 | .BI "-e, --end " DATETIME
 59 | Provide an end date and time for the window of interest.
 60 | The default end time is the current time (or \fInow\fR). As a special
 61 | case, if the end DATETIME begins with a plus \fI+\fR, then the end
 62 | date and time is considered to be an offset from the start time.
 63 | .TP
 64 | .BI "-i, --interval " DURATION
 65 | Split time window into intervals of size \fIDURATION\fR. DURATION may
 66 | have the form DD-HH:MM:SS or DDdHHhMMmSSs or DDdaysHHhrMMminSSsec,
 67 | where DD is days, HH is hours, MM is minutes, and SS is seconds. In the
 68 | second two forms, values that are zero may be lef out, e.g. 4hr.
 69 | In the first form, days, hours, and minutes are optional, e.g. :04 
 70 | is 4 seconds.
 71 | .TP
 72 | .BI "-t, --total " 
 73 | Include a final line summarizing statistics for the total time
 74 | period when using the \fI--interval\fR option.
 75 | .TP
 76 | .BI "-o, --output " TYPE
 77 | Select alternate output statistics. By default, TYPE is 
 78 | "utilization" which includes the number of jobs run and the
 79 | cluster utilization during the specified time window. Alternate
 80 | output types include:
 81 | .TP 20
 82 | .B "utilization | util"
 83 | This is default output type. It reports the cluster utilization for the
 84 | configured time window.
 85 | .TP
 86 | .B "jobstats | stats" 
 87 | Report job completion statistics including the number of jobs that ended
 88 | during the time window, and the number of jobs that ended with each of
 89 | the job state codes F = failed, NF = node failure, TO = timed out,
 90 | CA = cancelled, and CD = completed.
 91 | .TP
 92 | .B "jobstarts | starts"
 93 | Report the number of jobs that started during the time window.
 94 | .TP 
 95 | .B "jobsize | size"
 96 | Report simple job size statistics, including the maximum, minimum,
 97 | average, and median job size.
 98 | .TP
 99 | .B "all"
100 | Report all job statistics at once.
101 | 
102 | .SH EXAMPLES
103 | Display number of jobs run and utilization from 3 hrs ago until now:
104 | .nf
105 | 
106 |     skewstats --start=-3hr
107 | 
108 | .fi
109 | Display statistics since yesterday at 8AM divided into 1hr intervals, 
110 | including extra job stats such as number of jobs starting and ending
111 | during each time interval:
112 | .nf
113 | 
114 |     skewstats -i 1h -s yesterday,8am --jobstats
115 | 
116 | .fi
117 | Display stats for the 4 hour window starting on April 3rd, 2008, 09:00
118 | .nf
119 | 
120 |     skewstats --start=2008-04-03T09:00 --end=+3hr
121 | 
122 | .fi  
123 | .SH AUTHOR
124 | Written by Mark Grondona.
125 | 
126 | .SH "SEE ALSO"
127 | .BR sqlog (1),
128 | .BR squeue (1),
129 | .BR sinfo (1)
130 | 


--------------------------------------------------------------------------------
/slurm-joblog.conf.example:
--------------------------------------------------------------------------------
 1 | ###############################################################################
 2 | # $Id$
 3 | ###############################################################################
 4 | #
 5 | #  SLURM Job log utility config file. 
 6 | #
 7 | #  Allows configuration of the following:
 8 | #
 9 | #   SQLUSER     : The Read-write username for the DB
10 | #   SQLPASS     : Read-write password
11 | #   SQLROOTPASS : Root password (Needed for DB creation)
12 | #   JOBLOGFILE  : Location of joblog file (empty if you don't want a logfile) 
13 | #   SQLRWHOSTS  : Array of all hosts from which to allow RW user access.
14 | #   SQLNETWORK  : Restricted network for db  (default = 192.168.%.%)
15 | #
16 | package conf;
17 | use Genders;
18 | 
19 | $SQLUSER = "slurm";
20 | $SQLPASS = "MyReadWritePassword";
21 | $SQLNETWORK = "192.168.%.%";
22 | 
23 | # Root password needed for creation of SLURM tables
24 | $SQLROOTPASS = "MyRootPassword";
25 | 
26 | # Attempt to autocreate DB if it doesn't exist
27 | $AUTOCREATE = 1;
28 | 
29 | # Job log file. If no logfile, set to empty.
30 | $JOBLOGFILE = "/var/log/slurm/joblog";
31 | 
32 | # Give rw access to slurm db from these hosts
33 | @SQLRWHOSTS = get_rw_nodes ();
34 | 
35 | 1;
36 | 
37 | sub get_rw_nodes
38 | {
39 | 	my $g = Genders->new ();
40 | 	my @nodes = $g->getnodes ("mysqld");
41 | 	push (@nodes, $g->getnodes ("primgmt"));
42 | 	push (@nodes, $g->getnodes ("altmgmt"));
43 | 
44 | 	# Include altnames
45 | 	push (@nodes, map { $g->getattrval ("altname", $_) } @nodes);
46 | 
47 | 	return (@nodes);
48 | }
49 | # vi: ts=4 sw=4
50 | 


--------------------------------------------------------------------------------
/slurm-joblog.pl:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/perl -w
  2 | ###############################################################################
  3 | #  $Id$
  4 | #******************************************************************************
  5 | #  Copyright (C) 2007-2009  Lawrence Livermore National Security, LLC.
  6 | #  Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
  7 | #  Written by Adam Moody <moody20@llnl.gov> and
  8 | #             Mark Grondona <mgrondona@llnl.gov>
  9 | #
 10 | #  UCRL-CODE-235340.
 11 | #
 12 | #  This file is part of sqlog.
 13 | #
 14 | #  This is free software; you can redistribute it and/or modify it
 15 | #  under the terms of the GNU General Public License as published by
 16 | #  the Free Software Foundation; either version 2 of the License, or
 17 | #  (at your option) any later version.
 18 | #
 19 | #  This is distributed in the hope that it will be useful, but WITHOUT
 20 | #  ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
 21 | #  FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
 22 | #  for more details.
 23 | #
 24 | #  You should have received a copy of the GNU General Public License
 25 | #  along with this program; if not, see <http://www.gnu.org/licenses/>.
 26 | ###############################################################################
 27 | #
 28 | #  This script is run by the SLURM controller at every job completion
 29 | #   to insert records in the job completion database.
 30 | #
 31 | #
 32 | require 5.005;
 33 | use strict;
 34 | use lib qw();	# Required for _perl_libpaths RPM option
 35 | use DBI;
 36 | use File::Basename;
 37 | use POSIX qw(strftime);
 38 | use Hostlist qw(expand);
 39 | 
 40 | # Required for _path_env_var RPM option
 41 | $ENV{PATH} = '/bin:/usr/bin:/usr/sbin';
 42 | 
 43 | my $prog = basename $0;
 44 | 
 45 | #  List of job variables provided in ENV by SLURM.
 46 | #
 47 | my @SLURMvars = qw(JOBID UID JOBNAME JOBSTATE PARTITION LIMIT START END
 48 |                    NODES PROCS NODECNT);
 49 | 
 50 | #  List of parameters (in order) to pass to SQL execute command below.
 51 | #  
 52 | my @params    = qw(jobid username uid jobname jobstate partition limit
 53 | 		           start end nodes nodecount procs);
 54 | 
 55 | # 
 56 | #  Set up SQL parameters
 57 | #
 58 | my %conf = ();
 59 | $conf{db}      = "slurm";
 60 | $conf{sqluser} = "slurm";
 61 | $conf{sqlpass} = "";
 62 | $conf{sqlhost} = "sqlhost";
 63 | $conf{stmt_v1} = qq(INSERT INTO slurm_job_log VALUES (?,?,?,?,?,?,?,?,?,?,?,?));
 64 | $conf{confdir} = "/etc/slurm";
 65 | 
 66 | # enables / disables node tracking per job in version 2 schema
 67 | $conf{track} = 1;
 68 | 
 69 | # assume neither version 1 nor version 2 are available
 70 | $conf{version}{1} = 0;
 71 | $conf{version}{2} = 0;
 72 | 
 73 | #
 74 | #  Autocreate slurm_job_log DB if it doesn't exist?
 75 | #
 76 | $conf{autocreate} = 0;
 77 | 
 78 | #
 79 | #  Default job logfile. If empty, no logfile is used.
 80 | #
 81 | $conf{joblogfile} = "/var/log/slurm/joblog";
 82 | 
 83 | #
 84 | #  Read db, user, password, host from config files:
 85 | #
 86 | read_config ();
 87 | #
 88 | #  Get SLURM-provided env vars and add to config
 89 | #
 90 | get_slurm_vars ();
 91 | 
 92 | #  Append job log to database.
 93 | my $success = append_job_db ();
 94 | 
 95 | #  Append to text file.if requested or DB failed.
 96 | if ($conf{joblogfile} || !$success) {
 97 |     append_joblog ();
 98 | }
 99 | 
100 | exit 0;
101 | 
102 | 
103 | #
104 | #  Error logging functions:
105 | #
106 | sub log_msg 
107 | {
108 |     my @msg = @_;
109 |     my $logfile = "/var/log/slurm/jobcomp.log";
110 | 
111 |     if (!open (LOG, ">>$logfile")) {
112 |         print STDERR @msg;
113 |         return;
114 |     }
115 |     print LOG scalar localtime, ": ", @msg;
116 |     close (LOG);
117 | }
118 | 
119 | sub log_error
120 | {
121 |     log_msg "$prog: Error: ", @_;
122 |     return undef
123 | }
124 | 
125 | sub log_fatal
126 | {
127 |     log_msg "$prog: Fatal: ", @_;
128 |     exit 1;
129 | }
130 | 
131 | 
132 | sub read_config
133 | {
134 |     my $ro = "$conf{confdir}/sqlog.conf";
135 |     my $rw = "$conf{confdir}/slurm-joblog.conf";
136 | 
137 |     # First read sqlog config to get SQLHOST and SQLDB
138 |     #  (ignore SQLUSER)
139 |     unless (my $rc = do $ro) {
140 |         return log_error ("Couldn't parse $ro: $@\n") if $@;
141 |         return log_error ("couldn't run $ro\n") if (defined $rc && !$rc);
142 |     }
143 |     $conf{sqlhost} = $conf::SQLHOST if (defined $conf::SQLHOST);
144 |     $conf{db}      = $conf::SQLDB   if (defined $conf::SQLDB);
145 | 
146 |     # enable / disable per job node tracking
147 |     $conf{track} = $conf::TRACKNODES if (defined $conf::TRACKNODES);
148 | 
149 |     undef $conf::SQLUSER;
150 |     undef $conf::SQLPASS;
151 | 
152 |     # Now read slurm-joblog.conf
153 |     -r $rw  || return log_error ("Unable to read required config file: $rw.\n");
154 |     unless (my $rc = do $rw) {
155 |         return log_error ("Couldn't parse $rw: $@\n") if $@;
156 |         return log_error ("couldn't run $rw\n") if (defined $rc && !$rc);
157 |     }
158 | 
159 |     $conf{sqluser}    = $conf::SQLUSER    if (defined $conf::SQLUSER);
160 |     $conf{sqlpass}    = $conf::SQLPASS    if (defined $conf::SQLPASS);
161 |     $conf{joblogfile} = $conf::JOBLOGFILE if (defined $conf::JOBLOGFILE);
162 |     $conf{autocreate} = $conf::AUTOCREATE if (defined $conf::AUTOCREATE);
163 | }
164 | 
165 | 
166 | sub get_slurm_vars
167 | {
168 |     # if a job is cancelled before it starts,
169 |     # set reasonable defaults for missing variables
170 |     #   PROCS may be set to the number of requested
171 |     #         processors (don't know), force it to 0
172 |     #   NODECNT may be set to the number of requested
173 |     #           nodes (don't know), we don't use this anyway
174 |     if (not $ENV{NODES}) {
175 |         $ENV{NODES} = "";
176 |         $ENV{PROCS} = 0;
177 |     }
178 | 
179 |     # set fields in conf corresponding to each SLURM variable
180 |     for my $var (@SLURMvars) {
181 |         exists $ENV{$var} or
182 |             log_fatal "$var not set in script environment! Aborting...\n";
183 |         $conf{lc $var} = $ENV{$var};
184 |     }
185 | 
186 |     # get username
187 |     $conf{username}  = getpwuid($conf{uid});
188 | 
189 |     # If NODECNT wasn't set, try counting the list of nodes:
190 |     #
191 |     #  XXX: SLURM's NODECNT variable is incorrect in many cases,
192 |     #  e.g. when a job's state is NODE_FAIL, NODECNT will have been
193 |     #  decremented at the time the job ends (from the failed node)
194 |     #  So, we unfortunately cannot trust the NODECNT variable here.
195 |     #
196 |     #if (defined $conf{nodecnt}) {
197 |     #    $conf{nodecount} = $conf{nodecnt};
198 |     #}
199 |     #else {
200 |     #    $conf{nodecount} = ($conf{nodes} =~ /^\s*$/) ? 0 : expand($conf{nodes});
201 |     #}
202 |     $conf{nodecount} = ($conf{nodes} =~ /^\s*$/) ? 0 : expand($conf{nodes});
203 | }
204 | 
205 | sub create_db
206 | {
207 |     my $cmd = "sqlog-db-util --create";
208 |     system ($cmd);
209 |     if ($?>>8) {
210 |         log_error ("'$cmd' exited with exit code ", $?>>8, "\n");
211 |         return (0);
212 |     }
213 | 
214 |     log_msg ("Created DB $conf{db} at host $conf{sqlhost}\n");
215 | 
216 |     return (1);
217 | }
218 | 
219 | ########################################
220 | # These following functions are similar to those in sqlog-db-util
221 | # TODO: Move these to a perl module?
222 | ########################################
223 | 
224 | # cache for name ids, saves us from hitting the database
225 | # over and over at the cost of more memory
226 | # not really needed in this case (insert of a single job),
227 | # but this way, the functions are the same as sqlog-db-util
228 | my %IDcache = ();
229 | %{$IDcache{nodes}} = ();
230 | 
231 | # execute (do) sql statement on dbh
232 | sub do_sql {
233 |     my ($dbh, $stmt) = @_;
234 |     if (!$dbh->do ($stmt)) {
235 |       log_error ("SQL failed: $stmt\n");
236 |       return 0;
237 |     }
238 |     return 1;
239 | }
240 | 
241 | # returns 1 if table exists, 0 otherwise
242 | sub table_exists
243 | {
244 |     my $dbh   = shift @_;
245 |     my $table = shift @_;
246 | 
247 |     # check whether our database has a table by the proper name
248 |     my $sth = $dbh->prepare("SHOW TABLES;");
249 |     if ($sth->execute()) {
250 |         while (my ($name) = $sth->fetchrow_array()) {
251 |             if ($name eq $table) { return 1; }
252 |         }
253 |     }
254 | 
255 |     # didn't find it
256 |     return 0;
257 | }
258 | 
259 | # return the auto increment value for the last inserted record
260 | sub get_last_insert_id
261 | {
262 |     my $dbh = shift @_;
263 |     my $id = undef;
264 | 
265 |     my $sql = "SELECT LAST_INSERT_ID();";
266 |     my $sth = $dbh->prepare($sql);
267 |     if ($sth->execute()) {
268 |         ($id) = $sth->fetchrow_array();
269 |     } else {
270 |         log_error ("Fetching last id: $sql\n");
271 |     }
272 | 
273 |     return $id;
274 | }
275 | 
276 | # given a table and name,
277 | # read id for name from table and add to id cache if found
278 | sub read_id
279 | {
280 |     my $dbh   = shift @_;
281 |     my $table = shift @_;
282 |     my $name  = shift @_;
283 | 
284 |     my $id = undef;
285 | 
286 |     # if name is not set, don't try to look it up in hash, just return undef
287 |     if (not defined $name) { return $id; }
288 | 
289 |     if (not defined $IDcache{$table}) { %{$IDcache{$table}} = (); }
290 |     if (not defined $IDcache{$table}{$name}) {
291 |         my $q_name = $dbh->quote($name);
292 |         my $sql = "SELECT * FROM `$table` WHERE `name` = $q_name;";
293 |         my $sth = $dbh->prepare($sql);
294 |         if ($sth->execute ()) {
295 |             my ($table_id, $table_name) = $sth->fetchrow_array ();
296 |             if (defined $table_id and defined $table_name) {
297 |                 $IDcache{$table}{$name} = $table_id;
298 |                 $id = $table_id;
299 |             }
300 |         } else {
301 |             log_error ("Reading record: $sql --> " . $dbh->errstr . "\n");
302 |         }
303 |     } else {
304 |         $id = $IDcache{$table}{$name};
305 |     }
306 | 
307 |     return $id;
308 | }
309 | 
310 | # insert name into table if it does not exist, and return its id
311 | sub read_write_id
312 | {
313 |     my $dbh   = shift @_;
314 |     my $table = shift @_;
315 |     my $name  = shift @_;
316 | 
317 |     # attempt to read the id first,
318 |     # if not found, insert it and return the last insert id
319 |     my $id = read_id ($dbh, $table, $name);
320 |     if (not defined $id) {
321 |         my $q_name = $dbh->quote($name);
322 |         my $sql = "INSERT IGNORE INTO `$table` (`id`,`name`)" .
323 |                   " VALUES (NULL,$q_name);";
324 |         my $sth = $dbh->prepare($sql);
325 |         if ($sth->execute ()) {
326 |             # user read_id here instead of get_last_insert_id
327 |             # to avoid race conditions
328 |             $id = read_id ($dbh, $table, $name);
329 |             if (not defined $id) {
330 |                 log_error ("Error inserting new record (id undefined): $sql\n");
331 |                 $id = 0;
332 |             } elsif ($id == 0) {
333 |                 log_error ("Error inserting new record (id=0): $sql\n");
334 |                 $id = 0;
335 |             }
336 |         } else {
337 |             log_error ("Error inserting new record: $sql --> " .
338 |                        $dbh->errstr . "\n");
339 |             $id = 0;
340 |         }
341 |     }
342 | 
343 |     return $id;
344 | }
345 | 
346 | # given a reference to a list of nodes,
347 | # read their ids from the nodes table and add them to the id cache
348 | sub read_node_ids
349 | {
350 |     my $dbh       = shift @_;
351 |     my $nodes_ref = shift @_;
352 |     my $success = 1;
353 | 
354 |     # build list of nodes not in our cache
355 |     my @missing_nodes = ();
356 |     foreach my $node (@$nodes_ref) {
357 |         if (not defined $IDcache{nodes}{$node}) { push @missing_nodes, $node; }
358 |     }
359 | 
360 |     # if any missing nodes, try to look up their values
361 |     if (@missing_nodes > 0) {
362 |         my @q_nodes = map $dbh->quote($_), @missing_nodes;
363 |         my $in_nodes = join(",", @q_nodes);
364 |         my $sql = "SELECT * FROM `nodes` WHERE `name` IN ($in_nodes);";
365 |         my $sth = $dbh->prepare($sql);
366 |         if ($sth->execute ()) {
367 |             while (my ($table_id, $table_name) = $sth->fetchrow_array ()) {
368 |                 $IDcache{nodes}{$table_name} = $table_id;
369 |             }
370 |         } else {
371 |             log_error ("Reading nodes: $sql --> " . $dbh->errstr . "\n");
372 |             $success = 0;
373 |         }
374 |     }
375 | 
376 |     return $success;
377 | }
378 | 
379 | # given a reference to a list of nodes,
380 | # insert them into the nodes table and add their ids to the id cache
381 | sub read_write_node_ids
382 | {
383 |     my $dbh       = shift @_;
384 |     my $nodes_ref = shift @_;
385 |     my $success = 1;
386 | 
387 |     # read node_ids for these nodes into our cache
388 |     read_node_ids($dbh, $nodes_ref);
389 | 
390 |     # if still missing nodes, we need to insert them
391 |     my @missing_nodes = ();
392 |     foreach my $node (@$nodes_ref) {
393 |         if (not defined $IDcache{nodes}{$node}) { push @missing_nodes, $node; }
394 |     }
395 |     if (@missing_nodes > 0) {
396 |         my @q_nodes = map $dbh->quote($_), @missing_nodes;
397 |         my $values = join("),(", @q_nodes);
398 |         my $sql = "INSERT IGNORE INTO `nodes` (`name`) VALUES ($values);";
399 |         my $sth = $dbh->prepare($sql);
400 |         if (not $sth->execute ()) {
401 |             log_error ("Inserting nodes: $sql --> " . $dbh->errstr . "\n");
402 |             $success = 0;
403 |         }
404 | 
405 |         # fetch ids for just inserted nodes
406 |         read_node_ids($dbh, $nodes_ref);
407 |     }
408 | 
409 |     return $success;
410 | }
411 | 
412 | # given a job_id and a nodelist,
413 | # insert jobs_nodes records for each node used in job_id
414 | sub insert_job_nodes
415 | {
416 |     my $dbh      = shift @_;
417 |     my $job_id   = shift @_;
418 |     my $nodelist = shift @_;
419 |     my $success = 1;
420 | 
421 |     if (defined $job_id and defined $nodelist and $nodelist ne "") {
422 |         my $q_job_id = $dbh->quote($job_id);
423 | 
424 |         # clean up potentially bad nodelist
425 |         if ($nodelist =~ /\[/ and $nodelist !~ /\]/) {
426 |             # found an opening bracket, but no closing bracket,
427 |             # nodelist is probably incomplete
428 |             # chop back to last ',' or '-' and replace with a ']'
429 |             $nodelist =~ s/[,-]\d+$/\]/;
430 |         }
431 | 
432 |         # get our nodeset
433 |         my @nodes = Hostlist::expand($nodelist);
434 | 
435 |         # this will fill our node_id cache
436 |         read_write_node_ids($dbh, \@nodes);
437 | 
438 |         # get the node_id for each node
439 |         my @values = ();
440 |         foreach my $node (@nodes) {
441 |             if (defined $IDcache{nodes}{$node}) {
442 |                 my $q_node_id = $dbh->quote($IDcache{nodes}{$node});
443 |                 push @values, "($q_job_id,$q_node_id)";
444 |             }
445 |         }
446 | 
447 |         # if we have any nodes for this job, insert them
448 |         if (@values > 0) {
449 |             my $sql = "INSERT DELAYED IGNORE INTO `jobs_nodes`" .
450 |                       " (`job_id`,`node_id`)" .
451 |                       " VALUES " . join(",", @values) . ";";
452 |             my $sth = $dbh->prepare($sql);
453 |             if (not $sth->execute ()) {
454 |                 log_error ("Inserting jobs_nodes records for job id" .
455 |                            " $job_id: $sql --> " . $dbh->errstr . "\n");
456 |                 $success = 0;
457 |             }
458 |         }
459 |     }
460 | 
461 |     return $success;
462 | }
463 | 
464 | # compute time since epoch, attempt to account for DST changes via timelocal
465 | sub get_seconds
466 | {
467 |     my ($date) = @_;
468 |     use Time::Local;
469 | 
470 |     my ($y, $m, $d, $H, $M, $S) = ($date =~ /(\d\d\d\d)\-(\d\d)\-(\d\d) (\d\d):(\d\d):(\d\d)/);
471 |     $y -= 1900;
472 |     $m -= 1;
473 | 
474 |     return timelocal ($S, $M, $H, $d, $m, $y);
475 | }
476 | 
477 | # given hash of values, create mysql values string for insert statement
478 | sub value_string_v2
479 | {
480 |     my $dbh = shift @_;
481 |     my $h   = shift @_;
482 | 
483 |     # given start and end times, compute the number of
484 |     # seconds the job ran for
485 |     # TODO: unsure whether this correctly handles jobs
486 |     # that straddle DST changes
487 |     my $seconds = 0;
488 |     if (defined $h->{StartTime} and $h->{StartTime} !~ /^\s*$/ and
489 |         defined $h->{EndTime}   and $h->{EndTime}   !~ /^\s*$/)
490 |     {
491 |          my $start = get_seconds($h->{StartTime});
492 |          my $end   = get_seconds($h->{EndTime});
493 |          $seconds = $end - $start;
494 |          if ($seconds < 0) { $seconds = 0; }
495 |     }
496 | 
497 |     # if Procs is not set, but ppn is specified and NodeCnt is set,
498 |     # compute Procs (assumes all processors on the node were
499 |     # allocated to the job, only use for clusters which use
500 |     # whole-node allocation)
501 | #    if (not defined $h->{Procs} and defined $conf{ppn} and
502 | #        defined $h->{NodeCnt}
503 | #       )
504 | #    {
505 | #      $h->{Procs} = $h->{NodeCnt} * $conf{ppn};
506 | #    }
507 | 
508 |     # insert the field values, order matters
509 |     my @parts = ();
510 |     push @parts, (defined $h->{Id}) ? $dbh->quote($h->{Id}) : "NULL";
511 |     push @parts, $dbh->quote($h->{JobId});
512 |     push @parts, $dbh->quote(read_write_id($dbh, "usernames",  $h->{UserName}));
513 |     push @parts, $dbh->quote($h->{UserNumb});
514 |     push @parts, $dbh->quote(read_write_id($dbh, "jobnames",   $h->{Name}));
515 |     push @parts, $dbh->quote(read_write_id($dbh, "jobstates",  $h->{JobState}));
516 |     push @parts, $dbh->quote(read_write_id($dbh, "partitions", $h->{Partition}));
517 |     push @parts, $dbh->quote($h->{TimeLimit});
518 |     push @parts, $dbh->quote($h->{StartTime});
519 |     push @parts, $dbh->quote($h->{EndTime});
520 |     push @parts, $dbh->quote($seconds);
521 |     push @parts, $dbh->quote($h->{NodeList});
522 |     push @parts, $dbh->quote($h->{NodeCnt});
523 |     push @parts, (defined $h->{Procs}) ? $dbh->quote($h->{Procs}) : 0;
524 | 
525 |     # finally, return the ('field1','field2',...) string
526 |     return "(" . join(',', @parts) . ")";
527 | }
528 | 
529 | ########################################
530 | # The above functions are similar to those in sqlog-db-util
531 | # TODO: Move these to a perl module?
532 | ########################################
533 | 
534 | #
535 | #  Append data to SLURM job log (database)
536 | #
537 | sub append_job_db
538 | {
539 |     #  Ignore if no sqlhost, just append to txt joblog
540 |     #
541 |     if (!$conf{"sqlhost"}) {
542 |         log_error "No SQLHOST found $conf{sqlhost}\n";
543 |         return 0;
544 |     }
545 | 
546 |     my $str = "DBI:mysql:database=$conf{db};host=$conf{sqlhost}";
547 |     my $dbh = DBI->connect($str, $conf{sqluser}, $conf{sqlpass});
548 | 
549 |     if (!$dbh) {
550 |         if (!$conf{autocreate}) {
551 |             log_error ("Failed to connect to DB at $conf{sqlhost}: ",
552 |                        "$DBI::errstr\n");
553 |             return (0);
554 |         }
555 |         create_db() 
556 |             or return (0);
557 |         $dbh = DBI->connect($str, $conf{sqluser}, $conf{sqlpass})
558 |             or return (0);
559 |     }
560 | 
561 |     # check whether we have version 1 and version 2 schemas
562 |     $conf{version}{1} = table_exists ($dbh, 'slurm_job_log');
563 |     $conf{version}{2} = table_exists ($dbh, 'jobs');
564 | 
565 |     # Check for tables, if not found, try to create them
566 |     if (not $conf{version}{1} and not $conf{version}{2}) {
567 |         log_msg ("SLURM job log table doesn't exist in DB. Creating.\n");
568 |         create_db () or return (0);
569 |     }
570 | 
571 |     # if we have schema 2 use it, otherwise, try schema 1
572 |     # if neither is found, print an error
573 |     if ($conf{version}{2}) {
574 |         # value_string_v2 expects certain field names, so convert conf
575 |         my %h = ();
576 |         $h{JobId}     = $conf{jobid};
577 |         $h{UserName}  = $conf{username};
578 |         $h{UserNumb}  = $conf{uid};
579 |         $h{Name}      = $conf{jobname};
580 |         $h{JobState}  = $conf{jobstate};
581 |         $h{Partition} = $conf{partition};
582 |         $h{TimeLimit} = $conf{limit};
583 |         $h{StartTime} = convtime_db("start");
584 |         $h{EndTime}   = convtime_db("end");
585 |         $h{NodeList}  = $conf{nodes};
586 |         $h{NodeCnt}   = $conf{nodecount};
587 |         $h{Procs}     = $conf{procs};
588 | 
589 |         # convert hash to VALUES clause
590 |         my $value_string = value_string_v2 ($dbh, \%h);
591 | 
592 |         # insert into v2 schema
593 |         my $sql = "INSERT INTO `jobs` VALUES $value_string;";
594 |         if (not do_sql ($dbh, $sql)) {
595 |             log_error "Problem inserting into slurm table:" .
596 |                       " $sql: error: ", $dbh->errstr, "\n"; 
597 |             return 0;
598 |         }
599 | 
600 |         # insert nodes used by this job if node tracking is enabled
601 |         if ($conf{track}) {
602 |             my $job_id = get_last_insert_id ($dbh);
603 |             if (defined $job_id and $job_id != 0) {
604 |                 insert_job_nodes ($dbh, $job_id, $h{NodeList});
605 |             }
606 |         }
607 |     } elsif ($conf{version}{1}) {
608 |         # insert into v1 schema
609 |         my @params_v1 = @params;
610 |         pop @params_v1;
611 | 
612 |         my $sth_v1 = $dbh->prepare($conf{stmt_v1}) 
613 |             or log_error "prepare: ", $dbh->errstr, "\n";
614 | 
615 |         if (not $sth_v1->execute("NULL", map {convtime_db($_)} @params_v1)) {
616 |             log_error "Problem inserting into slurm table: ",
617 |                       $dbh->errstr, "\n"; 
618 |             return 0;
619 |         }
620 |     } else {
621 |         log_error "No tables found to insert record into\n"; 
622 |         return 0;
623 |     }
624 | 
625 |     $dbh->disconnect;
626 |     return 1;
627 | }
628 | 
629 | sub convtime_db
630 | {
631 |     my ($var) = @_;
632 |     my $fmt = "%Y-%m-%d %H:%M:%S";
633 | 
634 |     $var =~ /^(start|end)$/ && return strftime $fmt, localtime ($conf{$var});
635 |     return $conf{$var};
636 | }
637 | 
638 | 
639 | sub convtime
640 | {
641 |     my ($var) = @_;
642 |     my $fmt = "%Y-%m-%dT%H:%M:%S";
643 | 
644 |     $var =~ /^(start|end)$/ && return strftime $fmt, localtime ($conf{$var});
645 |     return $conf{$var};
646 | }
647 | 
648 | #
649 | #  Append data to SLURM job log (text file)
650 | #
651 | sub append_joblog
652 | {
653 | 	my $joblog = $conf{joblogfile};
654 | 
655 | 	if (!open (JOBLOG, ">>$joblog")) {
656 | 		log_error  "Unable to open $joblog: $!\n";
657 | 		return 0;
658 | 	}
659 | 
660 | 	printf JOBLOG "JobId=%s UserId=%s(%s) Name=%s JobState=%s Partition=%s " .
661 | 		          "TimeLimit=%s StartTime=%s EndTime=%s NodeList=%s " .
662 | 				  "NodeCnt=%s Procs=%s\n", 
663 |         map {convtime($_)} @params;
664 | 
665 | 	close (JOBLOG);
666 | }
667 | 
668 | # vi: ts=4 sw=4 expandtab
669 | 


--------------------------------------------------------------------------------
/sqlog-db-util:
--------------------------------------------------------------------------------
   1 | #!/usr/bin/perl -w
   2 | ###############################################################################
   3 | #  $Id$
   4 | #******************************************************************************
   5 | #  Copyright (C) 2007-2009  Lawrence Livermore National Security, LLC.
   6 | #  Produced at Lawrence Livermore National Laboratory (cf, DISCLAIMER).
   7 | #  Written by Adam Moody <moody20@llnl.gov> and
   8 | #             Mark Grondona <mgrondona@llnl.gov>
   9 | #
  10 | #  UCRL-CODE-235340.
  11 | #
  12 | #  This file is part of sqlog.
  13 | #
  14 | #  This is free software; you can redistribute it and/or modify it
  15 | #  under the terms of the GNU General Public License as published by
  16 | #  the Free Software Foundation; either version 2 of the License, or
  17 | #  (at your option) any later version.
  18 | #
  19 | #  This is distributed in the hope that it will be useful, but WITHOUT
  20 | #  ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
  21 | #  FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
  22 | #  for more details.
  23 | #
  24 | #  You should have received a copy of the GNU General Public License
  25 | #  along with this program; if not, see <http://www.gnu.org/licenses/>.
  26 | ###############################################################################
  27 | #
  28 | #  sqlog-db-util - SQLOG job database maintenance.
  29 | #
  30 | ###############################################################################
  31 | use strict;
  32 | use lib qw();	# Required for _perl_libpaths RPM option
  33 | use DBI;
  34 | use Digest::SHA1 qw/ sha1_hex /;
  35 | use Getopt::Long qw/ :config gnu_getopt ignore_case /;
  36 | use File::Basename;
  37 | use Hostlist;
  38 | use Time::HiRes qw( gettimeofday );
  39 | 
  40 | # This file contains the SQL statements needed
  41 | # to set up a 'slurm_job_log' table in a 'slurm' DB
  42 | # on a MySQL server.
  43 | #
  44 | # It can also be used to backfill the database by
  45 | # inserting records from a list of slurm job completion
  46 | # logfiles.
  47 | #
  48 | # Adam Moody <moody20@llnl.gov>
  49 | 
  50 | # Required for _path_env_var RPM option
  51 | $ENV{PATH} = '/bin:/usr/bin:/usr/sbin';
  52 | 
  53 | my %conf = ();
  54 | 
  55 | ##############################
  56 | #  Usage:
  57 | #############################
  58 | my $progname = basename $0;
  59 | 
  60 | $conf{usage} = <<EOF
  61 | Usage: $progname [OPTIONS]... [FILES]...
  62 | 
  63 | Create SLURM job completion log database along with user accounts to
  64 | access it, and/or backfill the database from SLURM job completion
  65 | logfiles.
  66 | 
  67 |     -h, --help         Display this message.
  68 |     -i, --info         Print information about current DB.
  69 |     -v, --verbose      Be verbose.
  70 |     -d, --drop=V       Drop tables for version V={1,2} of database schema.
  71 |     -c, --create       Create slurm database, users, and latest version
  72 |                         of database tables.
  73 |     -b, --backfill     Backfill database from all SLURM joblog files in ARGV.
  74 |     -x, --convert      Convert data from database schema version 1 to version 2.
  75 | 
  76 |     -B, --backup=RANGE Copy data from tables over RANGE to a file in a format 
  77 |                        readable by the --backfill option. RANGE can be 
  78 |                        specified as "all" or a date range in the form
  79 |                        DATE..DATE. DATE must be in a format of
  80 |                        'yyyy-mm-dd hh:mm:ss'.
  81 |                        
  82 |     -o, --obfuscate    Obfuscates usernames, userids, and jobnames during
  83 |                        backup operations, which is useful for sharing system
  84 |                        joblogs with offsite collaborators.
  85 | 
  86 |     -p, --prune=DATE   Prune database of all jobs with start times older 
  87 |                        than DATE; write such records to a file.  DATE must 
  88 |                        be in format of 'yyyy-mm-dd hh:mm:ss'
  89 | 
  90 |     -C, --cores-per-node=N       
  91 |                        During --backfill, --convert, --backup, or --prune, 
  92 |                        specify the number of cores per node used to compute 
  93 |                        corecount field on clusters that allocate whole
  94 |                        nodes to jobs.
  95 | 
  96 |     --notrack          Disable per-job node tracking for jobs inserted during 
  97 |                        --convert or --backfill operations.
  98 | 
  99 |     --delay-index      Temporarily disable node tracking indicies for jobs 
 100 |                        inserted during convert of backfill operations.
 101 | 
 102 |     --recalc-nodecnt   When backfilling, do not use NodeCnt as stored in the
 103 |                        joblog file. Instead recalculate based on  nodelist.
 104 | 
 105 |     -L, --localhost    Connect to DB over localhost instead of configured 
 106 |                        SQL host.
 107 | 
 108 | EOF
 109 | ;
 110 | 
 111 | sub usage { print STDERR $conf{usage}; exit 0; }
 112 | 
 113 | #############################
 114 | #  Read Config File.
 115 | #############################
 116 | 
 117 | # Config Defaults
 118 | $conf{confdir}     = "/etc/slurm";
 119 | $conf{db}          = "slurm";
 120 | $conf{sqlhost}     = "sqlhost";
 121 | $conf{ro}{sqluser} = "slurm_read";
 122 | $conf{ro}{sqlpass} = "";
 123 | 
 124 | $conf{rw}{sqluser} = "slurm";
 125 | $conf{rw}{sqlpass} = "";
 126 | $conf{rw}{sqlnetwork} = "192.168.%.%";
 127 | 
 128 | # enables / disables node tracking per job in version 2 schema
 129 | $conf{track} = 1;
 130 | 
 131 | read_config ();
 132 | 
 133 | 
 134 | ##############################
 135 | #  Parse Command-line
 136 | #############################
 137 | 
 138 | # set defaults and read in command-line options
 139 | $conf{verbose}   = 0;
 140 | $conf{info}      = 0;
 141 | $conf{drop}      = 0;
 142 | $conf{create}    = 0;
 143 | $conf{backfill}  = "";
 144 | $conf{convert}   = 0;      # convert data in version 1 table to version 2
 145 | $conf{backup}    = 0;
 146 | $conf{obfuscate} = 0;
 147 | $conf{prune}     = undef;
 148 | # used to set corecount field during convert or backfill
 149 | # for machines which allocate whole nodes
 150 | $conf{cores}     = undef;
 151 | # if set to 0, remove node-tracking indicies, insert nodes,
 152 | # and renable indicies
 153 | $conf{indicies}  = 1;
 154 | $conf{localhost} = 0;
 155 | 
 156 | GetOptions (
 157 |    "help|h"         => \$conf{help},
 158 |    "verbose|v+"     => \$conf{verbose},
 159 |    "info|i"         => \$conf{info},
 160 |    "drop|d=i"       => \$conf{drop},
 161 |    "create|c"       => \$conf{create},
 162 |    "backfill|b"     => \$conf{backfill},
 163 |    "convert|x"      => \$conf{convert},
 164 |    "backup|B=s"     => \$conf{backup},
 165 |    "obfuscate|o"    => \$conf{obfuscate},
 166 |    "prune|p=s"      => \$conf{prune},
 167 |    "cores-per-node|C=i"  => \$conf{cores},
 168 |    "notrack"        => sub { $conf{track} = 0; },
 169 |    "delay-index"    => sub { $conf{indicies} = 0; },
 170 |    "localhost|L"    => \$conf{localhost},
 171 |    "recalc-nodecnt" => \$conf{recalculate_nodecount},
 172 | ) or usage ();
 173 | 
 174 | if (!$conf{create} && !$conf{convert} && !$conf{drop} && 
 175 |     !$conf{backfill} && !$conf{backup} && !$conf{prune} && 
 176 |     !$conf{info} && !$conf{help}) {
 177 |     log_error ("Specify at least one of " . 
 178 |                "--{create,convert,drop,backfill,backup,prune,info}.\n");
 179 |     usage ();
 180 | }
 181 | 
 182 | if ($conf{help}) {
 183 |     usage ();
 184 | }
 185 | 
 186 | #############################
 187 | # Attempt to connect to slurm database
 188 | #############################
 189 | 
 190 | # test whether slurm db already exists by trying to connect
 191 | my $dbh = connect_db_rw ();
 192 | 
 193 | # backup table data -- writes records to a file readable by backfill
 194 | # global variables to help obfuscate user and jobnames
 195 | my %obfuscate = ();
 196 | my $num_users = 0;
 197 | my $num_jobs  = 0;
 198 | 
 199 | if ($conf{backup} or defined $conf{prune}) { 
 200 |     # check that we have a db connection
 201 |     if (!$dbh) {
 202 |         log_fatal ("Data dump requested, but connection to database failed!\n")
 203 |     }
 204 | 
 205 |     # check that user gave us exactly one file name
 206 |     if (@ARGV != 1) {
 207 |         log_fatal ("You must specify a date range and" .
 208 |                    " a filename to append data to.\n");
 209 |     }
 210 |     my $joblog = shift @ARGV;
 211 | 
 212 |     # dump data to joblog file
 213 |     if (table_exists ($dbh, "slurm_job_log")) {
 214 |         dump_slurm_joblog_table (1, $dbh, $joblog); 
 215 |     }
 216 |     if (table_exists ($dbh, "jobs")) {
 217 |         dump_slurm_joblog_table (2, $dbh, $joblog); 
 218 |     }
 219 | }
 220 | 
 221 | #
 222 | #  Drop existing tables
 223 | #
 224 | if ($conf{drop}) {
 225 |     if ($dbh) {
 226 |         if ($conf{drop} == 1) {
 227 |             log_verbose ("drop: Dropping version 1 tables\n");
 228 |             drop_slurm_joblog_table_v1 ($dbh);
 229 |         } elsif ($conf{drop} == 2) {
 230 |             log_verbose ("drop: Dropping version 2 tables\n");
 231 |             drop_slurm_joblog_table_v2 ($dbh);
 232 |         } else {
 233 |             log_verbose ("drop: Unknown schema version: $conf{drop}\n");
 234 |         }
 235 |         $dbh = disconnect_db_rw ();
 236 |     } else {
 237 |         log_verbose ("drop: No existing slurm DB to drop\n");
 238 |     }
 239 |     # TODO: should we also delete the slurm db and users 
 240 |     #  (i.e., undo everything create does?)
 241 | }
 242 | 
 243 | #
 244 | #  Create database
 245 | #
 246 | if ($conf{create} && $dbh) {
 247 |     # if version 2 tables do not exist, create them
 248 |     if (not table_exists ($dbh, "jobs")) {
 249 |         log_verbose ("create: Creating version 2 tables.\n");
 250 |         create_slurm_joblog_table_v2 ($dbh);
 251 |     } else {
 252 |         log_verbose ("create: SLURM database already exists.\n");
 253 |     }
 254 | } elsif ($conf{create} && !$dbh) {
 255 |     # the db may not exist (couldn't connect), try to create it
 256 |     create_db_and_slurm_users ();
 257 | 
 258 |     # try to connect again
 259 |     $dbh = connect_db_rw() 
 260 |         or log_fatal ("create: Failed to connect to SLURM DB after create!\n");
 261 | 
 262 |     # create version 2 from the beginning on a brand new install
 263 |     log_verbose ("create: Creating version 2 tables.\n");
 264 | 
 265 |     create_slurm_joblog_table_v2 ($dbh);
 266 | }
 267 | 
 268 | # 
 269 | # Convert slurm_job_log table to version 2 
 270 | #  (add corecount and extend nodelist columns)
 271 | #
 272 | if ($conf{convert}) {
 273 |     #
 274 |     #  Attempt to convert table to version 2, if conversion fails 
 275 |     #   print an error. 
 276 |     #
 277 |     #  If the table has already been converted, a message is printed 
 278 |     #   and no action is taken
 279 |     #
 280 |     if (!$dbh) {
 281 |         log_fatal ("convert: Conversion requested," .
 282 |                    " but connection to database failed.\n")
 283 |     }
 284 |     log_verbose ("convert: Initiating conversion from" .
 285 |                  " version 1 to version 2 tables.\n");
 286 |     if (!convert_slurm_joblog_table_from_v1_to_v2 ($dbh)) {
 287 |         log_fatal ("convert: SLURM job log table conversion failed.\n");
 288 |     }
 289 | }
 290 | 
 291 | #
 292 | #  Backfill from logfiles
 293 | #
 294 | if ($conf{backfill}) { 
 295 |     if (!$dbh) {
 296 |         log_fatal ("backfill: Backfill requested," .
 297 |                    " but connection to database failed!\n")
 298 |     }
 299 |     # if we find the version 2 schema, backfill to it
 300 |     # otherwise, if we find the version 1 schema, backfill to it
 301 |     # if we find neither, throw an error
 302 |     if (table_exists ($dbh, "jobs")) {
 303 |         backfill_slurm_joblog_table_to_v2 ($dbh, @ARGV); 
 304 |     } elsif (table_exists ($dbh, "slurm_job_log")) {
 305 |         backfill_slurm_joblog_table_to_v1 ($dbh, @ARGV); 
 306 |     } else {
 307 |         log_fatal ("backfill: Unknown schema version.\n");
 308 |     }
 309 | }
 310 | 
 311 | if ($conf{info}) {
 312 |     show_info ();
 313 | }
 314 | 
 315 | disconnect_db_rw ();
 316 | 
 317 | exit 0;
 318 | 
 319 | #############################
 320 | # Support functions
 321 | #############################
 322 | 
 323 | sub db_host_string
 324 | {
 325 |     return $conf{localhost} ? "localhost" : $conf{sqlhost};
 326 | }
 327 | 
 328 | sub connect_db_rw
 329 | {
 330 |     my $host = db_host_string ();
 331 |     my $cstr = "DBI:mysql(PrintError=>0):" .
 332 |                "database=$conf{db};host=$host:";
 333 | 
 334 |     my $dbh = DBI->connect($cstr, $conf{rw}{sqluser}, $conf{rw}{sqlpass})
 335 |        or log_verbose ("Unable to connect to MySQL DB as ",
 336 |                      "$conf{rw}{sqluser}\@$conf{sqlhost}: ", $DBI::errstr, "\n");
 337 | 
 338 |     $conf{dbh}{rw} = $dbh;
 339 | 
 340 |     return ($dbh);
 341 | }
 342 | 
 343 | sub disconnect_db_rw
 344 | {
 345 |     return if !$conf{dbh}{rw};
 346 |     $conf{dbh}{rw}->disconnect;
 347 |     return $conf{dbh}{rw} = undef;
 348 | }
 349 | 
 350 | sub connect_db_root
 351 | {
 352 |     my $host = db_host_string ();
 353 |     my $str  = "DBI:mysql(PrintError=>0):host=$host;";
 354 | 
 355 |     $conf{dbh}{root} = DBI->connect ($str, "root", $conf{rw}{rootpass}) 
 356 |         or log_fatal ("Unable to connect to MySQL DB as root\@$host: ",
 357 |                       $DBI::errstr, "\n");
 358 | 
 359 |     return ($conf{dbh}{root});
 360 | }
 361 | 
 362 | # returns 1 if table exists, 0 otherwise
 363 | sub table_exists
 364 | {
 365 |     my $dbh   = shift @_;
 366 |     my $table = shift @_;
 367 | 
 368 |     # check whether our database has a table by the proper name
 369 |     my $sth = $dbh->prepare("SHOW TABLES;");
 370 |     if ($sth->execute()) {
 371 |         while (my ($name) = $sth->fetchrow_array()) {
 372 |             if ($name eq $table) { return 1; }
 373 |         }
 374 |     }
 375 | 
 376 |     # didn't find it
 377 |     return 0;
 378 | }
 379 | 
 380 | sub read_config 
 381 | {
 382 |     my $ro = "$conf{confdir}/sqlog.conf";
 383 |     my $rw = "$conf{confdir}/slurm-joblog.conf";
 384 | 
 385 |     # First read sqlog config to get SQLHOST and SQLDB
 386 |     #  (ignore SQLUSER/SQLPASS)
 387 |     unless (my $rc = do $ro) {
 388 |         log_fatal ("Couldn't parse $ro: $@\n") if $@;
 389 |         log_fatal ("couldn't run $ro\n") if (defined $rc && !$rc);
 390 |     }
 391 | 
 392 |     $conf{db}          = $conf::SQLDB   if (defined $conf::SQLDB);
 393 |     $conf{sqlhost}     = $conf::SQLHOST if (defined $conf::SQLHOST);
 394 |     $conf{ro}{sqluser} = $conf::SQLUSER if (defined $conf::SQLUSER);
 395 |     $conf{ro}{sqlpass} = $conf::SQLPASS if (defined $conf::SQLPASS);
 396 | 
 397 |     # enable / disable per job node tracking
 398 |     $conf{track} = $conf::TRACKNODES if (defined $conf::TRACKNODES);
 399 | 
 400 |     undef $conf::SQLUSER;
 401 |     undef $conf::SQLPASS;
 402 | 
 403 |     # Now read slurm-joblog.conf
 404 |     -r $rw  || log_fatal ("Unable to read required config file: $rw.\n");
 405 |     unless (my $rc = do $rw) {
 406 |         log_fatal ("Couldn't parse $rw: $@\n") if $@;
 407 |         log_fatal ("couldn't run $rw\n") if (defined $rc && !$rc);
 408 |     }
 409 | 
 410 |     $conf{rw}{sqluser}    = $conf::SQLUSER     if (defined $conf::SQLUSER);
 411 |     $conf{rw}{sqlpass}    = $conf::SQLPASS     if (defined $conf::SQLPASS);
 412 |     $conf{rw}{rootpass}   = $conf::SQLROOTPASS if (defined $conf::SQLROOTPASS);
 413 |     $conf{rw}{sqlnetwork} = $conf::SQLNETWORK  if (defined $conf::SQLNETWORK);
 414 | 
 415 |     @{$conf{rw}{hosts}} = @conf::SQLRWHOSTS if (@conf::SQLRWHOSTS);
 416 | 
 417 |     my %seen;
 418 |     @{$conf{rw}{hosts}} = grep {$_ && !$seen{$_}++} @{$conf{rw}{hosts}};
 419 | 
 420 | }
 421 | 
 422 | # Connect to MySQL as root user to build slurm db
 423 | # and insert slurm and slurm_read users
 424 | sub create_db_and_slurm_users
 425 | {
 426 |     my $dbh = connect_db_root () 
 427 |         or log_fatal ("Couldn't connect to database as root\n");
 428 | 
 429 |     #  
 430 |     #  Abort if slurm_job_log table already exists.
 431 |     if (table_exists ($dbh, "slurm_job_log") or table_exists ($dbh, "jobs")) {
 432 |         log_msg ("create: SLURM job log table exists. No create necessary.\n");
 433 |         return;
 434 |     }
 435 | 
 436 |     #############################
 437 |     # Create slurm db / table
 438 |     #############################
 439 | 
 440 |     log_verbose ("Creating slurm DB\n");
 441 |     do_sql ($dbh, "CREATE DATABASE IF NOT EXISTS $conf{db};"); 
 442 | 
 443 |     #############################
 444 |     # Set up slurm (r/w) and slurm_read (r/o) access 
 445 |     #############################
 446 | 
 447 |     # Switch to management databases
 448 |     do_sql($dbh, "USE mysql;");
 449 | 
 450 |     log_verbose ("Dropping previous slurm joblog db users and privileges.\n");
 451 |     drop_slurm_users ($dbh);
 452 | 
 453 |     # set up permissions for different users of slurm database
 454 |     for my $host (@{$conf{rw}{hosts}}, "localhost") {
 455 |         my $user = $conf{rw}{sqluser};
 456 |         log_verbose ("Granting rw privileges to $user on $host\n");
 457 |         do_sql ($dbh, 
 458 |                 "GRANT ALL ON $conf{db}.* TO" .
 459 |                 " '$user'\@'$host'" .
 460 |                 " IDENTIFIED BY '$conf{rw}{sqlpass}'"); 
 461 |     }
 462 | 
 463 |     log_verbose ("Granting readonly privs to $conf{ro}{sqluser} " .
 464 |                  "on $conf{rw}{sqlnetwork}.\n");
 465 |     do_sql ($dbh, 
 466 |             "GRANT SELECT ON $conf{db}.* TO" .
 467 |             " $conf{ro}{sqluser}\@'$conf{rw}{sqlnetwork}'" .
 468 |             " IDENTIFIED BY ''");
 469 | 
 470 |     # flush privileges to make our changes current
 471 |     log_verbose ("FLUSH PRIVILEGES\n");
 472 |     do_sql($dbh, "FLUSH PRIVILEGES;");
 473 | 
 474 |     # we're done
 475 |     log_verbose ("Done creating slurm joblog DB.\n");
 476 | }
 477 | 
 478 | sub show_info 
 479 | {
 480 |     my $dbh = connect_db_rw () or return;
 481 | 
 482 |     # determine what schema version we're at
 483 |     my $version = "UKNOWN";
 484 |     if (table_exists ($dbh, "jobs")) {
 485 |       $version = 2;
 486 |     } elsif (table_exists ($dbh, "slurm_job_log")) {
 487 |       $version = 1;
 488 |     }
 489 | 
 490 |     &log_verbose ("Connected to joblog database version $version\n");
 491 | 
 492 |     # count the number of jobs in version 1
 493 |     my $count_v1 = 0;
 494 |     my $stmt = "SELECT COUNT(*) FROM `$conf{db}`.`slurm_job_log`;";
 495 |     my $sth = $dbh->prepare ($stmt) or return;
 496 |     if ($sth->execute ()) { ($count_v1) = $sth->fetchrow_array; }
 497 | 
 498 |     # count the number of jobs in version 2
 499 |     my $count_v2 = 0;
 500 |     $stmt = "SELECT COUNT(*) FROM `$conf{db}`.`jobs`;";
 501 |     $sth = $dbh->prepare ($stmt) or return;
 502 |     if ($sth->execute ()) { ($count_v2) = $sth->fetchrow_array; }
 503 | 
 504 |     # add the job counts to get the total
 505 |     my $count = $count_v1 + $count_v2;
 506 | 
 507 |     # now we're ready to print
 508 |     log_msg ("Information for SLURM job log DB:\n");
 509 |     print "DB Host:   $conf{sqlhost}\n";
 510 |     print "DB User:   $conf{ro}{sqluser}\n";
 511 |     print "RW User:   $conf{rw}{sqluser}\n";
 512 |     print "SLURM DB:  $conf{db}\n";
 513 |     print "Version:   $version\n";
 514 |     print "Job count: $count\n";
 515 |     
 516 |     return;
 517 | }
 518 | 
 519 | sub drop_slurm_users
 520 | {
 521 |     my $dbh = shift @_;
 522 |     my $stmt = "SELECT user,host from mysql.user;";
 523 |     my @oldusers = ();
 524 | 
 525 |     my $sth = $dbh->prepare ($stmt) or return;
 526 |     $sth->execute () or return;
 527 | 
 528 |     while ((my $a = $sth->fetchrow_arrayref)) {
 529 |         if ($a->[0] ne "$conf{ro}{sqluser}"  &&
 530 |             $a->[0] ne "$conf{rw}{sqluser}" ) {
 531 |                 next;
 532 |         }
 533 |         push (@oldusers, "$a->[0]\@'$a->[1]'");
 534 |     }
 535 |     do_sql ($dbh, "DROP USER " . join (", ", @oldusers)) if @oldusers;
 536 | }
 537 | 
 538 | # execute (do) sql statement on dbh
 539 | sub do_sql {
 540 |     my ($dbh, $stmt) = @_;
 541 |     log_debug ("SQL: [$stmt]\n");
 542 |     $dbh->do ($stmt);
 543 |     if (not $dbh->do ($stmt)) {
 544 |         log_error ("FAILED SQL: $stmt ERROR: " . $dbh->errstr . "\n");
 545 |         return 0;
 546 |     }
 547 |     return 1;
 548 | }
 549 | 
 550 | ####################
 551 | # Schema version 1 functions
 552 | ####################
 553 | 
 554 | # drop the table
 555 | sub drop_slurm_joblog_table_v1
 556 | {
 557 |     my $dbh = shift @_;
 558 |     my $success = 1;
 559 | 
 560 |     # switch to the slurm db
 561 |     if (not do_sql ($dbh, "USE $conf{db};")) { $success = 0; }
 562 | 
 563 |     # now drop the tables
 564 |     log_verbose ("drop: Dropping existing 'slurm_job_log' table\n");
 565 |     my $sql = "DROP TABLE `slurm_job_log`;";
 566 |     if (not do_sql ($dbh, $sql)) { $success = 0; }
 567 | 
 568 |     return $success;
 569 | }
 570 | 
 571 | # build the table
 572 | sub create_slurm_joblog_table_v1
 573 | {
 574 |     my $dbh = shift @_;
 575 |     my $success = 1;
 576 | 
 577 |     # switch to the slurm db
 578 |     if (not do_sql ($dbh, "USE $conf{db};")) { $success = 0; } 
 579 | 
 580 |     # keep this schema around for historical record
 581 |     # (could enable one to build a v1 table if so desired)
 582 |     my $sql = "CREATE TABLE IF NOT EXISTS slurm_job_log (
 583 |         id        int(10)   NOT NULL AUTO_INCREMENT,
 584 |         jobid     int(10)   NOT NULL,
 585 |         username  char(100) NOT NULL,
 586 |         userid    int(10)   NOT NULL,
 587 |         jobname   char(100) NOT NULL,
 588 |         jobstate  char(25)  NOT NULL,
 589 |         partition char(25)  NOT NULL,
 590 |         timelimit int(10)   NOT NULL,
 591 |         starttime datetime  NOT NULL,
 592 |         endtime   datetime  NOT NULL,
 593 |         nodelist  varchar(1024) NOT NULL,
 594 |         nodecount int(10)   NOT NULL,
 595 |         PRIMARY KEY (id),
 596 |         UNIQUE INDEX jobid (jobid,starttime),
 597 |         INDEX username (username)
 598 |     ) ENGINE=MyISAM;";
 599 |     if (not do_sql ($dbh, $sql)) { $success = 0; }
 600 | 
 601 |     return $success;
 602 | }
 603 | 
 604 | # given hash of values, create mysql values string for insert statement
 605 | sub value_string_v1
 606 | {
 607 |     my $dbh = shift @_;
 608 |     my $h   = shift @_;
 609 | 
 610 |     my @parts = ();
 611 |     push @parts, "NULL";
 612 |     push @parts, $dbh->quote($h->{JobId});
 613 |     push @parts, $dbh->quote($h->{UserName});
 614 |     push @parts, $dbh->quote($h->{UserNumb});
 615 |     push @parts, $dbh->quote($h->{Name});
 616 |     push @parts, $dbh->quote($h->{JobState});
 617 |     push @parts, $dbh->quote($h->{Partition});
 618 |     push @parts, $dbh->quote($h->{TimeLimit});
 619 |     push @parts, $dbh->quote($h->{StartTime});
 620 |     push @parts, $dbh->quote($h->{EndTime});
 621 |     push @parts, $dbh->quote($h->{NodeList});
 622 |     push @parts, $dbh->quote($h->{NodeCnt});
 623 | 
 624 |     return "(" . join(',', @parts) . ")";
 625 | }
 626 | 
 627 | # do a batch insert to be more efficient
 628 | sub insert_values_v1
 629 | {
 630 |     my $dbh = shift @_;
 631 |     my @values = @_;
 632 | 
 633 |     while (@values) {
 634 |         my @subvalues = ();
 635 |         for (my $i = 0; $i < 50 and @values; $i++) { 
 636 |             push @subvalues, shift @values; 
 637 |         }
 638 |         my $sql = "INSERT IGNORE INTO `$conf{db}`.`slurm_job_log` VALUES " . 
 639 |             join(",", @subvalues) . ";";
 640 | 
 641 |         #log_debug ("SQL: $sql\n");
 642 |         $dbh->do($sql);
 643 |     }
 644 | }
 645 | 
 646 | # given a dbh and list of slurm job completion logfiles,
 647 | # insert them into the dbh
 648 | sub backfill_slurm_joblog_table_to_v1
 649 | {
 650 |     my $dbh = shift @_;
 651 |     my @files = @_;
 652 |     my $success = 1;
 653 | 
 654 |     # switch to the slurm db
 655 |     if (not do_sql ($dbh, "USE $conf{db};")) { $success = 0; }
 656 | 
 657 |     # if our new table does not exist, create it
 658 |     if (not table_exists ($dbh, "slurm_job_log")) {
 659 |         if (not create_slurm_joblog_table_v1($dbh)) {
 660 |             return 0;
 661 |         }
 662 |     }
 663 | 
 664 |     log_error ("No files to backfill!\n") if (!@files);
 665 | 
 666 |     foreach my $file (@files) {
 667 |         my @values = ();
 668 |         my $count = 0;
 669 |         my $skipped = 0;
 670 | 
 671 |         my $f = $file;
 672 |         $f = "gzip -dc $f | " if ($f =~ /\.gz$/);
 673 | 
 674 |         open (IN, $f) or log_error ("Failed to open \"$file\":$!\n"), next;
 675 | 
 676 |         while (my $line = <IN>) {
 677 |             chomp $line;
 678 |             my @parts = split(" ", $line);
 679 | 
 680 |             my %h = ();
 681 |             foreach my $part (@parts) {
 682 |                 my ($key, $value) = split("=", $part);
 683 |                 $h{$key} = $value;
 684 |             }
 685 | 
 686 |             # Some very old joblog files may have the incorrect
 687 |             #  datetime format. Unfortunately, the year wasn't
 688 |             #  included in these, so we have to drop these entries :-(
 689 |             if (defined $h{StartTime} and $h{StartTime} =~ m{^\d\d/\d\d-}) {
 690 |                 $skipped++;
 691 |                 next;
 692 |             }
 693 | 
 694 |             # convert from slurm log to format for MySQL
 695 |             if (defined $h{"UserId"}) {
 696 |                 my $userid = $h{"UserId"};
 697 |                 my ($username, $usernumb) = ($userid =~ /(.+)\((\d+)\)/);
 698 |                 if (defined $username and defined $usernumb) {
 699 |                     $h{"UserName"} = $username;
 700 |                     $h{"UserNumb"} = $usernumb;
 701 |                 }
 702 |             }
 703 |             if (defined $h{"StartTime"}) {
 704 |                 $h{"StartTime"} =~ s/T/ /;
 705 |             }
 706 |             if (defined $h{"EndTime"}) {
 707 |                 $h{"EndTime"}   =~ s/T/ /;
 708 |             }
 709 | 
 710 |             push @values, value_string_v1($dbh, \%h);
 711 |             
 712 |             if (@values > 100) {
 713 |                 insert_values_v1($dbh, @values);
 714 |                 @values = ();
 715 |             }
 716 |             $count++;
 717 |         }
 718 |         insert_values_v1($dbh, @values);
 719 | 
 720 |         log_verbose ("Backfilled $count jobs from file $file\n");
 721 |         log_error ("Skipped $skipped job(s) from file $file because of ",
 722 |                   "old date format\n") if $skipped;
 723 | 
 724 |         close(IN);
 725 |     }
 726 | 
 727 |     return $success;
 728 | }
 729 | 
 730 | ####################
 731 | # Schema version 2 functions
 732 | ####################
 733 | 
 734 | # cache for name ids, saves us from hitting the database
 735 | # over and over at the cost of more memory
 736 | my %IDcache = ();
 737 | %{$IDcache{nodes}} = ();
 738 | 
 739 | # return the auto increment value for the last inserted record
 740 | sub get_last_insert_id
 741 | {
 742 |     my $dbh = shift @_;
 743 |     my $id = undef;
 744 | 
 745 |     my $sql = "SELECT LAST_INSERT_ID();";
 746 |     my $sth = $dbh->prepare($sql);
 747 |     if ($sth->execute()) {
 748 |         ($id) = $sth->fetchrow_array();
 749 |     } else {
 750 |         log_error ("Fetching last id: $sql\n");
 751 |     }
 752 | 
 753 |     return $id;
 754 | }
 755 | 
 756 | # given a table and name, read id for name from table
 757 | # and add to id cache if found
 758 | sub read_id
 759 | {
 760 |     my $dbh   = shift @_;
 761 |     my $table = shift @_;
 762 |     my $name  = shift @_;
 763 | 
 764 |     my $id = undef;
 765 | 
 766 |     # if name is not set, don't try to look it up in hash, just return undef
 767 |     if (not defined $name) { return $id; }
 768 | 
 769 |     if (not defined $IDcache{$table}) { %{$IDcache{$table}} = (); }
 770 |     if (not defined $IDcache{$table}{$name}) {
 771 |         my $q_name = $dbh->quote($name);
 772 |         my $sql = "SELECT * FROM `$table` WHERE `name` = $q_name;";
 773 |         my $sth = $dbh->prepare($sql);
 774 |         if ($sth->execute ()) {
 775 |             my ($table_id, $table_name) = $sth->fetchrow_array ();
 776 |             if (defined $table_id and defined $table_name) {
 777 |                 $IDcache{$table}{$name} = $table_id;
 778 |                 $id = $table_id;
 779 |             }
 780 |         } else {
 781 |             log_error ("Reading record: $sql --> " . $dbh->errstr . "\n");
 782 |         }
 783 |     } else {
 784 |         $id = $IDcache{$table}{$name};
 785 |     }
 786 | 
 787 |     return $id;
 788 | }
 789 | 
 790 | # insert name into table if it does not exist, and return its id
 791 | sub read_write_id
 792 | {
 793 |     my $dbh   = shift @_;
 794 |     my $table = shift @_;
 795 |     my $name  = shift @_;
 796 | 
 797 |     # if name isn't set, set it to the empty string
 798 |     # DON'T do this in slurm-joblog, it will fail and
 799 |     # write to the joblog instead
 800 |     if (not defined $name) { $name = ""; }
 801 | 
 802 |     # attempt to read the id first, if not found,
 803 |     # insert it and return the last insert id
 804 |     my $id = read_id($dbh, $table, $name);
 805 |     if (not defined $id) {
 806 |         my $q_name = $dbh->quote($name);
 807 |         my $sql = "INSERT IGNORE INTO `$table` (`id`,`name`)" .
 808 |                   " VALUES (NULL,$q_name);";
 809 |         my $sth = $dbh->prepare($sql);
 810 |         if ($sth->execute ()) {
 811 |             # user read_id here instead of get_last_insert_id
 812 |             # to avoid race conditions
 813 |             $id = read_id ($dbh, $table, $name);
 814 |             if (not defined $id) {
 815 |                 log_error ("Error inserting new record (id undefined): $sql\n");
 816 |                 $id = 0;
 817 |             } elsif ($id == 0) {
 818 |                 log_error ("Error inserting new record (id=0): $sql\n");
 819 |                 $id = 0;
 820 |             }
 821 |         } else {
 822 |             log_error ("Error inserting new record: $sql --> " .
 823 |                        $dbh->errstr . "\n");
 824 |             $id = 0;
 825 |         }
 826 |     }
 827 | 
 828 |     return $id;
 829 | }
 830 | 
 831 | # given a reference to a list of nodes,
 832 | # read their ids from the nodes table and add them to the id cache
 833 | sub read_node_ids
 834 | {
 835 |     my $dbh       = shift @_;
 836 |     my $nodes_ref = shift @_;
 837 |     my $success = 1;
 838 | 
 839 |     # build list of nodes not in our cache
 840 |     my @missing_nodes = ();
 841 |     foreach my $node (@$nodes_ref) {
 842 |         if (not defined $IDcache{nodes}{$node}) { push @missing_nodes, $node; }
 843 |     }
 844 | 
 845 |     # if any missing nodes, try to look up their values
 846 |     if (@missing_nodes > 0) {
 847 |         my @q_nodes = map $dbh->quote($_), @missing_nodes;
 848 |         my $in_nodes = join(",", @q_nodes);
 849 |         my $sql = "SELECT * FROM `nodes` WHERE `name` IN ($in_nodes);";
 850 |         my $sth = $dbh->prepare($sql);
 851 |         if ($sth->execute ()) {
 852 |             while (my ($table_id, $table_name) = $sth->fetchrow_array ()) {
 853 |                 $IDcache{nodes}{$table_name} = $table_id;
 854 |             }
 855 |         } else {
 856 |             log_error ("Reading nodes: $sql --> " . $dbh->errstr . "\n");
 857 |             $success = 0;
 858 |         }
 859 |     }
 860 | 
 861 |     return $success;
 862 | }
 863 | 
 864 | # given a reference to a list of nodes,
 865 | # insert them into the nodes table and add their ids to the id cache
 866 | sub read_write_node_ids
 867 | {
 868 |     my $dbh       = shift @_;
 869 |     my $nodes_ref = shift @_;
 870 |     my $success = 1;
 871 | 
 872 |     # read node_ids for these nodes into our cache
 873 |     read_node_ids($dbh, $nodes_ref);
 874 | 
 875 |     # if still missing nodes, we need to insert them
 876 |     my @missing_nodes = ();
 877 |     foreach my $node (@$nodes_ref) {
 878 |         if (not defined $IDcache{nodes}{$node}) { push @missing_nodes, $node; }
 879 |     }
 880 |     if (@missing_nodes > 0) {
 881 |         my @q_nodes = map $dbh->quote($_), @missing_nodes;
 882 |         my $values = join("),(", @q_nodes);
 883 |         my $sql = "INSERT IGNORE INTO `nodes` (`name`) VALUES ($values);";
 884 |         my $sth = $dbh->prepare($sql);
 885 |         if (not $sth->execute ()) {
 886 |             log_error ("Inserting nodes: $sql --> " . $dbh->errstr . "\n");
 887 |             $success = 0;
 888 |         }
 889 | 
 890 |         # fetch ids for just inserted nodes
 891 |         read_node_ids($dbh, $nodes_ref);
 892 |     }
 893 | 
 894 |     return $success;
 895 | }
 896 | 
 897 | # given a job_id and a nodelist,
 898 | # insert jobs_nodes records for each node used in job_id
 899 | sub insert_job_nodes
 900 | {
 901 |     my $dbh      = shift @_;
 902 |     my $job_id   = shift @_;
 903 |     my $nodelist = shift @_;
 904 |     my $success = 1;
 905 | 
 906 |     if (defined $job_id and defined $nodelist and $nodelist ne "") {
 907 |         my $q_job_id = $dbh->quote($job_id);
 908 | 
 909 |         # clean up potentially bad nodelist
 910 |         if ($nodelist =~ /\[/ and $nodelist !~ /\]/) {
 911 |             # found an opening bracket, but no closing bracket,
 912 |             # nodelist is probably incomplete
 913 |             # chop back to last ',' or '-' and replace with a ']'
 914 |             $nodelist =~ s/[,-]\d+$/\]/;
 915 |         }
 916 | 
 917 |         # get our nodeset
 918 |         my @nodes = Hostlist::expand($nodelist);
 919 | 
 920 |         # this will fill our node_id cache
 921 |         read_write_node_ids($dbh, \@nodes);
 922 | 
 923 |         # get the node_id for each node
 924 |         my @values = ();
 925 |         foreach my $node (@nodes) {
 926 |             if (defined $IDcache{nodes}{$node}) {
 927 |                 my $q_node_id = $dbh->quote($IDcache{nodes}{$node});
 928 |                 push @values, "($q_job_id,$q_node_id)";
 929 |             }
 930 |         }
 931 | 
 932 |         # if we have any nodes for this job, insert them
 933 |         if (@values > 0) {
 934 |             my $sql = "INSERT DELAYED IGNORE INTO `jobs_nodes`" .
 935 |                       " (`job_id`,`node_id`)" .
 936 |                       " VALUES " . join(",", @values) . ";";
 937 |             my $sth = $dbh->prepare($sql);
 938 |             if (not $sth->execute ()) {
 939 |                 log_error ("Inserting jobs_nodes records for job id" .
 940 |                            " $job_id: $sql --> " . $dbh->errstr . "\n");
 941 |                 $success = 0;
 942 |             }
 943 |         }
 944 |     }
 945 | 
 946 |     return $success;
 947 | }
 948 | 
 949 | # compute time since epoch, attempt to account for DST changes via timelocal
 950 | sub get_seconds
 951 | {
 952 |     my ($date) = @_;
 953 |     use Time::Local;
 954 | 
 955 |     my ($y, $m, $d, $H, $M, $S) = ($date =~ /(\d\d\d\d)\-(\d\d)\-(\d\d) (\d\d):(\d\d):(\d\d)/);
 956 |     $y -= 1900;
 957 |     $m -= 1;
 958 | 
 959 |     return timelocal ($S, $M, $H, $d, $m, $y);
 960 | }
 961 | 
 962 | # given hash of values, create mysql values string for insert statement
 963 | sub value_string_v2
 964 | {
 965 |     my $dbh = shift @_;
 966 |     my $h   = shift @_;
 967 | 
 968 |     # given start and end times, compute the number of seconds
 969 |     # the job ran for
 970 |     # TODO: unsure whether this correctly handles jobs that
 971 |     # straddle DST changes
 972 |     my $seconds = 0;
 973 |     if (defined $h->{StartTime} and $h->{StartTime} !~ /^\s*$/ and
 974 |         defined $h->{EndTime}   and $h->{EndTime}   !~ /^\s*$/)
 975 |     {
 976 |          my $start = get_seconds($h->{StartTime});
 977 |          my $end   = get_seconds($h->{EndTime});
 978 |          $seconds = $end - $start;
 979 |          if ($seconds < 0) { $seconds = 0; }
 980 |     }
 981 | 
 982 |     # if Procs is not set, but cores is specified and NodeCnt is set,
 983 |     # compute Procs
 984 |     # (assumes all processors on the node were allocated to the job,
 985 |     # only use for clusters which use whole-node allocation)
 986 |     if (not defined $h->{Procs} and defined $conf{cores} and
 987 |         defined $h->{NodeCnt}
 988 |        )
 989 |     {
 990 |       $h->{Procs} = $h->{NodeCnt} * $conf{cores};
 991 |     }
 992 | 
 993 |     # get id values
 994 |     my $username_id  = read_write_id($dbh, "usernames",  $h->{UserName});
 995 |     my $jobname_id   = read_write_id($dbh, "jobnames",   $h->{Name});
 996 |     my $jobstate_id  = read_write_id($dbh, "jobstates",  $h->{JobState});
 997 |     my $partition_id = read_write_id($dbh, "partitions", $h->{Partition});
 998 |     if (not defined $username_id or
 999 |         not defined $jobname_id or
1000 |         not defined $jobstate_id or
1001 |         not defined $partition_id)
1002 |     {
1003 |         log_error ("Missing an id for one of: jobid=$h->{JobId}," .
1004 |                    " username=$h->{UserName}, jobname=$h->{Name}," .
1005 |                    " jobstate=$h->{JobState}, partition=$h->{Partition}\n");
1006 |         log_error ("Missing an id for one of: $username_id -- $jobname_id" .
1007 |                    " -- $jobstate_id -- $partition_id\n");
1008 |     }
1009 | 
1010 |     # insert the field values, order matters
1011 |     my @parts = ();
1012 |     push @parts, (defined $h->{Id}) ? $dbh->quote($h->{Id}) : "NULL";
1013 |     push @parts, $dbh->quote($h->{JobId});
1014 |     push @parts, $dbh->quote($username_id);
1015 |     push @parts, $dbh->quote($h->{UserNumb});
1016 |     push @parts, $dbh->quote($jobname_id);
1017 |     push @parts, $dbh->quote($jobstate_id);
1018 |     push @parts, $dbh->quote($partition_id);
1019 |     push @parts, $dbh->quote($h->{TimeLimit});
1020 |     push @parts, $dbh->quote($h->{StartTime});
1021 |     push @parts, $dbh->quote($h->{EndTime});
1022 |     push @parts, $dbh->quote($seconds);
1023 |     push @parts, $dbh->quote($h->{NodeList});
1024 |     push @parts, $dbh->quote($h->{NodeCnt});
1025 |     push @parts, (defined $h->{Procs}) ? $dbh->quote($h->{Procs}) : 0;
1026 | 
1027 |     # finally, return the ('field1','field2',...) string
1028 |     return "(" . join(',', @parts) . ")";
1029 | }
1030 | 
1031 | # drop all v2 tables
1032 | sub drop_slurm_joblog_table_v2
1033 | {
1034 |     my $dbh = shift @_;
1035 |     my $success = 1;
1036 | 
1037 |     # switch to the slurm db
1038 |     if (not do_sql ($dbh, "USE $conf{db};")) { $success = 0; }
1039 | 
1040 |     # now drop the tables
1041 |     if (not do_sql($dbh, "DROP TABLE `jobs`;"))       { $success = 0; }
1042 |     if (not do_sql($dbh, "DROP TABLE `usernames`;"))  { $success = 0; }
1043 |     if (not do_sql($dbh, "DROP TABLE `jobnames`;"))   { $success = 0; }
1044 |     if (not do_sql($dbh, "DROP TABLE `jobstates`;"))  { $success = 0; }
1045 |     if (not do_sql($dbh, "DROP TABLE `partitions`;")) { $success = 0; }
1046 |     if (not do_sql($dbh, "DROP TABLE `nodes`;"))      { $success = 0; }
1047 |     if (not do_sql($dbh, "DROP TABLE `jobs_nodes`;")) { $success = 0; }
1048 | 
1049 |     return $success;
1050 | }
1051 | 
1052 | # build all v2 tables
1053 | sub create_slurm_joblog_table_v2
1054 | {
1055 |     my $dbh = shift @_;
1056 |     my $success = 1;
1057 | 
1058 |     # switch to the slurm db
1059 |     if (not do_sql ($dbh, "USE $conf{db};")) { $success = 0; }
1060 | 
1061 |     # nodelist can be null since some jobs are canceled before
1062 |     # ever being assigned resources
1063 |     my $sql = "CREATE TABLE IF NOT EXISTS `jobs` (
1064 |         `id`           INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
1065 |         `jobid`        INT          NOT NULL,
1066 |         `username_id`  INT UNSIGNED NOT NULL,
1067 |         `userid`       INT          NOT NULL,
1068 |         `jobname_id`   INT UNSIGNED NOT NULL,
1069 |         `jobstate_id`  INT UNSIGNED NOT NULL,
1070 |         `partition_id` INT UNSIGNED NOT NULL,
1071 |         `timelimit`    INT          NOT NULL,
1072 |         `starttime`    DATETIME     NOT NULL,
1073 |         `endtime`      DATETIME     NOT NULL,
1074 |         `runtime`      INT UNSIGNED NOT NULL,
1075 |         `nodelist`     BLOB         NOT NULL,
1076 |         `nodecount`    INT UNSIGNED NOT NULL,
1077 |         `corecount`    INT UNSIGNED NOT NULL,
1078 |         UNIQUE INDEX `jobid` (`jobid`,`starttime`),
1079 |         INDEX `username_id`  (`username_id`),
1080 |         INDEX `jobname_id`   (`jobname_id`),
1081 |         INDEX `starttime`    (`starttime`),
1082 |         INDEX `endtime`      (`endtime`),
1083 |         INDEX `runtime`      (`runtime`),
1084 |         INDEX `nodecount`    (`nodecount`),
1085 |         INDEX `corecount`    (`corecount`)
1086 |     ) ENGINE=MyISAM;";
1087 |     if (not do_sql ($dbh, $sql)) { $success = 0; }
1088 | 
1089 |     # NOTE: The UNIQUE INDEX below ensures that two jobs with
1090 |     # the same name (etc.) that complete around the same time
1091 |     # do not insert two records.  The downside is that the
1092 |     # index prefix length cannot be more than 1000 bytes, so
1093 |     # names must be limited by the prefix size.
1094 | 
1095 |     # maps username strings to unique ids
1096 |     $sql = "CREATE TABLE IF NOT EXISTS `usernames` (
1097 |         `id`   INT UNSIGNED  NOT NULL AUTO_INCREMENT PRIMARY KEY,
1098 |         `name` VARCHAR(512) NOT NULL,
1099 |         UNIQUE INDEX `name` (`name`(512))
1100 |     ) ENGINE=MyISAM;";
1101 |     if (not do_sql ($dbh, $sql)) { $success = 0; }
1102 | 
1103 |     # maps partition name strings to unique ids
1104 |     $sql = "CREATE TABLE IF NOT EXISTS `partitions` (
1105 |         `id`   INT UNSIGNED  NOT NULL AUTO_INCREMENT PRIMARY KEY,
1106 |         `name` VARCHAR(512) NOT NULL,
1107 |         UNIQUE INDEX `name` (`name`(512))
1108 |     ) ENGINE=MyISAM;";
1109 |     if (not do_sql ($dbh, $sql)) { $success = 0; }
1110 | 
1111 |     # maps job state strings to unique ids
1112 |     $sql = "CREATE TABLE IF NOT EXISTS `jobstates` (
1113 |         `id`   INT UNSIGNED  NOT NULL AUTO_INCREMENT PRIMARY KEY,
1114 |         `name` VARCHAR(512) NOT NULL,
1115 |         UNIQUE INDEX `name` (`name`(512))
1116 |     ) ENGINE=MyISAM;";
1117 |     if (not do_sql ($dbh, $sql)) { $success = 0; }
1118 | 
1119 |     # maps job name strings to unique ids
1120 |     $sql = "CREATE TABLE IF NOT EXISTS `jobnames` (
1121 |         `id`   INT UNSIGNED  NOT NULL AUTO_INCREMENT PRIMARY KEY,
1122 |         `name` VARCHAR(512) NOT NULL,
1123 |         UNIQUE INDEX `name` (`name`(512))
1124 |     ) ENGINE=MyISAM;";
1125 |     if (not do_sql ($dbh, $sql)) { $success = 0; }
1126 | 
1127 |     # maps node name strings to unique ids
1128 |     $sql = "CREATE TABLE IF NOT EXISTS `nodes` (
1129 |         `id`   INT UNSIGNED  NOT NULL AUTO_INCREMENT PRIMARY KEY,
1130 |         `name` VARCHAR(512) NOT NULL,
1131 |         UNIQUE INDEX `name` (`name`(512))
1132 |     ) ENGINE=MyISAM;";
1133 |     if (not do_sql ($dbh, $sql)) { $success = 0; }
1134 | 
1135 |     # insert a <jobid,nodeid> record for each node a job uses
1136 |     $sql = "CREATE TABLE IF NOT EXISTS `jobs_nodes` (
1137 |         `job_id`  INT UNSIGNED NOT NULL,
1138 |         `node_id` INT UNSIGNED NOT NULL,
1139 |         UNIQUE INDEX `job_node` (`job_id`,`node_id`),
1140 |         INDEX `node_id` (`node_id`)
1141 |     ) ENGINE=MyISAM;";
1142 |     if (not do_sql ($dbh, $sql)) { $success = 0; }
1143 | 
1144 |     return $success;
1145 | }
1146 | 
1147 | # convert all data in version 1 table to version 2 schema
1148 | sub convert_slurm_joblog_table_from_v1_to_v2
1149 | {
1150 |     my $dbh = shift @_;
1151 |     my $success = 1;
1152 | 
1153 |     # switch to the slurm db
1154 |     if (not do_sql ($dbh, "USE $conf{db};")) { $success = 0; }
1155 | 
1156 |     # check that there is an older table to convert from
1157 |     if (not table_exists ($dbh, "slurm_job_log")) {
1158 |         log_msg ("convert: 'slurm_job_log' table does not exist.\n");
1159 |         return 0;
1160 |     }
1161 | 
1162 |     # if our new table does not exist, create it
1163 |     if (not table_exists ($dbh, "jobs")) {
1164 |         log_msg ("convert: 'jobs' table does not exist," .
1165 |                  " attempting to create it.\n");
1166 |         if (not create_slurm_joblog_table_v2($dbh)) {
1167 |             return 0;
1168 |         }
1169 |     }
1170 | 
1171 |     # if tracking and --noindicies was specified,
1172 |     # remove indicies before insert, we'll add them back later
1173 |     if ($conf{track} and not $conf{indicies}) {
1174 |         my $drop = "ALTER TABLE `jobs_nodes`" .
1175 |                    " DROP INDEX `job_node`, DROP INDEX `node_id`;";
1176 |         if (not do_sql ($dbh, $drop)) {
1177 |             log_error ("Problem dropping node tracking indicies.\n");
1178 |         }
1179 |     }
1180 | 
1181 |     # get the total count of jobs in the database
1182 |     # (used to print percentage of progress)
1183 |     my $total_count = 0;
1184 |     my $sth_count = $dbh->prepare("SELECT COUNT(*) FROM `slurm_job_log`;");
1185 |     if ($sth_count->execute()) {
1186 |         ($total_count) = $sth_count->fetchrow_array();
1187 |     }
1188 |     my $milemarker = int($total_count / 100);
1189 |     if ($milemarker == 0) { $milemarker = 1; }
1190 | 
1191 |     # now grab all of the jobs and insert them one-by-one
1192 |     my $sth_all_jobs = $dbh->prepare("SELECT * FROM `slurm_job_log`;");
1193 |     my $job_id = undef;
1194 |     if ($sth_all_jobs->execute()) {
1195 |         my $count = 0;
1196 |         my $time_sum = 0;
1197 |         while (my @parts = $sth_all_jobs->fetchrow_array()) {
1198 |             # start timer
1199 |             my ($start_secs, $start_micros) = gettimeofday();
1200 | 
1201 |             my %h = ();
1202 |             # throw id away, we'll get a new one any way,
1203 |             # and this way we can run the conversion on a live machine,
1204 |             # since slurm-joblog.pl will be inserting records as this
1205 |             # conversion is running
1206 |             #$h{Id}        = $parts[0];
1207 |             $h{JobId}     = $parts[1];
1208 |             $h{UserName}  = $parts[2];
1209 |             $h{UserNumb}  = $parts[3];
1210 |             $h{Name}      = $parts[4];
1211 |             $h{JobState}  = $parts[5];
1212 |             $h{Partition} = $parts[6];
1213 |             $h{TimeLimit} = $parts[7];
1214 |             $h{StartTime} = $parts[8];
1215 |             $h{EndTime}   = $parts[9];
1216 |             $h{NodeList}  = $parts[10];
1217 |             $h{NodeCnt}   = $parts[11];
1218 |             # Procs field wasn't defined in version 1 schema
1219 |             #$h{Procs}     = $parts[12];
1220 | 
1221 |             # bug in version 1, which set nodecount to 1 for blank hostlists
1222 |             my $hostlist = $h{NodeList};
1223 |             if ($hostlist =~ /^\s*$/) {
1224 |                 $h{NodeList} = "";
1225 |                 $h{NodeCnt}  = 0;
1226 |             }
1227 | 
1228 |             # build the values string
1229 |             my $values = value_string_v2($dbh, \%h);
1230 | 
1231 |             # insert the job
1232 |             if ($conf{track}) {
1233 |                 # insert the job, need to wait on the insert
1234 |                 # since we need the job_id
1235 |                 my $sql = "INSERT IGNORE INTO `$conf{db}`.`jobs`" .
1236 |                           " VALUES $values;";
1237 |                 if (not do_sql($dbh, $sql)) {
1238 |                     $success = 0;
1239 |                 } else {
1240 |                     # now insert nodes used by this job
1241 |                     my $job_id = get_last_insert_id ($dbh);
1242 |                     if (defined $job_id and $job_id != 0) {
1243 |                         insert_job_nodes ($dbh, $job_id, $h{NodeList});
1244 |                     }
1245 |                 }
1246 |             } else {
1247 |                 # insert the job, no need to wait on it
1248 |                 my $sql = "INSERT DELAYED IGNORE INTO `$conf{db}`.`jobs`" .
1249 |                           " VALUES $values;";
1250 |                 if (not do_sql($dbh, $sql)) { $success = 0; }
1251 |             }
1252 | 
1253 |             # stop timer and print timing and progress as we go
1254 |             my ($end_secs, $end_micros) = gettimeofday();
1255 |             my $micros = ($end_secs * 1000000 + $end_micros) -
1256 |                 ($start_secs * 1000000 + $start_micros);
1257 |             $time_sum += $micros;
1258 |             $count++;
1259 |             if ($count % $milemarker == 0) {
1260 |                 my $avg_time = int($time_sum / $count);
1261 |                 my $perc = "";
1262 |                 if ($total_count > 0) {
1263 |                     $perc = sprintf("%.0f", $count / $total_count * 100);
1264 |                 }
1265 |                 log_msg ("Records converted $count ($perc%):" .
1266 |                          " $avg_time usec / record\n");
1267 |                 $time_sum = 0;
1268 |             }
1269 |         }
1270 |     } else {
1271 |         # select against version 1 table failed
1272 |         $success = 0;
1273 |     }
1274 | 
1275 |     # rebuild indicies
1276 |     if ($conf{track} and not $conf{indicies}) {
1277 |         my $rebuild = "ALTER TABLE `jobs_nodes`" .
1278 |                       " ADD UNIQUE INDEX `job_node` (`job_id`,`node_id`)," .
1279 |                       " ADD INDEX `node_id` (`node_id`);";
1280 |         if (not do_sql ($dbh, $rebuild)) {
1281 |             log_error ("Problem rebuilding node tracking indicies.\n");
1282 |         }
1283 |     }
1284 | 
1285 |     return $success;
1286 | }
1287 | 
1288 | # backfill data from files into version 2 tables
1289 | sub backfill_slurm_joblog_table_to_v2
1290 | {
1291 |     my $dbh = shift @_;
1292 |     my @files = @_;
1293 |     my $success = 1;
1294 | 
1295 |     # switch to the slurm db
1296 |     do_sql ($dbh, "USE $conf{db};");
1297 | 
1298 |     # if our new table does not exist, create it
1299 |     if (not table_exists ($dbh, "jobs")) {
1300 |         create_slurm_joblog_table_v2($dbh);
1301 |     }
1302 | 
1303 |     log_error ("No files to backfill!\n") if (!@files);
1304 | 
1305 |     # if tracking and --noindicies was specified,
1306 |     # remove indicies before insert, we'll add them back later
1307 |     if ($conf{track} and not $conf{indicies}) {
1308 |         my $drop = "ALTER TABLE `jobs_nodes`" .
1309 |                    " DROP INDEX `job_node`," .
1310 |                    " DROP INDEX `node_id`;";
1311 |         if (not do_sql ($dbh, $drop)) {
1312 |             log_error ("Problem dropping node tracking indicies.\n");
1313 |         }
1314 |     }
1315 | 
1316 |     my $count = 0;
1317 |     my $time_sum = 0;
1318 |     foreach my $file (@files) {
1319 |         my $skipped = 0;
1320 | 
1321 |         my $f = $file;
1322 |         $f = "gzip -dc $f | " if ($f =~ /\.gz$/);
1323 | 
1324 |         open (IN, $f) or log_error ("Failed to open \"$file\":$!\n"), next;
1325 | 
1326 |         while (my $line = <IN>) {
1327 |             # start timer
1328 |             my ($start_secs, $start_micros) = gettimeofday();
1329 | 
1330 |             chomp $line;
1331 |             my @parts = split(" ", $line);
1332 | 
1333 |             my %h = ();
1334 |             foreach my $part (@parts) {
1335 |                 my ($key, $value) = split("=", $part);
1336 |                 $h{$key} = $value;
1337 |             }
1338 | 
1339 |             # Some very old joblog files may have the incorrect
1340 |             #  datetime format. Unfortunately, the year wasn't
1341 |             #  included in these, so we have to drop these entries :-(
1342 |             if (defined $h{StartTime} and $h{StartTime} =~ m{^\d\d/\d\d-}) {
1343 |                 $skipped++;
1344 |                 next;
1345 |             }
1346 | 
1347 |             if ($conf{recalculate_nodecount} && defined $h{NodeList}) {
1348 |                 my $hostlist = $h{NodeList};
1349 |                 if ($hostlist =~ /^\s*$/) {
1350 |                     $h{NodeCnt}  = 0;
1351 |                 }
1352 |                 else {
1353 |                     $h{NodeCnt} = Hostlist::expand($hostlist)
1354 |                 }
1355 |             }
1356 | 
1357 |             # convert from slurm log to format for MySQL
1358 |             if (defined $h{"UserId"}) {
1359 |                 my $userid = $h{"UserId"};
1360 |                 my ($username, $usernumb) = ($userid =~ /(.+)\((\d+)\)/);
1361 |                 if (defined $username and defined $usernumb) {
1362 |                     $h{"UserName"} = $username;
1363 |                     $h{"UserNumb"} = $usernumb;
1364 |                 }
1365 |             }
1366 |             if (defined $h{"StartTime"}) {
1367 |                 $h{"StartTime"} =~ s/T/ /;
1368 |             }
1369 |             if (defined $h{"EndTime"}) {
1370 |                 $h{"EndTime"}   =~ s/T/ /;
1371 |             }
1372 | 
1373 |             # set the values
1374 |             my $values = value_string_v2($dbh, \%h);
1375 | 
1376 |             # insert the job
1377 |             if ($conf{track}) {
1378 |                 # insert the job, need to wait on the insert
1379 |                 # since we need the job_id
1380 |                 my $sql = "INSERT IGNORE INTO `$conf{db}`.`jobs`" .
1381 |                           " VALUES $values;";
1382 |                 if (not do_sql($dbh, $sql)) {
1383 |                     $success = 0;
1384 |                 } else {
1385 |                     # now insert nodes used by this job
1386 |                     my $job_id = get_last_insert_id ($dbh);
1387 |                     if (defined $job_id and $job_id != 0) {
1388 |                         insert_job_nodes ($dbh, $job_id, $h{NodeList});
1389 |                     }
1390 |                 }
1391 |             } else {
1392 |                 # insert the job, no need to wait on it
1393 |                 my $sql = "INSERT DELAYED IGNORE INTO `$conf{db}`.`jobs`" .
1394 |                           " VALUES $values;";
1395 |                 if (not do_sql($dbh, $sql)) { $success = 0; }
1396 |             }
1397 | 
1398 |             # stop timer and print timing and progress as we go
1399 |             my ($end_secs, $end_micros) = gettimeofday();
1400 |             my $micros = ($end_secs * 1000000 + $end_micros) -
1401 |                 ($start_secs * 1000000 + $start_micros);
1402 |             $time_sum += $micros;
1403 |             $count++;
1404 |             if ($count % 1000 == 0) {
1405 |                 my $avg_time = int($time_sum / $count);
1406 |                 log_msg ("Records converted $count:" .
1407 |                          " $avg_time usec / record\n");
1408 |                 $time_sum = 0;
1409 |             }
1410 |         }
1411 | 
1412 |         log_verbose ("Backfilled $count jobs from file $file\n");
1413 |         log_error ("Skipped $skipped job(s) from file $file because of ",
1414 |                   "old date format\n") if $skipped;
1415 | 
1416 |         close(IN);
1417 |     }
1418 | 
1419 |     # rebuild indicies
1420 |     if ($conf{track} and not $conf{indicies}) {
1421 |         my $rebuild = "ALTER TABLE `jobs_nodes`" .
1422 |                       " ADD UNIQUE INDEX `job_node` (`job_id`,`node_id`)," .
1423 |                       " ADD INDEX `node_id` (`node_id`);";
1424 |         if (not do_sql ($dbh, $rebuild)) {
1425 |             log_error ("Problem rebuilding node tracking indicies.\n");
1426 |         }
1427 |     }
1428 | 
1429 |     return $success;
1430 | }
1431 | 
1432 | ####################
1433 | # Utility functions
1434 | ####################
1435 | 
1436 | # append records to file
1437 | sub dump_slurm_joblog_table
1438 | {
1439 |     my $version = shift @_;
1440 |     my $dbh     = shift @_;
1441 |     my $joblog  = shift @_;
1442 |     my $success = 1;
1443 | 
1444 |     # switch to the slurm db
1445 |     if (not do_sql ($dbh, "USE $conf{db};")) { $success = 0; }
1446 | 
1447 |     # check that we have a table to get data from
1448 |     if ($version > 1) {
1449 |         if (not table_exists ($dbh, "jobs")) {
1450 |             log_msg ("'jobs' table does not exist.\n");
1451 |             return 0;
1452 |         }
1453 |     } else {
1454 |         if (not table_exists ($dbh, "slurm_job_log")) {
1455 |             log_msg ("'slurm_job_log' table does not exist.\n");
1456 |             return 0;
1457 |         }
1458 |     }
1459 | 
1460 |     # if prune is set, check that the date format is valid,
1461 |     # and check that we're not also obfuscating
1462 |     my $date = undef;
1463 |     if (defined $conf{prune}) {
1464 |         # can't prune and obfuscate at the same time
1465 |         if ($conf{obfuscate}) {
1466 |             log_fatal ("You cannot prune and obfuscate at the same time.\n");
1467 |         }
1468 | 
1469 |         # make sure the date is valid format
1470 |         if ($conf{prune} !~ /^\d\d\d\d\-\d\d\-\d\d \d\d:\d\d:\d\d$/) {
1471 |             log_fatal ("Invalid prune date: $conf{prune}." .
1472 |                        "  Must be in --prune='yyyy-mm-dd hh:mm:ss' format.\n");
1473 |         }
1474 | 
1475 |         # ok, build out date qualifier
1476 |         $date = "`starttime` < " . $dbh->quote($conf{prune});
1477 | 
1478 |         # TODO: if tracking, remove indicies from jobs_nodes and
1479 |         # add back after we're done?
1480 |     } elsif (defined $conf{backup}) {
1481 |         # make sure the date is valid format
1482 |         if ($conf{backup} =~ /^all$/i) {
1483 |             # nothing to do here
1484 |         } elsif ($conf{backup} =~ /^(\d\d\d\d\-\d\d\-\d\d \d\d:\d\d:\d\d)$/) {
1485 |             $date = "`starttime` < " . $dbh->quote($1);
1486 |         } elsif ($conf{backup} =~ /^(\d\d\d\d\-\d\d\-\d\d \d\d:\d\d:\d\d)\.\.(\d\d\d\d\-\d\d\-\d\d \d\d:\d\d:\d\d)$/) {
1487 |             $date = "`starttime` >= " . $dbh->quote($1) .
1488 |                     " AND `starttime` < " . $dbh->quote($2);
1489 |         } else {
1490 |             log_fatal ("Invalid backup range: $conf{backup}." .
1491 |                        "  Must be one of: \"all\", DATE, or DATE..DATE;" .
1492 |                        " where DATE is 'yyyy-mm-dd hh:mm:ss'.\n");
1493 |         }
1494 |     }
1495 | 
1496 |     # open our output file
1497 |     if (!open (JOBLOG, ">>$joblog")) {
1498 |         log_fatal ("Unable to open $joblog: $!\n");
1499 |     }
1500 | 
1501 |     my $stmt = "";
1502 | 
1503 |     # build a statement to get the total count of jobs in
1504 |     # the database (used to print percentage of progress)
1505 |     my $total_count = 0;
1506 |     if ($version > 1) {
1507 |         $stmt = "SELECT COUNT(*) FROM `jobs`";
1508 |     } else {
1509 |         $stmt = "SELECT COUNT(*) FROM `slurm_job_log`";
1510 |     }
1511 |     if (defined $date) { $stmt .= " WHERE $date"; }
1512 |     $stmt .= " ORDER BY `starttime`,`id` ASC;";
1513 | 
1514 |     # get the count
1515 |     log_debug ("$stmt\n");
1516 |     my $sth_count = $dbh->prepare($stmt);
1517 |     if ($sth_count->execute()) {
1518 |         ($total_count) = $sth_count->fetchrow_array();
1519 |     }
1520 |     my $milemarker = int($total_count / 100);
1521 |     if ($milemarker == 0) { $milemarker = 1; }
1522 | 
1523 |     # build a statement to select our records
1524 |     if ($version > 1) {
1525 |         $stmt = "SELECT" .
1526 |         " `jobs`.*," .
1527 |         "`usernames`.`name` as `username`," .
1528 |         "`jobnames`.`name` as `jobname`," .
1529 |         "`jobstates`.`name` as `jobstate`," .
1530 |         "`partitions`.`name` as `partition`" .
1531 |         " FROM `jobs`" .
1532 |         " LEFT JOIN `usernames`  ON `jobs`.`username_id`  = `usernames`.`id`" .
1533 |         " LEFT JOIN `jobnames`   ON `jobs`.`jobname_id`   = `jobnames`.`id`" .
1534 |         " LEFT JOIN `jobstates`  ON `jobs`.`jobstate_id`  = `jobstates`.`id`" .
1535 |         " LEFT JOIN `partitions` ON `jobs`.`partition_id` = `partitions`.`id`";
1536 |     } else {
1537 |         $stmt = "SELECT * FROM `slurm_job_log`";
1538 |     }
1539 |     if (defined $date) { $stmt .= " WHERE $date"; }
1540 |     $stmt .= " ORDER BY `starttime`,`id` ASC;";
1541 | 
1542 |     # now grab all of the jobs and append them one-by-one
1543 |     log_debug ("$stmt\n");
1544 |     my $sth_all_jobs = $dbh->prepare($stmt);
1545 |     if ($sth_all_jobs->execute()) {
1546 |         my $count = 0;
1547 |         my $time_sum = 0;
1548 | 
1549 |         while (my $h = $sth_all_jobs->fetchrow_hashref()) {
1550 |             # start timer
1551 |             my ($start_secs, $start_micros) = gettimeofday();
1552 | 
1553 |             # bug in version 1, which set nodecount to 1 for blank hostlists
1554 |             if ($$h{nodelist} =~ /^\s*$/) {
1555 |                 $$h{nodelist} = "";
1556 |                 $$h{nodecount}  = 0;
1557 |             }
1558 | 
1559 |             # set time to proper format
1560 |             $$h{starttime} =~ s/(\-\d\d) (\d\d:)/$1T$2/;
1561 |             $$h{endtime}   =~ s/(\-\d\d) (\d\d:)/$1T$2/;
1562 | 
1563 |             # set procs field
1564 |             my $procs = undef;
1565 |             if ($version > 1) {
1566 |                 $procs = $$h{'corecount'};
1567 |             } elsif (defined $conf{cores}) {
1568 |                 $procs = $$h{'nodecount'} * $conf{cores};
1569 |             }
1570 | 
1571 |             # optionally obfuscate username, userid, and jobname
1572 |             my $username = $$h{'username'};
1573 |             my $userid   = $$h{'userid'};
1574 |             my $jobname  = $$h{'jobname'};
1575 |             if ($conf{obfuscate} and not defined $conf{prune}) {
1576 |                # obfuscate username
1577 |                if (not defined $obfuscate{usernames}{$username}) {
1578 |                    $num_users++;
1579 |                    $obfuscate{usernames}{$username} = $num_users;
1580 |                }
1581 |                $username = "user_" . $obfuscate{usernames}{$username};
1582 | 
1583 |                # obfuscate userid
1584 |                $userid = $num_users;
1585 | 
1586 |                # obfuscate jobname
1587 |                if (not defined $obfuscate{jobnames}{$jobname}) {
1588 |                    $num_jobs++;
1589 |                    $obfuscate{jobnames}{$jobname} = $num_jobs;
1590 |                }
1591 |                $jobname = "job_" . $obfuscate{jobnames}{$jobname};
1592 |             }
1593 | 
1594 |             # append record to file
1595 |             my @params = ();
1596 |             push @params, $$h{'jobid'};
1597 |             push @params, $username;
1598 |             push @params, $userid;
1599 |             push @params, $jobname;
1600 |             push @params, $$h{'jobstate'};
1601 |             push @params, $$h{'partition'};
1602 |             push @params, $$h{'timelimit'};
1603 |             push @params, $$h{'starttime'};
1604 |             push @params, $$h{'endtime'};
1605 |             push @params, $$h{'nodelist'};
1606 |             push @params, $$h{'nodecount'};
1607 |             if (defined $procs) {
1608 |                 push @params, "$procs";
1609 |                 printf JOBLOG
1610 |                   "JobId=%s UserId=%s(%s) Name=%s JobState=%s Partition=%s " .
1611 |                   "TimeLimit=%s StartTime=%s EndTime=%s NodeList=%s " .
1612 |                   "NodeCnt=%s Procs=%s\n", @params;
1613 |             } else {
1614 |                 printf JOBLOG
1615 |                   "JobId=%s UserId=%s(%s) Name=%s JobState=%s Partition=%s " .
1616 |                   "TimeLimit=%s StartTime=%s EndTime=%s NodeList=%s " .
1617 |                   "NodeCnt=%s\n", @params;
1618 |             }
1619 | 
1620 |             # if we are pruning, delete the job and any associated records
1621 |             if (defined $conf{prune}) {
1622 |                 my $id = $$h{'id'};
1623 |                 if (defined $id) {
1624 |                     my $q_job_id = $dbh->quote($id);
1625 | 
1626 |                     # if tracking, first delete all node records
1627 |                     if ($version > 1 and $conf{track}) {
1628 |                         do_sql ($dbh, "DELETE FROM `jobs_nodes`" .
1629 |                                       " WHERE `job_id` = $q_job_id;");
1630 |                     }
1631 | 
1632 |                     # now delete the job record
1633 |                     if ($version > 1) {
1634 |                         do_sql ($dbh, "DELETE FROM `jobs`" .
1635 |                                       " WHERE `id` = $q_job_id;");
1636 |                     } else {
1637 |                         do_sql ($dbh, "DELETE FROM `slurm_job_log`" .
1638 |                                       " WHERE `id` = $q_job_id;");
1639 |                     }
1640 |                 }
1641 |             }
1642 | 
1643 |             # stop timer and print timing and progress as we go
1644 |             my ($end_secs, $end_micros) = gettimeofday();
1645 |             my $micros = ($end_secs * 1000000 + $end_micros) -
1646 |                 ($start_secs * 1000000 + $start_micros);
1647 |             $time_sum += $micros;
1648 |             $count++;
1649 |             if ($count % $milemarker == 0) {
1650 |                 my $avg_time = int($time_sum / $count);
1651 |                 my $perc = "";
1652 |                 if ($total_count > 0) {
1653 |                     $perc = sprintf("%.0f", $count / $total_count * 100);
1654 |                 }
1655 |                 log_msg ("Records written $count ($perc%):" .
1656 |                          " $avg_time usec / record\n");
1657 |                 $time_sum = 0;
1658 |             }
1659 |         }
1660 | 
1661 |         log_msg ("Wrote $count jobs to $joblog.\n");
1662 |     } else {
1663 |         # select against version 1 table failed
1664 |         $success = 0;
1665 |     }
1666 | 
1667 |     close (JOBLOG);
1668 |     return $success;
1669 | }
1670 | 
1671 | #
1672 | #  Generate a digest of the password, sha1 or md5 depending on the
1673 | #   size of the password column in the user table
1674 | #
1675 | sub passwd_digest
1676 | {
1677 |     my $dbh = connect_db_root ();
1678 |     my $passwd = shift @_;
1679 | 
1680 |     log_fatal ("passwd_digest: Failed to get DB handle!\n") if !$dbh;
1681 | 
1682 |     my $sth = $dbh->prepare ("SELECT PASSWORD('example');") 
1683 |         or log_fatal ($dbh->errstr);
1684 | 
1685 |     $sth->execute ();
1686 | 
1687 |     my ($r) = $sth->fetchrow_array ();
1688 | 
1689 |     if (length $r >= 41) {
1690 |         return "*" . sha1_hex ($passwd);
1691 |     } 
1692 | 
1693 |     #  I don't know what the short password hash is, so 
1694 |     #   use of this function is disabled for now.
1695 |     #
1696 |     #return (unpack ("H16", pack ("A13", $c)));
1697 | }
1698 | 
1699 | sub log_msg     { print STDERR "$progname: ", @_;       }
1700 | sub log_error   { log_msg ("Error: ", @_);              }
1701 | sub log_fatal   { log_msg ("Fatal: ", @_); exit 1;      }
1702 | sub log_verbose { log_msg (@_) if ($conf{verbose});     }
1703 | sub log_debug   { log_msg (@_) if ($conf{verbose} > 1); }
1704 | 
1705 | # vi: ts=4 sw=4 expandtab
1706 | 


--------------------------------------------------------------------------------
/sqlog-db-util.8:
--------------------------------------------------------------------------------
  1 | .\" $Id$
  2 | .\"
  3 | 
  4 | .TH SQLOG-DB-UTIL 8 "SQLOG Database Utility"
  5 | 
  6 | .SH NAME
  7 | sqlog-db-util \- Utility for SLURM job log database maintenance
  8 | 
  9 | .SH SYNOPSIS
 10 | .B sqlog-db-util
 11 | [\fIOPTIONS\fR]...
 12 | 
 13 | .SH DESCRIPTION
 14 | The \fBsqlog-db-util\fR utility is an interface for creating and
 15 | backfilling the SLURM job log database used by the \fBsqlog\fR(1)
 16 | command. It reads the sqlog.conf and slurm-joblog.conf files to
 17 | determine the DB users, passwords, and SQL host it should use
 18 | for DB creation. 
 19 | 
 20 | .SH OPTIONS
 21 | .TP 
 22 | .BI "-h, --help"
 23 | Display a usage message.
 24 | .TP
 25 | .BI "-i, --info"
 26 | Provide information about the currently configured DB, including the
 27 | server hostname, read-only username, read-write username, SLURM job
 28 | log database name, and the total number of jobs currently stored in
 29 | the DB.
 30 | .TP
 31 | .BI "-v, --verbose"
 32 | Increase verbosity.
 33 | .TP
 34 | .BI "-d, --drop=V"
 35 | Drop tables for version V={1,2} of database schema.
 36 | Currently, this option doesn't remove SLURM job log users or DB.
 37 | .TP
 38 | .BI "-c, --create"
 39 | Create the SLURM job log DB, the slurm_job_log table, and the associated
 40 | read-only and read-write users.
 41 | .TP
 42 | .BI "-b, --backfill"
 43 | Backfill the database with SLURM job information using the list of files
 44 | provided on the command line. Files should be in the format created by
 45 | SLURM's jobcomp/filetxt plugin. If the files end in .gz, they will be
 46 | automatically unzipped at runtime.  Also see --cores-per-node.
 47 | .TP
 48 | .BI "-x, --convert"
 49 | Convert data from database schema version 1 to version 2.  Also see 
 50 | --cores-per-node.
 51 | .TP
 52 | .BI "-B, --backup=RANGE"
 53 | Copy data from tables to a file (readable by --backfill).
 54 | Specify records via a range of job start times.  RANGE can be any
 55 | of: "all", DATE, or DATE..DATE; where DATE is 'yyyy-mm-dd hh:mm:ss',
 56 | to specify all jobs, all jobs with a start time older than DATE,
 57 | or all jobs with a start time between DATE..DATE, respectively.
 58 | Also see --cores-per-node and --obfuscate.
 59 | .TP
 60 | .BI "-o, --obfuscate"
 61 | Obfuscate usernames, userids, and jobnames during a backup operation.
 62 | This is useful when sharing joblogs with outside collaborators.
 63 | .TP
 64 | .BI "-p, --prune=DATE"
 65 | Prune database of all jobs with start times older than DATE; write such records to a file.
 66 | DATE must be in format of 'yyyy-mm-dd hh:mm:ss'.  Also see --cores-per-node.
 67 | .TP
 68 | .BI "-C, --cores-per-node=N"
 69 | The version 1 schema did not record the number of cores allocated to a job.
 70 | For systems that allocate whole nodes to jobs and have the same number of
 71 | cores per node, the number of cores allocated to a job can be computed
 72 | by multiplying the number of node allocated to a job by the number of
 73 | cores per node.  On such systems, use the --cores-per-node option to specify the
 74 | number of cores per node.  This option can be used during --convert,
 75 | --backfill, and --backup operations.
 76 | .TP
 77 | .BI "--notrack"
 78 | Disable per-job node tracking for jobs inserted during convert
 79 | or backfill operations.  If node-tracking is enabled on a system,
 80 | i.e., TRACKNODES is not set or is set to 1 in sqlog.conf,
 81 | then such jobs will not show up in queries involving specific node names.
 82 | .TP
 83 | .BI "--delay-index"
 84 | Temporarily disable node tracking indicies for jobs inserted during
 85 | convert of backfill operations.  This drops the indicies, inserts the jobs,
 86 | and rebuilds the indicies.  When converting or inserting lots of records,
 87 | this speeds up the operation, however, it makes things slower when loading
 88 | only a few records.
 89 | .TP
 90 | .BI "--recalc-nodecnt"
 91 | Some versions of SLURM incorrectly set NODECNT in the jobcomp/script plugin
 92 | and thus the nodecount may be incorrect in the sqlog database. Use of this
 93 | option along with a \fB--backfill\fR will fix the incorrect nodecount
 94 | values by recalculating directly from the nodelist in the joblog.
 95 | .BI "-L, --localhost"
 96 | Override the SQL host configuration and connect to DB on localhost.
 97 | (May be required if root user is only allowed access to DB via localhost)
 98 | 
 99 | .SH EXAMPLES
100 | Create database:
101 | .nf
102 | 
103 |    sqlog-db-util --create
104 | 
105 | .fi
106 | Insert job records in database for all jobs in current SLURM txt joblog files:
107 | .nf
108 | 
109 |    sqlog-db-util --backfill /var/log/slurm/joblog*
110 | 
111 | .fi
112 | Drop an existing version 2 database, recreate using current configuration,
113 |  and seed the new database using SLURM joblog files:
114 | .nf 
115 | 
116 |    sqlog-db-util -d 2 -cb /var/log/slurm/joblog*
117 | 
118 | .fi
119 | 
120 | .SH AUTHOR
121 | Written by Adam Moody and Mark Grondona.
122 | 
123 | .SH SEE ALSO
124 | \fBsqlog\fR(1), /etc/slurm/sqlog.conf, /etc/slurm/slurm-joblog.conf
125 | 


--------------------------------------------------------------------------------
/sqlog.1:
--------------------------------------------------------------------------------
  1 | .\" $Id$
  2 | .\"
  3 | 
  4 | .TH SQLOG 1 "SLURM Query Log"
  5 | 
  6 | .SH NAME
  7 | sqlog \- SLURM query log utility
  8 | 
  9 | .SH SYNOPSIS
 10 | .B sqlog
 11 | [\fIOPTIONS\fR]...
 12 | 
 13 | .SH DESCRIPTION
 14 | The \fBsqlog\fR utility provides a single interface to query information
 15 | about jobs from the SLURM job log database and/or the current queue
 16 | of running jobs. 
 17 | 
 18 | By default both the current queue of running jobs and the database
 19 | of completed jobs are queried, and a limit of 25 results is displayed.
 20 | If more results are available in the database or queue, sqlog will
 21 | note the fact with an informational message to stderr:
 22 | .nf 
 23 | 
 24 |     sqlog: [More results available....]
 25 | 
 26 | .fi 
 27 | This message is suppressed if the \fB--no-header\fR option is provided.
 28 | 
 29 | .SH CONFIGURATION
 30 | 
 31 | \fBsqlog\fR reads configuration from the \fBsqlog.conf\fR config file
 32 | (typically in /etc/slurm). This config file provides information about
 33 | the SLURM job log database location and username.  In addition,
 34 | \fBsqlog\fR  reads user defaults and additional output format types
 35 | from a ~/.sqlog file if it exists. See the USER CONFIG section
 36 | below for more information.
 37 | 
 38 | Various config parameters are set in the following order:
 39 | Internal defaults, system configuration, user configuration, 
 40 | and command-line.
 41 | 
 42 | .SH OPTIONS
 43 | .TP
 44 | .BI "-h, --help"
 45 | Display a summary of the command-line options.
 46 | .TP
 47 | .BI "-v, --verbose"
 48 | Increase debugging verbosity of the program.
 49 | .TP
 50 | .BI "--dry-run"
 51 | Don't actually do anything.
 52 | .TP
 53 | .BI "-j, --jobids " LIST
 54 | Provide a comma-separated list of jobids to include (or exclude if
 55 | preceded by the \fI-x\fR, \fI--exclude\fR option). This option may
 56 | be specified multiple times.
 57 | .TP
 58 | .BI "-J, --job-names " LIST
 59 | Provide a comma-separated list of job names to include (or exclude if
 60 | preceded by the \fI-x\fR, \fI--exclude\fR option). This option may
 61 | be specified multiple times.
 62 | .TP
 63 | .BI "-n, --nodes " LIST
 64 | Provide a comma-separated list of nodes or node lists to include 
 65 | (or exclude if preceded by the \fI-x\fR, \fI--exclude\fR option).
 66 | This option may be specified multiple times. Node lists can be
 67 | in hostlist format, e.g. host[34-36,67].
 68 | .TP
 69 | .BI "-p, --partitions " LIST
 70 | Provide a comma-separated list of partitions to include (or exclude if
 71 | preceded by the \fI-x\fR, \fI--exclude\fR option). This option may
 72 | be specified multiple times.
 73 | .TP
 74 | .BI "-s, --states " LIST
 75 | Provide a comma-separated list of job states to include (or exclude if
 76 | preceded by the \fI-x\fR, \fI--exclude\fR option). This option may
 77 | be specified multiple times. Use --states=list to generate list of valid
 78 | job state keys.
 79 | .TP
 80 | .BI "-u, --users " LIST
 81 | Provide a comma-separated list of users to include (or exclude if
 82 | preceded by the \fI-x\fR, \fI--exclude\fR option). This option may
 83 | be specified multiple times.
 84 | .TP
 85 | .BI "--regex"
 86 | Enable regular expressions instead of exact matching for the following list
 87 | of jobids, job names, partitions, states, and user names. For example
 88 | .nf
 89 | 
 90 |      sqlog --regex --job-names='^foo-.*'
 91 | 
 92 | .fi
 93 | .TP
 94 | .BI "-x, --exclude"
 95 | Exclude the following list of jobids, users, partitions, nodes, job names,
 96 | or job states. For example: \fB--exclude --jobids\fR=\fILIST\fR or 
 97 | \fB-xn\fR \fINODES\fR.
 98 | .TP
 99 | .BI "-N, --nnodes " N
100 | List jobs that ran on exactly \fIN\fR nodes. \fIN\fR may also have the 
101 | form +\fIN\fR, -\fIN\fR, or \fIN\fR..\fIM\fR, to specify a minimum, 
102 | maximum, or range of node counts. For examples see RANGE OPERATORS
103 | section below.
104 | .TP
105 | .BI "--minnodes " N
106 | Explicity specify a minimum node count. This is equivalent to using
107 | the option \fB--nnodes\fR=+\fIN\fR.
108 | .TP
109 | .BI "--maxnodes " N
110 | Explicity specify a maximum node count. This is equivalent to using
111 | the option \fB--nnodes\fR=-\fIN\fR.
112 | .TP
113 | .BI "-C, --ncores " N
114 | List jobs that ran on exactly \fIN\fR cores. \fIN\fR may also have the 
115 | form +\fIN\fR, -\fIN\fR, or \fIN\fR..\fIM\fR, to specify a minimum, 
116 | maximum, or range of node counts. For examples see RANGE OPERATORS
117 | section below.
118 | .TP
119 | .BI "--mincores " N
120 | Explicity specify a minimum core count. This is equivalent to using
121 | the option \fB--ncores\fR=+\fIN\fR.
122 | .TP
123 | .BI "--maxcores " N
124 | Explicity specify a maximum core count. This is equivalent to using
125 | the option \fB--ncores\fR=-\fIN\fR.
126 | .TP
127 | .BI "-T, --runtime " DURATION
128 | List jobs that ran or have run for \fIDURATION\fR. \fIDURATION\fR may
129 | have the form DD-HH:MM:SS or DDdHHhMMmSSs, where DD is days, HH
130 | hours, MM minutes, and SS seconds. In the second form, values that
131 | are zero may be left out, e.g. 4h. In the first form, days, hours,
132 | and minutes are optional, e.g. :04 is 4 seconds. \fIDURATION\fR may
133 | be specified as +\fIT\fR, -\fIT\fR or \fIT1\fR..\fIT2\fR to specify
134 | a min, max, or runtime range. For more information, see the RANGE
135 | OPERATORS section below.
136 | .TP
137 | .BI "--mintime " DURATION
138 | List all jobs that ran for at least \fIDURATION\fR.
139 | This is equivalent to specifying \fB--runtime\fR=+\fIDURATION\fR.
140 | .TP
141 | .BI "--maxtime " DURATION
142 | List all jobs that ran for at most \fIDURATION\fR.
143 | This is equivalent to specifying \fB--runtime\fR=-\fIDURATION\fR.
144 | .TP
145 | .BI "-t, --time, --at " TIME
146 | List jobs which were running at a particular date and time.
147 | \fITIME\fR arguments are parsed by the perl \fBDate::Manip\fR(3pm)
148 | package, so many date/time formats are allowed, i.e. 2pm or
149 | "4/14 15:30:00". A window of time may be specified by separating the
150 | start and end of the window with "..", e.g. \fB--time\fR=2pm..3pm.
151 | For more information see the RANGE OPERATORS section below.
152 | .TP
153 | .BI "-S, --start " TIME
154 | List all jobs that started at date and time \fITIME\fR. \fITIME\fR may
155 | have the form +\fIT\fR, -\fIT\fR, or \fIT1\fR..\fIT2\fR to specify a
156 | minimum, maximum, or start time range. See RANGE OPERATORS below
157 | for more information.
158 | .TP
159 | .BI "--start-after " TIME
160 | List all jobs that started after time \fITIME\fR. This is equivalent
161 | to using \fB--start\fR=+\fITIME\fR.
162 | .TP
163 | .BI "--start-before " TIME
164 | List all jobs that started before time \fITIME\fR. This is equivalent
165 | to using \fB--start\fR=-\fITIME\fR.
166 | .TP
167 | .BI "-E, --end " TIME
168 | List all jobs that ended at date and time \fITIME\fR. \fITIME\fR may
169 | have the form +\fIT\fR, -\fIT\fR, or \fIT1\fR..\fIT2\fR to specify a
170 | minimum, maximum, or end time range (see RANGE OPERATORS below for
171 | more information). For running jobs, SLURM uses
172 | an estimated end time, so end times in the future are valid and will
173 | be used. (and using \fB--end\fR=+now would list all currently 
174 | running jobs, since they all end in the future.)
175 | .TP
176 | .BI "--end-after " TIME
177 | List all jobs that ended (or will end) after time \fITIME\fR. This is 
178 | equivalent to using \fB--end\fR=+\fITIME\fR.
179 | .TP
180 | .BI "--end-before " TIME
181 | List all jobs that ended (or will end) before time \fITIME\fR. This is 
182 | equivalent to using \fB--end\fR=-\fITIME\fR.
183 | .TP
184 | .BI "-X, --no-running" 
185 | Do not query running jobs, i.e. ignore the current queue and only
186 | query the SLURM job log database.
187 | .TP
188 | .BI "--no-db"
189 | Do not query the SLURM job log database, i.e. only query the current
190 | queue.
191 | .TP
192 | .BI "-H, --no-header"
193 | Do not display header rows in output.
194 | .TP
195 | .BI "-o, --format " LIST
196 | Specify a list of format keys to display or a format type, or both
197 | using the form \fITYPE\fR:\fIKEYS\fR... Use \fB--format\fR=list to
198 | list valid keys and types. See OUTPUT FORMAT below for further 
199 | information.
200 | .TP
201 | .BI "-P, --sort " LIST
202 | Specify a list of keys on which to sort output. Prepend a '-' to sort
203 | in descending as opposed to ascending order. List valid keys
204 | using \fB--sort\fR=list. The default sort method is '-start'. 
205 | .TP
206 | .BI "-L, --limit " N
207 | Limit the number of records to report (Default = 25).
208 | .TP
209 | .BI "-a, --all"
210 | Do not limit the number of returned results. (Return all matching rows).
211 | This is equivalent to \fB--limit\fR=0.
212 | 
213 | .SH RANGE OPERATORS
214 | \fITIME\fR, \fIDURATION\fR, and numeric arguments may use the
215 | RANGE OPERATORS '+', '-', and '..' to specify minimum, maximum,
216 | or a range of values respectively.  TIME arguments may also use
217 | the '@' symbol to escape a leading + or - in the TIME itself
218 | (e.g. '-1hr' means '1 hr ago').  The \fB--time\fB, \fB--start\fR,
219 | \fB--end\fR, \fB--runtime\fB, and \fB--nnodes\fR options to
220 | \fBsqlog\fR all take RANGE OPERATORS.
221 | .TP
222 | Examples 
223 | .TP 20
224 | .BI "--nnodes " +8
225 | Jobs that ran with 8 or more nodes.
226 | .TP
227 | .BI "--nnodes " 16..32
228 | Jobs that ran with between 16 and 32 nodes, inclusive.
229 | .TP
230 | .BI "--runtime " -2h
231 | Jobs that ran for 2 hours or less.
232 | .TP
233 | .BI "--runtime " 5m..1hr
234 | Jobs that ran for between 5 minutes and 1 hour, inclusive.
235 | .TP
236 | .BI "--end " 2pm..3pm
237 | Jobs that ended today between 2PM and 3PM, inclusive.
238 | .TP
239 | .BI "--time " 7/17..7/18
240 | Jobs that ran anytime from 12AM, 7/17 to 12AM, 7/18.
241 | .TP
242 | .BI "--time " "+'1 hour ago'"
243 | Jobs that ran in the past hour (1 hour ago or later).
244 | .TP
245 | .BI "--time " "+-1hr (or +@-1hr)"
246 | Same as above.
247 | .TP
248 | .BI "--time " @-1hr
249 | Jobs that were running exactly at one hour ago.
250 | .TP
251 | .BI "--time " @-2hr..-1hr
252 | Jobs that were running between 2 hours ago and 1 hour ago.
253 | 
254 | 
255 | .SH USER CONFIGURATION
256 | When \fBsqlog\fR runs, it will first check for a ~/.sqlog file and 
257 | parse it if it exists. At this time, the ~/.sqlog file may be used 
258 | to set a new default limit (see \fB--limit\fR) and addtional output format 
259 | types (see \fB--format\fR). These two configuration parameters take the form:
260 | .TP 20
261 | \fBlimit\fR = \fIN\fR
262 | Set the new default output limit to \fIN\fR.
263 | .TP
264 | \fBformat{\fINAME\fB}\fR = \fILIST...\fR
265 | Create an alias \fINAME\fR for the format list \fILIST\fR.
266 | .PP
267 | For example, the following sqlog file
268 | .nf
269 |     #  Sample ~/.sqlog file
270 |     limit = 30
271 |     format{mine} = long:start,end,jobid,user,state
272 | 
273 | .fi
274 | would set the default output limit to 30 records and 
275 | add a new format type \fImine\fR. The new format type would 
276 | be used by specifying 
277 | .nf
278 | 
279 |     \fB--format\fR \fImine\fR
280 | 
281 | .fi 
282 | on the command line, which would be equivalent to 
283 | .nf
284 | 
285 |     \fB--format\fR long:start,end,jobid,user,state
286 | 
287 | .fi
288 | Any number of format types may be specified in this way, though
289 | if there are duplicate names, the last one specified will override
290 | all previous types. This also implies that a user can redefine
291 | the default \fBsqlog\fR format types \fIshort\fR, \fIlong\fR,
292 | and \fIfreeform\fR, though this is not recommended.
293 | 
294 | .SH OUTPUT FORMAT
295 | \fBsqlog\fR provides precise control over the output format, which aids with
296 | readability and simplifies parsing via scripts.  When parsing output, be sure
297 | to specify each field and the expected order using the -o,--format option.
298 | The built-in formats (short, long, and freeform) may add or reorder fields
299 | over time.
300 | 
301 | By default, \fBsqlog\fR uses the output format 
302 | .nf
303 | 
304 |    short:jobid,partition,name,user,state,start,runtime,ncores,nnodes,nodes
305 | 
306 | .fi
307 | 
308 | The \fIshort:\fR preceeding the format specification tells \fBsqlog\fR
309 | to use the \fIshort\fR form of each of the format keys. The result
310 | is what you see when running \fBsqlog\fR without using the -o,--format
311 | option. All format keys currently available are detailed here. Some
312 | keys have shorter aliases that are provided for convenience. These
313 | are listed alongside the full key name below. Note that all these
314 | keys can also be listed by using --\fIformat=list\fR.
315 | .TP 20
316 | .B "jobid | jid"
317 | The SLURM jobid for this job.
318 | .TP
319 | .B "partition | part"
320 | The SLURM partition in which the job ran or is running.
321 | .TP 
322 | .B "name"
323 | The name of the job as recorded by SLURM.
324 | .TP
325 | .B "user"
326 | The username of the user running the job.
327 | .TP 
328 | .B "state | st"
329 | The current or final state of the job. See JOB STATE CODES
330 | for a description of the two-letter codes that this field
331 | displays by default.
332 | .TP 
333 | .B "start"
334 | The start time of the job in the form MM/DD-HH:MM:SS.
335 | .TP 
336 | .B "runtime | time"
337 | The total runtime of the job in the form
338 | DD-HH:MM:SS. Leading zero values may be dropped,
339 | for instance 4:30 is 4 minutes 30 seconds.
340 | .TP 
341 | .B "ncores | C"
342 | The number of cores allocated to the job.
343 | .TP 
344 | .B "nnodes | N"
345 | The number of nodes allocated to the job.
346 | .TP 
347 | .B "nodes"
348 | The nodelist that was allocated to the job. Note that for
349 | completing jobs (CG) this nodelist will be restricted to
350 | the currently completing nodes for the job. To see the
351 | full nodelist, restrict \fBsqlog\fR to the database only,
352 | i.e. run with the -X, --no-running option.
353 | .TP 
354 | .B "runtime_s | time_s"
355 | The total job runtime in seconds.
356 | .TP 
357 | .B "end"
358 | The time at which the job completed in the form
359 | MM/DD-HH:MM:SS.
360 | .TP
361 | .B "longstart"
362 | Date and time the job started in the form
363 | YYYY-MM-DDTHH:MM:SS. This is displayed by
364 | default in the \fIlong\fR format type.
365 | .TP 
366 | .B "longend"
367 | Date and time the job ended in the form
368 | YYYY-MM-DDTHH:MM:SS. This is displayed by
369 | default in the \fIlong\fR format type.
370 | .TP 
371 | .B "unixstart"
372 | Job start time in seconds since epoch.
373 | .TP 
374 | .B "unixend"
375 | Job end time in seconds since epoch.
376 | .TP 0
377 | 
378 | A format type may be specified in addition to the format fields.  These change the output width and in some cases the output format of the fields above. The format type may also be specified alone to the \fI--format\fR option. For instance \fI--format=long\fR would choose the default fields configured for the \fIlong\fR format type.
379 | 
380 | .TP 20
381 | .B "short"
382 | This is the default output type. It uses the format fields:
383 | jobid,part,name,user,state,start,runtime,ncores,nnodes,nodes
384 | .TP
385 | .B "long"
386 | This format type uses longer widths for most fields, and
387 | displays the the full job state code by default (e.g.
388 | completing instead of CG). Its default format fields are:
389 | jobid,part,name,user,state,longstart,longend,runtime,ncores,nnodes,nodes
390 | .TP
391 | .B "freeform"
392 | This is a freeform output in which full width fields are displayed
393 | separated by whitespace. This would be used for parsing sqlog
394 | output for instance, to guarantee no field is trunctated.
395 | It uses the same format fields as the \fBlong\fR format type.
396 | 
397 | .SH JOB STATE CODES
398 | In normal output, job states are displayed with two letter abbreviations
399 | in \fBsqlog\fR output. Job state codes are fully explained in the
400 | \fBsqueue\fR(1) man page, but the abbreviations are restated here
401 | for completeness.
402 | .TP 20 
403 | .B "CA   CANCELLED"
404 | Job was cancelled.
405 | .TP
406 | .B "CD   COMPLETED"
407 | Job completed normally.
408 | .TP
409 | .B "CG   COMPLETING"
410 | Job is in the process of completing.
411 | .TP
412 | .B "F    FAILED"
413 | Job termined abnormally.
414 | .TP
415 | .B "NF   NODE_FAIL"
416 | Job terminated due to node failure.
417 | .TP
418 | .B "PD   PENDING"
419 | Job is pending allocation.
420 | .TP 
421 | .B "R    RUNNING"
422 | Job currently has an allocation.
423 | .TP
424 | .B "S    SUSPENDED"
425 | Job is suspended.
426 | .TP 
427 | .B "TO   TIMEOUT"
428 | Job terminated upon reaching its time limit.
429 | 
430 | 
431 | .SH EXAMPLES
432 | Display the job or jobs that were running on host55 at July 19, 4:00PM:
433 | .nf
434 | 
435 |     sqlog --time="July 19, 4pm" --nodes=host55
436 | 
437 | .fi
438 | Display at most 25 jobs that were running at midnight yesterday:
439 | .nf
440 | 
441 |     sqlog --time=yesterday,midnight
442 | 
443 | .fi
444 | Display all jobs that failed between 8:00AM and 9:00AM this morning,
445 | sorted by descending endtime:
446 | .nf
447 | 
448 |     sqlog --all --end=8am..9am --states=F --sort=-end
449 | 
450 | .fi 
451 | Display all jobs that started today:
452 | .nf
453 | 
454 |     sqlog --start=+midnight --all
455 | 
456 | .fi
457 | Display all jobs that have run between 3 and 4 hours on the nodes
458 | host30 through host65, and that didn't complete normally
459 | .nf
460 | 
461 |    sqlog -L 0 -T=3h..4h -n 'host[30-65]' -xs completed
462 | 
463 | .fi  
464 | Display all jobs that were running yesterday with 1000 nodes or 
465 | greater and completed normally:
466 | .nf
467 | 
468 |     sqlog -t yesterday,12am..12am -s CD -N +1000
469 | 
470 | .fi
471 | List current queue, sorted by number of nodes (ascending):
472 | .nf
473 | 
474 |     sqlog --all --no-db --sort=nnodes
475 | 
476 | .fi
477 | List the top 10 longest running jobs, and then the 5 oldest jobs:
478 | .nf
479 | 
480 |     sqlog --sort=runtime --limit=10
481 |     sqlog --sort=-start --limit=5
482 | 	
483 | .fi
484 | .SH AUTHOR
485 | Written by Adam Moody and Mark Grondona.
486 | 


--------------------------------------------------------------------------------
/sqlog.conf.example:
--------------------------------------------------------------------------------
 1 | ###############################################################################
 2 | # $Id$
 3 | ###############################################################################
 4 | #
 5 | # sqlog(1) database configuration
 6 | #
 7 | #  Override defaults for 
 8 | #    SQLHOST    Hostname for SQL db          (default = sqlhbost)
 9 | #    SQLUSER    Read-only username for db    (default = slurm_read)
10 | #    SQLPASS    Read-only password           (default = no password)
11 | #    SQLDB      Database name for slurm data (default = slurm)
12 | #    TRACKNODES Set to 0 to disable per-job node tracking (default = 1)
13 | #  
14 | #  This "config file" is read by perl's do() routine, so arbitrary
15 | #   perl can be used. (For example, to automatically determine the
16 | #   SQLHOST, etc.)
17 | #
18 | 
19 | #  Must begin with package "conf".
20 | #
21 | package conf;
22 | 
23 | use Genders;
24 | 
25 | #  Only need to override SQLHOST for now. Other defaults are ok.
26 | $SQLHOST = get_sqlhost ();
27 | 
28 | #  Example of adding new format list
29 | %FORMATS = ( "sys1" => "jid,user,name,start,end" );
30 | 
31 | 1;
32 | #
33 | #  Get cluster's "sqlhost" (altmgmt node for now)
34 | #   (Returns the ethernet hostname for the node, if available)
35 | #
36 | sub get_sqlhost
37 | {
38 |     my $genders = Genders->new();
39 |     my $host = "";
40 | 
41 |     ($host) = $genders->getnodes("mysqld") or
42 |         &main::log_fatal ("Failed to get SQL host from mysqld genders attr.\n");
43 | 
44 |     my $server = $genders->getattrval("altname", $host) or
45 |         &main::log_error ("Failed to get altname for $host.\n");
46 | 
47 |     return $server || $host;
48 | }
49 | 
50 | 


--------------------------------------------------------------------------------
/sqlog.spec:
--------------------------------------------------------------------------------
 1 | Name:      sqlog
 2 | Version:   See META
 3 | Release:   See META
 4 | 
 5 | Summary:   SLURM job completion database utilities
 6 | Group:     Applications/System
 7 | License:   GPL
 8 | Source:    %{name}-%{version}.tgz
 9 | BuildRoot: %{_tmppath}/%{name}-%{version}
10 | BuildArch: noarch
11 | Requires: slurm perl(Date::Manip) perl(DBI) perl(DBD::mysql) perl(Digest::SHA1) gendersllnl
12 | 
13 | %define debug_package %{nil}
14 | 
15 | %description
16 | sqlog provides a system for creation, query, and population of a 
17 | database of SLURM job history. 
18 | 
19 | %{!?_slurm_sysconfdir: %define _slurm_sysconfdir %{_sysconfdir}/slurm}
20 | %{!?_perl_path: %define _perl_path %{__perl}}
21 | %{!?_perl_libpaths: %define _perl_libpaths %{nil}}
22 | %{!?_path_env_var: %define _path_env_var /bin:/usr/bin:/usr/sbin}
23 | 
24 | %prep 
25 | %setup
26 | 
27 | %build
28 | #NOOP
29 | 
30 | %install
31 | rm -rf "$RPM_BUILD_ROOT"
32 | mkdir -p "$RPM_BUILD_ROOT"
33 | mkdir -p -m0755 $RPM_BUILD_ROOT/%{_libexecdir}/sqlog
34 | 
35 | perl -pli -e "s|/etc/slurm|%{_slurm_sysconfdir}|g;
36 | 	      s|/usr/bin/perl|%{_perl_path}|;
37 | 	      s|^use lib qw\(\);|use lib qw(%{_perl_libpaths});|;
38 | 	      s|^(\\\$ENV\{PATH\}) = '[^']*';|\$1 = '%{_path_env_var}';|;" \
39 | 	sqlog sqlog.1 sqlog-db-util sqlog-db-util.8 slurm-joblog.pl \
40 | 	skewstats skewstats.1
41 | 
42 | install -D -m 755 sqlog  ${RPM_BUILD_ROOT}/%{_bindir}/sqlog
43 | install -D -m 644 sqlog.1 ${RPM_BUILD_ROOT}/%{_mandir}/man1/sqlog.1
44 | install -D -m 755 skewstats  ${RPM_BUILD_ROOT}/%{_bindir}/skewstats
45 | install -D -m 644 skewstats.1 ${RPM_BUILD_ROOT}/%{_mandir}/man1/skewstats.1
46 | install -D -m 755 sqlog-db-util ${RPM_BUILD_ROOT}/%{_sbindir}/sqlog-db-util
47 | install -D -m 644 sqlog-db-util.8 ${RPM_BUILD_ROOT}/%{_mandir}/man8/sqlog-db-util.8
48 | install -D -m 755 slurm-joblog.pl \
49 | 			${RPM_BUILD_ROOT}/%{_libexecdir}/sqlog/slurm-joblog
50 | 
51 | 
52 | %clean
53 | rm -rf "$RPM_BUILD_ROOT"
54 | 
55 | %files
56 | %defattr(-,root,root)
57 | %doc README NEWS ChangeLog sqlog.conf.example slurm-joblog.conf.example
58 | %{_bindir}/sqlog
59 | %{_bindir}/skewstats
60 | %{_sbindir}/sqlog-db-util
61 | %{_mandir}/*/*
62 | %{_libexecdir}/sqlog
63 | 
64 | 


--------------------------------------------------------------------------------