├── .gitignore
├── LICENSE
├── tests.py
├── README.md
├── CHANGELOG
├── stuff
└── pg_bloat_check.py


/.gitignore:
--------------------------------------------------------------------------------
1 | ignore/*
2 | *.swp
3 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | LICENSE AND COPYRIGHT
 2 | ---------------------
 3 | 
 4 | pg_bloat_check.py is released under the PostgreSQL License, a liberal Open Source license, similar to the BSD or MIT licenses.
 5 | 
 6 | Copyright (c) 2019-2025 Crunchy Data Solutions, Inc
 7 | Copyright (c) 2017 OmniTI
 8 | 
 9 | Permission to use, copy, modify, and distribute this software and its documentation for any purpose, without fee, and without a written agreement is hereby granted, provided that the above copyright notice and this paragraph and the following two paragraphs appear in all copies.
10 | 
11 | IN NO EVENT SHALL THE AUTHOR BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF THE AUTHOR HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
12 | 
13 | THE AUTHOR SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS IS" BASIS, AND THE AUTHOR HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
14 | 


--------------------------------------------------------------------------------
/tests.py:
--------------------------------------------------------------------------------
 1 | from pg_bloat_check import convert_to_bytes
 2 | 
 3 | ### This section tests the convert_to_bytes() function ###
 4 | 
 5 | def test_convert_to_bytes(input_val,expected_val):
 6 |     return_val = convert_to_bytes(input_val)
 7 |     if return_val != expected_val:
 8 |         print("Test failed for {} -- Expected {} but got {}".format(input_val,expected_val,return_val))
 9 | 
10 | # Test numeric input
11 | # print("Testing int")
12 | test_convert_to_bytes(1,1)
13 | test_convert_to_bytes(1000,1000)
14 | test_convert_to_bytes(1000000,1000000)
15 | test_convert_to_bytes(1000000000,1000000000)
16 | test_convert_to_bytes(1000000000000,1000000000000)
17 | 
18 | # Test basic string input
19 | # print("Testing str")
20 | test_convert_to_bytes("1",1)
21 | test_convert_to_bytes("1000",1000)
22 | test_convert_to_bytes("1000000",1000000)
23 | test_convert_to_bytes("1000000000",1000000000)
24 | test_convert_to_bytes("1000000000000",1000000000000)
25 | 
26 | # Test permutations of alphas
27 | test_convert_to_bytes("1kb",1024)
28 | test_convert_to_bytes("1Kb",1024)
29 | test_convert_to_bytes("1KB",1024)
30 | 
31 | # Test really big numbers
32 | test_convert_to_bytes("1MB",1048576)
33 | test_convert_to_bytes("1GB",1073741824)
34 | test_convert_to_bytes("1TB",1099511627776)
35 | test_convert_to_bytes("1PB",1125899906842624)
36 | test_convert_to_bytes("1EB",1152921504606846976)
37 | test_convert_to_bytes("1ZB",1180591620717411303424)
38 | 
39 | # Test invalid input
40 | test_convert_to_bytes("q","q")
41 | test_convert_to_bytes("-1","-1")
42 | test_convert_to_bytes("1 KB",1)
43 | 
44 | ### End of convert_to_bytes() test ###
45 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # pg\_bloat\_check
 2 | 
 3 | pg\_bloat\_check is a script to provide a bloat report for PostgreSQL tables and/or indexes. It requires at least Python 3 and the `pgstattuple` contrib module - https://www.postgresql.org/docs/current/static/pgstattuple.html
 4 | 
 5 | Note that using `pgstattuple` to check for bloat can be extremely expensive on very large databases or those with many tables. The script first requires running `--create_stats_table` to create a table for storing the bloat statistics. This makes it easier for reviewing the bloat statistics or running a regular monitoring interval without having to rescan the database again. The expense of this method of bloat checking means this script is not meant to be run often to provide any sort of real-time bloat monitoring. At most, it's recommended to run this once a week or once a month during off-peak hours to search for objects that need major maintenance. Continuous runnning of this script could cause performance issues since it can cause higher priority data in shared buffers to be flushed out if run too frequently.
 6 | 
 7 | Output
 8 | ------
 9 |  - a simple text listing, ordered by wasted space. Good for email reports.
10 |  - a JSON blob that provides more detail and can be used by other tools that require a structured format
11 |  - a python dictionary with the same details as JSON, but can be more easily used with other python scripts
12 | 
13 | Filters
14 | -------
15 | Filters are available for bloat percentage, wasted size and object size. Object size allows reporting on only those objects of a designated size or larger. The bloat percentage & wasted size reported are a combination of dead tuples and free space per object, while also accounting for their fillfactor setting. The simple output format automatically takes into account an object's fillfactor setting when calculating the wasted space & percentage values it gives. Use the JSON or python dictionary output to see the distinction between dead tuples and free space for a more accurate picture of the bloat situation. If dead tuples is high, this means autovacuum is likely not able to run frequently enough on the given table or index. If dead tuples is low but free space is high, this indicates a vacuum full or reindex is likely required to clear the bloat and return the disk space to the system. Note that free space may never be completely empty due to fillfactor settings, so both that setting and the estimated number of pages for that object are also included. By default, tables have very little reserved space (fillfactor=100) so it shouldn't affect their free space values much. Indexes by default have 10% reserved space (fillfactor=90), so that should be taken into account when looking at the raw free percent and free space values.
16 | 
17 | The `--exclude_object_file` (-e) option can make the monitoring of bloat much more fine-grained for certain objects. The -s, -z and -p options are filters that are applied against all objects scanned. However, sometimes very large objects will have a very high amount of wasted space, but it's a very low percentage. Additionally, other relatively small objects always have a very high amount of bloat. Seeing them in the report every time isn't ideal, but you likely don't want to completely ignore these objects because if their bloat size got out of hand you would never know. To help create clearer bloat reports of things that need immediate attention, each line in the file for the -e option is a comma separated value entry of the following format: 
18 | 
19 | `objectname,bytes_wasted,percent_wasted`
20 | 
21 | The bytes\_wasted and percent\_wasted are additional filters on top of -s, -z and -p that tell the exclude option to ignore the given object unless these additional filter values are exceeded as well.
22 | 
23 | Examples
24 | --------
25 | First, the setup.
26 |  
27 | ```
28 | admin@mydb=# CREATE EXTENSION pgstattuple;
29 | 
30 | pg_bloat_check.py -c dbname=mydb --create_stats_table
31 | 
32 | pg_bloat_check.py -c "host=192.168.1.201 dbname=mydb user=admin" --create_stats_table --bloat_schema=monitoring
33 | 
34 | pg_bloat_check.py -c "host=db.example.com dbname=prod_db user=prod_role options='-c statement_timeout=0'" --create_stats_table --bloat_schema=monitoring
35 | ```
36 | 
37 | The first example above installs the stats tables to the default schema in your search path. You only have to run that once per database and if you run it again, it just drops the table if it exists and recreates it. If you want it in a different schema, `--bloat_schema` lets you set that, but you must then use that option every time you run the script or add that schema to your search path. The second example shows that as well as connecting to a remote system. The third example shows how you can disable a server side statement_timeouts using the psycopg2 connection options for the session.
38 | 
39 | ```
40 | pg_bloat_check.py -c dbname=mydb -z 10485760 -p 45 -s 5242880 
41 | 
42 | 1. pg_catalog.pg_attribute....................................................(83.76%) 4697 kB wasted
43 | ```
44 | 
45 | The above finds any tables or indexes that are at least 5MB in size with either 10MB of bloat space or are at least 45% bloated. Since it's over 45%, this table is returned in the report.
46 | 
47 | Next, an example using the -e option with a file containing the following:
48 | 
49 | ```
50 | pg_catalog.pg_attribute,5000000,85
51 | ```
52 | ```
53 | pg_bloat_check.py -c host=localhost -z 10485760 -p 45 -s 5242880 -e filterfile
54 | 
55 | No bloat found for given parameters
56 | ```
57 | 
58 | First, the -s, -z and -p filters are applied. The exclude file is processed and the pg\_attribute table is excluded if its bloat is either over 5 million bytes or exceeds 85%. Looking above, you can see none of these are true, so the report comes back empty. This now keeps the bloat report clear until we're sure specific conditions for this table are met. A more realistic example condition would be:
59 | 
60 | ```
61 | pg_bloat_check.py -c "dbname=prod" --min_wasted_size=5368709120 -p 45 -s 1073741824
62 | 
63 | 1. public.exclusions.....................................................(24.31%) 8866 MB wasted
64 | ```
65 | 
66 | Here, we're looking for objects that are at least 1GB in size and have either at least 5GB of bloat or are 45% bloated. The table returned has 8GB of bloat, but it's quite a low percentage of the whole table. This is likely a good candidate to be excluded from the regular report, but since it's such a large table, we wouldn't want to ignore it if it got out of hand. So, an exclude filter line that watches for it to be at least 55% bloated or have over 20GB of bloat could work here:
67 | 
68 | ```
69 | public.exclusions,21474836480,55
70 | ```
71 | 
72 | If an object needs to be excluded entirely, no matter its bloat, either just list the object name by itself or set the additional parameters to zero.
73 | 
74 | ```
75 | pg_catalog.pg_attribute
76 | 
77 | OR
78 | 
79 | pg_catalog.pg_attribute,0,0
80 | ```
81 | 
82 | See `--help` for more information.
83 | 
84 | NOTE: The 1.x version of this script used the bloat query found in check\_postgres.pl. While that query runs much faster than using `pgstattuple`, it can be inaccurate, at times missing large amounts of table and index bloat. Using `pgstattuple` provides the best method known to obtain the most accurate bloat statistics. Version 2.x of this script is not a drop-in replacement for 1.x. Please review the options and update any existing jobs accordingly.
85 | 


--------------------------------------------------------------------------------
/CHANGELOG:
--------------------------------------------------------------------------------
  1 | 2.8.0
  2 | - Drop python 2 support
  3 | - Change #! line to use python3 since several modern OS's no longer have a generic python binary/shortcut
  4 | 
  5 | 
  6 | 2.7.2
  7 | - Ignore spgist indexes. Thanks to @pdeaudney for the fix (Github PR #28)
  8 | - Add example to show how to set connection options when running bloat check (Github PR #28)
  9 | 
 10 | 
 11 | 2.7.1
 12 | - Fixed debug flag to work with additional flags (Github Issue #27)
 13 | 
 14 | 
 15 | 2.7.0
 16 | - Add support for simplified byte values to the filter options (b, kb, mb, gb, tb, pb, eb, zb). Thanks to @richyen on Github for the contribution (Github Issue #13 ,PR #21).
 17 | 
 18 | 
 19 | 2.6.5
 20 | - Allow --quick mode to work now that TOAST tables are also being scanned as of version 2.6.1. pgstattuple_approx() does not work on TOAST tables. The --quick mode now only applies the pgstattuple_approx() function to the base table and performs a normal check on all remaining objects. This means that --quick mode now no longer requires table mode and always scans all objects. Thanks to @victor-sudakov on Github for reporting the issue and suggestions on improvement. (Github Issue #22)
 21 | - Fix the output format for percentages to be at most 2 digit precision.
 22 | - Fix "index out of range" error when running in --debug mode
 23 | 
 24 | 
 25 | 2.6.4
 26 | - Grammar fixes
 27 | 
 28 | 
 29 | 2.6.3
 30 | - Fixed bug that caused script to fail when it encounters materialized views. Introduced in v2.5.1 when labels were added to the simple output. Added new label (mv) for materialized views. Thanks to @DHorneDBA on Github for reporting the issue. (Github Issue #19)
 31 | 
 32 | 
 33 | 2.6.2
 34 | - Ensure that temporary indexes are skipped. Thanks to @EricVeldhuyzen on Github for reporting issue with a fix (Github Issue #2).
 35 | 
 36 | 
 37 | 2.6.1
 38 | - Ensure TOAST tables are included in results. By default, pgstattuple does not include the underlying TOAST table for a relation. TOAST tables are now scanned as separate objects because their bloat status can vary widely from the main table. However, fixing TOAST bloat must be done through the main table. Many thanks to @jpuris on Github for reporting this issue and helping to test (Github Issue #15).
 39 |     - "simple" output will also output the toast table's associated real table.
 40 |     - Did not include associated real table in statistics table since a toast table could change any time a table is rewritten 
 41 |         and make the stored stats data invalid. It's better to dynamically look up a toast table's real table at the time the 
 42 |         data is queried (pg_class.reltoastrelid). This is what simple output does.
 43 | - Always include pg_catalog schema when the the --schema filter option is used.
 44 | - Fix python 3 compatibility around using --quiet option.
 45 | 
 46 | 
 47 | 2.6.0
 48 | - Modified --quiet option to allow setting multiple times to suppress different levels of non-error output.
 49 |     - Setting once will cause console output to only be given if bloat is actually found (supresses "no bloat found" message).
 50 |     - Setting two or more times will suppress all console output, even if bloat is found. Resulting data is only stored in stats tables. Setting value more than twice has no additional affects.
 51 |     - Added shorthand option to --quiet as -u to allow setting multiple times easier. Can do "--quiet --quiet", "-u -u", or "-uu". -q already taken by --quick and did not want to break backward compatability.
 52 | 
 53 | 
 54 | 2.5.1
 55 | - Allow the --rebuild_index option to honor all other filters so you only get rebuilding commands that are relevant to your desired bloat thresholds. Output of commands is now also given in wasted space order (similar to simple output list) starting with the largest.
 56 | - Added labels to the simple output to more clearly identify the object type (i=index, t=table, p=primary key).
 57 | 
 58 | 
 59 | 2.5.0
 60 | - Added --recovery_mode_norun option. This allows the same crontab entry to be entered on all systems in a failover cluster and ensure the bloat check is always only run on whichever system is the primary.
 61 | 
 62 | 
 63 | 2.4.3
 64 | - Make exit codes and database connection closes consistent throughout script. 
 65 | 
 66 | 
 67 | 2.4.2
 68 | - Fix regression in 2.4.1 that stopped --rebuild_index option from adding CONCURRENTLY flag to unique index statements.
 69 | 
 70 | 
 71 | 2.4.1
 72 | - Account for custom tablespace locations for indexes when using the --rebuild_index option to generate statements. If you use custom tablespace and used this option's generated SQL, please double-check that your indexes are in the correct tablespaces. CREATE INDEX does not automatically put the created index in the given table's tablespace unless it is the default or explicitly stated in the CREATE INDEX command. The generated statements now do this automatically for you if necessary.
 73 | 
 74 | 
 75 | 2.4.0
 76 | - Added oid column to the bloat_stats, bloat_tables and bloat_indexes tables. You will have to re-run --create_stats_table to recreate these tables in any database(s) this script is run against to include the new column. This makes it easier to use the values in the statistics tables in joins and functions and also simplifies script code.
 77 | 
 78 | 
 79 | 2.3.5
 80 | - Account for CLUSTER'ed indexes in the --rebuild_index output. If you'd previously used those generated commands, you may have lost the cluster attribute for that index. The commands output now properly account for this and will set the CLUSTER state on the given index without causing a long lock. However, the index does not have its data clustered yet. The command to actually cluster the data is given later, but commented out so it will not run automatically. This is to prevent an accidental, potentially very long exclusive lock on the table. You will have to manually run the CLUSTER command on the table to reorder your data. If you have a clustered index, it is highly recommended you look into altering the fillfactor storage attribute to reduce the chance that the clustering order is lost.
 81 | - Simplified internal check for PostgreSQL version.
 82 | 
 83 | 
 84 | 2.3.4
 85 | - Fixed script output to apply the fillfactor settings when any filtering options are set. Fillfactor was being accounted for in the text output values, but not in the actual calculation of the output, which caused objects to be output in the bloat list when they should not have been. 
 86 | - Additional syntax fixes so debug output should work in python 3.
 87 | 
 88 | 
 89 | 2.3.3
 90 | - Fix all print syntax so script works in python 3.
 91 | 
 92 | 
 93 | 2.3.2
 94 | - Do not rely on OID from pg_class to perform scanning operations anymore. Could possibly change mid-run without the object name changing and cause script failure.
 95 | 
 96 | 
 97 | 2.3.1
 98 | - Fixed undefined variable error when using wasted space & wasted percentage options in the --exclude_object_file file
 99 | 
100 | 
101 | 2.3.0
102 | - Added --rebuild_index argument. This generates SQL statement blocks that can be used to rebuild indexes with minimal impact on database locks.
103 |     - It does not run the statements it generates, it only outputs them to STDOUT.
104 |     - Handles primary keys & unique constraints. Unique constraints are rebuilt as unique indexes. They are technically the same under the hood as far as PostgreSQL is concerned and having them as an index makes them easier to rebuild in the future since they require fewer locks.
105 |     - CONCURRENTLY and ANALYZE statements require no locks on the tables involved. All ALTER TABLE statements require an exclusive lock on the table, so monitor locking carefully on those steps.
106 | - Allow script to work with non-release versions of PostgreSQL (ex 9.6beta2).
107 | 
108 | 
109 | 2.2.0
110 |  - Account for fillfactor settings in both tables and indexes when determining wasted space. This will likely cause an expected 10% drop in most index bloat reported (if fillfactor hasn't been adjusted manually).
111 |     - The "simple" output mode will automatically account for the fillfactor when outputing the wasted space values. For dict & json output, the original raw values of all statistics are given along with that object's fillfactor and relpage count so it can be calculated manually.
112 |     - Added new "fillfactor" & "relpages" columns to bloat statistics tables. Run --create_stats_table again to recreate the table with the new, required columns. Note this will wipe out any data contained in the stats table so if you need to preserve it, do so before updating script to this version.
113 |     - Added new --noanalyze option. Accounting for fillfactor requires an accurate page count estimate for each object, so it is now analyzed before scanning for bloat. This will cause this script to take slightly longer, but provide more accurate bloat statistics. You can set this option to skip the analyze. However, unless a recent analyze was done, this may cause the bloat report to be less accurate.
114 | 
115 | - pgstattuple does not currently support BRIN indexes. Excluded for now (Pull Request #6).
116 | - Fixed mismatch in json/dict output. It was giving the dead_tuple_percent value in the free_percent key (Pull Request #6). 
117 | 
118 | 
119 | 2.1.1
120 |  - Improve backward compatibility for PostgreSQL versions older than 9.3 (pg_index.indislive did not exist before then) (Github Issue #5)
121 | 
122 | 
123 | 2.1.0
124 |  - Fixed --bloat_schema & --pgstattuple_schema options so they actually work (Github Issue #4)
125 |  - More fine grained filtering using --exclude_object_file. Each line in the file is now a CSV entry that can provide additional filtering per object in addition to -s, -p & -z. See README for examples (Github Issue #3).
126 |  - Added json & jsonpretty --format modes.
127 |  - Changed dict --format mode to just output an unparsed dictionary blob. Previously each object was separated with a return, but this made it more difficult to actually use as a dictionary object. If you need a human readable format of the raw details, use --format=jsonpretty
128 |  - Check for object existence again before scanning. Avoids issues during long run times with objects that have been dropped since the initial object list was created.
129 |  - If an index is a primary key, the objecttype column in the bloat statistics table will now be "index_pk" to distinguish them from just a normal index.
130 |  - The --create_stats_table option will now attempt to drop the tables it creates first before creating them. Will allow any future changes to the stats table to be implemented easier.
131 |  - Changed --norescan option to --noscan for simplicity.
132 | 
133 | 
134 | 2.0.2
135 |  - Skip over temp tables and indexes that are marked as dropped (Github Issue #2).
136 | 
137 | 
138 | 2.0.1
139 |  - Skip over GIN indexes since there's no current way to find bloat with them in pgstattuple. Looking for contributions to fix support this https://github.com/keithf4/pg_bloat_check/issues/1
140 | 
141 | 
142 | 2.0.0
143 |  - Complete rewrite using pgstattuple contrib module instead of check_postgres.pl query
144 |  - See --help for more information and blog post at https://www.keithf4.com/checking-for-postgresql-bloat/
145 | 


--------------------------------------------------------------------------------
/stuff:
--------------------------------------------------------------------------------
  1 | 1. pg_catalog.pg_class (t) ...................................................(97.27%) 3712 kB wasted
  2 | 2. pg_catalog.pg_statistic (t) ...............................................(93.23%) 2819 kB wasted
  3 | 3. pg_catalog.pg_type (t) ....................................................(93.94%) 1962 kB wasted
  4 | 4. pg_catalog.pg_attribute (t) ...............................................(77.96%) 1771 kB wasted
  5 | 5. pg_catalog.pg_attribute_relid_attnam_index (i) ............................(84.16%) 1751 kB wasted
  6 | 6. pg_catalog.pg_depend_depender_index (i) ...................................(83.68%) 1520 kB wasted
  7 | 7. pg_catalog.pg_attribute_relid_attnum_index (p) ............................(84.32%) 1255 kB wasted
  8 | 8. pg_catalog.pg_depend (t) ..................................................(81.24%) 1176 kB wasted
  9 | 9. pg_catalog.pg_depend_reference_index (i) ...................................(83.63%) 850 kB wasted
 10 | 10. pg_catalog.pg_type_typname_nsp_index (i) ..................................(85.89%) 818 kB wasted
 11 | 11. pg_catalog.pg_class_relname_nsp_index (i) .................................(83.70%) 455 kB wasted
 12 | 12. pg_catalog.pg_statistic_relid_att_inh_index (p) ...........................(81.64%) 255 kB wasted
 13 | 13. pg_catalog.pg_type_oid_index (p) ..........................................(79.95%) 230 kB wasted
 14 | 14. pg_catalog.pg_class_tblspc_relfilenode_index (i) ..........................(81.97%) 230 kB wasted
 15 | 15. pg_catalog.pg_class_oid_index (p) .........................................(80.78%) 220 kB wasted
 16 | 16. pg_toast.pg_toast_2618 (t) ................................................(29.25%) 213 kB wasted
 17 |       Real table: pg_catalog.pg_rewrite
 18 | 17. pg_catalog.pg_inherits (t) ................................................(99.46%) 207 kB wasted
 19 | 18. pg_catalog.pg_proc_proname_args_nsp_index (i) .............................(26.09%) 123 kB wasted
 20 | 19. pg_toast.pg_toast_1255 (t) ................................................(50.83%) 102 kB wasted
 21 |       Real table: pg_catalog.pg_proc
 22 | 20. pg_catalog.pg_inherits_relid_seqno_index (p) ...............................(77.39%) 99 kB wasted
 23 | 21. pg_toast.pg_toast_2619 (t) .................................................(79.22%) 89 kB wasted
 24 |       Real table: pg_catalog.pg_statistic
 25 | 22. pg_catalog.pg_proc (t) ......................................................(6.12%) 78 kB wasted
 26 | 23. pg_catalog.pg_constraint_conrelid_contypid_conname_index (i) ...............(61.24%) 54 kB wasted
 27 | 24. pg_catalog.pg_trigger_tgrelid_tgname_index (i) .............................(67.07%) 48 kB wasted
 28 | 25. pg_catalog.pg_description_o_c_o_index (p) ..................................(21.26%) 43 kB wasted
 29 | 26. pg_catalog.pg_inherits_parent_index (i) ....................................(61.18%) 34 kB wasted
 30 | 27. pg_catalog.pg_constraint_conname_nsp_index (i) .............................(51.64%) 33 kB wasted
 31 | 28. pg_catalog.pg_index_indexrelid_index (p) ...................................(55.17%) 31 kB wasted
 32 | 29. pg_catalog.pg_trigger_oid_index (p) ........................................(56.12%) 27 kB wasted
 33 | 30. pg_catalog.pg_index_indrelid_index (i) .....................................(50.03%) 24 kB wasted
 34 | 31. pg_catalog.pg_proc_oid_index (p) ...........................................(14.54%) 20 kB wasted
 35 | 32. pg_catalog.pg_attrdef_adrelid_adnum_index (i) ..............................(47.44%) 19 kB wasted
 36 | 33. pg_catalog.pg_attrdef_oid_index (p) ........................................(47.44%) 19 kB wasted
 37 | 34. pg_catalog.pg_index (t) ....................................................(35.84%) 17 kB wasted
 38 | 35. pg_catalog.pg_constraint_oid_index (p) .....................................(41.29%) 17 kB wasted
 39 | 36. pg_catalog.pg_shdepend (t) ..................................................(5.17%) 14 kB wasted
 40 | 37. pg_catalog.pg_trigger_tgconstraint_index (i) ...............................(39.29%) 13 kB wasted
 41 | 38. pg_catalog.pg_constraint (t) ...............................................(30.83%) 12 kB wasted
 42 | 39. pg_catalog.pg_shdepend_depender_index (i) ...................................(7.13%) 12 kB wasted
 43 | 40. pg_catalog.pg_rewrite (t) ...................................................(9.57%) 11 kB wasted
 44 | 41. pg_catalog.pg_publication (t) .........................................(99.30%) 8134 bytes wasted
 45 | 42. public.v_start_time_string (t) ........................................(98.93%) 8104 bytes wasted
 46 | 43. pg_catalog.pg_ts_parser (t) ...........................................(98.14%) 8040 bytes wasted
 47 | 44. pg_catalog.pg_tablespace (t) ..........................................(97.22%) 7964 bytes wasted
 48 | 45. pg_catalog.pg_partitioned_table (t) ...................................(97.17%) 7960 bytes wasted
 49 | 46. pg_catalog.pg_shdescription (t) .......................................(96.88%) 7936 bytes wasted
 50 | 47. pg_catalog.pg_publication_rel (t) .....................................(96.43%) 7900 bytes wasted
 51 | 48. pg_catalog.pg_auth_members (t) ........................................(96.01%) 7865 bytes wasted
 52 | 49. pg_catalog.pg_range (t) ...............................................(95.26%) 7804 bytes wasted
 53 | 50. pg_catalog.pg_db_role_setting (t) .....................................(95.12%) 7792 bytes wasted
 54 | 51. pg_catalog.pg_language (t) ............................................(93.60%) 7668 bytes wasted
 55 | 52. pg_catalog.pg_ts_template (t) .........................................(93.07%) 7624 bytes wasted
 56 | 53. pg_catalog.pg_sequence (t) ............................................(92.95%) 7614 bytes wasted
 57 | 54. information_schema.sql_features (t) ...................................(11.44%) 7496 bytes wasted
 58 | 55. pg_catalog.pg_am (t) ..................................................(90.43%) 7408 bytes wasted
 59 | 56. information_schema.sql_parts (t) ......................................(90.38%) 7404 bytes wasted
 60 | 57. pg_catalog.pg_trigger (t) .............................................(89.95%) 7368 bytes wasted
 61 | 58. information_schema.sql_implementation_info (t) ........................(89.06%) 7296 bytes wasted
 62 | 59. pg_toast.pg_toast_2618_index (p) ......................................(14.71%) 7277 bytes wasted
 63 | 60. pg_catalog.pg_extension (t) ...........................................(86.85%) 7115 bytes wasted
 64 | 61. pg_catalog.pg_database (t) ............................................(86.14%) 7056 bytes wasted
 65 | 62. pg_catalog.pg_namespace (t) ...........................................(85.26%) 6984 bytes wasted
 66 | 63. pg_catalog.pg_ts_parser_oid_index (p) .................................(39.61%) 6490 bytes wasted
 67 | 64. pg_catalog.pg_ts_parser_prsname_index (i) .............................(39.56%) 6482 bytes wasted
 68 | 65. pg_catalog.pg_tablespace_oid_index (p) ................................(39.49%) 6470 bytes wasted
 69 | 66. pg_catalog.pg_tablespace_spcname_index (i) ............................(39.39%) 6454 bytes wasted
 70 | 67. pg_catalog.pg_publication_pubname_index (i) ...........................(39.37%) 6450 bytes wasted
 71 | 68. pg_catalog.pg_shdescription_o_c_index (p) .............................(39.37%) 6450 bytes wasted
 72 | 69. pg_catalog.pg_db_role_setting_databaseid_rol_index (p) ................(39.24%) 6430 bytes wasted
 73 | 70. pg_catalog.pg_language_oid_index (p) ..................................(39.24%) 6430 bytes wasted
 74 | 71. pg_catalog.pg_language_name_index (i) .................................(39.19%) 6422 bytes wasted
 75 | 72. pg_catalog.pg_publication_oid_index (p) ...............................(39.12%) 6410 bytes wasted
 76 | 73. pg_catalog.pg_partitioned_table_partrelid_index (p) ...................(39.12%) 6410 bytes wasted
 77 | 74. pg_catalog.pg_ts_template_oid_index (p) ...............................(39.12%) 6410 bytes wasted
 78 | 75. pg_catalog.pg_attrdef (t) .............................................(39.09%) 6404 bytes wasted
 79 | 76. pg_catalog.pg_range_rngmultitypid_index (i) ...........................(39.00%) 6390 bytes wasted
 80 | 77. pg_catalog.pg_range_rngtypid_index (p) ................................(39.00%) 6390 bytes wasted
 81 | 78. pg_catalog.pg_am_oid_index (p) ........................................(38.88%) 6370 bytes wasted
 82 | 79. pg_catalog.pg_ts_template_tmplname_index (i) ..........................(38.88%) 6370 bytes wasted
 83 | 80. pg_catalog.pg_am_name_index (i) .......................................(38.88%) 6370 bytes wasted
 84 | 81. pg_catalog.pg_database_datname_index (i) ..............................(38.65%) 6334 bytes wasted
 85 | 82. pg_catalog.pg_namespace_oid_index (p) .................................(38.53%) 6314 bytes wasted
 86 | 83. pg_catalog.pg_database_oid_index (p) ..................................(38.51%) 6310 bytes wasted
 87 | 84. pg_catalog.pg_extension_name_index (i) ................................(38.43%) 6298 bytes wasted
 88 | 85. pg_catalog.pg_auth_members_member_role_index (i) ......................(38.39%) 6290 bytes wasted
 89 | 86. pg_catalog.pg_auth_members_role_member_index (p) ......................(38.39%) 6290 bytes wasted
 90 | 87. information_schema.sql_sizing (t) .....................................(75.63%) 6196 bytes wasted
 91 | 88. pg_catalog.pg_namespace_nspname_index (i) .............................(37.80%) 6194 bytes wasted
 92 | 89. pg_catalog.pg_extension_oid_index (p) .................................(37.41%) 6130 bytes wasted
 93 | 90. pg_catalog.pg_authid_oid_index (p) ....................................(36.92%) 6050 bytes wasted
 94 | 91. pg_catalog.pg_ts_dict_oid_index (p) ...................................(36.19%) 5930 bytes wasted
 95 | 92. pg_catalog.pg_ts_config_oid_index (p) .................................(36.19%) 5930 bytes wasted
 96 | 93. pg_catalog.pg_publication_rel_prrelid_prpubid_index (i) ...............(35.70%) 5850 bytes wasted
 97 | 94. pg_catalog.pg_publication_rel_prpubid_index (i) .......................(35.70%) 5850 bytes wasted
 98 | 95. pg_catalog.pg_publication_rel_oid_index (p) ...........................(35.70%) 5850 bytes wasted
 99 | 96. pg_catalog.pg_authid_rolname_index (i) ................................(35.61%) 5834 bytes wasted
100 | 97. pg_catalog.pg_authid (t) ..............................................(69.77%) 5716 bytes wasted
101 | 98. pg_catalog.pg_ts_config_cfgname_index (i) .............................(34.78%) 5698 bytes wasted
102 | 99. pg_catalog.pg_collation_oid_index (p) .................................(34.73%) 5690 bytes wasted
103 | 100. pg_catalog.pg_sequence_seqrelid_index (p) ............................(34.48%) 5650 bytes wasted
104 | 101. pg_catalog.pg_ts_dict_dictname_index (i) .............................(33.95%) 5562 bytes wasted
105 | 102. pg_toast.pg_toast_2619_index (p) .....................................(32.65%) 5350 bytes wasted
106 | 103. pg_catalog.pg_collation_name_enc_nsp_index (i) .......................(31.80%) 5210 bytes wasted
107 | 104. pg_catalog.pg_operator_oprname_l_r_n_index (i) .......................(10.56%) 5189 bytes wasted
108 | 105. pg_catalog.pg_ts_config (t) ..........................................(61.43%) 5032 bytes wasted
109 | 106. pg_catalog.pg_constraint_conparentid_index (i) .......................(29.43%) 4822 bytes wasted
110 | 107. pg_catalog.pg_constraint_contypid_index (i) ..........................(29.23%) 4790 bytes wasted
111 | 108. pg_catalog.pg_amproc (t) .............................................(11.17%) 4576 bytes wasted
112 | 109. pg_catalog.pg_cast (t) ...............................................(26.98%) 4420 bytes wasted
113 | 110. public.bloat_tables (t) ..............................................(53.81%) 4408 bytes wasted
114 | 111. pg_catalog.pg_operator_oid_index (p) .................................(10.57%) 4328 bytes wasted
115 | 112. pg_toast.pg_toast_1255_index (p) .....................................(26.30%) 4310 bytes wasted
116 | 113. pg_catalog.pg_init_privs (t) .........................................(16.53%) 4064 bytes wasted
117 | 114. pg_catalog.pg_ts_dict (t) ............................................(48.83%) 4000 bytes wasted
118 | 115. pg_catalog.pg_conversion_oid_index (p) ...............................(24.11%) 3950 bytes wasted
119 | 116. pg_catalog.pg_opfamily_oid_index (p) .................................(21.91%) 3590 bytes wasted
120 | 117. pg_catalog.pg_aggregate_fnoid_index (p) ..............................(21.67%) 3550 bytes wasted
121 | 118. pg_catalog.pg_opclass_oid_index (p) ..................................(18.12%) 2970 bytes wasted
122 | 119. pg_catalog.pg_conversion_default_index (i) ...........................(17.86%) 2926 bytes wasted
123 | 120. pg_catalog.pg_description (t) .........................................(0.93%) 2903 bytes wasted
124 | 121. pg_catalog.pg_rewrite_oid_index (p) ..................................(17.39%) 2850 bytes wasted
125 | 122. pg_catalog.pg_opclass (t) ............................................(10.35%) 2544 bytes wasted
126 | 123. pg_catalog.pg_operator (t) ............................................(2.12%) 2436 bytes wasted
127 | 124. pg_catalog.pg_collation (t) ..........................................(27.25%) 2232 bytes wasted
128 | 125. pg_catalog.pg_cast_oid_index (p) .....................................(11.78%) 1930 bytes wasted
129 | 126. pg_catalog.pg_cast_source_target_index (i) ...........................(11.78%) 1930 bytes wasted
130 | 127. pg_catalog.pg_conversion_name_nsp_index (i) ..........................(10.04%) 1646 bytes wasted
131 | 128. pg_catalog.pg_amop_oid_index (p) ......................................(3.44%) 1408 bytes wasted
132 | 129. pg_catalog.pg_rewrite_rel_rulename_index (i) ..........................(8.36%) 1370 bytes wasted
133 | 130. public.bloat_indexes (t) .............................................(16.21%) 1328 bytes wasted
134 | 131. pg_catalog.pg_aggregate (t) ...........................................(7.13%) 1168 bytes wasted
135 | 132. pg_catalog.pg_amop_opr_fam_index (i) ..................................(2.32%) 1141 bytes wasted
136 | 133. pg_catalog.pg_amop_fam_strat_index (i) ................................(2.30%) 1133 bytes wasted
137 | 134. pg_catalog.pg_opfamily_am_name_nsp_index (i) ...........................(5.55%) 910 bytes wasted
138 | 135. pg_catalog.pg_amproc_fam_proc_index (i) ................................(1.91%) 784 bytes wasted
139 | 136. pg_catalog.pg_opfamily (t) .............................................(3.42%) 560 bytes wasted
140 | 137. pg_catalog.pg_conversion (t) ...........................................(2.78%) 456 bytes wasted
141 | 138. pg_catalog.pg_amop (t) .................................................(0.78%) 448 bytes wasted
142 | 139. pg_catalog.pg_ts_config_map (t) ........................................(1.01%) 248 bytes wasted
143 | 


--------------------------------------------------------------------------------
/pg_bloat_check.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | 
  3 | # Script is maintained at https://github.com/keithf4/pg_bloat_check
  4 | 
  5 | import argparse, csv, json, psycopg2, re, sys
  6 | from psycopg2 import extras
  7 | from random import randint
  8 | 
  9 | version = "2.8.0"
 10 | 
 11 | parser = argparse.ArgumentParser(description="Provide a bloat report for PostgreSQL tables and/or indexes. This script uses the pgstattuple contrib module which must be installed first. Note that the query to check for bloat can be extremely expensive on very large databases or those with many tables. The script stores the bloat stats in a table so they can be queried again as needed without having to re-run the entire scan. The table contains a timestamp columns to show when it was obtained.")
 12 | args_general = parser.add_argument_group(title="General options")
 13 | args_general.add_argument('-c','--connection', default="host=", help="""Connection string for use by psycopg. Defaults to "host=" (local socket).""")
 14 | args_general.add_argument('-e', '--exclude_object_file', help="""Full path to file containing a list of objects to exclude from the report (tables and/or indexes). Each line is a CSV entry in the format: objectname,bytes_wasted,percent_wasted. All objects must be schema qualified. bytes_wasted & percent_wasted are additional filter values on top of -s, -p, and -z to exclude the given object unless these values are also exceeded. Set either of these values to zero (or leave them off entirely) to exclude the object no matter what its bloat level. Comments are allowed if the line is prepended with "#". See the README.md for clearer examples of how to use this for more fine grained filtering.""")
 15 | args_general.add_argument('-f', '--format', default="simple", choices=["simple", "json", "jsonpretty", "dict"], help="Output formats. Simple is a plaintext version suitable for any output (ex: console, pipe to email). Object type is in parentheses (t=table, i=index, p=primary key). Json provides standardized json output which may be useful if taking input into something that needs a more structured format. Json also provides more details about dead tuples, empty space & free space. jsonpretty outputs in a more human readable format. Dict is the same as json but in the form of a python dictionary. Default is simple.")
 16 | args_general.add_argument('-m', '--mode', choices=["tables", "indexes", "both"], default="both", help="""Provide bloat reports for tables, indexes or both. Index bloat is always distinct from table bloat and reported as separate entries in the report. Default is "both". NOTE: GIN indexes are not supported at this time and will be skipped.""")
 17 | args_general.add_argument('-n', '--schema', help="Comma separated list of schema to include in report. pg_catalog schema is always included. All other schemas will be ignored.")
 18 | args_general.add_argument('-N', '--exclude_schema', help="Comma separated list of schemas to exclude.")
 19 | args_general.add_argument('--noanalyze', action="store_true", help="To ensure accurate fillfactor statistics, an analyze if each object being scanned is done before the check for bloat. Set this to skip the analyze step and reduce overall runtime, however your bloat statistics may not be as accurate.")
 20 | args_general.add_argument('--noscan', action="store_true", help="Set this option to have the script just read from the bloat statistics table without doing a scan of any tables again.")
 21 | args_general.add_argument('-p', '--min_wasted_percentage', type=float, default=0.1, help="Minimum percentage of wasted space an object must have to be included in the report. Default and minimum value is 0.1 (DO NOT include percent sign in given value).")
 22 | args_general.add_argument('-q', '--quick', action="store_true", help="Use the pgstattuple_approx() function instead of pgstattuple() for a quicker, but possibly less accurate bloat report on tables. Note that this does not work on indexes or TOAST tables and those objects will continue to be scanned with pgstattuple() and still be included in the results. Sets the 'approximate' column in the bloat statistics table to True. Note this only works in PostgreSQL 9.5+.")
 23 | args_general.add_argument('-u', '--quiet', default=0, action="count", help="Suppress console output but still insert data into the bloat statistics table. This option can be set several times. Setting once will suppress all non-error console output if no bloat is found, but still output when it is found for given parameter settings. Setting it twice will suppress all console output, even if bloat is found.")
 24 | args_general.add_argument('-r', '--commit_rate', type=int, default=5, help="Sets how many tables are scanned before committing inserts into the bloat statistics table. Helps avoid long running transactions when scanning large tables. Default is 5. Set to 0 to avoid committing until all tables are scanned. NOTE: The bloat table is truncated on every run unless --noscan is set.")
 25 | args_general.add_argument('--rebuild_index', action="store_true", help="Output a series of SQL commands for each index that will rebuild it with minimal impact on database locks. This does NOT run the given sql, it only provides the commands to do so manually. This does not run a new scan and will use the indexes contained in the statistics table from the last run. If a unique index was previously defined as a constraint, it will be recreated as a unique index. All other filters used during a standard bloat check scan can be used with this option so you only get commands to run for objects relevant to your desired bloat thresholds.")
 26 | args_general.add_argument('--recovery_mode_norun', action="store_true", help="Setting this option will cause the script to check if the database it is running against is a replica (in recovery mode) and cause it to skip running. Otherwise if it is not in recovery, it will run as normal. This is useful for when you want to ensure the bloat check always runs only on the primary after failover without having to edit crontabs or similar process managers.")
 27 | args_general.add_argument('-s', '--min_size', default=1, help="Minimum size in bytes of object to scan (table or index). Default and minimum value is 1. Size units (mb, kb, tb, etc.) can be provided as well")
 28 | args_general.add_argument('-t', '--tablename', help="Scan for bloat only on the given table. Must be schema qualified. This always gets both table and index bloat and overrides all other filter options so you always get the bloat statistics for the table no matter what they are.")
 29 | args_general.add_argument('--version', action="store_true", help="Print version of this script.")
 30 | args_general.add_argument('-z', '--min_wasted_size', default=1, help="Minimum size of wasted space in bytes. Default and minimum is 1. Size units (mb, kb, tb, etc.) can be provided as well")
 31 | args_general.add_argument('--debug', action="store_true", help="Output additional debugging information. Overrides quiet option.")
 32 | 
 33 | args_setup = parser.add_argument_group(title="Setup")
 34 | args_setup.add_argument('--pgstattuple_schema', help="If pgstattuple is not installed in the default search path, use this option to designate the schema where it is installed.")
 35 | args_setup.add_argument('--bloat_schema', help="Set the schema that the bloat report table is in if it's not in the default search path. Note this option can also be set when running --create_stats_table to set which schema you want the table created.")
 36 | args_setup.add_argument('--create_stats_table', action="store_true", help="Create the required tables that the bloat report uses (bloat_stats + two child tables). Places table in default search path unless --bloat_schema is set.")
 37 | args = parser.parse_args()
 38 | 
 39 | 
 40 | def check_pgstattuple(conn):
 41 |     sql = "SELECT e.extversion, n.nspname FROM pg_catalog.pg_extension e JOIN pg_catalog.pg_namespace n ON e.extnamespace = n.oid WHERE extname = 'pgstattuple'"
 42 |     cur = conn.cursor()
 43 |     cur.execute(sql)
 44 |     pgstattuple_info = cur.fetchone()
 45 |     if pgstattuple_info == None:
 46 |         print("pgstattuple extension not found. Please ensure it is installed in the database this script is connecting to.")
 47 |         close_conn(conn)
 48 |         sys.exit(2)
 49 |     if args.pgstattuple_schema != None:
 50 |         if args.pgstattuple_schema != pgstattuple_info[1]:
 51 |             print("pgstattuple not found in the schema given by --pgstattuple_schema option: " + args.pgstattuple_schema + ". Found instead in: " + pgstattuple_info[1]+".")
 52 |             close_conn(conn)
 53 |             sys.exit(2)
 54 |     return pgstattuple_info[0]
 55 | 
 56 | 
 57 | def check_recovery_status(conn):
 58 |     sql = "SELECT pg_is_in_recovery FROM pg_catalog.pg_is_in_recovery()"
 59 |     cur = conn.cursor()
 60 |     cur.execute(sql)
 61 |     is_in_recovery = cur.fetchone()[0]
 62 |     cur.close()
 63 |     return is_in_recovery
 64 | 
 65 | 
 66 | def create_conn():
 67 |     conn = psycopg2.connect(args.connection)
 68 |     return conn
 69 | 
 70 | 
 71 | def close_conn(conn):
 72 |     conn.close()
 73 | 
 74 | 
 75 | def create_list(list_type, list_items):
 76 |     split_list = []
 77 |     if list_type == "csv":
 78 |         split_list = list_items.split(',')
 79 |     elif list_type == "file":
 80 |         with open(list_items, 'r') as csvfile:
 81 |             objectreader = csv.DictReader(csvfile, fieldnames=['objectname', 'max_wasted', 'max_perc'])
 82 |             for o in objectreader:
 83 |                 if not o['objectname'].startswith('#'):
 84 |                     o['objectname'] = o['objectname'].strip()
 85 | 
 86 |                     if o['max_wasted'] != None:
 87 |                         o['max_wasted'] = int(o['max_wasted'])
 88 |                     else:
 89 |                         o['max_wasted'] = 0
 90 | 
 91 |                     if o['max_perc'] != None:
 92 |                         o['max_perc'] = float(o['max_perc'])
 93 |                     else:
 94 |                         o['max_perc'] = 0
 95 | 
 96 |                     split_list.append(o)
 97 | 
 98 |     return split_list
 99 | 
100 | 
101 | def create_stats_table(conn):
102 |     if args.bloat_schema != None:
103 |         parent_sql = args.bloat_schema + "." + "bloat_stats"
104 |         tables_sql = args.bloat_schema + "." + "bloat_tables"
105 |         indexes_sql = args.bloat_schema + "." + "bloat_indexes"
106 |     else:
107 |         parent_sql = "bloat_stats"
108 |         tables_sql = "bloat_tables"
109 |         indexes_sql = "bloat_indexes"
110 | 
111 |     drop_sql = "DROP TABLE IF EXISTS " + parent_sql + " CASCADE"
112 | 
113 |     sql = "CREATE TABLE " + parent_sql + """ (
114 |                               oid oid NOT NULL
115 |                             , schemaname text NOT NULL
116 |                             , objectname text NOT NULL
117 |                             , objecttype text NOT NULL
118 |                             , size_bytes bigint
119 |                             , live_tuple_count bigint
120 |                             , live_tuple_percent float8
121 |                             , dead_tuple_count bigint
122 |                             , dead_tuple_size_bytes bigint
123 |                             , dead_tuple_percent float8
124 |                             , free_space_bytes bigint
125 |                             , free_percent float8
126 |                             , stats_timestamp timestamptz NOT NULL DEFAULT CURRENT_TIMESTAMP
127 |                             , approximate boolean NOT NULL DEFAULT false
128 |                             , relpages bigint NOT NULL DEFAULT 1
129 |                             , fillfactor float8 NOT NULL DEFAULT 100)"""
130 |     cur = conn.cursor()
131 |     if args.debug:
132 |         print(cur.mogrify("drop_sql: " + drop_sql))
133 |     cur.execute(drop_sql)
134 |     if args.debug:
135 |         print(cur.mogrify("sql: " + sql))
136 |     cur.execute(sql)
137 |     sql = "CREATE TABLE " + tables_sql + " (LIKE " + parent_sql + " INCLUDING ALL) INHERITS (" + parent_sql + ")"
138 |     if args.debug:
139 |         print(cur.mogrify("sql: " + sql))
140 |     cur.execute(sql)
141 |     sql = "CREATE TABLE " + indexes_sql + " (LIKE " + parent_sql + " INCLUDING ALL) INHERITS (" + parent_sql + ")"
142 |     if args.debug:
143 |         print(cur.mogrify("sql: " + sql))
144 |     cur.execute(sql)
145 |     sql = "COMMENT ON TABLE " + parent_sql + " IS 'Table providing raw data for table & index bloat'"
146 |     if args.debug:
147 |         print(cur.mogrify("sql: " + sql))
148 |     cur.execute(sql)
149 |     sql = "COMMENT ON TABLE " + tables_sql + " IS 'Table providing raw data for table bloat'"
150 |     if args.debug:
151 |         print(cur.mogrify("sql: " + sql))
152 |     cur.execute(sql)
153 |     sql = "COMMENT ON TABLE " + indexes_sql + " IS 'Table providing raw data for index bloat'"
154 |     if args.debug:
155 |         print(cur.mogrify("sql: " + sql))
156 |     cur.execute(sql)
157 | 
158 |     conn.commit()
159 |     cur.close()
160 | 
161 | 
162 | def get_bloat(conn, exclude_schema_list, include_schema_list, exclude_object_list):
163 |     sql = ""
164 |     commit_counter = 0
165 |     analyzed_tables = []
166 |     cur = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
167 | 
168 |     sql = "SELECT current_setting('block_size')"
169 |     cur.execute(sql)
170 |     block_size = int(cur.fetchone()[0])
171 | 
172 |     sql_tables = """ SELECT c.oid, c.relkind, c.relname, n.nspname, 'false' as indisprimary, c.reloptions
173 |                     FROM pg_catalog.pg_class c
174 |                     JOIN pg_catalog.pg_namespace n ON c.relnamespace = n.oid
175 |                     WHERE relkind IN ('r', 'm')
176 |                     AND c.relpersistence <> 't' """
177 | 
178 |     sql_indexes = """ SELECT c.oid, c.relkind, c.relname, n.nspname, i.indisprimary, c.reloptions 
179 |                     FROM pg_catalog.pg_class c
180 |                     JOIN pg_catalog.pg_namespace n ON c.relnamespace = n.oid
181 |                     JOIN pg_catalog.pg_index i ON c.oid = i.indexrelid
182 |                     JOIN pg_catalog.pg_am a ON c.relam = a.oid
183 |                     WHERE c.relkind = 'i' 
184 |                     AND c.relpersistence <> 't'
185 |                     AND a.amname <> 'gin'
186 |                     AND a.amname <> 'brin'
187 |                     AND a.amname <> 'spgist' """
188 | 
189 | 
190 |     cur.execute("SELECT current_setting('server_version_num')::int >= 90300")
191 |     if cur.fetchone()[0] == True:
192 |         sql_indexes += " AND indislive = 'true' "
193 | 
194 |     if args.tablename != None:
195 |         sql_tables += " AND n.nspname||'.'||c.relname = %s "
196 |         sql_indexes += " AND i.indrelid::regclass = %s::regclass "
197 | 
198 |         sql_class = sql_tables + """
199 |                     UNION 
200 |                     """ + sql_indexes
201 | 
202 |         if args.debug:
203 |             print("sql_class: " + str(cur.mogrify(sql_class, [args.tablename, args.tablename])) )
204 |         cur.execute(sql_class, [args.tablename, args.tablename] )
205 |     else:
206 |         # IN clauses work with python tuples. lists were converted by get_bloat() call
207 |         if include_schema_list:
208 |             sql_tables += " AND n.nspname IN %s"
209 |             sql_indexes += " AND n.nspname IN %s"
210 |             filter_list = include_schema_list
211 |         elif exclude_schema_list:
212 |             sql_tables += " AND n.nspname NOT IN %s"
213 |             sql_indexes += " AND n.nspname NOT IN %s"
214 |             filter_list = exclude_schema_list
215 |         else:
216 |             filter_list = ""
217 | 
218 |         if args.mode == 'tables':
219 |             sql_class = sql_tables
220 |         elif args.mode == 'indexes':
221 |             sql_class = sql_indexes
222 |         elif args.mode == "both":
223 |             sql_class = sql_tables + """
224 |                     UNION 
225 |                     """ + sql_indexes
226 | 
227 |         if args.mode == "both":
228 |             if args.debug:
229 |                 print("sql_class: " + str(cur.mogrify(sql_class, (filter_list,filter_list) )) )
230 |             cur.execute(sql_class, (filter_list,filter_list))
231 |         elif args.mode == "tables" or args.mode == "indexes":
232 |             if args.debug:
233 |                 print("sql_class: " + str(cur.mogrify(sql_class, (filter_list,) )) )
234 |             cur.execute(sql_class, (filter_list,) )
235 |         else:
236 |             cur.execute(sql)
237 | 
238 |     object_list_no_toast = cur.fetchall()
239 | 
240 |     # Gather associated toast tables after generating above list so that only toast tables relevant
241 |     # to either schema or table filtering are gathered
242 |     object_list_with_toast = []
243 |     for o in object_list_no_toast:
244 |         # Add existing objects to new list
245 |         object_list_with_toast.append(o)
246 | 
247 |         if o['relkind'] == 'r' or o['relkind'] == 'm':
248 |             # only run for tables or mat views to speed things up
249 |             # Note tables without a toast table have the value 0 for reltoastrelid, not NULL
250 |             sql_toast = """  WITH toast_data AS (
251 |                             SELECT reltoastrelid FROM pg_class WHERE oid = %s AND reltoastrelid != 0
252 |                         )
253 |                         SELECT c.oid, c.relkind, c.relname, n.nspname, 'false' as indisprimary, c.reloptions
254 |                         FROM pg_catalog.pg_class c
255 |                         JOIN pg_catalog.pg_namespace n ON c.relnamespace = n.oid
256 |                         JOIN toast_data ON c.oid = toast_data.reltoastrelid
257 |                         AND c.relpersistence <> 't' """
258 |             cur.execute(sql_toast, [ o['oid'] ])
259 |             # Add new toast table to list if one exists for given table
260 |             result = cur.fetchone()
261 |             if result != None: 
262 |                 object_list_with_toast.append(result)
263 | 
264 |     if args.debug:
265 |         for o in object_list_with_toast:
266 |             print("object_list_with_toast: " + str(o))
267 | 
268 |     sql = "TRUNCATE "
269 |     if args.bloat_schema:
270 |         sql += args.bloat_schema + "."
271 |     if args.mode == "tables" or args.mode == "both":
272 |         sql_table = sql + "bloat_tables"
273 |         cur.execute(sql_table)
274 |     if args.mode == "indexes" or args.mode == "both":
275 |         sql_index = sql + "bloat_indexes"
276 |         cur.execute(sql_index)
277 |     conn.commit()
278 | 
279 |     for o in object_list_with_toast:
280 |         if args.debug:
281 |             print("begining of object list loop: " + str(o))
282 |         if exclude_object_list and args.tablename == None:
283 |             # completely skip object being scanned if it's in the excluded file list with max values equal to zero
284 |             match_found = False
285 |             for e in exclude_object_list:
286 |                 if (e['objectname'] == o['nspname'] + "." + o['relname']) and (e['max_wasted'] == 0) and (e['max_perc'] == 0):
287 |                     match_found = True
288 |             if match_found:
289 |                 continue
290 | 
291 |         if o['relkind'] == "i": 
292 |             fillfactor = 90.0
293 |         else:
294 |             fillfactor = 100.0
295 | 
296 |         if o['reloptions'] != None:
297 |             reloptions_dict = dict(o.split('=') for o in o['reloptions'])
298 |             if 'fillfactor' in reloptions_dict:
299 |                 fillfactor = float(reloptions_dict['fillfactor'])
300 |         
301 |         sql = """ SELECT count(*) FROM pg_catalog.pg_class WHERE oid = %s """
302 |         cur.execute(sql, [ o['oid'] ])
303 |         exists = cur.fetchone()[0]
304 |         if args.debug:
305 |             print("Checking for table existence before scanning: " + str(exists))
306 |         if exists == 0:
307 |             continue  # just skip over it. object was dropped since initial list was made
308 | 
309 |         if args.noanalyze != True:
310 |             if o['relkind'] == "r" or o['relkind'] == "m" or o['relkind'] == "t":
311 |                 quoted_table = "\"" + o['nspname'] + "\".\"" + o['relname'] + "\""
312 |             else:
313 |                 # get table that index is a part of
314 |                 sql = """SELECT n.nspname, c.relname
315 |                             FROM pg_catalog.pg_class c 
316 |                             JOIN pg_catalog.pg_index i ON c.oid = i.indrelid 
317 |                             JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace 
318 |                             WHERE indexrelid = %s"""
319 |                 cur.execute(sql, [ o['oid'] ] )
320 |                 result = cur.fetchone()
321 |                 quoted_table = "\"" + result[0] + "\".\"" + result[1] + "\""
322 | 
323 |             # maintain a list of analyzed tables so that if a table was already analyzed, it's not again (ex. multiple indexes on same table)
324 |             if quoted_table in analyzed_tables:
325 |                 if args.debug:
326 |                     print("Table already analyzed. Skipping...")
327 |                 pass
328 |             else:
329 |                 sql = "ANALYZE " + quoted_table
330 |                 if args.debug:
331 |                     print(cur.mogrify(sql, [quoted_table]))
332 |                 cur.execute(sql)
333 |                 analyzed_tables.append(quoted_table)
334 |         # end noanalyze check
335 | 
336 |         sql = """ SELECT c.relpages FROM pg_catalog.pg_class c 
337 |                     JOIN pg_catalog.pg_namespace n ON c.relnamespace = n.oid 
338 |                     WHERE n.nspname = %s
339 |                     AND c.relname = %s """
340 |         cur.execute(sql, [o['nspname'], o['relname']])
341 |         relpages = int(cur.fetchone()[0])
342 | 
343 |         if args.quick and (o['relkind'] == "r" or o['relkind'] == "m"):
344 |             # pgstattuple_approx() does not work against toast tables
345 |             approximate = True
346 |             sql = "SELECT table_len, approx_tuple_count AS tuple_count, approx_tuple_len AS tuple_len, approx_tuple_percent AS tuple_percent, dead_tuple_count,  "
347 |             sql += "dead_tuple_len, dead_tuple_percent, approx_free_space AS free_space, approx_free_percent AS free_percent FROM "
348 |         else:
349 |             approximate = False
350 |             sql = "SELECT table_len, tuple_count, tuple_len, tuple_percent, dead_tuple_count, dead_tuple_len, dead_tuple_percent, free_space, free_percent FROM "
351 |         if args.pgstattuple_schema != None:
352 |             sql += " \"" + args.pgstattuple_schema + "\"."
353 |         if args.quick and (o['relkind'] == "r" or o['relkind'] == "m"):
354 |             sql += "pgstattuple_approx(%s::regclass) "
355 |             if args.tablename == None:
356 |                 sql += " WHERE table_len > %s"
357 |                 sql += " AND ( (dead_tuple_len + approx_free_space) > %s OR (dead_tuple_percent + approx_free_percent) > %s )"
358 |         else:
359 |             sql += "pgstattuple(%s::regclass) "
360 |             if args.tablename == None:
361 |                 sql += " WHERE table_len > %s"
362 |                 sql += " AND ( (dead_tuple_len + free_space) > %s OR (dead_tuple_percent + free_percent) > %s )"
363 | 
364 |         if args.tablename == None:
365 |             if args.debug:
366 |                 print("sql: " + str(cur.mogrify(sql, [ o['oid']
367 |                                                     , convert_to_bytes(args.min_size)
368 |                                                     , convert_to_bytes(args.min_wasted_size)
369 |                                                     , args.min_wasted_percentage])) )
370 |             cur.execute(sql, [ o['oid']
371 |                                 , convert_to_bytes(args.min_size)
372 |                                 , convert_to_bytes(args.min_wasted_size)
373 |                                 , args.min_wasted_percentage ])
374 |         else:
375 |             if args.debug:
376 |                 print("sql: " + str(cur.mogrify(sql, [ o['oid'] ])) )
377 |             cur.execute(sql, [ o['oid'] ])
378 | 
379 |         stats = cur.fetchall()
380 | 
381 |         if args.debug:
382 |             print(stats)
383 | 
384 |         if stats: # completely empty objects will be zero for all stats, so this would be an empty set
385 | 
386 |             # determine byte size of fillfactor pages 
387 |             ff_relpages_size = (relpages - ( fillfactor/100 * relpages ) ) * block_size
388 | 
389 |             if exclude_object_list and args.tablename == None:
390 |                 # If object in the exclude list has max values, compare them to see if it should be left out of report
391 |                 wasted_space = stats[0]['dead_tuple_len'] + (stats[0]['free_space'] - ff_relpages_size)
392 |                 wasted_perc = stats[0]['dead_tuple_percent'] + (stats[0]['free_percent'] - (100-fillfactor))
393 |                 for e in exclude_object_list:
394 |                     if (e['objectname'] == o['nspname'] + "." + o['relname']):
395 |                         if ( (e['max_wasted'] < wasted_space ) or (e['max_perc'] < wasted_perc ) ):
396 |                             match_found = False
397 |                         else:
398 |                             match_found = True
399 |                 if match_found:
400 |                     continue
401 | 
402 |             sql = "INSERT INTO "
403 |             if args.bloat_schema != None:
404 |                 sql += args.bloat_schema + "."
405 | 
406 |             if o['relkind'] == "r" or o['relkind'] == "m" or o['relkind'] == "t":
407 |                 sql+= "bloat_tables"
408 |                 if o['relkind'] == "r":
409 |                     objecttype = "table"
410 |                 elif o['relkind'] == "t":
411 |                     objecttype = "toast_table"
412 |                 else:
413 |                     objecttype = "materialized_view"
414 |             elif o['relkind'] == "i":
415 |                 sql+= "bloat_indexes"
416 |                 if o['indisprimary'] == True:
417 |                     objecttype = "index_pk"
418 |                 else:
419 |                     objecttype = "index"
420 |                 
421 |             sql += """ (oid
422 |                         , schemaname
423 |                         , objectname 
424 |                         , objecttype
425 |                         , size_bytes
426 |                         , live_tuple_count
427 |                         , live_tuple_percent
428 |                         , dead_tuple_count
429 |                         , dead_tuple_size_bytes
430 |                         , dead_tuple_percent
431 |                         , free_space_bytes
432 |                         , free_percent
433 |                         , approximate
434 |                         , relpages
435 |                         , fillfactor)
436 |                     VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s) """
437 |             if args.debug:
438 |                 print("insert sql: " + str(cur.mogrify(sql, [ o['oid']
439 |                                                             , o['nspname']
440 |                                                             , o['relname']
441 |                                                             , objecttype 
442 |                                                             , stats[0]['table_len']
443 |                                                             , stats[0]['tuple_count']
444 |                                                             , stats[0]['tuple_percent']
445 |                                                             , stats[0]['dead_tuple_count']
446 |                                                             , stats[0]['dead_tuple_len']
447 |                                                             , stats[0]['dead_tuple_percent']
448 |                                                             , stats[0]['free_space']
449 |                                                             , stats[0]['free_percent']
450 |                                                             , approximate
451 |                                                             , relpages
452 |                                                             , fillfactor
453 |                                                         ])) ) 
454 |             cur.execute(sql, [   o['oid'] 
455 |                                , o['nspname']
456 |                                , o['relname']
457 |                                , objecttype
458 |                                , stats[0]['table_len']
459 |                                , stats[0]['tuple_count']
460 |                                , stats[0]['tuple_percent']
461 |                                , stats[0]['dead_tuple_count']
462 |                                , stats[0]['dead_tuple_len']
463 |                                , stats[0]['dead_tuple_percent']
464 |                                , stats[0]['free_space']
465 |                                , stats[0]['free_percent']
466 |                                , approximate
467 |                                , relpages
468 |                                , fillfactor
469 |                              ]) 
470 | 
471 |         commit_counter += 1
472 |         if args.commit_rate > 0 and (commit_counter % args.commit_rate == 0):
473 |             if args.debug:
474 |                 print("Batch committed. Object scanned count: " + str(commit_counter))
475 |             conn.commit()
476 |     conn.commit()
477 |     cur.close()
478 | ## end get_bloat()            
479 | 
480 | 
481 | def print_report(result_list):
482 |     if args.format == "simple":
483 |         for r in result_list:
484 |             print(r)
485 |     else:
486 |         print(result_list)
487 | 
488 | 
489 | def print_version():
490 |     print("Version: " + version)
491 | 
492 | 
493 | def rebuild_index(conn, index_list):
494 | 
495 |     if index_list == []:
496 |         print("Bloat statistics table contains no indexes for conditions given.")
497 |         close_conn(conn)
498 |         sys.exit(0)
499 |     
500 |     for i in index_list:
501 |         temp_index_name = "pgbloatcheck_rebuild_" + str(randint(1000,9999))
502 |         quoted_index = "\"" + i['schemaname'] + "\".\"" + i['objectname'] + "\""
503 |         # get table index is in
504 |         sql = """SELECT n.nspname, c.relname, t.spcname
505 |                     FROM pg_catalog.pg_class c 
506 |                     JOIN pg_catalog.pg_index i ON c.oid = i.indrelid 
507 |                     JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace 
508 |                     LEFT OUTER JOIN pg_catalog.pg_tablespace t ON c.reltablespace = t.oid
509 |                     WHERE indexrelid = %s"""
510 |         cur.execute(sql, [ i['oid'] ] )
511 |         result = cur.fetchone()
512 |         quoted_table = "\"" + result[0] + "\".\"" + result[1] + "\""
513 |         if result[2] != None:
514 |             quoted_tablespace = "\"" + result[2] + "\""
515 |         else:
516 |             quoted_tablespace = None
517 |         # create temp index definition
518 |         sql = "SELECT pg_get_indexdef(%s::regclass)"
519 |         cur.execute(sql, [ "\"" + i['schemaname'] +"\".\""+ i['objectname'] + "\"" ])
520 |         index_def = cur.fetchone()[0]
521 |         index_def = re.sub(r' INDEX', ' INDEX CONCURRENTLY', index_def, 1)
522 |         index_def = index_def.replace(i['objectname'], temp_index_name, 1)
523 |         if quoted_tablespace != None:
524 |             index_def += " TABLESPACE " + quoted_tablespace
525 |         index_def += ";"
526 |         # check if index is clustered
527 |         sql = "SELECT indisclustered FROM pg_catalog.pg_index WHERE indexrelid = %s"
528 |         cur.execute(sql, [ i['oid'] ])
529 |         indisclustered = cur.fetchone()[0]
530 |         # start output
531 |         print("")
532 |         print(index_def)
533 |         if indisclustered == True:
534 |             print("ALTER TABLE " + quoted_table + " CLUSTER ON " + temp_index_name) + ";"
535 |         # analyze table
536 |         print("ANALYZE " + quoted_table + ";")
537 |         if i['objecttype'] == "index":
538 |             # drop old index or unique constraint
539 |             sql = "SELECT count(*) FROM pg_catalog.pg_constraint WHERE conindid = %s"
540 |             cur.execute(sql, [ i['oid'] ])
541 |             isconstraint = int(cur.fetchone()[0])
542 |             if isconstraint == 1:
543 |                 print("ALTER TABLE " + quoted_table + " DROP CONSTRAINT " + "\"" + i['objectname'] + "\";")
544 |             else:
545 |                 print("DROP INDEX CONCURRENTLY " + quoted_index + ";")
546 |             # analyze again
547 |             print("ANALYZE " + quoted_table + ";")
548 |             # rename temp index to original name
549 |             print("ALTER INDEX \"" + i['schemaname'] + "\"." + temp_index_name + " RENAME TO \"" + i['objectname'] + "\";")
550 |         elif i['objecttype'] == "index_pk":
551 |             print("ALTER TABLE " + quoted_table + " DROP CONSTRAINT " + "\"" + i['objectname'] + "\";")
552 |             # analyze again
553 |             print("ANALYZE " + quoted_table + ";")
554 |             print("ALTER TABLE " + quoted_table + " ADD CONSTRAINT " + i['objectname'] + " PRIMARY KEY USING INDEX " + temp_index_name + ";")
555 |             # analyze again
556 |             print("ANALYZE " + quoted_table + ";")
557 |         if indisclustered == True:
558 |             print("")
559 |             print("-- WARNING: The following statement will exclusively lock the table for the duration of its runtime.")
560 |             print("--   Uncomment it or manually run it to recluster the table on the newly created index.")
561 |             print("-- CLUSTER " + quoted_table + ";")
562 | 
563 |         print("")
564 | # end rebuild_index
565 | 
566 | def convert_to_bytes(val):
567 | 
568 |    # immediately return if val is a number
569 |    # (i.e., no units were provided)
570 |    if isinstance(val, int):
571 |        return val
572 |    if isinstance(val, str) and val.isdigit():
573 |        return int(val)
574 | 
575 |    # split numbers from unit descriptor
576 |    # we assume format is "####kb" or "####MB" etc.
577 |    # we assume there are no spaces between number
578 |    # and units
579 |    match = re.search(r'^([0-9]+)([a-zA-Z]+)?',val)
580 | 
581 |    if match:
582 |      num = int(match.group(1))
583 |      unit = match.group(2)
584 | 
585 |      if args.debug:
586 |        print("arg was broken into " + str(num) + " and " + unit)
587 | 
588 |      # we shouldn't get here (because if we did,
589 |      # it would be all numbers, and that case is
590 |      # handled up at the top of this function),
591 |      # but if we somehow manage to get here,
592 |      # return num as bytes
593 |      if unit is None:
594 |        return num
595 | 
596 |      # just get the first letter of unit descriptor
597 |      # lowercase the unit for multiplier lookup
598 |      unit_prefix = unit[0].lower()
599 | 
600 |      # map 
601 |      multiplier={
602 |               'b':0,
603 |               'k':1,
604 |               'm':2,
605 |               'g':3,
606 |               't':4,
607 |               'p':5,
608 |               'e':6,
609 |               'z':7
610 |               }
611 |       
612 |      # if the size descriptor cannot be interpreted, then ignore it
613 |      exponent = multiplier.get(unit_prefix,0)
614 | 
615 |      if args.debug:
616 |        print("multiplier calculated: 1024 ** " + str(exponent))
617 | 
618 |      return_bytes = num * (1024 ** exponent)
619 | 
620 |      if args.debug:
621 |        print("calculated bytes: " + str(return_bytes))
622 | 
623 |      return return_bytes
624 | 
625 |    else:
626 |      # return val if we don't know what to do with it
627 |      return val
628 | # end convert_to_bytes
629 | 
630 | 
631 | if __name__ == "__main__":
632 |     if args.version:
633 |         print_version()
634 |         sys.exit(0)
635 | 
636 |     if args.schema != None and args.exclude_schema != None:
637 |         print("--schema and --exclude_schema are exclusive options and cannot be set together")
638 |         sys.exit(2)
639 | 
640 |     if args.debug:
641 |         print("quiet level: " + str(args.quiet))
642 | 
643 |     conn = create_conn()
644 | 
645 |     if args.recovery_mode_norun == True:
646 |         is_in_recovery = check_recovery_status(conn)
647 |         if is_in_recovery == True:
648 |             if args.debug:
649 |                 print("Recovery mode check found instance in recovery. Skipping run.")
650 |             close_conn(conn)
651 |             sys.exit(0)
652 |         else:
653 |             if args.debug:
654 |                 print("Recovery mode check found primary instance. Running as normal.")
655 | 
656 |     pgstattuple_version = float(check_pgstattuple(conn))
657 |     if args.quick:
658 |         if pgstattuple_version < 1.3:
659 |             print("--quick option requires pgstattuple version 1.3 or greater (PostgreSQL 9.5)")
660 |             close_conn(conn)
661 |             sys.exit(2)
662 | 
663 |     cur = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
664 | 
665 |     if args.create_stats_table:
666 |         create_stats_table(conn)
667 |         close_conn(conn)
668 |         sys.exit(0)
669 | 
670 |     sql = "SELECT tablename FROM pg_catalog.pg_tables WHERE tablename = %s"
671 |     if args.bloat_schema != None:
672 |         sql += " AND schemaname = %s"
673 |         cur.execute(sql, ['bloat_stats', args.bloat_schema])
674 |     else:
675 |         cur.execute(sql, ['bloat_stats'])
676 |     table_exists = cur.fetchone()
677 |     if table_exists == None:
678 |         print("Required statistics table does not exist. Please run --create_stats_table first before running a bloat scan.")
679 |         close_conn(conn)
680 |         sys.exit(2)
681 | 
682 |     if args.exclude_schema != None:
683 |         exclude_schema_list = create_list('csv', args.exclude_schema)
684 |     else:
685 |         exclude_schema_list = []
686 | 
687 |     if args.schema != None:
688 |         include_schema_list = create_list('csv', args.schema)
689 |         # Always include pg_catalog even when user specifies their own list
690 |         include_schema_list.append('pg_catalog')
691 |     else:
692 |         include_schema_list = []
693 | 
694 |     if args.exclude_object_file != None:
695 |         exclude_object_list = create_list('file', args.exclude_object_file)
696 |     else:
697 |         exclude_object_list = []
698 | 
699 |     if args.noscan == False and args.rebuild_index == False:
700 |         get_bloat(conn, tuple(exclude_schema_list), tuple(include_schema_list), exclude_object_list)
701 | 
702 |     # Final commit to ensure transaction that inserted stats data closes
703 |     conn.commit()
704 | 
705 |     counter = 1
706 |     result_list = []
707 |     if args.quiet <= 1 or args.debug == True:
708 |         simple_cols = """oid
709 |                          , schemaname
710 |                          , objectname
711 |                          , objecttype
712 |                          , CASE 
713 |                             WHEN (dead_tuple_percent + (free_percent - (100-fillfactor))) < 0 THEN 0
714 |                             ELSE (dead_tuple_percent + (free_percent - (100-fillfactor)))
715 |                            END AS total_waste_percent
716 |                          , CASE
717 |                             WHEN (dead_tuple_size_bytes + (free_space_bytes - (relpages - (fillfactor/100) * relpages ) * current_setting('block_size')::int ) ) < 0 THEN '0 bytes'
718 |                             ELSE pg_size_pretty((dead_tuple_size_bytes + (free_space_bytes - ((relpages - (fillfactor/100) * relpages ) * current_setting('block_size')::int ) ) )::bigint)
719 |                            END AS total_wasted_size"""
720 |         dict_cols = "oid, schemaname, objectname, objecttype, size_bytes, live_tuple_count, live_tuple_percent, dead_tuple_count, dead_tuple_size_bytes, dead_tuple_percent, free_space_bytes, free_percent, approximate, relpages, fillfactor"
721 |         if args.format == "dict" or args.format=="json" or args.format=="jsonpretty" or args.rebuild_index:
722 |             # Since "simple" is the default, this check needs to be first so that if args.rebuild_index is set, the proper columns are chosen
723 |             sql = "SELECT " + dict_cols + " FROM "
724 |         elif args.format == "simple":
725 |             sql = "SELECT " + simple_cols + " FROM "
726 |         else:
727 |             print("Unsupported --format given. Use 'simple', 'dict' 'json', or 'jsonpretty'.")
728 |             close_conn(conn)
729 |             sys.exit(2)
730 |         if args.bloat_schema != None:
731 |             sql += args.bloat_schema + "."
732 |         if args.mode == "tables":
733 |             sql += "bloat_tables"
734 |         elif args.mode == "indexes" or args.rebuild_index:
735 |             sql += "bloat_indexes"
736 |         else:
737 |             sql += "bloat_stats"
738 |         sql += " WHERE (dead_tuple_size_bytes + (free_space_bytes - (relpages - (fillfactor/100) * relpages ) * current_setting('block_size')::int ) ) > %s "
739 |         sql += " AND (dead_tuple_percent + (free_percent - (100-fillfactor))) > %s "
740 |         sql += " ORDER BY (dead_tuple_size_bytes + (free_space_bytes - ((relpages - (fillfactor/100) * relpages ) * current_setting('block_size')::int ) )) DESC"
741 |         cur.execute(sql, [convert_to_bytes(args.min_wasted_size), args.min_wasted_percentage])
742 |         result = cur.fetchall()
743 | 
744 |         # Output rebuild commands instead of status report
745 |         if args.rebuild_index:
746 |             rebuild_index(conn, result)
747 |             close_conn(conn)
748 |             sys.exit(0)
749 | 
750 |         for r in result:
751 |             if args.format == "simple":
752 |                 if r['objecttype'] == 'table' or r['objecttype'] == 'toast_table':
753 |                     type_label = 't'
754 |                 elif r['objecttype'] == 'index':
755 |                     type_label = 'i'
756 |                 elif r['objecttype'] == 'index_pk':
757 |                     type_label = 'p'
758 |                 elif r['objecttype'] == 'materialized_view':
759 |                     type_label = 'mv'
760 |                 else:
761 |                     print("Unexpected object type encountered in stats table. Please report this bug to author with value found: " + str(r['objecttype']))
762 |                     sys.exit(2)
763 | 
764 |                 justify_space = 100 - len(str(counter) + ". " + r['schemaname'] + "." + r['objectname'] + " (" + type_label + ") " + "(" + "{:.2f}".format(r['total_waste_percent']) + "%)" + r['total_wasted_size'] + " wasted")
765 | 
766 |                 output_line = str(counter) + ". " + r['schemaname'] + "." + r['objectname'] + " (" + type_label + ") " + "."*justify_space + "(" + "{:.2f}".format(r['total_waste_percent']) + "%) " + r['total_wasted_size'] + " wasted"
767 | 
768 |                 if r['objecttype'] == 'toast_table':
769 |                     toast_real_sql = """ SELECT n.nspname||'.'||c.relname 
770 |                                          FROM pg_catalog.pg_class c
771 |                                          JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
772 |                                          WHERE reltoastrelid = %s """
773 |                     if args.debug:
774 |                         print( "toast_real_sql: " + str(cur.mogrify(toast_real_sql, [r['oid']]) ) )
775 |                     cur.execute(toast_real_sql, [r['oid']])
776 |                     real_table = cur.fetchone()[0]
777 |                     output_line = output_line + "\n      Real table: " + str(real_table)
778 | 
779 |                 result_list.append(output_line)
780 |                 counter += 1
781 | 
782 |             elif args.format == "dict" or args.format == "json" or args.format == "jsonpretty":
783 |                 result_dict = dict([  ('oid', r['oid'])
784 |                                     , ('schemaname', r['schemaname'])
785 |                                     , ('objectname', r['objectname'])
786 |                                     , ('objecttype', r['objecttype'])
787 |                                     , ('size_bytes', int(r['size_bytes']))
788 |                                     , ('live_tuple_count', int(r['live_tuple_count']))
789 |                                     , ('live_tuple_percent', "{:.2f}".format(r['live_tuple_percent'])+"%" )
790 |                                     , ('dead_tuple_count', int(r['dead_tuple_count']))
791 |                                     , ('dead_tuple_size_bytes', int(r['dead_tuple_size_bytes']))
792 |                                     , ('dead_tuple_percent', "{:.2f}".format(r['dead_tuple_percent'])+"%" )
793 |                                     , ('free_space_bytes', int(r['free_space_bytes']))
794 |                                     , ('free_percent', "{:.2f}".format(r['free_percent'])+"%" )
795 |                                     , ('approximate', r['approximate'])
796 |                                    ])
797 |                 result_list.append(result_dict)
798 | 
799 |         if args.format == "json":
800 |             result_list = json.dumps(result_list)
801 |         elif args.format == "jsonpretty":
802 |             result_list = json.dumps(result_list, indent=4, separators=(',',': '))
803 |     
804 |         if len(result_list) >= 1:
805 |             print_report(result_list)
806 |         else:
807 |             if args.quiet == 0:
808 |                 print("No bloat found for given parameters")
809 | 
810 |     close_conn(conn)
811 | 


--------------------------------------------------------------------------------