├── docs ├── params │ ├── shared_buffers.md │ ├── log_truncate_on_rotation.md │ ├── bgwriter_lru_maxpages.md │ ├── log_lock_waits.md │ ├── lock_timeout.md │ ├── wal_recycle.md │ ├── min_wal_size.md │ ├── idle_in_transaction_session_timeout.md │ ├── max_wal_size.md │ ├── wal_sender_timeout.md │ ├── work_mem.md │ ├── parallel_leader_participation.md │ ├── hot_standby_feedback.md │ ├── wal_init_zero.md │ ├── checkpoint_timeout.md │ ├── random_page_cost.md │ ├── jit.md │ ├── max_standby_archive_delay.md │ ├── log_temp_files.md │ ├── track_io_timing.md │ ├── max_standby_streaming_delay.md │ ├── huge_pages.md │ ├── default_toast_compression.md │ ├── autovacuum.md │ ├── autovacuum_max_workers.md │ ├── max_connections.md │ └── wal_compression.md ├── images │ ├── latency.png │ └── throughput.png ├── meta.json ├── events │ ├── CLIENTWRITE.html │ ├── CPU.html │ ├── WALSENDERMAIN.html │ ├── WALRECEIVERMAIN.html │ ├── BUFFERIO.html │ ├── BUFFERPIN.html │ ├── BUFFERMAPPING.html │ ├── BUFFILEREAD.html │ ├── BUFFERCONTENT.html │ ├── DATAFILESYNC.html │ ├── BUFFILEWRITE.html │ ├── AUTOVACUUMMAIN.html │ ├── DATAFILEPREFETCH.html │ ├── EXTENSION.html │ ├── OIDGEN.html │ ├── OIDGENLOCK.html │ ├── SUBTRANSBUFFER.html │ ├── DATAFILEREAD.html │ ├── SYNCREP.html │ ├── TRANSACTIONID.html │ ├── CLIENTREAD.html │ ├── LOGICALLAUNCHERMAIN.html │ ├── TUPLE.html │ ├── LOCKMANAGER.html │ ├── WALWRITE.html │ ├── SUBTRANSSLRU.html │ └── BGWORKERSHUTDOWN.html ├── _config.yml ├── demo.md ├── pgGather.svg ├── replication.md ├── versionpolicy.md ├── InvalidIndexes.md ├── CONTRIBUTING ├── dygraphs │ ├── graph.html │ └── dygraph.css ├── unloggedtables.md ├── missingstats.md ├── NetDelay.md ├── table_object.md ├── xidhorizon.md ├── crosstab.sql ├── schema.md ├── Requirements.md ├── walarchive.md ├── unusedIndexes.md ├── tablespace.md ├── pkuk.md ├── mxid.md ├── catalogbloat.md ├── max_connections.md ├── ha.md ├── barman.md ├── cp.md ├── pgbinary.md ├── bloat.md ├── oldversions.md ├── tableinfo.md ├── History_Objectives_FAQ.md ├── extensions.md ├── continuous_collection.md ├── security.md └── waitevents.json ├── dev ├── build.md ├── README.md ├── Calculations.md └── apply_template.awk ├── LICENSE.md ├── imphistory_parallel.sh ├── imphistory.sh ├── generate_report.sh ├── history_schema.sql ├── gather_schema.sql └── README.md /docs/params/shared_buffers.md: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /docs/images/latency.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jobinau/pg_gather/HEAD/docs/images/latency.png -------------------------------------------------------------------------------- /docs/images/throughput.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jobinau/pg_gather/HEAD/docs/images/throughput.png -------------------------------------------------------------------------------- /docs/meta.json: -------------------------------------------------------------------------------- 1 | {"ver":32,"pgvers":["14.20","15.15","16.11","17.7","18.1"],"commonExtn":["plpgsql","pg_stat_statements","pg_repack"],"riskyExtn":["citus","tds_fdw","pglogical"]} 2 | -------------------------------------------------------------------------------- /docs/params/log_truncate_on_rotation.md: -------------------------------------------------------------------------------- 1 | # log_truncate_on_rotation 2 | Truncate the log file upon rotation, rather than appenend. 3 | This is a security recommendation by CIS audits. -------------------------------------------------------------------------------- /docs/events/CLIENTWRITE.html: -------------------------------------------------------------------------------- 1 |

Client:ClientWrite

2 | Waiting to write data to client/application.
3 | Generally caused by application retriving large amount of data at ones. -------------------------------------------------------------------------------- /docs/events/CPU.html: -------------------------------------------------------------------------------- 1 |

CPU Usage

2 |

This is the CPU / Computation time

3 |

**This may contain the time spend on something which cannot be accomodated in any other wait events.

-------------------------------------------------------------------------------- /docs/events/WALSENDERMAIN.html: -------------------------------------------------------------------------------- 1 |

WalSenderMain

2 | WAL Sender process is waiting in the main loop
3 | This can be considered as idle state. It is waiting for new WAL data to send to reciver.
-------------------------------------------------------------------------------- /docs/events/WALRECEIVERMAIN.html: -------------------------------------------------------------------------------- 1 |

WalReceiverMain

2 | WAL receiver process is waiting for WAL data from the primary server.
3 | This can be considered as idle state of the WAL receiver process.
-------------------------------------------------------------------------------- /docs/params/bgwriter_lru_maxpages.md: -------------------------------------------------------------------------------- 1 | # bgwriter_lru_maxpages 2 | 3 | The bgwriter need to be sufficiently active and agreesive. 4 | otherwise,major load of eviction (flushing the dirty pages) will be on the checkpointer and connection backends. -------------------------------------------------------------------------------- /docs/_config.yml: -------------------------------------------------------------------------------- 1 | title: pg_gather 2 | #remote_theme: zendesk/jekyll-theme-zendesk-garden@main 3 | remote_theme: jobinau/pg_gather_zendesk-garden@main 4 | plugins: 5 | - jekyll-remote-theme # add this line to the plugins list if you already have one 6 | -------------------------------------------------------------------------------- /docs/events/BUFFERIO.html: -------------------------------------------------------------------------------- 1 |

IO:BufferIO

2 | Backends will be trying to clear the Buffers. High value indicates that there is not sufficient shared_buffers.
3 | Generally it is expected to have assoicated DataFileRead also -------------------------------------------------------------------------------- /dev/build.md: -------------------------------------------------------------------------------- 1 | # Files for developers 2 | Files in this directory are for developers of this pg_gather using a html tempalte and awk script 3 | 4 | ``` 5 | cat gather_report.tpl.html | awk -f apply_template.awk > report.sql; psql -X -f report.sql > out.html 6 | 7 | ``` 8 | -------------------------------------------------------------------------------- /docs/events/BUFFERPIN.html: -------------------------------------------------------------------------------- 1 |

BufferPin:BufferPin

2 | BufferPin indicates that the process is waiting to acquire an exclusive pin on a buffer.
3 | An open cursor or frequent HOT updates could be holding BufferPins on Buffer pages.
4 | Buffer pinning can prevent VACUUM FREEZE operation on those pages. -------------------------------------------------------------------------------- /docs/params/log_lock_waits.md: -------------------------------------------------------------------------------- 1 | # log_lock_waits 2 | Incidents/cases where a session need to wait more than `deadlock_timeout` must be logged. 3 | long waits are often causes poor performance and concurrency. 4 | On a long term, PostgreSQL log will have information about all the victims of this concurrency problem 5 | -------------------------------------------------------------------------------- /docs/demo.md: -------------------------------------------------------------------------------- 1 | # Topics 2 | ## 1 [How to generate report on the PG host itsef](https://youtu.be/XiadIIA5QnU) 3 | Please watch the demo [video](https://youtu.be/XiadIIA5QnU) to understand how to generate a report on the databse host machine itself and possible demertis. 4 | The idea is to spin up a small temporary instance which runs on another port and use it for generating report. 5 | ## 2 -------------------------------------------------------------------------------- /docs/params/lock_timeout.md: -------------------------------------------------------------------------------- 1 | # lock_timeout 2 | 3 | A session of indefinitely waiting for necessary locks needs to be avoided. Such sessions could appear to be hanging. 4 | It is far better to cancel itself and come out reporting the problem. 5 | A general suggestion is to wait a maximum of 1 minute to get the necessary locks. 6 | ``` 7 | ALTER SYSTEM SET lock_timeout = '1min'; 8 | ``` 9 | -------------------------------------------------------------------------------- /docs/events/BUFFERMAPPING.html: -------------------------------------------------------------------------------- 1 | 2 |

Lwlock:BufferMapping

3 | This indicates the heavy activity in shared_buffers.
4 | Loading or removing pages from shared_buffers requires exclusive lock on the page. Each session also can put a shared lock on the page.
5 | High BufferMapping can indicate that big working-set-of-data by each session which the system is struggling to accomodate.
6 | Excessive indexes and bloated indexes and unpartitioned huge tables are the common reasons. -------------------------------------------------------------------------------- /docs/events/BUFFILEREAD.html: -------------------------------------------------------------------------------- 1 |

IO:BufFileRead

2 | BufFileRead happens when temporary files generated for SQL execution are read back to memory.
3 | Generally this happens after BuffileWrite.
4 | BufFileRead means that PostgreSQL is Reading from buffered Temporary Files.
5 | All sorts of temporary files including the one used for sort and hashjoins, parallel execution, And files used by single sessions (refer: buffile.c) can be responsible for this.
6 | Query tuning effort is suggestable. -------------------------------------------------------------------------------- /dev/README.md: -------------------------------------------------------------------------------- 1 | # Documentation for Developers 2 | 3 | ## How to build 4 | The project uses tempate for writing report generation code. please refer the file `gather_report.tpl.html` 5 | The AWK script `apply_template.awk` is used for generating the report.sql (or ../gather_report.sql) 6 | ``` 7 | cat gather_report.tpl.html | awk -f apply_template.awk > report.sql; psql -X -f report.sql > out.html 8 | ``` 9 | ## SQL Statement Documentation 10 | Please refer [SQL documentation](SQLstatement.md) on SQL statement used in this project. -------------------------------------------------------------------------------- /docs/pgGather.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | gGather 5 | -------------------------------------------------------------------------------- /docs/params/wal_recycle.md: -------------------------------------------------------------------------------- 1 | # wal_recycle 2 | If set to `on` (the default), this option causes WAL files to be recycled by renaming them, avoiding the need to create new ones. On CoW file systems like ZFS / BTRFS, it may be faster to create new ones, so the option is given to disable this behavior. 3 | 4 | ## Reference 5 | * [feature commit](https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=475861b26) 6 | * [Discussions 1](https://www.postgresql.org/message-id/flat/CACPQ5Fo00QR7LNAcd1ZjgoBi4y97%2BK760YABs0vQHH5dLdkkMA%40mail.gmail.com) 7 | -------------------------------------------------------------------------------- /docs/replication.md: -------------------------------------------------------------------------------- 1 | # Replication 2 | This analysis collects information from `pg_stat_replication` and `pg_replication_slots` 3 | 4 | ## Report details explained 5 | Uniits used are : 1. Bytes for "Size" 2. XMIN differences with latest known XMINs for all "Age" 6 | 7 | 8 | ## Base pg_gather tables 9 | 1. pg_replication_stat 10 | 2. pg_get_slots 11 | 12 | Raw Information imported in to above mentioned tables can be used for direct SQL queries. 13 | In case of partial and continuous gather, the information will be imported to tables with same name in `history` schema -------------------------------------------------------------------------------- /docs/params/min_wal_size.md: -------------------------------------------------------------------------------- 1 | # min_wal_size 2 | This parameter determines how many WAL files need to be retained for recycling rather than removed. 3 | PostgreSQL will try to avoid the usage of the `pg_wal` directory falling below this limit by preserving old WAL segment files. 4 | WAL file recycling reduces the overhead and fragmentation at the filesystem level. 5 | The biggest advantage of a sufficiently big `min_wal_size` is that it can ensure that there is sufficient space reserved for `pg_wal`. 6 | 7 | ## Recommendation 8 | Generally, we recommend half the size of the `max_wal_size`. -------------------------------------------------------------------------------- /docs/events/BUFFERCONTENT.html: -------------------------------------------------------------------------------- 1 |

Lwlock:BufferContent

2 | BufferContent happens when a sessions which want to modify a buffer is aquring lock on buffers.
3 | This might indicate when there is high concurrency.
4 |

Solution:

5 | 1. Reduce the number of connections. Multiplex large number of application connection over few database connections using transaction level pooling.
6 | 2. Reduce the size of the table (Archive / Purge) to fit in to memory
7 | 3. Partition the table.
8 | 4. Reduce the data integrity checks in the database side including foreign keys, check contraints and triggers
-------------------------------------------------------------------------------- /docs/events/DATAFILESYNC.html: -------------------------------------------------------------------------------- 1 |

IO:DataFileSync

2 | This is the wait event that occurs when a backend process is waiting for a data file to be synchronized to disk.
3 | This is expected during the durable writes, which flushes the dirty pages in the buffers to disk. This typically expected during a checkpoint.
4 | 5 |

Monitoring

6 | During the checkpoint operation, The checkpointer process is expected to show upto 3% of wait time on this event.
7 | If processes are showing high wait time on this event, It indicates that the storage is slow in synchronizing the data files to disk.
8 | 9 | -------------------------------------------------------------------------------- /docs/versionpolicy.md: -------------------------------------------------------------------------------- 1 | # pg_gather version policy 2 | 3 | Each pg_gather release invalidates all the previous versions. Data collections using older versions may or may not work with new analytical logic and can result in misleading inferences. There will be many corrections and improvements with every release. You may refer to the [release notes](https://github.com/jobinau/pg_gather/releases) for details. 4 | So, it is always important to use the latest version, and using older versions is highly discouraged. 5 | All PostgreSQL server versions above 10 are supported. However, the client utility : `psql` should be of minimum version 11 -------------------------------------------------------------------------------- /dev/Calculations.md: -------------------------------------------------------------------------------- 1 | # FILLFACTOR RECOMMENDATION 2 | 3 | Max 20% of space is considered for HOT updates (Redution in fill factor) 4 | So fillfactor : 100 - 20% max 5 | What is the proportion of new tuples coming due to UPDATES. reduce the above mentioned 20% if UPDATES are less. 6 | So fillfactor : 100 - 20%*UPDATES/(UPDATES+INSERTS) 7 | Even if updates are high, lot of hot updates are already happening, the additional the fraction of free space can be reduced according to the ratio of HOTUPDATE/UPDATES 8 | 20%*UPDATES/(UPDATES+INSERTS) * HOTUPDATE/UPDATE 9 | So fillfactor : 100 - 20%*UPDATES/(UPDATES+INSERTS) + 20%*UPDATES/(UPDATES+INSERTS) * HOTUPDATE/UPDATE -------------------------------------------------------------------------------- /docs/events/BUFFILEWRITE.html: -------------------------------------------------------------------------------- 1 |

IO:BuffileWrite

2 | This waitevent occurs when PostgreSQL needs to write data to temporary files on disk as part of SQL execution.
3 | This typically happens when operations require more memory than the work_mem parameter allows, causing the system to spill data to disk.
4 | From SQL tuning perspective, We need to check whether large amount of data is pulled into memory for sort and join operations. Good filtering conditions are important.
5 |
6 | For Further reading: [IO:BufFileRead and IO:BufFileWrite] -------------------------------------------------------------------------------- /docs/events/AUTOVACUUMMAIN.html: -------------------------------------------------------------------------------- 1 |

AutoVacuumMain

2 | It occurs when the Autovacuum Launcher process is idling between its scheduled searches for tables that need vacuuming or analyzing.
3 | There is a persistent background process in PostgreSQL called the Autovacuum Launcher, which is responsible for periodically checking the database for tables that require maintenance tasks such as vacuuming or analyzing.
4 | When the Launcher has finished checking the database and has spawned all necessary worker processes (up to the limit defined by autovacuum_max_workers), it enters a sleep state. During this sleep period, its wait event is recorded as AutoVacuumMain. 5 | 6 | -------------------------------------------------------------------------------- /docs/params/idle_in_transaction_session_timeout.md: -------------------------------------------------------------------------------- 1 | # idle_in_transaction_session_timeout 2 | 3 | It is important to protect the system from "idle in transaction" sessions. The sessions which are not completing the transactions quickly are dangerous for the health of the database. The default value is zero (0) which disables this timeout and that is not a good configuration for most of the enviroments. 4 | Such "idle in transaction" sessions are often found to cause blockages in databases, causing poor performance and even outages. 5 | It is suggestable to timeout such sessions in 5 mintues **at the maximum**, hence the suggession 6 | ``` 7 | ALTER SYSTEM SET idle_in_transaction_session_timeout='5min'; 8 | ``` 9 | Consider smaller values wherever applicable. -------------------------------------------------------------------------------- /docs/params/max_wal_size.md: -------------------------------------------------------------------------------- 1 | # max_wal_size 2 | This is the maximum size the `pg_wal` directory is allowed to grow. This is soft limit given to PostgreSQL so that PostgreSQL can plan for checkpointing sufficiently early to avoid the space consumption going above this limit. 3 | The default is 1GB., which is too small for any production system. 4 | 5 | ## Recommendation 6 | Ideally there should be sufficinet space for holding atleast 1 hour worth WAL files. So Wal generation need to be monitored before deciding on the value fo `max_wal_size`.   7 | Smaller sizes may trigger forced checkpoints much earlier. 8 | Poorly tuned systems may experience back-to-back checkpointing and associated instability. So consider giving bigger size for `max_wal_size` to handle occational spikes in WAL generation. 9 | 10 | 11 | -------------------------------------------------------------------------------- /docs/InvalidIndexes.md: -------------------------------------------------------------------------------- 1 | # Invalid Indexes 2 | Invalid Indexes are the corrupt unusable indexes, Which need to be dropped off or recreated 3 | 4 | ## Query to find Invalid indexes details from pg_gather data 5 | ``` 6 | SELECT ind.relname index, indexrelid indexoid,tab.relname table ,indrelid tableoid 7 | FROM pg_get_index i 8 | LEFT JOIN pg_get_class ind ON i.indexrelid = ind.reloid 9 | LEFT JOIN pg_get_class tab ON i.indrelid = tab.reloid 10 | WHERE i.indisvalid=false; 11 | ``` 12 | 13 | ## Query to find Invalid indexes from the current databasae 14 | ``` 15 | SELECT ind.relname index, indexrelid indexoid,tab.relname table ,indrelid tableoid, pg_get_indexdef(ind.oid) 16 | FROM pg_index i 17 | LEFT JOIN pg_class ind ON i.indexrelid = ind.oid 18 | LEFT JOIN pg_class tab ON i.indrelid = tab.oid 19 | WHERE i.indisvalid=false; 20 | ``` -------------------------------------------------------------------------------- /docs/params/wal_sender_timeout.md: -------------------------------------------------------------------------------- 1 | # wal_sender_timeout 2 | 3 | This parameter specifies the maximum time (in milliseconds) that a replication connection can remain inactive before the server terminates it. This helps the primary detect when a standby is disconnected due to crash or network failure. 4 | 5 | - **Default**: 60000 ms (60 seconds) 6 | - **Value 0**: Disables timeout (connection waits indefinitely) 7 | - **Impact**: Terminates only the WAL sender process, not the replication slot 8 | - **Shutdown consideration**: There could be cascading effect as explained below. 9 | 10 | ## Why it is important 11 | There could be cascading effect for this parameter.For example, The checkpointer will wait for all WAL senders to finish. However, a graceful shutdown will wait for checkpointer to finish first. So if there is no timeout happening, it could lead to shutdown taking too long. -------------------------------------------------------------------------------- /docs/params/work_mem.md: -------------------------------------------------------------------------------- 1 | # work_mem 2 | 3 | The setting of `work_mem` needs to be done very carefully. A big value can cause severe memory pressure on the server, slow down the entire database and even trigger out-of-memory (OOM) conditions. On the other hand, A small value can result in many temporary file generations for specific SQL statements, resulting in more IO. 4 | So, the general advice is to avoid specifying more than 64MB at the instance level, which could affect all the sessions. However, there could be specific SQL statements which require higher `work_mem`; please consider setting a bigger value for those specific SQL statements with a lower scope. For example, The value of `work_mem` can be specified at the transaction level such that the setting will have an effect only on that transaction 5 | ``` 6 | SET LOCAL work_mem = '200MB'; 7 | ``` 8 | or at session level 9 | ``` 10 | SET work_mem = '150MB'; 11 | ``` 12 | 13 | -------------------------------------------------------------------------------- /docs/CONTRIBUTING: -------------------------------------------------------------------------------- 1 | # Contributors Guide 2 | 3 | Two core philosophies: 4 | 1. The data collection should remain lightweight on the environment from where data is collected. 5 | 2. Collect only very specific information, which is essential for analysis. 6 | 7 | ## Key guidelines for Pull Requests: 8 | 1. Data collection (gather.sql) needs to remain minimalistic. We should avoid collecting additional info from the user environments unless it is unavoidable. 9 | I would appreciate a discussion before adding more data collection points. 10 | 2. SQL statements with joins and sort operations must be avoided during the data collection. 11 | 3. "SELECT * " is not allowed. Columns/attributes need to be listed explicitly. 12 | 4. All joins and sort operations can be done during the analysis phase (gather_report.sql). There is no restriction there. 13 | 5. Data collection should run smoothly from PG 10 onwards, and Report generation using PG 13+ 14 | -------------------------------------------------------------------------------- /docs/events/DATAFILEPREFETCH.html: -------------------------------------------------------------------------------- 1 |

IO:DataFilePrefetch

2 |

DataFilePrefetch indicates: 3 | 1. PostgreSQL is performing read-ahead operations to prefetch data blocks from disk into shared buffers before they're actually needed
4 | 2. The system is waiting for these asynchronous I/O operations to finish
5 | 3. It's part of PostgreSQL's optimization to reduce I/O wait times for subsequent queries

6 |

When It Occurs

7 | DataFilePrefetch typically happens during:
8 | 1. Large sequential scans
9 | 2. Index scans that will need many blocks
10 | 3. Operations where PostgreSQL predicts future block needs
11 |

Performance Implications

12 | Some DataFilePrefetch waits are normal and indicate the prefetch system is working,However, Excessive waits might suggest:
13 | 1. Slow storage subsystem
14 | 2. Need to tune shared_buffers or maintenance_work_mem
15 | 3. High concurrent I/O load
16 | -------------------------------------------------------------------------------- /docs/params/parallel_leader_participation.md: -------------------------------------------------------------------------------- 1 | # parallel_leader_participation 2 | 3 | By default, the leader of the parallel execution participates in the execution of plan nodes under "Gather" by collecting data from underlying tables/partitions, just like any parallel worker. Meanwhile, the leader process needs to perform additional work, such as collecting data from each parallel worker and " gathering" it in a single place. 4 | However, for an OLAP / DCS system, it would be better to have the leader process dedicated only to gathering the data from workers. This would be helpful if the following conditions are met 5 | 6 | * The host machine has a sufficiently high number of CPUs 7 | * There is not much concurrency, but few bulk SQLs are executed 8 | * Tables participating in SQL are partitioned. 9 | * The data is too big to fit into memory. 10 | 11 | ## Reference : 12 | https://kmoppel.github.io/2025-01-22-dont-forget-about-postgres-parallel-leader-participation/ 13 | -------------------------------------------------------------------------------- /docs/params/hot_standby_feedback.md: -------------------------------------------------------------------------------- 1 | # hot_standby_feedback 2 | This parameter MUST be `on` if the standby is used for executing SQL statements. Else, query cancellation due to conflict should be expected. 3 | 4 | ## Suggestion 5 | It is highly recommended to keep this parameter `on` if the standby is used for SQL statements. 6 | Again, it is recommended to keep the same value on both Primary and Standby. 7 | Along with the parameters `max_standby_archive_delay` and `max_standby_streaming_delay`, this parameter can allow a long-running SQL statement on the standby to wait before applying changes and acknowledging the replication position to the primary. 8 | This can prevent the primary from cleaning up tuple versions which are required on the standby side. 9 | 10 | ## Caution 11 | If the values of `max_standby_archive_delay` and `max_standby_streaming_delay` are high, the primary could end up holding old tuple versions, preventing autovacuum / vacuum from cleaning them up. This may potentially result in bloat on the primary. -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | Postgres License 2 | pg_gather is released under the PostgreSQL Licence. a liberal Open Source license, similar to the MIT license 3 | 4 | Copyright (c) 2020-2025 Jobin Augustine 5 | 6 | Permission to use, copy, modify, and distribute this software and its documentation for any purpose, 7 | without fee, and without a written agreement is hereby granted, provided that the above copyright notice 8 | and this paragraph and the following two paragraphs appear in all copies. 9 | 10 | IN NO EVENT SHALL THE COPYRIGHT OWNERS, AUTHORS, CONTRIBUTORS OR THE COMPANY/ENTITY THEY ARE WORKING FOR, BE LIABLE 11 | TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, ARISING 12 | OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF THE PERSON OR ENTITY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 13 | 14 | THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS IS" BASIS, AND NO ENTITY HAS ANY OBLIGATIONS TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, 15 | ENHANCEMENTS, OR MODIFICATIONS. 16 | -------------------------------------------------------------------------------- /docs/dygraphs/graph.html: -------------------------------------------------------------------------------- 1 | 2 | 4 | 6 | 7 | 8 | 9 |
10 | 35 | -------------------------------------------------------------------------------- /docs/unloggedtables.md: -------------------------------------------------------------------------------- 1 | # Unlogged tables 2 | Unlogged tables are the tables for which no WAL will be generated. They are ephemeral 3 | means data in the tables might be lost, if there is a crash / unclean-shutdown or a switchover to standbuy 4 | Since there is no WAL records gets generated these taables won't be able to participate in replication. no data will be replicated. 5 | 6 | 7 | ## List of unlogged tables From pg_gather data 8 | 9 | ``` 10 | SELECT c.relname "Tab Name",c.relkind,r.tab_ind_size "Tab + Ind",ct.relname "Toast name",rt.tab_ind_size "Toast + T.Ind" 11 | FROM pg_get_class c 12 | JOIN pg_get_rel r ON r.relid = c.reloid 13 | LEFT JOIN pg_get_toast t ON r.relid = t.relid 14 | LEFT JOIN pg_get_class ct ON t.toastid = ct.reloid 15 | LEFT JOIN pg_get_rel rt ON rt.relid = t.toastid 16 | WHERE c.relkind='r' AND c.relpersistence='u'; 17 | ``` 18 | 19 | ## List of unlogged tables from database 20 | ``` 21 | SELECT c.relname,c.relkind,pg_total_relation_size(c.oid), tc.relname "TOAST",pg_total_relation_size(tc.oid) "TOAST + TInd" 22 | FROM pg_class c 23 | JOIN pg_class tc ON c.reltoastrelid = tc.oid 24 | WHERE c.relkind='r' AND c.relpersistence='u'; 25 | ``` -------------------------------------------------------------------------------- /docs/params/wal_init_zero.md: -------------------------------------------------------------------------------- 1 | # wal_init_zero 2 | If set to `on` (the default), this option causes new WAL files to be filled with zeroes. On some file systems, this ensures that space is allocated before we need to write WAL records. However, Copy-On-Write (COW) file systems may not benefit from this technique, so the option is given to skip the unnecessary work. If set to `off` 3 | 4 | On the other hand turning it `off` on regular filesystems could cause performance regressions. Because when wal_init_zero is off, PostgreSQL creates new WAL segments by simply `lseek`ing to the end of the file or using `fallocate()` without actually writing data to zero out the underlying blocks. On many common filesystems (like ext4/xfs), this creates "holes" in the file. When data is subsequently written to these "holey" blocks, the filesystem has to perform additional work, resulting in multiple disk operations. 5 | 6 | ## Reference 7 | * [feature commit](https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=475861b26) 8 | * [Discussions 1](https://www.postgresql.org/message-id/flat/CACPQ5Fo00QR7LNAcd1ZjgoBi4y97%2BK760YABs0vQHH5dLdkkMA%40mail.gmail.com) 9 | * [Discussions 2](https://www.postgresql.org/message-id/flat/87a5bs5tla.fsf%40163.com) -------------------------------------------------------------------------------- /docs/params/checkpoint_timeout.md: -------------------------------------------------------------------------------- 1 | # checkpoint_timeout 2 | PostgreSQL performs a checkpoint every 5 minutes, as per the default settings. Checkpointing every 5 minutes is a very high frequency for a production system.  3 | In practical world, if there is a database outage, the HA will be failover to standby. So, the time it takes for a crash recovery becomes meaningless. 4 | 5 | PostgreSQL must flush out all dirty buffers to the disk to complete the checkpoint. if there is a good chance of the same pages getting modified again, this effort will be meaningless 6 | The biggest disadvantage of frequent checkpoints is the full-page write of every page getting modified is required after the checkpoint. In a system with a large amount of memory and many pages being modified, the impact will be huge. Often, that causes a huge spike in IO and a drop in database throughput. 7 | 8 | 9 | ## Recommendation 10 | Overall, checkpointing is a heavy and resource-intensive activity in the system. Reduce its frequency to get better performance and stability. 11 | Considering all the factors and real world feedback, we recommend checkpointing in every half an hour to one hour duration. 12 | ``` 13 | checkpoint_timeout = 1800 14 | ``` 15 | The value is specified in seconds. 16 | -------------------------------------------------------------------------------- /docs/params/random_page_cost.md: -------------------------------------------------------------------------------- 1 | # random_page_cost - Cost of Randomly accessing a memory block 2 | It is costly to access random pages in a magnetic disk because of the seek time ( track-to-track seek) and additional rotational latency to reach the track. 3 | Random accesss can be 10s of magnitude costly than sequently reading data from the same track. PostgreSQL by default considers that random access is 4x time costly than sequential access (value of random_page_cost), which is a generally accepted good balance. 4 | Hoever nowadays most of the environments has SSDs or NVMes or Storage which behave like SSDs where the random access is as cheap as sequential access. 5 | 6 | # Implications on Database performnace 7 | The Index scans are generally random access (B-Tree). If the random access is costly, the PostgreSQL planner (cost based planner) will take plans which could reduce the use of Indexes. Effectively we might see less use of indexes. 8 | 9 | # Suggessions 10 | 1. If The storage is using local SSD/NVMe, The `random_page_cost` can be almost same as `seq_page_cost`. A value between 1 to 1.2 is generally suggested 11 | 2. If the storage is SAN drive with memory caches, A value around 1.5 would be good. 12 | 3. if the storage is a Magnetic disk, The default value of 4 would be sufficient. 13 | -------------------------------------------------------------------------------- /docs/params/jit.md: -------------------------------------------------------------------------------- 1 | # jit (Just In Time compilation) 2 | PostgreSQL is capable of doing just in time compilation of the SQL statements from PostgreSQL version 12. 3 | This uses LLVM infrasructure available on the host machine. 4 | However, Due to initial compilation overhead, it is seldom gives any advantage. There could be very specific cases where this gives some advantage. 5 | 6 | # Disadvantages 7 | 1. Very rarely it gives any performance advatage. 8 | 2. LLVM infra can cause memory and CPU overhead. 9 | 3. Memory leaks are reported. 10 | 4. JIT is reported to cause crash in few enviroments. 11 | 12 | # Suggession 13 | 1. Disable JIT at global level (At instance level) 14 | 2. If there are specific SQL statements which has some advatage in terms of performance, Plase consider enabling the parameter at lower scope (At transaction level or Session level). [PostgreSQL Parameters: Scope and Priority Users Should Know](https://www.percona.com/blog/postgresql-parameters-scope-and-priority-users-should-know/) 15 | 16 | ## Additional references 17 | 1. [backend crash caused by query in llvm on arm64](https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1059476) 18 | 2. [BUG #18503: Reproducible 'Segmentation fault' in 16.3 on ARM64](https://www.postgresql.org/message-id/flat/18503-6e0f5ab2f9c319c1%40postgresql.org) -------------------------------------------------------------------------------- /docs/missingstats.md: -------------------------------------------------------------------------------- 1 | # Tables missing statistics 2 | 3 | Missing Statistics on tables can result in poor execution planning. 4 | Generally Statistics will be collected by autovacuum runs or explicit ANALYZE command. 5 | But in rare conditions, There could be tables without any statics leading to poor query planning. 6 | 7 | ## Tables without statistics (From pg_gather data) 8 | Following query can be executed on the database where pg_gather data is imported. 9 | 10 | ``` 11 | SELECT nsname "Schema" , relname "Table",n_live_tup "Tuples" 12 | FROM pg_get_class c LEFT JOIN pg_get_ns n ON c.relnamespace = n.nsoid 13 | JOIN pg_get_rel r ON c.reloid = r.relid AND relkind='r' AND r.n_live_tup != 0 14 | WHERE NOT EXISTS (select table_oid from pg_tab_bloat WHERE table_oid=c.reloid) 15 | AND nsname <> ALL (ARRAY['pg_catalog'::name, 'information_schema'::name]); 16 | ``` 17 | 18 | ## Tables without statistics (Directly from catalog) 19 | Following query can be executed on the *target database* to get the list of tables for which stats need to be collected using ANALYZE 20 | ``` 21 | SELECT c.oid,nspname "Schema",relname "Table",pg_stat_get_live_tuples(c.oid) "Tuples" 22 | FROM pg_class c 23 | JOIN pg_namespace as n ON relkind = 'r' AND n.oid = c.relnamespace 24 | AND n.nspname <> ALL (ARRAY['pg_catalog'::name, 'information_schema'::name]) 25 | WHERE NOT EXISTS (SELECT starelid FROM pg_statistic WHERE starelid=c.oid) 26 | AND pg_stat_get_live_tuples(c.oid) != 0; 27 | ``` 28 | -------------------------------------------------------------------------------- /docs/NetDelay.md: -------------------------------------------------------------------------------- 1 | # Net / Delay 2 | [![Nework / Delay Explained](https://img.youtube.com/vi/v5Y9YT44rOY/0.jpg)](https://youtu.be/v5Y9YT44rOY) 3 | 4 | Network or Delay. This is the time spent without doing anything. Waiting for something to happen. 5 | Common causes are: 6 | 1. **High network latency** 7 | Not every "Network/Delay" won't always result in "ClientRead" because the network delay can affect select statements also, which are independent of the transaction block. 8 | 2. **Connection poolers/proxies** 9 | Proxies standing in between the application server and the database can cause delays. 10 | 3. **Application design** 11 | If the Application becomes too chatty (Many back and forth communication), database sessions start spending more time waiting for communication. 12 | 4. **Application side processing time.** 13 | For example, An application executes a SELECT statement and receives a result set. Subsequently, a period of time may be required to process these results before the next statement is sent. This intermittent idling between transactions is also a factor to consider. 14 | for this category. 15 | 5. **Overloaded servers** - Waiting for scheduling 16 | On an overloaded application or database server, processes spend more time waiting in the run queue to be executed, because run queue gets longer. This increased wait time is known as run "*queue latency*" or "*CPU contention*". ("*CPU wait*" as per Oracle terminology). 17 | Such waiting time also accounted in Net/Delay. 18 | -------------------------------------------------------------------------------- /docs/params/max_standby_archive_delay.md: -------------------------------------------------------------------------------- 1 | # max_standby_archive_delay 2 | WAL applied on standby can be delayed by the amount of time specified in this parameter. The default is 30 seconds (30000 ms). PostgreSQL will hold the WAL apply if there is a conflicting statement already running on the standby side. This parameter comes into effect when WAL is **retrieved from the archive location**. So if the replication is a streaming replication, the value of this parameter won't be considered, instead the the value of [max_standby_streaming_delay](./max_standby_streaming_delay.md) will be in use. 3 | 4 | # Suggessions: 5 | One should increase this parameter if there are long-running statements on the standby side and frequent problems of statement cancellation due to conflicts. However, that comes with a cost of replication delay. These two requirements put opposite considerations into this parameter. 6 | One common strategy is to divide the sessions connecting to the standby side into multiple standby nodes, so that statements with longer duration are redirected to one standby and Statements that need to see near real-time data are redirected to another standby. The standby where long-running statements can have a bigger value for this parameter. 7 | Unless such strategies are used, the same value for this parameter and [max_standby_streaming_delay](./max_standby_streaming_delay.md) is a common practice. 8 | It is not recommended that this parameter have too big a value. Instead, statements that are taking too long to complete should be investigated for tuning. -------------------------------------------------------------------------------- /docs/params/log_temp_files.md: -------------------------------------------------------------------------------- 1 | # log_temp_files 2 | Heavy SQL statements which fetch large volumes of data and perform join or sort operations on that data may not be able to fit the data into `work_mem`, and consequently, PostgreSQL may spill it to disk. These files are called temp files. This could result in unwated I/O and performance degradation in PostgreSQL.The parameter `log_temp_files` will help generate entries in the PostgreSQL log with details of the SQL statement that caused the temp file generation. Setting this value to "0" might cause a lot of entries in PostgreSQL log and resulting in big size log files. 3 | ## Recommendation 4 | All SQL statements that generate excessive temp files need to be identified and addressed. Some of the SQL statements might need to be rewritten. Those statements that cannot be further simplified but need more `work_mem` needs special attention. Please refer to the [work_mem](work_mem.md) section for further details on how to handle this. In order to identify the problematic SQL statements, Start with those SQL statements which generate more than 100MB files 5 | ``` 6 | log_temp_files = '100MB'; 7 | ``` 8 | Once all those queries are addressed, this size can be further reduced to `50MB`. Keep reducing until objective is achived. 9 | 10 | ## References 11 | 1. [PostgreSQL documentation on log_temp_files](https://www.postgresql.org/docs/current/runtime-config-logging.html#GUC-LOG-TEMP-FILES) 12 | 2. [AWS documentation](https://docs.aws.amazon.com/prescriptive-guidance/latest/tuning-postgresql-parameters/log-temp-files.html) -------------------------------------------------------------------------------- /docs/events/EXTENSION.html: -------------------------------------------------------------------------------- 1 |

Extension

2 |

The server process is waiting for some condition defined by an extension module
3 | In other words, The PostgreSQL process is waiting for an operation defined by an installed Extension (add-on module) to complete.
4 | The core database engine is "handing over the wheel" to an external module.
5 | If you see high waits here, the bottleneck is not in standard PostgreSQL features (like tables or indexes), but in the specific logic of the extension you are using. 6 |

7 |

Why it is happening (Root Causes)

8 | Following extensions are some of the common trouble makers that can cause waits in this category: 9 | 27 | 28 | -------------------------------------------------------------------------------- /docs/params/track_io_timing.md: -------------------------------------------------------------------------------- 1 | # track_io_timing - Capture timing information of I/O 2 | PostgreSQL will caputre and populate I/O related counters if this parameter is enabled. 3 | This parameter is `off` by default, as it will repeatedly query the operating system for the current time, which may cause significant overhead on some platforms. 4 | However most of the modern CPUs (Intel/AMD and ARM) for database servers, the overhead is very low. 5 | 6 | # Suggession 7 | Run the `pg_test_timing` which comes along with the PostgreSQL installation. 8 | Please proceed to enable this parameter ("on"). If the results indicates that 95% of the calls has less than 1 microsecond delay 9 | Here is a sample result 10 | ``` 11 | $pg_test_timing 12 | Testing timing overhead for 3 seconds. 13 | Per loop time including overhead: 22.23 ns 14 | Histogram of timing durations: 15 | < us % of total count 16 | 1 97.78015 131942883 17 | 2 2.21770 2992526 18 | 4 0.00196 2643 19 | 8 0.00018 245 20 | 16 0.00001 13 21 | 32 0.00000 3 22 | ``` 23 | 24 | # Benefit 25 | The additional information it catpures enables users to undstand I/O related latency better. 26 | 1. I/O timing information is displayed in `pg_stat_database`, `pg_stat_io` 27 | 2. I/O timing information in the output of EXPLAIN when the BUFFERS option is used, 28 | 3. I/O timing information in the output of VACUUM when the VERBOSE option is used, by autovacuum for auto-vacuums and auto-analyzes, when log_autovacuum_min_duration is set and by pg_stat_statements 29 | -------------------------------------------------------------------------------- /docs/table_object.md: -------------------------------------------------------------------------------- 1 | # Tables and Objects in a Database 2 | A PostgreSQL Database can typically contain different types of objects 3 | 4 | 1. Tables, where data is stored 5 | 2. Toasts, which acts as extension to tables 6 | 3. Indexes, which are associated with tables and their columns 7 | 4. Native Partitioned table, which contains only the defenitions 8 | 5. Materialized Views 9 | 6. Sequnces 10 | 7. composite types 11 | 8. Foregin tables 12 | etc.. 13 | 14 | Having too many objects in a single database increases the metadata, which adversily impact the overall database performance and response. 15 | Less that thousand database objects are most ideal. 16 | 17 | # Get the list of objects and their details from pg_gather data 18 | Please run the following query on the database where the pg_gather data is imported 19 | ``` 20 | SELECT c.relname "Name",c.relkind "Kind",r.relnamespace "Schema",r.blks,r.n_live_tup "Live tup",r.n_dead_tup "Dead tup", CASE WHEN r.n_live_tup <> 0 THEN ROUND((r.n_dead_tup::real/r.n_live_tup::real)::numeric,4) END "Dead/Live", 21 | r.rel_size "Rel size",r.tot_tab_size "Tot.Tab size",r.tab_ind_size "Tab+Ind size",r.rel_age,r.last_vac "Last vacuum",r.last_anlyze "Last analyze",r.vac_nos, 22 | ct.relname "Toast name",rt.tab_ind_size "Toast+Ind" ,rt.rel_age "Toast Age",GREATEST(r.rel_age,rt.rel_age) "Max age" 23 | FROM pg_get_rel r 24 | JOIN pg_get_class c ON r.relid = c.reloid AND c.relkind NOT IN ('t','p') 25 | LEFT JOIN pg_get_toast t ON r.relid = t.relid 26 | LEFT JOIN pg_get_class ct ON t.toastid = ct.reloid 27 | LEFT JOIN pg_get_rel rt ON rt.relid = t.toastid 28 | ORDER BY r.tab_ind_size DESC; 29 | ``` 30 | -------------------------------------------------------------------------------- /docs/events/OIDGEN.html: -------------------------------------------------------------------------------- 1 |

LWLock:OidGen

2 |

The process is waiting for exclusive access to the OID Counter to assign a unique Object Identifier (OID)
3 | to a new database object (like a table, a type, or a TOAST value).
4 | If it takes time, It may indicates that address space contention (32bit)
5 | It has been Reported that toast chucks which uses oid, when runs out of available oids, this wait event appears.

6 |

Why it is happening (Root Causes)

7 | -------------------------------------------------------------------------------- /docs/events/OIDGENLOCK.html: -------------------------------------------------------------------------------- 1 |

LWLock:OidGen

2 |

The process is waiting for exclusive access to the OID Counter to assign a unique Object Identifier (OID)
3 | to a new database object (like a table, a type, or a TOAST value).
4 | If it takes time, It may indicates that address space contention (32bit)
5 | It has been Reported that toast chucks which uses oid, when runs out of available oids, this wait event appears.

6 |

Why it is happening (Root Causes)

7 | -------------------------------------------------------------------------------- /docs/events/SUBTRANSBUFFER.html: -------------------------------------------------------------------------------- 1 |

Lwlock:SubtransBuffer

2 | The SubtransBuffer event occurs when a backend process is waiting to read or write a data page in the pg_subtrans SLRU (Simple Least Recently Used) cache.
3 | Waiting for I/O on a sub-transaction SLRU buffer.
4 | The SubtransBuffer wait event is often considered as the early indicators of architectural issues within your application.
5 | While most developers are familiar with standard transaction locks, SubtransBuffer indicates a bottleneck in how PostgreSQL tracks the relationship between transactions and their subtransactions. 6 |

Common Causes of High SubtransBuffer Waits

7 | 14 | PostgreSQL uses the pg_subtrans directory to maintain a mapping of every subtransaction ID (subXID) to its parent transaction ID.
15 | This mapping is vital for MVCC (Multi-Version Concurrency Control)—the database needs it to determine if a row version (tuple) created by a subtransaction should be visible to other sessions. 16 | This wait event may appear along with SubtransSLRU wait events. If yes, Please refer to SubtransSLRU for more details
17 | -------------------------------------------------------------------------------- /docs/params/max_standby_streaming_delay.md: -------------------------------------------------------------------------------- 1 | # max_standby_streaming_delay 2 | WAL applied on standby can be delayed by the amount of time specified in this parameter. The default is 30 seconds (30000 ms). PostgreSQL will hold the WAL apply if there is a conflicting statement already running on the standby side. This parameter comes into effect when WAL is **fetched though streaming replication from primary**. 3 | 4 | 5 | One should increase this parameter if there are long-running statements on the standby side and frequent problems of statement cancellation due to conflicts. However, that comes with a cost of replication delay, if there is a conflict. These two requirements put opposite considerations into this parameter. 6 | In Summary, **A big value can cause replication delays and Small values can cause statement cancellations.** 7 | 8 | # Suggessions: 9 | There won't be a single value which could work great in all environments. Careful study and adjustment is required. If the standby is used for long running SQL statements, Strategic decisions may be required. 10 | One common strategy is to divide the sessions connecting to the standby side into multiple standby nodes, so that statements with longer duration are redirected to one standby and Statements that need to see near real-time data are redirected to another standby. The standby where long-running statements can have a bigger value for this parameter. 11 | Unless such strategies are used, the same value for this parameter and [max_standby_archive_delay](./max_standby_archive_delay.md) is a common practice. 12 | It is not recommended that this parameter have too big a value. Instead, statements that are taking too long to complete should be investigated for tuning. -------------------------------------------------------------------------------- /docs/xidhorizon.md: -------------------------------------------------------------------------------- 1 | # TransactionID Snapshot Horizon. 2 | Long-running transactions are problematic for any ACID-compliant database system because the system needs to track the snapshot of data to which every currently running transaction refers. PostgreSQL is no different and is expected to have many types of troubles if there are long-running transactions or statements. 3 | 4 | ## How to check 5 | 6 | Under the main Head information presented at the top of the pg_gather report, the "Oldest xid ref" is the oldest/earliest transaction ID (txid) that is still active. This is the oldest xid horizon which PostgreSQL is currently tracking. All earlier transactions than this will either be committed and visible or rolled back and dead. 7 | 8 | Again under "Sessions" details, Details like each sessions, the statment they are running and the xid age of the snapshot each of those sessions are refering, duration of the statement etc will be displayed. 9 | 10 | ## Dangers of long-running transactions. 11 | 1. Uncommitted transactions can cause contention in the system, resulting in overall slow performance. 12 | 2. They commonly cause concurrency issues, system blocking, sometimes hanging sessions, and even database outages. 13 | 3. Vacuum/autovacuum won't be able to clean up dead tuples generated after the oldest xmin reference, which results ub poor query performance and reduces the chance of Index-only scans. 14 | 5. System may reach high xid age and possibliy wraparound stages if the vacuum/autovacuum is not able to clean up old tuples. Systems with long running sessions. 15 | Wraparound prevention autovacuum (aggressive mode autovacuum) is frequently reported in systems which has long-running transactions 16 | 6. Tables and indexes are more prone to bloating as the vacuum becomes inefficient. -------------------------------------------------------------------------------- /docs/crosstab.sql: -------------------------------------------------------------------------------- 1 | --------- Crosstab report for continuous data collection ----------- 2 | -- This requires tablefunc contrib extension to be created -- 3 | -- tablefunc is part of PostgreSQL contrib modules -- 4 | -------------------------------------------------------------------- 5 | --Findout the wait events and prepare the columns for the crosstab report 6 | SELECT STRING_AGG(col,',') AS cols FROM 7 | (SELECT COALESCE(wait_event,'CPU') || ' int' "col" 8 | FROM history.pg_pid_wait WHERE wait_event IS NULL OR 9 | wait_event NOT IN ('ArchiverMain','AutoVacuumMain','BgWriterHibernate','BgWriterMain','CheckpointerMain','LogicalApplyMain','LogicalLauncherMain','RecoveryWalStream','SysLoggerMain','WalReceiverMain','WalSenderMain','WalWriterMain','CheckpointWriteDelay','PgSleep','VacuumDelay') 10 | GROUP BY wait_event ORDER BY 1) as A \gset 11 | --Run a crosstab query 12 | SELECT * 13 | FROM crosstab( 14 | $$ SELECT collect_ts,COALESCE(wait_event,'CPU') "Event", count(*) FROM history.pg_pid_wait 15 | WHERE wait_event IS NULL OR wait_event NOT IN ('ArchiverMain','AutoVacuumMain','BgWriterHibernate','BgWriterMain','CheckpointerMain','LogicalApplyMain','LogicalLauncherMain','RecoveryWalStream','SysLoggerMain','WalReceiverMain','WalSenderMain','WalWriterMain','CheckpointWriteDelay','PgSleep','VacuumDelay') 16 | GROUP BY 1,2 ORDER BY 1, 2 $$, 17 | $$ SELECT COALESCE(wait_event,'CPU') 18 | FROM history.pg_pid_wait WHERE wait_event IS NULL OR 19 | wait_event NOT IN ('ArchiverMain','AutoVacuumMain','BgWriterHibernate','BgWriterMain','CheckpointerMain','LogicalApplyMain','LogicalLauncherMain','RecoveryWalStream','SysLoggerMain','WalReceiverMain','WalSenderMain','WalWriterMain','CheckpointWriteDelay','PgSleep','VacuumDelay') 20 | GROUP BY wait_event ORDER BY 1 $$) 21 | AS ct (Collect_Time timestamp with time zone,:cols); 22 | 23 | -------------------------------------------------------------------------------- /docs/events/DATAFILEREAD.html: -------------------------------------------------------------------------------- 1 |

IO:DataFileRead

2 | DataFileRead occurs when a connection needs to access a specific data page that is not currently present in the PostgreSQL Shared Buffers.
3 | The process must wait while the operating system reads the page from the disk (or filesystem cache) into memory. 4 | 5 |

Implication:

6 | While some disk reads are normal, a high percentage of these wait events typically indicates that your active dataset (working set) is larger than your available memory,
7 | or that inefficient queries are forcing unnecessary disk reads. 8 | 9 |

Why it is happening (Root Causes)

10 | If this event is dominating, look for these culprits: 11 | 28 | 29 |

Comparison: IO:DataFileRead vs. IO:DataFilePrefetch

30 | It is important not to confuse this with Prefetching. 31 | -------------------------------------------------------------------------------- /docs/events/SYNCREP.html: -------------------------------------------------------------------------------- 1 |

IPC:SyncRep

2 |

This event occurs when a transaction has completed locally and is waiting for confirmation from a remote standby server before it can return "Success" to the client.

3 | PostgreSQL "Synchronous Replication" is technically Asynchronous Replication + A Wait. The database does not "write successfully to both places at the exact same instant." Instead,
4 | it writes locally, streams the data asynchronously to the standby, and then pauses the user session (generating this wait event) until the standby sends a "Thumbs Up" acknowledgment. 5 | 6 |

Why it is happening (Root Causes)

7 | 21 |

Additional References

22 | 26 | 27 | -------------------------------------------------------------------------------- /docs/schema.md: -------------------------------------------------------------------------------- 1 | # Schema / Namespace 2 | 3 | ## From pg_gather data 4 | ### 1. list of schema/namespace present in the database 5 | ``` 6 | SELECT nsoid,nsname, 7 | CASE WHEN nsname IN ('pg_toast','pg_catalog','information_schema') THEN 'System' 8 | WHEN nsname LIKE 'pg_toast_temp%' THEN 'TempToast' 9 | WHEN nsname LIKE 'pg_temp%' THEN 'Temp' 10 | ELSE 'User' END 11 | FROM pg_get_ns; 12 | ``` 13 | ### 2. Grops of Namespaces 14 | ``` 15 | WITH ns AS (SELECT nsoid,nsname, 16 | CASE WHEN nsname IN ('pg_toast','pg_catalog','information_schema') THEN 'System' 17 | WHEN nsname LIKE 'pg_toast_temp%' THEN 'TempToast' 18 | WHEN nsname LIKE 'pg_temp%' THEN 'Temp' 19 | ELSE 'User' END AS nstype 20 | FROM pg_get_ns) 21 | SELECT nstype,count(*) FROM ns GROUP BY nstype; 22 | ``` 23 | ### 3. List of Schema of "User" schema 24 | List of schema which doesn't include temp or temp toast or system schema. 25 | Meaning, list of schema explicity created by users. 26 | ``` 27 | WITH ns AS (SELECT nsoid,nsname, 28 | CASE WHEN nsname IN ('pg_toast','pg_catalog','information_schema') THEN 'System' 29 | WHEN nsname LIKE 'pg_toast_temp%' THEN 'TempToast' 30 | WHEN nsname LIKE 'pg_temp%' THEN 'Temp' 31 | ELSE 'User' END AS nstype 32 | FROM pg_get_ns) 33 | SELECT * FROM ns WHERE nstype='User'; 34 | ``` 35 | 36 | ### 4. Schema wise, count and size 37 | ``` 38 | WITH ns AS (SELECT nsoid,nsname 39 | FROM pg_get_ns WHERE nsname NOT LIKE 'pg_temp%' AND nsname NOT LIKE 'pg_toast_temp%' 40 | AND nsname NOT IN ('pg_toast','pg_catalog','information_schema')), 41 | sumry AS (SELECT r.relnamespace, count(*) AS "Tables", sum(r.rel_size) "Tot.Rel.size",sum(r.tot_tab_size) "Tot.Tab.size",sum(r.tab_ind_size) "Tab+Ind.size" 42 | FROM pg_get_rel r 43 | JOIN pg_get_class c ON r.relid = c.reloid AND c.relkind NOT IN ('t','p') 44 | GROUP BY r.relnamespace) 45 | SELECT nsoid,nsname,"Tables","Tot.Rel.size","Tot.Tab.size","Tab+Ind.size",pg_size_pretty("Tab+Ind.size") as "Size" 46 | FROM ns LEFT JOIN sumry ON ns.nsoid = sumry.relnamespace 47 | ORDER BY 6 DESC NULLS LAST; 48 | ``` 49 | ** use JOIN instead of LEFT JOIN to eleminate emtpy schemas -------------------------------------------------------------------------------- /docs/Requirements.md: -------------------------------------------------------------------------------- 1 | Requirements 2 | -------------------- 3 | 1. No download cases. Database Servers may not have net connection and admins may not be allowed to download script from public net for using in production. 4 | Expected : Need to share the script over official mail / ticket 5 | 2. No executables allowed. Secured enviroment with security scanners and auditing in place. 6 | DBAs are allowed to execute simple SQL statements. 7 | 3. No password authentication - Peer / ssl certificate authentication 8 | The data collection tool should work with any PostgreSQL authentication 9 | 4. Windows laptop and RDS instance. 10 | Windows client connecting to RDS 11 | 5. PostgreSQL on Windows 12 | Unix tools are completely helpless. 13 | 6. Aurora and other PostgreSQL like softwares 14 | Many PostreSQL like softwares started appearing without full compatibility. many catalog views and stats views are missing. Tool should just skip over what is missing than a hard stop with error. 15 | 7. ARM processor - No information about the processor architecutre when customer reports a problem. 16 | For example, Customer just reports a problem like "database is slow". The data collection step should be independent of the processor architecture. 17 | 8. PG in Container and different shell. 18 | In addition to Shell, Unix tools / Perl scripts won't help as many of them are missing in many containers. 19 | 9. Customer who collects data may not have privilege to execute query on many PostgreSQL views. 20 | Many SQL statments are expected to fail in unprivilaged user environrment. But tool should proceed with what it can. 21 | 10. Should be very light weight. Completely avoid any complex analysis queries on the system to be scanned 22 | Practically Users have 2 vcpu machines with 4GB ram for their micro services. 23 | 11. Seperation of data collection and analysis 24 | Collected data should be available in row format for in-depeth analysis and complex SQL statements 25 | 12. The collected data should be captured in a smallest file possible. Eleminate every data redundancy over each version. 26 | -------------------------------------------------------------------------------- /docs/params/huge_pages.md: -------------------------------------------------------------------------------- 1 | # huge_pages - Use Linux hugepages 2 | 3 | ### Warning: Critical Impact of Not Using Hugepages 4 | Failure to implement Hugepages is a primary cause of stability and reliability issues in PostgreSQL database systems. Memory-related problems and out-of-memory (OOM) terminations are frequently reported in systems that do not utilize Hugepages. Additionally, connection issues and inconsistent execution times are also prevalent. 5 | Without Hugepages, memory management and accounting become significantly more complex, leading to increased risk of system instability and performance degradation. The use of Hugepages is essential for optimal memory management and is a critical OS-level tuning requirement for handling database workloads. 6 | Failure to implement this feature may result in severe performance issues - occational drop in performance, stalls, connection failures and system instability. 7 | 8 | Detailed discussion of the importance of hugepages is beyond the scope of this summary info. Following blog post is highly recommend for further reading : 9 | **[Why Linux HugePages are Super Important for Database Servers: A Case with PostgreSQL](https://www.percona.com/blog/why-linux-hugepages-are-super-important-for-database-servers-a-case-with-postgresql/)** 10 | 11 | ## Warning about Missleading Benchmarks 12 | Synthentic benchmarks often consideres only speed, without considering other stability / reliablity aspect of the database system on the long run. Many of the synthetic benchmarks may not be able to demonstrate any considerable speed difference after enabling Hugepages. 13 | 14 | # Suggessions 15 | 1. Disable THP (Trasperent huge pages), preferably on the bootloader level of Linux 16 | 2. Eanable regular HugePages (2MB Size) with sufficient number of huge pages. Please refer the above blog post for details of the calculations. 17 | 3. Change the parameter `huge_pages` to `on` at PostgreSQL Instance to make sure that PostgreSQL will allocate sufficient huge pages on startup. It is good to prevent PostgreSQL startup with wrong settings, rather than a startup with wrong settings and troubles later. -------------------------------------------------------------------------------- /docs/events/TRANSACTIONID.html: -------------------------------------------------------------------------------- 1 |

Lock:transactionid

2 |

3 | Session waiting for other session to complete the transaction. (Session is blocked).
4 | The transactionid wait event in PostgreSQL occurs when a backend process is blocked while waiting for a specific transaction to complete.
5 | For example, Updating the same rows of a table from multiple sessions can lead to this situation.
6 | This is one of the more serious wait events that can significantly impact database performance.

7 | This waitevent indicates that: 8 |
    9 |
  1. One transaction is waiting for another transaction to finish (commit or abort)
  2. 10 |
  3. There is direct transaction ID dependency between sessions
  4. 11 |
  5. This typically involves row-level locking scenarios where MVCC (Multi-Version Concurrency Control) can't resolve the conflict
  6. 12 |
13 | 14 |

Why it is happening (Root Causes)

15 | Following are common scenarios that lead to transactionid waits: 16 |
    17 |
  1. Lock Contention: When Transaction A holds locks that Transaction B needs 18 | Example: Long-running UPDATE blocking another UPDATE/DELETE on same rows
  2. 19 |
  3. Foreign Key Operations: When checking referential integrity during updates/deletes
  4. 20 |
  5. Prepared Transactions: Waiting for a prepared transaction (2PC) to commit/rollback
  6. 21 |
  7. Serializable Isolation Level: In SERIALIZABLE isolation, waiting for a potentially conflicting transaction to complete
  8. 22 |
  9. VACUUM Operations: When VACUUM is blocked by long-running transactions
  10. 23 |
24 | 25 |

Performance Implications

26 |
    27 |
  1. More severe than tuple waits as it involves entire transactions rather than individual rows
  2. 28 |
  3. Can lead to transaction chains where multiple sessions wait in sequence
  4. 29 |
30 |

Often indicates:

31 |
    32 |
  1. Long-running transactions holding locks
  2. 33 |
  3. Application logic issues (transactions staying open too long)
  4. 34 |
  5. Insufficient vacuuming leading to transaction ID wraparound prevention
  6. 35 |
-------------------------------------------------------------------------------- /docs/events/CLIENTREAD.html: -------------------------------------------------------------------------------- 1 |

Client:ClientRead

2 |

3 | This event occurs when a PostgreSQL backend process is waiting to receive data or a new command from the connected client application.
4 | Essentially, the database has finished its previous task and is asking, "What should I do next?"
5 |

6 | Big precentage of ClientRead indicates that the response from application is not fast enough. It is wasting wall-clock time and potentially holding locks.
7 | "ClientRead" wait-event combined with "idle-in-transaction" can cause contention in the server.
8 | Two reasons generally cause high values for this wait-event
9 |

Why it is happening (Root Causes)

10 |

1. Network Latency (The "Slow Road"):

11 | The communication channel between the database and the application/client may have low bandwith or high latency.
12 | For example, there could be too many network hops. Many of the Cloud, Virtualization, Containerization, Firewall, and Routing (sometimes multi-layer routing) are found to cause high network latency.
13 | Latency has nothing to do with network bandwidth. Even a very high bandwidth connection can have high latency and affect the database performance.
14 | Please remember that the network related waits within the trasactions are generally accounted as "ClientRead" 15 |

2. Application Logic (The "Long Pause"):

16 | This is the most dangerous cause. It happens when an application starts a transaction (BEGIN), performs a SQL command, and then pauses to do non-database work
17 | (like sending an email, processing a file, or calling an external API) before sending the next SQL command or COMMIT.
18 | During this pause, the database is left waiting for the next command from the application, leading to "ClientRead" waits.
19 | This situation is particularly problematic because it can lead to long-held locks and increased contention within the database.
20 |

3. Human Error:

21 | A user with interactive login access may start a transaction and then get distracted or take a long time to issue the next command or COMMIT.
22 | -------------------------------------------------------------------------------- /docs/walarchive.md: -------------------------------------------------------------------------------- 1 | # WAL archive failure and lag 2 | WAL files are pushed to external backup repositories where backup is maintained. 3 | Due various reasons the WAL archivings could be failing or delaying behind the current WAL generation (lag). 4 | Following SQLs could help to analyze the archiving failures and delays. 5 | If WAL archive is not healthy, Backups also might fail and Point-in-time-Recovery won't be possible. 6 | 7 | ## From pg_gather data 8 | ```SQL 9 | SELECT collect_ts "collect_time", current_wal "current_lsn", last_archived_wal 10 | , coalesce(nullif(CASE WHEN length(last_archived_wal) < 24 THEN '' ELSE ltrim(substring(last_archived_wal, 9, 8), '0') END, ''), '0') || '/' || substring(last_archived_wal, 23, 2) || '000001' "last_archived_lsn" 11 | , last_archived_time::text || ' (' || CASE WHEN EXTRACT(EPOCH FROM(collect_ts - last_archived_time)) < 0 THEN 'Right Now'::text ELSE (collect_ts - last_archived_time)::text END || ')' "last_archived_time" 12 | , pg_wal_lsn_diff( current_wal, (coalesce(nullif(CASE WHEN length(last_archived_wal) < 24 THEN '' ELSE ltrim(substring(last_archived_wal, 9, 8), '0') END, ''), '0') || '/' || substring(last_archived_wal, 23, 2) || '000001') :: pg_lsn ) 13 | ,last_failed_wal,last_failed_time 14 | FROM pg_gather, pg_archiver_stat; 15 | ``` 16 | 17 | ## From PostgreSQL Directly 18 | ```SQL 19 | SELECT CURRENT_TIMESTAMP,pg_current_wal_lsn() 20 | ,coalesce(nullif(CASE WHEN length(last_archived_wal) < 24 THEN '' ELSE ltrim(substring(last_archived_wal, 9, 8), '0') END, ''), '0') || '/' || substring(last_archived_wal, 23, 2) || '000001' "last_archived_lsn" 21 | , last_archived_time::text || ' (' || CASE WHEN EXTRACT(EPOCH FROM(CURRENT_TIMESTAMP - last_archived_time)) < 0 THEN 'Right Now'::text ELSE (CURRENT_TIMESTAMP - last_archived_time)::text END || ')' "last_archived_time" 22 | , pg_size_pretty(pg_wal_lsn_diff( pg_current_wal_lsn(), (coalesce(nullif(CASE WHEN length(last_archived_wal) < 24 THEN '' ELSE ltrim(substring(last_archived_wal, 9, 8), '0') END, ''), '0') || '/' || substring(last_archived_wal, 23, 2) || '000001') :: pg_lsn )) archive_lag 23 | ,last_failed_wal,last_failed_time 24 | FROM pg_stat_archiver; 25 | ``` -------------------------------------------------------------------------------- /dev/apply_template.awk: -------------------------------------------------------------------------------- 1 | ################################################################# 2 | # AWK script by Nickolay Ihalainen 3 | # Generate the SQL script (report.sql) for final analysis report 4 | # Using HTML Template by replacing markers 5 | ################################################################# 6 | 7 | function psql_echo_escape() { 8 | in_double_quotes = 0 9 | split($0, chars, "") 10 | for (i=1; i <= length($0); i++) { 11 | ch = chars[i] 12 | if (ch == "\"" && in_double_quotes == 0) { 13 | in_double_quotes = 1 14 | printf("%s", "\"") 15 | } else if (ch == "\"" && in_double_quotes == 1) { 16 | in_double_quotes = 0 17 | printf("%s", "\"") 18 | } 19 | #else if (ch == "'" && in_double_quotes == 0) { 20 | # printf("%s", "''") 21 | #} 22 | else { 23 | printf("%s", chars[i]) 24 | } 25 | } 26 | } 27 | 28 | BEGIN { 29 | tpl = 0 30 | } 31 | { 32 | if (tpl == 0) { 33 | if ( /^<%.*%>/ ) { ## Single line SQL statement/psql command 34 | sub(/<%\s*/, ""); 35 | sub(/\s*%>/, ""); 36 | print 37 | } else if ( /^<%/ ) { ## Multi line SQL statement starting 38 | tpl = 1; 39 | sub(/<%\s*/, ""); 40 | print 41 | } else if ( /^\s*$/ ) { ## Empty lines for readability can be removed 42 | #print 43 | } else if ( /^\s*\/\// ) { ## Comments with double slash can be removed 44 | 45 | } else { ## Remaining lines (HTML tags) echo as it is 46 | sub(/^/, "\\echo "); 47 | split($0,a,/[^:]\/\//); ##split the line based an in-line comments with //, except :// 48 | $0=a[1]; ##Remove the inline comment part 49 | psql_echo_escape() ## Replace single quotes outside double quotes with escaped value 50 | printf("\n") 51 | } 52 | } else { ## Following lines of Multi line SQL statement 53 | if ( /%>/ ) { ## Last line of the Multi line SQL statement 54 | tpl = 0; 55 | sub(/%>/, ""); 56 | print 57 | } else { ## All lines in between starting and last line of multi line statement 58 | print 59 | } 60 | } 61 | } 62 | -------------------------------------------------------------------------------- /docs/unusedIndexes.md: -------------------------------------------------------------------------------- 1 | # Unused Indexes 2 | 3 | Unused indexes cause severe penalties in the system: It slow down DML operations for no benefit, They consume more memory, They Cause more IO, Generate more WAL, and Autovacuum will have more work to do. 4 | 5 | ## From pg_gather 6 | 7 | Following SQL statement can be used against the database where the pg_gather data is imported. 8 | 9 | ``` 10 | SELECT ns.nsname AS "Schema",ci.relname as "Index", ct.relname AS "Table", ptab.relname "TOAST of Table", 11 | indisunique as "UK?",indisprimary as "PK?",numscans as "Scans",size,ci.blocks_fetched "Fetch",ci.blocks_hit*100/nullif(ci.blocks_fetched,0) "C.Hit%", to_char(i.lastuse,'YYYY-MM-DD HH24:MI:SS') "Last Use" 12 | FROM pg_get_index i 13 | JOIN pg_get_class ct ON i.indrelid = ct.reloid 14 | JOIN pg_get_ns ns ON ct.relnamespace = ns.nsoid 15 | JOIN pg_get_class ci ON i.indexrelid = ci.reloid 16 | LEFT JOIN pg_get_toast tst ON ct.reloid = tst.toastid 17 | LEFT JOIN pg_get_class ptab ON tst.relid = ptab.reloid 18 | WHERE tst.relid IS NULL OR ptab.reloid IS NOT NULL 19 | ORDER BY size DESC; 20 | ``` 21 | 22 | ## From database 23 | Following SQL statement can be used agains the target database 24 | ``` 25 | SELECT n.nspname AS schema,relid::regclass as table, indexrelid::regclass as index, indisunique, indisprimary 26 | FROM pg_stat_user_indexes 27 | JOIN pg_index i USING (indexrelid) 28 | JOIN pg_class c ON i.indexrelid = c.oid 29 | JOIN pg_namespace n ON c.relnamespace = n.oid 30 | WHERE idx_scan = 0; 31 | ``` 32 | OR more detailed (TOAST and TOAST index) 33 | ``` 34 | SELECT n.nspname AS schema,t.relname "table", c.relname as index, tst.relname "TOAST", 35 | tst.oid "TOAST ID 1", 36 | tstind.relid "TOAST ID 2", 37 | tstind.indexrelname "TOAST Index", 38 | tstind.indexrelid "TOST INDEX relid", 39 | i.indisunique, i.indisprimary,pg_stat_user_indexes.idx_scan "Index usage", tstind.idx_scan "Toast index usage" 40 | FROM pg_stat_user_indexes 41 | JOIN pg_index i USING (indexrelid) 42 | JOIN pg_class c ON i.indexrelid = c.oid 43 | JOIN pg_class t ON i.indrelid = t.oid 44 | JOIN pg_namespace n ON c.relnamespace = n.oid 45 | LEFT JOIN pg_class tst ON t.reltoastrelid = tst.oid 46 | LEFT JOIN pg_stat_all_indexes tstind ON tst.oid = tstind.relid; 47 | ``` -------------------------------------------------------------------------------- /docs/events/LOGICALLAUNCHERMAIN.html: -------------------------------------------------------------------------------- 1 |

LogicalLauncherMain

2 | This wait event corresponds to a situation when the Logical Replication Launcher process is waiting in its main sleep loop for something to happen
3 | Logical Replication Launcher is a background process that is responsible for launching and managing logical replication workers in PostgreSQL.
4 | It periodically wakes up to check for new replication tasks and starts the necessary worker processes to handle them.
5 | When the Logical Replication Launcher is in its main sleep loop, it is essentially idle, waiting for a signal or event that indicates it needs to take action, such as starting a new logical replication worker.
6 | 7 |

How it works

8 | 13 |

When to be concerned

14 | 20 |

Tuning considerations

21 | max_worker_processes parameter can be adjusted to ensure that there are enough resources available for logical replication workers and other background processes.
22 | Logical Replication Configuration Parameters -------------------------------------------------------------------------------- /docs/params/default_toast_compression.md: -------------------------------------------------------------------------------- 1 | # default_toast_compression 2 | PostgreSQL allows users to select the compression algorithm used for TOAST compression from PostgreSQL version 14 onwards. 3 | PostgreSQL historically used the built-in algorithm `pglz` as default. However algorithms like `lz4` showed significant performance gains [1]. 4 | PostgreSQL allow users to select the algorithm at a column basis; for example 5 | ```sql 6 | CREATE TABLE tbl (id int, 7 | col1 text COMPRESSION pglz, 8 | col2 text COMPRESSION lz4, 9 | col3 text); 10 | ``` 11 | `lz4` is highly recommended for json datatypes 12 | ### Requirement: 13 | In order to avail `lz4` as the compression algorithm, The PostgreSQL should be built with the configuration option `--with-lz4`. You may confirm the configuration options used for building 14 | ``` 15 | pg_config | grep -i 'with-lz4' 16 | ``` 17 | 18 | ## How to check the current toasting algorithm 19 | Per-tuple,per-column toasting compression can be checked using `pg_column_compression()`. 20 | For example: 21 | ``` 22 | select id,pg_column_compression(col3) FROM tbl ; 23 | ``` 24 | 25 | ## How to change the toast compression 26 | 1. The compression method used for existing tuples won't chnage. Only newly inserted tuples will have the new compression method. 27 | 2. `VACUUM FULL` command or `pg_repack` WILL NOT change the compression algorithm. They cannot be used to alter the TOAST compression algorithm. 28 | 3. CREATE TABLE tab AS SELECT ... (CTAS) WILL NOT change the compression algorithm 29 | 4. INSERT INTO tab AS SELECT  also WILL NOT change the compression algorithm 30 | 5. Logical dump (`pg_dump`) and `pg_restore` can be used for changing the toast compression 31 | 6. Existing column values of tuples can be changed if there is an operation which requires detoasting the column 32 | ``` 33 | update tbl1 SET col3=col3||'' WHERE pg_column_compression(col3) != 'lz4'; 34 | # or 35 | update tbl SET col3=trim(col3) WHERE pg_column_compression(col3) != 'lz4'; 36 | # or for json 37 | update jsondat set dat = dat || '{}' where pg_column_compression(dat) != 'lz4'; 38 | ``` 39 | 40 | 41 | ## References 42 | 1. https://www.postgresql.fastware.com/blog/what-is-the-new-lz4-toast-compression-in-postgresql-14 43 | 2. https://stackoverflow.com/questions/71086258/query-on-json-jsonb-column-super-slow-can-i-use-an-index -------------------------------------------------------------------------------- /docs/tablespace.md: -------------------------------------------------------------------------------- 1 | # Tablespaces 2 | In PostgreSQL, each tablespace is a storage/mount point location. 3 | 4 | Historically, Tablespaces were the only option for spreading the I/O load to multiple disk systems, which was the major use of tablespaces. 5 | However, Advancements in LVM have made it less useful these days. LVMs are capable of striping data across different disk systems, which can give the total I/O bandwidth of all the storages put together. 6 | 7 | ## Checking the tablespaces 8 | ### From pg_gather data 9 | ``` 10 | select * from pg_get_tablespace ; 11 | ``` 12 | 13 | ## Directly from the database 14 | ``` 15 | SELECT spcname AS "Name", 16 | pg_catalog.pg_get_userbyid(spcowner) AS "Owner", 17 | pg_catalog.pg_tablespace_location(oid) AS "Location" 18 | FROM pg_catalog.pg_tablespace 19 | ORDER BY 1; 20 | ``` 21 | 22 | ## Disadvantages of Tablespaces. 23 | 1. DBA will have higher responsibility for monitoring and managing each tablespace and space availability. 24 | Segregation of storage into multiple mount points can lead to management and monitoring complexities. 25 | Capacity planning needs to be done for each location. 26 | 2. PostgreSQL need to manage more metadata and dependent metadata in the primary data directory 27 | 3. Unavailability of single tablespace can affect the availabltiy of the entire cluster. We might be introducing more failure points by increasing the number of tablespaces 28 | 4. Standby cluster also need to have similar tablespaces and file locations. 29 | 5. Backup and recovery become more complex operations.In the event of a disaster, getting a replacement machine with a similar structure might be more involved. 30 | 31 | ## Uses of Tablespaces 32 | Even though there is many disadvantages and maintenance overheads for using tablespaces, They can be useful for some of the senarios 33 | 1. Isolation of I/O load   34 | There could be cases where we may want to avoid I/O load on specific tables not affecting the I/O operation on other tables. 35 | 2. Separate tablespace for temp   36 | PostgreSQL allows the `temp_tablespaces` to use a different tablespace which is pointing to different mount point 37 | 3. Storage with different I/O characteristics   38 | For example, we might want to move old table partitions to cheap, slow storage for archival purposes. If the number of queries hitting those old partitions is very rare, that could be a saving. 39 | -------------------------------------------------------------------------------- /docs/events/TUPLE.html: -------------------------------------------------------------------------------- 1 |

Lock:tuple

2 | The tuple wait event is a specific type of lock contention that occurs at the row level.
3 | The tuple wait event occurs when a backend process is waiting to acquire a lock on a specific tuple (a physical row version).
4 | when a transaction wants to update or delete a row, it must first lock that row. If another transaction already holds a lock on that row,
5 | the second transaction enters a "waiting" state. If the contention is specifically for the right to access the row structure itself or to wait for a prior locker to finish, it is categorized as a Lock: tuple event. 6 |
7 | 8 |

Understanding both tuple & transactionid wait events

9 | transactionid: You are waiting for another transaction to COMMIT or ROLLBACK so you can see if the row is actually available.
10 | tuple: You are waiting in a "queue" to acquire the lock on the row itself. This usually happens when three or more transactions are trying to modify the same row simultaneously. 11 |

12 | When multiple sessions target the same row, the sequence usually looks like this: 13 |

18 | Essentially, tuple is the "waiting room" for row-level locks when there is high concurrency on a single record. 19 |

20 | 21 |

Common Causes of Lock: tuple Wait Events

22 |