├── .gitignore ├── DEVELOP.md ├── LICENSE ├── README.md ├── images └── pg_analytica.jpg ├── ingestor ├── Makefile ├── constants.h ├── export_entry.h ├── file_utils.h ├── generate_test_data.sql ├── ingestor--0.0.1.sql ├── ingestor.c ├── ingestor.control ├── parquet.sql ├── registry.c └── test.sql └── notes.md /.gitignore: -------------------------------------------------------------------------------- 1 | */*.o 2 | */*.so 3 | .vscode/* 4 | .DS_Store -------------------------------------------------------------------------------- /DEVELOP.md: -------------------------------------------------------------------------------- 1 | 2 | ## Postgres Installation using Homebrew: 3 | 4 | ``` 5 | brew install postgresql 6 | ``` 7 | After you install PostgreSQL on your Mac, you will be able to start it via Terminal using the following command: 8 | ``` 9 | brew services start postgresql 10 | ``` 11 | ``` 12 | Note: 13 | You can stop PostgreSQL using the `brew services stop postgresql` command. 14 | ``` 15 | Enter into Postgres using below command: 16 | 17 | ``` 18 | psql postgres 19 | ``` 20 | ## Check PostgreSQL version and PATH 21 | 22 | Check the version and PATH with below command: 23 | 24 | ``` 25 | pg_config --pkglibdir 26 | ``` 27 | The output should be: 28 | /opt/homebrew/lib/postgresql@14 29 | 30 | 31 | ## Running a sample extension 32 | 33 | Create a new directory and place four files there named pg_analytica.c, pg_analytica.control, Makefile and pg_analytica--0.0.1.sql. 34 | 35 | Run make command in the above directory: 36 | 37 | ```python 38 | make CC='/usr/bin/gcc' 39 | make install 40 | ``` 41 | 42 | ## Generate sample data for testing 43 | 44 | Run the below command. 45 | 46 | ``` 47 | psql postgres -f test.sql 48 | ``` 49 | 50 | This creates a table and populates sample rows that can used in testing. 51 | 52 | ## Testing the extension 53 | 54 | Add tests to the test.sql file. 55 | 56 | Tests can be executed using the command below. 57 | ``` 58 | psql postgres -f test.sql 59 | ``` 60 | 61 | ## Viewing Logs 62 | 63 | Logs can be viewed at. 64 | 65 | ``` 66 | tail -100 /opt/homebrew/var/log/postgresql@14.log 67 | ``` 68 | 69 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | GNU GENERAL PUBLIC LICENSE 2 | Version 2, June 1991 3 | 4 | Copyright (C) 1989, 1991 Free Software Foundation, Inc., 5 | 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA 6 | Everyone is permitted to copy and distribute verbatim copies 7 | of this license document, but changing it is not allowed. 8 | 9 | Preamble 10 | 11 | The licenses for most software are designed to take away your 12 | freedom to share and change it. By contrast, the GNU General Public 13 | License is intended to guarantee your freedom to share and change free 14 | software--to make sure the software is free for all its users. This 15 | General Public License applies to most of the Free Software 16 | Foundation's software and to any other program whose authors commit to 17 | using it. (Some other Free Software Foundation software is covered by 18 | the GNU Lesser General Public License instead.) You can apply it to 19 | your programs, too. 20 | 21 | When we speak of free software, we are referring to freedom, not 22 | price. Our General Public Licenses are designed to make sure that you 23 | have the freedom to distribute copies of free software (and charge for 24 | this service if you wish), that you receive source code or can get it 25 | if you want it, that you can change the software or use pieces of it 26 | in new free programs; and that you know you can do these things. 27 | 28 | To protect your rights, we need to make restrictions that forbid 29 | anyone to deny you these rights or to ask you to surrender the rights. 30 | These restrictions translate to certain responsibilities for you if you 31 | distribute copies of the software, or if you modify it. 32 | 33 | For example, if you distribute copies of such a program, whether 34 | gratis or for a fee, you must give the recipients all the rights that 35 | you have. You must make sure that they, too, receive or can get the 36 | source code. And you must show them these terms so they know their 37 | rights. 38 | 39 | We protect your rights with two steps: (1) copyright the software, and 40 | (2) offer you this license which gives you legal permission to copy, 41 | distribute and/or modify the software. 42 | 43 | Also, for each author's protection and ours, we want to make certain 44 | that everyone understands that there is no warranty for this free 45 | software. If the software is modified by someone else and passed on, we 46 | want its recipients to know that what they have is not the original, so 47 | that any problems introduced by others will not reflect on the original 48 | authors' reputations. 49 | 50 | Finally, any free program is threatened constantly by software 51 | patents. We wish to avoid the danger that redistributors of a free 52 | program will individually obtain patent licenses, in effect making the 53 | program proprietary. To prevent this, we have made it clear that any 54 | patent must be licensed for everyone's free use or not licensed at all. 55 | 56 | The precise terms and conditions for copying, distribution and 57 | modification follow. 58 | 59 | GNU GENERAL PUBLIC LICENSE 60 | TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 61 | 62 | 0. This License applies to any program or other work which contains 63 | a notice placed by the copyright holder saying it may be distributed 64 | under the terms of this General Public License. The "Program", below, 65 | refers to any such program or work, and a "work based on the Program" 66 | means either the Program or any derivative work under copyright law: 67 | that is to say, a work containing the Program or a portion of it, 68 | either verbatim or with modifications and/or translated into another 69 | language. (Hereinafter, translation is included without limitation in 70 | the term "modification".) Each licensee is addressed as "you". 71 | 72 | Activities other than copying, distribution and modification are not 73 | covered by this License; they are outside its scope. The act of 74 | running the Program is not restricted, and the output from the Program 75 | is covered only if its contents constitute a work based on the 76 | Program (independent of having been made by running the Program). 77 | Whether that is true depends on what the Program does. 78 | 79 | 1. You may copy and distribute verbatim copies of the Program's 80 | source code as you receive it, in any medium, provided that you 81 | conspicuously and appropriately publish on each copy an appropriate 82 | copyright notice and disclaimer of warranty; keep intact all the 83 | notices that refer to this License and to the absence of any warranty; 84 | and give any other recipients of the Program a copy of this License 85 | along with the Program. 86 | 87 | You may charge a fee for the physical act of transferring a copy, and 88 | you may at your option offer warranty protection in exchange for a fee. 89 | 90 | 2. You may modify your copy or copies of the Program or any portion 91 | of it, thus forming a work based on the Program, and copy and 92 | distribute such modifications or work under the terms of Section 1 93 | above, provided that you also meet all of these conditions: 94 | 95 | a) You must cause the modified files to carry prominent notices 96 | stating that you changed the files and the date of any change. 97 | 98 | b) You must cause any work that you distribute or publish, that in 99 | whole or in part contains or is derived from the Program or any 100 | part thereof, to be licensed as a whole at no charge to all third 101 | parties under the terms of this License. 102 | 103 | c) If the modified program normally reads commands interactively 104 | when run, you must cause it, when started running for such 105 | interactive use in the most ordinary way, to print or display an 106 | announcement including an appropriate copyright notice and a 107 | notice that there is no warranty (or else, saying that you provide 108 | a warranty) and that users may redistribute the program under 109 | these conditions, and telling the user how to view a copy of this 110 | License. (Exception: if the Program itself is interactive but 111 | does not normally print such an announcement, your work based on 112 | the Program is not required to print an announcement.) 113 | 114 | These requirements apply to the modified work as a whole. If 115 | identifiable sections of that work are not derived from the Program, 116 | and can be reasonably considered independent and separate works in 117 | themselves, then this License, and its terms, do not apply to those 118 | sections when you distribute them as separate works. But when you 119 | distribute the same sections as part of a whole which is a work based 120 | on the Program, the distribution of the whole must be on the terms of 121 | this License, whose permissions for other licensees extend to the 122 | entire whole, and thus to each and every part regardless of who wrote it. 123 | 124 | Thus, it is not the intent of this section to claim rights or contest 125 | your rights to work written entirely by you; rather, the intent is to 126 | exercise the right to control the distribution of derivative or 127 | collective works based on the Program. 128 | 129 | In addition, mere aggregation of another work not based on the Program 130 | with the Program (or with a work based on the Program) on a volume of 131 | a storage or distribution medium does not bring the other work under 132 | the scope of this License. 133 | 134 | 3. You may copy and distribute the Program (or a work based on it, 135 | under Section 2) in object code or executable form under the terms of 136 | Sections 1 and 2 above provided that you also do one of the following: 137 | 138 | a) Accompany it with the complete corresponding machine-readable 139 | source code, which must be distributed under the terms of Sections 140 | 1 and 2 above on a medium customarily used for software interchange; or, 141 | 142 | b) Accompany it with a written offer, valid for at least three 143 | years, to give any third party, for a charge no more than your 144 | cost of physically performing source distribution, a complete 145 | machine-readable copy of the corresponding source code, to be 146 | distributed under the terms of Sections 1 and 2 above on a medium 147 | customarily used for software interchange; or, 148 | 149 | c) Accompany it with the information you received as to the offer 150 | to distribute corresponding source code. (This alternative is 151 | allowed only for noncommercial distribution and only if you 152 | received the program in object code or executable form with such 153 | an offer, in accord with Subsection b above.) 154 | 155 | The source code for a work means the preferred form of the work for 156 | making modifications to it. For an executable work, complete source 157 | code means all the source code for all modules it contains, plus any 158 | associated interface definition files, plus the scripts used to 159 | control compilation and installation of the executable. However, as a 160 | special exception, the source code distributed need not include 161 | anything that is normally distributed (in either source or binary 162 | form) with the major components (compiler, kernel, and so on) of the 163 | operating system on which the executable runs, unless that component 164 | itself accompanies the executable. 165 | 166 | If distribution of executable or object code is made by offering 167 | access to copy from a designated place, then offering equivalent 168 | access to copy the source code from the same place counts as 169 | distribution of the source code, even though third parties are not 170 | compelled to copy the source along with the object code. 171 | 172 | 4. You may not copy, modify, sublicense, or distribute the Program 173 | except as expressly provided under this License. Any attempt 174 | otherwise to copy, modify, sublicense or distribute the Program is 175 | void, and will automatically terminate your rights under this License. 176 | However, parties who have received copies, or rights, from you under 177 | this License will not have their licenses terminated so long as such 178 | parties remain in full compliance. 179 | 180 | 5. You are not required to accept this License, since you have not 181 | signed it. However, nothing else grants you permission to modify or 182 | distribute the Program or its derivative works. These actions are 183 | prohibited by law if you do not accept this License. Therefore, by 184 | modifying or distributing the Program (or any work based on the 185 | Program), you indicate your acceptance of this License to do so, and 186 | all its terms and conditions for copying, distributing or modifying 187 | the Program or works based on it. 188 | 189 | 6. Each time you redistribute the Program (or any work based on the 190 | Program), the recipient automatically receives a license from the 191 | original licensor to copy, distribute or modify the Program subject to 192 | these terms and conditions. You may not impose any further 193 | restrictions on the recipients' exercise of the rights granted herein. 194 | You are not responsible for enforcing compliance by third parties to 195 | this License. 196 | 197 | 7. If, as a consequence of a court judgment or allegation of patent 198 | infringement or for any other reason (not limited to patent issues), 199 | conditions are imposed on you (whether by court order, agreement or 200 | otherwise) that contradict the conditions of this License, they do not 201 | excuse you from the conditions of this License. If you cannot 202 | distribute so as to satisfy simultaneously your obligations under this 203 | License and any other pertinent obligations, then as a consequence you 204 | may not distribute the Program at all. For example, if a patent 205 | license would not permit royalty-free redistribution of the Program by 206 | all those who receive copies directly or indirectly through you, then 207 | the only way you could satisfy both it and this License would be to 208 | refrain entirely from distribution of the Program. 209 | 210 | If any portion of this section is held invalid or unenforceable under 211 | any particular circumstance, the balance of the section is intended to 212 | apply and the section as a whole is intended to apply in other 213 | circumstances. 214 | 215 | It is not the purpose of this section to induce you to infringe any 216 | patents or other property right claims or to contest validity of any 217 | such claims; this section has the sole purpose of protecting the 218 | integrity of the free software distribution system, which is 219 | implemented by public license practices. Many people have made 220 | generous contributions to the wide range of software distributed 221 | through that system in reliance on consistent application of that 222 | system; it is up to the author/donor to decide if he or she is willing 223 | to distribute software through any other system and a licensee cannot 224 | impose that choice. 225 | 226 | This section is intended to make thoroughly clear what is believed to 227 | be a consequence of the rest of this License. 228 | 229 | 8. If the distribution and/or use of the Program is restricted in 230 | certain countries either by patents or by copyrighted interfaces, the 231 | original copyright holder who places the Program under this License 232 | may add an explicit geographical distribution limitation excluding 233 | those countries, so that distribution is permitted only in or among 234 | countries not thus excluded. In such case, this License incorporates 235 | the limitation as if written in the body of this License. 236 | 237 | 9. The Free Software Foundation may publish revised and/or new versions 238 | of the General Public License from time to time. Such new versions will 239 | be similar in spirit to the present version, but may differ in detail to 240 | address new problems or concerns. 241 | 242 | Each version is given a distinguishing version number. If the Program 243 | specifies a version number of this License which applies to it and "any 244 | later version", you have the option of following the terms and conditions 245 | either of that version or of any later version published by the Free 246 | Software Foundation. If the Program does not specify a version number of 247 | this License, you may choose any version ever published by the Free Software 248 | Foundation. 249 | 250 | 10. If you wish to incorporate parts of the Program into other free 251 | programs whose distribution conditions are different, write to the author 252 | to ask for permission. For software which is copyrighted by the Free 253 | Software Foundation, write to the Free Software Foundation; we sometimes 254 | make exceptions for this. Our decision will be guided by the two goals 255 | of preserving the free status of all derivatives of our free software and 256 | of promoting the sharing and reuse of software generally. 257 | 258 | NO WARRANTY 259 | 260 | 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY 261 | FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN 262 | OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES 263 | PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED 264 | OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF 265 | MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS 266 | TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE 267 | PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, 268 | REPAIR OR CORRECTION. 269 | 270 | 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING 271 | WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR 272 | REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, 273 | INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING 274 | OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED 275 | TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY 276 | YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER 277 | PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE 278 | POSSIBILITY OF SUCH DAMAGES. 279 | 280 | END OF TERMS AND CONDITIONS 281 | 282 | How to Apply These Terms to Your New Programs 283 | 284 | If you develop a new program, and you want it to be of the greatest 285 | possible use to the public, the best way to achieve this is to make it 286 | free software which everyone can redistribute and change under these terms. 287 | 288 | To do so, attach the following notices to the program. It is safest 289 | to attach them to the start of each source file to most effectively 290 | convey the exclusion of warranty; and each file should have at least 291 | the "copyright" line and a pointer to where the full notice is found. 292 | 293 | 294 | Copyright (C) 2024 Sushrut Shivaswamy 295 | 296 | This program is free software; you can redistribute it and/or modify 297 | it under the terms of the GNU General Public License as published by 298 | the Free Software Foundation; either version 2 of the License, or 299 | (at your option) any later version. 300 | 301 | This program is distributed in the hope that it will be useful, 302 | but WITHOUT ANY WARRANTY; without even the implied warranty of 303 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 304 | GNU General Public License for more details. 305 | 306 | You should have received a copy of the GNU General Public License along 307 | with this program; if not, write to the Free Software Foundation, Inc., 308 | 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. 309 | 310 | Also add information on how to contact you by electronic and paper mail. 311 | 312 | If the program is interactive, make it output a short notice like this 313 | when it starts in an interactive mode: 314 | 315 | Gnomovision version 69, Copyright (C) year name of author 316 | Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. 317 | This is free software, and you are welcome to redistribute it 318 | under certain conditions; type `show c' for details. 319 | 320 | The hypothetical commands `show w' and `show c' should show the appropriate 321 | parts of the General Public License. Of course, the commands you use may 322 | be called something other than `show w' and `show c'; they could even be 323 | mouse-clicks or menu items--whatever suits your program. 324 | 325 | You should also get your employer (if you work as a programmer) or your 326 | school, if any, to sign a "copyright disclaimer" for the program, if 327 | necessary. Here is a sample; alter the names: 328 | 329 | Yoyodyne, Inc., hereby disclaims all copyright interest in the program 330 | `Gnomovision' (which makes passes at compilers) written by James Hacker. 331 | 332 | , 1 April 1989 333 | Ty Coon, President of Vice 334 | 335 | This General Public License does not permit incorporating your program into 336 | proprietary programs. If your program is a subroutine library, you may 337 | consider it more useful to permit linking proprietary applications with the 338 | library. If this is what you want to do, use the GNU Lesser General 339 | Public License instead of this License. 340 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ![pg_analytica Banner](images/pg_analytica.jpg) 2 | 3 | ## What is pg_analytica 4 | 5 | *pg_analytica* is a Postgres extension that speeds up analytics queries by upto 90%. 6 | 7 | ### How it works 8 | 9 | *pg_analytica* automatically exports specified columns from your table into [columnar format](https://parquet.apache.org) at regular intervals. The columnar storage allows for blazing-fast querying compared to traditional row-based storage. 10 | 11 | ### Getting Started 12 | 13 | ### Download extension dependencies 14 | 15 | The extension relies on the following dependencies. 16 | - parquet-glib 17 | - arrow-glib 18 | - gio-2.0 19 | 20 | Installation instructions for various platforms can be found on the [Apache Arrow docs](https://arrow.apache.org/install/). 21 | 22 | The extension relies on the [parquet_fdw](https://github.com/adjust/parquet_fdw) extension to query the exported columnar data. Follow the [installation](https://github.com/adjust/parquet_fdw?tab=readme-ov-file#installation) instructions on the repo. 23 | 24 | Ensure that the extension is loaded before proceeding with next steps. 25 | Running the query below within a psql session should show that the extension is installed. 26 | 27 | ``` 28 | postgres=# SELECT * FROM pg_extension WHERE extname = 'parquet_fdw'; 29 | ``` 30 | 31 | 32 | ### Installing pg_analytica 33 | 34 | Clone the extension to your workstation and navigate to the ingestor directory. 35 | 36 | ``` 37 | $ cd ingestor 38 | ``` 39 | 40 | Build and install the extension to your Postgres install path. 41 | 42 | ``` 43 | $ make 44 | $ make install 45 | ``` 46 | 47 | ### Load the extension 48 | 49 | Start a psql session and load the extension. 50 | ``` 51 | $ psql postgres 52 | ``` 53 | 54 | Within the session, load the extension. 55 | ``` 56 | postgres=# CREATE EXTENSION ingestor; 57 | ``` 58 | 59 | ### Register a table for export 60 | 61 | The extension periodically exports data from registered tables in columnar format to 62 | provide speed ups in querying. To register a table for export use the command below 63 | in the psql session where the extension was loaded. 64 | 65 | ``` 66 | postgres=# SELECT register_table_export( 67 | -- table name 68 | 'your_table_name', 69 | -- list of columns to export 70 | '{column_1,column_2,column_3}', 71 | -- Hours after which table data will be re-exported 72 | 10 73 | ); 74 | ``` 75 | 76 | *pg_analytica* supports the following column types currently. 77 | 78 | | PG Type | Support | 79 | | -------------- | ---------------------- | 80 | | smallint | :white_check_mark: | 81 | | integer | :white_check_mark: | 82 | | bigint | :white_check_mark: | 83 | | float | :white_check_mark: | 84 | | double | :white_check_mark: | 85 | | varchar | :white_check_mark: | 86 | | text | :white_check_mark: | 87 | | boolean | :white_check_mark: | 88 | | timestamp | :white_check_mark: | 89 | 90 | ### Start the export background worker 91 | 92 | The ingestion background worker periodically finds registered tables eligible 93 | for export and exports the data. The worker only needs to be launched once. 94 | 95 | ``` 96 | postgres=# SELECT ingestor_launch(); 97 | ``` 98 | 99 | Once the export process is complete, you will be able to query the table. 100 | To check if your table is ready for export see if your table is listed in the result 101 | for the following query. 102 | 103 | ``` 104 | postgres=# SELECT * FROM analytica_exports WHERE last_run_completed IS NOT NULL; 105 | ``` 106 | 107 | During testing exporting a table with 60M rows and 5 columns took 40min but numbers may vary depending on your machine and size of the table. 108 | 109 | ### Querying the exported table 110 | 111 | Exported tables are queryable using the relation name `analytica_{table_name}`. 112 | If the table name you registered was `test_data` then the exported table will be queried as follows. 113 | ``` 114 | postgres=# SELECT * FROM analytica_test_data; 115 | ``` 116 | 117 | ### Benchmarks 118 | 119 | The extension was tested on a Postgres instance running on an M1 Air Macbook. To see the table used for testing see [here](./ingestor/generate_test_data.sql). 120 | Speed ups in querying columnar store vary based on the query but tests indicate upto 90% decease in query latency! 121 | 122 | 123 | ### Test 1 124 | 125 | Measured using the timing feature in postgres. 126 | - Table Size little over 2GiB 127 | - 30M rows with 2 int64 columns and a varchar column exported 128 | 129 | ``` 130 | \timing on 131 | 132 | 133 | -- Query on row based data 134 | postgres=# SELECT 135 | age, 136 | COUNT(*) 137 | FROM test_data 138 | WHERE age > 50 AND age < 90 AND id > 1000 139 | GROUP BY age; 140 | 141 | Time: 5153.393 ms (00:05.153) 142 | 143 | 144 | -- Query on columnar data 145 | postgres=# SELECT 146 | age, 147 | COUNT(*) 148 | FROM analytica_test_data 149 | WHERE age > 50 AND age < 90 AND id > 1000 150 | GROUP BY age; 151 | 152 | Time: 1633.900 ms (00:01.634) 153 | ``` 154 | 155 | ### Test 2 156 | 157 | Measured using the timing feature in postgres. 158 | - Table Size little over 5GiB 159 | - 60M rows with 6 columns of different types 160 | 161 | ``` 162 | \timing on 163 | 164 | 165 | -- Query on row based data 166 | SELECT 167 | age, 168 | COUNT(*) 169 | FROM test_data 170 | WHERE age > 50 AND age < 90 AND is_random 171 | GROUP BY age; 172 | 173 | Time: 9446.380 ms (00:09.446) 174 | 175 | 176 | -- Query on columnar data 177 | SELECT 178 | age, 179 | COUNT(*) 180 | FROM analytica_test_data 181 | WHERE age > 50 AND age < 90 AND is_random 182 | GROUP BY age; 183 | 184 | Time: 2312.845 ms (00:02.313) 185 | 186 | postgres=# \timing on 187 | Timing is on. 188 | 189 | -- Query on row based data 190 | postgres=# SELECT COUNT(*) from test_data WHERE age > 50 AND age < 55; 191 | count 192 | --------- 193 | 2399709 194 | (1 row) 195 | 196 | Time: 9904.160 ms (00:09.904) 197 | 198 | 199 | -- Query on columnar data 200 | postgres=# SELECT COUNT(*) from analytica_test_data WHERE age > 50 AND age < 55; 201 | count 202 | --------- 203 | 2398998 204 | (1 row) 205 | 206 | Time: 1399.673 ms (00:01.400) 207 | ``` -------------------------------------------------------------------------------- /images/pg_analytica.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sushrut141/pg_analytica/52d62405a2bbcf175977291ee6bf4734c5889104/images/pg_analytica.jpg -------------------------------------------------------------------------------- /ingestor/Makefile: -------------------------------------------------------------------------------- 1 | # Refer src/makefiles/pgxs.mk in postgres source for details about flags 2 | MODULE_big = ingestor 3 | OBJS = ingestor.o registry.o 4 | EXTENSION = ingestor # the extersion's name 5 | DATA = ingestor--0.0.1.sql # script file to install 6 | #REGRESS = get_sum_test # the test script file 7 | 8 | # `pkg-config --list-all | grep -e parquet` will find if installed 9 | PG_CFLAGS += `pkg-config --cflags parquet-glib` 10 | PG_CFLAGS += `pkg-config --cflags arrow-glib` 11 | PG_CFLAGS += `pkg-config --cflags gio-2.0` 12 | # Run pkg-config commands below to understand how outputs differ 13 | # TODO - PG_LIBS libraries are not passed with `-l`. Using PD_LFAGS 14 | # for both library include path(-L) and library name(-l) 15 | PG_LDFLAGS += `pkg-config --libs-only-L parquet-glib` 16 | PG_LDFLAGS += `pkg-config --libs-only-l parquet-glib` 17 | PG_LDFLAGS += `pkg-config --libs-only-L arrow-glib` 18 | PG_LDFLAGS += `pkg-config --libs-only-l arrow-glib` 19 | PG_LDFLAGS += `pkg-config --libs-only-L gio-2.0` 20 | PG_LDFLAGS += `pkg-config --libs-only-l gio-2.0` 21 | 22 | # for posgres build 23 | PG_CONFIG = pg_config 24 | PGXS := $(shell $(PG_CONFIG) --pgxs) 25 | include $(PGXS) 26 | -------------------------------------------------------------------------------- /ingestor/constants.h: -------------------------------------------------------------------------------- 1 | #ifndef _ANALYTICA_CONSTANTS_H 2 | #define _ANALYTICA_CONSTANTS_H 3 | 4 | enum ExportStatus { PENDING = 0, ACTIVE = 1, INACTIVE = -1 }; 5 | 6 | #endif -------------------------------------------------------------------------------- /ingestor/export_entry.h: -------------------------------------------------------------------------------- 1 | #ifndef _EXPORT_ENTRY_H 2 | #define _EXPORT_ENTRY_H 3 | 4 | #include 5 | 6 | #include "utils/elog.h" 7 | #include "utils/palloc.h" 8 | 9 | /** 10 | * Detail about table to be processed for export. 11 | * Callers are responsible for freeing memory of individual fields. 12 | */ 13 | typedef struct _ExportEntry { 14 | char *table_name; 15 | char **columns_to_export; 16 | int num_of_columns; 17 | int export_status; 18 | int64 chunk_size; 19 | } ExportEntry; 20 | 21 | void initialize_export_entry(const char *table_name, int num_of_columns, 22 | ExportEntry *entry) { 23 | // Initialize memory for column names. 24 | entry->num_of_columns = num_of_columns; 25 | entry->columns_to_export = (char **)palloc(num_of_columns * sizeof(char *)); 26 | // Initialize memory and set table name. 27 | entry->table_name = (char *)palloc(strlen(table_name) * sizeof(char)); 28 | strcpy(entry->table_name, table_name); 29 | } 30 | 31 | void export_entry_add_column(ExportEntry *entry, char *column_name, 32 | int column_num) { 33 | size_t column_name_size = strlen(column_name); 34 | entry->columns_to_export[column_num] = 35 | (char *)palloc(column_name_size * sizeof(char)); 36 | strcpy(entry->columns_to_export[column_num], column_name); 37 | } 38 | 39 | void free_export_entry(ExportEntry *entry) { 40 | pfree(entry->table_name); 41 | for (int i = 0; i < entry->num_of_columns; i += 1) { 42 | pfree(entry->columns_to_export[i]); 43 | } 44 | pfree(entry->columns_to_export); 45 | } 46 | 47 | #endif -------------------------------------------------------------------------------- /ingestor/file_utils.h: -------------------------------------------------------------------------------- 1 | #ifndef _FILE_UTILS_H 2 | #define _FILE_UTILS_H 3 | #include 4 | #include 5 | #include 6 | #include 7 | #include 8 | #include 9 | #include 10 | #include 11 | 12 | #include "postgres.h" 13 | 14 | /* 15 | * Populates the root path for the extension data. 16 | * Assumes that buffer has PATH_MAX space available. 17 | */ 18 | void populate_root_path(char *out, bool relative) { 19 | if (!relative) { 20 | if (getcwd(out, PATH_MAX) == NULL) { 21 | perror("getcwd"); 22 | } 23 | } else { 24 | strcat(out, "."); 25 | } 26 | strcat(out, "/pg_analytica/"); 27 | } 28 | 29 | /* 30 | * Populates the data path for columnar files for the supplied table in out. 31 | * Assumes that buffer has PATH_MAX space available. 32 | */ 33 | void populate_data_path_for_table(const char *table, char *out, bool relative) { 34 | populate_root_path(out, relative); 35 | strcat(out, table); 36 | } 37 | 38 | /* 39 | * Populates the temp path for columnar files for the supplied table in out. 40 | * Assumes that buffer has PATH_MAX space available. 41 | */ 42 | void populate_temp_path_for_table(const char *table, char *out, bool relative) { 43 | // TODO - getcwd does not work here. 44 | populate_data_path_for_table(table, out, relative); 45 | strcat(out, "/temp"); 46 | } 47 | 48 | static void delete_files_in_directory(const char *path) { 49 | DIR *dir = opendir(path); 50 | struct dirent *entry; 51 | while ((entry = readdir(dir)) != NULL) { 52 | char filepath[PATH_MAX]; 53 | snprintf(filepath, sizeof(filepath), "%s/%s", path, entry->d_name); 54 | 55 | // Skip special entries (".", "..") 56 | if (strcmp(entry->d_name, ".") == 0 || strcmp(entry->d_name, "..") == 0) { 57 | continue; 58 | } 59 | if (unlink(filepath) == -1) { 60 | elog(LOG, "Failed to delete file %s", filepath); 61 | perror("unlink"); 62 | } 63 | } 64 | closedir(dir); 65 | } 66 | 67 | int cleanup_table_data(const char *table_name) { 68 | char temp_path[PATH_MAX]; 69 | char data_path[PATH_MAX]; 70 | populate_temp_path_for_table(table_name, temp_path, /*relative=*/false); 71 | populate_data_path_for_table(table_name, data_path, /*relative=*/false); 72 | 73 | // delete temp files if they exist 74 | delete_files_in_directory(temp_path); 75 | // delete temp directory 76 | if (rmdir(temp_path) != 0) { 77 | elog(LOG, "Failed to delete temp data directory %s", temp_path); 78 | perror("rmdir"); 79 | return 1; 80 | } 81 | // delete main data files 82 | delete_files_in_directory(data_path); 83 | // delete data directory 84 | if (rmdir(data_path) != 0) { 85 | elog(LOG, "Failed to delete temp data file %s", data_path); 86 | perror("rmdir"); 87 | return 1; 88 | } 89 | return 0; 90 | } 91 | 92 | #endif -------------------------------------------------------------------------------- /ingestor/generate_test_data.sql: -------------------------------------------------------------------------------- 1 | DROP TABLE test_data; 2 | 3 | CREATE TABLE test_data ( 4 | id SERIAL PRIMARY KEY, 5 | name VARCHAR(50) NOT NULL, 6 | age INTEGER NOT NULL, 7 | is_random boolean, 8 | rating float4, 9 | created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP 10 | ); 11 | 12 | -- ~5 GiB table 13 | INSERT INTO test_data (name, age, is_random, rating) 14 | SELECT 15 | floor(random() * 1000) || ' Name', -- Generate random names 16 | floor(random() * 100) + 18 AS age, -- Generate random ages between 18-117 17 | random() < 0.5 as is_random, 18 | random() as rating 19 | FROM generate_series(1, 10000000); -------------------------------------------------------------------------------- /ingestor/ingestor--0.0.1.sql: -------------------------------------------------------------------------------- 1 | --complain if script is sourced in psql, rather than via CREATE EXTENSION 2 | \echo Use "CREATE EXTENSION ingestor;" to load this file. \quit 3 | 4 | -- Table to store export metadata 5 | CREATE TABLE analytica_exports ( 6 | table_name text PRIMARY KEY, 7 | last_run_completed TIMESTAMP WITH TIME ZONE, 8 | columns_to_export text[], 9 | export_frequency_hours int, 10 | export_status int, 11 | chunk_size int 12 | ); 13 | 14 | -- Register a postgres table for export 15 | CREATE OR REPLACE FUNCTION register_table_export( 16 | table_name text, 17 | columns_to_export text[], 18 | export_frequency_hours int, 19 | chunk_size int DEFAULT 100000) 20 | RETURNS bigint 21 | AS 'MODULE_PATHNAME' 22 | LANGUAGE C STRICT VOLATILE; 23 | 24 | -- Un-register a table for export and delete columnar data directory. 25 | CREATE OR REPLACE FUNCTION unregister_table_export(table_name text) 26 | RETURNS bigint 27 | AS 'MODULE_PATHNAME' 28 | LANGUAGE C VOLATILE; 29 | 30 | 31 | -- Launch an ingestion worker to export columnar data for tables registered for export. 32 | CREATE OR REPLACE FUNCTION ingestor_launch() 33 | RETURNS pg_catalog.int4 STRICT 34 | AS 'MODULE_PATHNAME' 35 | LANGUAGE C; 36 | 37 | -- function to list all parquet iles in data directory for table 38 | CREATE OR REPLACE FUNCTION list_parquet_files(args jsonb) 39 | returns text[] as 40 | $$ 41 | begin 42 | return array_agg(args->>'dir' || '/' || filename) 43 | from pg_ls_dir(args->>'dir') as files(filename) 44 | where filename ~~ '%.parquet'; 45 | end 46 | $$ 47 | language plpgsql; -------------------------------------------------------------------------------- /ingestor/ingestor.c: -------------------------------------------------------------------------------- 1 | #include "postgres.h" 2 | #include 3 | #include 4 | #include 5 | #include 6 | #include 7 | #include 8 | 9 | /* Header for arrow parquet */ 10 | #include 11 | #include 12 | #include 13 | 14 | /* These are always necessary for a bgworker */ 15 | #include "miscadmin.h" 16 | #include "postmaster/bgworker.h" 17 | #include "postmaster/interrupt.h" 18 | #include "storage/ipc.h" 19 | #include "storage/latch.h" 20 | #include "storage/lwlock.h" 21 | #include "storage/proc.h" 22 | #include "storage/shmem.h" 23 | 24 | /* these headers are used by this particular worker's code */ 25 | #include "access/xact.h" 26 | #include "commands/dbcommands.h" 27 | #include "constants.h" 28 | #include "executor/spi.h" 29 | #include "export_entry.h" 30 | #include "file_utils.h" 31 | #include "fmgr.h" 32 | #include "lib/stringinfo.h" 33 | #include "pgstat.h" 34 | #include "tcop/utility.h" 35 | #include "utils/acl.h" 36 | #include "utils/builtins.h" 37 | #include "utils/snapmgr.h" 38 | #include "utils/wait_event.h" 39 | 40 | PG_MODULE_MAGIC; 41 | 42 | PG_FUNCTION_INFO_V1(ingestor_launch); 43 | 44 | PGDLLEXPORT void ingestor_main(void) pg_attribute_noreturn(); 45 | 46 | #define MAX_COLUMN_NAME_CHARS 100 47 | #define MAX_SUPPORTED_COLUMNS 100 48 | #define MAX_EXPORT_ENTRIES 10 49 | #define PARQUET_ROW_GROUP_CHUNK_SIZE 10000 50 | 51 | /** Arrow functionality */ 52 | #define ASSIGN_IF_NOT_NULL(check_ptr, dest_ptr) \ 53 | { \ 54 | if (check_ptr != NULL) { \ 55 | *dest_ptr = *check_ptr; \ 56 | } \ 57 | } 58 | 59 | #define LOG_ARROW_ERROR(error) \ 60 | { \ 61 | if (error != NULL) { \ 62 | elog(LOG, error->message); \ 63 | } \ 64 | } 65 | 66 | static char source_database[] = "postgres"; 67 | static char *source_database_role = NULL; 68 | // Wait 5min after a successfull export run before starting again. 69 | // This param can be tuned using GUC variable. 70 | static int ingestor_naptime_sec = 300; 71 | 72 | static void list_current_directories() { 73 | DIR *dir; 74 | struct dirent *entry; 75 | 76 | dir = opendir("."); 77 | if (dir == NULL) { 78 | perror("opendir"); 79 | return; 80 | } 81 | 82 | while ((entry = readdir(dir)) != NULL) { 83 | // Check for "." and ".." entries 84 | if (strcmp(entry->d_name, ".") == 0 || strcmp(entry->d_name, "..") == 0) { 85 | continue; 86 | } 87 | elog(LOG, "Found directory with path %s", entry->d_name); 88 | } 89 | closedir(dir); 90 | } 91 | 92 | GArrowArray *create_int64_array(const int64 *data, int num_values, 93 | GError *error) { 94 | GError *inner_error = NULL; 95 | GArrowInt64ArrayBuilder *int_array_builder; 96 | int_array_builder = garrow_int64_array_builder_new(); 97 | 98 | for (int i = 0; i < num_values; i++) { 99 | int64 value = data[i]; 100 | elog(LOG, "Adding value %d to arrow array", value); 101 | garrow_int64_array_builder_append_value(int_array_builder, value, 102 | &inner_error); 103 | LOG_ARROW_ERROR(inner_error); 104 | ASSIGN_IF_NOT_NULL(inner_error, error); 105 | } 106 | GArrowArray *int_array = garrow_array_builder_finish( 107 | GARROW_ARRAY_BUILDER(int_array_builder), &inner_error); 108 | LOG_ARROW_ERROR(inner_error); 109 | ASSIGN_IF_NOT_NULL(inner_error, error); 110 | g_object_unref(int_array_builder); 111 | 112 | return int_array; 113 | } 114 | 115 | GArrowArray *create_timestamp_array(const int64 *data, int num_values, 116 | GError *error) { 117 | GError *inner_error = NULL; 118 | GArrowTimestampArrayBuilder *timestamp_array_builder; 119 | GTimeZone *time_zone = g_time_zone_new_utc(); 120 | GArrowTimestampDataType *timestamp_data_type = 121 | garrow_timestamp_data_type_new(GARROW_TIME_UNIT_SECOND, time_zone); 122 | timestamp_array_builder = 123 | garrow_timestamp_array_builder_new(timestamp_data_type); 124 | 125 | for (int i = 0; i < num_values; i++) { 126 | int64 value = data[i]; 127 | elog(LOG, "Adding value %ld to arrow array", value); 128 | garrow_timestamp_array_builder_append(timestamp_array_builder, value, 129 | &inner_error); 130 | LOG_ARROW_ERROR(inner_error); 131 | ASSIGN_IF_NOT_NULL(inner_error, error); 132 | } 133 | GArrowArray *int_array = garrow_array_builder_finish( 134 | GARROW_ARRAY_BUILDER(timestamp_array_builder), &inner_error); 135 | LOG_ARROW_ERROR(inner_error); 136 | ASSIGN_IF_NOT_NULL(inner_error, error); 137 | 138 | g_object_unref(timestamp_array_builder); 139 | g_object_unref(timestamp_data_type); 140 | 141 | return int_array; 142 | } 143 | 144 | GArrowArray *create_double_array(const double *data, int num_values, 145 | GError *error) { 146 | GError *inner_error = NULL; 147 | GArrowDoubleArrayBuilder *double_array_builder; 148 | double_array_builder = garrow_double_array_builder_new(); 149 | 150 | for (int i = 0; i < num_values; i++) { 151 | double value = data[i]; 152 | elog(LOG, "Adding value %f to arrow array", value); 153 | garrow_double_array_builder_append_value(double_array_builder, value, 154 | &inner_error); 155 | LOG_ARROW_ERROR(inner_error); 156 | ASSIGN_IF_NOT_NULL(inner_error, error); 157 | } 158 | GArrowArray *double_array = garrow_array_builder_finish( 159 | GARROW_ARRAY_BUILDER(double_array_builder), &inner_error); 160 | LOG_ARROW_ERROR(inner_error); 161 | ASSIGN_IF_NOT_NULL(inner_error, error); 162 | g_object_unref(double_array_builder); 163 | 164 | return double_array; 165 | } 166 | 167 | GArrowArray *create_bool_array(const int16 *data, int num_values, 168 | GError *error) { 169 | GError *inner_error = NULL; 170 | GArrowBooleanArrayBuilder *bool_array_builder; 171 | bool_array_builder = garrow_boolean_array_builder_new(); 172 | 173 | for (int i = 0; i < num_values; i++) { 174 | bool value = data[i] == 1 ? true : false; 175 | elog(LOG, "Adding value %d to arrow array", value); 176 | garrow_boolean_array_builder_append_value(bool_array_builder, value, 177 | &inner_error); 178 | LOG_ARROW_ERROR(inner_error); 179 | ASSIGN_IF_NOT_NULL(inner_error, error); 180 | } 181 | GArrowArray *bool_array = garrow_array_builder_finish( 182 | GARROW_ARRAY_BUILDER(bool_array_builder), &inner_error); 183 | LOG_ARROW_ERROR(inner_error); 184 | ASSIGN_IF_NOT_NULL(inner_error, error); 185 | g_object_unref(bool_array_builder); 186 | 187 | return bool_array; 188 | } 189 | 190 | GArrowArray *create_string_array(const char **data, int num_values, 191 | GError *error) { 192 | GError *inner_error = NULL; 193 | GArrowStringArrayBuilder *string_array_builder; 194 | string_array_builder = garrow_string_array_builder_new(); 195 | 196 | for (int i = 0; i < num_values; i++) { 197 | char *value = data[i]; 198 | if (value == 0) { 199 | value = ""; 200 | } 201 | elog(LOG, "Adding value %s to arrow array", value); 202 | garrow_string_array_builder_append_value(string_array_builder, value, 203 | &inner_error); 204 | ASSIGN_IF_NOT_NULL(inner_error, error); 205 | } 206 | 207 | GArrowArray *string_array = garrow_array_builder_finish( 208 | GARROW_ARRAY_BUILDER(string_array_builder), &inner_error); 209 | LOG_ARROW_ERROR(inner_error); 210 | ASSIGN_IF_NOT_NULL(inner_error, error); 211 | g_object_unref(string_array_builder); 212 | return string_array; 213 | } 214 | 215 | /* Stores information about column names and column types. */ 216 | typedef struct _ColumnInfo { 217 | char column_name[MAX_COLUMN_NAME_CHARS]; 218 | Oid column_type; 219 | } ColumnInfo; 220 | 221 | /** 222 | * Returns list of columns and type of each column. 223 | * num_of_columns is populated with columns in the table. 224 | * Caller should free the returned pointer. 225 | */ 226 | static ColumnInfo *get_column_types(const char *table_name, 227 | int *num_of_columns) { 228 | StringInfoData buf; 229 | initStringInfo(&buf); 230 | // TODO - ensure column schema is deterministic even after new columns are 231 | // added 232 | appendStringInfo(&buf, 233 | "SELECT attname, atttypid FROM pg_attribute WHERE attrelid " 234 | "= '%s'::regclass AND attnum > 0;", 235 | table_name); 236 | int status = SPI_execute(buf.data, true, 0); 237 | elog(LOG, "Executed SPI_execute query %s with status %d", buf.data, status); 238 | if (status != SPI_OK_SELECT) { 239 | ereport(ERROR, (errcode(ERRCODE_CONNECTION_FAILURE), 240 | errmsg("Failed to deduce column types"))); 241 | } 242 | elog(LOG, "Found %d processed rows", SPI_processed); 243 | bool isnull; 244 | ColumnInfo *columns = palloc_array(ColumnInfo, SPI_processed); 245 | for (int i = 0; i < SPI_processed; i += 1) { 246 | ColumnInfo column_info; 247 | Datum column_name_data = 248 | SPI_getbinval(SPI_tuptable->vals[i], SPI_tuptable->tupdesc, 1, &isnull); 249 | Name column_name = DatumGetName(column_name_data); 250 | char *column_name_value = column_name->data; 251 | elog(LOG, "Acquired column name as %s", column_name_value); 252 | 253 | Datum column_type_data = 254 | SPI_getbinval(SPI_tuptable->vals[i], SPI_tuptable->tupdesc, 2, &isnull); 255 | Oid column_type_value = DatumGetObjectId(column_type_data); 256 | elog(LOG, "Acquired column type as %d", column_type_value); 257 | 258 | strcpy(column_info.column_name, column_name_value); 259 | column_info.column_type = column_type_value; 260 | 261 | columns[i] = column_info; 262 | } 263 | *num_of_columns = SPI_processed; 264 | elog(LOG, "Finished extracting column types"); 265 | return columns; 266 | } 267 | 268 | /* 269 | * Creates Arrow schema for the table for basic types. 270 | * Caller is reponsible for freeing schema memory. 271 | */ 272 | static GArrowSchema *create_table_schema(const ColumnInfo *column_info, 273 | const ExportEntry *entry, 274 | int overall_num_columns) { 275 | GError *error = NULL; 276 | GList *fields = NULL; 277 | GArrowSchema *temp = NULL; 278 | 279 | // Define data schema 280 | elog(LOG, "Creating arrow schema"); 281 | temp = garrow_schema_new(fields); 282 | for (int i = 0; i < entry->num_of_columns; i += 1) { 283 | const char *column_name = entry->columns_to_export[i]; 284 | // TODO - move column type deduction to util 285 | Oid column_type; 286 | bool column_type_found = false; 287 | for (int j = 0; j < overall_num_columns; j += 1) { 288 | if (strcmp(column_info[j].column_name, column_name) == 0) { 289 | column_type = column_info[j].column_type; 290 | column_type_found = true; 291 | break; 292 | } 293 | } 294 | Assert(column_type_found); 295 | elog(LOG, "Adding column %s to schema with type %d", column_name, 296 | column_type); 297 | switch (column_type) { 298 | case INT2OID: 299 | case INT4OID: 300 | case INT8OID: { 301 | GArrowDataType *int_type = garrow_int64_data_type_new(); 302 | GArrowField *int_field = garrow_field_new(column_name, int_type); 303 | 304 | temp = garrow_schema_add_field(temp, i, int_field, &error); 305 | 306 | g_object_unref(int_field); 307 | g_object_unref(int_type); 308 | break; 309 | } 310 | case FLOAT4OID: 311 | case FLOAT8OID: { 312 | GArrowDataType *precision_type = garrow_double_data_type_new(); 313 | GArrowField *precision_field = 314 | garrow_field_new(column_name, precision_type); 315 | 316 | temp = garrow_schema_add_field(temp, i, precision_field, &error); 317 | 318 | g_object_unref(precision_field); 319 | g_object_unref(precision_type); 320 | break; 321 | } 322 | case TEXTOID: 323 | case VARCHAROID: { 324 | GArrowDataType *string_type = garrow_string_data_type_new(); 325 | GArrowField *string_field = garrow_field_new(column_name, string_type); 326 | 327 | temp = garrow_schema_add_field(temp, i, string_field, &error); 328 | 329 | g_object_unref(string_field); 330 | g_object_unref(string_type); 331 | break; 332 | } 333 | case BOOLOID: { 334 | GArrowDataType *bool_type = garrow_boolean_data_type_new(); 335 | GArrowField *bool_field = garrow_field_new(column_name, bool_type); 336 | 337 | temp = garrow_schema_add_field(temp, i, bool_field, &error); 338 | 339 | g_object_unref(bool_field); 340 | g_object_unref(bool_type); 341 | break; 342 | } 343 | case TIMESTAMPOID: { 344 | GTimeZone *time_zone = g_time_zone_new_utc(); 345 | GArrowDataType *timestamp_type = 346 | garrow_timestamp_data_type_new(GARROW_TIME_UNIT_SECOND, time_zone); 347 | GArrowField *timestamp_field = 348 | garrow_field_new(column_name, timestamp_type); 349 | 350 | temp = garrow_schema_add_field(temp, i, timestamp_field, &error); 351 | 352 | g_object_unref(timestamp_field); 353 | g_object_unref(timestamp_type); 354 | break; 355 | } 356 | default: 357 | elog(LOG, "Column type %s not yet supported in columnar schema", 358 | column_type); 359 | break; 360 | } 361 | } 362 | const char *schema_str = garrow_schema_to_string(temp); 363 | elog(LOG, "Created schema with string %s", schema_str); 364 | return temp; 365 | } 366 | 367 | static int64 *create_int64_data_array(Datum *data, int num_values) { 368 | int64 *int_data = (int64 *)palloc(num_values * sizeof(int64)); 369 | for (int i = 0; i < num_values; i += 1) { 370 | if (data[i] == ULONG_MAX) { 371 | int_data[i] = 0; 372 | } else { 373 | int_data[i] = DatumGetInt64(data[i]); 374 | } 375 | } 376 | return int_data; 377 | } 378 | 379 | static double *create_double_data_array(Datum *data, int num_values) { 380 | double *double_data = (double *)palloc(num_values * sizeof(double)); 381 | for (int i = 0; i < num_values; i += 1) { 382 | if (data[i] == ULONG_MAX) { 383 | double_data[i] = 0.0f; 384 | } else { 385 | double_data[i] = DatumGetFloat8(data[i]); 386 | } 387 | } 388 | return double_data; 389 | } 390 | 391 | static int16 *create_bool_data_array(Datum *data, int num_values) { 392 | int16 *bool_data = (int16 *)palloc(num_values * sizeof(int16)); 393 | for (int i = 0; i < num_values; i += 1) { 394 | if (data[i] == ULONG_MAX) { 395 | bool_data[i] = 0; 396 | } else { 397 | bool_data[i] = DatumGetBool(data[i]) ? 1 : 0; 398 | } 399 | } 400 | return bool_data; 401 | } 402 | 403 | static int64 *create_timestamp_data_array(Datum *data, int num_values) { 404 | int64 *timestamp_data = (int64 *)palloc(num_values * sizeof(int64)); 405 | for (int i = 0; i < num_values; i += 1) { 406 | if (data[i] == ULONG_MAX) { 407 | timestamp_data[i] = 0; 408 | } else { 409 | Timestamp pg_timestanp = DatumGetTimestamp(data[i]); 410 | timestamp_data[i] = timestamptz_to_time_t(pg_timestanp); 411 | } 412 | } 413 | return timestamp_data; 414 | } 415 | 416 | static char **create_string_data_array(Datum *data, int num_values) { 417 | char **string_data = (char **)palloc(num_values * sizeof(char *)); 418 | for (int i = 0; i < num_values; i += 1) { 419 | if (data[i] == ULONG_MAX) { 420 | char *cstring = ""; 421 | string_data[i] = (char *)palloc(sizeof(char)); 422 | strcpy(string_data[i], cstring); 423 | } else { 424 | char *cstring = TextDatumGetCString(data[i]); 425 | string_data[i] = (char *)palloc(sizeof(char) * strlen(cstring) + 1); 426 | strcpy(string_data[i], cstring); 427 | } 428 | } 429 | return string_data; 430 | } 431 | 432 | /** 433 | * Returns Arrow table pointer with data populated from SPI query. 434 | * Expects SPI query to read desired table rows has been executed. 435 | * Caller is reponsible for freeing table memory. 436 | */ 437 | static GArrowTable *create_arrow_table(GArrowSchema *schema, 438 | const ColumnInfo *column_info, 439 | const ExportEntry *entry, 440 | int overall_num_columns) { 441 | Assert(SPI_processed > 0); 442 | elog(LOG, "Creating arrow table"); 443 | int num_export_columns = entry->num_of_columns; 444 | GArrowArray **arrow_arrays = 445 | (GArrowArray **)palloc(num_export_columns * sizeof(GArrowArray)); 446 | Datum **table_data = (Datum **)palloc(num_export_columns * sizeof(Datum *)); 447 | // Iterate over all rows for each column and create Datum Arays for each 448 | // column Datums are just pointers to the actual data so the additional memory 449 | // of creating temp datums array is very low. 450 | for (int i = 0; i < num_export_columns; i += 1) { 451 | // Copy datums to temp variable to access all datums for a column 452 | table_data[i] = (Datum *)palloc(SPI_processed * sizeof(Datum)); 453 | for (int j = 0; j < SPI_processed; j += 1) { 454 | bool isnull; 455 | Datum datum = SPI_getbinval(SPI_tuptable->vals[j], SPI_tuptable->tupdesc, 456 | i + 1, &isnull); 457 | if (!isnull) { 458 | table_data[i][j] = datum; 459 | } else { 460 | table_data[i][j] = ULONG_MAX; 461 | } 462 | } 463 | } 464 | GError *error = NULL; 465 | // Populate arrow arrays for each column 466 | for (int i = 0; i < num_export_columns; i += 1) { 467 | // TODO - move column type deduction to util 468 | const char *column_name = entry->columns_to_export[i]; 469 | Oid column_type; 470 | bool column_type_found = false; 471 | for (int j = 0; j < overall_num_columns; j += 1) { 472 | if (strcmp(column_info[j].column_name, column_name) == 0) { 473 | column_type = column_info[j].column_type; 474 | column_type_found = true; 475 | break; 476 | } 477 | } 478 | Assert(column_type_found); 479 | elog(LOG, "Adding values to arrow array for column %s", column_name); 480 | switch (column_type) { 481 | case INT2OID: 482 | case INT4OID: 483 | case INT8OID: { 484 | const int64 *int_data = 485 | create_int64_data_array(table_data[i], SPI_processed); 486 | arrow_arrays[i] = create_int64_array(int_data, SPI_processed, error); 487 | pfree(int_data); 488 | elog(LOG, "Created int data array"); 489 | LOG_ARROW_ERROR(error); 490 | break; 491 | } 492 | case FLOAT4OID: 493 | case FLOAT8OID: { 494 | const double *double_data = 495 | create_double_data_array(table_data[i], SPI_processed); 496 | arrow_arrays[i] = create_double_array(double_data, SPI_processed, error); 497 | pfree(double_data); 498 | elog(LOG, "Created int data array"); 499 | LOG_ARROW_ERROR(error); 500 | break; 501 | } 502 | case TEXTOID: 503 | case VARCHAROID: { 504 | const char **string_data = 505 | create_string_data_array(table_data[i], SPI_processed); 506 | arrow_arrays[i] = create_string_array(string_data, SPI_processed, error); 507 | for (int k = 0; k < SPI_processed; k += 1) { 508 | pfree(string_data[k]); 509 | } 510 | pfree(string_data); 511 | elog(LOG, "Created string data array"); 512 | LOG_ARROW_ERROR(error); 513 | break; 514 | } 515 | case BOOLOID: { 516 | const int16 *bool_data = 517 | create_bool_data_array(table_data[i], SPI_processed); 518 | arrow_arrays[i] = create_bool_array(bool_data, SPI_processed, error); 519 | pfree(bool_data); 520 | elog(LOG, "Created bool data array"); 521 | LOG_ARROW_ERROR(error); 522 | break; 523 | } 524 | case TIMESTAMPOID: { 525 | const int64 *timestamp_data = 526 | create_timestamp_data_array(table_data[i], SPI_processed); 527 | arrow_arrays[i] = 528 | create_timestamp_array(timestamp_data, SPI_processed, error); 529 | pfree(timestamp_data); 530 | elog(LOG, "Created timestamp data array"); 531 | LOG_ARROW_ERROR(error); 532 | break; 533 | } 534 | default: 535 | elog(LOG, "Column type %s not yet supported in columnar schema", 536 | column_type); 537 | break; 538 | } 539 | } 540 | GArrowTable *table = 541 | garrow_table_new_arrays(schema, arrow_arrays, num_export_columns, &error); 542 | LOG_ARROW_ERROR(error); 543 | 544 | for (int i = 0; i < num_export_columns; i += 1) { 545 | g_object_unref(arrow_arrays[i]); 546 | pfree(table_data[i]); 547 | } 548 | pfree(arrow_arrays); 549 | pfree(table_data); 550 | 551 | const char *table_definition = garrow_table_to_string(table, &error); 552 | elog(LOG, "Created arrow table with definition %s", table_definition); 553 | return table; 554 | } 555 | 556 | static void write_arrow_table(const char *table_name, int chunk_num, 557 | GArrowSchema *schema, GArrowTable *table) { 558 | char path[PATH_MAX]; 559 | char file_name[PATH_MAX]; 560 | populate_temp_path_for_table(table_name, path, /*relative=*/true); 561 | sprintf(file_name, "/%d.parquet", chunk_num); 562 | strcat(path, file_name); 563 | elog(LOG, "Attempting to write file %s", path); 564 | 565 | GError *error = NULL; 566 | GParquetWriterProperties *writer_properties = 567 | gparquet_writer_properties_new(); 568 | GParquetArrowFileWriter *writer = gparquet_arrow_file_writer_new_path( 569 | schema, path, writer_properties, &error); 570 | LOG_ARROW_ERROR(error); 571 | 572 | gboolean is_write_successfull = gparquet_arrow_file_writer_write_table( 573 | writer, table, PARQUET_ROW_GROUP_CHUNK_SIZE, &error); 574 | LOG_ARROW_ERROR(error); 575 | 576 | gboolean file_closed = gparquet_arrow_file_writer_close(writer, &error); 577 | LOG_ARROW_ERROR(error); 578 | 579 | Assert(file_closed); 580 | 581 | g_object_unref(table); 582 | g_object_unref(writer); 583 | g_object_unref(writer_properties); 584 | 585 | elog(LOG, "Sucessfully wrote columnar file to disk."); 586 | } 587 | 588 | void delete_export_entry(const char *table_name) { 589 | StringInfoData buf; 590 | initStringInfo(&buf); 591 | appendStringInfo(&buf, 592 | "DELETE FROM analytica_exports WHERE table_name = '%s';", 593 | table_name); 594 | 595 | SetCurrentStatementStartTimestamp(); 596 | StartTransactionCommand(); 597 | PushActiveSnapshot(GetTransactionSnapshot()); 598 | int connection = SPI_connect(); 599 | if (connection < 0) { 600 | ereport(ERROR, (errcode(ERRCODE_CONNECTION_FAILURE), 601 | errmsg("Failed to connect to database"))); 602 | } 603 | elog(LOG, "Executing SPI_execute query %s", buf.data); 604 | int status = SPI_execute(buf.data, /*read_only=*/false, /*count=*/0); 605 | elog(LOG, "Executed SPI_execute command with status %d", status); 606 | if (status != SPI_OK_DELETE) { 607 | ereport(ERROR, (errcode(ERRCODE_CONNECTION_FAILURE), 608 | errmsg("Failed to delete export entry."))); 609 | } 610 | SPI_finish(); 611 | PopActiveSnapshot(); 612 | CommitTransactionCommand(); 613 | } 614 | 615 | static void get_tables_to_process_in_order(ExportEntry *entries, 616 | int *num_of_tables) { 617 | StringInfoData buf; 618 | initStringInfo(&buf); 619 | appendStringInfo(&buf, "SELECT \ 620 | table_name, \ 621 | columns_to_export, \ 622 | last_run_completed, \ 623 | export_frequency_hours, \ 624 | export_status, \ 625 | chunk_size, \ 626 | now() \ 627 | FROM analytica_exports \ 628 | ORDER BY last_run_completed \ 629 | LIMIT %d;", 630 | MAX_EXPORT_ENTRIES); 631 | 632 | SetCurrentStatementStartTimestamp(); 633 | StartTransactionCommand(); 634 | PushActiveSnapshot(GetTransactionSnapshot()); 635 | int connection = SPI_connect(); 636 | if (connection < 0) { 637 | ereport(ERROR, (errcode(ERRCODE_CONNECTION_FAILURE), 638 | errmsg("Failed to connect to database"))); 639 | } 640 | elog(LOG, "Created connection for query"); 641 | elog(LOG, "Executing SPI_execute query %s", buf.data); 642 | int status = SPI_execute(buf.data, /*read_only=*/true, /*count=*/0); 643 | elog(LOG, "Executed SPI_execute command with status %d", status); 644 | if (status < 0) { 645 | ereport(ERROR, (errcode(ERRCODE_CONNECTION_FAILURE), 646 | errmsg("Failed to fetch tables to export."))); 647 | } 648 | 649 | elog(LOG, "Beginning processing of %d rows", SPI_processed); 650 | int valid_entries = 0; 651 | 652 | // Calaculate number of vvalid entries to export. 653 | for (int i = 0; i < SPI_processed; i += 1) { 654 | bool isnull; 655 | bool is_valid_entry = false; 656 | Datum last_completed_datum = 657 | SPI_getbinval(SPI_tuptable->vals[i], SPI_tuptable->tupdesc, 3, &isnull); 658 | elog(LOG, "Read last completed datum"); 659 | if (isnull) { 660 | // Table is newly scheduled for export so last run completed is null 661 | is_valid_entry = true; 662 | elog(LOG, "last completed datum is null"); 663 | } else { 664 | elog(LOG, "Evaluating if older export entry is past threshold"); 665 | Datum export_frequency_datum = SPI_getbinval( 666 | SPI_tuptable->vals[i], SPI_tuptable->tupdesc, 4, &isnull); 667 | int export_frequency = DatumGetInt16(export_frequency_datum); 668 | 669 | int64 last_completed = DatumGetTimestampTz(last_completed_datum); 670 | int64 last_run_pg_time = timestamptz_to_time_t(last_completed); 671 | 672 | Datum current_time_datum = SPI_getbinval( 673 | SPI_tuptable->vals[i], SPI_tuptable->tupdesc, 7, &isnull); 674 | TimestampTz current_time = DatumGetTimestampTz(current_time_datum); 675 | int64 current_time_pg_time = timestamptz_to_time_t(current_time); 676 | 677 | elog(LOG, "last run time %lld", last_run_pg_time); 678 | elog(LOG, "current time %lld", current_time_pg_time); 679 | 680 | int64 seconds_elapsed = (current_time_pg_time - last_run_pg_time); 681 | elog(LOG, "seconds lapsed are %lld", seconds_elapsed); 682 | uint64 hours_elapsed = (seconds_elapsed) / 3600; 683 | elog(LOG, "%lld hours elapsed since last export", hours_elapsed); 684 | if (hours_elapsed > export_frequency) { 685 | is_valid_entry = true; 686 | elog(LOG, "Marking older entry for export again"); 687 | } else { 688 | elog(LOG, "Older export entry threshold han;t passed"); 689 | } 690 | } 691 | 692 | /** 693 | * Surface inactive entries for cleanup later. 694 | */ 695 | Datum export_status_datum = 696 | SPI_getbinval(SPI_tuptable->vals[i], SPI_tuptable->tupdesc, 5, &isnull); 697 | int export_status = DatumGetInt32(export_status_datum); 698 | if (export_status == INACTIVE) { 699 | is_valid_entry = true; 700 | } 701 | 702 | if (is_valid_entry) { 703 | valid_entries += 1; 704 | 705 | Datum name_datum = SPI_getbinval(SPI_tuptable->vals[i], 706 | SPI_tuptable->tupdesc, 1, &isnull); 707 | char *table_name = TextDatumGetCString(name_datum); 708 | 709 | Datum chunk_size_datum = SPI_getbinval(SPI_tuptable->vals[i], 710 | SPI_tuptable->tupdesc, 6, &isnull); 711 | int64 chunk_size = DatumGetInt64(chunk_size_datum); 712 | 713 | ArrayType *arr = DatumGetArrayTypeP(SPI_getbinval( 714 | SPI_tuptable->vals[i], SPI_tuptable->tupdesc, 2, &isnull)); 715 | Datum *column_datums; 716 | int num_of_columns; 717 | deconstruct_array(arr, TEXTOID, -1, false, TYPALIGN_INT, &column_datums, 718 | NULL, &num_of_columns); 719 | 720 | ExportEntry entry; 721 | initialize_export_entry(table_name, num_of_columns, &entry); 722 | entry.export_status = export_status; 723 | entry.chunk_size = chunk_size; 724 | 725 | for (int j = 0; j < num_of_columns; j += 1) { 726 | char *column_name = TextDatumGetCString(column_datums[j]); 727 | export_entry_add_column(&entry, column_name, j); 728 | } 729 | entries[i] = entry; 730 | } 731 | } 732 | *num_of_tables = valid_entries; 733 | SPI_finish(); 734 | PopActiveSnapshot(); 735 | CommitTransactionCommand(); 736 | } 737 | 738 | static void setup_data_directories(const char *table_name) { 739 | 740 | char root_path[PATH_MAX]; 741 | char base_path[PATH_MAX]; 742 | char temp_path[PATH_MAX]; 743 | 744 | populate_root_path(root_path, /*relative=*/false); 745 | populate_data_path_for_table(table_name, base_path, /*relative=*/false); 746 | populate_temp_path_for_table(table_name, temp_path, /*relative=*/false); 747 | 748 | elog(LOG, "Setting up root directory %s", root_path); 749 | elog(LOG, "Setting up base directory %s", base_path); 750 | elog(LOG, "Setting up temp directory %s", temp_path); 751 | 752 | // Create root data directory if not exists. 753 | if (mkdir(root_path, 0755) == -1) { 754 | perror("mkdir"); 755 | elog(LOG, "Failed to create root directoy %s", root_path); 756 | } 757 | 758 | // Create base data directory if not exists. 759 | if (mkdir(base_path, 0755) == -1) { 760 | perror("mkdir"); 761 | elog(LOG, "Failed to create data directoy %s", base_path); 762 | } 763 | 764 | // Create temp directory if not exists. 765 | if (mkdir(temp_path, 0755) == -1) { 766 | perror("mkdir"); 767 | elog(LOG, "Failed to create data directoy %s", temp_path); 768 | } 769 | 770 | // Delete content in temp directory. 771 | DIR *dir = opendir(temp_path); 772 | struct dirent *entry; 773 | while ((entry = readdir(dir)) != NULL) { 774 | char filepath[PATH_MAX]; 775 | snprintf(filepath, sizeof(filepath), "%s/%s", temp_path, entry->d_name); 776 | 777 | // Skip special entries (".", "..") 778 | if (strcmp(entry->d_name, ".") == 0 || strcmp(entry->d_name, "..") == 0) { 779 | continue; 780 | } 781 | if (unlink(filepath) == -1) { 782 | elog(LOG, "Failed to delete temp file %s", filepath); 783 | } 784 | } 785 | closedir(dir); 786 | elog(LOG, "Data directory setup complete for %s", table_name); 787 | } 788 | 789 | /** 790 | * Moves columnar files from temp directory to data directory for table. 791 | * Existing files in data directory are deleted. 792 | * Cuurently this moves files one by one so it isn't atomic. 793 | */ 794 | void move_temp_files(const char *table_name) { 795 | DIR *dir; 796 | struct dirent *entry; 797 | 798 | char temp_path[PATH_MAX]; 799 | char data_path[PATH_MAX]; 800 | 801 | populate_data_path_for_table(table_name, data_path, false); 802 | populate_temp_path_for_table(table_name, temp_path, false); 803 | 804 | // delete older parquet files in data directory 805 | // This might cause issues if an existing query is reading them. 806 | // There might be a way to acquire lock for file in unix. 807 | dir = opendir(data_path); 808 | while ((entry = readdir(dir)) != NULL) { 809 | char filepath[PATH_MAX]; 810 | snprintf(filepath, sizeof(filepath), "%s/%s", data_path, entry->d_name); 811 | 812 | // Skip special entries (".", "..") 813 | if (strcmp(entry->d_name, ".") == 0 || strcmp(entry->d_name, "..") == 0) { 814 | continue; 815 | } 816 | if (unlink(filepath) == -1) { 817 | elog(LOG, "Failed to delete data file %s", filepath); 818 | } 819 | } 820 | closedir(dir); 821 | 822 | dir = opendir(temp_path); 823 | if (dir == NULL) { 824 | perror("opendir"); 825 | return; 826 | } 827 | 828 | while ((entry = readdir(dir)) != NULL) { 829 | if (strcmp(entry->d_name, ".") == 0 || strcmp(entry->d_name, "..") == 0) { 830 | continue; 831 | } 832 | elog(LOG, "Found entry with path %s", entry->d_name); 833 | char src_path[PATH_MAX]; 834 | char dest_path[PATH_MAX]; 835 | 836 | // temp columnar file 837 | strcpy(src_path, temp_path); 838 | strcat(src_path, "/"); 839 | strcat(src_path, entry->d_name); 840 | 841 | // Final path 842 | strcpy(dest_path, data_path); 843 | strcat(dest_path, "/"); 844 | strcat(dest_path, entry->d_name); 845 | 846 | elog(LOG, "Temp path %s", src_path); 847 | elog(LOG, "Data path %s", dest_path); 848 | 849 | if (rename(src_path, dest_path) == 0) { 850 | elog(LOG, "File '%s' renamed to '%s' successfully.\n", src_path, 851 | dest_path); 852 | } else { 853 | perror("rename"); // Print error message if rename fails 854 | } 855 | } 856 | closedir(dir); 857 | } 858 | 859 | char *get_columns_string(char **columns, int num_of_columns) { 860 | int total_size = 0; 861 | for (int i = 0; i < num_of_columns; i++) { 862 | char *column_name = columns[i]; 863 | total_size += strlen(column_name) + 1; 864 | elog(LOG, "Size of column %s is %d", column_name, strlen(column_name)); 865 | } 866 | char *combined_string = (char *)palloc(total_size * sizeof(char)); 867 | combined_string[0] = '\0'; 868 | for (int i = 0; i < num_of_columns; i++) { 869 | strcat(combined_string, columns[i]); 870 | if (i < num_of_columns - 1) { 871 | strcat(combined_string, ","); 872 | } 873 | } 874 | return combined_string; 875 | } 876 | 877 | void export_table_data(ExportEntry entry) { 878 | int offset = 0; 879 | int64 processed_count; 880 | int select; 881 | 882 | SetCurrentStatementStartTimestamp(); 883 | StartTransactionCommand(); 884 | PushActiveSnapshot(GetTransactionSnapshot()); 885 | int connection = SPI_connect(); 886 | if (connection == SPI_ERROR_CONNECT) { 887 | ereport(ERROR, (errcode(ERRCODE_CONNECTION_FAILURE), 888 | errmsg("Failed to connect to database"))); 889 | } 890 | elog(LOG, "Created connection for query"); 891 | 892 | int total_columns; 893 | elog(LOG, "Trying to extract column types"); 894 | ColumnInfo *column_info = get_column_types(entry.table_name, &total_columns); 895 | elog(LOG, "Extracted %d column types", total_columns); 896 | 897 | GArrowSchema *arrow_schema = 898 | create_table_schema(column_info, &entry, total_columns); 899 | 900 | int chunk = 0; 901 | char *column_str = 902 | get_columns_string(entry.columns_to_export, entry.num_of_columns); 903 | 904 | do { 905 | StringInfoData buf; 906 | initStringInfo(&buf); 907 | // TODO - order of columns in result should match schema columns 908 | appendStringInfo(&buf, "SELECT %s FROM %s LIMIT %d OFFSET %d;", column_str, 909 | entry.table_name, entry.chunk_size, offset); 910 | 911 | // Execute the chunked query using SPI_exec 912 | elog(LOG, "Executing SPI_execute query %s", buf.data); 913 | SetCurrentStatementStartTimestamp(); 914 | select = SPI_execute(buf.data, true, 0); 915 | elog(LOG, "Executed SPI_execute command with status %d", select); 916 | 917 | if (select != SPI_OK_SELECT) { 918 | ereport(ERROR, (errcode(ERRCODE_CONNECTION_FAILURE), 919 | errmsg("SELECT Query execution failed"))); 920 | } 921 | processed_count = SPI_processed; 922 | elog(LOG, "Processed %d rows", processed_count); 923 | elog(LOG, "Number of rows is %lu", SPI_tuptable->numvals); 924 | if (processed_count > 0) { 925 | GArrowTable *arrow_table = 926 | create_arrow_table(arrow_schema, column_info, &entry, total_columns); 927 | // write to disk 928 | write_arrow_table(entry.table_name, chunk, arrow_schema, arrow_table); 929 | chunk += 1; 930 | } 931 | // Increment offset by number of processed rows 932 | offset += processed_count; 933 | } while (processed_count > 0); 934 | elog(LOG, "Finished processing %lld rows", processed_count); 935 | pfree(column_str); 936 | // Move files from temp directly to data directory. 937 | move_temp_files(entry.table_name); 938 | 939 | g_object_unref(arrow_schema); 940 | pfree(column_info); 941 | 942 | SPI_finish(); 943 | PopActiveSnapshot(); 944 | CommitTransactionCommand(); 945 | } 946 | 947 | /** 948 | * Update table export status after successfull export. 949 | */ 950 | void update_table_export_metadata(const char *table_name) { 951 | StringInfoData buf; 952 | initStringInfo(&buf); 953 | appendStringInfo(&buf, "UPDATE analytica_exports \ 954 | SET last_run_completed = CURRENT_TIMESTAMP, export_status = %d \ 955 | WHERE table_name = '%s';", 956 | ACTIVE, table_name); 957 | 958 | SetCurrentStatementStartTimestamp(); 959 | StartTransactionCommand(); 960 | PushActiveSnapshot(GetTransactionSnapshot()); 961 | int connection = SPI_connect(); 962 | if (connection < 0) { 963 | ereport(ERROR, (errcode(ERRCODE_CONNECTION_FAILURE), 964 | errmsg("Failed to connect to database"))); 965 | } 966 | elog(LOG, "Created connection for query"); 967 | elog(LOG, "Executing SPI_execute query %s", buf.data); 968 | int status = SPI_execute(buf.data, /*read_only=*/false, /*count=*/0); 969 | elog(LOG, "Executed SPI_execute command with status %d", status); 970 | 971 | SPI_finish(); 972 | PopActiveSnapshot(); 973 | CommitTransactionCommand(); 974 | } 975 | 976 | void cleanup_inactive_export(const char *table_name) { 977 | elog(LOG, "Cleaning up data for table %s", table_name); 978 | // delete data directories 979 | int status = cleanup_table_data(table_name); 980 | if (status != 0) { 981 | elog(LOG, "Failed to cleanup data for table %s", table_name); 982 | return; 983 | } 984 | // delete metadata entry 985 | delete_export_entry(table_name); 986 | elog(LOG, "Cleaned up data for table %s", table_name); 987 | } 988 | 989 | void register_table_with_parquet_server(const ExportEntry *entry) { 990 | StringInfoData buf; 991 | initStringInfo(&buf); 992 | appendStringInfo(&buf, "DROP FOREIGN TABLE IF EXISTS analytica_%s;\n", 993 | entry->table_name); 994 | appendStringInfo(&buf, "select import_parquet( \ 995 | 'analytica_%s', \ 996 | 'public', \ 997 | 'parquet_srv', \ 998 | 'list_parquet_files', \ 999 | '{\"dir\": \"./pg_analytica/%s\"}', \ 1000 | '{\"use_mmap\": \"true\", \"use_threads\": \"true\"}' \ 1001 | );", 1002 | entry->table_name, entry->table_name); 1003 | 1004 | SetCurrentStatementStartTimestamp(); 1005 | StartTransactionCommand(); 1006 | PushActiveSnapshot(GetTransactionSnapshot()); 1007 | int connection = SPI_connect(); 1008 | if (connection < 0) { 1009 | ereport(ERROR, (errcode(ERRCODE_CONNECTION_FAILURE), 1010 | errmsg("Failed to connect to database"))); 1011 | } 1012 | elog(LOG, "Created connection for query"); 1013 | elog(LOG, "Executing SPI_execute query %s", buf.data); 1014 | int status = SPI_execute(buf.data, /*read_only=*/false, /*count=*/0); 1015 | elog(LOG, "Executed SPI_execute command with status %d", status); 1016 | if (status != SPI_OK_SELECT) { 1017 | ereport(ERROR, (errcode(ERRCODE_CONNECTION_FAILURE), 1018 | errmsg("Failed to register new table entry."))); 1019 | } 1020 | SPI_finish(); 1021 | PopActiveSnapshot(); 1022 | CommitTransactionCommand(); 1023 | } 1024 | 1025 | /** 1026 | * Use SPI to read rows from table. 1027 | * https://www.postgresql.org/docs/current/spi.html 1028 | */ 1029 | void ingestor_main() { 1030 | elog(LOG, "Started export in background worker"); 1031 | bits32 flags = 0; 1032 | BackgroundWorkerUnblockSignals(); 1033 | elog(LOG, "Establishing connection to database %s with role %s", 1034 | source_database, source_database_role); 1035 | BackgroundWorkerInitializeConnection(source_database, source_database_role, 1036 | flags); 1037 | 1038 | for (;;) { 1039 | /* 1040 | * Background workers mustn't call usleep() or any direct equivalent: 1041 | * instead, they may wait on their process latch, which sleeps as 1042 | * necessary, but is awakened if postmaster dies. That way the 1043 | * background process goes away immediately in an emergency. 1044 | */ 1045 | (void)WaitLatch(MyLatch, WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, 1046 | ingestor_naptime_sec * 1000L, PG_WAIT_EXTENSION); 1047 | ResetLatch(MyLatch); 1048 | 1049 | int num_of_tables; 1050 | ExportEntry entries[MAX_EXPORT_ENTRIES]; 1051 | elog(LOG, "Fetching tables to export"); 1052 | get_tables_to_process_in_order(&entries, &num_of_tables); 1053 | 1054 | elog(LOG, "Beginning export for %d tables", num_of_tables); 1055 | 1056 | for (int i = 0; i < num_of_tables; i += 1) { 1057 | char *table_name = entries[i].table_name; 1058 | 1059 | if (entries[i].export_status == INACTIVE) { 1060 | elog(LOG, "Cleaning up inactive entry %s", entries[i].table_name); 1061 | cleanup_inactive_export(table_name); 1062 | free_export_entry(&entries[i]); 1063 | continue; 1064 | } 1065 | 1066 | elog(LOG, "Initializing data directory for %s with %d columns", 1067 | table_name, entries[i].num_of_columns); 1068 | setup_data_directories(table_name); 1069 | 1070 | elog(LOG, "Starting export for %s", table_name); 1071 | export_table_data(entries[i]); 1072 | 1073 | elog(LOG, "Updating export status for %s", table_name); 1074 | update_table_export_metadata(entries[i].table_name); 1075 | 1076 | elog(LOG, "Registering table with parqut fdw table %s", table_name); 1077 | register_table_with_parquet_server(&entries[i]); 1078 | 1079 | elog(LOG, "Freeing export entry"); 1080 | free_export_entry(&entries[i]); 1081 | 1082 | elog(LOG, "Export completed for %s", table_name); 1083 | 1084 | CHECK_FOR_INTERRUPTS(); 1085 | } 1086 | } 1087 | exit(0); 1088 | } 1089 | 1090 | Datum ingestor_launch(PG_FUNCTION_ARGS) { 1091 | pid_t pid; 1092 | BackgroundWorker worker; 1093 | BackgroundWorkerHandle *handle; 1094 | BgwHandleStatus status; 1095 | 1096 | DefineCustomIntVariable( 1097 | "pg_analytica.naptime", "Duration between each check (in seconds).", NULL, 1098 | &ingestor_naptime_sec, 10, 1, INT_MAX, PGC_SIGHUP, 0, NULL, NULL, NULL); 1099 | DefineCustomStringVariable("pg_analytica.database", "Database to connect to.", 1100 | NULL, &source_database, "postgres", PGC_SIGHUP, 0, 1101 | NULL, NULL, NULL); 1102 | DefineCustomStringVariable("pg_analytica.role", "Role to connect with.", NULL, 1103 | &source_database_role, NULL, PGC_SIGHUP, 0, NULL, 1104 | NULL, NULL); 1105 | 1106 | memset(&worker, 0, sizeof(worker)); 1107 | worker.bgw_flags = 1108 | BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION; 1109 | worker.bgw_start_time = BgWorkerStart_RecoveryFinished; 1110 | worker.bgw_restart_time = BGW_DEFAULT_RESTART_INTERVAL; 1111 | sprintf(worker.bgw_library_name, "ingestor"); 1112 | sprintf(worker.bgw_function_name, "ingestor_main"); 1113 | snprintf(worker.bgw_name, BGW_MAXLEN, "ingestor dynamic worker"); 1114 | snprintf(worker.bgw_type, BGW_MAXLEN, "ingestor dynamic"); 1115 | /* set bgw_notify_pid so that we can use WaitForBackgroundWorkerStartup */ 1116 | worker.bgw_notify_pid = MyProcPid; 1117 | 1118 | if (!RegisterDynamicBackgroundWorker(&worker, &handle)) 1119 | PG_RETURN_NULL(); 1120 | 1121 | status = WaitForBackgroundWorkerStartup(handle, &pid); 1122 | if (status == BGWH_STOPPED) 1123 | ereport(ERROR, 1124 | (errcode(ERRCODE_INSUFFICIENT_RESOURCES), 1125 | errmsg("could not start background process"), 1126 | errhint("More details may be available in the server log."))); 1127 | if (status == BGWH_POSTMASTER_DIED) 1128 | ereport(ERROR, 1129 | (errcode(ERRCODE_INSUFFICIENT_RESOURCES), 1130 | errmsg("cannot start background processes without postmaster"), 1131 | errhint("Kill all remaining database processes and restart the " 1132 | "database."))); 1133 | 1134 | Assert(status == BGWH_STARTED); 1135 | 1136 | elog(LOG, "Background worker started"); 1137 | PG_RETURN_INT32(pid); 1138 | } 1139 | -------------------------------------------------------------------------------- /ingestor/ingestor.control: -------------------------------------------------------------------------------- 1 | # ingestor postgresql extension 2 | comment = 'Extension to create columnar store from table in background worker.' 3 | default_version = '0.0.1' 4 | module_pathname = '$libdir/ingestor' 5 | relocatable = true 6 | -------------------------------------------------------------------------------- /ingestor/parquet.sql: -------------------------------------------------------------------------------- 1 | -- This file contains methods to test parquet_fdw functionality. 2 | 3 | -- function to list all parquet iles in data directory for table 4 | create function list_parquet_files(args jsonb) 5 | returns text[] as 6 | $$ 7 | begin 8 | return array_agg(args->>'dir' || '/' || filename) 9 | from pg_ls_dir(args->>'dir') as files(filename) 10 | where filename ~~ '%.parquet'; 11 | end 12 | $$ 13 | language plpgsql; 14 | 15 | -- Register parquet files as foreign table 16 | create or replace foreign table test_data_columnar ( 17 | id int, 18 | name varchar(40), 19 | age int 20 | ) 21 | server parquet_srv 22 | options ( 23 | files_func 'list_parquet_files', 24 | files_func_arg '{"dir": "./pg_analytica/test_data"}', 25 | use_mmap 'true', 26 | use_threads 'true' 27 | ); 28 | 29 | select import_parquet( 30 | 'analytica.distributors', 31 | 'public', 32 | 'parquet_srv', 33 | 'list_parquet_files', 34 | '{"dir": "./pg_analytica/distributors"}', 35 | '{"use_mmap": "true", "use_threads": "true"}' 36 | ); 37 | -------------------------------------------------------------------------------- /ingestor/registry.c: -------------------------------------------------------------------------------- 1 | #include "postgres.h" 2 | #include "access/xact.h" 3 | #include "catalog/pg_type_d.h" 4 | #include "constants.h" 5 | #include "executor/spi.h" 6 | #include "postgres.h" 7 | #include "utils/array.h" 8 | #include "utils/builtins.h" 9 | #include "utils/snapmgr.h" 10 | 11 | PG_FUNCTION_INFO_V1(register_table_export); 12 | PG_FUNCTION_INFO_V1(unregister_table_export); 13 | 14 | static int execute_query(StringInfoData buf) { 15 | int connection = SPI_connect(); 16 | if (connection < 0) { 17 | ereport(ERROR, (errcode(ERRCODE_CONNECTION_FAILURE), 18 | errmsg("Failed to connect to database"))); 19 | } 20 | elog(LOG, "Created connection for query"); 21 | // Execute the chunked query using SPI_exec 22 | elog(LOG, "Executing SPI_execute query %s", buf.data); 23 | int status = SPI_execute(buf.data, false, 0); 24 | elog(LOG, "Executed SPI_execute command with status %d", status); 25 | 26 | SPI_finish(); 27 | return status; 28 | } 29 | 30 | static char *get_columns_string(Datum *columns, int num_of_columns) { 31 | int total_size = 0; 32 | elog(LOG, "Creating stringified columns"); 33 | for (int i = 0; i < num_of_columns; i++) { 34 | char *column_name = TextDatumGetCString(columns[i]); 35 | total_size += strlen(column_name) + 1; 36 | elog(LOG, "Size of column %s is %d", column_name, strlen(column_name)); 37 | } 38 | char *combined_string = (char *)palloc(total_size * sizeof(char)); 39 | elog(LOG, "Allocated memory"); 40 | combined_string[0] = '\0'; 41 | for (int i = 0; i < num_of_columns; i++) { 42 | strcat(combined_string, TextDatumGetCString(columns[i])); 43 | elog(LOG, "Creating combined string %s", combined_string); 44 | if (i < num_of_columns - 1) { 45 | strcat(combined_string, ","); 46 | } 47 | } 48 | return combined_string; 49 | } 50 | 51 | Datum register_table_export(PG_FUNCTION_ARGS) { 52 | // TODO - add validation to ensure table exists 53 | // and column types are supported for export 54 | int num_of_args = PG_NARGS(); 55 | if (num_of_args != 4) { 56 | ereport(ERROR, (errcode(ERRCODE_RAISE_EXCEPTION), 57 | errmsg("Invalid number of arguments. Expected format is " 58 | "register_export(table_name text, columns_to_export " 59 | "text[], export_frequency_hours int)"))); 60 | } 61 | // Extract table name 62 | char *table_name = text_to_cstring(PG_GETARG_TEXT_PP(0)); 63 | // Extract columns to export 64 | ArrayType *arr = PG_GETARG_ARRAYTYPE_P(1); 65 | Assert(ARR_NDIM(arr) == 1); 66 | Assert(ARR_ELEMTYPE(arr) == TEXTOID); 67 | Datum *column_datums; 68 | int num_of_columns; 69 | deconstruct_array(arr, TEXTOID, -1, false, TYPALIGN_INT, &column_datums, NULL, 70 | &num_of_columns); 71 | char *column_str; 72 | column_str = get_columns_string(column_datums, num_of_columns); 73 | elog(LOG, "Created column string as %s", column_str); 74 | // Extract export frequency 75 | int32 export_frequency_hours = PG_GETARG_INT32(2); 76 | int64 chunk_size = PG_GETARG_INT32(3); 77 | 78 | StringInfoData buf; 79 | initStringInfo(&buf); 80 | appendStringInfo(&buf, 81 | "INSERT INTO analytica_exports (table_name, " 82 | "columns_to_export, export_frequency_hours, export_status, " 83 | "chunk_size) VALUES ('%s', '{%s}', %d, %d, %ld);", 84 | table_name, column_str, export_frequency_hours, PENDING, 85 | chunk_size); 86 | 87 | int status = execute_query(buf); 88 | if (status < 0) { 89 | ereport(ERROR, (errcode(ERRCODE_CONNECTION_FAILURE), 90 | errmsg("Query execution failed"))); 91 | } 92 | pfree(column_str); 93 | elog(LOG, "Scheduled export for table %s with frequency of %d hours", 94 | table_name, export_frequency_hours); 95 | PG_RETURN_INT32(1); 96 | } 97 | 98 | Datum unregister_table_export(PG_FUNCTION_ARGS) { 99 | int num_of_args = PG_NARGS(); 100 | if (num_of_args != 1) { 101 | ereport(ERROR, (errcode(ERRCODE_RAISE_EXCEPTION), 102 | errmsg("Invalid number of arguments. Expected format is " 103 | "unregister_export(table_name text)"))); 104 | } 105 | // Extract table name 106 | char *table_name = text_to_cstring(PG_GETARG_TEXT_PP(0)); 107 | 108 | StringInfoData buf; 109 | initStringInfo(&buf); 110 | appendStringInfo(&buf, "UPDATE analytica_exports \ 111 | SET export_status = %d \ 112 | WHERE table_name = '%s';", 113 | INACTIVE, table_name); 114 | 115 | int status = execute_query(buf); 116 | if (status < 0) { 117 | ereport(ERROR, (errcode(ERRCODE_CONNECTION_FAILURE), 118 | errmsg("Failed to unregister table."))); 119 | } 120 | 121 | elog(LOG, 122 | "Marked %s table export as inactive. Table data will be deleted at next " 123 | "available window.", 124 | table_name); 125 | PG_RETURN_INT32(1); 126 | } -------------------------------------------------------------------------------- /ingestor/test.sql: -------------------------------------------------------------------------------- 1 | DROP EXTENSION ingestor; 2 | CREATE EXTENSION ingestor; 3 | 4 | SELECT register_table_export( 5 | -- table name 6 | 'test_data', 7 | -- columns to export 8 | '{id,name,age,is_random,rating,created_at}', 9 | -- export_frequency_hours 10 | 7 11 | ); 12 | 13 | -- To Launch ingestor background worker 14 | SELECT ingestor_launch(); -------------------------------------------------------------------------------- /notes.md: -------------------------------------------------------------------------------- 1 | ## Rough Notes on pg_analytica extension 2 | 3 | 4 | ## Approach 1 5 | 6 | Try creating a columnar table by running a `CREATE TABLE AS` query by reading data from 7 | an existing table using a custom table access method to write the data. 8 | 9 | To keep the table updated as new rows are added, we would need to explore other techniques. 10 | May create a hook that periodically writes new tuples from source table. 11 | Maybe we can keep track of the oldest txn id ingested and query rows with txn id greater than that and write to columnar store. 12 | 13 | - pg_analytica extension defines a custom table access method 14 | - When used the following methods are called 15 | - `memam_relation_set_new_filelocator` 16 | - `memam_relation_needs_toast_table` 17 | - `memam_tuple_insert` called for each row in source table 18 | - Data in tuples is empty, number of valid attributes `tts_nvalid` is 0. 19 | - Calling `ExecFetchSlotHeapTuple` fetches tuple with actual data in it. 20 | 21 | - `memam_relation_set_new_filelocator` called to create storage for new table 22 | - `RelationCreateStorage` can be called to create the create the storage manager relation 23 | `SMgrRelation` used by postgres to access the table storage 24 | - Citus maps columnar storage concepts to postgres concepts like pages 25 | - `CatalogTupleInsert` is called at the end to insert table information into postgres catalog 26 | 27 | ## Approach 2 28 | - Table Access Manager definition only support row by row insert for `CREATE TABLE AS` queries 29 | - Multi-insert is only called if data is dumped to disk and read as a file 30 | - `CTAS` appoach to create columnar store may not work 31 | 32 | - Creating columnar tables in background process 33 | - Extension creates a metadata table to store metadata about columnar tables 34 | - Calling a method exposed by extension adds table entry to metadata table 35 | - A new table is also created in catalog that can use custom table AM handler ?? 36 | - Can we override query behavior here? 37 | - Background worker periodically scans registered tables and reads a subset of rows and writes it as a columnar file 38 | - After writing, the latest transaction id is written to metadata table 39 | - instead of transaction id we can read using OFFSET and chunk as well 40 | ``` 41 | SELECT * FROM your_table 42 | OFFSET offset 43 | LIMIT chunk_size; 44 | ``` 45 | - For determinitoc order we can ask users to supply list of columns by which 46 | rows should be ordered. Columnar files will be sorted by these columns. 47 | - In subsequent runs, only rows after latest transaction id / offset are scanned 48 | - Alternatively, we can scan the entire table each time. 49 | - Another background worker periodically combines multiple columnar files for the same table into a single file 50 | - Multiple ways of initializing a file defined here 51 | `postgres/src/include/storage/fd.h` 52 | - Initializing a subdirectory in postgres data directory here 53 | `MakePGDirectory(const char *directoryName)` 54 | - Details about creating background worker here 55 | `postgres/src/include/postmaster/bgworker.h` 56 | - Example usage of background worker 57 | `postgres/src/test/modules/worker_spi/worker_spi.c` 58 | 59 | ## Reading columnar file created by ingestor 60 | 61 | The columnar file is created in the postgres data directory (PGDATA). 62 | PGDATA directory can be found using this sql query. 63 | ``` 64 | SELECT current_setting('data_directory') AS pgdata_path; 65 | ``` 66 | 67 | To read the parquet file, start a duck db instance in memory and 68 | execute the select query to read the parquet file. 69 | 70 | ``` 71 | $ duckdb 72 | 73 | SELECT * from read_parquet('/opt/homebrew/var/postgresql@14/sample.parquet'); 74 | ``` 75 | 76 | # Query using Parquet Foreign Data Wrapper 77 | - https://github.com/adjust/parquet_fdw/blob/master/README.md 78 | 79 | # Query timings 80 | 81 | 82 | ### Test 1 83 | 84 | Measured using the timing feature in postgres. 85 | - Table Size little over 2GiB 86 | - 30M rows with 2 int64 columns and a varchar column exported 87 | 88 | ``` 89 | \timing on 90 | ``` 91 | 92 | - select with group by and filter 93 | ``` 94 | SELECT 95 | age, 96 | COUNT(*) 97 | FROM test_data_columnar 98 | WHERE age > 50 AND age < 90 AND id > 1000 99 | GROUP BY age; 100 | 101 | Time: 1633.900 ms (00:01.634) 102 | 103 | SELECT 104 | age, 105 | COUNT(*) 106 | FROM test_data 107 | WHERE age > 50 AND age < 90 AND id > 1000 108 | GROUP BY age; 109 | 110 | Time: 5153.393 ms (00:05.153) 111 | ``` 112 | 113 | ### Test 2 114 | 115 | Measured using the timing feature in postgres. 116 | - Table Size little over 5GiB 117 | - 60M rows with 6 columns of different types 118 | 119 | 120 | ``` 121 | \timing on 122 | 123 | 124 | -- Query on row based data 125 | SELECT 126 | age, 127 | COUNT(*) 128 | FROM test_data 129 | WHERE age > 50 AND age < 90 AND is_random 130 | GROUP BY age; 131 | 132 | Time: 9446.380 ms (00:09.446) 133 | 134 | -- Query on columnar data 135 | SELECT 136 | age, 137 | COUNT(*) 138 | FROM analytica_test_data 139 | WHERE age > 50 AND age < 90 AND is_random 140 | GROUP BY age; 141 | 142 | Time: 2312.845 ms (00:02.313) 143 | 144 | postgres=# \timing on 145 | Timing is on. 146 | postgres=# SELECT COUNT(*) from analytica_test_data WHERE age > 50 AND age < 55; 147 | count 148 | --------- 149 | 2398998 150 | (1 row) 151 | 152 | Time: 1399.673 ms (00:01.400) 153 | 154 | 155 | postgres=# SELECT COUNT(*) from test_data WHERE age > 50 AND age < 55; 156 | count 157 | --------- 158 | 2399709 159 | (1 row) 160 | 161 | Time: 9904.160 ms (00:09.904) 162 | 163 | ``` 164 | 165 | ## Export timings 166 | 167 | ### Test 1 168 | 169 | Time to export can be reduced with variable exort chunk sizes. 170 | Users can choose larger chunk size depending on size of row and system memory. 171 | 172 | - Table Size little over 2GiB 173 | - 30M rows with 2 int64 columns and a varchar column exported 174 | - export chunk size 10k 175 | - start time 12:21 PM 176 | - 10M / 30M rows processed at 12:33 PM 177 | - Finished processing at 13:34 PM 178 | - Overal export time - 1hr 13m 179 | - Max Memory usage - 3.2GiB 180 | - 3k columnar files exported 181 | 182 | ### Test 2 183 | 184 | - Table Size little over 5GiB 185 | - export chunk size 100k 186 | - Started at 8:47 PM 187 | - Peak memory usage steadily growing, currently at 9GiB 188 | - possible memory leak 189 | - memory usage at 9.82GiB 190 | - Completed at 9:34 PM 191 | - Peak memory usage 10.35GiB 192 | - 40min to export 5GiB of data 193 | - 600 columnar files exported 194 | 195 | ## Tasks 196 | 197 | - Look into memory usage increase 198 | - see if using cursors to read chunks reduces memory usage 199 | - check for memory leaks 200 | - Create section in readme about perf improvements 201 | 202 | - Add export timing metrics to metadata 203 | - Support exporting columnar files sorted by keys --------------------------------------------------------------------------------