├── .envrc ├── docs ├── .gitignore ├── book.toml └── src │ ├── SUMMARY.md │ ├── database-tables.md │ ├── generate-query.md │ ├── error-codes.md │ ├── table-details.md │ ├── introduction.md │ ├── installation.md │ ├── quick-start.md │ └── explain-query.md ├── .github ├── CODEOWNERS ├── dependabot.yml ├── ISSUE_TEMPLATE │ ├── config.yml │ ├── feature_request.yml │ ├── bug_report.yml │ └── installation_issue.yml ├── PULL_REQUEST_TEMPLATE │ ├── documentation.md │ ├── feature.md │ ├── bugfix.md │ └── refactor.md └── workflows │ ├── claude-review.yml │ └── formatting.yml ├── tests ├── fixtures │ ├── responses │ │ ├── raw_sql_only.txt │ │ ├── system_table_access.json │ │ ├── error_table_not_found.json │ │ ├── multiple_warnings.json │ │ ├── valid_query_response.json │ │ └── valid_query_markdown.txt │ └── configs │ │ ├── minimal_config.ini │ │ ├── anthropic_only.ini │ │ ├── empty_config.ini │ │ ├── gemini_only.ini │ │ ├── custom_endpoints.ini │ │ └── valid_config.ini ├── sql │ ├── teardown.sql │ ├── setup.sql │ └── test_extension_functions.sql ├── test_helpers.hpp ├── CMakeLists.txt └── unit │ ├── test_utils.cpp │ └── test_provider_selector.cpp ├── .gitmodules ├── .markdownlint.json ├── src ├── include │ ├── prompts.hpp │ ├── utils.hpp │ ├── logger.hpp │ ├── gemini_client.h │ ├── ai_client_factory.hpp │ ├── response_formatter.hpp │ ├── provider_selector.hpp │ ├── query_parser.hpp │ ├── query_generator.hpp │ ├── spi_connection.hpp │ └── config.hpp ├── core │ ├── spi_connection.cpp │ ├── logger.cpp │ ├── response_formatter.cpp │ ├── ai_client_factory.cpp │ ├── provider_selector.cpp │ └── query_parser.cpp ├── utils.cpp ├── prompts.cpp ├── providers │ └── gemini │ │ └── client.cpp └── pg_ai_query.cpp ├── pg_ai_query.control ├── .yamllint.yml ├── .clang-format ├── flake.lock ├── .gitignore ├── .agents └── commands │ └── gh-issue ├── sql └── pg_ai_query--1.0.sql ├── CLAUDE.md ├── example_pg_ai.config ├── DIRENV_SETUP.md ├── Makefile ├── CHANGELOG.md ├── NIX.md └── flake.nix /.envrc: -------------------------------------------------------------------------------- 1 | use flake 2 | -------------------------------------------------------------------------------- /docs/.gitignore: -------------------------------------------------------------------------------- 1 | book 2 | -------------------------------------------------------------------------------- /.github/CODEOWNERS: -------------------------------------------------------------------------------- 1 | * @benodiwal 2 | -------------------------------------------------------------------------------- /tests/fixtures/responses/raw_sql_only.txt: -------------------------------------------------------------------------------- 1 | SELECT id, name, email FROM customers WHERE country = 'USA' -------------------------------------------------------------------------------- /tests/fixtures/configs/minimal_config.ini: -------------------------------------------------------------------------------- 1 | # Minimal configuration - only OpenAI key 2 | [openai] 3 | api_key = sk-minimal-test-key 4 | -------------------------------------------------------------------------------- /.gitmodules: -------------------------------------------------------------------------------- 1 | [submodule "third_party/ai-sdk-cpp"] 2 | path = third_party/ai-sdk-cpp 3 | url = https://github.com/ClickHouse/ai-sdk-cpp.git 4 | -------------------------------------------------------------------------------- /.markdownlint.json: -------------------------------------------------------------------------------- 1 | { 2 | "MD013": { "line_length": 120 }, 3 | "MD033": false, 4 | "MD041": false, 5 | "MD034": false, 6 | "MD036": false 7 | } -------------------------------------------------------------------------------- /tests/fixtures/responses/system_table_access.json: -------------------------------------------------------------------------------- 1 | { 2 | "sql": "SELECT * FROM information_schema.tables", 3 | "explanation": "Query to list all tables" 4 | } -------------------------------------------------------------------------------- /tests/fixtures/configs/anthropic_only.ini: -------------------------------------------------------------------------------- 1 | # Configuration with only Anthropic 2 | [anthropic] 3 | api_key = sk-ant-only-key 4 | default_model = claude-sonnet-4-5-20250929 5 | -------------------------------------------------------------------------------- /src/include/prompts.hpp: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | #include 4 | 5 | namespace pg_ai::prompts { 6 | extern const std::string SYSTEM_PROMPT; 7 | extern const std::string EXPLAIN_SYSTEM_PROMPT; 8 | } // namespace pg_ai::prompts -------------------------------------------------------------------------------- /pg_ai_query.control: -------------------------------------------------------------------------------- 1 | # pg_ai_query extension 2 | comment = 'AI-powered SQL query generation for PostgreSQL' 3 | default_version = '1.0' 4 | module_pathname = '$libdir/pg_ai_query' 5 | relocatable = true 6 | requires = '' 7 | superuser = true -------------------------------------------------------------------------------- /tests/fixtures/configs/empty_config.ini: -------------------------------------------------------------------------------- 1 | # Empty configuration - no API keys 2 | [general] 3 | log_level = INFO 4 | enable_logging = false 5 | 6 | [query] 7 | enforce_limit = false 8 | 9 | [response] 10 | show_explanation = false 11 | -------------------------------------------------------------------------------- /.yamllint.yml: -------------------------------------------------------------------------------- 1 | extends: default 2 | rules: 3 | line-length: 4 | max: 120 5 | level: warning 6 | comments: 7 | min-spaces-from-content: 1 8 | comments-indentation: disable 9 | truthy: disable 10 | indentation: 11 | spaces: 2 -------------------------------------------------------------------------------- /.github/dependabot.yml: -------------------------------------------------------------------------------- 1 | version: 2 2 | updates: 3 | - package-ecosystem: github-actions 4 | directory: / 5 | schedule: 6 | interval: monthly 7 | - package-ecosystem: gitsubmodule 8 | directory: / 9 | schedule: 10 | interval: monthly 11 | -------------------------------------------------------------------------------- /tests/fixtures/configs/gemini_only.ini: -------------------------------------------------------------------------------- 1 | # Test configuration file - Gemini only 2 | [general] 3 | log_level = INFO 4 | enable_logging = false 5 | 6 | [gemini] 7 | api_key = AIzaSyTest-gemini-key-12345 8 | default_model = gemini-2.5-flash 9 | max_tokens = 8192 10 | temperature = 0.7 11 | -------------------------------------------------------------------------------- /tests/fixtures/configs/custom_endpoints.ini: -------------------------------------------------------------------------------- 1 | # Configuration with custom API endpoints 2 | [openai] 3 | api_key = sk-custom-openai 4 | api_endpoint = https://custom-openai.example.com/v1 5 | 6 | [anthropic] 7 | api_key = sk-custom-anthropic 8 | api_endpoint = https://custom-anthropic.example.com 9 | -------------------------------------------------------------------------------- /tests/fixtures/responses/error_table_not_found.json: -------------------------------------------------------------------------------- 1 | { 2 | "sql": "", 3 | "explanation": "Cannot generate query: The table 'nonexistent_table' does not exist in the database. Available tables are: users, orders, products.", 4 | "warnings": ["Error: Table 'nonexistent_table' does not exist"] 5 | } -------------------------------------------------------------------------------- /src/include/utils.hpp: -------------------------------------------------------------------------------- 1 | #pragma once 2 | #include 3 | #include 4 | 5 | namespace pg_ai::utils { 6 | 7 | std::pair read_file(const std::string& filepath); 8 | 9 | std::string read_file_or_throw(const std::string& filepath); 10 | 11 | std::string formatAPIError(const std::string& raw_error); 12 | 13 | } // namespace pg_ai::utils -------------------------------------------------------------------------------- /tests/fixtures/responses/multiple_warnings.json: -------------------------------------------------------------------------------- 1 | { 2 | "sql": "SELECT * FROM large_table", 3 | "explanation": "Retrieves all rows from the large_table", 4 | "warnings": [ 5 | "This query may return a large number of rows", 6 | "Consider adding a LIMIT clause", 7 | "No WHERE clause specified - full table scan" 8 | ], 9 | "row_limit_applied": false, 10 | "suggested_visualization": "table" 11 | } -------------------------------------------------------------------------------- /tests/fixtures/responses/valid_query_response.json: -------------------------------------------------------------------------------- 1 | { 2 | "sql": "SELECT * FROM users WHERE active = true ORDER BY created_at DESC LIMIT 100", 3 | "explanation": "This query retrieves all active users, sorted by creation date in descending order, limited to 100 results.", 4 | "warnings": ["Consider adding an index on the 'active' column for better performance"], 5 | "row_limit_applied": true, 6 | "suggested_visualization": "table" 7 | } -------------------------------------------------------------------------------- /tests/fixtures/responses/valid_query_markdown.txt: -------------------------------------------------------------------------------- 1 | Here is the SQL query you requested: 2 | 3 | ```json 4 | { 5 | "sql": "SELECT COUNT(*) as total_orders, customer_id FROM orders GROUP BY customer_id HAVING COUNT(*) > 5", 6 | "explanation": "This query counts orders per customer and filters to show only customers with more than 5 orders.", 7 | "warnings": [], 8 | "suggested_visualization": "bar_chart" 9 | } 10 | ``` 11 | 12 | Let me know if you need any modifications! -------------------------------------------------------------------------------- /docs/book.toml: -------------------------------------------------------------------------------- 1 | [book] 2 | authors = ["benodiwal"] 3 | language = "en" 4 | src = "src" 5 | title = "PostgreSQL AI Query Extension" 6 | description = "Generate SQL queries from natural language using AI - Documentation for pg_ai_query extension" 7 | 8 | [build] 9 | build-dir = "book" 10 | 11 | [output.html] 12 | default-theme = "light" 13 | preferred-dark-theme = "navy" 14 | smart-punctuation = true 15 | mathjax-support = false 16 | copy-fonts = true 17 | additional-css = [] 18 | additional-js = [] 19 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/config.yml: -------------------------------------------------------------------------------- 1 | blank_issues_enabled: true 2 | contact_links: 3 | - name: Documentation 4 | url: https://benodiwal.github.io/pg_ai_query/ 5 | about: Check the official documentation for guides and API reference 6 | - name: GitHub Discussions 7 | url: https://github.com/benodiwal/pg_ai_query/discussions 8 | about: Ask questions and discuss ideas with the community 9 | - name: Troubleshooting Guide 10 | url: https://benodiwal.github.io/pg_ai_query/troubleshooting.html 11 | about: Common issues and their solutions 12 | -------------------------------------------------------------------------------- /src/include/logger.hpp: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | #include 4 | 5 | #ifdef USE_POSTGRESQL_ELOG 6 | extern "C" { 7 | #include 8 | 9 | #include 10 | } 11 | 12 | #endif 13 | 14 | namespace pg_ai::logger { 15 | 16 | class Logger { 17 | public: 18 | static void debug(const std::string& message); 19 | static void info(const std::string& message); 20 | static void warning(const std::string& message); 21 | static void error(const std::string& message); 22 | 23 | static void setLoggingEnabled(bool enabled); 24 | }; 25 | 26 | } // namespace pg_ai::logger -------------------------------------------------------------------------------- /tests/sql/teardown.sql: -------------------------------------------------------------------------------- 1 | -- Teardown script for pg_ai_query extension tests 2 | -- This file cleans up the test environment 3 | 4 | -- Drop test tables 5 | DROP TABLE IF EXISTS pg_ai_test.order_items CASCADE; 6 | DROP TABLE IF EXISTS pg_ai_test.orders CASCADE; 7 | DROP TABLE IF EXISTS pg_ai_test.products CASCADE; 8 | DROP TABLE IF EXISTS pg_ai_test.users CASCADE; 9 | 10 | -- Drop test schema 11 | DROP SCHEMA IF EXISTS pg_ai_test CASCADE; 12 | 13 | -- Note: We don't drop the extension here as it might be used by other tests 14 | -- Use: DROP EXTENSION IF EXISTS pg_ai_query CASCADE; to remove the extension 15 | -------------------------------------------------------------------------------- /tests/fixtures/configs/valid_config.ini: -------------------------------------------------------------------------------- 1 | # Test configuration file - valid complete config 2 | [general] 3 | log_level = DEBUG 4 | enable_logging = true 5 | request_timeout_ms = 60000 6 | max_retries = 5 7 | 8 | [query] 9 | enforce_limit = true 10 | default_limit = 500 11 | 12 | [response] 13 | show_explanation = true 14 | show_warnings = true 15 | show_suggested_visualization = true 16 | use_formatted_response = false 17 | 18 | [openai] 19 | api_key = sk-test-openai-key-12345 20 | default_model = gpt-4o-mini 21 | max_tokens = 8192 22 | temperature = 0.5 23 | api_endpoint = https://api.openai.com 24 | 25 | [anthropic] 26 | api_key = sk-ant-test-key-67890 27 | default_model = claude-3-haiku-20240307 28 | max_tokens = 4096 29 | temperature = 0.3 30 | 31 | [gemini] 32 | api_key = AIzaSyTest-gemini-key-valid 33 | default_model = gemini-2.5-flash 34 | max_tokens = 8192 35 | temperature = 0.5 36 | -------------------------------------------------------------------------------- /src/core/spi_connection.cpp: -------------------------------------------------------------------------------- 1 | #include "../include/spi_connection.hpp" 2 | 3 | namespace pg_ai { 4 | 5 | SPIConnection::SPIConnection() { 6 | if (SPI_connect() == SPI_OK_CONNECT) { 7 | connected_ = true; 8 | } else { 9 | connected_ = false; 10 | error_message_ = "Failed to connect to SPI"; 11 | } 12 | } 13 | 14 | SPIConnection::~SPIConnection() { 15 | if (connected_) { 16 | SPI_finish(); 17 | } 18 | } 19 | 20 | SPIConnection::SPIConnection(SPIConnection&& other) noexcept 21 | : connected_(other.connected_), 22 | error_message_(std::move(other.error_message_)) { 23 | other.connected_ = false; 24 | } 25 | 26 | SPIConnection& SPIConnection::operator=(SPIConnection&& other) noexcept { 27 | if (this != &other) { 28 | if (connected_) { 29 | SPI_finish(); 30 | } 31 | connected_ = other.connected_; 32 | error_message_ = std::move(other.error_message_); 33 | other.connected_ = false; 34 | } 35 | return *this; 36 | } 37 | 38 | } // namespace pg_ai 39 | -------------------------------------------------------------------------------- /docs/src/SUMMARY.md: -------------------------------------------------------------------------------- 1 | # Summary 2 | 3 | [Introduction](./introduction.md) 4 | 5 | # Getting Started 6 | - [Installation](./installation.md) 7 | - [Configuration](./configuration.md) 8 | - [Response Formatting](./response-formatting.md) 9 | - [Quick Start](./quick-start.md) 10 | 11 | # Functions 12 | - [generate_query](./generate-query.md) 13 | - [explain_query](./explain-query.md) 14 | - [get_database_tables](./database-tables.md) 15 | - [get_table_details](./table-details.md) 16 | 17 | # Usage 18 | - [Basic Usage](./usage.md) 19 | - [Query Performance Analysis](./performance-analysis.md) 20 | - [Examples](./examples.md) 21 | 22 | # Advanced Topics 23 | - [Schema Discovery](./schema-discovery.md) 24 | - [AI Providers](./providers.md) 25 | - [Performance & Best Practices](./best-practices.md) 26 | - [Integration Patterns](./integration.md) 27 | 28 | # Development 29 | - [Nix Development Environment](./nix-development.md) 30 | 31 | # Reference 32 | - [Function Reference](./function-reference.md) 33 | - [API Reference](./api-reference.md) 34 | - [Configuration File Reference](./config-reference.md) 35 | - [Error Codes](./error-codes.md) 36 | - [Troubleshooting](./troubleshooting.md) 37 | - [FAQ](./faq.md) -------------------------------------------------------------------------------- /src/include/gemini_client.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | #include 4 | #include 5 | 6 | namespace gemini { 7 | 8 | struct GeminiRequest { 9 | std::string model; 10 | std::string system_prompt; 11 | std::string user_prompt; 12 | std::optional temperature; 13 | std::optional max_tokens; 14 | }; 15 | 16 | struct GeminiResponse { 17 | std::string text; 18 | bool success; 19 | std::string error_message; 20 | int status_code; 21 | }; 22 | 23 | class GeminiClient { 24 | public: 25 | explicit GeminiClient(const std::string& api_key); 26 | ~GeminiClient() = default; 27 | 28 | GeminiResponse generate_text(const GeminiRequest& request); 29 | 30 | private: 31 | std::string api_key_; 32 | static constexpr const char* BASE_URL = 33 | "https://generativelanguage.googleapis.com"; 34 | static constexpr const char* API_VERSION = "v1beta"; 35 | 36 | std::string build_request_body(const GeminiRequest& request); 37 | GeminiResponse parse_response(const std::string& body, int status_code); 38 | GeminiResponse make_http_request(const std::string& url, 39 | const std::string& body); 40 | }; 41 | 42 | } // namespace gemini 43 | -------------------------------------------------------------------------------- /src/include/ai_client_factory.hpp: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | #include 4 | 5 | #include 6 | 7 | #include "config.hpp" 8 | 9 | namespace pg_ai { 10 | 11 | namespace constants { 12 | constexpr const char* DEFAULT_OPENAI_MODEL = "gpt-4o"; 13 | constexpr const char* DEFAULT_ANTHROPIC_MODEL = "claude-sonnet-4-5-20250929"; 14 | } // namespace constants 15 | 16 | struct AIClientResult { 17 | ai::Client client; 18 | std::string model_name; 19 | bool success; 20 | std::string error_message; 21 | }; 22 | 23 | class AIClientFactory { 24 | public: 25 | /** 26 | * @brief Create an AI client for the specified provider 27 | * 28 | * @param provider The provider to create client for 29 | * @param api_key API key to use 30 | * @param provider_config Optional provider config for model/settings 31 | * @return AIClientResult with client and model name 32 | */ 33 | static AIClientResult createClient( 34 | config::Provider provider, 35 | const std::string& api_key, 36 | const config::ProviderConfig* provider_config = nullptr); 37 | 38 | /** 39 | * @brief Get the default model name for a provider 40 | */ 41 | static std::string getDefaultModel(config::Provider provider); 42 | }; 43 | 44 | } // namespace pg_ai 45 | -------------------------------------------------------------------------------- /src/include/response_formatter.hpp: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | #include 4 | #include "config.hpp" 5 | #include "query_generator.hpp" 6 | 7 | namespace pg_ai { 8 | 9 | class ResponseFormatter { 10 | public: 11 | /** 12 | * @brief Format query result based on configuration settings 13 | * @param result The query result to format 14 | * @param config Configuration settings for formatting 15 | * @return Formatted response string 16 | */ 17 | static std::string formatResponse(const QueryResult& result, 18 | const config::Configuration& config); 19 | 20 | private: 21 | /** 22 | * @brief Create JSON formatted response 23 | */ 24 | static std::string createJSONResponse(const QueryResult& result, 25 | const config::Configuration& config); 26 | 27 | /** 28 | * @brief Create plain text formatted response 29 | */ 30 | static std::string createPlainTextResponse( 31 | const QueryResult& result, 32 | const config::Configuration& config); 33 | 34 | /** 35 | * @brief Format warnings for display 36 | */ 37 | static std::string formatWarnings(const std::vector& warnings); 38 | 39 | /** 40 | * @brief Format suggested visualization for display 41 | */ 42 | static std::string formatVisualization(const std::string& visualization); 43 | }; 44 | 45 | } // namespace pg_ai -------------------------------------------------------------------------------- /tests/test_helpers.hpp: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | #include 4 | #include 5 | #include 6 | 7 | namespace pg_ai::test_utils { 8 | 9 | inline std::string getFixturesPath() { 10 | #ifdef TEST_FIXTURES_PATH 11 | return TEST_FIXTURES_PATH; 12 | #else 13 | return "./fixtures"; 14 | #endif 15 | } 16 | 17 | inline std::string getConfigFixture(const std::string& filename) { 18 | return getFixturesPath() + "/configs/" + filename; 19 | } 20 | 21 | inline std::string getResponseFixture(const std::string& filename) { 22 | return getFixturesPath() + "/responses/" + filename; 23 | } 24 | 25 | inline std::string readTestFile(const std::string& path) { 26 | std::ifstream file(path); 27 | if (!file) { 28 | return ""; 29 | } 30 | return std::string(std::istreambuf_iterator(file), 31 | std::istreambuf_iterator()); 32 | } 33 | 34 | class TempConfigFile { 35 | public: 36 | TempConfigFile(const std::string& content) { 37 | path_ = std::filesystem::temp_directory_path() / 38 | ("pg_ai_test_config_" + std::to_string(rand()) + ".ini"); 39 | std::ofstream file(path_); 40 | file << content; 41 | } 42 | 43 | ~TempConfigFile() { std::filesystem::remove(path_); } 44 | 45 | std::string path() const { return path_.string(); } 46 | 47 | private: 48 | std::filesystem::path path_; 49 | }; 50 | 51 | } // namespace pg_ai::test_utils 52 | -------------------------------------------------------------------------------- /.clang-format: -------------------------------------------------------------------------------- 1 | BasedOnStyle: Chromium 2 | # This defaults to 'Auto'. Explicitly set it for a while, so that 3 | # 'vector >' in existing files gets formatted to 4 | # 'vector>'. ('Auto' means that clang-format will only use 5 | # 'int>>' if the file already contains at least one such instance.) 6 | Standard: Cpp11 7 | SortIncludes: true 8 | 9 | # Include organization settings 10 | IncludeBlocks: Preserve 11 | IncludeCategories: 12 | # PostgreSQL headers - all same priority to stay together 13 | - Regex: '^$' 14 | Priority: 0 15 | SortPriority: 0 16 | CaseSensitive: false 17 | - Regex: '^<(fmgr|funcapi|miscadmin|access|catalog|utils)/' 18 | Priority: 0 19 | SortPriority: 1 20 | CaseSensitive: false 21 | - Regex: '^<(fmgr|funcapi|miscadmin)\.h>$' 22 | Priority: 0 23 | SortPriority: 1 24 | CaseSensitive: false 25 | # Primary header (for .cc files) 26 | - Regex: '^".*\.h"$' 27 | Priority: 2 28 | SortPriority: 2 29 | CaseSensitive: false 30 | # System/standard library headers 31 | - Regex: '^<[^/>]*>$' 32 | Priority: 3 33 | SortPriority: 3 34 | CaseSensitive: false 35 | # Third-party library headers (contain / but start with <) 36 | - Regex: '^<.*/' 37 | Priority: 4 38 | SortPriority: 4 39 | CaseSensitive: false 40 | # Project local headers 41 | - Regex: '^".*"$' 42 | Priority: 5 43 | SortPriority: 5 44 | CaseSensitive: false -------------------------------------------------------------------------------- /.github/PULL_REQUEST_TEMPLATE/documentation.md: -------------------------------------------------------------------------------- 1 | ## Documentation Changes 2 | 3 | 4 | 5 | ## Motivation 6 | 7 | 8 | 9 | ## Changes Made 10 | 11 | 12 | 13 | - 14 | - 15 | - 16 | 17 | ## Type of Documentation 18 | 19 | - [ ] User documentation (README, guides, tutorials) 20 | - [ ] Developer documentation (CONTRIBUTING, architecture) 21 | - [ ] API documentation (function references, configuration) 22 | - [ ] Code comments and inline documentation 23 | - [ ] Examples and usage guides 24 | 25 | ## Related Issue 26 | 27 | 28 | 29 | Closes # 30 | 31 | ## Review Checklist 32 | 33 | - [ ] All information is accurate and up-to-date 34 | - [ ] Examples are tested and working 35 | - [ ] Links are valid and not broken 36 | - [ ] Formatting is consistent with existing documentation 37 | - [ ] Grammar and spelling are correct 38 | - [ ] Code snippets are properly formatted 39 | - [ ] Screenshots/diagrams are clear and helpful (if applicable) 40 | 41 | ## Preview 42 | 43 | 44 | 45 | ``` 46 | 47 | ``` 48 | 49 | ## Additional Context 50 | 51 | 52 | -------------------------------------------------------------------------------- /src/core/logger.cpp: -------------------------------------------------------------------------------- 1 | #include "../include/logger.hpp" 2 | #include 3 | 4 | namespace pg_ai::logger { 5 | 6 | // Static flag to control logging, set by config manager 7 | static bool logging_enabled = false; 8 | 9 | void Logger::setLoggingEnabled(bool enabled) { 10 | logging_enabled = enabled; 11 | } 12 | 13 | void Logger::debug(const std::string& message) { 14 | if (!logging_enabled) 15 | return; 16 | #ifdef USE_POSTGRESQL_ELOG 17 | ereport(DEBUG1, (errmsg("[pg_ai_query] %s", message.c_str()))); 18 | #else 19 | std::cerr << "[DEBUG] [pg_ai_query] " << message << std::endl; 20 | #endif 21 | } 22 | 23 | void Logger::info(const std::string& message) { 24 | if (!logging_enabled) 25 | return; 26 | #ifdef USE_POSTGRESQL_ELOG 27 | ereport(INFO, (errmsg("[pg_ai_query] %s", message.c_str()))); 28 | #else 29 | std::cerr << "[INFO] [pg_ai_query] " << message << std::endl; 30 | #endif 31 | } 32 | 33 | void Logger::warning(const std::string& message) { 34 | if (!logging_enabled) 35 | return; 36 | #ifdef USE_POSTGRESQL_ELOG 37 | ereport(WARNING, (errmsg("[pg_ai_query] %s", message.c_str()))); 38 | #else 39 | std::cerr << "[WARNING] [pg_ai_query] " << message << std::endl; 40 | #endif 41 | } 42 | 43 | void Logger::error(const std::string& message) { 44 | if (!logging_enabled) 45 | return; 46 | #ifdef USE_POSTGRESQL_ELOG 47 | ereport(LOG, (errmsg("[pg_ai_query] ERROR: %s", message.c_str()))); 48 | #else 49 | std::cerr << "[ERROR] [pg_ai_query] " << message << std::endl; 50 | #endif 51 | } 52 | 53 | } // namespace pg_ai::logger -------------------------------------------------------------------------------- /src/include/provider_selector.hpp: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | #include 4 | #include 5 | 6 | #include "config.hpp" 7 | 8 | namespace pg_ai { 9 | 10 | struct ProviderSelectionResult { 11 | config::Provider provider = config::Provider::OPENAI; 12 | const config::ProviderConfig* config = nullptr; 13 | std::string api_key; 14 | std::string api_key_source; 15 | bool success = false; 16 | std::string error_message; 17 | }; 18 | 19 | class ProviderSelector { 20 | public: 21 | /** 22 | * @brief Select the appropriate provider and resolve API key 23 | * 24 | * Selection logic: 25 | * 1. If provider_preference is explicitly "openai", "anthropic", or "gemini", 26 | * use that 27 | * 2. If api_key is provided without provider preference, default to OpenAI 28 | * 3. If no api_key is provided, auto-detect based on available config keys 29 | * 30 | * @param api_key API key passed as parameter (may be empty) 31 | * @param provider_preference Provider preference ("openai", "anthropic", 32 | * "gemini", or empty for auto) 33 | * @return ProviderSelectionResult with selected provider, config, and API key 34 | */ 35 | static ProviderSelectionResult selectProvider( 36 | const std::string& api_key, 37 | const std::string& provider_preference); 38 | 39 | private: 40 | static ProviderSelectionResult selectExplicitProvider( 41 | const std::string& api_key, 42 | config::Provider provider); 43 | 44 | static ProviderSelectionResult autoSelectProvider(const std::string& api_key); 45 | }; 46 | 47 | } // namespace pg_ai 48 | -------------------------------------------------------------------------------- /.github/PULL_REQUEST_TEMPLATE/feature.md: -------------------------------------------------------------------------------- 1 | ## Description 2 | 3 | 4 | 5 | ## Motivation 6 | 7 | 8 | 9 | ## Changes 10 | 11 | 12 | 13 | - 14 | - 15 | - 16 | 17 | ## Type of Change 18 | 19 | - [ ] New feature (non-breaking change which adds functionality) 20 | - [ ] Enhancement (improvement to existing functionality) 21 | - [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected) 22 | 23 | ## Testing 24 | 25 | 26 | 27 | - [ ] Tested with PostgreSQL version(s): 28 | - [ ] Verified extension loads correctly 29 | - [ ] Tested with sample queries 30 | - [ ] Verified schema discovery functions work 31 | 32 | ## Configuration Changes 33 | 34 | 35 | 36 | - [ ] No configuration changes 37 | - [ ] New configuration options added (describe below) 38 | 39 | ## Documentation 40 | 41 | - [ ] Updated README.md (if applicable) 42 | - [ ] Updated CONTRIBUTING.md (if applicable) 43 | - [ ] Updated docs/ (if applicable) 44 | - [ ] Added inline code documentation 45 | 46 | ## Checklist 47 | 48 | - [ ] My code follows the code style of this project 49 | - [ ] I have performed a self-review of my own code 50 | - [ ] I have commented my code, particularly in hard-to-understand areas 51 | - [ ] My changes generate no new warnings 52 | - [ ] Any dependent changes have been merged and published 53 | 54 | ## Additional Notes 55 | 56 | 57 | -------------------------------------------------------------------------------- /flake.lock: -------------------------------------------------------------------------------- 1 | { 2 | "nodes": { 3 | "flake-utils": { 4 | "inputs": { 5 | "systems": "systems" 6 | }, 7 | "locked": { 8 | "lastModified": 1731533236, 9 | "narHash": "sha256-l0KFg5HjrsfsO/JpG+r7fRrqm12kzFHyUHqHCVpMMbI=", 10 | "owner": "numtide", 11 | "repo": "flake-utils", 12 | "rev": "11707dc2f618dd54ca8739b309ec4fc024de578b", 13 | "type": "github" 14 | }, 15 | "original": { 16 | "owner": "numtide", 17 | "repo": "flake-utils", 18 | "type": "github" 19 | } 20 | }, 21 | "nixpkgs": { 22 | "locked": { 23 | "lastModified": 1763966396, 24 | "narHash": "sha256-6eeL1YPcY1MV3DDStIDIdy/zZCDKgHdkCmsrLJFiZf0=", 25 | "owner": "NixOS", 26 | "repo": "nixpkgs", 27 | "rev": "5ae3b07d8d6527c42f17c876e404993199144b6a", 28 | "type": "github" 29 | }, 30 | "original": { 31 | "owner": "NixOS", 32 | "ref": "nixos-unstable", 33 | "repo": "nixpkgs", 34 | "type": "github" 35 | } 36 | }, 37 | "root": { 38 | "inputs": { 39 | "flake-utils": "flake-utils", 40 | "nixpkgs": "nixpkgs" 41 | } 42 | }, 43 | "systems": { 44 | "locked": { 45 | "lastModified": 1681028828, 46 | "narHash": "sha256-Vy1rq5AaRuLzOxct8nz4T6wlgyUR7zLU309k9mBC768=", 47 | "owner": "nix-systems", 48 | "repo": "default", 49 | "rev": "da67096a3b9bf56a91d16901293e51ba5b49a27e", 50 | "type": "github" 51 | }, 52 | "original": { 53 | "owner": "nix-systems", 54 | "repo": "default", 55 | "type": "github" 56 | } 57 | } 58 | }, 59 | "root": "root", 60 | "version": 7 61 | } 62 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Build directories 2 | build/ 3 | install/ 4 | dist/ 5 | out/ 6 | 7 | # Compiled Object files 8 | *.slo 9 | *.lo 10 | *.o 11 | *.obj 12 | *.so 13 | *.dylib 14 | *.dll 15 | 16 | # Compiled Dynamic libraries 17 | *.so.* 18 | *.dylib 19 | 20 | # Compiled Static libraries 21 | *.lai 22 | *.la 23 | *.a 24 | *.lib 25 | 26 | # Executables 27 | *.exe 28 | *.out 29 | *.app 30 | 31 | # CMake 32 | CMakeCache.txt 33 | CMakeFiles/ 34 | CMakeScripts/ 35 | compile_commands.json 36 | CTestTestfile.cmake 37 | _deps/ 38 | 39 | # IDE files 40 | .vscode/ 41 | .idea/ 42 | *.swp 43 | *.swo 44 | *~ 45 | .DS_Store 46 | 47 | # Configuration files with secrets 48 | *.config 49 | .env 50 | .env.local 51 | .env.production 52 | .env.test 53 | 54 | # Logs 55 | *.log 56 | logs/ 57 | *.pid 58 | 59 | # Package files 60 | *.tar.gz 61 | *.zip 62 | *.deb 63 | *.rpm 64 | 65 | # PostgreSQL 66 | postgresql-*.log 67 | pg_log/ 68 | logfile 69 | 70 | # Documentation build 71 | docs/book/ 72 | docs/_book/ 73 | docs/node_modules/ 74 | 75 | # Testing 76 | build_tests 77 | test_results/ 78 | coverage/ 79 | *.gcov 80 | *.gcda 81 | *.gcno 82 | 83 | # Backup files 84 | *.bak 85 | *.backup 86 | *~ 87 | 88 | # Temporary files 89 | tmp/ 90 | temp/ 91 | .tmp/ 92 | .cache/ 93 | 94 | # OS generated files 95 | Thumbs.db 96 | ehthumbs.db 97 | Desktop.ini 98 | 99 | # API Keys and secrets (examples) 100 | *api_key* 101 | *secret* 102 | *.pem 103 | *.key 104 | *.crt 105 | 106 | # Local development 107 | local_config/ 108 | dev_setup/ 109 | 110 | # Nix 111 | result 112 | result-* 113 | .direnv/ 114 | .pgdata/ 115 | 116 | # Rust/Cargo (local installation in Nix env) 117 | .cargo/ 118 | .rustup/ 119 | 120 | # Note: .envrc is committed to the repository 121 | # flake.lock is also committed for reproducibility -------------------------------------------------------------------------------- /docs/src/database-tables.md: -------------------------------------------------------------------------------- 1 | # get_database_tables Function 2 | 3 | The `get_database_tables` function returns metadata about all user tables in the database, providing a comprehensive overview of your database schema. 4 | 5 | ## Function Signature 6 | 7 | ```sql 8 | get_database_tables() RETURNS text 9 | ``` 10 | 11 | ## Parameters 12 | 13 | This function takes no parameters. 14 | 15 | ## Basic Usage 16 | 17 | ```sql 18 | -- Get all database tables 19 | SELECT get_database_tables(); 20 | 21 | -- Pretty print with formatting 22 | SELECT jsonb_pretty(get_database_tables()::jsonb); 23 | ``` 24 | 25 | ## Output Format 26 | 27 | The function returns a JSON array containing metadata for each table: 28 | 29 | ```json 30 | [ 31 | { 32 | "table_name": "users", 33 | "schema_name": "public", 34 | "table_type": "BASE TABLE", 35 | "estimated_rows": 1500 36 | }, 37 | { 38 | "table_name": "orders", 39 | "schema_name": "public", 40 | "table_type": "BASE TABLE", 41 | "estimated_rows": 5000 42 | } 43 | ] 44 | ``` 45 | 46 | ## Extracting Information 47 | 48 | ### Get Table Names Only 49 | 50 | ```sql 51 | SELECT 52 | jsonb_array_elements(get_database_tables()::jsonb)->>'table_name' as table_name 53 | FROM (SELECT get_database_tables()) t; 54 | ``` 55 | 56 | ### Get Tables with Row Counts 57 | 58 | ```sql 59 | SELECT 60 | table_name, 61 | estimated_rows 62 | FROM jsonb_to_recordset(get_database_tables()::jsonb) 63 | AS x(table_name text, estimated_rows int) 64 | ORDER BY estimated_rows DESC; 65 | ``` 66 | 67 | ## Use Cases 68 | 69 | - Database exploration and documentation 70 | - Schema discovery for applications 71 | - Inventory of available tables 72 | - Integration with generate_query function 73 | 74 | ## See Also 75 | 76 | - [get_table_details Function](./table-details.md) - Get detailed table information 77 | - [generate_query Function](./generate-query.md) - Generate queries from natural language -------------------------------------------------------------------------------- /docs/src/generate-query.md: -------------------------------------------------------------------------------- 1 | # generate_query Function 2 | 3 | The `generate_query` function converts natural language descriptions into valid PostgreSQL queries using AI models from OpenAI and Anthropic. 4 | 5 | ## Function Signature 6 | 7 | ```sql 8 | generate_query( 9 | natural_language_query text, 10 | api_key text DEFAULT NULL, 11 | provider text DEFAULT 'auto' 12 | ) RETURNS text 13 | ``` 14 | 15 | ## Parameters 16 | 17 | | Parameter | Type | Default | Description | 18 | |-----------|------|---------|-------------| 19 | | `natural_language_query` | `text` | *required* | The natural language description of the desired query | 20 | | `api_key` | `text` | `NULL` | OpenAI or Anthropic API key (uses config if not provided) | 21 | | `provider` | `text` | `'auto'` | AI provider: `'openai'`, `'anthropic'`, or `'auto'` | 22 | 23 | ## Basic Usage 24 | 25 | ### Simple Queries 26 | 27 | ```sql 28 | -- Basic data retrieval 29 | SELECT generate_query('show all users'); 30 | 31 | -- With filtering 32 | SELECT generate_query('find users created in the last week'); 33 | 34 | -- Counting records 35 | SELECT generate_query('count total orders'); 36 | ``` 37 | 38 | ### Complex Queries 39 | 40 | ```sql 41 | -- Joins and aggregation 42 | SELECT generate_query('show top 10 customers by total order value with their email addresses'); 43 | 44 | -- Date-based analysis 45 | SELECT generate_query('monthly revenue trend for the last year'); 46 | ``` 47 | 48 | ## Configuration 49 | 50 | ### API Key Setup 51 | 52 | Create `~/.pg_ai.config`: 53 | 54 | ```ini 55 | [openai] 56 | api_key = "sk-your-openai-api-key" 57 | default_model = "gpt-4o" 58 | 59 | [anthropic] 60 | api_key = "sk-ant-your-anthropic-key" 61 | default_model = "claude-sonnet-4-5-20250929" 62 | ``` 63 | 64 | ## See Also 65 | 66 | - [explain_query Function](./explain-query.md) - Analyze query performance 67 | - [Examples](./examples.md) - More usage examples 68 | - [Error Codes](./error-codes.md) - Troubleshooting guide -------------------------------------------------------------------------------- /src/include/query_parser.hpp: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | #include 4 | #include 5 | 6 | #include 7 | 8 | namespace pg_ai { 9 | 10 | // Forward declaration to avoid circular dependency 11 | struct QueryResult; 12 | 13 | /** 14 | * @brief Pure parsing functions for query generation responses 15 | * 16 | * These functions are extracted from QueryGenerator to allow unit testing 17 | * without PostgreSQL dependencies. 18 | */ 19 | class QueryParser { 20 | public: 21 | /** 22 | * @brief Extract SQL JSON from an LLM response 23 | * 24 | * Handles various response formats: 25 | * - Direct JSON object 26 | * - JSON embedded in markdown code blocks 27 | * - Raw SQL text (fallback) 28 | * 29 | * @param response The raw response text from the LLM 30 | * @return Parsed JSON object with sql, explanation, warnings, etc. 31 | */ 32 | static nlohmann::json extractSQLFromResponse(const std::string& response); 33 | 34 | /** 35 | * @brief Parse a JSON response into a QueryResult struct 36 | * 37 | * @param response_text The raw response text from the LLM 38 | * @return QueryResult with parsed fields and success/error status 39 | */ 40 | static QueryResult parseQueryResponse(const std::string& response_text); 41 | 42 | /** 43 | * @brief Check if a SQL query accesses system tables 44 | * 45 | * @param sql The SQL query to check 46 | * @return true if the query accesses information_schema or pg_catalog 47 | */ 48 | static bool accessesSystemTables(const std::string& sql); 49 | 50 | /** 51 | * @brief Check if an explanation indicates an error condition 52 | * 53 | * @param explanation The explanation text to check 54 | * @param warnings Vector of warning messages 55 | * @return true if error indicators are found 56 | */ 57 | static bool hasErrorIndicators(const std::string& explanation, 58 | const std::vector& warnings); 59 | }; 60 | 61 | } // namespace pg_ai 62 | -------------------------------------------------------------------------------- /tests/CMakeLists.txt: -------------------------------------------------------------------------------- 1 | cmake_minimum_required(VERSION 3.16) 2 | 3 | enable_testing() 4 | 5 | # Source files for core library (without PostgreSQL dependencies) 6 | set(CORE_SOURCES 7 | ${CMAKE_SOURCE_DIR}/src/config.cpp 8 | ${CMAKE_SOURCE_DIR}/src/core/provider_selector.cpp 9 | ${CMAKE_SOURCE_DIR}/src/core/response_formatter.cpp 10 | ${CMAKE_SOURCE_DIR}/src/core/ai_client_factory.cpp 11 | ${CMAKE_SOURCE_DIR}/src/core/logger.cpp 12 | ${CMAKE_SOURCE_DIR}/src/core/query_parser.cpp 13 | ${CMAKE_SOURCE_DIR}/src/utils.cpp 14 | ${CMAKE_SOURCE_DIR}/src/prompts.cpp 15 | ) 16 | 17 | # Create a testable core library (without PostgreSQL SPI dependencies) 18 | add_library(pg_ai_query_core STATIC ${CORE_SOURCES}) 19 | 20 | target_include_directories(pg_ai_query_core PUBLIC 21 | ${CMAKE_SOURCE_DIR}/src 22 | ${CMAKE_SOURCE_DIR}/third_party/ai-sdk-cpp/include 23 | ) 24 | 25 | target_link_libraries(pg_ai_query_core PUBLIC 26 | ai-sdk-cpp-core 27 | ai-sdk-cpp-openai 28 | ai-sdk-cpp-anthropic 29 | OpenSSL::SSL 30 | OpenSSL::Crypto 31 | ) 32 | 33 | # Don't define USE_POSTGRESQL_ELOG for tests - use stdout logging 34 | # This ensures Logger uses stdout instead of PostgreSQL's elog() 35 | 36 | # Unit tests executable 37 | add_executable(pg_ai_query_tests 38 | unit/test_config.cpp 39 | unit/test_provider_selector.cpp 40 | unit/test_response_formatter.cpp 41 | unit/test_utils.cpp 42 | unit/test_query_parser.cpp 43 | ) 44 | 45 | target_include_directories(pg_ai_query_tests PRIVATE 46 | ${CMAKE_SOURCE_DIR}/src 47 | ${CMAKE_SOURCE_DIR}/third_party/ai-sdk-cpp/include 48 | ${CMAKE_CURRENT_SOURCE_DIR} 49 | ) 50 | 51 | target_link_libraries(pg_ai_query_tests PRIVATE 52 | pg_ai_query_core 53 | gtest_main 54 | gmock 55 | ) 56 | 57 | # Define test fixtures path 58 | target_compile_definitions(pg_ai_query_tests PRIVATE 59 | TEST_FIXTURES_PATH="${CMAKE_CURRENT_SOURCE_DIR}/fixtures" 60 | ) 61 | 62 | include(GoogleTest) 63 | gtest_discover_tests(pg_ai_query_tests) -------------------------------------------------------------------------------- /tests/sql/setup.sql: -------------------------------------------------------------------------------- 1 | -- Setup script for pg_ai_query extension tests 2 | -- This file sets up the test environment 3 | 4 | -- Create test schema 5 | CREATE SCHEMA IF NOT EXISTS pg_ai_test; 6 | 7 | -- Create test tables for schema introspection tests 8 | CREATE TABLE IF NOT EXISTS pg_ai_test.users ( 9 | id SERIAL PRIMARY KEY, 10 | name VARCHAR(100) NOT NULL, 11 | email VARCHAR(255) UNIQUE NOT NULL, 12 | created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, 13 | active BOOLEAN DEFAULT true 14 | ); 15 | 16 | CREATE TABLE IF NOT EXISTS pg_ai_test.orders ( 17 | id SERIAL PRIMARY KEY, 18 | user_id INTEGER REFERENCES pg_ai_test.users(id), 19 | total_amount DECIMAL(10, 2) NOT NULL, 20 | status VARCHAR(50) DEFAULT 'pending', 21 | order_date DATE DEFAULT CURRENT_DATE 22 | ); 23 | 24 | CREATE TABLE IF NOT EXISTS pg_ai_test.products ( 25 | id SERIAL PRIMARY KEY, 26 | name VARCHAR(200) NOT NULL, 27 | price DECIMAL(10, 2) NOT NULL, 28 | category VARCHAR(100), 29 | in_stock BOOLEAN DEFAULT true 30 | ); 31 | 32 | CREATE TABLE IF NOT EXISTS pg_ai_test.order_items ( 33 | id SERIAL PRIMARY KEY, 34 | order_id INTEGER REFERENCES pg_ai_test.orders(id), 35 | product_id INTEGER REFERENCES pg_ai_test.products(id), 36 | quantity INTEGER NOT NULL, 37 | unit_price DECIMAL(10, 2) NOT NULL 38 | ); 39 | 40 | -- Create index for testing 41 | CREATE INDEX IF NOT EXISTS idx_users_email ON pg_ai_test.users(email); 42 | CREATE INDEX IF NOT EXISTS idx_orders_user_id ON pg_ai_test.orders(user_id); 43 | 44 | -- Insert some test data 45 | INSERT INTO pg_ai_test.users (name, email, active) VALUES 46 | ('Alice', 'alice@test.com', true), 47 | ('Bob', 'bob@test.com', true), 48 | ('Charlie', 'charlie@test.com', false) 49 | ON CONFLICT (email) DO NOTHING; 50 | 51 | INSERT INTO pg_ai_test.products (name, price, category, in_stock) VALUES 52 | ('Widget', 9.99, 'gadgets', true), 53 | ('Gadget', 19.99, 'gadgets', true), 54 | ('Doohickey', 29.99, 'tools', false) 55 | ON CONFLICT DO NOTHING; 56 | 57 | -- Ensure the extension is loaded 58 | CREATE EXTENSION IF NOT EXISTS pg_ai_query; 59 | -------------------------------------------------------------------------------- /.github/PULL_REQUEST_TEMPLATE/bugfix.md: -------------------------------------------------------------------------------- 1 | ## Bug Description 2 | 3 | 4 | 5 | ## Root Cause 6 | 7 | 8 | 9 | ## Solution 10 | 11 | 12 | 13 | ## Related Issue 14 | 15 | 16 | 17 | Fixes # 18 | 19 | ## Changes Made 20 | 21 | 22 | 23 | - 24 | - 25 | - 26 | 27 | ## Type of Change 28 | 29 | - [ ] Bug fix (non-breaking change which fixes an issue) 30 | - [ ] Critical bug fix (fixes a security or data integrity issue) 31 | - [ ] Breaking change (fix that would cause existing functionality to not work as expected) 32 | 33 | ## Testing 34 | 35 | 36 | 37 | ### Reproduction Steps (Before Fix) 38 | 1. 39 | 2. 40 | 3. 41 | 42 | ### Verification Steps (After Fix) 43 | 1. 44 | 2. 45 | 3. 46 | 47 | ### Test Environment 48 | - [ ] Tested with PostgreSQL version(s): 49 | - [ ] Verified extension loads correctly 50 | - [ ] Reproduced the original bug 51 | - [ ] Verified the fix resolves the issue 52 | - [ ] Tested edge cases 53 | 54 | ## Impact Assessment 55 | 56 | 57 | 58 | - **Affected Components**: 59 | - **Affected Users**: 60 | - **Severity**: Low / Medium / High / Critical 61 | 62 | ## Backwards Compatibility 63 | 64 | - [ ] This fix maintains backwards compatibility 65 | - [ ] This fix requires migration steps (describe below) 66 | 67 | ## Documentation 68 | 69 | - [ ] Updated README.md (if applicable) 70 | - [ ] Updated troubleshooting guide (if applicable) 71 | - [ ] Added comments explaining the fix 72 | - [ ] Updated configuration docs (if applicable) 73 | 74 | ## Checklist 75 | 76 | - [ ] My code follows the code style of this project 77 | - [ ] I have performed a self-review of my own code 78 | - [ ] I have tested the fix thoroughly 79 | - [ ] My changes generate no new warnings 80 | - [ ] I have added comments explaining the fix 81 | 82 | ## Additional Notes 83 | 84 | 85 | -------------------------------------------------------------------------------- /.github/PULL_REQUEST_TEMPLATE/refactor.md: -------------------------------------------------------------------------------- 1 | ## Refactoring Description 2 | 3 | 4 | 5 | ## Motivation 6 | 7 | 8 | 9 | - [ ] Improve code readability 10 | - [ ] Improve performance 11 | - [ ] Reduce technical debt 12 | - [ ] Better code organization 13 | - [ ] Prepare for future features 14 | - [ ] Other (describe below) 15 | 16 | ## Changes Made 17 | 18 | 19 | 20 | - 21 | - 22 | - 23 | 24 | ## Type of Refactoring 25 | 26 | - [ ] Code restructuring (moving files, renaming) 27 | - [ ] Algorithm optimization 28 | - [ ] Design pattern implementation 29 | - [ ] Dependency updates 30 | - [ ] Code cleanup and simplification 31 | 32 | ## Impact 33 | 34 | 35 | 36 | ### Before 37 | 38 | 39 | ### After 40 | 41 | 42 | ## Testing 43 | 44 | 45 | 46 | - [ ] All existing functionality works as before 47 | - [ ] No behavioral changes introduced 48 | - [ ] Tested with PostgreSQL version(s): 49 | - [ ] Performance benchmarks (if applicable) 50 | 51 | ## Performance Impact 52 | 53 | 54 | 55 | - [ ] No performance impact 56 | - [ ] Performance improved (provide metrics if possible) 57 | - [ ] Performance slightly degraded (justified by other benefits) 58 | 59 | ## Breaking Changes 60 | 61 | - [ ] No breaking changes 62 | - [ ] Breaking changes present (describe below and update version accordingly) 63 | 64 | ## Documentation 65 | 66 | - [ ] Updated code comments 67 | - [ ] Updated architecture documentation (if applicable) 68 | - [ ] Updated CONTRIBUTING.md (if applicable) 69 | - [ ] No documentation changes needed 70 | 71 | ## Checklist 72 | 73 | - [ ] My code follows the code style of this project 74 | - [ ] I have performed a self-review of my own code 75 | - [ ] I have tested that existing functionality still works 76 | - [ ] My changes generate no new warnings 77 | - [ ] Any dependent changes have been merged 78 | 79 | ## Additional Notes 80 | 81 | 82 | -------------------------------------------------------------------------------- /src/include/query_generator.hpp: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | #include 4 | #include 5 | 6 | #include 7 | 8 | namespace pg_ai { 9 | 10 | struct QueryRequest { 11 | std::string natural_language; 12 | std::string api_key; 13 | std::string provider; 14 | }; 15 | 16 | struct QueryResult { 17 | std::string generated_query; 18 | std::string explanation; 19 | std::vector warnings; 20 | bool row_limit_applied; 21 | std::string suggested_visualization; 22 | bool success; 23 | std::string error_message; 24 | }; 25 | 26 | struct TableInfo { 27 | std::string table_name; 28 | std::string schema_name; 29 | std::string table_type; 30 | int64_t estimated_rows; 31 | }; 32 | 33 | struct ColumnInfo { 34 | std::string column_name; 35 | std::string data_type; 36 | bool is_nullable; 37 | std::string column_default; 38 | bool is_primary_key; 39 | bool is_foreign_key; 40 | std::string foreign_table; 41 | std::string foreign_column; 42 | }; 43 | 44 | struct TableDetails { 45 | std::string table_name; 46 | std::string schema_name; 47 | std::vector columns; 48 | std::vector indexes; 49 | bool success; 50 | std::string error_message; 51 | }; 52 | 53 | struct DatabaseSchema { 54 | std::vector tables; 55 | bool success; 56 | std::string error_message; 57 | }; 58 | 59 | struct ExplainRequest { 60 | std::string query_text; 61 | std::string api_key; 62 | std::string provider; 63 | }; 64 | 65 | struct ExplainResult { 66 | std::string query; 67 | std::string explain_output; 68 | std::string ai_explanation; 69 | bool success; 70 | std::string error_message; 71 | }; 72 | 73 | class QueryGenerator { 74 | public: 75 | static QueryResult generateQuery(const QueryRequest& request); 76 | static DatabaseSchema getDatabaseTables(); 77 | static TableDetails getTableDetails( 78 | const std::string& table_name, 79 | const std::string& schema_name = "public"); 80 | static ExplainResult explainQuery(const ExplainRequest& request); 81 | 82 | static std::string formatSchemaForAI(const DatabaseSchema& schema); 83 | static std::string formatTableDetailsForAI(const TableDetails& details); 84 | 85 | private: 86 | static std::string buildPrompt(const QueryRequest& request); 87 | static void logModelSettings(const std::string& model_name, 88 | std::optional max_tokens, 89 | std::optional temperature); 90 | }; 91 | 92 | } // namespace pg_ai -------------------------------------------------------------------------------- /src/include/spi_connection.hpp: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | extern "C" { 4 | #include 5 | 6 | #include 7 | } 8 | 9 | #include 10 | 11 | namespace pg_ai { 12 | 13 | /** 14 | * @brief RAII wrapper for PostgreSQL SPI connections 15 | * 16 | * Automatically connects to SPI on construction and disconnects on destruction. 17 | * This ensures SPI connections are properly cleaned up even when exceptions 18 | * occur. 19 | */ 20 | class SPIConnection { 21 | public: 22 | SPIConnection(); 23 | ~SPIConnection(); 24 | 25 | SPIConnection(const SPIConnection&) = delete; 26 | SPIConnection& operator=(const SPIConnection&) = delete; 27 | 28 | SPIConnection(SPIConnection&& other) noexcept; 29 | SPIConnection& operator=(SPIConnection&& other) noexcept; 30 | 31 | /** 32 | * @brief Check if the SPI connection is valid 33 | */ 34 | bool isConnected() const { return connected_; } 35 | 36 | /** 37 | * @brief Get the error message if connection failed 38 | */ 39 | const std::string& getErrorMessage() const { return error_message_; } 40 | 41 | /** 42 | * @brief Explicit conversion to bool for easy checking 43 | */ 44 | explicit operator bool() const { return connected_; } 45 | 46 | private: 47 | bool connected_ = false; 48 | std::string error_message_; 49 | }; 50 | 51 | /** 52 | * @brief Helper struct for managing SPI values that need to be freed 53 | * 54 | * Automatically frees the value with pfree on destruction. 55 | */ 56 | class SPIValue { 57 | public: 58 | explicit SPIValue(char* value) : value_(value) {} 59 | ~SPIValue() { 60 | if (value_) { 61 | pfree(value_); 62 | } 63 | } 64 | 65 | SPIValue(const SPIValue&) = delete; 66 | SPIValue& operator=(const SPIValue&) = delete; 67 | 68 | SPIValue(SPIValue&& other) noexcept : value_(other.value_) { 69 | other.value_ = nullptr; 70 | } 71 | 72 | SPIValue& operator=(SPIValue&& other) noexcept { 73 | if (this != &other) { 74 | if (value_) { 75 | pfree(value_); 76 | } 77 | value_ = other.value_; 78 | other.value_ = nullptr; 79 | } 80 | return *this; 81 | } 82 | 83 | /** 84 | * @brief Get the raw char* value 85 | */ 86 | char* get() const { return value_; } 87 | 88 | /** 89 | * @brief Convert to string, returns empty string if null 90 | */ 91 | std::string toString() const { return value_ ? std::string(value_) : ""; } 92 | 93 | /** 94 | * @brief Check if the value is not null 95 | */ 96 | explicit operator bool() const { return value_ != nullptr; } 97 | 98 | private: 99 | char* value_; 100 | }; 101 | 102 | } // namespace pg_ai 103 | -------------------------------------------------------------------------------- /docs/src/error-codes.md: -------------------------------------------------------------------------------- 1 | # Error Codes and Messages 2 | 3 | This page provides a comprehensive reference for all error codes and messages that you might encounter while using the pg_ai_query extension. 4 | 5 | ## Function-Specific Errors 6 | 7 | ### generate_query Errors 8 | 9 | | Error Message | Cause | Solution | 10 | |---------------|-------|----------| 11 | | `"API key required. Pass as parameter or configure ~/.pg_ai.config"` | No API key provided and none configured | Add API key to config file or pass as parameter | 12 | | `"No API key available for [provider] provider"` | API key missing for specific provider | Configure API key for the requested provider | 13 | | `"AI API error: [details]"` | AI service returned an error | Check API key validity and service status | 14 | | `"Natural language query cannot be empty"` | Empty input provided | Provide a non-empty query description | 15 | | `"Query generation failed: [details]"` | AI failed to generate query | Check your description clarity and try again | 16 | 17 | ### explain_query Errors 18 | 19 | | Error Message | Cause | Solution | 20 | |---------------|-------|----------| 21 | | `"Query text cannot be empty"` | Empty query provided | Provide a valid SQL query | 22 | | `"Failed to prepare EXPLAIN query: syntax error"` | Invalid SQL syntax | Fix SQL syntax errors | 23 | | `"Failed to execute EXPLAIN query"` | Query execution failed | Check query validity and permissions | 24 | | `"Failed to connect to SPI"` | Database connection issue | Check database connectivity | 25 | | `"No output from EXPLAIN query"` | EXPLAIN returned no results | Verify query syntax and structure | 26 | 27 | ### get_database_tables Errors 28 | 29 | | Error Message | Cause | Solution | 30 | |---------------|-------|----------| 31 | | `"Failed to connect to SPI"` | Database connection issue | Check database connectivity | 32 | | `"Failed to execute query"` | Database query failed | Check permissions and database status | 33 | | `"No tables found"` | No user tables in database | Create tables or check schema permissions | 34 | 35 | ### get_table_details Errors 36 | 37 | | Error Message | Cause | Solution | 38 | |---------------|-------|----------| 39 | | `"Table '[table_name]' does not exist"` | Specified table not found | Check table name and schema | 40 | | `"Failed to get table details: [details]"` | Error retrieving table info | Check permissions and table existence | 41 | 42 | ## Debugging Commands 43 | 44 | ```sql 45 | -- Check extension status 46 | SELECT extname, extversion FROM pg_extension WHERE extname = 'pg_ai_query'; 47 | 48 | -- Test basic functionality 49 | SELECT generate_query('SELECT 1'); 50 | 51 | -- Check function permissions 52 | \df generate_query 53 | ``` 54 | 55 | ## See Also 56 | 57 | - [Troubleshooting Guide](./troubleshooting.md) 58 | - [Function Reference](./function-reference.md) -------------------------------------------------------------------------------- /.agents/commands/gh-issue: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env nu 2 | 3 | # A command to generate an agent prompt to diagnose and formulate 4 | # a plan for resolving a GitHub issue for pg_ai_query. 5 | # 6 | # IMPORTANT: This command is prompted to NOT write any code and to ONLY 7 | # produce a plan. You should still be vigilant when running this but that 8 | # is the expected behavior. 9 | # 10 | # The `` parameter can be either an issue number or a full GitHub 11 | # issue URL. 12 | def main [ 13 | issue: any, # pg_ai_query issue number or URL 14 | --repo: string = "benodiwal/pg_ai_query" # GitHub repository in the format "owner/repo" 15 | ] { 16 | # TODO: This whole script doesn't handle errors very well. I actually 17 | # don't know Nu well enough to know the proper way to handle it all. 18 | 19 | let issueData = gh issue view $issue --json author,title,number,body,comments | from json 20 | let comments = $issueData.comments | each { |comment| 21 | $" 22 | ### Comment by ($comment.author.login) 23 | ($comment.body) 24 | " | str trim 25 | } | str join "\n\n" 26 | 27 | $" 28 | Deep-dive on this GitHub issue for pg_ai_query. Find the problem and generate a plan. 29 | Do not write code. Explain the problem clearly and propose a comprehensive plan 30 | to solve it. 31 | 32 | # ($issueData.title) \(($issueData.number)\) 33 | 34 | ## Description 35 | ($issueData.body) 36 | 37 | ## Comments 38 | ($comments) 39 | 40 | ## Your Tasks 41 | 42 | You are an experienced PostgreSQL and AI integration developer tasked with diagnosing issues. 43 | 44 | 1. Review the issue context and details. 45 | 2. Examine the relevant parts of the codebase. Analyze the code thoroughly 46 | until you have a solid understanding of how it works. 47 | 3. Explain the issue in detail, including the problem and its root cause. 48 | 4. Create a comprehensive plan to solve the issue. The plan should include: 49 | - Required code changes (SQL, C, or other relevant languages) 50 | - Potential impacts on PostgreSQL extension functionality 51 | - Database compatibility considerations 52 | - Necessary tests to be written or updated 53 | - Documentation updates 54 | - Performance considerations (especially for database operations) 55 | - Security implications (SQL injection, permissions, etc.) 56 | - Backwards compatibility with existing PostgreSQL versions 57 | - AI model integration considerations (if applicable) 58 | - Include the reference link to the source issue and any related discussions 59 | 5. Think deeply about all aspects of the task. Consider edge cases, potential 60 | challenges, and best practices for PostgreSQL extensions and AI integrations. 61 | Review the plan with the oracle and adjust it based on its feedback. 62 | 63 | **ONLY CREATE A PLAN. DO NOT WRITE ANY CODE.** Your task is to create 64 | a thorough, comprehensive strategy for understanding and resolving the issue. 65 | " | str trim 66 | } 67 | -------------------------------------------------------------------------------- /sql/pg_ai_query--1.0.sql: -------------------------------------------------------------------------------- 1 | -- AI Query Generator Extension for PostgreSQL 2 | -- Generates SQL queries from natural language using OpenAI 3 | 4 | -- Version 1.0 5 | 6 | -- Main function: Generate SQL from natural language with automatic schema discovery 7 | CREATE OR REPLACE FUNCTION generate_query( 8 | natural_language_query text, 9 | api_key text DEFAULT NULL, 10 | provider text DEFAULT 'auto' 11 | ) 12 | RETURNS text 13 | AS 'MODULE_PATHNAME', 'generate_query' 14 | LANGUAGE C; 15 | 16 | -- Example usage: 17 | -- SELECT generate_query('Show me all users created in the last 7 days'); 18 | -- SELECT generate_query('Count orders by status'); 19 | -- SELECT generate_query('Show me all users', 'your-api-key-here'); 20 | -- SELECT generate_query('Show me all users', 'your-api-key-here', 'openai'); 21 | -- SELECT generate_query('Show me all users', 'your-api-key-here', 'anthropic'); 22 | 23 | COMMENT ON FUNCTION generate_query(text, text, text) IS 24 | 'Generate a PostgreSQL SELECT query from natural language description with automatic database schema discovery. Provider options: openai, anthropic, auto (default). Pass API key as parameter or configure ~/.pg_ai.config.'; 25 | 26 | -- Get all tables in the database with metadata 27 | CREATE OR REPLACE FUNCTION get_database_tables() 28 | RETURNS text 29 | AS 'MODULE_PATHNAME', 'get_database_tables' 30 | LANGUAGE C; 31 | 32 | -- Get detailed information about a specific table 33 | CREATE OR REPLACE FUNCTION get_table_details( 34 | table_name text, 35 | schema_name text DEFAULT 'public' 36 | ) 37 | RETURNS text 38 | AS 'MODULE_PATHNAME', 'get_table_details' 39 | LANGUAGE C; 40 | 41 | -- Example usage: 42 | -- SELECT pg_get_database_tables(); 43 | -- SELECT pg_get_table_details('users'); 44 | -- SELECT pg_get_table_details('orders', 'public'); 45 | 46 | COMMENT ON FUNCTION get_database_tables() IS 47 | 'Returns JSON array of all user tables in the database with metadata including table name, schema, type, and estimated row count.'; 48 | 49 | COMMENT ON FUNCTION get_table_details(text, text) IS 50 | 'Returns detailed JSON information about a specific table including columns with their data types, constraints, foreign keys, and indexes.'; 51 | 52 | -- Explain query function: Runs EXPLAIN ANALYZE and provides AI-generated explanation 53 | CREATE OR REPLACE FUNCTION explain_query( 54 | query_text text, 55 | api_key text DEFAULT NULL, 56 | provider text DEFAULT 'auto' 57 | ) 58 | RETURNS text 59 | AS 'MODULE_PATHNAME', 'explain_query' 60 | LANGUAGE C 61 | VOLATILE 62 | SECURITY DEFINER; 63 | 64 | -- Example usage: 65 | -- SELECT explain_query('SELECT * FROM users WHERE created_at > NOW() - INTERVAL ''7 days'''); 66 | -- SELECT explain_query('SELECT u.name, COUNT(o.id) FROM users u LEFT JOIN orders o ON u.id = o.user_id GROUP BY u.id', 'your-api-key-here'); 67 | -- SELECT explain_query('SELECT * FROM products ORDER BY price DESC LIMIT 10', 'your-api-key-here', 'openai'); 68 | 69 | COMMENT ON FUNCTION explain_query(text, text, text) IS 70 | 'Runs EXPLAIN ANALYZE on a query and returns an AI-generated explanation of the execution plan, performance insights, and optimization suggestions. Provider options: openai, anthropic, auto (default). Pass API key as parameter or configure ~/.pg_ai.config.'; 71 | 72 | -------------------------------------------------------------------------------- /src/utils.cpp: -------------------------------------------------------------------------------- 1 | #include "./include/utils.hpp" 2 | 3 | #include 4 | #include 5 | #include 6 | #include 7 | 8 | #include 9 | 10 | #include "./include/logger.hpp" 11 | 12 | namespace pg_ai::utils { 13 | 14 | std::pair read_file(const std::string& filepath) { 15 | std::ifstream file(filepath, std::ios::binary | std::ios::ate); 16 | if (!file) { 17 | logger::Logger::error("Failed to open file: " + filepath); 18 | return {false, {}}; 19 | } 20 | 21 | const auto size = file.tellg(); 22 | if (size == -1) { 23 | logger::Logger::error("Invalid file size: " + filepath); 24 | return {false, {}}; 25 | } 26 | 27 | file.seekg(0, std::ios::beg); 28 | 29 | std::string content(static_cast(size), '\0'); 30 | if (size > 0) { 31 | if (!file.read(&content[0], static_cast(size))) { 32 | logger::Logger::error("Failed to read file: " + filepath); 33 | return {false, {}}; 34 | } 35 | } 36 | 37 | return {true, std::move(content)}; 38 | } 39 | 40 | std::string read_file_or_throw(const std::string& filepath) { 41 | auto [success, content] = read_file(filepath); 42 | if (!success) { 43 | throw std::runtime_error("Failed to read file: " + filepath); 44 | } 45 | return std::move(content); 46 | } 47 | 48 | // CR-someday @benodiwal: This is the basic version of API Error formatting, 49 | // there is a lot of place for improvement. Currently it focuses on wrong model 50 | // names in conf relate errors. 51 | std::string formatAPIError(const std::string& raw_error) { 52 | std::string error_to_parse = raw_error; 53 | 54 | size_t json_start = raw_error.find('{'); 55 | if (json_start != std::string::npos) { 56 | error_to_parse = raw_error.substr(json_start); 57 | } 58 | 59 | try { 60 | auto error_json = nlohmann::json::parse(error_to_parse); 61 | 62 | if (error_json.contains("error")) { 63 | auto error_obj = error_json["error"]; 64 | 65 | if (error_obj.contains("type") && 66 | error_obj["type"] == "not_found_error") { 67 | if (error_obj.contains("message")) { 68 | std::string msg = error_obj["message"]; 69 | 70 | size_t model_pos = msg.find("model:"); 71 | if (model_pos != std::string::npos) { 72 | std::string model_name = msg.substr(model_pos + 7); 73 | model_name.erase(0, model_name.find_first_not_of(" \t")); 74 | model_name.erase(model_name.find_last_not_of(" \t") + 1); 75 | 76 | return "Invalid model '" + model_name + 77 | "'. Please check your configuration and use a valid model " 78 | "name. " 79 | "Common models: 'claude-sonnet-4-5-20250929' (Anthropic), " 80 | "'gpt-4o' (OpenAI)."; 81 | } 82 | } 83 | return "Model not found. Please check your model configuration and " 84 | "ensure you're using a valid model name."; 85 | } 86 | 87 | if (error_obj.contains("message")) { 88 | return error_obj["message"]; 89 | } 90 | } 91 | } catch (const nlohmann::json::exception&) { 92 | } 93 | 94 | return raw_error; 95 | } 96 | 97 | } // namespace pg_ai::utils -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/feature_request.yml: -------------------------------------------------------------------------------- 1 | name: Feature Request 2 | description: Suggest a new feature or enhancement for pg_ai_query 3 | title: "[Feature]: " 4 | labels: ["enhancement"] 5 | body: 6 | - type: markdown 7 | attributes: 8 | value: | 9 | Thanks for suggesting a feature! We appreciate your ideas for improving pg_ai_query. 10 | 11 | - type: textarea 12 | id: problem 13 | attributes: 14 | label: Problem Statement 15 | description: Is your feature request related to a problem? Please describe. 16 | placeholder: I'm frustrated when... / It would be helpful if... 17 | validations: 18 | required: true 19 | 20 | - type: textarea 21 | id: solution 22 | attributes: 23 | label: Proposed Solution 24 | description: Describe the solution you'd like to see implemented 25 | placeholder: Describe your proposed solution... 26 | validations: 27 | required: true 28 | 29 | - type: textarea 30 | id: alternatives 31 | attributes: 32 | label: Alternatives Considered 33 | description: Describe any alternative solutions or features you've considered 34 | placeholder: I've also considered... 35 | 36 | - type: dropdown 37 | id: category 38 | attributes: 39 | label: Feature Category 40 | description: What category does this feature fall into? 41 | options: 42 | - AI/ML improvements 43 | - Query generation 44 | - Performance analysis 45 | - Configuration/Settings 46 | - Schema discovery 47 | - Response formatting 48 | - Security/Safety 49 | - Documentation 50 | - Other 51 | validations: 52 | required: true 53 | 54 | - type: textarea 55 | id: use-case 56 | attributes: 57 | label: Use Case 58 | description: Describe a specific use case where this feature would be valuable 59 | placeholder: This would help when... 60 | validations: 61 | required: true 62 | 63 | - type: textarea 64 | id: example 65 | attributes: 66 | label: Example Usage 67 | description: Provide an example of how this feature would be used 68 | placeholder: | 69 | SELECT new_function('example usage'); 70 | render: sql 71 | 72 | - type: dropdown 73 | id: priority 74 | attributes: 75 | label: Priority 76 | description: How important is this feature to you? 77 | options: 78 | - Critical - Blocking my work 79 | - High - Would significantly improve my workflow 80 | - Medium - Nice to have 81 | - Low - Minor improvement 82 | validations: 83 | required: true 84 | 85 | - type: checkboxes 86 | id: contribution 87 | attributes: 88 | label: Contribution 89 | description: Would you be willing to contribute to this feature? 90 | options: 91 | - label: I would be willing to submit a PR for this feature 92 | - label: I would be willing to help test this feature 93 | 94 | - type: textarea 95 | id: additional 96 | attributes: 97 | label: Additional Context 98 | description: Any other context, mockups, or examples 99 | placeholder: Add any other context, screenshots, or examples here... 100 | -------------------------------------------------------------------------------- /CLAUDE.md: -------------------------------------------------------------------------------- 1 | # pg_ai_query - Claude Code Instructions 2 | 3 | ## Project Overview 4 | 5 | This is a PostgreSQL extension written in C++20 that generates SQL queries from natural language using AI models (OpenAI and Anthropic). 6 | 7 | ## Tech Stack 8 | 9 | - **Language**: C++20 10 | - **Build System**: CMake 3.16+ 11 | - **Database**: PostgreSQL 14+ extension 12 | - **Dependencies**: ai-sdk-cpp (git submodule in `third_party/`) 13 | 14 | ## Project Structure 15 | 16 | ``` 17 | src/ 18 | ├── include/ # Header files (.hpp) 19 | ├── core/ # Core implementation 20 | │ ├── ai_client_factory.cpp 21 | │ ├── provider_selector.cpp 22 | │ ├── query_generator.cpp 23 | │ ├── response_formatter.cpp 24 | │ ├── spi_connection.cpp 25 | │ └── logger.cpp 26 | ├── config.cpp # Configuration handling 27 | ├── prompts.cpp # AI prompt templates 28 | ├── utils.cpp # Utility functions 29 | └── pg_ai_query.cpp # Main extension entry point 30 | 31 | sql/ 32 | └── pg_ai_query--1.0.sql # PostgreSQL extension SQL definitions 33 | 34 | docs/ # mdbook documentation 35 | ``` 36 | 37 | ## Build Commands 38 | 39 | ```bash 40 | # Full build 41 | mkdir -p build && cd build && cmake .. && make 42 | 43 | # Install to PostgreSQL 44 | sudo make install 45 | 46 | # Clean build 47 | make clean 48 | ``` 49 | 50 | ## Code Style 51 | 52 | - Follow `.clang-format` configuration 53 | - Use C++20 features where appropriate 54 | - Header files use `.hpp` extension 55 | - Implementation files use `.cpp` extension 56 | - Use `#pragma once` for header guards 57 | 58 | ## Key Conventions 59 | 60 | 1. **Error Handling**: Use PostgreSQL's `ereport()` for errors 61 | 2. **Memory Management**: Use PostgreSQL's memory contexts (`palloc`, `pfree`) 62 | 3. **String Handling**: Use PostgreSQL's text handling functions 63 | 4. **Logging**: Use the custom Logger class in `src/core/logger.cpp` 64 | 65 | ## PostgreSQL Extension Guidelines 66 | 67 | - All exported functions must be declared with `PG_FUNCTION_INFO_V1()` 68 | - Use SPI (Server Programming Interface) for database queries 69 | - Configuration is read from `~/.pg_ai.config` 70 | 71 | ## Testing 72 | 73 | ```bash 74 | # Connect to PostgreSQL and test 75 | psql -d your_database -c "CREATE EXTENSION IF NOT EXISTS pg_ai_query;" 76 | psql -d your_database -c "SELECT generate_query('show all users');" 77 | ``` 78 | 79 | ## Important Files 80 | 81 | - `CMakeLists.txt` - Build configuration 82 | - `sql/pg_ai_query--1.0.sql` - Extension SQL definitions 83 | - `src/pg_ai_query.cpp` - Main entry point with PG functions 84 | - `src/core/query_generator.cpp` - Query generation logic 85 | - `src/prompts.cpp` - AI prompt templates 86 | 87 | ## Security Considerations 88 | 89 | - Never expose API keys in code or logs 90 | - Block access to system tables (`information_schema`, `pg_catalog`) 91 | - Validate and sanitize all user inputs 92 | - Enforce query limits to prevent resource exhaustion 93 | 94 | ## When Making Changes 95 | 96 | 1. Always read the relevant header file before modifying implementation 97 | 2. Check `src/include/` for type definitions and interfaces 98 | 3. Run `cmake ..` after modifying `CMakeLists.txt` 99 | 4. Test changes with actual PostgreSQL queries 100 | 5. Ensure backwards compatibility with PostgreSQL 14+ 101 | -------------------------------------------------------------------------------- /.github/workflows/claude-review.yml: -------------------------------------------------------------------------------- 1 | name: Claude PR Review 2 | 3 | on: 4 | issue_comment: 5 | types: [created] 6 | 7 | permissions: 8 | contents: write 9 | pull-requests: write 10 | issues: write 11 | id-token: write 12 | 13 | jobs: 14 | claude: 15 | if: | 16 | github.event.issue.pull_request && 17 | contains(github.event.comment.body, '/claude') && 18 | github.event.comment.user.login == github.repository_owner 19 | runs-on: macos-latest 20 | steps: 21 | - name: Add reaction to comment 22 | uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8.0.0 23 | with: 24 | script: | 25 | await github.rest.reactions.createForIssueComment({ 26 | owner: context.repo.owner, 27 | repo: context.repo.repo, 28 | comment_id: context.payload.comment.id, 29 | content: 'eyes' 30 | }); 31 | 32 | - name: Get PR details 33 | id: pr 34 | uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8.0.0 35 | with: 36 | script: | 37 | const pr = await github.rest.pulls.get({ 38 | owner: context.repo.owner, 39 | repo: context.repo.repo, 40 | pull_number: context.issue.number 41 | }); 42 | core.setOutput('head_sha', pr.data.head.sha); 43 | core.setOutput('head_repo', pr.data.head.repo.full_name); 44 | core.setOutput('head_ref', pr.data.head.ref); 45 | 46 | - name: Checkout repository 47 | uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1 48 | with: 49 | fetch-depth: 0 50 | repository: ${{ steps.pr.outputs.head_repo }} 51 | ref: ${{ steps.pr.outputs.head_ref }} 52 | 53 | - name: Run Claude Code 54 | uses: anthropics/claude-code-action@f0c8eb29807907de7f5412d04afceb5e24817127 # v1.0.23 55 | with: 56 | anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }} 57 | model: ${{ vars.CLAUDE_MODEL || 'claude-sonnet-4-20250514' }} 58 | trigger_phrase: "/claude" 59 | allowed_tools: "Bash,Read,Glob,Grep,Edit,Write" 60 | custom_instructions: | 61 | You are reviewing a PostgreSQL C++ extension project (pg_ai_query). 62 | 63 | When asked to review: 64 | 1. Analyze the PR changes 65 | 2. Build the project: mkdir -p build && cd build && cmake .. && make 66 | 3. Run any available tests 67 | 4. Provide feedback on: 68 | - C++ code quality and best practices 69 | - PostgreSQL extension patterns (SPI, memory contexts, ereport) 70 | - Security issues (SQL injection, API key exposure) 71 | - Build errors or warnings 72 | 73 | Be concise and actionable. Fix simple issues directly if asked. 74 | 75 | - name: Add success reaction 76 | if: success() 77 | uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8.0.0 78 | with: 79 | script: | 80 | await github.rest.reactions.createForIssueComment({ 81 | owner: context.repo.owner, 82 | repo: context.repo.repo, 83 | comment_id: context.payload.comment.id, 84 | content: 'rocket' 85 | }); 86 | -------------------------------------------------------------------------------- /src/core/response_formatter.cpp: -------------------------------------------------------------------------------- 1 | #include "../include/response_formatter.hpp" 2 | 3 | #include 4 | #include 5 | 6 | namespace pg_ai { 7 | 8 | std::string ResponseFormatter::formatResponse( 9 | const QueryResult& result, 10 | const config::Configuration& config) { 11 | if (config.use_formatted_response) { 12 | return createJSONResponse(result, config); 13 | } else { 14 | return createPlainTextResponse(result, config); 15 | } 16 | } 17 | 18 | std::string ResponseFormatter::createJSONResponse( 19 | const QueryResult& result, 20 | const config::Configuration& config) { 21 | nlohmann::json response; 22 | 23 | // Always include the query 24 | response["query"] = result.generated_query; 25 | response["success"] = result.success; 26 | 27 | // Add optional fields based on configuration 28 | if (config.show_explanation && !result.explanation.empty()) { 29 | response["explanation"] = result.explanation; 30 | } 31 | 32 | if (config.show_warnings && !result.warnings.empty()) { 33 | response["warnings"] = result.warnings; 34 | } 35 | 36 | if (config.show_suggested_visualization && 37 | !result.suggested_visualization.empty()) { 38 | response["suggested_visualization"] = result.suggested_visualization; 39 | } 40 | 41 | // Add metadata 42 | if (result.row_limit_applied) { 43 | response["row_limit_applied"] = true; 44 | } 45 | 46 | return response.dump(2); // Pretty print with 2-space indentation 47 | } 48 | 49 | std::string ResponseFormatter::createPlainTextResponse( 50 | const QueryResult& result, 51 | const config::Configuration& config) { 52 | std::ostringstream output; 53 | 54 | // Main query result 55 | output << result.generated_query; 56 | 57 | // Add explanation if enabled 58 | if (config.show_explanation && !result.explanation.empty()) { 59 | output << "\n\n-- Explanation:\n-- " << result.explanation; 60 | } 61 | 62 | // Add warnings if enabled 63 | if (config.show_warnings && !result.warnings.empty()) { 64 | output << "\n\n" << formatWarnings(result.warnings); 65 | } 66 | 67 | // Add suggested visualization if enabled 68 | if (config.show_suggested_visualization && 69 | !result.suggested_visualization.empty()) { 70 | output << "\n\n" << formatVisualization(result.suggested_visualization); 71 | } 72 | 73 | // Add metadata 74 | if (result.row_limit_applied) { 75 | output << "\n\n-- Note: Row limit was automatically applied to this query " 76 | "for safety"; 77 | } 78 | 79 | return output.str(); 80 | } 81 | 82 | std::string ResponseFormatter::formatWarnings( 83 | const std::vector& warnings) { 84 | std::ostringstream output; 85 | 86 | if (warnings.size() == 1) { 87 | output << "-- Warning: " << warnings[0]; 88 | } else { 89 | output << "-- Warnings:"; 90 | for (size_t i = 0; i < warnings.size(); ++i) { 91 | output << "\n-- " << (i + 1) << ". " << warnings[i]; 92 | } 93 | } 94 | 95 | return output.str(); 96 | } 97 | 98 | std::string ResponseFormatter::formatVisualization( 99 | const std::string& visualization) { 100 | std::ostringstream output; 101 | 102 | output << "-- Suggested Visualization:\n-- " << visualization; 103 | 104 | return output.str(); 105 | } 106 | 107 | } // namespace pg_ai -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/bug_report.yml: -------------------------------------------------------------------------------- 1 | name: Bug Report 2 | description: Report a bug or unexpected behavior in pg_ai_query 3 | title: "[Bug]: " 4 | labels: ["bug", "needs-triage"] 5 | body: 6 | - type: markdown 7 | attributes: 8 | value: | 9 | Thanks for taking the time to report a bug! Please fill out the following information to help us investigate and fix the issue. 10 | 11 | - type: textarea 12 | id: description 13 | attributes: 14 | label: Bug Description 15 | description: A clear and concise description of what the bug is. 16 | placeholder: Describe the bug you encountered... 17 | validations: 18 | required: true 19 | 20 | - type: textarea 21 | id: reproduction 22 | attributes: 23 | label: Steps to Reproduce 24 | description: Steps to reproduce the behavior 25 | placeholder: | 26 | 1. Create extension with '...' 27 | 2. Run query '...' 28 | 3. See error 29 | validations: 30 | required: true 31 | 32 | - type: textarea 33 | id: expected 34 | attributes: 35 | label: Expected Behavior 36 | description: What did you expect to happen? 37 | placeholder: Describe what you expected to happen... 38 | validations: 39 | required: true 40 | 41 | - type: textarea 42 | id: actual 43 | attributes: 44 | label: Actual Behavior 45 | description: What actually happened? 46 | placeholder: Describe what actually happened... 47 | validations: 48 | required: true 49 | 50 | - type: textarea 51 | id: query 52 | attributes: 53 | label: SQL Query/Function Call 54 | description: The exact SQL query or function call that triggered the issue 55 | placeholder: | 56 | SELECT generate_query('your prompt here'); 57 | render: sql 58 | 59 | - type: textarea 60 | id: logs 61 | attributes: 62 | label: Error Messages/Logs 63 | description: Any error messages or relevant log output 64 | placeholder: Paste error messages or logs here... 65 | render: shell 66 | 67 | - type: input 68 | id: pg-version 69 | attributes: 70 | label: PostgreSQL Version 71 | description: Output of `SELECT version();` 72 | placeholder: PostgreSQL 16.0 73 | validations: 74 | required: true 75 | 76 | - type: dropdown 77 | id: os 78 | attributes: 79 | label: Operating System 80 | options: 81 | - Linux 82 | - macOS 83 | - Windows 84 | - Other 85 | validations: 86 | required: true 87 | 88 | - type: input 89 | id: os-version 90 | attributes: 91 | label: OS Version 92 | description: e.g., Ubuntu 22.04, macOS 14.0, Windows 11 93 | placeholder: Ubuntu 22.04 94 | 95 | - type: textarea 96 | id: config 97 | attributes: 98 | label: Configuration 99 | description: Your pg_ai.config file (remove sensitive API keys!) 100 | placeholder: | 101 | [general] 102 | log_level = "INFO" 103 | enable_logging = false 104 | ... 105 | render: ini 106 | 107 | - type: textarea 108 | id: additional 109 | attributes: 110 | label: Additional Context 111 | description: Any other context about the problem 112 | placeholder: Add any other context, screenshots, or information about the problem here... 113 | -------------------------------------------------------------------------------- /src/core/ai_client_factory.cpp: -------------------------------------------------------------------------------- 1 | #include "../include/ai_client_factory.hpp" 2 | 3 | #include 4 | #include 5 | 6 | #include "../include/logger.hpp" 7 | 8 | namespace pg_ai { 9 | 10 | AIClientResult AIClientFactory::createClient( 11 | config::Provider provider, 12 | const std::string& api_key, 13 | const config::ProviderConfig* provider_config) { 14 | AIClientResult result; 15 | result.success = true; 16 | 17 | try { 18 | switch (provider) { 19 | case config::Provider::OPENAI: { 20 | logger::Logger::info("Creating OpenAI client"); 21 | 22 | // Determine the API endpoint to use 23 | std::string base_url = 24 | (provider_config && !provider_config->api_endpoint.empty()) 25 | ? provider_config->api_endpoint 26 | : config::constants::DEFAULT_OPENAI_ENDPOINT; 27 | 28 | if (provider_config && !provider_config->api_endpoint.empty()) { 29 | logger::Logger::info("Using custom OpenAI endpoint: " + base_url); 30 | } 31 | 32 | result.client = ai::openai::create_client(api_key, base_url); 33 | result.model_name = 34 | (provider_config && !provider_config->default_model.empty()) 35 | ? provider_config->default_model 36 | : constants::DEFAULT_OPENAI_MODEL; 37 | break; 38 | } 39 | 40 | case config::Provider::ANTHROPIC: { 41 | logger::Logger::info("Creating Anthropic client"); 42 | 43 | // Determine the API endpoint to use 44 | std::string base_url = 45 | (provider_config && !provider_config->api_endpoint.empty()) 46 | ? provider_config->api_endpoint 47 | : config::constants::DEFAULT_ANTHROPIC_ENDPOINT; 48 | 49 | if (provider_config && !provider_config->api_endpoint.empty()) { 50 | logger::Logger::info("Using custom Anthropic endpoint: " + base_url); 51 | } 52 | 53 | result.client = ai::anthropic::create_client(api_key, base_url); 54 | result.model_name = 55 | (provider_config && !provider_config->default_model.empty()) 56 | ? provider_config->default_model 57 | : constants::DEFAULT_ANTHROPIC_MODEL; 58 | break; 59 | } 60 | 61 | default: { 62 | logger::Logger::warning("Unknown provider, defaulting to OpenAI"); 63 | result.client = ai::openai::create_client( 64 | api_key, config::constants::DEFAULT_OPENAI_ENDPOINT); 65 | result.model_name = constants::DEFAULT_OPENAI_MODEL; 66 | break; 67 | } 68 | } 69 | 70 | logger::Logger::info("Using Provider: " + 71 | config::ConfigManager::providerToString(provider)); 72 | 73 | } catch (const std::exception& e) { 74 | logger::Logger::error("Failed to create " + 75 | config::ConfigManager::providerToString(provider) + 76 | " client: " + std::string(e.what())); 77 | result.success = false; 78 | result.error_message = 79 | "Failed to create AI client: " + std::string(e.what()); 80 | } 81 | 82 | return result; 83 | } 84 | 85 | std::string AIClientFactory::getDefaultModel(config::Provider provider) { 86 | switch (provider) { 87 | case config::Provider::OPENAI: 88 | return constants::DEFAULT_OPENAI_MODEL; 89 | case config::Provider::ANTHROPIC: 90 | return constants::DEFAULT_ANTHROPIC_MODEL; 91 | default: 92 | return constants::DEFAULT_OPENAI_MODEL; 93 | } 94 | } 95 | 96 | } // namespace pg_ai 97 | -------------------------------------------------------------------------------- /example_pg_ai.config: -------------------------------------------------------------------------------- 1 | # PostgreSQL AI Query Extension Configuration 2 | # Copy this file to ~/.pg_ai.config and modify as needed 3 | 4 | [general] 5 | # Logging configuration 6 | log_level = "INFO" 7 | enable_logging = true 8 | 9 | # Request timeout in milliseconds (default: 30000 = 30 seconds) 10 | request_timeout_ms = 30000 11 | 12 | # Maximum number of retries for failed requests 13 | max_retries = 3 14 | 15 | [query] 16 | # Automatically enforce LIMIT on SELECT queries for safety 17 | enforce_limit = true 18 | 19 | # Default row limit for queries (when no LIMIT is specified) 20 | default_limit = 1000 21 | 22 | [response] 23 | # Show detailed explanation of what the query does 24 | show_explanation = true 25 | 26 | # Show warnings about performance, security, or data implications 27 | show_warnings = true 28 | 29 | # Show suggested visualization type for query results 30 | show_suggested_visualization = true 31 | 32 | # Use formatted response (JSON format) instead of plain SQL 33 | # When enabled, returns structured JSON with query, explanation, warnings, etc. 34 | # When disabled, returns plain SQL with optional comments 35 | use_formatted_response = false 36 | 37 | [openai] 38 | # OpenAI API key - get from https://platform.openai.com 39 | api_key = "your-openai-api-key-here" 40 | 41 | # Default model to use (gpt-4o, gpt-4, gpt-3.5-turbo) 42 | default_model = "gpt-4o" 43 | 44 | # Custom API endpoint (optional) - for OpenAI-compatible APIs 45 | # Note: Do NOT include /v1 - the SDK appends /v1/chat/completions automatically 46 | # Examples: 47 | # - Ollama: http://localhost:11434 48 | # - LiteLLM: http://localhost:8000 49 | # - OpenRouter: https://openrouter.ai/api 50 | # - Together AI: https://api.together.xyz 51 | # - vLLM: http://localhost:8000 52 | # api_endpoint = "https://api.openai.com" 53 | 54 | [anthropic] 55 | # Anthropic API key - get from https://console.anthropic.com 56 | api_key = "your-anthropic-api-key-here" 57 | 58 | # Default model to use 59 | default_model = "claude-3-5-sonnet-20241022" 60 | 61 | # Custom API endpoint (optional) - for Anthropic-compatible APIs 62 | # api_endpoint = "https://api.anthropic.com" 63 | 64 | [gemini] 65 | # Google Gemini API key - get from https://ai.google.dev 66 | api_key = "your-gemini-api-key-here" 67 | 68 | # Default model to use (gemini-2.5-flash recommended for best price-performance) 69 | default_model = "gemini-2.5-flash" 70 | # Other available models: 71 | # default_model = "gemini-2.5-pro" # For complex reasoning tasks 72 | # default_model = "gemini-2.5-flash-lite" # For maximum speed/cost efficiency 73 | # default_model = "gemini-3-pro-preview" # For cutting-edge features (preview) 74 | 75 | # Example Usage Scenarios: 76 | # 77 | # 1. INTERACTIVE DEVELOPMENT (recommended for learning and development) 78 | # show_explanation = true 79 | # show_warnings = true 80 | # show_suggested_visualization = false 81 | # use_formatted_response = false 82 | # Result: SQL with helpful comments and warnings 83 | # 84 | # 2. APPLICATION INTEGRATION (for programmatic use) 85 | # show_explanation = true 86 | # show_warnings = true 87 | # show_suggested_visualization = true 88 | # use_formatted_response = true 89 | # Result: Structured JSON with all metadata 90 | # 91 | # 3. PRODUCTION/MINIMAL (for performance-critical applications) 92 | # show_explanation = false 93 | # show_warnings = false 94 | # show_suggested_visualization = false 95 | # use_formatted_response = false 96 | # Result: Just the SQL query (fastest response) 97 | # 98 | # 4. BUSINESS INTELLIGENCE (for dashboards and reporting) 99 | # show_explanation = true 100 | # show_warnings = false 101 | # show_suggested_visualization = true 102 | # use_formatted_response = true 103 | # Result: JSON with explanations and visualization guidance -------------------------------------------------------------------------------- /DIRENV_SETUP.md: -------------------------------------------------------------------------------- 1 | # Setting Up direnv for Automatic Environment Loading 2 | 3 | ## What is direnv? 4 | 5 | direnv is a shell extension that automatically loads and unloads environment variables based on the current directory. For this project, it means the Nix development environment activates automatically when you enter the project directory. 6 | 7 | ## Installation 8 | 9 | ### macOS 10 | ```bash 11 | brew install direnv 12 | ``` 13 | 14 | ### Linux 15 | ```bash 16 | # Ubuntu/Debian 17 | sudo apt install direnv 18 | 19 | # Fedora 20 | sudo dnf install direnv 21 | 22 | # Arch 23 | sudo pacman -S direnv 24 | ``` 25 | 26 | ## Shell Integration 27 | 28 | Add the direnv hook to your shell configuration file: 29 | 30 | ### For Zsh (~/.zshrc) 31 | ```bash 32 | eval "$(direnv hook zsh)" 33 | ``` 34 | 35 | ### For Bash (~/.bashrc or ~/.bash_profile) 36 | ```bash 37 | eval "$(direnv hook bash)" 38 | ``` 39 | 40 | ### For Fish (~/.config/fish/config.fish) 41 | ```fish 42 | direnv hook fish | source 43 | ``` 44 | 45 | After adding the hook, reload your shell: 46 | ```bash 47 | source ~/.zshrc # or ~/.bashrc 48 | ``` 49 | 50 | ## First Time Setup 51 | 52 | When you first enter the project directory, direnv will ask for permission: 53 | 54 | ```bash 55 | cd pg_ai_query 56 | # Output: direnv: error /Users/sachin/personal/pg_ai_query/.envrc is blocked. Run `direnv allow` to approve its content 57 | ``` 58 | 59 | Allow it to run: 60 | ```bash 61 | direnv allow 62 | ``` 63 | 64 | ## How It Works 65 | 66 | Once set up, direnv will: 67 | 68 | 1. **Automatically activate** when you enter the directory: 69 | ```bash 70 | cd pg_ai_query 71 | # direnv: loading ~/pg_ai_query/.envrc 72 | # direnv: using flake 73 | # PostgreSQL AI Query Extension Development Environment 74 | # =================================================== 75 | # [environment info displayed] 76 | ``` 77 | 78 | 2. **Automatically deactivate** when you leave: 79 | ```bash 80 | cd .. 81 | # direnv: unloading 82 | ``` 83 | 84 | 3. **Reload on changes**: 85 | - When flake.nix is modified, direnv automatically reloads 86 | - You can manually reload with `direnv reload` 87 | 88 | ## Without direnv 89 | 90 | If you don't want to use direnv, you can still use Nix manually: 91 | 92 | ```bash 93 | cd pg_ai_query 94 | nix develop 95 | # or if flakes aren't enabled: 96 | nix --extra-experimental-features "nix-command flakes" develop 97 | ``` 98 | 99 | ## Verification 100 | 101 | To verify direnv is working: 102 | 103 | ```bash 104 | # Outside the project 105 | which cmake 106 | # Output: /usr/bin/cmake (system cmake) 107 | 108 | cd pg_ai_query 109 | # direnv loads... 110 | 111 | which cmake 112 | # Output: /nix/store/.../bin/cmake (Nix cmake) 113 | ``` 114 | 115 | ## Troubleshooting 116 | 117 | ### direnv not activating 118 | 119 | 1. Check if hook is installed: 120 | ```bash 121 | type direnv 122 | # Should show: direnv is a shell function 123 | ``` 124 | 125 | 2. Check if .envrc is allowed: 126 | ```bash 127 | direnv status 128 | ``` 129 | 130 | 3. Manually reload: 131 | ```bash 132 | direnv allow 133 | direnv reload 134 | ``` 135 | 136 | ### Slow activation 137 | 138 | First load is slow as Nix builds the environment. Subsequent loads are fast: 139 | - First time: ~10-30 seconds (building) 140 | - After: <1 second (cached) 141 | 142 | ### Disabling temporarily 143 | 144 | ```bash 145 | # Disable for current directory 146 | direnv deny 147 | 148 | # Re-enable 149 | direnv allow 150 | ``` 151 | 152 | ## .envrc File Content 153 | 154 | The `.envrc` file in this project contains: 155 | ```bash 156 | use flake 157 | ``` 158 | 159 | This tells direnv to load the Nix flake's development shell. You can customize it with additional environment variables if needed: 160 | 161 | ```bash 162 | use flake 163 | 164 | # Custom environment variables 165 | export DEBUG=1 166 | export DATABASE_URL=postgresql://localhost/mydb 167 | ``` 168 | 169 | ## Security Note 170 | 171 | direnv requires explicit approval (.envrc files could contain malicious code). Always review `.envrc` files before running `direnv allow`, especially in projects from untrusted sources. 172 | 173 | For this project, the `.envrc` is safe - it only loads the Nix flake. 174 | -------------------------------------------------------------------------------- /src/core/provider_selector.cpp: -------------------------------------------------------------------------------- 1 | #include "../include/provider_selector.hpp" 2 | 3 | #include "../include/logger.hpp" 4 | 5 | namespace pg_ai { 6 | 7 | ProviderSelectionResult ProviderSelector::selectProvider( 8 | const std::string& api_key, 9 | const std::string& provider_preference) { 10 | if (provider_preference == "openai") { 11 | return selectExplicitProvider(api_key, config::Provider::OPENAI); 12 | } 13 | 14 | if (provider_preference == "anthropic") { 15 | return selectExplicitProvider(api_key, config::Provider::ANTHROPIC); 16 | } 17 | 18 | if (provider_preference == "gemini") { 19 | return selectExplicitProvider(api_key, config::Provider::GEMINI); 20 | } 21 | 22 | return autoSelectProvider(api_key); 23 | } 24 | 25 | ProviderSelectionResult ProviderSelector::selectExplicitProvider( 26 | const std::string& api_key, 27 | config::Provider provider) { 28 | ProviderSelectionResult result; 29 | result.provider = provider; 30 | result.config = config::ConfigManager::getProviderConfig(provider); 31 | result.success = true; 32 | 33 | std::string provider_name = config::ConfigManager::providerToString(provider); 34 | logger::Logger::info("Explicit " + provider_name + 35 | " provider selection from parameter"); 36 | 37 | if (!api_key.empty()) { 38 | result.api_key = api_key; 39 | result.api_key_source = "parameter"; 40 | } else if (result.config && !result.config->api_key.empty()) { 41 | result.api_key = result.config->api_key; 42 | result.api_key_source = provider_name + "_config"; 43 | logger::Logger::info("Using " + provider_name + 44 | " API key from configuration"); 45 | } 46 | 47 | if (result.api_key.empty()) { 48 | result.success = false; 49 | result.error_message = "No API key available for " + provider_name + 50 | " provider. Please provide API key as parameter " 51 | "or configure it in ~/.pg_ai.config."; 52 | } 53 | 54 | return result; 55 | } 56 | 57 | ProviderSelectionResult ProviderSelector::autoSelectProvider( 58 | const std::string& api_key) { 59 | ProviderSelectionResult result; 60 | 61 | if (!api_key.empty()) { 62 | result.provider = config::Provider::OPENAI; 63 | result.config = 64 | config::ConfigManager::getProviderConfig(config::Provider::OPENAI); 65 | result.api_key = api_key; 66 | result.api_key_source = "parameter"; 67 | result.success = true; 68 | logger::Logger::info( 69 | "Auto-selecting OpenAI provider (API key provided, no provider " 70 | "specified)"); 71 | return result; 72 | } 73 | 74 | const auto* openai_config = 75 | config::ConfigManager::getProviderConfig(config::Provider::OPENAI); 76 | if (openai_config && !openai_config->api_key.empty()) { 77 | logger::Logger::info( 78 | "Auto-selecting OpenAI provider based on configuration"); 79 | result.provider = config::Provider::OPENAI; 80 | result.config = openai_config; 81 | result.api_key = openai_config->api_key; 82 | result.api_key_source = "openai_config"; 83 | result.success = true; 84 | return result; 85 | } 86 | 87 | const auto* anthropic_config = 88 | config::ConfigManager::getProviderConfig(config::Provider::ANTHROPIC); 89 | if (anthropic_config && !anthropic_config->api_key.empty()) { 90 | logger::Logger::info( 91 | "Auto-selecting Anthropic provider based on configuration"); 92 | result.provider = config::Provider::ANTHROPIC; 93 | result.config = anthropic_config; 94 | result.api_key = anthropic_config->api_key; 95 | result.api_key_source = "anthropic_config"; 96 | result.success = true; 97 | return result; 98 | } 99 | 100 | const auto* gemini_config = 101 | config::ConfigManager::getProviderConfig(config::Provider::GEMINI); 102 | if (gemini_config && !gemini_config->api_key.empty()) { 103 | logger::Logger::info( 104 | "Auto-selecting Gemini provider based on configuration"); 105 | result.provider = config::Provider::GEMINI; 106 | result.config = gemini_config; 107 | result.api_key = gemini_config->api_key; 108 | result.api_key_source = "gemini_config"; 109 | result.success = true; 110 | return result; 111 | } 112 | 113 | logger::Logger::warning("No API key found in config"); 114 | result.success = false; 115 | result.error_message = 116 | "API key required. Pass as parameter or set OpenAI, " 117 | "Anthropic, or Gemini API key in ~/.pg_ai.config."; 118 | return result; 119 | } 120 | 121 | } // namespace pg_ai 122 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/installation_issue.yml: -------------------------------------------------------------------------------- 1 | name: Installation/Setup Issue 2 | description: Report problems with building, installing, or configuring pg_ai_query 3 | title: "[Installation]: " 4 | labels: ["installation"] 5 | body: 6 | - type: markdown 7 | attributes: 8 | value: | 9 | Thanks for reporting an installation issue! Please provide as much detail as possible to help us assist you. 10 | 11 | - type: textarea 12 | id: description 13 | attributes: 14 | label: Issue Description 15 | description: Describe the installation or setup problem you're encountering 16 | placeholder: Describe the issue... 17 | validations: 18 | required: true 19 | 20 | - type: dropdown 21 | id: install-step 22 | attributes: 23 | label: Installation Step 24 | description: At which step are you encountering the issue? 25 | options: 26 | - Prerequisites/Dependencies 27 | - Git clone/submodules 28 | - CMake configuration 29 | - Build (make) 30 | - Installation (make install) 31 | - CREATE EXTENSION 32 | - Configuration file setup 33 | - API key configuration 34 | - Testing the extension 35 | - Other 36 | validations: 37 | required: true 38 | 39 | - type: textarea 40 | id: commands 41 | attributes: 42 | label: Commands Executed 43 | description: What commands did you run? 44 | placeholder: | 45 | git clone --recurse-submodules https://github.com/benodiwal/pg_ai_query.git 46 | cd pg_ai_query 47 | mkdir build && cd build 48 | cmake .. 49 | make 50 | render: shell 51 | validations: 52 | required: true 53 | 54 | - type: textarea 55 | id: error-output 56 | attributes: 57 | label: Error Output 58 | description: Full error message or output 59 | placeholder: Paste the complete error output here... 60 | render: shell 61 | validations: 62 | required: true 63 | 64 | - type: input 65 | id: pg-version 66 | attributes: 67 | label: PostgreSQL Version 68 | description: Output of `pg_config --version` 69 | placeholder: PostgreSQL 16.0 70 | validations: 71 | required: true 72 | 73 | - type: input 74 | id: cmake-version 75 | attributes: 76 | label: CMake Version 77 | description: Output of `cmake --version` 78 | placeholder: cmake version 3.27.0 79 | 80 | - type: input 81 | id: compiler 82 | attributes: 83 | label: Compiler 84 | description: Which C++ compiler are you using? 85 | placeholder: g++ 11.4.0 / clang++ 15.0.0 86 | validations: 87 | required: true 88 | 89 | - type: dropdown 90 | id: os 91 | attributes: 92 | label: Operating System 93 | options: 94 | - Linux 95 | - macOS 96 | - Windows 97 | - WSL (Windows Subsystem for Linux) 98 | - Docker 99 | - Other 100 | validations: 101 | required: true 102 | 103 | - type: input 104 | id: os-version 105 | attributes: 106 | label: OS Version/Distribution 107 | description: e.g., Ubuntu 22.04, macOS 14.0, Windows 11 108 | placeholder: Ubuntu 22.04 109 | validations: 110 | required: true 111 | 112 | - type: textarea 113 | id: pg-config 114 | attributes: 115 | label: PostgreSQL Configuration 116 | description: Output of `pg_config` (if relevant) 117 | placeholder: Paste pg_config output here... 118 | render: shell 119 | 120 | - type: textarea 121 | id: cmake-output 122 | attributes: 123 | label: CMake Configuration Output 124 | description: Full output from cmake command 125 | placeholder: Paste cmake output here... 126 | render: shell 127 | 128 | - type: checkboxes 129 | id: checklist 130 | attributes: 131 | label: Pre-installation Checklist 132 | description: Have you verified these prerequisites? 133 | options: 134 | - label: PostgreSQL 14+ is installed with development headers (postgresql-server-dev) 135 | - label: CMake 3.16+ is installed 136 | - label: C++20 compatible compiler is installed 137 | - label: Git submodules were properly initialized (--recurse-submodules) 138 | - label: pg_config is in PATH 139 | - label: I have API key(s) from OpenAI or Anthropic 140 | 141 | - type: textarea 142 | id: additional 143 | attributes: 144 | label: Additional Context 145 | description: Any other relevant information 146 | placeholder: Add any other context about the installation problem here... 147 | -------------------------------------------------------------------------------- /src/include/config.hpp: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | #include 4 | #include 5 | #include 6 | 7 | namespace pg_ai::config { 8 | 9 | namespace constants { 10 | // Provider name strings 11 | constexpr const char* PROVIDER_OPENAI = "openai"; 12 | constexpr const char* PROVIDER_ANTHROPIC = "anthropic"; 13 | constexpr const char* PROVIDER_GEMINI = "gemini"; 14 | constexpr const char* PROVIDER_AUTO = "auto"; 15 | constexpr const char* PROVIDER_UNKNOWN = "unknown"; 16 | 17 | // Default API endpoints 18 | constexpr const char* DEFAULT_OPENAI_ENDPOINT = "https://api.openai.com"; 19 | constexpr const char* DEFAULT_ANTHROPIC_ENDPOINT = "https://api.anthropic.com"; 20 | 21 | // Config file path 22 | constexpr const char* CONFIG_FILE_NAME = ".pg_ai.config"; 23 | 24 | // Config section names 25 | constexpr const char* SECTION_GENERAL = "general"; 26 | constexpr const char* SECTION_QUERY = "query"; 27 | constexpr const char* SECTION_RESPONSE = "response"; 28 | constexpr const char* SECTION_OPENAI = "openai"; 29 | constexpr const char* SECTION_ANTHROPIC = "anthropic"; 30 | constexpr const char* SECTION_GEMINI = "gemini"; 31 | 32 | // Default model names 33 | constexpr const char* DEFAULT_OPENAI_MODEL = "gpt-4o"; 34 | constexpr const char* DEFAULT_ANTHROPIC_MODEL = "claude-sonnet-4-5-20250929"; 35 | 36 | // Default token limits 37 | constexpr int DEFAULT_OPENAI_MAX_TOKENS = 16384; 38 | constexpr int DEFAULT_ANTHROPIC_MAX_TOKENS = 8192; 39 | constexpr int DEFAULT_MAX_TOKENS = 4096; 40 | constexpr double DEFAULT_TEMPERATURE = 0.7; 41 | } // namespace constants 42 | 43 | enum class Provider { OPENAI, ANTHROPIC, GEMINI, UNKNOWN }; 44 | 45 | struct ProviderConfig { 46 | Provider provider; 47 | std::string api_key; 48 | std::string default_model; 49 | int default_max_tokens; 50 | double default_temperature; 51 | std::string api_endpoint; // Custom API endpoint URL (optional) 52 | 53 | // Default constructor 54 | ProviderConfig() 55 | : provider(Provider::UNKNOWN), 56 | default_max_tokens(4096), 57 | default_temperature(0.7), 58 | api_endpoint() {} 59 | }; 60 | 61 | struct Configuration { 62 | ProviderConfig default_provider; 63 | std::vector providers; 64 | 65 | // General settings 66 | std::string log_level; 67 | bool enable_logging; 68 | int request_timeout_ms; 69 | int max_retries; 70 | 71 | // Query generation settings 72 | bool enforce_limit; 73 | int default_limit; 74 | 75 | // Response format settings 76 | bool show_explanation; 77 | bool show_warnings; 78 | bool show_suggested_visualization; 79 | bool use_formatted_response; 80 | 81 | // Default constructor with sensible defaults 82 | Configuration(); 83 | }; 84 | 85 | class ConfigManager { 86 | public: 87 | /** 88 | * @brief Load configuration from ~/.pg_ai.config 89 | * @return true if config loaded successfully, false otherwise 90 | */ 91 | static bool loadConfig(); 92 | 93 | /** 94 | * @brief Load configuration from specific file path 95 | * @param config_path Path to configuration file 96 | * @return true if config loaded successfully, false otherwise 97 | */ 98 | static bool loadConfig(const std::string& config_path); 99 | 100 | /** 101 | * @brief Get current configuration 102 | * @return Reference to current configuration 103 | */ 104 | static const Configuration& getConfig(); 105 | 106 | /** 107 | * @brief Get provider config by provider type 108 | * @param provider Provider type to find 109 | * @return Pointer to provider config, or nullptr if not found 110 | */ 111 | static const ProviderConfig* getProviderConfig(Provider provider); 112 | 113 | /** 114 | * @brief Convert provider enum to string 115 | */ 116 | static std::string providerToString(Provider provider); 117 | 118 | /** 119 | * @brief Convert string to provider enum 120 | */ 121 | static Provider stringToProvider(const std::string& provider_str); 122 | 123 | /** 124 | * @brief Reset configuration to defaults (for testing only) 125 | */ 126 | static void reset(); 127 | 128 | private: 129 | static Configuration config_; 130 | static bool config_loaded_; 131 | 132 | /** 133 | * @brief Parse configuration file content 134 | */ 135 | static bool parseConfig(const std::string& content); 136 | 137 | /** 138 | * @brief Get home directory path 139 | */ 140 | static std::string getHomeDirectory(); 141 | 142 | /** 143 | * @brief Get mutable provider config (for internal use) 144 | */ 145 | static ProviderConfig* getProviderConfigMutable(Provider provider); 146 | 147 | /** 148 | * @brief Load configuration from environment variables 149 | */ 150 | static void loadEnvConfig(); 151 | }; 152 | 153 | // Convenience macros for accessing config 154 | #define PG_AI_CONFIG() pg_ai::config::ConfigManager::getConfig() 155 | #define PG_AI_PROVIDER_CONFIG(provider) \ 156 | pg_ai::config::ConfigManager::getProviderConfig(provider) 157 | 158 | } // namespace pg_ai::config -------------------------------------------------------------------------------- /docs/src/table-details.md: -------------------------------------------------------------------------------- 1 | # get_table_details Function 2 | 3 | The `get_table_details` function returns detailed information about a specific table including columns, constraints, foreign keys, and indexes. 4 | 5 | ## Function Signature 6 | 7 | ```sql 8 | get_table_details( 9 | table_name text, 10 | schema_name text DEFAULT 'public' 11 | ) RETURNS text 12 | ``` 13 | 14 | ## Parameters 15 | 16 | | Parameter | Type | Default | Description | 17 | |-----------|------|---------|-------------| 18 | | `table_name` | `text` | *required* | Name of the table to analyze | 19 | | `schema_name` | `text` | `'public'` | Schema containing the table | 20 | 21 | ## Basic Usage 22 | 23 | ```sql 24 | -- Basic usage 25 | SELECT get_table_details('users'); 26 | 27 | -- Specific schema 28 | SELECT get_table_details('orders', 'sales'); 29 | 30 | -- Pretty print 31 | SELECT jsonb_pretty(get_table_details('users')::jsonb); 32 | ``` 33 | 34 | ## Output Format 35 | 36 | The function returns detailed JSON information about the table: 37 | 38 | ```json 39 | { 40 | "table_name": "users", 41 | "schema_name": "public", 42 | "columns": [ 43 | { 44 | "column_name": "id", 45 | "data_type": "integer", 46 | "is_nullable": false, 47 | "column_default": "nextval('users_id_seq'::regclass)", 48 | "is_primary_key": true, 49 | "is_foreign_key": false 50 | }, 51 | { 52 | "column_name": "email", 53 | "data_type": "character varying", 54 | "is_nullable": false, 55 | "column_default": null, 56 | "is_primary_key": false, 57 | "is_foreign_key": false 58 | } 59 | ], 60 | "indexes": [ 61 | "users_pkey", 62 | "users_email_key" 63 | ] 64 | } 65 | ``` 66 | 67 | ## Extracting Information 68 | 69 | ### Get Column Information 70 | 71 | ```sql 72 | SELECT 73 | column_name, 74 | data_type, 75 | is_nullable, 76 | is_primary_key 77 | FROM jsonb_to_recordset( 78 | (get_table_details('users')::jsonb)->'columns' 79 | ) AS x( 80 | column_name text, 81 | data_type text, 82 | is_nullable boolean, 83 | is_primary_key boolean 84 | ); 85 | ``` 86 | 87 | ### Find Primary Key Columns 88 | 89 | ```sql 90 | SELECT 91 | column_name 92 | FROM jsonb_to_recordset( 93 | (get_table_details('users')::jsonb)->'columns' 94 | ) AS x(column_name text, is_primary_key boolean) 95 | WHERE is_primary_key = true; 96 | ``` 97 | 98 | ### List All Indexes 99 | 100 | ```sql 101 | SELECT 102 | jsonb_array_elements_text( 103 | (get_table_details('orders')::jsonb)->'indexes' 104 | ) as index_name; 105 | ``` 106 | 107 | ## Use Cases 108 | 109 | - **Schema Documentation**: Generate comprehensive table documentation 110 | - **Database Exploration**: Understand table structure before writing queries 111 | - **Data Migration Planning**: Analyze table structure for migration scripts 112 | - **Query Optimization**: Understand available indexes and constraints 113 | - **Integration Planning**: Map foreign key relationships 114 | 115 | ## Common Patterns 116 | 117 | ### Analyzing All Tables 118 | 119 | ```sql 120 | -- Get details for all tables in database 121 | WITH all_tables AS ( 122 | SELECT 123 | jsonb_array_elements(get_database_tables()::jsonb)->>'table_name' as table_name 124 | ) 125 | SELECT 126 | table_name, 127 | get_table_details(table_name) as details 128 | FROM all_tables; 129 | ``` 130 | 131 | ### Finding Foreign Key Relationships 132 | 133 | ```sql 134 | SELECT 135 | column_name, 136 | foreign_table, 137 | foreign_column 138 | FROM jsonb_to_recordset( 139 | (get_table_details('orders')::jsonb)->'columns' 140 | ) AS x( 141 | column_name text, 142 | is_foreign_key boolean, 143 | foreign_table text, 144 | foreign_column text 145 | ) 146 | WHERE is_foreign_key = true; 147 | ``` 148 | 149 | ## Integration with Other Functions 150 | 151 | ### With generate_query 152 | 153 | ```sql 154 | -- First understand the table structure 155 | SELECT get_table_details('users'); 156 | 157 | -- Then generate queries based on that knowledge 158 | SELECT generate_query('show users with their profile information'); 159 | ``` 160 | 161 | ### With explain_query 162 | 163 | ```sql 164 | -- Analyze table structure 165 | SELECT get_table_details('orders'); 166 | 167 | -- Generate and analyze a query 168 | SELECT explain_query('SELECT * FROM orders WHERE user_id = 123'); 169 | ``` 170 | 171 | ## Error Handling 172 | 173 | Common errors and solutions: 174 | 175 | | Error | Cause | Solution | 176 | |-------|-------|----------| 177 | | `"Table 'tablename' does not exist"` | Table not found | Check table name and schema | 178 | | `"Access denied to table"` | Insufficient permissions | Grant SELECT permission on table | 179 | | `"Schema 'schemaname' does not exist"` | Schema not found | Verify schema name | 180 | 181 | ## See Also 182 | 183 | - [get_database_tables Function](./database-tables.md) - List all database tables 184 | - [generate_query Function](./generate-query.md) - Generate queries from natural language 185 | - [Error Codes](./error-codes.md) - Troubleshooting guide -------------------------------------------------------------------------------- /src/core/query_parser.cpp: -------------------------------------------------------------------------------- 1 | #include "../include/query_parser.hpp" 2 | 3 | #include 4 | #include 5 | 6 | #include "../include/query_generator.hpp" 7 | 8 | namespace pg_ai { 9 | 10 | nlohmann::json QueryParser::extractSQLFromResponse(const std::string& text) { 11 | // Try to find JSON in markdown code block 12 | std::regex json_block(R"(```(?:json)?\s*(\{[\s\S]*?\})\s*```)", 13 | std::regex::icase); 14 | std::smatch match; 15 | 16 | if (std::regex_search(text, match, json_block)) { 17 | try { 18 | return nlohmann::json::parse(match[1].str()); 19 | } catch (...) { 20 | // Continue to other parsing methods 21 | } 22 | } 23 | 24 | // Try to parse as direct JSON 25 | try { 26 | return nlohmann::json::parse(text); 27 | } catch (...) { 28 | // Continue to fallback 29 | } 30 | 31 | // Fallback: treat as raw SQL 32 | return {{"sql", text}, {"explanation", "Raw LLM output (no JSON detected)"}}; 33 | } 34 | 35 | bool QueryParser::accessesSystemTables(const std::string& sql) { 36 | std::string upper_sql = sql; 37 | std::transform(upper_sql.begin(), upper_sql.end(), upper_sql.begin(), 38 | ::toupper); 39 | return upper_sql.find("INFORMATION_SCHEMA") != std::string::npos || 40 | upper_sql.find("PG_CATALOG") != std::string::npos; 41 | } 42 | 43 | bool QueryParser::hasErrorIndicators(const std::string& explanation, 44 | const std::vector& warnings) { 45 | std::string lower_explanation = explanation; 46 | std::transform(lower_explanation.begin(), lower_explanation.end(), 47 | lower_explanation.begin(), ::tolower); 48 | 49 | bool has_error = 50 | lower_explanation.find("cannot generate query") != std::string::npos || 51 | lower_explanation.find("cannot create query") != std::string::npos || 52 | lower_explanation.find("unable to generate") != std::string::npos || 53 | lower_explanation.find("does not exist") != std::string::npos || 54 | lower_explanation.find("do not exist") != std::string::npos || 55 | lower_explanation.find("table not found") != std::string::npos || 56 | lower_explanation.find("column not found") != std::string::npos || 57 | lower_explanation.find("no such table") != std::string::npos || 58 | lower_explanation.find("no such column") != std::string::npos; 59 | 60 | if (has_error) { 61 | return true; 62 | } 63 | 64 | for (const auto& warning : warnings) { 65 | std::string lower_warning = warning; 66 | std::transform(lower_warning.begin(), lower_warning.end(), 67 | lower_warning.begin(), ::tolower); 68 | if (lower_warning.find("error:") != std::string::npos || 69 | lower_warning.find("does not exist") != std::string::npos || 70 | lower_warning.find("do not exist") != std::string::npos) { 71 | return true; 72 | } 73 | } 74 | 75 | return false; 76 | } 77 | 78 | QueryResult QueryParser::parseQueryResponse(const std::string& response_text) { 79 | nlohmann::json j = extractSQLFromResponse(response_text); 80 | std::string sql = j.value("sql", ""); 81 | std::string explanation = j.value("explanation", ""); 82 | 83 | std::vector warnings_vec; 84 | try { 85 | if (j.contains("warnings")) { 86 | if (j["warnings"].is_array()) { 87 | warnings_vec = j["warnings"].get>(); 88 | } else if (j["warnings"].is_string()) { 89 | warnings_vec.push_back(j["warnings"].get()); 90 | } 91 | } 92 | } catch (...) { 93 | // Ignore warnings parsing errors 94 | } 95 | 96 | // Check for error indicators in explanation/warnings 97 | if (hasErrorIndicators(explanation, warnings_vec)) { 98 | return QueryResult{.generated_query = "", 99 | .explanation = explanation, 100 | .warnings = warnings_vec, 101 | .row_limit_applied = false, 102 | .suggested_visualization = "", 103 | .success = false, 104 | .error_message = explanation}; 105 | } 106 | 107 | // Handle empty SQL (but not an error) 108 | if (sql.empty()) { 109 | return QueryResult{.generated_query = "", 110 | .explanation = explanation, 111 | .warnings = warnings_vec, 112 | .row_limit_applied = false, 113 | .suggested_visualization = "", 114 | .success = true, 115 | .error_message = ""}; 116 | } 117 | 118 | // Check for system table access 119 | if (accessesSystemTables(sql)) { 120 | return QueryResult{ 121 | .generated_query = "", 122 | .explanation = "", 123 | .warnings = {}, 124 | .row_limit_applied = false, 125 | .suggested_visualization = "", 126 | .success = false, 127 | .error_message = 128 | "Generated query accesses system tables. Please query user " 129 | "tables only."}; 130 | } 131 | 132 | // Success case 133 | return QueryResult{ 134 | .generated_query = sql, 135 | .explanation = explanation, 136 | .warnings = warnings_vec, 137 | .row_limit_applied = j.value("row_limit_applied", false), 138 | .suggested_visualization = j.value("suggested_visualization", "table"), 139 | .success = true, 140 | .error_message = ""}; 141 | } 142 | 143 | } // namespace pg_ai 144 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | EXTENSION = pg_ai_query 2 | DATA = sql/pg_ai_query--1.0.sql 3 | # Note: We don't set MODULES here because we use CMake for compilation, not PGXS 4 | 5 | # Build directories (defined before PGXS for EXTRA_CLEAN) 6 | BUILD_DIR = build 7 | TEST_BUILD_DIR = build_tests 8 | TESTS_DIR = tests 9 | 10 | # Tell PGXS to clean our build directories and the copied extension file 11 | EXTRA_CLEAN = $(BUILD_DIR) $(TEST_BUILD_DIR) install pg_ai_query.so pg_ai_query.dylib 12 | 13 | PG_CONFIG = pg_config 14 | PGXS := $(shell $(PG_CONFIG) --pgxs) 15 | include $(PGXS) 16 | 17 | # Determine extension suffix based on PostgreSQL version on macOS 18 | ifeq ($(shell uname),Darwin) 19 | PG_VERSION := $(shell $(PG_CONFIG) --version | sed 's/PostgreSQL \([0-9]*\).*/\1/') 20 | ifeq ($(shell test $(PG_VERSION) -ge 16; echo $$?),0) 21 | EXT_SUFFIX := .dylib 22 | else 23 | EXT_SUFFIX := .so 24 | endif 25 | else 26 | EXT_SUFFIX := .so 27 | endif 28 | 29 | TARGET_LIB := pg_ai_query$(EXT_SUFFIX) 30 | 31 | all: $(TARGET_LIB) 32 | 33 | $(TARGET_LIB): $(BUILD_DIR)/CMakeCache.txt 34 | $(MAKE) -C $(BUILD_DIR) 35 | cp $(BUILD_DIR)/$(TARGET_LIB) . 36 | 37 | $(BUILD_DIR)/CMakeCache.txt: 38 | mkdir -p $(BUILD_DIR) 39 | cd $(BUILD_DIR) && cmake .. \ 40 | -DCMAKE_BUILD_TYPE=Release \ 41 | -DCMAKE_INSTALL_PREFIX=$(shell $(PG_CONFIG) --pkglibdir) 42 | 43 | # Override PGXS install to also install the shared library via CMake 44 | install: $(TARGET_LIB) 45 | cmake --install $(BUILD_DIR) 46 | 47 | .PHONY: all install 48 | 49 | # Formatting 50 | SRC_FILES = $(shell find . -type f \ 51 | \( -name "*.c" -o -name "*.cc" -o -name "*.cpp" \ 52 | -o -name "*.h" -o -name "*.hpp" \) \ 53 | -not -path "./$(BUILD_DIR)/*" \ 54 | -not -path "./third_party/*") 55 | 56 | # Run clang-format in-place 57 | format: 58 | @echo "Formatting $(words $(SRC_FILES)) file(s)..." 59 | @clang-format -i -style=file $(SRC_FILES) 60 | 61 | # Dry-run: show a diff of what *would* be changed 62 | format-check: 63 | @echo "Checking formatting..." 64 | @clang-format -style=file --dry-run --Werror $(SRC_FILES) \ 65 | || (echo "ERROR: Code is not formatted. Run 'make format'." && exit 1) 66 | 67 | .PHONY: format format-check 68 | 69 | # ============================================================================= 70 | # Testing 71 | # ============================================================================= 72 | 73 | NPROCS := $(shell nproc 2>/dev/null || sysctl -n hw.ncpu 2>/dev/null || echo 4) 74 | 75 | ## Setup test environment and build test executable 76 | test-setup: 77 | @echo "Setting up test environment..." 78 | @mkdir -p $(TEST_BUILD_DIR) 79 | @cd $(TEST_BUILD_DIR) && cmake .. -DBUILD_TESTS=ON 80 | @cd $(TEST_BUILD_DIR) && make pg_ai_query_tests -j$(NPROCS) 81 | @echo "\n✓ Test setup complete. Run 'make test-unit' to run tests." 82 | 83 | ## Run all tests (unit + PostgreSQL) 84 | test: test-unit test-pg 85 | 86 | ## Run C++ unit tests 87 | test-unit: 88 | @if [ ! -f "$(TEST_BUILD_DIR)/tests/pg_ai_query_tests" ]; then \ 89 | echo "Test executable not found. Running test-setup..."; \ 90 | $(MAKE) test-setup; \ 91 | fi 92 | @echo "\n=== Running Unit Tests ===" 93 | @cd $(TEST_BUILD_DIR) && ./tests/pg_ai_query_tests 94 | @echo "\n✓ All unit tests passed." 95 | 96 | ## Run specific test suite (usage: make test-suite SUITE=ConfigManagerTest) 97 | test-suite: 98 | @if [ ! -f "$(TEST_BUILD_DIR)/tests/pg_ai_query_tests" ]; then \ 99 | $(MAKE) test-setup; \ 100 | fi 101 | @cd $(TEST_BUILD_DIR) && ./tests/pg_ai_query_tests --gtest_filter="$(SUITE).*" 102 | 103 | ## Run PostgreSQL extension tests 104 | test-pg: 105 | @echo "\n=== Running PostgreSQL Extension Tests ===" 106 | @if ! pg_isready -q 2>/dev/null; then \ 107 | echo "Error: PostgreSQL is not running."; \ 108 | exit 1; \ 109 | fi 110 | @DB=$${PGDATABASE:-postgres}; \ 111 | echo "Using database: $$DB"; \ 112 | psql -d $$DB -f $(TESTS_DIR)/sql/setup.sql -q 2>/dev/null || true; \ 113 | psql -d $$DB -f $(TESTS_DIR)/sql/test_extension_functions.sql; \ 114 | psql -d $$DB -f $(TESTS_DIR)/sql/teardown.sql -q 2>/dev/null || true 115 | @echo "\n✓ PostgreSQL tests complete." 116 | 117 | ## Clean test build artifacts 118 | test-clean: 119 | @rm -rf $(TEST_BUILD_DIR) 120 | @echo "✓ Test build cleaned." 121 | 122 | ## Show test help 123 | test-help: 124 | @echo "pg_ai_query - Test Commands" 125 | @echo "============================" 126 | @echo "" 127 | @echo "Setup:" 128 | @echo " make test-setup - Build test executable (runs automatically if needed)" 129 | @echo "" 130 | @echo "Running Tests:" 131 | @echo " make test-unit - Run C++ unit tests (91 tests)" 132 | @echo " make test-pg - Run PostgreSQL extension tests" 133 | @echo " make test - Run all tests (unit + pg)" 134 | @echo "" 135 | @echo "Advanced:" 136 | @echo " make test-suite SUITE=ConfigManagerTest - Run specific test suite" 137 | @echo " make test-suite SUITE=QueryParserTest" 138 | @echo " make test-suite SUITE=ProviderSelectorTest" 139 | @echo " make test-suite SUITE=ResponseFormatterTest" 140 | @echo " make test-suite SUITE=UtilsTest" 141 | @echo "" 142 | @echo "Cleanup:" 143 | @echo " make test-clean - Remove test build artifacts" 144 | @echo "" 145 | @echo "Environment Variables:" 146 | @echo " PGDATABASE - Database for PostgreSQL tests (default: postgres)" 147 | @echo "" 148 | @echo "Examples:" 149 | @echo " make test-unit # Run all unit tests" 150 | @echo " make test-suite SUITE=ConfigManagerTest # Run one suite" 151 | @echo " PGDATABASE=mydb make test-pg # Run PG tests on 'mydb'" 152 | 153 | .PHONY: test test-setup test-unit test-suite test-pg test-clean test-help 154 | -------------------------------------------------------------------------------- /src/prompts.cpp: -------------------------------------------------------------------------------- 1 | #include "include/prompts.hpp" 2 | 3 | namespace pg_ai::prompts { 4 | 5 | const std::string SYSTEM_PROMPT = 6 | R"(You are a senior PostgreSQL database analyst that writes **correct, efficient SQL** for the exact database schema provided. 7 | 8 | CRITICAL: You MUST generate the exact SQL operation the user requests - if they ask for DELETE, write DELETE; if they ask for UPDATE, write UPDATE; if they ask for INSERT, write INSERT. Do NOT convert destructive operations to SELECT queries unless explicitly asked to do so. 9 | 10 | ### INPUTS YOU WILL RECEIVE 11 | 1. **User question** – natural language request. 12 | 2. **Database schema** – available tables, columns, data types, constraints, relationships. 13 | 3. **Database dialect** – PostgreSQL. 14 | 15 | ### YOUR OUTPUT (JSON only, no extra text) 16 | { 17 | "sql": "exact SQL query to run", 18 | "explanation": "plain English summary of what the query does", 19 | "warnings": ["list of risks, performance notes, or clarifications"] or [], 20 | "row_limit_applied": true/false, 21 | "suggested_visualization": "bar|line|table|pie|none" 22 | } 23 | 24 | ### VALIDATION RULES (CRITICAL) 25 | 1. **SCHEMA VALIDATION FOR QUERIES**: If user mentions tables/columns NOT in the provided schema for SELECT/UPDATE/DELETE/INSERT operations, return: 26 | ```json 27 | { 28 | "sql": "", 29 | "explanation": "Cannot generate query. Referenced table(s) 'table_name' do not exist in the database. Available tables: [list actual tables]", 30 | "warnings": ["ERROR: Table 'table_name' not found in database schema"], 31 | "row_limit_applied": false, 32 | "suggested_visualization": "none" 33 | } 34 | ``` 35 | 36 | 1a. **DDL OPERATIONS EXCEPTION**: For CREATE TABLE, DROP TABLE, ALTER TABLE, CREATE INDEX operations, do NOT validate against existing schema. These operations are meant to create or modify database structure. 37 | 38 | 2. **AMBIGUOUS REQUESTS**: If the request is too vague (e.g., "show me data"), ask for clarification: 39 | ```json 40 | { 41 | "sql": "", 42 | "explanation": "Please clarify: What specific data would you like to see? Available tables: [list tables]", 43 | "warnings": ["WARN: Request too ambiguous - need more specific requirements"], 44 | "row_limit_applied": false, 45 | "suggested_visualization": "none" 46 | } 47 | ``` 48 | 49 | 3. **IMPOSSIBLE QUERIES**: If request cannot be fulfilled with available schema: 50 | ```json 51 | { 52 | "sql": "", 53 | "explanation": "Cannot fulfill request: [specific reason]. Available data: [describe what's possible]", 54 | "warnings": ["ERROR: Requested data not available in current schema"], 55 | "row_limit_applied": false, 56 | "suggested_visualization": "none" 57 | } 58 | ``` 59 | 60 | ### SQL GENERATION RULES (CRITICAL) 61 | 1. **FOR DATA QUERIES (SELECT/UPDATE/DELETE/INSERT)**: ONLY use tables/columns from the provided schema - validate every reference. 62 | 2. **FOR DDL OPERATIONS (CREATE/DROP/ALTER)**: You may create new table/column names as requested by the user. These do not need to exist in current schema. 63 | 3. **SINGLE LINE FORMAT**: Generate ALL SQL as a single line without newlines (\n) or line breaks. Use spaces to separate clauses. 64 | 4. **Use proper table aliases** for readability (e.g., u for users, o for orders). 65 | 5. **For time-based queries**: Use appropriate PostgreSQL date functions (NOW(), INTERVAL, DATE_TRUNC). 66 | 6. **For rankings**: Use window functions (ROW_NUMBER(), RANK(), DENSE_RANK()). 67 | 7. **Handle NULLs properly** in aggregations and comparisons. 68 | 8. **Use appropriate JOINs**: INNER for required relationships, LEFT for optional. 69 | 9. **Quote identifiers** if they contain spaces or special characters. 70 | 10. **For SELECT queries**: Prefer explicit column lists over SELECT *. 71 | 11. **For SELECT queries**: Apply LIMIT 1000 unless user says "all", "full", or "complete". 72 | 12. **For destructive operations**: Include appropriate WHERE clauses to prevent unintended data loss. 73 | 13. **For CREATE TABLE**: Use appropriate PostgreSQL data types, constraints, and follow best practices. 74 | 75 | ### WARNING CATEGORIES 76 | - **INFO**: Helpful context about the query 77 | - **WARN**: Performance concerns, data quality notes, or assumptions made 78 | - **Never use ERROR in warnings** (errors go in explanation with empty sql) 79 | 80 | ### RESPONSE EXAMPLES 81 | 82 | All responses must be valid JSON with these fields: 83 | - sql: the PostgreSQL query to execute 84 | - explanation: plain English description 85 | - warnings: array of warning strings 86 | - row_limit_applied: boolean 87 | - suggested_visualization: string 88 | )"; 89 | 90 | const std::string EXPLAIN_SYSTEM_PROMPT = 91 | R"(You are a PostgreSQL query performance expert. 92 | Analyze the provided EXPLAIN ANALYZE output and provide a clear, easy-to-understand explanation. 93 | 94 | Your response should include: 95 | 1. Query Overview: Brief description of what the query does 96 | 2. Performance Summary: Overall execution time, total cost, and rows processed 97 | 3. Execution Plan Analysis: Key steps in the execution plan with focus on expensive operations 98 | 4. Performance Issues: Any bottlenecks, inefficient operations, or concerning metrics 99 | 5. Optimization Suggestions: Specific recommendations for improving query performance 100 | 6. Index Recommendations: Suggest missing indexes if applicable 101 | 102 | Keep the explanation concise but comprehensive. Use plain language that both developers and DBAs can understand. 103 | Format the response as plain text with clear section headers and bullet points. Do not use markdown syntax like **, ##, or ###.)"; 104 | 105 | } // namespace pg_ai::prompts -------------------------------------------------------------------------------- /CHANGELOG.md: -------------------------------------------------------------------------------- 1 | # Changelog 2 | 3 | All notable changes to this project will be documented in this file. 4 | 5 | The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), 6 | and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). 7 | 8 | ## [v0.1.1] - 2025-12-15 9 | 10 | ### Fixed 11 | 12 | #### Build System 13 | - Fix LLVM bitcode build error on PostgreSQL 18+ ([#30](https://github.com/benodiwal/pg_ai_query/pull/30)) 14 | - Fix `make install` not installing shared library ([#31](https://github.com/benodiwal/pg_ai_query/pull/31), [#37](https://github.com/benodiwal/pg_ai_query/pull/37)) 15 | - Fix `make clean` not removing extension files from root directory ([#35](https://github.com/benodiwal/pg_ai_query/pull/35)) 16 | 17 | #### Configuration 18 | - Fix config parser including inline comments with field values ([#28](https://github.com/benodiwal/pg_ai_query/pull/28)) 19 | 20 | ### Changed 21 | 22 | #### CI/CD 23 | - Pin GitHub Actions to commit SHAs for supply chain security ([#32](https://github.com/benodiwal/pg_ai_query/pull/32), [#33](https://github.com/benodiwal/pg_ai_query/pull/33)) 24 | - Add Makefile/PGXS build tests for PostgreSQL 16-18 25 | 26 | ### Contributors 27 | 28 | Thanks to the following people for their contributions to this release: 29 | 30 | - [@reneleonhardt](https://github.com/reneleonhardt) - GitHub Actions security hardening 31 | - [@fvlnl](https://github.com/fvlnl) - Reported PostgreSQL 18 build issues ([#29](https://github.com/benodiwal/pg_ai_query/issues/29)) 32 | 33 | --- 34 | 35 | ## [v0.1.0] - 2025-12-09 36 | 37 | ### Added 38 | 39 | #### Core Functions 40 | - `generate_query(text, text, text)` - Generate SQL queries from natural language descriptions 41 | - Automatic database schema discovery 42 | - Support for provider selection (`openai`, `anthropic`, `gemini`, `auto`) 43 | - Optional inline API key parameter 44 | - `explain_query(text, text, text)` - AI-powered query plan analysis 45 | - Runs `EXPLAIN ANALYZE` and provides human-readable explanations 46 | - Performance insights and optimization suggestions 47 | - `get_database_tables()` - Returns JSON array of all user tables with metadata 48 | - `get_table_details(text, text)` - Returns detailed table information including columns, constraints, and indexes 49 | 50 | #### AI Provider Support 51 | - **OpenAI** - All current supported models 52 | - **Anthropic** - All current supported models 53 | - **Google Gemini** - All current supported models 54 | - **OpenAI-compatible APIs** - Support for OpenRouter and other compatible endpoints 55 | - Automatic provider selection with fallback logic 56 | - Configurable retry policies with exponential backoff 57 | 58 | #### Configuration 59 | - File-based configuration via `~/.pg_ai.config` 60 | - Support for multiple AI providers in single config 61 | - Configurable options: 62 | - Log level and logging toggle 63 | - Request timeout and retry settings 64 | - Query limit enforcement 65 | - Response formatting (explanations, warnings, visualizations) 66 | - Custom API endpoints for OpenAI-compatible services 67 | 68 | #### Documentation 69 | - Comprehensive mdBook documentation 70 | - Installation and configuration guides 71 | - Provider comparison and model selection guide 72 | - Usage examples and troubleshooting 73 | 74 | ### Technical Details 75 | 76 | - Built with C++20 77 | - Uses [ai-sdk-cpp](https://github.com/anthropics/ai-sdk-cpp) for AI provider integration 78 | - PostgreSQL extension API compliance 79 | 80 | --- 81 | 82 | ## [v0.1.0-beta] - 2025-12-08 83 | 84 | ### Added 85 | 86 | #### Core Functions 87 | - `generate_query(text, text, text)` - Generate SQL queries from natural language descriptions 88 | - Automatic database schema discovery 89 | - Support for provider selection (`openai`, `anthropic`, `auto`) 90 | - Optional inline API key parameter 91 | - `explain_query(text, text, text)` - AI-powered query plan analysis 92 | - Runs `EXPLAIN ANALYZE` and provides human-readable explanations 93 | - Performance insights and optimization suggestions 94 | - `get_database_tables()` - Returns JSON array of all user tables with metadata 95 | - `get_table_details(text, text)` - Returns detailed table information including columns, constraints, and indexes 96 | 97 | #### AI Provider Support 98 | - **OpenAI** - All current supported models 99 | - **Anthropic** - All current supported models 100 | - **OpenAI-compatible APIs** - Support for OpenRouter and other compatible endpoints 101 | - Automatic provider selection with fallback logic 102 | - Configurable retry policies with exponential backoff 103 | 104 | #### Configuration 105 | - File-based configuration via `~/.pg_ai.config` 106 | - Support for multiple AI providers in single config 107 | - Configurable options: 108 | - Log level and logging toggle 109 | - Request timeout and retry settings 110 | - Query limit enforcement 111 | - Response formatting (explanations, warnings, visualizations) 112 | - Custom API endpoints for OpenAI-compatible services 113 | 114 | #### Documentation 115 | - Comprehensive mdBook documentation 116 | - Installation and configuration guides 117 | - Provider comparison and model selection guide 118 | - Usage examples and troubleshooting 119 | 120 | ### Technical Details 121 | 122 | - Built with C++20 123 | - Uses [ai-sdk-cpp](https://github.com/clickhouse/ai-sdk-cpp) for AI provider integration 124 | - SSL/TLS support via OpenSSL 125 | - PostgreSQL extension API compliance 126 | 127 | --- 128 | 129 | [v0.1.1]: https://github.com/benodiwal/pg_ai_query/compare/v0.1.0...v0.1.1 130 | [v0.1.0]: https://github.com/benodiwal/pg_ai_query/releases/tag/v0.1.0 131 | [v0.1.0-beta]: https://github.com/benodiwal/pg_ai_query/releases/tag/v0.1.0-beta 132 | -------------------------------------------------------------------------------- /docs/src/introduction.md: -------------------------------------------------------------------------------- 1 | # PostgreSQL AI Query Extension 2 | 3 | The **PostgreSQL AI Query Extension** (`pg_ai_query`) is a powerful PostgreSQL extension that allows you to generate SQL queries from natural language descriptions using state-of-the-art AI models from OpenAI, Anthropic, and Google Gemini. 4 | 5 | ## What is pg_ai_query? 6 | 7 | `pg_ai_query` bridges the gap between natural language and SQL by leveraging large language models to understand your intent and automatically generate optimized PostgreSQL queries. The extension integrates directly into your PostgreSQL database, providing a seamless way to query your data using plain English. 8 | 9 | ## Key Features 10 | 11 | - **Natural Language to SQL**: Convert plain English descriptions into valid PostgreSQL queries 12 | - **AI-Powered Query Analysis**: Analyze query performance with EXPLAIN ANALYZE and get optimization insights 13 | - **Automatic Schema Discovery**: The extension automatically analyzes your database schema to understand table structures, relationships, and constraints 14 | - **Multiple AI Providers**: Support for OpenAI (GPT-4, GPT-3.5), Anthropic (Claude), and Google (Gemini) models 15 | - **Intelligent Query Generation**: Generates optimized queries with appropriate JOINs, WHERE clauses, and LIMIT constraints 16 | - **Performance Optimization**: Get AI-powered recommendations for query improvements and index suggestions 17 | - **Safety First**: Built-in protections against dangerous operations and unauthorized access to system tables 18 | - **Configurable**: Flexible configuration system with support for API keys, model selection, and logging 19 | - **PostgreSQL Native**: Runs directly within PostgreSQL as a native extension 20 | 21 | ## How It Works 22 | 23 | 1. **Schema Analysis**: The extension automatically discovers and analyzes your database schema 24 | 2. **Natural Language Processing**: Your natural language query is processed and understood by AI models 25 | 3. **SQL Generation**: The AI generates an appropriate SQL query based on your schema and request 26 | 4. **Query Validation**: The generated query is validated for safety and correctness 27 | 5. **Execution Ready**: You receive a ready-to-execute SQL query 28 | 29 | ## Example Usage 30 | 31 | ### Query Generation 32 | ```sql 33 | -- Simple query 34 | SELECT generate_query('show me all users created in the last 7 days'); 35 | 36 | -- With custom API key 37 | SELECT generate_query('count orders by status', 'your-api-key-here'); 38 | 39 | -- With specific provider 40 | SELECT generate_query('find top 10 customers by revenue', 'your-api-key', 'openai'); 41 | ``` 42 | 43 | ### Query Performance Analysis 44 | ```sql 45 | -- Analyze query performance 46 | SELECT explain_query('SELECT * FROM users WHERE active = true'); 47 | 48 | -- Get optimization suggestions for complex queries 49 | SELECT explain_query(' 50 | SELECT u.username, COUNT(o.id) as orders 51 | FROM users u 52 | LEFT JOIN orders o ON u.id = o.user_id 53 | GROUP BY u.username 54 | ORDER BY orders DESC 55 | '); 56 | 57 | -- Combine generation and analysis 58 | WITH generated AS ( 59 | SELECT generate_query('find high-value recent orders') as query 60 | ) 61 | SELECT explain_query((SELECT query FROM generated)); 62 | ``` 63 | 64 | ## Supported AI Models 65 | 66 | ### OpenAI Models 67 | - **GPT-4o**: Latest and most capable model 68 | - **GPT-4**: High-quality reasoning and code generation 69 | - **GPT-3.5 Turbo**: Fast and efficient for simpler queries 70 | 71 | ### Anthropic Models 72 | - **Claude 3.5 Sonnet**: Advanced reasoning and natural language understanding 73 | 74 | ### Google Gemini Models 75 | - **Gemini 2.5 Pro**: Flagship model with excellent reasoning 76 | - **Gemini 2.5 Flash**: Fast and cost-effective for most queries 77 | - **Gemini 2.0 Flash**: Previous generation, still highly capable 78 | 79 | ## Use Cases 80 | 81 | - **Data Exploration**: Quickly explore your data without writing complex SQL 82 | - **Query Optimization**: Analyze and optimize slow-performing queries with AI insights 83 | - **Performance Monitoring**: Regular analysis of critical queries for performance regression 84 | - **Business Intelligence**: Generate reports and analytics queries from natural descriptions 85 | - **Learning SQL**: Understand how natural language translates to SQL syntax and learn optimization techniques 86 | - **Index Planning**: Get AI-powered recommendations for database index strategies 87 | - **Rapid Prototyping**: Quickly generate queries for testing and development 88 | - **Documentation**: Generate example queries for database documentation 89 | 90 | ## Architecture 91 | 92 | The extension consists of several key components: 93 | 94 | - **Query Generator**: Core engine that processes natural language and generates SQL 95 | - **Query Analyzer**: Performance analysis engine using EXPLAIN ANALYZE and AI insights 96 | - **Schema Discovery**: Automatically analyzes database structure and relationships 97 | - **AI Provider Integration**: Handles communication with OpenAI and Anthropic APIs 98 | - **Configuration Manager**: Manages settings, API keys, and model configurations 99 | - **Safety Validator**: Ensures generated queries are safe and authorized 100 | 101 | ## Security and Safety 102 | 103 | `pg_ai_query` is designed with security in mind: 104 | 105 | - **No System Access**: Cannot access system catalogs or sensitive PostgreSQL internals 106 | - **User Table Focus**: Only operates on user-created tables and data 107 | - **Query Validation**: All generated queries are validated before return 108 | - **Configurable Limits**: Built-in LIMIT enforcement to prevent large result sets 109 | - **API Key Protection**: Secure handling of API credentials 110 | 111 | ## Getting Started 112 | 113 | Ready to start generating SQL queries from natural language? Head over to the [Installation](./installation.md) guide to get started, or check out our [Quick Start](./quick-start.md) tutorial for a hands-on introduction. 114 | 115 | ## Repository 116 | 117 | The source code is available on [GitHub](https://github.com/benodiwal/pg_ai_query). Feel free to contribute, report issues, or explore the implementation. -------------------------------------------------------------------------------- /docs/src/installation.md: -------------------------------------------------------------------------------- 1 | # Installation 2 | 3 | This guide will walk you through installing the PostgreSQL AI Query extension on your system. 4 | 5 | ## Prerequisites 6 | 7 | Before installing `pg_ai_query`, ensure you have the following: 8 | 9 | ### System Requirements 10 | - **PostgreSQL 14+**: The extension requires PostgreSQL version 14 or later 11 | - **Operating System**: Linux, macOS, or Windows (with proper development tools) 12 | - **C++ Compiler**: GCC 8+ or Clang 10+ with C++20 support 13 | - **CMake**: Version 3.15 or later for building 14 | 15 | ### Development Dependencies 16 | - PostgreSQL development headers (`postgresql-devel` on RHEL/CentOS, `postgresql-server-dev-all` on Ubuntu/Debian) 17 | - OpenSSL development libraries 18 | - Git (for cloning the repository) 19 | 20 | ### AI Provider Requirements 21 | You'll need API access to at least one of the following: 22 | - **OpenAI API**: Create an account at [platform.openai.com](https://platform.openai.com) 23 | - **Anthropic API**: Create an account at [console.anthropic.com](https://console.anthropic.com) 24 | 25 | ## Installation Methods 26 | 27 | ### Method 1: Build from Source (Recommended) 28 | 29 | #### 1. Clone the Repository 30 | 31 | ```bash 32 | git clone https://github.com/benodiwal/pg_ai_query.git 33 | cd pg_ai_query 34 | ``` 35 | 36 | #### 2. Initialize Submodules 37 | 38 | ```bash 39 | git submodule update --init --recursive 40 | ``` 41 | 42 | #### 3. Build the Extension 43 | 44 | ```bash 45 | make clean 46 | make 47 | ``` 48 | 49 | #### 4. Install the Extension 50 | 51 | ```bash 52 | # Install to default PostgreSQL location 53 | sudo make install 54 | 55 | # Or specify a custom PostgreSQL installation 56 | PG_CONFIG=/path/to/pg_config make install 57 | ``` 58 | 59 | ### Method 2: Package Installation (Future) 60 | 61 | *Package installations will be available in future releases for major Linux distributions and package managers.* 62 | 63 | ## Platform-Specific Instructions 64 | 65 | ### Ubuntu/Debian 66 | 67 | ```bash 68 | # Install dependencies 69 | sudo apt-get update 70 | sudo apt-get install postgresql-server-dev-all cmake gcc g++ libssl-dev git 71 | 72 | # Clone and build 73 | git clone https://github.com/benodiwal/pg_ai_query.git 74 | cd pg_ai_query 75 | git submodule update --init --recursive 76 | make && sudo make install 77 | ``` 78 | 79 | ### CentOS/RHEL/Fedora 80 | 81 | ```bash 82 | # Install dependencies 83 | sudo yum install postgresql-devel cmake gcc-c++ openssl-devel git 84 | # OR on newer systems: 85 | sudo dnf install postgresql-devel cmake gcc-c++ openssl-devel git 86 | 87 | # Clone and build 88 | git clone https://github.com/benodiwal/pg_ai_query.git 89 | cd pg_ai_query 90 | git submodule update --init --recursive 91 | make && sudo make install 92 | ``` 93 | 94 | ### macOS 95 | 96 | ```bash 97 | # Install dependencies (with Homebrew) 98 | brew install postgresql cmake openssl git 99 | 100 | # Clone and build 101 | git clone https://github.com/benodiwal/pg_ai_query.git 102 | cd pg_ai_query 103 | git submodule update --init --recursive 104 | make && sudo make install 105 | ``` 106 | 107 | ### Windows 108 | 109 | For Windows installation, you'll need: 110 | - Visual Studio 2019 or later with C++ tools 111 | - PostgreSQL installed from EDB or compiled from source 112 | - CMake for Windows 113 | 114 | *Detailed Windows installation instructions will be added in a future update.* 115 | 116 | ## Verification 117 | 118 | After installation, verify that the extension is properly installed: 119 | 120 | ### 1. Connect to PostgreSQL 121 | 122 | ```bash 123 | psql -d your_database 124 | ``` 125 | 126 | ### 2. Create the Extension 127 | 128 | ```sql 129 | CREATE EXTENSION IF NOT EXISTS pg_ai_query; 130 | ``` 131 | 132 | ### 3. Test the Installation 133 | 134 | ```sql 135 | -- Check if functions are available 136 | \\df generate_query 137 | 138 | -- Test with a simple query (will fail without API key, which is expected) 139 | SELECT generate_query('show me all tables'); 140 | ``` 141 | 142 | You should see the function listed and get an error about missing API configuration (which is normal - we'll configure that next). 143 | 144 | ## Troubleshooting Installation 145 | 146 | ### Common Issues 147 | 148 | #### PostgreSQL Development Headers Missing 149 | 150 | **Error**: `postgres.h: No such file or directory` 151 | 152 | **Solution**: Install PostgreSQL development packages: 153 | ```bash 154 | # Ubuntu/Debian 155 | sudo apt-get install postgresql-server-dev-all 156 | 157 | # CentOS/RHEL 158 | sudo yum install postgresql-devel 159 | ``` 160 | 161 | #### CMake Not Found 162 | 163 | **Error**: `cmake: command not found` 164 | 165 | **Solution**: Install CMake: 166 | ```bash 167 | # Ubuntu/Debian 168 | sudo apt-get install cmake 169 | 170 | # macOS 171 | brew install cmake 172 | 173 | # CentOS/RHEL 174 | sudo yum install cmake 175 | ``` 176 | 177 | #### Compilation Errors 178 | 179 | **Error**: Various C++ compilation errors 180 | 181 | **Solutions**: 182 | - Ensure you have a C++20-compatible compiler 183 | - Check that all submodules are initialized: `git submodule update --init --recursive` 184 | - Clean and rebuild: `make clean && make` 185 | 186 | #### Permission Denied During Installation 187 | 188 | **Error**: Permission denied when running `make install` 189 | 190 | **Solution**: Use `sudo` for installation: 191 | ```bash 192 | sudo make install 193 | ``` 194 | 195 | #### Wrong PostgreSQL Installation 196 | 197 | **Error**: Extension installs to wrong PostgreSQL version 198 | 199 | **Solution**: Specify the correct `pg_config`: 200 | ```bash 201 | PG_CONFIG=/usr/pgsql-14/bin/pg_config make install 202 | ``` 203 | 204 | ### Getting Help 205 | 206 | If you encounter issues not covered here: 207 | 208 | 1. Check the [Troubleshooting](./troubleshooting.md) guide 209 | 2. Review the [FAQ](./faq.md) for common questions 210 | 3. Open an issue on the [GitHub repository](https://github.com/benodiwal/pg_ai_query/issues) 211 | 212 | ## Next Steps 213 | 214 | Once installation is complete: 215 | 216 | 1. [Configure the extension](./configuration.md) with your API keys 217 | 2. Follow the [Quick Start Guide](./quick-start.md) to generate your first queries 218 | 3. Explore [Usage Examples](./examples.md) for inspiration -------------------------------------------------------------------------------- /src/providers/gemini/client.cpp: -------------------------------------------------------------------------------- 1 | #include "../../include/gemini_client.h" 2 | 3 | #include 4 | #include 5 | #include 6 | 7 | namespace gemini { 8 | 9 | namespace { 10 | 11 | // Callback for libcurl to write response data 12 | size_t WriteCallback(void* contents, size_t size, size_t nmemb, void* userp) { 13 | ((std::string*)userp)->append((char*)contents, size * nmemb); 14 | return size * nmemb; 15 | } 16 | 17 | } // namespace 18 | 19 | GeminiClient::GeminiClient(const std::string& api_key) : api_key_(api_key) { 20 | curl_global_init(CURL_GLOBAL_DEFAULT); 21 | } 22 | 23 | std::string GeminiClient::build_request_body(const GeminiRequest& request) { 24 | nlohmann::json body; 25 | 26 | // Build contents array 27 | nlohmann::json contents = nlohmann::json::array(); 28 | nlohmann::json user_message; 29 | user_message["parts"] = nlohmann::json::array(); 30 | user_message["parts"].push_back({{"text", request.user_prompt}}); 31 | contents.push_back(user_message); 32 | body["contents"] = contents; 33 | 34 | // Add system instruction if provided 35 | if (!request.system_prompt.empty()) { 36 | nlohmann::json system_instruction; 37 | system_instruction["parts"] = nlohmann::json::array(); 38 | system_instruction["parts"].push_back({{"text", request.system_prompt}}); 39 | body["systemInstruction"] = system_instruction; 40 | } 41 | 42 | // Add generation config 43 | nlohmann::json generation_config; 44 | if (request.temperature.has_value()) { 45 | generation_config["temperature"] = request.temperature.value(); 46 | } 47 | if (request.max_tokens.has_value()) { 48 | generation_config["maxOutputTokens"] = request.max_tokens.value(); 49 | } 50 | 51 | if (!generation_config.empty()) { 52 | body["generationConfig"] = generation_config; 53 | } 54 | 55 | return body.dump(); 56 | } 57 | 58 | GeminiResponse GeminiClient::parse_response(const std::string& body, 59 | int status_code) { 60 | GeminiResponse response; 61 | response.status_code = status_code; 62 | 63 | if (status_code != 200) { 64 | response.success = false; 65 | try { 66 | auto json = nlohmann::json::parse(body); 67 | if (json.contains("error")) { 68 | auto error = json["error"]; 69 | response.error_message = error.value("message", "Unknown error"); 70 | if (error.contains("code")) { 71 | response.error_message = "Error " + 72 | std::to_string(error["code"].get()) + 73 | ": " + response.error_message; 74 | } 75 | } else { 76 | response.error_message = "HTTP " + std::to_string(status_code); 77 | } 78 | } catch (const std::exception& e) { 79 | response.error_message = 80 | "HTTP " + std::to_string(status_code) + ": " + body; 81 | } 82 | return response; 83 | } 84 | 85 | try { 86 | auto json = nlohmann::json::parse(body); 87 | 88 | // Extract text from: candidates[0].content.parts[0].text 89 | if (json.contains("candidates") && json["candidates"].is_array() && 90 | !json["candidates"].empty()) { 91 | auto& candidate = json["candidates"][0]; 92 | 93 | if (candidate.contains("content")) { 94 | auto& content = candidate["content"]; 95 | 96 | if (content.contains("parts") && content["parts"].is_array() && 97 | !content["parts"].empty()) { 98 | auto& part = content["parts"][0]; 99 | 100 | if (part.contains("text")) { 101 | response.text = part["text"].get(); 102 | response.success = true; 103 | return response; 104 | } 105 | } 106 | } 107 | } 108 | 109 | response.success = false; 110 | response.error_message = "Invalid response format: missing text content"; 111 | } catch (const std::exception& e) { 112 | response.success = false; 113 | response.error_message = std::string("JSON parse error: ") + e.what(); 114 | } 115 | 116 | return response; 117 | } 118 | 119 | GeminiResponse GeminiClient::make_http_request(const std::string& url, 120 | const std::string& body) { 121 | CURL* curl = curl_easy_init(); 122 | GeminiResponse response; 123 | 124 | if (!curl) { 125 | response.success = false; 126 | response.error_message = "Failed to initialize CURL"; 127 | return response; 128 | } 129 | 130 | std::string response_body; 131 | long http_code = 0; 132 | 133 | // Set URL 134 | curl_easy_setopt(curl, CURLOPT_URL, url.c_str()); 135 | 136 | // Set headers 137 | struct curl_slist* headers = nullptr; 138 | headers = curl_slist_append(headers, "Content-Type: application/json"); 139 | 140 | std::string api_key_header = "x-goog-api-key: " + api_key_; 141 | headers = curl_slist_append(headers, api_key_header.c_str()); 142 | 143 | curl_easy_setopt(curl, CURLOPT_HTTPHEADER, headers); 144 | 145 | // Set POST data 146 | curl_easy_setopt(curl, CURLOPT_POSTFIELDS, body.c_str()); 147 | 148 | // Set write callback 149 | curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, WriteCallback); 150 | curl_easy_setopt(curl, CURLOPT_WRITEDATA, &response_body); 151 | 152 | // Perform request 153 | CURLcode res = curl_easy_perform(curl); 154 | 155 | if (res != CURLE_OK) { 156 | response.success = false; 157 | response.error_message = 158 | std::string("CURL error: ") + curl_easy_strerror(res); 159 | } else { 160 | curl_easy_getinfo(curl, CURLINFO_RESPONSE_CODE, &http_code); 161 | response = parse_response(response_body, static_cast(http_code)); 162 | } 163 | 164 | curl_slist_free_all(headers); 165 | curl_easy_cleanup(curl); 166 | 167 | return response; 168 | } 169 | 170 | GeminiResponse GeminiClient::generate_text(const GeminiRequest& request) { 171 | // Build URL 172 | std::string url = std::string(BASE_URL) + "/" + API_VERSION + "/models/" + 173 | request.model + ":generateContent"; 174 | 175 | // Build request body 176 | std::string body = build_request_body(request); 177 | 178 | // Make HTTP request 179 | return make_http_request(url, body); 180 | } 181 | 182 | } // namespace gemini 183 | -------------------------------------------------------------------------------- /tests/unit/test_utils.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | 6 | #include "../test_helpers.hpp" 7 | #include "include/utils.hpp" 8 | 9 | using namespace pg_ai::utils; 10 | using namespace pg_ai::test_utils; 11 | 12 | class UtilsTest : public ::testing::Test { 13 | protected: 14 | std::filesystem::path temp_dir_; 15 | 16 | void SetUp() override { 17 | temp_dir_ = std::filesystem::temp_directory_path() / "pg_ai_utils_test"; 18 | std::filesystem::create_directories(temp_dir_); 19 | } 20 | 21 | void TearDown() override { std::filesystem::remove_all(temp_dir_); } 22 | 23 | std::string createTempFile(const std::string& filename, 24 | const std::string& content) { 25 | auto path = temp_dir_ / filename; 26 | std::ofstream file(path); 27 | file << content; 28 | file.close(); 29 | return path.string(); 30 | } 31 | }; 32 | 33 | // Test read_file with valid file 34 | TEST_F(UtilsTest, ReadFileValid) { 35 | std::string content = "Hello, World!\nLine 2\n"; 36 | std::string path = createTempFile("test.txt", content); 37 | 38 | auto [success, read_content] = read_file(path); 39 | 40 | EXPECT_TRUE(success); 41 | EXPECT_EQ(read_content, content); 42 | } 43 | 44 | // Test read_file with empty file 45 | TEST_F(UtilsTest, ReadFileEmpty) { 46 | std::string path = createTempFile("empty.txt", ""); 47 | 48 | auto [success, content] = read_file(path); 49 | 50 | EXPECT_TRUE(success); 51 | EXPECT_TRUE(content.empty()); 52 | } 53 | 54 | // Test read_file with non-existent file 55 | TEST_F(UtilsTest, ReadFileNonExistent) { 56 | auto [success, content] = read_file("/nonexistent/path/file.txt"); 57 | 58 | EXPECT_FALSE(success); 59 | EXPECT_TRUE(content.empty()); 60 | } 61 | 62 | // Test read_file with binary content 63 | TEST_F(UtilsTest, ReadFileBinary) { 64 | std::string binary_content = "test\x00with\x00nulls"; 65 | auto path = temp_dir_ / "binary.bin"; 66 | std::ofstream file(path, std::ios::binary); 67 | file.write(binary_content.c_str(), binary_content.size()); 68 | file.close(); 69 | 70 | auto [success, content] = read_file(path.string()); 71 | 72 | EXPECT_TRUE(success); 73 | // Note: string comparison may stop at null bytes, so check size 74 | EXPECT_EQ(content.size(), binary_content.size()); 75 | } 76 | 77 | // Test read_file_or_throw with valid file 78 | TEST_F(UtilsTest, ReadFileOrThrowValid) { 79 | std::string expected = "Test content"; 80 | std::string path = createTempFile("valid.txt", expected); 81 | 82 | std::string content = read_file_or_throw(path); 83 | 84 | EXPECT_EQ(content, expected); 85 | } 86 | 87 | // Test read_file_or_throw with non-existent file throws 88 | TEST_F(UtilsTest, ReadFileOrThrowThrows) { 89 | EXPECT_THROW(read_file_or_throw("/nonexistent/file.txt"), std::runtime_error); 90 | } 91 | 92 | // Test formatAPIError with valid JSON error 93 | TEST_F(UtilsTest, FormatAPIErrorValidJSON) { 94 | std::string raw_error = R"({ 95 | "error": { 96 | "type": "not_found_error", 97 | "message": "Model not found: model: invalid-model-name" 98 | } 99 | })"; 100 | 101 | std::string formatted = formatAPIError(raw_error); 102 | 103 | EXPECT_THAT(formatted, testing::HasSubstr("Invalid model")); 104 | EXPECT_THAT(formatted, testing::HasSubstr("invalid-model-name")); 105 | } 106 | 107 | // Test formatAPIError with generic error message 108 | TEST_F(UtilsTest, FormatAPIErrorGenericMessage) { 109 | std::string raw_error = R"({ 110 | "error": { 111 | "type": "rate_limit_error", 112 | "message": "Rate limit exceeded. Please try again later." 113 | } 114 | })"; 115 | 116 | std::string formatted = formatAPIError(raw_error); 117 | 118 | EXPECT_EQ(formatted, "Rate limit exceeded. Please try again later."); 119 | } 120 | 121 | // Test formatAPIError with not_found_error but no model info 122 | TEST_F(UtilsTest, FormatAPIErrorNotFoundNoModel) { 123 | std::string raw_error = R"({ 124 | "error": { 125 | "type": "not_found_error", 126 | "message": "Resource not found" 127 | } 128 | })"; 129 | 130 | std::string formatted = formatAPIError(raw_error); 131 | 132 | EXPECT_THAT(formatted, testing::HasSubstr("Model not found")); 133 | } 134 | 135 | // Test formatAPIError with invalid JSON returns raw 136 | TEST_F(UtilsTest, FormatAPIErrorInvalidJSON) { 137 | std::string raw_error = "This is not JSON"; 138 | 139 | std::string formatted = formatAPIError(raw_error); 140 | 141 | EXPECT_EQ(formatted, raw_error); 142 | } 143 | 144 | // Test formatAPIError with JSON embedded in text 145 | TEST_F(UtilsTest, FormatAPIErrorJSONInText) { 146 | std::string raw_error = 147 | R"(API Error: {"error": {"message": "Authentication failed"}})"; 148 | 149 | std::string formatted = formatAPIError(raw_error); 150 | 151 | EXPECT_EQ(formatted, "Authentication failed"); 152 | } 153 | 154 | // Test formatAPIError with empty error object 155 | TEST_F(UtilsTest, FormatAPIErrorEmptyError) { 156 | std::string raw_error = R"({"error": {}})"; 157 | 158 | std::string formatted = formatAPIError(raw_error); 159 | 160 | // Falls through to return raw error 161 | EXPECT_EQ(formatted, raw_error); 162 | } 163 | 164 | // Test formatAPIError with missing error key 165 | TEST_F(UtilsTest, FormatAPIErrorMissingErrorKey) { 166 | std::string raw_error = R"({"status": "error", "code": 500})"; 167 | 168 | std::string formatted = formatAPIError(raw_error); 169 | 170 | EXPECT_EQ(formatted, raw_error); 171 | } 172 | 173 | // Test reading actual fixture files 174 | TEST_F(UtilsTest, ReadFixtureFiles) { 175 | std::string config_path = getConfigFixture("valid_config.ini"); 176 | auto [success, content] = read_file(config_path); 177 | 178 | EXPECT_TRUE(success); 179 | EXPECT_THAT(content, testing::HasSubstr("[general]")); 180 | EXPECT_THAT(content, testing::HasSubstr("[openai]")); 181 | } 182 | 183 | // Test TempConfigFile helper 184 | TEST(TempConfigFileTest, CreatesAndCleansUp) { 185 | std::string path; 186 | { 187 | TempConfigFile temp("test content"); 188 | path = temp.path(); 189 | 190 | // File should exist 191 | EXPECT_TRUE(std::filesystem::exists(path)); 192 | 193 | // Content should be readable 194 | std::ifstream file(path); 195 | std::string content((std::istreambuf_iterator(file)), 196 | std::istreambuf_iterator()); 197 | EXPECT_EQ(content, "test content"); 198 | } 199 | 200 | // File should be deleted after scope 201 | EXPECT_FALSE(std::filesystem::exists(path)); 202 | } 203 | -------------------------------------------------------------------------------- /NIX.md: -------------------------------------------------------------------------------- 1 | # Nix Development Environment 2 | 3 | This project supports Nix flakes for reproducible development environments. 4 | 5 | ## Prerequisites 6 | 7 | 1. Install Nix with flakes enabled: 8 | 9 | ```bash 10 | # Install Nix (if not already installed) 11 | curl -L https://nixos.org/nix/install | sh 12 | 13 | # Enable flakes (add to ~/.config/nix/nix.conf or /etc/nix/nix.conf) 14 | echo "experimental-features = nix-command flakes" >> ~/.config/nix/nix.conf 15 | ``` 16 | 17 | 2. (Optional) Install direnv for automatic environment loading: 18 | 19 | ```bash 20 | # On macOS 21 | brew install direnv 22 | 23 | # On Linux 24 | # Use your package manager, e.g., apt, dnf, pacman 25 | 26 | # Add to your shell rc file (~/.bashrc, ~/.zshrc, etc.) 27 | eval "$(direnv hook bash)" # or zsh, fish, etc. 28 | ``` 29 | 30 | ## Getting Started 31 | 32 | ### Option 1: Using direnv (Recommended) 33 | 34 | If you have direnv installed, the environment will automatically load when you enter the project directory: 35 | 36 | ```bash 37 | cd pg_ai_query 38 | direnv allow # First time only 39 | ``` 40 | 41 | The environment will automatically activate and deactivate as you enter/leave the directory. 42 | 43 | ### Option 2: Manual Nix Shell 44 | 45 | Enter the development environment manually: 46 | 47 | ```bash 48 | # Full development environment 49 | nix develop 50 | 51 | # Minimal environment (fewer tools) 52 | nix develop .#minimal 53 | ``` 54 | 55 | ## What's Included 56 | 57 | The Nix development environment provides: 58 | 59 | ### Build Tools 60 | - CMake 3.x 61 | - GNU Make 62 | - Clang/LLVM toolchain with C++20 support 63 | - pkg-config 64 | - Git (for submodules) 65 | 66 | ### PostgreSQL 67 | - PostgreSQL 16 with development headers 68 | - `pg_config` utility 69 | - `psql` client 70 | - Database management tools (initdb, pg_ctl) 71 | 72 | ### Dependencies 73 | - OpenSSL 74 | - zlib 75 | - gettext (libintl) 76 | 77 | ### Rust Toolchain 78 | - Cargo (package manager) 79 | - rustc (Rust compiler) 80 | - rust-analyzer (LSP server) 81 | - rustfmt (code formatter) 82 | - clippy (linter) 83 | 84 | ### Documentation Tools 85 | - mdbook (main documentation generator) 86 | - mdbook-mermaid (diagrams) 87 | - mdbook-linkcheck (link validation) 88 | - mdbook-toc (table of contents) 89 | - mdbook-admonish (callout blocks) 90 | - graphviz, plantuml (diagram generation) 91 | 92 | ### Development Tools 93 | - GDB debugger 94 | - Valgrind (Linux only) 95 | - clang-format for code formatting 96 | - yamllint for YAML linting 97 | - markdownlint-cli for Markdown linting 98 | 99 | ### Utilities 100 | - ripgrep (rg) - Fast code search 101 | - fd - Fast file finder 102 | - jq - JSON processing 103 | 104 | ## Building the Project 105 | 106 | Once in the Nix environment: 107 | 108 | ### Build Extension 109 | 110 | ```bash 111 | # Initialize git submodules (if not already done) 112 | git submodule update --init --recursive 113 | 114 | # Create build directory 115 | mkdir -p build && cd build 116 | 117 | # Configure with CMake 118 | cmake .. 119 | 120 | # Build 121 | make 122 | 123 | # Install (optional - installs to PostgreSQL directory) 124 | sudo make install 125 | ``` 126 | 127 | ### Build Documentation 128 | 129 | ```bash 130 | # Serve documentation with live reload 131 | cd docs 132 | mdbook serve 133 | # Visit http://localhost:3000 134 | 135 | # Build static documentation 136 | mdbook build 137 | 138 | # Check for broken links 139 | mdbook test 140 | ``` 141 | 142 | ## Building the Nix Package 143 | 144 | You can also build the extension as a Nix package: 145 | 146 | ```bash 147 | # Build the package 148 | nix build 149 | 150 | # The result will be in ./result/ 151 | ls -la result/ 152 | 153 | # Install the package to your system (NixOS/nix-darwin) 154 | nix profile install . 155 | ``` 156 | 157 | ## Local PostgreSQL Instance 158 | 159 | The development environment sets up variables for a local PostgreSQL instance: 160 | 161 | ```bash 162 | # Initialize a local PostgreSQL database (inside nix develop) 163 | initdb -D .pgdata 164 | 165 | # Start PostgreSQL 166 | pg_ctl -D .pgdata -l logfile start 167 | 168 | # Connect to the database 169 | psql -d postgres 170 | 171 | # Stop PostgreSQL when done 172 | pg_ctl -D .pgdata stop 173 | ``` 174 | 175 | The local data directory (`.pgdata`) is gitignored. 176 | 177 | ## Environment Variables 178 | 179 | The Nix shell automatically sets: 180 | 181 | ### PostgreSQL 182 | - `PGDATA`: Points to `.pgdata` in project root 183 | - `PGHOST`: Points to `.pgdata` (Unix socket) 184 | - `PGPORT`: Set to 5432 185 | - `PG_CONFIG`: Path to pg_config utility 186 | 187 | ### Build Environment 188 | - `CMAKE_EXPORT_COMPILE_COMMANDS`: Generates compile_commands.json for LSP 189 | 190 | ### Rust/Cargo 191 | - `CARGO_HOME`: Cargo installation directory (`.cargo`) 192 | - `RUSTUP_HOME`: Rustup installation directory (`.rustup`) 193 | - `RUST_BACKTRACE`: Enabled for better error messages 194 | 195 | ## IDE Integration 196 | 197 | ### VSCode 198 | 199 | The environment is compatible with VSCode's C++ extension. The `compile_commands.json` file is automatically generated for IntelliSense. 200 | 201 | ### Neovim/Vim 202 | 203 | For LSP support with clangd, the `compile_commands.json` is symlinked to the project root automatically. 204 | 205 | ## Troubleshooting 206 | 207 | ### Direnv not working 208 | 209 | Make sure direnv is properly set up: 210 | 211 | ```bash 212 | # Check if direnv is installed 213 | which direnv 214 | 215 | # Check if hook is in your shell rc 216 | grep direnv ~/.zshrc # or ~/.bashrc 217 | ``` 218 | 219 | ### Submodule issues 220 | 221 | If you get errors about missing submodules: 222 | 223 | ```bash 224 | git submodule update --init --recursive 225 | ``` 226 | 227 | ### PostgreSQL version mismatch 228 | 229 | The flake uses PostgreSQL 16 by default. If you need a different version, edit `flake.nix`: 230 | 231 | ```nix 232 | # Change this line 233 | postgresql = pkgs.postgresql_16; 234 | 235 | # To this (for PostgreSQL 15) 236 | postgresql = pkgs.postgresql_15; 237 | ``` 238 | 239 | Then reload the environment: 240 | 241 | ```bash 242 | direnv reload # if using direnv 243 | # or 244 | exit && nix develop # if using nix develop manually 245 | ``` 246 | 247 | ### Building on macOS 248 | 249 | On macOS, the extension builds as `.dylib` (for PostgreSQL 16+) or `.so` (older versions). The CMake configuration handles this automatically. 250 | 251 | ## Updating Dependencies 252 | 253 | To update the Nix dependencies: 254 | 255 | ```bash 256 | # Update flake inputs 257 | nix flake update 258 | 259 | # Or update specific input 260 | nix flake lock --update-input nixpkgs 261 | ``` 262 | 263 | ## CI/CD Integration 264 | 265 | The flake can be used in CI/CD pipelines: 266 | 267 | ```yaml 268 | # GitHub Actions example 269 | - name: Install Nix 270 | uses: cachix/install-nix-action@v22 271 | with: 272 | extra_nix_config: | 273 | experimental-features = nix-command flakes 274 | 275 | - name: Build 276 | run: nix build 277 | ``` 278 | 279 | ## Additional Resources 280 | 281 | - [Nix Manual](https://nixos.org/manual/nix/stable/) 282 | - [Nix Flakes](https://nixos.wiki/wiki/Flakes) 283 | - [direnv Documentation](https://direnv.net/) 284 | -------------------------------------------------------------------------------- /tests/sql/test_extension_functions.sql: -------------------------------------------------------------------------------- 1 | -- pg_ai_query Extension Function Tests 2 | -- These tests verify the extension functions work correctly 3 | -- Run with: psql -f tests/sql/test_extension_functions.sql 4 | 5 | -- Test 1: Extension is installed 6 | DO $$ 7 | BEGIN 8 | IF NOT EXISTS (SELECT 1 FROM pg_extension WHERE extname = 'pg_ai_query') THEN 9 | RAISE EXCEPTION 'FAIL: Extension pg_ai_query is not installed'; 10 | ELSE 11 | RAISE NOTICE 'PASS: Extension pg_ai_query is installed'; 12 | END IF; 13 | END $$; 14 | 15 | -- Test 2: generate_query function exists 16 | DO $$ 17 | BEGIN 18 | IF NOT EXISTS ( 19 | SELECT 1 FROM pg_proc p 20 | JOIN pg_namespace n ON p.pronamespace = n.oid 21 | WHERE p.proname = 'generate_query' 22 | ) THEN 23 | RAISE EXCEPTION 'FAIL: Function generate_query does not exist'; 24 | ELSE 25 | RAISE NOTICE 'PASS: Function generate_query exists'; 26 | END IF; 27 | END $$; 28 | 29 | -- Test 3: get_database_tables function exists 30 | DO $$ 31 | BEGIN 32 | IF NOT EXISTS ( 33 | SELECT 1 FROM pg_proc p 34 | JOIN pg_namespace n ON p.pronamespace = n.oid 35 | WHERE p.proname = 'get_database_tables' 36 | ) THEN 37 | RAISE EXCEPTION 'FAIL: Function get_database_tables does not exist'; 38 | ELSE 39 | RAISE NOTICE 'PASS: Function get_database_tables exists'; 40 | END IF; 41 | END $$; 42 | 43 | -- Test 4: get_table_details function exists 44 | DO $$ 45 | BEGIN 46 | IF NOT EXISTS ( 47 | SELECT 1 FROM pg_proc p 48 | JOIN pg_namespace n ON p.pronamespace = n.oid 49 | WHERE p.proname = 'get_table_details' 50 | ) THEN 51 | RAISE EXCEPTION 'FAIL: Function get_table_details does not exist'; 52 | ELSE 53 | RAISE NOTICE 'PASS: Function get_table_details exists'; 54 | END IF; 55 | END $$; 56 | 57 | -- Test 5: explain_query function exists 58 | DO $$ 59 | BEGIN 60 | IF NOT EXISTS ( 61 | SELECT 1 FROM pg_proc p 62 | JOIN pg_namespace n ON p.pronamespace = n.oid 63 | WHERE p.proname = 'explain_query' 64 | ) THEN 65 | RAISE EXCEPTION 'FAIL: Function explain_query does not exist'; 66 | ELSE 67 | RAISE NOTICE 'PASS: Function explain_query exists'; 68 | END IF; 69 | END $$; 70 | 71 | -- Test 6: get_database_tables returns valid JSON 72 | DO $$ 73 | DECLARE 74 | result TEXT; 75 | json_result JSONB; 76 | BEGIN 77 | SELECT get_database_tables() INTO result; 78 | 79 | -- Try to parse as JSON 80 | BEGIN 81 | json_result := result::jsonb; 82 | 83 | -- Check it's an array 84 | IF jsonb_typeof(json_result) != 'array' THEN 85 | RAISE EXCEPTION 'FAIL: get_database_tables did not return JSON array'; 86 | END IF; 87 | 88 | RAISE NOTICE 'PASS: get_database_tables returns valid JSON array'; 89 | EXCEPTION WHEN OTHERS THEN 90 | RAISE EXCEPTION 'FAIL: get_database_tables returned invalid JSON: %', SQLERRM; 91 | END; 92 | END $$; 93 | 94 | -- Test 7: get_database_tables returns tables (if any exist) 95 | DO $$ 96 | DECLARE 97 | result JSONB; 98 | table_names TEXT[]; 99 | table_count INTEGER; 100 | BEGIN 101 | SELECT get_database_tables()::jsonb INTO result; 102 | 103 | -- Extract table names 104 | SELECT array_agg(elem->>'table_name') 105 | INTO table_names 106 | FROM jsonb_array_elements(result) AS elem; 107 | 108 | -- Count tables (may be 0 if no user tables exist in database) 109 | table_count := COALESCE(array_length(table_names, 1), 0); 110 | 111 | IF table_count > 0 THEN 112 | RAISE NOTICE 'PASS: get_database_tables returned % tables', table_count; 113 | ELSE 114 | -- This is OK - the database might not have any user tables 115 | RAISE NOTICE 'SKIP: No user tables found in database (this is OK for empty databases)'; 116 | END IF; 117 | END $$; 118 | 119 | -- Test 8: get_table_details returns valid JSON for existing table 120 | DO $$ 121 | DECLARE 122 | result TEXT; 123 | json_result JSONB; 124 | BEGIN 125 | -- First check if test table exists 126 | IF EXISTS ( 127 | SELECT 1 FROM information_schema.tables 128 | WHERE table_schema = 'pg_ai_test' AND table_name = 'users' 129 | ) THEN 130 | SELECT get_table_details('users', 'pg_ai_test') INTO result; 131 | 132 | BEGIN 133 | json_result := result::jsonb; 134 | 135 | -- Check it has expected keys 136 | IF NOT (json_result ? 'table_name' AND json_result ? 'columns') THEN 137 | RAISE EXCEPTION 'FAIL: get_table_details missing expected keys'; 138 | END IF; 139 | 140 | RAISE NOTICE 'PASS: get_table_details returns valid JSON with expected structure'; 141 | EXCEPTION WHEN OTHERS THEN 142 | RAISE EXCEPTION 'FAIL: get_table_details returned invalid JSON: %', SQLERRM; 143 | END; 144 | ELSE 145 | RAISE NOTICE 'SKIP: Test table pg_ai_test.users does not exist'; 146 | END IF; 147 | END $$; 148 | 149 | -- Test 9: get_table_details includes column information 150 | DO $$ 151 | DECLARE 152 | result JSONB; 153 | column_count INTEGER; 154 | BEGIN 155 | IF EXISTS ( 156 | SELECT 1 FROM information_schema.tables 157 | WHERE table_schema = 'pg_ai_test' AND table_name = 'users' 158 | ) THEN 159 | SELECT get_table_details('users', 'pg_ai_test')::jsonb INTO result; 160 | 161 | SELECT jsonb_array_length(result->'columns') INTO column_count; 162 | 163 | IF column_count < 1 THEN 164 | RAISE EXCEPTION 'FAIL: get_table_details returned no columns'; 165 | END IF; 166 | 167 | -- Check first column has expected properties 168 | IF NOT (result->'columns'->0 ? 'column_name' AND result->'columns'->0 ? 'data_type') THEN 169 | RAISE EXCEPTION 'FAIL: Column missing expected properties'; 170 | END IF; 171 | 172 | RAISE NOTICE 'PASS: get_table_details returned % columns with proper structure', column_count; 173 | ELSE 174 | RAISE NOTICE 'SKIP: Test table pg_ai_test.users does not exist'; 175 | END IF; 176 | END $$; 177 | 178 | -- Test 10: generate_query fails gracefully without API key 179 | DO $$ 180 | DECLARE 181 | result TEXT; 182 | BEGIN 183 | -- This should return an error since no API key is configured 184 | BEGIN 185 | SELECT generate_query('show all users') INTO result; 186 | 187 | -- If we get here, check if it's an error message 188 | IF result LIKE '%API key%' OR result LIKE '%error%' THEN 189 | RAISE NOTICE 'PASS: generate_query returns appropriate error without API key'; 190 | ELSE 191 | -- It might have worked if there's a config file 192 | RAISE NOTICE 'INFO: generate_query returned: %', LEFT(result, 100); 193 | END IF; 194 | EXCEPTION WHEN OTHERS THEN 195 | -- Expected behavior - function may raise exception without API key 196 | RAISE NOTICE 'PASS: generate_query raises exception without API key: %', SQLERRM; 197 | END; 198 | END $$; 199 | 200 | -- Summary 201 | DO $$ 202 | BEGIN 203 | RAISE NOTICE ''; 204 | RAISE NOTICE '========================================'; 205 | RAISE NOTICE 'pg_ai_query Extension Tests Complete'; 206 | RAISE NOTICE '========================================'; 207 | END $$; 208 | -------------------------------------------------------------------------------- /src/pg_ai_query.cpp: -------------------------------------------------------------------------------- 1 | extern "C" { 2 | #include 3 | 4 | #include 5 | #include 6 | #include 7 | #include 8 | #include 9 | #include 10 | #include 11 | #include 12 | } 13 | 14 | #include 15 | 16 | #include "include/config.hpp" 17 | #include "include/query_generator.hpp" 18 | #include "include/response_formatter.hpp" 19 | 20 | extern "C" { 21 | PG_MODULE_MAGIC; 22 | 23 | PG_FUNCTION_INFO_V1(generate_query); 24 | PG_FUNCTION_INFO_V1(get_database_tables); 25 | PG_FUNCTION_INFO_V1(get_table_details); 26 | PG_FUNCTION_INFO_V1(explain_query); 27 | 28 | /** 29 | * generate_query(natural_language_query text, api_key text DEFAULT NULL, 30 | * provider text DEFAULT 'auto') 31 | * 32 | * Generates a SQL query from natural language input with automatic schema 33 | * discovery Provider options: 'openai', 'anthropic', 'auto' (auto-select based 34 | * on config) 35 | */ 36 | Datum generate_query(PG_FUNCTION_ARGS) { 37 | try { 38 | text* nl_query_arg = PG_GETARG_TEXT_PP(0); 39 | text* api_key_arg = PG_ARGISNULL(1) ? nullptr : PG_GETARG_TEXT_PP(1); 40 | text* provider_arg = PG_ARGISNULL(2) ? nullptr : PG_GETARG_TEXT_PP(2); 41 | 42 | std::string nl_query = text_to_cstring(nl_query_arg); 43 | std::string api_key = api_key_arg ? text_to_cstring(api_key_arg) : ""; 44 | std::string provider = 45 | provider_arg ? text_to_cstring(provider_arg) : "auto"; 46 | 47 | pg_ai::QueryRequest request{ 48 | .natural_language = nl_query, .api_key = api_key, .provider = provider}; 49 | 50 | auto result = pg_ai::QueryGenerator::generateQuery(request); 51 | 52 | if (!result.success) { 53 | ereport(ERROR, (errcode(ERRCODE_EXTERNAL_ROUTINE_EXCEPTION), 54 | errmsg("Query generation failed: %s", 55 | result.error_message.c_str()))); 56 | } 57 | 58 | const auto& config = pg_ai::config::ConfigManager::getConfig(); 59 | 60 | std::string formatted_response = 61 | pg_ai::ResponseFormatter::formatResponse(result, config); 62 | 63 | if (result.generated_query.empty()) { 64 | ereport(INFO, (errmsg("%s", result.explanation.c_str()))); 65 | PG_RETURN_TEXT_P(cstring_to_text("")); 66 | } 67 | 68 | PG_RETURN_TEXT_P(cstring_to_text(formatted_response.c_str())); 69 | } catch (const std::exception& e) { 70 | ereport(ERROR, (errcode(ERRCODE_INTERNAL_ERROR), 71 | errmsg("Internal error: %s", e.what()))); 72 | PG_RETURN_NULL(); 73 | } 74 | } 75 | 76 | /** 77 | * get_database_tables() 78 | * 79 | * Returns JSON array of all tables in the database with their schema info 80 | */ 81 | Datum get_database_tables(PG_FUNCTION_ARGS) { 82 | try { 83 | auto result = pg_ai::QueryGenerator::getDatabaseTables(); 84 | 85 | if (!result.success) { 86 | ereport(ERROR, (errcode(ERRCODE_EXTERNAL_ROUTINE_EXCEPTION), 87 | errmsg("Failed to get database tables: %s", 88 | result.error_message.c_str()))); 89 | } 90 | 91 | nlohmann::json json_result = nlohmann::json::array(); 92 | 93 | for (const auto& table : result.tables) { 94 | nlohmann::json table_json; 95 | table_json["table_name"] = table.table_name; 96 | table_json["schema_name"] = table.schema_name; 97 | table_json["table_type"] = table.table_type; 98 | table_json["estimated_rows"] = table.estimated_rows; 99 | json_result.push_back(table_json); 100 | } 101 | 102 | std::string json_string = json_result.dump(2); 103 | PG_RETURN_TEXT_P(cstring_to_text(json_string.c_str())); 104 | 105 | } catch (const std::exception& e) { 106 | ereport(ERROR, (errcode(ERRCODE_INTERNAL_ERROR), 107 | errmsg("Internal error: %s", e.what()))); 108 | PG_RETURN_NULL(); 109 | } 110 | } 111 | 112 | /** 113 | * get_table_details(table_name text, schema_name text DEFAULT 'public') 114 | * 115 | * Returns detailed JSON information about a specific table including columns, 116 | * constraints, and indexes 117 | */ 118 | Datum get_table_details(PG_FUNCTION_ARGS) { 119 | try { 120 | text* table_name_arg = PG_GETARG_TEXT_PP(0); 121 | text* schema_name_arg = PG_ARGISNULL(1) ? nullptr : PG_GETARG_TEXT_PP(1); 122 | 123 | std::string table_name = text_to_cstring(table_name_arg); 124 | std::string schema_name = 125 | schema_name_arg ? text_to_cstring(schema_name_arg) : "public"; 126 | 127 | auto result = 128 | pg_ai::QueryGenerator::getTableDetails(table_name, schema_name); 129 | 130 | if (!result.success) { 131 | ereport(ERROR, (errcode(ERRCODE_EXTERNAL_ROUTINE_EXCEPTION), 132 | errmsg("Failed to get table details: %s", 133 | result.error_message.c_str()))); 134 | } 135 | 136 | nlohmann::json json_result; 137 | json_result["table_name"] = result.table_name; 138 | json_result["schema_name"] = result.schema_name; 139 | 140 | nlohmann::json columns = nlohmann::json::array(); 141 | for (const auto& column : result.columns) { 142 | nlohmann::json column_json; 143 | column_json["column_name"] = column.column_name; 144 | column_json["data_type"] = column.data_type; 145 | column_json["is_nullable"] = column.is_nullable; 146 | column_json["column_default"] = column.column_default; 147 | column_json["is_primary_key"] = column.is_primary_key; 148 | column_json["is_foreign_key"] = column.is_foreign_key; 149 | if (!column.foreign_table.empty()) { 150 | column_json["foreign_table"] = column.foreign_table; 151 | column_json["foreign_column"] = column.foreign_column; 152 | } 153 | columns.push_back(column_json); 154 | } 155 | json_result["columns"] = columns; 156 | 157 | json_result["indexes"] = result.indexes; 158 | 159 | std::string json_string = json_result.dump(2); 160 | PG_RETURN_TEXT_P(cstring_to_text(json_string.c_str())); 161 | 162 | } catch (const std::exception& e) { 163 | ereport(ERROR, (errcode(ERRCODE_INTERNAL_ERROR), 164 | errmsg("Internal error: %s", e.what()))); 165 | PG_RETURN_NULL(); 166 | } 167 | } 168 | 169 | /** 170 | * explain_query(query_text text, api_key text DEFAULT NULL, 171 | * provider text DEFAULT 'auto') 172 | * 173 | * Runs EXPLAIN ANALYZE on a query and returns an AI-generated explanation 174 | * of the execution plan, performance insights, and optimization suggestions. 175 | */ 176 | Datum explain_query(PG_FUNCTION_ARGS) { 177 | try { 178 | text* query_text_arg = PG_GETARG_TEXT_PP(0); 179 | text* api_key_arg = PG_ARGISNULL(1) ? nullptr : PG_GETARG_TEXT_PP(1); 180 | text* provider_arg = PG_ARGISNULL(2) ? nullptr : PG_GETARG_TEXT_PP(2); 181 | 182 | std::string query_text = text_to_cstring(query_text_arg); 183 | std::string api_key = api_key_arg ? text_to_cstring(api_key_arg) : ""; 184 | std::string provider = 185 | provider_arg ? text_to_cstring(provider_arg) : "auto"; 186 | 187 | pg_ai::ExplainRequest request{ 188 | .query_text = query_text, .api_key = api_key, .provider = provider}; 189 | 190 | auto result = pg_ai::QueryGenerator::explainQuery(request); 191 | 192 | if (!result.success) { 193 | ereport(ERROR, (errcode(ERRCODE_EXTERNAL_ROUTINE_EXCEPTION), 194 | errmsg("Query explanation failed: %s", 195 | result.error_message.c_str()))); 196 | } 197 | 198 | PG_RETURN_TEXT_P(cstring_to_text(result.ai_explanation.c_str())); 199 | } catch (const std::exception& e) { 200 | ereport(ERROR, (errcode(ERRCODE_INTERNAL_ERROR), 201 | errmsg("Internal error: %s", e.what()))); 202 | PG_RETURN_NULL(); 203 | } 204 | } 205 | } -------------------------------------------------------------------------------- /docs/src/quick-start.md: -------------------------------------------------------------------------------- 1 | # Quick Start Guide 2 | 3 | This guide will get you up and running with `pg_ai_query` in just a few minutes. We'll walk through setting up the extension, configuring it, and generating your first AI-powered SQL queries. 4 | 5 | ## Step 1: Install and Enable the Extension 6 | 7 | First, make sure you have `pg_ai_query` installed (see [Installation Guide](./installation.md) if needed), then enable it in your database: 8 | 9 | ```sql 10 | -- Connect to your database 11 | psql -d your_database 12 | 13 | -- Enable the extension 14 | CREATE EXTENSION IF NOT EXISTS pg_ai_query; 15 | 16 | -- Verify installation 17 | \df generate_query 18 | ``` 19 | 20 | You should see the `generate_query` function listed. 21 | 22 | ## Step 2: Configure API Access 23 | 24 | Create a configuration file at `~/.pg_ai.config`: 25 | 26 | ```ini 27 | # Quick start configuration 28 | [general] 29 | enable_logging = true 30 | 31 | [openai] 32 | api_key = "your-openai-api-key-here" 33 | default_model = "gpt-4o" 34 | ``` 35 | 36 | **Get your OpenAI API key**: 37 | 1. Visit [platform.openai.com](https://platform.openai.com) 38 | 2. Sign up or log in 39 | 3. Create an API key in the API keys section 40 | 4. Replace `your-openai-api-key-here` with your actual key 41 | 42 | ## Step 3: Create Sample Data 43 | 44 | Let's create some sample tables to work with: 45 | 46 | ```sql 47 | -- Create a users table 48 | CREATE TABLE users ( 49 | id SERIAL PRIMARY KEY, 50 | name VARCHAR(100) NOT NULL, 51 | email VARCHAR(150) UNIQUE NOT NULL, 52 | age INTEGER, 53 | created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, 54 | city VARCHAR(50) 55 | ); 56 | 57 | -- Create an orders table 58 | CREATE TABLE orders ( 59 | id SERIAL PRIMARY KEY, 60 | user_id INTEGER REFERENCES users(id), 61 | product_name VARCHAR(200), 62 | amount DECIMAL(10,2), 63 | status VARCHAR(20) DEFAULT 'pending', 64 | order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP 65 | ); 66 | 67 | -- Insert sample data 68 | INSERT INTO users (name, email, age, city) VALUES 69 | ('Alice Johnson', 'alice@example.com', 28, 'New York'), 70 | ('Bob Smith', 'bob@example.com', 35, 'San Francisco'), 71 | ('Carol Davis', 'carol@example.com', 31, 'Chicago'), 72 | ('David Wilson', 'david@example.com', 27, 'Seattle'), 73 | ('Eva Brown', 'eva@example.com', 33, 'Boston'); 74 | 75 | INSERT INTO orders (user_id, product_name, amount, status) VALUES 76 | (1, 'Laptop Pro', 1299.99, 'completed'), 77 | (1, 'Wireless Mouse', 79.99, 'completed'), 78 | (2, 'Monitor 4K', 399.99, 'pending'), 79 | (3, 'Keyboard Mechanical', 159.99, 'completed'), 80 | (4, 'Tablet', 599.99, 'shipped'), 81 | (5, 'Phone Case', 29.99, 'completed'); 82 | ``` 83 | 84 | ## Step 4: Generate Your First Queries 85 | 86 | Now let's use natural language to query our data: 87 | 88 | ### Basic Query 89 | ```sql 90 | -- Simple user lookup 91 | SELECT generate_query('show me all users'); 92 | ``` 93 | 94 | **Result:** 95 | ```sql 96 | SELECT id, name, email, age, created_at, city FROM public.users LIMIT 1000; 97 | ``` 98 | 99 | ### Query with Conditions 100 | ```sql 101 | -- Users from specific city 102 | SELECT generate_query('find all users from New York'); 103 | ``` 104 | 105 | **Result:** 106 | ```sql 107 | SELECT id, name, email, age, created_at, city 108 | FROM public.users 109 | WHERE city = 'New York' 110 | LIMIT 1000; 111 | ``` 112 | 113 | ### Query with Joins 114 | ```sql 115 | -- Orders with user information 116 | SELECT generate_query('show me all orders with customer names'); 117 | ``` 118 | 119 | **Result:** 120 | ```sql 121 | SELECT o.id, o.product_name, o.amount, o.status, o.order_date, u.name as customer_name 122 | FROM public.orders o 123 | JOIN public.users u ON o.user_id = u.id 124 | LIMIT 1000; 125 | ``` 126 | 127 | ### Aggregation Query 128 | ```sql 129 | -- Revenue analysis 130 | SELECT generate_query('calculate total revenue by order status'); 131 | ``` 132 | 133 | **Result:** 134 | ```sql 135 | SELECT status, SUM(amount) as total_revenue 136 | FROM public.orders 137 | GROUP BY status 138 | LIMIT 1000; 139 | ``` 140 | 141 | ## Step 5: Execute Generated Queries 142 | 143 | The extension generates SQL queries that you can execute immediately: 144 | 145 | ```sql 146 | -- Generate and execute in one step 147 | WITH generated AS ( 148 | SELECT generate_query('show users older than 30') as query 149 | ) 150 | SELECT query FROM generated; 151 | 152 | -- Copy the generated query and run it: 153 | SELECT id, name, email, age, created_at, city 154 | FROM public.users 155 | WHERE age > 30 156 | LIMIT 1000; 157 | ``` 158 | 159 | ## Step 6: Explore Advanced Features 160 | 161 | ### Schema Discovery 162 | Check what tables the extension can see: 163 | 164 | ```sql 165 | -- View all tables in your database 166 | SELECT get_database_tables(); 167 | 168 | -- Get detailed information about a specific table 169 | SELECT get_table_details('users'); 170 | ``` 171 | 172 | ### Different AI Providers 173 | If you have multiple providers configured: 174 | 175 | ```sql 176 | -- Use specific provider 177 | SELECT generate_query('show recent orders', null, 'openai'); 178 | SELECT generate_query('show recent orders', null, 'anthropic'); 179 | SELECT generate_query('show recent orders', null, 'gemini'); 180 | ``` 181 | 182 | ### Complex Queries 183 | Try more complex natural language requests: 184 | 185 | ```sql 186 | -- Complex aggregation 187 | SELECT generate_query('show top 3 customers by total order amount with their contact info'); 188 | 189 | -- Date-based filtering 190 | SELECT generate_query('find orders placed in the last 7 days'); 191 | 192 | -- Multiple conditions 193 | SELECT generate_query('show users from California or New York who are older than 25'); 194 | ``` 195 | 196 | ## Common Patterns 197 | 198 | ### 1. Exploratory Data Analysis 199 | ```sql 200 | -- Understand your data structure 201 | SELECT generate_query('describe the structure of my database'); 202 | SELECT generate_query('show me sample data from each table'); 203 | SELECT generate_query('count records in all tables'); 204 | ``` 205 | 206 | ### 2. Business Intelligence 207 | ```sql 208 | -- Sales analysis 209 | SELECT generate_query('monthly revenue trends'); 210 | SELECT generate_query('top selling products'); 211 | SELECT generate_query('customer acquisition by month'); 212 | ``` 213 | 214 | ### 3. Data Quality Checks 215 | ```sql 216 | -- Find data issues 217 | SELECT generate_query('find users with missing email addresses'); 218 | SELECT generate_query('show duplicate orders'); 219 | SELECT generate_query('find orders without valid user references'); 220 | ``` 221 | 222 | ## Troubleshooting Quick Start 223 | 224 | ### Error: "Extension not found" 225 | ```sql 226 | -- Solution: Install the extension first 227 | CREATE EXTENSION pg_ai_query; 228 | ``` 229 | 230 | ### Error: "API key not configured" 231 | - Check your `~/.pg_ai.config` file exists 232 | - Verify the API key is valid 233 | - Restart your PostgreSQL session 234 | 235 | ### Error: "No tables found" 236 | - Make sure you have user tables (not just system tables) 237 | - Check table permissions 238 | - Try creating the sample tables from Step 3 239 | 240 | ### Unexpected Results 241 | - Enable logging to see what's happening: 242 | ```ini 243 | [general] 244 | enable_logging = true 245 | log_level = "DEBUG" 246 | ``` 247 | - Check the generated query makes sense 248 | - Try simpler natural language descriptions 249 | 250 | ## Next Steps 251 | 252 | Now that you're up and running: 253 | 254 | 1. **Learn more about [Usage Patterns](./usage.md)** for advanced query generation 255 | 2. **Explore [Examples](./examples.md)** for inspiration 256 | 3. **Read about [AI Providers](./providers.md)** to optimize your model choice 257 | 4. **Check [Best Practices](./best-practices.md)** for production usage 258 | 259 | ## Getting Help 260 | 261 | If you run into issues: 262 | - Check the [Troubleshooting Guide](./troubleshooting.md) 263 | - Review the [FAQ](./faq.md) 264 | - Look at more [Examples](./examples.md) for inspiration -------------------------------------------------------------------------------- /tests/unit/test_provider_selector.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | 4 | #include "../test_helpers.hpp" 5 | #include "include/config.hpp" 6 | #include "include/provider_selector.hpp" 7 | 8 | using namespace pg_ai; 9 | using namespace pg_ai::config; 10 | using namespace pg_ai::test_utils; 11 | 12 | class ProviderSelectorTest : public ::testing::Test { 13 | protected: 14 | void SetUp() override { 15 | // Reset config state and load a valid config with both providers for most 16 | // tests 17 | ConfigManager::reset(); 18 | ConfigManager::loadConfig(getConfigFixture("valid_config.ini")); 19 | } 20 | }; 21 | 22 | // Test explicit OpenAI provider selection with API key parameter 23 | TEST_F(ProviderSelectorTest, ExplicitOpenAIWithApiKey) { 24 | auto result = ProviderSelector::selectProvider("sk-param-key", "openai"); 25 | 26 | EXPECT_TRUE(result.success); 27 | EXPECT_EQ(result.provider, Provider::OPENAI); 28 | EXPECT_EQ(result.api_key, "sk-param-key"); 29 | EXPECT_EQ(result.api_key_source, "parameter"); 30 | EXPECT_TRUE(result.error_message.empty()); 31 | } 32 | 33 | // Test explicit Anthropic provider selection with API key parameter 34 | TEST_F(ProviderSelectorTest, ExplicitAnthropicWithApiKey) { 35 | auto result = ProviderSelector::selectProvider("sk-ant-param", "anthropic"); 36 | 37 | EXPECT_TRUE(result.success); 38 | EXPECT_EQ(result.provider, Provider::ANTHROPIC); 39 | EXPECT_EQ(result.api_key, "sk-ant-param"); 40 | EXPECT_EQ(result.api_key_source, "parameter"); 41 | } 42 | 43 | // Test explicit provider selection falls back to config key 44 | TEST_F(ProviderSelectorTest, ExplicitProviderUsesConfigKey) { 45 | auto result = ProviderSelector::selectProvider("", "openai"); 46 | 47 | EXPECT_TRUE(result.success); 48 | EXPECT_EQ(result.provider, Provider::OPENAI); 49 | EXPECT_EQ(result.api_key, 50 | "sk-test-openai-key-12345"); // from valid_config.ini 51 | EXPECT_EQ(result.api_key_source, "openai_config"); 52 | } 53 | 54 | // Test auto-selection with API key defaults to OpenAI 55 | TEST_F(ProviderSelectorTest, AutoSelectWithKeyDefaultsToOpenAI) { 56 | auto result = ProviderSelector::selectProvider("sk-auto-key", ""); 57 | 58 | EXPECT_TRUE(result.success); 59 | EXPECT_EQ(result.provider, Provider::OPENAI); 60 | EXPECT_EQ(result.api_key, "sk-auto-key"); 61 | EXPECT_EQ(result.api_key_source, "parameter"); 62 | } 63 | 64 | // Test auto-selection without key uses config (OpenAI first) 65 | TEST_F(ProviderSelectorTest, AutoSelectWithoutKeyUsesConfig) { 66 | auto result = ProviderSelector::selectProvider("", ""); 67 | 68 | EXPECT_TRUE(result.success); 69 | EXPECT_EQ(result.provider, Provider::OPENAI); 70 | EXPECT_EQ(result.api_key, "sk-test-openai-key-12345"); 71 | EXPECT_EQ(result.api_key_source, "openai_config"); 72 | } 73 | 74 | // Test auto-selection falls back to Anthropic when no OpenAI key 75 | TEST_F(ProviderSelectorTest, AutoSelectFallsBackToAnthropic) { 76 | // Load config with only Anthropic 77 | ConfigManager::loadConfig(getConfigFixture("anthropic_only.ini")); 78 | 79 | auto result = ProviderSelector::selectProvider("", ""); 80 | 81 | EXPECT_TRUE(result.success); 82 | EXPECT_EQ(result.provider, Provider::ANTHROPIC); 83 | EXPECT_EQ(result.api_key, "sk-ant-only-key"); 84 | EXPECT_EQ(result.api_key_source, "anthropic_config"); 85 | } 86 | 87 | // Test auto-selection falls back to Gemini when configured 88 | TEST_F(ProviderSelectorTest, AutoSelectFallsBackToGemini) { 89 | // Load config with only Gemini 90 | ConfigManager::loadConfig(getConfigFixture("gemini_only.ini")); 91 | 92 | auto result = ProviderSelector::selectProvider("", ""); 93 | 94 | EXPECT_TRUE(result.success); 95 | EXPECT_EQ(result.provider, Provider::GEMINI); 96 | EXPECT_EQ(result.api_key, "AIzaSyTest-gemini-key-12345"); 97 | EXPECT_EQ(result.api_key_source, "gemini_config"); 98 | } 99 | 100 | // Test explicit Gemini provider selection with API key parameter 101 | TEST_F(ProviderSelectorTest, ExplicitGeminiWithApiKey) { 102 | auto result = ProviderSelector::selectProvider("AIzaSyTest-key", "gemini"); 103 | 104 | EXPECT_TRUE(result.success); 105 | EXPECT_EQ(result.provider, Provider::GEMINI); 106 | EXPECT_EQ(result.api_key, "AIzaSyTest-key"); 107 | EXPECT_EQ(result.api_key_source, "parameter"); 108 | } 109 | 110 | // Test failure when no API key available 111 | TEST_F(ProviderSelectorTest, FailsWhenNoApiKeyAvailable) { 112 | // Load empty config 113 | ConfigManager::loadConfig(getConfigFixture("empty_config.ini")); 114 | 115 | auto result = ProviderSelector::selectProvider("", ""); 116 | 117 | EXPECT_FALSE(result.success); 118 | EXPECT_FALSE(result.error_message.empty()); 119 | EXPECT_THAT(result.error_message, testing::HasSubstr("API key required")); 120 | } 121 | 122 | // Test explicit provider fails without key 123 | TEST_F(ProviderSelectorTest, ExplicitProviderFailsWithoutKey) { 124 | // Load empty config 125 | ConfigManager::loadConfig(getConfigFixture("empty_config.ini")); 126 | 127 | auto result = ProviderSelector::selectProvider("", "openai"); 128 | 129 | EXPECT_FALSE(result.success); 130 | EXPECT_THAT(result.error_message, testing::HasSubstr("No API key available")); 131 | EXPECT_THAT(result.error_message, testing::HasSubstr("openai")); 132 | } 133 | 134 | // Test that config pointer is set correctly 135 | TEST_F(ProviderSelectorTest, SetsConfigPointer) { 136 | auto result = ProviderSelector::selectProvider("sk-test", "anthropic"); 137 | 138 | EXPECT_TRUE(result.success); 139 | EXPECT_NE(result.config, nullptr); 140 | EXPECT_EQ(result.config->provider, Provider::ANTHROPIC); 141 | } 142 | 143 | // Test provider preference "auto" behaves like empty string 144 | TEST_F(ProviderSelectorTest, AutoPreferenceBehavesLikeEmpty) { 145 | auto result_auto = ProviderSelector::selectProvider("sk-test", "auto"); 146 | auto result_empty = ProviderSelector::selectProvider("sk-test", ""); 147 | 148 | // Both should default to OpenAI when key is provided 149 | EXPECT_EQ(result_auto.provider, result_empty.provider); 150 | EXPECT_EQ(result_auto.api_key_source, result_empty.api_key_source); 151 | } 152 | 153 | // Test case insensitivity of provider names would need lowercase conversion 154 | // Current implementation expects lowercase, so this tests the current behavior 155 | TEST_F(ProviderSelectorTest, ProviderNamesCaseSensitive) { 156 | auto result_lower = ProviderSelector::selectProvider("sk-test", "openai"); 157 | EXPECT_TRUE(result_lower.success); 158 | EXPECT_EQ(result_lower.provider, Provider::OPENAI); 159 | 160 | // Uppercase won't match - will go to auto-select 161 | auto result_upper = ProviderSelector::selectProvider("sk-test", "OPENAI"); 162 | EXPECT_TRUE(result_upper.success); 163 | // Goes to auto-select path, still picks OpenAI due to provided key 164 | EXPECT_EQ(result_upper.provider, Provider::OPENAI); 165 | } 166 | 167 | // Test with minimal config (only OpenAI) 168 | TEST_F(ProviderSelectorTest, MinimalConfigOpenAIOnly) { 169 | ConfigManager::loadConfig(getConfigFixture("minimal_config.ini")); 170 | 171 | // Auto-select should find OpenAI 172 | auto result = ProviderSelector::selectProvider("", ""); 173 | EXPECT_TRUE(result.success); 174 | EXPECT_EQ(result.provider, Provider::OPENAI); 175 | EXPECT_EQ(result.api_key, "sk-minimal-test-key"); 176 | 177 | // Explicit Anthropic should fail (no key) 178 | auto result_anthropic = ProviderSelector::selectProvider("", "anthropic"); 179 | EXPECT_FALSE(result_anthropic.success); 180 | } 181 | 182 | // Test that ProviderSelectionResult has expected default values 183 | TEST(ProviderSelectionResultTest, DefaultValues) { 184 | ProviderSelectionResult result; 185 | 186 | // Verify default initialization 187 | EXPECT_FALSE(result.success); 188 | EXPECT_TRUE(result.api_key.empty()); 189 | EXPECT_TRUE(result.api_key_source.empty()); 190 | EXPECT_TRUE(result.error_message.empty()); 191 | EXPECT_EQ(result.config, nullptr); 192 | EXPECT_EQ(result.provider, config::Provider::OPENAI); 193 | } 194 | -------------------------------------------------------------------------------- /flake.nix: -------------------------------------------------------------------------------- 1 | { 2 | description = "PostgreSQL AI Query Extension - Natural language to SQL with AI"; 3 | 4 | inputs = { 5 | nixpkgs.url = "github:NixOS/nixpkgs/nixos-unstable"; 6 | flake-utils.url = "github:numtide/flake-utils"; 7 | }; 8 | 9 | outputs = { self, nixpkgs, flake-utils }: 10 | flake-utils.lib.eachDefaultSystem (system: 11 | let 12 | pkgs = nixpkgs.legacyPackages.${system}; 13 | 14 | # PostgreSQL version to use 15 | postgresql = pkgs.postgresql_16; 16 | 17 | # Build the extension 18 | pg-ai-query = pkgs.stdenv.mkDerivation { 19 | pname = "pg_ai_query"; 20 | version = "1.0.0"; 21 | 22 | src = ./.; 23 | 24 | nativeBuildInputs = with pkgs; [ 25 | cmake 26 | pkg-config 27 | postgresql 28 | ]; 29 | 30 | buildInputs = with pkgs; [ 31 | openssl 32 | zlib 33 | gettext 34 | ]; 35 | 36 | # Initialize git submodules 37 | preConfigure = '' 38 | if [ -d .git ]; then 39 | git submodule update --init --recursive || true 40 | fi 41 | ''; 42 | 43 | cmakeFlags = [ 44 | "-DCMAKE_BUILD_TYPE=Release" 45 | "-DPG_CONFIG=${postgresql}/bin/pg_config" 46 | ]; 47 | 48 | installPhase = '' 49 | mkdir -p $out/lib 50 | mkdir -p $out/share/extension 51 | 52 | # Install the shared library 53 | if [ -f pg_ai_query.dylib ]; then 54 | cp pg_ai_query.dylib $out/lib/ 55 | elif [ -f pg_ai_query.so ]; then 56 | cp pg_ai_query.so $out/lib/ 57 | fi 58 | 59 | # Install SQL and control files 60 | cp ${./sql/pg_ai_query--1.0.sql} $out/share/extension/ 61 | cp ${./pg_ai_query.control} $out/share/extension/ 62 | ''; 63 | 64 | meta = with pkgs.lib; { 65 | description = "PostgreSQL extension for generating SQL queries from natural language using AI"; 66 | homepage = "https://github.com/benodiwal/pg_ai_query"; 67 | license = licenses.mit; 68 | platforms = platforms.unix; 69 | }; 70 | }; 71 | 72 | in 73 | { 74 | packages = { 75 | default = pg-ai-query; 76 | pg-ai-query = pg-ai-query; 77 | }; 78 | 79 | # Development shell 80 | devShells.default = pkgs.mkShell { 81 | name = "pg-ai-query-dev"; 82 | 83 | buildInputs = with pkgs; [ 84 | # Build tools 85 | cmake 86 | gnumake 87 | pkg-config 88 | git 89 | 90 | # Compiler toolchain 91 | clang 92 | clang-tools 93 | lldb 94 | 95 | # PostgreSQL 96 | postgresql 97 | 98 | # Libraries 99 | openssl 100 | zlib 101 | gettext 102 | 103 | # Development tools 104 | gdb 105 | 106 | # Rust toolchain (for mdbook and other tools) 107 | cargo 108 | rustc 109 | rust-analyzer 110 | rustfmt 111 | clippy 112 | 113 | # Documentation tools 114 | mdbook 115 | mdbook-mermaid # Diagrams in mdbook 116 | mdbook-linkcheck # Check for broken links 117 | mdbook-toc # Table of contents 118 | mdbook-admonish # Callout blocks (notes, warnings) 119 | 120 | # Additional documentation tools 121 | graphviz # For generating diagrams 122 | plantuml # UML diagrams 123 | 124 | # Formatting and linting 125 | clang-tools # includes clang-format 126 | yamllint 127 | markdownlint-cli # Markdown linting 128 | 129 | # Utility tools 130 | ripgrep # Fast grep alternative 131 | fd # Fast find alternative 132 | jq # JSON processing 133 | ] ++ pkgs.lib.optionals pkgs.stdenv.isLinux [ 134 | # Linux-only tools (valgrind doesn't work on macOS ARM) 135 | pkgs.valgrind 136 | ]; 137 | 138 | shellHook = '' 139 | echo "PostgreSQL AI Query Extension Development Environment" 140 | echo "=====================================================" 141 | echo "" 142 | echo "Tool Versions:" 143 | echo " PostgreSQL: ${postgresql.version}" 144 | echo " C++ Compiler: $(clang++ --version | head -n1)" 145 | echo " CMake: $(cmake --version | head -n1)" 146 | echo " Rust: $(rustc --version)" 147 | echo " Cargo: $(cargo --version)" 148 | echo " mdbook: $(mdbook --version)" 149 | echo "" 150 | echo "Available commands:" 151 | echo " C++ Build:" 152 | echo " - cmake, make, clang-format" 153 | echo " PostgreSQL:" 154 | echo " - pg_config, psql, initdb, pg_ctl" 155 | echo " Rust/Cargo:" 156 | echo " - cargo, rustc, rust-analyzer, rustfmt, clippy" 157 | echo " Documentation:" 158 | echo " - mdbook, mdbook-mermaid, mdbook-linkcheck" 159 | echo " Utilities:" 160 | echo " - rg (ripgrep), fd, jq, yamllint, markdownlint" 161 | echo "" 162 | echo "Quick start:" 163 | echo " Extension: mkdir -p build && cd build && cmake .. && make" 164 | echo " Docs: cd docs && mdbook serve" 165 | echo "" 166 | 167 | # Set up PostgreSQL environment 168 | export PGDATA="$PWD/.pgdata" 169 | export PGHOST="$PWD/.pgdata" 170 | export PGPORT=5432 171 | export PG_CONFIG="${postgresql}/bin/pg_config" 172 | 173 | # Add pg_config to PATH 174 | export PATH="${postgresql}/bin:$PATH" 175 | 176 | # Set up build environment 177 | export CMAKE_EXPORT_COMPILE_COMMANDS=1 178 | 179 | # For clangd LSP 180 | if [ ! -f compile_commands.json ] && [ -f build/compile_commands.json ]; then 181 | ln -sf build/compile_commands.json compile_commands.json 182 | fi 183 | 184 | # Rust environment 185 | export RUST_BACKTRACE=1 186 | export CARGO_HOME="$PWD/.cargo" 187 | export RUSTUP_HOME="$PWD/.rustup" 188 | 189 | echo "Environment variables set:" 190 | echo " PGDATA=$PGDATA" 191 | echo " PG_CONFIG=$PG_CONFIG" 192 | echo " CARGO_HOME=$CARGO_HOME" 193 | echo "" 194 | echo "PostgreSQL Setup:" 195 | echo " initdb -D .pgdata" 196 | echo " pg_ctl -D .pgdata -l logfile start" 197 | echo "" 198 | echo "Documentation:" 199 | echo " cd docs && mdbook serve" 200 | echo " View at: http://localhost:3000" 201 | echo "" 202 | ''; 203 | 204 | # Set C++ standard and other flags 205 | hardeningDisable = [ "fortify" ]; 206 | 207 | # Additional environment variables 208 | CMAKE_BUILD_TYPE = "Debug"; 209 | CMAKE_EXPORT_COMPILE_COMMANDS = "1"; 210 | }; 211 | 212 | # Alternative development shells 213 | devShells.minimal = pkgs.mkShell { 214 | name = "pg-ai-query-minimal"; 215 | 216 | buildInputs = with pkgs; [ 217 | cmake 218 | gnumake 219 | clang 220 | postgresql 221 | openssl 222 | zlib 223 | gettext 224 | ]; 225 | 226 | shellHook = '' 227 | echo "Minimal PostgreSQL AI Query development environment" 228 | export PG_CONFIG="${postgresql}/bin/pg_config" 229 | export PATH="${postgresql}/bin:$PATH" 230 | ''; 231 | }; 232 | } 233 | ); 234 | } 235 | -------------------------------------------------------------------------------- /.github/workflows/formatting.yml: -------------------------------------------------------------------------------- 1 | name: Code Formatting 2 | 3 | on: 4 | push: 5 | branches: [ main, develop ] 6 | pull_request: 7 | branches: [ main, develop ] 8 | 9 | jobs: 10 | format-check: 11 | name: Check Code Formatting 12 | runs-on: ubuntu-latest 13 | 14 | steps: 15 | - name: Checkout code 16 | uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1 17 | 18 | - name: Install clang-format 19 | run: | 20 | sudo apt-get update 21 | sudo apt-get install -y clang-format-15 22 | 23 | - name: Check C++ formatting 24 | run: | 25 | echo "Checking C++ code formatting..." 26 | 27 | # Check if .clang-format exists, if not skip formatting 28 | if [ ! -f .clang-format ]; then 29 | echo "No .clang-format file found, skipping C++ formatting" 30 | exit 0 31 | fi 32 | 33 | # Find all C++ source files 34 | CPP_FILES=$(find src -name "*.cpp" -o -name "*.hpp" -o -name "*.h" 2>/dev/null || true) 35 | 36 | if [ -z "$CPP_FILES" ]; then 37 | echo "No C++ files found to format" 38 | exit 0 39 | fi 40 | 41 | # Check formatting 42 | echo "Files to check:" 43 | echo "$CPP_FILES" 44 | 45 | UNFORMATTED_FILES="" 46 | for file in $CPP_FILES; do 47 | echo "Checking $file..." 48 | if ! clang-format-15 --dry-run --Werror "$file" > /dev/null 2>&1; then 49 | UNFORMATTED_FILES="$UNFORMATTED_FILES $file" 50 | fi 51 | done 52 | 53 | if [ -n "$UNFORMATTED_FILES" ]; then 54 | echo "ERROR: The following files need formatting:" 55 | for file in $UNFORMATTED_FILES; do 56 | echo " - $file" 57 | done 58 | echo "" 59 | echo "To fix formatting issues, run:" 60 | echo " clang-format -i $UNFORMATTED_FILES" 61 | exit 1 62 | else 63 | echo "All C++ files are properly formatted" 64 | fi 65 | 66 | - name: Check SQL formatting 67 | run: | 68 | echo "Checking SQL formatting..." 69 | 70 | # Install sqlformat (part of sqlparse) 71 | pip install sqlparse 72 | 73 | # Find SQL files 74 | SQL_FILES=$(find . -name "*.sql" -not -path "./.git/*" -not -path "./build/*" 2>/dev/null || true) 75 | 76 | if [ -z "$SQL_FILES" ]; then 77 | echo "No SQL files found to check" 78 | exit 0 79 | fi 80 | 81 | echo "SQL files to check:" 82 | echo "$SQL_FILES" 83 | 84 | # Basic SQL validation 85 | for file in $SQL_FILES; do 86 | echo "Checking $file..." 87 | if ! python -c " 88 | import sqlparse 89 | import sys 90 | with open('$file', 'r') as f: 91 | content = f.read() 92 | try: 93 | parsed = sqlparse.parse(content) 94 | if not parsed: 95 | sys.exit(1) 96 | except Exception as e: 97 | print(f'Error parsing SQL: {e}') 98 | sys.exit(1) 99 | print('SQL syntax is valid') 100 | "; then 101 | echo "ERROR: SQL syntax error in $file" 102 | exit 1 103 | fi 104 | done 105 | 106 | auto-format: 107 | name: Auto-format Code (on push to main) 108 | runs-on: ubuntu-latest 109 | if: github.event_name == 'push' && github.ref == 'refs/heads/main' 110 | needs: format-check 111 | # Only run if format-check failed (meaning there are formatting issues) 112 | # We'll make this conditional in the steps instead 113 | 114 | steps: 115 | - name: Checkout code 116 | uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1 117 | with: 118 | token: ${{ secrets.GITHUB_TOKEN }} 119 | 120 | - name: Install clang-format 121 | run: | 122 | sudo apt-get update 123 | sudo apt-get install -y clang-format-15 124 | 125 | - name: Format C++ code 126 | run: | 127 | echo "Auto-formatting C++ code..." 128 | 129 | # Check if .clang-format exists 130 | if [ ! -f .clang-format ]; then 131 | echo "No .clang-format file found, skipping C++ formatting" 132 | exit 0 133 | fi 134 | 135 | # Find and format C++ files 136 | CPP_FILES=$(find src -name "*.cpp" -o -name "*.hpp" -o -name "*.h" 2>/dev/null || true) 137 | 138 | if [ -n "$CPP_FILES" ]; then 139 | echo "Formatting C++ files:" 140 | echo "$CPP_FILES" 141 | clang-format-15 -i $CPP_FILES 142 | else 143 | echo "No C++ files to format" 144 | fi 145 | 146 | - name: Check for changes 147 | id: verify-changed-files 148 | run: | 149 | if [ -n "$(git status --porcelain)" ]; then 150 | echo "changed=true" >> $GITHUB_OUTPUT 151 | echo "Files were formatted:" 152 | git status --porcelain 153 | else 154 | echo "changed=false" >> $GITHUB_OUTPUT 155 | echo "No formatting changes needed" 156 | fi 157 | 158 | - name: Commit formatted code 159 | if: steps.verify-changed-files.outputs.changed == 'true' 160 | run: | 161 | git config --local user.email "action@github.com" 162 | git config --local user.name "GitHub Action" 163 | git add -A 164 | git commit -m "Auto-format code with clang-format 165 | 166 | Automated formatting applied by GitHub Actions" || exit 0 167 | 168 | - name: Push changes 169 | if: steps.verify-changed-files.outputs.changed == 'true' 170 | uses: ad-m/github-push-action@master 171 | with: 172 | github_token: ${{ secrets.GITHUB_TOKEN }} 173 | branch: main 174 | 175 | markdown-lint: 176 | name: Lint Markdown Files 177 | runs-on: ubuntu-latest 178 | 179 | steps: 180 | - name: Checkout code 181 | uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1 182 | 183 | - name: Setup Node.js 184 | uses: actions/setup-node@395ad3262231945c25e8478fd5baf05154b1d79f # v6.1.0 185 | with: 186 | node-version: '18' 187 | 188 | - name: Install markdownlint 189 | run: npm install -g markdownlint-cli 190 | 191 | - name: Check markdownlint config exists 192 | run: | 193 | if [ ! -f .markdownlint.json ]; then 194 | echo "No .markdownlint.json found, using default rules" 195 | else 196 | echo "Using existing .markdownlint.json configuration" 197 | fi 198 | 199 | - name: Lint markdown files 200 | continue-on-error: true 201 | run: | 202 | echo "Linting markdown files..." 203 | MARKDOWN_FILES=$(find . -name "*.md" -not -path "./.git/*" -not -path "./node_modules/*" -not -path "./build/*" 2>/dev/null || true) 204 | 205 | if [ -z "$MARKDOWN_FILES" ]; then 206 | echo "No markdown files found to lint" 207 | exit 0 208 | fi 209 | 210 | echo "Markdown files to lint:" 211 | echo "$MARKDOWN_FILES" 212 | 213 | markdownlint $MARKDOWN_FILES || echo "Markdown linting completed with warnings" 214 | 215 | yaml-lint: 216 | name: Lint YAML Files 217 | runs-on: ubuntu-latest 218 | 219 | steps: 220 | - name: Checkout code 221 | uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1 222 | 223 | - name: Install yamllint 224 | run: | 225 | sudo apt-get update 226 | sudo apt-get install -y yamllint 227 | 228 | - name: Check yamllint config exists 229 | run: | 230 | if [ ! -f .yamllint.yml ]; then 231 | echo "No .yamllint.yml found, using default rules" 232 | else 233 | echo "Using existing .yamllint.yml configuration" 234 | fi 235 | 236 | - name: Lint YAML files 237 | continue-on-error: true 238 | run: | 239 | echo "Linting YAML files..." 240 | YAML_FILES=$(find . -name "*.yml" -o -name "*.yaml" -not -path "./.git/*" -not -path "./build/*" 2>/dev/null || true) 241 | 242 | if [ -z "$YAML_FILES" ]; then 243 | echo "No YAML files found to lint" 244 | exit 0 245 | fi 246 | 247 | echo "YAML files to lint:" 248 | echo "$YAML_FILES" 249 | 250 | yamllint $YAML_FILES || echo "YAML linting completed with warnings" -------------------------------------------------------------------------------- /docs/src/explain-query.md: -------------------------------------------------------------------------------- 1 | # explain_query Function 2 | 3 | The `explain_query` function runs EXPLAIN ANALYZE on PostgreSQL queries and provides AI-powered performance analysis and optimization recommendations. 4 | 5 | ## Overview 6 | 7 | This function combines PostgreSQL's built-in EXPLAIN ANALYZE functionality with advanced AI analysis to provide: 8 | - Detailed execution plan analysis 9 | - Performance bottleneck identification 10 | - Index recommendations 11 | - Query optimization suggestions 12 | - Easy-to-understand explanations in plain English 13 | 14 | ## Function Signature 15 | 16 | ```sql 17 | explain_query( 18 | query_text text, 19 | api_key text DEFAULT NULL, 20 | provider text DEFAULT 'auto' 21 | ) RETURNS text 22 | ``` 23 | 24 | ## Parameters 25 | 26 | | Parameter | Type | Default | Description | 27 | |-----------|------|---------|-------------| 28 | | `query_text` | `text` | *required* | The SQL query to analyze | 29 | | `api_key` | `text` | `NULL` | OpenAI or Anthropic API key (uses config if not provided) | 30 | | `provider` | `text` | `'auto'` | AI provider: `'openai'`, `'anthropic'`, or `'auto'` | 31 | 32 | ## Basic Usage 33 | 34 | ### Simple Query Analysis 35 | 36 | ```sql 37 | SELECT explain_query('SELECT * FROM users WHERE created_at > NOW() - INTERVAL ''7 days'''); 38 | ``` 39 | 40 | ### Complex Query Analysis 41 | 42 | ```sql 43 | SELECT explain_query(' 44 | SELECT u.username, COUNT(o.id) as order_count 45 | FROM users u 46 | LEFT JOIN orders o ON u.id = o.user_id 47 | WHERE u.created_at > NOW() - INTERVAL ''30 days'' 48 | GROUP BY u.id, u.username 49 | HAVING COUNT(o.id) > 5 50 | ORDER BY order_count DESC 51 | LIMIT 100 52 | '); 53 | ``` 54 | 55 | ### Using Specific AI Provider 56 | 57 | ```sql 58 | -- Use OpenAI specifically 59 | SELECT explain_query( 60 | 'SELECT * FROM products WHERE price > 100 ORDER BY price DESC LIMIT 10', 61 | 'your-openai-api-key', 62 | 'openai' 63 | ); 64 | 65 | -- Use Anthropic specifically 66 | SELECT explain_query( 67 | 'SELECT * FROM products WHERE price > 100 ORDER BY price DESC LIMIT 10', 68 | 'your-anthropic-api-key', 69 | 'anthropic' 70 | ); 71 | ``` 72 | 73 | ## Output Format 74 | 75 | The function returns a structured text analysis with these sections: 76 | 77 | ### Query Overview 78 | Brief description of what the query accomplishes. 79 | 80 | ### Performance Summary 81 | - Overall execution time 82 | - Total cost estimate (PostgreSQL's relative measure) 83 | - Number of rows processed 84 | 85 | ### Execution Plan Analysis 86 | - Key operations in the execution plan 87 | - Join strategies and scan methods 88 | - Focus on expensive operations 89 | 90 | ### Performance Issues 91 | - Identified bottlenecks 92 | - Inefficient operations 93 | - Resource usage concerns 94 | 95 | ### Optimization Suggestions 96 | - Specific recommendations for improvement 97 | - Query rewriting suggestions 98 | - Configuration recommendations 99 | 100 | ### Index Recommendations 101 | - Missing indexes that could improve performance 102 | - Specific `CREATE INDEX` statements 103 | - Partial index suggestions where applicable 104 | 105 | ## Example Output 106 | 107 | ``` 108 | Query Overview: 109 | This query retrieves users created within the last 7 days along with their order statistics, 110 | focusing on active customers with more than 5 orders. 111 | 112 | Performance Summary: 113 | - Overall Execution Time: 45.2 milliseconds 114 | - Total Cost: 1250.75 (PostgreSQL's relative cost estimate) 115 | - Rows Processed: 156 rows returned from 50,000 rows examined 116 | 117 | Execution Plan Analysis: 118 | - Hash Join: Efficiently joins users and orders tables 119 | - Index Scan: Uses existing index on users.created_at 120 | - Sequential Scan: Full table scan on orders table (potential bottleneck) 121 | - HashAggregate: Groups results for COUNT calculations 122 | - Sort: Orders results by order count 123 | 124 | Performance Issues: 125 | - Sequential scan on orders table indicates missing index on user_id 126 | - Hash join spills to disk due to large orders table size 127 | - HAVING clause applied after aggregation, could be optimized 128 | 129 | Optimization Suggestions: 130 | 1. Add index on orders.user_id to eliminate sequential scan 131 | 2. Consider partitioning orders table by date if very large 132 | 3. Move some HAVING conditions to WHERE clause if possible 133 | 4. Increase work_mem setting if hash joins frequently spill to disk 134 | 135 | Index Recommendations: 136 | -- Primary recommendation 137 | CREATE INDEX idx_orders_user_id ON orders(user_id); 138 | 139 | -- Optional: Composite index for better performance 140 | CREATE INDEX idx_orders_user_date ON orders(user_id, created_at); 141 | 142 | -- Consider partial index for recent orders 143 | CREATE INDEX idx_orders_recent ON orders(user_id) WHERE created_at > NOW() - INTERVAL '1 year'; 144 | ``` 145 | 146 | ## Supported Query Types 147 | 148 | The function supports analysis of: 149 | - `SELECT` statements 150 | - `WITH` (Common Table Expressions) queries 151 | - `VALUES` clauses 152 | 153 | > **Note**: Only read-only queries are supported for security reasons. DDL, DML, and other statement types will return an error. 154 | 155 | ## Configuration 156 | 157 | The function uses the same configuration system as other pg_ai_query functions. 158 | 159 | ### API Keys 160 | 161 | Configure API keys in `~/.pg_ai.config`: 162 | 163 | ```ini 164 | [openai] 165 | api_key = "your-openai-api-key" 166 | default_model = "gpt-4o" 167 | 168 | [anthropic] 169 | api_key = "your-anthropic-api-key" 170 | default_model = "claude-sonnet-4-5-20250929" 171 | ``` 172 | 173 | ### Provider Selection 174 | 175 | - `'auto'` (default): Uses the first available API key from configuration 176 | - `'openai'`: Forces use of OpenAI models 177 | - `'anthropic'`: Forces use of Anthropic models 178 | 179 | ## Error Handling 180 | 181 | Common error scenarios and their solutions: 182 | 183 | ### Invalid Query 184 | ```sql 185 | SELECT explain_query('SELECT * FROM non_existent_table'); 186 | -- Error: relation "non_existent_table" does not exist 187 | ``` 188 | 189 | ### Missing API Key 190 | ```sql 191 | -- When no API key is configured 192 | SELECT explain_query('SELECT * FROM users'); 193 | -- Error: API key required. Pass as parameter or configure ~/.pg_ai.config 194 | ``` 195 | 196 | ### Syntax Error 197 | ```sql 198 | SELECT explain_query('SELECT * FORM users'); -- typo in FROM 199 | -- Error: syntax error at or near "FORM" 200 | ``` 201 | 202 | ### Unsupported Query Type 203 | ```sql 204 | SELECT explain_query('DROP TABLE users'); 205 | -- Error: Only SELECT, WITH, and VALUES queries are allowed 206 | ``` 207 | 208 | ## Performance Considerations 209 | 210 | - **Query Execution**: The function actually executes your query via EXPLAIN ANALYZE 211 | - **Execution Time**: Query execution time is included in the analysis 212 | - **AI Processing**: AI analysis adds typically 1-3 seconds of processing time 213 | - **Large Queries**: Very complex queries may take longer to analyze 214 | 215 | ## Security Notes 216 | 217 | - Queries are executed with the same permissions as the calling user 218 | - No privilege escalation occurs 219 | - API keys are handled securely and not logged 220 | - Only read-only query types are permitted 221 | 222 | ## Best Practices 223 | 224 | 1. **Use for Optimization**: Run on queries that are performing slowly 225 | 2. **Test Variations**: Compare different approaches to the same query 226 | 3. **Monitor Trends**: Regular analysis helps track performance changes over time 227 | 4. **Validate Recommendations**: Always test suggested indexes before implementing in production 228 | 5. **Consider Data Volume**: Remember that performance characteristics change with data size 229 | 230 | ## Integration with Other Functions 231 | 232 | The `explain_query` function works well with other pg_ai_query functions: 233 | 234 | ```sql 235 | -- Generate a query, then analyze its performance 236 | WITH generated_query AS ( 237 | SELECT generate_query('show recent high-value orders') as sql 238 | ) 239 | SELECT explain_query((SELECT sql FROM generated_query)); 240 | ``` 241 | 242 | ## Common Use Cases 243 | 244 | - **Query Optimization**: Identify and fix slow queries 245 | - **Index Planning**: Determine what indexes to create 246 | - **Performance Monitoring**: Regular health checks of critical queries 247 | - **Code Reviews**: Analyze query performance before deployment 248 | - **Learning**: Understand how PostgreSQL executes different query patterns 249 | 250 | ## See Also 251 | 252 | - [Performance Analysis Examples](./performance-analysis.md) 253 | - [Best Practices](./best-practices.md) 254 | - [Integration Patterns](./integration.md) 255 | - [Troubleshooting](./troubleshooting.md) --------------------------------------------------------------------------------