├── SUPPORT.md ├── GOVERNANCE.md ├── .github ├── PULL_REQUEST_TEMPLATE.md └── ISSUE_TEMPLATE │ ├── feature_request.md │ └── bug_report.md ├── ATTRIBUTION.md ├── THIRD_PARTY_LICENSES.md ├── ROADMAP.md ├── CONTRIBUTING.md ├── LICENSE ├── README.md └── ARCHITECTURE.md /SUPPORT.md: -------------------------------------------------------------------------------- 1 | # Support 2 | 3 | - **Q&A:** GitHub Discussions 4 | - **Bugs / Features:** GitHub Issues 5 | - **Slack:** Link to Slack -------------------------------------------------------------------------------- /GOVERNANCE.md: -------------------------------------------------------------------------------- 1 | # Project Governance 2 | 3 | - **Maintainers:** @rnik12 (More are welcome) 4 | - **Decision making:** Lazy consensus on issues/PRs. If no consensus, maintainers decide. -------------------------------------------------------------------------------- /.github/PULL_REQUEST_TEMPLATE.md: -------------------------------------------------------------------------------- 1 | ## What & Why 2 | (Briefly describe the change and the problem it solves.) 3 | 4 | ## How Tested 5 | - [ ] Unit tests 6 | - [ ] Manual testing 7 | - [ ] Screenshots (if UI) 8 | 9 | ## Checklist 10 | - [ ] Conventional commit 11 | - [ ] Docs updated (if needed) 12 | - [ ] No secrets or keys in code -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/feature_request.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Feature request 3 | about: Idea or enhancement 4 | labels: enhancement 5 | --- 6 | 7 | **Problem** 8 | What pain or limitation are you hitting? 9 | 10 | **Proposal** 11 | What would you like rrHog to do? 12 | 13 | **Alternatives considered** 14 | 15 | **Additional context / mockups** -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/bug_report.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Bug report 3 | about: Something broke or behaves oddly 4 | labels: bug 5 | --- 6 | 7 | **What happened?** 8 | 9 | **Steps to reproduce** 10 | 1. 11 | 2. 12 | 13 | **Expected behavior** 14 | 15 | **Environment** 16 | - rrHog version: 17 | - Deployment (Docker/Local): 18 | - Browser/OS: 19 | 20 | **Additional context / logs** -------------------------------------------------------------------------------- /ATTRIBUTION.md: -------------------------------------------------------------------------------- 1 | # Attribution 2 | 3 | rrHog builds on fantastic open source: 4 | 5 | - **rrweb** — MIT 6 | - **ClickHouse** — Apache-2.0 7 | - **NATS** — Apache-2.0 8 | - **PostgreSQL** — PostgreSQL License (BSD-style) 9 | 10 | We respect each project’s license. If we vendor or redistribute their code/binaries, we will include their notices accordingly. See [THIRD_PARTY_LICENSES.md](THIRD_PARTY_LICENSES.md). -------------------------------------------------------------------------------- /THIRD_PARTY_LICENSES.md: -------------------------------------------------------------------------------- 1 | # Third-Party Licenses (Placeholder) 2 | 3 | This document will list third-party components we **vendor or redistribute** (not merely depend on at runtime), along with their licenses and notices. 4 | 5 | For now (runtime dependencies only): 6 | - rrweb (MIT) — https://github.com/rrweb-io/rrweb 7 | - ClickHouse (Apache-2.0) — https://github.com/ClickHouse/ClickHouse 8 | - NATS (Apache-2.0) — https://github.com/nats-io/nats-server 9 | - PostgreSQL (PostgreSQL License) — https://www.postgresql.org/about/licence/ 10 | 11 | If/when we vendor code, paste each project’s license text below this line. -------------------------------------------------------------------------------- /ROADMAP.md: -------------------------------------------------------------------------------- 1 | # Roadmap 2 | 3 | ## v0 (MVP) 4 | - [ ] Ingest `/i` (batch rrweb) → NATS → CH bulk insert 5 | - [ ] Playback API + rrweb-player UI 6 | - [ ] Basic dashboards (pageviews, sessions, top pages) 7 | - [ ] Per-project keys, rate limits 8 | - [ ] Docker Compose + seed script 9 | 10 | ## v0.1–0.2 11 | - [ ] Error events (console/network) + rage clicks 12 | - [ ] Materialized views: retention, paths 13 | - [ ] Quotas & retention policies per project 14 | - [ ] S3/MinIO blob offload option for rrweb payloads 15 | 16 | ## v0.3+ 17 | - [ ] SSO (OIDC) & RBAC 18 | - [ ] Multi-region ingest option 19 | - [ ] Helm chart & Terraform examples 20 | - [ ] Public demo with synthetic data -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing to rrHog 2 | 3 | Thanks for considering a contribution! 🙌 4 | 5 | ## Ground rules 6 | - We use **MIT** license. By contributing, you agree your contribution is licensed under MIT. 7 | - Follow **Conventional Commits** (`feat:`, `fix:`, `docs:`, `chore:`…). 8 | - Keep PRs focused and add/adjust tests where it makes sense. 9 | 10 | ## How to start 11 | 1. Comment on an issue (look for **good first issue** / **help wanted**). 12 | 2. Discuss design for non-trivial changes. 13 | 3. Open a draft PR early for feedback. 14 | 15 | ## Dev quickstart (high level) 16 | - API: FastAPI (Python 3.11) 17 | - Worker: Python asyncio consumer 18 | - Web: Next.js (Node 20+) 19 | - Storage: ClickHouse & Postgres 20 | - Queue: NATS JetStream 21 | 22 | ## Testing 23 | - Prefer small, deterministic tests. 24 | - Add fixtures for sample rrweb batches. 25 | 26 | ## Code style 27 | - Python: black + ruff 28 | - TS/JS: eslint + prettier 29 | 30 | ## Community 31 | Be kind, be constructive. See our [Code of Conduct](CODE_OF_CONDUCT.md). -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2025 rrHog 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # rrHog — OSS Web Analytics & Session Replay (rrweb + ClickHouse) 2 | 3 | > Open-source, self-hosted web analytics & session monitoring built on **rrweb** with **async ingestion** for rock-solid reliability. 4 | > Stack: **FastAPI** (ingest/read) • **NATS JetStream** (queue) • **ClickHouse** (events) • **Postgres** (meta) • **Next.js** (UI) 5 | 6 | [![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE) 7 | [![Docs](https://img.shields.io/badge/docs-Architecture-brightgreen)](ARCHITECTURE.md) 8 | [![Good First Issues](https://img.shields.io/github/labels/YOUR_ORG/rrHog/good%20first%20issue)](#) 9 | 10 | **Why rrHog?** Most analytics pipelines lose data when DBs or networks hiccup. rrHog **ACKs fast** and persists to a **durable queue** first, so ingestion keeps flowing even if storage is down. 11 | 12 | ## Features 13 | - 🌀 **Async ingest** → instant `202` to clients 14 | - 🎥 **Session replay** with rrweb & rrweb-player 15 | - ⚡ **ClickHouse** analytics (blazing-fast queries) 16 | - 🧰 **Next.js** dashboard (App Router) 17 | - 🐳 **Docker Compose** one-box deploy 18 | - 🔒 Per-project keys, rate limits, and basic PII guards 19 | 20 | ## Quick start (high level) 21 | 1. `docker compose up -d` (services: nginx, nats, clickhouse, postgres, api, worker, web) 22 | 2. Create a project → get **WRITE** and **READ** keys. 23 | 3. Add rrweb snippet to your site and post batched segments to `/i`. 24 | 25 | See **[ARCHITECTURE.md](ARCHITECTURE.md)** for schemas, mermaid diagrams, and example API/worker snippets. 26 | 27 | ## Repo metadata (GitHub) 28 | - **Description:** `rrHog — Open-source, self-hosted web analytics & session replay built on rrweb. FastAPI + NATS JetStream + ClickHouse, Next.js UI.` 29 | - **Topics (11):** `open-source, self-hosted, web-analytics, session-replay, session-recording, rrweb, clickhouse, fastapi, nats-jetstream, postgresql, nextjs` 30 | 31 | ## Community 32 | - Start here: [CONTRIBUTING.md](CONTRIBUTING.md) 33 | - Code of Conduct: [CODE_OF_CONDUCT.md](CODE_OF_CONDUCT.md) 34 | - Support: [SUPPORT.md](SUPPORT.md) 35 | - Roadmap: [ROADMAP.md](ROADMAP.md) 36 | 37 | ## Security 38 | Please report vulnerabilities privately — see [SECURITY.md](SECURITY.md). 39 | 40 | ## License 41 | MIT © 2025 You & Contributors — see [LICENSE](LICENSE). 42 | 43 | > 💖 Want to support? Add a **GitHub Sponsors** button by creating `.github/FUNDING.yml` later. 44 | -------------------------------------------------------------------------------- /ARCHITECTURE.md: -------------------------------------------------------------------------------- 1 | # Architecture Overview 2 | 3 | rrHog is optimized for **reliable ingestion** and **minimal ops**. The API **acknowledges immediately**, then a worker asynchronously writes to storage. 4 | 5 | 6 | ## Diagram 7 | 8 | ```mermaid 9 | flowchart LR 10 | subgraph Client 11 | B["Browser — rrweb"] 12 | end 13 | subgraph Edge 14 | NG["Nginx or Kong"] 15 | end 16 | subgraph App 17 | A["FastAPI (ingest & read)"] 18 | Q["NATS JetStream"] 19 | W["Python Worker"] 20 | end 21 | subgraph Data 22 | CH["ClickHouse"] 23 | PG["Postgres"] 24 | end 25 | subgraph UI 26 | NX["Next.js UI"] 27 | end 28 | 29 | B --> NG --> A 30 | A --> Q 31 | W --> CH 32 | W --> PG 33 | NX --> A 34 | A --> CH 35 | A --> PG 36 | ```` 37 | 38 | ## Ingest sequence 39 | 40 | ```mermaid 41 | sequenceDiagram 42 | participant Browser 43 | participant Edge 44 | participant API 45 | participant NATS 46 | participant Worker 47 | participant CH 48 | 49 | Browser->>Edge: POST /i (batched rrweb) 50 | Edge->>API: proxy request 51 | API->>NATS: publish(batch) 52 | API-->>Browser: 202 Accepted 53 | Worker->>NATS: pull(batch) 54 | Worker->>CH: bulk insert (segments/events) 55 | Worker-->>NATS: ack 56 | ``` 57 | 58 | ## Components (concise) 59 | 60 | * **Nginx**: edge proxy, gzip/brotli, per-project rate limits, forwards to API `/i`. 61 | * **FastAPI**: 62 | 63 | * **Ingest**: validate + publish to `events.rrweb` → return `202`. 64 | * **Read**: playback & analytics endpoints (ClickHouse/Postgres). 65 | * Auth: per-project **write** and **read** keys; session JWT for UI. 66 | * **NATS JetStream**: durable streams, at-least-once delivery, back-pressure. 67 | * **Worker (Python, asyncio)**: pull, batch, bulk insert; idempotency by hashing payload. 68 | * **ClickHouse**: raw rrweb segments + analytics events; TTL on raw. 69 | * **Postgres**: projects, API keys, sessions meta, dashboards. 70 | 71 | ## Storage schemas (starter) 72 | 73 | **ClickHouse** 74 | 75 | ```sql 76 | CREATE TABLE events_rrweb ( 77 | project_id UInt32, 78 | session_id UUID, 79 | seq UInt32, 80 | ts DateTime64(3, 'UTC'), 81 | payload String CODEC(ZSTD(9)), 82 | event_id UUID, 83 | ingest_ts DateTime DEFAULT now() 84 | ) 85 | ENGINE=MergeTree 86 | PARTITION BY toYYYYMM(ts) 87 | ORDER BY (project_id, session_id, ts, seq) 88 | TTL ts + INTERVAL 90 DAY; 89 | 90 | CREATE TABLE events ( 91 | project_id UInt32, 92 | session_id UUID, 93 | user_id Nullable(String), 94 | event_name LowCardinality(String), 95 | ts DateTime64(3, 'UTC'), 96 | url String, 97 | props JSON, 98 | event_id UUID 99 | ) 100 | ENGINE=MergeTree 101 | PARTITION BY toYYYYMM(ts) 102 | ORDER BY (project_id, event_name, ts, session_id); 103 | ``` 104 | 105 | **Postgres** 106 | 107 | ```sql 108 | CREATE TABLE projects ( 109 | id SERIAL PRIMARY KEY, 110 | name TEXT NOT NULL, 111 | write_key TEXT UNIQUE NOT NULL, 112 | read_key TEXT UNIQUE NOT NULL 113 | ); 114 | 115 | CREATE TABLE sessions_meta ( 116 | project_id INT REFERENCES projects(id), 117 | session_id UUID, 118 | started_at TIMESTAMPTZ, 119 | user_id TEXT, 120 | ua TEXT, 121 | PRIMARY KEY (project_id, session_id) 122 | ); 123 | ``` 124 | 125 | ## NATS JetStream (subjects/consumers) 126 | 127 | * Stream: `EVENTS` 128 | 129 | * Subjects: `events.rrweb`, `events.analytics` 130 | * Storage: file, retention: limits, replicas: 1 (MVP) 131 | * Consumer: `WORKER` (durable) 132 | 133 | * `max_ack_pending` tuned to worker capacity 134 | * `max_deliver` 10 (redelivery retries) 135 | 136 | ## Ops & Observability 137 | 138 | * `/healthz` endpoints 139 | * Basic Prometheus metrics (queue lag, batch size, CH latency) 140 | * ClickHouse TTL to control disk (e.g., keep raw rrweb 60–90 days) 141 | * PII masking on client (rrweb plugins), allowlist domains/selectors 142 | 143 | ## Trade-offs 144 | 145 | * **NATS vs Kafka/Redpanda**: NATS is lighter → great for single host. Move to Redpanda later if you need Kafka ecosystem. 146 | * **Single-node ClickHouse**: perfect for MVP; add replicas/keeper later for HA. --------------------------------------------------------------------------------